T

steve 0cea6a4b28 v0.2.0: System tray, IPC status, VAD, hotkey grab, and polish

- Add system tray icon with Exit menu (tray-icon/muda)
- Add IPC daemon status via named pipe (Windows) / Unix socket (Linux)
- Add `mouth status` command to query running daemon
- Add daemon lock to prevent multiple instances
- Hide Windows console window when running as daemon
- Wire up Silero VAD model download and speech filtering
- Switch hotkey listener from rdev::listen to rdev::grab to consume hotkeys
- Add hotkey capture mode in interactive config (press keys instead of typing)
- Add all missing key names (brackets, punctuation, numpad, etc.)
- Fix ONNX tensor type mismatches (encoder wants i64, decoder wants i32)
- Add 300ms lead-in silence to compensate for mic startup latency
- Add 300ms trailing recording after stop for speech not to be clipped
- Add 50ms silence before audio feedback blips for device warmup
- Reduce overlay size (150x18, was 200x36)
- Add PolyForm Noncommercial 1.0.0 license
- Flesh out user-focused README
- Update release script with Gitea/GitHub forge support

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-10 22:04:39 +01:00

src

v0.2.0: System tray, IPC status, VAD, hotkey grab, and polish

2026-04-10 22:04:39 +01:00

.gitignore

Add release build script with Linux and Windows targets

2026-04-10 16:56:35 +01:00

Cargo.lock

v0.2.0: System tray, IPC status, VAD, hotkey grab, and polish

2026-04-10 22:04:39 +01:00

Cargo.toml

v0.2.0: System tray, IPC status, VAD, hotkey grab, and polish

2026-04-10 22:04:39 +01:00

CLAUDE.md

Implement core speech-to-text pipeline

2026-04-10 16:47:46 +01:00

config.yaml.example

Implement core speech-to-text pipeline

2026-04-10 16:47:46 +01:00

LICENSE

v0.2.0: System tray, IPC status, VAD, hotkey grab, and polish

2026-04-10 22:04:39 +01:00

README.md

v0.2.0: System tray, IPC status, VAD, hotkey grab, and polish

2026-04-10 22:04:39 +01:00

release.sh

v0.2.0: System tray, IPC status, VAD, hotkey grab, and polish

2026-04-10 22:04:39 +01:00

README.md

Mouth

Offline speech-to-text with a global hotkey. Press a key, speak, and transcribed text is pasted at your cursor. No cloud services, no API keys — everything runs locally.

Uses Parakeet TDT 0.6B v3 for transcription and Silero VAD for voice activity detection, both via ONNX Runtime.

Quick Start

Download mouth.exe (Windows) or build from source
Run mouth — models download automatically on first launch (~800MB one-time)
Press your hotkey (default: Ctrl+Space), speak, release — text appears at your cursor

Mouth runs in the background with a system tray icon. Right-click the tray icon to exit.

Usage

Command	Description
mouth	Run the daemon (default)
mouth config	Interactive configuration
mouth config --show	Print current config
mouth config --reset	Reset to defaults
mouth models	List available models
mouth models --download	Download configured model
mouth status	Show daemon status

Configuration

Config file location:

Windows: %APPDATA%\mouth\config.yaml
Linux/macOS: ~/.config/mouth/config.yaml

Run mouth config for an interactive setup, or edit the YAML directly:

hotkey: "ctrl+space"
mode: push_to_talk        # push_to_talk or toggle
cancel_key: "escape"
model: "parakeet-tdt-0.6b-v3"
accelerator: auto          # auto, cpu, cuda, directml
gpu_device: 0
paste_method: ctrl_v       # ctrl_v, shift_insert, ctrl_shift_v, clipboard_only
copy_to_clipboard: true
overlay_position: top      # top, bottom, none
audio_feedback: true
input_device: null          # null = system default
vad_enabled: true
language: en

Recording Modes

push_to_talk — Hold the hotkey while speaking, release to transcribe
toggle — Press once to start recording, press again to stop and transcribe

Hotkey Format

Hotkeys are written as modifier+key combinations:

Modifiers: ctrl, alt, shift, meta (Win key)
Keys: letters (a-z), numbers (0-9), function keys (f1-f12), punctuation ([, ], ;, etc.), and special keys (space, enter, escape, tab, etc.)

Examples: ctrl+space, alt+r, ctrl+shift+[, f9

When running mouth config, you can press the key combination directly instead of typing it.

Paste Methods

ctrl_v — Simulates Ctrl+V (works in most apps)
shift_insert — Simulates Shift+Insert (useful for terminals)
ctrl_shift_v — Simulates Ctrl+Shift+V (plain text paste)
clipboard_only — Copies to clipboard without pasting

Overlay

A small colour-coded bar appears at the top (or bottom) of your screen:

Red — Recording
Amber — Transcribing
Green — Done

Set overlay_position: none to disable.

Building from Source

Requires Rust 1.75+.

# Linux dependencies (Ubuntu/Debian)
sudo apt-get install libssl-dev libasound2-dev libpulse-dev \
  libx11-dev libxcb-shape0-dev libxcb-xfixes0-dev libxkbcommon-dev \
  libwayland-dev libgtk-3-dev libxtst-dev libxdo-dev cmake

# Build
cargo build --release

# Cross-compile for Windows from Linux (requires cargo-xwin)
cargo xwin build --release --target x86_64-pc-windows-msvc

How It Works

A global hotkey listener intercepts your configured key combination (consuming it so it doesn't reach other apps)
Audio is captured from your microphone and resampled to 16kHz
Silero VAD trims silence from the recording
The Parakeet TDT model transcribes speech to text via ONNX Runtime
Text is placed on the clipboard and pasted at your cursor

All processing happens locally. No data leaves your machine.

License

PolyForm Noncommercial 1.0.0 — free for personal and non-commercial use. For commercial licensing, contact the author.

Releases 3

Mouth v0.2.3 Latest

2026-04-15 05:46:15 +01:00

Languages

Rust 93.5%

Shell 6.5%