# Mouth Offline speech-to-text with a global hotkey. Press a key, speak, and transcribed text is pasted at your cursor. No cloud services, no API keys — everything runs locally. Uses [Parakeet TDT 0.6B v3](https://huggingface.co/istupakov/parakeet-tdt-0.6b-v3-onnx) for transcription and [Silero VAD](https://huggingface.co/onnx-community/silero-vad) for voice activity detection, both via ONNX Runtime. ## Quick Start 1. Download `mouth.exe` (Windows) or build from source 2. Run `mouth` — models download automatically on first launch (~800MB one-time) 3. Press your hotkey (default: `Ctrl+Space`), speak, release — text appears at your cursor Mouth runs in the background with a system tray icon. Right-click the tray icon to exit. ## Usage | Command | Description | | ----------------------- | ------------------------- | | mouth | Run the daemon (default) | | mouth config | Interactive configuration | | mouth config --show | Print current config | | mouth config --reset | Reset to defaults | | mouth models | List available models | | mouth models --download | Download configured model | | mouth status | Show daemon status | ## Configuration Config file location: - **Windows:** `%APPDATA%\mouth\config.yaml` - **Linux/macOS:** `~/.config/mouth/config.yaml` Run `mouth config` for an interactive setup, or edit the YAML directly: ```yaml hotkey: "ctrl+space" mode: push_to_talk # push_to_talk or toggle cancel_key: "escape" model: "parakeet-tdt-0.6b-v3" accelerator: auto # auto, cpu, cuda, directml gpu_device: 0 paste_method: ctrl_v # ctrl_v, shift_insert, ctrl_shift_v, clipboard_only copy_to_clipboard: true overlay_position: top # top, bottom, none audio_feedback: true input_device: null # null = system default vad_enabled: true language: en ``` ### Recording Modes - **push_to_talk** — Hold the hotkey while speaking, release to transcribe - **toggle** — Press once to start recording, press again to stop and transcribe ### Hotkey Format Hotkeys are written as modifier+key combinations: - Modifiers: `ctrl`, `alt`, `shift`, `meta` (Win key) - Keys: letters (`a`-`z`), numbers (`0`-`9`), function keys (`f1`-`f12`), punctuation (`[`, `]`, `;`, etc.), and special keys (`space`, `enter`, `escape`, `tab`, etc.) Examples: `ctrl+space`, `alt+r`, `ctrl+shift+[`, `f9` When running `mouth config`, you can press the key combination directly instead of typing it. ### Paste Methods - **ctrl_v** — Simulates Ctrl+V (works in most apps) - **shift_insert** — Simulates Shift+Insert (useful for terminals) - **ctrl_shift_v** — Simulates Ctrl+Shift+V (plain text paste) - **clipboard_only** — Copies to clipboard without pasting ## Overlay A small colour-coded bar appears at the top (or bottom) of your screen: - **Red** — Recording - **Amber** — Transcribing - **Green** — Done Set `overlay_position: none` to disable. ## Building from Source Requires Rust 1.75+. ```bash # Linux dependencies (Ubuntu/Debian) sudo apt-get install libssl-dev libasound2-dev libpulse-dev \ libx11-dev libxcb-shape0-dev libxcb-xfixes0-dev libxkbcommon-dev \ libwayland-dev libgtk-3-dev libxtst-dev libxdo-dev cmake # Build cargo build --release # Cross-compile for Windows from Linux (requires cargo-xwin) cargo xwin build --release --target x86_64-pc-windows-msvc ``` ## How It Works 1. A global hotkey listener intercepts your configured key combination (consuming it so it doesn't reach other apps) 2. Audio is captured from your microphone and resampled to 16kHz 3. Silero VAD trims silence from the recording 4. The Parakeet TDT model transcribes speech to text via ONNX Runtime 5. Text is placed on the clipboard and pasted at your cursor All processing happens locally. No data leaves your machine. ## License [PolyForm Noncommercial 1.0.0](LICENSE) — free for personal and non-commercial use. For commercial licensing, contact the author.