- Add system tray icon with Exit menu (tray-icon/muda) - Add IPC daemon status via named pipe (Windows) / Unix socket (Linux) - Add `mouth status` command to query running daemon - Add daemon lock to prevent multiple instances - Hide Windows console window when running as daemon - Wire up Silero VAD model download and speech filtering - Switch hotkey listener from rdev::listen to rdev::grab to consume hotkeys - Add hotkey capture mode in interactive config (press keys instead of typing) - Add all missing key names (brackets, punctuation, numpad, etc.) - Fix ONNX tensor type mismatches (encoder wants i64, decoder wants i32) - Add 300ms lead-in silence to compensate for mic startup latency - Add 300ms trailing recording after stop for speech not to be clipped - Add 50ms silence before audio feedback blips for device warmup - Reduce overlay size (150x18, was 200x36) - Add PolyForm Noncommercial 1.0.0 license - Flesh out user-focused README - Update release script with Gitea/GitHub forge support Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Mouth
Offline speech-to-text with a global hotkey. Press a key, speak, and transcribed text is pasted at your cursor. No cloud services, no API keys — everything runs locally.
Uses Parakeet TDT 0.6B v3 for transcription and Silero VAD for voice activity detection, both via ONNX Runtime.
Quick Start
- Download
mouth.exe(Windows) or build from source - Run
mouth— models download automatically on first launch (~800MB one-time) - Press your hotkey (default:
Ctrl+Space), speak, release — text appears at your cursor
Mouth runs in the background with a system tray icon. Right-click the tray icon to exit.
Usage
| Command | Description |
|---|---|
| mouth | Run the daemon (default) |
| mouth config | Interactive configuration |
| mouth config --show | Print current config |
| mouth config --reset | Reset to defaults |
| mouth models | List available models |
| mouth models --download | Download configured model |
| mouth status | Show daemon status |
Configuration
Config file location:
- Windows:
%APPDATA%\mouth\config.yaml - Linux/macOS:
~/.config/mouth/config.yaml
Run mouth config for an interactive setup, or edit the YAML directly:
hotkey: "ctrl+space"
mode: push_to_talk # push_to_talk or toggle
cancel_key: "escape"
model: "parakeet-tdt-0.6b-v3"
accelerator: auto # auto, cpu, cuda, directml
gpu_device: 0
paste_method: ctrl_v # ctrl_v, shift_insert, ctrl_shift_v, clipboard_only
copy_to_clipboard: true
overlay_position: top # top, bottom, none
audio_feedback: true
input_device: null # null = system default
vad_enabled: true
language: en
Recording Modes
- push_to_talk — Hold the hotkey while speaking, release to transcribe
- toggle — Press once to start recording, press again to stop and transcribe
Hotkey Format
Hotkeys are written as modifier+key combinations:
- Modifiers:
ctrl,alt,shift,meta(Win key) - Keys: letters (
a-z), numbers (0-9), function keys (f1-f12), punctuation ([,],;, etc.), and special keys (space,enter,escape,tab, etc.)
Examples: ctrl+space, alt+r, ctrl+shift+[, f9
When running mouth config, you can press the key combination directly instead of typing it.
Paste Methods
- ctrl_v — Simulates Ctrl+V (works in most apps)
- shift_insert — Simulates Shift+Insert (useful for terminals)
- ctrl_shift_v — Simulates Ctrl+Shift+V (plain text paste)
- clipboard_only — Copies to clipboard without pasting
Overlay
A small colour-coded bar appears at the top (or bottom) of your screen:
- Red — Recording
- Amber — Transcribing
- Green — Done
Set overlay_position: none to disable.
Building from Source
Requires Rust 1.75+.
# Linux dependencies (Ubuntu/Debian)
sudo apt-get install libssl-dev libasound2-dev libpulse-dev \
libx11-dev libxcb-shape0-dev libxcb-xfixes0-dev libxkbcommon-dev \
libwayland-dev libgtk-3-dev libxtst-dev libxdo-dev cmake
# Build
cargo build --release
# Cross-compile for Windows from Linux (requires cargo-xwin)
cargo xwin build --release --target x86_64-pc-windows-msvc
How It Works
- A global hotkey listener intercepts your configured key combination (consuming it so it doesn't reach other apps)
- Audio is captured from your microphone and resampled to 16kHz
- Silero VAD trims silence from the recording
- The Parakeet TDT model transcribes speech to text via ONNX Runtime
- Text is placed on the clipboard and pasted at your cursor
All processing happens locally. No data leaves your machine.
License
PolyForm Noncommercial 1.0.0 — free for personal and non-commercial use. For commercial licensing, contact the author.