v0.2.0: System tray, IPC status, VAD, hotkey grab, and polish
- Add system tray icon with Exit menu (tray-icon/muda) - Add IPC daemon status via named pipe (Windows) / Unix socket (Linux) - Add `mouth status` command to query running daemon - Add daemon lock to prevent multiple instances - Hide Windows console window when running as daemon - Wire up Silero VAD model download and speech filtering - Switch hotkey listener from rdev::listen to rdev::grab to consume hotkeys - Add hotkey capture mode in interactive config (press keys instead of typing) - Add all missing key names (brackets, punctuation, numpad, etc.) - Fix ONNX tensor type mismatches (encoder wants i64, decoder wants i32) - Add 300ms lead-in silence to compensate for mic startup latency - Add 300ms trailing recording after stop for speech not to be clipped - Add 50ms silence before audio feedback blips for device warmup - Reduce overlay size (150x18, was 200x36) - Add PolyForm Noncommercial 1.0.0 license - Flesh out user-focused README - Update release script with Gitea/GitHub forge support Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -1,3 +1,114 @@
|
||||
# Mouth
|
||||
|
||||
`Mouth` is a utility that sits in the background waiting for you to hit a global hot key - when you press, it listens to you, quickly translates your voice in to text using a local LLM model and pastes it in to your application where the cursor currently sits. No internet required!
|
||||
Offline speech-to-text with a global hotkey. Press a key, speak, and transcribed text is pasted at your cursor. No cloud services, no API keys — everything runs locally.
|
||||
|
||||
Uses [Parakeet TDT 0.6B v3](https://huggingface.co/istupakov/parakeet-tdt-0.6b-v3-onnx) for transcription and [Silero VAD](https://huggingface.co/onnx-community/silero-vad) for voice activity detection, both via ONNX Runtime.
|
||||
|
||||
## Quick Start
|
||||
|
||||
1. Download `mouth.exe` (Windows) or build from source
|
||||
2. Run `mouth` — models download automatically on first launch (~800MB one-time)
|
||||
3. Press your hotkey (default: `Ctrl+Space`), speak, release — text appears at your cursor
|
||||
|
||||
Mouth runs in the background with a system tray icon. Right-click the tray icon to exit.
|
||||
|
||||
## Usage
|
||||
|
||||
| Command | Description |
|
||||
| ----------------------- | ------------------------- |
|
||||
| mouth | Run the daemon (default) |
|
||||
| mouth config | Interactive configuration |
|
||||
| mouth config --show | Print current config |
|
||||
| mouth config --reset | Reset to defaults |
|
||||
| mouth models | List available models |
|
||||
| mouth models --download | Download configured model |
|
||||
| mouth status | Show daemon status |
|
||||
|
||||
|
||||
## Configuration
|
||||
|
||||
Config file location:
|
||||
- **Windows:** `%APPDATA%\mouth\config.yaml`
|
||||
- **Linux/macOS:** `~/.config/mouth/config.yaml`
|
||||
|
||||
Run `mouth config` for an interactive setup, or edit the YAML directly:
|
||||
|
||||
```yaml
|
||||
hotkey: "ctrl+space"
|
||||
mode: push_to_talk # push_to_talk or toggle
|
||||
cancel_key: "escape"
|
||||
model: "parakeet-tdt-0.6b-v3"
|
||||
accelerator: auto # auto, cpu, cuda, directml
|
||||
gpu_device: 0
|
||||
paste_method: ctrl_v # ctrl_v, shift_insert, ctrl_shift_v, clipboard_only
|
||||
copy_to_clipboard: true
|
||||
overlay_position: top # top, bottom, none
|
||||
audio_feedback: true
|
||||
input_device: null # null = system default
|
||||
vad_enabled: true
|
||||
language: en
|
||||
```
|
||||
|
||||
### Recording Modes
|
||||
|
||||
- **push_to_talk** — Hold the hotkey while speaking, release to transcribe
|
||||
- **toggle** — Press once to start recording, press again to stop and transcribe
|
||||
|
||||
### Hotkey Format
|
||||
|
||||
Hotkeys are written as modifier+key combinations:
|
||||
|
||||
- Modifiers: `ctrl`, `alt`, `shift`, `meta` (Win key)
|
||||
- Keys: letters (`a`-`z`), numbers (`0`-`9`), function keys (`f1`-`f12`), punctuation (`[`, `]`, `;`, etc.), and special keys (`space`, `enter`, `escape`, `tab`, etc.)
|
||||
|
||||
Examples: `ctrl+space`, `alt+r`, `ctrl+shift+[`, `f9`
|
||||
|
||||
When running `mouth config`, you can press the key combination directly instead of typing it.
|
||||
|
||||
### Paste Methods
|
||||
|
||||
- **ctrl_v** — Simulates Ctrl+V (works in most apps)
|
||||
- **shift_insert** — Simulates Shift+Insert (useful for terminals)
|
||||
- **ctrl_shift_v** — Simulates Ctrl+Shift+V (plain text paste)
|
||||
- **clipboard_only** — Copies to clipboard without pasting
|
||||
|
||||
## Overlay
|
||||
|
||||
A small colour-coded bar appears at the top (or bottom) of your screen:
|
||||
|
||||
- **Red** — Recording
|
||||
- **Amber** — Transcribing
|
||||
- **Green** — Done
|
||||
|
||||
Set `overlay_position: none` to disable.
|
||||
|
||||
## Building from Source
|
||||
|
||||
Requires Rust 1.75+.
|
||||
|
||||
```bash
|
||||
# Linux dependencies (Ubuntu/Debian)
|
||||
sudo apt-get install libssl-dev libasound2-dev libpulse-dev \
|
||||
libx11-dev libxcb-shape0-dev libxcb-xfixes0-dev libxkbcommon-dev \
|
||||
libwayland-dev libgtk-3-dev libxtst-dev libxdo-dev cmake
|
||||
|
||||
# Build
|
||||
cargo build --release
|
||||
|
||||
# Cross-compile for Windows from Linux (requires cargo-xwin)
|
||||
cargo xwin build --release --target x86_64-pc-windows-msvc
|
||||
```
|
||||
|
||||
## How It Works
|
||||
|
||||
1. A global hotkey listener intercepts your configured key combination (consuming it so it doesn't reach other apps)
|
||||
2. Audio is captured from your microphone and resampled to 16kHz
|
||||
3. Silero VAD trims silence from the recording
|
||||
4. The Parakeet TDT model transcribes speech to text via ONNX Runtime
|
||||
5. Text is placed on the clipboard and pasted at your cursor
|
||||
|
||||
All processing happens locally. No data leaves your machine.
|
||||
|
||||
## License
|
||||
|
||||
[PolyForm Noncommercial 1.0.0](LICENSE) — free for personal and non-commercial use. For commercial licensing, contact the author.
|
||||
|
||||
Reference in New Issue
Block a user