9ad870d260
Unsigned Rust binaries that use keyboard hooks, input simulation, and clipboard access trigger Defender heuristics. Document the workaround (Defender exclusion) and point users to building from source. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
145 lines
5.3 KiB
Markdown
145 lines
5.3 KiB
Markdown
# Mouth
|
|
|
|
Offline speech-to-text with a global hotkey. Press a key, speak, and transcribed text is pasted at your cursor. No cloud services, no API keys — everything runs locally.
|
|
|
|
Uses [Parakeet TDT 0.6B v3](https://huggingface.co/istupakov/parakeet-tdt-0.6b-v3-onnx) for transcription and [Silero VAD](https://huggingface.co/onnx-community/silero-vad) for voice activity detection, both via ONNX Runtime.
|
|
|
|
## Quick Start
|
|
|
|
1. Download `mouth.exe` (Windows) or build from source
|
|
2. Run `mouth` — models download automatically on first launch (~800MB one-time)
|
|
3. Press your hotkey (default: `Ctrl+Space`), speak, release — text appears at your cursor
|
|
|
|
Mouth runs in the background with a system tray icon. Right-click the tray icon to exit.
|
|
|
|
## Usage
|
|
|
|
| Command | Description |
|
|
| ----------------------- | ------------------------- |
|
|
| mouth | Run the daemon (default) |
|
|
| mouth config | Interactive configuration |
|
|
| mouth config --show | Print current config |
|
|
| mouth config --reset | Reset to defaults |
|
|
| mouth models | List available models |
|
|
| mouth models --download | Download configured model |
|
|
| mouth status | Show daemon status |
|
|
|
|
|
|
## Configuration
|
|
|
|
Config file location:
|
|
- **Windows:** `%APPDATA%\mouth\config.yaml`
|
|
- **Linux/macOS:** `~/.config/mouth/config.yaml`
|
|
|
|
Run `mouth config` for an interactive setup, or edit the YAML directly:
|
|
|
|
```yaml
|
|
hotkey: "ctrl+space"
|
|
mode: push_to_talk # push_to_talk or toggle
|
|
cancel_key: "escape"
|
|
model: "parakeet-tdt-0.6b-v3"
|
|
accelerator: auto # auto, cpu, cuda, directml
|
|
gpu_device: 0
|
|
paste_method: ctrl_v # ctrl_v, shift_insert, ctrl_shift_v, clipboard_only
|
|
copy_to_clipboard: true
|
|
overlay_position: top # top, bottom, none
|
|
audio_feedback: true
|
|
input_device: null # null = system default
|
|
vad_enabled: true
|
|
language: en
|
|
```
|
|
|
|
### Recording Modes
|
|
|
|
- **push_to_talk** — Hold the hotkey while speaking, release to transcribe
|
|
- **toggle** — Press once to start recording, press again to stop and transcribe
|
|
|
|
### Hotkey Format
|
|
|
|
Hotkeys are written as modifier+key combinations:
|
|
|
|
- Modifiers: `ctrl`, `alt`, `shift`, `meta` (Win key)
|
|
- Keys: letters (`a`-`z`), numbers (`0`-`9`), function keys (`f1`-`f12`), punctuation (`[`, `]`, `;`, etc.), and special keys (`space`, `enter`, `escape`, `tab`, etc.)
|
|
|
|
Examples: `ctrl+space`, `alt+r`, `ctrl+shift+[`, `f9`
|
|
|
|
When running `mouth config`, you can press the key combination directly instead of typing it.
|
|
|
|
### Paste Methods
|
|
|
|
- **ctrl_v** — Simulates Ctrl+V (works in most apps)
|
|
- **shift_insert** — Simulates Shift+Insert (useful for terminals)
|
|
- **ctrl_shift_v** — Simulates Ctrl+Shift+V (plain text paste)
|
|
- **clipboard_only** — Copies to clipboard without pasting
|
|
|
|
## Overlay
|
|
|
|
A small colour-coded bar appears at the top (or bottom) of your screen:
|
|
|
|
- **Red** — Recording
|
|
- **Amber** — Transcribing
|
|
- **Green** — Done
|
|
|
|
Set `overlay_position: none` to disable.
|
|
|
|
## Windows Defender False Positive
|
|
|
|
Windows Defender may flag `mouth.exe` as malicious and quarantine it. This is a
|
|
false positive caused by the way Mouth works — it uses global keyboard hooks,
|
|
simulated input, and clipboard access, which are the same techniques used by
|
|
legitimate accessibility tools but also match heuristic patterns that antivirus
|
|
software looks for.
|
|
|
|
Mouth is open source and you can inspect every line of code in this repository.
|
|
Unfortunately, the only reliable way to prevent these warnings is to purchase a
|
|
code signing certificate, which I can't justify for a free, non-commercial
|
|
project. If you're not comfortable adding an exception, you're welcome to build
|
|
the exe yourself from source (see below) — a locally built binary is far less
|
|
likely to be flagged.
|
|
|
|
To add an exclusion in Windows Defender:
|
|
|
|
1. Open **Windows Security** (search for it in the Start menu)
|
|
2. Go to **Virus & threat protection**
|
|
3. Under "Virus & threat protection settings", click **Manage settings**
|
|
4. Scroll down to **Exclusions** and click **Add or remove exclusions**
|
|
5. Click **Add an exclusion** → **File**, then select `mouth.exe`
|
|
|
|
If Defender has already quarantined the file, you'll need to restore it first:
|
|
|
|
1. In **Virus & threat protection**, click **Protection history**
|
|
2. Find the Mouth entry, expand it, and click **Restore**
|
|
|
|
Then add the exclusion above to prevent it happening again.
|
|
|
|
## Building from Source
|
|
|
|
Requires Rust 1.75+.
|
|
|
|
```bash
|
|
# Linux dependencies (Ubuntu/Debian)
|
|
sudo apt-get install libssl-dev libasound2-dev libpulse-dev \
|
|
libx11-dev libxcb-shape0-dev libxcb-xfixes0-dev libxkbcommon-dev \
|
|
libwayland-dev libgtk-3-dev libxtst-dev libxdo-dev cmake
|
|
|
|
# Build
|
|
cargo build --release
|
|
|
|
# Cross-compile for Windows from Linux (requires cargo-xwin)
|
|
cargo xwin build --release --target x86_64-pc-windows-msvc
|
|
```
|
|
|
|
## How It Works
|
|
|
|
1. A global hotkey listener intercepts your configured key combination (consuming it so it doesn't reach other apps)
|
|
2. Audio is captured from your microphone and resampled to 16kHz
|
|
3. Silero VAD trims silence from the recording
|
|
4. The Parakeet TDT model transcribes speech to text via ONNX Runtime
|
|
5. Text is placed on the clipboard and pasted at your cursor
|
|
|
|
All processing happens locally. No data leaves your machine.
|
|
|
|
## License
|
|
|
|
[PolyForm Noncommercial 1.0.0](LICENSE) — free for personal and non-commercial use. For commercial licensing, contact the author.
|