Implement core speech-to-text pipeline

All major components: hotkey listener (rdev), audio capture (cpal), resampling (rubato), VAD (Silero ONNX), Parakeet v3 TDT transcription (ort), overlay window (winit+softbuffer), paste simulation (enigo+arboard), audio feedback (rodio), YAML config, CLI with clap, HuggingFace model download. ~2400 lines of Rust across 16 source files. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 16:47:46 +01:00
parent 6b737f92fe
commit 9b0bf7d9e3
22 changed files with 7750 additions and 0 deletions
@@ -0,0 +1,42 @@
+# Mouth configuration
+# Copy to ~/.config/mouth/config.yaml (Linux/macOS)
+# or %APPDATA%\mouth\config.yaml (Windows)
+
+# Hotkey to activate recording
+hotkey: ctrl+space
+
+# Recording mode: push_to_talk or toggle
+mode: push_to_talk
+
+# Cancel hotkey (only active while recording)
+cancel_key: escape
+
+# Speech-to-text model
+model: parakeet-tdt-0.6b-v3
+
+# Inference accelerator: auto, cpu, cuda, directml
+accelerator: auto
+
+# GPU device index (when accelerator is cuda/directml)
+gpu_device: 0
+
+# How to paste text: ctrl_v, shift_insert, ctrl_shift_v, clipboard_only
+paste_method: ctrl_v
+
+# Keep transcribed text on clipboard after pasting
+copy_to_clipboard: true
+
+# Overlay position: top, bottom, none
+overlay_position: top
+
+# Play audio feedback sounds
+audio_feedback: true
+
+# Audio input device name (null = system default)
+input_device: null
+
+# Voice activity detection (trim silence)
+vad_enabled: true
+
+# Language hint for model
+language: en