Implement core speech-to-text pipeline
All major components: hotkey listener (rdev), audio capture (cpal), resampling (rubato), VAD (Silero ONNX), Parakeet v3 TDT transcription (ort), overlay window (winit+softbuffer), paste simulation (enigo+arboard), audio feedback (rodio), YAML config, CLI with clap, HuggingFace model download. ~2400 lines of Rust across 16 source files. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,42 @@
|
||||
# Mouth configuration
|
||||
# Copy to ~/.config/mouth/config.yaml (Linux/macOS)
|
||||
# or %APPDATA%\mouth\config.yaml (Windows)
|
||||
|
||||
# Hotkey to activate recording
|
||||
hotkey: ctrl+space
|
||||
|
||||
# Recording mode: push_to_talk or toggle
|
||||
mode: push_to_talk
|
||||
|
||||
# Cancel hotkey (only active while recording)
|
||||
cancel_key: escape
|
||||
|
||||
# Speech-to-text model
|
||||
model: parakeet-tdt-0.6b-v3
|
||||
|
||||
# Inference accelerator: auto, cpu, cuda, directml
|
||||
accelerator: auto
|
||||
|
||||
# GPU device index (when accelerator is cuda/directml)
|
||||
gpu_device: 0
|
||||
|
||||
# How to paste text: ctrl_v, shift_insert, ctrl_shift_v, clipboard_only
|
||||
paste_method: ctrl_v
|
||||
|
||||
# Keep transcribed text on clipboard after pasting
|
||||
copy_to_clipboard: true
|
||||
|
||||
# Overlay position: top, bottom, none
|
||||
overlay_position: top
|
||||
|
||||
# Play audio feedback sounds
|
||||
audio_feedback: true
|
||||
|
||||
# Audio input device name (null = system default)
|
||||
input_device: null
|
||||
|
||||
# Voice activity detection (trim silence)
|
||||
vad_enabled: true
|
||||
|
||||
# Language hint for model
|
||||
language: en
|
||||
Reference in New Issue
Block a user