# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. ## Project Overview Mouth is a single-binary, offline speech-to-text tool. Press a global hotkey, speak, and transcribed text is pasted at your cursor. Configured via YAML, no UI. Primary target is Windows; Linux/macOS supported where possible. Uses Parakeet TDT 0.6B v3 (ONNX, from `istupakov/parakeet-tdt-0.6b-v3-onnx`) for transcription, Silero VAD v4 for voice activity detection. ## Build & Run ```bash cargo build # debug build cargo build --release # release build cargo run # run daemon (default command) cargo run -- config --show # show current config cargo run -- config # interactive config TUI cargo run -- config --reset # reset to defaults cargo run -- models # list models cargo run -- models --download # download configured model cargo run -- status # daemon status ``` ## Architecture Single-binary Rust application. Core pipeline: hotkey capture (rdev) → audio recording (cpal) → resampling to 16kHz (rubato) → VAD (Silero ONNX) → mel spectrogram → transcription (Parakeet v3 TDT decoder via ort) → clipboard/paste (arboard + enigo). Minimal native overlay window (winit + softbuffer). **Threading model:** Main thread owns the overlay window event loop (required by winit). Background threads: hotkey listener (rdev::listen is blocking), audio recorder (cpal stream), coordinator (state machine). All communicate via `std::sync::mpsc` channels. **Coordinator state machine:** Idle → Recording → Transcribing → (Pasting) → Idle. Cancel from Recording returns to Idle. **Parakeet v3 inference:** Two-stage ONNX model — encoder (FastConformer) produces features, decoder+joint (TDT transducer) greedily decodes tokens with duration predictions. Audio preprocessing: pre-emphasis → STFT → 128-band log-mel → per-utterance CMVN. Vocab is SentencePiece BPE with `▁` as word boundary marker. **ort crate (v2.0.0-rc.12) notes:** Session::run needs `&mut self`. Input values must be converted to `Value::into_dyn()` before passing. Use `SessionInputValue::Owned(value.into_dyn())` pattern. `try_extract_tensor` returns `(&Shape, &[T])` tuple. `from_shape_vec` needs `[usize; N]` not `Vec`. Config lives at `~/.config/mouth/config.yaml` (Linux/macOS) or `%APPDATA%\mouth\config.yaml` (Windows). Models cached via HuggingFace Hub standard cache (`~/.cache/huggingface/hub/`). ## Cross-Compilation Developing on Ubuntu 24.04, targeting Windows: ```bash cargo build --target x86_64-pc-windows-gnu ``` ## System Dependencies (Ubuntu) ```bash sudo apt-get install libssl-dev libasound2-dev libpulse-dev libx11-dev libxcb-shape0-dev libxcb-xfixes0-dev libxkbcommon-dev libwayland-dev libgtk-3-dev libxtst-dev libxdo-dev cmake ```