v0.2.0: System tray, IPC status, VAD, hotkey grab, and polish
- Add system tray icon with Exit menu (tray-icon/muda) - Add IPC daemon status via named pipe (Windows) / Unix socket (Linux) - Add `mouth status` command to query running daemon - Add daemon lock to prevent multiple instances - Hide Windows console window when running as daemon - Wire up Silero VAD model download and speech filtering - Switch hotkey listener from rdev::listen to rdev::grab to consume hotkeys - Add hotkey capture mode in interactive config (press keys instead of typing) - Add all missing key names (brackets, punctuation, numpad, etc.) - Fix ONNX tensor type mismatches (encoder wants i64, decoder wants i32) - Add 300ms lead-in silence to compensate for mic startup latency - Add 300ms trailing recording after stop for speech not to be clipped - Add 50ms silence before audio feedback blips for device warmup - Reduce overlay size (150x18, was 200x36) - Add PolyForm Noncommercial 1.0.0 license - Flesh out user-focused README - Update release script with Gitea/GitHub forge support Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Generated
+706
-56
File diff suppressed because it is too large
Load Diff
+12
-2
@@ -1,8 +1,9 @@
|
||||
[package]
|
||||
name = "mouth"
|
||||
version = "0.1.0"
|
||||
version = "0.2.0"
|
||||
edition = "2024"
|
||||
description = "Offline speech-to-text with global hotkey and paste"
|
||||
license-file = "LICENSE"
|
||||
|
||||
[dependencies]
|
||||
# CLI
|
||||
@@ -24,7 +25,7 @@ tracing-subscriber = { version = "0.3", features = ["env-filter"] }
|
||||
tokio = { version = "1", features = ["full"] }
|
||||
|
||||
# Global hotkey
|
||||
rdev = "0.5"
|
||||
rdev = { version = "0.5", features = ["unstable_grab"] }
|
||||
|
||||
# Audio capture
|
||||
cpal = "0.15"
|
||||
@@ -56,6 +57,15 @@ rodio = "0.20"
|
||||
# System info
|
||||
num_cpus = "1"
|
||||
|
||||
# System tray
|
||||
tray-icon = "0.19"
|
||||
|
||||
# IPC status
|
||||
serde_json = "1"
|
||||
|
||||
# Error handling
|
||||
anyhow = "1"
|
||||
thiserror = "2"
|
||||
|
||||
[target.'cfg(windows)'.dependencies]
|
||||
windows-sys = { version = "0.59", features = ["Win32_System_Console", "Win32_UI_WindowsAndMessaging", "Win32_System_Pipes", "Win32_System_IO", "Win32_Storage_FileSystem", "Win32_Foundation", "Win32_Security"] }
|
||||
|
||||
@@ -0,0 +1,131 @@
|
||||
# PolyForm Noncommercial License 1.0.0
|
||||
|
||||
<https://polyformproject.org/licenses/noncommercial/1.0.0>
|
||||
|
||||
## Acceptance
|
||||
|
||||
In order to get any license under these terms, you must agree
|
||||
to them as both strict obligations and conditions to all
|
||||
your licenses.
|
||||
|
||||
## Copyright License
|
||||
|
||||
The licensor grants you a copyright license for the
|
||||
software to do everything you might do with the software
|
||||
that would otherwise infringe the licensor's copyright
|
||||
in it for any permitted purpose. However, you may
|
||||
only distribute the software according to [Distribution
|
||||
License](#distribution-license) and make changes or new works
|
||||
based on the software according to [Changes and New Works
|
||||
License](#changes-and-new-works-license).
|
||||
|
||||
## Distribution License
|
||||
|
||||
The licensor grants you an additional copyright license
|
||||
to distribute copies of the software. Your license
|
||||
to distribute covers distributing the software with
|
||||
changes and new works permitted by [Changes and New Works
|
||||
License](#changes-and-new-works-license).
|
||||
|
||||
## Notices
|
||||
|
||||
You must ensure that anyone who gets a copy of any part of
|
||||
the software from you also gets a copy of these terms or the
|
||||
URL for them above, as well as copies of any plain-text lines
|
||||
beginning with `Required Notice:` that the licensor provided
|
||||
with the software. For example:
|
||||
|
||||
> Required Notice: Copyright Yoyodyne, Inc. (http://example.com)
|
||||
|
||||
## Changes and New Works License
|
||||
|
||||
The licensor grants you an additional copyright license to
|
||||
make changes and new works based on the software for any
|
||||
permitted purpose.
|
||||
|
||||
## Patent License
|
||||
|
||||
The licensor grants you a patent license for the software that
|
||||
covers patent claims the licensor can license, or becomes able
|
||||
to license, that you would infringe by using the software.
|
||||
|
||||
## Noncommercial Purposes
|
||||
|
||||
Any noncommercial purpose is a permitted purpose.
|
||||
|
||||
## Personal Uses
|
||||
|
||||
Personal use for research, experiment, and testing for
|
||||
the benefit of public knowledge, personal study, private
|
||||
entertainment, hobby projects, amateur pursuits, or religious
|
||||
observance, without any anticipated commercial application,
|
||||
is use for a permitted purpose.
|
||||
|
||||
## Noncommercial Organizations
|
||||
|
||||
Use by any charitable organization, educational institution,
|
||||
public research organization, public safety or health
|
||||
organization, environmental protection organization,
|
||||
or government institution is use for a permitted purpose
|
||||
regardless of the source of funding or obligations resulting
|
||||
from the funding.
|
||||
|
||||
## Fair Use
|
||||
|
||||
You may have "fair use" rights for the software under the
|
||||
law. These terms do not limit them.
|
||||
|
||||
## No Other Rights
|
||||
|
||||
These terms do not allow you to sublicense or transfer any of
|
||||
your licenses to anyone else, or prevent the licensor from
|
||||
granting licenses to anyone else. These terms do not imply
|
||||
any other licenses.
|
||||
|
||||
## Patent Defense
|
||||
|
||||
If you make any written claim that the software infringes or
|
||||
contributes to infringement of any patent, your patent license
|
||||
for the software granted under these terms ends immediately. If
|
||||
your company makes such a claim, your patent license ends
|
||||
immediately for work on behalf of your company.
|
||||
|
||||
## Violations
|
||||
|
||||
The first time you are notified in writing that you have
|
||||
violated any of these terms, or done anything with the software
|
||||
not covered by your licenses, your licenses can nonetheless
|
||||
continue if you come into full compliance with these terms,
|
||||
and take practical steps to correct past violations, within
|
||||
32 days of receiving notice. Otherwise, all your licenses
|
||||
end immediately.
|
||||
|
||||
## No Liability
|
||||
|
||||
***As far as the law allows, the software comes as is, without
|
||||
any warranty or condition, and the licensor will not be liable
|
||||
to you for any damages arising out of these terms or the use
|
||||
or nature of the software, under any kind of legal claim.***
|
||||
|
||||
## Definitions
|
||||
|
||||
The **licensor** is the individual or entity offering these
|
||||
terms, and the **software** is the software the licensor makes
|
||||
available under these terms.
|
||||
|
||||
**You** refers to the individual or entity agreeing to these
|
||||
terms.
|
||||
|
||||
**Your company** is any legal entity, sole proprietorship,
|
||||
or other kind of organization that you work for, plus all
|
||||
organizations that have control over, are under the control of,
|
||||
or are under common control with that organization. **Control**
|
||||
means ownership of substantially all the assets of an entity,
|
||||
or the power to direct its management and policies by vote,
|
||||
contract, or otherwise. Control can be direct or indirect.
|
||||
|
||||
**Your licenses** are all the licenses granted to you for the
|
||||
software under these terms.
|
||||
|
||||
**Use** means anything you do with the software requiring one
|
||||
of your licenses.
|
||||
@@ -1,3 +1,114 @@
|
||||
# Mouth
|
||||
|
||||
`Mouth` is a utility that sits in the background waiting for you to hit a global hot key - when you press, it listens to you, quickly translates your voice in to text using a local LLM model and pastes it in to your application where the cursor currently sits. No internet required!
|
||||
Offline speech-to-text with a global hotkey. Press a key, speak, and transcribed text is pasted at your cursor. No cloud services, no API keys — everything runs locally.
|
||||
|
||||
Uses [Parakeet TDT 0.6B v3](https://huggingface.co/istupakov/parakeet-tdt-0.6b-v3-onnx) for transcription and [Silero VAD](https://huggingface.co/onnx-community/silero-vad) for voice activity detection, both via ONNX Runtime.
|
||||
|
||||
## Quick Start
|
||||
|
||||
1. Download `mouth.exe` (Windows) or build from source
|
||||
2. Run `mouth` — models download automatically on first launch (~800MB one-time)
|
||||
3. Press your hotkey (default: `Ctrl+Space`), speak, release — text appears at your cursor
|
||||
|
||||
Mouth runs in the background with a system tray icon. Right-click the tray icon to exit.
|
||||
|
||||
## Usage
|
||||
|
||||
| Command | Description |
|
||||
| ----------------------- | ------------------------- |
|
||||
| mouth | Run the daemon (default) |
|
||||
| mouth config | Interactive configuration |
|
||||
| mouth config --show | Print current config |
|
||||
| mouth config --reset | Reset to defaults |
|
||||
| mouth models | List available models |
|
||||
| mouth models --download | Download configured model |
|
||||
| mouth status | Show daemon status |
|
||||
|
||||
|
||||
## Configuration
|
||||
|
||||
Config file location:
|
||||
- **Windows:** `%APPDATA%\mouth\config.yaml`
|
||||
- **Linux/macOS:** `~/.config/mouth/config.yaml`
|
||||
|
||||
Run `mouth config` for an interactive setup, or edit the YAML directly:
|
||||
|
||||
```yaml
|
||||
hotkey: "ctrl+space"
|
||||
mode: push_to_talk # push_to_talk or toggle
|
||||
cancel_key: "escape"
|
||||
model: "parakeet-tdt-0.6b-v3"
|
||||
accelerator: auto # auto, cpu, cuda, directml
|
||||
gpu_device: 0
|
||||
paste_method: ctrl_v # ctrl_v, shift_insert, ctrl_shift_v, clipboard_only
|
||||
copy_to_clipboard: true
|
||||
overlay_position: top # top, bottom, none
|
||||
audio_feedback: true
|
||||
input_device: null # null = system default
|
||||
vad_enabled: true
|
||||
language: en
|
||||
```
|
||||
|
||||
### Recording Modes
|
||||
|
||||
- **push_to_talk** — Hold the hotkey while speaking, release to transcribe
|
||||
- **toggle** — Press once to start recording, press again to stop and transcribe
|
||||
|
||||
### Hotkey Format
|
||||
|
||||
Hotkeys are written as modifier+key combinations:
|
||||
|
||||
- Modifiers: `ctrl`, `alt`, `shift`, `meta` (Win key)
|
||||
- Keys: letters (`a`-`z`), numbers (`0`-`9`), function keys (`f1`-`f12`), punctuation (`[`, `]`, `;`, etc.), and special keys (`space`, `enter`, `escape`, `tab`, etc.)
|
||||
|
||||
Examples: `ctrl+space`, `alt+r`, `ctrl+shift+[`, `f9`
|
||||
|
||||
When running `mouth config`, you can press the key combination directly instead of typing it.
|
||||
|
||||
### Paste Methods
|
||||
|
||||
- **ctrl_v** — Simulates Ctrl+V (works in most apps)
|
||||
- **shift_insert** — Simulates Shift+Insert (useful for terminals)
|
||||
- **ctrl_shift_v** — Simulates Ctrl+Shift+V (plain text paste)
|
||||
- **clipboard_only** — Copies to clipboard without pasting
|
||||
|
||||
## Overlay
|
||||
|
||||
A small colour-coded bar appears at the top (or bottom) of your screen:
|
||||
|
||||
- **Red** — Recording
|
||||
- **Amber** — Transcribing
|
||||
- **Green** — Done
|
||||
|
||||
Set `overlay_position: none` to disable.
|
||||
|
||||
## Building from Source
|
||||
|
||||
Requires Rust 1.75+.
|
||||
|
||||
```bash
|
||||
# Linux dependencies (Ubuntu/Debian)
|
||||
sudo apt-get install libssl-dev libasound2-dev libpulse-dev \
|
||||
libx11-dev libxcb-shape0-dev libxcb-xfixes0-dev libxkbcommon-dev \
|
||||
libwayland-dev libgtk-3-dev libxtst-dev libxdo-dev cmake
|
||||
|
||||
# Build
|
||||
cargo build --release
|
||||
|
||||
# Cross-compile for Windows from Linux (requires cargo-xwin)
|
||||
cargo xwin build --release --target x86_64-pc-windows-msvc
|
||||
```
|
||||
|
||||
## How It Works
|
||||
|
||||
1. A global hotkey listener intercepts your configured key combination (consuming it so it doesn't reach other apps)
|
||||
2. Audio is captured from your microphone and resampled to 16kHz
|
||||
3. Silero VAD trims silence from the recording
|
||||
4. The Parakeet TDT model transcribes speech to text via ONNX Runtime
|
||||
5. Text is placed on the clipboard and pasted at your cursor
|
||||
|
||||
All processing happens locally. No data leaves your machine.
|
||||
|
||||
## License
|
||||
|
||||
[PolyForm Noncommercial 1.0.0](LICENSE) — free for personal and non-commercial use. For commercial licensing, contact the author.
|
||||
|
||||
@@ -1,287 +0,0 @@
|
||||
# Mouth — Implementation Plan
|
||||
|
||||
## Overview
|
||||
|
||||
Mouth is a single-binary, offline speech-to-text tool for Windows (with Linux/macOS support where possible). Press a hotkey, speak, and transcribed text is pasted at your cursor. Configured entirely via YAML.
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
┌─────────────┐ ┌───────────┐ ┌─────────────┐ ┌────────────┐
|
||||
│ Hotkey │────▶│ Recorder │────▶│ Transcriber │────▶│ Paste │
|
||||
│ Listener │ │ (cpal) │ │ (ort/ONNX) │ │ (enigo) │
|
||||
│ (rdev) │ │ │ │ │ │ │
|
||||
└─────────────┘ └───────────┘ └─────────────┘ └────────────┘
|
||||
│ │ │ │
|
||||
│ ▼ │ │
|
||||
│ ┌───────────┐ │ │
|
||||
│ │ VAD │ │ │
|
||||
│ │ (silero) │ │ │
|
||||
│ └───────────┘ │ │
|
||||
│ │ │
|
||||
▼ ▼ ▼
|
||||
┌──────────────────────────────────────────────────────────────────────┐
|
||||
│ Overlay (winit) │
|
||||
│ State: idle → recording → transcribing → done │
|
||||
└──────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Component Communication
|
||||
|
||||
All components communicate via channels (`std::sync::mpsc` or `tokio::sync`). The main thread owns the overlay window (required by most windowing systems). A coordinator task receives events from hotkey/recorder/transcriber and drives state transitions.
|
||||
|
||||
```
|
||||
HotkeyEvent(Pressed/Released) ──┐
|
||||
AudioReady(Vec<f32>) ───────────┼──▶ Coordinator ──▶ OverlayState
|
||||
TranscriptionDone(String) ──────┘ ──▶ PasteAction
|
||||
CancelRequested ────────────────┘
|
||||
```
|
||||
|
||||
## Crate Dependencies
|
||||
|
||||
| Crate | Purpose | Notes |
|
||||
|-------|---------|-------|
|
||||
| `rdev` | Global hotkey capture | Cross-platform key events, no focus required |
|
||||
| `cpal` | Audio capture | Cross-platform mic input |
|
||||
| `rubato` | Audio resampling | Resample to 16kHz for Parakeet |
|
||||
| `ort` | ONNX Runtime | Run Parakeet v3 + Silero VAD |
|
||||
| `hf-hub` | Model download | Download from HuggingFace, standard cache dir |
|
||||
| `enigo` | Keyboard simulation | Simulate Ctrl+V, Shift+Insert, etc. |
|
||||
| `arboard` | Clipboard access | Read/write clipboard, save/restore |
|
||||
| `winit` | Windowing | Minimal overlay window |
|
||||
| `softbuffer` | Pixel rendering | Draw coloured overlay (no GPU needed for overlay) |
|
||||
| `serde` + `serde_yaml` | Config | Deserialize YAML config |
|
||||
| `clap` | CLI | Subcommands: `run`, `config`, `models` |
|
||||
| `dialoguer` | Interactive TUI | `mouth config` interactive setup |
|
||||
| `rodio` | Audio playback | Blip up/down sounds |
|
||||
| `indicatif` | Progress bars | Model download progress |
|
||||
| `dirs` | Platform dirs | Config/cache paths |
|
||||
| `tracing` | Logging | Structured logging |
|
||||
|
||||
## Config File
|
||||
|
||||
Location: `~/.config/mouth/config.yaml` (Linux/macOS), `%APPDATA%\mouth\config.yaml` (Windows)
|
||||
|
||||
```yaml
|
||||
# Hotkey to activate recording
|
||||
hotkey: "ctrl+space"
|
||||
|
||||
# Recording mode: push_to_talk or toggle
|
||||
mode: push_to_talk
|
||||
|
||||
# Cancel hotkey (only active while recording)
|
||||
cancel_key: "escape"
|
||||
|
||||
# Speech-to-text model
|
||||
model: "parakeet-tdt-0.6b-v3"
|
||||
|
||||
# Inference accelerator: auto, cpu, cuda, directml
|
||||
accelerator: auto
|
||||
|
||||
# GPU device index (only used when accelerator is cuda/directml)
|
||||
gpu_device: 0
|
||||
|
||||
# How to paste text
|
||||
paste_method: ctrl_v # ctrl_v | shift_insert | ctrl_shift_v | clipboard_only
|
||||
|
||||
# Also keep transcribed text on clipboard after pasting
|
||||
copy_to_clipboard: true
|
||||
|
||||
# Overlay position on screen
|
||||
overlay_position: top # top | bottom | none
|
||||
|
||||
# Audio feedback
|
||||
audio_feedback: true
|
||||
|
||||
# Audio input device (null = system default)
|
||||
input_device: null
|
||||
|
||||
# VAD: trim silence from audio before transcription
|
||||
vad_enabled: true
|
||||
|
||||
# Language (for model hint, if supported)
|
||||
language: en
|
||||
```
|
||||
|
||||
## CLI Interface
|
||||
|
||||
```
|
||||
mouth run # Start the daemon (default if no subcommand)
|
||||
mouth config # Interactive TUI to edit config
|
||||
mouth config --show # Print current config to stdout
|
||||
mouth config --reset # Reset config to defaults
|
||||
mouth models # List available/downloaded models
|
||||
mouth models download # Download configured model (if not cached)
|
||||
mouth status # Show daemon status, loaded model, app version
|
||||
```
|
||||
|
||||
## Implementation Phases
|
||||
|
||||
### Phase 1: Project Skeleton + Config
|
||||
|
||||
- Cargo.toml with all dependencies
|
||||
- Config struct with serde, defaults, load/save
|
||||
- CLI with clap (run, config, models subcommands)
|
||||
- `mouth config` interactive TUI with dialoguer
|
||||
- Platform-aware config/cache directory resolution
|
||||
|
||||
### Phase 2: Hotkey Listener
|
||||
|
||||
- Global hotkey capture using rdev
|
||||
- Support configurable key combinations (parse from string like "ctrl+space")
|
||||
- Push-to-talk mode: record on press, stop on release
|
||||
- Toggle mode: start on first press, stop on second press
|
||||
- Cancel on Escape while recording
|
||||
- Debounce rapid key events (~30ms)
|
||||
|
||||
### Phase 3: Audio Capture + VAD
|
||||
|
||||
- Open mic input via cpal (default device or configured)
|
||||
- Convert to f32 mono
|
||||
- Resample to 16kHz via rubato
|
||||
- Buffer audio chunks during recording
|
||||
- Run Silero VAD to trim leading/trailing silence
|
||||
- Produce final `Vec<f32>` of clean speech at 16kHz
|
||||
|
||||
### Phase 4: Model Management
|
||||
|
||||
- Use hf-hub to download Parakeet v3 ONNX model from HuggingFace
|
||||
- Store in standard HF cache (`~/.cache/huggingface/hub/`)
|
||||
- Show download progress with indicatif
|
||||
- `mouth models` command to list/download models
|
||||
- Auto-download on first run if model not cached
|
||||
|
||||
### Phase 5: Transcription
|
||||
|
||||
- Load Parakeet v3 ONNX model via ort
|
||||
- Auto-detect GPU (DirectML on Windows, CUDA if available, CPU fallback)
|
||||
- Respect accelerator override from config
|
||||
- Run inference on captured audio
|
||||
- Return transcribed text string
|
||||
|
||||
### Phase 6: Overlay
|
||||
|
||||
- Create a small always-on-top window using winit
|
||||
- Render with softbuffer (simple coloured rectangle + text)
|
||||
- States and colours:
|
||||
- Recording: red pulsing indicator
|
||||
- Transcribing: amber/yellow
|
||||
- Done: brief green flash, then hide
|
||||
- Error: brief red flash with error hint
|
||||
- Window flags (Windows): `WS_EX_TOPMOST | WS_EX_TOOLWINDOW | WS_EX_NOACTIVATE`
|
||||
- Position: centered horizontally at top or bottom of current monitor
|
||||
- No focus steal, no taskbar entry
|
||||
|
||||
### Phase 7: Paste System
|
||||
|
||||
- Save current clipboard content (if preserving)
|
||||
- Set transcribed text to clipboard via arboard
|
||||
- Simulate keypress via enigo based on paste_method:
|
||||
- `ctrl_v`: Ctrl+V (Cmd+V on macOS)
|
||||
- `shift_insert`: Shift+Insert
|
||||
- `ctrl_shift_v`: Ctrl+Shift+V
|
||||
- `clipboard_only`: no keypress, just clipboard
|
||||
- Restore previous clipboard content (unless copy_to_clipboard is true)
|
||||
- Small delay between clipboard set and paste simulation (~50ms)
|
||||
|
||||
### Phase 8: Audio Feedback
|
||||
|
||||
- Bundle two short PCM blip sounds in the binary (via `include_bytes!`)
|
||||
- "Blip up" on recording start
|
||||
- "Blip down" on recording stop / transcription complete
|
||||
- Play via rodio on a separate thread (non-blocking)
|
||||
- Respect audio_feedback config flag
|
||||
|
||||
### Phase 9: Coordinator + Integration
|
||||
|
||||
- Wire all components together with channel-based message passing
|
||||
- Main thread: overlay window event loop (winit requires this)
|
||||
- Spawned threads/tasks: hotkey listener, audio recorder, transcriber
|
||||
- Coordinator receives events, drives state machine:
|
||||
```
|
||||
Idle ──[hotkey press]──▶ Recording
|
||||
Recording ──[hotkey release/press]──▶ Transcribing
|
||||
Recording ──[cancel]──▶ Idle
|
||||
Transcribing ──[result]──▶ Pasting ──▶ Idle
|
||||
Transcribing ──[error]──▶ Error ──▶ Idle
|
||||
```
|
||||
- Graceful shutdown on SIGINT / tray quit
|
||||
|
||||
### Phase 10: Daemon IPC + Status
|
||||
|
||||
- The running daemon listens on a local Unix domain socket (Linux/macOS) or named pipe (Windows) for status queries
|
||||
- Socket/pipe path: `/tmp/mouth.sock` (Linux/macOS), `\\.\pipe\mouth` (Windows)
|
||||
- `mouth status` connects and requests current state; daemon responds with JSON:
|
||||
```json
|
||||
{
|
||||
"version": "0.1.0",
|
||||
"state": "idle",
|
||||
"model": "parakeet-tdt-0.6b-v3",
|
||||
"accelerator": "directml",
|
||||
"uptime_secs": 3420
|
||||
}
|
||||
```
|
||||
- If the daemon is not running, `mouth status` reports "Mouth is not running" and exits with code 1
|
||||
- Also used internally to prevent launching a second daemon instance (lock check)
|
||||
|
||||
### Phase 11: Polish + Distribution
|
||||
|
||||
- Error handling: user-friendly messages for common failures (no mic, model not found, etc.)
|
||||
- Windows installer via `cargo-wix` or distribute as standalone .exe
|
||||
- Test on Windows 10/11 primarily
|
||||
- Test on Linux (X11 + Wayland) and macOS as secondary
|
||||
- Update CLAUDE.md with build/run/test instructions
|
||||
- Write user-facing README with setup instructions
|
||||
|
||||
## Risks & Mitigations
|
||||
|
||||
| Risk | Impact | Mitigation |
|
||||
|------|--------|------------|
|
||||
| Parakeet v3 ONNX model compatibility with `ort` | Blocks core functionality | Test early in Phase 5; Parakeet v2 as fallback |
|
||||
| `rdev` hotkey reliability on Windows | Broken UX | Test early in Phase 2; fallback to Win32 `RegisterHotKey` |
|
||||
| Overlay focus stealing | Annoying | Use proper window flags; test with various foreground apps |
|
||||
| Audio resampling quality | Poor transcription | Use rubato SincInterpolation (high quality) |
|
||||
| Binary size with bundled ONNX Runtime | Large download | ONNX Runtime is ~20-40MB; acceptable for a single-binary tool |
|
||||
| winit event loop blocking | Unresponsive | All heavy work on background threads; overlay is lightweight |
|
||||
|
||||
## File Structure
|
||||
|
||||
```
|
||||
mouth/
|
||||
├── Cargo.toml
|
||||
├── CLAUDE.md
|
||||
├── README.md
|
||||
├── plan.md
|
||||
├── config.yaml.example
|
||||
├── resources/
|
||||
│ ├── blip_up.pcm # bundled audio feedback
|
||||
│ └── blip_down.pcm
|
||||
└── src/
|
||||
├── main.rs # CLI entry, clap setup
|
||||
├── config.rs # Config struct, YAML load/save, defaults
|
||||
├── hotkey.rs # Global hotkey listener (rdev)
|
||||
├── recorder.rs # Audio capture (cpal + rubato + VAD)
|
||||
├── vad.rs # Silero VAD wrapper
|
||||
├── transcriber.rs # ONNX inference, model loading, GPU detection
|
||||
├── model_cache.rs # HuggingFace download, cache management
|
||||
├── overlay.rs # Minimal overlay window (winit + softbuffer)
|
||||
├── paste.rs # Clipboard + paste simulation
|
||||
├── audio_feedback.rs # Blip sounds via rodio
|
||||
├── coordinator.rs # State machine, channel hub
|
||||
└── cli/
|
||||
├── mod.rs
|
||||
├── run.rs # `mouth run` handler
|
||||
├── config_cmd.rs # `mouth config` TUI
|
||||
├── models_cmd.rs # `mouth models` handler
|
||||
└── status_cmd.rs # `mouth status` handler
|
||||
```
|
||||
|
||||
## Not In Scope (v1)
|
||||
|
||||
- LLM post-processing of transcriptions
|
||||
- Transcription history / database
|
||||
- Multiple model support (v1 is Parakeet v3 only, architecture supports adding more later)
|
||||
- Auto-submit (Enter after paste)
|
||||
- Multi-language UI
|
||||
- Tray icon / system tray integration
|
||||
- Translate-to-English mode
|
||||
+116
-32
@@ -1,11 +1,22 @@
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
|
||||
# ============================================================
|
||||
# Release configuration
|
||||
# ============================================================
|
||||
REPO="https://gitea.dcglab.co.uk/steve/mouth"
|
||||
FORGE="gitea" # "gitea" (uses tea CLI) or "github" (uses gh CLI)
|
||||
|
||||
# ============================================================
|
||||
# Derived variables
|
||||
# ============================================================
|
||||
VERSION=$(grep '^version' Cargo.toml | head -1 | sed 's/.*"\(.*\)"/\1/')
|
||||
RELEASE_DIR="release/v${VERSION}"
|
||||
BINARY_NAME="mouth"
|
||||
TAG="v${VERSION}"
|
||||
|
||||
echo "=== Mouth Release Build v${VERSION} ==="
|
||||
echo "=== Mouth Release Build ${TAG} ==="
|
||||
echo "Forge: ${FORGE} (${REPO})"
|
||||
echo ""
|
||||
|
||||
# Ensure we're in the project root
|
||||
@@ -14,6 +25,22 @@ if [ ! -f Cargo.toml ]; then
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Check CLI tools
|
||||
if [ "${FORGE}" = "gitea" ]; then
|
||||
if ! command -v tea &>/dev/null; then
|
||||
echo "ERROR: 'tea' CLI not found. Install: https://gitea.com/gitea/tea"
|
||||
exit 1
|
||||
fi
|
||||
elif [ "${FORGE}" = "github" ]; then
|
||||
if ! command -v gh &>/dev/null; then
|
||||
echo "ERROR: 'gh' CLI not found. Install: https://cli.github.com"
|
||||
exit 1
|
||||
fi
|
||||
else
|
||||
echo "ERROR: Unknown forge '${FORGE}'. Must be 'gitea' or 'github'."
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Clean previous release artifacts for this version
|
||||
rm -rf "${RELEASE_DIR}"
|
||||
mkdir -p "${RELEASE_DIR}"
|
||||
@@ -32,19 +59,12 @@ build_target() {
|
||||
if cargo build --release --target "${target}" 2>&1; then
|
||||
local binary="target/${target}/release/${BINARY_NAME}${ext}"
|
||||
if [ -f "${binary}" ]; then
|
||||
local archive="${RELEASE_DIR}/${BINARY_NAME}-v${VERSION}-${target}"
|
||||
local archive="${RELEASE_DIR}/${BINARY_NAME}-${TAG}-${target}"
|
||||
if [ -n "${ext}" ]; then
|
||||
# Windows: zip
|
||||
local zip_name="${archive}.zip"
|
||||
zip -j "${zip_name}" "${binary}" 2>/dev/null || {
|
||||
# Fallback if zip not installed
|
||||
cp "${binary}" "${archive}${ext}"
|
||||
echo " -> ${archive}${ext}"
|
||||
BUILT+=("${archive}${ext}")
|
||||
return
|
||||
}
|
||||
echo " -> ${zip_name}"
|
||||
BUILT+=("${zip_name}")
|
||||
# Windows: ship the exe directly
|
||||
cp "${binary}" "${archive}${ext}"
|
||||
echo " -> ${archive}${ext}"
|
||||
BUILT+=("${archive}${ext}")
|
||||
else
|
||||
# Linux/macOS: tar.gz
|
||||
local tar_name="${archive}.tar.gz"
|
||||
@@ -71,30 +91,15 @@ build_target() {
|
||||
build_target "x86_64-unknown-linux-gnu" "Linux x86_64"
|
||||
|
||||
# Windows x86_64 (MSVC target via cargo-xwin)
|
||||
# ort requires the MSVC target — the GNU/MinGW target has no prebuilt
|
||||
# ONNX Runtime binaries. cargo-xwin cross-compiles using the MSVC
|
||||
# toolchain from Linux without needing a Windows machine.
|
||||
#
|
||||
# Install once:
|
||||
# cargo install cargo-xwin
|
||||
# rustup target add x86_64-pc-windows-msvc
|
||||
#
|
||||
if command -v cargo-xwin &>/dev/null && rustup target list --installed | grep -q x86_64-pc-windows-msvc; then
|
||||
echo "--- Building Windows x86_64 (x86_64-pc-windows-msvc via cargo-xwin) ---"
|
||||
if cargo xwin build --release --target x86_64-pc-windows-msvc 2>&1; then
|
||||
local_binary="target/x86_64-pc-windows-msvc/release/${BINARY_NAME}.exe"
|
||||
if [ -f "${local_binary}" ]; then
|
||||
archive="${RELEASE_DIR}/${BINARY_NAME}-v${VERSION}-x86_64-pc-windows-msvc"
|
||||
zip_name="${archive}.zip"
|
||||
zip -j "${zip_name}" "${local_binary}" 2>/dev/null || {
|
||||
cp "${local_binary}" "${archive}.exe"
|
||||
echo " -> ${archive}.exe"
|
||||
BUILT+=("${archive}.exe")
|
||||
}
|
||||
if [ -f "${zip_name}" ]; then
|
||||
echo " -> ${zip_name}"
|
||||
BUILT+=("${zip_name}")
|
||||
fi
|
||||
archive="${RELEASE_DIR}/${BINARY_NAME}-${TAG}-x86_64-pc-windows-msvc.exe"
|
||||
cp "${local_binary}" "${archive}"
|
||||
echo " -> ${archive}"
|
||||
BUILT+=("${archive}")
|
||||
else
|
||||
echo " WARN: Binary not found"
|
||||
FAILED+=("Windows x86_64 (MSVC)")
|
||||
@@ -149,3 +154,82 @@ if [ ${#BUILT[@]} -gt 0 ]; then
|
||||
cat checksums-sha256.txt
|
||||
cd - > /dev/null
|
||||
fi
|
||||
|
||||
# ============================================================
|
||||
# Publish release
|
||||
# ============================================================
|
||||
|
||||
if [ ${#BUILT[@]} -eq 0 ]; then
|
||||
echo ""
|
||||
echo "No successful builds — skipping release publish."
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo ""
|
||||
read -rp "Publish release ${TAG} to ${FORGE}? [y/N] " confirm
|
||||
if [[ ! "${confirm}" =~ ^[Yy]$ ]]; then
|
||||
echo "Skipped. Artifacts are in ${RELEASE_DIR}/"
|
||||
exit 0
|
||||
fi
|
||||
|
||||
# Ensure the git tag exists
|
||||
if ! git rev-parse "${TAG}" &>/dev/null; then
|
||||
echo "Creating git tag ${TAG}..."
|
||||
git tag -a "${TAG}" -m "Release ${TAG}"
|
||||
git push origin "${TAG}"
|
||||
fi
|
||||
|
||||
# Collect all release files (artifacts + checksums)
|
||||
RELEASE_FILES=()
|
||||
for b in "${BUILT[@]}"; do
|
||||
RELEASE_FILES+=("${b}")
|
||||
done
|
||||
RELEASE_FILES+=("${RELEASE_DIR}/checksums-sha256.txt")
|
||||
|
||||
RELEASE_TITLE="Mouth ${TAG}"
|
||||
RELEASE_BODY="## Mouth ${TAG}
|
||||
|
||||
### Downloads
|
||||
$(for b in "${BUILT[@]}"; do echo "- $(basename "${b}")"; done)
|
||||
|
||||
### Checksums (SHA256)
|
||||
\`\`\`
|
||||
$(cat "${RELEASE_DIR}/checksums-sha256.txt")
|
||||
\`\`\`
|
||||
"
|
||||
|
||||
if [ "${FORGE}" = "gitea" ]; then
|
||||
echo "Publishing to Gitea via tea..."
|
||||
|
||||
# Extract host and owner/repo from REPO URL
|
||||
REPO_OWNER_NAME=$(echo "${REPO}" | sed 's|.*://[^/]*/||')
|
||||
|
||||
# Create the release
|
||||
tea release create \
|
||||
--repo "${REPO_OWNER_NAME}" \
|
||||
--tag "${TAG}" \
|
||||
--title "${RELEASE_TITLE}" \
|
||||
--note "${RELEASE_BODY}"
|
||||
|
||||
# Upload assets
|
||||
for f in "${RELEASE_FILES[@]}"; do
|
||||
echo " Uploading $(basename "${f}")..."
|
||||
tea release asset create \
|
||||
--repo "${REPO_OWNER_NAME}" \
|
||||
--tag "${TAG}" \
|
||||
--name "$(basename "${f}")" \
|
||||
--file "${f}"
|
||||
done
|
||||
|
||||
elif [ "${FORGE}" = "github" ]; then
|
||||
echo "Publishing to GitHub via gh..."
|
||||
|
||||
gh release create "${TAG}" \
|
||||
--repo "${REPO}" \
|
||||
--title "${RELEASE_TITLE}" \
|
||||
--notes "${RELEASE_BODY}" \
|
||||
"${RELEASE_FILES[@]}"
|
||||
fi
|
||||
|
||||
echo ""
|
||||
echo "=== Release ${TAG} published to ${FORGE}! ==="
|
||||
|
||||
@@ -84,7 +84,11 @@ pub fn play_blip_down() {
|
||||
}
|
||||
|
||||
fn play_blip_internal(freq_start: f32, freq_end: f32, duration_ms: u64) -> Result<()> {
|
||||
let samples = generate_blip(freq_start, freq_end, duration_ms);
|
||||
// Prepend silence so the audio device has time to warm up
|
||||
let silence_ms = 50u64;
|
||||
let silence_samples = (44100u64 * silence_ms / 1000) as usize;
|
||||
let mut samples = vec![0i16; silence_samples];
|
||||
samples.extend(generate_blip(freq_start, freq_end, duration_ms));
|
||||
let wav_data = encode_wav(&samples, 44100);
|
||||
|
||||
let (_stream, stream_handle) = OutputStream::try_default()?;
|
||||
|
||||
+37
-8
@@ -1,7 +1,9 @@
|
||||
use anyhow::Result;
|
||||
use dialoguer::{Input, Select};
|
||||
use std::time::Duration;
|
||||
|
||||
use crate::config::{Accelerator, Config, OverlayPosition, PasteMethod, RecordingMode};
|
||||
use crate::hotkey::capture_hotkey;
|
||||
|
||||
pub fn show() -> Result<()> {
|
||||
let config = Config::load()?;
|
||||
@@ -20,10 +22,7 @@ pub fn reset() -> Result<()> {
|
||||
pub fn interactive() -> Result<()> {
|
||||
let mut config = Config::load()?;
|
||||
|
||||
config.hotkey = Input::new()
|
||||
.with_prompt("Hotkey")
|
||||
.default(config.hotkey)
|
||||
.interact_text()?;
|
||||
config.hotkey = prompt_hotkey("Hotkey", &config.hotkey)?;
|
||||
|
||||
let mode_idx = Select::new()
|
||||
.with_prompt("Recording mode")
|
||||
@@ -38,10 +37,7 @@ pub fn interactive() -> Result<()> {
|
||||
_ => RecordingMode::Toggle,
|
||||
};
|
||||
|
||||
config.cancel_key = Input::new()
|
||||
.with_prompt("Cancel key")
|
||||
.default(config.cancel_key)
|
||||
.interact_text()?;
|
||||
config.cancel_key = prompt_hotkey("Cancel key", &config.cancel_key)?;
|
||||
|
||||
config.model = Input::new()
|
||||
.with_prompt("Model")
|
||||
@@ -125,3 +121,36 @@ pub fn interactive() -> Result<()> {
|
||||
println!("\nConfig saved to {}", Config::path()?.display());
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Prompt the user to either press a key combination or type it manually.
|
||||
fn prompt_hotkey(label: &str, current: &str) -> Result<String> {
|
||||
let choice = Select::new()
|
||||
.with_prompt(format!("{label} (current: {current})"))
|
||||
.items(&["Press the key combination", "Type it manually", "Keep current"])
|
||||
.default(0)
|
||||
.interact()?;
|
||||
|
||||
match choice {
|
||||
0 => {
|
||||
println!("Press your desired key combination (timeout: 10s)...");
|
||||
match capture_hotkey(Duration::from_secs(10)) {
|
||||
Some(hotkey) => {
|
||||
println!(" Captured: {hotkey}");
|
||||
Ok(hotkey)
|
||||
}
|
||||
None => {
|
||||
println!(" No keypress detected, keeping current: {current}");
|
||||
Ok(current.to_string())
|
||||
}
|
||||
}
|
||||
}
|
||||
1 => {
|
||||
let value = Input::new()
|
||||
.with_prompt(label)
|
||||
.default(current.to_string())
|
||||
.interact_text()?;
|
||||
Ok(value)
|
||||
}
|
||||
_ => Ok(current.to_string()),
|
||||
}
|
||||
}
|
||||
|
||||
+93
-50
@@ -1,18 +1,31 @@
|
||||
use anyhow::{Context, Result};
|
||||
use std::sync::mpsc;
|
||||
use std::sync::{mpsc, Arc};
|
||||
use std::thread;
|
||||
use tracing::info;
|
||||
|
||||
use crate::config::{Config, OverlayPosition};
|
||||
use crate::config::Config;
|
||||
use crate::coordinator::Coordinator;
|
||||
use crate::hotkey;
|
||||
use crate::ipc;
|
||||
use crate::model_cache;
|
||||
use crate::overlay;
|
||||
use crate::recorder;
|
||||
use crate::shared_state::SharedState;
|
||||
use crate::transcriber::Transcriber;
|
||||
|
||||
pub fn run() -> Result<()> {
|
||||
let config = Config::load()?;
|
||||
|
||||
// Check if already running
|
||||
if ipc::is_daemon_running() {
|
||||
eprintln!("Mouth is already running.");
|
||||
std::process::exit(1);
|
||||
}
|
||||
|
||||
// Hide Windows console window
|
||||
#[cfg(windows)]
|
||||
hide_console();
|
||||
|
||||
info!("Mouth v{} starting", env!("CARGO_PKG_VERSION"));
|
||||
info!("Mode: {:?}", config.mode);
|
||||
info!("Hotkey: {}", config.hotkey);
|
||||
@@ -30,10 +43,25 @@ pub fn run() -> Result<()> {
|
||||
let transcriber = Transcriber::new(&model_paths, &config.accelerator, config.gpu_device)
|
||||
.context("Failed to load transcription engine")?;
|
||||
|
||||
// Step 3: VAD (not yet bundled)
|
||||
// Step 3: VAD
|
||||
let vad = if config.vad_enabled {
|
||||
info!("VAD enabled but Silero model not yet bundled — skipping");
|
||||
None
|
||||
info!("Loading Silero VAD...");
|
||||
match model_cache::ensure_vad_model() {
|
||||
Ok(vad_path) => match crate::vad::Vad::new(vad_path.to_str().unwrap_or_default()) {
|
||||
Ok(v) => {
|
||||
info!("VAD loaded");
|
||||
Some(v)
|
||||
}
|
||||
Err(e) => {
|
||||
tracing::warn!("Failed to load VAD, continuing without it: {e}");
|
||||
None
|
||||
}
|
||||
},
|
||||
Err(e) => {
|
||||
tracing::warn!("Failed to download VAD model, continuing without it: {e}");
|
||||
None
|
||||
}
|
||||
}
|
||||
} else {
|
||||
None
|
||||
};
|
||||
@@ -44,12 +72,29 @@ pub fn run() -> Result<()> {
|
||||
let cancel_combo = hotkey::parse_hotkey(&config.cancel_key)
|
||||
.with_context(|| format!("Invalid cancel key: {}", config.cancel_key))?;
|
||||
|
||||
// Step 5: Set up channels
|
||||
// Step 5: Create shared state
|
||||
let shared_state = Arc::new(SharedState::new(
|
||||
config.model.clone(),
|
||||
format!("{:?}", config.accelerator).to_lowercase(),
|
||||
));
|
||||
|
||||
// Step 6: Start IPC listener
|
||||
let ipc_state = Arc::clone(&shared_state);
|
||||
thread::Builder::new()
|
||||
.name("mouth-ipc".into())
|
||||
.spawn(move || {
|
||||
if let Err(e) = ipc::start_ipc_listener(ipc_state) {
|
||||
tracing::error!("IPC listener failed: {e}");
|
||||
}
|
||||
})
|
||||
.context("Failed to spawn IPC thread")?;
|
||||
|
||||
// Step 7: Set up channels
|
||||
let (hotkey_tx, hotkey_rx) = mpsc::channel();
|
||||
let (recorder_cmd_tx, recorder_cmd_rx) = mpsc::channel();
|
||||
let (audio_tx, audio_rx) = mpsc::channel();
|
||||
|
||||
// Step 6: Spawn background threads
|
||||
// Step 8: Spawn background threads
|
||||
let device_name = config.input_device.clone();
|
||||
thread::Builder::new()
|
||||
.name("mouth-recorder".into())
|
||||
@@ -65,52 +110,50 @@ pub fn run() -> Result<()> {
|
||||
})
|
||||
.context("Failed to spawn hotkey thread")?;
|
||||
|
||||
// Step 7: Start overlay + coordinator
|
||||
if config.overlay_position != OverlayPosition::None {
|
||||
let (event_loop, proxy) = overlay::create_event_loop()
|
||||
.map_err(|e| anyhow::anyhow!("Failed to create overlay event loop: {e}"))?;
|
||||
// Step 9: Start overlay + coordinator
|
||||
// Always create the event loop (needed for tray icon even when overlay is hidden)
|
||||
let (event_loop, proxy) = overlay::create_event_loop()
|
||||
.map_err(|e| anyhow::anyhow!("Failed to create overlay event loop: {e}"))?;
|
||||
|
||||
let overlay_position = config.overlay_position.clone();
|
||||
let coord_proxy = Some(proxy);
|
||||
let overlay_position = config.overlay_position.clone();
|
||||
|
||||
// Coordinator runs on a background thread
|
||||
let coord_config = config.clone();
|
||||
thread::Builder::new()
|
||||
.name("mouth-coordinator".into())
|
||||
.spawn(move || {
|
||||
let mut coordinator = Coordinator::new(
|
||||
coord_config,
|
||||
transcriber,
|
||||
vad,
|
||||
recorder_cmd_tx,
|
||||
audio_rx,
|
||||
hotkey_rx,
|
||||
coord_proxy,
|
||||
);
|
||||
coordinator.run();
|
||||
})
|
||||
.context("Failed to spawn coordinator thread")?;
|
||||
// Coordinator runs on a background thread
|
||||
let coord_config = config.clone();
|
||||
let coord_state = Arc::clone(&shared_state);
|
||||
thread::Builder::new()
|
||||
.name("mouth-coordinator".into())
|
||||
.spawn(move || {
|
||||
let mut coordinator = Coordinator::new(
|
||||
coord_config,
|
||||
coord_state,
|
||||
transcriber,
|
||||
vad,
|
||||
recorder_cmd_tx,
|
||||
audio_rx,
|
||||
hotkey_rx,
|
||||
Some(proxy),
|
||||
);
|
||||
coordinator.run();
|
||||
})
|
||||
.context("Failed to spawn coordinator thread")?;
|
||||
|
||||
println!("Mouth is running. Press {} to record. Ctrl+C to quit.", config.hotkey);
|
||||
|
||||
// Overlay event loop runs on main thread (blocking)
|
||||
overlay::run_event_loop(event_loop, overlay_position)
|
||||
.map_err(|e| anyhow::anyhow!("Overlay event loop error: {e}"))?;
|
||||
} else {
|
||||
// No overlay — coordinator runs on main thread
|
||||
println!("Mouth is running. Press {} to record. Ctrl+C to quit.", config.hotkey);
|
||||
|
||||
let mut coordinator = Coordinator::new(
|
||||
config,
|
||||
transcriber,
|
||||
vad,
|
||||
recorder_cmd_tx,
|
||||
audio_rx,
|
||||
hotkey_rx,
|
||||
None,
|
||||
);
|
||||
coordinator.run();
|
||||
}
|
||||
// Overlay event loop runs on main thread (blocking)
|
||||
// Tray icon is created inside the overlay app
|
||||
overlay::run_event_loop(event_loop, overlay_position)
|
||||
.map_err(|e| anyhow::anyhow!("Overlay event loop error: {e}"))?;
|
||||
|
||||
ipc::cleanup();
|
||||
Ok(())
|
||||
}
|
||||
|
||||
#[cfg(windows)]
|
||||
fn hide_console() {
|
||||
use windows_sys::Win32::System::Console::GetConsoleWindow;
|
||||
use windows_sys::Win32::UI::WindowsAndMessaging::{ShowWindow, SW_HIDE};
|
||||
unsafe {
|
||||
let console = GetConsoleWindow();
|
||||
if !console.is_null() {
|
||||
ShowWindow(console, SW_HIDE);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
+20
-7
@@ -1,11 +1,24 @@
|
||||
use anyhow::Result;
|
||||
|
||||
pub fn status() -> Result<()> {
|
||||
let version = env!("CARGO_PKG_VERSION");
|
||||
use crate::ipc;
|
||||
|
||||
// TODO: Phase 10 — connect to daemon IPC socket/pipe and query status
|
||||
// For now, just show version info
|
||||
println!("Mouth v{version}");
|
||||
println!("Status: not yet implemented (requires daemon IPC)");
|
||||
Ok(())
|
||||
pub fn status() -> Result<()> {
|
||||
match ipc::query_daemon_status() {
|
||||
Ok(status) => {
|
||||
println!("Mouth v{}", status.version);
|
||||
println!("State: {}", status.state);
|
||||
println!("Model: {}", status.model);
|
||||
println!("Accelerator: {}", status.accelerator);
|
||||
|
||||
let hours = status.uptime_secs / 3600;
|
||||
let mins = (status.uptime_secs % 3600) / 60;
|
||||
let secs = status.uptime_secs % 60;
|
||||
println!("Uptime: {}h {}m {}s", hours, mins, secs);
|
||||
Ok(())
|
||||
}
|
||||
Err(_) => {
|
||||
eprintln!("Mouth is not running.");
|
||||
std::process::exit(1);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
+28
-11
@@ -1,4 +1,4 @@
|
||||
use std::sync::mpsc;
|
||||
use std::sync::{mpsc, Arc};
|
||||
use std::thread;
|
||||
use tracing::{debug, error, info, warn};
|
||||
use winit::event_loop::EventLoopProxy;
|
||||
@@ -9,6 +9,7 @@ use crate::hotkey::HotkeyEvent;
|
||||
use crate::overlay::{OverlayEvent, OverlayState};
|
||||
use crate::paste;
|
||||
use crate::recorder::{AudioData, RecorderCommand};
|
||||
use crate::shared_state::SharedState;
|
||||
use crate::transcriber::Transcriber;
|
||||
use crate::vad::Vad;
|
||||
|
||||
@@ -24,6 +25,7 @@ enum State {
|
||||
pub struct Coordinator {
|
||||
config: Config,
|
||||
state: State,
|
||||
shared_state: Arc<SharedState>,
|
||||
transcriber: Transcriber,
|
||||
vad: Option<Vad>,
|
||||
recorder_tx: mpsc::Sender<RecorderCommand>,
|
||||
@@ -35,6 +37,7 @@ pub struct Coordinator {
|
||||
impl Coordinator {
|
||||
pub fn new(
|
||||
config: Config,
|
||||
shared_state: Arc<SharedState>,
|
||||
transcriber: Transcriber,
|
||||
vad: Option<Vad>,
|
||||
recorder_tx: mpsc::Sender<RecorderCommand>,
|
||||
@@ -45,6 +48,7 @@ impl Coordinator {
|
||||
Self {
|
||||
config,
|
||||
state: State::Idle,
|
||||
shared_state,
|
||||
transcriber,
|
||||
vad,
|
||||
recorder_tx,
|
||||
@@ -54,6 +58,16 @@ impl Coordinator {
|
||||
}
|
||||
}
|
||||
|
||||
fn set_state(&mut self, state: State) {
|
||||
self.state = state;
|
||||
let label = match state {
|
||||
State::Idle => "idle",
|
||||
State::Recording => "recording",
|
||||
State::Transcribing => "transcribing",
|
||||
};
|
||||
self.shared_state.set_state(label);
|
||||
}
|
||||
|
||||
/// Run the coordinator loop. This blocks until shutdown.
|
||||
pub fn run(&mut self) {
|
||||
info!("Coordinator started");
|
||||
@@ -111,7 +125,7 @@ impl Coordinator {
|
||||
|
||||
fn start_recording(&mut self) {
|
||||
info!("Recording started");
|
||||
self.state = State::Recording;
|
||||
self.set_state(State::Recording);
|
||||
self.set_overlay(OverlayState::Recording);
|
||||
|
||||
if self.config.audio_feedback {
|
||||
@@ -120,23 +134,26 @@ impl Coordinator {
|
||||
|
||||
if self.recorder_tx.send(RecorderCommand::Start).is_err() {
|
||||
error!("Failed to send start command to recorder");
|
||||
self.state = State::Idle;
|
||||
self.set_state(State::Idle);
|
||||
self.set_overlay(OverlayState::Hidden);
|
||||
}
|
||||
}
|
||||
|
||||
fn stop_recording(&mut self) {
|
||||
info!("Recording stopped, starting transcription");
|
||||
self.state = State::Transcribing;
|
||||
self.set_state(State::Transcribing);
|
||||
self.set_overlay(OverlayState::Transcribing);
|
||||
|
||||
if self.config.audio_feedback {
|
||||
audio_feedback::play_blip_down();
|
||||
}
|
||||
|
||||
// Keep recording briefly after the stop signal so trailing speech isn't clipped
|
||||
thread::sleep(std::time::Duration::from_millis(300));
|
||||
|
||||
if self.recorder_tx.send(RecorderCommand::Stop).is_err() {
|
||||
error!("Failed to send stop command to recorder");
|
||||
self.state = State::Idle;
|
||||
self.set_state(State::Idle);
|
||||
self.set_overlay(OverlayState::Hidden);
|
||||
return;
|
||||
}
|
||||
@@ -148,7 +165,7 @@ impl Coordinator {
|
||||
}
|
||||
Err(_) => {
|
||||
error!("Failed to receive audio data");
|
||||
self.state = State::Idle;
|
||||
self.set_state(State::Idle);
|
||||
self.set_overlay(OverlayState::Error);
|
||||
self.delayed_hide_overlay();
|
||||
}
|
||||
@@ -157,7 +174,7 @@ impl Coordinator {
|
||||
|
||||
fn cancel_recording(&mut self) {
|
||||
info!("Recording cancelled");
|
||||
self.state = State::Idle;
|
||||
self.set_state(State::Idle);
|
||||
|
||||
if self.recorder_tx.send(RecorderCommand::Stop).is_err() {
|
||||
warn!("Failed to send stop command to recorder");
|
||||
@@ -176,7 +193,7 @@ impl Coordinator {
|
||||
Ok(filtered) => {
|
||||
if filtered.is_empty() {
|
||||
info!("No speech detected by VAD");
|
||||
self.state = State::Idle;
|
||||
self.set_state(State::Idle);
|
||||
self.set_overlay(OverlayState::Hidden);
|
||||
return;
|
||||
}
|
||||
@@ -199,7 +216,7 @@ impl Coordinator {
|
||||
Ok(text) => {
|
||||
if text.is_empty() {
|
||||
info!("Empty transcription");
|
||||
self.state = State::Idle;
|
||||
self.set_state(State::Idle);
|
||||
self.set_overlay(OverlayState::Hidden);
|
||||
return;
|
||||
}
|
||||
@@ -218,11 +235,11 @@ impl Coordinator {
|
||||
}
|
||||
|
||||
self.delayed_hide_overlay();
|
||||
self.state = State::Idle;
|
||||
self.set_state(State::Idle);
|
||||
}
|
||||
Err(e) => {
|
||||
error!("Transcription failed: {e}");
|
||||
self.state = State::Idle;
|
||||
self.set_state(State::Idle);
|
||||
self.set_overlay(OverlayState::Error);
|
||||
self.delayed_hide_overlay();
|
||||
}
|
||||
|
||||
+247
-26
@@ -1,5 +1,6 @@
|
||||
use anyhow::{bail, Result};
|
||||
use rdev::{self, Event, EventType, Key};
|
||||
use std::cell::RefCell;
|
||||
use std::sync::mpsc;
|
||||
use std::time::{Duration, Instant};
|
||||
use tracing::{debug, error, info};
|
||||
@@ -164,77 +165,297 @@ fn parse_key(s: &str) -> Result<Key> {
|
||||
"7" => Key::Num7,
|
||||
"8" => Key::Num8,
|
||||
"9" => Key::Num9,
|
||||
// Punctuation / symbol keys
|
||||
"[" | "leftbracket" => Key::LeftBracket,
|
||||
"]" | "rightbracket" => Key::RightBracket,
|
||||
";" | "semicolon" => Key::SemiColon,
|
||||
"'" | "quote" => Key::Quote,
|
||||
"`" | "backquote" | "backtick" => Key::BackQuote,
|
||||
"\\" | "backslash" => Key::BackSlash,
|
||||
"," | "comma" => Key::Comma,
|
||||
"." | "dot" | "period" => Key::Dot,
|
||||
"/" | "slash" => Key::Slash,
|
||||
"-" | "minus" => Key::Minus,
|
||||
"=" | "equal" | "equals" => Key::Equal,
|
||||
// Additional non-character keys
|
||||
"printscreen" | "prtsc" => Key::PrintScreen,
|
||||
"scrolllock" => Key::ScrollLock,
|
||||
"pause" | "break" => Key::Pause,
|
||||
"numlock" => Key::NumLock,
|
||||
"capslock" => Key::CapsLock,
|
||||
// Numpad
|
||||
"kp0" | "numpad0" => Key::Kp0,
|
||||
"kp1" | "numpad1" => Key::Kp1,
|
||||
"kp2" | "numpad2" => Key::Kp2,
|
||||
"kp3" | "numpad3" => Key::Kp3,
|
||||
"kp4" | "numpad4" => Key::Kp4,
|
||||
"kp5" | "numpad5" => Key::Kp5,
|
||||
"kp6" | "numpad6" => Key::Kp6,
|
||||
"kp7" | "numpad7" => Key::Kp7,
|
||||
"kp8" | "numpad8" => Key::Kp8,
|
||||
"kp9" | "numpad9" => Key::Kp9,
|
||||
"kpenter" | "numpadenter" => Key::KpReturn,
|
||||
"kpminus" | "numpadminus" => Key::KpMinus,
|
||||
"kpplus" | "numpadplus" => Key::KpPlus,
|
||||
"kpmultiply" | "numpadmultiply" => Key::KpMultiply,
|
||||
"kpdivide" | "numpaddivide" => Key::KpDivide,
|
||||
"kpdelete" | "numpaddelete" => Key::KpDelete,
|
||||
_ => bail!("Unknown key: {s}"),
|
||||
};
|
||||
Ok(key)
|
||||
}
|
||||
|
||||
/// Convert an rdev Key back to the config string representation.
|
||||
fn key_to_string(key: &Key) -> Option<String> {
|
||||
let s = match key {
|
||||
Key::Space => "space",
|
||||
Key::Return => "enter",
|
||||
Key::Escape => "escape",
|
||||
Key::Tab => "tab",
|
||||
Key::Backspace => "backspace",
|
||||
Key::Delete => "delete",
|
||||
Key::Insert => "insert",
|
||||
Key::Home => "home",
|
||||
Key::End => "end",
|
||||
Key::PageUp => "pageup",
|
||||
Key::PageDown => "pagedown",
|
||||
Key::UpArrow => "up",
|
||||
Key::DownArrow => "down",
|
||||
Key::LeftArrow => "left",
|
||||
Key::RightArrow => "right",
|
||||
Key::F1 => "f1",
|
||||
Key::F2 => "f2",
|
||||
Key::F3 => "f3",
|
||||
Key::F4 => "f4",
|
||||
Key::F5 => "f5",
|
||||
Key::F6 => "f6",
|
||||
Key::F7 => "f7",
|
||||
Key::F8 => "f8",
|
||||
Key::F9 => "f9",
|
||||
Key::F10 => "f10",
|
||||
Key::F11 => "f11",
|
||||
Key::F12 => "f12",
|
||||
Key::KeyA => "a",
|
||||
Key::KeyB => "b",
|
||||
Key::KeyC => "c",
|
||||
Key::KeyD => "d",
|
||||
Key::KeyE => "e",
|
||||
Key::KeyF => "f",
|
||||
Key::KeyG => "g",
|
||||
Key::KeyH => "h",
|
||||
Key::KeyI => "i",
|
||||
Key::KeyJ => "j",
|
||||
Key::KeyK => "k",
|
||||
Key::KeyL => "l",
|
||||
Key::KeyM => "m",
|
||||
Key::KeyN => "n",
|
||||
Key::KeyO => "o",
|
||||
Key::KeyP => "p",
|
||||
Key::KeyQ => "q",
|
||||
Key::KeyR => "r",
|
||||
Key::KeyS => "s",
|
||||
Key::KeyT => "t",
|
||||
Key::KeyU => "u",
|
||||
Key::KeyV => "v",
|
||||
Key::KeyW => "w",
|
||||
Key::KeyX => "x",
|
||||
Key::KeyY => "y",
|
||||
Key::KeyZ => "z",
|
||||
Key::Num0 => "0",
|
||||
Key::Num1 => "1",
|
||||
Key::Num2 => "2",
|
||||
Key::Num3 => "3",
|
||||
Key::Num4 => "4",
|
||||
Key::Num5 => "5",
|
||||
Key::Num6 => "6",
|
||||
Key::Num7 => "7",
|
||||
Key::Num8 => "8",
|
||||
Key::Num9 => "9",
|
||||
Key::LeftBracket => "[",
|
||||
Key::RightBracket => "]",
|
||||
Key::SemiColon => ";",
|
||||
Key::Quote => "'",
|
||||
Key::BackQuote => "`",
|
||||
Key::BackSlash => "\\",
|
||||
Key::Comma => ",",
|
||||
Key::Dot => ".",
|
||||
Key::Slash => "/",
|
||||
Key::Minus => "-",
|
||||
Key::Equal => "=",
|
||||
Key::PrintScreen => "printscreen",
|
||||
Key::ScrollLock => "scrolllock",
|
||||
Key::Pause => "pause",
|
||||
Key::NumLock => "numlock",
|
||||
Key::CapsLock => "capslock",
|
||||
Key::Kp0 => "kp0",
|
||||
Key::Kp1 => "kp1",
|
||||
Key::Kp2 => "kp2",
|
||||
Key::Kp3 => "kp3",
|
||||
Key::Kp4 => "kp4",
|
||||
Key::Kp5 => "kp5",
|
||||
Key::Kp6 => "kp6",
|
||||
Key::Kp7 => "kp7",
|
||||
Key::Kp8 => "kp8",
|
||||
Key::Kp9 => "kp9",
|
||||
Key::KpReturn => "kpenter",
|
||||
Key::KpMinus => "kpminus",
|
||||
Key::KpPlus => "kpplus",
|
||||
Key::KpMultiply => "kpmultiply",
|
||||
Key::KpDivide => "kpdivide",
|
||||
Key::KpDelete => "kpdelete",
|
||||
_ => return None,
|
||||
};
|
||||
Some(s.to_string())
|
||||
}
|
||||
|
||||
/// Returns true if the key is a modifier (ctrl, alt, shift, meta).
|
||||
fn is_modifier(key: &Key) -> bool {
|
||||
matches!(
|
||||
key,
|
||||
Key::ControlLeft
|
||||
| Key::ControlRight
|
||||
| Key::Alt
|
||||
| Key::AltGr
|
||||
| Key::ShiftLeft
|
||||
| Key::ShiftRight
|
||||
| Key::MetaLeft
|
||||
| Key::MetaRight
|
||||
)
|
||||
}
|
||||
|
||||
/// Capture a hotkey combination by listening for an actual keypress.
|
||||
/// Blocks until the user presses a non-modifier key while optionally holding modifiers.
|
||||
/// Returns the hotkey string (e.g. "ctrl+[") or None on timeout/error.
|
||||
pub fn capture_hotkey(timeout: Duration) -> Option<String> {
|
||||
let (tx, rx) = mpsc::channel();
|
||||
|
||||
std::thread::spawn(move || {
|
||||
let mut modifier_state = ModifierState::default();
|
||||
|
||||
let callback = move |event: Event| {
|
||||
match event.event_type {
|
||||
EventType::KeyPress(key) => {
|
||||
modifier_state.update(&key, true);
|
||||
|
||||
// Ignore pure modifier presses — wait for a real key
|
||||
if is_modifier(&key) {
|
||||
return;
|
||||
}
|
||||
|
||||
if let Some(key_name) = key_to_string(&key) {
|
||||
let mut parts = Vec::new();
|
||||
if modifier_state.ctrl {
|
||||
parts.push("ctrl".to_string());
|
||||
}
|
||||
if modifier_state.alt {
|
||||
parts.push("alt".to_string());
|
||||
}
|
||||
if modifier_state.shift {
|
||||
parts.push("shift".to_string());
|
||||
}
|
||||
if modifier_state.meta {
|
||||
parts.push("meta".to_string());
|
||||
}
|
||||
parts.push(key_name);
|
||||
let _ = tx.send(parts.join("+"));
|
||||
}
|
||||
}
|
||||
EventType::KeyRelease(key) => {
|
||||
modifier_state.update(&key, false);
|
||||
}
|
||||
_ => {}
|
||||
}
|
||||
};
|
||||
|
||||
let _ = rdev::listen(callback);
|
||||
});
|
||||
|
||||
rx.recv_timeout(timeout).ok()
|
||||
}
|
||||
|
||||
/// Start the global hotkey listener on the current thread (blocking).
|
||||
/// Sends HotkeyEvents to the provided channel.
|
||||
/// Uses `rdev::grab` to intercept and consume hotkey events so they don't
|
||||
/// reach the focused application.
|
||||
pub fn listen(
|
||||
hotkey: HotkeyCombination,
|
||||
cancel_key: HotkeyCombination,
|
||||
tx: mpsc::Sender<HotkeyEvent>,
|
||||
) {
|
||||
let debounce_duration = Duration::from_millis(30);
|
||||
let mut last_event_time = Instant::now() - debounce_duration;
|
||||
let mut modifier_state = ModifierState::default();
|
||||
let mut hotkey_held = false;
|
||||
|
||||
info!("Hotkey listener started");
|
||||
info!("Hotkey listener started (grab mode)");
|
||||
debug!("Hotkey: {:?}", hotkey);
|
||||
debug!("Cancel: {:?}", cancel_key);
|
||||
|
||||
let callback = move |event: Event| {
|
||||
// rdev::grab requires Fn (not FnMut), so wrap mutable state in RefCell
|
||||
struct GrabState {
|
||||
last_event_time: Instant,
|
||||
modifier_state: ModifierState,
|
||||
hotkey_held: bool,
|
||||
}
|
||||
let state = RefCell::new(GrabState {
|
||||
last_event_time: Instant::now() - debounce_duration,
|
||||
modifier_state: ModifierState::default(),
|
||||
hotkey_held: false,
|
||||
});
|
||||
|
||||
let callback = move |event: Event| -> Option<Event> {
|
||||
let mut s = state.borrow_mut();
|
||||
let now = Instant::now();
|
||||
match event.event_type {
|
||||
EventType::KeyPress(key) => {
|
||||
modifier_state.update(&key, true);
|
||||
s.modifier_state.update(&key, true);
|
||||
|
||||
// Check cancel key
|
||||
if key == cancel_key.key && modifier_state.all_held(&cancel_key.modifiers) {
|
||||
if now.duration_since(last_event_time) >= debounce_duration {
|
||||
last_event_time = now;
|
||||
// Check cancel key — swallow it
|
||||
if key == cancel_key.key && s.modifier_state.all_held(&cancel_key.modifiers) {
|
||||
if now.duration_since(s.last_event_time) >= debounce_duration {
|
||||
s.last_event_time = now;
|
||||
debug!("Cancel key pressed");
|
||||
if tx.send(HotkeyEvent::Cancel).is_err() {
|
||||
error!("Failed to send cancel event");
|
||||
}
|
||||
}
|
||||
return;
|
||||
return None;
|
||||
}
|
||||
|
||||
// Check hotkey
|
||||
if key == hotkey.key && modifier_state.all_held(&hotkey.modifiers) {
|
||||
if now.duration_since(last_event_time) >= debounce_duration && !hotkey_held {
|
||||
last_event_time = now;
|
||||
hotkey_held = true;
|
||||
// Check hotkey — swallow it
|
||||
if key == hotkey.key && s.modifier_state.all_held(&hotkey.modifiers) {
|
||||
if now.duration_since(s.last_event_time) >= debounce_duration && !s.hotkey_held {
|
||||
s.last_event_time = now;
|
||||
s.hotkey_held = true;
|
||||
debug!("Hotkey pressed");
|
||||
if tx.send(HotkeyEvent::Pressed).is_err() {
|
||||
error!("Failed to send pressed event");
|
||||
}
|
||||
}
|
||||
return None;
|
||||
}
|
||||
|
||||
Some(event)
|
||||
}
|
||||
EventType::KeyRelease(key) => {
|
||||
modifier_state.update(&key, false);
|
||||
s.modifier_state.update(&key, false);
|
||||
|
||||
// Check hotkey release (for push-to-talk)
|
||||
if key == hotkey.key && hotkey_held {
|
||||
if now.duration_since(last_event_time) >= debounce_duration {
|
||||
last_event_time = now;
|
||||
hotkey_held = false;
|
||||
// Check hotkey release — swallow it
|
||||
if key == hotkey.key && s.hotkey_held {
|
||||
if now.duration_since(s.last_event_time) >= debounce_duration {
|
||||
s.last_event_time = now;
|
||||
s.hotkey_held = false;
|
||||
debug!("Hotkey released");
|
||||
if tx.send(HotkeyEvent::Released).is_err() {
|
||||
error!("Failed to send released event");
|
||||
}
|
||||
}
|
||||
return None;
|
||||
}
|
||||
|
||||
Some(event)
|
||||
}
|
||||
_ => {}
|
||||
_ => Some(event),
|
||||
}
|
||||
};
|
||||
|
||||
if let Err(e) = rdev::listen(callback) {
|
||||
error!("Hotkey listener error: {:?}", e);
|
||||
if let Err(e) = rdev::grab(callback) {
|
||||
error!("Hotkey grab error: {:?}", e);
|
||||
}
|
||||
}
|
||||
|
||||
+233
@@ -0,0 +1,233 @@
|
||||
use anyhow::{Context, Result};
|
||||
use serde::{Deserialize, Serialize};
|
||||
use std::io::{Read, Write};
|
||||
use std::sync::Arc;
|
||||
use tracing::{debug, info};
|
||||
|
||||
use crate::shared_state::SharedState;
|
||||
|
||||
/// Status response sent over IPC.
|
||||
#[derive(Debug, Serialize, Deserialize)]
|
||||
pub struct DaemonStatus {
|
||||
pub version: String,
|
||||
pub state: String,
|
||||
pub model: String,
|
||||
pub accelerator: String,
|
||||
pub uptime_secs: u64,
|
||||
}
|
||||
|
||||
/// Returns the platform-specific IPC path.
|
||||
pub fn ipc_path() -> String {
|
||||
#[cfg(unix)]
|
||||
{
|
||||
"/tmp/mouth.sock".to_string()
|
||||
}
|
||||
#[cfg(windows)]
|
||||
{
|
||||
r"\\.\pipe\mouth".to_string()
|
||||
}
|
||||
}
|
||||
|
||||
/// Check if a daemon is already running by attempting to connect.
|
||||
pub fn is_daemon_running() -> bool {
|
||||
query_daemon_status().is_ok()
|
||||
}
|
||||
|
||||
/// Query the running daemon for its status.
|
||||
pub fn query_daemon_status() -> Result<DaemonStatus> {
|
||||
let path = ipc_path();
|
||||
|
||||
#[cfg(unix)]
|
||||
{
|
||||
use std::os::unix::net::UnixStream;
|
||||
let mut stream = UnixStream::connect(&path)
|
||||
.with_context(|| format!("Could not connect to daemon at {path}"))?;
|
||||
stream
|
||||
.set_read_timeout(Some(std::time::Duration::from_secs(2)))
|
||||
.ok();
|
||||
let mut buf = String::new();
|
||||
stream.read_to_string(&mut buf)?;
|
||||
let status: DaemonStatus =
|
||||
serde_json::from_str(&buf).context("Invalid status response from daemon")?;
|
||||
Ok(status)
|
||||
}
|
||||
|
||||
#[cfg(windows)]
|
||||
{
|
||||
use std::fs::OpenOptions;
|
||||
let mut file = OpenOptions::new()
|
||||
.read(true)
|
||||
.write(true)
|
||||
.open(&path)
|
||||
.with_context(|| format!("Could not connect to daemon at {path}"))?;
|
||||
// Write a newline to trigger the server to respond
|
||||
file.write_all(b"\n")?;
|
||||
file.flush()?;
|
||||
// Read response — use a fixed buffer since read_to_string waits for EOF
|
||||
let mut buf = vec![0u8; 4096];
|
||||
let n = file.read(&mut buf)?;
|
||||
let text = String::from_utf8_lossy(&buf[..n]);
|
||||
let status: DaemonStatus =
|
||||
serde_json::from_str(&text).context("Invalid status response from daemon")?;
|
||||
Ok(status)
|
||||
}
|
||||
}
|
||||
|
||||
/// Start the IPC listener on the current thread (blocking).
|
||||
/// Call this from a dedicated thread.
|
||||
pub fn start_ipc_listener(shared_state: Arc<SharedState>) -> Result<()> {
|
||||
let path = ipc_path();
|
||||
info!("Starting IPC listener at {path}");
|
||||
|
||||
#[cfg(unix)]
|
||||
{
|
||||
unix_listener(&path, shared_state)
|
||||
}
|
||||
|
||||
#[cfg(windows)]
|
||||
{
|
||||
windows_listener(&path, shared_state)
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(unix)]
|
||||
fn unix_listener(path: &str, shared_state: Arc<SharedState>) -> Result<()> {
|
||||
use std::os::unix::net::UnixListener;
|
||||
|
||||
// Clean up stale socket
|
||||
if std::path::Path::new(path).exists() {
|
||||
if is_daemon_running() {
|
||||
anyhow::bail!("Another instance of Mouth is already running");
|
||||
}
|
||||
std::fs::remove_file(path).ok();
|
||||
}
|
||||
|
||||
let listener = UnixListener::bind(path).context("Failed to bind IPC socket")?;
|
||||
info!("IPC listener ready");
|
||||
|
||||
for stream in listener.incoming() {
|
||||
match stream {
|
||||
Ok(mut stream) => {
|
||||
let status = build_status(&shared_state);
|
||||
match serde_json::to_string(&status) {
|
||||
Ok(json) => {
|
||||
if let Err(e) = stream.write_all(json.as_bytes()) {
|
||||
debug!("Failed to write IPC response: {e}");
|
||||
}
|
||||
}
|
||||
Err(e) => {
|
||||
warn!("Failed to serialize status: {e}");
|
||||
}
|
||||
}
|
||||
}
|
||||
Err(e) => {
|
||||
debug!("IPC accept error: {e}");
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
#[cfg(windows)]
|
||||
fn windows_listener(path: &str, shared_state: Arc<SharedState>) -> Result<()> {
|
||||
use windows_sys::Win32::Foundation::{CloseHandle, INVALID_HANDLE_VALUE};
|
||||
use windows_sys::Win32::Storage::FileSystem::{
|
||||
FlushFileBuffers, ReadFile, WriteFile, PIPE_ACCESS_DUPLEX,
|
||||
};
|
||||
use windows_sys::Win32::System::Pipes::{
|
||||
ConnectNamedPipe, CreateNamedPipeW, DisconnectNamedPipe,
|
||||
PIPE_READMODE_BYTE, PIPE_TYPE_BYTE, PIPE_UNLIMITED_INSTANCES, PIPE_WAIT,
|
||||
};
|
||||
|
||||
let wide_path: Vec<u16> = path.encode_utf16().chain(std::iter::once(0)).collect();
|
||||
|
||||
info!("IPC listener ready");
|
||||
|
||||
loop {
|
||||
let handle = unsafe {
|
||||
CreateNamedPipeW(
|
||||
wide_path.as_ptr(),
|
||||
PIPE_ACCESS_DUPLEX,
|
||||
PIPE_TYPE_BYTE | PIPE_READMODE_BYTE | PIPE_WAIT,
|
||||
PIPE_UNLIMITED_INSTANCES,
|
||||
4096,
|
||||
4096,
|
||||
0,
|
||||
std::ptr::null(),
|
||||
)
|
||||
};
|
||||
|
||||
if handle == INVALID_HANDLE_VALUE {
|
||||
tracing::error!("Failed to create named pipe");
|
||||
std::thread::sleep(std::time::Duration::from_secs(1));
|
||||
continue;
|
||||
}
|
||||
|
||||
// Wait for a client to connect
|
||||
let connected = unsafe { ConnectNamedPipe(handle, std::ptr::null_mut()) };
|
||||
if connected == 0 {
|
||||
let err = std::io::Error::last_os_error();
|
||||
// ERROR_PIPE_CONNECTED (535) means client already connected — that's ok
|
||||
if err.raw_os_error() != Some(535) {
|
||||
debug!("ConnectNamedPipe error: {err}");
|
||||
unsafe { CloseHandle(handle) };
|
||||
continue;
|
||||
}
|
||||
}
|
||||
|
||||
// Read the trigger byte from the client (just 1 byte to unblock)
|
||||
let mut read_buf = [0u8; 1];
|
||||
let mut bytes_read: u32 = 0;
|
||||
unsafe {
|
||||
ReadFile(
|
||||
handle,
|
||||
read_buf.as_mut_ptr(),
|
||||
1,
|
||||
&mut bytes_read,
|
||||
std::ptr::null_mut(),
|
||||
);
|
||||
}
|
||||
|
||||
// Write the status response
|
||||
let status = build_status(&shared_state);
|
||||
if let Ok(json) = serde_json::to_string(&status) {
|
||||
let bytes = json.as_bytes();
|
||||
let mut written: u32 = 0;
|
||||
unsafe {
|
||||
WriteFile(
|
||||
handle,
|
||||
bytes.as_ptr().cast(),
|
||||
bytes.len() as u32,
|
||||
&mut written,
|
||||
std::ptr::null_mut(),
|
||||
);
|
||||
FlushFileBuffers(handle);
|
||||
}
|
||||
}
|
||||
|
||||
unsafe {
|
||||
DisconnectNamedPipe(handle);
|
||||
CloseHandle(handle);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
fn build_status(shared_state: &SharedState) -> DaemonStatus {
|
||||
DaemonStatus {
|
||||
version: env!("CARGO_PKG_VERSION").to_string(),
|
||||
state: shared_state.get_state(),
|
||||
model: shared_state.model.clone(),
|
||||
accelerator: shared_state.accelerator.clone(),
|
||||
uptime_secs: shared_state.uptime_secs(),
|
||||
}
|
||||
}
|
||||
|
||||
/// Clean up the IPC socket (Unix only).
|
||||
pub fn cleanup() {
|
||||
#[cfg(unix)]
|
||||
{
|
||||
let path = ipc_path();
|
||||
std::fs::remove_file(&path).ok();
|
||||
}
|
||||
}
|
||||
@@ -3,10 +3,12 @@ mod cli;
|
||||
mod config;
|
||||
mod coordinator;
|
||||
mod hotkey;
|
||||
mod ipc;
|
||||
mod model_cache;
|
||||
mod overlay;
|
||||
mod paste;
|
||||
mod recorder;
|
||||
mod shared_state;
|
||||
mod transcriber;
|
||||
mod vad;
|
||||
|
||||
|
||||
@@ -82,6 +82,23 @@ pub fn ensure_model(model_name: &str) -> Result<ModelPaths> {
|
||||
})
|
||||
}
|
||||
|
||||
/// Ensure the Silero VAD model is downloaded and return its path.
|
||||
pub fn ensure_vad_model() -> Result<PathBuf> {
|
||||
let repo_id = "onnx-community/silero-vad";
|
||||
let model_file = "onnx/model.onnx";
|
||||
|
||||
let api = Api::new().context("Failed to create HuggingFace Hub API")?;
|
||||
let repo = api.model(repo_id.to_string());
|
||||
|
||||
info!("Ensuring Silero VAD model from {repo_id}");
|
||||
let path = repo
|
||||
.get(model_file)
|
||||
.with_context(|| format!("Failed to download VAD model from {repo_id}"))?;
|
||||
debug!("VAD model: {}", path.display());
|
||||
|
||||
Ok(path)
|
||||
}
|
||||
|
||||
/// Check if model files are already cached.
|
||||
pub fn is_model_cached(model_name: &str) -> bool {
|
||||
ensure_model(model_name).is_ok()
|
||||
|
||||
+139
-2
@@ -8,8 +8,8 @@ use winit::window::{Window, WindowAttributes, WindowId, WindowLevel};
|
||||
|
||||
use crate::config::OverlayPosition;
|
||||
|
||||
const OVERLAY_WIDTH: u32 = 200;
|
||||
const OVERLAY_HEIGHT: u32 = 36;
|
||||
const OVERLAY_WIDTH: u32 = 150;
|
||||
const OVERLAY_HEIGHT: u32 = 18;
|
||||
|
||||
/// State of the overlay display.
|
||||
#[derive(Debug, Clone, Copy, PartialEq)]
|
||||
@@ -34,6 +34,8 @@ struct OverlayApp {
|
||||
surface: Option<softbuffer::Surface<std::rc::Rc<Window>, std::rc::Rc<Window>>>,
|
||||
state: OverlayState,
|
||||
position: OverlayPosition,
|
||||
_tray_icon: Option<tray_icon::TrayIcon>,
|
||||
tray_exit_id: Option<tray_icon::menu::MenuId>,
|
||||
}
|
||||
|
||||
impl OverlayApp {
|
||||
@@ -99,6 +101,43 @@ impl OverlayApp {
|
||||
window.set_visible(visible);
|
||||
}
|
||||
}
|
||||
|
||||
fn create_tray_icon(&mut self) {
|
||||
use tray_icon::menu::{Menu, MenuItem};
|
||||
use tray_icon::TrayIconBuilder;
|
||||
|
||||
let menu = Menu::new();
|
||||
let exit_item = MenuItem::new("Exit", true, None);
|
||||
let exit_id = exit_item.id().clone();
|
||||
if let Err(e) = menu.append(&exit_item) {
|
||||
warn!("Failed to add tray menu item: {e}");
|
||||
return;
|
||||
}
|
||||
|
||||
let icon = match load_tray_icon() {
|
||||
Ok(i) => i,
|
||||
Err(e) => {
|
||||
warn!("Failed to load tray icon: {e}");
|
||||
return;
|
||||
}
|
||||
};
|
||||
|
||||
match TrayIconBuilder::new()
|
||||
.with_menu(Box::new(menu))
|
||||
.with_tooltip("Mouth — Speech to Text")
|
||||
.with_icon(icon)
|
||||
.build()
|
||||
{
|
||||
Ok(tray) => {
|
||||
info!("System tray icon created");
|
||||
self._tray_icon = Some(tray);
|
||||
self.tray_exit_id = Some(exit_id);
|
||||
}
|
||||
Err(e) => {
|
||||
warn!("Failed to create tray icon: {e}");
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl ApplicationHandler<OverlayEvent> for OverlayApp {
|
||||
@@ -154,6 +193,9 @@ impl ApplicationHandler<OverlayEvent> for OverlayApp {
|
||||
error!("Failed to create overlay window: {e}");
|
||||
}
|
||||
}
|
||||
|
||||
// Create tray icon (must be done on the main/event-loop thread)
|
||||
self.create_tray_icon();
|
||||
}
|
||||
|
||||
fn user_event(&mut self, event_loop: &ActiveEventLoop, event: OverlayEvent) {
|
||||
@@ -176,6 +218,99 @@ impl ApplicationHandler<OverlayEvent> for OverlayApp {
|
||||
self.draw();
|
||||
}
|
||||
}
|
||||
|
||||
fn about_to_wait(&mut self, event_loop: &ActiveEventLoop) {
|
||||
// Poll tray menu events
|
||||
if let Some(exit_id) = &self.tray_exit_id {
|
||||
if let Ok(event) = tray_icon::menu::MenuEvent::receiver().try_recv() {
|
||||
if event.id() == exit_id {
|
||||
info!("Exit requested via tray icon");
|
||||
crate::ipc::cleanup();
|
||||
event_loop.exit();
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
fn load_tray_icon() -> Result<tray_icon::Icon, Box<dyn std::error::Error>> {
|
||||
const S: u32 = 32;
|
||||
let mut pixels = vec![0u8; (S * S * 4) as usize];
|
||||
|
||||
let cx = S as f32 / 2.0;
|
||||
|
||||
for y in 0..S {
|
||||
for x in 0..S {
|
||||
let fx = x as f32 + 0.5;
|
||||
let fy = y as f32 + 0.5;
|
||||
let idx = ((y * S + x) * 4) as usize;
|
||||
|
||||
let mut alpha: f32 = 0.0;
|
||||
|
||||
// Microphone body: rounded rectangle (capsule shape)
|
||||
// Center x=16, from y=3 to y=18, radius 5
|
||||
let mic_top = 3.0;
|
||||
let mic_bot = 18.0;
|
||||
let mic_r = 5.5;
|
||||
let mic_cx = cx;
|
||||
{
|
||||
let dy = fy.clamp(mic_top + mic_r, mic_bot - mic_r);
|
||||
let dist = ((fx - mic_cx).powi(2) + (fy - dy).powi(2)).sqrt();
|
||||
if dist <= mic_r {
|
||||
alpha = 1.0;
|
||||
} else if dist <= mic_r + 1.0 {
|
||||
alpha = alpha.max(mic_r + 1.0 - dist); // anti-alias
|
||||
}
|
||||
}
|
||||
|
||||
// Cradle arc: U-shape below mic, from y=14 to y=22
|
||||
{
|
||||
let arc_cy = 14.0;
|
||||
let arc_r = 8.5;
|
||||
let arc_thickness = 2.2;
|
||||
let dx = fx - cx;
|
||||
let dy = fy - arc_cy;
|
||||
let dist = (dx * dx + dy * dy).sqrt();
|
||||
if fy >= arc_cy && dist >= arc_r - arc_thickness / 2.0 && dist <= arc_r + arc_thickness / 2.0 {
|
||||
let edge_outer = (arc_r + arc_thickness / 2.0 - dist).min(1.0).max(0.0);
|
||||
let edge_inner = (dist - (arc_r - arc_thickness / 2.0)).min(1.0).max(0.0);
|
||||
alpha = alpha.max(edge_outer.min(edge_inner));
|
||||
}
|
||||
}
|
||||
|
||||
// Stem: vertical line from arc bottom to near bottom
|
||||
{
|
||||
let stem_top = 22.0;
|
||||
let stem_bot = 27.0;
|
||||
let stem_w = 1.2;
|
||||
if fy >= stem_top && fy <= stem_bot && (fx - cx).abs() <= stem_w {
|
||||
let edge = (stem_w - (fx - cx).abs()).min(1.0);
|
||||
alpha = alpha.max(edge);
|
||||
}
|
||||
}
|
||||
|
||||
// Base: horizontal line at bottom
|
||||
{
|
||||
let base_y = 27.0;
|
||||
let base_h = 2.0;
|
||||
let base_hw = 5.0;
|
||||
if fy >= base_y && fy <= base_y + base_h && (fx - cx).abs() <= base_hw {
|
||||
let edge = (base_hw - (fx - cx).abs()).min(1.0);
|
||||
alpha = alpha.max(edge);
|
||||
}
|
||||
}
|
||||
|
||||
let a = (alpha.clamp(0.0, 1.0) * 255.0) as u8;
|
||||
// White icon with alpha (looks good on both light and dark taskbars)
|
||||
pixels[idx] = 255; // R
|
||||
pixels[idx + 1] = 255; // G
|
||||
pixels[idx + 2] = 255; // B
|
||||
pixels[idx + 3] = a; // A
|
||||
}
|
||||
}
|
||||
|
||||
let icon = tray_icon::Icon::from_rgba(pixels, S, S)?;
|
||||
Ok(icon)
|
||||
}
|
||||
|
||||
/// Create an event loop and return the proxy for sending events.
|
||||
@@ -195,6 +330,8 @@ pub fn run_event_loop(
|
||||
surface: None,
|
||||
state: OverlayState::Hidden,
|
||||
position,
|
||||
_tray_icon: None,
|
||||
tray_exit_id: None,
|
||||
};
|
||||
|
||||
event_loop.run_app(&mut app)
|
||||
|
||||
+9
-1
@@ -7,6 +7,9 @@ use std::sync::{Arc, Mutex};
|
||||
use tracing::{debug, error, info, warn};
|
||||
|
||||
const TARGET_SAMPLE_RATE: u32 = 16000;
|
||||
/// Silence prepended to recordings to give the model a clean lead-in,
|
||||
/// compensating for mic startup latency.
|
||||
const LEAD_IN_MS: u32 = 300;
|
||||
|
||||
/// Commands sent to the recorder.
|
||||
#[derive(Debug)]
|
||||
@@ -252,8 +255,13 @@ pub fn run(
|
||||
|
||||
debug!("Resampled to {} samples at {}Hz", samples.len(), TARGET_SAMPLE_RATE);
|
||||
|
||||
// Prepend silence to compensate for mic startup latency
|
||||
let lead_in_samples = (TARGET_SAMPLE_RATE * LEAD_IN_MS / 1000) as usize;
|
||||
let mut padded = vec![0.0f32; lead_in_samples];
|
||||
padded.extend_from_slice(&samples);
|
||||
|
||||
let audio = AudioData {
|
||||
samples,
|
||||
samples: padded,
|
||||
sample_rate: TARGET_SAMPLE_RATE,
|
||||
};
|
||||
|
||||
|
||||
@@ -0,0 +1,35 @@
|
||||
use std::sync::RwLock;
|
||||
use std::time::Instant;
|
||||
|
||||
/// Thread-safe shared state accessible by the coordinator, IPC listener, and tray icon.
|
||||
pub struct SharedState {
|
||||
pub state: RwLock<String>,
|
||||
pub model: String,
|
||||
pub accelerator: String,
|
||||
pub started_at: Instant,
|
||||
}
|
||||
|
||||
impl SharedState {
|
||||
pub fn new(model: String, accelerator: String) -> Self {
|
||||
Self {
|
||||
state: RwLock::new("idle".to_string()),
|
||||
model,
|
||||
accelerator,
|
||||
started_at: Instant::now(),
|
||||
}
|
||||
}
|
||||
|
||||
pub fn set_state(&self, state: &str) {
|
||||
if let Ok(mut s) = self.state.write() {
|
||||
*s = state.to_string();
|
||||
}
|
||||
}
|
||||
|
||||
pub fn get_state(&self) -> String {
|
||||
self.state.read().map(|s| s.clone()).unwrap_or_else(|_| "unknown".to_string())
|
||||
}
|
||||
|
||||
pub fn uptime_secs(&self) -> u64 {
|
||||
self.started_at.elapsed().as_secs()
|
||||
}
|
||||
}
|
||||
+6
-6
@@ -22,7 +22,7 @@ pub struct Transcriber {
|
||||
encoder: Session,
|
||||
decoder: Session,
|
||||
vocab: Vec<String>,
|
||||
blank_id: i64,
|
||||
blank_id: i32,
|
||||
vocab_size: usize,
|
||||
}
|
||||
|
||||
@@ -45,7 +45,7 @@ impl Transcriber {
|
||||
|
||||
let vocab = load_vocab(&paths.vocab)?;
|
||||
let vocab_size = vocab.len();
|
||||
let blank_id = (vocab_size - 1) as i64; // <blk> is the last token
|
||||
let blank_id = (vocab_size - 1) as i32; // <blk> is the last token
|
||||
info!("Vocab loaded: {vocab_size} tokens, blank_id={blank_id}");
|
||||
|
||||
Ok(Self {
|
||||
@@ -121,7 +121,7 @@ impl Transcriber {
|
||||
Ok((enc_data.to_vec(), feat_dim, encoded_length))
|
||||
}
|
||||
|
||||
fn tdt_greedy_decode(&mut self, encoder_output: &[f32], feat_dim: usize, encoded_length: usize) -> Result<Vec<i64>> {
|
||||
fn tdt_greedy_decode(&mut self, encoder_output: &[f32], feat_dim: usize, encoded_length: usize) -> Result<Vec<i32>> {
|
||||
// Determine decoder LSTM state dimensions by inspecting input metadata
|
||||
// Default fallback values
|
||||
let mut state_shape: [usize; 3] = [1, 1, 640];
|
||||
@@ -168,7 +168,7 @@ impl Transcriber {
|
||||
let frame = Array3::from_shape_vec([1, feat_dim, 1], frame_data)?;
|
||||
|
||||
let targets = ndarray::Array2::from_shape_vec((1, 1), vec![prev_token])?;
|
||||
let target_length = ndarray::Array1::from_vec(vec![1i64]);
|
||||
let target_length = ndarray::Array1::from_vec(vec![1i32]);
|
||||
|
||||
let outputs = self.decoder.run(vec![
|
||||
make_input("encoder_outputs", Value::from_array(frame)?.into_dyn()),
|
||||
@@ -186,7 +186,7 @@ impl Transcriber {
|
||||
let token_logits = &output_data[..self.vocab_size];
|
||||
let duration_logits = &output_data[self.vocab_size..];
|
||||
|
||||
let token_id = argmax(token_logits) as i64;
|
||||
let token_id = argmax(token_logits) as i32;
|
||||
let duration = if !duration_logits.is_empty() {
|
||||
argmax(duration_logits)
|
||||
} else {
|
||||
@@ -225,7 +225,7 @@ impl Transcriber {
|
||||
Ok(tokens)
|
||||
}
|
||||
|
||||
fn tokens_to_text(&self, tokens: &[i64]) -> String {
|
||||
fn tokens_to_text(&self, tokens: &[i32]) -> String {
|
||||
let mut text = String::new();
|
||||
for &token_id in tokens {
|
||||
if token_id >= 0 && (token_id as usize) < self.vocab.len() {
|
||||
|
||||
Reference in New Issue
Block a user