v0.2.0: System tray, IPC status, VAD, hotkey grab, and polish
- Add system tray icon with Exit menu (tray-icon/muda) - Add IPC daemon status via named pipe (Windows) / Unix socket (Linux) - Add `mouth status` command to query running daemon - Add daemon lock to prevent multiple instances - Hide Windows console window when running as daemon - Wire up Silero VAD model download and speech filtering - Switch hotkey listener from rdev::listen to rdev::grab to consume hotkeys - Add hotkey capture mode in interactive config (press keys instead of typing) - Add all missing key names (brackets, punctuation, numpad, etc.) - Fix ONNX tensor type mismatches (encoder wants i64, decoder wants i32) - Add 300ms lead-in silence to compensate for mic startup latency - Add 300ms trailing recording after stop for speech not to be clipped - Add 50ms silence before audio feedback blips for device warmup - Reduce overlay size (150x18, was 200x36) - Add PolyForm Noncommercial 1.0.0 license - Flesh out user-focused README - Update release script with Gitea/GitHub forge support Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Generated
+706
-56
File diff suppressed because it is too large
Load Diff
+12
-2
@@ -1,8 +1,9 @@
|
|||||||
[package]
|
[package]
|
||||||
name = "mouth"
|
name = "mouth"
|
||||||
version = "0.1.0"
|
version = "0.2.0"
|
||||||
edition = "2024"
|
edition = "2024"
|
||||||
description = "Offline speech-to-text with global hotkey and paste"
|
description = "Offline speech-to-text with global hotkey and paste"
|
||||||
|
license-file = "LICENSE"
|
||||||
|
|
||||||
[dependencies]
|
[dependencies]
|
||||||
# CLI
|
# CLI
|
||||||
@@ -24,7 +25,7 @@ tracing-subscriber = { version = "0.3", features = ["env-filter"] }
|
|||||||
tokio = { version = "1", features = ["full"] }
|
tokio = { version = "1", features = ["full"] }
|
||||||
|
|
||||||
# Global hotkey
|
# Global hotkey
|
||||||
rdev = "0.5"
|
rdev = { version = "0.5", features = ["unstable_grab"] }
|
||||||
|
|
||||||
# Audio capture
|
# Audio capture
|
||||||
cpal = "0.15"
|
cpal = "0.15"
|
||||||
@@ -56,6 +57,15 @@ rodio = "0.20"
|
|||||||
# System info
|
# System info
|
||||||
num_cpus = "1"
|
num_cpus = "1"
|
||||||
|
|
||||||
|
# System tray
|
||||||
|
tray-icon = "0.19"
|
||||||
|
|
||||||
|
# IPC status
|
||||||
|
serde_json = "1"
|
||||||
|
|
||||||
# Error handling
|
# Error handling
|
||||||
anyhow = "1"
|
anyhow = "1"
|
||||||
thiserror = "2"
|
thiserror = "2"
|
||||||
|
|
||||||
|
[target.'cfg(windows)'.dependencies]
|
||||||
|
windows-sys = { version = "0.59", features = ["Win32_System_Console", "Win32_UI_WindowsAndMessaging", "Win32_System_Pipes", "Win32_System_IO", "Win32_Storage_FileSystem", "Win32_Foundation", "Win32_Security"] }
|
||||||
|
|||||||
@@ -0,0 +1,131 @@
|
|||||||
|
# PolyForm Noncommercial License 1.0.0
|
||||||
|
|
||||||
|
<https://polyformproject.org/licenses/noncommercial/1.0.0>
|
||||||
|
|
||||||
|
## Acceptance
|
||||||
|
|
||||||
|
In order to get any license under these terms, you must agree
|
||||||
|
to them as both strict obligations and conditions to all
|
||||||
|
your licenses.
|
||||||
|
|
||||||
|
## Copyright License
|
||||||
|
|
||||||
|
The licensor grants you a copyright license for the
|
||||||
|
software to do everything you might do with the software
|
||||||
|
that would otherwise infringe the licensor's copyright
|
||||||
|
in it for any permitted purpose. However, you may
|
||||||
|
only distribute the software according to [Distribution
|
||||||
|
License](#distribution-license) and make changes or new works
|
||||||
|
based on the software according to [Changes and New Works
|
||||||
|
License](#changes-and-new-works-license).
|
||||||
|
|
||||||
|
## Distribution License
|
||||||
|
|
||||||
|
The licensor grants you an additional copyright license
|
||||||
|
to distribute copies of the software. Your license
|
||||||
|
to distribute covers distributing the software with
|
||||||
|
changes and new works permitted by [Changes and New Works
|
||||||
|
License](#changes-and-new-works-license).
|
||||||
|
|
||||||
|
## Notices
|
||||||
|
|
||||||
|
You must ensure that anyone who gets a copy of any part of
|
||||||
|
the software from you also gets a copy of these terms or the
|
||||||
|
URL for them above, as well as copies of any plain-text lines
|
||||||
|
beginning with `Required Notice:` that the licensor provided
|
||||||
|
with the software. For example:
|
||||||
|
|
||||||
|
> Required Notice: Copyright Yoyodyne, Inc. (http://example.com)
|
||||||
|
|
||||||
|
## Changes and New Works License
|
||||||
|
|
||||||
|
The licensor grants you an additional copyright license to
|
||||||
|
make changes and new works based on the software for any
|
||||||
|
permitted purpose.
|
||||||
|
|
||||||
|
## Patent License
|
||||||
|
|
||||||
|
The licensor grants you a patent license for the software that
|
||||||
|
covers patent claims the licensor can license, or becomes able
|
||||||
|
to license, that you would infringe by using the software.
|
||||||
|
|
||||||
|
## Noncommercial Purposes
|
||||||
|
|
||||||
|
Any noncommercial purpose is a permitted purpose.
|
||||||
|
|
||||||
|
## Personal Uses
|
||||||
|
|
||||||
|
Personal use for research, experiment, and testing for
|
||||||
|
the benefit of public knowledge, personal study, private
|
||||||
|
entertainment, hobby projects, amateur pursuits, or religious
|
||||||
|
observance, without any anticipated commercial application,
|
||||||
|
is use for a permitted purpose.
|
||||||
|
|
||||||
|
## Noncommercial Organizations
|
||||||
|
|
||||||
|
Use by any charitable organization, educational institution,
|
||||||
|
public research organization, public safety or health
|
||||||
|
organization, environmental protection organization,
|
||||||
|
or government institution is use for a permitted purpose
|
||||||
|
regardless of the source of funding or obligations resulting
|
||||||
|
from the funding.
|
||||||
|
|
||||||
|
## Fair Use
|
||||||
|
|
||||||
|
You may have "fair use" rights for the software under the
|
||||||
|
law. These terms do not limit them.
|
||||||
|
|
||||||
|
## No Other Rights
|
||||||
|
|
||||||
|
These terms do not allow you to sublicense or transfer any of
|
||||||
|
your licenses to anyone else, or prevent the licensor from
|
||||||
|
granting licenses to anyone else. These terms do not imply
|
||||||
|
any other licenses.
|
||||||
|
|
||||||
|
## Patent Defense
|
||||||
|
|
||||||
|
If you make any written claim that the software infringes or
|
||||||
|
contributes to infringement of any patent, your patent license
|
||||||
|
for the software granted under these terms ends immediately. If
|
||||||
|
your company makes such a claim, your patent license ends
|
||||||
|
immediately for work on behalf of your company.
|
||||||
|
|
||||||
|
## Violations
|
||||||
|
|
||||||
|
The first time you are notified in writing that you have
|
||||||
|
violated any of these terms, or done anything with the software
|
||||||
|
not covered by your licenses, your licenses can nonetheless
|
||||||
|
continue if you come into full compliance with these terms,
|
||||||
|
and take practical steps to correct past violations, within
|
||||||
|
32 days of receiving notice. Otherwise, all your licenses
|
||||||
|
end immediately.
|
||||||
|
|
||||||
|
## No Liability
|
||||||
|
|
||||||
|
***As far as the law allows, the software comes as is, without
|
||||||
|
any warranty or condition, and the licensor will not be liable
|
||||||
|
to you for any damages arising out of these terms or the use
|
||||||
|
or nature of the software, under any kind of legal claim.***
|
||||||
|
|
||||||
|
## Definitions
|
||||||
|
|
||||||
|
The **licensor** is the individual or entity offering these
|
||||||
|
terms, and the **software** is the software the licensor makes
|
||||||
|
available under these terms.
|
||||||
|
|
||||||
|
**You** refers to the individual or entity agreeing to these
|
||||||
|
terms.
|
||||||
|
|
||||||
|
**Your company** is any legal entity, sole proprietorship,
|
||||||
|
or other kind of organization that you work for, plus all
|
||||||
|
organizations that have control over, are under the control of,
|
||||||
|
or are under common control with that organization. **Control**
|
||||||
|
means ownership of substantially all the assets of an entity,
|
||||||
|
or the power to direct its management and policies by vote,
|
||||||
|
contract, or otherwise. Control can be direct or indirect.
|
||||||
|
|
||||||
|
**Your licenses** are all the licenses granted to you for the
|
||||||
|
software under these terms.
|
||||||
|
|
||||||
|
**Use** means anything you do with the software requiring one
|
||||||
|
of your licenses.
|
||||||
@@ -1,3 +1,114 @@
|
|||||||
# Mouth
|
# Mouth
|
||||||
|
|
||||||
`Mouth` is a utility that sits in the background waiting for you to hit a global hot key - when you press, it listens to you, quickly translates your voice in to text using a local LLM model and pastes it in to your application where the cursor currently sits. No internet required!
|
Offline speech-to-text with a global hotkey. Press a key, speak, and transcribed text is pasted at your cursor. No cloud services, no API keys — everything runs locally.
|
||||||
|
|
||||||
|
Uses [Parakeet TDT 0.6B v3](https://huggingface.co/istupakov/parakeet-tdt-0.6b-v3-onnx) for transcription and [Silero VAD](https://huggingface.co/onnx-community/silero-vad) for voice activity detection, both via ONNX Runtime.
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
1. Download `mouth.exe` (Windows) or build from source
|
||||||
|
2. Run `mouth` — models download automatically on first launch (~800MB one-time)
|
||||||
|
3. Press your hotkey (default: `Ctrl+Space`), speak, release — text appears at your cursor
|
||||||
|
|
||||||
|
Mouth runs in the background with a system tray icon. Right-click the tray icon to exit.
|
||||||
|
|
||||||
|
## Usage
|
||||||
|
|
||||||
|
| Command | Description |
|
||||||
|
| ----------------------- | ------------------------- |
|
||||||
|
| mouth | Run the daemon (default) |
|
||||||
|
| mouth config | Interactive configuration |
|
||||||
|
| mouth config --show | Print current config |
|
||||||
|
| mouth config --reset | Reset to defaults |
|
||||||
|
| mouth models | List available models |
|
||||||
|
| mouth models --download | Download configured model |
|
||||||
|
| mouth status | Show daemon status |
|
||||||
|
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
Config file location:
|
||||||
|
- **Windows:** `%APPDATA%\mouth\config.yaml`
|
||||||
|
- **Linux/macOS:** `~/.config/mouth/config.yaml`
|
||||||
|
|
||||||
|
Run `mouth config` for an interactive setup, or edit the YAML directly:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
hotkey: "ctrl+space"
|
||||||
|
mode: push_to_talk # push_to_talk or toggle
|
||||||
|
cancel_key: "escape"
|
||||||
|
model: "parakeet-tdt-0.6b-v3"
|
||||||
|
accelerator: auto # auto, cpu, cuda, directml
|
||||||
|
gpu_device: 0
|
||||||
|
paste_method: ctrl_v # ctrl_v, shift_insert, ctrl_shift_v, clipboard_only
|
||||||
|
copy_to_clipboard: true
|
||||||
|
overlay_position: top # top, bottom, none
|
||||||
|
audio_feedback: true
|
||||||
|
input_device: null # null = system default
|
||||||
|
vad_enabled: true
|
||||||
|
language: en
|
||||||
|
```
|
||||||
|
|
||||||
|
### Recording Modes
|
||||||
|
|
||||||
|
- **push_to_talk** — Hold the hotkey while speaking, release to transcribe
|
||||||
|
- **toggle** — Press once to start recording, press again to stop and transcribe
|
||||||
|
|
||||||
|
### Hotkey Format
|
||||||
|
|
||||||
|
Hotkeys are written as modifier+key combinations:
|
||||||
|
|
||||||
|
- Modifiers: `ctrl`, `alt`, `shift`, `meta` (Win key)
|
||||||
|
- Keys: letters (`a`-`z`), numbers (`0`-`9`), function keys (`f1`-`f12`), punctuation (`[`, `]`, `;`, etc.), and special keys (`space`, `enter`, `escape`, `tab`, etc.)
|
||||||
|
|
||||||
|
Examples: `ctrl+space`, `alt+r`, `ctrl+shift+[`, `f9`
|
||||||
|
|
||||||
|
When running `mouth config`, you can press the key combination directly instead of typing it.
|
||||||
|
|
||||||
|
### Paste Methods
|
||||||
|
|
||||||
|
- **ctrl_v** — Simulates Ctrl+V (works in most apps)
|
||||||
|
- **shift_insert** — Simulates Shift+Insert (useful for terminals)
|
||||||
|
- **ctrl_shift_v** — Simulates Ctrl+Shift+V (plain text paste)
|
||||||
|
- **clipboard_only** — Copies to clipboard without pasting
|
||||||
|
|
||||||
|
## Overlay
|
||||||
|
|
||||||
|
A small colour-coded bar appears at the top (or bottom) of your screen:
|
||||||
|
|
||||||
|
- **Red** — Recording
|
||||||
|
- **Amber** — Transcribing
|
||||||
|
- **Green** — Done
|
||||||
|
|
||||||
|
Set `overlay_position: none` to disable.
|
||||||
|
|
||||||
|
## Building from Source
|
||||||
|
|
||||||
|
Requires Rust 1.75+.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Linux dependencies (Ubuntu/Debian)
|
||||||
|
sudo apt-get install libssl-dev libasound2-dev libpulse-dev \
|
||||||
|
libx11-dev libxcb-shape0-dev libxcb-xfixes0-dev libxkbcommon-dev \
|
||||||
|
libwayland-dev libgtk-3-dev libxtst-dev libxdo-dev cmake
|
||||||
|
|
||||||
|
# Build
|
||||||
|
cargo build --release
|
||||||
|
|
||||||
|
# Cross-compile for Windows from Linux (requires cargo-xwin)
|
||||||
|
cargo xwin build --release --target x86_64-pc-windows-msvc
|
||||||
|
```
|
||||||
|
|
||||||
|
## How It Works
|
||||||
|
|
||||||
|
1. A global hotkey listener intercepts your configured key combination (consuming it so it doesn't reach other apps)
|
||||||
|
2. Audio is captured from your microphone and resampled to 16kHz
|
||||||
|
3. Silero VAD trims silence from the recording
|
||||||
|
4. The Parakeet TDT model transcribes speech to text via ONNX Runtime
|
||||||
|
5. Text is placed on the clipboard and pasted at your cursor
|
||||||
|
|
||||||
|
All processing happens locally. No data leaves your machine.
|
||||||
|
|
||||||
|
## License
|
||||||
|
|
||||||
|
[PolyForm Noncommercial 1.0.0](LICENSE) — free for personal and non-commercial use. For commercial licensing, contact the author.
|
||||||
|
|||||||
@@ -1,287 +0,0 @@
|
|||||||
# Mouth — Implementation Plan
|
|
||||||
|
|
||||||
## Overview
|
|
||||||
|
|
||||||
Mouth is a single-binary, offline speech-to-text tool for Windows (with Linux/macOS support where possible). Press a hotkey, speak, and transcribed text is pasted at your cursor. Configured entirely via YAML.
|
|
||||||
|
|
||||||
## Architecture
|
|
||||||
|
|
||||||
```
|
|
||||||
┌─────────────┐ ┌───────────┐ ┌─────────────┐ ┌────────────┐
|
|
||||||
│ Hotkey │────▶│ Recorder │────▶│ Transcriber │────▶│ Paste │
|
|
||||||
│ Listener │ │ (cpal) │ │ (ort/ONNX) │ │ (enigo) │
|
|
||||||
│ (rdev) │ │ │ │ │ │ │
|
|
||||||
└─────────────┘ └───────────┘ └─────────────┘ └────────────┘
|
|
||||||
│ │ │ │
|
|
||||||
│ ▼ │ │
|
|
||||||
│ ┌───────────┐ │ │
|
|
||||||
│ │ VAD │ │ │
|
|
||||||
│ │ (silero) │ │ │
|
|
||||||
│ └───────────┘ │ │
|
|
||||||
│ │ │
|
|
||||||
▼ ▼ ▼
|
|
||||||
┌──────────────────────────────────────────────────────────────────────┐
|
|
||||||
│ Overlay (winit) │
|
|
||||||
│ State: idle → recording → transcribing → done │
|
|
||||||
└──────────────────────────────────────────────────────────────────────┘
|
|
||||||
```
|
|
||||||
|
|
||||||
### Component Communication
|
|
||||||
|
|
||||||
All components communicate via channels (`std::sync::mpsc` or `tokio::sync`). The main thread owns the overlay window (required by most windowing systems). A coordinator task receives events from hotkey/recorder/transcriber and drives state transitions.
|
|
||||||
|
|
||||||
```
|
|
||||||
HotkeyEvent(Pressed/Released) ──┐
|
|
||||||
AudioReady(Vec<f32>) ───────────┼──▶ Coordinator ──▶ OverlayState
|
|
||||||
TranscriptionDone(String) ──────┘ ──▶ PasteAction
|
|
||||||
CancelRequested ────────────────┘
|
|
||||||
```
|
|
||||||
|
|
||||||
## Crate Dependencies
|
|
||||||
|
|
||||||
| Crate | Purpose | Notes |
|
|
||||||
|-------|---------|-------|
|
|
||||||
| `rdev` | Global hotkey capture | Cross-platform key events, no focus required |
|
|
||||||
| `cpal` | Audio capture | Cross-platform mic input |
|
|
||||||
| `rubato` | Audio resampling | Resample to 16kHz for Parakeet |
|
|
||||||
| `ort` | ONNX Runtime | Run Parakeet v3 + Silero VAD |
|
|
||||||
| `hf-hub` | Model download | Download from HuggingFace, standard cache dir |
|
|
||||||
| `enigo` | Keyboard simulation | Simulate Ctrl+V, Shift+Insert, etc. |
|
|
||||||
| `arboard` | Clipboard access | Read/write clipboard, save/restore |
|
|
||||||
| `winit` | Windowing | Minimal overlay window |
|
|
||||||
| `softbuffer` | Pixel rendering | Draw coloured overlay (no GPU needed for overlay) |
|
|
||||||
| `serde` + `serde_yaml` | Config | Deserialize YAML config |
|
|
||||||
| `clap` | CLI | Subcommands: `run`, `config`, `models` |
|
|
||||||
| `dialoguer` | Interactive TUI | `mouth config` interactive setup |
|
|
||||||
| `rodio` | Audio playback | Blip up/down sounds |
|
|
||||||
| `indicatif` | Progress bars | Model download progress |
|
|
||||||
| `dirs` | Platform dirs | Config/cache paths |
|
|
||||||
| `tracing` | Logging | Structured logging |
|
|
||||||
|
|
||||||
## Config File
|
|
||||||
|
|
||||||
Location: `~/.config/mouth/config.yaml` (Linux/macOS), `%APPDATA%\mouth\config.yaml` (Windows)
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
# Hotkey to activate recording
|
|
||||||
hotkey: "ctrl+space"
|
|
||||||
|
|
||||||
# Recording mode: push_to_talk or toggle
|
|
||||||
mode: push_to_talk
|
|
||||||
|
|
||||||
# Cancel hotkey (only active while recording)
|
|
||||||
cancel_key: "escape"
|
|
||||||
|
|
||||||
# Speech-to-text model
|
|
||||||
model: "parakeet-tdt-0.6b-v3"
|
|
||||||
|
|
||||||
# Inference accelerator: auto, cpu, cuda, directml
|
|
||||||
accelerator: auto
|
|
||||||
|
|
||||||
# GPU device index (only used when accelerator is cuda/directml)
|
|
||||||
gpu_device: 0
|
|
||||||
|
|
||||||
# How to paste text
|
|
||||||
paste_method: ctrl_v # ctrl_v | shift_insert | ctrl_shift_v | clipboard_only
|
|
||||||
|
|
||||||
# Also keep transcribed text on clipboard after pasting
|
|
||||||
copy_to_clipboard: true
|
|
||||||
|
|
||||||
# Overlay position on screen
|
|
||||||
overlay_position: top # top | bottom | none
|
|
||||||
|
|
||||||
# Audio feedback
|
|
||||||
audio_feedback: true
|
|
||||||
|
|
||||||
# Audio input device (null = system default)
|
|
||||||
input_device: null
|
|
||||||
|
|
||||||
# VAD: trim silence from audio before transcription
|
|
||||||
vad_enabled: true
|
|
||||||
|
|
||||||
# Language (for model hint, if supported)
|
|
||||||
language: en
|
|
||||||
```
|
|
||||||
|
|
||||||
## CLI Interface
|
|
||||||
|
|
||||||
```
|
|
||||||
mouth run # Start the daemon (default if no subcommand)
|
|
||||||
mouth config # Interactive TUI to edit config
|
|
||||||
mouth config --show # Print current config to stdout
|
|
||||||
mouth config --reset # Reset config to defaults
|
|
||||||
mouth models # List available/downloaded models
|
|
||||||
mouth models download # Download configured model (if not cached)
|
|
||||||
mouth status # Show daemon status, loaded model, app version
|
|
||||||
```
|
|
||||||
|
|
||||||
## Implementation Phases
|
|
||||||
|
|
||||||
### Phase 1: Project Skeleton + Config
|
|
||||||
|
|
||||||
- Cargo.toml with all dependencies
|
|
||||||
- Config struct with serde, defaults, load/save
|
|
||||||
- CLI with clap (run, config, models subcommands)
|
|
||||||
- `mouth config` interactive TUI with dialoguer
|
|
||||||
- Platform-aware config/cache directory resolution
|
|
||||||
|
|
||||||
### Phase 2: Hotkey Listener
|
|
||||||
|
|
||||||
- Global hotkey capture using rdev
|
|
||||||
- Support configurable key combinations (parse from string like "ctrl+space")
|
|
||||||
- Push-to-talk mode: record on press, stop on release
|
|
||||||
- Toggle mode: start on first press, stop on second press
|
|
||||||
- Cancel on Escape while recording
|
|
||||||
- Debounce rapid key events (~30ms)
|
|
||||||
|
|
||||||
### Phase 3: Audio Capture + VAD
|
|
||||||
|
|
||||||
- Open mic input via cpal (default device or configured)
|
|
||||||
- Convert to f32 mono
|
|
||||||
- Resample to 16kHz via rubato
|
|
||||||
- Buffer audio chunks during recording
|
|
||||||
- Run Silero VAD to trim leading/trailing silence
|
|
||||||
- Produce final `Vec<f32>` of clean speech at 16kHz
|
|
||||||
|
|
||||||
### Phase 4: Model Management
|
|
||||||
|
|
||||||
- Use hf-hub to download Parakeet v3 ONNX model from HuggingFace
|
|
||||||
- Store in standard HF cache (`~/.cache/huggingface/hub/`)
|
|
||||||
- Show download progress with indicatif
|
|
||||||
- `mouth models` command to list/download models
|
|
||||||
- Auto-download on first run if model not cached
|
|
||||||
|
|
||||||
### Phase 5: Transcription
|
|
||||||
|
|
||||||
- Load Parakeet v3 ONNX model via ort
|
|
||||||
- Auto-detect GPU (DirectML on Windows, CUDA if available, CPU fallback)
|
|
||||||
- Respect accelerator override from config
|
|
||||||
- Run inference on captured audio
|
|
||||||
- Return transcribed text string
|
|
||||||
|
|
||||||
### Phase 6: Overlay
|
|
||||||
|
|
||||||
- Create a small always-on-top window using winit
|
|
||||||
- Render with softbuffer (simple coloured rectangle + text)
|
|
||||||
- States and colours:
|
|
||||||
- Recording: red pulsing indicator
|
|
||||||
- Transcribing: amber/yellow
|
|
||||||
- Done: brief green flash, then hide
|
|
||||||
- Error: brief red flash with error hint
|
|
||||||
- Window flags (Windows): `WS_EX_TOPMOST | WS_EX_TOOLWINDOW | WS_EX_NOACTIVATE`
|
|
||||||
- Position: centered horizontally at top or bottom of current monitor
|
|
||||||
- No focus steal, no taskbar entry
|
|
||||||
|
|
||||||
### Phase 7: Paste System
|
|
||||||
|
|
||||||
- Save current clipboard content (if preserving)
|
|
||||||
- Set transcribed text to clipboard via arboard
|
|
||||||
- Simulate keypress via enigo based on paste_method:
|
|
||||||
- `ctrl_v`: Ctrl+V (Cmd+V on macOS)
|
|
||||||
- `shift_insert`: Shift+Insert
|
|
||||||
- `ctrl_shift_v`: Ctrl+Shift+V
|
|
||||||
- `clipboard_only`: no keypress, just clipboard
|
|
||||||
- Restore previous clipboard content (unless copy_to_clipboard is true)
|
|
||||||
- Small delay between clipboard set and paste simulation (~50ms)
|
|
||||||
|
|
||||||
### Phase 8: Audio Feedback
|
|
||||||
|
|
||||||
- Bundle two short PCM blip sounds in the binary (via `include_bytes!`)
|
|
||||||
- "Blip up" on recording start
|
|
||||||
- "Blip down" on recording stop / transcription complete
|
|
||||||
- Play via rodio on a separate thread (non-blocking)
|
|
||||||
- Respect audio_feedback config flag
|
|
||||||
|
|
||||||
### Phase 9: Coordinator + Integration
|
|
||||||
|
|
||||||
- Wire all components together with channel-based message passing
|
|
||||||
- Main thread: overlay window event loop (winit requires this)
|
|
||||||
- Spawned threads/tasks: hotkey listener, audio recorder, transcriber
|
|
||||||
- Coordinator receives events, drives state machine:
|
|
||||||
```
|
|
||||||
Idle ──[hotkey press]──▶ Recording
|
|
||||||
Recording ──[hotkey release/press]──▶ Transcribing
|
|
||||||
Recording ──[cancel]──▶ Idle
|
|
||||||
Transcribing ──[result]──▶ Pasting ──▶ Idle
|
|
||||||
Transcribing ──[error]──▶ Error ──▶ Idle
|
|
||||||
```
|
|
||||||
- Graceful shutdown on SIGINT / tray quit
|
|
||||||
|
|
||||||
### Phase 10: Daemon IPC + Status
|
|
||||||
|
|
||||||
- The running daemon listens on a local Unix domain socket (Linux/macOS) or named pipe (Windows) for status queries
|
|
||||||
- Socket/pipe path: `/tmp/mouth.sock` (Linux/macOS), `\\.\pipe\mouth` (Windows)
|
|
||||||
- `mouth status` connects and requests current state; daemon responds with JSON:
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"version": "0.1.0",
|
|
||||||
"state": "idle",
|
|
||||||
"model": "parakeet-tdt-0.6b-v3",
|
|
||||||
"accelerator": "directml",
|
|
||||||
"uptime_secs": 3420
|
|
||||||
}
|
|
||||||
```
|
|
||||||
- If the daemon is not running, `mouth status` reports "Mouth is not running" and exits with code 1
|
|
||||||
- Also used internally to prevent launching a second daemon instance (lock check)
|
|
||||||
|
|
||||||
### Phase 11: Polish + Distribution
|
|
||||||
|
|
||||||
- Error handling: user-friendly messages for common failures (no mic, model not found, etc.)
|
|
||||||
- Windows installer via `cargo-wix` or distribute as standalone .exe
|
|
||||||
- Test on Windows 10/11 primarily
|
|
||||||
- Test on Linux (X11 + Wayland) and macOS as secondary
|
|
||||||
- Update CLAUDE.md with build/run/test instructions
|
|
||||||
- Write user-facing README with setup instructions
|
|
||||||
|
|
||||||
## Risks & Mitigations
|
|
||||||
|
|
||||||
| Risk | Impact | Mitigation |
|
|
||||||
|------|--------|------------|
|
|
||||||
| Parakeet v3 ONNX model compatibility with `ort` | Blocks core functionality | Test early in Phase 5; Parakeet v2 as fallback |
|
|
||||||
| `rdev` hotkey reliability on Windows | Broken UX | Test early in Phase 2; fallback to Win32 `RegisterHotKey` |
|
|
||||||
| Overlay focus stealing | Annoying | Use proper window flags; test with various foreground apps |
|
|
||||||
| Audio resampling quality | Poor transcription | Use rubato SincInterpolation (high quality) |
|
|
||||||
| Binary size with bundled ONNX Runtime | Large download | ONNX Runtime is ~20-40MB; acceptable for a single-binary tool |
|
|
||||||
| winit event loop blocking | Unresponsive | All heavy work on background threads; overlay is lightweight |
|
|
||||||
|
|
||||||
## File Structure
|
|
||||||
|
|
||||||
```
|
|
||||||
mouth/
|
|
||||||
├── Cargo.toml
|
|
||||||
├── CLAUDE.md
|
|
||||||
├── README.md
|
|
||||||
├── plan.md
|
|
||||||
├── config.yaml.example
|
|
||||||
├── resources/
|
|
||||||
│ ├── blip_up.pcm # bundled audio feedback
|
|
||||||
│ └── blip_down.pcm
|
|
||||||
└── src/
|
|
||||||
├── main.rs # CLI entry, clap setup
|
|
||||||
├── config.rs # Config struct, YAML load/save, defaults
|
|
||||||
├── hotkey.rs # Global hotkey listener (rdev)
|
|
||||||
├── recorder.rs # Audio capture (cpal + rubato + VAD)
|
|
||||||
├── vad.rs # Silero VAD wrapper
|
|
||||||
├── transcriber.rs # ONNX inference, model loading, GPU detection
|
|
||||||
├── model_cache.rs # HuggingFace download, cache management
|
|
||||||
├── overlay.rs # Minimal overlay window (winit + softbuffer)
|
|
||||||
├── paste.rs # Clipboard + paste simulation
|
|
||||||
├── audio_feedback.rs # Blip sounds via rodio
|
|
||||||
├── coordinator.rs # State machine, channel hub
|
|
||||||
└── cli/
|
|
||||||
├── mod.rs
|
|
||||||
├── run.rs # `mouth run` handler
|
|
||||||
├── config_cmd.rs # `mouth config` TUI
|
|
||||||
├── models_cmd.rs # `mouth models` handler
|
|
||||||
└── status_cmd.rs # `mouth status` handler
|
|
||||||
```
|
|
||||||
|
|
||||||
## Not In Scope (v1)
|
|
||||||
|
|
||||||
- LLM post-processing of transcriptions
|
|
||||||
- Transcription history / database
|
|
||||||
- Multiple model support (v1 is Parakeet v3 only, architecture supports adding more later)
|
|
||||||
- Auto-submit (Enter after paste)
|
|
||||||
- Multi-language UI
|
|
||||||
- Tray icon / system tray integration
|
|
||||||
- Translate-to-English mode
|
|
||||||
+113
-29
@@ -1,11 +1,22 @@
|
|||||||
#!/usr/bin/env bash
|
#!/usr/bin/env bash
|
||||||
set -euo pipefail
|
set -euo pipefail
|
||||||
|
|
||||||
|
# ============================================================
|
||||||
|
# Release configuration
|
||||||
|
# ============================================================
|
||||||
|
REPO="https://gitea.dcglab.co.uk/steve/mouth"
|
||||||
|
FORGE="gitea" # "gitea" (uses tea CLI) or "github" (uses gh CLI)
|
||||||
|
|
||||||
|
# ============================================================
|
||||||
|
# Derived variables
|
||||||
|
# ============================================================
|
||||||
VERSION=$(grep '^version' Cargo.toml | head -1 | sed 's/.*"\(.*\)"/\1/')
|
VERSION=$(grep '^version' Cargo.toml | head -1 | sed 's/.*"\(.*\)"/\1/')
|
||||||
RELEASE_DIR="release/v${VERSION}"
|
RELEASE_DIR="release/v${VERSION}"
|
||||||
BINARY_NAME="mouth"
|
BINARY_NAME="mouth"
|
||||||
|
TAG="v${VERSION}"
|
||||||
|
|
||||||
echo "=== Mouth Release Build v${VERSION} ==="
|
echo "=== Mouth Release Build ${TAG} ==="
|
||||||
|
echo "Forge: ${FORGE} (${REPO})"
|
||||||
echo ""
|
echo ""
|
||||||
|
|
||||||
# Ensure we're in the project root
|
# Ensure we're in the project root
|
||||||
@@ -14,6 +25,22 @@ if [ ! -f Cargo.toml ]; then
|
|||||||
exit 1
|
exit 1
|
||||||
fi
|
fi
|
||||||
|
|
||||||
|
# Check CLI tools
|
||||||
|
if [ "${FORGE}" = "gitea" ]; then
|
||||||
|
if ! command -v tea &>/dev/null; then
|
||||||
|
echo "ERROR: 'tea' CLI not found. Install: https://gitea.com/gitea/tea"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
elif [ "${FORGE}" = "github" ]; then
|
||||||
|
if ! command -v gh &>/dev/null; then
|
||||||
|
echo "ERROR: 'gh' CLI not found. Install: https://cli.github.com"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
else
|
||||||
|
echo "ERROR: Unknown forge '${FORGE}'. Must be 'gitea' or 'github'."
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
# Clean previous release artifacts for this version
|
# Clean previous release artifacts for this version
|
||||||
rm -rf "${RELEASE_DIR}"
|
rm -rf "${RELEASE_DIR}"
|
||||||
mkdir -p "${RELEASE_DIR}"
|
mkdir -p "${RELEASE_DIR}"
|
||||||
@@ -32,19 +59,12 @@ build_target() {
|
|||||||
if cargo build --release --target "${target}" 2>&1; then
|
if cargo build --release --target "${target}" 2>&1; then
|
||||||
local binary="target/${target}/release/${BINARY_NAME}${ext}"
|
local binary="target/${target}/release/${BINARY_NAME}${ext}"
|
||||||
if [ -f "${binary}" ]; then
|
if [ -f "${binary}" ]; then
|
||||||
local archive="${RELEASE_DIR}/${BINARY_NAME}-v${VERSION}-${target}"
|
local archive="${RELEASE_DIR}/${BINARY_NAME}-${TAG}-${target}"
|
||||||
if [ -n "${ext}" ]; then
|
if [ -n "${ext}" ]; then
|
||||||
# Windows: zip
|
# Windows: ship the exe directly
|
||||||
local zip_name="${archive}.zip"
|
|
||||||
zip -j "${zip_name}" "${binary}" 2>/dev/null || {
|
|
||||||
# Fallback if zip not installed
|
|
||||||
cp "${binary}" "${archive}${ext}"
|
cp "${binary}" "${archive}${ext}"
|
||||||
echo " -> ${archive}${ext}"
|
echo " -> ${archive}${ext}"
|
||||||
BUILT+=("${archive}${ext}")
|
BUILT+=("${archive}${ext}")
|
||||||
return
|
|
||||||
}
|
|
||||||
echo " -> ${zip_name}"
|
|
||||||
BUILT+=("${zip_name}")
|
|
||||||
else
|
else
|
||||||
# Linux/macOS: tar.gz
|
# Linux/macOS: tar.gz
|
||||||
local tar_name="${archive}.tar.gz"
|
local tar_name="${archive}.tar.gz"
|
||||||
@@ -71,30 +91,15 @@ build_target() {
|
|||||||
build_target "x86_64-unknown-linux-gnu" "Linux x86_64"
|
build_target "x86_64-unknown-linux-gnu" "Linux x86_64"
|
||||||
|
|
||||||
# Windows x86_64 (MSVC target via cargo-xwin)
|
# Windows x86_64 (MSVC target via cargo-xwin)
|
||||||
# ort requires the MSVC target — the GNU/MinGW target has no prebuilt
|
|
||||||
# ONNX Runtime binaries. cargo-xwin cross-compiles using the MSVC
|
|
||||||
# toolchain from Linux without needing a Windows machine.
|
|
||||||
#
|
|
||||||
# Install once:
|
|
||||||
# cargo install cargo-xwin
|
|
||||||
# rustup target add x86_64-pc-windows-msvc
|
|
||||||
#
|
|
||||||
if command -v cargo-xwin &>/dev/null && rustup target list --installed | grep -q x86_64-pc-windows-msvc; then
|
if command -v cargo-xwin &>/dev/null && rustup target list --installed | grep -q x86_64-pc-windows-msvc; then
|
||||||
echo "--- Building Windows x86_64 (x86_64-pc-windows-msvc via cargo-xwin) ---"
|
echo "--- Building Windows x86_64 (x86_64-pc-windows-msvc via cargo-xwin) ---"
|
||||||
if cargo xwin build --release --target x86_64-pc-windows-msvc 2>&1; then
|
if cargo xwin build --release --target x86_64-pc-windows-msvc 2>&1; then
|
||||||
local_binary="target/x86_64-pc-windows-msvc/release/${BINARY_NAME}.exe"
|
local_binary="target/x86_64-pc-windows-msvc/release/${BINARY_NAME}.exe"
|
||||||
if [ -f "${local_binary}" ]; then
|
if [ -f "${local_binary}" ]; then
|
||||||
archive="${RELEASE_DIR}/${BINARY_NAME}-v${VERSION}-x86_64-pc-windows-msvc"
|
archive="${RELEASE_DIR}/${BINARY_NAME}-${TAG}-x86_64-pc-windows-msvc.exe"
|
||||||
zip_name="${archive}.zip"
|
cp "${local_binary}" "${archive}"
|
||||||
zip -j "${zip_name}" "${local_binary}" 2>/dev/null || {
|
echo " -> ${archive}"
|
||||||
cp "${local_binary}" "${archive}.exe"
|
BUILT+=("${archive}")
|
||||||
echo " -> ${archive}.exe"
|
|
||||||
BUILT+=("${archive}.exe")
|
|
||||||
}
|
|
||||||
if [ -f "${zip_name}" ]; then
|
|
||||||
echo " -> ${zip_name}"
|
|
||||||
BUILT+=("${zip_name}")
|
|
||||||
fi
|
|
||||||
else
|
else
|
||||||
echo " WARN: Binary not found"
|
echo " WARN: Binary not found"
|
||||||
FAILED+=("Windows x86_64 (MSVC)")
|
FAILED+=("Windows x86_64 (MSVC)")
|
||||||
@@ -149,3 +154,82 @@ if [ ${#BUILT[@]} -gt 0 ]; then
|
|||||||
cat checksums-sha256.txt
|
cat checksums-sha256.txt
|
||||||
cd - > /dev/null
|
cd - > /dev/null
|
||||||
fi
|
fi
|
||||||
|
|
||||||
|
# ============================================================
|
||||||
|
# Publish release
|
||||||
|
# ============================================================
|
||||||
|
|
||||||
|
if [ ${#BUILT[@]} -eq 0 ]; then
|
||||||
|
echo ""
|
||||||
|
echo "No successful builds — skipping release publish."
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo ""
|
||||||
|
read -rp "Publish release ${TAG} to ${FORGE}? [y/N] " confirm
|
||||||
|
if [[ ! "${confirm}" =~ ^[Yy]$ ]]; then
|
||||||
|
echo "Skipped. Artifacts are in ${RELEASE_DIR}/"
|
||||||
|
exit 0
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Ensure the git tag exists
|
||||||
|
if ! git rev-parse "${TAG}" &>/dev/null; then
|
||||||
|
echo "Creating git tag ${TAG}..."
|
||||||
|
git tag -a "${TAG}" -m "Release ${TAG}"
|
||||||
|
git push origin "${TAG}"
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Collect all release files (artifacts + checksums)
|
||||||
|
RELEASE_FILES=()
|
||||||
|
for b in "${BUILT[@]}"; do
|
||||||
|
RELEASE_FILES+=("${b}")
|
||||||
|
done
|
||||||
|
RELEASE_FILES+=("${RELEASE_DIR}/checksums-sha256.txt")
|
||||||
|
|
||||||
|
RELEASE_TITLE="Mouth ${TAG}"
|
||||||
|
RELEASE_BODY="## Mouth ${TAG}
|
||||||
|
|
||||||
|
### Downloads
|
||||||
|
$(for b in "${BUILT[@]}"; do echo "- $(basename "${b}")"; done)
|
||||||
|
|
||||||
|
### Checksums (SHA256)
|
||||||
|
\`\`\`
|
||||||
|
$(cat "${RELEASE_DIR}/checksums-sha256.txt")
|
||||||
|
\`\`\`
|
||||||
|
"
|
||||||
|
|
||||||
|
if [ "${FORGE}" = "gitea" ]; then
|
||||||
|
echo "Publishing to Gitea via tea..."
|
||||||
|
|
||||||
|
# Extract host and owner/repo from REPO URL
|
||||||
|
REPO_OWNER_NAME=$(echo "${REPO}" | sed 's|.*://[^/]*/||')
|
||||||
|
|
||||||
|
# Create the release
|
||||||
|
tea release create \
|
||||||
|
--repo "${REPO_OWNER_NAME}" \
|
||||||
|
--tag "${TAG}" \
|
||||||
|
--title "${RELEASE_TITLE}" \
|
||||||
|
--note "${RELEASE_BODY}"
|
||||||
|
|
||||||
|
# Upload assets
|
||||||
|
for f in "${RELEASE_FILES[@]}"; do
|
||||||
|
echo " Uploading $(basename "${f}")..."
|
||||||
|
tea release asset create \
|
||||||
|
--repo "${REPO_OWNER_NAME}" \
|
||||||
|
--tag "${TAG}" \
|
||||||
|
--name "$(basename "${f}")" \
|
||||||
|
--file "${f}"
|
||||||
|
done
|
||||||
|
|
||||||
|
elif [ "${FORGE}" = "github" ]; then
|
||||||
|
echo "Publishing to GitHub via gh..."
|
||||||
|
|
||||||
|
gh release create "${TAG}" \
|
||||||
|
--repo "${REPO}" \
|
||||||
|
--title "${RELEASE_TITLE}" \
|
||||||
|
--notes "${RELEASE_BODY}" \
|
||||||
|
"${RELEASE_FILES[@]}"
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo ""
|
||||||
|
echo "=== Release ${TAG} published to ${FORGE}! ==="
|
||||||
|
|||||||
@@ -84,7 +84,11 @@ pub fn play_blip_down() {
|
|||||||
}
|
}
|
||||||
|
|
||||||
fn play_blip_internal(freq_start: f32, freq_end: f32, duration_ms: u64) -> Result<()> {
|
fn play_blip_internal(freq_start: f32, freq_end: f32, duration_ms: u64) -> Result<()> {
|
||||||
let samples = generate_blip(freq_start, freq_end, duration_ms);
|
// Prepend silence so the audio device has time to warm up
|
||||||
|
let silence_ms = 50u64;
|
||||||
|
let silence_samples = (44100u64 * silence_ms / 1000) as usize;
|
||||||
|
let mut samples = vec![0i16; silence_samples];
|
||||||
|
samples.extend(generate_blip(freq_start, freq_end, duration_ms));
|
||||||
let wav_data = encode_wav(&samples, 44100);
|
let wav_data = encode_wav(&samples, 44100);
|
||||||
|
|
||||||
let (_stream, stream_handle) = OutputStream::try_default()?;
|
let (_stream, stream_handle) = OutputStream::try_default()?;
|
||||||
|
|||||||
+37
-8
@@ -1,7 +1,9 @@
|
|||||||
use anyhow::Result;
|
use anyhow::Result;
|
||||||
use dialoguer::{Input, Select};
|
use dialoguer::{Input, Select};
|
||||||
|
use std::time::Duration;
|
||||||
|
|
||||||
use crate::config::{Accelerator, Config, OverlayPosition, PasteMethod, RecordingMode};
|
use crate::config::{Accelerator, Config, OverlayPosition, PasteMethod, RecordingMode};
|
||||||
|
use crate::hotkey::capture_hotkey;
|
||||||
|
|
||||||
pub fn show() -> Result<()> {
|
pub fn show() -> Result<()> {
|
||||||
let config = Config::load()?;
|
let config = Config::load()?;
|
||||||
@@ -20,10 +22,7 @@ pub fn reset() -> Result<()> {
|
|||||||
pub fn interactive() -> Result<()> {
|
pub fn interactive() -> Result<()> {
|
||||||
let mut config = Config::load()?;
|
let mut config = Config::load()?;
|
||||||
|
|
||||||
config.hotkey = Input::new()
|
config.hotkey = prompt_hotkey("Hotkey", &config.hotkey)?;
|
||||||
.with_prompt("Hotkey")
|
|
||||||
.default(config.hotkey)
|
|
||||||
.interact_text()?;
|
|
||||||
|
|
||||||
let mode_idx = Select::new()
|
let mode_idx = Select::new()
|
||||||
.with_prompt("Recording mode")
|
.with_prompt("Recording mode")
|
||||||
@@ -38,10 +37,7 @@ pub fn interactive() -> Result<()> {
|
|||||||
_ => RecordingMode::Toggle,
|
_ => RecordingMode::Toggle,
|
||||||
};
|
};
|
||||||
|
|
||||||
config.cancel_key = Input::new()
|
config.cancel_key = prompt_hotkey("Cancel key", &config.cancel_key)?;
|
||||||
.with_prompt("Cancel key")
|
|
||||||
.default(config.cancel_key)
|
|
||||||
.interact_text()?;
|
|
||||||
|
|
||||||
config.model = Input::new()
|
config.model = Input::new()
|
||||||
.with_prompt("Model")
|
.with_prompt("Model")
|
||||||
@@ -125,3 +121,36 @@ pub fn interactive() -> Result<()> {
|
|||||||
println!("\nConfig saved to {}", Config::path()?.display());
|
println!("\nConfig saved to {}", Config::path()?.display());
|
||||||
Ok(())
|
Ok(())
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// Prompt the user to either press a key combination or type it manually.
|
||||||
|
fn prompt_hotkey(label: &str, current: &str) -> Result<String> {
|
||||||
|
let choice = Select::new()
|
||||||
|
.with_prompt(format!("{label} (current: {current})"))
|
||||||
|
.items(&["Press the key combination", "Type it manually", "Keep current"])
|
||||||
|
.default(0)
|
||||||
|
.interact()?;
|
||||||
|
|
||||||
|
match choice {
|
||||||
|
0 => {
|
||||||
|
println!("Press your desired key combination (timeout: 10s)...");
|
||||||
|
match capture_hotkey(Duration::from_secs(10)) {
|
||||||
|
Some(hotkey) => {
|
||||||
|
println!(" Captured: {hotkey}");
|
||||||
|
Ok(hotkey)
|
||||||
|
}
|
||||||
|
None => {
|
||||||
|
println!(" No keypress detected, keeping current: {current}");
|
||||||
|
Ok(current.to_string())
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
1 => {
|
||||||
|
let value = Input::new()
|
||||||
|
.with_prompt(label)
|
||||||
|
.default(current.to_string())
|
||||||
|
.interact_text()?;
|
||||||
|
Ok(value)
|
||||||
|
}
|
||||||
|
_ => Ok(current.to_string()),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|||||||
+70
-27
@@ -1,18 +1,31 @@
|
|||||||
use anyhow::{Context, Result};
|
use anyhow::{Context, Result};
|
||||||
use std::sync::mpsc;
|
use std::sync::{mpsc, Arc};
|
||||||
use std::thread;
|
use std::thread;
|
||||||
use tracing::info;
|
use tracing::info;
|
||||||
|
|
||||||
use crate::config::{Config, OverlayPosition};
|
use crate::config::Config;
|
||||||
use crate::coordinator::Coordinator;
|
use crate::coordinator::Coordinator;
|
||||||
use crate::hotkey;
|
use crate::hotkey;
|
||||||
|
use crate::ipc;
|
||||||
use crate::model_cache;
|
use crate::model_cache;
|
||||||
use crate::overlay;
|
use crate::overlay;
|
||||||
use crate::recorder;
|
use crate::recorder;
|
||||||
|
use crate::shared_state::SharedState;
|
||||||
use crate::transcriber::Transcriber;
|
use crate::transcriber::Transcriber;
|
||||||
|
|
||||||
pub fn run() -> Result<()> {
|
pub fn run() -> Result<()> {
|
||||||
let config = Config::load()?;
|
let config = Config::load()?;
|
||||||
|
|
||||||
|
// Check if already running
|
||||||
|
if ipc::is_daemon_running() {
|
||||||
|
eprintln!("Mouth is already running.");
|
||||||
|
std::process::exit(1);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Hide Windows console window
|
||||||
|
#[cfg(windows)]
|
||||||
|
hide_console();
|
||||||
|
|
||||||
info!("Mouth v{} starting", env!("CARGO_PKG_VERSION"));
|
info!("Mouth v{} starting", env!("CARGO_PKG_VERSION"));
|
||||||
info!("Mode: {:?}", config.mode);
|
info!("Mode: {:?}", config.mode);
|
||||||
info!("Hotkey: {}", config.hotkey);
|
info!("Hotkey: {}", config.hotkey);
|
||||||
@@ -30,10 +43,25 @@ pub fn run() -> Result<()> {
|
|||||||
let transcriber = Transcriber::new(&model_paths, &config.accelerator, config.gpu_device)
|
let transcriber = Transcriber::new(&model_paths, &config.accelerator, config.gpu_device)
|
||||||
.context("Failed to load transcription engine")?;
|
.context("Failed to load transcription engine")?;
|
||||||
|
|
||||||
// Step 3: VAD (not yet bundled)
|
// Step 3: VAD
|
||||||
let vad = if config.vad_enabled {
|
let vad = if config.vad_enabled {
|
||||||
info!("VAD enabled but Silero model not yet bundled — skipping");
|
info!("Loading Silero VAD...");
|
||||||
|
match model_cache::ensure_vad_model() {
|
||||||
|
Ok(vad_path) => match crate::vad::Vad::new(vad_path.to_str().unwrap_or_default()) {
|
||||||
|
Ok(v) => {
|
||||||
|
info!("VAD loaded");
|
||||||
|
Some(v)
|
||||||
|
}
|
||||||
|
Err(e) => {
|
||||||
|
tracing::warn!("Failed to load VAD, continuing without it: {e}");
|
||||||
None
|
None
|
||||||
|
}
|
||||||
|
},
|
||||||
|
Err(e) => {
|
||||||
|
tracing::warn!("Failed to download VAD model, continuing without it: {e}");
|
||||||
|
None
|
||||||
|
}
|
||||||
|
}
|
||||||
} else {
|
} else {
|
||||||
None
|
None
|
||||||
};
|
};
|
||||||
@@ -44,12 +72,29 @@ pub fn run() -> Result<()> {
|
|||||||
let cancel_combo = hotkey::parse_hotkey(&config.cancel_key)
|
let cancel_combo = hotkey::parse_hotkey(&config.cancel_key)
|
||||||
.with_context(|| format!("Invalid cancel key: {}", config.cancel_key))?;
|
.with_context(|| format!("Invalid cancel key: {}", config.cancel_key))?;
|
||||||
|
|
||||||
// Step 5: Set up channels
|
// Step 5: Create shared state
|
||||||
|
let shared_state = Arc::new(SharedState::new(
|
||||||
|
config.model.clone(),
|
||||||
|
format!("{:?}", config.accelerator).to_lowercase(),
|
||||||
|
));
|
||||||
|
|
||||||
|
// Step 6: Start IPC listener
|
||||||
|
let ipc_state = Arc::clone(&shared_state);
|
||||||
|
thread::Builder::new()
|
||||||
|
.name("mouth-ipc".into())
|
||||||
|
.spawn(move || {
|
||||||
|
if let Err(e) = ipc::start_ipc_listener(ipc_state) {
|
||||||
|
tracing::error!("IPC listener failed: {e}");
|
||||||
|
}
|
||||||
|
})
|
||||||
|
.context("Failed to spawn IPC thread")?;
|
||||||
|
|
||||||
|
// Step 7: Set up channels
|
||||||
let (hotkey_tx, hotkey_rx) = mpsc::channel();
|
let (hotkey_tx, hotkey_rx) = mpsc::channel();
|
||||||
let (recorder_cmd_tx, recorder_cmd_rx) = mpsc::channel();
|
let (recorder_cmd_tx, recorder_cmd_rx) = mpsc::channel();
|
||||||
let (audio_tx, audio_rx) = mpsc::channel();
|
let (audio_tx, audio_rx) = mpsc::channel();
|
||||||
|
|
||||||
// Step 6: Spawn background threads
|
// Step 8: Spawn background threads
|
||||||
let device_name = config.input_device.clone();
|
let device_name = config.input_device.clone();
|
||||||
thread::Builder::new()
|
thread::Builder::new()
|
||||||
.name("mouth-recorder".into())
|
.name("mouth-recorder".into())
|
||||||
@@ -65,52 +110,50 @@ pub fn run() -> Result<()> {
|
|||||||
})
|
})
|
||||||
.context("Failed to spawn hotkey thread")?;
|
.context("Failed to spawn hotkey thread")?;
|
||||||
|
|
||||||
// Step 7: Start overlay + coordinator
|
// Step 9: Start overlay + coordinator
|
||||||
if config.overlay_position != OverlayPosition::None {
|
// Always create the event loop (needed for tray icon even when overlay is hidden)
|
||||||
let (event_loop, proxy) = overlay::create_event_loop()
|
let (event_loop, proxy) = overlay::create_event_loop()
|
||||||
.map_err(|e| anyhow::anyhow!("Failed to create overlay event loop: {e}"))?;
|
.map_err(|e| anyhow::anyhow!("Failed to create overlay event loop: {e}"))?;
|
||||||
|
|
||||||
let overlay_position = config.overlay_position.clone();
|
let overlay_position = config.overlay_position.clone();
|
||||||
let coord_proxy = Some(proxy);
|
|
||||||
|
|
||||||
// Coordinator runs on a background thread
|
// Coordinator runs on a background thread
|
||||||
let coord_config = config.clone();
|
let coord_config = config.clone();
|
||||||
|
let coord_state = Arc::clone(&shared_state);
|
||||||
thread::Builder::new()
|
thread::Builder::new()
|
||||||
.name("mouth-coordinator".into())
|
.name("mouth-coordinator".into())
|
||||||
.spawn(move || {
|
.spawn(move || {
|
||||||
let mut coordinator = Coordinator::new(
|
let mut coordinator = Coordinator::new(
|
||||||
coord_config,
|
coord_config,
|
||||||
|
coord_state,
|
||||||
transcriber,
|
transcriber,
|
||||||
vad,
|
vad,
|
||||||
recorder_cmd_tx,
|
recorder_cmd_tx,
|
||||||
audio_rx,
|
audio_rx,
|
||||||
hotkey_rx,
|
hotkey_rx,
|
||||||
coord_proxy,
|
Some(proxy),
|
||||||
);
|
);
|
||||||
coordinator.run();
|
coordinator.run();
|
||||||
})
|
})
|
||||||
.context("Failed to spawn coordinator thread")?;
|
.context("Failed to spawn coordinator thread")?;
|
||||||
|
|
||||||
println!("Mouth is running. Press {} to record. Ctrl+C to quit.", config.hotkey);
|
|
||||||
|
|
||||||
// Overlay event loop runs on main thread (blocking)
|
// Overlay event loop runs on main thread (blocking)
|
||||||
|
// Tray icon is created inside the overlay app
|
||||||
overlay::run_event_loop(event_loop, overlay_position)
|
overlay::run_event_loop(event_loop, overlay_position)
|
||||||
.map_err(|e| anyhow::anyhow!("Overlay event loop error: {e}"))?;
|
.map_err(|e| anyhow::anyhow!("Overlay event loop error: {e}"))?;
|
||||||
} else {
|
|
||||||
// No overlay — coordinator runs on main thread
|
|
||||||
println!("Mouth is running. Press {} to record. Ctrl+C to quit.", config.hotkey);
|
|
||||||
|
|
||||||
let mut coordinator = Coordinator::new(
|
|
||||||
config,
|
|
||||||
transcriber,
|
|
||||||
vad,
|
|
||||||
recorder_cmd_tx,
|
|
||||||
audio_rx,
|
|
||||||
hotkey_rx,
|
|
||||||
None,
|
|
||||||
);
|
|
||||||
coordinator.run();
|
|
||||||
}
|
|
||||||
|
|
||||||
|
ipc::cleanup();
|
||||||
Ok(())
|
Ok(())
|
||||||
}
|
}
|
||||||
|
|
||||||
|
#[cfg(windows)]
|
||||||
|
fn hide_console() {
|
||||||
|
use windows_sys::Win32::System::Console::GetConsoleWindow;
|
||||||
|
use windows_sys::Win32::UI::WindowsAndMessaging::{ShowWindow, SW_HIDE};
|
||||||
|
unsafe {
|
||||||
|
let console = GetConsoleWindow();
|
||||||
|
if !console.is_null() {
|
||||||
|
ShowWindow(console, SW_HIDE);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|||||||
+19
-6
@@ -1,11 +1,24 @@
|
|||||||
use anyhow::Result;
|
use anyhow::Result;
|
||||||
|
|
||||||
pub fn status() -> Result<()> {
|
use crate::ipc;
|
||||||
let version = env!("CARGO_PKG_VERSION");
|
|
||||||
|
|
||||||
// TODO: Phase 10 — connect to daemon IPC socket/pipe and query status
|
pub fn status() -> Result<()> {
|
||||||
// For now, just show version info
|
match ipc::query_daemon_status() {
|
||||||
println!("Mouth v{version}");
|
Ok(status) => {
|
||||||
println!("Status: not yet implemented (requires daemon IPC)");
|
println!("Mouth v{}", status.version);
|
||||||
|
println!("State: {}", status.state);
|
||||||
|
println!("Model: {}", status.model);
|
||||||
|
println!("Accelerator: {}", status.accelerator);
|
||||||
|
|
||||||
|
let hours = status.uptime_secs / 3600;
|
||||||
|
let mins = (status.uptime_secs % 3600) / 60;
|
||||||
|
let secs = status.uptime_secs % 60;
|
||||||
|
println!("Uptime: {}h {}m {}s", hours, mins, secs);
|
||||||
Ok(())
|
Ok(())
|
||||||
|
}
|
||||||
|
Err(_) => {
|
||||||
|
eprintln!("Mouth is not running.");
|
||||||
|
std::process::exit(1);
|
||||||
|
}
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
+28
-11
@@ -1,4 +1,4 @@
|
|||||||
use std::sync::mpsc;
|
use std::sync::{mpsc, Arc};
|
||||||
use std::thread;
|
use std::thread;
|
||||||
use tracing::{debug, error, info, warn};
|
use tracing::{debug, error, info, warn};
|
||||||
use winit::event_loop::EventLoopProxy;
|
use winit::event_loop::EventLoopProxy;
|
||||||
@@ -9,6 +9,7 @@ use crate::hotkey::HotkeyEvent;
|
|||||||
use crate::overlay::{OverlayEvent, OverlayState};
|
use crate::overlay::{OverlayEvent, OverlayState};
|
||||||
use crate::paste;
|
use crate::paste;
|
||||||
use crate::recorder::{AudioData, RecorderCommand};
|
use crate::recorder::{AudioData, RecorderCommand};
|
||||||
|
use crate::shared_state::SharedState;
|
||||||
use crate::transcriber::Transcriber;
|
use crate::transcriber::Transcriber;
|
||||||
use crate::vad::Vad;
|
use crate::vad::Vad;
|
||||||
|
|
||||||
@@ -24,6 +25,7 @@ enum State {
|
|||||||
pub struct Coordinator {
|
pub struct Coordinator {
|
||||||
config: Config,
|
config: Config,
|
||||||
state: State,
|
state: State,
|
||||||
|
shared_state: Arc<SharedState>,
|
||||||
transcriber: Transcriber,
|
transcriber: Transcriber,
|
||||||
vad: Option<Vad>,
|
vad: Option<Vad>,
|
||||||
recorder_tx: mpsc::Sender<RecorderCommand>,
|
recorder_tx: mpsc::Sender<RecorderCommand>,
|
||||||
@@ -35,6 +37,7 @@ pub struct Coordinator {
|
|||||||
impl Coordinator {
|
impl Coordinator {
|
||||||
pub fn new(
|
pub fn new(
|
||||||
config: Config,
|
config: Config,
|
||||||
|
shared_state: Arc<SharedState>,
|
||||||
transcriber: Transcriber,
|
transcriber: Transcriber,
|
||||||
vad: Option<Vad>,
|
vad: Option<Vad>,
|
||||||
recorder_tx: mpsc::Sender<RecorderCommand>,
|
recorder_tx: mpsc::Sender<RecorderCommand>,
|
||||||
@@ -45,6 +48,7 @@ impl Coordinator {
|
|||||||
Self {
|
Self {
|
||||||
config,
|
config,
|
||||||
state: State::Idle,
|
state: State::Idle,
|
||||||
|
shared_state,
|
||||||
transcriber,
|
transcriber,
|
||||||
vad,
|
vad,
|
||||||
recorder_tx,
|
recorder_tx,
|
||||||
@@ -54,6 +58,16 @@ impl Coordinator {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
fn set_state(&mut self, state: State) {
|
||||||
|
self.state = state;
|
||||||
|
let label = match state {
|
||||||
|
State::Idle => "idle",
|
||||||
|
State::Recording => "recording",
|
||||||
|
State::Transcribing => "transcribing",
|
||||||
|
};
|
||||||
|
self.shared_state.set_state(label);
|
||||||
|
}
|
||||||
|
|
||||||
/// Run the coordinator loop. This blocks until shutdown.
|
/// Run the coordinator loop. This blocks until shutdown.
|
||||||
pub fn run(&mut self) {
|
pub fn run(&mut self) {
|
||||||
info!("Coordinator started");
|
info!("Coordinator started");
|
||||||
@@ -111,7 +125,7 @@ impl Coordinator {
|
|||||||
|
|
||||||
fn start_recording(&mut self) {
|
fn start_recording(&mut self) {
|
||||||
info!("Recording started");
|
info!("Recording started");
|
||||||
self.state = State::Recording;
|
self.set_state(State::Recording);
|
||||||
self.set_overlay(OverlayState::Recording);
|
self.set_overlay(OverlayState::Recording);
|
||||||
|
|
||||||
if self.config.audio_feedback {
|
if self.config.audio_feedback {
|
||||||
@@ -120,23 +134,26 @@ impl Coordinator {
|
|||||||
|
|
||||||
if self.recorder_tx.send(RecorderCommand::Start).is_err() {
|
if self.recorder_tx.send(RecorderCommand::Start).is_err() {
|
||||||
error!("Failed to send start command to recorder");
|
error!("Failed to send start command to recorder");
|
||||||
self.state = State::Idle;
|
self.set_state(State::Idle);
|
||||||
self.set_overlay(OverlayState::Hidden);
|
self.set_overlay(OverlayState::Hidden);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
fn stop_recording(&mut self) {
|
fn stop_recording(&mut self) {
|
||||||
info!("Recording stopped, starting transcription");
|
info!("Recording stopped, starting transcription");
|
||||||
self.state = State::Transcribing;
|
self.set_state(State::Transcribing);
|
||||||
self.set_overlay(OverlayState::Transcribing);
|
self.set_overlay(OverlayState::Transcribing);
|
||||||
|
|
||||||
if self.config.audio_feedback {
|
if self.config.audio_feedback {
|
||||||
audio_feedback::play_blip_down();
|
audio_feedback::play_blip_down();
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Keep recording briefly after the stop signal so trailing speech isn't clipped
|
||||||
|
thread::sleep(std::time::Duration::from_millis(300));
|
||||||
|
|
||||||
if self.recorder_tx.send(RecorderCommand::Stop).is_err() {
|
if self.recorder_tx.send(RecorderCommand::Stop).is_err() {
|
||||||
error!("Failed to send stop command to recorder");
|
error!("Failed to send stop command to recorder");
|
||||||
self.state = State::Idle;
|
self.set_state(State::Idle);
|
||||||
self.set_overlay(OverlayState::Hidden);
|
self.set_overlay(OverlayState::Hidden);
|
||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
@@ -148,7 +165,7 @@ impl Coordinator {
|
|||||||
}
|
}
|
||||||
Err(_) => {
|
Err(_) => {
|
||||||
error!("Failed to receive audio data");
|
error!("Failed to receive audio data");
|
||||||
self.state = State::Idle;
|
self.set_state(State::Idle);
|
||||||
self.set_overlay(OverlayState::Error);
|
self.set_overlay(OverlayState::Error);
|
||||||
self.delayed_hide_overlay();
|
self.delayed_hide_overlay();
|
||||||
}
|
}
|
||||||
@@ -157,7 +174,7 @@ impl Coordinator {
|
|||||||
|
|
||||||
fn cancel_recording(&mut self) {
|
fn cancel_recording(&mut self) {
|
||||||
info!("Recording cancelled");
|
info!("Recording cancelled");
|
||||||
self.state = State::Idle;
|
self.set_state(State::Idle);
|
||||||
|
|
||||||
if self.recorder_tx.send(RecorderCommand::Stop).is_err() {
|
if self.recorder_tx.send(RecorderCommand::Stop).is_err() {
|
||||||
warn!("Failed to send stop command to recorder");
|
warn!("Failed to send stop command to recorder");
|
||||||
@@ -176,7 +193,7 @@ impl Coordinator {
|
|||||||
Ok(filtered) => {
|
Ok(filtered) => {
|
||||||
if filtered.is_empty() {
|
if filtered.is_empty() {
|
||||||
info!("No speech detected by VAD");
|
info!("No speech detected by VAD");
|
||||||
self.state = State::Idle;
|
self.set_state(State::Idle);
|
||||||
self.set_overlay(OverlayState::Hidden);
|
self.set_overlay(OverlayState::Hidden);
|
||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
@@ -199,7 +216,7 @@ impl Coordinator {
|
|||||||
Ok(text) => {
|
Ok(text) => {
|
||||||
if text.is_empty() {
|
if text.is_empty() {
|
||||||
info!("Empty transcription");
|
info!("Empty transcription");
|
||||||
self.state = State::Idle;
|
self.set_state(State::Idle);
|
||||||
self.set_overlay(OverlayState::Hidden);
|
self.set_overlay(OverlayState::Hidden);
|
||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
@@ -218,11 +235,11 @@ impl Coordinator {
|
|||||||
}
|
}
|
||||||
|
|
||||||
self.delayed_hide_overlay();
|
self.delayed_hide_overlay();
|
||||||
self.state = State::Idle;
|
self.set_state(State::Idle);
|
||||||
}
|
}
|
||||||
Err(e) => {
|
Err(e) => {
|
||||||
error!("Transcription failed: {e}");
|
error!("Transcription failed: {e}");
|
||||||
self.state = State::Idle;
|
self.set_state(State::Idle);
|
||||||
self.set_overlay(OverlayState::Error);
|
self.set_overlay(OverlayState::Error);
|
||||||
self.delayed_hide_overlay();
|
self.delayed_hide_overlay();
|
||||||
}
|
}
|
||||||
|
|||||||
+247
-26
@@ -1,5 +1,6 @@
|
|||||||
use anyhow::{bail, Result};
|
use anyhow::{bail, Result};
|
||||||
use rdev::{self, Event, EventType, Key};
|
use rdev::{self, Event, EventType, Key};
|
||||||
|
use std::cell::RefCell;
|
||||||
use std::sync::mpsc;
|
use std::sync::mpsc;
|
||||||
use std::time::{Duration, Instant};
|
use std::time::{Duration, Instant};
|
||||||
use tracing::{debug, error, info};
|
use tracing::{debug, error, info};
|
||||||
@@ -164,77 +165,297 @@ fn parse_key(s: &str) -> Result<Key> {
|
|||||||
"7" => Key::Num7,
|
"7" => Key::Num7,
|
||||||
"8" => Key::Num8,
|
"8" => Key::Num8,
|
||||||
"9" => Key::Num9,
|
"9" => Key::Num9,
|
||||||
|
// Punctuation / symbol keys
|
||||||
|
"[" | "leftbracket" => Key::LeftBracket,
|
||||||
|
"]" | "rightbracket" => Key::RightBracket,
|
||||||
|
";" | "semicolon" => Key::SemiColon,
|
||||||
|
"'" | "quote" => Key::Quote,
|
||||||
|
"`" | "backquote" | "backtick" => Key::BackQuote,
|
||||||
|
"\\" | "backslash" => Key::BackSlash,
|
||||||
|
"," | "comma" => Key::Comma,
|
||||||
|
"." | "dot" | "period" => Key::Dot,
|
||||||
|
"/" | "slash" => Key::Slash,
|
||||||
|
"-" | "minus" => Key::Minus,
|
||||||
|
"=" | "equal" | "equals" => Key::Equal,
|
||||||
|
// Additional non-character keys
|
||||||
|
"printscreen" | "prtsc" => Key::PrintScreen,
|
||||||
|
"scrolllock" => Key::ScrollLock,
|
||||||
|
"pause" | "break" => Key::Pause,
|
||||||
|
"numlock" => Key::NumLock,
|
||||||
|
"capslock" => Key::CapsLock,
|
||||||
|
// Numpad
|
||||||
|
"kp0" | "numpad0" => Key::Kp0,
|
||||||
|
"kp1" | "numpad1" => Key::Kp1,
|
||||||
|
"kp2" | "numpad2" => Key::Kp2,
|
||||||
|
"kp3" | "numpad3" => Key::Kp3,
|
||||||
|
"kp4" | "numpad4" => Key::Kp4,
|
||||||
|
"kp5" | "numpad5" => Key::Kp5,
|
||||||
|
"kp6" | "numpad6" => Key::Kp6,
|
||||||
|
"kp7" | "numpad7" => Key::Kp7,
|
||||||
|
"kp8" | "numpad8" => Key::Kp8,
|
||||||
|
"kp9" | "numpad9" => Key::Kp9,
|
||||||
|
"kpenter" | "numpadenter" => Key::KpReturn,
|
||||||
|
"kpminus" | "numpadminus" => Key::KpMinus,
|
||||||
|
"kpplus" | "numpadplus" => Key::KpPlus,
|
||||||
|
"kpmultiply" | "numpadmultiply" => Key::KpMultiply,
|
||||||
|
"kpdivide" | "numpaddivide" => Key::KpDivide,
|
||||||
|
"kpdelete" | "numpaddelete" => Key::KpDelete,
|
||||||
_ => bail!("Unknown key: {s}"),
|
_ => bail!("Unknown key: {s}"),
|
||||||
};
|
};
|
||||||
Ok(key)
|
Ok(key)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// Convert an rdev Key back to the config string representation.
|
||||||
|
fn key_to_string(key: &Key) -> Option<String> {
|
||||||
|
let s = match key {
|
||||||
|
Key::Space => "space",
|
||||||
|
Key::Return => "enter",
|
||||||
|
Key::Escape => "escape",
|
||||||
|
Key::Tab => "tab",
|
||||||
|
Key::Backspace => "backspace",
|
||||||
|
Key::Delete => "delete",
|
||||||
|
Key::Insert => "insert",
|
||||||
|
Key::Home => "home",
|
||||||
|
Key::End => "end",
|
||||||
|
Key::PageUp => "pageup",
|
||||||
|
Key::PageDown => "pagedown",
|
||||||
|
Key::UpArrow => "up",
|
||||||
|
Key::DownArrow => "down",
|
||||||
|
Key::LeftArrow => "left",
|
||||||
|
Key::RightArrow => "right",
|
||||||
|
Key::F1 => "f1",
|
||||||
|
Key::F2 => "f2",
|
||||||
|
Key::F3 => "f3",
|
||||||
|
Key::F4 => "f4",
|
||||||
|
Key::F5 => "f5",
|
||||||
|
Key::F6 => "f6",
|
||||||
|
Key::F7 => "f7",
|
||||||
|
Key::F8 => "f8",
|
||||||
|
Key::F9 => "f9",
|
||||||
|
Key::F10 => "f10",
|
||||||
|
Key::F11 => "f11",
|
||||||
|
Key::F12 => "f12",
|
||||||
|
Key::KeyA => "a",
|
||||||
|
Key::KeyB => "b",
|
||||||
|
Key::KeyC => "c",
|
||||||
|
Key::KeyD => "d",
|
||||||
|
Key::KeyE => "e",
|
||||||
|
Key::KeyF => "f",
|
||||||
|
Key::KeyG => "g",
|
||||||
|
Key::KeyH => "h",
|
||||||
|
Key::KeyI => "i",
|
||||||
|
Key::KeyJ => "j",
|
||||||
|
Key::KeyK => "k",
|
||||||
|
Key::KeyL => "l",
|
||||||
|
Key::KeyM => "m",
|
||||||
|
Key::KeyN => "n",
|
||||||
|
Key::KeyO => "o",
|
||||||
|
Key::KeyP => "p",
|
||||||
|
Key::KeyQ => "q",
|
||||||
|
Key::KeyR => "r",
|
||||||
|
Key::KeyS => "s",
|
||||||
|
Key::KeyT => "t",
|
||||||
|
Key::KeyU => "u",
|
||||||
|
Key::KeyV => "v",
|
||||||
|
Key::KeyW => "w",
|
||||||
|
Key::KeyX => "x",
|
||||||
|
Key::KeyY => "y",
|
||||||
|
Key::KeyZ => "z",
|
||||||
|
Key::Num0 => "0",
|
||||||
|
Key::Num1 => "1",
|
||||||
|
Key::Num2 => "2",
|
||||||
|
Key::Num3 => "3",
|
||||||
|
Key::Num4 => "4",
|
||||||
|
Key::Num5 => "5",
|
||||||
|
Key::Num6 => "6",
|
||||||
|
Key::Num7 => "7",
|
||||||
|
Key::Num8 => "8",
|
||||||
|
Key::Num9 => "9",
|
||||||
|
Key::LeftBracket => "[",
|
||||||
|
Key::RightBracket => "]",
|
||||||
|
Key::SemiColon => ";",
|
||||||
|
Key::Quote => "'",
|
||||||
|
Key::BackQuote => "`",
|
||||||
|
Key::BackSlash => "\\",
|
||||||
|
Key::Comma => ",",
|
||||||
|
Key::Dot => ".",
|
||||||
|
Key::Slash => "/",
|
||||||
|
Key::Minus => "-",
|
||||||
|
Key::Equal => "=",
|
||||||
|
Key::PrintScreen => "printscreen",
|
||||||
|
Key::ScrollLock => "scrolllock",
|
||||||
|
Key::Pause => "pause",
|
||||||
|
Key::NumLock => "numlock",
|
||||||
|
Key::CapsLock => "capslock",
|
||||||
|
Key::Kp0 => "kp0",
|
||||||
|
Key::Kp1 => "kp1",
|
||||||
|
Key::Kp2 => "kp2",
|
||||||
|
Key::Kp3 => "kp3",
|
||||||
|
Key::Kp4 => "kp4",
|
||||||
|
Key::Kp5 => "kp5",
|
||||||
|
Key::Kp6 => "kp6",
|
||||||
|
Key::Kp7 => "kp7",
|
||||||
|
Key::Kp8 => "kp8",
|
||||||
|
Key::Kp9 => "kp9",
|
||||||
|
Key::KpReturn => "kpenter",
|
||||||
|
Key::KpMinus => "kpminus",
|
||||||
|
Key::KpPlus => "kpplus",
|
||||||
|
Key::KpMultiply => "kpmultiply",
|
||||||
|
Key::KpDivide => "kpdivide",
|
||||||
|
Key::KpDelete => "kpdelete",
|
||||||
|
_ => return None,
|
||||||
|
};
|
||||||
|
Some(s.to_string())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Returns true if the key is a modifier (ctrl, alt, shift, meta).
|
||||||
|
fn is_modifier(key: &Key) -> bool {
|
||||||
|
matches!(
|
||||||
|
key,
|
||||||
|
Key::ControlLeft
|
||||||
|
| Key::ControlRight
|
||||||
|
| Key::Alt
|
||||||
|
| Key::AltGr
|
||||||
|
| Key::ShiftLeft
|
||||||
|
| Key::ShiftRight
|
||||||
|
| Key::MetaLeft
|
||||||
|
| Key::MetaRight
|
||||||
|
)
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Capture a hotkey combination by listening for an actual keypress.
|
||||||
|
/// Blocks until the user presses a non-modifier key while optionally holding modifiers.
|
||||||
|
/// Returns the hotkey string (e.g. "ctrl+[") or None on timeout/error.
|
||||||
|
pub fn capture_hotkey(timeout: Duration) -> Option<String> {
|
||||||
|
let (tx, rx) = mpsc::channel();
|
||||||
|
|
||||||
|
std::thread::spawn(move || {
|
||||||
|
let mut modifier_state = ModifierState::default();
|
||||||
|
|
||||||
|
let callback = move |event: Event| {
|
||||||
|
match event.event_type {
|
||||||
|
EventType::KeyPress(key) => {
|
||||||
|
modifier_state.update(&key, true);
|
||||||
|
|
||||||
|
// Ignore pure modifier presses — wait for a real key
|
||||||
|
if is_modifier(&key) {
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
if let Some(key_name) = key_to_string(&key) {
|
||||||
|
let mut parts = Vec::new();
|
||||||
|
if modifier_state.ctrl {
|
||||||
|
parts.push("ctrl".to_string());
|
||||||
|
}
|
||||||
|
if modifier_state.alt {
|
||||||
|
parts.push("alt".to_string());
|
||||||
|
}
|
||||||
|
if modifier_state.shift {
|
||||||
|
parts.push("shift".to_string());
|
||||||
|
}
|
||||||
|
if modifier_state.meta {
|
||||||
|
parts.push("meta".to_string());
|
||||||
|
}
|
||||||
|
parts.push(key_name);
|
||||||
|
let _ = tx.send(parts.join("+"));
|
||||||
|
}
|
||||||
|
}
|
||||||
|
EventType::KeyRelease(key) => {
|
||||||
|
modifier_state.update(&key, false);
|
||||||
|
}
|
||||||
|
_ => {}
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
let _ = rdev::listen(callback);
|
||||||
|
});
|
||||||
|
|
||||||
|
rx.recv_timeout(timeout).ok()
|
||||||
|
}
|
||||||
|
|
||||||
/// Start the global hotkey listener on the current thread (blocking).
|
/// Start the global hotkey listener on the current thread (blocking).
|
||||||
/// Sends HotkeyEvents to the provided channel.
|
/// Uses `rdev::grab` to intercept and consume hotkey events so they don't
|
||||||
|
/// reach the focused application.
|
||||||
pub fn listen(
|
pub fn listen(
|
||||||
hotkey: HotkeyCombination,
|
hotkey: HotkeyCombination,
|
||||||
cancel_key: HotkeyCombination,
|
cancel_key: HotkeyCombination,
|
||||||
tx: mpsc::Sender<HotkeyEvent>,
|
tx: mpsc::Sender<HotkeyEvent>,
|
||||||
) {
|
) {
|
||||||
let debounce_duration = Duration::from_millis(30);
|
let debounce_duration = Duration::from_millis(30);
|
||||||
let mut last_event_time = Instant::now() - debounce_duration;
|
|
||||||
let mut modifier_state = ModifierState::default();
|
|
||||||
let mut hotkey_held = false;
|
|
||||||
|
|
||||||
info!("Hotkey listener started");
|
info!("Hotkey listener started (grab mode)");
|
||||||
debug!("Hotkey: {:?}", hotkey);
|
debug!("Hotkey: {:?}", hotkey);
|
||||||
debug!("Cancel: {:?}", cancel_key);
|
debug!("Cancel: {:?}", cancel_key);
|
||||||
|
|
||||||
let callback = move |event: Event| {
|
// rdev::grab requires Fn (not FnMut), so wrap mutable state in RefCell
|
||||||
|
struct GrabState {
|
||||||
|
last_event_time: Instant,
|
||||||
|
modifier_state: ModifierState,
|
||||||
|
hotkey_held: bool,
|
||||||
|
}
|
||||||
|
let state = RefCell::new(GrabState {
|
||||||
|
last_event_time: Instant::now() - debounce_duration,
|
||||||
|
modifier_state: ModifierState::default(),
|
||||||
|
hotkey_held: false,
|
||||||
|
});
|
||||||
|
|
||||||
|
let callback = move |event: Event| -> Option<Event> {
|
||||||
|
let mut s = state.borrow_mut();
|
||||||
let now = Instant::now();
|
let now = Instant::now();
|
||||||
match event.event_type {
|
match event.event_type {
|
||||||
EventType::KeyPress(key) => {
|
EventType::KeyPress(key) => {
|
||||||
modifier_state.update(&key, true);
|
s.modifier_state.update(&key, true);
|
||||||
|
|
||||||
// Check cancel key
|
// Check cancel key — swallow it
|
||||||
if key == cancel_key.key && modifier_state.all_held(&cancel_key.modifiers) {
|
if key == cancel_key.key && s.modifier_state.all_held(&cancel_key.modifiers) {
|
||||||
if now.duration_since(last_event_time) >= debounce_duration {
|
if now.duration_since(s.last_event_time) >= debounce_duration {
|
||||||
last_event_time = now;
|
s.last_event_time = now;
|
||||||
debug!("Cancel key pressed");
|
debug!("Cancel key pressed");
|
||||||
if tx.send(HotkeyEvent::Cancel).is_err() {
|
if tx.send(HotkeyEvent::Cancel).is_err() {
|
||||||
error!("Failed to send cancel event");
|
error!("Failed to send cancel event");
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
return;
|
return None;
|
||||||
}
|
}
|
||||||
|
|
||||||
// Check hotkey
|
// Check hotkey — swallow it
|
||||||
if key == hotkey.key && modifier_state.all_held(&hotkey.modifiers) {
|
if key == hotkey.key && s.modifier_state.all_held(&hotkey.modifiers) {
|
||||||
if now.duration_since(last_event_time) >= debounce_duration && !hotkey_held {
|
if now.duration_since(s.last_event_time) >= debounce_duration && !s.hotkey_held {
|
||||||
last_event_time = now;
|
s.last_event_time = now;
|
||||||
hotkey_held = true;
|
s.hotkey_held = true;
|
||||||
debug!("Hotkey pressed");
|
debug!("Hotkey pressed");
|
||||||
if tx.send(HotkeyEvent::Pressed).is_err() {
|
if tx.send(HotkeyEvent::Pressed).is_err() {
|
||||||
error!("Failed to send pressed event");
|
error!("Failed to send pressed event");
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
return None;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
Some(event)
|
||||||
}
|
}
|
||||||
EventType::KeyRelease(key) => {
|
EventType::KeyRelease(key) => {
|
||||||
modifier_state.update(&key, false);
|
s.modifier_state.update(&key, false);
|
||||||
|
|
||||||
// Check hotkey release (for push-to-talk)
|
// Check hotkey release — swallow it
|
||||||
if key == hotkey.key && hotkey_held {
|
if key == hotkey.key && s.hotkey_held {
|
||||||
if now.duration_since(last_event_time) >= debounce_duration {
|
if now.duration_since(s.last_event_time) >= debounce_duration {
|
||||||
last_event_time = now;
|
s.last_event_time = now;
|
||||||
hotkey_held = false;
|
s.hotkey_held = false;
|
||||||
debug!("Hotkey released");
|
debug!("Hotkey released");
|
||||||
if tx.send(HotkeyEvent::Released).is_err() {
|
if tx.send(HotkeyEvent::Released).is_err() {
|
||||||
error!("Failed to send released event");
|
error!("Failed to send released event");
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
return None;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
Some(event)
|
||||||
}
|
}
|
||||||
_ => {}
|
_ => Some(event),
|
||||||
}
|
}
|
||||||
};
|
};
|
||||||
|
|
||||||
if let Err(e) = rdev::listen(callback) {
|
if let Err(e) = rdev::grab(callback) {
|
||||||
error!("Hotkey listener error: {:?}", e);
|
error!("Hotkey grab error: {:?}", e);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
+233
@@ -0,0 +1,233 @@
|
|||||||
|
use anyhow::{Context, Result};
|
||||||
|
use serde::{Deserialize, Serialize};
|
||||||
|
use std::io::{Read, Write};
|
||||||
|
use std::sync::Arc;
|
||||||
|
use tracing::{debug, info};
|
||||||
|
|
||||||
|
use crate::shared_state::SharedState;
|
||||||
|
|
||||||
|
/// Status response sent over IPC.
|
||||||
|
#[derive(Debug, Serialize, Deserialize)]
|
||||||
|
pub struct DaemonStatus {
|
||||||
|
pub version: String,
|
||||||
|
pub state: String,
|
||||||
|
pub model: String,
|
||||||
|
pub accelerator: String,
|
||||||
|
pub uptime_secs: u64,
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Returns the platform-specific IPC path.
|
||||||
|
pub fn ipc_path() -> String {
|
||||||
|
#[cfg(unix)]
|
||||||
|
{
|
||||||
|
"/tmp/mouth.sock".to_string()
|
||||||
|
}
|
||||||
|
#[cfg(windows)]
|
||||||
|
{
|
||||||
|
r"\\.\pipe\mouth".to_string()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Check if a daemon is already running by attempting to connect.
|
||||||
|
pub fn is_daemon_running() -> bool {
|
||||||
|
query_daemon_status().is_ok()
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Query the running daemon for its status.
|
||||||
|
pub fn query_daemon_status() -> Result<DaemonStatus> {
|
||||||
|
let path = ipc_path();
|
||||||
|
|
||||||
|
#[cfg(unix)]
|
||||||
|
{
|
||||||
|
use std::os::unix::net::UnixStream;
|
||||||
|
let mut stream = UnixStream::connect(&path)
|
||||||
|
.with_context(|| format!("Could not connect to daemon at {path}"))?;
|
||||||
|
stream
|
||||||
|
.set_read_timeout(Some(std::time::Duration::from_secs(2)))
|
||||||
|
.ok();
|
||||||
|
let mut buf = String::new();
|
||||||
|
stream.read_to_string(&mut buf)?;
|
||||||
|
let status: DaemonStatus =
|
||||||
|
serde_json::from_str(&buf).context("Invalid status response from daemon")?;
|
||||||
|
Ok(status)
|
||||||
|
}
|
||||||
|
|
||||||
|
#[cfg(windows)]
|
||||||
|
{
|
||||||
|
use std::fs::OpenOptions;
|
||||||
|
let mut file = OpenOptions::new()
|
||||||
|
.read(true)
|
||||||
|
.write(true)
|
||||||
|
.open(&path)
|
||||||
|
.with_context(|| format!("Could not connect to daemon at {path}"))?;
|
||||||
|
// Write a newline to trigger the server to respond
|
||||||
|
file.write_all(b"\n")?;
|
||||||
|
file.flush()?;
|
||||||
|
// Read response — use a fixed buffer since read_to_string waits for EOF
|
||||||
|
let mut buf = vec![0u8; 4096];
|
||||||
|
let n = file.read(&mut buf)?;
|
||||||
|
let text = String::from_utf8_lossy(&buf[..n]);
|
||||||
|
let status: DaemonStatus =
|
||||||
|
serde_json::from_str(&text).context("Invalid status response from daemon")?;
|
||||||
|
Ok(status)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Start the IPC listener on the current thread (blocking).
|
||||||
|
/// Call this from a dedicated thread.
|
||||||
|
pub fn start_ipc_listener(shared_state: Arc<SharedState>) -> Result<()> {
|
||||||
|
let path = ipc_path();
|
||||||
|
info!("Starting IPC listener at {path}");
|
||||||
|
|
||||||
|
#[cfg(unix)]
|
||||||
|
{
|
||||||
|
unix_listener(&path, shared_state)
|
||||||
|
}
|
||||||
|
|
||||||
|
#[cfg(windows)]
|
||||||
|
{
|
||||||
|
windows_listener(&path, shared_state)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#[cfg(unix)]
|
||||||
|
fn unix_listener(path: &str, shared_state: Arc<SharedState>) -> Result<()> {
|
||||||
|
use std::os::unix::net::UnixListener;
|
||||||
|
|
||||||
|
// Clean up stale socket
|
||||||
|
if std::path::Path::new(path).exists() {
|
||||||
|
if is_daemon_running() {
|
||||||
|
anyhow::bail!("Another instance of Mouth is already running");
|
||||||
|
}
|
||||||
|
std::fs::remove_file(path).ok();
|
||||||
|
}
|
||||||
|
|
||||||
|
let listener = UnixListener::bind(path).context("Failed to bind IPC socket")?;
|
||||||
|
info!("IPC listener ready");
|
||||||
|
|
||||||
|
for stream in listener.incoming() {
|
||||||
|
match stream {
|
||||||
|
Ok(mut stream) => {
|
||||||
|
let status = build_status(&shared_state);
|
||||||
|
match serde_json::to_string(&status) {
|
||||||
|
Ok(json) => {
|
||||||
|
if let Err(e) = stream.write_all(json.as_bytes()) {
|
||||||
|
debug!("Failed to write IPC response: {e}");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
Err(e) => {
|
||||||
|
warn!("Failed to serialize status: {e}");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
Err(e) => {
|
||||||
|
debug!("IPC accept error: {e}");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
#[cfg(windows)]
|
||||||
|
fn windows_listener(path: &str, shared_state: Arc<SharedState>) -> Result<()> {
|
||||||
|
use windows_sys::Win32::Foundation::{CloseHandle, INVALID_HANDLE_VALUE};
|
||||||
|
use windows_sys::Win32::Storage::FileSystem::{
|
||||||
|
FlushFileBuffers, ReadFile, WriteFile, PIPE_ACCESS_DUPLEX,
|
||||||
|
};
|
||||||
|
use windows_sys::Win32::System::Pipes::{
|
||||||
|
ConnectNamedPipe, CreateNamedPipeW, DisconnectNamedPipe,
|
||||||
|
PIPE_READMODE_BYTE, PIPE_TYPE_BYTE, PIPE_UNLIMITED_INSTANCES, PIPE_WAIT,
|
||||||
|
};
|
||||||
|
|
||||||
|
let wide_path: Vec<u16> = path.encode_utf16().chain(std::iter::once(0)).collect();
|
||||||
|
|
||||||
|
info!("IPC listener ready");
|
||||||
|
|
||||||
|
loop {
|
||||||
|
let handle = unsafe {
|
||||||
|
CreateNamedPipeW(
|
||||||
|
wide_path.as_ptr(),
|
||||||
|
PIPE_ACCESS_DUPLEX,
|
||||||
|
PIPE_TYPE_BYTE | PIPE_READMODE_BYTE | PIPE_WAIT,
|
||||||
|
PIPE_UNLIMITED_INSTANCES,
|
||||||
|
4096,
|
||||||
|
4096,
|
||||||
|
0,
|
||||||
|
std::ptr::null(),
|
||||||
|
)
|
||||||
|
};
|
||||||
|
|
||||||
|
if handle == INVALID_HANDLE_VALUE {
|
||||||
|
tracing::error!("Failed to create named pipe");
|
||||||
|
std::thread::sleep(std::time::Duration::from_secs(1));
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Wait for a client to connect
|
||||||
|
let connected = unsafe { ConnectNamedPipe(handle, std::ptr::null_mut()) };
|
||||||
|
if connected == 0 {
|
||||||
|
let err = std::io::Error::last_os_error();
|
||||||
|
// ERROR_PIPE_CONNECTED (535) means client already connected — that's ok
|
||||||
|
if err.raw_os_error() != Some(535) {
|
||||||
|
debug!("ConnectNamedPipe error: {err}");
|
||||||
|
unsafe { CloseHandle(handle) };
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Read the trigger byte from the client (just 1 byte to unblock)
|
||||||
|
let mut read_buf = [0u8; 1];
|
||||||
|
let mut bytes_read: u32 = 0;
|
||||||
|
unsafe {
|
||||||
|
ReadFile(
|
||||||
|
handle,
|
||||||
|
read_buf.as_mut_ptr(),
|
||||||
|
1,
|
||||||
|
&mut bytes_read,
|
||||||
|
std::ptr::null_mut(),
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Write the status response
|
||||||
|
let status = build_status(&shared_state);
|
||||||
|
if let Ok(json) = serde_json::to_string(&status) {
|
||||||
|
let bytes = json.as_bytes();
|
||||||
|
let mut written: u32 = 0;
|
||||||
|
unsafe {
|
||||||
|
WriteFile(
|
||||||
|
handle,
|
||||||
|
bytes.as_ptr().cast(),
|
||||||
|
bytes.len() as u32,
|
||||||
|
&mut written,
|
||||||
|
std::ptr::null_mut(),
|
||||||
|
);
|
||||||
|
FlushFileBuffers(handle);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
unsafe {
|
||||||
|
DisconnectNamedPipe(handle);
|
||||||
|
CloseHandle(handle);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
fn build_status(shared_state: &SharedState) -> DaemonStatus {
|
||||||
|
DaemonStatus {
|
||||||
|
version: env!("CARGO_PKG_VERSION").to_string(),
|
||||||
|
state: shared_state.get_state(),
|
||||||
|
model: shared_state.model.clone(),
|
||||||
|
accelerator: shared_state.accelerator.clone(),
|
||||||
|
uptime_secs: shared_state.uptime_secs(),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Clean up the IPC socket (Unix only).
|
||||||
|
pub fn cleanup() {
|
||||||
|
#[cfg(unix)]
|
||||||
|
{
|
||||||
|
let path = ipc_path();
|
||||||
|
std::fs::remove_file(&path).ok();
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -3,10 +3,12 @@ mod cli;
|
|||||||
mod config;
|
mod config;
|
||||||
mod coordinator;
|
mod coordinator;
|
||||||
mod hotkey;
|
mod hotkey;
|
||||||
|
mod ipc;
|
||||||
mod model_cache;
|
mod model_cache;
|
||||||
mod overlay;
|
mod overlay;
|
||||||
mod paste;
|
mod paste;
|
||||||
mod recorder;
|
mod recorder;
|
||||||
|
mod shared_state;
|
||||||
mod transcriber;
|
mod transcriber;
|
||||||
mod vad;
|
mod vad;
|
||||||
|
|
||||||
|
|||||||
@@ -82,6 +82,23 @@ pub fn ensure_model(model_name: &str) -> Result<ModelPaths> {
|
|||||||
})
|
})
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// Ensure the Silero VAD model is downloaded and return its path.
|
||||||
|
pub fn ensure_vad_model() -> Result<PathBuf> {
|
||||||
|
let repo_id = "onnx-community/silero-vad";
|
||||||
|
let model_file = "onnx/model.onnx";
|
||||||
|
|
||||||
|
let api = Api::new().context("Failed to create HuggingFace Hub API")?;
|
||||||
|
let repo = api.model(repo_id.to_string());
|
||||||
|
|
||||||
|
info!("Ensuring Silero VAD model from {repo_id}");
|
||||||
|
let path = repo
|
||||||
|
.get(model_file)
|
||||||
|
.with_context(|| format!("Failed to download VAD model from {repo_id}"))?;
|
||||||
|
debug!("VAD model: {}", path.display());
|
||||||
|
|
||||||
|
Ok(path)
|
||||||
|
}
|
||||||
|
|
||||||
/// Check if model files are already cached.
|
/// Check if model files are already cached.
|
||||||
pub fn is_model_cached(model_name: &str) -> bool {
|
pub fn is_model_cached(model_name: &str) -> bool {
|
||||||
ensure_model(model_name).is_ok()
|
ensure_model(model_name).is_ok()
|
||||||
|
|||||||
+139
-2
@@ -8,8 +8,8 @@ use winit::window::{Window, WindowAttributes, WindowId, WindowLevel};
|
|||||||
|
|
||||||
use crate::config::OverlayPosition;
|
use crate::config::OverlayPosition;
|
||||||
|
|
||||||
const OVERLAY_WIDTH: u32 = 200;
|
const OVERLAY_WIDTH: u32 = 150;
|
||||||
const OVERLAY_HEIGHT: u32 = 36;
|
const OVERLAY_HEIGHT: u32 = 18;
|
||||||
|
|
||||||
/// State of the overlay display.
|
/// State of the overlay display.
|
||||||
#[derive(Debug, Clone, Copy, PartialEq)]
|
#[derive(Debug, Clone, Copy, PartialEq)]
|
||||||
@@ -34,6 +34,8 @@ struct OverlayApp {
|
|||||||
surface: Option<softbuffer::Surface<std::rc::Rc<Window>, std::rc::Rc<Window>>>,
|
surface: Option<softbuffer::Surface<std::rc::Rc<Window>, std::rc::Rc<Window>>>,
|
||||||
state: OverlayState,
|
state: OverlayState,
|
||||||
position: OverlayPosition,
|
position: OverlayPosition,
|
||||||
|
_tray_icon: Option<tray_icon::TrayIcon>,
|
||||||
|
tray_exit_id: Option<tray_icon::menu::MenuId>,
|
||||||
}
|
}
|
||||||
|
|
||||||
impl OverlayApp {
|
impl OverlayApp {
|
||||||
@@ -99,6 +101,43 @@ impl OverlayApp {
|
|||||||
window.set_visible(visible);
|
window.set_visible(visible);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
fn create_tray_icon(&mut self) {
|
||||||
|
use tray_icon::menu::{Menu, MenuItem};
|
||||||
|
use tray_icon::TrayIconBuilder;
|
||||||
|
|
||||||
|
let menu = Menu::new();
|
||||||
|
let exit_item = MenuItem::new("Exit", true, None);
|
||||||
|
let exit_id = exit_item.id().clone();
|
||||||
|
if let Err(e) = menu.append(&exit_item) {
|
||||||
|
warn!("Failed to add tray menu item: {e}");
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
let icon = match load_tray_icon() {
|
||||||
|
Ok(i) => i,
|
||||||
|
Err(e) => {
|
||||||
|
warn!("Failed to load tray icon: {e}");
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
match TrayIconBuilder::new()
|
||||||
|
.with_menu(Box::new(menu))
|
||||||
|
.with_tooltip("Mouth — Speech to Text")
|
||||||
|
.with_icon(icon)
|
||||||
|
.build()
|
||||||
|
{
|
||||||
|
Ok(tray) => {
|
||||||
|
info!("System tray icon created");
|
||||||
|
self._tray_icon = Some(tray);
|
||||||
|
self.tray_exit_id = Some(exit_id);
|
||||||
|
}
|
||||||
|
Err(e) => {
|
||||||
|
warn!("Failed to create tray icon: {e}");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
impl ApplicationHandler<OverlayEvent> for OverlayApp {
|
impl ApplicationHandler<OverlayEvent> for OverlayApp {
|
||||||
@@ -154,6 +193,9 @@ impl ApplicationHandler<OverlayEvent> for OverlayApp {
|
|||||||
error!("Failed to create overlay window: {e}");
|
error!("Failed to create overlay window: {e}");
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Create tray icon (must be done on the main/event-loop thread)
|
||||||
|
self.create_tray_icon();
|
||||||
}
|
}
|
||||||
|
|
||||||
fn user_event(&mut self, event_loop: &ActiveEventLoop, event: OverlayEvent) {
|
fn user_event(&mut self, event_loop: &ActiveEventLoop, event: OverlayEvent) {
|
||||||
@@ -176,6 +218,99 @@ impl ApplicationHandler<OverlayEvent> for OverlayApp {
|
|||||||
self.draw();
|
self.draw();
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
fn about_to_wait(&mut self, event_loop: &ActiveEventLoop) {
|
||||||
|
// Poll tray menu events
|
||||||
|
if let Some(exit_id) = &self.tray_exit_id {
|
||||||
|
if let Ok(event) = tray_icon::menu::MenuEvent::receiver().try_recv() {
|
||||||
|
if event.id() == exit_id {
|
||||||
|
info!("Exit requested via tray icon");
|
||||||
|
crate::ipc::cleanup();
|
||||||
|
event_loop.exit();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
fn load_tray_icon() -> Result<tray_icon::Icon, Box<dyn std::error::Error>> {
|
||||||
|
const S: u32 = 32;
|
||||||
|
let mut pixels = vec![0u8; (S * S * 4) as usize];
|
||||||
|
|
||||||
|
let cx = S as f32 / 2.0;
|
||||||
|
|
||||||
|
for y in 0..S {
|
||||||
|
for x in 0..S {
|
||||||
|
let fx = x as f32 + 0.5;
|
||||||
|
let fy = y as f32 + 0.5;
|
||||||
|
let idx = ((y * S + x) * 4) as usize;
|
||||||
|
|
||||||
|
let mut alpha: f32 = 0.0;
|
||||||
|
|
||||||
|
// Microphone body: rounded rectangle (capsule shape)
|
||||||
|
// Center x=16, from y=3 to y=18, radius 5
|
||||||
|
let mic_top = 3.0;
|
||||||
|
let mic_bot = 18.0;
|
||||||
|
let mic_r = 5.5;
|
||||||
|
let mic_cx = cx;
|
||||||
|
{
|
||||||
|
let dy = fy.clamp(mic_top + mic_r, mic_bot - mic_r);
|
||||||
|
let dist = ((fx - mic_cx).powi(2) + (fy - dy).powi(2)).sqrt();
|
||||||
|
if dist <= mic_r {
|
||||||
|
alpha = 1.0;
|
||||||
|
} else if dist <= mic_r + 1.0 {
|
||||||
|
alpha = alpha.max(mic_r + 1.0 - dist); // anti-alias
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Cradle arc: U-shape below mic, from y=14 to y=22
|
||||||
|
{
|
||||||
|
let arc_cy = 14.0;
|
||||||
|
let arc_r = 8.5;
|
||||||
|
let arc_thickness = 2.2;
|
||||||
|
let dx = fx - cx;
|
||||||
|
let dy = fy - arc_cy;
|
||||||
|
let dist = (dx * dx + dy * dy).sqrt();
|
||||||
|
if fy >= arc_cy && dist >= arc_r - arc_thickness / 2.0 && dist <= arc_r + arc_thickness / 2.0 {
|
||||||
|
let edge_outer = (arc_r + arc_thickness / 2.0 - dist).min(1.0).max(0.0);
|
||||||
|
let edge_inner = (dist - (arc_r - arc_thickness / 2.0)).min(1.0).max(0.0);
|
||||||
|
alpha = alpha.max(edge_outer.min(edge_inner));
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Stem: vertical line from arc bottom to near bottom
|
||||||
|
{
|
||||||
|
let stem_top = 22.0;
|
||||||
|
let stem_bot = 27.0;
|
||||||
|
let stem_w = 1.2;
|
||||||
|
if fy >= stem_top && fy <= stem_bot && (fx - cx).abs() <= stem_w {
|
||||||
|
let edge = (stem_w - (fx - cx).abs()).min(1.0);
|
||||||
|
alpha = alpha.max(edge);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Base: horizontal line at bottom
|
||||||
|
{
|
||||||
|
let base_y = 27.0;
|
||||||
|
let base_h = 2.0;
|
||||||
|
let base_hw = 5.0;
|
||||||
|
if fy >= base_y && fy <= base_y + base_h && (fx - cx).abs() <= base_hw {
|
||||||
|
let edge = (base_hw - (fx - cx).abs()).min(1.0);
|
||||||
|
alpha = alpha.max(edge);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
let a = (alpha.clamp(0.0, 1.0) * 255.0) as u8;
|
||||||
|
// White icon with alpha (looks good on both light and dark taskbars)
|
||||||
|
pixels[idx] = 255; // R
|
||||||
|
pixels[idx + 1] = 255; // G
|
||||||
|
pixels[idx + 2] = 255; // B
|
||||||
|
pixels[idx + 3] = a; // A
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
let icon = tray_icon::Icon::from_rgba(pixels, S, S)?;
|
||||||
|
Ok(icon)
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Create an event loop and return the proxy for sending events.
|
/// Create an event loop and return the proxy for sending events.
|
||||||
@@ -195,6 +330,8 @@ pub fn run_event_loop(
|
|||||||
surface: None,
|
surface: None,
|
||||||
state: OverlayState::Hidden,
|
state: OverlayState::Hidden,
|
||||||
position,
|
position,
|
||||||
|
_tray_icon: None,
|
||||||
|
tray_exit_id: None,
|
||||||
};
|
};
|
||||||
|
|
||||||
event_loop.run_app(&mut app)
|
event_loop.run_app(&mut app)
|
||||||
|
|||||||
+9
-1
@@ -7,6 +7,9 @@ use std::sync::{Arc, Mutex};
|
|||||||
use tracing::{debug, error, info, warn};
|
use tracing::{debug, error, info, warn};
|
||||||
|
|
||||||
const TARGET_SAMPLE_RATE: u32 = 16000;
|
const TARGET_SAMPLE_RATE: u32 = 16000;
|
||||||
|
/// Silence prepended to recordings to give the model a clean lead-in,
|
||||||
|
/// compensating for mic startup latency.
|
||||||
|
const LEAD_IN_MS: u32 = 300;
|
||||||
|
|
||||||
/// Commands sent to the recorder.
|
/// Commands sent to the recorder.
|
||||||
#[derive(Debug)]
|
#[derive(Debug)]
|
||||||
@@ -252,8 +255,13 @@ pub fn run(
|
|||||||
|
|
||||||
debug!("Resampled to {} samples at {}Hz", samples.len(), TARGET_SAMPLE_RATE);
|
debug!("Resampled to {} samples at {}Hz", samples.len(), TARGET_SAMPLE_RATE);
|
||||||
|
|
||||||
|
// Prepend silence to compensate for mic startup latency
|
||||||
|
let lead_in_samples = (TARGET_SAMPLE_RATE * LEAD_IN_MS / 1000) as usize;
|
||||||
|
let mut padded = vec![0.0f32; lead_in_samples];
|
||||||
|
padded.extend_from_slice(&samples);
|
||||||
|
|
||||||
let audio = AudioData {
|
let audio = AudioData {
|
||||||
samples,
|
samples: padded,
|
||||||
sample_rate: TARGET_SAMPLE_RATE,
|
sample_rate: TARGET_SAMPLE_RATE,
|
||||||
};
|
};
|
||||||
|
|
||||||
|
|||||||
@@ -0,0 +1,35 @@
|
|||||||
|
use std::sync::RwLock;
|
||||||
|
use std::time::Instant;
|
||||||
|
|
||||||
|
/// Thread-safe shared state accessible by the coordinator, IPC listener, and tray icon.
|
||||||
|
pub struct SharedState {
|
||||||
|
pub state: RwLock<String>,
|
||||||
|
pub model: String,
|
||||||
|
pub accelerator: String,
|
||||||
|
pub started_at: Instant,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl SharedState {
|
||||||
|
pub fn new(model: String, accelerator: String) -> Self {
|
||||||
|
Self {
|
||||||
|
state: RwLock::new("idle".to_string()),
|
||||||
|
model,
|
||||||
|
accelerator,
|
||||||
|
started_at: Instant::now(),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
pub fn set_state(&self, state: &str) {
|
||||||
|
if let Ok(mut s) = self.state.write() {
|
||||||
|
*s = state.to_string();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
pub fn get_state(&self) -> String {
|
||||||
|
self.state.read().map(|s| s.clone()).unwrap_or_else(|_| "unknown".to_string())
|
||||||
|
}
|
||||||
|
|
||||||
|
pub fn uptime_secs(&self) -> u64 {
|
||||||
|
self.started_at.elapsed().as_secs()
|
||||||
|
}
|
||||||
|
}
|
||||||
+6
-6
@@ -22,7 +22,7 @@ pub struct Transcriber {
|
|||||||
encoder: Session,
|
encoder: Session,
|
||||||
decoder: Session,
|
decoder: Session,
|
||||||
vocab: Vec<String>,
|
vocab: Vec<String>,
|
||||||
blank_id: i64,
|
blank_id: i32,
|
||||||
vocab_size: usize,
|
vocab_size: usize,
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -45,7 +45,7 @@ impl Transcriber {
|
|||||||
|
|
||||||
let vocab = load_vocab(&paths.vocab)?;
|
let vocab = load_vocab(&paths.vocab)?;
|
||||||
let vocab_size = vocab.len();
|
let vocab_size = vocab.len();
|
||||||
let blank_id = (vocab_size - 1) as i64; // <blk> is the last token
|
let blank_id = (vocab_size - 1) as i32; // <blk> is the last token
|
||||||
info!("Vocab loaded: {vocab_size} tokens, blank_id={blank_id}");
|
info!("Vocab loaded: {vocab_size} tokens, blank_id={blank_id}");
|
||||||
|
|
||||||
Ok(Self {
|
Ok(Self {
|
||||||
@@ -121,7 +121,7 @@ impl Transcriber {
|
|||||||
Ok((enc_data.to_vec(), feat_dim, encoded_length))
|
Ok((enc_data.to_vec(), feat_dim, encoded_length))
|
||||||
}
|
}
|
||||||
|
|
||||||
fn tdt_greedy_decode(&mut self, encoder_output: &[f32], feat_dim: usize, encoded_length: usize) -> Result<Vec<i64>> {
|
fn tdt_greedy_decode(&mut self, encoder_output: &[f32], feat_dim: usize, encoded_length: usize) -> Result<Vec<i32>> {
|
||||||
// Determine decoder LSTM state dimensions by inspecting input metadata
|
// Determine decoder LSTM state dimensions by inspecting input metadata
|
||||||
// Default fallback values
|
// Default fallback values
|
||||||
let mut state_shape: [usize; 3] = [1, 1, 640];
|
let mut state_shape: [usize; 3] = [1, 1, 640];
|
||||||
@@ -168,7 +168,7 @@ impl Transcriber {
|
|||||||
let frame = Array3::from_shape_vec([1, feat_dim, 1], frame_data)?;
|
let frame = Array3::from_shape_vec([1, feat_dim, 1], frame_data)?;
|
||||||
|
|
||||||
let targets = ndarray::Array2::from_shape_vec((1, 1), vec![prev_token])?;
|
let targets = ndarray::Array2::from_shape_vec((1, 1), vec![prev_token])?;
|
||||||
let target_length = ndarray::Array1::from_vec(vec![1i64]);
|
let target_length = ndarray::Array1::from_vec(vec![1i32]);
|
||||||
|
|
||||||
let outputs = self.decoder.run(vec![
|
let outputs = self.decoder.run(vec![
|
||||||
make_input("encoder_outputs", Value::from_array(frame)?.into_dyn()),
|
make_input("encoder_outputs", Value::from_array(frame)?.into_dyn()),
|
||||||
@@ -186,7 +186,7 @@ impl Transcriber {
|
|||||||
let token_logits = &output_data[..self.vocab_size];
|
let token_logits = &output_data[..self.vocab_size];
|
||||||
let duration_logits = &output_data[self.vocab_size..];
|
let duration_logits = &output_data[self.vocab_size..];
|
||||||
|
|
||||||
let token_id = argmax(token_logits) as i64;
|
let token_id = argmax(token_logits) as i32;
|
||||||
let duration = if !duration_logits.is_empty() {
|
let duration = if !duration_logits.is_empty() {
|
||||||
argmax(duration_logits)
|
argmax(duration_logits)
|
||||||
} else {
|
} else {
|
||||||
@@ -225,7 +225,7 @@ impl Transcriber {
|
|||||||
Ok(tokens)
|
Ok(tokens)
|
||||||
}
|
}
|
||||||
|
|
||||||
fn tokens_to_text(&self, tokens: &[i64]) -> String {
|
fn tokens_to_text(&self, tokens: &[i32]) -> String {
|
||||||
let mut text = String::new();
|
let mut text = String::new();
|
||||||
for &token_id in tokens {
|
for &token_id in tokens {
|
||||||
if token_id >= 0 && (token_id as usize) < self.vocab.len() {
|
if token_id >= 0 && (token_id as usize) < self.vocab.len() {
|
||||||
|
|||||||
Reference in New Issue
Block a user