Phase 3 sub-spec covering the alerts engine, notification channels, and UI (P3-05/06/07). Brainstorm ran 2026-05-04; all ten design decisions locked before this spec was written. Key decisions captured: - Hardcoded rule set, no operator-tunable thresholds in v1. Six rules: backup_failed, forget_failed, prune_failed, check_failed, stale_schedule, agent_offline. - Hybrid engine cadence: event hooks at MarkJobFinished + offline-sweeper for immediate triggers; one 60s ticker for stale-schedule detection + auto-resolution sweeps. - Auto-resolve when underlying condition clears; manual Resolve any time; Acknowledge as a separate I-have-seen-it intermediate state that does NOT close the alert. - v1 channels: native ntfy + webhook. Apprise + SMTP deferred. Channel scope is global only — no per-host or per-severity routing. - Webhook payload is one stable JSON envelope shape across raised / acknowledged / resolved / test events; ntfy uses the standard publish format with severity → priority mapping. - Per-channel Send Test Notification button hits the real send path with a synthetic info-severity event; inline green-tick / red-cross result. - Dedup by (host_id, kind, resolved_at IS NULL); last_seen_at bumped on every confirming tick so the UI can render still happening · Ns ago without re-notifying. - Top-level /alerts page; Settings shell with Notifications sub-tab. Per-host vitals Open alerts cell deep-links into filtered list. - Best-effort fire-and-forget delivery with 5s timeout; failures logged to a new notification_log table but never retried. Alert row in the DB is the source of truth. Migrations: - 0013 adds alerts.last_seen_at (column-level ALTER per CLAUDE.md) - 0014 adds notification_channels + notification_log tables Wireframe: _diag/p3-alerts-wireframe/wireframe.html
restic-manager
Self-hosted, browser-based, single-pane-of-glass for managing restic backups across a fleet of Linux and Windows endpoints.
Status: pre-alpha. Phase 0 (project bootstrap) complete; Phase 1 (MVP) in progress. See
spec.mdfor the design andtasks.mdfor the roadmap.
What it does (target)
- Central visibility into backup state for every endpoint
- Trigger any restic operation remotely (
backup,forget,prune,check,unlock,snapshots,stats,diff,restore) - Manage per-host backup schedules from the UI
- Live job progress streamed back to the UI
- Restore wizard (browse snapshots, pick paths, restore to original or alternate host)
- Repo health surfacing (size, dedup ratio, last check, lock state)
- Alerting on failure or staleness
- Cross-platform agent (Linux + Windows)
- Ransomware-resistant repo access via append-only credentials
Architecture (one-line summary)
A small Go control-plane on the Proxmox host, lightweight Go agents on each
endpoint that hold an outbound WebSocket to the control-plane, and a
restic/rest-server on Unraid that holds the actual backup data. The
control-plane never touches backup bytes.
Full architecture diagram and component breakdown:
spec.md §3.
Repository layout
cmd/server/ control-plane binary
cmd/agent/ endpoint agent binary
internal/api shared API types (REST + WS envelopes)
internal/server/ HTTP, WS, UI handlers
internal/agent/ service integration, restic runner, local scheduler
internal/restic restic CLI wrapper
internal/store SQLite persistence
internal/crypto secret encryption
internal/auth passwords, sessions, agent tokens
web/ server-rendered templates + static assets
deploy/ Dockerfile, docker-compose.yml, install scripts
design/ UI wireframes (Phase 0 design pass)
Local development
Requires Go 1.25+ (built and tested on 1.26). The floor is set by
modernc.org/sqlite v1.50.
make build # builds cmd/server and cmd/agent into ./bin
make test # runs go test ./...
make lint # runs golangci-lint
make run-server # runs the server (dev defaults)
License
PolyForm Noncommercial 1.0.0 — see LICENSE. Free for personal,
hobby, research, educational, governmental, and other noncommercial use.
Commercial use requires a separate license.