steve 6165e34f6f docs: P3 alerts design spec
Phase 3 sub-spec covering the alerts engine, notification channels, and
UI (P3-05/06/07). Brainstorm ran 2026-05-04; all ten design decisions
locked before this spec was written.

Key decisions captured:

- Hardcoded rule set, no operator-tunable thresholds in v1. Six rules:
  backup_failed, forget_failed, prune_failed, check_failed,
  stale_schedule, agent_offline.
- Hybrid engine cadence: event hooks at MarkJobFinished + offline-sweeper
  for immediate triggers; one 60s ticker for stale-schedule detection +
  auto-resolution sweeps.
- Auto-resolve when underlying condition clears; manual Resolve any time;
  Acknowledge as a separate I-have-seen-it intermediate state that does
  NOT close the alert.
- v1 channels: native ntfy + webhook. Apprise + SMTP deferred. Channel
  scope is global only — no per-host or per-severity routing.
- Webhook payload is one stable JSON envelope shape across raised /
  acknowledged / resolved / test events; ntfy uses the standard publish
  format with severity → priority mapping.
- Per-channel Send Test Notification button hits the real send path with
  a synthetic info-severity event; inline green-tick / red-cross result.
- Dedup by (host_id, kind, resolved_at IS NULL); last_seen_at bumped on
  every confirming tick so the UI can render still happening · Ns ago
  without re-notifying.
- Top-level /alerts page; Settings shell with Notifications sub-tab.
  Per-host vitals Open alerts cell deep-links into filtered list.
- Best-effort fire-and-forget delivery with 5s timeout; failures logged
  to a new notification_log table but never retried. Alert row in the DB
  is the source of truth.

Migrations:
- 0013 adds alerts.last_seen_at (column-level ALTER per CLAUDE.md)
- 0014 adds notification_channels + notification_log tables

Wireframe: _diag/p3-alerts-wireframe/wireframe.html
2026-05-04 18:39:26 +01:00
2026-05-04 18:39:26 +01:00
2026-05-01 00:03:59 +01:00
2026-05-01 00:03:59 +01:00
2026-05-02 11:12:58 +01:00
2026-05-02 11:12:58 +01:00
2026-05-01 00:03:59 +01:00

restic-manager

Self-hosted, browser-based, single-pane-of-glass for managing restic backups across a fleet of Linux and Windows endpoints.

Status: pre-alpha. Phase 0 (project bootstrap) complete; Phase 1 (MVP) in progress. See spec.md for the design and tasks.md for the roadmap.

What it does (target)

  • Central visibility into backup state for every endpoint
  • Trigger any restic operation remotely (backup, forget, prune, check, unlock, snapshots, stats, diff, restore)
  • Manage per-host backup schedules from the UI
  • Live job progress streamed back to the UI
  • Restore wizard (browse snapshots, pick paths, restore to original or alternate host)
  • Repo health surfacing (size, dedup ratio, last check, lock state)
  • Alerting on failure or staleness
  • Cross-platform agent (Linux + Windows)
  • Ransomware-resistant repo access via append-only credentials

Architecture (one-line summary)

A small Go control-plane on the Proxmox host, lightweight Go agents on each endpoint that hold an outbound WebSocket to the control-plane, and a restic/rest-server on Unraid that holds the actual backup data. The control-plane never touches backup bytes.

Full architecture diagram and component breakdown: spec.md §3.

Repository layout

cmd/server/        control-plane binary
cmd/agent/         endpoint agent binary
internal/api       shared API types (REST + WS envelopes)
internal/server/   HTTP, WS, UI handlers
internal/agent/    service integration, restic runner, local scheduler
internal/restic    restic CLI wrapper
internal/store     SQLite persistence
internal/crypto    secret encryption
internal/auth      passwords, sessions, agent tokens
web/               server-rendered templates + static assets
deploy/            Dockerfile, docker-compose.yml, install scripts
design/            UI wireframes (Phase 0 design pass)

Local development

Requires Go 1.25+ (built and tested on 1.26). The floor is set by modernc.org/sqlite v1.50.

make build           # builds cmd/server and cmd/agent into ./bin
make test            # runs go test ./...
make lint            # runs golangci-lint
make run-server      # runs the server (dev defaults)

License

PolyForm Noncommercial 1.0.0 — see LICENSE. Free for personal, hobby, research, educational, governmental, and other noncommercial use. Commercial use requires a separate license.

S
Description
No description provided
Readme 2.9 MiB
2026-05-09 12:58:56 +01:00
Languages
Go 68.6%
HTML 28.5%
CSS 1.4%
TypeScript 0.5%
Makefile 0.4%
Other 0.5%