phase 1 foundations: api types, store, crypto, auth
Lands the bottom three layers of Phase 1: P1-08 internal/api: protocol_version + envelope + every WS message shape from spec.md §6.2 (Hello, Heartbeat, Job*, Schedule*, etc). Wire-format tests pin the JSON shape so a rename here breaks tests instead of silently breaking the agent. P1-02 + P1-03 internal/store: SQLite via modernc.org/sqlite, embed.FS + a tiny version table for hand-rolled migrations. 0001_initial.sql covers every table from spec.md §5 plus enrollment_tokens and host_schedule_version. Typed accessors for users / sessions / enrollment / audit. WAL + foreign_keys + busy_timeout on by default. P1-06 internal/crypto: XChaCha20-Poly1305 AEAD wrapper with per-message random nonce. Key file lifecycle (generate + refuse-to-overwrite, load with size validation). Optional additionalData binds ciphertext to the row that owns it. P1-04 internal/auth (partial — passwords + tokens; sessions middleware lands with the HTTP handlers): argon2id following RFC 9106 (64 MiB / t=3 / p=4 / 32B), constant-time verify. HashToken stores SHA-256 of session/agent/enrollment tokens so a stolen DB doesn't hand over credentials. Build floor moves to Go 1.25 (modernc.org/sqlite v1.50+ requires it); CI + Dockerfile + README updated. Markdown lint diagnostics on tasks.md cleared. All packages tested. ~70 new tests pass in <1s. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -20,6 +20,7 @@ Sizes: **S** = under a day, **M** = 1–3 days, **L** = 3–7 days.
|
||||
## Phase 1 — MVP: enrollment, visibility, on-demand backup
|
||||
|
||||
### Server foundations
|
||||
|
||||
- [ ] **P1-01** (M) HTTP server scaffolding (`chi`, structured logging via `slog`, graceful shutdown)
|
||||
- [ ] **P1-02** (M) SQLite store layer (`modernc.org/sqlite`) + migrations (`golang-migrate` or hand-rolled)
|
||||
- [ ] **P1-03** (M) Schema for `users`, `sessions`, `hosts`, `repos`, `credentials`, `jobs`, `job_logs`, `snapshots`, `audit_log`
|
||||
@@ -29,6 +30,7 @@ Sizes: **S** = under a day, **M** = 1–3 days, **L** = 3–7 days.
|
||||
- [ ] **P1-07** (M) Audit log writer + middleware
|
||||
|
||||
### Agent ↔ server protocol
|
||||
|
||||
- [ ] **P1-08** (M) Define shared API types in `internal/api` (Go structs, JSON tags)
|
||||
- [ ] **P1-09** (L) WebSocket transport (`nhooyr.io/websocket`), framed JSON envelopes, request/response correlation, ping/pong, reconnect with backoff
|
||||
- [ ] **P1-10** (M) Enrollment flow: `POST /api/agents/enroll` with one-time token → returns persistent bearer + cert pin
|
||||
@@ -36,6 +38,7 @@ Sizes: **S** = under a day, **M** = 1–3 days, **L** = 3–7 days.
|
||||
- [ ] **P1-12** (S) Heartbeat handler (mark host offline after 90s without heartbeat)
|
||||
|
||||
### Agent foundations
|
||||
|
||||
- [ ] **P1-13** (M) Agent config file (`/etc/restic-manager/agent.yaml`); Windows path deferred to Phase 2
|
||||
- [ ] **P1-14** (M) Service integration: systemd unit (Linux only in Phase 1; Windows service entrypoint deferred to Phase 2 — see P2-16)
|
||||
- [ ] **P1-15** (M) Outbound WS client (`github.com/coder/websocket`) with reconnect, server cert pinning, `protocol_version` advertisement in `hello`
|
||||
@@ -43,6 +46,7 @@ Sizes: **S** = under a day, **M** = 1–3 days, **L** = 3–7 days.
|
||||
- [ ] **P1-17** (S) Host metadata collection (OS, arch, hostname, restic version, agent version, protocol_version)
|
||||
|
||||
### Run-now backup
|
||||
|
||||
- [ ] **P1-18** (L) Job lifecycle: queued → running → succeeded/failed/cancelled, persisted with logs
|
||||
- [ ] **P1-19** (M) Server endpoint `POST /api/hosts/:id/jobs` to dispatch a `backup` command
|
||||
- [ ] **P1-20** (M) Agent executes `restic backup`, streams stdout/stderr + parsed JSON events back as `job.progress` / `log.stream`
|
||||
@@ -50,6 +54,7 @@ Sizes: **S** = under a day, **M** = 1–3 days, **L** = 3–7 days.
|
||||
- [ ] **P1-22** (S) Snapshot listing: `restic snapshots --json`, cached projection table, refresh after each backup
|
||||
|
||||
### UI (HTMX + Tailwind)
|
||||
|
||||
- [ ] **P1-23** (M) Base layout, login page, session-aware nav
|
||||
- [ ] **P1-24** (M) Dashboard: host cards (status dot, last backup, repo size)
|
||||
- [ ] **P1-25** (M) Host detail page: snapshots tab + run-now button
|
||||
@@ -58,10 +63,12 @@ Sizes: **S** = under a day, **M** = 1–3 days, **L** = 3–7 days.
|
||||
- [ ] **P1-28** (S) Tailwind build via `tailwindcss` standalone binary (no Node)
|
||||
|
||||
### Install scripts
|
||||
|
||||
- [ ] **P1-29** (M) `install.sh` (Linux): detects arch, downloads agent, installs systemd unit, enrolls. Also detects existing restic timers/cron (`systemctl list-timers --all | grep -i restic`, `crontab -l`, `/etc/cron.d/`, `/etc/cron.daily/`) and prints them with the disable commands — does **not** auto-disable, since heuristic matches could be unrelated tooling
|
||||
- [ ] **P1-31** (S) Server endpoint to serve agent binaries + install scripts (signed)
|
||||
|
||||
### Phase 1 acceptance
|
||||
|
||||
- One Linux host can enroll, appear in the dashboard, and a backup can be triggered from the UI with live log streaming. Snapshots list updates after success.
|
||||
- Windows binary builds cleanly in CI (`.gitea/workflows/ci.yml`) but is not service-tested or installer-shipped in Phase 1 — that lands in Phase 2 (P2-16, P2-17).
|
||||
- Agent ↔ server `protocol_version` handshake rejects mismatched versions with a clear error rather than failing on JSON parse.
|
||||
@@ -89,6 +96,7 @@ Sizes: **S** = under a day, **M** = 1–3 days, **L** = 3–7 days.
|
||||
- [ ] **P2-17** (M) `install.ps1` (Windows): downloads agent, installs as service, enrolls; detects existing scheduled tasks named `*restic*` and prints them for manual review
|
||||
|
||||
### Phase 2 acceptance
|
||||
|
||||
- Schedules created in UI run on agents on time; retention is applied; admin can prune from UI; repo health visible per host. Pre/post hooks fire correctly (verified with a Docker stop/start example and a `mysqldump` example) and are rejected on non-backup schedule kinds. Bandwidth limits honoured.
|
||||
- A Windows host can enroll, appear in the dashboard, and run a backup with live log streaming — closing the cross-platform gap left by Phase 1.
|
||||
|
||||
@@ -107,6 +115,7 @@ Sizes: **S** = under a day, **M** = 1–3 days, **L** = 3–7 days.
|
||||
- [ ] **P3-09** (S) `diff` between two snapshots in UI
|
||||
|
||||
### Phase 3 acceptance
|
||||
|
||||
- A file deleted on a host can be restored from the UI in under 2 minutes. A failed backup raises an alert via the configured channel within 60s.
|
||||
|
||||
---
|
||||
@@ -124,6 +133,7 @@ Sizes: **S** = under a day, **M** = 1–3 days, **L** = 3–7 days.
|
||||
- [ ] **P4-09** (S) Document Prometheus integration + sample Grafana dashboard JSON
|
||||
|
||||
### Phase 4 acceptance
|
||||
|
||||
- Non-admin users see an appropriately limited UI. Agents upgrade via apt/choco with one admin-triggered action. OIDC login works against at least one provider (Authelia or Authentik). Prometheus can scrape `/metrics` and the sample Grafana dashboard renders with live data.
|
||||
|
||||
---
|
||||
@@ -139,6 +149,7 @@ Sizes: **S** = under a day, **M** = 1–3 days, **L** = 3–7 days.
|
||||
- [ ] **P5-07** (S) Sample `docker-compose.yml` with TLS via Caddy sidecar (also demonstrates `RM_TRUSTED_PROXY`)
|
||||
|
||||
### Phase 5 acceptance
|
||||
|
||||
- A stranger can read the docs and stand up a working install in under 30 minutes.
|
||||
|
||||
---
|
||||
|
||||
Reference in New Issue
Block a user