spec/tasks: address pre-Phase-1 design feedback
Doc-only changes captured before any Phase 1 code lands. spec.md: - §4.1 nhooyr.io/websocket → github.com/coder/websocket (the maintained fork; the original is unmaintained) - §4.1 RM_LISTEN documented as source of truth for the bind port; add RM_TRUSTED_PROXY env var for X-Forwarded-* handling behind Caddy/Traefik - §4.2 Phase 1 ships Linux only; Windows binaries continue to build in CI to keep the codebase portable, but service integration + installer move to Phase 2 - §4.2 self-update via apt/choco, not bespoke signed binaries - §5 add Host.protocol_version + Host.applied_schedule_version - §6.2 lock protocol_version handshake semantics (clean error on mismatch, not weird JSON parse failures) - §6.2 schedule reconciliation when server unreachable: agent keeps firing last-known-good indefinitely; server's view canonical on reconnect; UI surfaces drift via applied_schedule_version - §6.2 schedule.set carries schedule_version; new schedule.ack agent→server message - §10.1 cross-reference RM_LISTEN ↔ compose port mapping - §14.3 hooks rejected at validation on non-backup schedule kinds tasks.md: - P1-14 / P1-30 (Windows service + install.ps1) → Phase 2 as P2-16 / P2-17 - P1-29 install.sh detects existing restic timers/cron and prints disable commands, doesn't auto-disable - Phase 1 acceptance: drop Windows from end-to-end criterion, require windows cross-compile in CI - P4-01 rewritten: package-manager-based update delivery - P5-08 removed (duplicate of P4-08 Prometheus /metrics) - Various references updated No Go code changes; build still clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -36,11 +36,11 @@ Sizes: **S** = under a day, **M** = 1–3 days, **L** = 3–7 days.
|
||||
- [ ] **P1-12** (S) Heartbeat handler (mark host offline after 90s without heartbeat)
|
||||
|
||||
### Agent foundations
|
||||
- [ ] **P1-13** (M) Agent config file (`/etc/restic-manager/agent.yaml` / `%PROGRAMDATA%\restic-manager\agent.yaml`)
|
||||
- [ ] **P1-14** (M) Service integration: systemd unit + Windows service entrypoint
|
||||
- [ ] **P1-15** (M) Outbound WS client with reconnect, server cert pinning
|
||||
- [ ] **P1-13** (M) Agent config file (`/etc/restic-manager/agent.yaml`); Windows path deferred to Phase 2
|
||||
- [ ] **P1-14** (M) Service integration: systemd unit (Linux only in Phase 1; Windows service entrypoint deferred to Phase 2 — see P2-16)
|
||||
- [ ] **P1-15** (M) Outbound WS client (`github.com/coder/websocket`) with reconnect, server cert pinning, `protocol_version` advertisement in `hello`
|
||||
- [ ] **P1-16** (M) Restic wrapper: locate `restic` binary, run with `--json`, stream parsed events
|
||||
- [ ] **P1-17** (S) Host metadata collection (OS, arch, hostname, restic version, agent version)
|
||||
- [ ] **P1-17** (S) Host metadata collection (OS, arch, hostname, restic version, agent version, protocol_version)
|
||||
|
||||
### Run-now backup
|
||||
- [ ] **P1-18** (L) Job lifecycle: queued → running → succeeded/failed/cancelled, persisted with logs
|
||||
@@ -58,12 +58,13 @@ Sizes: **S** = under a day, **M** = 1–3 days, **L** = 3–7 days.
|
||||
- [ ] **P1-28** (S) Tailwind build via `tailwindcss` standalone binary (no Node)
|
||||
|
||||
### Install scripts
|
||||
- [ ] **P1-29** (M) `install.sh` (Linux): detects arch, downloads agent, installs systemd unit, enrolls
|
||||
- [ ] **P1-30** (M) `install.ps1` (Windows): downloads agent, installs as service, enrolls
|
||||
- [ ] **P1-29** (M) `install.sh` (Linux): detects arch, downloads agent, installs systemd unit, enrolls. Also detects existing restic timers/cron (`systemctl list-timers --all | grep -i restic`, `crontab -l`, `/etc/cron.d/`, `/etc/cron.daily/`) and prints them with the disable commands — does **not** auto-disable, since heuristic matches could be unrelated tooling
|
||||
- [ ] **P1-31** (S) Server endpoint to serve agent binaries + install scripts (signed)
|
||||
|
||||
### Phase 1 acceptance
|
||||
- One Linux + one Windows host can enroll, appear in the dashboard, and a backup can be triggered from the UI with live log streaming. Snapshots list updates after success.
|
||||
- One Linux host can enroll, appear in the dashboard, and a backup can be triggered from the UI with live log streaming. Snapshots list updates after success.
|
||||
- Windows binary builds cleanly in CI (`.gitea/workflows/ci.yml`) but is not service-tested or installer-shipped in Phase 1 — that lands in Phase 2 (P2-16, P2-17).
|
||||
- Agent ↔ server `protocol_version` handshake rejects mismatched versions with a clear error rather than failing on JSON parse.
|
||||
|
||||
---
|
||||
|
||||
@@ -83,10 +84,13 @@ Sizes: **S** = under a day, **M** = 1–3 days, **L** = 3–7 days.
|
||||
- [ ] **P2-12** (S) Bandwidth limit fields on schedule editor (`--limit-upload`, `--limit-download`); also overridable on run-now jobs
|
||||
- [ ] **P2-13** (M) Pre/post backup hooks: schema (`Schedule.pre_hook`, `Schedule.post_hook`, `Host.pre_hook_default`, `Host.post_hook_default`), encrypted at rest, admin-only edit, audit-logged
|
||||
- [ ] **P2-14** (M) Agent execution of hooks: configurable shell per host, `pre_hook` failure aborts backup, `post_hook` always runs with `RM_JOB_STATUS` env var, stdout/stderr captured into `JobLog` with prefix
|
||||
- [ ] **P2-15** (S) Hook editor UI on schedule + host pages, with sensible warnings (e.g. "this hook runs as the agent service user")
|
||||
- [ ] **P2-15** (S) Hook editor UI on schedule + host pages, with sensible warnings (e.g. "this hook runs as the agent service user"); validation enforces hooks only on `kind = backup` schedules (see spec.md §14.3)
|
||||
- [ ] **P2-16** (M) Windows service integration: agent runs under the Service Control Manager via `golang.org/x/sys/windows/svc`; install/uninstall/start/stop wired up
|
||||
- [ ] **P2-17** (M) `install.ps1` (Windows): downloads agent, installs as service, enrolls; detects existing scheduled tasks named `*restic*` and prints them for manual review
|
||||
|
||||
### Phase 2 acceptance
|
||||
- Schedules created in UI run on agents on time; retention is applied; admin can prune from UI; repo health visible per host. Pre/post hooks fire correctly (verified with a Docker stop/start example and a `mysqldump` example). Bandwidth limits honoured.
|
||||
- Schedules created in UI run on agents on time; retention is applied; admin can prune from UI; repo health visible per host. Pre/post hooks fire correctly (verified with a Docker stop/start example and a `mysqldump` example) and are rejected on non-backup schedule kinds. Bandwidth limits honoured.
|
||||
- A Windows host can enroll, appear in the dashboard, and run a backup with live log streaming — closing the cross-platform gap left by Phase 1.
|
||||
|
||||
---
|
||||
|
||||
@@ -107,10 +111,10 @@ Sizes: **S** = under a day, **M** = 1–3 days, **L** = 3–7 days.
|
||||
|
||||
---
|
||||
|
||||
## Phase 4 — Self-update, RBAC polish, OIDC
|
||||
## Phase 4 — Update delivery, RBAC polish, OIDC
|
||||
|
||||
- [ ] **P4-01** (L) Agent self-update: signed binary published by server, agent downloads, verifies, swaps, restarts
|
||||
- [ ] **P4-02** (M) Agent version reporting on dashboard; "update all" admin action
|
||||
- [ ] **P4-01** (M) Update delivery via OS package managers — host an apt repo (Linux) and Chocolatey package (Windows) on gitea releases. `restic-manager-agent update` is a thin wrapper over `apt-get install --only-upgrade restic-manager-agent` / `choco upgrade`. Trades flexibility for a much smaller security surface than bespoke signed binaries (see spec.md §4.2)
|
||||
- [ ] **P4-02** (M) Agent version reporting on dashboard: surface "agent N versions behind server"; "update all" admin action calls the package-manager wrapper on each host
|
||||
- [ ] **P4-03** (M) RBAC enforcement at API layer (admin / operator / viewer)
|
||||
- [ ] **P4-04** (S) User management UI (create/edit/disable, role assignment, password reset)
|
||||
- [ ] **P4-05** (L) OIDC login (generic provider config, group → role mapping)
|
||||
@@ -120,7 +124,7 @@ Sizes: **S** = under a day, **M** = 1–3 days, **L** = 3–7 days.
|
||||
- [ ] **P4-09** (S) Document Prometheus integration + sample Grafana dashboard JSON
|
||||
|
||||
### Phase 4 acceptance
|
||||
- Non-admin users see an appropriately limited UI. Agents update themselves with one click. OIDC login works against at least one provider (Authelia or Authentik). Prometheus can scrape `/metrics` and the sample Grafana dashboard renders with live data.
|
||||
- Non-admin users see an appropriately limited UI. Agents upgrade via apt/choco with one admin-triggered action. OIDC login works against at least one provider (Authelia or Authentik). Prometheus can scrape `/metrics` and the sample Grafana dashboard renders with live data.
|
||||
|
||||
---
|
||||
|
||||
@@ -132,8 +136,7 @@ Sizes: **S** = under a day, **M** = 1–3 days, **L** = 3–7 days.
|
||||
- [ ] **P5-04** (S) Demo screenshots / short Loom walkthrough in README
|
||||
- [ ] **P5-05** (S) `SECURITY.md` with disclosure process
|
||||
- [ ] **P5-06** (M) End-to-end test suite in CI (Playwright vs. compose stack with sibling Linux agent)
|
||||
- [ ] **P5-07** (S) Sample `docker-compose.yml` with TLS via Caddy sidecar
|
||||
- [ ] **P5-08** (S) Optional Prometheus `/metrics` endpoint
|
||||
- [ ] **P5-07** (S) Sample `docker-compose.yml` with TLS via Caddy sidecar (also demonstrates `RM_TRUSTED_PROXY`)
|
||||
|
||||
### Phase 5 acceptance
|
||||
- A stranger can read the docs and stand up a working install in under 30 minutes.
|
||||
|
||||
Reference in New Issue
Block a user