c9368de904
P0-01 Go module + cmd/server + cmd/agent skeletons + internal/ tree
P0-02 LICENSE (PolyForm NC 1.0.0), README, CONTRIBUTING
P0-03 golangci-lint, pre-commit, .editorconfig, .gitignore
P0-04 Gitea Actions CI: test (race+coverage), lint, cross-platform build matrix
P0-05 Dockerfile.server (multi-stage, distroless/static), docker-compose.yml
P0-06 Makefile with build/test/lint/fmt/run/release targets
build, vet, test, and cross-compile to linux/{amd64,arm64} + windows/amd64
all verified locally.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
149 lines
9.5 KiB
Markdown
149 lines
9.5 KiB
Markdown
# restic-manager — Tasks
|
||
|
||
Tasks are grouped by phase. Each task has an ID for cross-referencing, an estimated size (S/M/L), and acceptance criteria.
|
||
|
||
Sizes: **S** = under a day, **M** = 1–3 days, **L** = 3–7 days.
|
||
|
||
---
|
||
|
||
## Phase 0 — Project bootstrap
|
||
|
||
- [x] **P0-01** (S) Initialize Go module, `cmd/server`, `cmd/agent`, baseline `internal/` packages
|
||
- [x] **P0-02** (S) Add LICENSE (PolyForm Noncommercial 1.0.0), README stub, CONTRIBUTING placeholder
|
||
- [x] **P0-03** (S) Set up `golangci-lint`, `gofumpt`, `goimports`; pre-commit config
|
||
- [x] **P0-04** (S) ~~GitHub Actions~~ Gitea Actions: build matrix (linux amd64/arm64, windows amd64), unit tests, lint
|
||
- [x] **P0-05** (S) `Dockerfile.server` (multi-stage, distroless), `deploy/docker-compose.yml`
|
||
- [x] **P0-06** (S) Makefile / ~~`taskfile.yml`~~ with common targets (`build`, `test`, `run`, `release`)
|
||
|
||
---
|
||
|
||
## Phase 1 — MVP: enrollment, visibility, on-demand backup
|
||
|
||
### Server foundations
|
||
- [ ] **P1-01** (M) HTTP server scaffolding (`chi`, structured logging via `slog`, graceful shutdown)
|
||
- [ ] **P1-02** (M) SQLite store layer (`modernc.org/sqlite`) + migrations (`golang-migrate` or hand-rolled)
|
||
- [ ] **P1-03** (M) Schema for `users`, `sessions`, `hosts`, `repos`, `credentials`, `jobs`, `job_logs`, `snapshots`, `audit_log`
|
||
- [ ] **P1-04** (M) Auth: argon2id password hashing, login/logout, session cookies, CSRF middleware
|
||
- [ ] **P1-05** (S) First-run admin bootstrap (printed one-time setup token in server logs)
|
||
- [ ] **P1-06** (M) Secret encryption helper (AEAD with key from `RM_SECRET_KEY_FILE`)
|
||
- [ ] **P1-07** (M) Audit log writer + middleware
|
||
|
||
### Agent ↔ server protocol
|
||
- [ ] **P1-08** (M) Define shared API types in `internal/api` (Go structs, JSON tags)
|
||
- [ ] **P1-09** (L) WebSocket transport (`nhooyr.io/websocket`), framed JSON envelopes, request/response correlation, ping/pong, reconnect with backoff
|
||
- [ ] **P1-10** (M) Enrollment flow: `POST /api/agents/enroll` with one-time token → returns persistent bearer + cert pin
|
||
- [ ] **P1-11** (M) Agent registration on connect (`hello` message → upsert host record, mark online)
|
||
- [ ] **P1-12** (S) Heartbeat handler (mark host offline after 90s without heartbeat)
|
||
|
||
### Agent foundations
|
||
- [ ] **P1-13** (M) Agent config file (`/etc/restic-manager/agent.yaml` / `%PROGRAMDATA%\restic-manager\agent.yaml`)
|
||
- [ ] **P1-14** (M) Service integration: systemd unit + Windows service entrypoint
|
||
- [ ] **P1-15** (M) Outbound WS client with reconnect, server cert pinning
|
||
- [ ] **P1-16** (M) Restic wrapper: locate `restic` binary, run with `--json`, stream parsed events
|
||
- [ ] **P1-17** (S) Host metadata collection (OS, arch, hostname, restic version, agent version)
|
||
|
||
### Run-now backup
|
||
- [ ] **P1-18** (L) Job lifecycle: queued → running → succeeded/failed/cancelled, persisted with logs
|
||
- [ ] **P1-19** (M) Server endpoint `POST /api/hosts/:id/jobs` to dispatch a `backup` command
|
||
- [ ] **P1-20** (M) Agent executes `restic backup`, streams stdout/stderr + parsed JSON events back as `job.progress` / `log.stream`
|
||
- [ ] **P1-21** (M) Server persists log stream to `job_logs`, exposes `WS /api/jobs/:id/stream` for live tailing
|
||
- [ ] **P1-22** (S) Snapshot listing: `restic snapshots --json`, cached projection table, refresh after each backup
|
||
|
||
### UI (HTMX + Tailwind)
|
||
- [ ] **P1-23** (M) Base layout, login page, session-aware nav
|
||
- [ ] **P1-24** (M) Dashboard: host cards (status dot, last backup, repo size)
|
||
- [ ] **P1-25** (M) Host detail page: snapshots tab + run-now button
|
||
- [ ] **P1-26** (M) Live job log viewer (WS-driven, auto-scroll, cancel button)
|
||
- [ ] **P1-27** (S) "Add host" flow: generate token, copy install command snippet
|
||
- [ ] **P1-28** (S) Tailwind build via `tailwindcss` standalone binary (no Node)
|
||
|
||
### Install scripts
|
||
- [ ] **P1-29** (M) `install.sh` (Linux): detects arch, downloads agent, installs systemd unit, enrolls
|
||
- [ ] **P1-30** (M) `install.ps1` (Windows): downloads agent, installs as service, enrolls
|
||
- [ ] **P1-31** (S) Server endpoint to serve agent binaries + install scripts (signed)
|
||
|
||
### Phase 1 acceptance
|
||
- One Linux + one Windows host can enroll, appear in the dashboard, and a backup can be triggered from the UI with live log streaming. Snapshots list updates after success.
|
||
|
||
---
|
||
|
||
## Phase 2 — Scheduling, retention, repo operations
|
||
|
||
- [ ] **P2-01** (M) Schedule schema + CRUD API
|
||
- [ ] **P2-02** (L) Server-pushed schedule reconciliation (server is source of truth; agent applies)
|
||
- [ ] **P2-03** (M) Agent local scheduler (`robfig/cron/v3`); persists next-fire times across restarts
|
||
- [ ] **P2-04** (M) Schedule editor UI (paths, excludes, tags, cron, retention)
|
||
- [ ] **P2-05** (M) `forget` command with retention policy (keep-last/daily/weekly/monthly/yearly)
|
||
- [ ] **P2-06** (M) `prune` command (admin-only, uses non-append-only credential)
|
||
- [ ] **P2-07** (S) `check` command (random subset + `--read-data-subset`)
|
||
- [ ] **P2-08** (S) `unlock` command
|
||
- [ ] **P2-09** (M) Repo stats panel: size, dedup ratio, snapshot count, last check time, lock state
|
||
- [ ] **P2-10** (S) Run-now buttons for forget/prune/check/unlock on host detail page
|
||
- [ ] **P2-11** (S) Schedule "next run" / "last run" surfaced on host card
|
||
- [ ] **P2-12** (S) Bandwidth limit fields on schedule editor (`--limit-upload`, `--limit-download`); also overridable on run-now jobs
|
||
- [ ] **P2-13** (M) Pre/post backup hooks: schema (`Schedule.pre_hook`, `Schedule.post_hook`, `Host.pre_hook_default`, `Host.post_hook_default`), encrypted at rest, admin-only edit, audit-logged
|
||
- [ ] **P2-14** (M) Agent execution of hooks: configurable shell per host, `pre_hook` failure aborts backup, `post_hook` always runs with `RM_JOB_STATUS` env var, stdout/stderr captured into `JobLog` with prefix
|
||
- [ ] **P2-15** (S) Hook editor UI on schedule + host pages, with sensible warnings (e.g. "this hook runs as the agent service user")
|
||
|
||
### Phase 2 acceptance
|
||
- Schedules created in UI run on agents on time; retention is applied; admin can prune from UI; repo health visible per host. Pre/post hooks fire correctly (verified with a Docker stop/start example and a `mysqldump` example). Bandwidth limits honoured.
|
||
|
||
---
|
||
|
||
## Phase 3 — Restore, alerts, audit
|
||
|
||
- [ ] **P3-01** (L) Restore wizard backend: snapshot tree browse via `restic ls --json`, path picker, target selection
|
||
- [ ] **P3-02** (L) Restore wizard UI (multi-step: host → snapshot → paths → target → confirm)
|
||
- [ ] **P3-03** (M) Restore execution: `restic restore` invocation, progress streaming
|
||
- [ ] **P3-04** (L) Cross-host restore: target agent receives a temporary scoped read credential for source host's repo (single-job, auto-revoked); UI supports source→target path remapping; warns when source paths need root and target service user is non-root
|
||
- [ ] **P3-05** (M) Alert engine: rule evaluation loop (failed backup, stale schedule, agent offline, check failed)
|
||
- [ ] **P3-06** (M) Notification channels: webhook, ntfy, SMTP email
|
||
- [ ] **P3-07** (S) Alert UI: list, acknowledge, resolve
|
||
- [ ] **P3-08** (S) Audit log UI with filters (user, action, target, time range)
|
||
- [ ] **P3-09** (S) `diff` between two snapshots in UI
|
||
|
||
### Phase 3 acceptance
|
||
- A file deleted on a host can be restored from the UI in under 2 minutes. A failed backup raises an alert via the configured channel within 60s.
|
||
|
||
---
|
||
|
||
## Phase 4 — Self-update, RBAC polish, OIDC
|
||
|
||
- [ ] **P4-01** (L) Agent self-update: signed binary published by server, agent downloads, verifies, swaps, restarts
|
||
- [ ] **P4-02** (M) Agent version reporting on dashboard; "update all" admin action
|
||
- [ ] **P4-03** (M) RBAC enforcement at API layer (admin / operator / viewer)
|
||
- [ ] **P4-04** (S) User management UI (create/edit/disable, role assignment, password reset)
|
||
- [ ] **P4-05** (L) OIDC login (generic provider config, group → role mapping)
|
||
- [ ] **P4-06** (M) Repo size trend graphs (sparkline on host card, full chart on repo page)
|
||
- [ ] **P4-07** (S) Per-host tags + dashboard filtering by tag
|
||
- [ ] **P4-08** (M) Prometheus `/metrics` endpoint: per-host gauges (last backup timestamp, last backup status, repo size, snapshot count, agent online), server gauges (active alerts, build info), job duration histograms; protected by bearer token or IP allow-list
|
||
- [ ] **P4-09** (S) Document Prometheus integration + sample Grafana dashboard JSON
|
||
|
||
### Phase 4 acceptance
|
||
- Non-admin users see an appropriately limited UI. Agents update themselves with one click. OIDC login works against at least one provider (Authelia or Authentik). Prometheus can scrape `/metrics` and the sample Grafana dashboard renders with live data.
|
||
|
||
---
|
||
|
||
## Phase 5 — OSS readiness
|
||
|
||
- [ ] **P5-01** (M) Documentation site (mdBook or similar) with install, concepts, security model, screenshots
|
||
- [ ] **P5-02** (S) `CONTRIBUTING.md`, `CODE_OF_CONDUCT.md`, issue + PR templates
|
||
- [ ] **P5-03** (S) Release automation: `goreleaser` for binaries + Docker image to GHCR
|
||
- [ ] **P5-04** (S) Demo screenshots / short Loom walkthrough in README
|
||
- [ ] **P5-05** (S) `SECURITY.md` with disclosure process
|
||
- [ ] **P5-06** (M) End-to-end test suite in CI (Playwright vs. compose stack with sibling Linux agent)
|
||
- [ ] **P5-07** (S) Sample `docker-compose.yml` with TLS via Caddy sidecar
|
||
- [ ] **P5-08** (S) Optional Prometheus `/metrics` endpoint
|
||
|
||
### Phase 5 acceptance
|
||
- A stranger can read the docs and stand up a working install in under 30 minutes.
|
||
|
||
---
|
||
|
||
## Cross-cutting / ongoing
|
||
|
||
- [ ] **X-01** Keep CHANGELOG.md updated (Keep-a-Changelog format)
|
||
- [ ] **X-02** Track restic version compatibility matrix
|
||
- [ ] **X-03** Periodic dependency updates (`dependabot` or `renovate`)
|
||
- [ ] **X-04** Threat-model review at end of each phase
|