Files
restic-manager/tasks.md
T
steve 25aa001135 phase 0: project bootstrap
P0-01 Go module + cmd/server + cmd/agent skeletons + internal/ tree
P0-02 LICENSE (PolyForm NC 1.0.0), README, CONTRIBUTING
P0-03 golangci-lint, pre-commit, .editorconfig, .gitignore
P0-04 Gitea Actions CI: test (race+coverage), lint, cross-platform build matrix
P0-05 Dockerfile.server (multi-stage, distroless/static), docker-compose.yml
P0-06 Makefile with build/test/lint/fmt/run/release targets

build, vet, test, and cross-compile to linux/{amd64,arm64} + windows/amd64
all verified locally.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 00:03:59 +01:00

9.5 KiB
Raw Blame History

restic-manager — Tasks

Tasks are grouped by phase. Each task has an ID for cross-referencing, an estimated size (S/M/L), and acceptance criteria.

Sizes: S = under a day, M = 13 days, L = 37 days.


Phase 0 — Project bootstrap

  • P0-01 (S) Initialize Go module, cmd/server, cmd/agent, baseline internal/ packages
  • P0-02 (S) Add LICENSE (PolyForm Noncommercial 1.0.0), README stub, CONTRIBUTING placeholder
  • P0-03 (S) Set up golangci-lint, gofumpt, goimports; pre-commit config
  • P0-04 (S) GitHub Actions Gitea Actions: build matrix (linux amd64/arm64, windows amd64), unit tests, lint
  • P0-05 (S) Dockerfile.server (multi-stage, distroless), deploy/docker-compose.yml
  • P0-06 (S) Makefile / taskfile.yml with common targets (build, test, run, release)

Phase 1 — MVP: enrollment, visibility, on-demand backup

Server foundations

  • P1-01 (M) HTTP server scaffolding (chi, structured logging via slog, graceful shutdown)
  • P1-02 (M) SQLite store layer (modernc.org/sqlite) + migrations (golang-migrate or hand-rolled)
  • P1-03 (M) Schema for users, sessions, hosts, repos, credentials, jobs, job_logs, snapshots, audit_log
  • P1-04 (M) Auth: argon2id password hashing, login/logout, session cookies, CSRF middleware
  • P1-05 (S) First-run admin bootstrap (printed one-time setup token in server logs)
  • P1-06 (M) Secret encryption helper (AEAD with key from RM_SECRET_KEY_FILE)
  • P1-07 (M) Audit log writer + middleware

Agent ↔ server protocol

  • P1-08 (M) Define shared API types in internal/api (Go structs, JSON tags)
  • P1-09 (L) WebSocket transport (nhooyr.io/websocket), framed JSON envelopes, request/response correlation, ping/pong, reconnect with backoff
  • P1-10 (M) Enrollment flow: POST /api/agents/enroll with one-time token → returns persistent bearer + cert pin
  • P1-11 (M) Agent registration on connect (hello message → upsert host record, mark online)
  • P1-12 (S) Heartbeat handler (mark host offline after 90s without heartbeat)

Agent foundations

  • P1-13 (M) Agent config file (/etc/restic-manager/agent.yaml / %PROGRAMDATA%\restic-manager\agent.yaml)
  • P1-14 (M) Service integration: systemd unit + Windows service entrypoint
  • P1-15 (M) Outbound WS client with reconnect, server cert pinning
  • P1-16 (M) Restic wrapper: locate restic binary, run with --json, stream parsed events
  • P1-17 (S) Host metadata collection (OS, arch, hostname, restic version, agent version)

Run-now backup

  • P1-18 (L) Job lifecycle: queued → running → succeeded/failed/cancelled, persisted with logs
  • P1-19 (M) Server endpoint POST /api/hosts/:id/jobs to dispatch a backup command
  • P1-20 (M) Agent executes restic backup, streams stdout/stderr + parsed JSON events back as job.progress / log.stream
  • P1-21 (M) Server persists log stream to job_logs, exposes WS /api/jobs/:id/stream for live tailing
  • P1-22 (S) Snapshot listing: restic snapshots --json, cached projection table, refresh after each backup

UI (HTMX + Tailwind)

  • P1-23 (M) Base layout, login page, session-aware nav
  • P1-24 (M) Dashboard: host cards (status dot, last backup, repo size)
  • P1-25 (M) Host detail page: snapshots tab + run-now button
  • P1-26 (M) Live job log viewer (WS-driven, auto-scroll, cancel button)
  • P1-27 (S) "Add host" flow: generate token, copy install command snippet
  • P1-28 (S) Tailwind build via tailwindcss standalone binary (no Node)

Install scripts

  • P1-29 (M) install.sh (Linux): detects arch, downloads agent, installs systemd unit, enrolls
  • P1-30 (M) install.ps1 (Windows): downloads agent, installs as service, enrolls
  • P1-31 (S) Server endpoint to serve agent binaries + install scripts (signed)

Phase 1 acceptance

  • One Linux + one Windows host can enroll, appear in the dashboard, and a backup can be triggered from the UI with live log streaming. Snapshots list updates after success.

Phase 2 — Scheduling, retention, repo operations

  • P2-01 (M) Schedule schema + CRUD API
  • P2-02 (L) Server-pushed schedule reconciliation (server is source of truth; agent applies)
  • P2-03 (M) Agent local scheduler (robfig/cron/v3); persists next-fire times across restarts
  • P2-04 (M) Schedule editor UI (paths, excludes, tags, cron, retention)
  • P2-05 (M) forget command with retention policy (keep-last/daily/weekly/monthly/yearly)
  • P2-06 (M) prune command (admin-only, uses non-append-only credential)
  • P2-07 (S) check command (random subset + --read-data-subset)
  • P2-08 (S) unlock command
  • P2-09 (M) Repo stats panel: size, dedup ratio, snapshot count, last check time, lock state
  • P2-10 (S) Run-now buttons for forget/prune/check/unlock on host detail page
  • P2-11 (S) Schedule "next run" / "last run" surfaced on host card
  • P2-12 (S) Bandwidth limit fields on schedule editor (--limit-upload, --limit-download); also overridable on run-now jobs
  • P2-13 (M) Pre/post backup hooks: schema (Schedule.pre_hook, Schedule.post_hook, Host.pre_hook_default, Host.post_hook_default), encrypted at rest, admin-only edit, audit-logged
  • P2-14 (M) Agent execution of hooks: configurable shell per host, pre_hook failure aborts backup, post_hook always runs with RM_JOB_STATUS env var, stdout/stderr captured into JobLog with prefix
  • P2-15 (S) Hook editor UI on schedule + host pages, with sensible warnings (e.g. "this hook runs as the agent service user")

Phase 2 acceptance

  • Schedules created in UI run on agents on time; retention is applied; admin can prune from UI; repo health visible per host. Pre/post hooks fire correctly (verified with a Docker stop/start example and a mysqldump example). Bandwidth limits honoured.

Phase 3 — Restore, alerts, audit

  • P3-01 (L) Restore wizard backend: snapshot tree browse via restic ls --json, path picker, target selection
  • P3-02 (L) Restore wizard UI (multi-step: host → snapshot → paths → target → confirm)
  • P3-03 (M) Restore execution: restic restore invocation, progress streaming
  • P3-04 (L) Cross-host restore: target agent receives a temporary scoped read credential for source host's repo (single-job, auto-revoked); UI supports source→target path remapping; warns when source paths need root and target service user is non-root
  • P3-05 (M) Alert engine: rule evaluation loop (failed backup, stale schedule, agent offline, check failed)
  • P3-06 (M) Notification channels: webhook, ntfy, SMTP email
  • P3-07 (S) Alert UI: list, acknowledge, resolve
  • P3-08 (S) Audit log UI with filters (user, action, target, time range)
  • P3-09 (S) diff between two snapshots in UI

Phase 3 acceptance

  • A file deleted on a host can be restored from the UI in under 2 minutes. A failed backup raises an alert via the configured channel within 60s.

Phase 4 — Self-update, RBAC polish, OIDC

  • P4-01 (L) Agent self-update: signed binary published by server, agent downloads, verifies, swaps, restarts
  • P4-02 (M) Agent version reporting on dashboard; "update all" admin action
  • P4-03 (M) RBAC enforcement at API layer (admin / operator / viewer)
  • P4-04 (S) User management UI (create/edit/disable, role assignment, password reset)
  • P4-05 (L) OIDC login (generic provider config, group → role mapping)
  • P4-06 (M) Repo size trend graphs (sparkline on host card, full chart on repo page)
  • P4-07 (S) Per-host tags + dashboard filtering by tag
  • P4-08 (M) Prometheus /metrics endpoint: per-host gauges (last backup timestamp, last backup status, repo size, snapshot count, agent online), server gauges (active alerts, build info), job duration histograms; protected by bearer token or IP allow-list
  • P4-09 (S) Document Prometheus integration + sample Grafana dashboard JSON

Phase 4 acceptance

  • Non-admin users see an appropriately limited UI. Agents update themselves with one click. OIDC login works against at least one provider (Authelia or Authentik). Prometheus can scrape /metrics and the sample Grafana dashboard renders with live data.

Phase 5 — OSS readiness

  • P5-01 (M) Documentation site (mdBook or similar) with install, concepts, security model, screenshots
  • P5-02 (S) CONTRIBUTING.md, CODE_OF_CONDUCT.md, issue + PR templates
  • P5-03 (S) Release automation: goreleaser for binaries + Docker image to GHCR
  • P5-04 (S) Demo screenshots / short Loom walkthrough in README
  • P5-05 (S) SECURITY.md with disclosure process
  • P5-06 (M) End-to-end test suite in CI (Playwright vs. compose stack with sibling Linux agent)
  • P5-07 (S) Sample docker-compose.yml with TLS via Caddy sidecar
  • P5-08 (S) Optional Prometheus /metrics endpoint

Phase 5 acceptance

  • A stranger can read the docs and stand up a working install in under 30 minutes.

Cross-cutting / ongoing

  • X-01 Keep CHANGELOG.md updated (Keep-a-Changelog format)
  • X-02 Track restic version compatibility matrix
  • X-03 Periodic dependency updates (dependabot or renovate)
  • X-04 Threat-model review at end of each phase