From 1d36dcd6683313c349e7fc1cdefb56aa52866e57 Mon Sep 17 00:00:00 2001 From: Steve Cliff Date: Sat, 9 May 2026 12:29:00 +0100 Subject: [PATCH] v1 readiness: CHANGELOG + threat model + first-run onboarding polish - CHANGELOG.md: Keep-a-Changelog format, v1.0.0 entry summarising what each phase delivered. - docs/threat-model.md: structured walkthrough of assets, actors, attack surfaces and residual risks; reviewed against v1.0.0. - cmd/server/main.go: at first-run startup, print a clickable $RM_BASE_URL/bootstrap URL alongside the existing one-shot bootstrap token (or a fallback hint when RM_BASE_URL is unset). - web/templates/pages/bootstrap.html: visible "Minimum 12 characters" hint under the password field so the rule is communicated before the operator submits. - tasks.md: close X-01, X-04, X-05 with notes. --- CHANGELOG.md | 89 ++++++++++++++++++++ cmd/server/main.go | 14 +++- docs/threat-model.md | 126 +++++++++++++++++++++++++++++ tasks.md | 6 +- web/templates/pages/bootstrap.html | 1 + 5 files changed, 231 insertions(+), 5 deletions(-) create mode 100644 CHANGELOG.md create mode 100644 docs/threat-model.md diff --git a/CHANGELOG.md b/CHANGELOG.md new file mode 100644 index 0000000..e74b24d --- /dev/null +++ b/CHANGELOG.md @@ -0,0 +1,89 @@ +# Changelog + +All notable changes to this project are documented here. +The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/), +and the project follows [Semantic Versioning](https://semver.org/). + +## [Unreleased] + +## [1.0.0] - 2026-05-09 + +First tagged release. Six development phases brought the project from +empty repo to a self-hostable, multi-tenant restic backup orchestrator +with a web UI, JSON API, and self-updating agent fleet. + +### Phase 1 — MVP: enrolment, visibility, on-demand backup + +- HTTP server, SQLite store with migrations, AEAD-encrypted + credentials at rest, Argon2id password hashing, session cookies. +- WebSocket transport between server and agents (heartbeat, hello, + schedule fan-out, job log streaming). +- Agent install path for Linux (systemd unit + `install.sh`); one-time + enrolment tokens with embedded repo credentials. +- Run-now backup execution end-to-end, snapshot listing. +- Server-side encrypted repo creds pushed to the agent on hello. + +### Phase 2 — Scheduling, retention, repo operations + +- Source groups (paths + excludes + pre/post hooks + bandwidth caps) + decoupled from schedules; a schedule fires a source group. +- Cron-style schedules with retention policies, server-driven + reconciliation push and ack. +- `restic forget`, `prune`, `check`, `unlock` automation; periodic + maintenance ticker with per-host stagger. +- Pending-runs queue with backpressure (`max_concurrent_jobs` per + host). +- Repo stats panel on the host detail page (size, last-check, last- + prune, stale-lock banner). +- Auto-init of repos on first onboard with credential-failure surface + on the host detail page. +- Announce-and-approve enrolment path for hosts that don't have a + pre-minted token (Ed25519 fingerprint, operator approves). +- Windows agent: SCM service integration + `install.ps1` installer. +- Cross-platform alt-enrolment (announce flow on Windows). + +### Phase 3 — Restore, alerts, audit + +- Restore wizard: pick a snapshot, pick paths, pick a target + (in-place / new directory), live progress. +- Snapshot diff against parent. +- Alert engine: per-source-group dedup, severity tiers, ack / resolve. +- Live-refresh alerts table with severity cues. +- Audit log UI with filters, sort, CSV export, payload-detail modal. + +### Phase 4 — RBAC, OIDC, host tags + +- Role-based access control: viewer / operator / admin. +- User management UI (invite, role change, disable, password reset). +- Generic OIDC SSO with JIT user provisioning + role mapping. +- Per-host tags with chip-row filter on the dashboard. + +### Phase 5 — OSS readiness + +- mdBook-rendered docs site at `docs/book/`. +- Contributor onboarding (CONTRIBUTING.md, security policy, license). +- Docker-only release pipeline + reference deployment compose file. +- Playwright e2e harness covering the smoke runbook. + +### Phase 6 — Update delivery + observability + +- Agent self-update: server-side channel pin per host, signed binary + fetch via the WS transport, atomic swap with rollback on failure. +- Fleet-wide update orchestration with per-host stagger and an admin + pause switch. +- Prometheus `/metrics` endpoint + Grafana dashboard JSON. +- Repo size trend per host (90-day rolling) on the host detail page. + +### Cross-cutting + +- Live dashboard with column sort, filters, free-text host search, + background-tab-aware live refresh (5s cadence). +- Pure-Go binary with embedded UI, no Node/CGO at runtime. +- Reproducible `-trimpath -ldflags="-s -w"` builds for + linux/amd64, linux/arm64, windows/amd64. +- Sharded CI (server-http / store / rest), pre-commit hooks (gofumpt, + go vet, golangci-lint). +- Threat model published (`docs/threat-model.md`). + +[Unreleased]: https://gitea.dcglab.co.uk/steve/restic-manager/compare/v1.0.0...HEAD +[1.0.0]: https://gitea.dcglab.co.uk/steve/restic-manager/releases/tag/v1.0.0 diff --git a/cmd/server/main.go b/cmd/server/main.go index 45f8f15..9da8c13 100644 --- a/cmd/server/main.go +++ b/cmd/server/main.go @@ -9,6 +9,7 @@ import ( "os" "os/signal" "path/filepath" + "strings" "syscall" "time" @@ -145,9 +146,18 @@ func run() error { // text exactly once; we hash it into BootstrapToken on the // server-side handler. fmt.Fprintln(os.Stderr, "================================================================") - fmt.Fprintln(os.Stderr, " FIRST RUN — bootstrap token (use within 1 hour, then it's gone):") + fmt.Fprintln(os.Stderr, " FIRST RUN — no admin user exists yet.") + if cfg.BaseURL != "" { + fmt.Fprintln(os.Stderr, " Open this URL in a browser to create the first administrator:") + fmt.Fprintln(os.Stderr, " "+strings.TrimRight(cfg.BaseURL, "/")+"/bootstrap") + } else { + fmt.Fprintln(os.Stderr, " Open the server URL in a browser; you'll be sent to /bootstrap.") + fmt.Fprintln(os.Stderr, " (Set RM_BASE_URL to have a clickable link printed here.)") + } + fmt.Fprintln(os.Stderr, "") + fmt.Fprintln(os.Stderr, " Headless? POST {token, username, password} to /api/bootstrap") + fmt.Fprintln(os.Stderr, " with this one-shot token (valid until first user is created):") fmt.Fprintln(os.Stderr, " "+token) - fmt.Fprintln(os.Stderr, " POST it to /api/bootstrap with {token, username, password}.") fmt.Fprintln(os.Stderr, "================================================================") } diff --git a/docs/threat-model.md b/docs/threat-model.md new file mode 100644 index 0000000..1772d2d --- /dev/null +++ b/docs/threat-model.md @@ -0,0 +1,126 @@ +# Threat model + +A short, structured walkthrough of the assets restic-manager +protects, the actors that interact with it, the attack surfaces +exposed, and the mitigations in place. This document is written for +operators considering a deployment and for contributors evaluating +security-sensitive changes. It is **not** a formal certification — +restic-manager has not been third-party audited. + +Last reviewed: **2026-05-09** (against v1.0.0). + +--- + +## 1. Assets + +In rough order of sensitivity: + +| Asset | Why it matters | +|---|---| +| **Restic repository passwords** | Decrypt every backup in the repo. Server holds them encrypted at rest; agents need plaintext at backup-time. | +| **Repository URLs with embedded credentials** (e.g. `rest:https://user:pass@host/repo`) | Same as above — read access to the repo is leak-equivalent to the password. | +| **Agent bearer tokens** | Long-lived credentials authenticating each agent → server WS. Compromise lets an attacker impersonate that host (push fake snapshots, ack fake schedule versions, exfiltrate repo creds the server pushes back). | +| **Server session cookies** | Browser-side session for human operators. Compromise = full UI access at the user's role for the cookie's TTL (24h). | +| **Database secret key** | Wraps every encrypted-at-rest field (repo creds, agent enrolment payloads). Loss of the file means decryptable backups; rotation requires re-pushing creds to every agent. | +| **Bootstrap / setup tokens** | One-shot, time-limited; mint admin or invited-user accounts. | +| **Audit log** | Tamper-evident record of admin actions; read-only via UI. | +| **Backup data on the wire** | Restic itself encrypts on the agent before sending — see "out of scope". | + +--- + +## 2. Actors + +| Actor | Trust | +|---|---| +| **Anonymous internet** | Untrusted. Should not reach the server unless proxied behind auth (see deployment guide). | +| **Authenticated viewer** | Read-only on hosts/jobs/alerts/audit. | +| **Authenticated operator** | Add/remove hosts, edit schedules, run backups/restores, mint enrolment tokens, ack alerts. | +| **Authenticated admin** | All of the above plus user management, role changes, fleet update controls, secret-key visibility (no — see below). | +| **Agent** | Trusted to backup-and-report on its own host only. Cannot read other hosts' creds. Bearer-authenticated. | +| **Restic backend (rest-server / S3 / B2 / etc.)** | Out of scope for this document — assumed to authenticate the credentials presented and not collude. | + +--- + +## 3. Attack surfaces and mitigations + +### 3.1 First-run bootstrap + +- **Surface**: `/bootstrap` UI + `/api/bootstrap` JSON endpoint. +- **Risk**: race between server start and admin creation — an attacker who reaches the server first can claim admin. +- **Mitigations**: + - Bootstrap token printed to stderr exactly once; held in memory, not persisted. + - The UI form on `/bootstrap` uses the in-memory token automatically (no token field for the operator to type or expose). + - Both surfaces self-disable the moment any user row exists (`CountUsers > 0`). + - Token is also blanked from process memory after success (defence in depth). +- **Residual risk**: if an operator brings up the server on the public internet before reaching the bootstrap page, an attacker reaching `/bootstrap` first wins. **Recommendation**: bring the server up behind an existing trusted network or with the listener bound to `127.0.0.1` until first-run is complete. + +### 3.2 Local user accounts + +- **Surface**: `/login`, `/api/auth/login`. +- **Mitigations**: Argon2id password hashing with per-deployment params; constant-time password compare; session-cookie minting via `crypto/rand`; session rows hash-only (raw token only in cookie). +- **Rate limiting**: Currently not in place at the application layer — the project assumes a reverse proxy enforces login throttling. **Recommendation**: front the server with `caddy`/`nginx` rate-limit rules in production. +- **Password policy**: 12-character minimum on bootstrap and user-setup paths; no maximum, no rotation, no history. Sufficient for self-hosted ops; tighten in policy if a deployment requires it. + +### 3.3 OIDC SSO + +- **Surface**: `/auth/oidc/*` — generic OIDC client, JIT user provisioning. +- **Mitigations**: state + nonce per flow; role mapping is server-configured (claims trusted only to identify the user, not pick role); user-disabled gate runs after IdP success. +- **Residual risk**: misconfigured role-mapping rules can promote any IdP user to admin. **Recommendation**: review `cfg.OIDC.RoleMappings` carefully. + +### 3.4 Agent enrolment + +- **Surface**: `/api/agents/enroll` (token-authenticated), `/api/agents/announce` (anonymous, then operator-approves). +- **Mitigations**: + - Token path: one-shot, hashed at rest, 1h TTL; agent receives a fresh long-lived bearer in the response. + - Announce path: agent supplies an Ed25519 public key; operator sees a fingerprint to confirm out-of-band before accepting. + - Bearer tokens are SHA-256 hashed in the DB. +- **Residual risk**: an attacker on the network between operator and target host who intercepts the install snippet can enrol *as* the target. The install script must be served over TLS in production (the docker-only deployment defaults to TLS-by-default; bare-metal deployers must configure their own). + +### 3.5 Agent → server WebSocket + +- **Surface**: persistent WS authenticated by agent bearer. +- **Mitigations**: bearer is presented per-connection; server pins the agent fingerprint for the announce flow; messages are envelope-typed and rejected if shape-invalid. +- **No payload-level signing** today — TLS is the integrity boundary. A man-in-the-middle with a valid cert chain could swap messages. **Recommendation**: pin the server cert via `RM_SERVER_CERT_PIN_SHA256` if running over a network you don't fully control. + +### 3.6 Repo credential lifecycle + +- Stored encrypted at rest under the AEAD secret key. +- Pushed to the agent over the WS on hello, on creds change, and on demand. +- Agent persists them encrypted (per-host secret key derived from a value known only to the agent). +- Logged surfaces use `restic.RedactURL()` to strip `user:pass@` from URLs before they reach `slog`. +- Plaintext form is constructed only at `exec.Command` time inside the agent, never stored on a struct field that could be slogged. + +### 3.7 Restore + +- Operators can restore to any path the agent (running as root) can write. +- Cross-host restore (host A's snapshot → host C) is **deferred** — see F-01. The current single-host restore does not require granting any cross-host privileges. + +### 3.8 Audit log + +- Append-only writes from the application; SQLite enforces no schema-level immutability. +- A compromise of the SQLite file (via OS-level access) can edit the audit log. **Recommendation**: ship audit entries to an append-only sink (syslog / Loki / Splunk) if tamper-evidence beyond the OS boundary is required. + +### 3.9 Self-update channel (P6) + +- Agents fetch new binaries via the WS transport from the server. +- Binaries are signature-checked by the agent against a key embedded in the existing agent (see `internal/fleetupdate/`). +- **Residual risk**: a server compromise lets the attacker push code to every agent (running as root). The signing-key compromise window is the same as the server compromise window because both live on the server. Splitting the signing key onto a separate signer is future work (not v1). + +--- + +## 4. Out of scope + +- **Restic itself** — its repository format, encryption, and backend protocol are upstream-trusted. +- **The host OS** — root compromise of a host obviously compromises that host's backups. +- **The backup destination** — restic-manager assumes the rest-server / object-store / SFTP target enforces its own auth. +- **Side-channel attacks** on the server process (RAM dump, process tracing). +- **Physical access** to the server's disk. + +--- + +## 5. Reporting + +Found something we missed? See `SECURITY.md` for the disclosure +process. Coordinated disclosure preferred; the project is +maintained by a small team and we'll respond as quickly as we +reasonably can. diff --git a/tasks.md b/tasks.md index 4599f2a..fc094a3 100644 --- a/tasks.md +++ b/tasks.md @@ -480,11 +480,11 @@ Sizes: **S** = under a day, **M** = 1–3 days, **L** = 3–7 days. ## Cross-cutting / ongoing -- [ ] **X-01** Keep CHANGELOG.md updated (Keep-a-Changelog format) +- [x] **X-01** Keep CHANGELOG.md updated (Keep-a-Changelog format). ✅ Landed: `CHANGELOG.md` at the repo root with a v1.0.0 entry summarising what each phase shipped, plus an empty Unreleased section to accumulate changes after the tag. Updated on each release going forward. - [ ] **X-02** Track restic version compatibility matrix - [ ] **X-03** Periodic dependency updates (`dependabot` or `renovate`) -- [ ] **X-04** Threat-model review at end of each phase -- [ ] **X-05** Proper first-run onboarding UI: admin shouldn't need to `curl` `/api/bootstrap` by hand. Render the bootstrap form on the same login page (extra "setup token" field shown only while no admin user exists, hidden after); on submit POST to `/api/bootstrap`, then drop straight into a session. Surface the one-time token from the server log somewhere copy-able (or print a clickable URL with the token in the query string at first-run). Also: relax the 12-char password floor for the first-run path or document it in the form so `admin` doesn't silently fail validation. +- [x] **X-04** Threat-model review at end of each phase. ✅ Landed: `docs/threat-model.md` covering assets, actors, attack surfaces (bootstrap, local accounts, OIDC, agent enrolment, agent ↔ server WS, credential lifecycle, restore, audit log, self-update channel), residual risks, and explicit out-of-scope items. Reviewed against v1.0.0 surface; refresh on each tagged release. +- [x] **X-05** Proper first-run onboarding UI. ✅ Landed: bootstrap form already lives at `/bootstrap` and `/login` redirects to it when no users exist (so an operator hitting the server in a browser is guided into setup automatically — the form takes username + password only, no token field needed because the server holds the in-memory token and applies it server-side). Improvements added here: at first-run startup the server now prints a clickable `$RM_BASE_URL/bootstrap` URL (or a fallback message when `RM_BASE_URL` is unset) alongside the existing one-shot token for headless `/api/bootstrap` use; the bootstrap form's password field shows an explicit "Minimum 12 characters" hint so the rule is visible before submission instead of failing on submit. --- diff --git a/web/templates/pages/bootstrap.html b/web/templates/pages/bootstrap.html index 5e3c744..e5b1274 100644 --- a/web/templates/pages/bootstrap.html +++ b/web/templates/pages/bootstrap.html @@ -36,6 +36,7 @@ +
Minimum 12 characters.