P5: OSS readiness — docs site, contributor onboarding, e2e harness

P5-01 — Documentation site under docs/book/ rendered with mdBook (downloaded via Makefile, same static-binary pattern as Tailwind). Structured chapters: getting started, concepts, operations, security, reference. `make docs` / `make docs-watch`. Generated output gitignored. P5-02 — CONTRIBUTING.md rewritten from placeholder to a full guide. CODE_OF_CONDUCT.md adapted from Contributor Covenant for a single-maintainer project. .gitea/issue_template/{bug,feature}.md and PULL_REQUEST_TEMPLATE.md. P5-04 — Six README screenshots captured live from a fresh server bootstrap (login, empty dashboard, add-host, alerts, settings, audit log). README rewritten to centre the screenshot grid and link out to the docs site. P5-05 — SECURITY.md with disclosure policy (3-day ack, 30-day default window), scope in/out, threat-model summary, operator hardening checklist. Mirrored as a docs-site chapter. P5-06 — End-to-end test harness. e2e/compose.e2e.yml brings up server + sibling Linux agent (alpine + restic) + restic/rest-server. Agent uses announce-and-approve so Playwright can drive the full operator flow: bootstrap → login → accept pending → backup → verify terminal status. Second spec scrapes /metrics to assert the P6-04 endpoint surface. .gitea/workflows/e2e.yml runs on every PR; local how-to in docs/e2e.md.
2026-05-07 23:56:02 +01:00
parent ff8a5dbead
commit bb4ed3502d
47 changed files with 2818 additions and 61 deletions
@@ -0,0 +1,35 @@
+# Reporting vulnerabilities
+
+The full disclosure policy lives in
+[`SECURITY.md`](https://gitea.dcglab.co.uk/steve/restic-manager/src/branch/main/SECURITY.md)
+at the repo root. The short version:
+
+- **Don't open a public issue.**
+- Send a Gitea private message to `steve` on
+  <https://gitea.dcglab.co.uk>, or email the address on the
+  maintainer's profile, with a subject like
+  `[SECURITY] restic-manager: <one-line summary>`.
+- Expect an acknowledgement within 3 working days; escalate
+  through the other channel if you don't get one.
+- Default disclosure window is **30 days from confirmed report
+  to public disclosure**, faster if a PoC is already
+  circulating, slower only by mutual agreement.
+
+## What to include
+
+A description of the issue and the impact, the affected
+component (server / agent / install script / docs), the version,
+and reproduction steps. A working PoC is welcome but not
+required — a credible threat model is enough.
+
+## In scope vs. out of scope
+
+See the full policy. Quick highlights:
+
+- **In scope:** server, agent, install scripts, docker image,
+  docker-compose reference, crypto choices, docs that lead to
+  insecure configs.
+- **Out of scope:** restic itself (report upstream), unpatched
+  third-party deps (report upstream first), pre-authenticated
+  admin abuse (admins are designed to have full power), DoS on
+  deployments without the recommended reverse proxy.
@@ -0,0 +1,72 @@
+# Hardening checklist
+
+A baseline for new deployments. Most of these are defaults; the
+list is here to make audit easy.
+
+## Server
+
+- [ ] Reverse proxy in front, TLS terminating at the proxy
+      (Caddy/nginx/Traefik).
+- [ ] `RM_TRUSTED_PROXY` set to the proxy's CIDR.
+- [ ] `RM_BASE_URL` matches the public hostname and the cookie
+      scope you want.
+- [ ] `RM_COOKIE_SECURE=true` (the default; only set `false`
+      for local HTTP testing).
+- [ ] HTTP listener bound to **localhost** in the compose file,
+      not `0.0.0.0`. The reverse proxy is the only thing that
+      should reach it.
+- [ ] `secret.key` backed up separately from the database.
+- [ ] Bootstrap token consumed and the printed log line scrubbed
+      from any log archive.
+
+## Authentication
+
+- [ ] Admin user has a password ≥ 12 characters (the floor).
+- [ ] OIDC enabled if you have an IdP — local password auth
+      stays as a break-glass.
+- [ ] Disabled (not deleted) any users who change roles or leave
+      so their session is invalidated immediately.
+- [ ] The last-admin guard isn't tripped — there's always at
+      least one enabled admin user.
+
+## Repo credentials
+
+- [ ] Append-only credential set as the everyday cred for every
+      host.
+- [ ] Admin credential set only where prune cadence is enabled.
+- [ ] No credentials reused across hosts. Each host should have
+      its own credential pair so a single host compromise has a
+      single blast radius.
+- [ ] If using rest-server, `--append-only` flag is on for the
+      everyday user; the prune user is a separate identity.
+
+## Agent
+
+- [ ] Agent runs as `root` (Linux) or `LocalSystem` (Windows)
+      **only when** the source paths require it. Otherwise pin
+      a service user that has read access to what's backed up
+      and nothing else.
+- [ ] systemd unit's sandboxing flags are intact
+      (`NoNewPrivileges`, `Protect*`, `MemoryDenyWriteExecute`).
+- [ ] Agent's config file `/etc/restic-manager/agent.yaml` is
+      mode `0600` and owned by the service user. The bearer
+      token lives in there.
+
+## Operations
+
+- [ ] Alerts wired to a real channel (webhook into Slack,
+      ntfy topic, SMTP) — not just sitting in the UI.
+- [ ] Test-fire each notification channel after configuring.
+- [ ] Audit-log retention is long enough to cover the operator's
+      incident-response window.
+- [ ] Prometheus endpoint, if enabled, gated by token AND CIDR
+      where practical (default is opt-in / off).
+
+## Recovery
+
+- [ ] A documented procedure for rotating a leaked agent bearer
+      (delete + re-enrol the host).
+- [ ] A test-restore done at least once, end-to-end, before
+      relying on the system in anger.
+- [ ] `secret.key` and the SQLite database covered by separate
+      backup paths so neither alone reconstitutes the other.
@@ -0,0 +1,110 @@
+# Threat model
+
+This page documents what restic-manager defends against, what it
+doesn't, and the trust assumptions a deployment is making. The
+canonical version lives in [`spec.md`](https://gitea.dcglab.co.uk/steve/restic-manager/src/branch/main/spec.md)
+§11; the summary here is shaped for operators rather than
+implementers.
+
+## Trust boundaries
+
+```
+┌──────────────────────────────────────────┐
+│  TRUSTED zone                            │
+│  ┌─────────────┐    ┌──────────────┐     │
+│  │  Operator's │    │   Reverse    │     │
+│  │   browser   │◄──►│    proxy     │     │  TLS terminates here
+│  └─────────────┘    └──────┬───────┘     │
+└────────────────────────────┼─────────────┘
+                             │ HTTP, plaintext
+                             │ (loopback or trusted LAN)
+┌────────────────────────────▼─────────────┐
+│  Server (control plane)                  │
+└────────────┬─────────────────────────────┘
+             │ outbound WebSocket (TLS to clients via proxy)
+             │ — bearer-authenticated
+┌────────────▼──────────────┐
+│  Agent (per host)         │  ◄── attacker model: assume one
+└────────────┬──────────────┘       endpoint can be compromised
+             │ subprocess
+             ▼
+   restic ──▶ repository (rest-server / S3 / SFTP / …)
+```
+
+## What we defend against
+
+### Network attacker between operator and server
+
+- HTTPS via the reverse proxy is the only operator-facing surface
+  on a sane deployment.
+- `RM_COOKIE_SECURE=true` (default) means the session cookie
+  refuses to ride a non-HTTPS connection.
+- `RM_TRUSTED_PROXY` gates whether `X-Forwarded-*` is honoured;
+  a bypassing request can't spoof the client IP.
+
+### Compromised agent host
+
+- The agent's bearer token can dispatch commands **only on its
+  own host**. It can't read other hosts' state, dispatch jobs
+  on other hosts, or escalate within the control plane.
+- If you suspect a host compromise:
+  1. Disable the agent's host row from **Hosts → Delete**
+     (cascades the bearer hash).
+  2. Rotate the repo credential at the rest-server / object
+     store side.
+  3. Audit-log lists every action that bearer ever drove.
+
+### DB compromise without the secret key
+
+- Repo credentials are AEAD-encrypted at rest. A DB dump alone
+  doesn't expose them.
+- Agent bearer **hashes** are leaked; that's enough to
+  authenticate as any agent until you revoke. A rotation
+  procedure is just "delete + re-enrol" today.
+- Operator passwords are bcrypt-hashed; OIDC users have no
+  password to leak.
+- Session tokens are hashed; an attacker can't replay a
+  session from a DB dump.
+
+### DB compromise WITH the secret key
+
+The attacker can decrypt every credential. Treat
+`secret.key` with the same care as a password manager database.
+Back it up to a separate vault, not to the same Docker volume
+as the database.
+
+### Forget/prune as a DoS vector
+
+- The everyday backup credential cannot prune (append-only).
+- The admin credential is only pushed to the agent at the
+  moment of dispatch and discarded after the job ends.
+- Compromise of a single agent host does **not** grant prune
+  rights — at worst the attacker gets fresh write access until
+  the credential is rotated.
+
+### Operator-side typo or bad copy-paste
+
+- Repo credentials are stored encrypted; mis-typed creds fail
+  fast on the next `restic` invocation rather than silently
+  corrupting state.
+- NS-03 added auto-init: the first dispatched job after creds
+  change runs `restic init`, surfaces the error eagerly under
+  the host's vitals strip if the creds are bad, and resets the
+  host's `repo_status` so the operator can retry without
+  hunting through job logs.
+
+## What we don't defend against
+
+- **Insider threat at the maintainer level.** A malicious
+  maintainer can publish a backdoored container; SBOM /
+  signing infrastructure (Phase 6 candidate) would help here
+  but isn't shipped today.
+- **Supply chain.** We pin module versions (`go.sum`) and
+  pin the Tailwind binary's release tag, but a compromise in
+  one of those upstreams would land here.
+- **Side-channel via restic itself.** A bug in restic that
+  enables snapshot-content disclosure is restic's problem; the
+  control plane doesn't see snapshot bytes either way.
+- **DoS via resource exhaustion** without the recommended
+  reverse-proxy / rate-limit in front. Don't expose the
+  server's HTTP port to the public internet directly.