Plan — P6-04 + P6-05 Prometheus metrics + Grafana dashboard

Spec: docs/superpowers/specs/2026-05-07-p6-04-05-prometheus-metrics-design.md

Step 1 — Config wiring

Add fields to internal/server/config/config.go:
- MetricsToken string (yaml metrics_token)
- MetricsTrustedCIDRs []string (yaml metrics_trusted_cidrs)
- method (c Config) MetricsAuthEnabled() bool returning true iff at least one of the two is configured.
Env loading: RM_METRICS_TOKEN and RM_METRICS_TRUSTED_CIDR (comma-CIDR).
validate() extension: ensure each CIDR parses (reuse the same netip.ParsePrefix pattern that already validates TrustedProxies).
Tests: extend config_test.go covering both env vars + happy/sad CIDR.

Registry struct: sync.Mutex, map[jobKey]*histogramState where jobKey = struct{kind,status string}.
ObserveJob(kind, status string, dur time.Duration) — clamps negative durations to 0; locks; bumps the right bucket + sum + count.
Snapshot() Snapshot — copies state under lock; returns plain value type.
Snapshot carries Histogram rows (kind, status, buckets, sum, count) and accepts the rest from the caller (host rows, alert counts, build info).
Render(w io.Writer, s Snapshot) error — emits standard text exposition with stable line ordering. No external dep; manual escape of \ " \n in label values per the Prom format spec.
Unit tests: golden render, concurrent observe, bucket boundaries.

New internal/server/http/metrics.go:
- (s *Server) handleMetrics(w, r) — calls authoriseMetricsScrape, then gatherSnapshot(ctx) then metrics.Render.
- authoriseMetricsScrape(r, cfg) (ok bool, status int) — pure helper; bearer token compared with subtle.ConstantTimeCompare; CIDR check on r.RemoteAddr first, then X-Forwarded-For if a trusted proxy fronted us (mirror realIP's logic; simplest path is to call chi/middleware.RealIP-aware lookup the existing handlers use).
- gatherSnapshot(ctx) — assembles the snapshot from Store.ListHosts, Store.ListAlerts({Status:"open"}), the metrics registry, and version.Version/version.Commit/runtime.Version().
Route mounted in server.go only if s.deps.Cfg.MetricsAuthEnabled().
Deps grows a Metrics *metrics.Registry field; nil-tolerant in handlers.

internal/server/ws/handler.go:
- HandlerDeps grows Metrics *metrics.Registry.
- In the MsgJobFinished branch, after the GetJob lookup we already do, observe (job.Kind, p.Status.String(), p.FinishedAt.Sub(deref(job.StartedAt))). Skip if job.StartedAt is nil (rare race).
cmd/server wires the registry into both Deps and HandlerDeps from a single instance.

internal/server/metrics/registry_test.go — observe + snapshot determinism.
internal/server/metrics/render_test.go — golden output for a fixed snapshot.
internal/server/http/metrics_test.go — auth matrix (six cases per the spec) using the existing newTestServer fixture pattern. Render snapshot includes ≥1 host so we exercise the gather path end-to-end.

docs/prometheus.md — enable + scrape config + metric reference + dashboard import.
deploy/grafana/restic-manager-dashboard.json — six-panel dashboard. Hand-authored against Grafana 11 dashboard schema (uid, schemaVersion, panels with targets[].expr, datasource as variable). Validated by importing into Grafana — but since we can't run Grafana in CI, the structural sniff test is just that the JSON parses and contains the expected panel titles + datasource variable.

Strike P6-04, P6-05 in tasks.md; add an "as shipped" note mirroring the prior P6 entries.
Run go vet ./..., go test ./..., make build.
Push branch (no PR per standing instruction).

CIDR check for proxied scrapes — easy to mis-implement, easy to mis-document. The handler test must exercise both "direct hit" and "X-Forwarded-For" paths.
Histogram lock contention — every job finish takes the mutex. Job throughput is low (a few/min/host max), and ObserveJob is a couple of map lookups; no risk in practice.
Dashboard JSON drift — Grafana versions evolve. Pinning schemaVersion and using only well-supported panel types (timeseries, stat, table) keeps the import working across recent versions.