The Dockerfile only set `-X main.version=...`, so docker-built binaries
left `internal/version.Version` at its default "dev". The update logic
(host_update.go:61, hosts.go:94, fleet_update.go:101 et al.) compares
against `internal/version.Version`, so a v1.0.0 host always looked
out-of-date to a v1.0.0 server, the chip never cleared, and pressing
"update" re-downloaded the same bundled binary on a loop.
Collapse the two version sources: drop the `var version/commit/date`
locals in cmd/{server,agent}/main.go, route everything through
internal/version (now also carrying Date), and have both the Dockerfile
and the Makefile set the same single set of -X flags. Verified
end-to-end: make build and docker build both emit binaries whose
--version reflects the build VERSION.
New internal/server/metrics package emits the legacy text/plain
exposition format directly, so we don't pull in
prometheus/client_golang. Endpoint is opt-in via RM_METRICS_TOKEN
and/or RM_METRICS_TRUSTED_CIDR; route is not mounted at all if
neither gate is set. Both gates ANDed when both configured.
Per-host gauges (online, last_backup_*, repo_size_bytes,
snapshot_count, open_alerts, repo_status), server gauges
(hosts_total/online, active_alerts by severity, build_info), and
an in-memory job-duration histogram observed from the existing
MsgJobFinished branch in the WS handler.
Docs in docs/prometheus.md (enable + scrape config + metric
reference + dashboard import). Sample dashboard at
deploy/grafana/restic-manager-dashboard.json - six panels,
Grafana schema 39, single Prometheus datasource variable.
Tests: golden render, concurrent observe, bucket boundaries in
the metrics package; auth matrix (no auth -> 404, token gate,
CIDR gate, both required) in the HTTP layer.
Smoke caught this: ProtectSystem=full mounts /usr read-only so the
agent couldn't write its own .new staging file or atomic-rename over
the running binary. Adding /usr/local/bin to ReadWritePaths is the
minimum diff that lets self-update work; the whole-dir grant is
required because os.Rename needs write on the parent directory.
Smoothes the rough edges that came up exercising a live deployment.
First-run bootstrap UI: /bootstrap renders a username + password form
that uses the in-memory token directly (operator no longer copies it
out of the log); /login redirects there while bootstrap is available.
Agent reliability: failJob synthetic envelopes so command.run early
returns no longer hang the server-side job; runtime probe of restic
restore --help drives --no-ownership instead of version sniffing
(0.18.x had it removed). Server unit re-shaped: ProtectSystem=full
plus ReadWritePaths=/etc/restic-manager, no ProtectHome — restore
can now write anywhere a user might want.
Restore wizard: default target is /root/rm-restore/<job-id>/ with
clearer help text. Re-init confirm input uses .field (was .input,
which doesn't exist — text was invisible).
NS-01 host delete: store DeleteHost, admin-band /hosts/{id}/delete
with hostname-confirm danger zone, audit, FK cascade, live WS close.
NS-02 enrollment-token recovery: outstanding-tokens panel on
/hosts/new, regenerate (preserves attachments) and revoke handlers
+ audit, store-level ListOutstandingEnrollmentTokens and
DeleteEnrollmentToken.
NS-03 repo init / probe surface: migration 0020 adds
hosts.repo_status + repo_status_error; WS handler projects every
init job's outcome onto the host row (idempotent already-initialised
collapses to ready); creds-save resets status and dispatches a fresh
probe; /hosts/{id}/repo/probe retry endpoint with banner.
NS-04 dashboard live + sort + filter: query-string filter
(q/status/repo_status/tag/sort/dir), 5s htmx live poll mirroring the
alerts pattern with a localStorage live toggle, sortable column
headers, filter row + clear.
Alerts page: ack'd-by line resolves user_id ULID to username.
Compose.yaml ignored — host-specific.
The reverse proxy is assumed to live outside this project (Caddy,
nginx, Traefik, whatever the operator already runs). The reference
compose stands up only the server: image-pinned via RM_VERSION,
named volume for operator state, localhost-bound so the proxy
reaches it on loopback.
docs/reverse-proxy.md covers what the proxy must forward — the
X-Forwarded-* headers, Host, and Connection: upgrade for the agent
WebSocket and live-log streams — plus the RM_TRUSTED_PROXY CIDR
rule that gates header trust. Worked examples for Caddy, nginx
(with the websocket upgrade map + 1h proxy_read_timeout for live
logs), and Traefik.
Single public deliverable per tag: a multi-arch server image, with
cross-compiled agent binaries + install scripts + the systemd unit
baked under /opt/restic-manager/dist/. The /agent/binary and
/install/* handlers fall back from <DataDir>/... to that read-only
path so a fresh container Just Works without first-run staging;
operators can still drop a custom build into <DataDir>/ to override
per-host.
Architecture rationale: agent distribution already routes through
the running server, so the release surface mirrors that — there's
no second source of truth to keep in sync.
Workflow .gitea/workflows/release.yml triggers on v*.*.* tag-push
(fan-out :vX.Y.Z / :X.Y / :X, plus :latest once MAJOR>=1) and
workflow_dispatch (snapshot tag only). Pushes to the Gitea
container registry on this instance.
Both binaries grow main.commit + main.date ldflag targets. Makefile
and Dockerfile fill them; release workflow forwards from gitea.sha
plus a UTC timestamp.
Spec : docs/superpowers/specs/2026-05-05-p5-03-docker-only-release.md
Plan : docs/superpowers/plans/2026-05-05-p5-03-docker-only-release.md
Three small follow-ups from review:
1. Restore target is now operator-editable. Default value is the
literal '\$HOME/rm-restore/<job-id>/' (agent expands \$HOME at
run time using os.UserHomeDir(); also handles \${HOME} and ~/
prefixes). Operator can replace with any absolute path.
- ui_restore.go validates the input is either absolute or starts
with one of the recognised prefixes; other env-var refs (\$PATH
etc.) are deliberately rejected so operator paths can't pick up
arbitrary agent env values.
- host_restore.html replaces the read-only mono-text display with
a real <input>; help text spells out that \$HOME resolves
agent-side and <job-id> is substituted on dispatch.
- install.sh + the systemd unit prep /root/rm-restore so the
default works under the sandbox: ReadWritePaths gains a soft
'-/root/rm-restore' entry (the '-' makes the bind-mount soft-fail
if missing, but install.sh pre-creates it root-owned 0700).
2. --no-ownership flag now gated on restic version. The flag was
added in restic 0.17 and 0.16 rejects it. Previously dropped it
wholesale — that meant new-dir restores silently preserved
ownership against design intent on 0.17+. Now the agent threads
its detected restic version (sysinfo already collects it) through
runner.Config -> restic.Env, and RunRestore appends --no-ownership
only when AtLeastVersion(0, 17) returns true. 0.16 hosts still
restore with original uid/gid; help text in the wizard explicitly
notes this. The previous 'Original ownership is preserved' copy
was wrong for new-dir mode and is corrected.
3. golangci-lint misspell locale switched US -> UK and the codebase
swept (73 corrections, mostly behaviour/serialise/recognise/honour).
Wire-format ErrorCode 'unauthorized' -> 'unauthorised' is a tiny
contract change but the agent doesn't parse those codes today and
no external API consumers exist yet. Tests passed before + after.
Tests:
- internal/restic/version_test.go covers Env.AtLeastVersion across
edge cases (empty, exact match, patch above, minor below, non-
numeric) and expandHome on \$HOME / \${HOME} / ~/, plus
pass-through for absolute paths and refusal of other env vars.
- ui_restore_test updated: TargetDir now starts '\$HOME/rm-restore/'
with the job_id substituted into the placeholder.
Live verified on the smoke env: default target restored to
/root/rm-restore/<job-id>/ as the agent's expanded \$HOME (2 files,
14 bytes); custom override '/tmp/custom-restore/<job-id>/' restored
into the agent's PrivateTmp namespace (1 file, 6 bytes); both jobs
'succeeded', exit 0.
Pwsh installer that detects arch, downloads
$Server/agent/binary?os=windows&arch=amd64 to
C:\Program Files\restic-manager\, runs the agent in -enroll-server
[+ -enroll-token] mode (token flow OR announce-and-approve), then
calls 'restic-manager-agent install' to register the SCM service.
Surfaces existing scheduled tasks named *restic* without disabling.
CLAUDE.md restage block updated to also stage install.ps1 alongside
install.sh.
Allow-list filter @system-service excludes some syscalls Go's
runtime + restic's file scanner reach for; init job died
immediately with "bad system call (core dumped)". CapabilityBounding
already constrains what root can do; the Protect*/Restrict* toggles
still cover network / kernel / mount / namespace. Net effect on the
threat model is negligible vs the operational cost.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cohesive batch from a smoke-test session against a real rest-server.
Themed bullets:
* Agent runs as root, sandboxed via systemd. CapabilityBoundingSet
drops to CAP_DAC_READ_SEARCH + restore caps; ProtectSystem=strict
with ReadWritePaths confined to /etc + /var/lib/restic-manager;
NoNewPrivileges blocks escalation. Install script no longer
creates a service user. spec.md §4.2 / §14.1 / §14.3 explain the
rationale (matches UrBackup / Veeam / Bareos defaults; trying to
back up "everything" as an unprivileged user creates silent skips
on /home, /root, /var/lib/* with no upside vs the threat model
the agent already implies).
* Init-repo end-to-end. New JobKind="init" wired through agent
runner, restic.Env.RunInit, server dispatcher, and a UI button
(red "Initialise repo" in the run-now panel). hosts.repo_initialised_at
flips on init success, on backup success, or on a non-empty
snapshots.report. The "Run now" / "Init" / "Retry" branching now
drives both the dashboard host row and the host-detail panel.
Migrations 0004 (column), 0005 (jobs.kind CHECK widened — using
the safe create-new-then-rename pattern; first version corrupted
job_logs.job_id FK), 0006 (cleans up job_logs FK on already-
affected DBs).
* rest-server creds embedded at exec time only. restic.Env gains
RepoUsername; mergeRestCreds() builds the user:pass@-prefixed URL
inside envSlice() and never assigns it back to the struct, so
nothing slog-able ever sees the cleartext form. RedactURL helper
for any future surface that needs to log a URL safely. Both
helpers tested.
* Add-host UX. Repo password is now optional — server mints a
24-byte URL-safe random one and surfaces it once, alongside an
htpasswd snippet ("echo PASS | htpasswd -B -i ... USERNAME") so
the operator pastes one command on the rest-server host and one
on the endpoint. Result page also links the install snippet at
/install/install.sh (was /install.sh — 404'd before) and pipes
to bash (not sh — script uses set -o pipefail and other
bashisms; on Debian/Ubuntu sh is dash).
* Late-subscriber race in JobHub. A fast-failing job could finish
(DB write + Broadcast) before the browser's HX-Redirect → page
load → WS-connect path completed, so the JS sat forever waiting
on a job.finished that already passed. JobHub split into
Register + Send + Run; handleJobStream now subscribes first,
re-fetches the job, and sends a synthetic job.finished if the
state is already terminal.
* HTMX error visibility. New toast partial listens to
htmx:responseError and surfaces the response body as a
bottom-right toast — every server-side validation error now
becomes visible without per-handler JS wiring. Also handles
custom rm:toast events for future server-pushed notifications
via the HX-Trigger header. Themed via existing CSS vars.
* Dashboard rows are now whole-row clickable to host detail
(CSS card-link pattern: absolute-positioned anchor + .row-action
z-index restoration so the action button stays clickable).
"View →" on a running job links to /jobs/<id> rather than
/hosts/<id> since the row click already covers the host page.
* "Run first" / "Run first backup" → "Run now" everywhere for
consistency.
* runbook (docs/e2e-smoke.md) updated — live-log streaming step
now reflects P1-26; mentions the browser-driven Run-now flow.
* _diag/dump-creds — moved out of cmd/ so go build doesn't pick
it up; .gitignore now excludes /_diag/ entirely.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Self-hosted deployments already terminate TLS at Caddy/Traefik/nginx;
making the server do TLS too means double cert config, dual ACME
plumbing, and an untested code path. Drop RM_TLS_CERT/RM_TLS_KEY,
remove TLSEnabled() and the ListenAndServeTLS branch.
Replace the cookie's "Secure if TLS-in-process" check with a new
RM_COOKIE_SECURE flag (default true). Local HTTP-only testing sets
RM_COOKIE_SECURE=false; production is always behind a TLS proxy and
the cookie stays Secure.
Default port :8443 → :8080. docker-compose binds 127.0.0.1 only and
populates RM_TRUSTED_PROXY. spec.md §4.1/§10.1 rewritten with a
Caddyfile snippet and a hard "do not expose RM_LISTEN publicly"
warning. enrollResponse keeps cert_pin_sha256 in the shape but the
server can't introspect a cert it doesn't terminate — operator
pastes the proxy's hash into -cert-pin at install time.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
P1-14 deploy/install/restic-manager-agent.service: standard systemd
unit with the usual hardening switches (NoNewPrivileges, Protect*,
RestrictRealtime, MemoryDenyWriteExecute). Restart=always with a
5s backoff. Runs as a dedicated unprivileged restic-manager-agent
user; the install script creates it.
P1-29 deploy/install/install.sh: arch detection (amd64/arm64), pulls
the agent binary from /agent/binary, creates the service user
+ dirs (/etc/restic-manager, /var/lib/restic-manager), runs
enrollment via `agent -enroll-server -enroll-token`, lays down
the systemd unit, enables and starts it.
Honours the spec's "detect, don't auto-disable" rule for existing
schedulers: scans systemd timers, /etc/cron.d/*, /etc/cron.daily/*,
root crontab for restic-named entries and prints them with the
exact disable command — operator decides.
P1-31 server endpoints to ship the agent installation payload:
GET /agent/binary?os=linux&arch=amd64 → serves
<DataDir>/agent-binaries/restic-manager-agent-linux-amd64
GET /install/<file> → serves
<DataDir>/install/<file>
Both endpoints reject path traversal and return 404 if the file
isn't published. Operators drop the binaries + service unit into
these directories at release time. Signed-bundle verification is
deferred to Phase 5 OSS readiness.
All tests still pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lands the bottom three layers of Phase 1:
P1-08 internal/api: protocol_version + envelope + every WS message
shape from spec.md §6.2 (Hello, Heartbeat, Job*, Schedule*, etc).
Wire-format tests pin the JSON shape so a rename here breaks
tests instead of silently breaking the agent.
P1-02 + P1-03 internal/store: SQLite via modernc.org/sqlite,
embed.FS + a tiny version table for hand-rolled migrations.
0001_initial.sql covers every table from spec.md §5 plus
enrollment_tokens and host_schedule_version. Typed accessors
for users / sessions / enrollment / audit. WAL + foreign_keys
+ busy_timeout on by default.
P1-06 internal/crypto: XChaCha20-Poly1305 AEAD wrapper with
per-message random nonce. Key file lifecycle (generate +
refuse-to-overwrite, load with size validation). Optional
additionalData binds ciphertext to the row that owns it.
P1-04 internal/auth (partial — passwords + tokens; sessions
middleware lands with the HTTP handlers): argon2id following
RFC 9106 (64 MiB / t=3 / p=4 / 32B), constant-time verify.
HashToken stores SHA-256 of session/agent/enrollment tokens
so a stolen DB doesn't hand over credentials.
Build floor moves to Go 1.25 (modernc.org/sqlite v1.50+ requires
it); CI + Dockerfile + README updated. Markdown lint diagnostics
on tasks.md cleared.
All packages tested. ~70 new tests pass in <1s.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>