Commit Graph

9 Commits

Author SHA1 Message Date
steve 7cc17813a9 p5-03: docker-only release path (drop goreleaser)
Single public deliverable per tag: a multi-arch server image, with
cross-compiled agent binaries + install scripts + the systemd unit
baked under /opt/restic-manager/dist/. The /agent/binary and
/install/* handlers fall back from <DataDir>/... to that read-only
path so a fresh container Just Works without first-run staging;
operators can still drop a custom build into <DataDir>/ to override
per-host.

Architecture rationale: agent distribution already routes through
the running server, so the release surface mirrors that — there's
no second source of truth to keep in sync.

Workflow .gitea/workflows/release.yml triggers on v*.*.* tag-push
(fan-out :vX.Y.Z / :X.Y / :X, plus :latest once MAJOR>=1) and
workflow_dispatch (snapshot tag only). Pushes to the Gitea
container registry on this instance.

Both binaries grow main.commit + main.date ldflag targets. Makefile
and Dockerfile fill them; release workflow forwards from gitea.sha
plus a UTC timestamp.

Spec : docs/superpowers/specs/2026-05-05-p5-03-docker-only-release.md
Plan : docs/superpowers/plans/2026-05-05-p5-03-docker-only-release.md
2026-05-05 15:18:48 +01:00
steve f0dfa689fe P3 follow-up: editable target dir, conditional --no-ownership, UK lint
Three small follow-ups from review:

1. Restore target is now operator-editable. Default value is the
   literal '\$HOME/rm-restore/<job-id>/' (agent expands \$HOME at
   run time using os.UserHomeDir(); also handles \${HOME} and ~/
   prefixes). Operator can replace with any absolute path.
   - ui_restore.go validates the input is either absolute or starts
     with one of the recognised prefixes; other env-var refs (\$PATH
     etc.) are deliberately rejected so operator paths can't pick up
     arbitrary agent env values.
   - host_restore.html replaces the read-only mono-text display with
     a real <input>; help text spells out that \$HOME resolves
     agent-side and <job-id> is substituted on dispatch.
   - install.sh + the systemd unit prep /root/rm-restore so the
     default works under the sandbox: ReadWritePaths gains a soft
     '-/root/rm-restore' entry (the '-' makes the bind-mount soft-fail
     if missing, but install.sh pre-creates it root-owned 0700).

2. --no-ownership flag now gated on restic version. The flag was
   added in restic 0.17 and 0.16 rejects it. Previously dropped it
   wholesale — that meant new-dir restores silently preserved
   ownership against design intent on 0.17+. Now the agent threads
   its detected restic version (sysinfo already collects it) through
   runner.Config -> restic.Env, and RunRestore appends --no-ownership
   only when AtLeastVersion(0, 17) returns true. 0.16 hosts still
   restore with original uid/gid; help text in the wizard explicitly
   notes this. The previous 'Original ownership is preserved' copy
   was wrong for new-dir mode and is corrected.

3. golangci-lint misspell locale switched US -> UK and the codebase
   swept (73 corrections, mostly behaviour/serialise/recognise/honour).
   Wire-format ErrorCode 'unauthorized' -> 'unauthorised' is a tiny
   contract change but the agent doesn't parse those codes today and
   no external API consumers exist yet. Tests passed before + after.

Tests:
- internal/restic/version_test.go covers Env.AtLeastVersion across
  edge cases (empty, exact match, patch above, minor below, non-
  numeric) and expandHome on \$HOME / \${HOME} / ~/, plus
  pass-through for absolute paths and refusal of other env vars.
- ui_restore_test updated: TargetDir now starts '\$HOME/rm-restore/'
  with the job_id substituted into the placeholder.

Live verified on the smoke env: default target restored to
/root/rm-restore/<job-id>/ as the agent's expanded \$HOME (2 files,
14 bytes); custom override '/tmp/custom-restore/<job-id>/' restored
into the agent's PrivateTmp namespace (1 file, 6 bytes); both jobs
'succeeded', exit 0.
2026-05-04 17:27:52 +01:00
steve 8ceb76c733 deploy: P2-17 install.ps1 (Windows installer)
Pwsh installer that detects arch, downloads
$Server/agent/binary?os=windows&arch=amd64 to
C:\Program Files\restic-manager\, runs the agent in -enroll-server
[+ -enroll-token] mode (token flow OR announce-and-approve), then
calls 'restic-manager-agent install' to register the SCM service.
Surfaces existing scheduled tasks named *restic* without disabling.

CLAUDE.md restage block updated to also stage install.ps1 alongside
install.sh.
2026-05-04 11:15:18 +01:00
steve c565a7abd1 agent unit: drop SystemCallFilter — was killing restic with SIGSYS
CI / Test (linux/amd64) (push) Has been cancelled
CI / Lint (push) Has been cancelled
CI / Build (windows/amd64) (push) Has been cancelled
CI / Build (linux/amd64) (push) Has been cancelled
CI / Build (linux/arm64) (push) Has been cancelled
Allow-list filter @system-service excludes some syscalls Go's
runtime + restic's file scanner reach for; init job died
immediately with "bad system call (core dumped)". CapabilityBounding
already constrains what root can do; the Protect*/Restrict* toggles
still cover network / kernel / mount / namespace. Net effect on the
threat model is negligible vs the operational cost.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 13:40:43 +01:00
steve ee3ee241ea P1 polish: agent-as-root, init-repo flow, rest creds passthrough, UX fixes
CI / Test (linux/amd64) (push) Has been cancelled
CI / Lint (push) Has been cancelled
CI / Build (windows/amd64) (push) Has been cancelled
CI / Build (linux/amd64) (push) Has been cancelled
CI / Build (linux/arm64) (push) Has been cancelled
Cohesive batch from a smoke-test session against a real rest-server.
Themed bullets:

* Agent runs as root, sandboxed via systemd. CapabilityBoundingSet
  drops to CAP_DAC_READ_SEARCH + restore caps; ProtectSystem=strict
  with ReadWritePaths confined to /etc + /var/lib/restic-manager;
  NoNewPrivileges blocks escalation. Install script no longer
  creates a service user. spec.md §4.2 / §14.1 / §14.3 explain the
  rationale (matches UrBackup / Veeam / Bareos defaults; trying to
  back up "everything" as an unprivileged user creates silent skips
  on /home, /root, /var/lib/* with no upside vs the threat model
  the agent already implies).

* Init-repo end-to-end. New JobKind="init" wired through agent
  runner, restic.Env.RunInit, server dispatcher, and a UI button
  (red "Initialise repo" in the run-now panel). hosts.repo_initialised_at
  flips on init success, on backup success, or on a non-empty
  snapshots.report. The "Run now" / "Init" / "Retry" branching now
  drives both the dashboard host row and the host-detail panel.
  Migrations 0004 (column), 0005 (jobs.kind CHECK widened — using
  the safe create-new-then-rename pattern; first version corrupted
  job_logs.job_id FK), 0006 (cleans up job_logs FK on already-
  affected DBs).

* rest-server creds embedded at exec time only. restic.Env gains
  RepoUsername; mergeRestCreds() builds the user:pass@-prefixed URL
  inside envSlice() and never assigns it back to the struct, so
  nothing slog-able ever sees the cleartext form. RedactURL helper
  for any future surface that needs to log a URL safely. Both
  helpers tested.

* Add-host UX. Repo password is now optional — server mints a
  24-byte URL-safe random one and surfaces it once, alongside an
  htpasswd snippet ("echo PASS | htpasswd -B -i ... USERNAME") so
  the operator pastes one command on the rest-server host and one
  on the endpoint. Result page also links the install snippet at
  /install/install.sh (was /install.sh — 404'd before) and pipes
  to bash (not sh — script uses set -o pipefail and other
  bashisms; on Debian/Ubuntu sh is dash).

* Late-subscriber race in JobHub. A fast-failing job could finish
  (DB write + Broadcast) before the browser's HX-Redirect → page
  load → WS-connect path completed, so the JS sat forever waiting
  on a job.finished that already passed. JobHub split into
  Register + Send + Run; handleJobStream now subscribes first,
  re-fetches the job, and sends a synthetic job.finished if the
  state is already terminal.

* HTMX error visibility. New toast partial listens to
  htmx:responseError and surfaces the response body as a
  bottom-right toast — every server-side validation error now
  becomes visible without per-handler JS wiring. Also handles
  custom rm:toast events for future server-pushed notifications
  via the HX-Trigger header. Themed via existing CSS vars.

* Dashboard rows are now whole-row clickable to host detail
  (CSS card-link pattern: absolute-positioned anchor + .row-action
  z-index restoration so the action button stays clickable).
  "View →" on a running job links to /jobs/<id> rather than
  /hosts/<id> since the row click already covers the host page.

* "Run first" / "Run first backup" → "Run now" everywhere for
  consistency.

* runbook (docs/e2e-smoke.md) updated — live-log streaming step
  now reflects P1-26; mentions the browser-driven Run-now flow.

* _diag/dump-creds — moved out of cmd/ so go build doesn't pick
  it up; .gitignore now excludes /_diag/ entirely.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 11:02:12 +01:00
steve 41a4043af3 server: drop in-process TLS — HTTP-only behind reverse proxy
Self-hosted deployments already terminate TLS at Caddy/Traefik/nginx;
making the server do TLS too means double cert config, dual ACME
plumbing, and an untested code path. Drop RM_TLS_CERT/RM_TLS_KEY,
remove TLSEnabled() and the ListenAndServeTLS branch.

Replace the cookie's "Secure if TLS-in-process" check with a new
RM_COOKIE_SECURE flag (default true). Local HTTP-only testing sets
RM_COOKIE_SECURE=false; production is always behind a TLS proxy and
the cookie stays Secure.

Default port :8443 → :8080. docker-compose binds 127.0.0.1 only and
populates RM_TRUSTED_PROXY. spec.md §4.1/§10.1 rewritten with a
Caddyfile snippet and a hard "do not expose RM_LISTEN publicly"
warning. enrollResponse keeps cert_pin_sha256 in the shape but the
server can't introspect a cert it doesn't terminate — operator
pastes the proxy's hash into -cert-pin at install time.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 11:20:41 +01:00
steve e8eccd20c2 phase 1: agent install path — systemd unit, install.sh, asset endpoints
P1-14 deploy/install/restic-manager-agent.service: standard systemd
  unit with the usual hardening switches (NoNewPrivileges, Protect*,
  RestrictRealtime, MemoryDenyWriteExecute). Restart=always with a
  5s backoff. Runs as a dedicated unprivileged restic-manager-agent
  user; the install script creates it.

P1-29 deploy/install/install.sh: arch detection (amd64/arm64), pulls
  the agent binary from /agent/binary, creates the service user
  + dirs (/etc/restic-manager, /var/lib/restic-manager), runs
  enrollment via `agent -enroll-server -enroll-token`, lays down
  the systemd unit, enables and starts it.

  Honours the spec's "detect, don't auto-disable" rule for existing
  schedulers: scans systemd timers, /etc/cron.d/*, /etc/cron.daily/*,
  root crontab for restic-named entries and prints them with the
  exact disable command — operator decides.

P1-31 server endpoints to ship the agent installation payload:
  GET /agent/binary?os=linux&arch=amd64 → serves
    <DataDir>/agent-binaries/restic-manager-agent-linux-amd64
  GET /install/<file>                   → serves
    <DataDir>/install/<file>
  Both endpoints reject path traversal and return 404 if the file
  isn't published. Operators drop the binaries + service unit into
  these directories at release time. Signed-bundle verification is
  deferred to Phase 5 OSS readiness.

All tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 00:40:36 +01:00
steve c275f4ff4c phase 1 foundations: api types, store, crypto, auth
Lands the bottom three layers of Phase 1:

P1-08 internal/api: protocol_version + envelope + every WS message
  shape from spec.md §6.2 (Hello, Heartbeat, Job*, Schedule*, etc).
  Wire-format tests pin the JSON shape so a rename here breaks
  tests instead of silently breaking the agent.

P1-02 + P1-03 internal/store: SQLite via modernc.org/sqlite,
  embed.FS + a tiny version table for hand-rolled migrations.
  0001_initial.sql covers every table from spec.md §5 plus
  enrollment_tokens and host_schedule_version. Typed accessors
  for users / sessions / enrollment / audit. WAL + foreign_keys
  + busy_timeout on by default.

P1-06 internal/crypto: XChaCha20-Poly1305 AEAD wrapper with
  per-message random nonce. Key file lifecycle (generate +
  refuse-to-overwrite, load with size validation). Optional
  additionalData binds ciphertext to the row that owns it.

P1-04 internal/auth (partial — passwords + tokens; sessions
  middleware lands with the HTTP handlers): argon2id following
  RFC 9106 (64 MiB / t=3 / p=4 / 32B), constant-time verify.
  HashToken stores SHA-256 of session/agent/enrollment tokens
  so a stolen DB doesn't hand over credentials.

Build floor moves to Go 1.25 (modernc.org/sqlite v1.50+ requires
it); CI + Dockerfile + README updated. Markdown lint diagnostics
on tasks.md cleared.

All packages tested. ~70 new tests pass in <1s.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 00:24:40 +01:00
steve c9368de904 phase 0: project bootstrap
CI / Test (linux/amd64) (push) Has been cancelled
CI / Lint (push) Has been cancelled
CI / Build (windows/amd64) (push) Has been cancelled
CI / Build (linux/amd64) (push) Has been cancelled
CI / Build (linux/arm64) (push) Has been cancelled
P0-01 Go module + cmd/server + cmd/agent skeletons + internal/ tree
P0-02 LICENSE (PolyForm NC 1.0.0), README, CONTRIBUTING
P0-03 golangci-lint, pre-commit, .editorconfig, .gitignore
P0-04 Gitea Actions CI: test (race+coverage), lint, cross-platform build matrix
P0-05 Dockerfile.server (multi-stage, distroless/static), docker-compose.yml
P0-06 Makefile with build/test/lint/fmt/run/release targets

build, vet, test, and cross-compile to linux/{amd64,arm64} + windows/amd64
all verified locally.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 00:03:59 +01:00