Commit Graph

255 Commits

Author SHA1 Message Date
steve 22bcf69e6c http: expose GET /api/version 2026-05-06 21:39:13 +01:00
steve fe1ed49977 version: build-time version package + Makefile ldflags wiring 2026-05-06 21:38:35 +01:00
steve d24856866e plan: P6-01+02 implementation plan 2026-05-06 21:37:38 +01:00
steve 731f01a63e spec: P6-01+02 agent self-update + fleet update design 2026-05-06 21:20:00 +01:00
steve c80ca90efb tasks: rewrite P6-01/02 around server-bundled agent self-update
The original plan was apt repo + Chocolatey package. The P5-03 Docker
pivot bundled matching agent binaries into the server image and
exposes them via /agent/binary, so 'update agent' now collapses to
're-fetch from your own server'. No third-party packaging or signing
infra needed. P6-01 drops to S; P6-02 keeps the dashboard reporting
+ fleet-update UX but points at the new mechanism.
2026-05-06 21:08:22 +01:00
steve c32acc0332 ci(release): use DEV_TOKEN for registry login
Release / Build + push image (push) Successful in 6m58s
The auto-issued GITHUB_TOKEN lacks write:package scope on this Gitea
instance, so the v0.9.0 tag build failed at docker login. Switch to
the user-level DEV_TOKEN secret which has the correct scope.
2026-05-06 19:05:54 +01:00
steve 505a2d7a79 Merge pull request 'testing: bootstrap UI, agent reliability, NS-01..04 + alert username' (#18) from ns-batch-host-ops into main
Release / Build + push image (push) Failing after 2m5s
2026-05-05 21:09:17 +00:00
steve 3800b34a2b testing: bootstrap UI, agent reliability, NS-01..04 + alert username
CI / Test (rest) (pull_request) Successful in 29s
CI / Lint (pull_request) Successful in 32s
CI / Build (windows/amd64) (pull_request) Successful in 22s
CI / Test (store) (pull_request) Successful in 1m22s
CI / Test (server-http) (pull_request) Successful in 1m30s
CI / Build (linux/amd64) (pull_request) Successful in 22s
CI / Build (linux/arm64) (pull_request) Successful in 41s
Smoothes the rough edges that came up exercising a live deployment.

First-run bootstrap UI: /bootstrap renders a username + password form
that uses the in-memory token directly (operator no longer copies it
out of the log); /login redirects there while bootstrap is available.

Agent reliability: failJob synthetic envelopes so command.run early
returns no longer hang the server-side job; runtime probe of restic
restore --help drives --no-ownership instead of version sniffing
(0.18.x had it removed). Server unit re-shaped: ProtectSystem=full
plus ReadWritePaths=/etc/restic-manager, no ProtectHome — restore
can now write anywhere a user might want.

Restore wizard: default target is /root/rm-restore/<job-id>/ with
clearer help text. Re-init confirm input uses .field (was .input,
which doesn't exist — text was invisible).

NS-01 host delete: store DeleteHost, admin-band /hosts/{id}/delete
with hostname-confirm danger zone, audit, FK cascade, live WS close.

NS-02 enrollment-token recovery: outstanding-tokens panel on
/hosts/new, regenerate (preserves attachments) and revoke handlers
+ audit, store-level ListOutstandingEnrollmentTokens and
DeleteEnrollmentToken.

NS-03 repo init / probe surface: migration 0020 adds
hosts.repo_status + repo_status_error; WS handler projects every
init job's outcome onto the host row (idempotent already-initialised
collapses to ready); creds-save resets status and dispatches a fresh
probe; /hosts/{id}/repo/probe retry endpoint with banner.

NS-04 dashboard live + sort + filter: query-string filter
(q/status/repo_status/tag/sort/dir), 5s htmx live poll mirroring the
alerts pattern with a localStorage live toggle, sortable column
headers, filter row + clear.

Alerts page: ack'd-by line resolves user_id ULID to username.

Compose.yaml ignored — host-specific.
2026-05-05 22:03:15 +01:00
steve b91fe56c83 Merge pull request 'P5-03 + P5-07: docker-only release path & reference deployment' (#17) from p5-03-docker-release into main
Reviewed-on: #17
2026-05-05 16:36:08 +00:00
steve d6f6d19bff p5-07: reference deployment (server-only compose + reverse-proxy docs)
CI / Test (store) (pull_request) Successful in 21s
CI / Test (rest) (pull_request) Successful in 38s
CI / Lint (pull_request) Successful in 33s
CI / Build (windows/amd64) (pull_request) Successful in 39s
CI / Test (server-http) (pull_request) Successful in 1m17s
CI / Build (linux/amd64) (pull_request) Successful in 23s
CI / Build (linux/arm64) (pull_request) Successful in 39s
The reverse proxy is assumed to live outside this project (Caddy,
nginx, Traefik, whatever the operator already runs). The reference
compose stands up only the server: image-pinned via RM_VERSION,
named volume for operator state, localhost-bound so the proxy
reaches it on loopback.

docs/reverse-proxy.md covers what the proxy must forward — the
X-Forwarded-* headers, Host, and Connection: upgrade for the agent
WebSocket and live-log streams — plus the RM_TRUSTED_PROXY CIDR
rule that gates header trust. Worked examples for Caddy, nginx
(with the websocket upgrade map + 1h proxy_read_timeout for live
logs), and Traefik.
2026-05-05 17:15:00 +01:00
steve 7cc17813a9 p5-03: docker-only release path (drop goreleaser)
Single public deliverable per tag: a multi-arch server image, with
cross-compiled agent binaries + install scripts + the systemd unit
baked under /opt/restic-manager/dist/. The /agent/binary and
/install/* handlers fall back from <DataDir>/... to that read-only
path so a fresh container Just Works without first-run staging;
operators can still drop a custom build into <DataDir>/ to override
per-host.

Architecture rationale: agent distribution already routes through
the running server, so the release surface mirrors that — there's
no second source of truth to keep in sync.

Workflow .gitea/workflows/release.yml triggers on v*.*.* tag-push
(fan-out :vX.Y.Z / :X.Y / :X, plus :latest once MAJOR>=1) and
workflow_dispatch (snapshot tag only). Pushes to the Gitea
container registry on this instance.

Both binaries grow main.commit + main.date ldflag targets. Makefile
and Dockerfile fill them; release workflow forwards from gitea.sha
plus a UTC timestamp.

Spec : docs/superpowers/specs/2026-05-05-p5-03-docker-only-release.md
Plan : docs/superpowers/plans/2026-05-05-p5-03-docker-only-release.md
2026-05-05 15:18:48 +01:00
steve 5ee58979fa Merge pull request 'P4-05: OIDC login (generic, JIT-provisioned)' (#16) from p4-05-oidc into main
Reviewed-on: #16
2026-05-05 13:46:23 +00:00
steve 4d90f72575 oidc: merge userinfo claims; tick P4-05 in tasks.md
CI / Test (rest) (pull_request) Successful in 40s
CI / Test (store) (pull_request) Successful in 37s
CI / Build (windows/amd64) (pull_request) Successful in 23s
CI / Test (server-http) (pull_request) Successful in 1m10s
CI / Build (linux/amd64) (pull_request) Successful in 24s
CI / Build (linux/arm64) (pull_request) Successful in 22s
CI / Lint (pull_request) Successful in 58s
Authelia (and many other IdPs) only put `sub` in the ID token by
default, surfacing `preferred_username`/`email`/`groups` from the
userinfo endpoint. Fetch userinfo after id_token verification and
fold its claims into the parsed claim map; the id_token claims
remain authoritative on conflict so the signed assertion still
wins.

Live sweep against https://auth.dcglab.co.uk verified all four
flows: rm-admin → admin JIT, rm-operator → operator JIT (RBAC
denies admin pages), rm-viewer → viewer JIT (RBAC denies operator
pages), rm-other → no_role_match banner with no row created.
Returning rm-admin sign-in resolves to the same row by sub.
Screenshots in _diag/p4-05-sweep/.
2026-05-05 14:06:28 +01:00
steve 3173f85b97 server: build OIDC client at startup; sweep oidc_state on alert tick 2026-05-05 13:45:52 +01:00
steve 962a5affea ui(users): oidc chip on list + readonly fields on edit for OIDC users 2026-05-05 13:42:57 +01:00
steve 885439b048 ui: login page — SSO button + oidc_error banner 2026-05-05 13:40:13 +01:00
steve c62d7d3ac3 http: local-login rejects auth_source='oidc' users 2026-05-05 13:37:07 +01:00
steve 86598d6357 http: logout — 303 to end_session_endpoint with id_token_hint for OIDC sessions 2026-05-05 13:34:47 +01:00
steve c55a75355a http: GET /auth/oidc/callback — JIT-provision, refresh, deny paths 2026-05-05 13:30:00 +01:00
steve f56844b5c6 http: GET /auth/oidc/login — generate state/PKCE, redirect to IdP 2026-05-05 13:26:06 +01:00
steve 878c82a328 oidc: test stub IdP + happy-path exchange test 2026-05-05 13:23:16 +01:00
steve e7d891c4fc oidc: client wrapper around go-oidc — discovery, exchange, claim parse 2026-05-05 13:20:08 +01:00
steve 5c844ad9b7 config: OIDCConfig — YAML + env overlay with defaults 2026-05-05 13:18:01 +01:00
steve 6006cad992 store: oidc_state CRUD + 5-minute cleanup 2026-05-05 13:15:45 +01:00
steve 7f8bd13a07 store: round-trip IDToken on sessions for RP-initiated logout 2026-05-05 13:14:27 +01:00
steve 805380f52d store: GetUserByOIDCSubject + scanUser auth_source/oidc_subject 2026-05-05 13:12:11 +01:00
steve c2581e56e8 store: extend User with AuthSource/OIDCSubject; Session with IDToken 2026-05-05 13:09:49 +01:00
steve dc89997307 store: migration 0019 — users.auth_source/oidc_subject + sessions.id_token + oidc_state 2026-05-05 13:08:15 +01:00
steve cdbd8eeb88 plan: P4-05 — OIDC login implementation plan
Bite-sized TDD tasks across 7 slices (A schema, B config, C OIDC
client core + stub IdP, D login + callback, E logout + local-login
rejection, F UI, G wiring + Authelia sweep). Each task is one
commit with concrete code blocks and test cases — no placeholders.

Refs spec at docs/superpowers/specs/2026-05-05-p4-05-oidc-design.md.
Authelia bundle for the sweep stashed at /tmp/rm-smoke/oidc.env.
2026-05-05 13:04:39 +01:00
steve bc19ad8804 spec: P4-05 — Authelia-specific defaults
Confirmed claim name from the lab IdP is 'groups' (not 'roles' as
the original spec assumed). Default the role_claim config field to
'groups' which also matches Keycloak and Authentik out of the box.
Add a 'display_name' field so the SSO button can read 'Sign in with
Authelia' rather than the generic 'SSO'.

Two new gotchas captured:
  - Authelia 4.39+ 'sub' is an opaque UUID, not username — the
    locked design already keys on sub + reads preferred_username
    for display, so this is just documentation.
  - end_session_endpoint isn't always published (Authelia config-
    dependent); the locked logout flow already degrades cleanly.
2026-05-05 12:56:16 +01:00
steve 814e49cb93 spec: P4-05 — OIDC login design
Brainstormed shape locked: JIT-provision local rows on first OIDC
sign-in (auth_source='oidc'), YAML-only config (no UI), 'roles'
claim with deny-on-no-match default, preferred_username with email
fallback, refuse on local-user collision, single provider, login
page shows SSO above password (break-glass), front-channel logout
only, role re-evaluation at login only.

Migration 0019: users.auth_source + users.oidc_subject (partial
unique index), sessions.id_token (for end_session id_token_hint),
oidc_state table for the OAuth round-trip state, swept on the
existing alert-engine tick.

Composes with the user-management work from P4-03/04: admin can
disable OIDC users like local; last-admin guard catches IdP role-
mapping mistakes; audit trail covers JIT-provision via
user.created with auth_source payload + new user.oidc_login /
user.oidc_login_blocked actions.

Out of scope (deferred): back-channel logout, multi-provider,
UI-driven role mapping, refresh tokens / mid-session re-eval.
2026-05-05 12:04:09 +01:00
steve 4b48925edf Merge pull request 'Phase 4 — P4-07: per-host tags + dashboard chip-row filter' (#15) from p4-07-host-tags into main
Reviewed-on: #15
2026-05-05 10:55:11 +00:00
steve 36fd9050fe ui(tags): edit-button label, Save-tags width, persistent help text
CI / Test (store) (pull_request) Successful in 44s
CI / Test (rest) (pull_request) Successful in 48s
CI / Test (server-http) (pull_request) Successful in 1m8s
CI / Lint (pull_request) Successful in 37s
CI / Build (windows/amd64) (pull_request) Successful in 39s
CI / Build (linux/amd64) (pull_request) Successful in 21s
CI / Build (linux/arm64) (pull_request) Successful in 26s
2026-05-05 11:23:36 +01:00
steve 89d4458866 feat(hosts): per-host tags edit + dashboard chip-row filter (P4-07) 2026-05-05 11:16:09 +01:00
steve 191f0f1c55 tasks: defer update delivery + observability to Phase 6
Pull the operator-experience polish out of Phase 4 so a working v1
ships sooner. Phase 4 keeps RBAC + user mgmt (already done), OIDC,
and host tags. Deferred items renumbered as P6-01..P6-05:

  P4-01 → P6-01  apt + Chocolatey update delivery
  P4-02 → P6-02  agent-version-behind-server tracking on dashboard
  P4-06 → P6-03  repo size trend graphs
  P4-08 → P6-04  Prometheus /metrics endpoint
  P4-09 → P6-05  Grafana dashboard JSON + integration docs

None of these gate getting the system into production. They land
after Phase 5 (OSS readiness) on the new Phase 6.

Phase 4 remaining: P4-05 (OIDC login) + P4-07 (per-host tags +
dashboard filtering).
2026-05-05 11:05:11 +01:00
steve 00b926b0a3 Merge pull request 'Phase 4 — P4-03/04: RBAC + user management' (#14) from p4-03-04-rbac-user-mgmt into main
Reviewed-on: #14
2026-05-05 10:01:43 +00:00
steve dfff6d1ef9 ui(users): banner explaining the disabled-username re-enable flow
CI / Test (rest) (pull_request) Successful in 29s
CI / Lint (pull_request) Successful in 32s
CI / Test (server-http) (pull_request) Successful in 1m9s
CI / Test (store) (pull_request) Successful in 1m13s
CI / Build (windows/amd64) (pull_request) Successful in 23s
CI / Build (linux/amd64) (pull_request) Successful in 21s
CI / Build (linux/arm64) (pull_request) Successful in 37s
2026-05-05 10:57:25 +01:00
steve 0415a96e27 ui(users): record last_login on /setup + sortable headers 2026-05-05 10:57:25 +01:00
steve d85e82110f tasks: tick P4-03/04 + sweep notes
Live Playwright + curl sweep on the smoke env exercised the full
user-management lifecycle:

  admin add user → setup link generated → curl-as-new-user fetches
  /setup (200, username on page) → POSTs password → 303 to / with
  Set-Cookie → 200 on dashboard, 200 on /settings/account,
  **403 on /settings/users** (admin-only) → admin disables → next
  request is **401** + session row count drops to 0 → audit log
  reflects user.created + user.setup_completed.

Three-role middleware enforces band gates; admin is fail-closed
default. Setup tokens are sha256-hashed at rest with 1h expiry;
expired tokens are swept on the alert engine's 60s tick. Last-admin
guard rejects disable + demote of the only enabled admin. Self-
service password change at /settings/account is reachable by every
role.
2026-05-05 10:57:25 +01:00
steve d2cc4a802e alert: piggy-back expired-setup-token cleanup on the engine tick 2026-05-05 10:57:25 +01:00
steve c34a76393c ui: /settings/account self-service password change
Adds GET/POST handlers for /settings/account in the viewer band
(any authenticated user), account.html template with current-password
field suppressed when must_change_password is set, and audits the
change via AppendAudit.
2026-05-05 10:57:25 +01:00
steve 6ccc6c8c5e ui: /settings/users edit form + disable/enable/regenerate/force-logout 2026-05-05 10:57:25 +01:00
steve b0a5a76925 ui: /settings/users/new + /setup-link page
Adds handleUIUserNewGet, handleUIUserNewPost, handleUIUserSetupLinkGet
to ui_users.go; creates web/templates/pages/user_edit.html (multi-mode
new/edit/setup-link); wires three routes in the admin band of server.go.
2026-05-05 10:57:25 +01:00
steve 88f1959a6a ui: /settings/users list page 2026-05-05 10:57:25 +01:00
steve cae4147df6 http: POST /api/account/password — self-service password change 2026-05-05 10:57:25 +01:00
steve dbb8550936 http: regenerate setup link + force-logout 2026-05-05 10:57:25 +01:00
steve 90bcddb27e http: disable/enable user with last-admin guard + session kick 2026-05-05 10:57:25 +01:00
steve cd3c13e2c6 http: GET/PATCH /api/users/{id} with last-admin guard 2026-05-05 10:57:25 +01:00
steve a74dc33c1c http: POST /api/users — create + setup-token + audit 2026-05-05 10:57:25 +01:00
steve a985d45daa http: GET /api/users (list) 2026-05-05 10:57:25 +01:00