Commit Graph

138 Commits

Author SHA1 Message Date
steve 8be551349c ui: trend panel + range selector on host repo page 2026-05-07 19:10:59 +01:00
steve a48df77f40 ui: 30d repo-size sparkline on every dashboard host row 2026-05-07 19:02:35 +01:00
steve 871490b9d4 ws: record daily repo stats history alongside current upsert 2026-05-07 18:46:26 +01:00
steve 711d5e964c fix: project finished backup jobs onto host row + smoke path tweaks
CI / Test (store) (pull_request) Successful in 50s
CI / Test (rest) (pull_request) Successful in 1m5s
CI / Lint (pull_request) Successful in 24s
CI / Build (linux/amd64) (pull_request) Successful in 22s
CI / Build (windows/amd64) (pull_request) Successful in 43s
CI / Test (server-http) (pull_request) Successful in 1m51s
CI / Build (linux/arm64) (pull_request) Successful in 21s
The dashboard's 'Last backup' column reads hosts.last_backup_at /
last_backup_status, but the WS handler only updated hosts.repo_status
on job.finished — backup terminations were silently dropped. Add a
SetHostLastBackup store method and call it from the same job.finished
switch that already handles init jobs.

Also: CLAUDE.md restage block uses /tmp/rm-smoke (the original
default) but the actual dev env runs out of $HOME/smoke. Update the
paths in the doc to match.
2026-05-07 17:55:23 +01:00
steve ccaccd840a ui: dashboard hosts-behind tile + filter
- Add ?updates=behind query filter and the matching dashboardFilter
  field; round-trips through encode/parse.
- Compute UpdatesBehind on the dashboard view-model (online + version
  trailing the server) and surface as an amber hero tile that links
  to the filtered list.
- Test exercise covering the new filter case.
2026-05-06 22:20:54 +01:00
steve 94441a5371 ui: update chip + per-host button
- Surface UpdateAvailable + TargetVersion on the dashboard host row,
  the host_chrome header, and the JSON Host shape.
- New host_update_chip partial renders an amber out-of-date pill
  next to the agent-version display when the host's agent trails
  the server.
- Host detail right-rail gains an admin-only Update agent button
  (disabled when host is offline or already updating).
- New .update-chip and .btn-amber CSS tokens; tailwind output
  refreshed.
2026-05-06 22:20:40 +01:00
steve 3fa7be51a5 ui: fleet update page + endpoints
- POST /api/fleet/update, POST /api/fleet-updates/{id}/cancel,
  GET /api/fleet-updates/{id} (admin-only).
- GET /settings/fleet-update + /partial for htmx polling.
- Renders idle / running / terminal states with per-host progress.
- Tests cover happy path, derive-host-ids, conflict, cancel, get,
  and RBAC.
2026-05-06 22:20:03 +01:00
steve 6fd2a2ff77 p6-01/02: agent self-update + fleet update server cluster
- alert: update_failed (per-host, dedup=hostID) + fleet_update_halted
  (system-scoped, host_id NULL via new RaiseOrTouchSystem helper).
- ws: UpdateWatcher tracks in-flight command.update dispatches and
  reconciles them against incoming hello envelopes — success path
  marks the job succeeded and auto-resolves the alert; 90s timeout
  marks the job failed and raises update_failed.
- http: POST /api/hosts/{id}/update (admin-only JSON) + the HTMX
  /hosts/{id}/update form variant. Pre-checks: host exists, online,
  agent_version != current, no running update job. Refactored core
  into Server.dispatchHostUpdate so the fleet worker can share it
  without going through HTTP.
- fleetupdate: rolling worker iterating through host slots, halting
  on first failure and raising fleet_update_halted. Polling-based
  version-match (re-read hosts.agent_version every 1s up to 95s) —
  no extra plumbing into the WS hello path. At-most-one-running is
  enforced at the store layer (ErrFleetUpdateRunning).
- cmd/server: wire UpdateWatcher and FleetWorker into the main
  goroutine; the worker uses a small serverDispatcher adapter that
  delegates back into Server.DispatchHostUpdate.

Tests: watcher (success/timeout/mismatch/late-hello), HTTP endpoint
(happy + four pre-check branches + RBAC), worker (two-host happy,
timeout-halt, host-offline-halt, already-at-target skip, cancel
mid-run, double-Start guard).
2026-05-06 22:03:50 +01:00
steve 22bcf69e6c http: expose GET /api/version 2026-05-06 21:39:13 +01:00
steve 3800b34a2b testing: bootstrap UI, agent reliability, NS-01..04 + alert username
CI / Test (rest) (pull_request) Successful in 29s
CI / Lint (pull_request) Successful in 32s
CI / Build (windows/amd64) (pull_request) Successful in 22s
CI / Test (store) (pull_request) Successful in 1m22s
CI / Test (server-http) (pull_request) Successful in 1m30s
CI / Build (linux/amd64) (pull_request) Successful in 22s
CI / Build (linux/arm64) (pull_request) Successful in 41s
Smoothes the rough edges that came up exercising a live deployment.

First-run bootstrap UI: /bootstrap renders a username + password form
that uses the in-memory token directly (operator no longer copies it
out of the log); /login redirects there while bootstrap is available.

Agent reliability: failJob synthetic envelopes so command.run early
returns no longer hang the server-side job; runtime probe of restic
restore --help drives --no-ownership instead of version sniffing
(0.18.x had it removed). Server unit re-shaped: ProtectSystem=full
plus ReadWritePaths=/etc/restic-manager, no ProtectHome — restore
can now write anywhere a user might want.

Restore wizard: default target is /root/rm-restore/<job-id>/ with
clearer help text. Re-init confirm input uses .field (was .input,
which doesn't exist — text was invisible).

NS-01 host delete: store DeleteHost, admin-band /hosts/{id}/delete
with hostname-confirm danger zone, audit, FK cascade, live WS close.

NS-02 enrollment-token recovery: outstanding-tokens panel on
/hosts/new, regenerate (preserves attachments) and revoke handlers
+ audit, store-level ListOutstandingEnrollmentTokens and
DeleteEnrollmentToken.

NS-03 repo init / probe surface: migration 0020 adds
hosts.repo_status + repo_status_error; WS handler projects every
init job's outcome onto the host row (idempotent already-initialised
collapses to ready); creds-save resets status and dispatches a fresh
probe; /hosts/{id}/repo/probe retry endpoint with banner.

NS-04 dashboard live + sort + filter: query-string filter
(q/status/repo_status/tag/sort/dir), 5s htmx live poll mirroring the
alerts pattern with a localStorage live toggle, sortable column
headers, filter row + clear.

Alerts page: ack'd-by line resolves user_id ULID to username.

Compose.yaml ignored — host-specific.
2026-05-05 22:03:15 +01:00
steve 7cc17813a9 p5-03: docker-only release path (drop goreleaser)
Single public deliverable per tag: a multi-arch server image, with
cross-compiled agent binaries + install scripts + the systemd unit
baked under /opt/restic-manager/dist/. The /agent/binary and
/install/* handlers fall back from <DataDir>/... to that read-only
path so a fresh container Just Works without first-run staging;
operators can still drop a custom build into <DataDir>/ to override
per-host.

Architecture rationale: agent distribution already routes through
the running server, so the release surface mirrors that — there's
no second source of truth to keep in sync.

Workflow .gitea/workflows/release.yml triggers on v*.*.* tag-push
(fan-out :vX.Y.Z / :X.Y / :X, plus :latest once MAJOR>=1) and
workflow_dispatch (snapshot tag only). Pushes to the Gitea
container registry on this instance.

Both binaries grow main.commit + main.date ldflag targets. Makefile
and Dockerfile fill them; release workflow forwards from gitea.sha
plus a UTC timestamp.

Spec : docs/superpowers/specs/2026-05-05-p5-03-docker-only-release.md
Plan : docs/superpowers/plans/2026-05-05-p5-03-docker-only-release.md
2026-05-05 15:18:48 +01:00
steve 4d90f72575 oidc: merge userinfo claims; tick P4-05 in tasks.md
CI / Test (rest) (pull_request) Successful in 40s
CI / Test (store) (pull_request) Successful in 37s
CI / Build (windows/amd64) (pull_request) Successful in 23s
CI / Test (server-http) (pull_request) Successful in 1m10s
CI / Build (linux/amd64) (pull_request) Successful in 24s
CI / Build (linux/arm64) (pull_request) Successful in 22s
CI / Lint (pull_request) Successful in 58s
Authelia (and many other IdPs) only put `sub` in the ID token by
default, surfacing `preferred_username`/`email`/`groups` from the
userinfo endpoint. Fetch userinfo after id_token verification and
fold its claims into the parsed claim map; the id_token claims
remain authoritative on conflict so the signed assertion still
wins.

Live sweep against https://auth.dcglab.co.uk verified all four
flows: rm-admin → admin JIT, rm-operator → operator JIT (RBAC
denies admin pages), rm-viewer → viewer JIT (RBAC denies operator
pages), rm-other → no_role_match banner with no row created.
Returning rm-admin sign-in resolves to the same row by sub.
Screenshots in _diag/p4-05-sweep/.
2026-05-05 14:06:28 +01:00
steve 962a5affea ui(users): oidc chip on list + readonly fields on edit for OIDC users 2026-05-05 13:42:57 +01:00
steve 885439b048 ui: login page — SSO button + oidc_error banner 2026-05-05 13:40:13 +01:00
steve c62d7d3ac3 http: local-login rejects auth_source='oidc' users 2026-05-05 13:37:07 +01:00
steve 86598d6357 http: logout — 303 to end_session_endpoint with id_token_hint for OIDC sessions 2026-05-05 13:34:47 +01:00
steve c55a75355a http: GET /auth/oidc/callback — JIT-provision, refresh, deny paths 2026-05-05 13:30:00 +01:00
steve f56844b5c6 http: GET /auth/oidc/login — generate state/PKCE, redirect to IdP 2026-05-05 13:26:06 +01:00
steve 878c82a328 oidc: test stub IdP + happy-path exchange test 2026-05-05 13:23:16 +01:00
steve e7d891c4fc oidc: client wrapper around go-oidc — discovery, exchange, claim parse 2026-05-05 13:20:08 +01:00
steve 5c844ad9b7 config: OIDCConfig — YAML + env overlay with defaults 2026-05-05 13:18:01 +01:00
steve 89d4458866 feat(hosts): per-host tags edit + dashboard chip-row filter (P4-07) 2026-05-05 11:16:09 +01:00
steve dfff6d1ef9 ui(users): banner explaining the disabled-username re-enable flow
CI / Test (rest) (pull_request) Successful in 29s
CI / Lint (pull_request) Successful in 32s
CI / Test (server-http) (pull_request) Successful in 1m9s
CI / Test (store) (pull_request) Successful in 1m13s
CI / Build (windows/amd64) (pull_request) Successful in 23s
CI / Build (linux/amd64) (pull_request) Successful in 21s
CI / Build (linux/arm64) (pull_request) Successful in 37s
2026-05-05 10:57:25 +01:00
steve 0415a96e27 ui(users): record last_login on /setup + sortable headers 2026-05-05 10:57:25 +01:00
steve c34a76393c ui: /settings/account self-service password change
Adds GET/POST handlers for /settings/account in the viewer band
(any authenticated user), account.html template with current-password
field suppressed when must_change_password is set, and audits the
change via AppendAudit.
2026-05-05 10:57:25 +01:00
steve 6ccc6c8c5e ui: /settings/users edit form + disable/enable/regenerate/force-logout 2026-05-05 10:57:25 +01:00
steve b0a5a76925 ui: /settings/users/new + /setup-link page
Adds handleUIUserNewGet, handleUIUserNewPost, handleUIUserSetupLinkGet
to ui_users.go; creates web/templates/pages/user_edit.html (multi-mode
new/edit/setup-link); wires three routes in the admin band of server.go.
2026-05-05 10:57:25 +01:00
steve 88f1959a6a ui: /settings/users list page 2026-05-05 10:57:25 +01:00
steve cae4147df6 http: POST /api/account/password — self-service password change 2026-05-05 10:57:25 +01:00
steve dbb8550936 http: regenerate setup link + force-logout 2026-05-05 10:57:25 +01:00
steve 90bcddb27e http: disable/enable user with last-admin guard + session kick 2026-05-05 10:57:25 +01:00
steve cd3c13e2c6 http: GET/PATCH /api/users/{id} with last-admin guard 2026-05-05 10:57:25 +01:00
steve a74dc33c1c http: POST /api/users — create + setup-token + audit 2026-05-05 10:57:25 +01:00
steve a985d45daa http: GET /api/users (list) 2026-05-05 10:57:25 +01:00
steve 57a13f0759 http: POST /setup — set password, drop session, audit setup_completed
Replaces the 501 stub with the full handler: validates the token and
password, hashes and stores the password, deletes the setup token,
mints an 8-hour session cookie, appends a user.setup_completed audit
entry, and redirects to /. Adds TestSetupPostHappyPath covering the
full round-trip including normal-login verification after setup.
2026-05-05 10:57:24 +01:00
steve 8d4c4426b0 http: GET /setup landing page with expiry handling 2026-05-05 10:57:24 +01:00
steve cbdd94ca12 http: session/login reject disabled users; mid-session disable kicks immediately 2026-05-05 10:57:24 +01:00
steve c1e974aad9 http: re-group routes by role band, fail-closed admin default
Routes are now structured into Public / Viewer / Operator / Admin bands
using requireRole middleware. Job log stream and download moved into the
Viewer band. healthz moved from New() into routes() with the other
public endpoints.
2026-05-05 10:57:24 +01:00
steve 95aee73e2c http: gated test for admin-band reject of operator (lands fully in B4+E1) 2026-05-05 10:57:24 +01:00
steve f87ba29836 http: requireRole middleware + 403 forbidden page 2026-05-05 10:57:24 +01:00
steve 2073898c10 http: test helpers — makeUser, loginAs 2026-05-05 10:57:24 +01:00
steve 37a25beb14 http: roleAtLeast helper for the role hierarchy 2026-05-05 10:57:24 +01:00
steve ba425c9766 feat(audit): clickable column headers with asc/desc sort
CI / Build (windows/amd64) (pull_request) Successful in 23s
CI / Lint (pull_request) Successful in 34s
CI / Build (linux/amd64) (pull_request) Successful in 23s
CI / Build (linux/arm64) (pull_request) Successful in 21s
CI / Test (linux/amd64) (pull_request) Successful in 3m41s
2026-05-05 08:15:22 +01:00
steve 1d0d994bc4 audit(csv): drop user_id and target_id columns 2026-05-05 08:05:41 +01:00
steve 489f831fc7 feat(audit): CSV export, absolute timestamps, payload modal 2026-05-05 08:00:53 +01:00
steve 3f36bcd0b0 feat(audit): P3-08 — audit log UI with filters 2026-05-05 07:49:25 +01:00
steve 9860b412f7 feat(alerts): live-refresh the table every 15s while the tab is visible
The alerts list is the one screen where staleness is genuinely
harmful — an operator can be looking at an Open tab that's already
been resolved by another admin or auto-resolved by the engine, and
take action on a row that no longer exists.

Add an htmx poll on just the table panel:

  hx-get        same URL with current querystring (filters preserved)
  hx-trigger    every 15s, only when document is visible (no idle CPU)
  hx-select     #alerts-table — pull this element out of the response
  hx-swap       outerHTML

Polling lives on the table div, not the page root, so the filter
strip and header don't flash on each tick. Header gains a small
'live ●' label so the polling is discoverable.

RefreshURL is r.URL.RequestURI() on the server side — keeps any
status/severity/host_id/q params intact across refreshes.

Other screens (dashboard, hosts, jobs) deliberately stay manual-
refresh per the project's anti-flicker stance.
2026-05-04 23:30:19 +01:00
steve a45c801884 feat(alerts): per-source-group dedup so two failing backups produce two alerts
Until now the open-alert key was (host_id, kind, resolved_at IS NULL).
A host with two source groups both failing collapsed onto one
backup_failed row — second failure bumped last_seen_at and
overwrote the message but never re-fan-out. Operators saw one
alert that appeared to flap, not two distinct broken things.

Schema changes (column-level ALTER, no rebuild):

- 0015 jobs.source_group_id (FK → source_groups, ON DELETE SET NULL,
  index). Populated for backup jobs in CreateJob.
- 0016 alerts.dedup_key (NOT NULL DEFAULT ''). The old alerts_open
  partial index gets dropped and replaced with a UNIQUE partial
  index on (host_id, kind, dedup_key) WHERE resolved_at IS NULL —
  the index is now the actual dedup primitive.

Plumbing:

- RaiseOrTouch / AutoResolve / Alert struct gain dedup_key.
- engine.JobFinishedEvent gains SourceGroupID; handleJobFinished
  passes it through for backup_failed only (forget/prune/check stay
  repo-scoped with key='').
- ws.handler reads SourceGroupID off the freshly-loaded job row.
- dispatchJobWithPayload gains a *string sourceGroupID arg; the
  per-group Run-now path and schedule.fire path pass &g.ID.

Test coverage: TestRaiseOrTouchDedupsPerSourceGroup proves two
distinct groups produce two distinct open alerts and that resolving
one does not auto-resolve the other.

Dev tool: cmd/_fake_alert gains -dedup-key flag.
2026-05-04 22:59:48 +01:00
steve feaeff217d feat(ntfy): support HTTP Basic auth alongside access tokens
CI / Build (windows/amd64) (pull_request) Successful in 22s
CI / Build (linux/amd64) (pull_request) Successful in 22s
CI / Build (linux/arm64) (pull_request) Successful in 21s
CI / Lint (pull_request) Successful in 1m12s
CI / Test (linux/amd64) (pull_request) Successful in 1m18s
Self-hosted ntfy that doesn't expose a token-mint endpoint can still
authenticate over HTTP Basic. Add Username + Password fields to
NtfyConfig; the channel sends 'Authorization: Basic …' when token is
empty and username is set. Token wins when both are configured.

Form-side: two new optional fields next to the access token, with
the same write-only placeholder treatment as smtp_password (blank
on edit means 'keep stored value'). Username is round-tripped on
edit; password is masked.
2026-05-04 22:25:42 +01:00
steve cffad4b4f3 fix: enabled toggle — list-row click + edit-form save
CI / Build (windows/amd64) (pull_request) Successful in 22s
CI / Build (linux/amd64) (pull_request) Successful in 24s
CI / Build (linux/arm64) (pull_request) Successful in 24s
CI / Lint (pull_request) Successful in 1m15s
CI / Test (linux/amd64) (pull_request) Successful in 1m36s
Two bugs in the channel-enabled affordance:

1. List-row toggle was a static span with no handler; the row's
   row-link overlay swallowed every click and routed to /edit. Add
   POST /settings/notifications/{id}/toggle backed by a new store
   method SetNotificationChannelEnabled, and turn the row toggle
   into an htmx-driven button that swaps in the new state. Use
   event.stopPropagation() on the toggle so it beats the row link.

2. Edit-form toggle visually flipped but the underlying checkbox
   reverted: the visual span lives inside the <label>, so clicking
   it fired the inline JS handler AND the label's native
   checkbox-toggle, cancelling out. Bind to the checkbox 'change'
   event instead and let the label do the toggling — the JS just
   mirrors check.checked into the .on class.
2026-05-04 22:21:45 +01:00