restic-manager

Author	SHA1	Message	Date
steve	c319c38038	nav: drop dead /repos top-level link (repos are per-host, accessed via host sub-tab) CI / Test (store) (pull_request) Successful in 6s Details CI / Test (rest) (pull_request) Successful in 8s Details CI / Build (windows/amd64) (pull_request) Successful in 8s Details CI / Build (linux/amd64) (pull_request) Successful in 7s Details CI / Lint (pull_request) Successful in 19s Details CI / Build (linux/arm64) (pull_request) Successful in 7s Details e2e / Playwright vs docker-compose (pull_request) Successful in 1m31s Details CI / Test (server-http) (pull_request) Successful in 2m43s Details	2026-05-09 11:59:08 +01:00
steve	130b68226e	api: expose host.repo_status in /api/hosts JSON CI / Test (rest) (pull_request) Successful in 11s Details CI / Test (store) (pull_request) Successful in 10s Details CI / Build (windows/amd64) (pull_request) Successful in 15s Details CI / Lint (pull_request) Successful in 19s Details CI / Build (linux/arm64) (pull_request) Successful in 7s Details CI / Build (linux/amd64) (pull_request) Successful in 15s Details CI / Test (server-http) (pull_request) Successful in 1m30s Details e2e / Playwright vs docker-compose (pull_request) Failing after 6m36s Details The dashboard renders init_running / init_failed / ready state based on host.repo_status, but the JSON endpoint dropped the field on its way out. The e2e test couldn't poll for repo readiness; reflect the same projection the UI uses.	2026-05-08 22:06:22 +01:00
steve	21567adb8e	runner tests: probe-exec setupScript to clear overlayfs ETXTBSY CI / Test (rest) (pull_request) Successful in 7s Details CI / Test (server-http) (pull_request) Successful in 1m37s Details CI / Test (store) (pull_request) Successful in 5s Details CI / Lint (pull_request) Successful in 21s Details CI / Build (windows/amd64) (pull_request) Successful in 10s Details CI / Build (linux/arm64) (pull_request) Successful in 9s Details CI / Build (linux/amd64) (pull_request) Successful in 1m2s Details e2e / Playwright vs docker-compose (pull_request) Failing after 5m0s Details The original write-tmp-then-rename guard handles the ETXTBSY race on a vanilla filesystem, but inside the new ci-runner-go container our jobs land on overlayfs, which keeps a lagged "writable inode" view long enough to leak ETXTBSY into the exec the test does milliseconds later. After rename, probe-exec the file with a benign argument ("__rm_probe__" — every script's case statement falls through to a clean exit) until exec succeeds. Each script body is shaped `case "$1" in restore) ... ;; esac` so the probe is a no-op. 3s deadline keeps a stuck filesystem from hanging the suite.	2026-05-08 21:26:35 +01:00
steve	73e733be61	P6-04+05: Prometheus /metrics endpoint + Grafana dashboard CI / Test (rest) (pull_request) Successful in 41s Details CI / Test (store) (pull_request) Successful in 43s Details CI / Lint (pull_request) Successful in 29s Details CI / Build (windows/amd64) (pull_request) Successful in 44s Details CI / Test (server-http) (pull_request) Successful in 1m47s Details CI / Build (linux/arm64) (pull_request) Successful in 43s Details CI / Build (linux/amd64) (pull_request) Successful in 2m1s Details New internal/server/metrics package emits the legacy text/plain exposition format directly, so we don't pull in prometheus/client_golang. Endpoint is opt-in via RM_METRICS_TOKEN and/or RM_METRICS_TRUSTED_CIDR; route is not mounted at all if neither gate is set. Both gates ANDed when both configured. Per-host gauges (online, last_backup_*, repo_size_bytes, snapshot_count, open_alerts, repo_status), server gauges (hosts_total/online, active_alerts by severity, build_info), and an in-memory job-duration histogram observed from the existing MsgJobFinished branch in the WS handler. Docs in docs/prometheus.md (enable + scrape config + metric reference + dashboard import). Sample dashboard at deploy/grafana/restic-manager-dashboard.json - six panels, Grafana schema 39, single Prometheus datasource variable. Tests: golden render, concurrent observe, bucket boundaries in the metrics package; auth matrix (no auth -> 404, token gate, CIDR gate, both required) in the HTTP layer.	2026-05-07 23:17:15 +01:00
steve	c4dc9e9119	ui+store: dashboard polish — repo size projection + header alignment - Project total_size_bytes onto hosts.repo_size_bytes inside the UpsertHostRepoStats transaction. The hosts row column has been unwritten since the initial schema in 0001, so the dashboard's Repo size cell has always rendered '—' even after backups. Now the column updates atomically alongside the host_repo_stats row, and FleetSummary's SUM(repo_size_bytes) becomes accurate too. - Right-align the Alerts column header so it sits over its right-aligned value (was floating left of column, ambiguous). - Add text-ink-mid to the 30d trend / Alerts / Tags headers so all column headers share the same brightness.	2026-05-07 22:55:21 +01:00
steve	7011510092	ui: chart polish — rotated y-axis labels, wider viewBox, single-day fallback - Add rotated 'Size' (left) and 'Snapshots' (right) axis titles in the chart's outer margins so the two y-axes are self-describing. - Bump the chart viewBox from 600x220 to 640x220 and lift padL from 56 to 72 so the rotated labels and byte tick numbers don't crowd. - Dedupe the X-axis labels for short windows (1 or 2 days collapsed the start/mid/end indices onto each other, stacking 'May 7' three times); the 1-day case now centres a single label, 2-day uses start+end only. - Pin a lone data dot to the chart centre instead of the left edge when len(days)==1, so it sits under the centred date label. Goldens regenerated.	2026-05-07 22:55:12 +01:00
steve	42eeabea9a	ui: per-host Jobs sub-tab; drop unused Settings stub Adds /hosts/{id}/jobs page listing recent jobs for the host (newest first, capped at 100) with click-through to /jobs/{id}. Converts the Jobs placeholder <div> to a real <a> nav link; removes the Settings stub entirely. Also registers durationHuman template func and a .jobs-row CSS grid to match the existing .schd-row idiom.	2026-05-07 22:49:10 +01:00
steve	7b390e9e5e	ws: synthesize job.finished from update watcher so browser stream wakes up	2026-05-07 20:32:48 +01:00
steve	2562b2c7b5	test: assert Trend panel renders on full repo page	2026-05-07 19:14:34 +01:00
steve	8be551349c	ui: trend panel + range selector on host repo page	2026-05-07 19:10:59 +01:00
steve	a48df77f40	ui: 30d repo-size sparkline on every dashboard host row	2026-05-07 19:02:35 +01:00
steve	70769f0841	web/sparkline: guard days[i] against shorter days slice in RenderChart	2026-05-07 18:58:33 +01:00
steve	ea74965830	web/sparkline: two-axis trend chart with hover dots	2026-05-07 18:55:31 +01:00
steve	9c209a952e	web/sparkline: inline-SVG sparkline renderer (empty / single / multi)	2026-05-07 18:50:23 +01:00
steve	871490b9d4	ws: record daily repo stats history alongside current upsert	2026-05-07 18:46:26 +01:00
steve	d317d2e561	store: history table helpers (upsert/list, COALESCE preserves prior values)	2026-05-07 18:43:20 +01:00
steve	00bfef0aee	store: migration 0023 host_repo_stats_history	2026-05-07 18:39:44 +01:00
steve	711d5e964c	fix: project finished backup jobs onto host row + smoke path tweaks CI / Test (store) (pull_request) Successful in 50s Details CI / Test (rest) (pull_request) Successful in 1m5s Details CI / Lint (pull_request) Successful in 24s Details CI / Build (linux/amd64) (pull_request) Successful in 22s Details CI / Build (windows/amd64) (pull_request) Successful in 43s Details CI / Test (server-http) (pull_request) Successful in 1m51s Details CI / Build (linux/arm64) (pull_request) Successful in 21s Details The dashboard's 'Last backup' column reads hosts.last_backup_at / last_backup_status, but the WS handler only updated hosts.repo_status on job.finished — backup terminations were silently dropped. Add a SetHostLastBackup store method and call it from the same job.finished switch that already handles init jobs. Also: CLAUDE.md restage block uses /tmp/rm-smoke (the original default) but the actual dev env runs out of $HOME/smoke. Update the paths in the doc to match.	2026-05-07 17:55:23 +01:00
steve	ccaccd840a	ui: dashboard hosts-behind tile + filter - Add ?updates=behind query filter and the matching dashboardFilter field; round-trips through encode/parse. - Compute UpdatesBehind on the dashboard view-model (online + version trailing the server) and surface as an amber hero tile that links to the filtered list. - Test exercise covering the new filter case.	2026-05-06 22:20:54 +01:00
steve	94441a5371	ui: update chip + per-host button - Surface UpdateAvailable + TargetVersion on the dashboard host row, the host_chrome header, and the JSON Host shape. - New host_update_chip partial renders an amber out-of-date pill next to the agent-version display when the host's agent trails the server. - Host detail right-rail gains an admin-only Update agent button (disabled when host is offline or already updating). - New .update-chip and .btn-amber CSS tokens; tailwind output refreshed.	2026-05-06 22:20:40 +01:00
steve	3fa7be51a5	ui: fleet update page + endpoints - POST /api/fleet/update, POST /api/fleet-updates/{id}/cancel, GET /api/fleet-updates/{id} (admin-only). - GET /settings/fleet-update + /partial for htmx polling. - Renders idle / running / terminal states with per-host progress. - Tests cover happy path, derive-host-ids, conflict, cancel, get, and RBAC.	2026-05-06 22:20:03 +01:00
steve	6fd2a2ff77	p6-01/02: agent self-update + fleet update server cluster - alert: update_failed (per-host, dedup=hostID) + fleet_update_halted (system-scoped, host_id NULL via new RaiseOrTouchSystem helper). - ws: UpdateWatcher tracks in-flight command.update dispatches and reconciles them against incoming hello envelopes — success path marks the job succeeded and auto-resolves the alert; 90s timeout marks the job failed and raises update_failed. - http: POST /api/hosts/{id}/update (admin-only JSON) + the HTMX /hosts/{id}/update form variant. Pre-checks: host exists, online, agent_version != current, no running update job. Refactored core into Server.dispatchHostUpdate so the fleet worker can share it without going through HTTP. - fleetupdate: rolling worker iterating through host slots, halting on first failure and raising fleet_update_halted. Polling-based version-match (re-read hosts.agent_version every 1s up to 95s) — no extra plumbing into the WS hello path. At-most-one-running is enforced at the store layer (ErrFleetUpdateRunning). - cmd/server: wire UpdateWatcher and FleetWorker into the main goroutine; the worker uses a small serverDispatcher adapter that delegates back into Server.DispatchHostUpdate. Tests: watcher (success/timeout/mismatch/late-hello), HTTP endpoint (happy + four pre-check branches + RBAC), worker (two-host happy, timeout-halt, host-offline-halt, already-at-target skip, cancel mid-run, double-Start guard).	2026-05-06 22:03:50 +01:00
steve	d413896302	store: migrations 0021+0022 + fleet_updates CRUD	2026-05-06 21:47:54 +01:00
steve	74cf24c28b	agent: command.update handler + updater package (Linux + Windows)	2026-05-06 21:42:50 +01:00
steve	22bcf69e6c	http: expose GET /api/version	2026-05-06 21:39:13 +01:00
steve	fe1ed49977	version: build-time version package + Makefile ldflags wiring	2026-05-06 21:38:35 +01:00
steve	3800b34a2b	testing: bootstrap UI, agent reliability, NS-01..04 + alert username CI / Test (rest) (pull_request) Successful in 29s Details CI / Lint (pull_request) Successful in 32s Details CI / Build (windows/amd64) (pull_request) Successful in 22s Details CI / Test (store) (pull_request) Successful in 1m22s Details CI / Test (server-http) (pull_request) Successful in 1m30s Details CI / Build (linux/amd64) (pull_request) Successful in 22s Details CI / Build (linux/arm64) (pull_request) Successful in 41s Details Smoothes the rough edges that came up exercising a live deployment. First-run bootstrap UI: /bootstrap renders a username + password form that uses the in-memory token directly (operator no longer copies it out of the log); /login redirects there while bootstrap is available. Agent reliability: failJob synthetic envelopes so command.run early returns no longer hang the server-side job; runtime probe of restic restore --help drives --no-ownership instead of version sniffing (0.18.x had it removed). Server unit re-shaped: ProtectSystem=full plus ReadWritePaths=/etc/restic-manager, no ProtectHome — restore can now write anywhere a user might want. Restore wizard: default target is /root/rm-restore/<job-id>/ with clearer help text. Re-init confirm input uses .field (was .input, which doesn't exist — text was invisible). NS-01 host delete: store DeleteHost, admin-band /hosts/{id}/delete with hostname-confirm danger zone, audit, FK cascade, live WS close. NS-02 enrollment-token recovery: outstanding-tokens panel on /hosts/new, regenerate (preserves attachments) and revoke handlers + audit, store-level ListOutstandingEnrollmentTokens and DeleteEnrollmentToken. NS-03 repo init / probe surface: migration 0020 adds hosts.repo_status + repo_status_error; WS handler projects every init job's outcome onto the host row (idempotent already-initialised collapses to ready); creds-save resets status and dispatches a fresh probe; /hosts/{id}/repo/probe retry endpoint with banner. NS-04 dashboard live + sort + filter: query-string filter (q/status/repo_status/tag/sort/dir), 5s htmx live poll mirroring the alerts pattern with a localStorage live toggle, sortable column headers, filter row + clear. Alerts page: ack'd-by line resolves user_id ULID to username. Compose.yaml ignored — host-specific.	2026-05-05 22:03:15 +01:00
steve	7cc17813a9	p5-03: docker-only release path (drop goreleaser) Single public deliverable per tag: a multi-arch server image, with cross-compiled agent binaries + install scripts + the systemd unit baked under /opt/restic-manager/dist/. The /agent/binary and /install/* handlers fall back from <DataDir>/... to that read-only path so a fresh container Just Works without first-run staging; operators can still drop a custom build into <DataDir>/ to override per-host. Architecture rationale: agent distribution already routes through the running server, so the release surface mirrors that — there's no second source of truth to keep in sync. Workflow .gitea/workflows/release.yml triggers on v..* tag-push (fan-out :vX.Y.Z / :X.Y / :X, plus :latest once MAJOR>=1) and workflow_dispatch (snapshot tag only). Pushes to the Gitea container registry on this instance. Both binaries grow main.commit + main.date ldflag targets. Makefile and Dockerfile fill them; release workflow forwards from gitea.sha plus a UTC timestamp. Spec : docs/superpowers/specs/2026-05-05-p5-03-docker-only-release.md Plan : docs/superpowers/plans/2026-05-05-p5-03-docker-only-release.md	2026-05-05 15:18:48 +01:00
steve	4d90f72575	oidc: merge userinfo claims; tick P4-05 in tasks.md CI / Test (rest) (pull_request) Successful in 40s Details CI / Test (store) (pull_request) Successful in 37s Details CI / Build (windows/amd64) (pull_request) Successful in 23s Details CI / Test (server-http) (pull_request) Successful in 1m10s Details CI / Build (linux/amd64) (pull_request) Successful in 24s Details CI / Build (linux/arm64) (pull_request) Successful in 22s Details CI / Lint (pull_request) Successful in 58s Details Authelia (and many other IdPs) only put `sub` in the ID token by default, surfacing `preferred_username`/`email`/`groups` from the userinfo endpoint. Fetch userinfo after id_token verification and fold its claims into the parsed claim map; the id_token claims remain authoritative on conflict so the signed assertion still wins. Live sweep against https://auth.dcglab.co.uk verified all four flows: rm-admin → admin JIT, rm-operator → operator JIT (RBAC denies admin pages), rm-viewer → viewer JIT (RBAC denies operator pages), rm-other → no_role_match banner with no row created. Returning rm-admin sign-in resolves to the same row by sub. Screenshots in _diag/p4-05-sweep/.	2026-05-05 14:06:28 +01:00
steve	3173f85b97	server: build OIDC client at startup; sweep oidc_state on alert tick	2026-05-05 13:45:52 +01:00
steve	962a5affea	ui(users): oidc chip on list + readonly fields on edit for OIDC users	2026-05-05 13:42:57 +01:00
steve	885439b048	ui: login page — SSO button + oidc_error banner	2026-05-05 13:40:13 +01:00
steve	c62d7d3ac3	http: local-login rejects auth_source='oidc' users	2026-05-05 13:37:07 +01:00
steve	86598d6357	http: logout — 303 to end_session_endpoint with id_token_hint for OIDC sessions	2026-05-05 13:34:47 +01:00
steve	c55a75355a	http: GET /auth/oidc/callback — JIT-provision, refresh, deny paths	2026-05-05 13:30:00 +01:00
steve	f56844b5c6	http: GET /auth/oidc/login — generate state/PKCE, redirect to IdP	2026-05-05 13:26:06 +01:00
steve	878c82a328	oidc: test stub IdP + happy-path exchange test	2026-05-05 13:23:16 +01:00
steve	e7d891c4fc	oidc: client wrapper around go-oidc — discovery, exchange, claim parse	2026-05-05 13:20:08 +01:00
steve	5c844ad9b7	config: OIDCConfig — YAML + env overlay with defaults	2026-05-05 13:18:01 +01:00
steve	6006cad992	store: oidc_state CRUD + 5-minute cleanup	2026-05-05 13:15:45 +01:00
steve	7f8bd13a07	store: round-trip IDToken on sessions for RP-initiated logout	2026-05-05 13:14:27 +01:00
steve	805380f52d	store: GetUserByOIDCSubject + scanUser auth_source/oidc_subject	2026-05-05 13:12:11 +01:00
steve	c2581e56e8	store: extend User with AuthSource/OIDCSubject; Session with IDToken	2026-05-05 13:09:49 +01:00
steve	dc89997307	store: migration 0019 — users.auth_source/oidc_subject + sessions.id_token + oidc_state	2026-05-05 13:08:15 +01:00
steve	89d4458866	feat(hosts): per-host tags edit + dashboard chip-row filter (P4-07)	2026-05-05 11:16:09 +01:00
steve	dfff6d1ef9	ui(users): banner explaining the disabled-username re-enable flow CI / Test (rest) (pull_request) Successful in 29s Details CI / Lint (pull_request) Successful in 32s Details CI / Test (server-http) (pull_request) Successful in 1m9s Details CI / Test (store) (pull_request) Successful in 1m13s Details CI / Build (windows/amd64) (pull_request) Successful in 23s Details CI / Build (linux/amd64) (pull_request) Successful in 21s Details CI / Build (linux/arm64) (pull_request) Successful in 37s Details	2026-05-05 10:57:25 +01:00
steve	0415a96e27	ui(users): record last_login on /setup + sortable headers	2026-05-05 10:57:25 +01:00
steve	d2cc4a802e	alert: piggy-back expired-setup-token cleanup on the engine tick	2026-05-05 10:57:25 +01:00
steve	c34a76393c	ui: /settings/account self-service password change Adds GET/POST handlers for /settings/account in the viewer band (any authenticated user), account.html template with current-password field suppressed when must_change_password is set, and audits the change via AppendAudit.	2026-05-05 10:57:25 +01:00
steve	6ccc6c8c5e	ui: /settings/users edit form + disable/enable/regenerate/force-logout	2026-05-05 10:57:25 +01:00

1 2 3 4 5

212 Commits