restic-manager

Author	SHA1	Message	Date
steve	6fd2a2ff77	p6-01/02: agent self-update + fleet update server cluster - alert: update_failed (per-host, dedup=hostID) + fleet_update_halted (system-scoped, host_id NULL via new RaiseOrTouchSystem helper). - ws: UpdateWatcher tracks in-flight command.update dispatches and reconciles them against incoming hello envelopes — success path marks the job succeeded and auto-resolves the alert; 90s timeout marks the job failed and raises update_failed. - http: POST /api/hosts/{id}/update (admin-only JSON) + the HTMX /hosts/{id}/update form variant. Pre-checks: host exists, online, agent_version != current, no running update job. Refactored core into Server.dispatchHostUpdate so the fleet worker can share it without going through HTTP. - fleetupdate: rolling worker iterating through host slots, halting on first failure and raising fleet_update_halted. Polling-based version-match (re-read hosts.agent_version every 1s up to 95s) — no extra plumbing into the WS hello path. At-most-one-running is enforced at the store layer (ErrFleetUpdateRunning). - cmd/server: wire UpdateWatcher and FleetWorker into the main goroutine; the worker uses a small serverDispatcher adapter that delegates back into Server.DispatchHostUpdate. Tests: watcher (success/timeout/mismatch/late-hello), HTTP endpoint (happy + four pre-check branches + RBAC), worker (two-host happy, timeout-halt, host-offline-halt, already-at-target skip, cancel mid-run, double-Start guard).	2026-05-06 22:03:50 +01:00
steve	22bcf69e6c	http: expose GET /api/version	2026-05-06 21:39:13 +01:00
steve	3800b34a2b	testing: bootstrap UI, agent reliability, NS-01..04 + alert username CI / Test (rest) (pull_request) Successful in 29s Details CI / Lint (pull_request) Successful in 32s Details CI / Build (windows/amd64) (pull_request) Successful in 22s Details CI / Test (store) (pull_request) Successful in 1m22s Details CI / Test (server-http) (pull_request) Successful in 1m30s Details CI / Build (linux/amd64) (pull_request) Successful in 22s Details CI / Build (linux/arm64) (pull_request) Successful in 41s Details Smoothes the rough edges that came up exercising a live deployment. First-run bootstrap UI: /bootstrap renders a username + password form that uses the in-memory token directly (operator no longer copies it out of the log); /login redirects there while bootstrap is available. Agent reliability: failJob synthetic envelopes so command.run early returns no longer hang the server-side job; runtime probe of restic restore --help drives --no-ownership instead of version sniffing (0.18.x had it removed). Server unit re-shaped: ProtectSystem=full plus ReadWritePaths=/etc/restic-manager, no ProtectHome — restore can now write anywhere a user might want. Restore wizard: default target is /root/rm-restore/<job-id>/ with clearer help text. Re-init confirm input uses .field (was .input, which doesn't exist — text was invisible). NS-01 host delete: store DeleteHost, admin-band /hosts/{id}/delete with hostname-confirm danger zone, audit, FK cascade, live WS close. NS-02 enrollment-token recovery: outstanding-tokens panel on /hosts/new, regenerate (preserves attachments) and revoke handlers + audit, store-level ListOutstandingEnrollmentTokens and DeleteEnrollmentToken. NS-03 repo init / probe surface: migration 0020 adds hosts.repo_status + repo_status_error; WS handler projects every init job's outcome onto the host row (idempotent already-initialised collapses to ready); creds-save resets status and dispatches a fresh probe; /hosts/{id}/repo/probe retry endpoint with banner. NS-04 dashboard live + sort + filter: query-string filter (q/status/repo_status/tag/sort/dir), 5s htmx live poll mirroring the alerts pattern with a localStorage live toggle, sortable column headers, filter row + clear. Alerts page: ack'd-by line resolves user_id ULID to username. Compose.yaml ignored — host-specific.	2026-05-05 22:03:15 +01:00
steve	7cc17813a9	p5-03: docker-only release path (drop goreleaser) Single public deliverable per tag: a multi-arch server image, with cross-compiled agent binaries + install scripts + the systemd unit baked under /opt/restic-manager/dist/. The /agent/binary and /install/* handlers fall back from <DataDir>/... to that read-only path so a fresh container Just Works without first-run staging; operators can still drop a custom build into <DataDir>/ to override per-host. Architecture rationale: agent distribution already routes through the running server, so the release surface mirrors that — there's no second source of truth to keep in sync. Workflow .gitea/workflows/release.yml triggers on v..* tag-push (fan-out :vX.Y.Z / :X.Y / :X, plus :latest once MAJOR>=1) and workflow_dispatch (snapshot tag only). Pushes to the Gitea container registry on this instance. Both binaries grow main.commit + main.date ldflag targets. Makefile and Dockerfile fill them; release workflow forwards from gitea.sha plus a UTC timestamp. Spec : docs/superpowers/specs/2026-05-05-p5-03-docker-only-release.md Plan : docs/superpowers/plans/2026-05-05-p5-03-docker-only-release.md	2026-05-05 15:18:48 +01:00
steve	4d90f72575	oidc: merge userinfo claims; tick P4-05 in tasks.md CI / Test (rest) (pull_request) Successful in 40s Details CI / Test (store) (pull_request) Successful in 37s Details CI / Build (windows/amd64) (pull_request) Successful in 23s Details CI / Test (server-http) (pull_request) Successful in 1m10s Details CI / Build (linux/amd64) (pull_request) Successful in 24s Details CI / Build (linux/arm64) (pull_request) Successful in 22s Details CI / Lint (pull_request) Successful in 58s Details Authelia (and many other IdPs) only put `sub` in the ID token by default, surfacing `preferred_username`/`email`/`groups` from the userinfo endpoint. Fetch userinfo after id_token verification and fold its claims into the parsed claim map; the id_token claims remain authoritative on conflict so the signed assertion still wins. Live sweep against https://auth.dcglab.co.uk verified all four flows: rm-admin → admin JIT, rm-operator → operator JIT (RBAC denies admin pages), rm-viewer → viewer JIT (RBAC denies operator pages), rm-other → no_role_match banner with no row created. Returning rm-admin sign-in resolves to the same row by sub. Screenshots in _diag/p4-05-sweep/.	2026-05-05 14:06:28 +01:00
steve	962a5affea	ui(users): oidc chip on list + readonly fields on edit for OIDC users	2026-05-05 13:42:57 +01:00
steve	885439b048	ui: login page — SSO button + oidc_error banner	2026-05-05 13:40:13 +01:00
steve	c62d7d3ac3	http: local-login rejects auth_source='oidc' users	2026-05-05 13:37:07 +01:00
steve	86598d6357	http: logout — 303 to end_session_endpoint with id_token_hint for OIDC sessions	2026-05-05 13:34:47 +01:00
steve	c55a75355a	http: GET /auth/oidc/callback — JIT-provision, refresh, deny paths	2026-05-05 13:30:00 +01:00
steve	f56844b5c6	http: GET /auth/oidc/login — generate state/PKCE, redirect to IdP	2026-05-05 13:26:06 +01:00
steve	878c82a328	oidc: test stub IdP + happy-path exchange test	2026-05-05 13:23:16 +01:00
steve	e7d891c4fc	oidc: client wrapper around go-oidc — discovery, exchange, claim parse	2026-05-05 13:20:08 +01:00
steve	5c844ad9b7	config: OIDCConfig — YAML + env overlay with defaults	2026-05-05 13:18:01 +01:00
steve	89d4458866	feat(hosts): per-host tags edit + dashboard chip-row filter (P4-07)	2026-05-05 11:16:09 +01:00
steve	dfff6d1ef9	ui(users): banner explaining the disabled-username re-enable flow CI / Test (rest) (pull_request) Successful in 29s Details CI / Lint (pull_request) Successful in 32s Details CI / Test (server-http) (pull_request) Successful in 1m9s Details CI / Test (store) (pull_request) Successful in 1m13s Details CI / Build (windows/amd64) (pull_request) Successful in 23s Details CI / Build (linux/amd64) (pull_request) Successful in 21s Details CI / Build (linux/arm64) (pull_request) Successful in 37s Details	2026-05-05 10:57:25 +01:00
steve	0415a96e27	ui(users): record last_login on /setup + sortable headers	2026-05-05 10:57:25 +01:00
steve	c34a76393c	ui: /settings/account self-service password change Adds GET/POST handlers for /settings/account in the viewer band (any authenticated user), account.html template with current-password field suppressed when must_change_password is set, and audits the change via AppendAudit.	2026-05-05 10:57:25 +01:00
steve	6ccc6c8c5e	ui: /settings/users edit form + disable/enable/regenerate/force-logout	2026-05-05 10:57:25 +01:00
steve	b0a5a76925	ui: /settings/users/new + /setup-link page Adds handleUIUserNewGet, handleUIUserNewPost, handleUIUserSetupLinkGet to ui_users.go; creates web/templates/pages/user_edit.html (multi-mode new/edit/setup-link); wires three routes in the admin band of server.go.	2026-05-05 10:57:25 +01:00
steve	88f1959a6a	ui: /settings/users list page	2026-05-05 10:57:25 +01:00
steve	cae4147df6	http: POST /api/account/password — self-service password change	2026-05-05 10:57:25 +01:00
steve	dbb8550936	http: regenerate setup link + force-logout	2026-05-05 10:57:25 +01:00
steve	90bcddb27e	http: disable/enable user with last-admin guard + session kick	2026-05-05 10:57:25 +01:00
steve	cd3c13e2c6	http: GET/PATCH /api/users/{id} with last-admin guard	2026-05-05 10:57:25 +01:00
steve	a74dc33c1c	http: POST /api/users — create + setup-token + audit	2026-05-05 10:57:25 +01:00
steve	a985d45daa	http: GET /api/users (list)	2026-05-05 10:57:25 +01:00
steve	57a13f0759	http: POST /setup — set password, drop session, audit setup_completed Replaces the 501 stub with the full handler: validates the token and password, hashes and stores the password, deletes the setup token, mints an 8-hour session cookie, appends a user.setup_completed audit entry, and redirects to /. Adds TestSetupPostHappyPath covering the full round-trip including normal-login verification after setup.	2026-05-05 10:57:24 +01:00
steve	8d4c4426b0	http: GET /setup landing page with expiry handling	2026-05-05 10:57:24 +01:00
steve	cbdd94ca12	http: session/login reject disabled users; mid-session disable kicks immediately	2026-05-05 10:57:24 +01:00
steve	c1e974aad9	http: re-group routes by role band, fail-closed admin default Routes are now structured into Public / Viewer / Operator / Admin bands using requireRole middleware. Job log stream and download moved into the Viewer band. healthz moved from New() into routes() with the other public endpoints.	2026-05-05 10:57:24 +01:00
steve	95aee73e2c	http: gated test for admin-band reject of operator (lands fully in B4+E1)	2026-05-05 10:57:24 +01:00
steve	f87ba29836	http: requireRole middleware + 403 forbidden page	2026-05-05 10:57:24 +01:00
steve	2073898c10	http: test helpers — makeUser, loginAs	2026-05-05 10:57:24 +01:00
steve	37a25beb14	http: roleAtLeast helper for the role hierarchy	2026-05-05 10:57:24 +01:00
steve	ba425c9766	feat(audit): clickable column headers with asc/desc sort CI / Build (windows/amd64) (pull_request) Successful in 23s Details CI / Lint (pull_request) Successful in 34s Details CI / Build (linux/amd64) (pull_request) Successful in 23s Details CI / Build (linux/arm64) (pull_request) Successful in 21s Details CI / Test (linux/amd64) (pull_request) Successful in 3m41s Details	2026-05-05 08:15:22 +01:00
steve	1d0d994bc4	audit(csv): drop user_id and target_id columns	2026-05-05 08:05:41 +01:00
steve	489f831fc7	feat(audit): CSV export, absolute timestamps, payload modal	2026-05-05 08:00:53 +01:00
steve	3f36bcd0b0	feat(audit): P3-08 — audit log UI with filters	2026-05-05 07:49:25 +01:00
steve	9860b412f7	feat(alerts): live-refresh the table every 15s while the tab is visible The alerts list is the one screen where staleness is genuinely harmful — an operator can be looking at an Open tab that's already been resolved by another admin or auto-resolved by the engine, and take action on a row that no longer exists. Add an htmx poll on just the table panel: hx-get same URL with current querystring (filters preserved) hx-trigger every 15s, only when document is visible (no idle CPU) hx-select #alerts-table — pull this element out of the response hx-swap outerHTML Polling lives on the table div, not the page root, so the filter strip and header don't flash on each tick. Header gains a small 'live ●' label so the polling is discoverable. RefreshURL is r.URL.RequestURI() on the server side — keeps any status/severity/host_id/q params intact across refreshes. Other screens (dashboard, hosts, jobs) deliberately stay manual- refresh per the project's anti-flicker stance.	2026-05-04 23:30:19 +01:00
steve	a45c801884	feat(alerts): per-source-group dedup so two failing backups produce two alerts Until now the open-alert key was (host_id, kind, resolved_at IS NULL). A host with two source groups both failing collapsed onto one backup_failed row — second failure bumped last_seen_at and overwrote the message but never re-fan-out. Operators saw one alert that appeared to flap, not two distinct broken things. Schema changes (column-level ALTER, no rebuild): - 0015 jobs.source_group_id (FK → source_groups, ON DELETE SET NULL, index). Populated for backup jobs in CreateJob. - 0016 alerts.dedup_key (NOT NULL DEFAULT ''). The old alerts_open partial index gets dropped and replaced with a UNIQUE partial index on (host_id, kind, dedup_key) WHERE resolved_at IS NULL — the index is now the actual dedup primitive. Plumbing: - RaiseOrTouch / AutoResolve / Alert struct gain dedup_key. - engine.JobFinishedEvent gains SourceGroupID; handleJobFinished passes it through for backup_failed only (forget/prune/check stay repo-scoped with key=''). - ws.handler reads SourceGroupID off the freshly-loaded job row. - dispatchJobWithPayload gains a *string sourceGroupID arg; the per-group Run-now path and schedule.fire path pass &g.ID. Test coverage: TestRaiseOrTouchDedupsPerSourceGroup proves two distinct groups produce two distinct open alerts and that resolving one does not auto-resolve the other. Dev tool: cmd/_fake_alert gains -dedup-key flag.	2026-05-04 22:59:48 +01:00
steve	feaeff217d	feat(ntfy): support HTTP Basic auth alongside access tokens CI / Build (windows/amd64) (pull_request) Successful in 22s Details CI / Build (linux/amd64) (pull_request) Successful in 22s Details CI / Build (linux/arm64) (pull_request) Successful in 21s Details CI / Lint (pull_request) Successful in 1m12s Details CI / Test (linux/amd64) (pull_request) Successful in 1m18s Details Self-hosted ntfy that doesn't expose a token-mint endpoint can still authenticate over HTTP Basic. Add Username + Password fields to NtfyConfig; the channel sends 'Authorization: Basic …' when token is empty and username is set. Token wins when both are configured. Form-side: two new optional fields next to the access token, with the same write-only placeholder treatment as smtp_password (blank on edit means 'keep stored value'). Username is round-tripped on edit; password is masked.	2026-05-04 22:25:42 +01:00
steve	cffad4b4f3	fix: enabled toggle — list-row click + edit-form save CI / Build (windows/amd64) (pull_request) Successful in 22s Details CI / Build (linux/amd64) (pull_request) Successful in 24s Details CI / Build (linux/arm64) (pull_request) Successful in 24s Details CI / Lint (pull_request) Successful in 1m15s Details CI / Test (linux/amd64) (pull_request) Successful in 1m36s Details Two bugs in the channel-enabled affordance: 1. List-row toggle was a static span with no handler; the row's row-link overlay swallowed every click and routed to /edit. Add POST /settings/notifications/{id}/toggle backed by a new store method SetNotificationChannelEnabled, and turn the row toggle into an htmx-driven button that swaps in the new state. Use event.stopPropagation() on the toggle so it beats the row link. 2. Edit-form toggle visually flipped but the underlying checkbox reverted: the visual span lives inside the <label>, so clicking it fired the inline JS handler AND the label's native checkbox-toggle, cancelling out. Bind to the checkbox 'change' event instead and let the label do the toggling — the JS just mirrors check.checked into the .on class.	2026-05-04 22:21:45 +01:00
steve	84e121bb9c	fix: read 'name' across all per-kind sub-forms when editing channels CI / Build (windows/amd64) (pull_request) Successful in 22s Details CI / Lint (pull_request) Successful in 38s Details CI / Build (linux/amd64) (pull_request) Successful in 21s Details CI / Build (linux/arm64) (pull_request) Successful in 22s Details CI / Test (linux/amd64) (pull_request) Successful in 2m39s Details The channel form has three inputs all named 'name' (one per kind section: webhook / ntfy / smtp), but only the visible kind's input is filled in. PostForm.Get returns the first regardless of emptiness, so editing an ntfy or smtp channel always read '' from the (hidden, unfilled) webhook section's name input and rejected with 'name required'. Add firstNonEmpty helper that scans the slice for the first non-blank value. Same flavour of bug as the enabled checkbox fix in `6466f8c` — both fall out of having multiple inputs share a name across the per-kind sub-forms.	2026-05-04 22:16:59 +01:00
steve	6466f8c759	fix: read enabled checkbox correctly when paired with hidden=0 sibling The notification channel form has a <input hidden name=enabled value=0> plus a <input checkbox name=enabled value=1> so unchecking the box still submits 'enabled=0' (otherwise the field would just be absent). But Go's url.Values.Get returns the FIRST value, so even when the checkbox is ticked the handler read '0' and persisted enabled=false. Scan r.PostForm["enabled"] for any '1' instead. Caught during the sweep — all three test channels saved with enabled=0 even though the toggle visually rendered ON.	2026-05-04 21:00:54 +01:00
steve	9be3cead8e	fix: dispatch alert.acknowledged + alert.resolved on UI ack/resolve Spotted during the live Playwright sweep: clicking Acknowledge or Resolve updated the alert row but never fanned out a notification. The handlers went straight to Store.Acknowledge/Resolve, bypassing the hub. Add Engine.Acknowledge and Engine.Resolve that wrap the store call and dispatch the matching event to every enabled channel. The UI handlers prefer the engine path when wired, and fall back to the direct store call so unit tests that construct a Server without an engine still work. Use context.WithoutCancel for the goroutine dispatch — the request context is cancelled the instant the handler returns 204, so the naive 'go e.hub.Dispatch(ctx, ...)' was racing the response and losing the channel-list query with 'context canceled'.	2026-05-04 21:00:44 +01:00
steve	e0fbb8c980	ui: dashboard crit-alerts banner	2026-05-04 20:29:49 +01:00
steve	371fe734f3	ui: /settings/notifications list + edit form (3 kinds) Add settings.html (shell + sub-tab nav + conditional list/edit body), notifications.html and notification_edit.html (glob stubs), and the supporting CSS tokens (.ch-row, .ch-icon, .toggle, .kind-grid, .kind-card, .radio-pip, .test-pill) to input.css. Rebuild styles.css. Add ui_parse_test.go to catch template regressions at test time. The kind picker is JS-driven (no full page reload); the enabled toggle mirrors the existing visual toggle pattern; the test-notification button uses HTMX and renders the JSON response as a coloured pill client-side.	2026-05-04 20:25:06 +01:00
steve	d373d19647	ui: F1 — populate OpenAlerts in baseView so nav badge updates everywhere Flagged in review of `cd38b40`: the Alerts tab badge should show the open count from any page, not just /alerts. baseView now takes the request and queries store.ListAlerts(Status: "open") to fill view.OpenAlerts on every page render. All call sites updated.	2026-05-04 20:19:09 +01:00
steve	cd38b40516	ui: alerts list page + alert row partial + nav badge	2026-05-04 20:15:01 +01:00

1 2 3

131 Commits