Commit Graph

219 Commits

Author SHA1 Message Date
steve 154b57a4cd store: extend User with AuthSource/OIDCSubject; Session with IDToken 2026-05-05 13:09:49 +01:00
steve c5b29b88b9 store: migration 0019 — users.auth_source/oidc_subject + sessions.id_token + oidc_state 2026-05-05 13:08:15 +01:00
steve 1df072a211 Merge pull request 'Phase 4 — P4-07: per-host tags + dashboard chip-row filter' (#15) from p4-07-host-tags into main
Reviewed-on: #15
2026-05-05 10:55:11 +00:00
steve 2421d5d389 ui(tags): edit-button label, Save-tags width, persistent help text 2026-05-05 11:23:36 +01:00
steve 168059ae45 feat(hosts): per-host tags edit + dashboard chip-row filter (P4-07) 2026-05-05 11:16:09 +01:00
steve c1426110e5 tasks: defer update delivery + observability to Phase 6
Pull the operator-experience polish out of Phase 4 so a working v1
ships sooner. Phase 4 keeps RBAC + user mgmt (already done), OIDC,
and host tags. Deferred items renumbered as P6-01..P6-05:

  P4-01 → P6-01  apt + Chocolatey update delivery
  P4-02 → P6-02  agent-version-behind-server tracking on dashboard
  P4-06 → P6-03  repo size trend graphs
  P4-08 → P6-04  Prometheus /metrics endpoint
  P4-09 → P6-05  Grafana dashboard JSON + integration docs

None of these gate getting the system into production. They land
after Phase 5 (OSS readiness) on the new Phase 6.

Phase 4 remaining: P4-05 (OIDC login) + P4-07 (per-host tags +
dashboard filtering).
2026-05-05 11:05:11 +01:00
steve 3922cbece2 Merge pull request 'Phase 4 — P4-03/04: RBAC + user management' (#14) from p4-03-04-rbac-user-mgmt into main
Reviewed-on: #14
2026-05-05 10:01:43 +00:00
steve 6295faad64 ui(users): banner explaining the disabled-username re-enable flow 2026-05-05 10:57:25 +01:00
steve 2d9e53b025 ui(users): record last_login on /setup + sortable headers 2026-05-05 10:57:25 +01:00
steve 0521a2169f tasks: tick P4-03/04 + sweep notes
Live Playwright + curl sweep on the smoke env exercised the full
user-management lifecycle:

  admin add user → setup link generated → curl-as-new-user fetches
  /setup (200, username on page) → POSTs password → 303 to / with
  Set-Cookie → 200 on dashboard, 200 on /settings/account,
  **403 on /settings/users** (admin-only) → admin disables → next
  request is **401** + session row count drops to 0 → audit log
  reflects user.created + user.setup_completed.

Three-role middleware enforces band gates; admin is fail-closed
default. Setup tokens are sha256-hashed at rest with 1h expiry;
expired tokens are swept on the alert engine's 60s tick. Last-admin
guard rejects disable + demote of the only enabled admin. Self-
service password change at /settings/account is reachable by every
role.
2026-05-05 10:57:25 +01:00
steve c98eb19adb alert: piggy-back expired-setup-token cleanup on the engine tick 2026-05-05 10:57:25 +01:00
steve 2dd8f3c3be ui: /settings/account self-service password change
Adds GET/POST handlers for /settings/account in the viewer band
(any authenticated user), account.html template with current-password
field suppressed when must_change_password is set, and audits the
change via AppendAudit.
2026-05-05 10:57:25 +01:00
steve 2f3292aebf ui: /settings/users edit form + disable/enable/regenerate/force-logout 2026-05-05 10:57:25 +01:00
steve 04a413eb55 ui: /settings/users/new + /setup-link page
Adds handleUIUserNewGet, handleUIUserNewPost, handleUIUserSetupLinkGet
to ui_users.go; creates web/templates/pages/user_edit.html (multi-mode
new/edit/setup-link); wires three routes in the admin band of server.go.
2026-05-05 10:57:25 +01:00
steve 211f11e460 ui: /settings/users list page 2026-05-05 10:57:25 +01:00
steve 426b06d43d http: POST /api/account/password — self-service password change 2026-05-05 10:57:25 +01:00
steve 18affc1f16 http: regenerate setup link + force-logout 2026-05-05 10:57:25 +01:00
steve 53016aee93 http: disable/enable user with last-admin guard + session kick 2026-05-05 10:57:25 +01:00
steve 9e044fd7b0 http: GET/PATCH /api/users/{id} with last-admin guard 2026-05-05 10:57:25 +01:00
steve 7c241c55d1 http: POST /api/users — create + setup-token + audit 2026-05-05 10:57:25 +01:00
steve e5f79902fd http: GET /api/users (list) 2026-05-05 10:57:25 +01:00
steve 81f2852eb1 http: POST /setup — set password, drop session, audit setup_completed
Replaces the 501 stub with the full handler: validates the token and
password, hashes and stores the password, deletes the setup token,
mints an 8-hour session cookie, appends a user.setup_completed audit
entry, and redirects to /. Adds TestSetupPostHappyPath covering the
full round-trip including normal-login verification after setup.
2026-05-05 10:57:24 +01:00
steve 0407aa420b http: GET /setup landing page with expiry handling 2026-05-05 10:57:24 +01:00
steve 56108ffc33 http: session/login reject disabled users; mid-session disable kicks immediately 2026-05-05 10:57:24 +01:00
steve c75777b60f http: re-group routes by role band, fail-closed admin default
Routes are now structured into Public / Viewer / Operator / Admin bands
using requireRole middleware. Job log stream and download moved into the
Viewer band. healthz moved from New() into routes() with the other
public endpoints.
2026-05-05 10:57:24 +01:00
steve 085fa9684b http: gated test for admin-band reject of operator (lands fully in B4+E1) 2026-05-05 10:57:24 +01:00
steve 529104b8e4 http: requireRole middleware + 403 forbidden page 2026-05-05 10:57:24 +01:00
steve 2ba561410f http: test helpers — makeUser, loginAs 2026-05-05 10:57:24 +01:00
steve 8727d6bacc http: roleAtLeast helper for the role hierarchy 2026-05-05 10:57:24 +01:00
steve e76a383813 store: DeleteSessionsByUserID for force-logout 2026-05-05 10:57:24 +01:00
steve 93d857d995 store: user_setup_tokens CRUD + cleanup-expired 2026-05-05 10:57:24 +01:00
steve dafdfcda3f store: lowercase username, email/disable helpers, last-admin count 2026-05-05 10:57:24 +01:00
steve c6fbe7c0e0 store: extend User struct with Email, DisabledAt, MustChangePassword 2026-05-05 10:57:24 +01:00
steve a1d307fafa store: migration 0018 — user_setup_tokens 2026-05-05 10:57:24 +01:00
steve 9712c65b04 store: migration 0017 — users.email, disabled_at, must_change_password 2026-05-05 10:57:24 +01:00
steve 45d4844937 Merge pull request 'ci: shard test job + cheap argon2 in test mode' (#13) from ci-faster-tests into main
Reviewed-on: #13
2026-05-05 07:44:30 +00:00
steve b2983aed52 ci: shard test job + cheap argon2 in test mode
Test job was wall-clocked by `internal/server/http` (~156s on the
self-hosted runner under -race). Two changes here cut that:

1. Matrix-shard the test job by package group: server-http, store,
   and "rest" (everything else, computed via `go list | grep -v`).
   Each shard runs on its own runner so the heavy package isn't
   CPU-starved by siblings.

2. `auth.HashPassword` drops to cheap argon2id params (8 KiB / 1
   iter / 1 lane) when `testing.Testing()` returns true. Production
   params are unchanged. VerifyPassword reads params from the
   encoded hash so cheap-params hashes verify identically — no test
   call sites need to change.
2026-05-05 08:40:50 +01:00
steve c1fd5a3526 Merge pull request 'feat(audit): P3-08 — audit log UI with filters, sort, CSV export, payload modal' (#12) from p3-08-audit-ui into main
Reviewed-on: #12
2026-05-05 07:17:25 +00:00
steve 4f66cc2b34 feat(audit): clickable column headers with asc/desc sort 2026-05-05 08:15:22 +01:00
steve deb8b874ca audit(csv): drop user_id and target_id columns 2026-05-05 08:05:41 +01:00
steve 86fe569ea0 feat(audit): CSV export, absolute timestamps, payload modal 2026-05-05 08:00:53 +01:00
steve 16c77a8cc5 feat(audit): P3-08 — audit log UI with filters 2026-05-05 07:49:25 +01:00
steve 4f94ddbbcb Merge pull request 'feat(alerts): live refresh table with toggle + severity colour cues' (#11) from alerts-live-refresh into main
Reviewed-on: #11
2026-05-05 06:42:21 +00:00
steve d46adabeec alerts: 5s polling cadence + live toggle + severity colour cues
Two operator-visible changes on /alerts:

1. Polling drops from 15s to 5s and gains a checkbox in the table
   header to turn live monitoring on/off. Choice is persisted in
   localStorage so it survives full-page navigations. The toggle
   state is woven into the htmx hx-trigger predicate, so flipping
   the checkbox just sets the flag and the next tick (or the
   absence of one) honours it — no attribute juggling, no
   htmx.process re-init. The dot dims to 0.3 opacity when paused
   so operators can see at a glance that they're looking at a
   stale view.

2. Severity dropdown options pick up the same oklch tints used by
   the row dots / left borders / kind chips. The kind column shows
   only the kind text, so without a colour cue the dropdown
   mentioned a concept (severity) that the table itself didn't
   render. Now the colours bridge the gap.

Note on <option> styling: Chrome and Firefox honour inline color:
on options; Safari ignores it. Acceptable degradation — falls back
to plain text, which is what we had.
2026-05-04 23:35:03 +01:00
steve 595656ed59 feat(alerts): live-refresh the table every 15s while the tab is visible
The alerts list is the one screen where staleness is genuinely
harmful — an operator can be looking at an Open tab that's already
been resolved by another admin or auto-resolved by the engine, and
take action on a row that no longer exists.

Add an htmx poll on just the table panel:

  hx-get        same URL with current querystring (filters preserved)
  hx-trigger    every 15s, only when document is visible (no idle CPU)
  hx-select     #alerts-table — pull this element out of the response
  hx-swap       outerHTML

Polling lives on the table div, not the page root, so the filter
strip and header don't flash on each tick. Header gains a small
'live ●' label so the polling is discoverable.

RefreshURL is r.URL.RequestURI() on the server side — keeps any
status/severity/host_id/q params intact across refreshes.

Other screens (dashboard, hosts, jobs) deliberately stay manual-
refresh per the project's anti-flicker stance.
2026-05-04 23:30:19 +01:00
steve 85c62741b5 feat(channels): include event verb in ntfy title + smtp subject (#10)
Co-authored-by: Steve Cliff <steve@devcloud.guru>
Co-committed-by: Steve Cliff <steve@devcloud.guru>
2026-05-04 22:25:38 +00:00
steve bc5ce12957 ui(alerts): clarify Acknowledge vs Resolve (#9)
Co-authored-by: Steve Cliff <steve@devcloud.guru>
Co-committed-by: Steve Cliff <steve@devcloud.guru>
2026-05-04 22:25:35 +00:00
steve 38090dd457 Merge pull request 'Phase 3 — Alerts: per-source-group dedup' (#8) from p3-alerts-dedup into main
Reviewed-on: #8
2026-05-04 22:11:08 +00:00
steve 350be3f19d feat(alerts): per-source-group dedup so two failing backups produce two alerts
Until now the open-alert key was (host_id, kind, resolved_at IS NULL).
A host with two source groups both failing collapsed onto one
backup_failed row — second failure bumped last_seen_at and
overwrote the message but never re-fan-out. Operators saw one
alert that appeared to flap, not two distinct broken things.

Schema changes (column-level ALTER, no rebuild):

- 0015 jobs.source_group_id (FK → source_groups, ON DELETE SET NULL,
  index). Populated for backup jobs in CreateJob.
- 0016 alerts.dedup_key (NOT NULL DEFAULT ''). The old alerts_open
  partial index gets dropped and replaced with a UNIQUE partial
  index on (host_id, kind, dedup_key) WHERE resolved_at IS NULL —
  the index is now the actual dedup primitive.

Plumbing:

- RaiseOrTouch / AutoResolve / Alert struct gain dedup_key.
- engine.JobFinishedEvent gains SourceGroupID; handleJobFinished
  passes it through for backup_failed only (forget/prune/check stay
  repo-scoped with key='').
- ws.handler reads SourceGroupID off the freshly-loaded job row.
- dispatchJobWithPayload gains a *string sourceGroupID arg; the
  per-group Run-now path and schedule.fire path pass &g.ID.

Test coverage: TestRaiseOrTouchDedupsPerSourceGroup proves two
distinct groups produce two distinct open alerts and that resolving
one does not auto-resolve the other.

Dev tool: cmd/_fake_alert gains -dedup-key flag.
2026-05-04 22:59:48 +01:00
steve 9d7a714102 Merge pull request 'Phase 3 — Alerts (P3-05/06/07)' (#7) from p3-alerts into main
Reviewed-on: #7
2026-05-04 21:51:16 +00:00