Commit Graph

219 Commits

Author SHA1 Message Date
steve dfff6d1ef9 ui(users): banner explaining the disabled-username re-enable flow
CI / Test (rest) (pull_request) Successful in 29s
CI / Lint (pull_request) Successful in 32s
CI / Test (server-http) (pull_request) Successful in 1m9s
CI / Test (store) (pull_request) Successful in 1m13s
CI / Build (windows/amd64) (pull_request) Successful in 23s
CI / Build (linux/amd64) (pull_request) Successful in 21s
CI / Build (linux/arm64) (pull_request) Successful in 37s
2026-05-05 10:57:25 +01:00
steve 0415a96e27 ui(users): record last_login on /setup + sortable headers 2026-05-05 10:57:25 +01:00
steve d85e82110f tasks: tick P4-03/04 + sweep notes
Live Playwright + curl sweep on the smoke env exercised the full
user-management lifecycle:

  admin add user → setup link generated → curl-as-new-user fetches
  /setup (200, username on page) → POSTs password → 303 to / with
  Set-Cookie → 200 on dashboard, 200 on /settings/account,
  **403 on /settings/users** (admin-only) → admin disables → next
  request is **401** + session row count drops to 0 → audit log
  reflects user.created + user.setup_completed.

Three-role middleware enforces band gates; admin is fail-closed
default. Setup tokens are sha256-hashed at rest with 1h expiry;
expired tokens are swept on the alert engine's 60s tick. Last-admin
guard rejects disable + demote of the only enabled admin. Self-
service password change at /settings/account is reachable by every
role.
2026-05-05 10:57:25 +01:00
steve d2cc4a802e alert: piggy-back expired-setup-token cleanup on the engine tick 2026-05-05 10:57:25 +01:00
steve c34a76393c ui: /settings/account self-service password change
Adds GET/POST handlers for /settings/account in the viewer band
(any authenticated user), account.html template with current-password
field suppressed when must_change_password is set, and audits the
change via AppendAudit.
2026-05-05 10:57:25 +01:00
steve 6ccc6c8c5e ui: /settings/users edit form + disable/enable/regenerate/force-logout 2026-05-05 10:57:25 +01:00
steve b0a5a76925 ui: /settings/users/new + /setup-link page
Adds handleUIUserNewGet, handleUIUserNewPost, handleUIUserSetupLinkGet
to ui_users.go; creates web/templates/pages/user_edit.html (multi-mode
new/edit/setup-link); wires three routes in the admin band of server.go.
2026-05-05 10:57:25 +01:00
steve 88f1959a6a ui: /settings/users list page 2026-05-05 10:57:25 +01:00
steve cae4147df6 http: POST /api/account/password — self-service password change 2026-05-05 10:57:25 +01:00
steve dbb8550936 http: regenerate setup link + force-logout 2026-05-05 10:57:25 +01:00
steve 90bcddb27e http: disable/enable user with last-admin guard + session kick 2026-05-05 10:57:25 +01:00
steve cd3c13e2c6 http: GET/PATCH /api/users/{id} with last-admin guard 2026-05-05 10:57:25 +01:00
steve a74dc33c1c http: POST /api/users — create + setup-token + audit 2026-05-05 10:57:25 +01:00
steve a985d45daa http: GET /api/users (list) 2026-05-05 10:57:25 +01:00
steve 57a13f0759 http: POST /setup — set password, drop session, audit setup_completed
Replaces the 501 stub with the full handler: validates the token and
password, hashes and stores the password, deletes the setup token,
mints an 8-hour session cookie, appends a user.setup_completed audit
entry, and redirects to /. Adds TestSetupPostHappyPath covering the
full round-trip including normal-login verification after setup.
2026-05-05 10:57:24 +01:00
steve 8d4c4426b0 http: GET /setup landing page with expiry handling 2026-05-05 10:57:24 +01:00
steve cbdd94ca12 http: session/login reject disabled users; mid-session disable kicks immediately 2026-05-05 10:57:24 +01:00
steve c1e974aad9 http: re-group routes by role band, fail-closed admin default
Routes are now structured into Public / Viewer / Operator / Admin bands
using requireRole middleware. Job log stream and download moved into the
Viewer band. healthz moved from New() into routes() with the other
public endpoints.
2026-05-05 10:57:24 +01:00
steve 95aee73e2c http: gated test for admin-band reject of operator (lands fully in B4+E1) 2026-05-05 10:57:24 +01:00
steve f87ba29836 http: requireRole middleware + 403 forbidden page 2026-05-05 10:57:24 +01:00
steve 2073898c10 http: test helpers — makeUser, loginAs 2026-05-05 10:57:24 +01:00
steve 37a25beb14 http: roleAtLeast helper for the role hierarchy 2026-05-05 10:57:24 +01:00
steve f0828782c1 store: DeleteSessionsByUserID for force-logout 2026-05-05 10:57:24 +01:00
steve 12391abef0 store: user_setup_tokens CRUD + cleanup-expired 2026-05-05 10:57:24 +01:00
steve 2c090171e5 store: lowercase username, email/disable helpers, last-admin count 2026-05-05 10:57:24 +01:00
steve bd08d8ca14 store: extend User struct with Email, DisabledAt, MustChangePassword 2026-05-05 10:57:24 +01:00
steve a7e53e0a64 store: migration 0018 — user_setup_tokens 2026-05-05 10:57:24 +01:00
steve ca170fedc5 store: migration 0017 — users.email, disabled_at, must_change_password 2026-05-05 10:57:24 +01:00
steve c9f230ce1d plan: P4-03/04 — RBAC + user management implementation plan
Bite-sized TDD tasks across 7 slices (A schema, B middleware,
C session re-validation, D setup-token flow, E user CRUD API,
F UI, G wiring + sweep). Each task is one commit with concrete
code blocks and test cases — no placeholders.

Refs spec at docs/superpowers/specs/2026-05-05-p4-03-04-rbac-user-mgmt-design.md.
2026-05-05 10:57:24 +01:00
steve 282258e837 spec: P4-03/04 — RBAC + user management design
Brainstormed shape locked: chi route-group middleware, fail-closed
admin default; setup-token flow with 1h single-use tokens
(sha256-hashed at rest, raw shown to admin once); disable-only user
lifecycle with last-admin guard; self-service /settings/account
password change for every role; email field on users (metadata
v1); session re-validation on every authenticated request so
disable / role change land immediately.

Locked decisions captured in §Role taxonomy, §Schema changes,
§Setup-token flow, §RBAC enforcement, §Last-admin self-protection.
Deferred items in §Out of scope (OIDC, SMTP email-the-link,
hard delete, lockout).

Migrations 0017 (users extensions) + 0018 (user_setup_tokens)
both column-level ALTERs per CLAUDE.md preference.
2026-05-05 10:57:24 +01:00
steve 4eab42a9c3 Merge pull request 'ci: shard test job + cheap argon2 in test mode' (#13) from ci-faster-tests into main
Reviewed-on: #13
2026-05-05 07:44:30 +00:00
steve 03e5ec31f1 ci: shard test job + cheap argon2 in test mode
CI / Test (store) (pull_request) Successful in 38s
CI / Test (rest) (pull_request) Successful in 48s
CI / Test (server-http) (pull_request) Successful in 1m10s
CI / Lint (pull_request) Successful in 33s
CI / Build (linux/amd64) (pull_request) Successful in 24s
CI / Build (windows/amd64) (pull_request) Successful in 48s
CI / Build (linux/arm64) (pull_request) Successful in 23s
Test job was wall-clocked by `internal/server/http` (~156s on the
self-hosted runner under -race). Two changes here cut that:

1. Matrix-shard the test job by package group: server-http, store,
   and "rest" (everything else, computed via `go list | grep -v`).
   Each shard runs on its own runner so the heavy package isn't
   CPU-starved by siblings.

2. `auth.HashPassword` drops to cheap argon2id params (8 KiB / 1
   iter / 1 lane) when `testing.Testing()` returns true. Production
   params are unchanged. VerifyPassword reads params from the
   encoded hash so cheap-params hashes verify identically — no test
   call sites need to change.
2026-05-05 08:40:50 +01:00
steve 6fd16ace81 Merge pull request 'feat(audit): P3-08 — audit log UI with filters, sort, CSV export, payload modal' (#12) from p3-08-audit-ui into main
Reviewed-on: #12
2026-05-05 07:17:25 +00:00
steve ba425c9766 feat(audit): clickable column headers with asc/desc sort
CI / Build (windows/amd64) (pull_request) Successful in 23s
CI / Lint (pull_request) Successful in 34s
CI / Build (linux/amd64) (pull_request) Successful in 23s
CI / Build (linux/arm64) (pull_request) Successful in 21s
CI / Test (linux/amd64) (pull_request) Successful in 3m41s
2026-05-05 08:15:22 +01:00
steve 1d0d994bc4 audit(csv): drop user_id and target_id columns 2026-05-05 08:05:41 +01:00
steve 489f831fc7 feat(audit): CSV export, absolute timestamps, payload modal 2026-05-05 08:00:53 +01:00
steve 3f36bcd0b0 feat(audit): P3-08 — audit log UI with filters 2026-05-05 07:49:25 +01:00
steve cb3260b89c Merge pull request 'feat(alerts): live refresh table with toggle + severity colour cues' (#11) from alerts-live-refresh into main
Reviewed-on: #11
2026-05-05 06:42:21 +00:00
steve 8813e93317 alerts: 5s polling cadence + live toggle + severity colour cues
CI / Build (windows/amd64) (pull_request) Successful in 23s
CI / Lint (pull_request) Successful in 38s
CI / Build (linux/amd64) (pull_request) Successful in 21s
CI / Build (linux/arm64) (pull_request) Successful in 23s
CI / Test (linux/amd64) (pull_request) Successful in 2m57s
Two operator-visible changes on /alerts:

1. Polling drops from 15s to 5s and gains a checkbox in the table
   header to turn live monitoring on/off. Choice is persisted in
   localStorage so it survives full-page navigations. The toggle
   state is woven into the htmx hx-trigger predicate, so flipping
   the checkbox just sets the flag and the next tick (or the
   absence of one) honours it — no attribute juggling, no
   htmx.process re-init. The dot dims to 0.3 opacity when paused
   so operators can see at a glance that they're looking at a
   stale view.

2. Severity dropdown options pick up the same oklch tints used by
   the row dots / left borders / kind chips. The kind column shows
   only the kind text, so without a colour cue the dropdown
   mentioned a concept (severity) that the table itself didn't
   render. Now the colours bridge the gap.

Note on <option> styling: Chrome and Firefox honour inline color:
on options; Safari ignores it. Acceptable degradation — falls back
to plain text, which is what we had.
2026-05-04 23:35:03 +01:00
steve 9860b412f7 feat(alerts): live-refresh the table every 15s while the tab is visible
The alerts list is the one screen where staleness is genuinely
harmful — an operator can be looking at an Open tab that's already
been resolved by another admin or auto-resolved by the engine, and
take action on a row that no longer exists.

Add an htmx poll on just the table panel:

  hx-get        same URL with current querystring (filters preserved)
  hx-trigger    every 15s, only when document is visible (no idle CPU)
  hx-select     #alerts-table — pull this element out of the response
  hx-swap       outerHTML

Polling lives on the table div, not the page root, so the filter
strip and header don't flash on each tick. Header gains a small
'live ●' label so the polling is discoverable.

RefreshURL is r.URL.RequestURI() on the server side — keeps any
status/severity/host_id/q params intact across refreshes.

Other screens (dashboard, hosts, jobs) deliberately stay manual-
refresh per the project's anti-flicker stance.
2026-05-04 23:30:19 +01:00
steve 1618094a26 feat(channels): include event verb in ntfy title + smtp subject (#10)
Co-authored-by: Steve Cliff <steve@devcloud.guru>
Co-committed-by: Steve Cliff <steve@devcloud.guru>
2026-05-04 22:25:38 +00:00
steve dd53c9e497 ui(alerts): clarify Acknowledge vs Resolve (#9)
Co-authored-by: Steve Cliff <steve@devcloud.guru>
Co-committed-by: Steve Cliff <steve@devcloud.guru>
2026-05-04 22:25:35 +00:00
steve 84814b1386 Merge pull request 'Phase 3 — Alerts: per-source-group dedup' (#8) from p3-alerts-dedup into main
CI / Build (windows/amd64) (pull_request) Successful in 23s
CI / Build (linux/amd64) (pull_request) Successful in 23s
CI / Build (linux/arm64) (pull_request) Successful in 22s
CI / Lint (pull_request) Successful in 1m22s
CI / Test (linux/amd64) (pull_request) Successful in 1m28s
Reviewed-on: #8
2026-05-04 22:11:08 +00:00
steve a45c801884 feat(alerts): per-source-group dedup so two failing backups produce two alerts
Until now the open-alert key was (host_id, kind, resolved_at IS NULL).
A host with two source groups both failing collapsed onto one
backup_failed row — second failure bumped last_seen_at and
overwrote the message but never re-fan-out. Operators saw one
alert that appeared to flap, not two distinct broken things.

Schema changes (column-level ALTER, no rebuild):

- 0015 jobs.source_group_id (FK → source_groups, ON DELETE SET NULL,
  index). Populated for backup jobs in CreateJob.
- 0016 alerts.dedup_key (NOT NULL DEFAULT ''). The old alerts_open
  partial index gets dropped and replaced with a UNIQUE partial
  index on (host_id, kind, dedup_key) WHERE resolved_at IS NULL —
  the index is now the actual dedup primitive.

Plumbing:

- RaiseOrTouch / AutoResolve / Alert struct gain dedup_key.
- engine.JobFinishedEvent gains SourceGroupID; handleJobFinished
  passes it through for backup_failed only (forget/prune/check stay
  repo-scoped with key='').
- ws.handler reads SourceGroupID off the freshly-loaded job row.
- dispatchJobWithPayload gains a *string sourceGroupID arg; the
  per-group Run-now path and schedule.fire path pass &g.ID.

Test coverage: TestRaiseOrTouchDedupsPerSourceGroup proves two
distinct groups produce two distinct open alerts and that resolving
one does not auto-resolve the other.

Dev tool: cmd/_fake_alert gains -dedup-key flag.
2026-05-04 22:59:48 +01:00
steve 7792aadb94 Merge pull request 'Phase 3 — Alerts (P3-05/06/07)' (#7) from p3-alerts into main
Reviewed-on: #7
2026-05-04 21:51:16 +00:00
steve 2eac324cec chore: ignore cmd/_* dev binaries + Tailwind rebuild
CI / Build (windows/amd64) (pull_request) Successful in 21s
CI / Build (linux/amd64) (pull_request) Successful in 21s
CI / Build (linux/arm64) (pull_request) Successful in 22s
CI / Lint (pull_request) Successful in 1m13s
CI / Test (linux/amd64) (pull_request) Successful in 1m20s
cmd/_fake_alert and similar one-shot dev tools live under cmd/_*
where Go's build tooling skips them. Add an explicit gitignore line
so an accidental 'git add cmd/.' can't drag them into a release.

styles.css is the regenerated Tailwind output — picks up the new
ntfy basic-auth fields and the right-rail preview ids.
2026-05-04 22:49:46 +01:00
steve 3cdaee63d4 fix: payload-preview rail follows kind switcher
CI / Lint (pull_request) Successful in 32s
CI / Build (windows/amd64) (pull_request) Successful in 43s
CI / Build (linux/amd64) (pull_request) Successful in 21s
CI / Test (linux/amd64) (pull_request) Successful in 1m18s
CI / Build (linux/arm64) (pull_request) Successful in 43s
Right-rail preview was rendered server-side via {{if eq $f.Kind ...}},
so it stayed on whatever kind the page loaded with. Editing an SMTP
channel and flipping to ntfy in the picker left the email RFC 5322
sample on screen.

Render all three preview panels with id='preview-<kind>' (only the
matching one visible on first render) and toggle their .hidden class
in the kind-switcher JS alongside the field panels. Same pattern
used for fields-<kind>.
2026-05-04 22:40:46 +01:00
steve 7f2a9964db fix: move channel delete-panel out of edit form (nested form bug)
CI / Build (windows/amd64) (pull_request) Successful in 21s
CI / Build (linux/amd64) (pull_request) Successful in 22s
CI / Build (linux/arm64) (pull_request) Successful in 21s
CI / Lint (pull_request) Successful in 1m11s
CI / Test (linux/amd64) (pull_request) Successful in 1m22s
The delete-panel <form action='.../delete'> was nested inside the
main <form action='.../edit'>. HTML doesn't allow nested forms —
browsers parse the inner form as if it didn't exist, so clicking
'Delete permanently' submitted the outer edit form to /edit
instead of /delete, leaving the channel intact.

Move the delete-panel block to a sibling of the main form. The
'Delete channel…' button still toggles its visibility via JS, the
panel still renders inside the page layout, and now its form
actually posts to the delete handler.
2026-05-04 22:35:58 +01:00
steve feaeff217d feat(ntfy): support HTTP Basic auth alongside access tokens
CI / Build (windows/amd64) (pull_request) Successful in 22s
CI / Build (linux/amd64) (pull_request) Successful in 22s
CI / Build (linux/arm64) (pull_request) Successful in 21s
CI / Lint (pull_request) Successful in 1m12s
CI / Test (linux/amd64) (pull_request) Successful in 1m18s
Self-hosted ntfy that doesn't expose a token-mint endpoint can still
authenticate over HTTP Basic. Add Username + Password fields to
NtfyConfig; the channel sends 'Authorization: Basic …' when token is
empty and username is set. Token wins when both are configured.

Form-side: two new optional fields next to the access token, with
the same write-only placeholder treatment as smtp_password (blank
on edit means 'keep stored value'). Username is round-tripped on
edit; password is masked.
2026-05-04 22:25:42 +01:00
steve cffad4b4f3 fix: enabled toggle — list-row click + edit-form save
CI / Build (windows/amd64) (pull_request) Successful in 22s
CI / Build (linux/amd64) (pull_request) Successful in 24s
CI / Build (linux/arm64) (pull_request) Successful in 24s
CI / Lint (pull_request) Successful in 1m15s
CI / Test (linux/amd64) (pull_request) Successful in 1m36s
Two bugs in the channel-enabled affordance:

1. List-row toggle was a static span with no handler; the row's
   row-link overlay swallowed every click and routed to /edit. Add
   POST /settings/notifications/{id}/toggle backed by a new store
   method SetNotificationChannelEnabled, and turn the row toggle
   into an htmx-driven button that swaps in the new state. Use
   event.stopPropagation() on the toggle so it beats the row link.

2. Edit-form toggle visually flipped but the underlying checkbox
   reverted: the visual span lives inside the <label>, so clicking
   it fired the inline JS handler AND the label's native
   checkbox-toggle, cancelling out. Bind to the checkbox 'change'
   event instead and let the label do the toggling — the JS just
   mirrors check.checked into the .on class.
2026-05-04 22:21:45 +01:00