Brainstormed shape locked: chi route-group middleware, fail-closed admin default; setup-token flow with 1h single-use tokens (sha256-hashed at rest, raw shown to admin once); disable-only user lifecycle with last-admin guard; self-service /settings/account password change for every role; email field on users (metadata v1); session re-validation on every authenticated request so disable / role change land immediately. Locked decisions captured in §Role taxonomy, §Schema changes, §Setup-token flow, §RBAC enforcement, §Last-admin self-protection. Deferred items in §Out of scope (OIDC, SMTP email-the-link, hard delete, lockout). Migrations 0017 (users extensions) + 0018 (user_setup_tokens) both column-level ALTERs per CLAUDE.md preference.
18 KiB
P4-03 / P4-04 — RBAC + User Management Design
Date: 2026-05-05 Status: brainstorm complete; ready for plan Closes: P4-03 (RBAC enforcement at API layer), P4-04 (User management UI)
Goal
Enforce role-based access control at the HTTP layer (currently every authenticated user has admin powers) and ship the operator-facing screens for managing users, roles, and password lifecycle.
Architecture
Two coupled subsystems landing in one PR:
- RBAC enforcement — chi route-group middleware that gates each subtree by minimum role. Fail-closed default (admin) so a forgotten declaration doesn't accidentally widen access.
- User management —
/settings/userssub-tab with list / add / edit / disable. Setup-link flow for new users (1-hour-expiry single-use token). Self-service password change at/settings/account.
The audit log already records actor + user_id on every mutation; new endpoints fold in naturally.
Role taxonomy
Locked. Three roles, hierarchical (admin ⊇ operator ⊇ viewer):
| Action | admin | operator | viewer |
|---|---|---|---|
| View dashboard / alerts / audit / hosts | ✓ | ✓ | ✓ |
| Trigger Run-now / Restore / Snapshot diff | ✓ | ✓ | ✗ |
| Acknowledge / resolve alerts | ✓ | ✓ | ✗ |
| Edit schedules / source groups / retention / hooks | ✓ | ✓ | ✗ |
| Add / remove hosts (enrolment, accept/reject pending) | ✓ | ✓ | ✗ |
| Cancel running jobs | ✓ | ✓ | ✗ |
| Edit repo credentials | ✓ | ✓ | ✗ |
| Edit notification channels | ✓ | ✗ | ✗ |
| Manage users | ✓ | ✗ | ✗ |
Self password change (/settings/account) |
✓ | ✓ | ✓ |
The role enum already exists in the schema (CHECK (role IN ('admin','operator','viewer'))) and in internal/store/types.go. Bootstrap creates the first user as admin. Zero migration needed for existing installs.
Schema changes
All column-level ALTERs (CLAUDE.md prefers these over rebuilds; safe under foreign_keys=ON).
Migration 0017 — users extensions
ALTER TABLE users ADD COLUMN email TEXT;
ALTER TABLE users ADD COLUMN disabled_at TEXT;
ALTER TABLE users ADD COLUMN must_change_password INTEGER NOT NULL DEFAULT 0;
-- Username case-insensitive lookup. Existing rows are kept as-is;
-- normalisation only applies to new INSERTs (handled in Go).
CREATE UNIQUE INDEX users_username_lower ON users(LOWER(username));
Migration 0018 — user_setup_tokens
CREATE TABLE user_setup_tokens (
user_id TEXT PRIMARY KEY REFERENCES users(id) ON DELETE CASCADE,
token_hash TEXT NOT NULL, -- sha256(raw_token), hex
expires_at TEXT NOT NULL,
created_at TEXT NOT NULL,
created_by TEXT NOT NULL REFERENCES users(id) ON DELETE SET NULL
);
CREATE INDEX user_setup_tokens_expires ON user_setup_tokens(expires_at);
user_id is PRIMARY KEY, not just FOREIGN KEY — only one outstanding setup token per user. Regenerating supersedes the old via INSERT OR REPLACE.
RBAC enforcement
Middleware
// requireRole returns chi middleware that 403s any request whose
// session-resolved user doesn't meet the minimum role. Roles are
// hierarchical: admin > operator > viewer.
func (s *Server) requireRole(min store.Role) func(http.Handler) http.Handler
Hierarchy implemented as a small helper:
func roleAtLeast(have, min store.Role) bool {
rank := map[store.Role]int{
store.RoleViewer: 1,
store.RoleOperator: 2,
store.RoleAdmin: 3,
}
return rank[have] >= rank[min]
}
Route grouping in server.go
The existing /api and UI routes get re-grouped into three role bands plus a self-service group:
/api/* viewer-readable — GET endpoints anyone authenticated can hit
/api/* operator+ — mutating endpoints up to host/source-group/schedule level
/api/* admin-only — /api/users/*, channel CRUD
/api/account — self-service password change
/audit, /alerts, /hosts/{id}, etc. — viewer
/hosts/{id}/run, /alerts/{id}/ack — operator
/settings/users/*, /settings/notifications/* — admin
/settings/account — viewer (any authenticated)
Default at the bottom of routes() is admin (fail-closed). Any future endpoint that doesn't get explicitly placed lands in admin-only, surfacing the missing declaration as a permission error rather than a silent bypass.
Per-handler nuance
One existing case warrants a handler-level check on top of the route gate: GET /settings/users/{id}/edit is admin-only, but the PUT /api/account/password is viewer-OK. The split-by-route already covers this; no per-handler overrides expected in v1.
Out of scope of role middleware
/ws/agentand/api/agents/*— agent bearer-token auth, separate chain/healthz— unauthenticated/login,/logout,/bootstrap— public
403 handling
- JSON endpoints:
{"error":"forbidden","code":"insufficient_role"}with HTTP 403 - HTML endpoints: render a small "You don't have permission" panel inside the chrome (so the user keeps their nav and can move away), HTTP 403
- No audit row on 403 — too noisy with normal users hitting URLs they don't have access to
Session re-validation
Sessions need to honour disabled_at and current role on every request, not just at login. The session-validation middleware reads the user row each request (single PK lookup, fast in SQLite). If disabled_at IS NOT NULL, the session is invalidated and the request 401s. This makes "disable user" and "force logout" effectively immediate.
Cost: one SELECT per authenticated request. SQLite handles this comfortably for the fleet sizes this codebase targets.
Setup-token flow (replacing temp passwords)
Add user
- Admin clicks + Add user on
/settings/users - Form: username (required, lowercase-normalised), email (optional, validated), role (admin/operator/viewer)
- Server:
- Validates username uniqueness (case-insensitive). On collision with a disabled user, return a 409 with
{"existing_user_id": "...", "disabled": true}so the UI can pivot to a "re-enable existing user" prompt - On collision with an enabled user: 409 with a plain "username taken" error
- Creates user row with
password_hash = "",must_change_password = 1,disabled_at = NULL - Generates 32 random bytes, hex-encodes → raw token (64 chars). Stores
sha256(token)hex inuser_setup_tokens.expires_at = now + 1h - Audit:
user.created, payload{"username": "...", "role": "...", "with_setup_token": true}
- Validates username uniqueness (case-insensitive). On collision with a disabled user, return a 409 with
- Server returns the admin to a one-time setup-link page:
/settings/users/{id}/setup-link- Shows the URL
http(s)://<base>/setup?token=<raw>with a Copy button - Countdown timer (live JS) showing time-to-expiry
- Warning: "This is the only time you'll see this link. If you lose it, regenerate from the user edit page."
- "Done" button →
/settings/users
- Shows the URL
The raw token is never persisted server-side. Lost tokens require regeneration.
Setup landing page (public, no auth required)
- User clicks the link, lands on
/setup?token=<raw> - Server hashes the token, looks up
user_setup_tokensrow, validatesexpires_at > now - On invalid / expired: render an error page with a "Contact your administrator" message. Audit:
user.setup_token.expired(no actor). - On valid: render a password-set form:
new password + confirm. Submit:- Validates password meets policy (min 12 chars, no other constraints in v1 — same as bootstrap path)
- Hashes via
auth.HashPassword(existing helper) - Updates
users.password_hash, setsmust_change_password = 0 - Deletes the
user_setup_tokensrow (single-use) - Logs the user in via the existing session helper
- Audit:
user.setup_completed, payload{"user_id": "..."} - Redirect to
/
Regenerate setup link (admin)
/settings/users/{id}/edit shows a "Regenerate setup link" button when must_change_password = 1. Clicking it:
- Generates a new token + hash, INSERT OR REPLACE on
user_setup_tokens - Returns the admin to the same one-time link page as the add-user flow
- Audit:
user.setup_token.regenerated
Cleanup
Expired tokens linger in the DB until cleaned. Add a cheap sweep on the existing maintenance ticker: DELETE FROM user_setup_tokens WHERE expires_at < ?. Runs at the same cadence as the alert engine tick (60s). No new ticker needed.
Self-service password change
/settings/account
- Accessible to every authenticated user (any role)
- Form:
current password + new password + confirm - Server validates current password (re-uses login bcrypt comparison), updates hash, audits
user.password_changed - Special case: if
must_change_password = 1, the current-password field is hidden / not required (covers the legacy "admin reset password" path if we ever add one — current setup-token path doesn't use this)
The bootstrap user's password change uses this same page (no special case for "first admin").
User list / management UI
/settings/users (admin-only)
Settings · Users [3]
─────────────────────────────────────────────────
[ + Add user ] [ ] Show disabled
USERNAME EMAIL ROLE LAST LOGIN STATUS
alice alice@example.com admin 2 mins ago enabled
bob — operator 3 days ago enabled
charlie c@example.com viewer never setup pending ← if has open setup token
diane d@example.com operator 1 month ago disabled ← only when "Show disabled"
Actions per row: Edit · (Re-enable | Disable)
- "setup pending" badge for users with
must_change_password=1— clicking the row goes to edit, which surfaces the regenerate-link button prominently - "Show disabled" is a checkbox querystring filter (
?show_disabled=1) - Sort columns: clickable like the audit log (username, role, last_login). Reuse the same pattern (server-side sort + URL builder + glyph)
/settings/users/new (admin-only)
Single form: username + email (optional) + role. On submit → either landed on the setup-link page (success) or returned with an inline "username exists, re-enable existing?" panel (collision with disabled user) / red error (collision with enabled user).
/settings/users/{id}/edit (admin-only)
- Display-only block: id, created_at, last_login_at, status
- Editable: email, role
- Buttons:
- "Regenerate setup link" — only when
must_change_password = 1 - "Disable user" — flips
disabled_at; rejected if last enabled admin (server-side check). Confirmation modal with typed name to confirm. - "Re-enable user" — clears
disabled_at. No confirmation. - "Force logout" — separate from disable; just kills the session but keeps the user enabled. Useful for "I think Bob's session was hijacked" without locking him out.
- "Regenerate setup link" — only when
- Cancel / Save buttons at the bottom
/settings/users/{id}/setup-link (admin-only)
Renders the one-time link with copy button + countdown. Shown after add-user and after regenerate. Reload of this URL after the token is consumed: 410 Gone with a clear message.
/settings/account (any authenticated)
Self-service password change. Form-only page; no nav under Settings since most users will only see this one Settings page in v1.
API surface
GET /api/users admin — list (with ?show_disabled=1 filter)
POST /api/users admin — create user, returns user_id + setup_url
GET /api/users/{id} admin — read
PATCH /api/users/{id} admin — update email, role
POST /api/users/{id}/disable admin — set disabled_at; rejects last-admin
POST /api/users/{id}/enable admin — clear disabled_at
POST /api/users/{id}/regenerate-setup admin — new token, returns setup_url
POST /api/users/{id}/force-logout admin — kill all sessions for this user
POST /api/account/password any auth — self password change
GET /setup public — landing page (HTML form)
POST /setup public — submit new password
UI routes mirror the API but at /settings/users/....
Last-admin self-protection
Two operations that could lock everyone out are guarded:
- Disable user: rejected if the user is admin AND there are no other enabled admins
- Demote admin to operator/viewer: same check
Server-side enforcement (single SELECT on COUNT(*) FROM users WHERE role='admin' AND disabled_at IS NULL). UI hint: edit page disables the role dropdown's non-admin options + disable button when the user is the last admin, with a tooltip explaining why.
The bootstrap admin is just a regular admin row; this check covers it.
Audit actions
New action strings introduced:
user.createduser.updated(email / role change)user.disableduser.enableduser.password_changeduser.setup_completeduser.setup_token.regenerateduser.setup_token.expired(system-driven, on cleanup sweep)user.force_logout
All target_kind = user, target_id = the affected user's id. Existing payload conventions apply.
Ordering / dependencies
Slices in approximate landing order (writing-plans will firm this up):
- A. Schema — migrations 0017 + 0018,
Rolehelper updates, store API extensions (email, disabled_at, must_change_password, setup_token CRUD, lowercase username constraints) - B. RBAC middleware —
requireRole+roleAtLeast, route re-grouping in server.go, 403 rendering for HTML + JSON - C. Session re-validation — extend the existing session middleware to re-read user state per request, kick disabled users
- D. Setup-token flow —
/setupGET+POST, the one-time link page after add-user - E. User CRUD API — handlers + handlers' tests
- F. UI —
/settings/userslist, add, edit, setup-link page, account page - G. Sweep — Playwright walk through the full lifecycle (add → setup link → user signs in → admin disables → user gets kicked → admin re-enables → user signs back in)
Each slice can land as its own commit on the branch. RBAC middleware (B) goes in before user CRUD so we don't ship an open /api/users/* even briefly.
Test strategy
- Store:
Set/GetSetupToken,EnableUser/DisableUser, last-admin guard, lowercase-username uniqueness, expired-token cleanup - HTTP middleware:
roleAtLeasttruth table; viewer hitting an operator route returns 403; disabled user gets 401 mid-session - Setup flow integration: create user → fetch setup URL → land on
/setup?token=...→ POST password → user can log in → token row gone - UI: existing Playwright sweep pattern, screenshots into
_diag/p4-03-04-sweep/
Out of scope (deferred)
- OIDC (P4-05) — adds a parallel auth chain. This PR keeps the surface for it (role taxonomy, session middleware) but doesn't wire it.
- Email-the-setup-link — explicitly deferred. Easy follow-up because the SMTP channel client from P3-06 is already there.
- Hard delete — disable-only in v1; can add a typed-confirm "purge" later if it turns out to be needed.
- Password complexity / rotation policy — current minimum (12 chars) and no rotation; tighten later if/when policy demands.
- Lockout on failed login — a brute-force protection layer is its own task and orthogonal to RBAC.
- Audit on 403 — not in v1; revisit if compliance asks for it.
Risks / gotchas to watch
- Existing tests that assume "any logged-in user can hit any endpoint" will break. Audit the test fixtures: most use
loginAsAdmin, which is fine; any tests currently exercising specific operator/viewer paths need explicit role assignment. (Quick grep suggests there aren't many — bootstrap-only.) - Bootstrap user normalisation — the existing admin row's username is whatever it was set to at first run. The new lowercase-uniqueness index uses
LOWER(username), which makes the existing row implicitly lowercase-keyed for lookups. No data migration needed. - Session middleware re-read cost — one SELECT per authenticated request. SQLite WAL handles this fine at expected fleet sizes; if it ever shows up on a profile we add a small in-memory cache keyed by session id with a 30s TTL.
- 403 vs 401 distinction — make sure unauthenticated requests still get 401 (login redirect) and authenticated-but-insufficient get 403. The middleware should compose: auth-required first, role-required second.
Acceptance
- An admin can add a user, copy the setup link, the new user can land on
/setup?token=..., set a password, and reach/ - An expired token (>1h) on
/setup?token=...shows the "contact your administrator" page - Admin regenerates the link, old token is invalid, new token works
- Operator user can trigger Run-now but cannot reach
/settings/users(403) and the Users tab in Settings is hidden in their nav - Viewer user gets 403 on Run-now, 200 on dashboard / alerts / audit
- Admin disables a user mid-session — the user's next request is 401 and they're redirected to login
- Admin cannot disable themselves if they are the last enabled admin (server returns 409, UI button is greyed)
- Self-service password change at
/settings/accountworks for every role - All existing tests pass; new test suite covers role middleware, setup-token lifecycle, last-admin guard
Self-review notes
- ✅ All sections concrete, no TBD / TODO
- ✅ Schema migrations are column-level (CLAUDE.md compliance)
- ✅ Audit action vocabulary listed in one place; no string typos to drift
- ✅ Out-of-scope list explicit so reviewers can challenge what we aren't doing
- ✅ Last-admin guard handled both server-side and UI-hinted
- ✅ Token storage hashes the secret server-side; raw is shown to admin once and never again
- ✅ Session re-validation cost noted with a fallback if it shows up on a profile