# P4-03 / P4-04 — RBAC + User Management Design > **Date:** 2026-05-05 > **Status:** brainstorm complete; ready for plan > **Closes:** P4-03 (RBAC enforcement at API layer), P4-04 (User management UI) ## Goal Enforce role-based access control at the HTTP layer (currently every authenticated user has admin powers) and ship the operator-facing screens for managing users, roles, and password lifecycle. ## Architecture Two coupled subsystems landing in one PR: 1. **RBAC enforcement** — chi route-group middleware that gates each subtree by minimum role. Fail-closed default (admin) so a forgotten declaration doesn't accidentally widen access. 2. **User management** — `/settings/users` sub-tab with list / add / edit / disable. Setup-link flow for new users (1-hour-expiry single-use token). Self-service password change at `/settings/account`. The audit log already records actor + user_id on every mutation; new endpoints fold in naturally. ## Role taxonomy Locked. Three roles, hierarchical (admin ⊇ operator ⊇ viewer): | Action | admin | operator | viewer | |---|:-:|:-:|:-:| | View dashboard / alerts / audit / hosts | ✓ | ✓ | ✓ | | Trigger Run-now / Restore / Snapshot diff | ✓ | ✓ | ✗ | | Acknowledge / resolve alerts | ✓ | ✓ | ✗ | | Edit schedules / source groups / retention / hooks | ✓ | ✓ | ✗ | | Add / remove hosts (enrolment, accept/reject pending) | ✓ | ✓ | ✗ | | Cancel running jobs | ✓ | ✓ | ✗ | | Edit repo credentials | ✓ | ✓ | ✗ | | Edit notification channels | ✓ | ✗ | ✗ | | Manage users | ✓ | ✗ | ✗ | | Self password change (`/settings/account`) | ✓ | ✓ | ✓ | The role enum already exists in the schema (`CHECK (role IN ('admin','operator','viewer'))`) and in `internal/store/types.go`. Bootstrap creates the first user as admin. Zero migration needed for existing installs. ## Schema changes All column-level ALTERs (CLAUDE.md prefers these over rebuilds; safe under `foreign_keys=ON`). ### Migration 0017 — `users` extensions ```sql ALTER TABLE users ADD COLUMN email TEXT; ALTER TABLE users ADD COLUMN disabled_at TEXT; ALTER TABLE users ADD COLUMN must_change_password INTEGER NOT NULL DEFAULT 0; -- Username case-insensitive lookup. Existing rows are kept as-is; -- normalisation only applies to new INSERTs (handled in Go). CREATE UNIQUE INDEX users_username_lower ON users(LOWER(username)); ``` ### Migration 0018 — `user_setup_tokens` ```sql CREATE TABLE user_setup_tokens ( user_id TEXT PRIMARY KEY REFERENCES users(id) ON DELETE CASCADE, token_hash TEXT NOT NULL, -- sha256(raw_token), hex expires_at TEXT NOT NULL, created_at TEXT NOT NULL, created_by TEXT NOT NULL REFERENCES users(id) ON DELETE SET NULL ); CREATE INDEX user_setup_tokens_expires ON user_setup_tokens(expires_at); ``` `user_id` is PRIMARY KEY, not just FOREIGN KEY — only one outstanding setup token per user. Regenerating supersedes the old via `INSERT OR REPLACE`. ## RBAC enforcement ### Middleware ```go // requireRole returns chi middleware that 403s any request whose // session-resolved user doesn't meet the minimum role. Roles are // hierarchical: admin > operator > viewer. func (s *Server) requireRole(min store.Role) func(http.Handler) http.Handler ``` Hierarchy implemented as a small helper: ```go func roleAtLeast(have, min store.Role) bool { rank := map[store.Role]int{ store.RoleViewer: 1, store.RoleOperator: 2, store.RoleAdmin: 3, } return rank[have] >= rank[min] } ``` ### Route grouping in `server.go` The existing `/api` and UI routes get re-grouped into three role bands plus a self-service group: ``` /api/* viewer-readable — GET endpoints anyone authenticated can hit /api/* operator+ — mutating endpoints up to host/source-group/schedule level /api/* admin-only — /api/users/*, channel CRUD /api/account — self-service password change /audit, /alerts, /hosts/{id}, etc. — viewer /hosts/{id}/run, /alerts/{id}/ack — operator /settings/users/*, /settings/notifications/* — admin /settings/account — viewer (any authenticated) ``` Default at the bottom of `routes()` is admin (fail-closed). Any future endpoint that doesn't get explicitly placed lands in admin-only, surfacing the missing declaration as a permission error rather than a silent bypass. ### Per-handler nuance One existing case warrants a handler-level check on top of the route gate: `GET /settings/users/{id}/edit` is admin-only, but the `PUT /api/account/password` is viewer-OK. The split-by-route already covers this; no per-handler overrides expected in v1. ### Out of scope of role middleware - `/ws/agent` and `/api/agents/*` — agent bearer-token auth, separate chain - `/healthz` — unauthenticated - `/login`, `/logout`, `/bootstrap` — public ### 403 handling - JSON endpoints: `{"error":"forbidden","code":"insufficient_role"}` with HTTP 403 - HTML endpoints: render a small "You don't have permission" panel inside the chrome (so the user keeps their nav and can move away), HTTP 403 - **No audit row on 403** — too noisy with normal users hitting URLs they don't have access to ### Session re-validation Sessions need to honour `disabled_at` and current role on every request, not just at login. The session-validation middleware reads the user row each request (single PK lookup, fast in SQLite). If `disabled_at IS NOT NULL`, the session is invalidated and the request 401s. This makes "disable user" and "force logout" effectively immediate. Cost: one SELECT per authenticated request. SQLite handles this comfortably for the fleet sizes this codebase targets. ## Setup-token flow (replacing temp passwords) ### Add user 1. Admin clicks **+ Add user** on `/settings/users` 2. Form: username (required, lowercase-normalised), email (optional, validated), role (admin/operator/viewer) 3. Server: - Validates username uniqueness (case-insensitive). On collision with a *disabled* user, return a 409 with `{"existing_user_id": "...", "disabled": true}` so the UI can pivot to a "re-enable existing user" prompt - On collision with an enabled user: 409 with a plain "username taken" error - Creates user row with `password_hash = ""`, `must_change_password = 1`, `disabled_at = NULL` - Generates 32 random bytes, hex-encodes → raw token (64 chars). Stores `sha256(token)` hex in `user_setup_tokens`. `expires_at = now + 1h` - Audit: `user.created`, payload `{"username": "...", "role": "...", "with_setup_token": true}` 4. Server returns the admin to a one-time setup-link page: `/settings/users/{id}/setup-link` - Shows the URL `http(s):///setup?token=` with a Copy button - Countdown timer (live JS) showing time-to-expiry - Warning: "This is the only time you'll see this link. If you lose it, regenerate from the user edit page." - "Done" button → `/settings/users` The raw token is **never persisted** server-side. Lost tokens require regeneration. ### Setup landing page (public, no auth required) 1. User clicks the link, lands on `/setup?token=` 2. Server hashes the token, looks up `user_setup_tokens` row, validates `expires_at > now` 3. On invalid / expired: render an error page with a "Contact your administrator" message. Audit: `user.setup_token.expired` (no actor). 4. On valid: render a password-set form: `new password + confirm`. Submit: - Validates password meets policy (min 12 chars, no other constraints in v1 — same as bootstrap path) - Hashes via `auth.HashPassword` (existing helper) - Updates `users.password_hash`, sets `must_change_password = 0` - Deletes the `user_setup_tokens` row (single-use) - Logs the user in via the existing session helper - Audit: `user.setup_completed`, payload `{"user_id": "..."}` - Redirect to `/` ### Regenerate setup link (admin) `/settings/users/{id}/edit` shows a "Regenerate setup link" button when `must_change_password = 1`. Clicking it: 1. Generates a new token + hash, INSERT OR REPLACE on `user_setup_tokens` 2. Returns the admin to the same one-time link page as the add-user flow 3. Audit: `user.setup_token.regenerated` ### Cleanup Expired tokens linger in the DB until cleaned. Add a cheap sweep on the existing maintenance ticker: `DELETE FROM user_setup_tokens WHERE expires_at < ?`. Runs at the same cadence as the alert engine tick (60s). No new ticker needed. ## Self-service password change `/settings/account` - Accessible to every authenticated user (any role) - Form: `current password + new password + confirm` - Server validates current password (re-uses login bcrypt comparison), updates hash, audits `user.password_changed` - Special case: if `must_change_password = 1`, the current-password field is hidden / not required (covers the legacy "admin reset password" path if we ever add one — current setup-token path doesn't use this) The bootstrap user's password change uses this same page (no special case for "first admin"). ## User list / management UI ### `/settings/users` (admin-only) ``` Settings · Users [3] ───────────────────────────────────────────────── [ + Add user ] [ ] Show disabled USERNAME EMAIL ROLE LAST LOGIN STATUS alice alice@example.com admin 2 mins ago enabled bob — operator 3 days ago enabled charlie c@example.com viewer never setup pending ← if has open setup token diane d@example.com operator 1 month ago disabled ← only when "Show disabled" Actions per row: Edit · (Re-enable | Disable) ``` - "setup pending" badge for users with `must_change_password=1` — clicking the row goes to edit, which surfaces the regenerate-link button prominently - "Show disabled" is a checkbox querystring filter (`?show_disabled=1`) - Sort columns: clickable like the audit log (username, role, last_login). Reuse the same pattern (server-side sort + URL builder + glyph) ### `/settings/users/new` (admin-only) Single form: `username + email (optional) + role`. On submit → either landed on the setup-link page (success) or returned with an inline "username exists, re-enable existing?" panel (collision with disabled user) / red error (collision with enabled user). ### `/settings/users/{id}/edit` (admin-only) - Display-only block: id, created_at, last_login_at, status - **Editable**: email, role - **Buttons**: - "Regenerate setup link" — only when `must_change_password = 1` - "Disable user" — flips `disabled_at`; rejected if last enabled admin (server-side check). Confirmation modal with typed name to confirm. - "Re-enable user" — clears `disabled_at`. No confirmation. - "Force logout" — separate from disable; just kills the session but keeps the user enabled. Useful for "I think Bob's session was hijacked" without locking him out. - Cancel / Save buttons at the bottom ### `/settings/users/{id}/setup-link` (admin-only) Renders the one-time link with copy button + countdown. Shown after add-user and after regenerate. Reload of this URL after the token is consumed: 410 Gone with a clear message. ### `/settings/account` (any authenticated) Self-service password change. Form-only page; no nav under Settings since most users will only see this one Settings page in v1. ## API surface ``` GET /api/users admin — list (with ?show_disabled=1 filter) POST /api/users admin — create user, returns user_id + setup_url GET /api/users/{id} admin — read PATCH /api/users/{id} admin — update email, role POST /api/users/{id}/disable admin — set disabled_at; rejects last-admin POST /api/users/{id}/enable admin — clear disabled_at POST /api/users/{id}/regenerate-setup admin — new token, returns setup_url POST /api/users/{id}/force-logout admin — kill all sessions for this user POST /api/account/password any auth — self password change GET /setup public — landing page (HTML form) POST /setup public — submit new password ``` UI routes mirror the API but at `/settings/users/...`. ## Last-admin self-protection Two operations that could lock everyone out are guarded: - **Disable user**: rejected if the user is admin AND there are no other enabled admins - **Demote admin to operator/viewer**: same check Server-side enforcement (single SELECT on `COUNT(*) FROM users WHERE role='admin' AND disabled_at IS NULL`). UI hint: edit page disables the role dropdown's non-admin options + disable button when the user is the last admin, with a tooltip explaining why. The bootstrap admin is just a regular admin row; this check covers it. ## Audit actions New action strings introduced: - `user.created` - `user.updated` (email / role change) - `user.disabled` - `user.enabled` - `user.password_changed` - `user.setup_completed` - `user.setup_token.regenerated` - `user.setup_token.expired` (system-driven, on cleanup sweep) - `user.force_logout` All target_kind = `user`, target_id = the affected user's id. Existing payload conventions apply. ## Ordering / dependencies Slices in approximate landing order (writing-plans will firm this up): 1. **A. Schema** — migrations 0017 + 0018, `Role` helper updates, store API extensions (email, disabled_at, must_change_password, setup_token CRUD, lowercase username constraints) 2. **B. RBAC middleware** — `requireRole` + `roleAtLeast`, route re-grouping in server.go, 403 rendering for HTML + JSON 3. **C. Session re-validation** — extend the existing session middleware to re-read user state per request, kick disabled users 4. **D. Setup-token flow** — `/setup` GET+POST, the one-time link page after add-user 5. **E. User CRUD API** — handlers + handlers' tests 6. **F. UI** — `/settings/users` list, add, edit, setup-link page, account page 7. **G. Sweep** — Playwright walk through the full lifecycle (add → setup link → user signs in → admin disables → user gets kicked → admin re-enables → user signs back in) Each slice can land as its own commit on the branch. RBAC middleware (B) goes in *before* user CRUD so we don't ship an open `/api/users/*` even briefly. ## Test strategy - **Store**: `Set/GetSetupToken`, `EnableUser`/`DisableUser`, last-admin guard, lowercase-username uniqueness, expired-token cleanup - **HTTP middleware**: `roleAtLeast` truth table; viewer hitting an operator route returns 403; disabled user gets 401 mid-session - **Setup flow integration**: create user → fetch setup URL → land on `/setup?token=...` → POST password → user can log in → token row gone - **UI**: existing Playwright sweep pattern, screenshots into `_diag/p4-03-04-sweep/` ## Out of scope (deferred) - **OIDC** (P4-05) — adds a parallel auth chain. This PR keeps the surface for it (role taxonomy, session middleware) but doesn't wire it. - **Email-the-setup-link** — explicitly deferred. Easy follow-up because the SMTP channel client from P3-06 is already there. - **Hard delete** — disable-only in v1; can add a typed-confirm "purge" later if it turns out to be needed. - **Password complexity / rotation policy** — current minimum (12 chars) and no rotation; tighten later if/when policy demands. - **Lockout on failed login** — a brute-force protection layer is its own task and orthogonal to RBAC. - **Audit on 403** — not in v1; revisit if compliance asks for it. ## Risks / gotchas to watch - **Existing tests** that assume "any logged-in user can hit any endpoint" will break. Audit the test fixtures: most use `loginAsAdmin`, which is fine; any tests currently exercising specific operator/viewer paths need explicit role assignment. (Quick grep suggests there aren't many — bootstrap-only.) - **Bootstrap user normalisation** — the existing admin row's username is whatever it was set to at first run. The new lowercase-uniqueness index uses `LOWER(username)`, which makes the existing row implicitly lowercase-keyed for lookups. No data migration needed. - **Session middleware re-read cost** — one SELECT per authenticated request. SQLite WAL handles this fine at expected fleet sizes; if it ever shows up on a profile we add a small in-memory cache keyed by session id with a 30s TTL. - **403 vs 401 distinction** — make sure unauthenticated requests still get 401 (login redirect) and authenticated-but-insufficient get 403. The middleware should compose: auth-required first, role-required second. ## Acceptance - [ ] An admin can add a user, copy the setup link, the new user can land on `/setup?token=...`, set a password, and reach `/` - [ ] An expired token (>1h) on `/setup?token=...` shows the "contact your administrator" page - [ ] Admin regenerates the link, old token is invalid, new token works - [ ] Operator user can trigger Run-now but cannot reach `/settings/users` (403) and the Users tab in Settings is hidden in their nav - [ ] Viewer user gets 403 on Run-now, 200 on dashboard / alerts / audit - [ ] Admin disables a user mid-session — the user's next request is 401 and they're redirected to login - [ ] Admin cannot disable themselves if they are the last enabled admin (server returns 409, UI button is greyed) - [ ] Self-service password change at `/settings/account` works for every role - [ ] All existing tests pass; new test suite covers role middleware, setup-token lifecycle, last-admin guard ## Self-review notes - ✅ All sections concrete, no TBD / TODO - ✅ Schema migrations are column-level (CLAUDE.md compliance) - ✅ Audit action vocabulary listed in one place; no string typos to drift - ✅ Out-of-scope list explicit so reviewers can challenge what we *aren't* doing - ✅ Last-admin guard handled both server-side and UI-hinted - ✅ Token storage hashes the secret server-side; raw is shown to admin once and never again - ✅ Session re-validation cost noted with a fallback if it shows up on a profile