spec: P4-03/04 — RBAC + user management design
Brainstormed shape locked: chi route-group middleware, fail-closed admin default; setup-token flow with 1h single-use tokens (sha256-hashed at rest, raw shown to admin once); disable-only user lifecycle with last-admin guard; self-service /settings/account password change for every role; email field on users (metadata v1); session re-validation on every authenticated request so disable / role change land immediately. Locked decisions captured in §Role taxonomy, §Schema changes, §Setup-token flow, §RBAC enforcement, §Last-admin self-protection. Deferred items in §Out of scope (OIDC, SMTP email-the-link, hard delete, lockout). Migrations 0017 (users extensions) + 0018 (user_setup_tokens) both column-level ALTERs per CLAUDE.md preference.
This commit is contained in:
@@ -0,0 +1,340 @@
|
||||
# P4-03 / P4-04 — RBAC + User Management Design
|
||||
|
||||
> **Date:** 2026-05-05
|
||||
> **Status:** brainstorm complete; ready for plan
|
||||
> **Closes:** P4-03 (RBAC enforcement at API layer), P4-04 (User management UI)
|
||||
|
||||
## Goal
|
||||
|
||||
Enforce role-based access control at the HTTP layer (currently every authenticated user has admin powers) and ship the operator-facing screens for managing users, roles, and password lifecycle.
|
||||
|
||||
## Architecture
|
||||
|
||||
Two coupled subsystems landing in one PR:
|
||||
|
||||
1. **RBAC enforcement** — chi route-group middleware that gates each subtree by minimum role. Fail-closed default (admin) so a forgotten declaration doesn't accidentally widen access.
|
||||
2. **User management** — `/settings/users` sub-tab with list / add / edit / disable. Setup-link flow for new users (1-hour-expiry single-use token). Self-service password change at `/settings/account`.
|
||||
|
||||
The audit log already records actor + user_id on every mutation; new endpoints fold in naturally.
|
||||
|
||||
## Role taxonomy
|
||||
|
||||
Locked. Three roles, hierarchical (admin ⊇ operator ⊇ viewer):
|
||||
|
||||
| Action | admin | operator | viewer |
|
||||
|---|:-:|:-:|:-:|
|
||||
| View dashboard / alerts / audit / hosts | ✓ | ✓ | ✓ |
|
||||
| Trigger Run-now / Restore / Snapshot diff | ✓ | ✓ | ✗ |
|
||||
| Acknowledge / resolve alerts | ✓ | ✓ | ✗ |
|
||||
| Edit schedules / source groups / retention / hooks | ✓ | ✓ | ✗ |
|
||||
| Add / remove hosts (enrolment, accept/reject pending) | ✓ | ✓ | ✗ |
|
||||
| Cancel running jobs | ✓ | ✓ | ✗ |
|
||||
| Edit repo credentials | ✓ | ✓ | ✗ |
|
||||
| Edit notification channels | ✓ | ✗ | ✗ |
|
||||
| Manage users | ✓ | ✗ | ✗ |
|
||||
| Self password change (`/settings/account`) | ✓ | ✓ | ✓ |
|
||||
|
||||
The role enum already exists in the schema (`CHECK (role IN ('admin','operator','viewer'))`) and in `internal/store/types.go`. Bootstrap creates the first user as admin. Zero migration needed for existing installs.
|
||||
|
||||
## Schema changes
|
||||
|
||||
All column-level ALTERs (CLAUDE.md prefers these over rebuilds; safe under `foreign_keys=ON`).
|
||||
|
||||
### Migration 0017 — `users` extensions
|
||||
|
||||
```sql
|
||||
ALTER TABLE users ADD COLUMN email TEXT;
|
||||
ALTER TABLE users ADD COLUMN disabled_at TEXT;
|
||||
ALTER TABLE users ADD COLUMN must_change_password INTEGER NOT NULL DEFAULT 0;
|
||||
|
||||
-- Username case-insensitive lookup. Existing rows are kept as-is;
|
||||
-- normalisation only applies to new INSERTs (handled in Go).
|
||||
CREATE UNIQUE INDEX users_username_lower ON users(LOWER(username));
|
||||
```
|
||||
|
||||
### Migration 0018 — `user_setup_tokens`
|
||||
|
||||
```sql
|
||||
CREATE TABLE user_setup_tokens (
|
||||
user_id TEXT PRIMARY KEY REFERENCES users(id) ON DELETE CASCADE,
|
||||
token_hash TEXT NOT NULL, -- sha256(raw_token), hex
|
||||
expires_at TEXT NOT NULL,
|
||||
created_at TEXT NOT NULL,
|
||||
created_by TEXT NOT NULL REFERENCES users(id) ON DELETE SET NULL
|
||||
);
|
||||
|
||||
CREATE INDEX user_setup_tokens_expires ON user_setup_tokens(expires_at);
|
||||
```
|
||||
|
||||
`user_id` is PRIMARY KEY, not just FOREIGN KEY — only one outstanding setup token per user. Regenerating supersedes the old via `INSERT OR REPLACE`.
|
||||
|
||||
## RBAC enforcement
|
||||
|
||||
### Middleware
|
||||
|
||||
```go
|
||||
// requireRole returns chi middleware that 403s any request whose
|
||||
// session-resolved user doesn't meet the minimum role. Roles are
|
||||
// hierarchical: admin > operator > viewer.
|
||||
func (s *Server) requireRole(min store.Role) func(http.Handler) http.Handler
|
||||
```
|
||||
|
||||
Hierarchy implemented as a small helper:
|
||||
|
||||
```go
|
||||
func roleAtLeast(have, min store.Role) bool {
|
||||
rank := map[store.Role]int{
|
||||
store.RoleViewer: 1,
|
||||
store.RoleOperator: 2,
|
||||
store.RoleAdmin: 3,
|
||||
}
|
||||
return rank[have] >= rank[min]
|
||||
}
|
||||
```
|
||||
|
||||
### Route grouping in `server.go`
|
||||
|
||||
The existing `/api` and UI routes get re-grouped into three role bands plus a self-service group:
|
||||
|
||||
```
|
||||
/api/* viewer-readable — GET endpoints anyone authenticated can hit
|
||||
/api/* operator+ — mutating endpoints up to host/source-group/schedule level
|
||||
/api/* admin-only — /api/users/*, channel CRUD
|
||||
/api/account — self-service password change
|
||||
|
||||
/audit, /alerts, /hosts/{id}, etc. — viewer
|
||||
/hosts/{id}/run, /alerts/{id}/ack — operator
|
||||
/settings/users/*, /settings/notifications/* — admin
|
||||
/settings/account — viewer (any authenticated)
|
||||
```
|
||||
|
||||
Default at the bottom of `routes()` is admin (fail-closed). Any future endpoint that doesn't get explicitly placed lands in admin-only, surfacing the missing declaration as a permission error rather than a silent bypass.
|
||||
|
||||
### Per-handler nuance
|
||||
|
||||
One existing case warrants a handler-level check on top of the route gate: `GET /settings/users/{id}/edit` is admin-only, but the `PUT /api/account/password` is viewer-OK. The split-by-route already covers this; no per-handler overrides expected in v1.
|
||||
|
||||
### Out of scope of role middleware
|
||||
|
||||
- `/ws/agent` and `/api/agents/*` — agent bearer-token auth, separate chain
|
||||
- `/healthz` — unauthenticated
|
||||
- `/login`, `/logout`, `/bootstrap` — public
|
||||
|
||||
### 403 handling
|
||||
|
||||
- JSON endpoints: `{"error":"forbidden","code":"insufficient_role"}` with HTTP 403
|
||||
- HTML endpoints: render a small "You don't have permission" panel inside the chrome (so the user keeps their nav and can move away), HTTP 403
|
||||
- **No audit row on 403** — too noisy with normal users hitting URLs they don't have access to
|
||||
|
||||
### Session re-validation
|
||||
|
||||
Sessions need to honour `disabled_at` and current role on every request, not just at login. The session-validation middleware reads the user row each request (single PK lookup, fast in SQLite). If `disabled_at IS NOT NULL`, the session is invalidated and the request 401s. This makes "disable user" and "force logout" effectively immediate.
|
||||
|
||||
Cost: one SELECT per authenticated request. SQLite handles this comfortably for the fleet sizes this codebase targets.
|
||||
|
||||
## Setup-token flow (replacing temp passwords)
|
||||
|
||||
### Add user
|
||||
|
||||
1. Admin clicks **+ Add user** on `/settings/users`
|
||||
2. Form: username (required, lowercase-normalised), email (optional, validated), role (admin/operator/viewer)
|
||||
3. Server:
|
||||
- Validates username uniqueness (case-insensitive). On collision with a *disabled* user, return a 409 with `{"existing_user_id": "...", "disabled": true}` so the UI can pivot to a "re-enable existing user" prompt
|
||||
- On collision with an enabled user: 409 with a plain "username taken" error
|
||||
- Creates user row with `password_hash = ""`, `must_change_password = 1`, `disabled_at = NULL`
|
||||
- Generates 32 random bytes, hex-encodes → raw token (64 chars). Stores `sha256(token)` hex in `user_setup_tokens`. `expires_at = now + 1h`
|
||||
- Audit: `user.created`, payload `{"username": "...", "role": "...", "with_setup_token": true}`
|
||||
4. Server returns the admin to a one-time setup-link page: `/settings/users/{id}/setup-link`
|
||||
- Shows the URL `http(s)://<base>/setup?token=<raw>` with a Copy button
|
||||
- Countdown timer (live JS) showing time-to-expiry
|
||||
- Warning: "This is the only time you'll see this link. If you lose it, regenerate from the user edit page."
|
||||
- "Done" button → `/settings/users`
|
||||
|
||||
The raw token is **never persisted** server-side. Lost tokens require regeneration.
|
||||
|
||||
### Setup landing page (public, no auth required)
|
||||
|
||||
1. User clicks the link, lands on `/setup?token=<raw>`
|
||||
2. Server hashes the token, looks up `user_setup_tokens` row, validates `expires_at > now`
|
||||
3. On invalid / expired: render an error page with a "Contact your administrator" message. Audit: `user.setup_token.expired` (no actor).
|
||||
4. On valid: render a password-set form: `new password + confirm`. Submit:
|
||||
- Validates password meets policy (min 12 chars, no other constraints in v1 — same as bootstrap path)
|
||||
- Hashes via `auth.HashPassword` (existing helper)
|
||||
- Updates `users.password_hash`, sets `must_change_password = 0`
|
||||
- Deletes the `user_setup_tokens` row (single-use)
|
||||
- Logs the user in via the existing session helper
|
||||
- Audit: `user.setup_completed`, payload `{"user_id": "..."}`
|
||||
- Redirect to `/`
|
||||
|
||||
### Regenerate setup link (admin)
|
||||
|
||||
`/settings/users/{id}/edit` shows a "Regenerate setup link" button when `must_change_password = 1`. Clicking it:
|
||||
|
||||
1. Generates a new token + hash, INSERT OR REPLACE on `user_setup_tokens`
|
||||
2. Returns the admin to the same one-time link page as the add-user flow
|
||||
3. Audit: `user.setup_token.regenerated`
|
||||
|
||||
### Cleanup
|
||||
|
||||
Expired tokens linger in the DB until cleaned. Add a cheap sweep on the existing maintenance ticker: `DELETE FROM user_setup_tokens WHERE expires_at < ?`. Runs at the same cadence as the alert engine tick (60s). No new ticker needed.
|
||||
|
||||
## Self-service password change
|
||||
|
||||
`/settings/account`
|
||||
|
||||
- Accessible to every authenticated user (any role)
|
||||
- Form: `current password + new password + confirm`
|
||||
- Server validates current password (re-uses login bcrypt comparison), updates hash, audits `user.password_changed`
|
||||
- Special case: if `must_change_password = 1`, the current-password field is hidden / not required (covers the legacy "admin reset password" path if we ever add one — current setup-token path doesn't use this)
|
||||
|
||||
The bootstrap user's password change uses this same page (no special case for "first admin").
|
||||
|
||||
## User list / management UI
|
||||
|
||||
### `/settings/users` (admin-only)
|
||||
|
||||
```
|
||||
Settings · Users [3]
|
||||
─────────────────────────────────────────────────
|
||||
[ + Add user ] [ ] Show disabled
|
||||
|
||||
USERNAME EMAIL ROLE LAST LOGIN STATUS
|
||||
alice alice@example.com admin 2 mins ago enabled
|
||||
bob — operator 3 days ago enabled
|
||||
charlie c@example.com viewer never setup pending ← if has open setup token
|
||||
diane d@example.com operator 1 month ago disabled ← only when "Show disabled"
|
||||
|
||||
Actions per row: Edit · (Re-enable | Disable)
|
||||
```
|
||||
|
||||
- "setup pending" badge for users with `must_change_password=1` — clicking the row goes to edit, which surfaces the regenerate-link button prominently
|
||||
- "Show disabled" is a checkbox querystring filter (`?show_disabled=1`)
|
||||
- Sort columns: clickable like the audit log (username, role, last_login). Reuse the same pattern (server-side sort + URL builder + glyph)
|
||||
|
||||
### `/settings/users/new` (admin-only)
|
||||
|
||||
Single form: `username + email (optional) + role`. On submit → either landed on the setup-link page (success) or returned with an inline "username exists, re-enable existing?" panel (collision with disabled user) / red error (collision with enabled user).
|
||||
|
||||
### `/settings/users/{id}/edit` (admin-only)
|
||||
|
||||
- Display-only block: id, created_at, last_login_at, status
|
||||
- **Editable**: email, role
|
||||
- **Buttons**:
|
||||
- "Regenerate setup link" — only when `must_change_password = 1`
|
||||
- "Disable user" — flips `disabled_at`; rejected if last enabled admin (server-side check). Confirmation modal with typed name to confirm.
|
||||
- "Re-enable user" — clears `disabled_at`. No confirmation.
|
||||
- "Force logout" — separate from disable; just kills the session but keeps the user enabled. Useful for "I think Bob's session was hijacked" without locking him out.
|
||||
- Cancel / Save buttons at the bottom
|
||||
|
||||
### `/settings/users/{id}/setup-link` (admin-only)
|
||||
|
||||
Renders the one-time link with copy button + countdown. Shown after add-user and after regenerate. Reload of this URL after the token is consumed: 410 Gone with a clear message.
|
||||
|
||||
### `/settings/account` (any authenticated)
|
||||
|
||||
Self-service password change. Form-only page; no nav under Settings since most users will only see this one Settings page in v1.
|
||||
|
||||
## API surface
|
||||
|
||||
```
|
||||
GET /api/users admin — list (with ?show_disabled=1 filter)
|
||||
POST /api/users admin — create user, returns user_id + setup_url
|
||||
GET /api/users/{id} admin — read
|
||||
PATCH /api/users/{id} admin — update email, role
|
||||
POST /api/users/{id}/disable admin — set disabled_at; rejects last-admin
|
||||
POST /api/users/{id}/enable admin — clear disabled_at
|
||||
POST /api/users/{id}/regenerate-setup admin — new token, returns setup_url
|
||||
POST /api/users/{id}/force-logout admin — kill all sessions for this user
|
||||
|
||||
POST /api/account/password any auth — self password change
|
||||
GET /setup public — landing page (HTML form)
|
||||
POST /setup public — submit new password
|
||||
```
|
||||
|
||||
UI routes mirror the API but at `/settings/users/...`.
|
||||
|
||||
## Last-admin self-protection
|
||||
|
||||
Two operations that could lock everyone out are guarded:
|
||||
|
||||
- **Disable user**: rejected if the user is admin AND there are no other enabled admins
|
||||
- **Demote admin to operator/viewer**: same check
|
||||
|
||||
Server-side enforcement (single SELECT on `COUNT(*) FROM users WHERE role='admin' AND disabled_at IS NULL`). UI hint: edit page disables the role dropdown's non-admin options + disable button when the user is the last admin, with a tooltip explaining why.
|
||||
|
||||
The bootstrap admin is just a regular admin row; this check covers it.
|
||||
|
||||
## Audit actions
|
||||
|
||||
New action strings introduced:
|
||||
|
||||
- `user.created`
|
||||
- `user.updated` (email / role change)
|
||||
- `user.disabled`
|
||||
- `user.enabled`
|
||||
- `user.password_changed`
|
||||
- `user.setup_completed`
|
||||
- `user.setup_token.regenerated`
|
||||
- `user.setup_token.expired` (system-driven, on cleanup sweep)
|
||||
- `user.force_logout`
|
||||
|
||||
All target_kind = `user`, target_id = the affected user's id. Existing payload conventions apply.
|
||||
|
||||
## Ordering / dependencies
|
||||
|
||||
Slices in approximate landing order (writing-plans will firm this up):
|
||||
|
||||
1. **A. Schema** — migrations 0017 + 0018, `Role` helper updates, store API extensions (email, disabled_at, must_change_password, setup_token CRUD, lowercase username constraints)
|
||||
2. **B. RBAC middleware** — `requireRole` + `roleAtLeast`, route re-grouping in server.go, 403 rendering for HTML + JSON
|
||||
3. **C. Session re-validation** — extend the existing session middleware to re-read user state per request, kick disabled users
|
||||
4. **D. Setup-token flow** — `/setup` GET+POST, the one-time link page after add-user
|
||||
5. **E. User CRUD API** — handlers + handlers' tests
|
||||
6. **F. UI** — `/settings/users` list, add, edit, setup-link page, account page
|
||||
7. **G. Sweep** — Playwright walk through the full lifecycle (add → setup link → user signs in → admin disables → user gets kicked → admin re-enables → user signs back in)
|
||||
|
||||
Each slice can land as its own commit on the branch. RBAC middleware (B) goes in *before* user CRUD so we don't ship an open `/api/users/*` even briefly.
|
||||
|
||||
## Test strategy
|
||||
|
||||
- **Store**: `Set/GetSetupToken`, `EnableUser`/`DisableUser`, last-admin guard, lowercase-username uniqueness, expired-token cleanup
|
||||
- **HTTP middleware**: `roleAtLeast` truth table; viewer hitting an operator route returns 403; disabled user gets 401 mid-session
|
||||
- **Setup flow integration**: create user → fetch setup URL → land on `/setup?token=...` → POST password → user can log in → token row gone
|
||||
- **UI**: existing Playwright sweep pattern, screenshots into `_diag/p4-03-04-sweep/`
|
||||
|
||||
## Out of scope (deferred)
|
||||
|
||||
- **OIDC** (P4-05) — adds a parallel auth chain. This PR keeps the surface for it (role taxonomy, session middleware) but doesn't wire it.
|
||||
- **Email-the-setup-link** — explicitly deferred. Easy follow-up because the SMTP channel client from P3-06 is already there.
|
||||
- **Hard delete** — disable-only in v1; can add a typed-confirm "purge" later if it turns out to be needed.
|
||||
- **Password complexity / rotation policy** — current minimum (12 chars) and no rotation; tighten later if/when policy demands.
|
||||
- **Lockout on failed login** — a brute-force protection layer is its own task and orthogonal to RBAC.
|
||||
- **Audit on 403** — not in v1; revisit if compliance asks for it.
|
||||
|
||||
## Risks / gotchas to watch
|
||||
|
||||
- **Existing tests** that assume "any logged-in user can hit any endpoint" will break. Audit the test fixtures: most use `loginAsAdmin`, which is fine; any tests currently exercising specific operator/viewer paths need explicit role assignment. (Quick grep suggests there aren't many — bootstrap-only.)
|
||||
- **Bootstrap user normalisation** — the existing admin row's username is whatever it was set to at first run. The new lowercase-uniqueness index uses `LOWER(username)`, which makes the existing row implicitly lowercase-keyed for lookups. No data migration needed.
|
||||
- **Session middleware re-read cost** — one SELECT per authenticated request. SQLite WAL handles this fine at expected fleet sizes; if it ever shows up on a profile we add a small in-memory cache keyed by session id with a 30s TTL.
|
||||
- **403 vs 401 distinction** — make sure unauthenticated requests still get 401 (login redirect) and authenticated-but-insufficient get 403. The middleware should compose: auth-required first, role-required second.
|
||||
|
||||
## Acceptance
|
||||
|
||||
- [ ] An admin can add a user, copy the setup link, the new user can land on `/setup?token=...`, set a password, and reach `/`
|
||||
- [ ] An expired token (>1h) on `/setup?token=...` shows the "contact your administrator" page
|
||||
- [ ] Admin regenerates the link, old token is invalid, new token works
|
||||
- [ ] Operator user can trigger Run-now but cannot reach `/settings/users` (403) and the Users tab in Settings is hidden in their nav
|
||||
- [ ] Viewer user gets 403 on Run-now, 200 on dashboard / alerts / audit
|
||||
- [ ] Admin disables a user mid-session — the user's next request is 401 and they're redirected to login
|
||||
- [ ] Admin cannot disable themselves if they are the last enabled admin (server returns 409, UI button is greyed)
|
||||
- [ ] Self-service password change at `/settings/account` works for every role
|
||||
- [ ] All existing tests pass; new test suite covers role middleware, setup-token lifecycle, last-admin guard
|
||||
|
||||
## Self-review notes
|
||||
|
||||
- ✅ All sections concrete, no TBD / TODO
|
||||
- ✅ Schema migrations are column-level (CLAUDE.md compliance)
|
||||
- ✅ Audit action vocabulary listed in one place; no string typos to drift
|
||||
- ✅ Out-of-scope list explicit so reviewers can challenge what we *aren't* doing
|
||||
- ✅ Last-admin guard handled both server-side and UI-hinted
|
||||
- ✅ Token storage hashes the secret server-side; raw is shown to admin once and never again
|
||||
- ✅ Session re-validation cost noted with a fallback if it shows up on a profile
|
||||
Reference in New Issue
Block a user