Files
restic-manager/docs/superpowers/specs/2026-05-05-p4-03-04-rbac-user-mgmt-design.md
T
steve 282258e837 spec: P4-03/04 — RBAC + user management design
Brainstormed shape locked: chi route-group middleware, fail-closed
admin default; setup-token flow with 1h single-use tokens
(sha256-hashed at rest, raw shown to admin once); disable-only user
lifecycle with last-admin guard; self-service /settings/account
password change for every role; email field on users (metadata
v1); session re-validation on every authenticated request so
disable / role change land immediately.

Locked decisions captured in §Role taxonomy, §Schema changes,
§Setup-token flow, §RBAC enforcement, §Last-admin self-protection.
Deferred items in §Out of scope (OIDC, SMTP email-the-link,
hard delete, lockout).

Migrations 0017 (users extensions) + 0018 (user_setup_tokens)
both column-level ALTERs per CLAUDE.md preference.
2026-05-05 10:57:24 +01:00

18 KiB

P4-03 / P4-04 — RBAC + User Management Design

Date: 2026-05-05 Status: brainstorm complete; ready for plan Closes: P4-03 (RBAC enforcement at API layer), P4-04 (User management UI)

Goal

Enforce role-based access control at the HTTP layer (currently every authenticated user has admin powers) and ship the operator-facing screens for managing users, roles, and password lifecycle.

Architecture

Two coupled subsystems landing in one PR:

  1. RBAC enforcement — chi route-group middleware that gates each subtree by minimum role. Fail-closed default (admin) so a forgotten declaration doesn't accidentally widen access.
  2. User management/settings/users sub-tab with list / add / edit / disable. Setup-link flow for new users (1-hour-expiry single-use token). Self-service password change at /settings/account.

The audit log already records actor + user_id on every mutation; new endpoints fold in naturally.

Role taxonomy

Locked. Three roles, hierarchical (admin ⊇ operator ⊇ viewer):

Action admin operator viewer
View dashboard / alerts / audit / hosts
Trigger Run-now / Restore / Snapshot diff
Acknowledge / resolve alerts
Edit schedules / source groups / retention / hooks
Add / remove hosts (enrolment, accept/reject pending)
Cancel running jobs
Edit repo credentials
Edit notification channels
Manage users
Self password change (/settings/account)

The role enum already exists in the schema (CHECK (role IN ('admin','operator','viewer'))) and in internal/store/types.go. Bootstrap creates the first user as admin. Zero migration needed for existing installs.

Schema changes

All column-level ALTERs (CLAUDE.md prefers these over rebuilds; safe under foreign_keys=ON).

Migration 0017 — users extensions

ALTER TABLE users ADD COLUMN email TEXT;
ALTER TABLE users ADD COLUMN disabled_at TEXT;
ALTER TABLE users ADD COLUMN must_change_password INTEGER NOT NULL DEFAULT 0;

-- Username case-insensitive lookup. Existing rows are kept as-is;
-- normalisation only applies to new INSERTs (handled in Go).
CREATE UNIQUE INDEX users_username_lower ON users(LOWER(username));

Migration 0018 — user_setup_tokens

CREATE TABLE user_setup_tokens (
  user_id     TEXT PRIMARY KEY REFERENCES users(id) ON DELETE CASCADE,
  token_hash  TEXT NOT NULL,           -- sha256(raw_token), hex
  expires_at  TEXT NOT NULL,
  created_at  TEXT NOT NULL,
  created_by  TEXT NOT NULL REFERENCES users(id) ON DELETE SET NULL
);

CREATE INDEX user_setup_tokens_expires ON user_setup_tokens(expires_at);

user_id is PRIMARY KEY, not just FOREIGN KEY — only one outstanding setup token per user. Regenerating supersedes the old via INSERT OR REPLACE.

RBAC enforcement

Middleware

// requireRole returns chi middleware that 403s any request whose
// session-resolved user doesn't meet the minimum role. Roles are
// hierarchical: admin > operator > viewer.
func (s *Server) requireRole(min store.Role) func(http.Handler) http.Handler

Hierarchy implemented as a small helper:

func roleAtLeast(have, min store.Role) bool {
    rank := map[store.Role]int{
        store.RoleViewer:   1,
        store.RoleOperator: 2,
        store.RoleAdmin:    3,
    }
    return rank[have] >= rank[min]
}

Route grouping in server.go

The existing /api and UI routes get re-grouped into three role bands plus a self-service group:

/api/* viewer-readable    — GET endpoints anyone authenticated can hit
/api/* operator+          — mutating endpoints up to host/source-group/schedule level
/api/* admin-only         — /api/users/*, channel CRUD
/api/account              — self-service password change

/audit, /alerts, /hosts/{id}, etc.   — viewer
/hosts/{id}/run, /alerts/{id}/ack    — operator
/settings/users/*, /settings/notifications/* — admin
/settings/account                    — viewer (any authenticated)

Default at the bottom of routes() is admin (fail-closed). Any future endpoint that doesn't get explicitly placed lands in admin-only, surfacing the missing declaration as a permission error rather than a silent bypass.

Per-handler nuance

One existing case warrants a handler-level check on top of the route gate: GET /settings/users/{id}/edit is admin-only, but the PUT /api/account/password is viewer-OK. The split-by-route already covers this; no per-handler overrides expected in v1.

Out of scope of role middleware

  • /ws/agent and /api/agents/* — agent bearer-token auth, separate chain
  • /healthz — unauthenticated
  • /login, /logout, /bootstrap — public

403 handling

  • JSON endpoints: {"error":"forbidden","code":"insufficient_role"} with HTTP 403
  • HTML endpoints: render a small "You don't have permission" panel inside the chrome (so the user keeps their nav and can move away), HTTP 403
  • No audit row on 403 — too noisy with normal users hitting URLs they don't have access to

Session re-validation

Sessions need to honour disabled_at and current role on every request, not just at login. The session-validation middleware reads the user row each request (single PK lookup, fast in SQLite). If disabled_at IS NOT NULL, the session is invalidated and the request 401s. This makes "disable user" and "force logout" effectively immediate.

Cost: one SELECT per authenticated request. SQLite handles this comfortably for the fleet sizes this codebase targets.

Setup-token flow (replacing temp passwords)

Add user

  1. Admin clicks + Add user on /settings/users
  2. Form: username (required, lowercase-normalised), email (optional, validated), role (admin/operator/viewer)
  3. Server:
    • Validates username uniqueness (case-insensitive). On collision with a disabled user, return a 409 with {"existing_user_id": "...", "disabled": true} so the UI can pivot to a "re-enable existing user" prompt
    • On collision with an enabled user: 409 with a plain "username taken" error
    • Creates user row with password_hash = "", must_change_password = 1, disabled_at = NULL
    • Generates 32 random bytes, hex-encodes → raw token (64 chars). Stores sha256(token) hex in user_setup_tokens. expires_at = now + 1h
    • Audit: user.created, payload {"username": "...", "role": "...", "with_setup_token": true}
  4. Server returns the admin to a one-time setup-link page: /settings/users/{id}/setup-link
    • Shows the URL http(s)://<base>/setup?token=<raw> with a Copy button
    • Countdown timer (live JS) showing time-to-expiry
    • Warning: "This is the only time you'll see this link. If you lose it, regenerate from the user edit page."
    • "Done" button → /settings/users

The raw token is never persisted server-side. Lost tokens require regeneration.

Setup landing page (public, no auth required)

  1. User clicks the link, lands on /setup?token=<raw>
  2. Server hashes the token, looks up user_setup_tokens row, validates expires_at > now
  3. On invalid / expired: render an error page with a "Contact your administrator" message. Audit: user.setup_token.expired (no actor).
  4. On valid: render a password-set form: new password + confirm. Submit:
    • Validates password meets policy (min 12 chars, no other constraints in v1 — same as bootstrap path)
    • Hashes via auth.HashPassword (existing helper)
    • Updates users.password_hash, sets must_change_password = 0
    • Deletes the user_setup_tokens row (single-use)
    • Logs the user in via the existing session helper
    • Audit: user.setup_completed, payload {"user_id": "..."}
    • Redirect to /

/settings/users/{id}/edit shows a "Regenerate setup link" button when must_change_password = 1. Clicking it:

  1. Generates a new token + hash, INSERT OR REPLACE on user_setup_tokens
  2. Returns the admin to the same one-time link page as the add-user flow
  3. Audit: user.setup_token.regenerated

Cleanup

Expired tokens linger in the DB until cleaned. Add a cheap sweep on the existing maintenance ticker: DELETE FROM user_setup_tokens WHERE expires_at < ?. Runs at the same cadence as the alert engine tick (60s). No new ticker needed.

Self-service password change

/settings/account

  • Accessible to every authenticated user (any role)
  • Form: current password + new password + confirm
  • Server validates current password (re-uses login bcrypt comparison), updates hash, audits user.password_changed
  • Special case: if must_change_password = 1, the current-password field is hidden / not required (covers the legacy "admin reset password" path if we ever add one — current setup-token path doesn't use this)

The bootstrap user's password change uses this same page (no special case for "first admin").

User list / management UI

/settings/users (admin-only)

Settings · Users [3]
─────────────────────────────────────────────────
[ + Add user ]                       [ ] Show disabled

USERNAME       EMAIL              ROLE      LAST LOGIN     STATUS
alice          alice@example.com  admin     2 mins ago     enabled
bob            —                  operator  3 days ago     enabled
charlie        c@example.com      viewer    never          setup pending  ← if has open setup token
diane          d@example.com      operator  1 month ago    disabled       ← only when "Show disabled"

Actions per row: Edit · (Re-enable | Disable)
  • "setup pending" badge for users with must_change_password=1 — clicking the row goes to edit, which surfaces the regenerate-link button prominently
  • "Show disabled" is a checkbox querystring filter (?show_disabled=1)
  • Sort columns: clickable like the audit log (username, role, last_login). Reuse the same pattern (server-side sort + URL builder + glyph)

/settings/users/new (admin-only)

Single form: username + email (optional) + role. On submit → either landed on the setup-link page (success) or returned with an inline "username exists, re-enable existing?" panel (collision with disabled user) / red error (collision with enabled user).

/settings/users/{id}/edit (admin-only)

  • Display-only block: id, created_at, last_login_at, status
  • Editable: email, role
  • Buttons:
    • "Regenerate setup link" — only when must_change_password = 1
    • "Disable user" — flips disabled_at; rejected if last enabled admin (server-side check). Confirmation modal with typed name to confirm.
    • "Re-enable user" — clears disabled_at. No confirmation.
    • "Force logout" — separate from disable; just kills the session but keeps the user enabled. Useful for "I think Bob's session was hijacked" without locking him out.
  • Cancel / Save buttons at the bottom

Renders the one-time link with copy button + countdown. Shown after add-user and after regenerate. Reload of this URL after the token is consumed: 410 Gone with a clear message.

/settings/account (any authenticated)

Self-service password change. Form-only page; no nav under Settings since most users will only see this one Settings page in v1.

API surface

GET    /api/users                        admin   — list (with ?show_disabled=1 filter)
POST   /api/users                        admin   — create user, returns user_id + setup_url
GET    /api/users/{id}                   admin   — read
PATCH  /api/users/{id}                   admin   — update email, role
POST   /api/users/{id}/disable           admin   — set disabled_at; rejects last-admin
POST   /api/users/{id}/enable            admin   — clear disabled_at
POST   /api/users/{id}/regenerate-setup  admin   — new token, returns setup_url
POST   /api/users/{id}/force-logout      admin   — kill all sessions for this user

POST   /api/account/password             any auth — self password change
GET    /setup                            public — landing page (HTML form)
POST   /setup                            public — submit new password

UI routes mirror the API but at /settings/users/....

Last-admin self-protection

Two operations that could lock everyone out are guarded:

  • Disable user: rejected if the user is admin AND there are no other enabled admins
  • Demote admin to operator/viewer: same check

Server-side enforcement (single SELECT on COUNT(*) FROM users WHERE role='admin' AND disabled_at IS NULL). UI hint: edit page disables the role dropdown's non-admin options + disable button when the user is the last admin, with a tooltip explaining why.

The bootstrap admin is just a regular admin row; this check covers it.

Audit actions

New action strings introduced:

  • user.created
  • user.updated (email / role change)
  • user.disabled
  • user.enabled
  • user.password_changed
  • user.setup_completed
  • user.setup_token.regenerated
  • user.setup_token.expired (system-driven, on cleanup sweep)
  • user.force_logout

All target_kind = user, target_id = the affected user's id. Existing payload conventions apply.

Ordering / dependencies

Slices in approximate landing order (writing-plans will firm this up):

  1. A. Schema — migrations 0017 + 0018, Role helper updates, store API extensions (email, disabled_at, must_change_password, setup_token CRUD, lowercase username constraints)
  2. B. RBAC middlewarerequireRole + roleAtLeast, route re-grouping in server.go, 403 rendering for HTML + JSON
  3. C. Session re-validation — extend the existing session middleware to re-read user state per request, kick disabled users
  4. D. Setup-token flow/setup GET+POST, the one-time link page after add-user
  5. E. User CRUD API — handlers + handlers' tests
  6. F. UI/settings/users list, add, edit, setup-link page, account page
  7. G. Sweep — Playwright walk through the full lifecycle (add → setup link → user signs in → admin disables → user gets kicked → admin re-enables → user signs back in)

Each slice can land as its own commit on the branch. RBAC middleware (B) goes in before user CRUD so we don't ship an open /api/users/* even briefly.

Test strategy

  • Store: Set/GetSetupToken, EnableUser/DisableUser, last-admin guard, lowercase-username uniqueness, expired-token cleanup
  • HTTP middleware: roleAtLeast truth table; viewer hitting an operator route returns 403; disabled user gets 401 mid-session
  • Setup flow integration: create user → fetch setup URL → land on /setup?token=... → POST password → user can log in → token row gone
  • UI: existing Playwright sweep pattern, screenshots into _diag/p4-03-04-sweep/

Out of scope (deferred)

  • OIDC (P4-05) — adds a parallel auth chain. This PR keeps the surface for it (role taxonomy, session middleware) but doesn't wire it.
  • Email-the-setup-link — explicitly deferred. Easy follow-up because the SMTP channel client from P3-06 is already there.
  • Hard delete — disable-only in v1; can add a typed-confirm "purge" later if it turns out to be needed.
  • Password complexity / rotation policy — current minimum (12 chars) and no rotation; tighten later if/when policy demands.
  • Lockout on failed login — a brute-force protection layer is its own task and orthogonal to RBAC.
  • Audit on 403 — not in v1; revisit if compliance asks for it.

Risks / gotchas to watch

  • Existing tests that assume "any logged-in user can hit any endpoint" will break. Audit the test fixtures: most use loginAsAdmin, which is fine; any tests currently exercising specific operator/viewer paths need explicit role assignment. (Quick grep suggests there aren't many — bootstrap-only.)
  • Bootstrap user normalisation — the existing admin row's username is whatever it was set to at first run. The new lowercase-uniqueness index uses LOWER(username), which makes the existing row implicitly lowercase-keyed for lookups. No data migration needed.
  • Session middleware re-read cost — one SELECT per authenticated request. SQLite WAL handles this fine at expected fleet sizes; if it ever shows up on a profile we add a small in-memory cache keyed by session id with a 30s TTL.
  • 403 vs 401 distinction — make sure unauthenticated requests still get 401 (login redirect) and authenticated-but-insufficient get 403. The middleware should compose: auth-required first, role-required second.

Acceptance

  • An admin can add a user, copy the setup link, the new user can land on /setup?token=..., set a password, and reach /
  • An expired token (>1h) on /setup?token=... shows the "contact your administrator" page
  • Admin regenerates the link, old token is invalid, new token works
  • Operator user can trigger Run-now but cannot reach /settings/users (403) and the Users tab in Settings is hidden in their nav
  • Viewer user gets 403 on Run-now, 200 on dashboard / alerts / audit
  • Admin disables a user mid-session — the user's next request is 401 and they're redirected to login
  • Admin cannot disable themselves if they are the last enabled admin (server returns 409, UI button is greyed)
  • Self-service password change at /settings/account works for every role
  • All existing tests pass; new test suite covers role middleware, setup-token lifecycle, last-admin guard

Self-review notes

  • All sections concrete, no TBD / TODO
  • Schema migrations are column-level (CLAUDE.md compliance)
  • Audit action vocabulary listed in one place; no string typos to drift
  • Out-of-scope list explicit so reviewers can challenge what we aren't doing
  • Last-admin guard handled both server-side and UI-hinted
  • Token storage hashes the secret server-side; raw is shown to admin once and never again
  • Session re-validation cost noted with a fallback if it shows up on a profile