spec: P4-05 — OIDC login design

Brainstormed shape locked: JIT-provision local rows on first OIDC
sign-in (auth_source='oidc'), YAML-only config (no UI), 'roles'
claim with deny-on-no-match default, preferred_username with email
fallback, refuse on local-user collision, single provider, login
page shows SSO above password (break-glass), front-channel logout
only, role re-evaluation at login only.

Migration 0019: users.auth_source + users.oidc_subject (partial
unique index), sessions.id_token (for end_session id_token_hint),
oidc_state table for the OAuth round-trip state, swept on the
existing alert-engine tick.

Composes with the user-management work from P4-03/04: admin can
disable OIDC users like local; last-admin guard catches IdP role-
mapping mistakes; audit trail covers JIT-provision via
user.created with auth_source payload + new user.oidc_login /
user.oidc_login_blocked actions.

Out of scope (deferred): back-channel logout, multi-provider,
UI-driven role mapping, refresh tokens / mid-session re-eval.
This commit is contained in:
2026-05-05 12:04:09 +01:00
parent 4b48925edf
commit 814e49cb93
@@ -0,0 +1,212 @@
# P4-05 — OIDC Login Design
> **Date:** 2026-05-05
> **Status:** brainstorm complete; ready for plan
> **Closes:** P4-05 (OIDC login)
## Goal
Wire OpenID Connect authentication as a sign-in path alongside the existing local-user system, so a deployment that already has an IdP (Authelia, Authentik, Keycloak, Okta, Auth0, etc.) can use it for restic-manager logins.
## Architecture
OIDC sits on top of the local-user system rather than replacing it. The first time a user signs in via OIDC the server **just-in-time provisions** a local user row marked `auth_source='oidc'`, with role derived from the IdP's `roles` claim. Subsequent sign-ins look up the same row by stable `oidc_subject` and refresh role + email from the latest claims. Once the row exists it behaves like any other local user — admin can disable it, force-logout, see it in audit logs, etc. — except password-login is rejected because there's no password.
The Authorization Code flow (with PKCE) is implemented against the discovered well-known config of a single configured issuer. Front-channel logout: clicking Sign out drops the local session + redirects the browser to the IdP's `end_session_endpoint` (when advertised). Back-channel logout deferred.
## Locked decisions
| Decision | Pick |
|---|---|
| User lifecycle | **B** — JIT-provision local rows on first OIDC login (`auth_source='oidc'`, `oidc_subject`) |
| Role mapping config | **A** — YAML/env, claim name `roles`, default = deny on no-match |
| Username source | `preferred_username`, fallback to `email` |
| Username collision with existing local user | **Refuse** with clear remediation message |
| Provider config | **Single provider**`providers:` array can come later |
| Login page layout | SSO button **above** password form; password form labelled "or sign in with a local account" |
| OIDC users + password login | **Disabled**`auth_source='oidc'` rows have empty `password_hash`; password form rejects them |
| Logout shape | **Front-channel only** — drop session + redirect to `end_session_endpoint` when advertised |
| Role re-evaluation | **At login only** — claims read at the OIDC callback; admin can disable mid-session locally |
## Schema changes
Migration 0019 — `users` extensions for OIDC bookkeeping:
```sql
ALTER TABLE users ADD COLUMN auth_source TEXT NOT NULL DEFAULT 'local'
CHECK (auth_source IN ('local', 'oidc'));
ALTER TABLE users ADD COLUMN oidc_subject TEXT;
CREATE UNIQUE INDEX users_oidc_subject ON users(oidc_subject)
WHERE oidc_subject IS NOT NULL;
```
Both column-level ALTERs (CLAUDE.md preference). The unique partial index defends the JIT-lookup invariant (one row per IdP subject) without blocking multiple rows with NULL oidc_subject (the local users).
## Configuration
```yaml
# server config — extend existing config struct
oidc:
issuer: https://auth.example.com # well-known config discovered from this
client_id: restic-manager
client_secret: ${RM_OIDC_CLIENT_SECRET} # or via _FILE
scopes: [openid, profile, email, roles] # 'roles' usually means a custom scope
role_claim: roles # default if absent
role_mapping:
rm-admins: admin
rm-operators: operator
rm-viewers: viewer
# Optional — auto-derived from BaseURL if absent.
redirect_url: https://rm.example.com/auth/oidc/callback
```
Env-var overrides: `RM_OIDC_ISSUER`, `RM_OIDC_CLIENT_ID`, `RM_OIDC_CLIENT_SECRET`, `RM_OIDC_CLIENT_SECRET_FILE`. Mapping is YAML-only (env doesn't fit a multi-key string→string map cleanly).
When `oidc.issuer` is empty or missing, OIDC is disabled (current behaviour). No restart-toggle UI; this is a deploy-time setting.
## Auth flow
### Login start
`GET /auth/oidc/login` — only mounted when OIDC is configured.
1. Generate `state` (32 random bytes, base64) and `code_verifier` (64 random bytes, base64); compute `code_challenge = base64(sha256(code_verifier))`.
2. Store `(state, code_verifier, created_at)` in a new ephemeral table (or in memory with a 5-minute TTL — see "trade-off" below).
3. Redirect to `<authorization_endpoint>?response_type=code&client_id=...&redirect_uri=...&scope=...&state=...&code_challenge=...&code_challenge_method=S256`.
### Callback
`GET /auth/oidc/callback?code=...&state=...` — also OIDC-only mount.
1. Validate `state` against the stored value (one-shot — delete row on read). Reject if missing/expired/already used.
2. Exchange `code` + `code_verifier` for tokens at `token_endpoint`.
3. Validate the `id_token` JWT: signature against the JWKS endpoint, `iss`, `aud`, `exp`, `iat`, `nonce` (if used).
4. Extract `sub`, `preferred_username`, `email`, and the configured `role_claim` (default `roles`).
5. Pick username: `preferred_username` if non-empty, else `email`. Lowercase / trim per the existing local-user rules.
6. Pick role: first match in `role_mapping` against the array of role-claim values. **No match → deny with a clear error page**, no row created.
7. Look up user by `oidc_subject`. Three cases:
- **Found** — refresh `email`, `role`, `last_login_at`. Don't touch `username` (changing it would break audit trails; if the IdP changes the username, that's an operator concern). Log `user.oidc_login`.
- **Not found, username free** — INSERT row with `auth_source='oidc'`, `oidc_subject=<sub>`, `password_hash=''`, `must_change_password=0`. Log `user.created` with payload `{"auth_source":"oidc"}` + `user.oidc_login`.
- **Not found, username taken by a local user** — render an error page: "This OIDC user (`<sub>`) wants to sign in as `alice`, but a local user with that name already exists. Ask your administrator to either rename / remove the local user, or exclude this user from the OIDC mapping." 403, no row created. Log `user.oidc_login_blocked`.
8. Drop a session cookie + `MarkUserLogin` (the existing helper).
9. Redirect to `/`.
### Logout
`POST /logout` (existing handler) — augmented:
1. Look up the session before deletion (we need the user row to know if they're an OIDC user).
2. Delete the session as today.
3. If the user is `auth_source='oidc'` AND the discovered `end_session_endpoint` is non-empty → 303 to `<end_session_endpoint>?id_token_hint=<id_token>&post_logout_redirect_uri=<base>/login`. Otherwise → existing 303 to `/login`.
We need to keep the latest `id_token` per session to drive `id_token_hint`. Stash it in a new `sessions.id_token TEXT` column (one column-level ALTER on migration 0019 alongside the user columns), populated only for OIDC sessions.
## State table
Two reasonable shapes for the short-lived state used during the OAuth round-trip:
- **In-memory map** with a 5-minute TTL sweeper. Simpler, but multi-process deployments lose it (no multi-process today, but Phase 5 OSS readiness might add).
- **`oidc_state` table** — `(state_hash PK, code_verifier, created_at)`, swept on the same 60s alert-engine tick that already handles setup-token cleanup.
I'll go with the **table**. Costs ~3 lines in the existing cleanup tick, behaves correctly under restarts, and survives a future scale-out. Migration 0019 includes:
```sql
CREATE TABLE oidc_state (
state_hash TEXT PRIMARY KEY, -- sha256(state) hex; raw state never persisted
code_verifier TEXT NOT NULL,
created_at TEXT NOT NULL
);
CREATE INDEX oidc_state_created ON oidc_state(created_at);
```
## Login-page UI
`/login` template branches based on `view.OIDCEnabled`:
- **OIDC off** → current layout (just the password form).
- **OIDC on** → an `Sign in with <provider name>` button at the top, then a faint divider line, then the existing password form labelled "Or sign in with a local account". Provider name comes from a new optional config `oidc.display_name` (defaults to "SSO").
Failed-OIDC redirects (no role match, username collision, IdP error) land on `/login?oidc_error=<reason>` with a small banner above the buttons.
## Audit actions
New entries in the action vocabulary:
- `user.oidc_login` (target_kind=user, target_id=user_id, payload `{"sub":"…"}`)
- `user.oidc_login_blocked` (target_kind=user, target_id=oidc_subject when no row was created, payload `{"username":"…", "reason":"username_taken|no_role_match|other"}`)
- `user.created` already exists; OIDC's first-time provisioning fires this with payload `{"auth_source":"oidc"}` so the audit log distinguishes admin-created from JIT-provisioned rows.
## User-management UI changes
Small additions, not new screens:
- **Users list** — Status column adds a small `oidc` chip when `auth_source='oidc'` so admin can see at a glance which rows came from JIT-provisioning. Sortable by auth_source via the same sortable-headers pattern (lands as a small follow-up if anyone asks; out of scope for v1).
- **Add user form** — disabled when OIDC is the only auth path, with a hint: "User provisioning is handled by your OIDC provider; users appear here on first sign-in." Configurable later via a `oidc.disable_local_users` flag if that becomes a real ask. Out of scope for v1; both paths stay open.
- **Edit user form** — when `auth_source='oidc'`:
- Username field disabled (changing it would just be undone on next OIDC login)
- Role dropdown disabled, with a hint: "Role is managed by your OIDC provider's `roles` claim mapping. Edit the mapping in server config to change."
- Email field disabled (refreshed from IdP on each login)
- **Disable / Enable / Force logout** still work — disabling an OIDC user kicks their session and rejects future OIDC logins ("user disabled by administrator")
- **Regenerate setup link** hidden — there's no setup token for OIDC users
- **Login UI** — password form rejects users with `auth_source='oidc'` ("This account uses single sign-on. Click the SSO button above.")
## Middleware / handler changes
- **Routes**: new public-band entries `GET /auth/oidc/login`, `GET /auth/oidc/callback`. Skipped entirely when OIDC isn't configured (`s.deps.OIDC == nil`).
- **Logout handler** augmented to fetch the user row + decide between local logout (303 → `/login`) and OIDC logout (303 → `end_session_endpoint`).
- **Login handler** rejects `auth_source='oidc'` users with the SSO-prompt error.
- **Last-admin guard** — already covers OIDC users naturally because they live in the `users` table. The role-from-claims path could create a "every admin gets demoted to operator" situation if the IdP's claim mapping is wrong; the guard rejects that demotion at the moment it'd be applied (returns the user to the login page with `oidc_error=role_change_blocked` and audit entry; admin must fix the mapping or promote a local admin first).
## Implementation outline
1. **Schema** — migration 0019 (users.auth_source + oidc_subject, sessions.id_token, oidc_state table)
2. **Config** — extend `internal/server/config` with the OIDC block + env-var overrides; load JWKS lazily
3. **Discovery + JWKS** — small helper that fetches `<issuer>/.well-known/openid-configuration` once at startup, caches `authorization_endpoint`, `token_endpoint`, `end_session_endpoint`, `jwks_uri`. JWKS refreshed on first failed verification.
4. **Login start handler**`/auth/oidc/login`
5. **Callback handler**`/auth/oidc/callback`, with the four claim-resolution branches
6. **Logout handler augmentation** — branch on `auth_source`
7. **Login form rejection** — local-user password form rejects OIDC accounts
8. **State cleanup** — extend the alert engine's existing cleanup tick
9. **UI**`oidc` chip on users list, disabled fields on edit-form for OIDC users, login page SSO button + error banner
10. **Tests** — config parse tests; happy-path callback test using a fake IdP (httptest server with a hand-rolled discovery doc + JWKS); username-collision test; no-role-match test; logout test
11. **Sweep** — full Playwright walk against an actual IdP (Authelia in a Docker container) — admin gets in via OIDC, role mapping works, logout redirects through IdP, OIDC user can't password-login
## Test strategy
The IdP is the hard part to test cleanly. Two layers:
- **Unit / integration tests** use a stub OIDC provider built into the test harness — `httptest.Server` exposing `.well-known/openid-configuration`, a token endpoint that signs minted JWTs with a test ECDSA key, and a JWKS endpoint serving the public key. This covers every code path without a real IdP. Pattern: each test mints its own claims and runs the callback against the stub.
- **Smoke env** runs against a real Authelia container (existing `compose.smoke.yaml`-style file or one-liner `docker run`) for the final sweep — confirms the discovery doc isn't being misread, real JWT verification works, real `end_session_endpoint` redirect works.
## Out of scope (deferred)
- **Multi-provider** support (`providers:` array)
- **Back-channel logout** (RFC 8138) — schema isn't blocked from adding it later
- **UI-driven role mapping** (config-only in v1)
- **Refresh tokens / mid-session role re-evaluation** — login-only refresh in v1
- **`oidc.disable_local_users`** flag — both paths stay open in v1
- **OIDC user dashboard chip / badges** beyond the small `oidc` indicator on the users list
- **Per-user "auth source" filter on the users list** — sortable headers cover most of the use case
## Risks / gotchas
- **JWKS key rotation** — refresh on first failed verification is the standard fix; document the cache TTL (1h) in the config block.
- **Clock skew** — accept `iat`/`exp` with a 60s leeway; matches what most OIDC libraries do.
- **End-session 404 / not advertised** — degrade gracefully; just drop the session and 303 to `/login`. Don't 500 the logout because the IdP doesn't implement RP-initiated logout.
- **Username changes at the IdP** — silently keep the local username (matches our locked decision: subject is the stable key, username is display-only). Document.
- **Role claim is sometimes a string, sometimes an array, sometimes a comma-separated string** depending on IdP — normalise into `[]string` before mapping. Authelia/Keycloak emit arrays; some custom setups emit strings; handle both.
- **Password-form bypass for OIDC users via /api/auth/login (JSON)** — same rejection rule applies, not just the HTML form.
## Acceptance
- [ ] An OIDC user with `roles: ["rm-admins"]` can sign in, becomes an admin, is visible in `/settings/users` with an `oidc` chip
- [ ] Same user signing in again resolves to the same row (no duplicate)
- [ ] Same user with `roles: ["something-else"]` is denied, lands on `/login?oidc_error=no_role_match` with a banner, no row created
- [ ] OIDC user can't password-login through `/login` or `/api/auth/login`
- [ ] Admin disables an OIDC user → next OIDC login is rejected, existing session bounced (existing disable-mid-session)
- [ ] Sign out as an OIDC user → 303 to IdP's end-session URL (when advertised); no end-session URL → 303 to `/login`
- [ ] OIDC config absent → password login works exactly as today (zero behavioural change)
- [ ] Username collision: a local `alice` exists, OIDC user with `preferred_username=alice` and a different `sub` → blocked at sign-in with the clear error page
- [ ] Last-admin guard refuses to demote the only enabled admin even if the IdP's role mapping says otherwise
- [ ] All existing tests pass; new test suite covers the four claim-resolution branches and logout