P4-05: OIDC login (generic, JIT-provisioned) #16
@@ -0,0 +1,212 @@
|
||||
# P4-05 — OIDC Login Design
|
||||
|
||||
> **Date:** 2026-05-05
|
||||
> **Status:** brainstorm complete; ready for plan
|
||||
> **Closes:** P4-05 (OIDC login)
|
||||
|
||||
## Goal
|
||||
|
||||
Wire OpenID Connect authentication as a sign-in path alongside the existing local-user system, so a deployment that already has an IdP (Authelia, Authentik, Keycloak, Okta, Auth0, etc.) can use it for restic-manager logins.
|
||||
|
||||
## Architecture
|
||||
|
||||
OIDC sits on top of the local-user system rather than replacing it. The first time a user signs in via OIDC the server **just-in-time provisions** a local user row marked `auth_source='oidc'`, with role derived from the IdP's `roles` claim. Subsequent sign-ins look up the same row by stable `oidc_subject` and refresh role + email from the latest claims. Once the row exists it behaves like any other local user — admin can disable it, force-logout, see it in audit logs, etc. — except password-login is rejected because there's no password.
|
||||
|
||||
The Authorization Code flow (with PKCE) is implemented against the discovered well-known config of a single configured issuer. Front-channel logout: clicking Sign out drops the local session + redirects the browser to the IdP's `end_session_endpoint` (when advertised). Back-channel logout deferred.
|
||||
|
||||
## Locked decisions
|
||||
|
||||
| Decision | Pick |
|
||||
|---|---|
|
||||
| User lifecycle | **B** — JIT-provision local rows on first OIDC login (`auth_source='oidc'`, `oidc_subject`) |
|
||||
| Role mapping config | **A** — YAML/env, claim name `roles`, default = deny on no-match |
|
||||
| Username source | `preferred_username`, fallback to `email` |
|
||||
| Username collision with existing local user | **Refuse** with clear remediation message |
|
||||
| Provider config | **Single provider** — `providers:` array can come later |
|
||||
| Login page layout | SSO button **above** password form; password form labelled "or sign in with a local account" |
|
||||
| OIDC users + password login | **Disabled** — `auth_source='oidc'` rows have empty `password_hash`; password form rejects them |
|
||||
| Logout shape | **Front-channel only** — drop session + redirect to `end_session_endpoint` when advertised |
|
||||
| Role re-evaluation | **At login only** — claims read at the OIDC callback; admin can disable mid-session locally |
|
||||
|
||||
## Schema changes
|
||||
|
||||
Migration 0019 — `users` extensions for OIDC bookkeeping:
|
||||
|
||||
```sql
|
||||
ALTER TABLE users ADD COLUMN auth_source TEXT NOT NULL DEFAULT 'local'
|
||||
CHECK (auth_source IN ('local', 'oidc'));
|
||||
ALTER TABLE users ADD COLUMN oidc_subject TEXT;
|
||||
|
||||
CREATE UNIQUE INDEX users_oidc_subject ON users(oidc_subject)
|
||||
WHERE oidc_subject IS NOT NULL;
|
||||
```
|
||||
|
||||
Both column-level ALTERs (CLAUDE.md preference). The unique partial index defends the JIT-lookup invariant (one row per IdP subject) without blocking multiple rows with NULL oidc_subject (the local users).
|
||||
|
||||
## Configuration
|
||||
|
||||
```yaml
|
||||
# server config — extend existing config struct
|
||||
oidc:
|
||||
issuer: https://auth.example.com # well-known config discovered from this
|
||||
client_id: restic-manager
|
||||
client_secret: ${RM_OIDC_CLIENT_SECRET} # or via _FILE
|
||||
scopes: [openid, profile, email, roles] # 'roles' usually means a custom scope
|
||||
role_claim: roles # default if absent
|
||||
role_mapping:
|
||||
rm-admins: admin
|
||||
rm-operators: operator
|
||||
rm-viewers: viewer
|
||||
# Optional — auto-derived from BaseURL if absent.
|
||||
redirect_url: https://rm.example.com/auth/oidc/callback
|
||||
```
|
||||
|
||||
Env-var overrides: `RM_OIDC_ISSUER`, `RM_OIDC_CLIENT_ID`, `RM_OIDC_CLIENT_SECRET`, `RM_OIDC_CLIENT_SECRET_FILE`. Mapping is YAML-only (env doesn't fit a multi-key string→string map cleanly).
|
||||
|
||||
When `oidc.issuer` is empty or missing, OIDC is disabled (current behaviour). No restart-toggle UI; this is a deploy-time setting.
|
||||
|
||||
## Auth flow
|
||||
|
||||
### Login start
|
||||
|
||||
`GET /auth/oidc/login` — only mounted when OIDC is configured.
|
||||
|
||||
1. Generate `state` (32 random bytes, base64) and `code_verifier` (64 random bytes, base64); compute `code_challenge = base64(sha256(code_verifier))`.
|
||||
2. Store `(state, code_verifier, created_at)` in a new ephemeral table (or in memory with a 5-minute TTL — see "trade-off" below).
|
||||
3. Redirect to `<authorization_endpoint>?response_type=code&client_id=...&redirect_uri=...&scope=...&state=...&code_challenge=...&code_challenge_method=S256`.
|
||||
|
||||
### Callback
|
||||
|
||||
`GET /auth/oidc/callback?code=...&state=...` — also OIDC-only mount.
|
||||
|
||||
1. Validate `state` against the stored value (one-shot — delete row on read). Reject if missing/expired/already used.
|
||||
2. Exchange `code` + `code_verifier` for tokens at `token_endpoint`.
|
||||
3. Validate the `id_token` JWT: signature against the JWKS endpoint, `iss`, `aud`, `exp`, `iat`, `nonce` (if used).
|
||||
4. Extract `sub`, `preferred_username`, `email`, and the configured `role_claim` (default `roles`).
|
||||
5. Pick username: `preferred_username` if non-empty, else `email`. Lowercase / trim per the existing local-user rules.
|
||||
6. Pick role: first match in `role_mapping` against the array of role-claim values. **No match → deny with a clear error page**, no row created.
|
||||
7. Look up user by `oidc_subject`. Three cases:
|
||||
- **Found** — refresh `email`, `role`, `last_login_at`. Don't touch `username` (changing it would break audit trails; if the IdP changes the username, that's an operator concern). Log `user.oidc_login`.
|
||||
- **Not found, username free** — INSERT row with `auth_source='oidc'`, `oidc_subject=<sub>`, `password_hash=''`, `must_change_password=0`. Log `user.created` with payload `{"auth_source":"oidc"}` + `user.oidc_login`.
|
||||
- **Not found, username taken by a local user** — render an error page: "This OIDC user (`<sub>`) wants to sign in as `alice`, but a local user with that name already exists. Ask your administrator to either rename / remove the local user, or exclude this user from the OIDC mapping." 403, no row created. Log `user.oidc_login_blocked`.
|
||||
8. Drop a session cookie + `MarkUserLogin` (the existing helper).
|
||||
9. Redirect to `/`.
|
||||
|
||||
### Logout
|
||||
|
||||
`POST /logout` (existing handler) — augmented:
|
||||
|
||||
1. Look up the session before deletion (we need the user row to know if they're an OIDC user).
|
||||
2. Delete the session as today.
|
||||
3. If the user is `auth_source='oidc'` AND the discovered `end_session_endpoint` is non-empty → 303 to `<end_session_endpoint>?id_token_hint=<id_token>&post_logout_redirect_uri=<base>/login`. Otherwise → existing 303 to `/login`.
|
||||
|
||||
We need to keep the latest `id_token` per session to drive `id_token_hint`. Stash it in a new `sessions.id_token TEXT` column (one column-level ALTER on migration 0019 alongside the user columns), populated only for OIDC sessions.
|
||||
|
||||
## State table
|
||||
|
||||
Two reasonable shapes for the short-lived state used during the OAuth round-trip:
|
||||
|
||||
- **In-memory map** with a 5-minute TTL sweeper. Simpler, but multi-process deployments lose it (no multi-process today, but Phase 5 OSS readiness might add).
|
||||
- **`oidc_state` table** — `(state_hash PK, code_verifier, created_at)`, swept on the same 60s alert-engine tick that already handles setup-token cleanup.
|
||||
|
||||
I'll go with the **table**. Costs ~3 lines in the existing cleanup tick, behaves correctly under restarts, and survives a future scale-out. Migration 0019 includes:
|
||||
|
||||
```sql
|
||||
CREATE TABLE oidc_state (
|
||||
state_hash TEXT PRIMARY KEY, -- sha256(state) hex; raw state never persisted
|
||||
code_verifier TEXT NOT NULL,
|
||||
created_at TEXT NOT NULL
|
||||
);
|
||||
CREATE INDEX oidc_state_created ON oidc_state(created_at);
|
||||
```
|
||||
|
||||
## Login-page UI
|
||||
|
||||
`/login` template branches based on `view.OIDCEnabled`:
|
||||
|
||||
- **OIDC off** → current layout (just the password form).
|
||||
- **OIDC on** → an `Sign in with <provider name>` button at the top, then a faint divider line, then the existing password form labelled "Or sign in with a local account". Provider name comes from a new optional config `oidc.display_name` (defaults to "SSO").
|
||||
|
||||
Failed-OIDC redirects (no role match, username collision, IdP error) land on `/login?oidc_error=<reason>` with a small banner above the buttons.
|
||||
|
||||
## Audit actions
|
||||
|
||||
New entries in the action vocabulary:
|
||||
|
||||
- `user.oidc_login` (target_kind=user, target_id=user_id, payload `{"sub":"…"}`)
|
||||
- `user.oidc_login_blocked` (target_kind=user, target_id=oidc_subject when no row was created, payload `{"username":"…", "reason":"username_taken|no_role_match|other"}`)
|
||||
- `user.created` already exists; OIDC's first-time provisioning fires this with payload `{"auth_source":"oidc"}` so the audit log distinguishes admin-created from JIT-provisioned rows.
|
||||
|
||||
## User-management UI changes
|
||||
|
||||
Small additions, not new screens:
|
||||
|
||||
- **Users list** — Status column adds a small `oidc` chip when `auth_source='oidc'` so admin can see at a glance which rows came from JIT-provisioning. Sortable by auth_source via the same sortable-headers pattern (lands as a small follow-up if anyone asks; out of scope for v1).
|
||||
- **Add user form** — disabled when OIDC is the only auth path, with a hint: "User provisioning is handled by your OIDC provider; users appear here on first sign-in." Configurable later via a `oidc.disable_local_users` flag if that becomes a real ask. Out of scope for v1; both paths stay open.
|
||||
- **Edit user form** — when `auth_source='oidc'`:
|
||||
- Username field disabled (changing it would just be undone on next OIDC login)
|
||||
- Role dropdown disabled, with a hint: "Role is managed by your OIDC provider's `roles` claim mapping. Edit the mapping in server config to change."
|
||||
- Email field disabled (refreshed from IdP on each login)
|
||||
- **Disable / Enable / Force logout** still work — disabling an OIDC user kicks their session and rejects future OIDC logins ("user disabled by administrator")
|
||||
- **Regenerate setup link** hidden — there's no setup token for OIDC users
|
||||
- **Login UI** — password form rejects users with `auth_source='oidc'` ("This account uses single sign-on. Click the SSO button above.")
|
||||
|
||||
## Middleware / handler changes
|
||||
|
||||
- **Routes**: new public-band entries `GET /auth/oidc/login`, `GET /auth/oidc/callback`. Skipped entirely when OIDC isn't configured (`s.deps.OIDC == nil`).
|
||||
- **Logout handler** augmented to fetch the user row + decide between local logout (303 → `/login`) and OIDC logout (303 → `end_session_endpoint`).
|
||||
- **Login handler** rejects `auth_source='oidc'` users with the SSO-prompt error.
|
||||
- **Last-admin guard** — already covers OIDC users naturally because they live in the `users` table. The role-from-claims path could create a "every admin gets demoted to operator" situation if the IdP's claim mapping is wrong; the guard rejects that demotion at the moment it'd be applied (returns the user to the login page with `oidc_error=role_change_blocked` and audit entry; admin must fix the mapping or promote a local admin first).
|
||||
|
||||
## Implementation outline
|
||||
|
||||
1. **Schema** — migration 0019 (users.auth_source + oidc_subject, sessions.id_token, oidc_state table)
|
||||
2. **Config** — extend `internal/server/config` with the OIDC block + env-var overrides; load JWKS lazily
|
||||
3. **Discovery + JWKS** — small helper that fetches `<issuer>/.well-known/openid-configuration` once at startup, caches `authorization_endpoint`, `token_endpoint`, `end_session_endpoint`, `jwks_uri`. JWKS refreshed on first failed verification.
|
||||
4. **Login start handler** — `/auth/oidc/login`
|
||||
5. **Callback handler** — `/auth/oidc/callback`, with the four claim-resolution branches
|
||||
6. **Logout handler augmentation** — branch on `auth_source`
|
||||
7. **Login form rejection** — local-user password form rejects OIDC accounts
|
||||
8. **State cleanup** — extend the alert engine's existing cleanup tick
|
||||
9. **UI** — `oidc` chip on users list, disabled fields on edit-form for OIDC users, login page SSO button + error banner
|
||||
10. **Tests** — config parse tests; happy-path callback test using a fake IdP (httptest server with a hand-rolled discovery doc + JWKS); username-collision test; no-role-match test; logout test
|
||||
11. **Sweep** — full Playwright walk against an actual IdP (Authelia in a Docker container) — admin gets in via OIDC, role mapping works, logout redirects through IdP, OIDC user can't password-login
|
||||
|
||||
## Test strategy
|
||||
|
||||
The IdP is the hard part to test cleanly. Two layers:
|
||||
|
||||
- **Unit / integration tests** use a stub OIDC provider built into the test harness — `httptest.Server` exposing `.well-known/openid-configuration`, a token endpoint that signs minted JWTs with a test ECDSA key, and a JWKS endpoint serving the public key. This covers every code path without a real IdP. Pattern: each test mints its own claims and runs the callback against the stub.
|
||||
- **Smoke env** runs against a real Authelia container (existing `compose.smoke.yaml`-style file or one-liner `docker run`) for the final sweep — confirms the discovery doc isn't being misread, real JWT verification works, real `end_session_endpoint` redirect works.
|
||||
|
||||
## Out of scope (deferred)
|
||||
|
||||
- **Multi-provider** support (`providers:` array)
|
||||
- **Back-channel logout** (RFC 8138) — schema isn't blocked from adding it later
|
||||
- **UI-driven role mapping** (config-only in v1)
|
||||
- **Refresh tokens / mid-session role re-evaluation** — login-only refresh in v1
|
||||
- **`oidc.disable_local_users`** flag — both paths stay open in v1
|
||||
- **OIDC user dashboard chip / badges** beyond the small `oidc` indicator on the users list
|
||||
- **Per-user "auth source" filter on the users list** — sortable headers cover most of the use case
|
||||
|
||||
## Risks / gotchas
|
||||
|
||||
- **JWKS key rotation** — refresh on first failed verification is the standard fix; document the cache TTL (1h) in the config block.
|
||||
- **Clock skew** — accept `iat`/`exp` with a 60s leeway; matches what most OIDC libraries do.
|
||||
- **End-session 404 / not advertised** — degrade gracefully; just drop the session and 303 to `/login`. Don't 500 the logout because the IdP doesn't implement RP-initiated logout.
|
||||
- **Username changes at the IdP** — silently keep the local username (matches our locked decision: subject is the stable key, username is display-only). Document.
|
||||
- **Role claim is sometimes a string, sometimes an array, sometimes a comma-separated string** depending on IdP — normalise into `[]string` before mapping. Authelia/Keycloak emit arrays; some custom setups emit strings; handle both.
|
||||
- **Password-form bypass for OIDC users via /api/auth/login (JSON)** — same rejection rule applies, not just the HTML form.
|
||||
|
||||
## Acceptance
|
||||
|
||||
- [ ] An OIDC user with `roles: ["rm-admins"]` can sign in, becomes an admin, is visible in `/settings/users` with an `oidc` chip
|
||||
- [ ] Same user signing in again resolves to the same row (no duplicate)
|
||||
- [ ] Same user with `roles: ["something-else"]` is denied, lands on `/login?oidc_error=no_role_match` with a banner, no row created
|
||||
- [ ] OIDC user can't password-login through `/login` or `/api/auth/login`
|
||||
- [ ] Admin disables an OIDC user → next OIDC login is rejected, existing session bounced (existing disable-mid-session)
|
||||
- [ ] Sign out as an OIDC user → 303 to IdP's end-session URL (when advertised); no end-session URL → 303 to `/login`
|
||||
- [ ] OIDC config absent → password login works exactly as today (zero behavioural change)
|
||||
- [ ] Username collision: a local `alice` exists, OIDC user with `preferred_username=alice` and a different `sub` → blocked at sign-in with the clear error page
|
||||
- [ ] Last-admin guard refuses to demote the only enabled admin even if the IdP's role mapping says otherwise
|
||||
- [ ] All existing tests pass; new test suite covers the four claim-resolution branches and logout
|
||||
Reference in New Issue
Block a user