bc19ad8804
Confirmed claim name from the lab IdP is 'groups' (not 'roles' as
the original spec assumed). Default the role_claim config field to
'groups' which also matches Keycloak and Authentik out of the box.
Add a 'display_name' field so the SSO button can read 'Sign in with
Authelia' rather than the generic 'SSO'.
Two new gotchas captured:
- Authelia 4.39+ 'sub' is an opaque UUID, not username — the
locked design already keys on sub + reads preferred_username
for display, so this is just documentation.
- end_session_endpoint isn't always published (Authelia config-
dependent); the locked logout flow already degrades cleanly.
216 lines
16 KiB
Markdown
216 lines
16 KiB
Markdown
# P4-05 — OIDC Login Design
|
|
|
|
> **Date:** 2026-05-05
|
|
> **Status:** brainstorm complete; ready for plan
|
|
> **Closes:** P4-05 (OIDC login)
|
|
|
|
## Goal
|
|
|
|
Wire OpenID Connect authentication as a sign-in path alongside the existing local-user system, so a deployment that already has an IdP (Authelia, Authentik, Keycloak, Okta, Auth0, etc.) can use it for restic-manager logins.
|
|
|
|
## Architecture
|
|
|
|
OIDC sits on top of the local-user system rather than replacing it. The first time a user signs in via OIDC the server **just-in-time provisions** a local user row marked `auth_source='oidc'`, with role derived from the IdP's `roles` claim. Subsequent sign-ins look up the same row by stable `oidc_subject` and refresh role + email from the latest claims. Once the row exists it behaves like any other local user — admin can disable it, force-logout, see it in audit logs, etc. — except password-login is rejected because there's no password.
|
|
|
|
The Authorization Code flow (with PKCE) is implemented against the discovered well-known config of a single configured issuer. Front-channel logout: clicking Sign out drops the local session + redirects the browser to the IdP's `end_session_endpoint` (when advertised). Back-channel logout deferred.
|
|
|
|
## Locked decisions
|
|
|
|
| Decision | Pick |
|
|
|---|---|
|
|
| User lifecycle | **B** — JIT-provision local rows on first OIDC login (`auth_source='oidc'`, `oidc_subject`) |
|
|
| Role mapping config | **A** — YAML/env, claim name configurable (default `groups`, matching Authelia / Keycloak / Authentik), default = deny on no-match |
|
|
| Username source | `preferred_username`, fallback to `email` |
|
|
| Username collision with existing local user | **Refuse** with clear remediation message |
|
|
| Provider config | **Single provider** — `providers:` array can come later |
|
|
| Login page layout | SSO button **above** password form; password form labelled "or sign in with a local account" |
|
|
| OIDC users + password login | **Disabled** — `auth_source='oidc'` rows have empty `password_hash`; password form rejects them |
|
|
| Logout shape | **Front-channel only** — drop session + redirect to `end_session_endpoint` when advertised |
|
|
| Role re-evaluation | **At login only** — claims read at the OIDC callback; admin can disable mid-session locally |
|
|
|
|
## Schema changes
|
|
|
|
Migration 0019 — `users` extensions for OIDC bookkeeping:
|
|
|
|
```sql
|
|
ALTER TABLE users ADD COLUMN auth_source TEXT NOT NULL DEFAULT 'local'
|
|
CHECK (auth_source IN ('local', 'oidc'));
|
|
ALTER TABLE users ADD COLUMN oidc_subject TEXT;
|
|
|
|
CREATE UNIQUE INDEX users_oidc_subject ON users(oidc_subject)
|
|
WHERE oidc_subject IS NOT NULL;
|
|
```
|
|
|
|
Both column-level ALTERs (CLAUDE.md preference). The unique partial index defends the JIT-lookup invariant (one row per IdP subject) without blocking multiple rows with NULL oidc_subject (the local users).
|
|
|
|
## Configuration
|
|
|
|
```yaml
|
|
# server config — extend existing config struct
|
|
oidc:
|
|
issuer: https://auth.example.com # well-known config discovered from this
|
|
client_id: restic-manager
|
|
client_secret: ${RM_OIDC_CLIENT_SECRET} # or via _FILE
|
|
display_name: Authelia # button label "Sign in with <display_name>"; default "SSO"
|
|
scopes: [openid, profile, email, groups]
|
|
role_claim: groups # default if absent (matches Authelia / Keycloak / Authentik)
|
|
role_mapping:
|
|
rm-admins: admin
|
|
rm-operators: operator
|
|
rm-viewers: viewer
|
|
# Optional — auto-derived from BaseURL if absent.
|
|
redirect_url: https://rm.example.com/auth/oidc/callback
|
|
```
|
|
|
|
Env-var overrides: `RM_OIDC_ISSUER`, `RM_OIDC_CLIENT_ID`, `RM_OIDC_CLIENT_SECRET`, `RM_OIDC_CLIENT_SECRET_FILE`. Mapping is YAML-only (env doesn't fit a multi-key string→string map cleanly).
|
|
|
|
When `oidc.issuer` is empty or missing, OIDC is disabled (current behaviour). No restart-toggle UI; this is a deploy-time setting.
|
|
|
|
## Auth flow
|
|
|
|
### Login start
|
|
|
|
`GET /auth/oidc/login` — only mounted when OIDC is configured.
|
|
|
|
1. Generate `state` (32 random bytes, base64) and `code_verifier` (64 random bytes, base64); compute `code_challenge = base64(sha256(code_verifier))`.
|
|
2. Store `(state, code_verifier, created_at)` in a new ephemeral table (or in memory with a 5-minute TTL — see "trade-off" below).
|
|
3. Redirect to `<authorization_endpoint>?response_type=code&client_id=...&redirect_uri=...&scope=...&state=...&code_challenge=...&code_challenge_method=S256`.
|
|
|
|
### Callback
|
|
|
|
`GET /auth/oidc/callback?code=...&state=...` — also OIDC-only mount.
|
|
|
|
1. Validate `state` against the stored value (one-shot — delete row on read). Reject if missing/expired/already used.
|
|
2. Exchange `code` + `code_verifier` for tokens at `token_endpoint`.
|
|
3. Validate the `id_token` JWT: signature against the JWKS endpoint, `iss`, `aud`, `exp`, `iat`, `nonce` (if used).
|
|
4. Extract `sub`, `preferred_username`, `email`, and the configured `role_claim` (default `roles`).
|
|
5. Pick username: `preferred_username` if non-empty, else `email`. Lowercase / trim per the existing local-user rules.
|
|
6. Pick role: first match in `role_mapping` against the array of role-claim values. **No match → deny with a clear error page**, no row created.
|
|
7. Look up user by `oidc_subject`. Three cases:
|
|
- **Found** — refresh `email`, `role`, `last_login_at`. Don't touch `username` (changing it would break audit trails; if the IdP changes the username, that's an operator concern). Log `user.oidc_login`.
|
|
- **Not found, username free** — INSERT row with `auth_source='oidc'`, `oidc_subject=<sub>`, `password_hash=''`, `must_change_password=0`. Log `user.created` with payload `{"auth_source":"oidc"}` + `user.oidc_login`.
|
|
- **Not found, username taken by a local user** — render an error page: "This OIDC user (`<sub>`) wants to sign in as `alice`, but a local user with that name already exists. Ask your administrator to either rename / remove the local user, or exclude this user from the OIDC mapping." 403, no row created. Log `user.oidc_login_blocked`.
|
|
8. Drop a session cookie + `MarkUserLogin` (the existing helper).
|
|
9. Redirect to `/`.
|
|
|
|
### Logout
|
|
|
|
`POST /logout` (existing handler) — augmented:
|
|
|
|
1. Look up the session before deletion (we need the user row to know if they're an OIDC user).
|
|
2. Delete the session as today.
|
|
3. If the user is `auth_source='oidc'` AND the discovered `end_session_endpoint` is non-empty → 303 to `<end_session_endpoint>?id_token_hint=<id_token>&post_logout_redirect_uri=<base>/login`. Otherwise → existing 303 to `/login`.
|
|
|
|
We need to keep the latest `id_token` per session to drive `id_token_hint`. Stash it in a new `sessions.id_token TEXT` column (one column-level ALTER on migration 0019 alongside the user columns), populated only for OIDC sessions.
|
|
|
|
## State table
|
|
|
|
Two reasonable shapes for the short-lived state used during the OAuth round-trip:
|
|
|
|
- **In-memory map** with a 5-minute TTL sweeper. Simpler, but multi-process deployments lose it (no multi-process today, but Phase 5 OSS readiness might add).
|
|
- **`oidc_state` table** — `(state_hash PK, code_verifier, created_at)`, swept on the same 60s alert-engine tick that already handles setup-token cleanup.
|
|
|
|
I'll go with the **table**. Costs ~3 lines in the existing cleanup tick, behaves correctly under restarts, and survives a future scale-out. Migration 0019 includes:
|
|
|
|
```sql
|
|
CREATE TABLE oidc_state (
|
|
state_hash TEXT PRIMARY KEY, -- sha256(state) hex; raw state never persisted
|
|
code_verifier TEXT NOT NULL,
|
|
created_at TEXT NOT NULL
|
|
);
|
|
CREATE INDEX oidc_state_created ON oidc_state(created_at);
|
|
```
|
|
|
|
## Login-page UI
|
|
|
|
`/login` template branches based on `view.OIDCEnabled`:
|
|
|
|
- **OIDC off** → current layout (just the password form).
|
|
- **OIDC on** → an `Sign in with <provider name>` button at the top, then a faint divider line, then the existing password form labelled "Or sign in with a local account". Provider name comes from a new optional config `oidc.display_name` (defaults to "SSO").
|
|
|
|
Failed-OIDC redirects (no role match, username collision, IdP error) land on `/login?oidc_error=<reason>` with a small banner above the buttons.
|
|
|
|
## Audit actions
|
|
|
|
New entries in the action vocabulary:
|
|
|
|
- `user.oidc_login` (target_kind=user, target_id=user_id, payload `{"sub":"…"}`)
|
|
- `user.oidc_login_blocked` (target_kind=user, target_id=oidc_subject when no row was created, payload `{"username":"…", "reason":"username_taken|no_role_match|other"}`)
|
|
- `user.created` already exists; OIDC's first-time provisioning fires this with payload `{"auth_source":"oidc"}` so the audit log distinguishes admin-created from JIT-provisioned rows.
|
|
|
|
## User-management UI changes
|
|
|
|
Small additions, not new screens:
|
|
|
|
- **Users list** — Status column adds a small `oidc` chip when `auth_source='oidc'` so admin can see at a glance which rows came from JIT-provisioning. Sortable by auth_source via the same sortable-headers pattern (lands as a small follow-up if anyone asks; out of scope for v1).
|
|
- **Add user form** — disabled when OIDC is the only auth path, with a hint: "User provisioning is handled by your OIDC provider; users appear here on first sign-in." Configurable later via a `oidc.disable_local_users` flag if that becomes a real ask. Out of scope for v1; both paths stay open.
|
|
- **Edit user form** — when `auth_source='oidc'`:
|
|
- Username field disabled (changing it would just be undone on next OIDC login)
|
|
- Role dropdown disabled, with a hint: "Role is managed by your OIDC provider's `roles` claim mapping. Edit the mapping in server config to change."
|
|
- Email field disabled (refreshed from IdP on each login)
|
|
- **Disable / Enable / Force logout** still work — disabling an OIDC user kicks their session and rejects future OIDC logins ("user disabled by administrator")
|
|
- **Regenerate setup link** hidden — there's no setup token for OIDC users
|
|
- **Login UI** — password form rejects users with `auth_source='oidc'` ("This account uses single sign-on. Click the SSO button above.")
|
|
|
|
## Middleware / handler changes
|
|
|
|
- **Routes**: new public-band entries `GET /auth/oidc/login`, `GET /auth/oidc/callback`. Skipped entirely when OIDC isn't configured (`s.deps.OIDC == nil`).
|
|
- **Logout handler** augmented to fetch the user row + decide between local logout (303 → `/login`) and OIDC logout (303 → `end_session_endpoint`).
|
|
- **Login handler** rejects `auth_source='oidc'` users with the SSO-prompt error.
|
|
- **Last-admin guard** — already covers OIDC users naturally because they live in the `users` table. The role-from-claims path could create a "every admin gets demoted to operator" situation if the IdP's claim mapping is wrong; the guard rejects that demotion at the moment it'd be applied (returns the user to the login page with `oidc_error=role_change_blocked` and audit entry; admin must fix the mapping or promote a local admin first).
|
|
|
|
## Implementation outline
|
|
|
|
1. **Schema** — migration 0019 (users.auth_source + oidc_subject, sessions.id_token, oidc_state table)
|
|
2. **Config** — extend `internal/server/config` with the OIDC block + env-var overrides; load JWKS lazily
|
|
3. **Discovery + JWKS** — small helper that fetches `<issuer>/.well-known/openid-configuration` once at startup, caches `authorization_endpoint`, `token_endpoint`, `end_session_endpoint`, `jwks_uri`. JWKS refreshed on first failed verification.
|
|
4. **Login start handler** — `/auth/oidc/login`
|
|
5. **Callback handler** — `/auth/oidc/callback`, with the four claim-resolution branches
|
|
6. **Logout handler augmentation** — branch on `auth_source`
|
|
7. **Login form rejection** — local-user password form rejects OIDC accounts
|
|
8. **State cleanup** — extend the alert engine's existing cleanup tick
|
|
9. **UI** — `oidc` chip on users list, disabled fields on edit-form for OIDC users, login page SSO button + error banner
|
|
10. **Tests** — config parse tests; happy-path callback test using a fake IdP (httptest server with a hand-rolled discovery doc + JWKS); username-collision test; no-role-match test; logout test
|
|
11. **Sweep** — full Playwright walk against an actual IdP (Authelia in a Docker container) — admin gets in via OIDC, role mapping works, logout redirects through IdP, OIDC user can't password-login
|
|
|
|
## Test strategy
|
|
|
|
The IdP is the hard part to test cleanly. Two layers:
|
|
|
|
- **Unit / integration tests** use a stub OIDC provider built into the test harness — `httptest.Server` exposing `.well-known/openid-configuration`, a token endpoint that signs minted JWTs with a test ECDSA key, and a JWKS endpoint serving the public key. This covers every code path without a real IdP. Pattern: each test mints its own claims and runs the callback against the stub.
|
|
- **Smoke env** runs against a real Authelia container (existing `compose.smoke.yaml`-style file or one-liner `docker run`) for the final sweep — confirms the discovery doc isn't being misread, real JWT verification works, real `end_session_endpoint` redirect works.
|
|
|
|
## Out of scope (deferred)
|
|
|
|
- **Multi-provider** support (`providers:` array)
|
|
- **Back-channel logout** (RFC 8138) — schema isn't blocked from adding it later
|
|
- **UI-driven role mapping** (config-only in v1)
|
|
- **Refresh tokens / mid-session role re-evaluation** — login-only refresh in v1
|
|
- **`oidc.disable_local_users`** flag — both paths stay open in v1
|
|
- **OIDC user dashboard chip / badges** beyond the small `oidc` indicator on the users list
|
|
- **Per-user "auth source" filter on the users list** — sortable headers cover most of the use case
|
|
|
|
## Risks / gotchas
|
|
|
|
- **JWKS key rotation** — refresh on first failed verification is the standard fix; document the cache TTL (1h) in the config block.
|
|
- **Clock skew** — accept `iat`/`exp` with a 60s leeway; matches what most OIDC libraries do.
|
|
- **End-session 404 / not advertised** — degrade gracefully; just drop the session and 303 to `/login`. Don't 500 the logout because the IdP doesn't implement RP-initiated logout.
|
|
- **Username changes at the IdP** — silently keep the local username (matches our locked decision: subject is the stable key, username is display-only). Document.
|
|
- **Role claim is sometimes a string, sometimes an array, sometimes a comma-separated string** depending on IdP — normalise into `[]string` before mapping. Authelia/Keycloak emit arrays; some custom setups emit strings; handle both.
|
|
- **Authelia `sub` is an opaque UUID, not the username** (Authelia 4.39+ default for new clients). Don't assume `sub` is human-readable; it's stable but display value is `preferred_username` or `email`. The locked design already keys lookups on `sub` and uses `preferred_username` for the display username, so this is just a correctness note.
|
|
- **`end_session_endpoint` may not be published** (Authelia doesn't advertise it for many configs). The locked logout flow already degrades to "drop session + redirect to /login" when the discovery doc lacks it; no extra config needed.
|
|
- **Password-form bypass for OIDC users via /api/auth/login (JSON)** — same rejection rule applies, not just the HTML form.
|
|
|
|
## Acceptance
|
|
|
|
- [ ] An OIDC user with `roles: ["rm-admins"]` can sign in, becomes an admin, is visible in `/settings/users` with an `oidc` chip
|
|
- [ ] Same user signing in again resolves to the same row (no duplicate)
|
|
- [ ] Same user with `roles: ["something-else"]` is denied, lands on `/login?oidc_error=no_role_match` with a banner, no row created
|
|
- [ ] OIDC user can't password-login through `/login` or `/api/auth/login`
|
|
- [ ] Admin disables an OIDC user → next OIDC login is rejected, existing session bounced (existing disable-mid-session)
|
|
- [ ] Sign out as an OIDC user → 303 to IdP's end-session URL (when advertised); no end-session URL → 303 to `/login`
|
|
- [ ] OIDC config absent → password login works exactly as today (zero behavioural change)
|
|
- [ ] Username collision: a local `alice` exists, OIDC user with `preferred_username=alice` and a different `sub` → blocked at sign-in with the clear error page
|
|
- [ ] Last-admin guard refuses to demote the only enabled admin even if the IdP's role mapping says otherwise
|
|
- [ ] All existing tests pass; new test suite covers the four claim-resolution branches and logout
|