restic-manager/docs/superpowers/specs/2026-05-05-p4-05-oidc-design.md at 814e49cb93ef0956acae1caca56b0e1b1ab0abfd

Files

T

steve 814e49cb93 spec: P4-05 — OIDC login design

Brainstormed shape locked: JIT-provision local rows on first OIDC
sign-in (auth_source='oidc'), YAML-only config (no UI), 'roles'
claim with deny-on-no-match default, preferred_username with email
fallback, refuse on local-user collision, single provider, login
page shows SSO above password (break-glass), front-channel logout
only, role re-evaluation at login only.

Migration 0019: users.auth_source + users.oidc_subject (partial
unique index), sessions.id_token (for end_session id_token_hint),
oidc_state table for the OAuth round-trip state, swept on the
existing alert-engine tick.

Composes with the user-management work from P4-03/04: admin can
disable OIDC users like local; last-admin guard catches IdP role-
mapping mistakes; audit trail covers JIT-provision via
user.created with auth_source payload + new user.oidc_login /
user.oidc_login_blocked actions.

Out of scope (deferred): back-channel logout, multi-provider,
UI-driven role mapping, refresh tokens / mid-session re-eval.

2026-05-05 12:04:09 +01:00

15 KiB

Raw Blame History

Date: 2026-05-05 Status: brainstorm complete; ready for plan Closes: P4-05 (OIDC login)

Goal

Wire OpenID Connect authentication as a sign-in path alongside the existing local-user system, so a deployment that already has an IdP (Authelia, Authentik, Keycloak, Okta, Auth0, etc.) can use it for restic-manager logins.

Architecture

OIDC sits on top of the local-user system rather than replacing it. The first time a user signs in via OIDC the server just-in-time provisions a local user row marked auth_source='oidc', with role derived from the IdP's roles claim. Subsequent sign-ins look up the same row by stable oidc_subject and refresh role + email from the latest claims. Once the row exists it behaves like any other local user — admin can disable it, force-logout, see it in audit logs, etc. — except password-login is rejected because there's no password.

The Authorization Code flow (with PKCE) is implemented against the discovered well-known config of a single configured issuer. Front-channel logout: clicking Sign out drops the local session + redirects the browser to the IdP's end_session_endpoint (when advertised). Back-channel logout deferred.

Locked decisions

Decision	Pick
User lifecycle	B — JIT-provision local rows on first OIDC login (`auth_source='oidc'`, `oidc_subject`)
Role mapping config	A — YAML/env, claim name `roles`, default = deny on no-match
Username source	`preferred_username`, fallback to `email`
Username collision with existing local user	Refuse with clear remediation message
Provider config	Single provider — `providers:` array can come later
Login page layout	SSO button above password form; password form labelled "or sign in with a local account"
OIDC users + password login	Disabled — `auth_source='oidc'` rows have empty `password_hash`; password form rejects them
Logout shape	Front-channel only — drop session + redirect to `end_session_endpoint` when advertised
Role re-evaluation	At login only — claims read at the OIDC callback; admin can disable mid-session locally

Schema changes

Migration 0019 — users extensions for OIDC bookkeeping:

ALTER TABLE users ADD COLUMN auth_source TEXT NOT NULL DEFAULT 'local'
  CHECK (auth_source IN ('local', 'oidc'));
ALTER TABLE users ADD COLUMN oidc_subject TEXT;

CREATE UNIQUE INDEX users_oidc_subject ON users(oidc_subject)
  WHERE oidc_subject IS NOT NULL;

Both column-level ALTERs (CLAUDE.md preference). The unique partial index defends the JIT-lookup invariant (one row per IdP subject) without blocking multiple rows with NULL oidc_subject (the local users).

Configuration

# server config — extend existing config struct
oidc:
  issuer:        https://auth.example.com    # well-known config discovered from this
  client_id:     restic-manager
  client_secret: ${RM_OIDC_CLIENT_SECRET}    # or via _FILE
  scopes:        [openid, profile, email, roles]   # 'roles' usually means a custom scope
  role_claim:    roles                       # default if absent
  role_mapping:
    rm-admins:    admin
    rm-operators: operator
    rm-viewers:   viewer
  # Optional — auto-derived from BaseURL if absent.
  redirect_url:  https://rm.example.com/auth/oidc/callback

Env-var overrides: RM_OIDC_ISSUER, RM_OIDC_CLIENT_ID, RM_OIDC_CLIENT_SECRET, RM_OIDC_CLIENT_SECRET_FILE. Mapping is YAML-only (env doesn't fit a multi-key string→string map cleanly).

When oidc.issuer is empty or missing, OIDC is disabled (current behaviour). No restart-toggle UI; this is a deploy-time setting.

Auth flow

GET /auth/oidc/login — only mounted when OIDC is configured.

Generate state (32 random bytes, base64) and code_verifier (64 random bytes, base64); compute code_challenge = base64(sha256(code_verifier)).
Store (state, code_verifier, created_at) in a new ephemeral table (or in memory with a 5-minute TTL — see "trade-off" below).
Redirect to <authorization_endpoint>?response_type=code&client_id=...&redirect_uri=...&scope=...&state=...&code_challenge=...&code_challenge_method=S256.

Callback

GET /auth/oidc/callback?code=...&state=... — also OIDC-only mount.

Validate state against the stored value (one-shot — delete row on read). Reject if missing/expired/already used.
Exchange code + code_verifier for tokens at token_endpoint.
Validate the id_token JWT: signature against the JWKS endpoint, iss, aud, exp, iat, nonce (if used).
Extract sub, preferred_username, email, and the configured role_claim (default roles).
Pick username: preferred_username if non-empty, else email. Lowercase / trim per the existing local-user rules.
Pick role: first match in role_mapping against the array of role-claim values. No match → deny with a clear error page, no row created.
Look up user by oidc_subject. Three cases:
- Found — refresh email, role, last_login_at. Don't touch username (changing it would break audit trails; if the IdP changes the username, that's an operator concern). Log user.oidc_login.
- Not found, username free — INSERT row with auth_source='oidc', oidc_subject=<sub>, password_hash='', must_change_password=0. Log user.created with payload {"auth_source":"oidc"} + user.oidc_login.
- Not found, username taken by a local user — render an error page: "This OIDC user (<sub>) wants to sign in as alice, but a local user with that name already exists. Ask your administrator to either rename / remove the local user, or exclude this user from the OIDC mapping." 403, no row created. Log user.oidc_login_blocked.
Drop a session cookie + MarkUserLogin (the existing helper).
Redirect to /.

Logout

POST /logout (existing handler) — augmented:

Look up the session before deletion (we need the user row to know if they're an OIDC user).
Delete the session as today.
If the user is auth_source='oidc' AND the discovered end_session_endpoint is non-empty → 303 to <end_session_endpoint>?id_token_hint=<id_token>&post_logout_redirect_uri=<base>/login. Otherwise → existing 303 to /login.

We need to keep the latest id_token per session to drive id_token_hint. Stash it in a new sessions.id_token TEXT column (one column-level ALTER on migration 0019 alongside the user columns), populated only for OIDC sessions.

State table

Two reasonable shapes for the short-lived state used during the OAuth round-trip:

In-memory map with a 5-minute TTL sweeper. Simpler, but multi-process deployments lose it (no multi-process today, but Phase 5 OSS readiness might add).
oidc_state table — (state_hash PK, code_verifier, created_at), swept on the same 60s alert-engine tick that already handles setup-token cleanup.

I'll go with the table. Costs ~3 lines in the existing cleanup tick, behaves correctly under restarts, and survives a future scale-out. Migration 0019 includes:

CREATE TABLE oidc_state (
  state_hash    TEXT PRIMARY KEY,    -- sha256(state) hex; raw state never persisted
  code_verifier TEXT NOT NULL,
  created_at    TEXT NOT NULL
);
CREATE INDEX oidc_state_created ON oidc_state(created_at);

/login template branches based on view.OIDCEnabled:

OIDC off → current layout (just the password form).
OIDC on → an Sign in with <provider name> button at the top, then a faint divider line, then the existing password form labelled "Or sign in with a local account". Provider name comes from a new optional config oidc.display_name (defaults to "SSO").

Failed-OIDC redirects (no role match, username collision, IdP error) land on /login?oidc_error=<reason> with a small banner above the buttons.

Audit actions

New entries in the action vocabulary:

user.oidc_login (target_kind=user, target_id=user_id, payload {"sub":"…"})
user.oidc_login_blocked (target_kind=user, target_id=oidc_subject when no row was created, payload {"username":"…", "reason":"username_taken|no_role_match|other"})
user.created already exists; OIDC's first-time provisioning fires this with payload {"auth_source":"oidc"} so the audit log distinguishes admin-created from JIT-provisioned rows.

User-management UI changes

Small additions, not new screens:

Users list — Status column adds a small oidc chip when auth_source='oidc' so admin can see at a glance which rows came from JIT-provisioning. Sortable by auth_source via the same sortable-headers pattern (lands as a small follow-up if anyone asks; out of scope for v1).
Add user form — disabled when OIDC is the only auth path, with a hint: "User provisioning is handled by your OIDC provider; users appear here on first sign-in." Configurable later via a oidc.disable_local_users flag if that becomes a real ask. Out of scope for v1; both paths stay open.
Edit user form — when auth_source='oidc':
- Username field disabled (changing it would just be undone on next OIDC login)
- Role dropdown disabled, with a hint: "Role is managed by your OIDC provider's roles claim mapping. Edit the mapping in server config to change."
- Email field disabled (refreshed from IdP on each login)
- Disable / Enable / Force logout still work — disabling an OIDC user kicks their session and rejects future OIDC logins ("user disabled by administrator")
- Regenerate setup link hidden — there's no setup token for OIDC users
Login UI — password form rejects users with auth_source='oidc' ("This account uses single sign-on. Click the SSO button above.")

Middleware / handler changes

Routes: new public-band entries GET /auth/oidc/login, GET /auth/oidc/callback. Skipped entirely when OIDC isn't configured (s.deps.OIDC == nil).
Logout handler augmented to fetch the user row + decide between local logout (303 → /login) and OIDC logout (303 → end_session_endpoint).
Login handler rejects auth_source='oidc' users with the SSO-prompt error.
Last-admin guard — already covers OIDC users naturally because they live in the users table. The role-from-claims path could create a "every admin gets demoted to operator" situation if the IdP's claim mapping is wrong; the guard rejects that demotion at the moment it'd be applied (returns the user to the login page with oidc_error=role_change_blocked and audit entry; admin must fix the mapping or promote a local admin first).

Implementation outline

Schema — migration 0019 (users.auth_source + oidc_subject, sessions.id_token, oidc_state table)
Config — extend internal/server/config with the OIDC block + env-var overrides; load JWKS lazily
Discovery + JWKS — small helper that fetches <issuer>/.well-known/openid-configuration once at startup, caches authorization_endpoint, token_endpoint, end_session_endpoint, jwks_uri. JWKS refreshed on first failed verification.
Login start handler — /auth/oidc/login
Callback handler — /auth/oidc/callback, with the four claim-resolution branches
Logout handler augmentation — branch on auth_source
Login form rejection — local-user password form rejects OIDC accounts
State cleanup — extend the alert engine's existing cleanup tick
UI — oidc chip on users list, disabled fields on edit-form for OIDC users, login page SSO button + error banner
Tests — config parse tests; happy-path callback test using a fake IdP (httptest server with a hand-rolled discovery doc + JWKS); username-collision test; no-role-match test; logout test
Sweep — full Playwright walk against an actual IdP (Authelia in a Docker container) — admin gets in via OIDC, role mapping works, logout redirects through IdP, OIDC user can't password-login

Test strategy

The IdP is the hard part to test cleanly. Two layers:

Unit / integration tests use a stub OIDC provider built into the test harness — httptest.Server exposing .well-known/openid-configuration, a token endpoint that signs minted JWTs with a test ECDSA key, and a JWKS endpoint serving the public key. This covers every code path without a real IdP. Pattern: each test mints its own claims and runs the callback against the stub.
Smoke env runs against a real Authelia container (existing compose.smoke.yaml-style file or one-liner docker run) for the final sweep — confirms the discovery doc isn't being misread, real JWT verification works, real end_session_endpoint redirect works.

Out of scope (deferred)

Multi-provider support (providers: array)
Back-channel logout (RFC 8138) — schema isn't blocked from adding it later
UI-driven role mapping (config-only in v1)
Refresh tokens / mid-session role re-evaluation — login-only refresh in v1
oidc.disable_local_users flag — both paths stay open in v1
OIDC user dashboard chip / badges beyond the small oidc indicator on the users list
Per-user "auth source" filter on the users list — sortable headers cover most of the use case

Risks / gotchas

JWKS key rotation — refresh on first failed verification is the standard fix; document the cache TTL (1h) in the config block.
Clock skew — accept iat/exp with a 60s leeway; matches what most OIDC libraries do.
End-session 404 / not advertised — degrade gracefully; just drop the session and 303 to /login. Don't 500 the logout because the IdP doesn't implement RP-initiated logout.
Username changes at the IdP — silently keep the local username (matches our locked decision: subject is the stable key, username is display-only). Document.
Role claim is sometimes a string, sometimes an array, sometimes a comma-separated string depending on IdP — normalise into []string before mapping. Authelia/Keycloak emit arrays; some custom setups emit strings; handle both.
Password-form bypass for OIDC users via /api/auth/login (JSON) — same rejection rule applies, not just the HTML form.

Acceptance

An OIDC user with roles: ["rm-admins"] can sign in, becomes an admin, is visible in /settings/users with an oidc chip
Same user signing in again resolves to the same row (no duplicate)
Same user with roles: ["something-else"] is denied, lands on /login?oidc_error=no_role_match with a banner, no row created
OIDC user can't password-login through /login or /api/auth/login
Admin disables an OIDC user → next OIDC login is rejected, existing session bounced (existing disable-mid-session)
Sign out as an OIDC user → 303 to IdP's end-session URL (when advertised); no end-session URL → 303 to /login
OIDC config absent → password login works exactly as today (zero behavioural change)
Username collision: a local alice exists, OIDC user with preferred_username=alice and a different sub → blocked at sign-in with the clear error page
Last-admin guard refuses to demote the only enabled admin even if the IdP's role mapping says otherwise
All existing tests pass; new test suite covers the four claim-resolution branches and logout

15 KiB Raw Blame History

P4-05 — OIDC Login Design