178 Commits

Author SHA1 Message Date
steve c446ca072e ui(alerts): make Acknowledge vs Resolve distinction visible
CI / Build (windows/amd64) (pull_request) Successful in 22s
CI / Lint (pull_request) Successful in 37s
CI / Build (linux/amd64) (pull_request) Successful in 22s
CI / Build (linux/arm64) (pull_request) Successful in 23s
CI / Test (linux/amd64) (pull_request) Successful in 3m55s
Both buttons make the row leave the Open tab, so on a quiet system
they look identical. The behavioural difference only manifests next
time the underlying condition fires:

  - Acknowledge silences fan-out while the problem persists; the
    alert parks on the Acknowledged tab and recurrences just touch
    last_seen_at without re-notifying.
  - Resolve closes the alert. If the same condition fires again
    later, a fresh alert with a new id raises and the channels
    fan out as if it were the first time.

Add a one-line legend under the page header explaining both, and
title= tooltips on each button covering the same ground for keyboard
and assistive tech.
2026-05-04 23:11:46 +01:00
steve 84814b1386 Merge pull request 'Phase 3 — Alerts: per-source-group dedup' (#8) from p3-alerts-dedup into main
CI / Build (windows/amd64) (pull_request) Successful in 23s
CI / Build (linux/amd64) (pull_request) Successful in 23s
CI / Build (linux/arm64) (pull_request) Successful in 22s
CI / Lint (pull_request) Successful in 1m22s
CI / Test (linux/amd64) (pull_request) Successful in 1m28s
Reviewed-on: #8
2026-05-04 22:11:08 +00:00
steve a45c801884 feat(alerts): per-source-group dedup so two failing backups produce two alerts
Until now the open-alert key was (host_id, kind, resolved_at IS NULL).
A host with two source groups both failing collapsed onto one
backup_failed row — second failure bumped last_seen_at and
overwrote the message but never re-fan-out. Operators saw one
alert that appeared to flap, not two distinct broken things.

Schema changes (column-level ALTER, no rebuild):

- 0015 jobs.source_group_id (FK → source_groups, ON DELETE SET NULL,
  index). Populated for backup jobs in CreateJob.
- 0016 alerts.dedup_key (NOT NULL DEFAULT ''). The old alerts_open
  partial index gets dropped and replaced with a UNIQUE partial
  index on (host_id, kind, dedup_key) WHERE resolved_at IS NULL —
  the index is now the actual dedup primitive.

Plumbing:

- RaiseOrTouch / AutoResolve / Alert struct gain dedup_key.
- engine.JobFinishedEvent gains SourceGroupID; handleJobFinished
  passes it through for backup_failed only (forget/prune/check stay
  repo-scoped with key='').
- ws.handler reads SourceGroupID off the freshly-loaded job row.
- dispatchJobWithPayload gains a *string sourceGroupID arg; the
  per-group Run-now path and schedule.fire path pass &g.ID.

Test coverage: TestRaiseOrTouchDedupsPerSourceGroup proves two
distinct groups produce two distinct open alerts and that resolving
one does not auto-resolve the other.

Dev tool: cmd/_fake_alert gains -dedup-key flag.
2026-05-04 22:59:48 +01:00
steve 7792aadb94 Merge pull request 'Phase 3 — Alerts (P3-05/06/07)' (#7) from p3-alerts into main
Reviewed-on: #7
2026-05-04 21:51:16 +00:00
steve 2eac324cec chore: ignore cmd/_* dev binaries + Tailwind rebuild
CI / Build (windows/amd64) (pull_request) Successful in 21s
CI / Build (linux/amd64) (pull_request) Successful in 21s
CI / Build (linux/arm64) (pull_request) Successful in 22s
CI / Lint (pull_request) Successful in 1m13s
CI / Test (linux/amd64) (pull_request) Successful in 1m20s
cmd/_fake_alert and similar one-shot dev tools live under cmd/_*
where Go's build tooling skips them. Add an explicit gitignore line
so an accidental 'git add cmd/.' can't drag them into a release.

styles.css is the regenerated Tailwind output — picks up the new
ntfy basic-auth fields and the right-rail preview ids.
2026-05-04 22:49:46 +01:00
steve 3cdaee63d4 fix: payload-preview rail follows kind switcher
CI / Lint (pull_request) Successful in 32s
CI / Build (windows/amd64) (pull_request) Successful in 43s
CI / Build (linux/amd64) (pull_request) Successful in 21s
CI / Test (linux/amd64) (pull_request) Successful in 1m18s
CI / Build (linux/arm64) (pull_request) Successful in 43s
Right-rail preview was rendered server-side via {{if eq $f.Kind ...}},
so it stayed on whatever kind the page loaded with. Editing an SMTP
channel and flipping to ntfy in the picker left the email RFC 5322
sample on screen.

Render all three preview panels with id='preview-<kind>' (only the
matching one visible on first render) and toggle their .hidden class
in the kind-switcher JS alongside the field panels. Same pattern
used for fields-<kind>.
2026-05-04 22:40:46 +01:00
steve 7f2a9964db fix: move channel delete-panel out of edit form (nested form bug)
CI / Build (windows/amd64) (pull_request) Successful in 21s
CI / Build (linux/amd64) (pull_request) Successful in 22s
CI / Build (linux/arm64) (pull_request) Successful in 21s
CI / Lint (pull_request) Successful in 1m11s
CI / Test (linux/amd64) (pull_request) Successful in 1m22s
The delete-panel <form action='.../delete'> was nested inside the
main <form action='.../edit'>. HTML doesn't allow nested forms —
browsers parse the inner form as if it didn't exist, so clicking
'Delete permanently' submitted the outer edit form to /edit
instead of /delete, leaving the channel intact.

Move the delete-panel block to a sibling of the main form. The
'Delete channel…' button still toggles its visibility via JS, the
panel still renders inside the page layout, and now its form
actually posts to the delete handler.
2026-05-04 22:35:58 +01:00
steve feaeff217d feat(ntfy): support HTTP Basic auth alongside access tokens
CI / Build (windows/amd64) (pull_request) Successful in 22s
CI / Build (linux/amd64) (pull_request) Successful in 22s
CI / Build (linux/arm64) (pull_request) Successful in 21s
CI / Lint (pull_request) Successful in 1m12s
CI / Test (linux/amd64) (pull_request) Successful in 1m18s
Self-hosted ntfy that doesn't expose a token-mint endpoint can still
authenticate over HTTP Basic. Add Username + Password fields to
NtfyConfig; the channel sends 'Authorization: Basic …' when token is
empty and username is set. Token wins when both are configured.

Form-side: two new optional fields next to the access token, with
the same write-only placeholder treatment as smtp_password (blank
on edit means 'keep stored value'). Username is round-tripped on
edit; password is masked.
2026-05-04 22:25:42 +01:00
steve cffad4b4f3 fix: enabled toggle — list-row click + edit-form save
CI / Build (windows/amd64) (pull_request) Successful in 22s
CI / Build (linux/amd64) (pull_request) Successful in 24s
CI / Build (linux/arm64) (pull_request) Successful in 24s
CI / Lint (pull_request) Successful in 1m15s
CI / Test (linux/amd64) (pull_request) Successful in 1m36s
Two bugs in the channel-enabled affordance:

1. List-row toggle was a static span with no handler; the row's
   row-link overlay swallowed every click and routed to /edit. Add
   POST /settings/notifications/{id}/toggle backed by a new store
   method SetNotificationChannelEnabled, and turn the row toggle
   into an htmx-driven button that swaps in the new state. Use
   event.stopPropagation() on the toggle so it beats the row link.

2. Edit-form toggle visually flipped but the underlying checkbox
   reverted: the visual span lives inside the <label>, so clicking
   it fired the inline JS handler AND the label's native
   checkbox-toggle, cancelling out. Bind to the checkbox 'change'
   event instead and let the label do the toggling — the JS just
   mirrors check.checked into the .on class.
2026-05-04 22:21:45 +01:00
steve 84e121bb9c fix: read 'name' across all per-kind sub-forms when editing channels
CI / Build (windows/amd64) (pull_request) Successful in 22s
CI / Lint (pull_request) Successful in 38s
CI / Build (linux/amd64) (pull_request) Successful in 21s
CI / Build (linux/arm64) (pull_request) Successful in 22s
CI / Test (linux/amd64) (pull_request) Successful in 2m39s
The channel form has three inputs all named 'name' (one per kind
section: webhook / ntfy / smtp), but only the visible kind's input
is filled in. PostForm.Get returns the first regardless of
emptiness, so editing an ntfy or smtp channel always read '' from
the (hidden, unfilled) webhook section's name input and rejected
with 'name required'.

Add firstNonEmpty helper that scans the slice for the first
non-blank value. Same flavour of bug as the enabled checkbox fix
in 6466f8c — both fall out of having multiple inputs share a name
across the per-kind sub-forms.
2026-05-04 22:16:59 +01:00
steve c5b884a22b tasks: tick P3-05/06/07 + Playwright sweep notes
CI / Build (windows/amd64) (pull_request) Successful in 22s
CI / Lint (pull_request) Successful in 32s
CI / Build (linux/amd64) (pull_request) Successful in 22s
CI / Build (linux/arm64) (pull_request) Successful in 21s
CI / Test (linux/amd64) (pull_request) Successful in 3m44s
Sweep against the live smoke env confirmed the alerts subsystem
end-to-end: three channels (webhook → local sink, ntfy → ntfy.sh,
SMTP → MailHog) created and verified via the Test button; synthetic
critical raised; ack + resolve fan out alert.acknowledged /
alert.resolved across all three; dashboard banner appears and
clears; nav badge tracks open count.

Three real bugs found and fixed mid-sweep — see preceding three
commits for the full reasoning.
2026-05-04 21:01:34 +01:00
steve 3d99306cea fix: refresh hosts.open_alert_count on Raise/Resolve/AutoResolve
The denormalised projection was never written by the alerts code
path, so the dashboard's OPEN ALERTS card and the per-host alerts
column always read 0 regardless of how many alerts were open.
fleet.GetStats sums hosts.open_alert_count; if it never moves, the
card is decoration.

Add refreshHostOpenAlertCount that recomputes from the alerts table
(self-healing — no +/- bookkeeping to drift). Call it after the
commit in RaiseOrTouch when a row was inserted, after Resolve, and
after AutoResolve.

Caught during the live sweep: a synthetic critical raised the count
to 1, but resolving it left the dashboard reading '1 unresolved'
indefinitely.
2026-05-04 21:01:17 +01:00
steve 6466f8c759 fix: read enabled checkbox correctly when paired with hidden=0 sibling
The notification channel form has a <input hidden name=enabled value=0>
plus a <input checkbox name=enabled value=1> so unchecking the box
still submits 'enabled=0' (otherwise the field would just be absent).
But Go's url.Values.Get returns the FIRST value, so even when the
checkbox is ticked the handler read '0' and persisted enabled=false.

Scan r.PostForm["enabled"] for any '1' instead. Caught during the
sweep — all three test channels saved with enabled=0 even though
the toggle visually rendered ON.
2026-05-04 21:00:54 +01:00
steve 9be3cead8e fix: dispatch alert.acknowledged + alert.resolved on UI ack/resolve
Spotted during the live Playwright sweep: clicking Acknowledge or
Resolve updated the alert row but never fanned out a notification.
The handlers went straight to Store.Acknowledge/Resolve, bypassing
the hub.

Add Engine.Acknowledge and Engine.Resolve that wrap the store call
and dispatch the matching event to every enabled channel. The UI
handlers prefer the engine path when wired, and fall back to the
direct store call so unit tests that construct a Server without an
engine still work.

Use context.WithoutCancel for the goroutine dispatch — the request
context is cancelled the instant the handler returns 204, so the
naive 'go e.hub.Dispatch(ctx, ...)' was racing the response and
losing the channel-list query with 'context canceled'.
2026-05-04 21:00:44 +01:00
steve ee410fcf95 alert: construct + run engine; expose hub to handlers
- Construct notification.NewHub and alert.NewEngine at boot in cmd/server/main.go
- Start go alertEngine.Run(ctx) after construction, before the HTTP listener
- Wire AlertEngine and NotificationHub into rmhttp.Deps (fields already existed)
- Remove the TODO(G1) in the offline sweeper; now calls NotifyHostOffline per ID
2026-05-04 20:32:10 +01:00
steve e0fbb8c980 ui: dashboard crit-alerts banner 2026-05-04 20:29:49 +01:00
steve 371fe734f3 ui: /settings/notifications list + edit form (3 kinds)
Add settings.html (shell + sub-tab nav + conditional list/edit body),
notifications.html and notification_edit.html (glob stubs), and the
supporting CSS tokens (.ch-row, .ch-icon, .toggle, .kind-grid,
.kind-card, .radio-pip, .test-pill) to input.css. Rebuild styles.css.
Add ui_parse_test.go to catch template regressions at test time.

The kind picker is JS-driven (no full page reload); the enabled toggle
mirrors the existing visual toggle pattern; the test-notification button
uses HTMX and renders the JSON response as a coloured pill client-side.
2026-05-04 20:25:06 +01:00
steve d373d19647 ui: F1 — populate OpenAlerts in baseView so nav badge updates everywhere
Flagged in review of cd38b40: the Alerts tab badge should show the
open count from any page, not just /alerts. baseView now takes the
request and queries store.ListAlerts(Status: "open") to fill
view.OpenAlerts on every page render. All call sites updated.
2026-05-04 20:19:09 +01:00
steve cd38b40516 ui: alerts list page + alert row partial + nav badge 2026-05-04 20:15:01 +01:00
steve de6939b3f6 http: /settings/notifications CRUD + test endpoint 2026-05-04 20:06:45 +01:00
steve 873821b871 http: /alerts list + ack/resolve handlers + /api/alerts JSON 2026-05-04 19:59:24 +01:00
steve 8c42b00228 alert: wire engine into ws hello + MarkJobFinished + offline sweep
- ws.HandlerDeps gains an AlertEngine *alert.Engine field; populated
  from http.Deps.AlertEngine (nil until G1 constructs the engine)
- runAgentLoop calls NotifyHostOnline after MarkHostHello succeeds
- dispatchAgentMessage MsgJobFinished case calls NotifyJobFinished,
  looking up the job Kind via Store.GetJob before notifying
- store.MarkHostsOfflineStaleReturnIDs added: SELECT+UPDATE in one
  transaction, returns the IDs that flipped to offline
- offline sweeper in cmd/server/main.go switched to the new variant;
  TODO(G1) comment marks where NotifyHostOffline calls will land
2026-05-04 19:54:39 +01:00
steve cb4695e09a alert: rule logic for the six v1 rules 2026-05-04 19:50:33 +01:00
steve f38930e2e6 alert: engine skeleton + event channels 2026-05-04 19:47:09 +01:00
steve 16e71a0708 notification: Hub fan-out + log writer 2026-05-04 19:44:31 +01:00
steve a6ac9ee71d notification: smtp channel 2026-05-04 19:40:21 +01:00
steve a99864c649 notification: B3 — Content-Type header + URL trim
Fixes flagged in spec review of f0a323e: ntfy POSTs need explicit
Content-Type: text/plain (the spec calls for it; ntfy works without
but explicit beats inferred); trim trailing slashes from server URL
to avoid double-slash when operators paste 'https://ntfy.sh/'.
2026-05-04 19:38:16 +01:00
steve f0a323ef91 notification: ntfy channel 2026-05-04 19:35:50 +01:00
steve c22fb24f5b notification: webhook channel 2026-05-04 19:33:29 +01:00
steve 6688b3f88a notification: payload + Channel interface 2026-05-04 19:31:27 +01:00
steve 69fc89143d store: notification_channels CRUD + AppendNotificationLog 2026-05-04 19:28:41 +01:00
steve b5a0aa4667 store: alerts CRUD with dedup + last_seen_at bump 2026-05-04 19:24:17 +01:00
steve f24dfa5214 store: migration 0014 — notification_channels + notification_log 2026-05-04 19:20:37 +01:00
steve 640b64710e store: A1 — check rows.Err() + Scan err in migrate_test
Code-quality nits flagged in review of e6d965d. Mirrors the existing
pattern in host_credentials_test.go.
2026-05-04 19:19:28 +01:00
steve e6d965d7a5 store: migration 0013 — alerts.last_seen_at 2026-05-04 19:16:59 +01:00
steve 4b70939ab5 docs: P3 alerts implementation plan 2026-05-04 19:00:18 +01:00
steve 518c29ddb3 docs: P3 alerts spec — add SMTP as first-class v1 channel
Post-brainstorm change after operator review: overnight-digest /
"don't ping me at 03:00, email me in the morning" use case is poorly
served by ntfy (push) and clumsy via webhook → email-gateway. SMTP joins
webhook + ntfy as the third v1 channel; Apprise stays deferred.

Spec updates:
- Decision 5 reworded: three channels in v1.
- Channel iface gains smtpChannel using net/smtp + crypto/tls. 10s
  timeout vs 5s for HTTP — STARTTLS handshake + DATA over a slow link
  legitimately needs the headroom.
- Migration 0014 CHECK now allows 'smtp'. New smtpConfig struct: host,
  port, encryption (starttls/tls/none), username, password (AEAD), from,
  to. One channel = one To-address; multi-recipient = multiple channels
  (keeps failure attribution per-recipient).
- Body shape documented: hardcoded subject pattern
  '[restic-manager] [<sev>] <host>: <kind>', Message-ID includes the
  alert id so threading groups raised → ack → resolved cleanly. Plain
  text only in v1.
- Encryption defaults to STARTTLS on 465/587; PLAIN auth over TLS, no
  XOAUTH2 yet (app passwords recommended for Gmail / M365).
- Test plan adds MailHog step in the Playwright sweep.
- Non-goals expanded: HTML emails, OAuth2/XOAUTH2, multi-recipient
  channels are explicitly out of v1.

Wireframe updates (_diag/p3-alerts-wireframe/wireframe.html):
- Kind picker grows from 2 cards to 3 (Webhook / Ntfy / SMTP @). SMTP
  gets the --ok green colour family so it visually separates from
  webhook (accent) and ntfy (warm).
- New SMTP variant section (3c): host+port+encryption row, user+pass
  row, from+to row, test result, plus right-rail email shape preview
  showing the RFC 5322 layout.
- Channel list grows a third row: 'overnight-digest · smtp://… →
  ops-overnight@example.com'.
2026-05-04 18:48:15 +01:00
steve 6165e34f6f docs: P3 alerts design spec
Phase 3 sub-spec covering the alerts engine, notification channels, and
UI (P3-05/06/07). Brainstorm ran 2026-05-04; all ten design decisions
locked before this spec was written.

Key decisions captured:

- Hardcoded rule set, no operator-tunable thresholds in v1. Six rules:
  backup_failed, forget_failed, prune_failed, check_failed,
  stale_schedule, agent_offline.
- Hybrid engine cadence: event hooks at MarkJobFinished + offline-sweeper
  for immediate triggers; one 60s ticker for stale-schedule detection +
  auto-resolution sweeps.
- Auto-resolve when underlying condition clears; manual Resolve any time;
  Acknowledge as a separate I-have-seen-it intermediate state that does
  NOT close the alert.
- v1 channels: native ntfy + webhook. Apprise + SMTP deferred. Channel
  scope is global only — no per-host or per-severity routing.
- Webhook payload is one stable JSON envelope shape across raised /
  acknowledged / resolved / test events; ntfy uses the standard publish
  format with severity → priority mapping.
- Per-channel Send Test Notification button hits the real send path with
  a synthetic info-severity event; inline green-tick / red-cross result.
- Dedup by (host_id, kind, resolved_at IS NULL); last_seen_at bumped on
  every confirming tick so the UI can render still happening · Ns ago
  without re-notifying.
- Top-level /alerts page; Settings shell with Notifications sub-tab.
  Per-host vitals Open alerts cell deep-links into filtered list.
- Best-effort fire-and-forget delivery with 5s timeout; failures logged
  to a new notification_log table but never retried. Alert row in the DB
  is the source of truth.

Migrations:
- 0013 adds alerts.last_seen_at (column-level ALTER per CLAUDE.md)
- 0014 adds notification_channels + notification_log tables

Wireframe: _diag/p3-alerts-wireframe/wireframe.html
2026-05-04 18:39:26 +01:00
steve 64861a5fb8 Merge pull request 'Phase 3 — Restore (P3-X1, X2, 01, 02, 03, 09, X3-X6)' (#6) from p3-restore into main
Reviewed-on: #6
2026-05-04 17:06:18 +00:00
steve 28d5043eb0 test: lock-protect fakeSender so -race CI passes
CI / Lint (pull_request) Successful in 31s
CI / Build (linux/amd64) (pull_request) Successful in 20s
CI / Build (linux/arm64) (pull_request) Successful in 19s
CI / Test (linux/amd64) (pull_request) Successful in 1m27s
CI / Build (windows/amd64) (pull_request) Successful in 1m34s
The CI runs go test with -race; the agent runner has two pump goroutines
(pumpStdout + pumpStderr) writing through the sender concurrently, and
the unprotected fakeSender slice append raced. The cancel_test had a
local 'safeSender' workaround for the same issue; promote that mutex
onto fakeSender itself so every test in the package is race-clean
without per-test variants.

- fakeSender grows mu sync.Mutex; Send takes/releases. New snapshot()
  helper for tests that want a stable copy.
- cancel_test drops its local safeSender + sync import; uses fakeSender.

Verified: go test -race ./... passes across all packages.
2026-05-04 18:01:35 +01:00
steve e4031d26fa P3 wrap: agent auto-creates restore target; tasks.md ticked
CI / Lint (pull_request) Successful in 35s
CI / Build (linux/amd64) (pull_request) Successful in 20s
CI / Build (windows/amd64) (pull_request) Successful in 1m18s
CI / Build (linux/arm64) (pull_request) Successful in 46s
CI / Test (linux/amd64) (pull_request) Failing after 2m46s
1. Agent-side MkdirAll on the new-dir restore target. Restic creates
   missing leaves but won't traverse multiple missing levels, and
   under the systemd sandbox writes outside ReadWritePaths fail
   anyway. Calling os.MkdirAll(target, 0700) before invoking restic
   means the operator never has to pre-create the per-job subdir,
   and a path the sandbox rejects surfaces as a clean
   'restic restore: prepare target ...: read-only file system' error
   in the job log instead of a cryptic restic-side stat failure.

2. tasks.md Phase 3 — Restore section refreshed:
   - P3-X4 added (job log download dropdown — txt + ndjson)
   - P3-X5 added (UK lint locale switch + 73-correction sweep)
   - P3-X6 added (SIZE/FILES tooltip when host's restic < 0.17)
   - P3-03 entry expanded to cover version-gated --no-ownership,
     editable target, $HOME expansion, agent-side MkdirAll
   - As-shipped sweep summary mentions custom-target restore +
     download dropdown + tooltip in addition to the original walk

Test: TestRunRestoreNewDirAutoCreatesTarget seeds a multi-level
target the operator hasn't created and confirms RunRestore mkdir's
the chain before invoking restic.
2026-05-04 17:51:34 +01:00
steve 02250670c1 ui: snapshots SIZE/FILES tooltip when host's restic is < 0.17
Per-snapshot size + file-count come from the embedded summary block
restic added to 'snapshots --json' in 0.17 (the source comment in
internal/restic/snapshots.go incorrectly said 0.16+). Hosts running
0.16.x leave those columns blank.

- Fix the snapshots.go doc comment: '0.16+' -> '0.17+'.
- hostDetailPage carries a LegacyRestic bool computed from the host's
  reported ResticVersion via Env.AtLeastVersion(0, 17). Empty version
  also counts as legacy (conservative default).
- Template attaches title='Needs restic 0.17+ on the agent host. This
  host runs <ver>.' + cursor:help on the SIZE / FILES headers when
  the flag is true. Hosts already on 0.17+ get no tooltip and no
  extra styling.

A host upgrading restic to 0.17+ gets the columns populated on the
next backup automatically — no further code change needed.
2026-05-04 17:45:32 +01:00
steve 8e06bc7924 ui: tidy job-page download into a single dropdown
Replace the floating 'Download log' button + bare '.ndjson' link with
one cohesive dropdown menu — same affordance as the rest of the
header, opens to two well-described options.

- Native <details><summary> for keyboard + no-JS support; only the
  click-outside-to-close handler is JS (a few lines).
- New .dropdown / .dropdown-menu / .dropdown-item tokens in
  web/styles/input.css. Reusable for future header menus
  (host-detail overflow, source-group action menus, etc).
- Chevron flips 180 degrees when open via .dropdown[open] selector.
- Each option has a label + a mono hint line explaining when to pick it
  (.txt for humans / paste into a ticket; .ndjson for jq / tooling).
2026-05-04 17:36:57 +01:00
steve f0dfa689fe P3 follow-up: editable target dir, conditional --no-ownership, UK lint
Three small follow-ups from review:

1. Restore target is now operator-editable. Default value is the
   literal '\$HOME/rm-restore/<job-id>/' (agent expands \$HOME at
   run time using os.UserHomeDir(); also handles \${HOME} and ~/
   prefixes). Operator can replace with any absolute path.
   - ui_restore.go validates the input is either absolute or starts
     with one of the recognised prefixes; other env-var refs (\$PATH
     etc.) are deliberately rejected so operator paths can't pick up
     arbitrary agent env values.
   - host_restore.html replaces the read-only mono-text display with
     a real <input>; help text spells out that \$HOME resolves
     agent-side and <job-id> is substituted on dispatch.
   - install.sh + the systemd unit prep /root/rm-restore so the
     default works under the sandbox: ReadWritePaths gains a soft
     '-/root/rm-restore' entry (the '-' makes the bind-mount soft-fail
     if missing, but install.sh pre-creates it root-owned 0700).

2. --no-ownership flag now gated on restic version. The flag was
   added in restic 0.17 and 0.16 rejects it. Previously dropped it
   wholesale — that meant new-dir restores silently preserved
   ownership against design intent on 0.17+. Now the agent threads
   its detected restic version (sysinfo already collects it) through
   runner.Config -> restic.Env, and RunRestore appends --no-ownership
   only when AtLeastVersion(0, 17) returns true. 0.16 hosts still
   restore with original uid/gid; help text in the wizard explicitly
   notes this. The previous 'Original ownership is preserved' copy
   was wrong for new-dir mode and is corrected.

3. golangci-lint misspell locale switched US -> UK and the codebase
   swept (73 corrections, mostly behaviour/serialise/recognise/honour).
   Wire-format ErrorCode 'unauthorized' -> 'unauthorised' is a tiny
   contract change but the agent doesn't parse those codes today and
   no external API consumers exist yet. Tests passed before + after.

Tests:
- internal/restic/version_test.go covers Env.AtLeastVersion across
  edge cases (empty, exact match, patch above, minor below, non-
  numeric) and expandHome on \$HOME / \${HOME} / ~/, plus
  pass-through for absolute paths and refusal of other env vars.
- ui_restore_test updated: TargetDir now starts '\$HOME/rm-restore/'
  with the job_id substituted into the placeholder.

Live verified on the smoke env: default target restored to
/root/rm-restore/<job-id>/ as the agent's expanded \$HOME (2 files,
14 bytes); custom override '/tmp/custom-restore/<job-id>/' restored
into the agent's PrivateTmp namespace (1 file, 6 bytes); both jobs
'succeeded', exit 0.
2026-05-04 17:27:52 +01:00
steve a2398d0b66 P3 follow-up: log download (txt + ndjson) on the live job page
The diff job's full output streams to the standard live job log page,
which can be a lot of text the operator wants to grep through or paste
into a ticket. Add a Download button.

Source of truth is the persisted job_logs table — works any time
(running or finished) and doesn't need to pause the live WS stream.
The download is 'everything the server has up to right now'; if the
operator wants a fuller snapshot of a still-running job, they hit
Download again.

- New endpoint GET /api/jobs/{id}/log.{txt,ndjson} (chi {format}
  matcher constrained to the two known suffixes). Auth via session
  cookie. 404 on unknown job.
- internal/server/http/job_download.go writeLogsText emits a small
  header + 'HH:MM:SS.mmm  TAG  payload' rows mirroring what the live
  page shows. writeLogsNDJSON emits one self-contained {seq,ts,stream,
  payload} JSON object per line — appending stays valid (each line
  stands alone), and the whole file pipes cleanly into jq. NDJSON is
  newline-delimited JSON; not the same as a JSON array.
- web/templates/pages/job_detail.html grows two header buttons:
  'Download log' (txt) + '.ndjson' ghost variant for tooling.

Tests cover the txt format (header + per-row shape), the ndjson
format (each line round-trips through json.Unmarshal), unknown job
404, unauthenticated 401.
2026-05-04 17:12:45 +01:00
steve e22b41d452 P3 sweep fixes: snap-row CSS, tree expand, --no-ownership drop, target path
Bug fixes from the Playwright sweep against the live smoke server:

1. Snapshot-picker layout. The .snap-row class was used in the wireframe
   but never landed in web/styles/input.css; rows rendered as vertical
   blocks instead of a 6-column grid. Added the token (mirrors host-row
   shape with restore-specific column widths).

2. Tree expansion. hx-target='closest .tree-row + .tree-children' isn't
   a valid HTMX selector — modifiers don't chain. Replaced HTMX-driven
   expansion with a small window.__rmTreeToggle helper that uses plain
   fetch + .tree-pair wrapper structure for trivial sibling lookup.
   Caches loaded state per node.

3. --no-ownership flag dropped. Restic 0.17 introduced --no-ownership;
   0.16 rejects it ('unknown flag') before doing any work. Since the
   agent runs as root in the systemd unit, restored files keep their
   original uid/gid either way and the parent dir is root-owned, so
   the 'cp without sudo' rationale doesn't hold. Drop the flag entirely.

4. Default target dir moved to /var/lib/restic-manager/restore. The
   systemd unit pins ReadWritePaths to /etc/restic-manager +
   /var/lib/restic-manager (with ProtectSystem=strict making the rest
   of /var read-only); writes to /var/restic-restore failed with
   'read-only file system'.

5. Confirm summary HTML escaping. defaultTarget JS literal evaluates
   to a string with literal angle brackets; insertion into innerHTML
   must escape them. Added an inline HTML-escape pass.

tasks.md ticked for the Restore sub-phase with a sweep summary
covering the live end-to-end test.
2026-05-04 15:57:42 +01:00
steve 1111124573 P3-09 + P3-X3: snapshot diff + recent-restores line
P3-09 — snapshot diff dispatcher.
- POST /api/hosts/{id}/snapshots/diff (and the unprefixed HTMX-form
  variant) takes {snapshot_a, snapshot_b}, validates both belong to
  the host (long id / short id / prefix match), checks the agent is
  online, mints a JobDiff, ships command.run with DiffPayload, writes
  a host.snapshot_diff audit row, returns HX-Redirect to the live
  job page (or JSON {job_id, job_url} for REST callers).
- Two-snapshot guard: POSTing diff(a,a) returns 422.
- UI: small panel on the host_detail right rail (visible when the
  host has 2+ snapshots) with two short-id inputs and a Diff button.
  Output renders on the standard live job page where the operator
  reads the per-line diff text directly.

P3-X3 — recent-restores line.
- hostChromeData grows RestoreStatus / RestoreAt / RestoreJobID
  populated via store.LatestJobByKind(host_id, 'restore') (already
  exists, used by the init line).
- host_chrome.html renders a small line below the existing init-status
  one with status-coloured copy + a link to the job log. Hidden when
  no restore has ever run on this host.

Tests:
- diff_test covers happy path (correct DiffPayload + HX-Redirect),
  same-id rejection (422), unknown-id rejection (422). Adds a
  seedTwoSnapshots helper since ReplaceHostSnapshots is atomic-swap
  (calling seedSnapshot twice would only leave the second).

Restage block (CLAUDE.md) deferred to the end of the restore phase.
2026-05-04 15:38:28 +01:00
steve 6e47efc146 P3-01/02/03: restore wizard backend + templates + restore-shaped job page
End-to-end wizard from /hosts/{id}/restore (or per-snapshot deep link
/hosts/{id}/snapshots/{sid}/restore) → tree-browse → dispatch →
restore-shaped live job page.

Backend (internal/server/http/ui_restore.go):
- GET handlers render the four-step wizard against the wireframe shape
  in docs/superpowers/specs/2026-05-04-p3-restore-design.md.
- HTMX tree partial endpoint hits fetchTreeWithCache (P3-X2) so each
  directory expansion is a sub-second cached lookup after the first
  miss.
- POST validates: snapshot_id non-empty, ≥1 absolute path, in-place
  mode requires confirm_hostname == host name, agent online. On error
  re-renders the wizard with the operator's input intact. Happy path
  mints a job_id, computes the new-directory target as
  /var/restic-restore/<job-id>/ (operator can't escape the prefix —
  server picks it), creates the job row, ships command.run with
  kind=restore + RestorePayload, writes a host.restore audit row,
  returns HX-Redirect (or 303) to the live job page.

Templates:
- host_restore.html: single-page progressively-enabled wizard matching
  _diag/p3-restore-wizard wireframe. Form-state-driven JS computes a
  running tally of selected paths and the step-4 confirm summary
  client-side; the server re-renders on validation failure with form
  fields preserved.
- partials/tree_node.html: recursive HTMX-served tree fragment.
- Top-level Restore button on host_detail right rail + per-snapshot
  Restore action on snapshot rows replace the previous P3-stub.

Restore-shaped job page (job_detail.html):
- Progress widget rendered as a panel rather than a bare strip when
  the job is active.
- Current-file display under the bar, updated from log.stream stdout
  lines that look like absolute paths. Hidden for non-restore kinds.

Migration 0012:
- Add restore + diff to the jobs.kind CHECK. Rebuild required (SQLite
  can't ALTER CHECK in place); follows the safe pattern from 0005.
  Defensive: stash job_logs into a temp table before the rebuild and
  INSERT OR IGNORE back afterwards so even if SQLite cascades on
  DROP TABLE jobs the log history survives.

Tests:
- ui_restore_test covers GET step-1 render, GET pre-selected snapshot
  summary card, POST missing snapshot, POST missing paths, POST
  in-place wrong-hostname rejection (no command.run leaks to the
  agent), POST happy path (HX-Redirect + correct payload + audit
  row), POST against offline host returns 503.

Restage block (CLAUDE.md) deferred to the end of the restore phase.
2026-05-04 15:34:29 +01:00
steve 265b4b6c5d P3-03: restic restore + diff execution path
Wires JobRestore and JobDiff end-to-end at the agent layer (the wizard
backend that drives this lands in the next slice).

- internal/api: JobRestore + JobDiff JobKind constants. CommandRunPayload
  grows nullable Restore + Diff sub-payloads. RestorePayload carries
  snapshot_id, paths, in_place, target_dir; DiffPayload carries
  snapshot_a + snapshot_b.
- internal/restic.RunRestore wraps 'restic restore <sid> --target ...
  [--no-ownership] [--include p]...' with --json. New pumpRestoreStdout
  parses the per-line status / summary objects (drops raw status from
  log.stream — the throttled job.progress envelope covers it). New
  RestoreStatus + RestoreSummary types mirror restic's wire shape.
- internal/restic.RunDiff wraps 'restic diff --json <a> <b>'.
- internal/agent/runner: RunRestore translates RestoreStatus into
  job.progress (mapping FilesRestored → FilesDone etc) with a small
  estimateETA helper since restic doesn't provide ETA for restore.
  RunDiff is a thin streamHandler wrapper.
- cmd/agent dispatcher gains JobRestore + JobDiff cases. Both reuse
  the spawn() helper from P3-X1 so cancel just works.
- Drive-by fix: lastProgress was initialised to time.Now() so the
  very first status event was suppressed by the 1s throttle if the
  agent reported quickly. Initialise to time.Time{} (zero) so the
  first event always emits. Affects backup + restore.

Tests:
- restore_test covers restore happy path (started → progress →
  finished, kind=restore on the started envelope), in-place argv
  asserts no --no-ownership, new-dir argv asserts --no-ownership +
  --target + --include, diff produces the expected log.stream lines.

Restage block (CLAUDE.md) is deferred to the end of the restore
sub-phase so we restage once with all changes.
2026-05-04 15:24:14 +01:00
steve 6d295bc9f6 P3-X2: tree.list synchronous WS RPC + per-session cache
Foundational for the restore wizard's tree browser. The wizard needs to
lazy-load directory contents from a snapshot as the operator drills
down; this lands the transport.

- internal/api adds MsgTreeList (server → agent) + MsgTreeListResult
  (agent → server) with TreeListRequestPayload / TreeListEntry /
  TreeListResultPayload types. Reply correlates by Envelope.ID.
- internal/restic.ListTreeChildren wraps 'restic ls --json' and
  filters its recursive output to direct children of the requested
  path. Parser + path-normalisation + isDirectChild are unit-tested.
- internal/server/ws/rpc.go introduces a generic SendRPC helper on
  Hub: register a buffered channel keyed by ULID, send the request,
  block on ctx.Done()/timeout/reply. Reply routing piggybacks on the
  existing dispatchAgentMessage by adding a MsgTreeListResult case
  that forwards to the registered waiter; if no waiter is registered
  (caller already gave up) the stray reply is dropped quietly.
- cmd/agent gains a tree.list handler that runs ListTreeChildren on a
  fresh per-call context (60s ceiling) and ships the matching
  tree.list.result envelope. Errors surface in result.Error rather
  than as transport failures so the server-side waiter can render a
  sensible UI message.
- internal/server/http/tree_cache.go is the per-wizard-session cache
  layer (~30min TTL, sweep-on-access) that fetchTreeWithCache uses
  before falling through to SendRPC. Cached on success only; agent
  errors aren't cached so a transient failure doesn't poison the
  session.

Tests:
- internal/restic/ls_test.go covers parseLsChildren at root / mid-tree
  / leaf, plus normalizeTreePath and isDirectChild edge cases.
- internal/server/ws/rpc_test.go unit-tests the registry: round-trip,
  release semantics, concurrent waiters, ctx-cancel.
- internal/server/http/tree_rpc_test.go is the full round-trip: server
  SendRPC → fake-agent over a real WS → reply → server gets the
  payload. Plus a timeout test that confirms ~300ms timeouts terminate
  in ~300ms rather than waiting forever.

The cache is plumbed but no UI handler hits fetchTreeWithCache yet —
that lands with P3-01 (wizard backend). The unused-linter is suppressed
via nolint until the wizard wires it in.
2026-05-04 15:19:22 +01:00
steve 9fa2ef48f0 P3-X1: cancel-job feature
Wires the existing job_detail Cancel button (which was a UI stub) into
real backend behaviour:

- internal/api already declared MsgCommandCancel + CommandCancelPayload;
  promote those from forward-declarations to a working envelope. Agent
  side: cmd/agent/main.go drops the TODO-stub and gains a per-job
  ctx.CancelFunc map. runJob's switch is refactored around a small
  spawn() helper so each kind's goroutine derives a per-job context,
  registers the cancel, and removes itself on completion regardless of
  outcome. command.cancel looks up the func and fires it.
- internal/agent/runner.sendFinished now takes ctx and rebadges
  ctx.Canceled errors as JobCancelled (exit 130) rather than
  JobFailed. All Run* call sites updated.
- internal/restic.resticCmd sets cmd.Cancel to send SIGTERM (via
  build-tagged sigterm constant; os.Kill on Windows since SIGTERM
  isn't deliverable there) and cmd.WaitDelay=5s for the SIGKILL
  fallback. SIGTERM lets restic remove its lock file before exiting.
- New POST /api/jobs/{id}/cancel server endpoint validates the job
  is non-terminal and the host is online, sends command.cancel via
  the hub, writes a job.cancel audit row, returns 202. The agent's
  resulting job.finished (status=cancelled) is what actually
  transitions the row.

Tests:
- internal/server/http/cancel_test.go covers happy path (envelope
  shape + audit row), 409 for terminal jobs, 404 for missing jobs,
  503 for offline hosts.
- internal/agent/runner/cancel_test.go covers cancel mid-run: a fake
  restic that exec'd into 'sleep 30' is canceled 150ms after start
  and the resulting job.finished reports JobCancelled with exit 130
  in well under the WaitDelay.

Foundational for P3 restore (operator needs to be able to cancel a
running backup if they need to restore urgently). Independently useful
for prune/check/backup that are stuck.
2026-05-04 15:11:49 +01:00
steve 454a2415dc docs: P3 restore design spec + scope-decompose Phase 3
Splits Phase 3 into three independently-shippable sub-phases (Restore,
Alerts, Audit UI) so they can land in separate PRs with their own brainstorm
→ spec → plan cycles. The Restore sub-phase is up first.

The brainstorm ran on 2026-05-04 and locked the following decisions:

- Single-host restore only this phase. P3-04 (cross-host restore) is moved
  to a new 'Future / unscheduled' section. Disaster recovery is already
  covered by re-enrolling a replacement host with the same repo creds; the
  remaining 'pull a file from host A onto host C' use case is genuinely
  different (file sharing / migration, not DR) and has no confirmed need.
- Default target is /var/restic-restore/<job-id>/ with --no-ownership;
  in-place restore preserves uid/gid/mode and is gated by typed-confirmation
  of the host name (mirroring the repo re-init danger zone).
- Tree browser is the path picker, lazy-loaded via a synchronous WS RPC
  (tree.list) over the existing correlation-ID infrastructure with a
  per-wizard-session in-memory cache (~30 min TTL).
- Single-page wizard with progressively-enabled sections; entry is a
  top-level Restore button on host detail (or per-snapshot Restore action
  for direct deep-link).
- Snapshot diff (P3-09) is a JobDiff JobKind, dispatched like every other
  agent operation; output streams to the standard live job log page.
- Restore-specific live job page variant with files-restored /
  bytes-restored / current-file widget.
- Single-flight per host across all kinds, plus a real cancel-job feature
  (command.cancel WS envelope, agent kills the restic subprocess via
  context cancel + SIGTERM/SIGKILL grace) so the operator can pre-empt a
  long-running backup if they need to restore urgently. Wires the existing
  job_detail Cancel button (which was a UI stub).
- Audit row host.restore on every dispatch + a recent-restores panel on
  host detail. Role gate deferred to P4-03 RBAC.

Wireframe at _diag/p3-restore-wizard/wireframe.html (gitignored —
transient design artefact); screenshot reviewed and approved 2026-05-04.
2026-05-04 15:02:32 +01:00
steve 0bd7a896c4 Merge pull request 'P2 completion (P2R-09/10/11/12/13/14, P2-16/17/18)' (#5) from p2-completion into main 2026-05-04 13:19:05 +00:00
steve bdabcfb68e docs: note Gitea repo + tea CLI in CLAUDE.md
CI / Build (windows/amd64) (pull_request) Successful in 19s
CI / Lint (pull_request) Successful in 21s
CI / Build (linux/amd64) (pull_request) Successful in 19s
CI / Build (linux/arm64) (pull_request) Successful in 19s
CI / Test (linux/amd64) (pull_request) Successful in 2m17s
2026-05-04 14:18:50 +01:00
steve c691dc8a56 tasks: tick P2 completion + Playwright sweep screenshots
CI / Build (windows/amd64) (pull_request) Successful in 20s
CI / Lint (pull_request) Successful in 41s
CI / Build (linux/amd64) (pull_request) Successful in 21s
CI / Test (linux/amd64) (pull_request) Successful in 53s
CI / Build (linux/arm64) (pull_request) Successful in 1m48s
P2R-09/10/11/12/13/14, P2-16/17/18 all marked done. Acceptance line
for Windows hosts annotated as 'compile-verified, untested in CI'.

_diag/p2-completion-sweep/ holds the dashboard + host-detail +
schedules + sources + repo + source-group-edit screenshots from a
clean sweep against :8080. Zero console errors throughout.

announce_test.go: rate-limit + global-cap subtests dropped t.Parallel
to avoid racing on the package-level tunables under -race.
2026-05-04 11:27:09 +01:00
steve 8ceb76c733 deploy: P2-17 install.ps1 (Windows installer)
Pwsh installer that detects arch, downloads
$Server/agent/binary?os=windows&arch=amd64 to
C:\Program Files\restic-manager\, runs the agent in -enroll-server
[+ -enroll-token] mode (token flow OR announce-and-approve), then
calls 'restic-manager-agent install' to register the SCM service.
Surfaces existing scheduled tasks named *restic* without disabling.

CLAUDE.md restage block updated to also stage install.ps1 alongside
install.sh.
2026-05-04 11:15:18 +01:00
steve d29475560d agent: P2-16 Windows service (SCM) integration
internal/agent/service: build-tagged into service_windows.go (svc.Handler
that listens for Stop/Shutdown + delegates to the agent loop) and
service_other.go (foreground stub for Linux/macOS). install_windows.go
wraps mgr.Connect+CreateService/Delete/Start/Stop for the new
'restic-manager-agent install|uninstall|start|stop' subcommands.

Cross-compile verified: GOOS=windows GOARCH=amd64 go build ./cmd/agent
succeeds. UNTESTED on Windows itself — the SCM round-trip can't be
exercised from Linux CI; treat as a starting point for the first
real Windows install.
2026-05-04 11:13:56 +01:00
steve bbdf631a01 ui+server: P2-18d pending hosts dashboard panel + expiry sweeper
Dashboard handler loads ListPendingHosts(now); template renders a
warn-bordered panel above the host table with hostname, OS/arch,
fingerprint (selectable / copyable), source IP, age, expiry. Each
row carries an inline accept form (repo URL/user/password) plus a
Reject button. cmd/server adds a 60s ticker calling
DeleteExpiredPendingHosts so 1h-stale rows drop off.
2026-05-04 11:11:32 +01:00
steve a3a53e3b87 agent: P2-18c announce-and-approve enrolment path
When -enroll-server is supplied without -enroll-token, the agent
mints (and persists) an Ed25519 keypair, POSTs /api/agents/announce,
prints the SHA256 fingerprint in a copy-friendly banner, opens
/ws/agent/pending, signs the server's nonce, and blocks until the
admin clicks Accept (1h ceiling). On accept, persists the bearer +
host_id from the 'enrolled' message; on reject (close code 4001)
exits with a clear error.

Repo creds are pushed via config.update on the first standard WS
hello (P1-32 path), not in the enrolled message itself.
2026-05-04 11:09:47 +01:00
steve 567561a6a3 server: P2-18b pending WS + admin accept/reject
GET /ws/agent/pending?pending_id=… runs an Ed25519 nonce-sign
handshake against the row's stored public key, then holds the
connection open. POST /api/pending-hosts/{id}/accept (admin)
mints a real Host row + bearer + AEAD-encrypted repo creds, pushes
the bearer down the open WS, deletes the pending row, and writes
a host.accept_pending audit entry. POST /api/pending-hosts/{id}/reject
closes the socket with code 4001 and audit-logs host.reject_pending.

In-memory pendingHub keyed by pending_id wires accept/reject to
their live socket.
2026-05-04 11:07:32 +01:00
steve a8e6c9d6d7 store+server: P2-18a announce-and-approve schema + endpoint
migration 0011 adds pending_hosts table (id, hostname, public_key,
fingerprint, expiry). store/pending_hosts.go covers full CRUD plus
hostname-collision count + expired-row sweeper.

POST /api/agents/announce takes {hostname, os, arch, agent_version,
restic_version, public_key (base64)}, returns {pending_id,
fingerprint, hostname_collision}. Per-source-IP token-bucket
rate limit (10/min) + global cap of 100 in-flight rows. Public
key must be exactly 32 bytes (Ed25519).
2026-05-04 11:03:41 +01:00
steve 1d3661470f ui: P2R-12 hook editor — source-group form + host-default Repo section
Source-group edit form gains pre/post hook textareas with a service-
user warning banner; bodies AEAD-encrypted on save (per-group AD).
Repo page adds a 'Host-default hooks' panel above the danger zone
with the same shape; saved via POST /hosts/{id}/repo/hooks.
2026-05-04 11:00:28 +01:00
steve 13c35b68d4 agent+server: P2R-11 pre/post hook execution for backup jobs
Agent: new runner.BackupHooks struct + runHook helper invoked via
/bin/sh -c (cmd.exe /C on Windows). pre_hook non-zero exit aborts
the backup; post_hook always runs with RM_JOB_STATUS=succeeded|failed
in env. Output streamed as 'hook(<phase>): …' log.stream lines.
Hooks only run for kind=backup (other kinds skip both phases).

Server: resolveBackupHooks resolves group → host default → empty,
decrypts via crypto.AEAD with per-slot ad bytes, plumbs plaintext
into CommandRunPayload for both schedule.fire and per-group
Run-now dispatch sites. Decrypt failures degrade silently to no
hook so a malformed blob can't poison every backup.
2026-05-04 10:57:28 +01:00
steve c20375eaf5 store: P2R-10 schema for source-group + host-default hooks (migration 0010)
Adds pre_hook/post_hook BLOB columns to source_groups and
pre_hook_default/post_hook_default to hosts. Bytes stored verbatim
(AEAD encrypt/decrypt happens at the HTTP layer where the AEAD key
lives). Round-trip tests cover set/clear semantics on both tables.
2026-05-04 10:52:16 +01:00
steve cce3cd8384 ui: P2R-09 auto-init UX — init line in chrome + danger-zone re-init
Latest 'init' job status surfaced under the host-detail vitals strip
(succeeded/failed/running/queued, with link to the live job log on
non-success). New POST /hosts/{id}/repo/reinit handler dispatches a
fresh init job after the operator types the host name to confirm;
audit row records 'host.repo_reinit'.
2026-05-04 10:49:57 +01:00
steve 93ab0ae84f ui+server: schedule next-run / last-run on dashboard + schedules tab
P2R-14. New store.LatestJobBySchedule query (per-schedule fired job).
Schedules-tab handler computes next-fire from cron + last-fire from
the jobs table per row. Schedules table grows two columns; dashboard
host row prepends 'next 12h ago/from now' to the existing last-backup
line when a single covering schedule is the run-now candidate.

Embeds store.Schedule into scheduleRow so existing template field
references keep working without bulk renames.
2026-05-04 10:44:31 +01:00
steve 6589f23313 ui+server: per-job bandwidth override on Run-now
P2R-13b. POST /hosts/{id}/source-groups/{gid}/run accepts optional
bandwidth_up_kbps / bandwidth_down_kbps form fields, plumbs them onto
CommandRunPayload. Agent dispatcher already prefers per-job override
over host-wide caps (T1). UI wraps the Run-now button in a form with
a <details> 'Limit bandwidth for this run' disclosure containing two
KB/s inputs.
2026-05-04 10:41:13 +01:00
steve ddc07609cb agent+server: apply host bandwidth caps to restic invocations
P2R-13a. restic.Env gains LimitUploadKBps/LimitDownloadKBps which are
emitted as global --limit-upload/--limit-download flags before the
subcommand on every invocation. Agent dispatcher tracks host-wide
caps received via config.update; server pushes them on hello and
after PUT /api/hosts/{id}/bandwidth.

Also extends api.CommandRunPayload with optional per-job overrides
(BandwidthUpKBps/Down + PreHook/PostHook); the override consumers
land in T2/T6.
2026-05-04 10:38:34 +01:00
steve 21d967a2cf plan: P2 completion (P2R-09/10/11/12/13/14, P2-16/17/18) 2026-05-04 10:33:34 +01:00
steve 24973bdc72 Merge pull request 'tasks: tasks.md sync left behind by PR #3 merge' (#4) from tasks-md-phase5-sync into main 2026-05-04 09:26:42 +00:00
steve cd510d2032 tasks: collapse Phase 5 header + fix P2R-03/04 cadence cross-refs
CI / Lint (pull_request) Successful in 19s
CI / Build (windows/amd64) (pull_request) Successful in 18s
CI / Build (linux/arm64) (pull_request) Successful in 18s
CI / Build (linux/amd64) (pull_request) Successful in 44s
CI / Test (linux/amd64) (pull_request) Successful in 1m23s
The Phase 5 section had drifted from the convention used by phases
1–4 (single section header carrying , no separate summary block).
Collapse to the existing pattern; fold the summary into a blockquote
sitting right under the header.

While there: P2R-03 and P2R-04 still carried forward-references
saying "cadence-driven dispatch lands in P2R-04 / P2R-05". Both
should point at P2R-06 (the maintenance ticker), not the next item
in the list. Updated descriptions to reflect what actually shipped:
LatestJobByKind anchor includes in-flight jobs, ForgetGroups
multi-group payload reshape, repo.stats envelope shape, per-host
drain mutex.
2026-05-04 10:26:24 +01:00
steve a07d7fc53e Merge pull request 'P2 redesign Phase 5 — prune/check/unlock + maintenance ticker + repo stats + pending-runs queue' (#3) from p2r-phase5-maintenance into main
Reviewed-on: #3
2026-05-04 09:25:00 +00:00
steve bc02fcb498 test: poll pending-row count in drain-on-reconnect test (race fix)
CI / Lint (pull_request) Successful in 17s
CI / Test (linux/amd64) (pull_request) Successful in 43s
CI / Build (linux/amd64) (pull_request) Successful in 22s
CI / Build (windows/amd64) (pull_request) Successful in 51s
CI / Build (linux/arm64) (pull_request) Successful in 21s
CI run #50 failed with:

  --- FAIL: TestDrainPendingDispatchesOnReconnect (1.03s)
      pending_drain_test.go:150: pending rows after drain: got 1, want 0

The test waits for a backup command.run envelope on the wire and
then checks the pending-row count. But conn.Send (the wire write)
returns BEFORE DeletePendingRun runs in the drain goroutine — both
fire serially inside drainOne, but the wire-side reader can observe
the Send while the delete is still pending.

Use the existing waitForPendingCount helper to poll the count with
a 2s deadline. Behaviour unchanged when the delete is fast (count
hits 0 immediately); only relevant under CI scheduling pressure.
-race -count=10 locally now passes consistently.
2026-05-04 10:20:54 +01:00
steve d8dd21b5e0 test: write-then-rename script-bin helpers (avoid ETXTBSY under -race)
CI / Build (windows/amd64) (pull_request) Successful in 18s
CI / Build (linux/amd64) (pull_request) Successful in 19s
CI / Lint (pull_request) Successful in 41s
CI / Build (linux/arm64) (pull_request) Successful in 18s
CI / Test (linux/amd64) (pull_request) Failing after 3m41s
CI run #48 failed with:

  --- FAIL: TestRunInitShipsStartedAndFinished
      RunInit: ... fork/exec /tmp/.../restic: text file busy

setupScript and setupScriptBin used os.WriteFile to write a shell
script directly at the final path, then exec'd it. Under -race +
many t.Parallel tests, a fork-from-another-goroutine could inherit
the still-open writable fd from one of those WriteFile calls; the
kernel returns ETXTBSY when the freshly-execed binary still has a
writable fd anywhere on the system.

Fix: write to "<path>.tmp", then os.Rename into place. The rename
is a pure dirent op; by the time the final path exists, no process
has a writable fd on its inode and exec is safe. -race + -count=5
on both runner packages now passes consistently.
2026-05-04 10:19:15 +01:00
steve b054e7b987 api+agent: document protocol-version stability and forget back-compat decisions
version.go: add a comment block explaining why Phase 5's wire changes
(CommandRunPayload, ConfigUpdatePayload, RepoStatsPayload reshapes) did
not bump CurrentProtocolVersion — lockstep deploy, no rolling-upgrade
path, smoke env restage enforces it. Notes where a version bump to 2
would be required if a multi-version path is ever introduced.

cmd/agent/main.go: document why the JobForget handler hard-errors on
empty ForgetGroups rather than falling back to a single-policy form.
The maintenance ticker is the only writer and always populates the
field; the fallback was specced but skipped given lockstep deploy.
2026-05-04 10:19:15 +01:00
steve 99ef2b7a71 server: serialize DrainPending per host (avoid drain double-dispatch)
Add a per-host drain mutex (drainLocks map guarded by drainLocksMu) on
the Server struct. DrainPending acquires it with TryLock: if a drain is
already in-flight for this host, the call returns immediately — the
running drain will see every pending row. This prevents the on-hello
goroutine and the 30s tick from both listing the same host's rows and
dispatching them twice.

Update three existing tests that called srv.DrainPending explicitly
after the on-hello goroutine had already been spawned: replace the
now-redundant direct call with a waitForPendingCount poll so they don't
race the goroutine's mutex ownership. Add TestDrainPendingSerializesPerHost
which fires 10 concurrent DrainPending goroutines against a 5-row queue
and asserts exactly 5 job rows result.
2026-05-04 10:19:15 +01:00
steve b8c9c50a93 store: LatestJobByKind includes in-flight jobs (avoid maintenance double-fire)
Widen the SQL query to consider all statuses (queued, running,
succeeded, failed, cancelled) rather than terminal-only. An in-flight
prune that outlasts the 60s tick interval previously produced
ErrNotFound, causing the ticker to anchor at now-24h and fire a second
prune concurrently with the first.

Update the doc comment and test: remove the "queued job filtered out"
case, add assertions that a running job and a queued job are each
returned as the latest.
2026-05-04 10:19:15 +01:00
steve 18cc90d54e tasks: tick P2R-03 through P2R-08 done 2026-05-04 10:19:15 +01:00
steve a1db4ce4f7 diag: phase 5 Playwright sweep screenshots 2026-05-04 10:19:15 +01:00
steve 99b88d08c9 server/ws: persist repo.stats into host_repo_stats 2026-05-04 10:19:15 +01:00
steve 1629dc7146 server: drainer abandons only on ErrNotFound, not transient errors
GetSourceGroup errors in drainOne now gate on errors.Is(err,
store.ErrNotFound) before calling abandonPending, mirroring the
existing GetSchedule pattern. Transient errors (SQLITE_BUSY, context
cancellation) now log a warning and return without deleting the row.

Add regression test TestDrainPendingDropsRowsForGoneSourceGroup
confirming the ErrNotFound path still abandons correctly. Also add
a comment above the backoff-doubling loop explaining the progression.
2026-05-04 10:19:15 +01:00
steve 0c9ea75046 server: drainer uses dispatch-core to avoid duplicate pending_run enqueue
Extract dispatchBackupForGroupCore (persist+marshal+send, no enqueue on
failure) from dispatchBackupForGroup. drainOne now calls the core
directly so a failed Send only bumps the existing pending_runs row via
BumpPendingRunAttempt — not create a second row — stopping the
geometric duplication on repeated drain failures.

dispatchBackupForGroup (schedule.fire path) wraps the core and keeps
its enqueue-on-failure behaviour unchanged.

TestDrainPendingBumpsOnSendFailure strengthened: asserts exactly 1 row
remains after a send failure (was tolerating >=1 duplicate rows).
2026-05-04 10:19:15 +01:00
steve 3e337dfb3c server: drain pending_runs on tick + on agent reconnect
Two trigger paths land here:

- A 30s ticker in cmd/server calls Server.DrainAllDue(ctx). It
  walks pending_runs rows whose next_attempt_at <= now, dedupes by
  host, skips offline hosts, and per online host runs DrainPending.

- onAgentHello spawns a background DrainPending(hostID). When a
  host comes back, every pending row for it is dispatchable now —
  due-ness becomes irrelevant once the wire is back.

Each row's schedule + group are reloaded; ErrNotFound or
disabled-schedule or gone-group abandons the row with a
pending_run.abandoned audit. attempt >= retry_max also abandons.
Otherwise dispatchBackupForGroup is invoked; success deletes the
row, failure bumps attempt with exponential backoff capped at
30m.
2026-05-04 10:19:15 +01:00
steve e64cf25c0e server: enqueue pending_runs when scheduled-job dispatch fails
When dispatchBackupForGroup's conn.Send errors, queue a pending_runs
row (attempt=1, next_attempt_at = now + group.RetryBackoffSeconds)
instead of silently dropping the fire. The orphaned queued job row
is left behind for forensic visibility — the drainer will create a
fresh job row on its retry.

Also adds Store.ListPendingRunsForHost — the on-reconnect drain
walks every row for the host, regardless of due-ness, since the
host being back makes 'due' irrelevant.
2026-05-04 10:19:15 +01:00
steve 2794d5a821 server: fix stale RetentionPolicy comment + check Scan errors in maintenance test 2026-05-04 10:19:15 +01:00
steve c47cc682e0 server: maintenance ticker drives forget/prune/check on cadence
Wires a 60s server-side ticker to the pure-logic maintenance.Decide
introduced in the previous commit. Decisions flow through a new
DispatchMaintenance method on *Server, which:

  - skips offline hosts (no pending_runs queueing — maintenance is
    not a backup, missed fires shouldn't pile up)
  - silently skips prune when admin creds aren't bound
  - pushes admin creds before prune, then dispatches with
    RequiresAdminCreds=true (same as operator-driven prune)
  - persists job rows with actor_kind="system"

Reshapes the forget wire payload from a single RetentionPolicy to a
ForgetGroups list (one tag + per-group keep-* per source group). The
agent walks the groups and runs `restic forget --tag <name> --keep-*`
once per group. Dead-code removed: CommandRunPayload.RetentionPolicy,
the old forget JSON-decode in cmd/agent, and the single-policy form of
restic.RunForget.
2026-05-04 10:19:15 +01:00
steve e7e11454a8 maintenance: pure-logic ticker decides forget/prune/check fires 2026-05-04 10:19:15 +01:00
steve 77a8590e3a ui: hx-swap none on Run-now + truthful save banner + tailwind rebuild
Add hx-swap="none" to the three Run-now buttons (check/prune/unlock) in
host_repo.html to match the existing pattern on host_sources.html and
host_schedules.html. Fix all-blank admin-credentials save to redirect
without ?saved= query string so no false-positive banner is shown;
strengthen the corresponding test to assert Location has no ?saved=.
Rebuild CSS bundle via Tailwind to pick up max-w-[640px] JIT class.
2026-05-04 10:19:15 +01:00
steve 46ec123f95 ui: Slice E — admin creds form + run-now buttons + repo health panel
- hostRepoPage gains AdminURL/AdminUsername/HasAdminPassword, Online,
  and StatsView (pre-dereferenced projection of host_repo_stats).
- loadHostRepoPage loads the admin slot (tolerating ErrNotFound),
  hub.Connected, and stats (tolerating ErrNotFound).
- renderRepoPage gains an adminErr parameter; all callers updated.
- handleUIAdminCredentialsSave / handleUIAdminCredentialsDelete added
  (form-POST handlers mirroring the repo-creds pattern, with audit).
- Routes /hosts/{id}/admin-credentials POST and /delete POST registered.
- Template: Admin credentials form after Connection, Run-now HTMX
  buttons after Maintenance, Repo health stats panel in right rail.
- Tests: 9 new tests covering rendering, disabled states, save/delete
  round-trips, audit rows, and idempotent delete.
2026-05-04 10:19:15 +01:00
steve b35f1736f7 server: populate audit UserID on credential mutations + slog prune push errors
Switch handleSetHostCredentials, handleSetAdminCredentials, and
handleDeleteAdminCredentials from authedUser (bool) to requireUser
(*store.User) so AuditEntry.UserID and Actor are populated correctly.
Add slog.Warn on the non-ErrNotFound pushAdminCredsToAgent path in
handleRunRepoPrune so decrypt/send failures surface in the server log
rather than appearing as a generic host_offline 503.
2026-05-04 10:19:15 +01:00
steve a8aff2c62b server: cover HTMX auth-redirect path in repo-ops tests 2026-05-04 10:19:15 +01:00
steve 1ae567021a server: HTTP run-now for prune / check / unlock
Adds POST /api/hosts/{id}/repo/{prune,check,unlock} (and matching outer
routes for HTMX form posts). Prune pushes the admin-cred slot via
pushAdminCredsToAgent before dispatch and refuses with
admin_creds_required when the slot is not set. Check reads
check_subset_pct from host_repo_maintenance (overridable via ?subset=N,
clamped 0-100; non-numeric override falls back to DB value silently).
Unlock needs no admin creds. All three share the same wantsHTML/HX-Redirect
response split as the per-source-group run-now endpoint.
2026-05-04 10:19:15 +01:00
steve 81a00202d0 server: admin-credentials REST + Slot:admin push helper
Adds GET/PUT/DELETE /api/hosts/{id}/admin-credentials handlers that
mirror the existing repo-credentials endpoints but write to
store.CredKindAdmin with AEAD additional-data "host:<id>:admin" (scoped
away from the repo slot to prevent cross-binding). PUT immediately pushes
a config.update(Slot:"admin") to the agent when it is connected, and the
new pushAdminCredsToAgent helper is wired for use by the upcoming prune
run-now endpoint (D2) to push on-demand before dispatch.
2026-05-04 10:19:15 +01:00
steve dafae84149 agent: secrets fail-loud on corrupt blob + small polish
Save and SaveAdmin now propagate loadBundle errors instead of silently
overwriting a corrupt file (data-loss fix). Tests added for both paths.
reportStats logs a Debug on RunStats failure; r in runJob gets a comment
explaining the prune-runner asymmetry; runner_test comment tightened.
2026-05-04 10:19:15 +01:00
steve d3c354cd97 agent/runner: ship repo.stats before job.finished in RunCheck/RunUnlock
RunCheck and RunUnlock were calling sendFinished before reportStats,
inverting the required job.started → log.stream → repo.stats →
job.finished envelope order. Move reportStats ahead of sendFinished in
both functions to match the pattern already correct in RunPrune.

Strengthen TestRunCheckShipsCheckStatus, TestRunCheckErrorsFoundShipsErrorsStatus,
and TestRunUnlockClearsLock with the same position-index ordering
assertions used by TestRunPruneShipsExpectedEnvelopes; these assertions
would have failed against the pre-fix code.
2026-05-04 10:19:15 +01:00
steve 1f600fa849 agent: RunPrune/RunCheck/RunUnlock + reportStats + admin-cred slot dispatch
Extract resticEnv/sendStarted/streamHandler/sendFinished helpers to remove
boilerplate duplication across Run* methods. Add RunPrune (ships repo.stats
with LastPruneAt before job.finished), RunCheck (ships stats with
LastCheckStatus/LockPresent regardless of outcome), RunUnlock (ships
LockPresent=false on success), and reportStats (fills size fields via
RunStats when caller didn't populate them).

Wire JobPrune/JobCheck/JobUnlock into the dispatcher switch; teach
MsgConfigUpdate about the Slot discriminator for admin vs repo creds;
add strconv import for subset-pct parsing.
2026-05-04 10:19:15 +01:00
steve 212fd3e400 agent/secrets: separate admin slot with backwards-compatible decode
Split the on-disk bundle into repo + admin slots. Legacy flat Repo blobs
are detected at load time by the presence of "repo_url" at the top level
and transparently promoted into the new shape on the next Save/SaveAdmin.
Adds ErrNoAdmin sentinel, LoadAdmin, SaveAdmin, and three new tests.
2026-05-04 10:19:15 +01:00
steve c9be9040d9 api: stats partial-update payload + ConfigUpdate.Slot + CommandRun.RequiresAdminCreds
Reshape RepoStatsPayload into pointer-field partial-update form matching
store.HostRepoStats semantics; add Slot discriminator to ConfigUpdatePayload
for admin vs repo credential routing; add RequiresAdminCreds flag to
CommandRunPayload for prune/unlock jobs that need delete authority.
2026-05-04 10:19:15 +01:00
steve 7fd29427a0 restic: tighten RunCheck lock sniff + RunStats zero-snapshot test
Narrow the LockPresent predicate from bare "locked" (too broad) to
"stale lock" and "already locked" — the two phrases restic actually
emits. Replace TestRunCheckParsesLock with table-driven
TestRunCheckLockSniff covering both trigger phrases and a benign
"locked-file" line that must not set LockPresent. Add
TestRunStatsZeroSnapshots to pin that RunStats accepts zero-snapshot
JSON without error.
2026-05-04 10:19:15 +01:00
steve 49fd3f4441 restic: RunUnlock + RunStats (raw-data mode)
Add RunUnlock (delegates straight to runWithPump) and RunStats which
runs `restic stats --json --mode raw-data`, captures the single JSON
line from stdout into RepoStats, and returns an error if no JSON
arrives.  Tests cover arg plumbing for unlock, JSON parsing, and the
no-JSON error path.
2026-05-04 10:19:15 +01:00
steve f3eaf511be restic: RunCheck with subset% + lock-state sniffing
Add CheckResult (LockPresent, ErrorsFound) and RunCheck.  subsetPct>0
passes --read-data-subset N% to limit data reads.  Stderr is sniffed
for "Found stale lock"/"locked" to set LockPresent; a non-zero exit
from restic is absorbed as ErrorsFound=true rather than an error so
the caller can always persist last_check_status.  Tests cover lock
detection, exit-1 absorption, and subset-arg plumbing.
2026-05-04 10:19:15 +01:00
steve 2caf7f1193 restic: RunPrune + runWithPump helper, refactor Forget/Init onto it
Add RunPrune for admin-credential prune invocations.  Extract
runWithPump to DRY the stdout+stderr pump pattern; refactor RunForget
and RunInit to delegate to it (RunInit preserves the "config file
already exists" soft-success sniff by wrapping the handler before the
call).  Add runner_test.go with TestRunPruneInvokesPrune.
2026-05-04 10:19:15 +01:00
steve 4ad0b5147a store: tighten CHECK constraint on host_repo_stats.last_check_status 2026-05-04 10:19:15 +01:00
steve f97f67eb67 store: wrap UpsertHostRepoStats in a transaction (concurrency safety) 2026-05-04 10:19:15 +01:00
steve bc77081366 store: assert CHECK constraint on host_credentials.kind 2026-05-04 10:19:15 +01:00
steve 87655cf0e4 store: HostRepoStats projection (size, lock, last-check, last-prune) 2026-05-04 10:19:15 +01:00
steve de6d51eeb1 store: host_credentials becomes kind-aware (repo + admin slots) 2026-05-04 10:19:15 +01:00
steve 212ddfe226 store: migration 0009 — admin-creds kind + host_repo_stats 2026-05-04 10:19:15 +01:00
steve b640775a61 plan: P2 redesign Phase 5 (P2R-03..P2R-08) 2026-05-04 10:19:15 +01:00
steve 13f58537ad infra: remove provision-gitea-runner.sh (now lives with the infra team)
The runner-provisioning script has been handed off to the infra
agent, who will own it going forward. ci.yml's header comment is
updated to point at "the infra team owns the script" rather than
the in-repo path, but the runner expectations themselves stay the
same — workflows still rely on the persistent volumes, pre-cloned
actions, and host-installed golangci-lint that any compliant
provisioning produces.
2026-05-04 10:19:09 +01:00
steve a24eee4c68 ci+infra: provisioning script for gitea runners + drop setup-go cache
scripts/provision-gitea-runner.sh is a one-shot, idempotent host
setup for an act_runner LXC. It mounts persistent host volumes for
GOMODCACHE / GOCACHE / act-clones, pre-pulls the runner image,
pre-clones the common GitHub actions, installs golangci-lint, and
sets up a nightly cron to refresh the lot. Generic — no per-project
state.

With those persistent volumes in place, `cache: true` on
actions/setup-go becomes a net negative — the action keeps tar-ing /
un-tar-ing GOMODCACHE+GOCACHE through the Gitea cache backend on
every job, adding ~10s per job and overwriting the volume contents.
Drop it from all three jobs in ci.yml. Add a header comment block
explaining the runner-side expectations and the Go version / build
matrix / upload-artifact context for anyone reading later.
2026-05-04 09:40:27 +01:00
steve 0ae62261e3 Merge pull request 'P2R-02: UI rewire against the slim-schedule + source-group model' (#2) from p2r-02-ui-rebuild into main
Reviewed-on: #2
2026-05-03 20:34:02 +00:00
steve dd7b37a5c1 lint: align local gofumpt rules with golangci-lint v2.5.0
CI / Test (linux/amd64) (pull_request) Successful in 21s
CI / Lint (pull_request) Successful in 24s
CI / Build (windows/amd64) (pull_request) Successful in 20s
CI / Build (linux/amd64) (pull_request) Successful in 21s
CI / Build (linux/arm64) (pull_request) Successful in 20s
Bumping CI to v2.5.0 surfaced two new gofumpt findings (in two test
files that gofumpt v2.1.6 considered fine). Local re-format with
the matching tool brings them in line.

Pre-commit hook config: prepend $GOPATH/bin to PATH inside the hook
entry so gofumpt + golangci-lint resolve when ~/go/bin isn't on the
operator's interactive shell PATH (common — go install puts them
there but PATH config varies). Without this, the hooks fail with
'Executable not found' even when the tools are installed.

Pin the Makefile setup target to v2.5.0 so a fresh clone gets the
same binary CI runs — keeps pre-commit and CI from drifting again.
2026-05-03 21:31:47 +01:00
steve 694d9d9bf3 ci: bump golangci-lint to v2.5.0 (Go 1.25-built binary)
CI / Test (linux/amd64) (pull_request) Successful in 19s
CI / Lint (pull_request) Failing after 27s
CI / Build (windows/amd64) (pull_request) Successful in 21s
CI / Build (linux/amd64) (pull_request) Successful in 22s
CI / Build (linux/arm64) (pull_request) Successful in 20s
The v2.1.6 release binary is built with Go 1.24, and golangci-lint
refuses to load a config targeting a newer toolchain than itself
('Go language version (go1.24) used to build golangci-lint is lower
than the targeted Go version (1.25.0)'). go.mod is on 1.25, so the
binary needs to be too.

Locally this didn't bite because 'go install …@v2.1.6' compiled
v2.1.6 against the local Go 1.25 toolchain; CI uses the prebuilt
release tarball which carries the build-time Go version.

v2.5.0 is the first v2.x line built with Go 1.25 — pin in lockstep
with go.mod going forward.
2026-05-03 21:29:02 +01:00
steve 2d40002355 ci: enforce lint locally via pre-commit hook
CI / Test (linux/amd64) (pull_request) Successful in 29s
CI / Lint (pull_request) Failing after 16s
CI / Build (windows/amd64) (pull_request) Successful in 21s
CI / Build (linux/amd64) (pull_request) Successful in 21s
CI / Build (linux/arm64) (pull_request) Successful in 21s
The repo had a .pre-commit-config.yaml entry for golangci-lint
already, but pinned to v1.61.0 — which doesn't grok the v2 schema
we just migrated to, so it would crash if anyone ever ran it. Hence
nobody did.

Replace the third-party hook blocks with local hooks that call
whatever tool is on the developer's PATH (gofumpt + go vet +
golangci-lint). That way the version of each tool tracks what the
developer would invoke by hand — no drift between hook config and
binary.

Add 'make setup' as a one-liner per-clone bootstrap:
  * installs gofumpt + golangci-lint via go install if missing
  * installs the pre-commit hooks via 'pre-commit install'

end-of-file-fixer auto-fixed two existing files (web/static/css/
styles.css and ask.md) — trailing newlines, harmless.
2026-05-03 21:26:24 +01:00
steve e871b05b38 lint: drive baseline to zero, drop only-new-issues gate
CI / Test (linux/amd64) (pull_request) Successful in 34s
CI / Lint (pull_request) Failing after 16s
CI / Build (windows/amd64) (pull_request) Successful in 22s
CI / Build (linux/amd64) (pull_request) Successful in 20s
CI / Build (linux/arm64) (pull_request) Successful in 21s
Cleanup pass over the repo so CI can enforce lint going forward
without the only-new-issues escape hatch:

* gofumpt -w across the tree (31 hits, all formatting)
* misspell --fix (25 hits, US-locale spelling) — but reverted on
  api.JobCancelled = "cancelled" since that literal is the wire +
  DB CHECK constraint value, plus matched the case in store/fleet.go
  back to "cancelled" and added //nolint:misspell on both for the
  next time someone reaches for the auto-fix
* Wrap every `defer rows.Close()` / `defer stmt.Close()` /
  `defer res.Body.Close()` in `defer func() { _ = .Close() }()`
  to satisfy errcheck without losing the close itself
* websocket.Dial callers (1 prod, 4 tests) now capture + close the
  upgrade response Body — coder/websocket can return res with a nil
  Body on success, so the test deferred-closes guard against that
* Annotate the two genuine-by-design nilerr cases with //nolint
  comments explaining why nil-on-error is the contract (cookie
  missing = no session; ctx cancelled mid-backoff = clean shutdown)
* Add brief godoc on the 10 exported const groups + types that
  revive flagged (api.HostOS/HostArch/JobKind/JobStatus/LogStream/
  ErrorCode, restic.EventKind, store.Role, web.FS)
* Drop the unused (*Server).userByID method
* Inline the unparam baseView(active) — every UI page is under
  the dashboard primary nav today

Result: `golangci-lint run ./...` reports 0 issues. CI lint job
no longer needs only-new-issues: true; X-06 follow-up entry in
tasks.md removed.
2026-05-03 16:15:17 +01:00
steve 18a9f6624e ci: migrate .golangci.yml to v2 schema + only-new-issues gate
CI / Test (linux/amd64) (pull_request) Successful in 29s
CI / Lint (pull_request) Failing after 16s
CI / Build (windows/amd64) (pull_request) Successful in 20s
CI / Build (linux/amd64) (pull_request) Successful in 20s
CI / Build (linux/arm64) (pull_request) Successful in 21s
The bump from golangci-lint-action@v6 → v7 (which downloads the v2.x
binary) was blocking CI lint with 'unsupported version of the
configuration: ""' because .golangci.yml was still in the v1 schema.

Migrate the config to v2:
* version: "2" prelude
* disable-all → default: none
* linters-settings → linters.settings
* gofumpt + goimports move into formatters.enable + formatters.settings
* exclude-rules move into linters.exclusions.rules
* gosimple drops (folded into staticcheck in v2)

Fix the four lint hits in the new P2R-02 code:
* host_bandwidth.go: convert hostBandwidthRequest directly to
  hostBandwidthView via type conversion (S1016)
* ui_repo.go: drop unparam savedSection + status arguments from
  renderRepoPage (always "" / always 422 — split GET render from
  validation-fail render)
* ui_schedules.go: gofumpt formatting on the scheduleEditPage struct

Add only-new-issues: true to the lint job. The repo carries ~90
pre-existing findings (gofumpt drift × 31, misspell × 25, missing
godoc × 10, bodyclose × 6, errcheck × 12, …) accumulated before
lint was actually wired into CI. Without this gate, every PR would
fail on baseline noise instead of its own changes.

Track the cleanup as X-06 in tasks.md so the gate is temporary.
2026-05-03 15:00:24 +01:00
steve 2a8dd1eba2 P2R-02 — mark Phase 4 complete, all 6 slices done
CI / Test (linux/amd64) (pull_request) Successful in 1m28s
CI / Lint (pull_request) Failing after 31s
CI / Build (windows/amd64) (pull_request) Successful in 20s
CI / Build (linux/amd64) (pull_request) Successful in 20s
CI / Build (linux/arm64) (pull_request) Successful in 24s
Update tasks.md: Phase 4 of the P2 redesign is done end-to-end.
Slice 1–5 wired the four host-detail tabs against the new
slim-schedule + source-group + repo-maintenance model; slice 6
ran a Playwright sweep against the live :8080 server (login,
walk every tab, create source group, create schedule, Run-now,
confirm a snapshot landed) — clean pass, no console errors.
Screenshots in _diag/p2r-02-sweep/.

Side-fix landed alongside slice 6: agent runner now drops
restic's noisy --json status events from log.stream (the
throttled job.progress envelope already covers them).

Phase 5 (server-side maintenance ticker — P2R-03..08) is next.
2026-05-03 14:49:40 +01:00
steve fab99b4a38 P2R-02 slice 5: dashboard row Run-now uses covering schedule
Replace the placeholder 'Open →' link with a per-host Run-now
decision computed server-side once per render:

* If the host has exactly one enabled schedule whose source-group
  set covers every group on the host → primary 'Run all groups'
  button (HX-POST to that schedule's /run endpoint, fires every
  backup the host knows about in one click).
* Otherwise (zero matches, multiple matches, or any ambiguity) →
  ghost 'Open →' link to /hosts/{id}/sources, where the operator
  picks per-group from the source-group rows.

dashboardPage.Hosts moves from []store.Host to []dashboardHostRow
to carry the precomputed RunAllScheduleID; host_row.html now reads
.Host.* and .RunAllScheduleID. Two extra store calls per host on
dashboard render — fine at fleet sizes we care about; if we ever
need to support thousands of hosts we'll batch these queries.
2026-05-03 13:42:50 +01:00
steve ffba7371c5 agent runner: drop status-event spam from log.stream
restic --json emits a status frame ~every 16ms during a backup.
The runner was forwarding every line to log.stream verbatim, which
flooded the live log pane with duplicate status JSON for any
short-running backup (visible immediately on a 1000-file, ~4MB
test set: ~14 identical 'percent_done: 1' lines in 220ms).

The progress widget already covers the same information at a sane
sample rate (one per second via job.progress), so the raw status
lines in log.stream are double-bookkeeping. Skip them and forward
only non-status lines (file names, errors, summary).

Throttling logic for job.progress is unchanged.
2026-05-03 13:35:18 +01:00
steve 4035c44be3 P2R-02 follow-up: schedule Run-now feedback (single → job log, multi → toast)
Schedules tab Run-now used to silently HX-Redirect back to the
list, leaving the operator wondering whether the click registered.
Now:

* Single-source-group schedule → HX-Redirect to that one job's
  live log, matching the per-source-group Run-now UX from Sources.
* Multi-group schedule → stay on the schedules list and fire a
  success toast ("N backups dispatched: <group names>") via the
  existing rm:toast HX-Trigger channel, so the operator sees clear
  acknowledgement without losing their place.

dispatchBackupForGroup now returns the persisted job ID so the
caller can choose between job-log redirect and toast feedback;
on any internal failure it returns "" and the warning still
hits slog as before. The cron-fired path (dispatchScheduledJob)
ignores the return value, behaviour unchanged.
2026-05-03 13:25:31 +01:00
steve d62b173712 P2R-02 slice 4: Repo tab — connection / bandwidth / maintenance
Three independent forms on /hosts/{id}/repo so saving one section
doesn't disturb the others:

* Connection: edits repo URL, username, password (pre-filled from
  the redacted GET /api/hosts/{id}/repo-credentials view; password
  field shows masked stored-creds placeholder; blank password = keep
  existing). On save, encrypts and pushes config.update to a
  connected agent.
* Bandwidth: host-wide upload/download caps (KB/s; blank = no cap)
  written via store.SetHostBandwidth. New REST endpoint
  PUT /api/hosts/{id}/bandwidth for JSON callers.
* Maintenance: forget/prune/check cadences + check subset %, with
  per-row enabled toggles. Reuses cronParser for validation;
  auto-seeds the row if a host pre-dates the migration.

Right-rail surfaces repo size, snapshot count, snapshots-by-tag
breakdown (counted from existing snapshot tag rows), and an
'untagged snapshots are left alone' note.

Danger-zone re-init button is rendered but disabled with a hint
pointing at P2R-09 (real implementation lands there).

Validation re-renders the page with the relevant form's banner and
all other section state intact. Successful saves redirect with a
?saved=<section> query param so the page surfaces a small ✓ saved
indicator on the relevant form.

ci.yml: bump golangci-lint-action v6→v7 (separate change picked up
in this commit).
2026-05-03 12:14:03 +01:00
steve 8b91d3037c P2R-02 follow-up: Run-now works on disabled schedules with confirm
CI / Test (linux/amd64) (pull_request) Successful in 33s
CI / Lint (pull_request) Failing after 15s
CI / Build (windows/amd64) (pull_request) Successful in 22s
CI / Build (linux/amd64) (pull_request) Successful in 23s
CI / Build (linux/arm64) (pull_request) Successful in 23s
Surface the Run-now button on every schedule when the host is online,
not just enabled ones. Disabled rows render the button as a non-primary
style + a HX-confirm dialog ("This schedule is paused — running it now
won't change that. Fire it once anyway?"); enabled rows keep the
zero-friction primary button.

Server-side, Run-now no longer short-circuits on !Enabled — it
dispatches the source groups inline rather than via dispatchScheduledJob
(which always bails on disabled schedules, since cron-tick semantics
are different from explicit operator intent). The audit-log entry
inside dispatchBackupForGroup still records every fire.
2026-05-03 12:07:26 +01:00
steve 64d2fcf7a3 P2R-02 follow-up: clickable rows on Sources/Schedules + cron-preset tooltips
CI / Test (linux/amd64) (pull_request) Successful in 1m57s
CI / Lint (pull_request) Failing after 15s
CI / Build (windows/amd64) (pull_request) Successful in 22s
CI / Build (linux/amd64) (pull_request) Successful in 22s
CI / Build (linux/arm64) (pull_request) Successful in 22s
Aligns Sources and Schedules tab rows with the dashboard's row-click
UX: whole-row click navigates to the row's edit page (mirroring
.host-row.clickable). Drops the redundant Edit buttons; Run-now and
Delete remain in .row-action cells that sit above the row-link
overlay via z-index.

Schedule edit form's cron preset chips now carry human-readable
title= tooltips ("Every day at 03:00", "Every Sunday at 03:00", etc).

tasks.md gets a binding row-design rule covering all current and
future list-row templates, and the P2R-02 entry is split into the
six slices already agreed with the operator (slices 1–3 marked
done, 4 next).
2026-05-03 12:01:55 +01:00
steve 67ca769686 P2R-02 slice 3: Schedules tab — slim list, new/edit form, delete, Run-now
CI / Test (linux/amd64) (pull_request) Failing after 44s
CI / Lint (pull_request) Failing after 13s
CI / Build (windows/amd64) (pull_request) Successful in 19s
CI / Build (linux/amd64) (pull_request) Successful in 19s
CI / Build (linux/arm64) (pull_request) Successful in 25s
Schedules list: status (enabled/paused) + cron + source-group tags +
actions (Run-now when enabled+online, Edit, Delete). Run-now reuses
dispatchScheduledJob — same path real cron fires take, so each
referenced source group runs as its own backup with its own tag.
Falls back to a 409 if the agent is offline.

Schedule new/edit form: cron input with five preset chips
(quick-pick @hourly / nightly / 6h / weekly / monthly), source-group
multi-pick rendered as styled checkbox cards (visual state tracks
the underlying box via a tiny inline script), enabled toggle. No
paths/excludes/retention/kind on the schedule itself — those live on
source groups now.

Server-side validation re-renders with the operator's input + ticked
groups intact. Every successful mutation calls pushScheduleSetAsync.

Adds .schd-row, .preset-chip, .picker styles.
2026-05-03 11:55:16 +01:00
steve dede74fd3a P2R-02 slice 2 follow-up: refuse to delete a host's last source group
CI / Test (linux/amd64) (pull_request) Failing after 45s
CI / Lint (pull_request) Failing after 12s
CI / Build (windows/amd64) (pull_request) Successful in 19s
CI / Build (linux/amd64) (pull_request) Successful in 19s
CI / Build (linux/arm64) (pull_request) Successful in 23s
Belt-and-braces: the UI now disables the Delete button when a group
is the only one on the host (with a tooltip explaining why), and the
server-side handler returns 409 if a curl/form-replay tries anyway.
Every host needs at least one source group to be backup-able, so the
'last group on a fresh host' case is a meaningful accident to guard
against.
2026-05-03 11:49:17 +01:00
steve 0ed9c3d1ec P2R-02 slice 2: Sources tab — list, new/edit form, delete, Run-now
Sources tab now lists every source group on the host with per-row
counts (used-by-N-schedules, snapshot count by tag), the v4
conflict tag (keep-* dimension that has no compatible cadence),
and Run-now / Edit / Delete actions. Run-now reuses the existing
HTMX-aware /hosts/{id}/source-groups/{gid}/run handler.

New /hosts/{id}/sources/new and /sources/{gid}/edit form: name +
includes/excludes textareas + the 3×2 keep-* retention grid +
retry-on-offline knobs. Server-side validation re-renders with the
operator's input intact; the inline conflict banner shows above the
retention grid when ConflictDimension is set.

Delete blocks (UI + server) when the group is referenced by any
schedule. Every successful mutation calls pushScheduleSetAsync so
an online agent re-arms within seconds.

Adds .src-row and .keep-cell to input.css for the row + retention
grid layout.
2026-05-03 11:44:43 +01:00
steve a535822ff3 P2R-02 slice 1: host-detail sub-tab skeleton
Extract header/vitals/sub-tabs into a host_chrome partial that every
host-detail tab page renders. Sources / Schedules / Repo go from
inert divs to real <a> links backed by stub pages that share the
chrome and a 'coming next' body — slices 2/3/4 fill them in.

Also re-establishes the version indicator (host_schedule_version vs
agent's applied_schedule_version) in the header.

Drops the legacy fat-schedule list/edit templates that referenced
fields removed by the P2 redesign (Manual / Paths / RetentionPolicy
on Schedule); the new templates land in slice 3.
2026-05-03 11:37:55 +01:00
steve 21841e38c4 ci: only trigger on PRs into main
Drop the push-to-main trigger; main is fast-forward only via PR, so
the post-merge run was redundant.
2026-05-03 11:25:13 +01:00
steve e968abc042 ci: fix race-trip in enrollment fixture + bump golangci-lint to v2.1.6
- host_credentials_test.go's CreateEnrollmentToken fixture passed 1<<20
  as the TTL (third arg, time.Duration) — that's ~1ms in nanoseconds.
  Local non-race runs finished inside the window, but -race overhead
  blew the deadline so the token was already expired by the time
  GetEnrollmentTokenAttachments / ConsumeEnrollmentToken ran. Use
  time.Hour instead, which matches the spirit of a per-test fixture.
- Lint pin v1.61.0 was built against Go 1.23 and refuses to load a
  config targeting newer toolchains. go.mod is on 1.25, so the lint
  step exited 3 ('the Go language version used to build golangci-lint
  is lower than the targeted Go version'). Bumping to v2.1.6, which
  supports Go 1.25.

Both failures showed up only on the Gitea runner because local make
target runs go test without -race and lint hadn't been re-run after
the go.mod toolchain bump.
2026-05-03 11:13:22 +01:00
steve 713bc4a2bb P2R-01 follow-up: WS-path tests + drop unused retention from backup dispatch
Adds p2r01_ws_test.go covering the two paths the original commit's
in-process tests couldn't reach without a live conn:

- maybeAutoInit dispatches command.run(init) on first hello when creds
  are bound, skips on second hello once a job row exists, and skips
  entirely when the host has no creds.
- dispatchScheduledJob iterates a schedule's source groups and emits
  one backup per group with the right Tag/Includes; persists job rows
  with actor_kind=schedule + scheduled_id; no-ops on a disabled
  schedule.

Drops RetentionPolicy from the per-group Run-now and schedule.fire
backup payloads — the agent's RunBackup ignores it (forget is the
only consumer). Adds Hub.Conn() so tests can grab the live *Conn
post-hello.
2026-05-03 11:00:45 +01:00
steve d000fe7ec1 P2R-01: REST + WS rewire against the slim shape
Schedules CRUD now takes {cron, enabled, source_group_ids[]} with cron
parsed via robfig/cron/v3 and group membership scoped to the host.
New source-groups CRUD lives at /api/hosts/{id}/source-groups; delete
refuses with 409 if any schedule still references the group, returning
the schedule list so the UI can prompt 'remove from these schedules
first.' Repo-maintenance GET/PUT manages forget/prune/check cadences
on host_repo_maintenance — no version bump, the server-side ticker
(P2R-06) drives execution.

Per-source-group Run-now (POST /hosts/{id}/source-groups/{gid}/run)
resolves the group's includes/excludes/retention/tag and dispatches a
backup command.run with the new structured CommandRunPayload fields
(Includes/Excludes/Tag). Old per-host /hosts/{id}/run-backup and
/hosts/{id}/init-repo return 410 Gone with a redirect message.

schedule_push.go is rebuilt: buildScheduleSetPayload assembles the
slim wire shape, pushScheduleSetOnConn ships it during the on-hello
window, pushScheduleSetAsync fires after every CRUD mutation, and
dispatchScheduledJob handles agent schedule.fire by iterating the
schedule's source groups and dispatching one backup per group with
actor_kind=schedule and scheduled_id pointing at the schedule.

Auto-init at first WS connect: when the host has repo creds bound and
no init job in its history, server dispatches restic init. Restic's
'config file already exists' soft-success means re-runs against an
existing repo no-op; we don't auto-retry on failure (operator triggers
re-init manually via the danger zone in P2R-09).

api.Schedule drops Kind/Paths/Excludes/Tags/RetentionPolicy/Manual etc.
in favour of {id, cron, enabled, source_groups: [...]}. The agent
scheduler stops checking sch.Manual; cmd/agent's backup dispatch reads
Includes/Excludes/Tag instead of Args.

Tests cover the new HTTP surface end-to-end: source-groups CRUD with
in-use refusal, schedule validation (bad cron / missing groups /
foreign group), repo-maintenance auto-seed and validation, the 410
route, and buildScheduleSetPayload's wire-shape correctness. Full
suite passes; smoke env exercises auto-init dispatch on hello,
async push after schedule create, and per-source-group Run-now
landing the right paths/excludes/tag at the agent.
2026-05-03 10:56:40 +01:00
steve 337dcc0f0f fix(.mcp.json): wrap playwright under mcpServers key
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-03 10:35:57 +01:00
steve 813158b3d6 P2 redesign · phase 2.5: tasks.md rewrite + UI patch-up
CI / Test (linux/amd64) (push) Failing after 4m47s
CI / Lint (push) Failing after 26s
CI / Build (windows/amd64) (push) Successful in 54s
CI / Build (linux/amd64) (push) Successful in 46s
CI / Build (linux/arm64) (push) Successful in 46s
The store rewrite in 5667cdf left tasks.md describing a data shape
(fat schedules, host.repo_initialised_at, manual flag) that no longer
exists, and left the host-detail templates rendering against fields
the store no longer exposes. This commit reconciles both.

tasks.md
* Mid-phase pivot called out at the top of Phase 2 with commit hashes.
* P2-01..P2-05 kept as done but stamped ⚠️ "shipped against old shape
  — to re-validate under P2R-02".
* P2-04.5 (manual flag) struck as superseded.
* New P2R-NN section covering work that previously lived only in
  commit messages and code stubs:
    P2R-00.1/00.2/00.3/00.4 — phases already shipped (this commit
                              records 00.4)
    P2R-01 — REST + WS rewire against slim schedules + source groups
             + repo maintenance + auto-init
    P2R-02 — UI rewire against the v4 wireframes
    P2R-03..05 — prune / check / unlock command surfaces
    P2R-06 — server-side maintenance ticker (cadence-driven)
    P2R-07 — repo stats panel
    P2R-08 — pending_runs queue worker
    P2R-09 — auto-init UX polish
    P2R-10..12 — pre/post hooks rehomed from schedule onto source group
    P2R-13..14 — bandwidth + next/last-run surface
* P2-16/17/18 (Windows + announce-and-approve) untouched.
* Phase 2 acceptance criteria rewritten against the new model.

UI patch-up (P2R-00.4)
* host_detail.html + host_row.html: removed every $host.RepoInitialisedAt
  reference (column dropped in migration 0008 — render was 500'ing).
* Removed manual init-repo branches; the auto-init path replaces them.
* Schedules sub-tab demoted from active link to inert div until P2R-02
  rebuilds the page (it was linking to a raw 501 from the stubbed
  ui_schedules.go handlers).
* Disabled the four per-host Run-now buttons (dashboard row + host
  detail header + empty-snapshots state + right-rail) with a
  "lands in P2 Phase 4" hint — handler is 501-stubbed pending P2R-01,
  so leaving them clickable produced silent failures over htmx.
* Dashboard row-action becomes "Open →" instead of Run-now.

Project tooling
* .mcp.json at repo root: project-scoped Playwright MCP override.
  Forces --headless (so I don't pop a browser at the operator) and
  --output-dir _diag (so screenshots / traces land in the gitignored
  _diag/ directory rather than scattered at the repo root).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 09:13:05 +01:00
steve 5667cdf13a P2 redesign · phase 2: store rewrite — sources, slim schedules, repo maintenance
CI / Test (linux/amd64) (push) Has been cancelled
CI / Lint (push) Has been cancelled
CI / Build (windows/amd64) (push) Has been cancelled
CI / Build (linux/amd64) (push) Has been cancelled
CI / Build (linux/arm64) (push) Has been cancelled
Go-side data model rebuilt against migration 0008. The fat-Schedule
shape (paths/excludes/tags/retention/manual/kind/options/hooks) is
gone; that surface lives on source_groups now.

* store/types.go
  - Schedule slimmed to {id, host_id, cron, enabled, source_group_ids,
    timestamps}. SourceGroupIDs populated by Get/List, accepted on
    Create/Update so callers pass desired junction state in one shape.
  - SourceGroup added: name (= snapshot tag), includes/excludes,
    retention_policy, retry_max + retry_backoff_seconds, cached
    conflict_dimension.
  - HostRepoMaintenance added: forget/prune/check cadences + enabled.
  - PendingRun added: offline-retry queue.
  - Host loses RepoInitialisedAt; gains BandwidthUpKBps + BandwidthDownKBps.
  - RetentionPolicy moves home from "schedule field" to "source group
    field" but the type itself + Summary() method unchanged.

* store/sources.go (new) — CRUD + GetByName + ConflictDimension cache.
  Group writes bump host_schedule_version; conflict cache writes don't
  (server-internal projection, agent doesn't see it).
* store/maintenance.go (new) — CreateDefault is idempotent (INSERT OR
  IGNORE). UpdateRepoMaintenance doesn't bump schedule version because
  these run on the server's own ticker, not the agent's local cron.
* store/pending.go (new) — Enqueue / DueRunsForRetry / Bump / Delete.
* store/schedules.go — rewritten for slim shape + junction CRUD.
  Update wipes the schedule_source_groups junction wholesale and
  re-inserts (simpler than diffing). Adds SchedulesUsingGroup for
  retention-conflict detection + UI labels.
* store/hosts.go — drops repo_initialised_at scan, adds bandwidth scan.
  New SetHostBandwidth helper.

* HTTP layer — temporarily stubbed during this rewrite (501 returns
  with redesign_in_progress error code). Phase 3 fills these in
  against the new shape:
    - schedules.go REST CRUD
    - schedule_push.go agent reconciliation
    - ui_schedules.go HTML form CRUD
  Run-now-per-host + Init-repo handlers in ui_handlers.go also stubbed
  — both go away in the new model (Run-now per source group; auto-init
  at host enrolment).

* enrollment.go — replaces "seed manual schedule from typed paths"
  with "seed default source group + repo-maintenance row." The default
  group gets the typed paths as its includes; operator edits later
  via Sources tab.

* ws/handler.go — drops the MarkHostRepoInitialised projection (column
  is gone; auto-init makes it derivable from latest init job's status).

Tests:
* store: existing schedule test rewritten for slim shape + junction;
  new sources_test.go covers source-group CRUD, name uniqueness,
  conflict cache, repo-maintenance defaults + idempotent seed,
  pending-runs queue lifecycle.
* http: schedules_test.go and schedule_push_test.go deleted — both
  exercised the obsolete fat-schedule API. Phase 3 rewrites them
  against the new endpoints.

go test ./... green. cmd/server + cmd/agent build. The UI is broken
end-to-end (schedules / sources / repo tabs all hit 501 stubs); Phase 3
restores REST + on-the-wire reconciliation; Phase 4 rewires the UI
templates against the new model.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 21:30:41 +01:00
steve 666af41f46 design: v4 wireframes for P2 redesign (sources / schedules / repo)
CI / Test (linux/amd64) (push) Has been cancelled
CI / Lint (push) Has been cancelled
CI / Build (windows/amd64) (push) Has been cancelled
CI / Build (linux/amd64) (push) Has been cancelled
CI / Build (linux/arm64) (push) Has been cancelled
Hi-fi mock of the four pages affected by the redesign:
* /hosts/{id}/sources — list of source groups with per-row meta
  line (includes/excludes count, retention summary, usage,
  snapshot count) and Run-now / Edit / Delete actions. Tweaks
  toggle flips between fresh-host (default empty group, Run-now
  + Delete disabled) and multi-group states.
* /hosts/{id}/sources/{gid}/edit — name (snapshot tag), includes/
  excludes textareas, retention as a 3×2 grid of keep-* cells,
  retry-on-offline, inline conflict banner above retention when
  granularity↔cadence mismatch detected.
* /hosts/{id}/schedules — slim list (status / cron / source-tags
  / actions) plus new-schedule form (cron with quick-pick chips,
  source-group multi-select via clickable check pickers, enabled
  toggle).
* /hosts/{id}/repo — connection (URL/user/password/cert pin),
  bandwidth caps, maintenance rows (forget daily / prune weekly /
  check monthly with 5% subset), danger zone re-init.

Footer carries the retention-conflict detection spec (granularity
vs cadence mismatch). Visual language matches v1: --accent cyan,
JetBrains Mono for IDs/cron, btn tokens, sub-tab nav, hairline
panels.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 20:54:14 +01:00
steve 7a7cac588c P2 redesign · phase 1: migration 0008 — sources + repo maintenance
Schema rebuild for the model collapse described in
design/v4-sources-redesign.html. Three nouns now stand on their
own:

* schedules — slim. Only cron + enabled + host_id. Fat-schedule
  shape (paths/excludes/tags/retention/manual/kind/options/hooks)
  is dropped wholesale. Schedule data wiped — by design (smoke env
  was nuked before this ran; fresh installs have nothing to lose).
* source_groups — name + includes + excludes + retention_policy +
  retry policy + cached conflict_dimension. Group name doubles as
  the snapshot tag so retention can target it cleanly. UNIQUE
  (host_id, name) enforces tag unambiguity.
* schedule_source_groups — N:M junction. One schedule can fire N
  groups per tick; one group can be referenced by N schedules.
* host_repo_maintenance — 1:1 with hosts. Default cadences:
  forget daily 03:00, prune weekly Sun 04:00, check monthly 1st
  05:00 with --read-data-subset 5%. Operator can edit on Repo tab.
* pending_runs — offline-retry queue. Server-side ticker dispatches
  due rows; bounded by source_groups.retry_max + retry_backoff_seconds.

Plus:
* hosts.bandwidth_up_kbps / .bandwidth_down_kbps — host-wide caps.
* hosts.repo_initialised_at — DROPPED. Auto-init on enrol makes
  it derivable from the latest init job; the Init-repo button goes
  too (failure surfaces via job history banner).

Note on FK safety: smoke env was wiped before migration ran, so
DROP TABLE schedules cascades to nothing. Fresh installs apply
0001-0007 then immediately 0008 — same story (no schedule rows
to lose). For an upgrade path on a populated DB, this migration
would need a data-preserving variant; not needed today.

Tests fail to compile/run after this — expected. The Go side
(store types, CRUD, REST handlers, agent runner, UI templates)
gets rebuilt in subsequent phases. tasks.md will track P2 redesign
progress.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 20:54:01 +01:00
steve fdecde0d5c P2-05: forget command with retention policy
CI / Test (linux/amd64) (push) Has been cancelled
CI / Lint (push) Has been cancelled
CI / Build (windows/amd64) (push) Has been cancelled
CI / Build (linux/amd64) (push) Has been cancelled
CI / Build (linux/arm64) (push) Has been cancelled
End-to-end forget plumbing — operator can create a forget schedule
with keep-* values, agent runs restic forget --keep-* … on the
schedule's cron (or via per-row Run-now), snapshot list shrinks,
UI updates.

* api.CommandRunPayload gains retention_policy json.RawMessage so
  the agent doesn't need a typed copy of the server-side struct.
* restic.ForgetPolicy mirrors restic's --keep-* flags. Empty()
  reports zero dimensions; restic wrapper RunForget refuses to
  run an empty policy (would delete every snapshot). Does NOT
  pass --prune — pruning lives behind a separate admin-only
  credential (P2-06); forget just rewrites the snapshot index.
* runner.RunForget mirrors RunBackup's envelope shape so the
  live log viewer works without special-casing. On success
  triggers reportSnapshots (forget shrinks the index, the host's
  snapshot count almost certainly changed).
* cmd/agent dispatcher handles MsgCommandRun with kind=forget,
  decodes RetentionPolicy from the wire, builds restic.ForgetPolicy.
* Server dispatchScheduleNow marshals the schedule's
  RetentionPolicy into the wire payload for kind=forget jobs.
  Refuses to dispatch a forget schedule with empty retention.
* validateSchedule rejects kind=forget without at least one keep-*
  dimension (new error code: missing_retention).
* UI schedule edit form gains a Kind dropdown (backup or forget;
  immutable on edit). Paths block toggles by kind via inline
  data-kind attributes. Form help-text explains the prune
  separation.

Other kinds (prune, check, unlock) deferred to P2-06..08; the
Kind dropdown only offers backup and forget today.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 14:07:42 +01:00
steve f62a90b4b3 ui: stop Run-now buttons wrapping to two lines
CI / Test (linux/amd64) (push) Has been cancelled
CI / Lint (push) Has been cancelled
CI / Build (windows/amd64) (push) Has been cancelled
CI / Build (linux/amd64) (push) Has been cancelled
CI / Build (linux/arm64) (push) Has been cancelled
Three sites:
* Schedules list per-row Run-now / Edit / Delete column was 1fr
  next to a 1.3fr retention column — too narrow for the three
  buttons. Pin the action column to 240px and add
  whitespace-nowrap to each button so the layout can't squeeze
  them onto two lines regardless.
* Dashboard host_row Run-now button got whitespace-nowrap +
  &nbsp; for the same reason inside the 92px action column.
* Host detail header "Run backup now" — &nbsp; the words so the
  button never breaks across lines if the header gets crowded.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 13:59:42 +01:00
steve 1b947f5a2c restic: don't fall back to parent's HOME when picking the cache dir
CI / Test (linux/amd64) (push) Has been cancelled
CI / Lint (push) Has been cancelled
CI / Build (windows/amd64) (push) Has been cancelled
CI / Build (linux/amd64) (push) Has been cancelled
CI / Build (linux/arm64) (push) Has been cancelled
Agent runs as root (HOME=/root from systemd) with ProtectHome=
read-only, so restic's `mkdir /root/.cache/restic` fails on the
first call. Backups still completed (restic falls back to no-cache)
but every job log started with a noisy red "unable to open cache"
warning.

Default to /var/lib/restic-manager unconditionally — that's already
in the unit's ReadWritePaths and survives ProtectHome. ExtraEnv
overrides still win for tests / unusual setups.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 13:43:10 +01:00
steve c565a7abd1 agent unit: drop SystemCallFilter — was killing restic with SIGSYS
CI / Test (linux/amd64) (push) Has been cancelled
CI / Lint (push) Has been cancelled
CI / Build (windows/amd64) (push) Has been cancelled
CI / Build (linux/amd64) (push) Has been cancelled
CI / Build (linux/arm64) (push) Has been cancelled
Allow-list filter @system-service excludes some syscalls Go's
runtime + restic's file scanner reach for; init job died
immediately with "bad system call (core dumped)". CapabilityBounding
already constrains what root can do; the Protect*/Restrict* toggles
still cover network / kernel / mount / namespace. Net effect on the
threat model is negligible vs the operational cost.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 13:40:43 +01:00
steve 7e49b62e0e Add CLAUDE.md with project-specific rules
CI / Test (linux/amd64) (push) Has been cancelled
CI / Lint (push) Has been cancelled
CI / Build (windows/amd64) (push) Has been cancelled
CI / Build (linux/amd64) (push) Has been cancelled
CI / Build (linux/arm64) (push) Has been cancelled
Three rules to date:

* After every make build, restage the agent binary + install
  assets into /tmp/rm-smoke/data/ and replace the running agent
  on this dev box. Plain `make build` doesn't reach either, and
  forgetting has bitten the smoke env twice today (stale agent
  without mergeRestCreds; stale unit without User=root).

* Migrations: prefer ALTER TABLE DROP/RENAME COLUMN (SQLite
  3.35+) over the rebuild dance. With foreign_keys=ON in the DSN,
  DROP TABLE on a parent with ON DELETE CASCADE children wipes
  every dependent table — and PRAGMA foreign_keys=OFF inside a
  migration is a no-op (PRAGMA can only change outside a tx).

* Don't slog restic's merged URL. The user:pass@-embedded form
  exists only inside envSlice() at exec time; if any URL needs
  to be operator-visible, route it through restic.RedactURL.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 13:33:20 +01:00
steve e0037f0026 restic: treat 'config file already exists' on init as soft success
CI / Test (linux/amd64) (push) Has been cancelled
CI / Lint (push) Has been cancelled
CI / Build (windows/amd64) (push) Has been cancelled
CI / Build (linux/amd64) (push) Has been cancelled
CI / Build (linux/arm64) (push) Has been cancelled
Re-running restic init on a repo that's already initialised exits
non-zero with "Fatal: ... config file already exists". Semantically
that's a no-op, not a failure — the repo IS initialised, the
caller's intent is satisfied. Sniff stderr for the magic string
and swallow the exit code in that case, emitting an event line
so the operator-facing log says what happened.

Caught while smoke-testing P2-04.5: I'd init'd the repo manually
during a debug session, then the operator clicking the UI's
Init-repo button would hit this and the host's repo_initialised_at
would never flip.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 13:22:01 +01:00
steve 72d8081b0d Add-host: default repo username to hostname; always show htpasswd snippet
CI / Test (linux/amd64) (push) Has been cancelled
CI / Lint (push) Has been cancelled
CI / Build (windows/amd64) (push) Has been cancelled
CI / Build (linux/amd64) (push) Has been cancelled
CI / Build (linux/arm64) (push) Has been cancelled
The pending page suppressed the htpasswd snippet when repo_username
was blank — but with --private-repos the username is required for
auth, and operators routinely leave the field blank assuming the
system will pick something sensible.

* handleUIAddHostPost defaults repo_username to the typed hostname
  when blank. Matches what --private-repos expects (URL path
  segment == username).
* pending_host.html: snippet now renders whenever a password is
  present (always true after the generate-on-blank logic landed
  earlier).
* Form help-text updated to describe the default explicitly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 13:08:23 +01:00
steve 8a05969953 Add-host: durable pending page + polled awaiting-agent panel
CI / Test (linux/amd64) (push) Has been cancelled
CI / Lint (push) Has been cancelled
CI / Build (windows/amd64) (push) Has been cancelled
CI / Build (linux/amd64) (push) Has been cancelled
CI / Build (linux/arm64) (push) Has been cancelled
Two issues from a smoke session:
1. The awaiting-agent panel never refreshed — operator had to go
   back to the dashboard to see the host had connected.
2. Generated passwords were displayed only on the POST response.
   Navigating away (or even an accidental tab close) lost them
   permanently, so the operator couldn't update the rest-server's
   htpasswd.

Both are the same fix: convert the POST-rendered transient
"result state" into a durable GET page at /hosts/pending/{token}.

* New route GET /hosts/pending/{token} renders the install-command +
  htpasswd snippet view. Password is decrypted from the (still-
  encrypted-at-rest) token row on every render — operator can
  refresh, bookmark, navigate away and come back. Once the agent
  enrols, the page redirects to /hosts/{id}; once the token
  expires, redirect to /hosts/new.
* New route GET /hosts/pending/{token}/awaiting returns a polled
  HTML fragment that the pending page swaps in every 2s via HTMX.
  States: awaiting (keep polling) | connected (show "Open host →"
  + "View schedules" CTAs, polling stops) | expired (mint-new
  link, polling stops). Polling stops naturally because only the
  awaiting state's wrapper carries the hx-trigger attribute.
* POST /hosts/new now 303-redirects to /hosts/pending/{token}
  on success; validation errors keep re-rendering the form with
  banner.

Supporting changes:
* New store helper Store.GetEnrollmentTokenStatus(tokenHash) for
  the polling endpoint — returns {expires_at, consumed_at,
  consumed_host} in one round-trip without dragging in the
  attachments-decryption path.
* New ui.Renderer.RenderPartial(w, name, data) for HTMX fragment
  responses (no layout wrap). Picks an arbitrary page's template
  set as the lookup point — every page parses the full common-
  paths list, so they all see every partial.
* add_host.html stripped to form-only; pending_host.html owns the
  result-state UI; awaiting_agent.html is the polled partial.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 12:59:24 +01:00
steve 148e61b33b P2-04.5: kill host.default_paths in favour of manual schedules
CI / Test (linux/amd64) (push) Has been cancelled
CI / Lint (push) Has been cancelled
CI / Build (windows/amd64) (push) Has been cancelled
CI / Build (linux/amd64) (push) Has been cancelled
CI / Build (linux/arm64) (push) Has been cancelled
Two independent path lists for "what does this host back up?" was
a real divergence footgun — operator types one set at Add-host time
and a different set into a schedule, both end up in the same repo,
the snapshot history looks fine until restore. Resolution: drop
host.default_paths entirely; add a `manual` flag on schedules.
A manual schedule has paths/excludes/tags/retention like any other
but no cron — it fires only via per-schedule Run-now. Single source
of truth for what gets backed up.

Schema (migration 0007):
* schedules.manual INTEGER NOT NULL DEFAULT 0.
* For every host with non-empty default_paths, seed a manual
  schedule with those paths and bump host_schedule_version.
* ALTER TABLE hosts DROP COLUMN default_paths.
* ALTER TABLE enrollment_tokens RENAME COLUMN default_paths
  TO initial_paths.

Original draft of this migration rebuilt hosts via the
create-new + drop-old + rename-new pattern. With foreign_keys=ON
(set in the connection DSN), DROP TABLE on the parent fired
ON DELETE CASCADE on every child of hosts(id) — schedules /
jobs / snapshots / host_credentials all wiped on the smoke env
when I tried it. SQLite 3.35+ supports column-level ALTERs
directly, so we skip the rebuild dance and avoid the cascade
trap. Six lines of SQL instead of sixty, no FK risk.

Run-now rewiring:
* New `dispatchScheduleNow(hostID, scheduleID, conn?)` helper
  unifies the agent-driven path (cron fire → schedule.fire →
  OnScheduleFire callback) and the UI-driven path (operator
  clicks Run-now on a schedule row). Conn arg is optional; nil
  falls back to Hub.Send.
* New POST /hosts/{id}/schedules/{sid}/run endpoint — per-row
  Run-now button on the schedules list.
* Dashboard's per-host Run-now (handleUIRunBackup) now picks the
  host's only enabled manual schedule, falls back to the only
  enabled schedule, else returns "pick one in Schedules tab".
  Keeps one-click for the common case.

Agent:
* Scheduler skips manual schedules in cron build (silent — they're
  a normal data shape, not an error).
* Wire Schedule struct gains Manual flag.
* Schedule.fire flow unchanged — the agent only ever fires
  non-manual schedules anyway.

UI:
* Add-host form retitled "Initial schedule · manual" so the
  operator knows the paths become an editable schedule under
  the Schedules tab. Result page calls out the manual schedule
  + points at Host > Schedules.
* Schedule edit form: "Manual schedule" checkbox at the top of
  the When section; toggling it hides/shows the cron field via
  inline JS. Server-side validator skips the cron requirement
  when manual=true.
* Schedule list shows a "manual" tag under the status pill and
  renders the When column as "— run-now only —" for manual rows.
  Each row gets a Run-now button when the schedule is enabled
  and the host is online.

Tests + go test ./... green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 12:26:06 +01:00
steve 160d788bae P2-04: schedule editor UI
CI / Test (linux/amd64) (push) Has been cancelled
CI / Lint (push) Has been cancelled
CI / Build (windows/amd64) (push) Has been cancelled
CI / Build (linux/amd64) (push) Has been cancelled
CI / Build (linux/arm64) (push) Has been cancelled
Closes the schedule foundations slice — operator can now drive the
plumbing P2-01..03 landed without touching the JSON API.

* New routes:
  - GET  /hosts/{id}/schedules          (list)
  - GET  /hosts/{id}/schedules/new      (create form)
  - POST /hosts/{id}/schedules/new      (create)
  - GET  /hosts/{id}/schedules/{sid}/edit (edit form)
  - POST /hosts/{id}/schedules/{sid}/edit (update)
  - POST /hosts/{id}/schedules/{sid}/delete (delete, confirm-then-redirect)

* List view (web/templates/pages/schedules_list.html):
  status, cron, paths, retention summary, tags, edit/delete buttons.
  Header shows "version N · agent in sync" or "agent at vM" when the
  push hasn't been ack'd yet — backed by host_schedule_version +
  applied_schedule_version. Empty-state CTA points at /schedules/new.

* Create/edit form (web/templates/pages/schedule_edit.html, shared):
  cron expression with five quick-pick presets (daily 3am / every 6h
  / @hourly / weekly Sun / monthly 1st), paths textarea (one per
  line), excludes textarea, tags (comma-separated), retention as six
  numeric fields (mirrors restic's --keep-* flags one-for-one),
  bandwidth caps, enabled toggle. Side panel explains the
  reconciliation flow so the operator knows what saving actually
  does. Validation errors re-render with operator's input intact.

* internal/server/http/ui_schedules.go owns the handlers; reuses
  the same validateSchedule + pushScheduleSetAsync used by the JSON
  API path. Each save audit-logs schedule.created / schedule.updated
  / schedule.deleted (matching the JSON API actions).

* store.RetentionPolicy gains a Summary() method ("last=7, d=14,
  w=4" or "—"). Used by the list view's table cell so templates
  don't have to do any conditional retention rendering.

* Two new template helpers: list (string varargs → []string, used
  for the cron preset row) and joinComma (sibling to joinDot for
  the rare list that wants commas). RetentionPolicy.Summary covers
  the schedule-list case but the helpers are general.

* host_detail.html secondary tabs row converted from inert <div>s
  into <a> links. Snapshots active by default; Schedules now points
  at the new page. Jobs/Repo/Settings remain inert until their
  P2 owners ship.

Hooks UI deferred to P2-15 (lands with the hook execution path).
Single-kind UI (backup only) by design — other kinds get a UI when
their job dispatch lands in P2-05..08.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 11:44:40 +01:00
steve 6450bf1b88 P2-02 (agent side) + P2-03: agent scheduler + schedule.fire dispatch
CI / Test (linux/amd64) (push) Has been cancelled
CI / Lint (push) Has been cancelled
CI / Build (windows/amd64) (push) Has been cancelled
CI / Build (linux/amd64) (push) Has been cancelled
CI / Build (linux/arm64) (push) Has been cancelled
Closes the schedule reconciliation loop end-to-end.

* New `internal/agent/scheduler` package wraps robfig/cron/v3 with
  the lifecycle the agent needs:
  - Apply(ScheduleSetPayload, Sender) stops the prior cron (waiting
    for in-flight entries to return), rebuilds from scratch, starts,
    and emits schedule.ack with the version we just applied.
  - Disabled entries skipped silently; bad cron exprs (which
    shouldn't reach us — the server validates — but defensive)
    log a warn and skip.
  - On each cron tick the entry sends a new schedule.fire envelope
    to the server with {schedule_id, scheduled_at}. The scheduler
    itself never builds CommandRunPayloads — server is the source
    of truth for jobs.
  - tx is swapped on every Apply, so reconnect is handled
    naturally: cron entries that fire against a dropped tx log
    "no active connection" and skip the tick.
  - Stop() is idempotent and waits for the cron's in-flight
    workers via cron.Stop().Done().

* New wire message api.MsgScheduleFire + api.ScheduleFirePayload
  for the agent → server "I just fired locally" RPC.

* Server-side dispatch (schedule_push.go: dispatchScheduledJob):
  looks up the schedule by id, validates ownership + that it's
  enabled, builds args from kind (paths for backup; other kinds
  are still arg-less in Phase 2 and grow as those job kinds land
  in P2-05..08), persists a jobs row with actor_kind=schedule +
  scheduled_id, and writes command.run back on the same conn so
  the agent runs through its existing dispatch path.

* store.CreateJob now writes scheduled_id. This column was in the
  schema since 0001 but never populated — the original P1 path
  only had operator-driven jobs, so actor_kind was always 'user'
  and scheduled_id was always nil.

* cmd/agent/main.go integration: dispatcher gains a
  *scheduler.Scheduler; the MsgScheduleSet case now hands the
  payload to scheduler.Apply (in a goroutine so the WS read loop
  keeps draining other messages).

* WS dispatcher gains OnScheduleFire alongside OnScheduleAck.

* Tests:
  - scheduler unit tests (4): ack-on-apply, cron tick fires
    schedule.fire envelope, disabled entries don't fire, replace-
    prior-state stops the old cron.
  - Server-side end-to-end: schedule.fire → command.run with the
    right job_id / kind / args, plus jobs row with actor_kind=
    "schedule" and scheduled_id linking back to the schedule.

Persistence of next-fire times across agent restarts is
deliberately deferred. A missed fire window during downtime
simply fires once on reconnect — that's the desirable behaviour
(the operator wants the missed backup to run, not be silently
skipped because we lost track of when it was due).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 11:29:12 +01:00
steve 946b6db137 P2-02 (server side): schedule reconciliation push + ack handling
CI / Test (linux/amd64) (push) Has been cancelled
CI / Lint (push) Has been cancelled
CI / Build (windows/amd64) (push) Has been cancelled
CI / Build (linux/amd64) (push) Has been cancelled
CI / Build (linux/arm64) (push) Has been cancelled
Server is now the source of truth for the agent's cron set.

* Helpers in schedule_push.go:
  - loadScheduleSetPayload reads the host's schedules + canonical
    version into the wire shape.
  - pushScheduleSetOnConn writes directly to a just-handshaken conn
    (avoids racing against Hub.Register on a brand-new connection).
  - pushScheduleSetAsync is the post-CRUD flavour — no-op when the
    host is offline (the next reconnect's on-hello path catches it
    up, so a missed push is non-fatal).
  - applyScheduleAck records what version the agent has confirmed.

* onAgentHello restructured: was returning early when the host had
  no repo credentials, which made the schedule push unreachable for
  fresh hosts. Split into pushRepoCredsOnHello (silent no-op on
  ErrNotFound) + pushScheduleSetOnConn (always runs). Empty schedule
  list is a valid push: tells the agent to drop stale cron entries.

* WS dispatcher gains an OnScheduleAck hook on HandlerDeps; the
  http server wires it to applyScheduleAck. MsgScheduleAck moves
  out of the "TODO(P2)" group into a real case that decodes the
  payload and forwards to the callback.

* Schedule CRUD handlers each fire pushScheduleSetAsync after the
  audit-log write so the agent picks up changes within seconds.

Tests cover:
  - On-hello push of an already-created schedule, agent acks,
    applied_schedule_version flips on the host row.
  - Connect-then-CRUD: empty initial push (version 0), then a
    follow-on push at version 1 after the operator creates a
    schedule via REST.

Agent-side `schedule.set` handler (parse, replace local cron,
emit `schedule.ack`) is the remainder of P2-02 and lands with
P2-03's local scheduler.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 11:22:06 +01:00
steve 4b075840a1 P2-01: schedule schema + CRUD API
CI / Test (linux/amd64) (push) Has been cancelled
CI / Lint (push) Has been cancelled
CI / Build (windows/amd64) (push) Has been cancelled
CI / Build (linux/amd64) (push) Has been cancelled
CI / Build (linux/arm64) (push) Has been cancelled
The `schedules` table was already laid down in migration 0001; this
slice adds the Go-side data model, store CRUD with atomic version
bumps, and REST endpoints.

* `store.Schedule` + `RetentionPolicy` + `ScheduleOptions` typed
  views (the wire form on the agent side keeps retention/options
  as raw JSON since the agent just forwards them to restic).
* Store CRUD: CreateSchedule / GetSchedule / ListSchedulesByHost /
  UpdateSchedule / DeleteSchedule. Each mutation bumps
  `host_schedule_version` atomically in the same tx via UPSERT on
  `host_schedule_version`. SetHostAppliedScheduleVersion records
  what the agent has confirmed via schedule.ack (P2-02 will use it).
* REST endpoints under /api/hosts/{id}/schedules + /{sid}:
  GET (list, with the version envelope so callers can detect
  drift), POST (create), PUT (update — kind is immutable), DELETE.
* Validation: cron expressions parse via robfig/cron/v3 (same
  parser the agent will use, so anything that validates here will
  fire there); kind ∈ {backup, forget, prune, check} (init/unlock
  are operator-only one-shot kinds, not schedulable); backup
  schedules require ≥1 path; hooks rejected on non-backup kinds
  (spec §14.3).
* All mutations audit-logged.
* Tests: store-level CRUD + version-bump invariants; REST happy
  path (create→list→update→delete with version progression); REST
  validation table covers each rejection code.

newTestServerWithHub now sets BootstrapToken so the schedules
handler tests can use the existing login flow without a parallel
test-server constructor.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 11:12:58 +01:00
steve ee3ee241ea P1 polish: agent-as-root, init-repo flow, rest creds passthrough, UX fixes
CI / Test (linux/amd64) (push) Has been cancelled
CI / Lint (push) Has been cancelled
CI / Build (windows/amd64) (push) Has been cancelled
CI / Build (linux/amd64) (push) Has been cancelled
CI / Build (linux/arm64) (push) Has been cancelled
Cohesive batch from a smoke-test session against a real rest-server.
Themed bullets:

* Agent runs as root, sandboxed via systemd. CapabilityBoundingSet
  drops to CAP_DAC_READ_SEARCH + restore caps; ProtectSystem=strict
  with ReadWritePaths confined to /etc + /var/lib/restic-manager;
  NoNewPrivileges blocks escalation. Install script no longer
  creates a service user. spec.md §4.2 / §14.1 / §14.3 explain the
  rationale (matches UrBackup / Veeam / Bareos defaults; trying to
  back up "everything" as an unprivileged user creates silent skips
  on /home, /root, /var/lib/* with no upside vs the threat model
  the agent already implies).

* Init-repo end-to-end. New JobKind="init" wired through agent
  runner, restic.Env.RunInit, server dispatcher, and a UI button
  (red "Initialise repo" in the run-now panel). hosts.repo_initialised_at
  flips on init success, on backup success, or on a non-empty
  snapshots.report. The "Run now" / "Init" / "Retry" branching now
  drives both the dashboard host row and the host-detail panel.
  Migrations 0004 (column), 0005 (jobs.kind CHECK widened — using
  the safe create-new-then-rename pattern; first version corrupted
  job_logs.job_id FK), 0006 (cleans up job_logs FK on already-
  affected DBs).

* rest-server creds embedded at exec time only. restic.Env gains
  RepoUsername; mergeRestCreds() builds the user:pass@-prefixed URL
  inside envSlice() and never assigns it back to the struct, so
  nothing slog-able ever sees the cleartext form. RedactURL helper
  for any future surface that needs to log a URL safely. Both
  helpers tested.

* Add-host UX. Repo password is now optional — server mints a
  24-byte URL-safe random one and surfaces it once, alongside an
  htpasswd snippet ("echo PASS | htpasswd -B -i ... USERNAME") so
  the operator pastes one command on the rest-server host and one
  on the endpoint. Result page also links the install snippet at
  /install/install.sh (was /install.sh — 404'd before) and pipes
  to bash (not sh — script uses set -o pipefail and other
  bashisms; on Debian/Ubuntu sh is dash).

* Late-subscriber race in JobHub. A fast-failing job could finish
  (DB write + Broadcast) before the browser's HX-Redirect → page
  load → WS-connect path completed, so the JS sat forever waiting
  on a job.finished that already passed. JobHub split into
  Register + Send + Run; handleJobStream now subscribes first,
  re-fetches the job, and sends a synthetic job.finished if the
  state is already terminal.

* HTMX error visibility. New toast partial listens to
  htmx:responseError and surfaces the response body as a
  bottom-right toast — every server-side validation error now
  becomes visible without per-handler JS wiring. Also handles
  custom rm:toast events for future server-pushed notifications
  via the HX-Trigger header. Themed via existing CSS vars.

* Dashboard rows are now whole-row clickable to host detail
  (CSS card-link pattern: absolute-positioned anchor + .row-action
  z-index restoration so the action button stays clickable).
  "View →" on a running job links to /jobs/<id> rather than
  /hosts/<id> since the row click already covers the host page.

* "Run first" / "Run first backup" → "Run now" everywhere for
  consistency.

* runbook (docs/e2e-smoke.md) updated — live-log streaming step
  now reflects P1-26; mentions the browser-driven Run-now flow.

* _diag/dump-creds — moved out of cmd/ so go build doesn't pick
  it up; .gitignore now excludes /_diag/ entirely.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 11:02:12 +01:00
steve 12b72e7dde P1 polish: Host.default_paths interim + restic env hygiene + job_id JS quoting
CI / Test (linux/amd64) (push) Has been cancelled
CI / Lint (push) Has been cancelled
CI / Build (windows/amd64) (push) Has been cancelled
CI / Build (linux/amd64) (push) Has been cancelled
CI / Build (linux/arm64) (push) Has been cancelled
Two fixes that close the loop on dashboard run-now and harden the
agent's restic invocation.

Default paths (interim until P2-01 schedules):
  - 0003 migration adds default_paths TEXT NOT NULL DEFAULT '[]'
    to hosts and to enrollment_tokens.
  - Operator types paths in the Add-host form (textarea, one per
    line). They ride on the enrol_token row alongside the
    encrypted creds (paths aren't secret — plain JSON column).
  - On consume, ConsumeEnrollmentToken still just burns the token;
    the new GetEnrollmentTokenAttachments returns both the
    re-bindable creds and the path list in one round trip, the
    handler transfers them onto the new host row inside CreateHost.
  - The dashboard's Run-now and host-detail's "Run backup now"
    button now read Host.DefaultPaths and pass them to dispatchJob.
    A host with no default paths returns 400 with a friendly
    "no paths set" message instead of dispatching a doomed
    `restic backup` with no positional args.
  - Doc comments explicitly call this out as a Phase 1 interim —
    schedules supersede.

Restic env hygiene:
  - envSlice() previously omitted HOME / XDG_CACHE_HOME, which
    bit the smoke runs whenever the agent was launched outside
    systemd (restic refused to start: "neither $XDG_CACHE_HOME
    nor $HOME are defined"). Now both are set explicitly: prefer
    Env.ExtraEnv overrides, fall back to the agent process's own
    HOME, and finally to /var/lib/restic-manager.
  - Comment makes the env policy explicit: parent's RESTIC_* /
    AWS_* / B2_* env is filtered out by design — control-plane
    is the unambiguous source of truth.

JS bug fix in the live log page:
  - {{$job.ID | printf "%q"}} produced a literal-quoted JS string,
    which then went into the WS URL as ".../jobs/"<ID>"/stream"
    → 404. Switched to '{{$job.ID}}' inside the literal so
    html/template's auto-escape does the right thing. Verified
    end-to-end: dashboard "Run now" → live progress + log lines
    arrive over the WS → succeeded pill renders.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 22:35:33 +01:00
steve bd434bd1d0 P1-26: live job log viewer + WS browser fan-out hub
CI / Test (linux/amd64) (push) Has been cancelled
CI / Lint (push) Has been cancelled
CI / Build (windows/amd64) (push) Has been cancelled
CI / Build (linux/amd64) (push) Has been cancelled
CI / Build (linux/arm64) (push) Has been cancelled
Closes the P1-21 remainder.

internal/server/ws/jobhub.go — new JobHub. Per-job_id set of
subscribers; each gets a 64-deep buffered channel with a writer
goroutine. Broadcast is non-blocking: if a subscriber is slow,
its channel fills and messages are dropped for that subscriber
only — the agent's read loop is never blocked by a stuck browser.

The agent dispatchAgentMessage path mirrors job.started /
job.progress / log.stream / job.finished envelopes onto the hub
in addition to its existing persistence work. The wire shape is
the same end-to-end, so client-side JS switches on env.type the
same way Go code does.

GET /api/jobs/{id}/stream is the browser endpoint. Auth via
session cookie (HTTP layer); upgrade; subscribe; pump until
context closes.

GET /jobs/{id} renders the live log page. Three states (queued/
running/succeeded/failed) drive the header pill, the progress
bar block, the failure summary panel, and the action button
(Cancel job while running, Back to host afterwards). Already-
persisted log lines are server-rendered on initial load; new
lines arrive over the WS and append to #log-stream. Auto-scrolls
unless the user scrolls up (a "⇢ Follow" pill re-attaches).
On job.finished the page reloads after 600ms to pick up the
final-state header rendered server-side.

POST /hosts/{id}/run-backup now sets HX-Redirect → /jobs/{job_id}
on success so HTMX lands the operator straight on the live log.
For non-HTMX callers (curl / plain form post) it 303s to the
same target.

store.ListJobLogs returns persisted log lines for initial render
on page load.

Browser-verified end-to-end: enrol → run a real backup against a
sibling restic/rest-server → live progress + 11 log lines stream
in → succeeded pill + final stats land after page reload.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 21:45:56 +01:00
steve 26a2b85e13 P1-25: host detail page (snapshots tab default)
CI / Test (linux/amd64) (push) Has been cancelled
CI / Lint (push) Has been cancelled
CI / Build (windows/amd64) (push) Has been cancelled
CI / Build (linux/amd64) (push) Has been cancelled
CI / Build (linux/arm64) (push) Has been cancelled
GET /hosts/{id} renders the v1 host detail layout:

  - persistent header: status dot (pulse if a job is in flight),
    monospace name, tags, plus a metadata strip (os/arch, agent
    version, restic version, "last seen Xs ago" or "online · last
    heartbeat …").
  - vitals strip: four tiles for last backup (status + relative
    time), repo size, snapshot count, open alerts.
  - sub-tabs: Snapshots is active; Jobs / Repo / Settings are
    visible but inert until P2.
  - snapshot table: short id, time (absolute), paths joined with
    " · ", size, file count, restore button (disabled — wires up
    in P3).
  - right rail: run-now stack (backup live, forget/prune/check/
    unlock disabled with the Phase tag), danger-zone remove panel
    (also disabled for now).

Empty state: when a host has no snapshots yet, the table replaces
itself with a "no snapshots yet" prompt that includes the run-now
button (provided the agent is online).

Pagination cap of 50 most-recent snapshots; full pagination lands
when fleet sizes demand it.

Template helpers grew: comma() now accepts int / int32 / int64 so
templates don't fight Go's type inference; joinDot() concatenates
a []string with " · "; absTime() formats time.Time as
YYYY-MM-DD HH:MM:SS; the existing relTime() already accepts T or
*T after P1-27.

Browser-verified end-to-end with seeded fixture data.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 20:20:21 +01:00
steve dad8c7fe99 P1-27: Add host flow — form + minted-token result page
CI / Test (linux/amd64) (push) Has been cancelled
CI / Lint (push) Has been cancelled
CI / Build (windows/amd64) (push) Has been cancelled
CI / Build (linux/amd64) (push) Has been cancelled
CI / Build (linux/arm64) (push) Has been cancelled
GET /hosts/new renders the focused two-column form (hostname,
tags, repo URL/username/password). POST /hosts/new validates,
mints a one-time token via the new mintEnrollmentToken helper —
shared with the existing JSON /api/enrollment-tokens endpoint —
and re-renders the same page in result state showing:

  - the install command with RM_SERVER + RM_TOKEN filled in (and
    an inline copy-to-clipboard button),
  - an "awaiting agent connection" panel with the hostname
    pre-filled,
  - a troubleshooting list pointing at the most common reasons
    the agent doesn't appear,
  - back-to-dashboard / add-another-host links.

publicURL() resolves RM_BASE_URL first, falling back to scheme +
Host on the inbound request — useful for local smoke without a
proxy.

Browser-verified end-to-end: form submit → token minted → install
command renders with the right values from the form input.

template fn formatRelTime now accepts time.Time *or* *time.Time
so templates can pass either without fighting Go's lack of an
address-of operator.

Deferred: download-preconfigured-installer (a templated .sh with
the values baked in) — copy-paste covers v1; nice-to-have later.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 20:16:54 +01:00
steve ee16bc7ce7 P1-24: live dashboard — fleet summary tiles + host table
CI / Test (linux/amd64) (push) Has been cancelled
CI / Lint (push) Has been cancelled
CI / Build (windows/amd64) (push) Has been cancelled
CI / Build (linux/amd64) (push) Has been cancelled
CI / Build (linux/arm64) (push) Has been cancelled
Server-rendered HTML view backed by:
  - new store.FleetSummary aggregating host counts + repo bytes +
    snapshot total + open alerts + last-24h job rollup in two queries.
  - GET /api/hosts (JSON list of hosts in the dashboard projection).
  - GET /api/fleet/summary (JSON aggregate, same shape as above).

The HTML page (web/templates/pages/dashboard.html) renders the four
summary tiles + host table directly from store data — no separate
fetch. Per-row state colour comes from .host-row.{degraded,failed,
offline} which paint a 3px left edge so problem hosts are scannable
without reading. HTMX is loaded into the base layout so per-row
"Run now" buttons can hx-post to /hosts/{id}/run-backup, a thin
HTML wrapper that funnels into a new dispatchJob helper shared
with the JSON /api/hosts/{id}/jobs endpoint.

Empty state (zero hosts) collapses to the "no hosts yet" prompt
with the + Add host CTA — matches the v1 mockup.

Template helpers (internal/server/ui/funcs.go) added for byte
formatting (412 GB / 3.7 TB), relative time (3m ago / 2d ago), and
comma grouping (1,847). Pure Go, no template-magic dependency.

Browser-verified end-to-end with seeded fixture data: five hosts
across all four states render with correct dots, accents, last-
backup pills, sizes, snapshot counts, alerts, tags, and the right
action button (Run now / Retry / Run first / View → / offline).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 19:29:11 +01:00
steve 229f89fee2 P1-23 / P1-28: base layout, login, session-aware nav + Tailwind build
CI / Test (linux/amd64) (push) Has been cancelled
CI / Lint (push) Has been cancelled
CI / Build (windows/amd64) (push) Has been cancelled
CI / Build (linux/amd64) (push) Has been cancelled
CI / Build (linux/arm64) (push) Has been cancelled
P1-28: Tailwind standalone CLI wired into the Makefile. `make tailwind`
downloads the pinned v3.4.17 binary into bin/tailwindcss (gitignored),
builds web/styles/input.css → web/static/css/styles.css. `make build`
now runs the CSS pass first; `make tailwind-watch` for dev. Output is
embedded in the binary via web.FS — single static binary, no Node.

The CSS source carries every component class the v1 mockups defined
(status dots, buttons, host row, log viewer, progress bar, fields,
chips, snippet panel, empty state) so screens that land later can
just reach for them.

P1-23: html/template tree at web/templates with two layouts (base
with chrome, chromeless for login + bootstrap), one nav partial, and
two pages (dashboard placeholder, login). internal/server/ui parses
the tree at startup; ui_handlers.go in the http package wires:

  GET  /         dashboard (303 → /login when unauthed)
  GET  /login    sign-in form
  POST /login    consume form, mint session cookie, 303 → /
  POST /logout   drop cookie, 303 → /login
  GET  /static/* embedded Tailwind bundle

The HTML login flow shares store/session logic with /api/auth/login
via a new authenticateAndSession helper — same security guarantees,
two surface representations (HTML form / JSON).

Verified end-to-end: bootstrap → form-login → authed dashboard →
sign-out → 303 cycle works in the browser; Tailwind output emits
only the component classes referenced in the live templates (9.6kB
minified).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 19:19:06 +01:00
steve 136e1a1d8f design: extend v1 to login / add-host / host-detail / job-log + lock components
CI / Test (linux/amd64) (push) Has been cancelled
CI / Lint (push) Has been cancelled
CI / Build (windows/amd64) (push) Has been cancelled
CI / Build (linux/amd64) (push) Has been cancelled
CI / Build (linux/arm64) (push) Has been cancelled
Five hi-fi screens completing the Phase 1 surface, all in v1's dark
operator-console register.

  v1-login          Sparse centred card. Sign-in + first-error variant.
                    No marketing chrome; build version sits in footer
                    so a returning operator can spot agent drift.

  v1-add-host       Focused two-column page (form left, contextual
                    "what happens next" right) — not a modal. Two
                    states: form (state A) and minted-token result
                    with install command (state B). Backed by
                    POST /api/enrollment-tokens (P1-32).

  v1-host-detail    Persistent header (status dot, mono name, tags,
                    primary CTAs, vitals strip) over four sub-tabs
                    (Snapshots / Jobs / Repo / Settings). Snapshots
                    is the default — the thing 90% of operators
                    want when they click a host name. Right rail
                    holds Recent activity, run-now stack, and a
                    danger-zone panel.

  v1-job-log        WS-streamed log view. Three states: running (live
                    progress bar + auto-scroll cursor), succeeded
                    (summary stats + final lines), failed (error
                    panel + tail). Backed by WS /api/jobs/{id}/stream
                    (P1-21 remainder).

  v1-components     The load-bearing reference. 14 sections covering
                    tokens (colour + type scale), status, buttons,
                    form fields, tags, tabs, host row, log viewer,
                    progress bar, stat tile, modal, toast, install
                    snippet, empty-state pattern. Every CSS class is
                    real and copy-able into the Go template build.

This locks the visual register before P1-23 onwards. Each Phase 1
template gets a {{define}} matching a section in v1-components.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 19:05:39 +01:00
steve f9c2351ab6 design: v1 polish — row accents, wider last-backup col, empty state
CI / Test (linux/amd64) (push) Has been cancelled
CI / Lint (push) Has been cancelled
CI / Build (windows/amd64) (push) Has been cancelled
CI / Build (linux/amd64) (push) Has been cancelled
CI / Build (linux/arm64) (push) Has been cancelled
- Single .host-row CSS rule replaces 13 inline grid-template-columns
  copies; column widths bumped so "backup running…" doesn't wrap.
- Faint left-edge accent for degraded / failed / offline rows so
  problem hosts are scannable without reading.
- Empty-state hero added: top-bar + nav still present (Dashboard
  active, others dimmed) but body collapses to a calm "no hosts yet"
  prompt with the install command as the load-bearing affordance.
  Prerequisite note keeps the deliberate "restic must already be
  installed" decision visible to first-time operators.

This is the artefact P1-23/24/27 will template against.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 18:48:15 +01:00
steve 81c7825937 design: three hi-fi dashboard directions for review
CI / Test (linux/amd64) (push) Has been cancelled
CI / Lint (push) Has been cancelled
CI / Build (windows/amd64) (push) Has been cancelled
CI / Build (linux/amd64) (push) Has been cancelled
CI / Build (linux/arm64) (push) Has been cancelled
Three deliberately differentiated takes on the dashboard so we can
lock the visual register before the UI work starts (P1-23 onwards).

  v1 — Operator console (Linear/Datadog dark register).
       Dense table, monospace numerics, restrained colour, pulsing
       status dot only when a job is running. The natural fit for
       the audience and the most defensible choice.

  v2 — Editorial calm (Stripe/Notion light register).
       Serif hero headline that humanises the data, cards with
       breathing room in a 2-up grid, demoted "quiet hosts" strip,
       subtle rust accent. Reads as trustworthy infrastructure.

  v3 — Print spec (Tufte/aerospace monospace register).
       Pure monospace, near-monochrome, status as typeset glyphs
       (●▶▲○✗) so the screen survives greyscale. "Requires
       attention" block groups problem hosts at the top; activity
       tail reads like a real log. Most polarising; highest
       craft ceiling.

Each file is self-contained (Tailwind via CDN + Google Fonts) and
includes a philosophy preamble + the dashboard hero + a component
vocabulary section so we can read the system, not just one screen.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 18:39:57 +01:00
steve b6cfa99413 agent: log accept/complete on backup jobs; audit: populate host.enrolled payload
CI / Test (linux/amd64) (push) Has been cancelled
CI / Lint (push) Has been cancelled
CI / Build (windows/amd64) (push) Has been cancelled
CI / Build (linux/amd64) (push) Has been cancelled
CI / Build (linux/arm64) (push) Has been cancelled
Two warts surfaced during the smoke run:

- Agent was silent between "config.update applied" and "job
  finished" — operators tailing journalctl saw no acknowledgement
  that a command.run had landed. Adds Info logs at job-accept
  ({job_id, paths}) and at successful completion.

- The host.enrolled audit row had an empty {} payload. Now
  carries {hostname, os, arch, has_repo_creds} so an audit-log
  reader can answer "what got enrolled and did the operator
  bundle creds with the token" without joining back to hosts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 18:24:56 +01:00
steve 2418e585db fix: enrollment FK race + log-when-rejected; runbook fixes from dry-run
CI / Test (linux/amd64) (push) Has been cancelled
CI / Lint (push) Has been cancelled
CI / Build (windows/amd64) (push) Has been cancelled
CI / Build (linux/amd64) (push) Has been cancelled
CI / Build (linux/arm64) (push) Has been cancelled
The smoke runbook caught a real bug: ConsumeEnrollmentToken was
inserting into host_credentials (FK -> hosts) inside the same tx as
the token burn, but the host row didn't exist yet — CreateHost
runs in the *next* statement. The agent saw a generic 401 with no
clue why.

Fix: drop the host_credentials insert from ConsumeEnrollmentToken;
the HTTP handler now does Consume -> CreateHost ->
SetHostCredentials. SetHostCredentials failure is logged loudly
but doesn't fail the enrol — operator recovers via PUT
/api/hosts/{id}/repo-credentials.

Adds slog.Warn lines on both 401 paths in handleAgentEnroll so the
underlying cause is visible in server logs (the wire response stays
generic to avoid leaking which step failed).

Test: TestEnrollmentTransfersRepoCreds rewritten to mirror the new
order (consume -> create host -> SetHostCredentials).

Runbook (docs/e2e-smoke.md): rest-server moved off 8000 (commonly
in use); URLs use trailing slash on the rest path; clarified that
secrets_key is minted on first agent start, not at enrol time.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 14:01:59 +01:00
steve 5d1951ad94 P1-34: e2e smoke runbook + redacted GET /repo-credentials
CI / Test (linux/amd64) (push) Has been cancelled
CI / Lint (push) Has been cancelled
CI / Build (windows/amd64) (push) Has been cancelled
CI / Build (linux/amd64) (push) Has been cancelled
CI / Build (linux/arm64) (push) Has been cancelled
Adds docs/e2e-smoke.md — an ~5-minute runbook that walks the full
P1 happy path against a sibling restic/rest-server: bootstrap
admin, mint token with repo creds, enrol an agent, watch the
config.update push land, run a backup, confirm the snapshot, edit
creds and watch the second push fire. Per the design discussion
this is a runbook (not a Go integration test); the Playwright
version lands in P5-06.

GET /api/hosts/{id}/repo-credentials returns the redacted view —
{repo_url, repo_username, has_password} — so the UI can pre-fill
the edit form without ever pulling the password out of the AEAD
blob.

Marks P1-32 / P1-33 / P1-34 done in tasks.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 13:49:34 +01:00
steve ec276dbc91 P1-33: agent-side encrypted secrets store + push-on-update
CI / Test (linux/amd64) (push) Has been cancelled
CI / Lint (push) Has been cancelled
CI / Build (windows/amd64) (push) Has been cancelled
CI / Build (linux/amd64) (push) Has been cancelled
CI / Build (linux/arm64) (push) Has been cancelled
New internal/agent/secrets package: AEAD blob at
/var/lib/restic-manager/secrets.enc, atomic write (os.CreateTemp +
Sync + Rename), 0600. Key lives in agent.yaml as base64
(SecretsKey) — same trust boundary as the bearer token, minted on
first start via EnsureSecretsKey.

cmd/agent: dispatcher reads creds fresh from secrets.Load() on
each job rather than from in-memory config. config.update merges
the push with what's on disk and persists, so a daemon restart
keeps the latest values. Legacy plaintext repo_url/repo_password
in agent.yaml are silently migrated into secrets.enc on next start
and stripped from the YAML on the following save.

Tests: round-trip + wrong-key rejection + atomic-write
post-condition for secrets; key idempotence + legacy-field
parse/clear for config.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 12:41:28 +01:00
steve 0ba56ed30d P1-32: server-side encrypted repo creds + push-on-hello
CI / Test (linux/amd64) (push) Has been cancelled
CI / Lint (push) Has been cancelled
CI / Build (windows/amd64) (push) Has been cancelled
CI / Build (linux/amd64) (push) Has been cancelled
CI / Build (linux/arm64) (push) Has been cancelled
Operator-minted enrollment tokens now carry the repo URL/username/
password as one AEAD blob bound (via additional-data) to the token
hash. ConsumeEnrollmentToken re-encrypts under host_id and writes a
host_credentials row in the same tx as token-burn, so the binding
moves with the credential.

PUT /api/hosts/{id}/repo-credentials lets an operator edit creds
post-enrollment; merges with the existing blob, audits, and pushes
config.update if the agent is connected.

WS handler grows an OnHello hook that the HTTP layer wires to send
the host's decrypted creds as a config.update immediately after the
hello succeeds — synchronously, so a racing command.run lands after
the agent has its repo password.

Schema: 0002_host_credentials.sql adds enc_repo_creds to
enrollment_tokens and a host_credentials table (PK = host_id, FK
ON DELETE CASCADE).

Tests: round-trip token → consume → host_credentials with AAD swap
detection; no-creds path stays compatible.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 12:38:35 +01:00
steve e58917106d spec/tasks: pull repo-credential plumbing into Phase 1
CI / Test (linux/amd64) (push) Has been cancelled
CI / Lint (push) Has been cancelled
CI / Build (windows/amd64) (push) Has been cancelled
CI / Build (linux/amd64) (push) Has been cancelled
CI / Build (linux/arm64) (push) Has been cancelled
Adds P1-32/33/34: encrypted repo creds carried on the enrollment token,
agent-side AEAD secrets file, end-to-end smoke. spec.md §4.2 and §7.3
rewritten to describe the full flow (server-issued at token time,
pushed via config.update on hello, persisted encrypted on the agent)
and to make the encrypted-file-now / OS-keyring-Phase-2 split
explicit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 12:32:53 +01:00
steve 6c9558c703 tasks: add P2-18 announce-and-approve, expand P1-27 with preconfigured installer
CI / Test (linux/amd64) (push) Has been cancelled
CI / Lint (push) Has been cancelled
CI / Build (windows/amd64) (push) Has been cancelled
CI / Build (linux/amd64) (push) Has been cancelled
CI / Build (linux/arm64) (push) Has been cancelled
P2-18 captures the keypair + fingerprint-comparison enrollment flow
as a Phase 2 alternative to the token model. Includes guards
(rate limit, pending cap, hostname-collision flagging) and explicit
acceptance criteria.

P1-27 grows to mint encrypted repo creds alongside the token and
expose a one-click preconfigured-installer download from the
"Add host" form (cf. UrBackup Internet-mode push installer).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 12:31:28 +01:00
steve 3904a78f14 P1-22: snapshot listing via restic snapshots --json
CI / Test (linux/amd64) (push) Has been cancelled
CI / Lint (push) Has been cancelled
CI / Build (windows/amd64) (push) Has been cancelled
CI / Build (linux/amd64) (push) Has been cancelled
CI / Build (linux/arm64) (push) Has been cancelled
Agent calls restic snapshots --json after each successful backup
(60s timeout, separate from the backup ctx) and ships the projection
over the existing snapshots.report WS envelope. Failure here is
logged but doesn't fail the job — the next successful backup catches
the projection up.

Server-side ReplaceHostSnapshots is delete-then-insert plus a
hosts.snapshot_count update in one transaction so the dashboard's
per-host count stays consistent with the projection. New read
endpoint GET /api/hosts/{id}/snapshots returns the cached list with
a refreshed_at marker so the UI can show staleness when an agent
has been offline.

Schema: dropped the unused snapshots.repo_id FK (repos as a
first-class entity is P2 work), added short_id and refreshed_at
columns, switched the time index to DESC for the most-recent-first
list query. api.Snapshot gains short_id; size_bytes/file_count come
from the embedded summary block on restic 0.16+ and stay zero on
older clients.

Tests cover round-trip, authoritative replacement after forget+prune
shrinkage, and empty-after-wipe.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 11:20:57 +01:00
steve 41a4043af3 server: drop in-process TLS — HTTP-only behind reverse proxy
Self-hosted deployments already terminate TLS at Caddy/Traefik/nginx;
making the server do TLS too means double cert config, dual ACME
plumbing, and an untested code path. Drop RM_TLS_CERT/RM_TLS_KEY,
remove TLSEnabled() and the ListenAndServeTLS branch.

Replace the cookie's "Secure if TLS-in-process" check with a new
RM_COOKIE_SECURE flag (default true). Local HTTP-only testing sets
RM_COOKIE_SECURE=false; production is always behind a TLS proxy and
the cookie stays Secure.

Default port :8443 → :8080. docker-compose binds 127.0.0.1 only and
populates RM_TRUSTED_PROXY. spec.md §4.1/§10.1 rewritten with a
Caddyfile snippet and a hard "do not expose RM_LISTEN publicly"
warning. enrollResponse keeps cert_pin_sha256 in the shape but the
server can't introspect a cert it doesn't terminate — operator
pastes the proxy's hash into -cert-pin at install time.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 11:20:41 +01:00
steve 77a305d064 tasks.md: mark Phase 1 progress
CI / Test (linux/amd64) (push) Has been cancelled
CI / Lint (push) Has been cancelled
CI / Build (windows/amd64) (push) Has been cancelled
CI / Build (linux/amd64) (push) Has been cancelled
CI / Build (linux/arm64) (push) Has been cancelled
Captures the state landed in this session:

Done (P1-01..03, P1-05, P1-06, P1-08..16, P1-17..20, P1-29):
  HTTP server, store + schema, crypto, first-run bootstrap,
  every API type with wire-shape tests, WS transport,
  enrollment + hello + heartbeat round-trip, agent config +
  service unit + WS client + sysinfo, restic wrapper, job
  lifecycle store + run-now endpoint, agent runner.

Partial (P1-04, P1-07, P1-21, P1-31):
  CSRF middleware lives with the UI work; audit middleware
  sweep lives with rest of API; live job-log fan-out needs
  the per-job browser hub; signed agent binaries deferred to
  Phase 5.

Open (P1-22..28):
  Snapshot listing, full UI suite (login, dashboard, host
  detail, live job log, add-host, Tailwind build).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 00:46:16 +01:00
steve 95b49ecab9 phase 1: run-now backup — restic wrapper, job lifecycle, end-to-end
Lands the operator → server → agent → restic → server roundtrip for
on-demand backups. The flow:

  POST /api/hosts/{id}/jobs {kind:"backup",args:["/path"]}
    → server creates a queued Job row
    → server emits command.run over WS to the host's agent
    → agent dispatcher spawns runner.RunBackup in a goroutine
    → runner spawns `restic backup --json`, parses each line
    → forwards: job.started, log.stream (every line), job.progress
      (throttled to 1/sec), job.finished (with summary stats blob)
    → server WS handler persists those into jobs / job_logs

P1-16 internal/restic: thin Locate + Env wrapper that runs `restic
  backup --json`, scans stdout/stderr, parses BackupStatus +
  BackupSummary, calls back into a LineHandler so the agent can fan
  out to log.stream + job.progress. Treats exit code 3 as
  "succeeded with issues" (matches restic's contract).

P1-18 store: jobs accessors (CreateJob, MarkJobStarted,
  MarkJobFinished, AppendJobLog, GetJob).

P1-19 server: POST /api/hosts/{id}/jobs creates the Job row,
  validates kind, dispatches via Hub.Send, audit-logs the action.

P1-20 agent runner: wraps restic.RunBackup with throttled progress
  emission. Sender abstraction was added to wsclient.Handler so
  background goroutines can keep replying after dispatch returns.

P1-21 server WS: dispatchAgentMessage now persists job.started,
  job.finished, log.stream into the database. Browser fan-out for
  live tailing lands with the UI work.

Agent gets repo_url + repo_password from agent.yaml in plaintext
for now (mode 0600, owned by service user); spec.md §7.3's keyring
storage moves there in P2. config.update over WS overrides the
in-memory copy (does not persist).

Build clean; all tests pass. End-to-end with a real restic still
needs a host that has restic installed — wire shape verified by
the existing hello/heartbeat round-trip test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 00:45:04 +01:00
steve e8eccd20c2 phase 1: agent install path — systemd unit, install.sh, asset endpoints
P1-14 deploy/install/restic-manager-agent.service: standard systemd
  unit with the usual hardening switches (NoNewPrivileges, Protect*,
  RestrictRealtime, MemoryDenyWriteExecute). Restart=always with a
  5s backoff. Runs as a dedicated unprivileged restic-manager-agent
  user; the install script creates it.

P1-29 deploy/install/install.sh: arch detection (amd64/arm64), pulls
  the agent binary from /agent/binary, creates the service user
  + dirs (/etc/restic-manager, /var/lib/restic-manager), runs
  enrollment via `agent -enroll-server -enroll-token`, lays down
  the systemd unit, enables and starts it.

  Honours the spec's "detect, don't auto-disable" rule for existing
  schedulers: scans systemd timers, /etc/cron.d/*, /etc/cron.daily/*,
  root crontab for restic-named entries and prints them with the
  exact disable command — operator decides.

P1-31 server endpoints to ship the agent installation payload:
  GET /agent/binary?os=linux&arch=amd64 → serves
    <DataDir>/agent-binaries/restic-manager-agent-linux-amd64
  GET /install/<file>                   → serves
    <DataDir>/install/<file>
  Both endpoints reject path traversal and return 404 if the file
  isn't published. Operators drop the binaries + service unit into
  these directories at release time. Signed-bundle verification is
  deferred to Phase 5 OSS readiness.

All tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 00:40:36 +01:00
steve f34773b505 phase 1: WS transport, enrollment, agent that hellos and heartbeats
Lands the protocol layer end-to-end: an agent can be enrolled
through the operator UI, store credentials, dial back to the server
over WS, complete the protocol_version handshake, and stay
connected with periodic heartbeats.

Server side:
- P1-09 ws.Hub: one Conn per host_id, last-write-wins eviction,
  json envelope writer with a write mutex, reader, error envelopes.
- P1-09 ws.AgentHandler: bearer-auth, accept upgrade, hello-stage
  (10s deadline, protocol_version checked against
  api.MinAgentProtocolVersion → ErrProtocolTooOld with help URL on
  reject), main read loop, defer hub register/unregister.
- P1-10 POST /api/agents/enroll consumes a one-time token, mints a
  persistent agent bearer (sha-256 stored), creates a host row.
- P1-10 POST /api/enrollment-tokens (operator, session-auth)
  issues a 1h one-time token.
- P1-11 hello upserts agent_version + restic_version +
  protocol_version on the host row, flips status to online.
- P1-12 heartbeat touches last_seen_at; background sweeper marks
  hosts offline after 90s without one.
- store: hosts table accessors, host_schedule_version,
  enrollment_tokens FK on consumed_host dropped (audit-only field;
  the token gets burned before the host row exists).

Agent side:
- P1-13 internal/agent/config: yaml at /etc/restic-manager/agent.yaml,
  atomic Save (tmp+fsync+rename), Enrolled() helper.
- P1-15 internal/agent/wsclient: dial with bearer + optional
  TLS cert pinning (sha-256 of leaf), exponential backoff with
  jitter (1s → 60s cap), heartbeat goroutine, fatal handling for
  ErrProtocolTooOld.
- P1-15 wsclient.Enroll: HTTP POST /api/agents/enroll with sysinfo.
- P1-17 internal/agent/sysinfo: hostname/OS/arch/restic-version
  collection. restic detected by `restic version` parse; absent
  restic doesn't block startup.
- cmd/agent: -enroll-server / -enroll-token flags drive first-run
  enrollment then exit (so the install script can hand off to
  systemd to run the persistent service).

End-to-end smoke verified: bootstrap → login → issue token →
enroll → run agent → server logs `ws agent connected` with the
right host_id and protocol_version 1.

All tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 00:39:00 +01:00
steve 84fd31ccaa phase 1: HTTP server + first-run bootstrap
P1-01 chi router, slog request log, graceful shutdown via signal
  context. Health endpoint, /api/auth/login, /api/auth/logout,
  /api/bootstrap. Background sweeper for expired sessions and
  enrollment tokens (15 min cadence).

P1-04 (sessions half) HttpOnly Secure-when-TLS cookie carrying a
  base64url token; server stores SHA-256(token) so a stolen DB
  doesn't yield credentials. Unknown user and bad password collapse
  to the same 401 response code so a probe can't enumerate names.

P1-05 first-run admin bootstrap. On a fresh DB the server mints a
  one-time token and prints it to stderr inside a banner. The
  /api/bootstrap handler accepts {token, username, password},
  creates the first admin, then becomes a 409 forever.

P1-07 (partial) audit hooks fire on auth.login and auth.bootstrap.
  Full middleware-driven coverage lands with the rest of the API.

internal/server/config: env > YAML > defaults. RM_LISTEN /
  RM_DATA_DIR / RM_BASE_URL / RM_TLS_CERT / RM_TLS_KEY /
  RM_SECRET_KEY_FILE / RM_TRUSTED_PROXY (CIDR list, validated).

End-to-end smoke test passes: server boots on a fresh dir,
prints the bootstrap token, POST /api/bootstrap creates the admin,
POST /api/auth/login returns 200 with a session cookie.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 00:28:18 +01:00
steve c275f4ff4c phase 1 foundations: api types, store, crypto, auth
Lands the bottom three layers of Phase 1:

P1-08 internal/api: protocol_version + envelope + every WS message
  shape from spec.md §6.2 (Hello, Heartbeat, Job*, Schedule*, etc).
  Wire-format tests pin the JSON shape so a rename here breaks
  tests instead of silently breaking the agent.

P1-02 + P1-03 internal/store: SQLite via modernc.org/sqlite,
  embed.FS + a tiny version table for hand-rolled migrations.
  0001_initial.sql covers every table from spec.md §5 plus
  enrollment_tokens and host_schedule_version. Typed accessors
  for users / sessions / enrollment / audit. WAL + foreign_keys
  + busy_timeout on by default.

P1-06 internal/crypto: XChaCha20-Poly1305 AEAD wrapper with
  per-message random nonce. Key file lifecycle (generate +
  refuse-to-overwrite, load with size validation). Optional
  additionalData binds ciphertext to the row that owns it.

P1-04 internal/auth (partial — passwords + tokens; sessions
  middleware lands with the HTTP handlers): argon2id following
  RFC 9106 (64 MiB / t=3 / p=4 / 32B), constant-time verify.
  HashToken stores SHA-256 of session/agent/enrollment tokens
  so a stolen DB doesn't hand over credentials.

Build floor moves to Go 1.25 (modernc.org/sqlite v1.50+ requires
it); CI + Dockerfile + README updated. Markdown lint diagnostics
on tasks.md cleared.

All packages tested. ~70 new tests pass in <1s.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 00:24:40 +01:00
steve 595546afb9 spec/tasks: address pre-Phase-1 design feedback
CI / Test (linux/amd64) (push) Has been cancelled
CI / Lint (push) Has been cancelled
CI / Build (windows/amd64) (push) Has been cancelled
CI / Build (linux/amd64) (push) Has been cancelled
CI / Build (linux/arm64) (push) Has been cancelled
Doc-only changes captured before any Phase 1 code lands.

spec.md:
- §4.1 nhooyr.io/websocket → github.com/coder/websocket (the
  maintained fork; the original is unmaintained)
- §4.1 RM_LISTEN documented as source of truth for the bind port;
  add RM_TRUSTED_PROXY env var for X-Forwarded-* handling behind
  Caddy/Traefik
- §4.2 Phase 1 ships Linux only; Windows binaries continue to build
  in CI to keep the codebase portable, but service integration +
  installer move to Phase 2
- §4.2 self-update via apt/choco, not bespoke signed binaries
- §5 add Host.protocol_version + Host.applied_schedule_version
- §6.2 lock protocol_version handshake semantics (clean error on
  mismatch, not weird JSON parse failures)
- §6.2 schedule reconciliation when server unreachable: agent keeps
  firing last-known-good indefinitely; server's view canonical on
  reconnect; UI surfaces drift via applied_schedule_version
- §6.2 schedule.set carries schedule_version; new schedule.ack
  agent→server message
- §10.1 cross-reference RM_LISTEN ↔ compose port mapping
- §14.3 hooks rejected at validation on non-backup schedule kinds

tasks.md:
- P1-14 / P1-30 (Windows service + install.ps1) → Phase 2 as
  P2-16 / P2-17
- P1-29 install.sh detects existing restic timers/cron and prints
  disable commands, doesn't auto-disable
- Phase 1 acceptance: drop Windows from end-to-end criterion,
  require windows cross-compile in CI
- P4-01 rewritten: package-manager-based update delivery
- P5-08 removed (duplicate of P4-08 Prometheus /metrics)
- Various references updated

No Go code changes; build still clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 00:12:55 +01:00
steve c9368de904 phase 0: project bootstrap
CI / Test (linux/amd64) (push) Has been cancelled
CI / Lint (push) Has been cancelled
CI / Build (windows/amd64) (push) Has been cancelled
CI / Build (linux/amd64) (push) Has been cancelled
CI / Build (linux/arm64) (push) Has been cancelled
P0-01 Go module + cmd/server + cmd/agent skeletons + internal/ tree
P0-02 LICENSE (PolyForm NC 1.0.0), README, CONTRIBUTING
P0-03 golangci-lint, pre-commit, .editorconfig, .gitignore
P0-04 Gitea Actions CI: test (race+coverage), lint, cross-platform build matrix
P0-05 Dockerfile.server (multi-stage, distroless/static), docker-compose.yml
P0-06 Makefile with build/test/lint/fmt/run/release targets

build, vet, test, and cross-compile to linux/{amd64,arm64} + windows/amd64
all verified locally.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 00:03:59 +01:00
steve 7612687a14 initial setup ready 2026-04-30 23:55:52 +01:00
108 changed files with 6586 additions and 8629 deletions
+2 -29
View File
@@ -53,24 +53,8 @@ env:
jobs:
test:
# Sharded by package group. server/http and store are the two
# heavy packages (~156s and ~75s in CI respectively under
# `-race`); pulling them onto their own runners lets each shard
# have all CPUs to itself instead of CPU-starving each other on
# one runner. The third shard ("rest") covers everything else.
name: Test (${{ matrix.name }})
name: Test (linux/amd64)
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
include:
- name: server-http
packages: ./internal/server/http/...
- name: store
packages: ./internal/store/...
- name: rest
# Computed at runtime — see the "go test" step below.
packages: ""
steps:
- uses: actions/checkout@v4
- uses: actions/setup-go@v5
@@ -80,18 +64,7 @@ jobs:
- name: go vet
run: go vet ./...
- name: go test
run: |
set -euo pipefail
if [ -n "${{ matrix.packages }}" ]; then
pkgs="${{ matrix.packages }}"
else
# "rest" shard: everything except the dedicated shards.
pkgs=$(go list ./... \
| grep -v '/internal/server/http$' \
| grep -v '/internal/store$')
fi
# shellcheck disable=SC2086
go test -race -coverprofile=coverage.out $pkgs
run: go test -race -coverprofile=coverage.out ./...
- name: coverage summary
run: go tool cover -func=coverage.out | tail -1
-107
View File
@@ -1,107 +0,0 @@
# Release workflow — P5-03 (docker-only release path).
#
# Spec : docs/superpowers/specs/2026-05-05-p5-03-docker-only-release.md
# Plan : docs/superpowers/plans/2026-05-05-p5-03-docker-only-release.md
#
# What it does
# * Triggered by either:
# - tag push matching v[0-9]+.[0-9]+.[0-9]+ (real release), or
# - workflow_dispatch (snapshot iteration without tagging).
# * Cross-builds a multi-arch (linux/amd64,linux/arm64) image of the
# server, with three agent binaries (linux amd64+arm64, windows amd64)
# plus install.sh / install.ps1 / the systemd unit baked in under
# /opt/restic-manager/dist (the read-only fallback path the server
# handlers use when <DataDir>/... is empty).
# * Pushes to this Gitea instance's container registry under
# <gitea-host>/<owner>/restic-manager.
#
# Tag fan-out
# * tag push: :vX.Y.Z, :X.Y, :X
# * tag push and X >= 1: also :latest
# * workflow_dispatch: only :snapshot-<shortsha>; nothing else moves.
#
# Why no goreleaser
# The architecture already routes agent distribution through the
# server's /agent/binary endpoint. The image is the only deliverable;
# binary archives would just be a second source of truth.
name: Release
on:
push:
tags:
- 'v[0-9]+.[0-9]+.[0-9]+'
workflow_dispatch:
env:
REGISTRY: gitea.dcglab.co.uk
IMAGE_NAME: ${{ gitea.repository }}
jobs:
image:
name: Build + push image
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: docker/setup-qemu-action@v3
- uses: docker/setup-buildx-action@v3
- name: Log in to Gitea registry
uses: docker/login-action@v3
with:
registry: ${{ env.REGISTRY }}
username: ${{ gitea.actor }}
password: ${{ secrets.DEV_TOKEN }}
- name: Compute tags + version
id: meta
shell: bash
run: |
set -euo pipefail
REG="${REGISTRY}/${IMAGE_NAME}"
DATE="$(date -u +%Y-%m-%dT%H:%M:%SZ)"
SHORT_SHA="${GITHUB_SHA::7}"
if [ "${GITHUB_EVENT_NAME}" = "push" ] && [ "${GITHUB_REF_TYPE}" = "tag" ]; then
TAG="${GITHUB_REF_NAME}" # vX.Y.Z
VER="${TAG#v}" # X.Y.Z
MAJOR="${VER%%.*}"
MINOR="${VER#${MAJOR}.}"; MINOR="${MINOR%%.*}"
TAGS="${REG}:${TAG}"
TAGS="${TAGS},${REG}:${MAJOR}.${MINOR}"
TAGS="${TAGS},${REG}:${MAJOR}"
# Pre-1.0 holds back :latest by design; operators must
# pin a version explicitly until v1.0.0.
if [ "${MAJOR}" -ge 1 ]; then
TAGS="${TAGS},${REG}:latest"
fi
VERSION="${TAG}"
else
TAGS="${REG}:snapshot-${SHORT_SHA}"
VERSION="0.0.0-snapshot-${SHORT_SHA}"
fi
{
echo "tags=${TAGS}"
echo "version=${VERSION}"
echo "date=${DATE}"
} >> "${GITHUB_OUTPUT}"
- name: Build + push
uses: docker/build-push-action@v6
with:
context: .
file: deploy/Dockerfile.server
platforms: linux/amd64,linux/arm64
push: true
tags: ${{ steps.meta.outputs.tags }}
build-args: |
VERSION=${{ steps.meta.outputs.version }}
COMMIT=${{ gitea.sha }}
DATE=${{ steps.meta.outputs.date }}
labels: |
org.opencontainers.image.version=${{ steps.meta.outputs.version }}
org.opencontainers.image.revision=${{ gitea.sha }}
org.opencontainers.image.created=${{ steps.meta.outputs.date }}
-6
View File
@@ -26,12 +26,6 @@ coverage.html
.env.local
*.local
# Local docker-compose for the dev/test bench. Has host-specific IPs,
# hostnames, and ports — never committed; the canonical reference
# deployment lives in deploy/.
/compose.yaml
/compose.override.yaml
# Local diagnostic helpers (never shipped). Go's build tooling already
# skips paths beginning with _ or ., but ignore explicitly so nothing
# checked in here can leak into a release tarball.
-9
View File
@@ -2,19 +2,10 @@
Project-specific rules for Claude when working in this repo.
## Commands
Is the user types in any of the following, follow the instructions in the table
| Command | Action |
| --- | --- |
| :release | trigger subagent to commit (if needed), push (if needed), raise PR, wait for PR to pass or fail. If fail, report back. If pass, merge in to main |
## Repo
The repo lives inside a Gitea instance; `tea` CLI is available for use by agents
## Run `go vet` before every commit
CI runs `go vet ./...` and will fail the build on any vet error.
+3 -9
View File
@@ -5,11 +5,9 @@ BIN_DIR := bin
SERVER_BIN := $(BIN_DIR)/restic-manager-server
AGENT_BIN := $(BIN_DIR)/restic-manager-agent
VERSION ?= $(shell git describe --tags --always --dirty 2>/dev/null || echo dev)
COMMIT ?= $(shell git rev-parse HEAD 2>/dev/null || echo none)
DATE ?= $(shell date -u +%Y-%m-%dT%H:%M:%SZ)
LDFLAGS := -s -w -X main.version=$(VERSION) -X main.commit=$(COMMIT) -X main.date=$(DATE)
LDFLAGS := -s -w -X main.version=$(VERSION)
GOFLAGS := -trimpath
DOCKER_IMAGE ?= gitea.dcglab.co.uk/steve/restic-manager
DOCKER_IMAGE ?= ghcr.io/dcglab/restic-manager
DOCKER_TAG ?= dev
# Tailwind standalone CLI — single binary, no Node toolchain.
@@ -86,11 +84,7 @@ run-agent: agent ## Build and run the agent
$(AGENT_BIN)
docker: ## Build the server Docker image
docker build -f deploy/Dockerfile.server \
--build-arg VERSION=$(VERSION) \
--build-arg COMMIT=$(COMMIT) \
--build-arg DATE=$(DATE) \
-t $(DOCKER_IMAGE):$(DOCKER_TAG) .
docker build -f deploy/Dockerfile.server --build-arg VERSION=$(VERSION) -t $(DOCKER_IMAGE):$(DOCKER_TAG) .
release: ## Cross-compile for all supported platforms
@mkdir -p $(BIN_DIR)
+8
View File
@@ -0,0 +1,8 @@
# The ask!
I have numerous servers deployed out in a lab, mainly Linux but some Windows
All have restic installed on them
I need to build a browser based management service that allows me to have a central single-plane-of-glass to monitor and manage all teh endpoints
All endpoints will be enabled for SSH (unless other methods are better?)
Plan out how we would go about this please?
+24 -74
View File
@@ -24,11 +24,7 @@ import (
"gitea.dcglab.co.uk/steve/restic-manager/internal/restic"
)
var (
version = "dev"
commit = "none"
date = "unknown"
)
var version = "dev"
func main() {
if err := run(); err != nil {
@@ -66,7 +62,7 @@ func run() error {
flag.Parse()
if *showVersion {
fmt.Printf("restic-manager-agent %s (commit %s, built %s)\n", version, commit, date)
fmt.Println("restic-manager-agent", version)
return nil
}
@@ -115,12 +111,6 @@ func run() error {
resticBin, _ := restic.Locate(cfg.ResticPath) // empty is fine; commands fail with a clear error later
// Probe the actual restic binary for restore-flag support. We used
// to gate --no-ownership on a SemVer comparison (added in 0.17),
// but a restic 0.18.1 build was observed in the wild that still
// rejects the flag. The help text is the only reliable signal.
resticSupportsNoOwnership := restic.SupportsRestoreNoOwnership(ctx, resticBin)
// Open the secrets store. If the agent is enrolled but has no
// secrets key yet (legacy YAML), mint one and migrate any
// plaintext repo fields into the encrypted blob.
@@ -145,11 +135,10 @@ func run() error {
}
d := &dispatcher{
resticBin: resticBin,
resticVer: snap.ResticVersion,
resticSupportsNoOwnership: resticSupportsNoOwnership,
secrets: sec,
scheduler: scheduler.New(),
resticBin: resticBin,
resticVer: snap.ResticVersion,
secrets: sec,
scheduler: scheduler.New(),
}
if err := wsclient.Run(ctx, wsCfg, d.handle); err != nil {
return fmt.Errorf("ws run: %w", err)
@@ -211,11 +200,10 @@ func openSecretsStore(cfg *config.Config) (*secrets.Store, error) {
// secrets store on each job — config.update writes through to disk,
// so a job dispatched in the same session sees the latest values.
type dispatcher struct {
resticBin string
resticVer string // e.g. "0.17.1"; empty if restic isn't installed yet
resticSupportsNoOwnership bool // captured at startup from `restic restore --help`
secrets *secrets.Store
scheduler *scheduler.Scheduler
resticBin string
resticVer string // e.g. "0.17.1"; empty if restic isn't installed yet
secrets *secrets.Store
scheduler *scheduler.Scheduler
// Bandwidth caps in KB/s pushed via config.update. Mutated under
// bwMu by the config.update handler; read by runJob when building
@@ -472,47 +460,17 @@ func (d *dispatcher) handleTreeList(ctx context.Context, reqID string, p api.Tre
reply(api.TreeListResultPayload{Entries: apiEntries})
}
// failJob ships a synthetic job.started + job.finished(failed) pair
// for a command.run we couldn't even spawn locally — missing restic
// binary, missing credentials, or a malformed payload. Without these
// envelopes the server has no way to know the job will never produce
// output: the row sits in "running", the live stream stays stuck on
// "awaiting agent output," and a subsequent command.cancel arrives
// for a job_id the agent never registered (we log "unknown job"
// because trackJob was never called). Sending a terminal envelope
// here closes the loop on both fronts.
func failJob(p api.CommandRunPayload, tx wsclient.Sender, errMsg string) {
now := time.Now().UTC()
if startedEnv, err := api.Marshal(api.MsgJobStarted, p.JobID, api.JobStartedPayload{
JobID: p.JobID, Kind: p.Kind, StartedAt: now,
}); err == nil {
_ = tx.Send(startedEnv)
}
if finEnv, err := api.Marshal(api.MsgJobFinished, p.JobID, api.JobFinishedPayload{
JobID: p.JobID,
Status: api.JobFailed,
ExitCode: -1,
FinishedAt: now,
Error: errMsg,
}); err == nil {
_ = tx.Send(finEnv)
}
}
// runJob spawns a runner for one job. We launch a goroutine so the
// WS read loop keeps draining messages while restic chugs along.
func (d *dispatcher) runJob(ctx context.Context, p api.CommandRunPayload, tx wsclient.Sender) error {
if d.resticBin == "" {
failJob(p, tx, "restic binary not located on this agent")
return fmt.Errorf("restic binary not located on this agent")
}
creds, err := d.secrets.Load()
if err != nil {
failJob(p, tx, "load repo credentials: "+err.Error())
return fmt.Errorf("load repo credentials: %w", err)
}
if creds.Empty() {
failJob(p, tx, "repo credentials not configured (waiting for server config.update push)")
return fmt.Errorf("repo credentials not configured (waiting for server config.update push)")
}
// r is the everyday runner — bound to the host's repo
@@ -536,14 +494,13 @@ func (d *dispatcher) runJob(ctx context.Context, p api.CommandRunPayload, tx wsc
}
r := runner.New(runner.Config{
ResticBin: d.resticBin,
ResticVersion: d.resticVer,
RepoURL: creds.URL,
RepoUsername: creds.Username,
RepoPassword: creds.Password,
SupportsRestoreNoOwnership: d.resticSupportsNoOwnership,
LimitUploadKBps: upKBps,
LimitDownloadKBps: downKBps,
ResticBin: d.resticBin,
ResticVersion: d.resticVer,
RepoURL: creds.URL,
RepoUsername: creds.Username,
RepoPassword: creds.Password,
LimitUploadKBps: upKBps,
LimitDownloadKBps: downKBps,
}, tx, time.Second)
// spawn wraps the kind-specific goroutine: derives a per-job
@@ -599,7 +556,6 @@ func (d *dispatcher) runJob(ctx context.Context, p api.CommandRunPayload, tx wsc
// policy fallback was specced but skipped — see the
// Phase 5 plan rationale and version.go's lockstep-deploy
// note for why.
failJob(p, tx, "forget: command.run carried no forget_groups (server didn't populate them)")
return fmt.Errorf("forget: command.run carried no forget_groups (server didn't populate them)")
}
groups := make([]restic.ForgetGroup, 0, len(p.ForgetGroups))
@@ -634,14 +590,13 @@ func (d *dispatcher) runJob(ctx context.Context, p api.CommandRunPayload, tx wsc
runCreds = ac
}
prr := runner.New(runner.Config{
ResticBin: d.resticBin,
ResticVersion: d.resticVer,
RepoURL: runCreds.URL,
RepoUsername: runCreds.Username,
RepoPassword: runCreds.Password,
SupportsRestoreNoOwnership: d.resticSupportsNoOwnership,
LimitUploadKBps: upKBps,
LimitDownloadKBps: downKBps,
ResticBin: d.resticBin,
ResticVersion: d.resticVer,
RepoURL: runCreds.URL,
RepoUsername: runCreds.Username,
RepoPassword: runCreds.Password,
LimitUploadKBps: upKBps,
LimitDownloadKBps: downKBps,
}, tx, time.Second)
slog.Info("agent: accepting prune job", "job_id", p.JobID, "admin_creds", p.RequiresAdminCreds)
spawn("prune", func(jobCtx context.Context) error {
@@ -663,16 +618,13 @@ func (d *dispatcher) runJob(ctx context.Context, p api.CommandRunPayload, tx wsc
})
case api.JobRestore:
if p.Restore == nil {
failJob(p, tx, "restore: command.run carried no restore payload")
return fmt.Errorf("restore: command.run carried no restore payload")
}
rp := *p.Restore
if rp.SnapshotID == "" {
failJob(p, tx, "restore: snapshot_id is required")
return fmt.Errorf("restore: snapshot_id is required")
}
if !rp.InPlace && rp.TargetDir == "" {
failJob(p, tx, "restore: target_dir required for non-in-place restore")
return fmt.Errorf("restore: target_dir required for non-in-place restore")
}
slog.Info("agent: accepting restore job",
@@ -683,7 +635,6 @@ func (d *dispatcher) runJob(ctx context.Context, p api.CommandRunPayload, tx wsc
})
case api.JobDiff:
if p.Diff == nil || p.Diff.SnapshotA == "" || p.Diff.SnapshotB == "" {
failJob(p, tx, "diff: command.run carried incomplete diff payload")
return fmt.Errorf("diff: command.run carried incomplete diff payload")
}
dp := *p.Diff
@@ -693,7 +644,6 @@ func (d *dispatcher) runJob(ctx context.Context, p api.CommandRunPayload, tx wsc
return r.RunDiff(jobCtx, p.JobID, dp.SnapshotA, dp.SnapshotB)
})
default:
failJob(p, tx, fmt.Sprintf("kind %q not implemented on this agent", p.Kind))
return fmt.Errorf("kind %q not implemented yet (Phase 2 lands the rest)", p.Kind)
}
return nil
+2 -19
View File
@@ -19,17 +19,12 @@ import (
"gitea.dcglab.co.uk/steve/restic-manager/internal/server/config"
rmhttp "gitea.dcglab.co.uk/steve/restic-manager/internal/server/http"
"gitea.dcglab.co.uk/steve/restic-manager/internal/server/maintenance"
"gitea.dcglab.co.uk/steve/restic-manager/internal/server/oidc"
"gitea.dcglab.co.uk/steve/restic-manager/internal/server/ui"
"gitea.dcglab.co.uk/steve/restic-manager/internal/server/ws"
"gitea.dcglab.co.uk/steve/restic-manager/internal/store"
)
var (
version = "dev"
commit = "none"
date = "unknown"
)
var version = "dev"
func main() {
if err := run(); err != nil {
@@ -44,7 +39,7 @@ func run() error {
flag.Parse()
if *showVersion {
fmt.Printf("restic-manager-server %s (commit %s, built %s)\n", version, commit, date)
fmt.Println("restic-manager-server", version)
return nil
}
@@ -97,17 +92,6 @@ func run() error {
return fmt.Errorf("ui: %w", err)
}
var oidcClient *oidc.Client
if cfg.OIDC != nil {
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel()
oidcClient, err = oidc.New(ctx, cfg.OIDC, cfg.BaseURL)
if err != nil {
return fmt.Errorf("oidc: %w", err)
}
slog.Info("oidc enabled", "issuer", cfg.OIDC.Issuer, "display", cfg.OIDC.DisplayName)
}
deps := rmhttp.Deps{
Cfg: cfg,
Store: st,
@@ -118,7 +102,6 @@ func run() error {
NotificationHub: notifHub,
UI: renderer,
Version: version,
OIDC: oidcClient,
}
// First-run bootstrap: if the users table is empty, mint a one-time
+7 -57
View File
@@ -1,17 +1,14 @@
# syntax=docker/dockerfile:1.7
# ---- Build stage --------------------------------------------------------
# Cross-compiles:
# * the server binary for the image's TARGETARCH (linux/amd64 or arm64),
# * three agent binaries (linux/amd64, linux/arm64, windows/amd64) that
# the running server hands out via /agent/binary.
# Pure-Go SQLite (modernc.org/sqlite) means CGO stays off; static binaries
# run on distroless/static.
FROM --platform=$BUILDPLATFORM golang:1.25-alpine AS build
FROM golang:1.25-alpine AS build
WORKDIR /src
# Pure-Go SQLite (modernc.org/sqlite) means we can keep CGO off and build a
# fully static binary that runs on distroless/static.
ENV CGO_ENABLED=0 \
GOOS=linux \
GOFLAGS="-trimpath"
# Cache module downloads in a separate layer.
@@ -21,41 +18,9 @@ RUN go mod download
COPY . .
ARG VERSION=dev
ARG COMMIT=none
ARG DATE=unknown
ARG TARGETOS
ARG TARGETARCH
ENV LDFLAGS="-s -w -X main.version=${VERSION} -X main.commit=${COMMIT} -X main.date=${DATE}"
# Server: built for the image's runtime arch.
RUN GOOS=${TARGETOS} GOARCH=${TARGETARCH} \
go build -ldflags="${LDFLAGS}" \
-o /out/restic-manager-server \
./cmd/server
# Empty /data skeleton so the runtime image carries an existing,
# nonroot-owned mount point. Docker copies that ownership onto a
# named volume the first time it's created, which avoids the
# "permission denied" trap on /data/secret.key when the operator
# uses a default `volumes: { rm-data: {} }` declaration.
RUN mkdir -p /out/data
# Agents: identical across image arches — an arm64 server image still
# ships an amd64 agent binary for amd64 endpoints to download.
RUN mkdir -p /out/agent-binaries && \
GOOS=linux GOARCH=amd64 \
go build -ldflags="${LDFLAGS}" \
-o /out/agent-binaries/restic-manager-agent-linux-amd64 \
./cmd/agent && \
GOOS=linux GOARCH=arm64 \
go build -ldflags="${LDFLAGS}" \
-o /out/agent-binaries/restic-manager-agent-linux-arm64 \
./cmd/agent && \
GOOS=windows GOARCH=amd64 \
go build -ldflags="${LDFLAGS}" \
-o /out/agent-binaries/restic-manager-agent-windows-amd64.exe \
./cmd/agent
RUN go build -ldflags="-s -w -X main.version=${VERSION}" \
-o /out/restic-manager-server \
./cmd/server
# ---- Runtime stage ------------------------------------------------------
FROM gcr.io/distroless/static-debian12:nonroot
@@ -66,22 +31,7 @@ LABEL org.opencontainers.image.licenses="PolyForm-Noncommercial-1.0.0"
USER nonroot:nonroot
WORKDIR /
# Server binary on PATH.
COPY --from=build /out/restic-manager-server /usr/local/bin/restic-manager-server
# Image-baked bundled assets (P5-03). Read-only; the /agent/binary and
# /install/* handlers fall back here when <DataDir>/... is empty, so a
# fresh container Just Works without first-run staging. Operators can
# still drop a custom build under <DataDir>/agent-binaries/<name> to
# override per-host.
COPY --from=build --chmod=0755 /out/agent-binaries/ /opt/restic-manager/dist/agent-binaries/
COPY --chmod=0755 deploy/install/install.sh /opt/restic-manager/dist/install/install.sh
COPY --chmod=0644 deploy/install/install.ps1 /opt/restic-manager/dist/install/install.ps1
COPY --chmod=0644 deploy/install/restic-manager-agent.service /opt/restic-manager/dist/install/restic-manager-agent.service
# Pre-created data dir owned by nonroot so a fresh named volume
# inherits the right ownership.
COPY --from=build --chown=nonroot:nonroot /out/data /data
EXPOSE 8443
ENTRYPOINT ["/usr/local/bin/restic-manager-server"]
+9 -40
View File
@@ -1,52 +1,21 @@
# Reference deployment for the restic-manager control plane.
# Mirrors spec.md §10.1 and the P5-07 reference deployment.
# Mirrors spec.md §10.1. Adjust image tag and RM_BASE_URL for your env.
#
# Scope: this compose stands up the server only. TLS termination and
# the public hostname belong to a reverse proxy that lives outside
# this stack (Caddy, Traefik, nginx, HAProxy, your existing edge —
# whatever you already operate). See `docs/reverse-proxy.md` for the
# headers + CIDRs that proxy needs to forward.
#
# Architecture:
# * The server speaks plain HTTP on :8080.
# * The agent binaries + install scripts ship inside the image under
# /opt/restic-manager/dist/, so /agent/binary and /install/*
# serve out of the box without first-run staging.
# * The named volume holds *only* operator state (sqlite,
# secrets.enc, audit log, the AEAD key). Image upgrades replace
# the agents/scripts; the volume is untouched.
# * Pre-1.0 releases never publish :latest — pin to an exact
# vX.Y.Z tag and bump deliberately.
#
# Before first start:
# 1. Pick a version: export RM_VERSION=vX.Y.Z (or substitute below).
# 2. Set RM_BASE_URL to the public HTTPS URL the external proxy
# serves on.
# 3. Set RM_TRUSTED_PROXY to the IP/CIDR the proxy connects from
# (the X-Forwarded-* headers are honoured only when the immediate
# peer matches one of these).
# The server speaks plain HTTP. Front it with a TLS-terminating
# reverse proxy (Caddy/Traefik/nginx). RM_TRUSTED_PROXY must contain
# the proxy's IP/CIDR so X-Forwarded-* headers are honoured.
services:
restic-manager:
image: gitea.dcglab.co.uk/steve/restic-manager:${RM_VERSION:?set RM_VERSION to a vX.Y.Z tag}
image: ghcr.io/dcglab/restic-manager:latest
restart: unless-stopped
# Bind to localhost only — your reverse proxy reaches the server
# over loopback (or, if it runs in a separate compose / on
# another host, swap this for an internal docker network or a
# private LAN bind).
# Bind to localhost only — the proxy is what the public reaches.
ports:
- "127.0.0.1:8080:8080"
volumes:
- rm-data:/data
- ./data:/data
environment:
- RM_DATA_DIR=/data
- RM_LISTEN=:8080
- RM_BASE_URL=${RM_BASE_URL:?set RM_BASE_URL to the public https URL}
- RM_BASE_URL=https://restic.lab.example
- RM_SECRET_KEY_FILE=/data/secret.key
- RM_TRUSTED_PROXY=${RM_TRUSTED_PROXY:?set RM_TRUSTED_PROXY to the proxy CIDR}
# Cookies are Secure by default; keep that. Override only for
# local-HTTP smoke tests.
# - RM_COOKIE_SECURE=true
volumes:
rm-data:
- RM_TRUSTED_PROXY=172.16.0.0/12
+6 -4
View File
@@ -49,10 +49,12 @@ detect_arch() {
ensure_dirs() {
install -d -m 0700 -o root -g root "$RM_CONFIG_DIR"
install -d -m 0700 -o root -g root "$RM_STATE_DIR"
# Default new-directory restore target: $HOME/rm-restore. With the
# current unit (ProtectSystem=full, no ReadWritePaths pin) the agent
# can mkdir anywhere on real filesystems, so this is just a courtesy
# pre-create so the wizard's default lands in a tidy spot.
# Default new-directory restore target: $HOME/rm-restore. Pre-create
# so the systemd unit's ReadWritePaths bind-mount applies cleanly
# (paths that don't exist when systemd starts get a soft-fail
# because of the '-' prefix, but the agent then can't mkdir into
# the read-only /root). Mode 0700 + root-owned matches the threat
# model — files restored here are operator-readable as root.
install -d -m 0700 -o root -g root /root/rm-restore
}
+10 -19
View File
@@ -33,26 +33,17 @@ CapabilityBoundingSet=CAP_DAC_READ_SEARCH CAP_DAC_OVERRIDE CAP_FOWNER CAP_CHOWN
AmbientCapabilities=CAP_DAC_READ_SEARCH CAP_DAC_OVERRIDE CAP_FOWNER CAP_CHOWN
# Hardening — blocks privilege escalation even from root, and
# confines kernel / namespace / privilege surface. Filesystem reads
# stay open (that's the whole job) and restore writes are
# unrestricted: a backup tool whose entire purpose is "put files
# back where they belong" can't have ProtectHome=read-only or
# ProtectSystem=strict without breaking on the first cross-user
# restore. ProtectSystem=full keeps /usr, /boot, /efi read-only so a
# compromised agent can't swap out /usr/bin/restic or drop a kernel
# module, while leaving /home, /root, /var, /opt, /srv, /tmp etc.
# writable for arbitrary restore targets. The agent is treated as a
# high-trust component (it runs operator hooks as root and holds
# repo credentials); the residual hardening is about kernel + privesc
# protection, not write confinement.
# confines writes / network / kernel access to what restic actually
# needs. Filesystem reads stay open: that's the whole job.
NoNewPrivileges=true
ProtectSystem=full
# ProtectSystem=full mounts /usr, /boot, /efi *and* /etc read-only.
# The agent rewrites /etc/restic-manager/agent.yaml on enrolment and
# whenever a new SecretsKey is minted, so we need a targeted
# write-exemption for that dir. No exemption for the rest of /etc:
# the agent has no business editing /etc/passwd, /etc/sudoers, etc.
ReadWritePaths=/etc/restic-manager
ProtectSystem=strict
# /etc/restic-manager: agent.yaml + secrets.enc.
# /var/lib/restic-manager: agent state (currently unused but reserved).
# /root/rm-restore: default target for new-directory restores
# ($HOME/rm-restore/<job-id>/ resolves here for User=root).
# ReadWritePaths overrides ProtectHome=read-only on this subdir only.
ReadWritePaths=/etc/restic-manager /var/lib/restic-manager -/root/rm-restore
ProtectHome=read-only
ProtectHostname=true
ProtectKernelTunables=true
ProtectKernelModules=true
-113
View File
@@ -1,113 +0,0 @@
# Running behind a reverse proxy
The restic-manager server is HTTP-only by design (see `spec.md` §11):
TLS termination, public hostname, ACME, HSTS, and edge-level rate
limiting all belong to a reverse proxy that you already operate
outside this project. The reference compose in `deploy/docker-compose.yml`
stands up *only* the server; this page covers what your proxy needs
to do to make the rest of it work.
## What the proxy must forward
The server reads four headers when (and only when) the immediate peer
matches `RM_TRUSTED_PROXY`:
| Header | Value | Why |
|---------------------|----------------------------------------------------------|-----|
| `X-Forwarded-For` | The original client IP (single value, or comma chain) | Rate-limit keys, audit log entries, and OIDC redirect-URI checks all use the real client IP. |
| `X-Forwarded-Proto` | `https` | The server emits absolute URLs (e.g. OIDC redirect URIs) using this. |
| `Host` | The public hostname clients use | Cookies are scoped to this; `RM_BASE_URL` must match. |
| `Connection`/`Upgrade` | Pass through unchanged | The agent connects on `/ws/agent` and the live-log viewer connects on `/api/jobs/{id}/stream` — both are WebSockets and need `Upgrade: websocket` to survive the hop. |
Set `RM_TRUSTED_PROXY` to the CIDR (or comma-separated list of CIDRs)
the proxy connects from. Anything outside that range has its
`X-Forwarded-*` headers ignored, so a stray request that bypasses the
proxy can't spoof the client IP.
## Example: Caddy
```caddyfile
restic.example.com {
# Caddy's default reverse_proxy preserves Host, sets
# X-Forwarded-For/Proto, and passes Connection: upgrade through,
# so a single directive covers HTTP + WebSocket.
reverse_proxy 127.0.0.1:8080
encode zstd gzip
}
```
`RM_TRUSTED_PROXY=127.0.0.1/32` if Caddy and the server share the
host; the docker-bridge CIDR (commonly `172.16.0.0/12`) if Caddy
runs in another container on the default bridge network.
## Example: nginx
```nginx
server {
listen 443 ssl http2;
server_name restic.example.com;
ssl_certificate /etc/ssl/restic.example.com.fullchain.pem;
ssl_certificate_key /etc/ssl/restic.example.com.key.pem;
location / {
proxy_pass http://127.0.0.1:8080;
proxy_http_version 1.1;
# WebSocket support — agent + live-log endpoints need this.
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection $connection_upgrade;
# Trusted-proxy headers.
proxy_set_header Host $host;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto https;
# Live job logs are long-running streams. Bump read timeouts
# so nginx doesn't drop them mid-backup.
proxy_read_timeout 1h;
proxy_send_timeout 1h;
}
}
# Standard websocket upgrade map (define once at the http {} level).
map $http_upgrade $connection_upgrade {
default upgrade;
'' close;
}
```
`RM_TRUSTED_PROXY` for the same-host case: `127.0.0.1/32`.
## Example: Traefik (label-based)
```yaml
labels:
- "traefik.enable=true"
- "traefik.http.routers.restic-manager.rule=Host(`restic.example.com`)"
- "traefik.http.routers.restic-manager.entrypoints=websecure"
- "traefik.http.routers.restic-manager.tls.certresolver=letsencrypt"
- "traefik.http.services.restic-manager.loadbalancer.server.port=8080"
```
Traefik handles `X-Forwarded-*` and `Connection: upgrade` by default.
`RM_TRUSTED_PROXY` should be the docker network the Traefik container
shares with the server (commonly `172.16.0.0/12` for the default
bridge, or whatever your overlay network's CIDR is).
## Sanity-checking the wiring
After bringing the stack up:
1. `curl -fsS https://restic.example.com/healthz` — should return 200.
2. The login page should report HTTPS in the address bar; cookies
set after login should carry the `Secure` flag.
3. Check the server log for the `config resolved` line:
`trusted_proxies` must include the IP/CIDR your proxy actually
connects from.
4. Enrol a test agent — the WebSocket handshake hitting `/ws/agent`
confirms `Upgrade` is being forwarded correctly.
If any of those fail, the proxy is the first place to look — the
server itself is intentionally minimal.
File diff suppressed because it is too large Load Diff
@@ -0,0 +1,259 @@
# P2 Completion Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
**Goal:** Close every remaining P2 task in `tasks.md`: P2R-09 (auto-init UX), P2R-10/11/12 (hooks), P2R-13 (bandwidth wiring + per-job override), P2R-14 (schedule next/last run), P2-16 (Windows svc), P2-17 (`install.ps1`), P2-18 (announce-and-approve).
**Architecture:** Server stays HTTP+WS; agent stays a single binary that auto-restages via `make build`. Hooks live on `source_groups` (and host-level defaults). Announce-and-approve adds a separate WS path (`/ws/agent/pending`) and a Pending hosts panel; token-flow stays default. Windows service support uses `golang.org/x/sys/windows/svc` behind a `//go:build windows` tag — Linux builds untouched. **Operator is away — make best guesses on small UX choices, but commit each item separately so the choices are reviewable.**
**Tech Stack:** Go 1.23+, chi router, modernc/sqlite, `coder/websocket`, `robfig/cron/v3`, HTMX + Tailwind, `golang.org/x/sys/windows/svc`, Ed25519 (stdlib).
---
## Pre-flight
- [ ] **Run baseline:** `go vet ./... && go build ./... && go test ./...` — must be green before starting. Restage agent + restart server (per CLAUDE.md restage block) so smoke env is warm.
## Order of execution
Smallest blast-radius first. UI polish → bandwidth → next/last → hooks → announce → Windows. Commit and restage at each task boundary. Run `go vet ./... && go test ./...` before every commit.
---
## Task 1 — P2R-13a: Wire bandwidth caps into restic invocations
**Files:**
- Modify: `internal/restic/runner.go` (add `LimitUploadKBps`, `LimitDownloadKBps` to `Env` or to a per-call options struct already present; emit `--limit-upload N`/`--limit-download N` on `restic backup|forget|prune|check|restore`)
- Modify: `internal/agent/runner/*.go` — pass host-wide caps into the runner. Caps come from `agent.config.Config` or are pushed via `config.update`. Decision: ship caps in the existing `config.update` envelope as new fields `bandwidth_up_kbps`, `bandwidth_down_kbps`. Server pushes on hello + on `PUT /api/hosts/{id}/bandwidth`.
- Modify: `internal/api/messages.go` — extend `ConfigUpdatePayload` with the two int pointers.
- Modify: `internal/server/ws/handler.go` (or wherever hello/config push lives) — include caps in the pushed config.
- Modify: `internal/server/http/host_bandwidth.go` — after `SetHostBandwidth`, fan out a `config.update` to the connected agent (mirror the credentials-edit path).
- Test: `internal/restic/runner_test.go` — assert flag injection.
- Test: `internal/server/ws/*_test.go` — assert config.update carries caps on hello and on edit.
- [ ] **Step 1.1** Add `LimitUploadKBps *int`, `LimitDownloadKBps *int` to whatever per-host config the runner already consults. Existing pattern is `restic.Env{}`; extend it.
- [ ] **Step 1.2** Failing test in `internal/restic/runner_test.go`: build a backup command with `LimitUploadKBps=1024`, assert the resulting argv contains `--limit-upload 1024`.
- [ ] **Step 1.3** Implement: prepend the flags in argv builders for `backup`, `forget`, `prune`, `check`, `restore`. Skip when nil/<=0.
- [ ] **Step 1.4** Wire `config.update` payload — server reads `Host.BandwidthUpKBps`/`DownKBps`, includes them in the existing `ConfigUpdatePayload` push on hello and on bandwidth edit (mirror cred-edit fan-out in `internal/server/http/host_credentials.go`).
- [ ] **Step 1.5** Agent applies caps: store in the in-memory dispatcher state on `config.update`, attach to every restic call.
- [ ] **Step 1.6** `go vet ./... && go test ./... && make build && <restage block>`. Commit:
```
agent+server: apply host bandwidth caps to restic invocations
```
## Task 2 — P2R-13b: Per-job override on Run-now confirm dialog
**Decision:** A small numeric input on the per-source-group Run-now button (and dashboard Run-all). Operator is away — keep it minimal: two optional inputs (up/down KB/s) on the dispatch endpoint; UI shows a `<details>` "Limit bandwidth for this run" disclosure with two number inputs.
**Files:**
- Modify: `internal/server/http/sources.go` (or wherever the per-group Run-now POST lives) — accept optional `bandwidth_up_kbps`/`bandwidth_down_kbps` form fields, pass through.
- Modify: dispatch path (`internal/server/dispatch_*.go` or `ws/handler.go` job-dispatch core) — accept overrides, include in the `command.run` payload.
- Modify: `internal/api/messages.go``CommandRunPayload` gains optional caps that take precedence over host-wide caps when present.
- Modify: agent dispatcher — use payload override if present else falls back to config caps.
- Modify: `web/templates/pages/host_sources.html` (and the schedules Run-now form) — `<details>` block.
- Test: HTTP test for the new form fields; agent runner test for override precedence.
- [ ] **Step 2.1** Failing test: POST to per-group Run-now with `bandwidth_up_kbps=512` → assert dispatched payload carries 512.
- [ ] **Step 2.2** Implement endpoint changes + payload extension.
- [ ] **Step 2.3** Agent override precedence test (payload wins over config).
- [ ] **Step 2.4** UI `<details>` blocks (one per Run-now form).
- [ ] **Step 2.5** Playwright spot-check via `:8080` smoke env: open Sources tab, expand the Run-now disclosure, fire with limit=128, then open the live job log and confirm the agent's restic argv (read `/tmp/rm-smoke/server.log` for the dispatched command — it logs argv) shows `--limit-upload 128`.
- [ ] **Step 2.6** Commit.
## Task 3 — P2R-14: Schedule "next run" / "last run"
**Files:**
- Modify: `internal/store/schedules.go` — add `NextRunAt(time.Time)` derivation helper and `LatestScheduledJobAt(host_id, schedule_id) (time.Time, error)` (or a single batched fetch for all schedules of a host).
- Modify: dashboard host row (`web/templates/partials/host_row.html`) — show "Next: …" and "Last: …" when there's a single covering schedule (already detected in slice 5).
- Modify: `web/templates/pages/host_schedules.html` — add Next/Last columns to the schedules table.
- Modify: relevant page handlers (`internal/server/http/ui_schedules.go`, dashboard handler) — populate the data.
- Test: `schedules_test.go` for next-run derivation (parse cron, compute next from a fixed `now`).
- [ ] **Step 3.1** Add `NextRun(cronExpr string, from time.Time) (time.Time, error)` helper using `robfig/cron/v3`'s `Parse(...).Next(from)`. Test with three crons.
- [ ] **Step 3.2** Add `LatestJobByActorKindForSchedule(host_id, schedule_id) (time.Time, status, error)` query against `jobs` (filter `actor_kind='schedule'` AND `schedule_id=?`, ORDER BY `started_at` DESC LIMIT 1).
- [ ] **Step 3.3** Wire schedules-page handler to populate Next/Last per row; render relative time + ISO tooltip (mirror existing `formatRelTime` template helper if it exists; otherwise use a simple "5m ago" helper).
- [ ] **Step 3.4** Wire dashboard row: when single covering schedule, surface "Next: 03:00" / "Last: 8h ago — succeeded".
- [ ] **Step 3.5** Playwright spot-check: a host with a schedule shows Next/Last; pause it → Next becomes "—" / "(paused)".
- [ ] **Step 3.6** Commit.
## Task 4 — P2R-09: Auto-init UX polish
**Files:**
- Modify: `web/templates/pages/host_repo.html` — danger-zone re-init button + two-step confirm (type the host name).
- Modify: `internal/server/http/ui_repo.go` (or new `repo_reinit.go`) — `POST /hosts/{id}/repo/reinit` admin-only, audit-logged. Server runs `restic init --force` (or wipes-then-inits — pick the safer of the two; restic doesn't truly wipe a repo, the operator must clear the bucket. **Best guess:** dispatch a normal `init` job with a flag that re-runs even if the repo claims to exist; if restic refuses, surface "the repo on the remote already has data — clear it manually before re-init" via the job log).
- Modify: host detail page header / vitals strip — surface init result line. Use the existing latest-`init`-job query to render "repo ready · initialised <relative time> ago" or "init failed · job N · retry".
- Test: HTTP test for re-init endpoint (auth, audit, host-name confirm); template test that the result line renders for both states.
- [ ] **Step 4.1** Add helper: `LatestJobByKind(host_id, "init")` — already exists from P2R-06 (`store.LatestJobByKind`). Reuse.
- [ ] **Step 4.2** Render init line into vitals strip; show "init failed" amber when latest init failed.
- [ ] **Step 4.3** Implement `POST /hosts/{id}/repo/reinit` handler — admin role check, requires a `confirm_hostname` form field that must equal `host.Name`, returns 400 otherwise. Dispatches a fresh `init` job.
- [ ] **Step 4.4** Add danger-zone re-init form to `host_repo.html` (currently disabled per slice 4). Two-step confirm with the typed hostname.
- [ ] **Step 4.5** Playwright: visit `/hosts/{id}/repo`, click re-init, type wrong hostname → blocked; type right hostname → dispatches init job → returns to live log.
- [ ] **Step 4.6** Commit.
## Task 5 — P2R-10: Hook schema (migration 0010)
**Files:**
- Create: `internal/store/migrations/0010_hooks.sql`
- `ALTER TABLE source_groups ADD COLUMN pre_hook BLOB;` (AEAD ciphertext, NULLable)
- `ALTER TABLE source_groups ADD COLUMN post_hook BLOB;`
- `ALTER TABLE hosts ADD COLUMN pre_hook_default BLOB;`
- `ALTER TABLE hosts ADD COLUMN post_hook_default BLOB;`
- All four are AEAD ciphertext (existing `crypto.AEAD`); BLOB column type.
- Modify: `internal/store/types.go` — add `PreHook *string` (decrypted), `PostHook *string` to `SourceGroup`; same to `Host`.
- Modify: `internal/store/sources.go` + `internal/store/hosts.go` — getters/setters encrypt on write, decrypt on read. Pass `crypto.AEAD` through (pattern mirrors `host_credentials.go`).
- Test: encrypt/decrypt round-trip; setting `nil` clears the column.
- [ ] **Step 5.1** Write migration SQL. Column-level ALTERs only (per CLAUDE.md).
- [ ] **Step 5.2** Update store types + getters/setters with AEAD encrypt/decrypt. Mirror `internal/store/host_credentials.go` patterns exactly.
- [ ] **Step 5.3** Round-trip test: set hook on a source group; reload; assert plaintext returned. Set nil; assert nil after reload.
- [ ] **Step 5.4** `go vet && go test`. Commit.
## Task 6 — P2R-11: Agent execution of hooks
**Files:**
- Modify: `internal/api/messages.go``ConfigUpdatePayload` (or the per-source-group bundle inside `ScheduleSetPayload`) carries `PreHook`, `PostHook` plaintext (server has decrypted by then; wire is authenticated WS, same trust boundary as repo creds).
- Modify: agent dispatcher — for `kind=backup` only:
- Run `pre_hook` (if present) via `os/exec` with the host shell (`/bin/sh -c` on Linux, `cmd.exe /C` on Windows). Capture stdout+stderr → JobLog with `hook:` prefix. Non-zero exit aborts the backup, marks the job failed with `pre_hook` error.
- Run `post_hook` (if present) **always** after the backup, with `RM_JOB_STATUS=succeeded|failed` env var. Capture into JobLog, prefix `hook:`. Non-zero exit on post_hook does NOT change job status (warning logged).
- Skip both for `kind` ∈ {forget, prune, check, unlock, init} per spec.md §14.3.
- Test: dispatcher test with a `pre_hook` that exits 1 → backup not started; `post_hook` always runs and sees `RM_JOB_STATUS`.
- [ ] **Step 6.1** Plumb hooks through `ScheduleSetPayload` source-group bundle + per-group Run-now `command.run` payload (override host-default with group hook if both present). Server-side resolution: host default if group hook is empty.
- [ ] **Step 6.2** Agent dispatcher: factor hook execution into `internal/agent/runner/hooks.go`. Use `exec.CommandContext`, set env, plumb output to existing JobLog stream with `Source: "hook"` (or prefix the log lines `hook: …`).
- [ ] **Step 6.3** Failing test in `internal/agent/runner/runner_test.go` (create file if absent): `pre_hook=/bin/false` → job fails with `pre_hook failed (exit 1)` and the actual restic backup never runs (assert via mock-restic shim).
- [ ] **Step 6.4** Test: `post_hook` runs even when backup fails; receives `RM_JOB_STATUS=failed`.
- [ ] **Step 6.5** Test: hooks skipped on `forget`/`prune`/`check`/`unlock` jobs.
- [ ] **Step 6.6** `go vet && go test && make build && <restage block>`. Commit.
## Task 7 — P2R-12: Hook editor UI
**Files:**
- Modify: `web/templates/pages/source_group_edit.html` (new or extend existing source-group form) — `<textarea>` for pre_hook, `<textarea>` for post_hook, with the warning banner: "this hook runs as the agent service user (root on Linux; LocalSystem on Windows)".
- Modify: source-group HTTP handler (`internal/server/http/sources.go`) — accept hook fields on POST/PUT, encrypt-and-persist via store.
- Create: a new "Settings" tab section on host detail (currently inert per P1-25) — wait, just add a new sub-tab or extend Repo page. **Decision:** add `pre_hook_default` / `post_hook_default` to the Repo page under a new "Hooks" section since Settings is still inert.
- Modify: source-group form admin-only check; post-only edit allowed by operators? **Decision:** admin-only edit per spec; render but disable for operators.
- Modify: audit-log writer — emit `source_group.hook_updated` and `host.default_hook_updated` events (without the hook body).
- Test: HTTP test for create + update; admin-only enforcement; audit row written without secret.
- [ ] **Step 7.1** Source-group form extension + handler wiring.
- [ ] **Step 7.2** Repo page Hooks section (host defaults).
- [ ] **Step 7.3** Audit entries.
- [ ] **Step 7.4** Playwright: as admin, set a `pre_hook` of `echo hello`, fire Run-now, open live log, confirm `hook: hello` line appears.
- [ ] **Step 7.5** Commit.
## Task 8 — P2-18a: Announce schema + endpoint
**Files:**
- Create: `internal/store/migrations/0011_pending_hosts.sql`
```sql
CREATE TABLE pending_hosts (
id TEXT PRIMARY KEY,
hostname TEXT NOT NULL,
os TEXT NOT NULL,
arch TEXT NOT NULL,
agent_version TEXT NOT NULL,
restic_version TEXT NOT NULL,
public_key BLOB NOT NULL, -- 32-byte Ed25519
fingerprint TEXT NOT NULL, -- "SHA256:hex"
announced_from_ip TEXT NOT NULL,
first_seen_at TEXT NOT NULL,
last_seen_at TEXT NOT NULL,
expires_at TEXT NOT NULL
);
CREATE INDEX pending_hosts_expires ON pending_hosts(expires_at);
CREATE INDEX pending_hosts_fingerprint ON pending_hosts(fingerprint);
```
- Create: `internal/store/pending_hosts.go` — `CreatePendingHost`, `GetPendingHostByFingerprint`, `ListPendingHosts`, `DeletePendingHost`, `TouchPendingHost`, `DeleteExpiredPendingHosts`.
- Create: `internal/server/http/announce.go` — `POST /api/agents/announce` accepts `{hostname, os, arch, agent_version, restic_version, public_key (base64)}`. Validates protocol_version implicitly via `agent_version` check. Token-bucket rate limit per source IP (10/min). Global cap 100 pending rows. Returns `{fingerprint, pending_id, hostname_collision: bool}`.
- Test: `announce_test.go` — happy path; rate limit; cap; collision flag.
- [ ] **Step 8.1** Migration + store layer + tests.
- [ ] **Step 8.2** Endpoint + tests (use a fake clock + in-process token bucket).
- [ ] **Step 8.3** Commit.
## Task 9 — P2-18b: Pending WS + accept/reject
**Files:**
- Create: `internal/server/ws/pending.go` — `GET /ws/agent/pending` upgrade. Server issues a 32-byte nonce; agent signs it with its Ed25519 private key; server verifies against the `public_key` stored on the pending row keyed by the supplied `pending_id`. If valid, hold the connection open; on accept, push a single `enrolled` message containing `{bearer_token, repo_credentials_aead_blob}` and close cleanly. On reject, close with code 4001 + reason "rejected".
- Create: `internal/server/http/pending.go` — admin-only `POST /api/pending-hosts/{id}/accept` (atomically: mint bearer, decrypt admin-supplied repo creds (passed in form), promote pending row → real `hosts` row, push `enrolled` to the open WS, audit-log) and `POST /api/pending-hosts/{id}/reject` (delete row + close socket).
- Modify: server `main.go` route registration.
- Test: integration test — fake agent opens pending WS, admin POST /accept, agent receives bearer.
- [ ] **Step 9.1** Pending WS handler with nonce-sign verify.
- [ ] **Step 9.2** Accept/reject endpoints. Accept reuses the existing token-consume path internally (mints persistent bearer from `crypto.RandomToken`-style helper, inserts host row + `host_credentials`).
- [ ] **Step 9.3** Tests.
- [ ] **Step 9.4** Commit.
## Task 10 — P2-18c: Agent announce path
**Files:**
- Modify: `cmd/agent/main.go` — when `RM_TOKEN` is unset, switch to announce mode instead of erroring out. `RM_SERVER` still required.
- Create: `internal/agent/announce/announce.go` — generate-or-load Ed25519 keypair (persisted as a file alongside `secrets.enc`, mode 0600). POST `/api/agents/announce`. Open `/ws/agent/pending`. Wait. On `enrolled` message, persist bearer to `agent.yaml`, persist repo creds via existing secrets store, exit announce mode and reconnect via the normal WS path.
- Modify: `deploy/install/install.sh` — when `RM_TOKEN` is missing, run agent in announce mode and `journalctl --follow` until the agent prints the fingerprint, print it to the operator's terminal in big copy-friendly format, then keep following until enrolled.
- Test: end-to-end test in `internal/server/...` using a fake agent.
- [ ] **Step 10.1** Keypair generation + persistence.
- [ ] **Step 10.2** Announce client + pending WS client; print `SHA256:…` fingerprint to stdout in a banner.
- [ ] **Step 10.3** Install script branch.
- [ ] **Step 10.4** Playwright: register a host via announce mode (run agent locally with no RM_TOKEN), log into UI, see Pending hosts panel with the fingerprint, click Accept, confirm host appears.
- [ ] **Step 10.5** Commit.
## Task 11 — P2-18d: Pending hosts UI panel
**Files:**
- Modify: `web/templates/pages/dashboard.html` — add Pending hosts panel above the host list when any pending rows exist.
- Modify: dashboard handler — `Store.ListPendingHosts(now)` (auto-skips expired).
- Add buttons → POST `/api/pending-hosts/{id}/accept` and `/reject` via HTMX.
- Background sweeper for `DeleteExpiredPendingHosts` every 60s (mirror the existing offline-sweeper goroutine pattern).
- [ ] **Step 11.1** Sweeper goroutine.
- [ ] **Step 11.2** Dashboard handler + template.
- [ ] **Step 11.3** Accept form must include the same repo URL/user/pw fields as the token-mint form (admin still supplies repo creds at accept time).
- [ ] **Step 11.4** Playwright sweep.
- [ ] **Step 11.5** Commit.
## Task 12 — P2-16: Windows service integration
**Decision:** Cannot test on Windows from WSL. Goal is a clean compile under `GOOS=windows GOARCH=amd64` and code that follows the canonical `golang.org/x/sys/windows/svc/example` pattern. Untestable beyond compile + manual review; mark in commit message.
**Files:**
- Create: `internal/agent/service/service_windows.go` (build tag `//go:build windows`) — implements `svc.Handler`. `Execute` starts the agent's main loop in a goroutine, listens for `svc.Stop`/`svc.Shutdown`, cancels ctx, waits.
- Create: `internal/agent/service/service_other.go` (build tag `//go:build !windows`) — stub `RunService` that just runs the agent loop in the foreground.
- Create: `internal/agent/service/install_windows.go` — `Install`, `Uninstall`, `Start`, `Stop` thin wrappers around `mgr` package.
- Modify: `cmd/agent/main.go` — sub-commands: `install`, `uninstall`, `start`, `stop`, `run` (default). `run` delegates to `service.Run()` which on Windows checks `svc.IsWindowsService()` and dispatches accordingly.
- Test: `internal/agent/service/service_windows_test.go` (build-tagged) for argv parsing only — actual SCM interaction can't be tested in CI.
- [ ] **Step 12.1** Implement the svc.Handler shell.
- [ ] **Step 12.2** Install/uninstall wrappers (use `mgr.ConnectLocal()`, `m.CreateService(name, exepath, mgr.Config{...}, "run")`).
- [ ] **Step 12.3** Cross-compile check: `GOOS=windows GOARCH=amd64 go build ./cmd/agent` must succeed.
- [ ] **Step 12.4** Commit with note "untested on Windows; compile-verified only".
## Task 13 — P2-17: install.ps1
**Files:**
- Create: `deploy/install/install.ps1` — PowerShell 5.1+ compatible. Checks admin elevation. Downloads agent binary from `$RM_SERVER/agent/binary?os=windows&arch=amd64`. Drops it at `C:\Program Files\restic-manager\restic-manager-agent.exe`. Runs `restic-manager-agent.exe install` (registers service). Starts it. Detects existing tasks named `*restic*` via `Get-ScheduledTask` and prints them — does not auto-disable. Writes `C:\ProgramData\restic-manager\agent.yaml` with `RM_SERVER` + `RM_TOKEN` (or no token if announce-mode).
- Modify: `internal/server/http/install.go` (or wherever install scripts are served) to also serve `/install/install.ps1`.
- Modify: CLAUDE.md restage block to also stage `install.ps1`.
- [ ] **Step 13.1** Write the script.
- [ ] **Step 13.2** Wire serving + restage.
- [ ] **Step 13.3** Smoke parse: `pwsh -NoProfile -Command "Get-Command -Syntax (Get-ChildItem deploy/install/install.ps1)"` if pwsh is on PATH, else `Set-StrictMode` parse via `pwsh -c "$null = [scriptblock]::Create((Get-Content deploy/install/install.ps1 -Raw))"`. Skip if no pwsh available — note in commit.
- [ ] **Step 13.4** Commit.
## Task 14 — Final integration sweep
- [ ] **Step 14.1** `go vet ./... && go test ./... -race`. Full build. Restage. Restart server.
- [ ] **Step 14.2** Playwright walkthrough on `:8080`: login → dashboard shows pending-hosts empty state → create source group → set a `pre_hook` → Run-now with bandwidth override → confirm hook fires + bandwidth applied → schedules tab shows next/last → repo page shows init-OK line → re-init flow gated by typed hostname.
- [ ] **Step 14.3** Update `tasks.md`: tick P2R-09, P2R-10, P2R-11, P2R-12, P2R-13, P2R-14, P2-16, P2-17, P2-18 done. Update Phase 2 acceptance line items as satisfied.
- [ ] **Step 14.4** Open PR `p2-completion → main` with a summary of every item closed.
---
## Decisions made on the operator's behalf (away)
1. **Bandwidth UI for per-job override:** small `<details>` disclosure under each Run-now button. Simpler than a modal; matches the rest of the app's progressive-disclosure style.
2. **Re-init UX:** server dispatches a fresh `init` job; if restic refuses because the repo already exists, surfaces the error in the job log and instructs the operator to clear the remote bucket. We don't try to forcibly wipe — too dangerous, and the agent doesn't have credentials to wipe S3/B2/etc generically.
3. **Hooks editor lives on the Repo page (host defaults) + on the source-group edit form (per-group override).** Skips inventing a new "Settings" tab since that surface is still inert.
4. **Announce flow:** admin still supplies repo creds at accept time (same form as the token-mint flow). The pending row only carries identity-of-the-endpoint material, never repo creds.
5. **Windows service:** compile-verified only; untested. Commit message will say so.
File diff suppressed because it is too large Load Diff
@@ -0,0 +1,473 @@
# P3 — Alerts (design)
> Phase 3 sub-spec covering the alerts engine, notification channels, and UI
> (P3-05 / P3-06 / P3-07).
>
> Wireframe: `_diag/p3-alerts-wireframe/wireframe.html`. Screenshots in the
> same directory. Spec brainstorm ran 2026-05-04; user approved all ten
> design decisions before this spec was written.
## Scope locked
Brainstorm decisions (in order asked):
1. **Rule model.** Hardcoded rule set, no operator-tunable thresholds in v1.
The engine knows about each rule type internally; per-rule config can land
later if/when an operator asks.
2. **Rule set.** Six rules: `backup_failed`, `forget_failed`, `prune_failed`,
`check_failed`, `stale_schedule`, `agent_offline`.
3. **Engine cadence.** Hybrid. Event hooks at the existing
`MarkJobFinished` and offline-sweeper sites for the immediate triggers;
one 60-second ticker handles stale-schedule detection and auto-resolution.
4. **Resolution.** Auto-resolve when the underlying condition clears + manual
Resolve at any time. Acknowledge is a separate "I've seen it" intermediate
state that does NOT close the alert.
5. **v1 channels.** Webhook + native ntfy + SMTP. Apprise deferred (the
channel plumbing accepts new kinds without reshaping). SMTP added as
a first-class channel post-brainstorm because the use case — overnight
alerts the operator wants to read in the morning rather than be pinged
on at 03:00 — is poorly served by ntfy's push model and clumsy via
webhook → email-gateway.
6. **Channel scope.** Global only. No per-host or per-severity routing in v1.
7. **Notification body.** Structured JSON for webhooks, formatted
title+body+click-URL for ntfy, plus a per-channel "Send test notification"
button with inline result feedback.
8. **Deduplication.** Open-alert uniqueness on `(host_id, kind)` with a
`last_seen_at` bump on every confirming tick. One notification per
occurrence; the UI shows "still happening · Ns ago" while a rule keeps
matching.
9. **Alert UI.** Top-level `/alerts` page (the existing nav stub becomes
real). Per-host vitals "Open alerts" cell links to `/alerts?host_id=...`.
Channel CRUD lives at `/settings/notifications`.
10. **Delivery semantics.** Best-effort fire-and-forget with a 5s timeout
per notification. Failures are logged but not retried. The alert row in
the DB is the source of truth.
## Architecture
The subsystem is three loosely-coupled units behind one `AlertEngine`
goroutine:
```
┌───────────────────────────┐
event hooks ─────────────────►│ │
│ AlertEngine │ ──► raise/resolve
60s ticker ──────────────────►│ (rule evaluation) │ alert row
│ │
└────────────┬──────────────┘
┌──────────────────────┐
│ notification.Hub │
│ (fire-and-forget) │
└──┬────────┬──────────┘
│ │
┌──────▼──┐ ┌──▼──────┐
│ Webhook │ │ Ntfy │ …future channels
└─────────┘ └─────────┘
```
### Component boundaries
| Component | Purpose | Depends on |
| ---------------------------------------- | ---------------------------------------------------------------------------------------- | -------------------------------------- |
| `internal/alert.Engine` | Owns the rule evaluation. Exposes `OnJobFinished`, `OnHostOffline`, `OnHostOnline` event hooks; runs a 60s ticker for stale-schedule + auto-resolution sweeps. Persists raises/resolves through the store. | store, notification.Hub, slog |
| `internal/alert.Rule` + per-rule files | Each of the six rules is a small struct with `Kind() string`, `Severity() string`, `MessageFor(ctx) string`. The engine iterates over a registered slice. | store models |
| `internal/notification.Hub` | Receives "alert raised/resolved/test" events; fans out to enabled channels in parallel; logs results to a new `notification_log` table. | store, channel adapters |
| `internal/notification.Channel` (iface) | Single method `Send(ctx, payload) error` with a 5s context for HTTP channels, 10s for SMTP. Three impls in v1: `webhookChannel`, `ntfyChannel`, `smtpChannel`. | http.Client; net/smtp + crypto/tls for SMTP |
| `internal/store/alerts.go` | CRUD on `alerts` table: `RaiseOrTouch(host_id, kind, severity, message)`, `Acknowledge(id, user)`, `Resolve(id, by user)`, `AutoResolve(host_id, kind)`, `ListAlerts(filter)`, plus the `last_seen_at` bump. | sqlite |
| `internal/store/notification_channels.go` | CRUD on `notification_channels` (new table) + `notification_log` (new table). | sqlite, crypto.AEAD (for secrets) |
| `internal/server/http/ui_alerts.go` | `/alerts` page handler + filter parsing + ack/resolve form actions. | store |
| `internal/server/http/ui_notifications.go` | `/settings/notifications` page + channel CRUD + "Send test" handler. | store, notification.Hub |
### Engine event shape
The engine runs as one goroutine per server process started in
`cmd/server/main.go`. It exposes a small set of channels other code writes to:
```go
type Engine struct {
store *store.Store
hub *notification.Hub
// Event channels (buffered, drop-on-full with a slog warning to keep
// hot paths non-blocking). The engine drains them on its own
// goroutine, evaluates the rule, and acts.
jobFinished chan jobFinishedEvent // from store.MarkJobFinished hook
hostOffline chan string // host_id; from offline sweeper
hostOnline chan string // host_id; from ws handler hello
// 60s ticker drives stale-schedule + auto-resolution sweeps.
tick *time.Ticker
}
```
The hot-path call sites (`store.MarkJobFinished`, `ws.handler` offline
sweep, `ws.handler` hello) push to these channels via a tiny
`Engine.Notify*` method that does a non-blocking send. The engine's own
goroutine handles every match — keeps mutation off the hot path.
### Rule catalogue
| Kind | Severity | Trigger | Auto-resolve when |
| ------------------- | -------- | ----------------------------------------------------------------------- | -------------------------------------------------- |
| `backup_failed` | warning | `MarkJobFinished` with kind=backup, status=failed | next backup for the same host succeeds |
| `forget_failed` | warning | `MarkJobFinished` with kind=forget, status=failed | next forget for the same host succeeds |
| `prune_failed` | warning | `MarkJobFinished` with kind=prune, status=failed | next prune for the same host succeeds |
| `check_failed` | critical | `MarkJobFinished` with kind=check, status=failed OR errors_found | next check for the same host succeeds without errors |
| `stale_schedule` | warning | 60s ticker: a schedule's next-fire time is more than 5 minutes in the past with no matching job since | next job for that schedule succeeds OR schedule deleted |
| `agent_offline` | warning | offline-sweeper marks the host offline AND the host has been offline > 15 min (engine checks `last_seen_at`) | hostOnline event for that host |
The 15-minute floor on `agent_offline` exists so a 30-second blip during
agent restart doesn't generate a notification storm. The store's existing
offline sweeper (`hosts.last_seen_at` with 90s threshold) already marks the
host offline; the engine sees the event but waits for the threshold before
raising.
### Dedup + last_seen_at
`store.RaiseOrTouch(host_id, kind, severity, message)`:
```sql
SELECT id, last_seen_at FROM alerts
WHERE host_id = ? AND kind = ? AND resolved_at IS NULL
LIMIT 1;
```
- Found: `UPDATE alerts SET last_seen_at = ?, message = ? WHERE id = ?`,
return `(id, didRaise=false)`.
- Not found: `INSERT INTO alerts (id, host_id, kind, severity, message,
created_at, last_seen_at) VALUES (?, ?, ?, ?, ?, ?, ?)`, return
`(id, didRaise=true)`.
The engine fires a notification through the Hub only when `didRaise=true`.
Touch-only events keep the row's `last_seen_at` fresh so the UI can render
"still happening · Ns ago" without spamming the operator's phone.
### Notification payload shapes
**Webhook** — a single JSON envelope per event:
```json
{
"event": "alert.raised",
"alert_id": "01KQT...",
"severity": "warning",
"kind": "backup_failed",
"host_id": "01KQ...",
"host_name": "alfa-01",
"message": "Backup 'system-config' failed: rest-server returned 401",
"raised_at": "2026-05-04T15:42:01Z",
"link": "https://restic-manager.example/alerts/01KQT..."
}
```
`event` is one of `alert.raised | alert.acknowledged | alert.resolved |
alert.test`. The same envelope shape is reused across events — operators
build one bridge, switch on `event` and `severity`.
**SMTP** — single-recipient plain-text email per channel. The channel
config carries the SMTP server credentials and a `to` address; one
channel = one recipient (or one distribution-list address). Operators
who want multiple recipients add multiple channels — keeps the config
flat and the failure modes per-recipient.
Subject pattern is hardcoded (no per-channel template in v1):
```
Subject: [restic-manager] [<severity>] <host_name>: <kind>
From: <configured-from-address>
To: <configured-to-address>
Date: <RFC 5322>
Message-ID: <alert_id@<server-host>>
<message line — same string the webhook/ntfy gets>
Raised at: 2026-05-04T15:42:01Z
Severity: warning
Host: alfa-01
Kind: backup_failed
Open in restic-manager:
https://restic-manager.example/alerts/01KQT...
(This message was sent by restic-manager. Acknowledge or resolve in the UI.)
```
The body is plain text only in v1 — no HTML alternative — both because
the data is already structured well enough as text and because HTML
email opens a long tail of rendering / sanitisation concerns. The
`Message-ID` includes the alert id so a thread-aware client can group
related events (raised → acknowledged → resolved) together.
Encryption:
- **STARTTLS** (default, port 587). Opportunistic upgrade. Most
operator-facing relays.
- **Implicit TLS** (port 465). Connect-then-TLS-handshake.
- **None** (port 25). Plain. Hidden behind a "Yes I understand" warning
on the form because the password goes over the wire.
Auth:
- **PLAIN** (RFC 4616) over TLS. Default and almost always what's wanted.
- **CRAM-MD5** (RFC 2195). Offered if the server advertises it, no UI
toggle — automatic.
- No OAuth2 / XOAUTH2 in v1; that's a real next step if Gmail-without-
app-passwords becomes a recurring ask.
Per-message timeout is 10s (vs 5s for HTTP channels) — STARTTLS
handshake + DATA over a slow link can legitimately take that long.
**Ntfy** — uses the standard publish format:
```
POST /<topic> HTTP/1.1
Host: <server>
Authorization: Bearer <access-token> (if configured)
Title: [warning] alfa-01 backup failed
Priority: 4
Tags: warning,backup_failed
Click: https://restic-manager.example/alerts/01KQT...
Backup 'system-config' failed: rest-server returned 401
```
Severity → priority mapping:
| Severity | Priority |
| --------- | -------- |
| info | 3 (default) |
| warning | 4 (high) |
| critical | 5 (urgent) |
Per-channel `default_priority` setting overrides for non-critical alerts;
critical always goes urgent regardless.
### Test notification
`POST /api/notifications/{channel_id}/test` builds a synthetic event
(severity=info, kind=test_notification, message="Test from
restic-manager", link to the channel's edit page) and runs it through the
real send path. Returns `{ok: bool, latency_ms: int, status_code?: int,
error?: string}`. UI renders the green ✓ / red ✗ feedback inline.
## Routes added
| Method | Path | Purpose |
| ------- | ----------------------------------------------------- | ------------------------------------------------------------- |
| GET | `/alerts` | Fleet alerts list with filters (`?status=open&severity=warning&host_id=...&q=...`) |
| POST | `/alerts/{id}/acknowledge` | Mark alert acknowledged (HTMX form) |
| POST | `/alerts/{id}/resolve` | Manual resolve (HTMX form) |
| GET | `/settings/notifications` | Channel list page |
| GET | `/settings/notifications/new` | Channel kind picker + empty form |
| POST | `/settings/notifications/new` | Validate + create + redirect |
| GET | `/settings/notifications/{id}/edit` | Channel edit form |
| POST | `/settings/notifications/{id}/edit` | Validate + update |
| POST | `/settings/notifications/{id}/delete` | Delete channel (typed-confirm name in the form) |
| POST | `/api/notifications/{id}/test` | Fire test notification, return JSON result |
| GET | `/api/alerts` | JSON list (mirrors the UI filters) for future REST callers |
## Data model
### Migration 0013 — alerts.last_seen_at
```sql
ALTER TABLE alerts ADD COLUMN last_seen_at TEXT;
UPDATE alerts SET last_seen_at = created_at WHERE last_seen_at IS NULL;
```
Existing alerts (currently zero in production — nothing writes them yet)
get `last_seen_at = created_at`. Column is nullable for forwards-compat
with rows from the alert-engine-pre-bump period.
### Migration 0014 — notification_channels + notification_log
```sql
CREATE TABLE notification_channels (
id TEXT PRIMARY KEY,
kind TEXT NOT NULL CHECK (kind IN ('webhook', 'ntfy', 'smtp')),
name TEXT NOT NULL,
enabled INTEGER NOT NULL DEFAULT 1 CHECK (enabled IN (0, 1)),
config BLOB NOT NULL, -- AEAD-encrypted JSON; per-kind shape
default_priority TEXT, -- ntfy only; null for webhook + smtp
created_at TEXT NOT NULL,
updated_at TEXT NOT NULL,
last_fired_at TEXT
);
CREATE INDEX notification_channels_enabled ON notification_channels(enabled) WHERE enabled = 1;
CREATE TABLE notification_log (
id TEXT PRIMARY KEY,
channel_id TEXT NOT NULL REFERENCES notification_channels(id) ON DELETE CASCADE,
alert_id TEXT REFERENCES alerts(id) ON DELETE SET NULL,
event TEXT NOT NULL, -- alert.raised | alert.acknowledged | alert.resolved | alert.test
ok INTEGER NOT NULL CHECK (ok IN (0, 1)),
status_code INTEGER,
latency_ms INTEGER,
error TEXT,
fired_at TEXT NOT NULL
);
CREATE INDEX notification_log_channel ON notification_log(channel_id, fired_at DESC);
CREATE INDEX notification_log_alert ON notification_log(alert_id);
```
`config` is an AEAD-encrypted JSON blob — bearer tokens for webhooks and
access tokens for ntfy live there. Per-kind config shapes:
```go
type webhookConfig struct {
URL string `json:"url"`
BearerToken string `json:"bearer_token,omitempty"`
HeaderName string `json:"header_name,omitempty"`
HeaderValue string `json:"header_value,omitempty"`
}
type ntfyConfig struct {
ServerURL string `json:"server_url"` // default https://ntfy.sh
Topic string `json:"topic"`
AccessToken string `json:"access_token,omitempty"`
}
type smtpConfig struct {
Host string `json:"host"` // e.g. smtp.example.com
Port int `json:"port"` // default 587 (STARTTLS), 465 (TLS), 25 (none)
Encryption string `json:"encryption"` // "starttls" | "tls" | "none"
Username string `json:"username"`
Password string `json:"password"` // sensitive — AEAD-encrypted with the rest of config
From string `json:"from"` // RFC 5322 address; "alerts@example.com" or "Restic-Manager <alerts@…>"
To string `json:"to"` // single recipient or distribution-list address; v1 = one channel = one to-line
}
```
### Engine state
The engine itself is stateless beyond the channels it owns; all
persisted state is in the existing `alerts` table + the new
`notification_log` table. A process restart re-evaluates from scratch:
on next tick the stale-schedule + auto-resolution sweeps catch up with
whatever happened during the downtime. No outbox to drain.
## UI templates
| Template | Purpose |
| ----------------------------------------- | ------------------------------------------------------ |
| `web/templates/pages/alerts.html` | Fleet alerts page |
| `web/templates/partials/alert_row.html` | One alert row (used by both list and detail-fragment swap) |
| `web/templates/pages/settings.html` | Settings shell with Notifications / Users / Auth sub-tabs |
| `web/templates/pages/notifications.html` | Channel list (Notifications sub-tab body) |
| `web/templates/pages/notification_edit.html` | Channel kind picker + per-kind form + test button + payload preview |
| `web/templates/partials/crit_banner.html` | Dashboard top-of-page banner |
| `web/templates/partials/nav.html` | Existing — gain a `data-alerts-count` attribute on the Alerts tab so the badge auto-updates |
The Settings shell + Notifications sub-tab is the new chrome the wireframe
introduced; Users + Authentication tabs are placeholder links that 404 in
v1 (or render an "Lands later" notice). Same pattern P2R-02 used for
inert sub-tabs.
## Tests (target coverage)
- `internal/alert/engine_test.go` — rule firing per kind: backup_failed
raises on `MarkJobFinished(kind=backup, status=failed)`; touch-only on
the second failure for the same host (no second notification);
auto-resolve on next success.
- `internal/alert/agent_offline_test.go` — `OnHostOffline` emits without
raising until the 15-min floor; `OnHostOnline` clears the alert.
- `internal/alert/stale_schedule_test.go` — synthetic schedule whose next
fire is in the past triggers; resets when a job lands.
- `internal/notification/webhook_test.go` — payload shape pinned;
authorisation header sent when bearer set; custom header echoed; 5s
timeout enforced; error in `notification_log`.
- `internal/notification/ntfy_test.go` — title/priority/tags/click headers
match the severity mapping; access token sent as `Authorization: Bearer
<token>`; default priority overridden by severity for critical.
- `internal/notification/smtp_test.go` — round-trip against a local
`net/smtp.NewServer`-style fake (or `mhog`/MailHog if convenient):
STARTTLS handshake completes against a self-signed cert; PLAIN auth
uses configured creds; subject + from + to + body bytes match the
spec'd format; Message-ID contains the alert id; 10s timeout enforced;
failure path (auth refused) lands in `notification_log` with the
server's error string.
- `internal/server/http/ui_alerts_test.go` — page renders with filters
applied; ack/resolve POSTs flip the row + write audit; HX-Redirect
bounces back to the filtered list.
- `internal/server/http/ui_notifications_test.go` — CRUD happy paths,
validation re-render, secrets-encrypted-at-rest assertion (load row,
decrypt, compare), test-button hits the real send path against a
test http.Server.
- Migration 0013 + 0014 round-trip tested via `store.Open` on a fresh
db.
## Playwright sweep
End-of-phase sweep mirrors the P2R-02 / P3-restore pattern:
1. Login → `/alerts` (initially empty) → see "All clear · last alert
never" empty state.
2. Trigger a fake-failed-backup via `POST /api/hosts/{id}/jobs` against a
host with a deliberately-wrong rest-server URL. Wait for the
`backup_failed` alert to appear in the list within ~2s of the job
finishing.
3. Acknowledge → row tints + ack actor visible.
4. Take the agent offline (`systemctl stop`); wait 15 min OR mock
`last_seen_at` to 16 min ago via the test harness; confirm
`agent_offline` alert raises once.
5. Restart the agent → `agent_offline` auto-resolves; `backup_failed` is
still open.
6. Configure a webhook channel pointing at a local test sink; click "Send
test" → green ✓.
7. Configure a ntfy channel pointing at a local sink → click "Send test"
→ green ✓.
8. Configure an SMTP channel pointing at a local MailHog (Docker, port
1025, no TLS for the local-only sweep) → click "Send test" → green ✓
→ MailHog UI at :8025 shows the test email with the right subject
and Message-ID.
9. Trigger a fresh failed backup → all three channels receive the
notification (verified from sink logs + MailHog inbox);
`notification_log` has three rows `event=alert.raised, ok=true`.
10. Manually Resolve the open `backup_failed`; confirm all three channels
receive `event=alert.resolved`.
11. Critical-severity test: trigger `check_failed` (mocked) → dashboard
banner appears; clicking it lands on `/alerts?severity=critical&status=open`.
12. Empty the alerts again → banner disappears.
Screenshots into `_diag/p3-alerts-sweep/`. End-to-end clean, zero console
errors, before handing back.
## What does NOT change
- Existing chrome/templates beyond the small additions noted above.
- Existing `alerts.severity` CHECK (`info`/`warning`/`critical`) — already
the right shape; no migration needed for that.
- Audit log writer pattern — engine writes audit rows for ack/resolve
the same way every other state-changing handler does.
- The agent. Alerts are entirely a server concern; the agent doesn't
know they exist.
## Open questions / explicit non-goals
- **Per-rule cooldowns / re-raise on long-running issues.** Out of scope
(brainstorm question 8 ruled this out). Operators see "still happening"
in the UI; they don't get a reminder ping.
- **SMTP HTML emails.** v1 is plain text only — operators wanting rich
rendering can deploy a webhook → mail-merge bridge, or wait for a v2
template engine. The Message-ID threading + plain text body should be
enough for almost every overnight-digest workflow.
- **SMTP OAuth2 / XOAUTH2.** Out of scope. Gmail / Microsoft 365 with
modern OAuth requires an `app password` workaround in v1. Native
XOAUTH2 lands when an operator asks (or when Google starts refusing
app passwords for non-business accounts in earnest).
- **Multi-recipient SMTP channels.** A channel = one `To`. Operators
wanting multiple recipients add multiple channels. Keeps failure
attribution per-recipient.
- **Apprise sidecar integration.** Deferred per brainstorm. The
`Channel` interface accepts a third impl without reshaping when we get
there.
- **Per-host or per-severity channel routing.** Out of scope. Likely
next step if operators ask: a `min_severity` field on the channel row.
- **Snooze / mute.** Out of scope. Acknowledge is the closest analogue;
full silence-windows would need a new table and is YAGNI for v1.
- **PagerDuty / OpsGenie.** Both have webhook receivers; operators wire
them via the webhook channel today.
- **Alert "rules" UI.** No CRUD; the rule set is hardcoded.
@@ -0,0 +1,342 @@
# P3 — Restore (design)
> Phase 3 sub-spec covering single-host restore (P3-01, P3-02, P3-03, P3-09).
> P3-04 (cross-host restore) is deferred to a new "Future / unscheduled"
> section in `tasks.md` — disaster recovery is already covered by re-enrolling
> a replacement host with the same repo credentials.
>
> Wireframe: `_diag/p3-restore-wizard/wireframe.html`. Screenshot:
> `_diag/p3-restore-wizard/01-full-wizard.png`.
## Scope locked
Brainstorm decisions (in order asked):
1. **In-place vs new-directory.** Default is a new directory under
`/var/restic-restore/<job-id>/`. An "Restore in place (overwrite original
paths)" toggle is gated by typed-confirmation of the host name, mirroring
the repo re-init pattern.
2. **Path-selection granularity.** Tree browser as the path selector, lazy-
loaded via `restic ls --json <snapshot> <path>` per directory expansion.
3. **Cross-host restore (P3-04).** Out of scope this phase. Move to
"Future / unscheduled" in `tasks.md`. The disaster-recovery case is covered
by the standard enrolment flow: stand up a replacement host, paste the
original repo creds at enrolment, snapshots reappear, restore is
same-host.
4. **Snapshot diff (P3-09).** Diff-as-a-job. New `JobDiff` JobKind dispatched
like every other agent operation. Output streams as `log.stream` and
renders on the live job log page.
5. **Wizard entry points.** Top-level "Restore" button on host detail
(`/hosts/{id}/restore`, opens wizard at step 1) plus a per-snapshot
Restore action on snapshot rows (`/hosts/{id}/snapshots/{sid}/restore`,
skips step 1).
6. **Wizard interaction model.** Single-page, sections progressively enable;
tree-browser nodes lazy-load via HTMX partials. No `restore_drafts` table.
7. **Tree-browser data path.** Synchronous WS RPC (`tree.list`
`tree.list.result`, correlation-ID) plus a per-wizard-session in-memory
cache keyed by `{snapshot_id, path}` with ~30-min TTL.
8. **Restore progress UI.** Restore-specific job-page variant: files-restored
/ bytes-restored / throughput / ETA / current-file display, driven by
restic restore's JSON status events surfaced through `job.progress`.
9. **Permissions/ownership.** Policy, not toggle. In-place restore preserves
original ownership; new-directory restore drops ownership
(`--no-ownership`).
10. **Concurrency.** Single-flight per host (one job at a time across all
kinds). Plus a real cancel-job feature: `command.cancel` envelope, agent
kills the `restic` subprocess via context cancel (SIGTERM, SIGKILL after
grace), server transitions the job to `cancelled`. The "Cancel" button
already in the `job_detail` template becomes real for any running job
kind.
11. **Audit + safety.** Audit row on every restore dispatch (`host.restore`
with snapshot ID, paths, target, in-place flag). Recent-restores panel
on the host page surfacing the latest restore job alongside last-backup
and last-init signals. Role gate deferred to P4-03.
## Architecture
Restore composes from existing primitives plus three new pieces:
- **New JobKind values**: `JobRestore`, `JobDiff`. Dispatcher cases mirror
the prune/check pattern. Agent-side handlers wrap `restic.RunRestore` and
`restic.RunDiff` (new methods on the `restic` package).
- **New WS RPC**: `tree.list` request (`{snapshot_id, path}`) ↔
`tree.list.result` reply (`{entries: [{name, type, size}], ...}` or
`{error}`). Reuses existing correlation-ID infrastructure from P1-09. No
`jobs` row.
- **New cancel surface**: `command.cancel` request (`{job_id}`), agent
cancels the running subprocess context, returns `command.ack` + `job.finished`
with status `cancelled`. Server endpoint `POST /api/jobs/{id}/cancel`
bridges UI button → WS envelope.
Everything else (job lifecycle, log streaming, progress envelope, snapshot
listing, audit log writer, host_chrome partial, danger-zone typed-confirmation)
already exists and is reused verbatim.
### Component boundaries
| Component | Purpose | Depends on |
| ---------------------------------- | ---------------------------------------------------- | ----------------------------------------- |
| `internal/restic.RunRestore` | Run `restic restore` with paths + target + ownership | `restic.Env` |
| `internal/restic.RunDiff` | Run `restic diff --json a b` | `restic.Env` |
| `internal/agent/runner` cases | Dispatch `JobRestore` / `JobDiff` jobs | `restic.Run*`, hooks (skipped: backup-only) |
| `internal/agent/runner` cancel hook | Wire WS `command.cancel` → ctx.CancelFunc per job | runner job map |
| `internal/agent/runner` tree-list | Sync RPC handler: `restic ls --json` for one path | `restic.Env` |
| `internal/server/ws/cancel.go` | Validate + send `command.cancel` envelope | hub.Send, store.UpdateJobStatus |
| `internal/server/ws/tree.go` | RPC mediator: `tree.list` request → reply, with cache | hub.SendRPC, in-memory cache |
| `internal/server/http/restore.go` | Wizard routes + dispatch endpoint | store, ws, audit |
| `internal/server/http/diff.go` | Snapshot-diff dispatch endpoint | store, ws |
| `internal/server/http/cancel.go` | `POST /api/jobs/{id}/cancel` | ws |
| `web/templates/pages/host_restore.html` | Wizard page | host_chrome partial |
| `web/templates/partials/tree_node.html` | Lazy-loaded tree node fragment for HTMX swap | — |
| `web/templates/pages/job_detail.html` | Restore-kind progress widget (variant) | existing job_detail |
### Data flow — wizard happy path
```
operator
├─ GET /hosts/{id}/restore
│ server renders wizard shell, snapshot table from store.ListSnapshotsByHost
├─ click snapshot row (or arrives via /hosts/{id}/snapshots/{sid}/restore)
│ wizard advances to step 2, snapshot summary card rendered
├─ expand a tree node (chevron click)
│ HTMX GET /hosts/{id}/restore/tree?snapshot={sid}&path=/etc
│ server checks per-session cache (keyed by sid+path)
│ hit → render tree_node fragment from cache
│ miss → hub.SendRPC(host_id, "tree.list", {sid, path}) → wait reply
│ cache result, render tree_node fragment
├─ tick file/dir checkboxes (form state, no round-trip)
├─ pick target radio (and optionally type host name to unlock in-place)
└─ POST /hosts/{id}/restore (form submit)
server validates: ≥1 path, target mode, in-place ⇒ host name match
write audit row host.restore
store.CreateJob{kind=restore, payload={snapshot_id, paths, target, in_place}}
hub.Send(host_id, "command.run", {job_id, kind=restore, payload})
HX-Redirect: /jobs/{job_id}
```
### Data flow — agent restore execution
```
agent.runner receives command.run kind=restore
├─ check single-flight: if r.activeJobID != "" → reply busy
│ (server queues to pending_runs only for kind=backup; restore returns busy)
├─ allocate ctx, ctxCancel — store cancelFunc against job_id in r.cancels
├─ sendStarted(job_id, JobRestore, now)
├─ build target path: if in_place → "/" else "/var/restic-restore/<job_id>/"
├─ build flags: paths from payload, --no-ownership when !in_place
├─ restic.RunRestore(ctx, env, snapshot_id, paths, target, in_place):
│ restic restore <sid> --target <path> [--no-ownership] -- <p1> <p2> ...
│ parse stdout JSON: forward "status" → job.progress (1Hz throttle), "summary" → final
├─ on success: sendFinished(job_id, succeeded, exit=0)
├─ on ctx.Err() == context.Canceled: sendFinished(job_id, cancelled, exit=130)
└─ delete cancel func from r.cancels
```
### Data flow — cancel
```
operator clicks Cancel on /jobs/{id} (running)
POST /api/jobs/{id}/cancel
server: lookup job, ensure status=running, find host
hub.Send(host_id, "command.cancel", {job_id})
→ agent.runner receives command.cancel
cancelFunc, ok := r.cancels[job_id]
ok && cancelFunc()
→ restic subprocess context done → exec.Cmd kills via SIGTERM
→ if still alive after 5s grace → SIGKILL
→ runner sendFinished(job_id, cancelled, exit=130)
→ server receives job.finished status=cancelled, persists, broadcasts
→ browser refresh shows cancelled state
```
The cancel surface is independently useful for any kind (prune/check/backup) —
not gated to restore. The button already in `job_detail.html` becomes real.
### Tree-list RPC details
New WS message types (added to `internal/api/messages.go`):
```
type TreeListRequestPayload struct {
SnapshotID string `json:"snapshot_id"`
Path string `json:"path"`
}
type TreeListEntry struct {
Name string `json:"name"`
Type string `json:"type"` // "dir" | "file" | "symlink"
Size int64 `json:"size,omitempty"`
}
type TreeListResultPayload struct {
SnapshotID string `json:"snapshot_id"`
Path string `json:"path"`
Entries []TreeListEntry `json:"entries,omitempty"`
Error string `json:"error,omitempty"`
}
```
Server-side mediator (`ws.SendRPC`) takes a request envelope, registers the
correlation ID in a pending map, sends, blocks on a per-call channel until
the matching reply arrives (or 30s timeout). The pattern is small enough
to inline in `internal/server/ws/rpc.go` as a generic helper — future
synchronous RPCs reuse it.
In-memory cache: `map[sessionID]map[cacheKey]TreeListResultPayload` with
`cacheKey = snapshot_id + "\x00" + path`. Session ID minted per wizard
load (HTTP-only cookie scoped to `/hosts/{id}/restore/tree`, lifetime 30
min). On wizard close (browser navigation away) the entry expires
naturally. No persistence, no migration.
Agent handler runs `restic ls --json <sid> <path>` (non-recursive — restic
defaults to recursive but `restic ls` accepts `--long` and a path filter;
parse output line-by-line and emit only direct children of `path`). 60s
context timeout, mirroring existing `restic snapshots` invocation.
### Restore payload
`api.CommandRunPayload` gains a nested optional `restore` field:
```
type RestorePayload struct {
SnapshotID string `json:"snapshot_id"`
Paths []string `json:"paths"` // absolute paths inside the snapshot
InPlace bool `json:"in_place"`
TargetDir string `json:"target_dir"` // empty when in_place=true
PreserveOwner bool `json:"preserve_owner"` // mirrors policy: in_place=>true, else=>false
}
```
The payload is set by the server when dispatching `JobRestore` and ignored
on every other kind. Wire-shape test pinned in `wire_test.go`.
### Diff payload
`api.CommandRunPayload` gains:
```
type DiffPayload struct {
SnapshotA string `json:"snapshot_a"`
SnapshotB string `json:"snapshot_b"`
}
```
Set on `JobDiff`. Output is plain `restic diff --json <a> <b>` forwarded as
`log.stream` lines. Job page renders unchanged — operator reads the diff
output directly.
### Recent-restores panel
A small panel rendered on the host detail page below the existing init-status
line:
```
last restore: succeeded 2h ago · job f73ab4c1… · 3 files to /var/restic-restore/...
```
Backed by a new `store.LatestJobByKind(host_id, JobRestore)` query (mirroring
the existing `store.LatestJobByKind` already used for init/forget/prune/check
in P2R-06). One template addition in `host_chrome.html` next to the
`InitStatus` block.
## Routes added
| Method | Path | Purpose |
| ------- | --------------------------------------------------------- | ----------------------------------------------------------- |
| GET | `/hosts/{id}/restore` | Wizard shell (step 1 = snapshot picker) |
| GET | `/hosts/{id}/snapshots/{sid}/restore` | Wizard shell with snapshot pre-selected (skips step 1) |
| GET | `/hosts/{id}/restore/tree` | HTMX partial: tree node listing for `?snapshot=&path=` |
| POST | `/hosts/{id}/restore` | Validate + dispatch restore job, redirect to live job page |
| POST | `/api/hosts/{id}/snapshots/diff` | Dispatch a diff job for `{snapshot_a, snapshot_b}` |
| POST | `/api/jobs/{id}/cancel` | Send `command.cancel` to host, transition job → cancelled |
## Migrations
None. Restore + diff piggyback on the existing `jobs` table (their `kind` is
new but the schema already accepts arbitrary kind strings — there's no
CHECK constraint on `kind`). The cancel feature uses the existing
`JobCancelled` terminal status. The tree-list cache lives in process memory.
## Tests (target coverage)
- `internal/restic/restore_test.go``RunRestore` invocation builds the
expected argv (paths, --target, --no-ownership flag presence, in-place
variant); JSON status parsing → `BackupStatus`-shaped progress envelopes.
- `internal/restic/diff_test.go``RunDiff` argv shape and JSON forwarding.
- `internal/agent/runner/restore_test.go` — happy path, cancel mid-run
produces `cancelled` finished, in-place vs new-directory dispatch,
single-flight rejects when another job is running.
- `internal/agent/runner/tree_test.go``tree.list` handler returns
direct children for a synthetic restic ls output, surfaces error on
missing snapshot.
- `internal/server/ws/rpc_test.go``SendRPC` correlation matching,
timeout, concurrent calls.
- `internal/server/http/restore_test.go` — wizard renders with snapshots,
POST validates ≥1 path + in-place host-name match, audit row written,
job dispatched with correct payload, in-place without typed-confirm
re-renders form with input intact and an error.
- `internal/server/http/diff_test.go` — POST dispatches `JobDiff`,
snapshot IDs validated against the host's snapshot list.
- `internal/server/http/cancel_test.go` — POST cancel happy path
(running → cancelled), 4xx for non-running jobs, 4xx when host offline.
- `internal/server/http/restore_e2e_test.go` — happy path: GET wizard,
expand `/etc` (HTMX call returns expected fragment), submit, follow
HX-Redirect to job page, see status.
- `web/templates/pages/host_restore_test.go` (template-render test) —
wizard renders all four sections; in-place card disabled until typed
confirm.
## Playwright iteration / sweep
A Playwright sweep at the end (mirroring P2R-02 Slice 6) runs against the
local smoke server with a real agent enrolled. Steps:
1. Login → navigate to alfa-01 host → click Restore.
2. Wizard step 1: pick the most recent snapshot.
3. Wizard step 2: expand a directory two levels, tick three files,
verify tally updates.
4. Wizard step 3: leave default new-directory.
5. Wizard step 4: dispatch.
6. Land on live job page, see progress widget animating, see log lines.
7. Click Cancel mid-flight, verify status transitions to cancelled and
the agent's subprocess actually died (log line `signal: killed` or exit
130).
8. Repeat with in-place mode: type host name, dispatch, verify red
primary button, verify files actually overwritten on host.
9. Snapshot diff: navigate to snapshots, pick two, dispatch diff, see
diff output streamed.
10. Screenshots into `_diag/p3-restore-sweep/`.
End-to-end clean, zero console errors, before handing back.
## What does NOT change
- `host_chrome.html` only grows the recent-restores line; sub-tab list
unchanged (Restore is a top-level button on the host page, not a sub-tab).
- `enrollment.go`, schedule reconciliation, source-group CRUD, repo
maintenance ticker, hook execution — none of these are touched.
- The CLAUDE.md restage block applies as-is when the agent binary changes
(it does — runner gains restore/diff/cancel/tree handlers). The unit
file does not change.
## Open questions / explicit non-goals
- **Restore preview / dry-run.** Restic doesn't have a dry-run for restore.
Out of scope.
- **Resumable restore.** Restic restore is idempotent per-file but not
resumable mid-stream from where it left off. If a restore is cancelled,
the operator re-runs (files already written are overwritten). No state
to track.
- **Restore to a glob/pattern (e.g. `*.conf`).** Out of scope; the tree
picker requires explicit ticks. Power users can edit the URL or use the
CLI.
- **Bandwidth caps for restore.** Honoured automatically — restic's
`--limit-download` is part of `restic.Env` already (P2R-13) and applies
to restore unchanged.
- **Pre/post hooks for restore.** Hooks today gate only `kind=backup`
(P2R-11). Out of scope.
+3 -7
View File
@@ -3,26 +3,22 @@ module gitea.dcglab.co.uk/steve/restic-manager
go 1.25.0
require (
github.com/coder/websocket v1.8.14
github.com/coreos/go-oidc/v3 v3.18.0
github.com/go-chi/chi/v5 v5.2.5
github.com/golang-jwt/jwt/v5 v5.3.1
github.com/oklog/ulid/v2 v2.1.1
github.com/robfig/cron/v3 v3.0.1
golang.org/x/crypto v0.50.0
golang.org/x/oauth2 v0.36.0
golang.org/x/sys v0.43.0
gopkg.in/yaml.v3 v3.0.1
modernc.org/sqlite v1.50.0
)
require (
github.com/coder/websocket v1.8.14 // indirect
github.com/dustin/go-humanize v1.0.1 // indirect
github.com/go-jose/go-jose/v4 v4.1.4 // indirect
github.com/google/uuid v1.6.0 // indirect
github.com/mattn/go-isatty v0.0.20 // indirect
github.com/ncruces/go-strftime v1.0.0 // indirect
github.com/remyoudompheng/bigfft v0.0.0-20230129092748-24d4a6f8daec // indirect
github.com/robfig/cron/v3 v3.0.1 // indirect
golang.org/x/sys v0.43.0 // indirect
modernc.org/libc v1.72.0 // indirect
modernc.org/mathutil v1.7.1 // indirect
modernc.org/memory v1.11.0 // indirect
-8
View File
@@ -1,15 +1,9 @@
github.com/coder/websocket v1.8.14 h1:9L0p0iKiNOibykf283eHkKUHHrpG7f65OE3BhhO7v9g=
github.com/coder/websocket v1.8.14/go.mod h1:NX3SzP+inril6yawo5CQXx8+fk145lPDC6pumgx0mVg=
github.com/coreos/go-oidc/v3 v3.18.0 h1:V9orjXynvu5wiC9SemFTWnG4F45v403aIcjWo0d41+A=
github.com/coreos/go-oidc/v3 v3.18.0/go.mod h1:DYCf24+ncYi+XkIH97GY1+dqoRlbaSI26KVTCI9SrY4=
github.com/dustin/go-humanize v1.0.1 h1:GzkhY7T5VNhEkwH0PVJgjz+fX1rhBrR7pRT3mDkpeCY=
github.com/dustin/go-humanize v1.0.1/go.mod h1:Mu1zIs6XwVuF/gI1OepvI0qD18qycQx+mFykh5fBlto=
github.com/go-chi/chi/v5 v5.2.5 h1:Eg4myHZBjyvJmAFjFvWgrqDTXFyOzjj7YIm3L3mu6Ug=
github.com/go-chi/chi/v5 v5.2.5/go.mod h1:X7Gx4mteadT3eDOMTsXzmI4/rwUpOwBHLpAfupzFJP0=
github.com/go-jose/go-jose/v4 v4.1.4 h1:moDMcTHmvE6Groj34emNPLs/qtYXRVcd6S7NHbHz3kA=
github.com/go-jose/go-jose/v4 v4.1.4/go.mod h1:x4oUasVrzR7071A4TnHLGSPpNOm2a21K9Kf04k1rs08=
github.com/golang-jwt/jwt/v5 v5.3.1 h1:kYf81DTWFe7t+1VvL7eS+jKFVWaUnK9cB1qbwn63YCY=
github.com/golang-jwt/jwt/v5 v5.3.1/go.mod h1:fxCRLWMO43lRc8nhHWY6LGqRcf+1gQWArsqaEUEa5bE=
github.com/google/pprof v0.0.0-20250317173921-a4b03ec1a45e h1:ijClszYn+mADRFY17kjQEVQ1XRhq2/JR1M3sGqeJoxs=
github.com/google/pprof v0.0.0-20250317173921-a4b03ec1a45e/go.mod h1:boTsfXsheKC2y+lKOCMpSfarhxDeIzfZG1jqGcPl3cA=
github.com/google/uuid v1.6.0 h1:NIvaJDMOsjHA8n1jAhLSgzrAzy1Hgr+hNrb57e+94F0=
@@ -31,8 +25,6 @@ golang.org/x/crypto v0.50.0 h1:zO47/JPrL6vsNkINmLoo/PH1gcxpls50DNogFvB5ZGI=
golang.org/x/crypto v0.50.0/go.mod h1:3muZ7vA7PBCE6xgPX7nkzzjiUq87kRItoJQM1Yo8S+Q=
golang.org/x/mod v0.33.0 h1:tHFzIWbBifEmbwtGz65eaWyGiGZatSrT9prnU8DbVL8=
golang.org/x/mod v0.33.0/go.mod h1:swjeQEj+6r7fODbD2cqrnje9PnziFuw4bmLbBZFrQ5w=
golang.org/x/oauth2 v0.36.0 h1:peZ/1z27fi9hUOFCAZaHyrpWG5lwe0RJEEEeH0ThlIs=
golang.org/x/oauth2 v0.36.0/go.mod h1:YDBUJMTkDnJS+A4BP4eZBjCqtokkg1hODuPjwiGPO7Q=
golang.org/x/sync v0.20.0 h1:e0PTpb7pjO8GAtTs2dQ6jYa5BWYlMuX047Dco/pItO4=
golang.org/x/sync v0.20.0/go.mod h1:9xrNwdLfx4jkKbNva9FpL6vEN7evnE43NNNJQ2LF3+0=
golang.org/x/sys v0.6.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
+7 -13
View File
@@ -32,11 +32,6 @@ type Config struct {
RepoUsername string
RepoPassword string
// SupportsRestoreNoOwnership comes from a startup probe of
// `restic restore --help`; gates the new-dir-restore flag without
// relying on version sniffing.
SupportsRestoreNoOwnership bool
// Bandwidth caps in KB/s applied to every restic invocation.
// <=0 means "no cap". Per-job override: callers that build a
// runner per-dispatch can pass the override value here directly.
@@ -66,14 +61,13 @@ func New(cfg Config, tx Sender, progressMinPeriod time.Duration) *Runner {
// resticEnv builds the shared restic.Env from r.cfg.
func (r *Runner) resticEnv() restic.Env {
return restic.Env{
Bin: r.cfg.ResticBin,
Version: r.cfg.ResticVersion,
RepoURL: r.cfg.RepoURL,
RepoUsername: r.cfg.RepoUsername,
RepoPassword: r.cfg.RepoPassword,
SupportsRestoreNoOwnership: r.cfg.SupportsRestoreNoOwnership,
LimitUploadKBps: r.cfg.LimitUploadKBps,
LimitDownloadKBps: r.cfg.LimitDownloadKBps,
Bin: r.cfg.ResticBin,
Version: r.cfg.ResticVersion,
RepoURL: r.cfg.RepoURL,
RepoUsername: r.cfg.RepoUsername,
RepoPassword: r.cfg.RepoPassword,
LimitUploadKBps: r.cfg.LimitUploadKBps,
LimitDownloadKBps: r.cfg.LimitDownloadKBps,
}
}
-11
View File
@@ -186,17 +186,6 @@ func (e *Engine) handleHostOnline(ctx context.Context, hostID string) {
// task. The KindStaleSchedule constant is exported so UI code can
// reference the tag string today.
func (e *Engine) tick(ctx context.Context, now time.Time) {
// User-management cleanup piggy-backed here for now. Setup tokens
// have a 1h expiry; the alert engine tick is the cheapest existing
// 60s loop. If more housekeeping queries appear, extract a
// dedicated maintenance loop.
if _, err := e.store.CleanupExpiredSetupTokens(ctx, now); err != nil {
slog.Warn("alert: cleanup expired setup tokens", "err", err)
}
if _, err := e.store.CleanupExpiredOIDCState(ctx, now.Add(-5*time.Minute)); err != nil {
slog.Warn("alert: cleanup expired oidc state", "err", err)
}
hosts, err := e.store.ListHosts(ctx)
if err != nil {
slog.Warn("alert: tick list hosts", "err", err)
+2 -19
View File
@@ -9,7 +9,6 @@ import (
"errors"
"fmt"
"strings"
"testing"
"golang.org/x/crypto/argon2"
)
@@ -28,38 +27,22 @@ const (
defaultKeyLen = 32
)
// Cheap params used only when the binary is a `go test` binary
// (testing.Testing() == true). Argon2id at production params costs
// 300500 ms per hash and dominates wall time on CI runners under
// `-race`. Tests don't need real KDF strength — VerifyPassword reads
// params from the encoded hash, so verifying a cheap-params hash
// works the same way.
const (
testMemoryKiB = 8
testIterations = 1
testParallel = 1
)
// HashPassword returns an argon2id-encoded string of the form
//
// $argon2id$v=19$m=...,t=...,p=...$<salt>$<hash>
//
// safe to store in a TEXT column. The salt is freshly random per call.
func HashPassword(password string) (string, error) {
mem, iter, par := uint32(defaultMemoryKiB), uint32(defaultIterations), uint8(defaultParallel)
if testing.Testing() {
mem, iter, par = testMemoryKiB, testIterations, testParallel
}
salt := make([]byte, defaultSaltLen)
if _, err := rand.Read(salt); err != nil {
return "", fmt.Errorf("auth: read salt: %w", err)
}
hash := argon2.IDKey([]byte(password), salt,
iter, mem, par, defaultKeyLen)
defaultIterations, defaultMemoryKiB, defaultParallel, defaultKeyLen)
return fmt.Sprintf("$argon2id$v=%d$m=%d,t=%d,p=%d$%s$%s",
argon2.Version,
mem, iter, par,
defaultMemoryKiB, defaultIterations, defaultParallel,
base64.RawStdEncoding.EncodeToString(salt),
base64.RawStdEncoding.EncodeToString(hash),
), nil
+3 -23
View File
@@ -58,34 +58,14 @@ func (c *NtfyChannel) Send(ctx context.Context, p Payload) (int, time.Duration,
server := strings.TrimRight(c.cfg.ServerURL, "/")
url := server + "/" + c.cfg.Topic
// Body carries the event verb so the body alone is unambiguous when
// it shows up on a phone lockscreen without the title.
body := p.Message
switch p.Event {
case EventResolved:
body = "Resolved · " + p.Message
case EventAcknowledged:
body = "Acknowledged · " + p.Message
}
req, err := http.NewRequestWithContext(ctx, http.MethodPost, url, bytes.NewBufferString(body))
req, err := http.NewRequestWithContext(ctx, http.MethodPost, url, bytes.NewBufferString(p.Message))
if err != nil {
return 0, 0, fmt.Errorf("ntfy: build request: %w", err)
}
// Title prefix tracks the event so raise vs ack vs resolve are
// visually distinct in the ntfy notification list.
verb := "raised"
switch p.Event {
case EventAcknowledged:
verb = "ack"
case EventResolved:
verb = "resolved"
case EventTest:
verb = "test"
}
req.Header.Set("Content-Type", "text/plain")
req.Header.Set("Title", fmt.Sprintf("[%s · %s] %s %s", verb, p.Severity, p.HostName, p.Kind))
req.Header.Set("Tags", verb+","+p.Severity+","+p.Kind)
req.Header.Set("Title", fmt.Sprintf("[%s] %s %s", p.Severity, p.HostName, p.Kind))
req.Header.Set("Tags", p.Severity+","+p.Kind)
req.Header.Set("Priority", priorityForSeverity(p.Severity, c.defaultPriority))
if p.Link != "" {
req.Header.Set("Click", p.Link)
+2 -2
View File
@@ -60,13 +60,13 @@ func TestNtfySendsHeadersAndBody(t *testing.T) {
t.Fatalf("want 200, got %d", code)
}
if want := "[raised · critical] alfa-01 check_failed"; gotTitle != want {
if want := "[critical] alfa-01 check_failed"; gotTitle != want {
t.Errorf("Title: got %q want %q", gotTitle, want)
}
if gotPri != "5" {
t.Errorf("Priority: got %q want \"5\"", gotPri)
}
if want := "raised,critical,check_failed"; gotTags != want {
if want := "critical,check_failed"; gotTags != want {
t.Errorf("Tags: got %q want %q", gotTags, want)
}
if gotClick != "https://rm.example/a" {
+1 -12
View File
@@ -117,20 +117,9 @@ func extractAddr(s string) string {
// Plain text only; subject hardcoded.
func buildEmailBody(cfg SMTPConfig, msgIDDomain string, p Payload) []byte {
var b strings.Builder
// Subject prefix tracks the event verb so raise vs ack vs resolve
// are visually distinct in the inbox (and threaded by Message-ID).
verb := "raised"
switch p.Event {
case EventAcknowledged:
verb = "ack"
case EventResolved:
verb = "resolved"
case EventTest:
verb = "test"
}
b.WriteString("From: " + cfg.From + "\r\n")
b.WriteString("To: " + cfg.To + "\r\n")
b.WriteString(fmt.Sprintf("Subject: [restic-manager] [%s · %s] %s: %s\r\n", verb, p.Severity, p.HostName, p.Kind))
b.WriteString(fmt.Sprintf("Subject: [restic-manager] [%s] %s: %s\r\n", p.Severity, p.HostName, p.Kind))
b.WriteString("Date: " + p.RaisedAt.UTC().Format(time.RFC1123Z) + "\r\n")
b.WriteString("Message-ID: <" + p.AlertID + "@" + msgIDDomain + ">\r\n")
b.WriteString("MIME-Version: 1.0\r\n")
+1 -1
View File
@@ -133,7 +133,7 @@ func TestSMTPSendsExpectedHeaders(t *testing.T) {
if !strings.Contains(srv.rcptTo, "ops@example.com") {
t.Errorf("RCPT TO: %q", srv.rcptTo)
}
if !strings.Contains(srv.data, "Subject: [restic-manager] [raised · warning] alfa-01: backup_failed") {
if !strings.Contains(srv.data, "Subject: [restic-manager] [warning] alfa-01: backup_failed") {
t.Errorf("subject missing or wrong: %q", srv.data)
}
if !strings.Contains(srv.data, "Message-ID: <01ABC@rm.example>") {
+7 -7
View File
@@ -87,13 +87,13 @@ func (e Env) RunRestore(ctx context.Context, snapshotID string, paths []string,
}
}
args = append(args, "--target", target)
// --no-ownership is nominally a restic 0.17+ flag, but at least
// one downstream 0.18.1 build still rejects it. We rely on a
// runtime probe captured at agent startup (see
// SupportsRestoreNoOwnership) rather than version sniffing.
// In-place restores always preserve ownership — that's the whole
// point of in-place — so we only add the flag for new-dir mode.
if !inPlace && e.SupportsRestoreNoOwnership {
// --no-ownership was added in restic 0.17. Older versions reject
// the flag with "unknown flag: --no-ownership". For new-dir
// restores we want the files owned by the agent user (operator
// can cp them without juggling chown), so pass the flag iff the
// running restic supports it. In-place restores always preserve
// ownership — that's the whole point of in-place.
if !inPlace && e.AtLeastVersion(0, 17) {
args = append(args, "--no-ownership")
}
for _, p := range paths {
+6 -37
View File
@@ -15,26 +15,6 @@ import (
"time"
)
// SupportsRestoreNoOwnership probes the running restic for the
// `--no-ownership` flag on the `restore` subcommand. Some restic
// builds (≥ 0.17 in theory; observed missing on a downstream 0.18.1)
// do not expose it, so we ask the binary directly rather than
// inferring from the version string. Empty `bin` or any failure to
// run the help command returns false — the caller stays on the
// conservative path of not adding the flag.
func SupportsRestoreNoOwnership(ctx context.Context, bin string) bool {
if bin == "" {
return false
}
probeCtx, cancel := context.WithTimeout(ctx, 5*time.Second)
defer cancel()
out, err := exec.CommandContext(probeCtx, bin, "restore", "--help").CombinedOutput()
if err != nil {
return false
}
return strings.Contains(string(out), "--no-ownership")
}
// Locate resolves the path to the restic binary. Honour an explicit
// override if provided, else fall back to PATH.
func Locate(override string) (string, error) {
@@ -69,15 +49,6 @@ type Env struct {
ExtraEnv map[string]string // any other RESTIC_* / passthrough
WorkDir string // CWD; default = current
// SupportsRestoreNoOwnership records whether the running restic's
// `restore --help` advertises the --no-ownership flag. The flag was
// added in 0.17, but at least one downstream build of 0.18.1 still
// rejects it ("unknown flag: --no-ownership") — version sniffing
// proved unreliable, so the agent now probes for the actual flag at
// startup (see internal/restic.SupportsRestoreNoOwnership) and
// passes the resulting boolean down here.
SupportsRestoreNoOwnership bool
// Bandwidth caps in KB/s. <=0 means "no cap" (omit the flag).
// Emitted as restic global flags --limit-upload / --limit-download
// before the subcommand on every invocation.
@@ -536,14 +507,12 @@ func pumpPlain(r io.Reader, stream string, handle LineHandler) error {
// on one or the other for its cache dir; without it the command
// fails before ever talking to the repo.
//
// Default to /var/lib/restic-manager. The unit no longer pins
// ProtectHome=read-only (a backup tool needs to restore anywhere),
// but the explicit HOME stays for two reasons: the parent's HOME
// can be unset under unusual init shapes, and pinning the cache
// under a known agent-owned dir keeps restic's metadata isolated
// from the actual operator home dirs that the agent can now write
// to. ExtraEnv overrides win for callers that want a different
// cache location.
// Default to /var/lib/restic-manager — that's in the systemd unit's
// ReadWritePaths and survives ProtectHome=read-only. We do NOT fall
// back to the parent's HOME env var: the agent runs as root with
// HOME=/root, but ProtectHome makes /root read-only, so restic's
// `mkdir /root/.cache/restic` fails. ExtraEnv overrides win for
// callers that explicitly want a different cache location.
func (e Env) envSlice() []string {
home := "/var/lib/restic-manager"
if h, ok := e.ExtraEnv["HOME"]; ok && h != "" {
+4 -28
View File
@@ -30,17 +30,7 @@ type Config struct {
// Defaults to true. Set RM_COOKIE_SECURE=false only for local HTTP
// testing — production deployments are always behind a TLS proxy
// and the cookie must be Secure.
CookieSecure bool `yaml:"cookie_secure"`
OIDCRaw *OIDCConfig `yaml:"oidc"`
OIDC *OIDCConfig `yaml:"-"`
// BundledAssetsDir is the read-only path inside the image that
// holds agent binaries (under agent-binaries/) and install
// scripts (under install/). The /agent/binary and /install/*
// handlers fall back here when the file is not present in
// DataDir. Source-build deployments can override via
// RM_BUNDLED_ASSETS_DIR.
BundledAssetsDir string `yaml:"bundled_assets_dir"`
CookieSecure bool `yaml:"cookie_secure"`
}
// Load resolves config in this order:
@@ -52,10 +42,9 @@ type Config struct {
// safe to start.
func Load(yamlPath string) (Config, error) {
c := Config{
Listen: ":8080",
DataDir: "/data",
CookieSecure: true,
BundledAssetsDir: "/opt/restic-manager/dist",
Listen: ":8080",
DataDir: "/data",
CookieSecure: true,
}
if yamlPath != "" {
@@ -90,9 +79,6 @@ func Load(yamlPath string) (Config, error) {
c.CookieSecure = true
}
}
if v, ok := os.LookupEnv("RM_BUNDLED_ASSETS_DIR"); ok {
c.BundledAssetsDir = v
}
if v, ok := os.LookupEnv("RM_TRUSTED_PROXY"); ok {
// Comma-separated CIDRs; allow whitespace for readability.
parts := strings.Split(v, ",")
@@ -105,16 +91,6 @@ func Load(yamlPath string) (Config, error) {
}
}
var rawOIDC OIDCConfig
if c.OIDCRaw != nil {
rawOIDC = *c.OIDCRaw
}
oidc, err := loadOIDC(envSnapshot(), rawOIDC)
if err != nil {
return c, err
}
c.OIDC = oidc
return c, c.validate()
}
-103
View File
@@ -1,103 +0,0 @@
// internal/server/config/oidc.go — OIDC subsection of the server
// config. Disabled when oidc.issuer is empty or absent.
package config
import (
"errors"
"fmt"
"os"
)
// OIDCConfig is the OIDC sub-block. The struct doubles as YAML schema;
// loadOIDC applies env overlays on top and fills defaults.
type OIDCConfig struct {
Issuer string `yaml:"issuer"`
ClientID string `yaml:"client_id"`
ClientSecret string `yaml:"client_secret"`
DisplayName string `yaml:"display_name"`
Scopes []string `yaml:"scopes"`
RoleClaim string `yaml:"role_claim"`
RoleMapping map[string]string `yaml:"role_mapping"`
RedirectURL string `yaml:"redirect_url"`
}
// loadOIDC merges YAML + env, applies defaults, validates. Returns
// nil + nil when OIDC is disabled (issuer empty after merge); a
// non-nil OIDCConfig means the caller should wire OIDC.
//
// Env vars (override YAML when set):
//
// RM_OIDC_ISSUER, RM_OIDC_CLIENT_ID, RM_OIDC_CLIENT_SECRET,
// RM_OIDC_CLIENT_SECRET_FILE, RM_OIDC_DISPLAY_NAME,
// RM_OIDC_REDIRECT_URL.
//
// envs is passed in (rather than read with os.LookupEnv) so unit
// tests can supply a fake env map.
func loadOIDC(envs map[string]string, yaml OIDCConfig) (*OIDCConfig, error) {
c := yaml
if v, ok := envs["RM_OIDC_ISSUER"]; ok {
c.Issuer = v
}
if v, ok := envs["RM_OIDC_CLIENT_ID"]; ok {
c.ClientID = v
}
if v, ok := envs["RM_OIDC_CLIENT_SECRET"]; ok {
c.ClientSecret = v
}
if v, ok := envs["RM_OIDC_CLIENT_SECRET_FILE"]; ok && v != "" {
body, err := os.ReadFile(v)
if err != nil {
return nil, fmt.Errorf("config: oidc client_secret_file: %w", err)
}
c.ClientSecret = string(body)
}
if v, ok := envs["RM_OIDC_DISPLAY_NAME"]; ok {
c.DisplayName = v
}
if v, ok := envs["RM_OIDC_REDIRECT_URL"]; ok {
c.RedirectURL = v
}
if c.Issuer == "" {
return nil, nil
}
if c.ClientID == "" {
return nil, errors.New("config: oidc.client_id required when issuer is set")
}
if c.ClientSecret == "" {
return nil, errors.New("config: oidc.client_secret required when issuer is set")
}
if len(c.RoleMapping) == 0 {
return nil, errors.New("config: oidc.role_mapping must have at least one entry")
}
if c.DisplayName == "" {
c.DisplayName = "SSO"
}
if c.RoleClaim == "" {
c.RoleClaim = "groups"
}
if len(c.Scopes) == 0 {
c.Scopes = []string{"openid", "profile", "email", "groups"}
}
return &c, nil
}
// envSnapshot reads the OIDC env vars into a map. Lets the production
// loadOIDC call site stay env-driven while tests pass an explicit
// map.
func envSnapshot() map[string]string {
keys := []string{
"RM_OIDC_ISSUER", "RM_OIDC_CLIENT_ID", "RM_OIDC_CLIENT_SECRET",
"RM_OIDC_CLIENT_SECRET_FILE", "RM_OIDC_DISPLAY_NAME",
"RM_OIDC_REDIRECT_URL",
}
out := make(map[string]string, len(keys))
for _, k := range keys {
if v, ok := os.LookupEnv(k); ok {
out[k] = v
}
}
return out
}
-72
View File
@@ -1,72 +0,0 @@
package config
import "testing"
func TestOIDCParseDisabledWhenIssuerEmpty(t *testing.T) {
t.Parallel()
c, err := loadOIDC(map[string]string{}, OIDCConfig{})
if err != nil {
t.Fatalf("load: %v", err)
}
if c != nil {
t.Errorf("expected nil OIDC config when issuer empty; got %+v", c)
}
}
func TestOIDCRejectMissingClientID(t *testing.T) {
t.Parallel()
yaml := OIDCConfig{Issuer: "https://x", ClientSecret: "s"}
if _, err := loadOIDC(map[string]string{}, yaml); err == nil {
t.Error("expected error for missing client_id")
}
}
func TestOIDCRejectMissingClientSecret(t *testing.T) {
t.Parallel()
yaml := OIDCConfig{Issuer: "https://x", ClientID: "rm"}
if _, err := loadOIDC(map[string]string{}, yaml); err == nil {
t.Error("expected error for missing client_secret")
}
}
func TestOIDCDefaultsApplied(t *testing.T) {
t.Parallel()
yaml := OIDCConfig{
Issuer: "https://x", ClientID: "rm", ClientSecret: "s",
RoleMapping: map[string]string{"a": "admin"},
}
c, err := loadOIDC(map[string]string{}, yaml)
if err != nil {
t.Fatalf("load: %v", err)
}
if c.RoleClaim != "groups" {
t.Errorf("role_claim default: got %q want groups", c.RoleClaim)
}
if c.DisplayName != "SSO" {
t.Errorf("display_name default: got %q want SSO", c.DisplayName)
}
wantScopes := []string{"openid", "profile", "email", "groups"}
if len(c.Scopes) != len(wantScopes) {
t.Errorf("scopes default: got %v want %v", c.Scopes, wantScopes)
}
}
func TestOIDCEnvOverrides(t *testing.T) {
t.Parallel()
yaml := OIDCConfig{
Issuer: "https://from-yaml", ClientID: "yaml-id", ClientSecret: "yaml-secret",
RoleMapping: map[string]string{"x": "admin"},
}
envs := map[string]string{
"RM_OIDC_ISSUER": "https://from-env",
"RM_OIDC_CLIENT_ID": "env-id",
"RM_OIDC_CLIENT_SECRET": "env-secret",
}
c, err := loadOIDC(envs, yaml)
if err != nil {
t.Fatalf("load: %v", err)
}
if c.Issuer != "https://from-env" || c.ClientID != "env-id" || c.ClientSecret != "env-secret" {
t.Errorf("env override: got %+v", c)
}
}
+11 -35
View File
@@ -11,23 +11,19 @@ import (
)
// agent_assets.go serves the agent binary (one per OS/arch) and the
// install scripts. Lookup is dual-path:
//
// 1. <DataDir>/agent-binaries/<name> (or <DataDir>/install/<name>) —
// operator-managed override; lets the operator hot-patch a
// pre-release agent without rebuilding the server image.
// 2. <BundledAssetsDir>/agent-binaries/<name> — read-only, baked
// into the server image at build time (P5-03). This is what
// makes a fresh container Just Work without first-run staging.
// install scripts. The binaries live under <DataDir>/agent-binaries/,
// laid down by the release pipeline (or copied by hand for now).
// The install scripts live in <DataDir>/install/ alongside the
// systemd unit.
//
// Both endpoints are intentionally unauthenticated: the install
// payload is unprivileged on its own — it's the one-time enrollment
// token that grants access. Anyone can pull the binary; only
// someone with a valid token can use it productively.
//
// P1-31: signed-binary verification is deferred. The image is the
// unit of trust; pull-by-digest is the verification primitive.
// Future work bumps standalone-binary delivery to minisign/cosign.
// P1-31: signed-binary verification is deferred. Today we serve
// whatever the operator dropped on disk. Future work bumps this to
// minisign/cosign signed bundles.
// installAssetsRoutes adds /agent/binary and /install/* to r.
func (s *Server) handleAgentBinary(w stdhttp.ResponseWriter, r *stdhttp.Request) {
@@ -49,8 +45,8 @@ func (s *Server) handleAgentBinary(w stdhttp.ResponseWriter, r *stdhttp.Request)
ext = ".exe"
}
name := fmt.Sprintf("restic-manager-agent-%s-%s%s", osTag, archTag, ext)
path, ok := s.resolveBundledAsset("agent-binaries", name)
if !ok {
path := filepath.Join(s.deps.Cfg.DataDir, "agent-binaries", name)
if _, err := os.Stat(path); err != nil {
writeJSONError(w, stdhttp.StatusNotFound, "binary_not_published",
fmt.Sprintf("agent binary for %s/%s not published on this server", osTag, archTag))
return
@@ -68,34 +64,14 @@ func (s *Server) handleInstallAsset(w stdhttp.ResponseWriter, r *stdhttp.Request
writeJSONError(w, stdhttp.StatusBadRequest, "bad_path", "")
return
}
path, ok := s.resolveBundledAsset("install", rel)
if !ok {
path := filepath.Join(s.deps.Cfg.DataDir, "install", rel)
if _, err := os.Stat(path); err != nil {
writeJSONError(w, stdhttp.StatusNotFound, "not_found", "")
return
}
stdhttp.ServeFile(w, r, path)
}
// resolveBundledAsset looks up an asset by (subdir, name). DataDir
// wins so an operator can override the image-baked copy by dropping
// a file into <DataDir>/<subdir>/<name>. If neither path resolves,
// returns ("", false).
func (s *Server) resolveBundledAsset(subdir, name string) (string, bool) {
candidates := []string{
filepath.Join(s.deps.Cfg.DataDir, subdir, name),
}
if s.deps.Cfg.BundledAssetsDir != "" {
candidates = append(candidates,
filepath.Join(s.deps.Cfg.BundledAssetsDir, subdir, name))
}
for _, p := range candidates {
if _, err := os.Stat(p); err == nil {
return p, true
}
}
return "", false
}
func validOS(s string) bool {
switch api.HostOS(s) {
case api.OSLinux, api.OSWindows:
-167
View File
@@ -1,167 +0,0 @@
package http
import (
"context"
"io"
stdhttp "net/http"
"net/http/httptest"
"os"
"path/filepath"
"testing"
"gitea.dcglab.co.uk/steve/restic-manager/internal/crypto"
"gitea.dcglab.co.uk/steve/restic-manager/internal/server/config"
"gitea.dcglab.co.uk/steve/restic-manager/internal/server/ws"
"gitea.dcglab.co.uk/steve/restic-manager/internal/store"
)
// newAssetsTestServer is a minimal scaffold for the /agent/binary and
// /install/* handlers. Two roots: one acts as DataDir, the other as
// the image-baked BundledAssetsDir. Either or both may be empty.
func newAssetsTestServer(t *testing.T, populate func(dataDir, bundleDir string)) string {
t.Helper()
root := t.TempDir()
dataDir := filepath.Join(root, "data")
bundleDir := filepath.Join(root, "dist")
for _, d := range []string{
filepath.Join(dataDir, "agent-binaries"),
filepath.Join(dataDir, "install"),
filepath.Join(bundleDir, "agent-binaries"),
filepath.Join(bundleDir, "install"),
} {
if err := os.MkdirAll(d, 0o755); err != nil {
t.Fatalf("mkdir: %v", err)
}
}
if populate != nil {
populate(dataDir, bundleDir)
}
st, err := store.Open(context.Background(), filepath.Join(root, "rm.db"))
if err != nil {
t.Fatalf("store: %v", err)
}
t.Cleanup(func() { _ = st.Close() })
keyPath := filepath.Join(root, "secret.key")
_ = crypto.GenerateKeyFile(keyPath)
key, _ := crypto.LoadKeyFromFile(keyPath)
aead, _ := crypto.NewAEAD(key)
deps := Deps{
Cfg: config.Config{
Listen: ":0",
DataDir: dataDir,
SecretKeyFile: keyPath,
BundledAssetsDir: bundleDir,
},
Store: st,
AEAD: aead,
Hub: ws.NewHub(),
BootstrapToken: "test-token",
}
s := New(deps)
ts := httptest.NewServer(s.srv.Handler)
t.Cleanup(ts.Close)
return ts.URL
}
func writeFile(t *testing.T, path string, body []byte) {
t.Helper()
if err := os.WriteFile(path, body, 0o644); err != nil {
t.Fatalf("write %s: %v", path, err)
}
}
func get(t *testing.T, url string) (int, []byte) {
t.Helper()
res, err := stdhttp.Get(url)
if err != nil {
t.Fatalf("GET %s: %v", url, err)
}
defer res.Body.Close()
body, _ := io.ReadAll(res.Body)
return res.StatusCode, body
}
func TestAgentBinary_DataDirHit(t *testing.T) {
t.Parallel()
url := newAssetsTestServer(t, func(dataDir, _ string) {
writeFile(t, filepath.Join(dataDir, "agent-binaries", "restic-manager-agent-linux-amd64"),
[]byte("from-datadir"))
})
code, body := get(t, url+"/agent/binary?os=linux&arch=amd64")
if code != 200 || string(body) != "from-datadir" {
t.Fatalf("got %d %q", code, string(body))
}
}
func TestAgentBinary_BundleFallback(t *testing.T) {
t.Parallel()
url := newAssetsTestServer(t, func(_, bundleDir string) {
writeFile(t, filepath.Join(bundleDir, "agent-binaries", "restic-manager-agent-linux-amd64"),
[]byte("from-bundle"))
})
code, body := get(t, url+"/agent/binary?os=linux&arch=amd64")
if code != 200 || string(body) != "from-bundle" {
t.Fatalf("got %d %q", code, string(body))
}
}
func TestAgentBinary_DataDirShadowsBundle(t *testing.T) {
t.Parallel()
url := newAssetsTestServer(t, func(dataDir, bundleDir string) {
writeFile(t, filepath.Join(dataDir, "agent-binaries", "restic-manager-agent-linux-amd64"),
[]byte("from-datadir"))
writeFile(t, filepath.Join(bundleDir, "agent-binaries", "restic-manager-agent-linux-amd64"),
[]byte("from-bundle"))
})
code, body := get(t, url+"/agent/binary?os=linux&arch=amd64")
if code != 200 || string(body) != "from-datadir" {
t.Fatalf("operator override should win: got %d %q", code, string(body))
}
}
func TestAgentBinary_BothMiss(t *testing.T) {
t.Parallel()
url := newAssetsTestServer(t, nil)
code, _ := get(t, url+"/agent/binary?os=linux&arch=amd64")
if code != 404 {
t.Fatalf("expected 404, got %d", code)
}
}
func TestAgentBinary_WindowsNameHasExe(t *testing.T) {
t.Parallel()
url := newAssetsTestServer(t, func(_, bundleDir string) {
writeFile(t, filepath.Join(bundleDir, "agent-binaries", "restic-manager-agent-windows-amd64.exe"),
[]byte("win"))
})
code, body := get(t, url+"/agent/binary?os=windows&arch=amd64")
if code != 200 || string(body) != "win" {
t.Fatalf("got %d %q", code, string(body))
}
}
func TestInstallAsset_BundleFallback(t *testing.T) {
t.Parallel()
url := newAssetsTestServer(t, func(_, bundleDir string) {
writeFile(t, filepath.Join(bundleDir, "install", "install.sh"), []byte("#!/bin/sh\n"))
})
code, body := get(t, url+"/install/install.sh")
if code != 200 || string(body) != "#!/bin/sh\n" {
t.Fatalf("got %d %q", code, string(body))
}
}
func TestInstallAsset_PathTraversalRejected(t *testing.T) {
t.Parallel()
url := newAssetsTestServer(t, nil)
// chi will normalise some traversal attempts, but the handler
// also rejects any rel containing a slash or backslash. The
// path component of the URL after /install/ is the rel.
code, _ := get(t, url+"/install/..%2fpasswd")
if code == 200 {
t.Fatalf("traversal should not return 200")
}
}
-391
View File
@@ -1,391 +0,0 @@
// api_users.go — JSON handlers for the user-management surface.
//
// All endpoints in this file are admin-only; gating happens at the
// route-mount site (server.go's admin band).
package http
import (
"crypto/rand"
"encoding/hex"
"encoding/json"
"errors"
"log/slog"
stdhttp "net/http"
"net/mail"
"strings"
"time"
"github.com/go-chi/chi/v5"
"github.com/oklog/ulid/v2"
"gitea.dcglab.co.uk/steve/restic-manager/internal/store"
)
type listUsersResponse struct {
Users []apiUser `json:"users"`
}
type apiUser struct {
ID string `json:"id"`
Username string `json:"username"`
Role string `json:"role"`
Email *string `json:"email,omitempty"`
Disabled bool `json:"disabled"`
MustChangePassword bool `json:"must_change_password"`
CreatedAt string `json:"created_at"`
LastLoginAt *string `json:"last_login_at,omitempty"`
}
func (s *Server) handleAPIUsersList(w stdhttp.ResponseWriter, r *stdhttp.Request) {
users, err := s.deps.Store.ListUsers(r.Context(), store.UserSort{})
if err != nil {
slog.Error("api users: list", "err", err)
writeJSONError(w, stdhttp.StatusInternalServerError, "internal", err.Error())
return
}
out := make([]apiUser, len(users))
for i, u := range users {
var lastLogin *string
if u.LastLoginAt != nil {
s := u.LastLoginAt.UTC().Format("2006-01-02T15:04:05Z")
lastLogin = &s
}
out[i] = apiUser{
ID: u.ID, Username: u.Username, Role: string(u.Role),
Email: u.Email, Disabled: u.DisabledAt != nil,
MustChangePassword: u.MustChangePassword,
CreatedAt: u.CreatedAt.UTC().Format("2006-01-02T15:04:05Z"),
LastLoginAt: lastLogin,
}
}
w.Header().Set("Content-Type", "application/json; charset=utf-8")
_ = json.NewEncoder(w).Encode(listUsersResponse{Users: out})
}
type createUserRequest struct {
Username string `json:"username"`
Email string `json:"email,omitempty"`
Role string `json:"role"`
}
type createUserResponse struct {
ID string `json:"id"`
SetupURL string `json:"setup_url"`
}
// generateSetupToken returns 32 random bytes hex-encoded (64 chars).
func generateSetupToken() (string, error) {
var b [32]byte
if _, err := rand.Read(b[:]); err != nil {
return "", err
}
return hex.EncodeToString(b[:]), nil
}
// validRole maps a wire role string to the typed constant. Returns
// ("", false) for anything unknown.
func validRole(r string) (store.Role, bool) {
switch r {
case "admin":
return store.RoleAdmin, true
case "operator":
return store.RoleOperator, true
case "viewer":
return store.RoleViewer, true
}
return "", false
}
func (s *Server) handleAPIUserCreate(w stdhttp.ResponseWriter, r *stdhttp.Request) {
actor, _ := s.requireUser(r) // already gated by middleware
var req createUserRequest
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
writeJSONError(w, stdhttp.StatusBadRequest, "invalid_json", err.Error())
return
}
uname := strings.ToLower(strings.TrimSpace(req.Username))
if uname == "" {
writeJSONError(w, stdhttp.StatusBadRequest, "username_required", "")
return
}
role, ok := validRole(req.Role)
if !ok {
writeJSONError(w, stdhttp.StatusBadRequest, "invalid_role", "")
return
}
if req.Email != "" {
if _, err := mail.ParseAddress(req.Email); err != nil {
writeJSONError(w, stdhttp.StatusBadRequest, "invalid_email", err.Error())
return
}
}
// Check for collision against existing user (case-insensitive).
existing, err := s.deps.Store.GetUserByUsername(r.Context(), uname)
if err == nil {
body := map[string]any{
"error": "username_taken",
"existing_user_id": existing.ID,
"disabled": existing.DisabledAt != nil,
}
w.Header().Set("Content-Type", "application/json; charset=utf-8")
w.WriteHeader(stdhttp.StatusConflict)
_ = json.NewEncoder(w).Encode(body)
return
} else if !errors.Is(err, store.ErrNotFound) {
writeJSONError(w, stdhttp.StatusInternalServerError, "internal", err.Error())
return
}
id := ulid.Make().String()
now := time.Now().UTC()
var emailPtr *string
if req.Email != "" {
em := strings.ToLower(strings.TrimSpace(req.Email))
emailPtr = &em
}
if err := s.deps.Store.CreateUser(r.Context(), store.User{
ID: id, Username: uname, PasswordHash: "",
Role: role, Email: emailPtr, CreatedAt: now,
MustChangePassword: true,
}); err != nil {
writeJSONError(w, stdhttp.StatusInternalServerError, "internal", err.Error())
return
}
rawToken, err := generateSetupToken()
if err != nil {
writeJSONError(w, stdhttp.StatusInternalServerError, "internal", err.Error())
return
}
var actorID *string
if actor != nil {
actorID = &actor.ID
}
if err := s.deps.Store.SetSetupToken(r.Context(), store.SetupToken{
UserID: id, TokenHash: hashSetupToken(rawToken),
ExpiresAt: now.Add(time.Hour),
CreatedAt: now, CreatedBy: actorID,
}); err != nil {
writeJSONError(w, stdhttp.StatusInternalServerError, "internal", err.Error())
return
}
_ = s.deps.Store.AppendAudit(r.Context(), store.AuditEntry{
ID: ulid.Make().String(), UserID: actorID, Actor: "user",
Action: "user.created", TargetKind: ptr("user"), TargetID: &id,
TS: now,
})
w.Header().Set("Content-Type", "application/json; charset=utf-8")
w.WriteHeader(stdhttp.StatusCreated)
_ = json.NewEncoder(w).Encode(createUserResponse{
ID: id,
SetupURL: s.deps.Cfg.BaseURL + "/setup?token=" + rawToken,
})
}
func (s *Server) handleAPIUserGet(w stdhttp.ResponseWriter, r *stdhttp.Request) {
id := chi.URLParam(r, "id")
u, err := s.deps.Store.GetUserByID(r.Context(), id)
if err != nil {
if errors.Is(err, store.ErrNotFound) {
writeJSONError(w, stdhttp.StatusNotFound, "user_not_found", "")
return
}
writeJSONError(w, stdhttp.StatusInternalServerError, "internal", err.Error())
return
}
out := apiUser{
ID: u.ID, Username: u.Username, Role: string(u.Role),
Email: u.Email, Disabled: u.DisabledAt != nil,
MustChangePassword: u.MustChangePassword,
CreatedAt: u.CreatedAt.UTC().Format("2006-01-02T15:04:05Z"),
}
if u.LastLoginAt != nil {
ll := u.LastLoginAt.UTC().Format("2006-01-02T15:04:05Z")
out.LastLoginAt = &ll
}
w.Header().Set("Content-Type", "application/json; charset=utf-8")
_ = json.NewEncoder(w).Encode(out)
}
type patchUserRequest struct {
Role *string `json:"role,omitempty"`
Email *string `json:"email,omitempty"`
}
func (s *Server) handleAPIUserPatch(w stdhttp.ResponseWriter, r *stdhttp.Request) {
actor, _ := s.requireUser(r)
id := chi.URLParam(r, "id")
u, err := s.deps.Store.GetUserByID(r.Context(), id)
if err != nil {
writeJSONError(w, stdhttp.StatusNotFound, "user_not_found", "")
return
}
var req patchUserRequest
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
writeJSONError(w, stdhttp.StatusBadRequest, "invalid_json", err.Error())
return
}
if req.Role != nil {
newRole, ok := validRole(*req.Role)
if !ok {
writeJSONError(w, stdhttp.StatusBadRequest, "invalid_role", "")
return
}
// Last-admin guard: cannot demote the only enabled admin.
if u.Role == store.RoleAdmin && newRole != store.RoleAdmin && u.DisabledAt == nil {
n, _ := s.deps.Store.CountEnabledAdmins(r.Context())
if n <= 1 {
writeJSONError(w, stdhttp.StatusConflict, "last_admin", "")
return
}
}
if err := s.deps.Store.SetUserRole(r.Context(), id, newRole); err != nil {
writeJSONError(w, stdhttp.StatusInternalServerError, "internal", err.Error())
return
}
}
if req.Email != nil {
em := strings.TrimSpace(*req.Email)
if em != "" {
if _, err := mail.ParseAddress(em); err != nil {
writeJSONError(w, stdhttp.StatusBadRequest, "invalid_email", err.Error())
return
}
}
if err := s.deps.Store.SetUserEmail(r.Context(), id, em); err != nil {
writeJSONError(w, stdhttp.StatusInternalServerError, "internal", err.Error())
return
}
}
var actorID *string
if actor != nil {
actorID = &actor.ID
}
_ = s.deps.Store.AppendAudit(r.Context(), store.AuditEntry{
ID: ulid.Make().String(), UserID: actorID, Actor: "user",
Action: "user.updated", TargetKind: ptr("user"), TargetID: &id,
TS: time.Now().UTC(),
})
w.WriteHeader(stdhttp.StatusOK)
}
func (s *Server) handleAPIUserDisable(w stdhttp.ResponseWriter, r *stdhttp.Request) {
actor, _ := s.requireUser(r)
id := chi.URLParam(r, "id")
u, err := s.deps.Store.GetUserByID(r.Context(), id)
if err != nil {
writeJSONError(w, stdhttp.StatusNotFound, "user_not_found", "")
return
}
if u.Role == store.RoleAdmin && u.DisabledAt == nil {
n, _ := s.deps.Store.CountEnabledAdmins(r.Context())
if n <= 1 {
writeJSONError(w, stdhttp.StatusConflict, "last_admin", "")
return
}
}
now := time.Now().UTC()
if err := s.deps.Store.DisableUser(r.Context(), id, now); err != nil {
writeJSONError(w, stdhttp.StatusInternalServerError, "internal", err.Error())
return
}
// Kick existing sessions so the user is bounced immediately.
_, _ = s.deps.Store.DeleteSessionsByUserID(r.Context(), id)
var actorID *string
if actor != nil {
actorID = &actor.ID
}
_ = s.deps.Store.AppendAudit(r.Context(), store.AuditEntry{
ID: ulid.Make().String(), UserID: actorID, Actor: "user",
Action: "user.disabled", TargetKind: ptr("user"), TargetID: &id,
TS: now,
})
w.WriteHeader(stdhttp.StatusOK)
}
func (s *Server) handleAPIUserEnable(w stdhttp.ResponseWriter, r *stdhttp.Request) {
actor, _ := s.requireUser(r)
id := chi.URLParam(r, "id")
if err := s.deps.Store.EnableUser(r.Context(), id); err != nil {
writeJSONError(w, stdhttp.StatusInternalServerError, "internal", err.Error())
return
}
var actorID *string
if actor != nil {
actorID = &actor.ID
}
_ = s.deps.Store.AppendAudit(r.Context(), store.AuditEntry{
ID: ulid.Make().String(), UserID: actorID, Actor: "user",
Action: "user.enabled", TargetKind: ptr("user"), TargetID: &id,
TS: time.Now().UTC(),
})
w.WriteHeader(stdhttp.StatusOK)
}
type regenerateSetupResponse struct {
SetupURL string `json:"setup_url"`
}
func (s *Server) handleAPIUserRegenerateSetup(w stdhttp.ResponseWriter, r *stdhttp.Request) {
actor, _ := s.requireUser(r)
id := chi.URLParam(r, "id")
if _, err := s.deps.Store.GetUserByID(r.Context(), id); err != nil {
writeJSONError(w, stdhttp.StatusNotFound, "user_not_found", "")
return
}
rawToken, err := generateSetupToken()
if err != nil {
writeJSONError(w, stdhttp.StatusInternalServerError, "internal", err.Error())
return
}
now := time.Now().UTC()
var actorID *string
if actor != nil {
actorID = &actor.ID
}
if err := s.deps.Store.SetSetupToken(r.Context(), store.SetupToken{
UserID: id, TokenHash: hashSetupToken(rawToken),
ExpiresAt: now.Add(time.Hour),
CreatedAt: now, CreatedBy: actorID,
}); err != nil {
writeJSONError(w, stdhttp.StatusInternalServerError, "internal", err.Error())
return
}
if err := s.deps.Store.SetMustChangePassword(r.Context(), id, true); err != nil {
writeJSONError(w, stdhttp.StatusInternalServerError, "internal", err.Error())
return
}
_ = s.deps.Store.AppendAudit(r.Context(), store.AuditEntry{
ID: ulid.Make().String(), UserID: actorID, Actor: "user",
Action: "user.setup_token.regenerated",
TargetKind: ptr("user"), TargetID: &id, TS: now,
})
w.Header().Set("Content-Type", "application/json; charset=utf-8")
_ = json.NewEncoder(w).Encode(regenerateSetupResponse{
SetupURL: s.deps.Cfg.BaseURL + "/setup?token=" + rawToken,
})
}
func (s *Server) handleAPIUserForceLogout(w stdhttp.ResponseWriter, r *stdhttp.Request) {
actor, _ := s.requireUser(r)
id := chi.URLParam(r, "id")
n, err := s.deps.Store.DeleteSessionsByUserID(r.Context(), id)
if err != nil {
writeJSONError(w, stdhttp.StatusInternalServerError, "internal", err.Error())
return
}
var actorID *string
if actor != nil {
actorID = &actor.ID
}
_ = s.deps.Store.AppendAudit(r.Context(), store.AuditEntry{
ID: ulid.Make().String(), UserID: actorID, Actor: "user",
Action: "user.force_logout",
TargetKind: ptr("user"), TargetID: &id,
TS: time.Now().UTC(),
})
w.Header().Set("Content-Type", "application/json; charset=utf-8")
_ = json.NewEncoder(w).Encode(map[string]int64{"sessions_killed": n})
}
-6
View File
@@ -56,15 +56,9 @@ func (s *Server) authenticateAndSession(w stdhttp.ResponseWriter, r *stdhttp.Req
// existence to a probing attacker.
return nil, errInvalidCredentials
}
if u.AuthSource == "oidc" {
return nil, errInvalidCredentials
}
if err := auth.VerifyPassword(u.PasswordHash, password); err != nil {
return nil, errInvalidCredentials
}
if u.DisabledAt != nil {
return nil, errInvalidCredentials
}
token, err := auth.NewToken()
if err != nil {
-157
View File
@@ -1,157 +0,0 @@
// bootstrap_handler.go — public landing page for the first-run admin
// flow. While the server has no users and still holds the in-memory
// one-shot bootstrap token printed at startup, /bootstrap renders a
// form that takes a username + password and creates the first admin.
//
// The operator never sees or types the token: the server already has
// it in memory, so the UI handler uses it directly. The token printed
// to stderr remains a break-glass fallback for the JSON
// /api/bootstrap path.
//
// Routes (wired in server.go):
//
// GET /bootstrap → handleUIBootstrapGet
// POST /bootstrap → handleUIBootstrapPost
//
// Both routes self-disable the moment a user row exists; subsequent
// hits redirect to /login.
package http
import (
"log/slog"
stdhttp "net/http"
"time"
"github.com/oklog/ulid/v2"
"gitea.dcglab.co.uk/steve/restic-manager/internal/auth"
"gitea.dcglab.co.uk/steve/restic-manager/internal/store"
)
type bootstrapPage struct {
Username string
Error string
}
func (s *Server) handleUIBootstrapGet(w stdhttp.ResponseWriter, r *stdhttp.Request) {
if !s.bootstrapAvailable(r) {
stdhttp.Redirect(w, r, "/login", stdhttp.StatusSeeOther)
return
}
s.renderBootstrap(w, r, "", "")
}
func (s *Server) handleUIBootstrapPost(w stdhttp.ResponseWriter, r *stdhttp.Request) {
if !s.bootstrapAvailable(r) {
stdhttp.Redirect(w, r, "/login", stdhttp.StatusSeeOther)
return
}
if err := r.ParseForm(); err != nil {
stdhttp.Error(w, "bad request", stdhttp.StatusBadRequest)
return
}
username := r.PostForm.Get("username")
pw := r.PostForm.Get("password")
pw2 := r.PostForm.Get("password_confirm")
if username == "" {
s.renderBootstrap(w, r, username, "Pick a username.")
return
}
if pw == "" || pw2 == "" || pw != pw2 || len(pw) < 12 {
s.renderBootstrap(w, r, username,
"Passwords must match and be at least 12 characters.")
return
}
hash, err := auth.HashPassword(pw)
if err != nil {
slog.Error("bootstrap: hash password", "err", err)
stdhttp.Error(w, "internal", stdhttp.StatusInternalServerError)
return
}
now := time.Now().UTC()
u := store.User{
ID: ulid.Make().String(),
Username: username,
PasswordHash: hash,
Role: store.RoleAdmin,
CreatedAt: now,
}
if err := s.deps.Store.CreateUser(r.Context(), u); err != nil {
slog.Error("bootstrap: create user", "err", err)
s.renderBootstrap(w, r, username,
"Could not create the administrator account. Check the server logs.")
return
}
// Clear the in-memory token so /api/bootstrap also stops accepting
// further calls. CountUsers > 0 already gates both surfaces, but
// blanking the token kills the constant-time-compare branch as
// well — defence in depth, plus stops the token from sitting in
// process memory longer than necessary.
s.deps.BootstrapToken = ""
_ = s.deps.Store.AppendAudit(r.Context(), store.AuditEntry{
ID: ulid.Make().String(),
UserID: &u.ID,
Actor: "system",
Action: "auth.bootstrap",
TS: now,
})
// Mint a session so the new admin lands authenticated on /.
rawSession, err := auth.NewToken()
if err != nil {
slog.Error("bootstrap: session token", "err", err)
stdhttp.Error(w, "internal", stdhttp.StatusInternalServerError)
return
}
if err := s.deps.Store.CreateSession(r.Context(), store.Session{
UserID: u.ID,
CreatedAt: now,
ExpiresAt: now.Add(sessionTTL),
IP: r.RemoteAddr,
UA: r.UserAgent(),
}, auth.HashToken(rawSession)); err != nil {
slog.Error("bootstrap: create session", "err", err)
stdhttp.Error(w, "internal", stdhttp.StatusInternalServerError)
return
}
_ = s.deps.Store.MarkUserLogin(r.Context(), u.ID, now)
stdhttp.SetCookie(w, &stdhttp.Cookie{
Name: sessionCookieName,
Value: rawSession,
Path: "/",
HttpOnly: true,
Secure: s.deps.Cfg.CookieSecure,
SameSite: stdhttp.SameSiteLaxMode,
Expires: now.Add(sessionTTL),
})
stdhttp.Redirect(w, r, "/", stdhttp.StatusSeeOther)
}
// bootstrapAvailable reports whether a fresh-install bootstrap can
// still proceed: a one-shot token is held in memory and no user rows
// exist yet.
func (s *Server) bootstrapAvailable(r *stdhttp.Request) bool {
if s.deps.BootstrapToken == "" {
return false
}
n, err := s.deps.Store.CountUsers(r.Context())
if err != nil {
slog.Error("bootstrap: count users", "err", err)
return false
}
return n == 0
}
func (s *Server) renderBootstrap(w stdhttp.ResponseWriter, r *stdhttp.Request, username, errMsg string) {
view := s.baseView(r, nil)
view.Title = "Welcome · restic-manager"
view.Page = bootstrapPage{Username: username, Error: errMsg}
if err := s.deps.UI.Render(w, "bootstrap", view); err != nil {
slog.Error("ui bootstrap: render", "err", err)
stdhttp.Error(w, "internal", stdhttp.StatusInternalServerError)
}
}
@@ -1,144 +0,0 @@
// dashboard_filter_test.go — covers the NS-04 filter + sort pipeline
// in pure-Go form, without going through HTTP. The handler tests
// elsewhere prove end-to-end render; here we focus on edge cases of
// the column-sort + filter precedence so a regression in either is
// surfaced loudly.
package http
import (
"net/url"
"testing"
"time"
"gitea.dcglab.co.uk/steve/restic-manager/internal/store"
)
func makeFilterHosts() []store.Host {
t1 := time.Date(2026, 5, 1, 12, 0, 0, 0, time.UTC)
t2 := time.Date(2026, 5, 4, 12, 0, 0, 0, time.UTC)
tSeen := time.Date(2026, 5, 5, 12, 0, 0, 0, time.UTC)
return []store.Host{
{
ID: "01HHA", Name: "alpha", OS: "linux", Status: "online",
RepoStatus: "ready", Tags: []string{"prod"}, SnapshotCount: 30,
LastBackupAt: &t1, LastSeenAt: &tSeen, RepoSizeBytes: 1000,
},
{
ID: "01HHB", Name: "bravo", OS: "linux", Status: "offline",
RepoStatus: "init_failed", Tags: []string{"dev"}, SnapshotCount: 10,
LastBackupAt: &t2, LastSeenAt: &tSeen, RepoSizeBytes: 5000,
},
{
ID: "01HHC", Name: "charlie", OS: "windows", Status: "online",
RepoStatus: "unknown", Tags: []string{"prod", "edge"}, SnapshotCount: 0,
LastSeenAt: nil, // never_seen path
},
}
}
// TestFilterAndSortDashboardSearchAndStatus covers the precedence of
// search ∧ status as combined filters.
func TestFilterAndSortDashboardSearchAndStatus(t *testing.T) {
t.Parallel()
hosts := makeFilterHosts()
// status=online narrows to alpha + charlie.
got := filterAndSortDashboardHosts(hosts, dashboardFilter{Status: "online", Sort: "name", Dir: "asc"})
if len(got) != 2 || got[0].Name != "alpha" || got[1].Name != "charlie" {
t.Errorf("status=online: got %d names %v, want [alpha charlie]", len(got), namesOf(got))
}
// q=bra narrows to bravo regardless of status default.
got = filterAndSortDashboardHosts(hosts, dashboardFilter{Search: "bra", Sort: "name", Dir: "asc"})
if len(got) != 1 || got[0].Name != "bravo" {
t.Errorf("search=bra: got %v", namesOf(got))
}
// repo_status=init_failed narrows to bravo only.
got = filterAndSortDashboardHosts(hosts, dashboardFilter{RepoStatus: "init_failed", Sort: "name", Dir: "asc"})
if len(got) != 1 || got[0].Name != "bravo" {
t.Errorf("repo_status=init_failed: got %v", namesOf(got))
}
// status=never_seen narrows on LastSeenAt == nil → charlie only.
got = filterAndSortDashboardHosts(hosts, dashboardFilter{Status: "never_seen", Sort: "name", Dir: "asc"})
if len(got) != 1 || got[0].Name != "charlie" {
t.Errorf("status=never_seen: got %v", namesOf(got))
}
// tag=prod narrows to alpha + charlie.
got = filterAndSortDashboardHosts(hosts, dashboardFilter{Tag: "prod", Sort: "name", Dir: "asc"})
if len(got) != 2 || got[0].Name != "alpha" || got[1].Name != "charlie" {
t.Errorf("tag=prod: got %v", namesOf(got))
}
}
// TestSortDashboardHostsColumns verifies each meaningful column
// sorts as expected, both ascending and descending.
func TestSortDashboardHostsColumns(t *testing.T) {
t.Parallel()
hosts := makeFilterHosts()
cases := []struct {
col, dir string
want []string
}{
{"name", "asc", []string{"alpha", "bravo", "charlie"}},
{"name", "desc", []string{"charlie", "bravo", "alpha"}},
{"snapshot_count", "asc", []string{"charlie", "bravo", "alpha"}},
{"snapshot_count", "desc", []string{"alpha", "bravo", "charlie"}},
{"last_backup", "asc", []string{"charlie", "alpha", "bravo"}}, // nil → zero → first
{"repo_status", "asc", []string{"bravo", "alpha", "charlie"}}, // init_failed < ready < unknown
}
for _, c := range cases {
c := c
t.Run(c.col+"_"+c.dir, func(t *testing.T) {
got := append([]store.Host(nil), hosts...)
sortDashboardHosts(got, c.col, c.dir)
if names := namesOf(got); !sliceEq(names, c.want) {
t.Errorf("got %v, want %v", names, c.want)
}
})
}
}
// TestParseDashboardFilterDefaults: empty query gives sort=name asc.
func TestParseDashboardFilterDefaults(t *testing.T) {
t.Parallel()
f := parseDashboardFilter(url.Values{})
if f.Sort != "name" || f.Dir != "asc" {
t.Errorf("defaults: got sort=%q dir=%q, want name/asc", f.Sort, f.Dir)
}
}
// TestBuildDashboardSortURLsToggles: clicking the active column
// flips direction; clicking another column resets to asc.
func TestBuildDashboardSortURLsToggles(t *testing.T) {
t.Parallel()
active := dashboardFilter{Sort: "name", Dir: "asc"}
urls := buildDashboardSortURLs(active)
if got := urls["name"]; got != "/?dir=desc" {
t.Errorf("name URL on active asc: got %q, want /?dir=desc", got)
}
// Switching to a non-default column also drops dir=asc since asc
// is the encoded default.
if got := urls["last_backup"]; got != "/?sort=last_backup" {
t.Errorf("last_backup URL: got %q, want /?sort=last_backup", got)
}
}
func namesOf(hs []store.Host) []string {
out := make([]string, len(hs))
for i, h := range hs {
out[i] = h.Name
}
return out
}
func sliceEq(a, b []string) bool {
if len(a) != len(b) {
return false
}
for i := range a {
if a[i] != b[i] {
return false
}
}
return true
}
-63
View File
@@ -146,15 +146,6 @@ func (s *Server) handleSetHostCredentials(w stdhttp.ResponseWriter, r *stdhttp.R
return
}
// NS-03: clear the host's last probe outcome — the new creds may
// reach a different repo (or fix an auth typo), so any prior
// "init_failed" / "ready" tag is stale. The next init dispatch
// (below, when the agent is online) will set it to a fresh value
// on completion.
if err := s.deps.Store.SetHostRepoStatus(r.Context(), hostID, "unknown", ""); err != nil {
slog.Warn("repo creds set: reset repo_status", "host_id", hostID, "err", err)
}
_ = s.deps.Store.AppendAudit(r.Context(), store.AuditEntry{
ID: ulid.Make().String(),
UserID: &user.ID,
@@ -169,65 +160,11 @@ func (s *Server) handleSetHostCredentials(w stdhttp.ResponseWriter, r *stdhttp.R
// the next reconnect will pick the row up via the hello handler.
if s.deps.Hub != nil && s.deps.Hub.Connected(hostID) {
_ = s.pushRepoCredsToAgent(r.Context(), hostID, existing)
// Force a fresh probe so a typo / wrong URL surfaces now
// rather than at the next scheduled job. No-op if offline —
// the operator already saw "host offline" elsewhere.
if err := s.dispatchInitJob(r.Context(), hostID, "user", &user.ID); err != nil {
slog.Warn("repo creds set: dispatch init", "host_id", hostID, "err", err)
}
}
w.WriteHeader(stdhttp.StatusNoContent)
}
// dispatchInitJob creates an init job row, marshals the command.run,
// ships it down the agent's WS connection (when connected), and
// audits. NS-03 path: callers use this to force a fresh probe after
// credentials change without waiting for the next hello — and without
// the maybeAutoInit "first time only" guard. actorKind should be
// "user" for operator-driven dispatches and "system" for the
// auto-init-on-hello case so audit reflects intent.
func (s *Server) dispatchInitJob(ctx context.Context, hostID, actorKind string, actorID *string) error {
jobID := ulid.Make().String()
now := time.Now().UTC()
if err := s.deps.Store.CreateJob(ctx, store.Job{
ID: jobID,
HostID: hostID,
Kind: string(api.JobInit),
ActorKind: actorKind,
ActorID: actorID,
CreatedAt: now,
}); err != nil {
return fmt.Errorf("dispatch init: persist job: %w", err)
}
env, err := api.Marshal(api.MsgCommandRun, jobID, api.CommandRunPayload{
JobID: jobID,
Kind: api.JobInit,
})
if err != nil {
return fmt.Errorf("dispatch init: marshal: %w", err)
}
if s.deps.Hub != nil && s.deps.Hub.Connected(hostID) {
sendCtx, cancel := context.WithTimeout(ctx, 5*time.Second)
defer cancel()
if err := s.deps.Hub.Send(sendCtx, hostID, env); err != nil {
// Job row stays — the host's pending-runs drain or the next
// hello picks it up. We leave the slate clean for the caller.
return fmt.Errorf("dispatch init: ws send: %w", err)
}
}
_ = s.deps.Store.AppendAudit(ctx, store.AuditEntry{
ID: ulid.Make().String(),
UserID: actorID,
Actor: actorKind,
Action: "host.repo_init_dispatched",
TargetKind: ptr("host"),
TargetID: &hostID,
TS: now,
})
return nil
}
// pushRepoCredsToAgent serialises blob into a config.update envelope
// and ships it down the agent's WS. Returns an error from the hub
// (no-op if not connected — caller is expected to check first when it
-6
View File
@@ -152,12 +152,6 @@ func (s *Server) requireUser(r *stdhttp.Request) (*store.User, bool) {
if err != nil {
return nil, false
}
if u.DisabledAt != nil {
// Disabled mid-session — kill the session and reject the
// request as if it were unauthenticated.
_ = s.deps.Store.DeleteSession(r.Context(), auth.HashToken(c.Value))
return nil, false
}
return u, true
}
-205
View File
@@ -1,205 +0,0 @@
// oidc_handlers.go — OIDC sign-in handlers. Public routes when oidc
// is configured (s.deps.OIDC != nil), otherwise not mounted.
package http
import (
"encoding/json"
"errors"
"log/slog"
stdhttp "net/http"
"strings"
"time"
"github.com/oklog/ulid/v2"
"gitea.dcglab.co.uk/steve/restic-manager/internal/auth"
"gitea.dcglab.co.uk/steve/restic-manager/internal/server/oidc"
"gitea.dcglab.co.uk/steve/restic-manager/internal/store"
)
// handleOIDCLogin generates state + PKCE pair, persists them, and
// redirects to the IdP authorization endpoint.
func (s *Server) handleOIDCLogin(w stdhttp.ResponseWriter, r *stdhttp.Request) {
state, err := oidc.RandomState()
if err != nil {
slog.Error("oidc login: state", "err", err)
stdhttp.Error(w, "internal", stdhttp.StatusInternalServerError)
return
}
verifier, challenge, err := oidc.PKCEPair()
if err != nil {
slog.Error("oidc login: pkce", "err", err)
stdhttp.Error(w, "internal", stdhttp.StatusInternalServerError)
return
}
if err := s.deps.Store.PutOIDCState(r.Context(),
oidc.HashState(state), verifier, time.Now().UTC()); err != nil {
slog.Error("oidc login: persist state", "err", err)
stdhttp.Error(w, "internal", stdhttp.StatusInternalServerError)
return
}
stdhttp.Redirect(w, r, s.deps.OIDC.AuthURL(state, challenge), stdhttp.StatusSeeOther)
}
func (s *Server) handleOIDCCallback(w stdhttp.ResponseWriter, r *stdhttp.Request) {
q := r.URL.Query()
code := q.Get("code")
state := q.Get("state")
if code == "" || state == "" {
s.oidcRedirectError(w, r, "missing_params")
return
}
verifier, err := s.deps.Store.ConsumeOIDCState(r.Context(), oidc.HashState(state))
if err != nil {
s.oidcRedirectError(w, r, "bad_state")
return
}
claims, rawIDToken, err := s.deps.OIDC.Exchange(r.Context(), code, verifier)
if err != nil {
slog.Warn("oidc callback: exchange", "err", err)
s.oidcRedirectError(w, r, "exchange_failed")
return
}
uname := strings.ToLower(strings.TrimSpace(claims.PreferredUsername))
if uname == "" {
uname = strings.ToLower(strings.TrimSpace(claims.Email))
}
if uname == "" || claims.Subject == "" {
s.oidcRedirectError(w, r, "missing_claims")
return
}
role := s.deps.OIDC.MapRole(claims.Roles)
if role == "" {
_ = s.auditOIDCBlocked(r, claims, "no_role_match")
s.oidcRedirectError(w, r, "no_role_match")
return
}
now := time.Now().UTC()
// Returning OIDC user — refresh role + email + last_login.
existing, err := s.deps.Store.GetUserByOIDCSubject(r.Context(), claims.Subject)
if err == nil {
if existing.DisabledAt != nil {
s.oidcRedirectError(w, r, "user_disabled")
return
}
_ = s.deps.Store.SetUserRole(r.Context(), existing.ID, store.Role(role))
_ = s.deps.Store.SetUserEmail(r.Context(), existing.ID, claims.Email)
_ = s.deps.Store.MarkUserLogin(r.Context(), existing.ID, now)
_ = s.deps.Store.AppendAudit(r.Context(), store.AuditEntry{
ID: ulid.Make().String(), UserID: &existing.ID, Actor: "user",
Action: "user.oidc_login", TargetKind: ptr("user"),
TargetID: &existing.ID, TS: now,
})
s.oidcDropSessionAndRedirect(w, r, existing.ID, rawIDToken, now)
return
} else if !errors.Is(err, store.ErrNotFound) {
slog.Error("oidc callback: lookup by sub", "err", err)
stdhttp.Error(w, "internal", stdhttp.StatusInternalServerError)
return
}
// New OIDC user — first check the username doesn't collide with
// a local user.
if _, err := s.deps.Store.GetUserByUsername(r.Context(), uname); err == nil {
_ = s.auditOIDCBlocked(r, claims, "username_taken")
s.oidcRedirectError(w, r, "username_taken")
return
} else if !errors.Is(err, store.ErrNotFound) {
slog.Error("oidc callback: lookup by username", "err", err)
stdhttp.Error(w, "internal", stdhttp.StatusInternalServerError)
return
}
// JIT-provision.
id := ulid.Make().String()
var emailPtr *string
if claims.Email != "" {
em := strings.ToLower(claims.Email)
emailPtr = &em
}
sub := claims.Subject
if err := s.deps.Store.CreateUser(r.Context(), store.User{
ID: id, Username: uname, PasswordHash: "",
Role: store.Role(role), Email: emailPtr,
AuthSource: "oidc", OIDCSubject: &sub,
CreatedAt: now,
}); err != nil {
slog.Error("oidc callback: provision", "err", err)
stdhttp.Error(w, "internal", stdhttp.StatusInternalServerError)
return
}
_ = s.deps.Store.MarkUserLogin(r.Context(), id, now)
_ = s.deps.Store.AppendAudit(r.Context(), store.AuditEntry{
ID: ulid.Make().String(), UserID: &id, Actor: "user",
Action: "user.created", TargetKind: ptr("user"), TargetID: &id,
TS: now,
Payload: jsonMust(map[string]any{"auth_source": "oidc"}),
})
_ = s.deps.Store.AppendAudit(r.Context(), store.AuditEntry{
ID: ulid.Make().String(), UserID: &id, Actor: "user",
Action: "user.oidc_login", TargetKind: ptr("user"), TargetID: &id,
TS: now,
})
s.oidcDropSessionAndRedirect(w, r, id, rawIDToken, now)
}
func (s *Server) oidcDropSessionAndRedirect(w stdhttp.ResponseWriter, r *stdhttp.Request, userID, idToken string, now time.Time) {
rawSession, err := auth.NewToken()
if err != nil {
slog.Error("oidc: session token", "err", err)
stdhttp.Error(w, "internal", stdhttp.StatusInternalServerError)
return
}
hashed := auth.HashToken(rawSession)
if err := s.deps.Store.CreateSession(r.Context(), store.Session{
ID: hashed, UserID: userID, CreatedAt: now,
ExpiresAt: now.Add(8 * time.Hour),
IDToken: idToken,
}, hashed); err != nil {
slog.Error("oidc: create session", "err", err)
stdhttp.Error(w, "internal", stdhttp.StatusInternalServerError)
return
}
stdhttp.SetCookie(w, &stdhttp.Cookie{
Name: sessionCookieName, Value: rawSession,
Path: "/", HttpOnly: true,
SameSite: stdhttp.SameSiteLaxMode,
Secure: s.deps.Cfg.CookieSecure,
Expires: now.Add(8 * time.Hour),
})
stdhttp.Redirect(w, r, "/", stdhttp.StatusSeeOther)
}
func (s *Server) oidcRedirectError(w stdhttp.ResponseWriter, r *stdhttp.Request, code string) {
stdhttp.Redirect(w, r, "/login?oidc_error="+code, stdhttp.StatusSeeOther)
}
// auditOIDCBlocked records a failed sign-in. user_id is nil because
// no row was created; the IdP subject + reason go in the payload so
// admin can correlate.
func (s *Server) auditOIDCBlocked(r *stdhttp.Request, claims *oidc.Claims, reason string) error {
return s.deps.Store.AppendAudit(r.Context(), store.AuditEntry{
ID: ulid.Make().String(), UserID: nil, Actor: "system",
Action: "user.oidc_login_blocked", TargetKind: ptr("user"),
TargetID: nil, TS: time.Now().UTC(),
Payload: jsonMust(map[string]any{
"sub": claims.Subject,
"username": claims.PreferredUsername,
"reason": reason,
}),
})
}
// jsonMust marshals to json.RawMessage; on error returns nil so the
// audit row still lands without the payload (best-effort).
func jsonMust(v any) json.RawMessage {
b, err := json.Marshal(v)
if err != nil {
return nil
}
return json.RawMessage(b)
}
-293
View File
@@ -1,293 +0,0 @@
package http
import (
"bytes"
"context"
"encoding/json"
stdhttp "net/http"
"net/http/cookiejar"
"net/http/httptest"
"net/url"
"path/filepath"
"strings"
"testing"
"time"
"gitea.dcglab.co.uk/steve/restic-manager/internal/crypto"
"gitea.dcglab.co.uk/steve/restic-manager/internal/server/config"
"gitea.dcglab.co.uk/steve/restic-manager/internal/server/oidc"
"gitea.dcglab.co.uk/steve/restic-manager/internal/server/oidc/oidctest"
"gitea.dcglab.co.uk/steve/restic-manager/internal/store"
)
// newTestServerWithOIDC returns a Server wired to a stub IdP.
// Returned ts is the httptest.Server fronting the actual server;
// stub is the IdP for minting codes / configuring claims.
func newTestServerWithOIDC(t *testing.T) (*Server, *httptest.Server, *oidctest.StubIdP) {
t.Helper()
dir := t.TempDir()
st, err := store.Open(context.Background(), filepath.Join(dir, "rm.db"))
if err != nil {
t.Fatalf("store: %v", err)
}
t.Cleanup(func() { _ = st.Close() })
keyPath := filepath.Join(dir, "secret.key")
if err := crypto.GenerateKeyFile(keyPath); err != nil {
t.Fatalf("genkey: %v", err)
}
key, _ := crypto.LoadKeyFromFile(keyPath)
aead, _ := crypto.NewAEAD(key)
stub := oidctest.New(t)
cfg := &config.OIDCConfig{
Issuer: stub.URL(), ClientID: "test-client", ClientSecret: "x",
Scopes: []string{"openid"}, RoleClaim: "groups",
RoleMapping: map[string]string{
"rm-admins": "admin",
"rm-operators": "operator",
"rm-viewers": "viewer",
},
}
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()
oidcClient, err := oidc.New(ctx, cfg, "http://test")
if err != nil {
t.Fatalf("oidc client: %v", err)
}
deps := Deps{
Cfg: config.Config{Listen: ":0", DataDir: dir, SecretKeyFile: keyPath, BaseURL: "http://test"},
Store: st,
AEAD: aead,
OIDC: oidcClient,
}
s := New(deps)
ts := httptest.NewServer(s.srv.Handler)
t.Cleanup(ts.Close)
return s, ts, stub
}
func TestOIDCLoginRedirectsToIdP(t *testing.T) {
t.Parallel()
srv, ts, _ := newTestServerWithOIDC(t)
c := &stdhttp.Client{CheckRedirect: func(*stdhttp.Request, []*stdhttp.Request) error {
return stdhttp.ErrUseLastResponse
}}
res, err := c.Get(ts.URL + "/auth/oidc/login")
if err != nil {
t.Fatalf("get: %v", err)
}
defer res.Body.Close()
if res.StatusCode != stdhttp.StatusSeeOther {
t.Errorf("status: got %d want 303", res.StatusCode)
}
loc := res.Header.Get("Location")
if !strings.Contains(loc, "code_challenge=") || !strings.Contains(loc, "state=") {
t.Errorf("location: %q", loc)
}
_ = srv
}
// runCallback drives the auth code flow against the stub: kicks off
// /auth/oidc/login (capturing the state), mints a code at the stub
// with the given claims, then GETs /auth/oidc/callback. Returns the
// final response.
func runCallback(t *testing.T, ts *httptest.Server, stub *oidctest.StubIdP, claims map[string]any) *stdhttp.Response {
t.Helper()
jar, _ := cookiejar.New(nil)
c := &stdhttp.Client{Jar: jar, CheckRedirect: func(*stdhttp.Request, []*stdhttp.Request) error {
return stdhttp.ErrUseLastResponse
}}
res, err := c.Get(ts.URL + "/auth/oidc/login")
if err != nil {
t.Fatalf("login: %v", err)
}
res.Body.Close()
authURL, _ := url.Parse(res.Header.Get("Location"))
state := authURL.Query().Get("state")
code := stub.MintCode(claims)
res, err = c.Get(ts.URL + "/auth/oidc/callback?code=" + code + "&state=" + state)
if err != nil {
t.Fatalf("callback: %v", err)
}
return res
}
func TestOIDCCallbackHappyPathAdmin(t *testing.T) {
t.Parallel()
srv, ts, stub := newTestServerWithOIDC(t)
res := runCallback(t, ts, stub, map[string]any{
"sub": "admin-sub",
"preferred_username": "alice",
"email": "alice@example.com",
"groups": []string{"rm-admins"},
"aud": "test-client",
})
defer res.Body.Close()
if res.StatusCode != stdhttp.StatusSeeOther || res.Header.Get("Location") != "/" {
t.Errorf("status: %d Location: %q", res.StatusCode, res.Header.Get("Location"))
}
u, err := srv.deps.Store.GetUserByOIDCSubject(t.Context(), "admin-sub")
if err != nil || u.AuthSource != "oidc" || u.Role != "admin" || u.Username != "alice" {
t.Errorf("user: %+v err: %v", u, err)
}
}
func TestOIDCCallbackNoRoleMatchDeny(t *testing.T) {
t.Parallel()
_, ts, stub := newTestServerWithOIDC(t)
res := runCallback(t, ts, stub, map[string]any{
"sub": "other-sub",
"preferred_username": "bob",
"groups": []string{"something-else"},
"aud": "test-client",
})
defer res.Body.Close()
if res.StatusCode != stdhttp.StatusSeeOther {
t.Errorf("status: got %d want 303", res.StatusCode)
}
loc := res.Header.Get("Location")
if !strings.Contains(loc, "oidc_error=no_role_match") {
t.Errorf("location: %q", loc)
}
}
func TestOIDCCallbackUsernameCollision(t *testing.T) {
t.Parallel()
srv, ts, stub := newTestServerWithOIDC(t)
if err := srv.deps.Store.CreateUser(t.Context(), store.User{
ID: "local-alice", Username: "alice", PasswordHash: "x",
Role: store.RoleViewer, CreatedAt: time.Now().UTC(),
}); err != nil {
t.Fatalf("seed: %v", err)
}
res := runCallback(t, ts, stub, map[string]any{
"sub": "remote-sub",
"preferred_username": "alice",
"groups": []string{"rm-admins"},
"aud": "test-client",
})
defer res.Body.Close()
loc := res.Header.Get("Location")
if !strings.Contains(loc, "oidc_error=username_taken") {
t.Errorf("location: %q", loc)
}
if _, err := srv.deps.Store.GetUserByOIDCSubject(t.Context(), "remote-sub"); err == nil {
t.Error("collision should not have provisioned a user")
}
}
func TestOIDCCallbackReturningUserRefreshesRole(t *testing.T) {
t.Parallel()
srv, ts, stub := newTestServerWithOIDC(t)
res := runCallback(t, ts, stub, map[string]any{
"sub": "carol-sub",
"preferred_username": "carol",
"groups": []string{"rm-operators"},
"aud": "test-client",
})
res.Body.Close()
res = runCallback(t, ts, stub, map[string]any{
"sub": "carol-sub",
"preferred_username": "carol",
"groups": []string{"rm-admins"},
"aud": "test-client",
})
res.Body.Close()
u, _ := srv.deps.Store.GetUserByOIDCSubject(t.Context(), "carol-sub")
if u.Role != "admin" {
t.Errorf("role refresh: got %q want admin", u.Role)
}
}
func TestOIDCLogoutRedirectsToEndSession(t *testing.T) {
t.Parallel()
srv, ts, stub := newTestServerWithOIDC(t)
endSessionURL := stub.URL() + "/logout-end"
stub.SetEndSessionEndpoint(endSessionURL)
// Rebuild the OIDC client because end_session_endpoint is read at
// New() time from the discovery doc.
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()
cfg := &config.OIDCConfig{
Issuer: stub.URL(), ClientID: "test-client", ClientSecret: "x",
Scopes: []string{"openid"}, RoleClaim: "groups",
RoleMapping: map[string]string{"rm-admins": "admin"},
}
newClient, err := oidc.New(ctx, cfg, "http://test")
if err != nil {
t.Fatalf("rebuild client: %v", err)
}
srv.deps.OIDC = newClient
// Sign in via the OIDC flow.
res := runCallback(t, ts, stub, map[string]any{
"sub": "logout-sub",
"preferred_username": "lo",
"groups": []string{"rm-admins"},
"aud": "test-client",
})
res.Body.Close()
cookies := res.Cookies()
if len(cookies) == 0 {
t.Fatal("expected session cookie after sign-in")
}
sessionCookie := cookies[0]
// POST /logout — should 303 to the end_session endpoint with
// id_token_hint + post_logout_redirect_uri.
c := &stdhttp.Client{CheckRedirect: func(*stdhttp.Request, []*stdhttp.Request) error {
return stdhttp.ErrUseLastResponse
}}
req, _ := stdhttp.NewRequest("POST", ts.URL+"/logout", nil)
req.AddCookie(sessionCookie)
res, err = c.Do(req)
if err != nil {
t.Fatalf("logout: %v", err)
}
defer res.Body.Close()
if res.StatusCode != stdhttp.StatusSeeOther {
t.Errorf("status: got %d want 303", res.StatusCode)
}
loc := res.Header.Get("Location")
if !strings.Contains(loc, "/logout-end") {
t.Errorf("location not at end_session: %q", loc)
}
if !strings.Contains(loc, "id_token_hint=") {
t.Errorf("location missing id_token_hint: %q", loc)
}
if !strings.Contains(loc, "post_logout_redirect_uri=") {
t.Errorf("location missing post_logout_redirect_uri: %q", loc)
}
}
func TestLocalLoginRejectsOIDCUser(t *testing.T) {
t.Parallel()
srv, urlBase := newTestServer(t, false)
uid := "u-oidc"
sub := "sub-x"
if err := srv.deps.Store.CreateUser(t.Context(), store.User{
ID: uid, Username: "ouser", PasswordHash: "",
Role: store.RoleOperator, CreatedAt: time.Now().UTC(),
AuthSource: "oidc", OIDCSubject: &sub,
}); err != nil {
t.Fatalf("create: %v", err)
}
body, _ := json.Marshal(map[string]string{
"username": "ouser", "password": "anything",
})
res, err := stdhttp.Post(urlBase+"/api/auth/login",
"application/json", bytes.NewReader(body))
if err != nil {
t.Fatalf("post: %v", err)
}
defer res.Body.Close()
if res.StatusCode != stdhttp.StatusUnauthorized {
t.Errorf("status: got %d want 401", res.StatusCode)
}
}
-87
View File
@@ -1,87 +0,0 @@
package http
import (
stdhttp "net/http"
"gitea.dcglab.co.uk/steve/restic-manager/internal/server/ui"
"gitea.dcglab.co.uk/steve/restic-manager/internal/store"
)
// rank maps each role to a numeric tier so 'A is at least B' becomes
// 'rank[A] >= rank[B] && both are known'. Unknown roles return 0 →
// fail-closed against either argument.
var roleRank = map[store.Role]int{
store.RoleViewer: 1,
store.RoleOperator: 2,
store.RoleAdmin: 3,
}
// roleAtLeast reports whether `have` meets or exceeds `min` in the
// admin > operator > viewer hierarchy. Either side being an unknown
// role returns false.
func roleAtLeast(have, min store.Role) bool {
h, hok := roleRank[have]
m, mok := roleRank[min]
if !hok || !mok {
return false
}
return h >= m
}
// requireRole returns chi middleware that 403s any request whose
// session-resolved user doesn't meet the minimum role. Unauthenticated
// requests return 401 (JSON) or 303 → /login (HTML) so the caller
// gets a usable error rather than a confusing 403.
//
// The middleware re-reads the user row on every request — by the time
// you read this you might be tempted to cache; don't. SQLite's WAL
// makes the lookup cheap and admin-driven changes (disable, role
// change) need to land immediately.
func (s *Server) requireRole(min store.Role) func(stdhttp.Handler) stdhttp.Handler {
return func(next stdhttp.Handler) stdhttp.Handler {
return stdhttp.HandlerFunc(func(w stdhttp.ResponseWriter, r *stdhttp.Request) {
u, ok := s.requireUser(r)
if !ok {
if isAPIPath(r) {
writeJSONError(w, stdhttp.StatusUnauthorized, "unauthorised", "")
return
}
stdhttp.Redirect(w, r, "/login", stdhttp.StatusSeeOther)
return
}
if !roleAtLeast(u.Role, min) {
if isAPIPath(r) {
writeJSONError(w, stdhttp.StatusForbidden, "insufficient_role", "")
return
}
renderForbiddenHTML(s, w, r, u, min)
return
}
next.ServeHTTP(w, r)
})
}
}
// isAPIPath reports whether the path lives under /api/. Lets one
// middleware return JSON or HTML appropriately without two near-
// identical wrappers.
func isAPIPath(r *stdhttp.Request) bool {
p := r.URL.Path
return len(p) >= 5 && p[:5] == "/api/"
}
// renderForbiddenHTML emits a small "you don't have permission"
// panel inside the chrome so the user keeps their nav and can
// move away to a page they can see.
func renderForbiddenHTML(s *Server, w stdhttp.ResponseWriter, r *stdhttp.Request, u *store.User, min store.Role) {
w.WriteHeader(stdhttp.StatusForbidden)
view := s.baseView(r, &ui.User{ID: u.ID, Username: u.Username, Role: string(u.Role)})
view.Title = "Forbidden · restic-manager"
view.Page = struct {
Required string
Have string
}{Required: string(min), Have: string(u.Role)}
if err := s.deps.UI.Render(w, "forbidden", view); err != nil {
_, _ = w.Write([]byte("403 Forbidden — your role does not permit this page."))
}
}
-162
View File
@@ -1,162 +0,0 @@
package http
import (
"bytes"
"encoding/json"
stdhttp "net/http"
"net/http/httptest"
"strings"
"testing"
"time"
"gitea.dcglab.co.uk/steve/restic-manager/internal/store"
)
func TestRoleAtLeast(t *testing.T) {
t.Parallel()
cases := []struct {
have store.Role
min store.Role
want bool
}{
{store.RoleViewer, store.RoleViewer, true},
{store.RoleOperator, store.RoleViewer, true},
{store.RoleAdmin, store.RoleViewer, true},
{store.RoleAdmin, store.RoleOperator, true},
{store.RoleAdmin, store.RoleAdmin, true},
{store.RoleViewer, store.RoleOperator, false},
{store.RoleViewer, store.RoleAdmin, false},
{store.RoleOperator, store.RoleAdmin, false},
{store.Role("nonsense"), store.RoleViewer, false},
{store.RoleAdmin, store.Role("nonsense"), false},
}
for _, c := range cases {
got := roleAtLeast(c.have, c.min)
if got != c.want {
t.Errorf("have=%q min=%q: got %v want %v", c.have, c.min, got, c.want)
}
}
}
func TestRequireRoleViewerAdmits(t *testing.T) {
t.Parallel()
srv, _ := newTestServer(t, false)
uid := makeUser(t, srv, "viewer1", store.RoleViewer)
cookie := loginAs(t, srv, uid)
mid := srv.requireRole(store.RoleViewer)
h := mid(stdhttp.HandlerFunc(func(w stdhttp.ResponseWriter, _ *stdhttp.Request) {
w.WriteHeader(stdhttp.StatusOK)
}))
rr := httptest.NewRecorder()
req, _ := stdhttp.NewRequest("GET", "/api/dummy", nil)
req.AddCookie(cookie)
h.ServeHTTP(rr, req)
if rr.Code != stdhttp.StatusOK {
t.Errorf("status: got %d want 200", rr.Code)
}
}
func TestRequireRoleViewerRejectedFromOperator(t *testing.T) {
t.Parallel()
srv, _ := newTestServer(t, false)
uid := makeUser(t, srv, "viewer2", store.RoleViewer)
cookie := loginAs(t, srv, uid)
mid := srv.requireRole(store.RoleOperator)
h := mid(stdhttp.HandlerFunc(func(w stdhttp.ResponseWriter, _ *stdhttp.Request) {
w.WriteHeader(stdhttp.StatusOK)
}))
rr := httptest.NewRecorder()
req, _ := stdhttp.NewRequest("GET", "/api/dummy", nil)
req.AddCookie(cookie)
h.ServeHTTP(rr, req)
if rr.Code != stdhttp.StatusForbidden {
t.Errorf("status: got %d want 403", rr.Code)
}
if !strings.Contains(rr.Body.String(), "insufficient_role") {
t.Errorf("body: got %q", rr.Body.String())
}
}
func TestRequireRoleUnauthenticated401OnAPI(t *testing.T) {
t.Parallel()
srv, _ := newTestServer(t, false)
mid := srv.requireRole(store.RoleViewer)
h := mid(stdhttp.HandlerFunc(func(w stdhttp.ResponseWriter, _ *stdhttp.Request) {
w.WriteHeader(stdhttp.StatusOK)
}))
rr := httptest.NewRecorder()
req, _ := stdhttp.NewRequest("GET", "/api/dummy", nil)
h.ServeHTTP(rr, req)
if rr.Code != stdhttp.StatusUnauthorized {
t.Errorf("status: got %d want 401", rr.Code)
}
}
func TestRequireRoleRejectsDisabledMidSession(t *testing.T) {
t.Parallel()
srv, urlBase := newTestServer(t, false)
uid := makeUser(t, srv, "victim", store.RoleOperator)
cookie := loginAs(t, srv, uid)
// Disable the user *while their session is still valid*.
if err := srv.deps.Store.DisableUser(t.Context(), uid, time.Now().UTC()); err != nil {
t.Fatalf("disable: %v", err)
}
req, _ := stdhttp.NewRequest("GET", urlBase+"/api/hosts", nil)
req.AddCookie(cookie)
res, err := stdhttp.DefaultClient.Do(req)
if err != nil {
t.Fatalf("GET: %v", err)
}
defer res.Body.Close()
if res.StatusCode != stdhttp.StatusUnauthorized {
t.Errorf("status: got %d want 401", res.StatusCode)
}
}
func TestLoginRejectsDisabledUser(t *testing.T) {
t.Parallel()
srv, urlBase := newTestServer(t, false)
uid := makeUser(t, srv, "disabled1", store.RoleOperator)
if err := srv.deps.Store.DisableUser(t.Context(), uid, time.Now().UTC()); err != nil {
t.Fatalf("disable: %v", err)
}
body, _ := json.Marshal(map[string]string{
"username": "disabled1", "password": "test-password",
})
res, err := stdhttp.Post(urlBase+"/api/auth/login", "application/json", bytes.NewReader(body))
if err != nil {
t.Fatalf("POST: %v", err)
}
defer res.Body.Close()
if res.StatusCode != stdhttp.StatusUnauthorized {
t.Errorf("status: got %d want 401", res.StatusCode)
}
}
func TestAdminBandRejectsOperator(t *testing.T) {
t.Parallel()
srv, urlBase := newTestServer(t, false)
makeUser(t, srv, "admin1", store.RoleAdmin)
opID := makeUser(t, srv, "op1", store.RoleOperator)
cookie := loginAs(t, srv, opID)
req, _ := stdhttp.NewRequest("GET", urlBase+"/api/users", nil)
req.AddCookie(cookie)
res, err := stdhttp.DefaultClient.Do(req)
if err != nil {
t.Fatalf("GET: %v", err)
}
defer res.Body.Close()
if res.StatusCode != stdhttp.StatusForbidden {
t.Errorf("status: got %d want 403", res.StatusCode)
}
}
+211 -174
View File
@@ -17,7 +17,6 @@ import (
"gitea.dcglab.co.uk/steve/restic-manager/internal/crypto"
"gitea.dcglab.co.uk/steve/restic-manager/internal/notification"
"gitea.dcglab.co.uk/steve/restic-manager/internal/server/config"
"gitea.dcglab.co.uk/steve/restic-manager/internal/server/oidc"
"gitea.dcglab.co.uk/steve/restic-manager/internal/server/ui"
"gitea.dcglab.co.uk/steve/restic-manager/internal/server/ws"
"gitea.dcglab.co.uk/steve/restic-manager/internal/store"
@@ -46,9 +45,6 @@ type Deps struct {
// admin-bootstrap token printed in the server logs. While set, the
// /bootstrap endpoint accepts it to create the first admin user.
BootstrapToken string
// OIDC (optional). Non-nil when the operator has configured an
// IdP — handlers under /auth/oidc/* are mounted only when set.
OIDC *oidc.Client
}
// Server is the running HTTP server.
@@ -89,6 +85,11 @@ func New(deps Deps) *Server {
r.Use(middleware.Recoverer)
r.Use(requestLogger)
// Health endpoint — unauthenticated, no audit, deliberately cheap.
r.Get("/healthz", func(w stdhttp.ResponseWriter, _ *stdhttp.Request) {
w.WriteHeader(stdhttp.StatusNoContent)
})
s := &Server{
deps: deps,
drainLocks: make(map[string]*sync.Mutex),
@@ -112,17 +113,129 @@ func New(deps Deps) *Server {
// routes wires the API tree. Subtrees live in this file by area so a
// reader can scan one place and see the surface.
func (s *Server) routes(r chi.Router) {
// Public, unauthenticated.
r.Get("/healthz", func(w stdhttp.ResponseWriter, _ *stdhttp.Request) {
w.WriteHeader(stdhttp.StatusNoContent)
r.Route("/api", func(r chi.Router) {
r.Post("/auth/login", s.handleLogin)
r.Post("/auth/logout", s.handleLogout)
r.Post("/bootstrap", s.handleBootstrap)
// Agent enrollment (open endpoint — token is the credential).
r.Post("/agents/enroll", s.handleAgentEnroll)
// Announce-and-approve enrolment (open endpoint — fingerprint
// comparison in the UI is the gate). Per-IP rate-limited and
// globally capped (P2-18).
r.Post("/agents/announce", s.handleAnnounce)
// Pending host management — admin-only (gated inside the handler).
r.Post("/pending-hosts/{id}/accept", s.handleAcceptPendingHost)
r.Post("/pending-hosts/{id}/reject", s.handleRejectPendingHost)
// Operator → server (authenticated). Spec.md §6.1's
// /hosts/{id}/enrollment-token (regenerate) lands when the
// host page can call it; for now just the create endpoint.
r.Post("/enrollment-tokens", s.handleCreateEnrollmentToken)
// Fleet read endpoints — back the dashboard.
r.Get("/hosts", s.handleListHosts)
r.Get("/fleet/summary", s.handleFleetSummary)
// Run-now: dispatch a job to a host's agent.
r.Post("/hosts/{id}/jobs", s.handleRunNow)
// Snapshot projection (refreshed by the agent after each backup).
r.Get("/hosts/{id}/snapshots", s.handleListHostSnapshots)
// Repo credentials — operator can edit after enrollment. The
// initial set is supplied at token-mint time (see enrollment.go).
// GET returns a redacted view (URL, username, has_password).
r.Get("/hosts/{id}/repo-credentials", s.handleGetHostCredentials)
r.Put("/hosts/{id}/repo-credentials", s.handleSetHostCredentials)
// Admin credentials — the prune-capable slot (separate from the
// everyday repo creds). Optional: hosts that don't prune against
// a rest-server repo with a separate admin user never need this.
r.Get("/hosts/{id}/admin-credentials", s.handleGetAdminCredentials)
r.Put("/hosts/{id}/admin-credentials", s.handleSetAdminCredentials)
r.Delete("/hosts/{id}/admin-credentials", s.handleDeleteAdminCredentials)
// Per-host schedule CRUD. Mutations bump host_schedule_version
// and async-push to a connected agent (see schedule_push.go).
r.Get("/hosts/{id}/schedules", s.handleListSchedules)
r.Post("/hosts/{id}/schedules", s.handleCreateSchedule)
r.Put("/hosts/{id}/schedules/{sid}", s.handleUpdateSchedule)
r.Delete("/hosts/{id}/schedules/{sid}", s.handleDeleteSchedule)
// Source-group CRUD. A group is "what gets backed up" — paths,
// excludes, retention, retry. Group name doubles as the
// snapshot tag (restic --tag <name>).
r.Get("/hosts/{id}/source-groups", s.handleListSourceGroups)
r.Post("/hosts/{id}/source-groups", s.handleCreateSourceGroup)
r.Get("/hosts/{id}/source-groups/{gid}", s.handleGetSourceGroup)
r.Put("/hosts/{id}/source-groups/{gid}", s.handleUpdateSourceGroup)
r.Delete("/hosts/{id}/source-groups/{gid}", s.handleDeleteSourceGroup)
// Repo maintenance cadences (forget / prune / check). Driven
// by the server-side ticker (P2R-06), not the agent's cron.
r.Get("/hosts/{id}/repo-maintenance", s.handleGetRepoMaintenance)
r.Put("/hosts/{id}/repo-maintenance", s.handleUpdateRepoMaintenance)
// Host-wide bandwidth caps (host.bandwidth_up_kbps /
// bandwidth_down_kbps). Apply to every restic invocation.
r.Put("/hosts/{id}/bandwidth", s.handleUpdateHostBandwidth)
// Per-source-group Run-now (JSON variant). HTMX action is
// mounted at the equivalent path outside /api below — both
// resolve to the same handler, which sniffs HX-Request.
r.Post("/hosts/{id}/source-groups/{gid}/run", s.handleRunSourceGroup)
// Repo-level run-now: prune (needs admin creds), check, unlock.
// HTMX forms are also mounted outside /api below.
r.Post("/hosts/{id}/repo/prune", s.handleRunRepoPrune)
r.Post("/hosts/{id}/repo/check", s.handleRunRepoCheck)
r.Post("/hosts/{id}/repo/unlock", s.handleRunRepoUnlock)
// Cancel a running job. Operator-driven, sends command.cancel
// to the agent which kills the restic subprocess; the agent's
// resulting job.finished (status=canceled) is what flips the
// job row.
r.Post("/jobs/{id}/cancel", s.handleCancelJob)
// Snapshot diff (P3-09). Dispatches a JobDiff against two
// snapshots; output streams to the standard live job page.
r.Post("/hosts/{id}/snapshots/diff", s.handleSnapshotDiff)
// Alert list (JSON variant). Same filter shape as the UI page.
r.Get("/alerts", s.handleAPIAlerts)
// Notification channel test-fire. Dispatches a synthetic payload
// through a single named channel; returns JSON result.
r.Post("/notifications/{id}/test", s.handleAPINotificationTest)
})
r.Post("/api/auth/login", s.handleLogin)
r.Post("/api/auth/logout", s.handleLogout)
r.Post("/api/bootstrap", s.handleBootstrap)
r.Post("/api/agents/enroll", s.handleAgentEnroll)
r.Post("/api/agents/announce", s.handleAnnounce)
r.Get("/agent/binary", s.handleAgentBinary)
r.Get("/install/*", s.handleInstallAsset)
// HTMX form variant of diff (mounted outside /api so HTMX forms
// can post against it without the api/ prefix).
r.Post("/hosts/{id}/snapshots/diff", s.handleSnapshotDiff)
// Per-source-group Run-now (HTMX form action). Available even
// when the server is started without UI templates so REST callers
// against the non-/api path also work.
r.Post("/hosts/{id}/source-groups/{gid}/run", s.handleRunSourceGroup)
// Repo-level run-now (HTMX form actions). Same handlers as the /api
// variants — wantsHTML sniff distinguishes JSON vs HTMX response.
r.Post("/hosts/{id}/repo/prune", s.handleRunRepoPrune)
r.Post("/hosts/{id}/repo/check", s.handleRunRepoCheck)
r.Post("/hosts/{id}/repo/unlock", s.handleRunRepoUnlock)
// Retired routes — see ui_handlers.go for the messages. Mounted
// outside the UI gate so cached browser tabs get a clear 410
// even if the server runs without templates.
r.Post("/hosts/{id}/run-backup", s.handleUIRunBackupGone)
r.Post("/hosts/{id}/init-repo", s.handleUIInitRepoGone)
// Pending-host WebSocket (announce-and-approve, P2-18b). Mounted
// before /ws/agent so the more-specific route matches first.
r.Get("/ws/agent/pending", s.handlePendingWS)
// Agent ↔ server WebSocket. Bearer-authenticated inside the handler.
if s.deps.Hub != nil {
r.Mount("/ws/agent", ws.AgentHandler(ws.HandlerDeps{
Hub: s.deps.Hub,
@@ -134,174 +247,98 @@ func (s *Server) routes(r chi.Router) {
OnScheduleFire: s.dispatchScheduledJob,
}))
}
r.Get("/ws/agent/pending", s.handlePendingWS)
// Agent binaries + install scripts. Open endpoints — content is
// unprivileged on its own, gating happens via the enrollment
// token. See agent_assets.go.
r.Get("/agent/binary", s.handleAgentBinary)
r.Get("/install/*", s.handleInstallAsset)
// Static assets (Tailwind CSS bundle, future favicon).
r.Mount("/static/", staticHandler())
// POST /logout is always mounted — it handles both local and OIDC
// sessions and doesn't require the UI renderer.
r.Post("/logout", s.handleUILogoutPost)
// HTML UI. The renderer is required — fail loud if the binary
// was built without templates (impossible in practice given
// embed, but guards bad test wiring).
if s.deps.UI != nil {
r.Get("/bootstrap", s.handleUIBootstrapGet)
r.Post("/bootstrap", s.handleUIBootstrapPost)
r.Get("/", s.handleUIDashboard)
r.Get("/login", s.handleUILoginGet)
r.Post("/login", s.handleUILoginPost)
r.Get("/setup", s.handleUISetupGet)
r.Post("/setup", s.handleUISetupPost)
}
if s.deps.OIDC != nil {
r.Get("/auth/oidc/login", s.handleOIDCLogin)
r.Get("/auth/oidc/callback", s.handleOIDCCallback)
r.Post("/logout", s.handleUILogoutPost)
// Per-host Run-now and manual Init-repo are mounted at the
// outer router (so they reply 410 even without UI). Per-
// source-group Run-now lives there too — same reason.
// Add host flow.
r.Get("/hosts/new", s.handleUIAddHostGet)
r.Post("/hosts/new", s.handleUIAddHostPost)
// Durable post-Add-host page (operator can refresh / come
// back; password decrypted from the token row each render).
// Polled fragment under /awaiting flips to "connected" once
// the agent enrols.
r.Get("/hosts/pending/{token}", s.handleUIPendingHost)
r.Get("/hosts/pending/{token}/awaiting", s.handleUIPendingAwaiting)
// Host detail (Snapshots tab is the default).
r.Get("/hosts/{id}", s.handleUIHostDetail)
// Sources tab + source-group CRUD forms.
r.Get("/hosts/{id}/sources", s.handleUIHostSources)
r.Get("/hosts/{id}/sources/new", s.handleUISourceGroupNewGet)
r.Post("/hosts/{id}/sources/new", s.handleUISourceGroupSave)
r.Get("/hosts/{id}/sources/{gid}/edit", s.handleUISourceGroupEditGet)
r.Post("/hosts/{id}/sources/{gid}/edit", s.handleUISourceGroupSave)
r.Post("/hosts/{id}/sources/{gid}/delete", s.handleUISourceGroupDelete)
// Repo tab — connection / bandwidth / maintenance. Three
// independent forms so saving one doesn't touch the others.
r.Get("/hosts/{id}/repo", s.handleUIHostRepo)
r.Post("/hosts/{id}/repo/credentials", s.handleUIRepoCredentialsSave)
r.Post("/hosts/{id}/repo/bandwidth", s.handleUIRepoBandwidthSave)
r.Post("/hosts/{id}/repo/maintenance", s.handleUIRepoMaintenanceSave)
r.Post("/hosts/{id}/repo/reinit", s.handleUIRepoReinit)
r.Post("/hosts/{id}/repo/hooks", s.handleUIRepoHooksSave)
// Admin credentials form (separate slot for prune-capable user).
r.Post("/hosts/{id}/admin-credentials", s.handleUIAdminCredentialsSave)
r.Post("/hosts/{id}/admin-credentials/delete", s.handleUIAdminCredentialsDelete)
// Schedules tab + create/edit/delete forms.
r.Get("/hosts/{id}/schedules", s.handleUISchedulesList)
r.Get("/hosts/{id}/schedules/new", s.handleUIScheduleNewGet)
r.Post("/hosts/{id}/schedules/new", s.handleUIScheduleSave)
r.Get("/hosts/{id}/schedules/{sid}/edit", s.handleUIScheduleEditGet)
r.Post("/hosts/{id}/schedules/{sid}/edit", s.handleUIScheduleSave)
r.Post("/hosts/{id}/schedules/{sid}/delete", s.handleUIScheduleDelete)
r.Post("/hosts/{id}/schedules/{sid}/run", s.handleUIScheduleRun)
// Live job log.
r.Get("/jobs/{id}", s.handleUIJobDetail)
// Restore wizard (P3-01/P3-02). Two GET variants land on the
// same handler; the second deep-links a chosen snapshot.
r.Get("/hosts/{id}/restore", s.handleUIRestoreGet)
r.Get("/hosts/{id}/snapshots/{sid}/restore", s.handleUIRestoreGet)
r.Post("/hosts/{id}/restore", s.handleUIRestorePost)
r.Get("/hosts/{id}/restore/tree", s.handleUIRestoreTree)
// Alerts list + operator actions.
r.Get("/alerts", s.handleUIAlerts)
r.Post("/alerts/{id}/acknowledge", s.handleUIAlertAcknowledge)
r.Post("/alerts/{id}/resolve", s.handleUIAlertResolve)
// Settings shell + Notifications sub-tab CRUD.
r.Get("/settings", s.handleUISettings)
r.Get("/settings/notifications", s.handleUINotificationsList)
r.Get("/settings/notifications/new", s.handleUINotificationNewGet)
r.Post("/settings/notifications/new", s.handleUINotificationNewPost)
r.Get("/settings/notifications/{id}/edit", s.handleUINotificationEditGet)
r.Post("/settings/notifications/{id}/edit", s.handleUINotificationEditPost)
r.Post("/settings/notifications/{id}/delete", s.handleUINotificationDelete)
r.Post("/settings/notifications/{id}/toggle", s.handleUINotificationToggle)
}
// Viewer band — anyone authenticated can read.
r.Group(func(r chi.Router) {
r.Use(s.requireRole(store.RoleViewer))
// Browser job-log stream (separate from /ws/agent so the auth
// layer is session-cookie not bearer). Mounted regardless of
// whether the UI is up — JSON callers may also subscribe.
if s.deps.JobHub != nil {
r.Get("/api/jobs/{id}/stream", s.handleJobStream)
}
// Read APIs.
r.Get("/api/hosts", s.handleListHosts)
r.Get("/api/fleet/summary", s.handleFleetSummary)
r.Get("/api/hosts/{id}/snapshots", s.handleListHostSnapshots)
r.Get("/api/hosts/{id}/repo-credentials", s.handleGetHostCredentials)
r.Get("/api/hosts/{id}/admin-credentials", s.handleGetAdminCredentials)
r.Get("/api/hosts/{id}/schedules", s.handleListSchedules)
r.Get("/api/hosts/{id}/source-groups", s.handleListSourceGroups)
r.Get("/api/hosts/{id}/source-groups/{gid}", s.handleGetSourceGroup)
r.Get("/api/hosts/{id}/repo-maintenance", s.handleGetRepoMaintenance)
r.Get("/api/alerts", s.handleAPIAlerts)
r.Get("/api/audit", s.handleAPIAudit)
r.Post("/api/account/password", s.handleAPIAccountPassword)
// Job log stream + download (read-only; any authenticated user).
if s.deps.JobHub != nil {
r.Get("/api/jobs/{id}/stream", s.handleJobStream)
}
r.Get("/api/jobs/{id}/log.{format:txt|ndjson}", s.handleJobLogDownload)
if s.deps.UI != nil {
r.Get("/", s.handleUIDashboard)
r.Get("/hosts/{id}", s.handleUIHostDetail)
r.Get("/hosts/{id}/sources", s.handleUIHostSources)
r.Get("/hosts/{id}/sources/new", s.handleUISourceGroupNewGet)
r.Get("/hosts/{id}/sources/{gid}/edit", s.handleUISourceGroupEditGet)
r.Get("/hosts/{id}/repo", s.handleUIHostRepo)
r.Get("/hosts/{id}/schedules", s.handleUISchedulesList)
r.Get("/hosts/{id}/schedules/new", s.handleUIScheduleNewGet)
r.Get("/hosts/{id}/schedules/{sid}/edit", s.handleUIScheduleEditGet)
r.Get("/jobs/{id}", s.handleUIJobDetail)
r.Get("/hosts/{id}/restore", s.handleUIRestoreGet)
r.Get("/hosts/{id}/snapshots/{sid}/restore", s.handleUIRestoreGet)
r.Get("/hosts/{id}/restore/tree", s.handleUIRestoreTree)
r.Get("/alerts", s.handleUIAlerts)
r.Get("/audit", s.handleUIAudit)
r.Get("/audit.csv", s.handleUIAuditCSV)
r.Get("/settings/account", s.handleUIAccountGet)
r.Post("/settings/account", s.handleUIAccountPost)
}
})
// Operator band — mutating endpoints up to backup ops.
r.Group(func(r chi.Router) {
r.Use(s.requireRole(store.RoleOperator))
// Pending hosts approval.
r.Post("/api/pending-hosts/{id}/accept", s.handleAcceptPendingHost)
r.Post("/api/pending-hosts/{id}/reject", s.handleRejectPendingHost)
r.Post("/api/enrollment-tokens", s.handleCreateEnrollmentToken)
r.Post("/hosts/enrollment-tokens/{hash}/regenerate", s.handleUIEnrollmentTokenRegenerate)
r.Post("/hosts/enrollment-tokens/{hash}/revoke", s.handleUIEnrollmentTokenRevoke)
// Run-now, restore, repo ops (JSON).
r.Post("/api/hosts/{id}/jobs", s.handleRunNow)
r.Put("/api/hosts/{id}/repo-credentials", s.handleSetHostCredentials)
r.Put("/api/hosts/{id}/admin-credentials", s.handleSetAdminCredentials)
r.Delete("/api/hosts/{id}/admin-credentials", s.handleDeleteAdminCredentials)
r.Post("/api/hosts/{id}/schedules", s.handleCreateSchedule)
r.Put("/api/hosts/{id}/schedules/{sid}", s.handleUpdateSchedule)
r.Delete("/api/hosts/{id}/schedules/{sid}", s.handleDeleteSchedule)
r.Post("/api/hosts/{id}/source-groups", s.handleCreateSourceGroup)
r.Put("/api/hosts/{id}/source-groups/{gid}", s.handleUpdateSourceGroup)
r.Delete("/api/hosts/{id}/source-groups/{gid}", s.handleDeleteSourceGroup)
r.Put("/api/hosts/{id}/repo-maintenance", s.handleUpdateRepoMaintenance)
r.Put("/api/hosts/{id}/bandwidth", s.handleUpdateHostBandwidth)
r.Post("/api/hosts/{id}/source-groups/{gid}/run", s.handleRunSourceGroup)
r.Post("/api/hosts/{id}/repo/prune", s.handleRunRepoPrune)
r.Post("/api/hosts/{id}/repo/check", s.handleRunRepoCheck)
r.Post("/api/hosts/{id}/repo/unlock", s.handleRunRepoUnlock)
r.Post("/api/jobs/{id}/cancel", s.handleCancelJob)
r.Post("/api/hosts/{id}/snapshots/diff", s.handleSnapshotDiff)
// HTMX form variants outside /api.
r.Post("/hosts/{id}/snapshots/diff", s.handleSnapshotDiff)
r.Post("/hosts/{id}/source-groups/{gid}/run", s.handleRunSourceGroup)
r.Post("/hosts/{id}/repo/prune", s.handleRunRepoPrune)
r.Post("/hosts/{id}/repo/check", s.handleRunRepoCheck)
r.Post("/hosts/{id}/repo/unlock", s.handleRunRepoUnlock)
r.Post("/hosts/{id}/run-backup", s.handleUIRunBackupGone)
r.Post("/hosts/{id}/init-repo", s.handleUIInitRepoGone)
if s.deps.UI != nil {
r.Get("/hosts/new", s.handleUIAddHostGet)
r.Post("/hosts/new", s.handleUIAddHostPost)
r.Get("/hosts/pending/{token}", s.handleUIPendingHost)
r.Get("/hosts/pending/{token}/awaiting", s.handleUIPendingAwaiting)
r.Post("/hosts/{id}/sources/new", s.handleUISourceGroupSave)
r.Post("/hosts/{id}/sources/{gid}/edit", s.handleUISourceGroupSave)
r.Post("/hosts/{id}/sources/{gid}/delete", s.handleUISourceGroupDelete)
r.Post("/hosts/{id}/repo/credentials", s.handleUIRepoCredentialsSave)
r.Post("/hosts/{id}/repo/bandwidth", s.handleUIRepoBandwidthSave)
r.Post("/hosts/{id}/repo/maintenance", s.handleUIRepoMaintenanceSave)
r.Post("/hosts/{id}/repo/reinit", s.handleUIRepoReinit)
r.Post("/hosts/{id}/repo/probe", s.handleUIRepoProbe)
r.Post("/hosts/{id}/repo/hooks", s.handleUIRepoHooksSave)
r.Post("/hosts/{id}/tags", s.handleUIHostTagsSave)
r.Post("/hosts/{id}/admin-credentials", s.handleUIAdminCredentialsSave)
r.Post("/hosts/{id}/admin-credentials/delete", s.handleUIAdminCredentialsDelete)
r.Post("/hosts/{id}/schedules/new", s.handleUIScheduleSave)
r.Post("/hosts/{id}/schedules/{sid}/edit", s.handleUIScheduleSave)
r.Post("/hosts/{id}/schedules/{sid}/delete", s.handleUIScheduleDelete)
r.Post("/hosts/{id}/schedules/{sid}/run", s.handleUIScheduleRun)
r.Post("/hosts/{id}/restore", s.handleUIRestorePost)
r.Post("/alerts/{id}/acknowledge", s.handleUIAlertAcknowledge)
r.Post("/alerts/{id}/resolve", s.handleUIAlertResolve)
}
})
// Admin band — channels, server-shape config.
r.Group(func(r chi.Router) {
r.Use(s.requireRole(store.RoleAdmin))
r.Get("/api/users", s.handleAPIUsersList)
r.Post("/api/users", s.handleAPIUserCreate)
r.Get("/api/users/{id}", s.handleAPIUserGet)
r.Patch("/api/users/{id}", s.handleAPIUserPatch)
r.Post("/api/users/{id}/disable", s.handleAPIUserDisable)
r.Post("/api/users/{id}/enable", s.handleAPIUserEnable)
r.Post("/api/users/{id}/regenerate-setup", s.handleAPIUserRegenerateSetup)
r.Post("/api/users/{id}/force-logout", s.handleAPIUserForceLogout)
r.Post("/api/notifications/{id}/test", s.handleAPINotificationTest)
if s.deps.UI != nil {
r.Post("/hosts/{id}/delete", s.handleUIHostDelete)
r.Get("/settings", s.handleUISettings)
r.Get("/settings/users", s.handleUIUsersList)
r.Get("/settings/users/new", s.handleUIUserNewGet)
r.Post("/settings/users/new", s.handleUIUserNewPost)
r.Get("/settings/users/{id}/edit", s.handleUIUserEditGet)
r.Post("/settings/users/{id}/edit", s.handleUIUserEditPost)
r.Post("/settings/users/{id}/disable", s.handleUIUserDisablePost)
r.Post("/settings/users/{id}/enable", s.handleUIUserEnablePost)
r.Post("/settings/users/{id}/regenerate-setup", s.handleUIUserRegenerateSetupPost)
r.Post("/settings/users/{id}/force-logout", s.handleUIUserForceLogoutPost)
r.Get("/settings/users/{id}/setup-link", s.handleUIUserSetupLinkGet)
r.Get("/settings/notifications", s.handleUINotificationsList)
r.Get("/settings/notifications/new", s.handleUINotificationNewGet)
r.Post("/settings/notifications/new", s.handleUINotificationNewPost)
r.Get("/settings/notifications/{id}/edit", s.handleUINotificationEditGet)
r.Post("/settings/notifications/{id}/edit", s.handleUINotificationEditPost)
r.Post("/settings/notifications/{id}/delete", s.handleUINotificationDelete)
r.Post("/settings/notifications/{id}/toggle", s.handleUINotificationToggle)
}
})
// Job log download (txt + ndjson). Source of truth is the
// persisted job_logs table; safe to call any time, no pause
// needed against the live stream.
r.Get("/api/jobs/{id}/log.{format:txt|ndjson}", s.handleJobLogDownload)
}
// Start begins listening. Blocks until ListenAndServe returns
-177
View File
@@ -1,177 +0,0 @@
// setup_handler.go — public landing page for the user-setup link
// emitted by the admin's "+ Add user" / "Regenerate setup link" flow.
//
// Routes (wired in server.go):
//
// GET /setup → handleUISetupGet
// POST /setup → handleUISetupPost (lands in Task D2)
//
// The token in the querystring (`?token=<raw>`) is the credential.
// Auth middleware does not run on these routes.
package http
import (
"crypto/sha256"
"encoding/hex"
"log/slog"
stdhttp "net/http"
"time"
"github.com/oklog/ulid/v2"
"gitea.dcglab.co.uk/steve/restic-manager/internal/auth"
"gitea.dcglab.co.uk/steve/restic-manager/internal/server/ui"
"gitea.dcglab.co.uk/steve/restic-manager/internal/store"
)
type setupPage struct {
Username string
Token string // round-tripped to the POST form
Error string // displayed when password validation fails or token is invalid
}
// hashSetupToken is the canonical hashing for setup tokens. Must
// match what the admin handler uses when SetSetupToken is called,
// so the digest at rest matches what GET /setup hashes.
func hashSetupToken(raw string) string {
h := sha256.Sum256([]byte(raw))
return hex.EncodeToString(h[:])
}
func (s *Server) handleUISetupGet(w stdhttp.ResponseWriter, r *stdhttp.Request) {
raw := r.URL.Query().Get("token")
if raw == "" {
s.renderSetupExpired(w, r)
return
}
tok, err := s.deps.Store.LookupSetupToken(r.Context(), hashSetupToken(raw))
if err != nil {
s.renderSetupExpired(w, r)
return
}
if tok.ExpiresAt.Before(time.Now().UTC()) {
s.renderSetupExpired(w, r)
return
}
u, err := s.deps.Store.GetUserByID(r.Context(), tok.UserID)
if err != nil {
s.renderSetupExpired(w, r)
return
}
view := s.baseView(r, nil)
view.Title = "Set your password · restic-manager"
view.Page = setupPage{Username: u.Username, Token: raw}
if err := s.deps.UI.Render(w, "setup", view); err != nil {
slog.Error("ui setup: render", "err", err)
}
}
func (s *Server) renderSetupExpired(w stdhttp.ResponseWriter, r *stdhttp.Request) {
w.WriteHeader(stdhttp.StatusGone)
view := s.baseView(r, nil)
view.Title = "Link expired · restic-manager"
view.Page = setupPage{Error: "expired"}
_ = s.deps.UI.Render(w, "setup", view)
_ = ui.User{} // keep ui import alive
}
func (s *Server) handleUISetupPost(w stdhttp.ResponseWriter, r *stdhttp.Request) {
if err := r.ParseForm(); err != nil {
stdhttp.Error(w, "bad request", stdhttp.StatusBadRequest)
return
}
raw := r.PostForm.Get("token")
pw := r.PostForm.Get("password")
pw2 := r.PostForm.Get("password_confirm")
if raw == "" {
s.renderSetupExpired(w, r)
return
}
if pw == "" || pw2 == "" || pw != pw2 || len(pw) < 12 {
s.renderSetupForm(w, r, raw, "Passwords must match and be at least 12 characters.")
return
}
tok, err := s.deps.Store.LookupSetupToken(r.Context(), hashSetupToken(raw))
if err != nil || tok.ExpiresAt.Before(time.Now().UTC()) {
s.renderSetupExpired(w, r)
return
}
u, err := s.deps.Store.GetUserByID(r.Context(), tok.UserID)
if err != nil {
s.renderSetupExpired(w, r)
return
}
hash, err := auth.HashPassword(pw)
if err != nil {
slog.Error("setup: hash password", "err", err)
stdhttp.Error(w, "internal", stdhttp.StatusInternalServerError)
return
}
if err := s.deps.Store.SetPasswordHash(r.Context(), u.ID, hash); err != nil {
slog.Error("setup: set password", "err", err)
stdhttp.Error(w, "internal", stdhttp.StatusInternalServerError)
return
}
if err := s.deps.Store.DeleteSetupToken(r.Context(), u.ID); err != nil {
slog.Warn("setup: delete token", "err", err)
// Non-fatal — password is set, audit will reflect it.
}
// Drop a session cookie so the user lands authenticated on /.
rawSession, err := auth.NewToken()
if err != nil {
slog.Error("setup: session token", "err", err)
stdhttp.Error(w, "internal", stdhttp.StatusInternalServerError)
return
}
hashed := auth.HashToken(rawSession)
now := time.Now().UTC()
if err := s.deps.Store.CreateSession(r.Context(), store.Session{
ID: hashed, UserID: u.ID, CreatedAt: now,
ExpiresAt: now.Add(8 * time.Hour),
}, hashed); err != nil {
slog.Error("setup: create session", "err", err)
stdhttp.Error(w, "internal", stdhttp.StatusInternalServerError)
return
}
stdhttp.SetCookie(w, &stdhttp.Cookie{
Name: sessionCookieName, Value: rawSession,
Path: "/", HttpOnly: true,
SameSite: stdhttp.SameSiteLaxMode,
Secure: s.deps.Cfg.CookieSecure,
Expires: now.Add(8 * time.Hour),
})
// Record the login so the users-list "Last login" column shows
// the moment they completed setup (the regular /login path does
// the same; we'd otherwise leave the row showing "never").
_ = s.deps.Store.MarkUserLogin(r.Context(), u.ID, now)
_ = s.deps.Store.AppendAudit(r.Context(), store.AuditEntry{
ID: ulid.Make().String(),
UserID: &u.ID,
Actor: "user",
Action: "user.setup_completed",
TargetKind: ptr("user"),
TargetID: &u.ID,
TS: now,
})
stdhttp.Redirect(w, r, "/", stdhttp.StatusSeeOther)
}
// renderSetupForm re-renders the setup page with an inline error
// (e.g. password mismatch). 200 OK with the form intact so the user
// can correct without losing the token.
func (s *Server) renderSetupForm(w stdhttp.ResponseWriter, r *stdhttp.Request, token, errMsg string) {
view := s.baseView(r, nil)
view.Title = "Set your password · restic-manager"
username := ""
if tok, err := s.deps.Store.LookupSetupToken(r.Context(), hashSetupToken(token)); err == nil {
if u, err := s.deps.Store.GetUserByID(r.Context(), tok.UserID); err == nil {
username = u.Username
}
}
view.Page = setupPage{Username: username, Token: token, Error: errMsg}
_ = s.deps.UI.Render(w, "setup", view)
}
-152
View File
@@ -1,152 +0,0 @@
package http
import (
"bytes"
"context"
"crypto/sha256"
"encoding/hex"
"encoding/json"
"io"
stdhttp "net/http"
"net/url"
"strings"
"testing"
"time"
"github.com/oklog/ulid/v2"
"gitea.dcglab.co.uk/steve/restic-manager/internal/store"
)
func sha256Hex(s string) string {
h := sha256.Sum256([]byte(s))
return hex.EncodeToString(h[:])
}
func TestSetupGetValidToken(t *testing.T) {
t.Parallel()
// /setup renders HTML, so we need a real UI renderer.
srv, ts, _ := rawTestServerWithUI(t)
urlBase := ts.URL
now := time.Now().UTC()
uid := ulid.Make().String()
if err := srv.deps.Store.CreateUser(t.Context(), store.User{
ID: uid, Username: "newbie", PasswordHash: "",
Role: store.RoleOperator, CreatedAt: now,
MustChangePassword: true,
}); err != nil {
t.Fatalf("create: %v", err)
}
raw := "raw-token-1234567890"
hash := sha256Hex(raw)
if err := srv.deps.Store.SetSetupToken(context.Background(), store.SetupToken{
UserID: uid, TokenHash: hash,
ExpiresAt: now.Add(time.Hour), CreatedAt: now,
}); err != nil {
t.Fatalf("set token: %v", err)
}
res, err := stdhttp.Get(urlBase + "/setup?token=" + raw)
if err != nil {
t.Fatalf("GET: %v", err)
}
defer res.Body.Close()
if res.StatusCode != stdhttp.StatusOK {
t.Errorf("status: got %d want 200", res.StatusCode)
}
body, _ := io.ReadAll(res.Body)
if !strings.Contains(string(body), "newbie") {
t.Errorf("expected username in body: %s", body)
}
}
func TestSetupGetExpiredToken(t *testing.T) {
t.Parallel()
// /setup renders HTML, so we need a real UI renderer.
srv, ts, _ := rawTestServerWithUI(t)
urlBase := ts.URL
now := time.Now().UTC()
uid := ulid.Make().String()
_ = srv.deps.Store.CreateUser(t.Context(), store.User{
ID: uid, Username: "stale",
PasswordHash: "", Role: store.RoleViewer, CreatedAt: now,
MustChangePassword: true,
})
raw := "expired-token"
_ = srv.deps.Store.SetSetupToken(context.Background(), store.SetupToken{
UserID: uid, TokenHash: sha256Hex(raw),
ExpiresAt: now.Add(-time.Minute), CreatedAt: now.Add(-2 * time.Hour),
})
res, err := stdhttp.Get(urlBase + "/setup?token=" + raw)
if err != nil {
t.Fatalf("GET: %v", err)
}
defer res.Body.Close()
if res.StatusCode != stdhttp.StatusGone {
t.Errorf("status: got %d want 410", res.StatusCode)
}
}
func TestSetupPostHappyPath(t *testing.T) {
t.Parallel()
srv, ts, _ := rawTestServerWithUI(t)
urlBase := ts.URL
now := time.Now().UTC()
uid := ulid.Make().String()
_ = srv.deps.Store.CreateUser(t.Context(), store.User{
ID: uid, Username: "newbie",
PasswordHash: "", Role: store.RoleOperator, CreatedAt: now,
MustChangePassword: true,
})
raw := "happy-token"
_ = srv.deps.Store.SetSetupToken(t.Context(), store.SetupToken{
UserID: uid, TokenHash: sha256Hex(raw),
ExpiresAt: now.Add(time.Hour), CreatedAt: now,
})
form := url.Values{}
form.Set("token", raw)
form.Set("password", "averylongpassword")
form.Set("password_confirm", "averylongpassword")
req, _ := stdhttp.NewRequest("POST", urlBase+"/setup",
strings.NewReader(form.Encode()))
req.Header.Set("Content-Type", "application/x-www-form-urlencoded")
c := &stdhttp.Client{CheckRedirect: func(*stdhttp.Request, []*stdhttp.Request) error {
return stdhttp.ErrUseLastResponse
}}
res, err := c.Do(req)
if err != nil {
t.Fatalf("POST: %v", err)
}
defer res.Body.Close()
if res.StatusCode != stdhttp.StatusSeeOther {
t.Errorf("status: got %d want 303", res.StatusCode)
}
if res.Header.Get("Location") != "/" {
t.Errorf("location: got %q want /", res.Header.Get("Location"))
}
// Token is consumed.
if _, err := srv.deps.Store.LookupSetupToken(t.Context(), sha256Hex(raw)); err == nil {
t.Error("token should be deleted after consumption")
}
// User can now log in via the normal route.
logBody, _ := json.Marshal(map[string]string{
"username": "newbie", "password": "averylongpassword",
})
loginRes, _ := stdhttp.Post(urlBase+"/api/auth/login",
"application/json", bytes.NewReader(logBody))
defer loginRes.Body.Close()
if loginRes.StatusCode != stdhttp.StatusOK {
body, _ := io.ReadAll(loginRes.Body)
t.Errorf("login: %d %s", loginRes.StatusCode, body)
}
}
-154
View File
@@ -1,154 +0,0 @@
// ui_account.go — self-service account surface (password change).
//
// Routes (wired in server.go):
//
// POST /api/account/password — JSON change-password (mounted in viewer band)
// GET /settings/account — page (lands in Task F4)
// POST /settings/account — page submit (lands in Task F4)
package http
import (
"encoding/json"
stdhttp "net/http"
"time"
"github.com/oklog/ulid/v2"
"gitea.dcglab.co.uk/steve/restic-manager/internal/auth"
"gitea.dcglab.co.uk/steve/restic-manager/internal/store"
)
type passwordChangeRequest struct {
CurrentPassword string `json:"current_password"`
NewPassword string `json:"new_password"`
}
func (s *Server) handleAPIAccountPassword(w stdhttp.ResponseWriter, r *stdhttp.Request) {
u, ok := s.requireUser(r)
if !ok {
writeJSONError(w, stdhttp.StatusUnauthorized, "unauthorised", "")
return
}
var req passwordChangeRequest
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
writeJSONError(w, stdhttp.StatusBadRequest, "invalid_json", err.Error())
return
}
if len(req.NewPassword) < 12 {
writeJSONError(w, stdhttp.StatusBadRequest, "password_too_short", "min 12 chars")
return
}
// Skip current-password check when must_change_password is set —
// the user has no current password to know (only matters for the
// legacy reset-password path; setup-token path doesn't use this).
if !u.MustChangePassword {
if err := auth.VerifyPassword(u.PasswordHash, req.CurrentPassword); err != nil {
writeJSONError(w, stdhttp.StatusUnauthorized, "current_password_wrong", "")
return
}
}
hash, err := auth.HashPassword(req.NewPassword)
if err != nil {
writeJSONError(w, stdhttp.StatusInternalServerError, "internal", err.Error())
return
}
if err := s.deps.Store.SetPasswordHash(r.Context(), u.ID, hash); err != nil {
writeJSONError(w, stdhttp.StatusInternalServerError, "internal", err.Error())
return
}
_ = s.deps.Store.AppendAudit(r.Context(), store.AuditEntry{
ID: ulid.Make().String(), UserID: &u.ID, Actor: "user",
Action: "user.password_changed",
TargetKind: ptr("user"), TargetID: &u.ID,
TS: time.Now().UTC(),
})
w.WriteHeader(stdhttp.StatusOK)
}
type accountPage struct {
Username string
Role string
MustChange bool
Error string
Saved bool
}
func (s *Server) handleUIAccountGet(w stdhttp.ResponseWriter, r *stdhttp.Request) {
u := s.requireUIUser(w, r)
if u == nil {
return
}
full, err := s.deps.Store.GetUserByID(r.Context(), u.ID)
if err != nil {
stdhttp.Error(w, "internal", stdhttp.StatusInternalServerError)
return
}
view := s.baseView(r, u)
view.Title = "Account · restic-manager"
view.Active = "settings"
view.Page = accountPage{
Username: full.Username, Role: string(full.Role),
MustChange: full.MustChangePassword,
}
_ = s.deps.UI.Render(w, "account", view)
}
func (s *Server) handleUIAccountPost(w stdhttp.ResponseWriter, r *stdhttp.Request) {
u := s.requireUIUser(w, r)
if u == nil {
return
}
if err := r.ParseForm(); err != nil {
stdhttp.Error(w, "bad request", stdhttp.StatusBadRequest)
return
}
cur := r.PostForm.Get("current_password")
pw := r.PostForm.Get("new_password")
pw2 := r.PostForm.Get("confirm_password")
full, err := s.deps.Store.GetUserByID(r.Context(), u.ID)
if err != nil {
stdhttp.Error(w, "internal", stdhttp.StatusInternalServerError)
return
}
render := func(errMsg string, saved bool) {
view := s.baseView(r, u)
view.Title = "Account · restic-manager"
view.Active = "settings"
view.Page = accountPage{
Username: full.Username, Role: string(full.Role),
MustChange: full.MustChangePassword,
Error: errMsg, Saved: saved,
}
_ = s.deps.UI.Render(w, "account", view)
}
if pw == "" || pw != pw2 || len(pw) < 12 {
render("Passwords must match and be at least 12 characters.", false)
return
}
if !full.MustChangePassword {
if err := auth.VerifyPassword(full.PasswordHash, cur); err != nil {
render("Current password is incorrect.", false)
return
}
}
hash, err := auth.HashPassword(pw)
if err != nil {
render("Internal error.", false)
return
}
if err := s.deps.Store.SetPasswordHash(r.Context(), u.ID, hash); err != nil {
render("Internal error.", false)
return
}
_ = s.deps.Store.AppendAudit(r.Context(), store.AuditEntry{
ID: ulid.Make().String(), UserID: &u.ID, Actor: "user",
Action: "user.password_changed",
TargetKind: ptr("user"), TargetID: &u.ID,
TS: time.Now().UTC(),
})
full.MustChangePassword = false
render("", true)
}
+5 -24
View File
@@ -14,12 +14,10 @@ import (
)
type alertsPage struct {
Filter store.AlertFilter
Alerts []store.Alert
Counts alertCounts
HostNames map[string]string // host_id → name for table rendering
Usernames map[string]string // user_id → username for the "ack'd by …" line
RefreshURL string // self-URL for the live-refresh poll
Filter store.AlertFilter
Alerts []store.Alert
Counts alertCounts
HostNames map[string]string // host_id → name for table rendering
}
type alertCounts struct {
@@ -53,29 +51,12 @@ func (s *Server) handleUIAlerts(w stdhttp.ResponseWriter, r *stdhttp.Request) {
return
}
page := alertsPage{
Filter: f,
Alerts: alerts,
HostNames: map[string]string{},
Usernames: map[string]string{},
RefreshURL: r.URL.RequestURI(),
}
page := alertsPage{Filter: f, Alerts: alerts, HostNames: map[string]string{}}
if hosts, err := s.deps.Store.ListHosts(r.Context()); err == nil {
for _, h := range hosts {
page.HostNames[h.ID] = h.Name
}
}
// Resolve user IDs that appear on acknowledged rows to usernames so
// the "ack'd by …" line shows a human name rather than the
// underlying ULID. Cheap at fleet sizes we care about (one extra
// query per alerts page render). Disabled users are still resolved
// — operators want to know *who* ack'd, even if the account is
// since gone.
if users, err := s.deps.Store.ListUsers(r.Context(), store.UserSort{}); err == nil {
for _, usr := range users {
page.Usernames[usr.ID] = usr.Username
}
}
page.Counts = computeAlertCounts(s, r)
view := s.baseView(r, u)
-269
View File
@@ -1,269 +0,0 @@
// ui_audit.go — Audit log read-only surfaces.
//
// Routes (wired in server.go):
//
// GET /audit → handleUIAudit (HTML)
// GET /audit.csv → handleUIAuditCSV (CSV download honouring current filters)
// GET /api/audit → handleAPIAudit (JSON)
//
// Filters: user, actor, action (substring), target_kind, time-range
// preset (24h | 7d | 30d | all). Page-level live refresh is *not*
// added here — audit is append-only and operators inspect history,
// not current state.
package http
import (
"encoding/csv"
"encoding/json"
"fmt"
"log/slog"
stdhttp "net/http"
"net/url"
"strings"
"time"
"gitea.dcglab.co.uk/steve/restic-manager/internal/store"
)
type auditPage struct {
Filter store.AuditFilter
Range string // "24h" | "7d" | "30d" | "all"
Entries []store.AuditEntry
UserNames map[string]string // user_id → username for row rendering
HostNames map[string]string // host_id → name (for target_kind=host display)
Actions []string // distinct actions seen so far, for the dropdown
// Sort + Dir reflect the *resolved* sort (after allowlist
// validation) so the template can render arrows on the active
// column.
Sort string // "ts" | "actor" | "user_id" | "action" | "target_kind"
Dir string // "asc" | "desc"
// SortHrefs is a fully-encoded /audit?…&sort=COL&dir=… for each
// sortable column. Built server-side because constructing the
// querystring inside a Go html/template <a href="…"> applies
// URL-attribute escaping to '=' (turning 'range=all' into
// 'range%3dall' on the wire), which loses every filter on click.
// CSVHref is the analogous link for the export button.
SortHrefs map[string]string
CSVHref string
}
// rangeToSince converts the time-range preset to a Since cutoff. "all"
// (or unrecognised) returns the zero time, meaning "no lower bound".
func rangeToSince(r string, now time.Time) time.Time {
switch r {
case "24h", "":
return now.Add(-24 * time.Hour)
case "7d":
return now.Add(-7 * 24 * time.Hour)
case "30d":
return now.Add(-30 * 24 * time.Hour)
default:
return time.Time{}
}
}
// auditFilterFromQuery extracts the AuditFilter + range preset from
// the request querystring. Shared by the HTML, CSV, and JSON handlers
// so all three honour the same filter URL.
func auditFilterFromQuery(r *stdhttp.Request) (store.AuditFilter, string) {
q := r.URL.Query()
rng := q.Get("range")
if rng == "" {
rng = "24h"
}
return store.AuditFilter{
UserID: q.Get("user_id"),
Actor: q.Get("actor"),
ActionLike: strings.TrimSpace(q.Get("action")),
TargetKind: q.Get("target_kind"),
Since: rangeToSince(rng, time.Now().UTC()),
Limit: 5000, // CSV export tolerates more rows; HTML clamps via paging later
OrderBy: q.Get("sort"),
OrderAsc: q.Get("dir") == "asc",
}, rng
}
func (s *Server) handleUIAudit(w stdhttp.ResponseWriter, r *stdhttp.Request) {
u := s.requireUIUser(w, r)
if u == nil {
return
}
f, rng := auditFilterFromQuery(r)
// HTML page caps lower than CSV — keeps the table snappy.
if f.Limit > 500 {
f.Limit = 500
}
entries, err := s.deps.Store.ListAudit(r.Context(), f)
if err != nil {
slog.Error("ui audit: list", "err", err)
stdhttp.Error(w, "internal", stdhttp.StatusInternalServerError)
return
}
// Resolve the sort key once so the page model and the template
// see the same value the SQL just used. f.OrderBy may have been
// '' or unknown → 'ts'; the template needs the resolved one.
resolvedSort := "ts"
switch f.OrderBy {
case "actor", "user_id", "action", "target_kind":
resolvedSort = f.OrderBy
}
dir := "desc"
if f.OrderAsc {
dir = "asc"
}
// Build the per-column sort hrefs once, so the template only
// has to emit them. Each click flips dir on the active column;
// any other column starts at desc (newest-first / Z→A).
base := url.Values{}
if rng != "" {
base.Set("range", rng)
}
if f.UserID != "" {
base.Set("user_id", f.UserID)
}
if f.Actor != "" {
base.Set("actor", f.Actor)
}
if f.ActionLike != "" {
base.Set("action", f.ActionLike)
}
if f.TargetKind != "" {
base.Set("target_kind", f.TargetKind)
}
csvHref := "/audit.csv?" + base.Encode()
hrefs := make(map[string]string, 5)
for _, col := range []string{"ts", "actor", "user_id", "action", "target_kind"} {
v := url.Values{}
for k, vs := range base {
v[k] = vs
}
v.Set("sort", col)
newDir := "desc"
if col == resolvedSort && dir == "desc" {
newDir = "asc"
}
v.Set("dir", newDir)
hrefs[col] = "/audit?" + v.Encode()
}
page := auditPage{
Filter: f,
Range: rng,
Entries: entries,
UserNames: map[string]string{},
HostNames: map[string]string{},
Sort: resolvedSort,
Dir: dir,
SortHrefs: hrefs,
CSVHref: csvHref,
}
if users, err := s.deps.Store.ListUsers(r.Context(), store.UserSort{}); err == nil {
for _, ux := range users {
page.UserNames[ux.ID] = ux.Username
}
}
if hosts, err := s.deps.Store.ListHosts(r.Context()); err == nil {
for _, h := range hosts {
page.HostNames[h.ID] = h.Name
}
}
if actions, err := s.deps.Store.DistinctAuditActions(r.Context()); err == nil {
page.Actions = actions
}
view := s.baseView(r, u)
view.Title = "Audit · restic-manager"
view.Active = "audit"
view.Page = page
if err := s.deps.UI.Render(w, "audit", view); err != nil {
slog.Error("ui audit: render", "err", err)
}
}
// handleAPIAudit is the JSON variant — same filters as the HTML page.
func (s *Server) handleAPIAudit(w stdhttp.ResponseWriter, r *stdhttp.Request) {
if _, ok := s.requireUser(r); !ok {
writeJSONError(w, stdhttp.StatusUnauthorized, "unauthorised", "")
return
}
f, _ := auditFilterFromQuery(r)
if f.Limit > 500 {
f.Limit = 500
}
entries, err := s.deps.Store.ListAudit(r.Context(), f)
if err != nil {
writeJSONError(w, stdhttp.StatusInternalServerError, "internal", err.Error())
return
}
w.Header().Set("Content-Type", "application/json; charset=utf-8")
_ = json.NewEncoder(w).Encode(map[string]any{"entries": entries})
}
// handleUIAuditCSV streams the filtered audit log as CSV. Auth-gated
// like the HTML page; honours the same filter querystring so an
// operator can refine the view in the browser, hit Export, and get
// exactly what's on screen (plus more rows up to the 5000 cap).
func (s *Server) handleUIAuditCSV(w stdhttp.ResponseWriter, r *stdhttp.Request) {
u := s.requireUIUser(w, r)
if u == nil {
return
}
f, _ := auditFilterFromQuery(r)
entries, err := s.deps.Store.ListAudit(r.Context(), f)
if err != nil {
stdhttp.Error(w, "internal", stdhttp.StatusInternalServerError)
return
}
// Resolve user_id → username and host_id → name once for the
// human-friendly columns.
userNames := map[string]string{}
if users, err := s.deps.Store.ListUsers(r.Context(), store.UserSort{}); err == nil {
for _, ux := range users {
userNames[ux.ID] = ux.Username
}
}
hostNames := map[string]string{}
if hosts, err := s.deps.Store.ListHosts(r.Context()); err == nil {
for _, h := range hosts {
hostNames[h.ID] = h.Name
}
}
stamp := time.Now().UTC().Format("20060102-150405")
w.Header().Set("Content-Type", "text/csv; charset=utf-8")
w.Header().Set("Content-Disposition",
fmt.Sprintf(`attachment; filename="audit-%s.csv"`, stamp))
cw := csv.NewWriter(w)
defer cw.Flush()
// user_id and target_id are internal ULIDs that carry no meaning
// to anyone reading the CSV — the resolved name (or — for system
// rows / non-host targets) is what an operator wants. The HTML
// page still shows IDs in the Target column for traceability when
// no name is available; the CSV is for human reporting only.
_ = cw.Write([]string{"timestamp_utc", "actor", "user", "action", "target_kind", "target_name", "payload"})
for _, e := range entries {
var uname string
if e.UserID != nil {
uname = userNames[*e.UserID]
}
var tk, tname string
if e.TargetKind != nil {
tk = *e.TargetKind
}
if tk == "host" && e.TargetID != nil {
tname = hostNames[*e.TargetID]
}
payload := ""
if len(e.Payload) > 0 {
payload = string(e.Payload)
}
_ = cw.Write([]string{
e.TS.UTC().Format("2006-01-02 15:04:05"),
e.Actor, uname, e.Action, tk, tname, payload,
})
}
}
@@ -1,143 +0,0 @@
// ui_enrollment_tokens.go — NS-02 token-recovery handlers.
//
// Today the only handle on a freshly-minted enrolment token is its
// /hosts/pending/{token} URL, which lives in the operator's browser
// tab. Closing that tab loses the install snippet — the row stays
// alive in the DB until TTL expiry but invisible to the UI. These
// handlers close the gap with two operations exposed on the
// Add-host page:
//
// POST /hosts/enrollment-tokens/{hash}/regenerate
// POST /hosts/enrollment-tokens/{hash}/revoke
//
// Hash here is the *token_hash* (sha256 hex of the raw token), which
// is opaque on its own — it is not the credential, just an identifier
// for the row. We chose regenerate over "show original token" because
// only hashes are persisted; the raw token has been gone since the
// original /hosts/new POST.
package http
import (
"encoding/json"
"errors"
"log/slog"
stdhttp "net/http"
"github.com/go-chi/chi/v5"
"github.com/oklog/ulid/v2"
"gitea.dcglab.co.uk/steve/restic-manager/internal/store"
)
// handleUIEnrollmentTokenRegenerate revokes the row keyed by token
// hash and mints a fresh raw token with the same attachments
// (encrypted repo creds, initial paths). Redirects to the new
// /hosts/pending/{newToken} so the operator lands directly on the
// install snippet.
func (s *Server) handleUIEnrollmentTokenRegenerate(w stdhttp.ResponseWriter, r *stdhttp.Request) {
user, ok := s.requireUser(r)
if !ok {
writeJSONError(w, stdhttp.StatusUnauthorized, "unauthorised", "")
return
}
oldHash := chi.URLParam(r, "hash")
if oldHash == "" {
stdhttp.Error(w, "missing hash", stdhttp.StatusBadRequest)
return
}
att, err := s.deps.Store.GetEnrollmentTokenAttachments(r.Context(), oldHash)
if err != nil {
if errors.Is(err, store.ErrNotFound) {
// Already expired/consumed/revoked — bounce back without
// fanfare so a stale form re-submit doesn't loud-fail.
stdhttp.Redirect(w, r, "/hosts/new", stdhttp.StatusSeeOther)
return
}
slog.Error("regen: load attachments", "err", err)
stdhttp.Error(w, "internal", stdhttp.StatusInternalServerError)
return
}
var blob repoCredsBlob
if att.EncRepoCreds != "" {
plain, err := s.deps.AEAD.Decrypt(att.EncRepoCreds, []byte("token:"+oldHash))
if err != nil {
slog.Error("regen: decrypt", "err", err)
stdhttp.Error(w, "internal", stdhttp.StatusInternalServerError)
return
}
_ = json.Unmarshal(plain, &blob)
}
// Mint the new row first; only revoke the old one once the fresh
// row exists. If something fails between, the operator at worst
// sees both rows side-by-side on the list page (and can revoke the
// stale one manually) — much better than nuking the old row and
// failing the mint, leaving them with nothing.
newToken, _, err := s.mintEnrollmentToken(r.Context(),
blob.RepoURL, blob.RepoUsername, blob.RepoPassword, att.InitialPaths)
if err != nil {
slog.Error("regen: mint new", "err", err)
stdhttp.Error(w, "internal", stdhttp.StatusInternalServerError)
return
}
if err := s.deps.Store.DeleteEnrollmentToken(r.Context(), oldHash); err != nil &&
!errors.Is(err, store.ErrNotFound) {
slog.Warn("regen: delete old", "old_hash", oldHash, "err", err)
// Fall through — the new row is good; operator can revoke the
// stale row from the list if the orphan row bothers them.
}
uid := user.ID
short := oldHash
if len(short) > 12 {
short = short[:12]
}
_ = s.deps.Store.AppendAudit(r.Context(), store.AuditEntry{
ID: ulid.Make().String(),
UserID: &uid,
Actor: "user",
Action: "enrollment_token.regenerated",
TargetKind: ptr("enrollment_token"),
TargetID: &short,
TS: nowUTC(),
})
stdhttp.Redirect(w, r, "/hosts/pending/"+newToken, stdhttp.StatusSeeOther)
}
// handleUIEnrollmentTokenRevoke deletes the token row outright.
// Redirects to /hosts/new where the list re-renders without the row.
func (s *Server) handleUIEnrollmentTokenRevoke(w stdhttp.ResponseWriter, r *stdhttp.Request) {
user, ok := s.requireUser(r)
if !ok {
writeJSONError(w, stdhttp.StatusUnauthorized, "unauthorised", "")
return
}
hash := chi.URLParam(r, "hash")
if hash == "" {
stdhttp.Error(w, "missing hash", stdhttp.StatusBadRequest)
return
}
if err := s.deps.Store.DeleteEnrollmentToken(r.Context(), hash); err != nil &&
!errors.Is(err, store.ErrNotFound) {
slog.Error("revoke: delete", "err", err)
stdhttp.Error(w, "internal", stdhttp.StatusInternalServerError)
return
}
uid := user.ID
short := hash
if len(short) > 12 {
short = short[:12]
}
_ = s.deps.Store.AppendAudit(r.Context(), store.AuditEntry{
ID: ulid.Make().String(),
UserID: &uid,
Actor: "user",
Action: "enrollment_token.revoked",
TargetKind: ptr("enrollment_token"),
TargetID: &short,
TS: nowUTC(),
})
stdhttp.Redirect(w, r, "/hosts/new", stdhttp.StatusSeeOther)
}
@@ -1,158 +0,0 @@
// ui_enrollment_tokens_test.go — covers NS-02 token-recovery handlers:
// revoke deletes the row, regenerate swaps the row out for a fresh
// raw token redirected to /hosts/pending/{newToken}.
package http
import (
"context"
"errors"
stdhttp "net/http"
"strings"
"testing"
"gitea.dcglab.co.uk/steve/restic-manager/internal/auth"
"gitea.dcglab.co.uk/steve/restic-manager/internal/store"
)
// mintTestToken seeds an enrolment token via the same helper the live
// /hosts/new flow uses, returning the (raw, hash) pair.
func mintTestToken(t *testing.T, srv *Server) (raw, hash string) {
t.Helper()
tok, _, err := srv.mintEnrollmentToken(context.Background(),
"rest:http://r:8000/x/", "u", "p", []string{"/etc"})
if err != nil {
t.Fatalf("mint: %v", err)
}
return tok, auth.HashToken(tok)
}
// TestEnrollmentTokenRevokeDeletesRow: POST .../revoke removes the
// row and 303s back to /hosts/new.
func TestEnrollmentTokenRevokeDeletesRow(t *testing.T) {
t.Parallel()
srv, ts, st := rawTestServerWithUI(t)
_, hash := mintTestToken(t, srv)
cookie := loginAsAdmin(t, st)
req, _ := stdhttp.NewRequest("POST",
ts.URL+"/hosts/enrollment-tokens/"+hash+"/revoke",
strings.NewReader(""))
req.Header.Set("Content-Type", "application/x-www-form-urlencoded")
req.AddCookie(cookie)
cli := &stdhttp.Client{
CheckRedirect: func(*stdhttp.Request, []*stdhttp.Request) error {
return stdhttp.ErrUseLastResponse
},
}
res, err := cli.Do(req)
if err != nil {
t.Fatalf("do: %v", err)
}
defer res.Body.Close()
if res.StatusCode != stdhttp.StatusSeeOther {
t.Fatalf("status: got %d, want 303", res.StatusCode)
}
if loc := res.Header.Get("Location"); loc != "/hosts/new" {
t.Errorf("Location: got %q, want /hosts/new", loc)
}
if _, err := st.GetEnrollmentTokenAttachments(context.Background(), hash); !errors.Is(err, store.ErrNotFound) {
t.Errorf("post-revoke lookup: want ErrNotFound, got %v", err)
}
var n int
if err := st.DB().QueryRow(
`SELECT COUNT(*) FROM audit_log WHERE action = 'enrollment_token.revoked'`).Scan(&n); err != nil {
t.Fatalf("count audit: %v", err)
}
if n != 1 {
t.Errorf("audit rows: got %d, want 1", n)
}
}
// TestEnrollmentTokenRegenerateSwapsRow: POST .../regenerate revokes
// the old hash, mints a fresh raw token preserving the repo URL/user/
// password attachments, and 303s to the new pending page.
func TestEnrollmentTokenRegenerateSwapsRow(t *testing.T) {
t.Parallel()
srv, ts, st := rawTestServerWithUI(t)
oldRaw, oldHash := mintTestToken(t, srv)
cookie := loginAsAdmin(t, st)
req, _ := stdhttp.NewRequest("POST",
ts.URL+"/hosts/enrollment-tokens/"+oldHash+"/regenerate",
strings.NewReader(""))
req.Header.Set("Content-Type", "application/x-www-form-urlencoded")
req.AddCookie(cookie)
cli := &stdhttp.Client{
CheckRedirect: func(*stdhttp.Request, []*stdhttp.Request) error {
return stdhttp.ErrUseLastResponse
},
}
res, err := cli.Do(req)
if err != nil {
t.Fatalf("do: %v", err)
}
defer res.Body.Close()
if res.StatusCode != stdhttp.StatusSeeOther {
t.Fatalf("status: got %d, want 303", res.StatusCode)
}
loc := res.Header.Get("Location")
if !strings.HasPrefix(loc, "/hosts/pending/") {
t.Fatalf("Location: got %q, want /hosts/pending/<token>", loc)
}
newRaw := strings.TrimPrefix(loc, "/hosts/pending/")
if newRaw == "" || newRaw == oldRaw {
t.Fatalf("regenerate produced same/empty token (old=%q, new=%q)", oldRaw, newRaw)
}
// Old hash gone; new hash present with the same paths attachment.
if _, err := st.GetEnrollmentTokenAttachments(context.Background(), oldHash); !errors.Is(err, store.ErrNotFound) {
t.Errorf("old hash should be gone; got %v", err)
}
att, err := st.GetEnrollmentTokenAttachments(context.Background(), auth.HashToken(newRaw))
if err != nil {
t.Fatalf("new hash lookup: %v", err)
}
if len(att.InitialPaths) != 1 || att.InitialPaths[0] != "/etc" {
t.Errorf("attachments: got paths %v, want [/etc]", att.InitialPaths)
}
var n int
if err := st.DB().QueryRow(
`SELECT COUNT(*) FROM audit_log WHERE action = 'enrollment_token.regenerated'`).Scan(&n); err != nil {
t.Fatalf("count audit: %v", err)
}
if n != 1 {
t.Errorf("audit rows: got %d, want 1", n)
}
}
// TestEnrollmentTokenRegenerateMissingTokenRedirects: hitting
// regenerate with an unknown hash 303s back to /hosts/new without a
// 5xx (idempotent re-submit safety).
func TestEnrollmentTokenRegenerateMissingTokenRedirects(t *testing.T) {
t.Parallel()
_, ts, st := rawTestServerWithUI(t)
cookie := loginAsAdmin(t, st)
req, _ := stdhttp.NewRequest("POST",
ts.URL+"/hosts/enrollment-tokens/deadbeef/regenerate",
strings.NewReader(""))
req.Header.Set("Content-Type", "application/x-www-form-urlencoded")
req.AddCookie(cookie)
cli := &stdhttp.Client{
CheckRedirect: func(*stdhttp.Request, []*stdhttp.Request) error {
return stdhttp.ErrUseLastResponse
},
}
res, err := cli.Do(req)
if err != nil {
t.Fatalf("do: %v", err)
}
defer res.Body.Close()
if res.StatusCode != stdhttp.StatusSeeOther {
t.Fatalf("status: got %d, want 303", res.StatusCode)
}
if loc := res.Header.Get("Location"); loc != "/hosts/new" {
t.Errorf("Location: got %q, want /hosts/new", loc)
}
}
+10 -392
View File
@@ -8,14 +8,11 @@ import (
"io/fs"
"log/slog"
stdhttp "net/http"
"net/url"
"sort"
"strings"
"time"
"github.com/coder/websocket"
"github.com/go-chi/chi/v5"
"github.com/oklog/ulid/v2"
"gitea.dcglab.co.uk/steve/restic-manager/internal/api"
"gitea.dcglab.co.uk/steve/restic-manager/internal/auth"
@@ -69,10 +66,6 @@ func (s *Server) sessionUser(r *stdhttp.Request) (*ui.User, error) {
}
return nil, err
}
if u.DisabledAt != nil {
_ = s.deps.Store.DeleteSession(r.Context(), auth.HashToken(c.Value))
return nil, nil
}
return &ui.User{ID: u.ID, Username: u.Username, Role: string(u.Role)}, nil
}
@@ -130,41 +123,10 @@ func (s *Server) version() string {
// dashboardPage is the data the dashboard template renders against.
type dashboardPage struct {
Hosts []dashboardHostRow
HostCount int // unfiltered fleet size
ShownCount int // after every active filter
HostCount int
Summary store.FleetSummary
PendingHosts []store.PendingHost // announce-and-approve queue (P2-18d)
CritOpenCount int
// Tag filter state. ActiveTag is the chip currently selected
// ("" = all). KnownTags is the full set of tags in use across
// the fleet, used to render the chip-row.
ActiveTag string
KnownTags []string
// Filter / sort URL state (NS-04). Round-tripped through query
// string so a bookmarked / shared dashboard URL is durable, and
// passed back to the template so the form inputs and column
// header sort-arrows render with current state.
Filter dashboardFilter
// RefreshURL is the same dashboard URL with all current filters
// pinned, used by the htmx live-poll trigger to refetch the
// table without flashing the surrounding chrome.
RefreshURL string
// SortURL is a per-column URL builder: passing a column key
// returns the URL that sorts by that column (toggling direction
// when it's already active). Pre-computed so the template stays
// dumb.
SortURL map[string]string
}
// dashboardFilter holds the parsed query-string filter state.
type dashboardFilter struct {
Search string // hostname substring match (case-insensitive)
Status string // "" | "online" | "offline" | "never_seen"
RepoStatus string // "" | "unknown" | "ready" | "init_failed"
Tag string // mirrors ActiveTag for round-trip on links
Sort string // column key (see sortDashboard)
Dir string // "asc" | "desc"
}
// dashboardHostRow carries a host plus the per-row Run-now decision
@@ -231,18 +193,12 @@ func (s *Server) handleUIDashboard(w stdhttp.ResponseWriter, r *stdhttp.Request)
return
}
allHosts, err := s.deps.Store.ListHosts(r.Context())
hosts, err := s.deps.Store.ListHosts(r.Context())
if err != nil {
slog.Error("ui dashboard: list hosts", "err", err)
stdhttp.Error(w, "internal", stdhttp.StatusInternalServerError)
return
}
// Parse query-string filter + sort (NS-04). The tag chip-row is
// kept as ?tag= for backwards compat with existing bookmarks.
filter := parseDashboardFilter(r.URL.Query())
hosts := filterAndSortDashboardHosts(allHosts, filter)
knownTags, _ := s.deps.Store.DistinctHostTags(r.Context())
summary, err := s.deps.Store.FleetSummary(r.Context())
if err != nil {
slog.Error("ui dashboard: fleet summary", "err", err)
@@ -292,16 +248,10 @@ func (s *Server) handleUIDashboard(w stdhttp.ResponseWriter, r *stdhttp.Request)
view := s.baseView(r, u)
view.Page = dashboardPage{
Hosts: rows,
HostCount: len(allHosts),
ShownCount: len(rows),
HostCount: len(hosts),
Summary: summary,
PendingHosts: pending,
CritOpenCount: critOpenCount,
ActiveTag: filter.Tag,
KnownTags: knownTags,
Filter: filter,
RefreshURL: "/?" + filter.encode(),
SortURL: buildDashboardSortURLs(filter),
}
if err := s.deps.UI.Render(w, "dashboard", view); err != nil {
slog.Error("ui: render dashboard", "err", err)
@@ -309,182 +259,6 @@ func (s *Server) handleUIDashboard(w stdhttp.ResponseWriter, r *stdhttp.Request)
}
}
// parseDashboardFilter reads the query string into a dashboardFilter,
// normalising defaults (sort=name, dir=asc) so the rest of the
// pipeline doesn't have to special-case empty values.
func parseDashboardFilter(q url.Values) dashboardFilter {
f := dashboardFilter{
Search: strings.TrimSpace(q.Get("q")),
Status: q.Get("status"),
RepoStatus: q.Get("repo_status"),
Tag: q.Get("tag"),
Sort: q.Get("sort"),
Dir: q.Get("dir"),
}
if f.Sort == "" {
f.Sort = "name"
}
if f.Dir != "asc" && f.Dir != "desc" {
f.Dir = "asc"
}
return f
}
// encode rebuilds the filter as a URL-safe query string. Used for the
// live-refresh URL and for column-sort link composition.
func (f dashboardFilter) encode() string {
v := url.Values{}
if f.Search != "" {
v.Set("q", f.Search)
}
if f.Status != "" {
v.Set("status", f.Status)
}
if f.RepoStatus != "" {
v.Set("repo_status", f.RepoStatus)
}
if f.Tag != "" {
v.Set("tag", f.Tag)
}
if f.Sort != "" && f.Sort != "name" {
v.Set("sort", f.Sort)
}
if f.Dir != "" && f.Dir != "asc" {
v.Set("dir", f.Dir)
}
return v.Encode()
}
// filterAndSortDashboardHosts narrows a host list by the active
// filter dimensions, then sorts it by the chosen column/direction.
// Filter precedence: search ∧ status ∧ repo_status ∧ tag — every
// active filter has to match. Sort runs after filtering.
func filterAndSortDashboardHosts(hosts []store.Host, f dashboardFilter) []store.Host {
out := make([]store.Host, 0, len(hosts))
q := strings.ToLower(f.Search)
for _, h := range hosts {
if q != "" && !strings.Contains(strings.ToLower(h.Name), q) {
continue
}
if f.Status != "" {
switch f.Status {
case "online", "offline":
if h.Status != f.Status {
continue
}
case "never_seen":
if h.LastSeenAt != nil {
continue
}
}
}
if f.RepoStatus != "" {
// Backward compatibility: rows pre-NS-03 have an empty
// status string in memory if loaded before the migration
// scan added the column; treat that as "unknown".
rs := h.RepoStatus
if rs == "" {
rs = "unknown"
}
if rs != f.RepoStatus {
continue
}
}
if f.Tag != "" {
match := false
for _, t := range h.Tags {
if t == f.Tag {
match = true
break
}
}
if !match {
continue
}
}
out = append(out, h)
}
sortDashboardHosts(out, f.Sort, f.Dir)
return out
}
// sortDashboardHosts applies the column-by-direction sort in place.
// Unknown column key falls back to name asc — defensive default that
// keeps a malformed bookmarked URL from rendering an empty table.
func sortDashboardHosts(hosts []store.Host, col, dir string) {
less := func(i, j int) bool {
a, b := hosts[i], hosts[j]
switch col {
case "os":
if a.OS != b.OS {
return a.OS < b.OS
}
case "status":
if a.Status != b.Status {
return a.Status < b.Status
}
case "repo_status":
if a.RepoStatus != b.RepoStatus {
return a.RepoStatus < b.RepoStatus
}
case "restic":
if a.ResticVersion != b.ResticVersion {
return a.ResticVersion < b.ResticVersion
}
case "snapshot_count":
if a.SnapshotCount != b.SnapshotCount {
return a.SnapshotCount < b.SnapshotCount
}
case "repo_size":
if a.RepoSizeBytes != b.RepoSizeBytes {
return a.RepoSizeBytes < b.RepoSizeBytes
}
case "last_backup":
at, bt := time.Time{}, time.Time{}
if a.LastBackupAt != nil {
at = *a.LastBackupAt
}
if b.LastBackupAt != nil {
bt = *b.LastBackupAt
}
if !at.Equal(bt) {
return at.Before(bt)
}
}
// Stable secondary key: name.
return a.Name < b.Name
}
if dir == "desc" {
sort.Slice(hosts, func(i, j int) bool { return less(j, i) })
} else {
sort.Slice(hosts, less)
}
}
// buildDashboardSortURLs precomputes the link target for every
// sortable column header. Clicking the active column toggles
// direction; clicking a different column starts ascending.
func buildDashboardSortURLs(active dashboardFilter) map[string]string {
cols := []string{"name", "os", "status", "repo_status", "restic", "snapshot_count", "repo_size", "last_backup"}
out := make(map[string]string, len(cols))
for _, c := range cols {
f := active
f.Sort = c
if active.Sort == c && active.Dir == "asc" {
f.Dir = "desc"
} else {
f.Dir = "asc"
}
enc := f.encode()
if enc == "" {
out[c] = "/"
} else {
out[c] = "/?" + enc
}
}
return out
}
// Per-host Run-now and manual Init-repo were retired by the P2 redesign.
// Run-now lives at POST /hosts/{id}/source-groups/{gid}/run; init runs
// automatically on the agent's first WS connect after enrolment. Both
@@ -518,23 +292,6 @@ type addHostPage struct {
Paths string
ServerURL string
Error string
// Outstanding tokens (NS-02) — every still-valid (un-consumed,
// un-expired) enrolment token, surfaced so an operator who closed
// the install snippet tab can recover via Regenerate or revoke.
OutstandingTokens []addHostOutstandingToken
}
// addHostOutstandingToken is a UI-shaped projection of a row from
// store.ListOutstandingEnrollmentTokens with the repo URL already
// decrypted-and-redacted (no creds reach the browser).
type addHostOutstandingToken struct {
TokenHash string // full hex hash; opaque path param for actions
ShortHash string // first 12 chars of TokenHash for display
CreatedAt time.Time
ExpiresAt time.Time
RepoURL string // redacted (no embedded creds)
InitialPaths []string
}
// pendingHostPage is the GET /hosts/pending/{token} view. Lives
@@ -558,54 +315,13 @@ func (s *Server) handleUIAddHostGet(w stdhttp.ResponseWriter, r *stdhttp.Request
}
view := s.baseView(r, u)
view.Title = "Add host · restic-manager"
view.Page = addHostPage{
ServerURL: s.publicURL(r),
OutstandingTokens: s.loadOutstandingTokensForUI(r),
}
view.Page = addHostPage{ServerURL: s.publicURL(r)}
if err := s.deps.UI.Render(w, "add_host", view); err != nil {
slog.Error("ui: render add_host", "err", err)
stdhttp.Error(w, "internal", stdhttp.StatusInternalServerError)
}
}
// loadOutstandingTokensForUI fetches the still-valid enrolment tokens
// and decrypts each row's repo URL so the Add-host page can show a
// recoverable list. Decryption failures (rotated key etc.) are logged
// and surfaced as "(decrypt failed)" rather than crashing the page.
func (s *Server) loadOutstandingTokensForUI(r *stdhttp.Request) []addHostOutstandingToken {
rows, err := s.deps.Store.ListOutstandingEnrollmentTokens(r.Context())
if err != nil {
slog.Warn("ui add_host: list outstanding tokens", "err", err)
return nil
}
out := make([]addHostOutstandingToken, 0, len(rows))
for _, row := range rows {
short := row.TokenHash
if len(short) > 12 {
short = short[:12]
}
entry := addHostOutstandingToken{
TokenHash: row.TokenHash,
ShortHash: short,
CreatedAt: row.CreatedAt,
ExpiresAt: row.ExpiresAt,
InitialPaths: row.InitialPaths,
}
if row.EncRepoCreds != "" {
plain, derr := s.deps.AEAD.Decrypt(row.EncRepoCreds, []byte("token:"+row.TokenHash))
if derr != nil {
entry.RepoURL = "(decrypt failed — key rotation?)"
} else {
var blob repoCredsBlob
_ = json.Unmarshal(plain, &blob)
entry.RepoURL = restic.RedactURL(blob.RepoURL)
}
}
out = append(out, entry)
}
return out
}
// handleUIAddHostPost validates the form, mints the enrolment token
// (with encrypted repo creds), and 303-redirects to the persistent
// pending-host page. On validation errors we re-render the form
@@ -809,9 +525,6 @@ type hostChromeData struct {
SourceGroupCount int
ScheduleCount int
ScheduleVersion int64 // host_schedule_version (latest desired)
// KnownTags is the union of tags already in use across the fleet,
// used for autocomplete on the host-tags edit form. Cheap query.
KnownTags []string
// Auto-init status surfaced from the latest 'init' job.
// InitStatus is "succeeded" | "failed" | "running" | "queued" | "" (never run).
@@ -865,62 +578,9 @@ func (s *Server) loadHostChrome(r *stdhttp.Request, host store.Host, subtab, cru
}
d.RestoreAt = &t
}
if tags, err := s.deps.Store.DistinctHostTags(r.Context()); err == nil {
d.KnownTags = tags
}
return d
}
// handleUIHostTagsSave accepts a comma-separated tag list, normalises,
// dedups, and writes. Operator-band; mounted in server.go.
func (s *Server) handleUIHostTagsSave(w stdhttp.ResponseWriter, r *stdhttp.Request) {
u := s.requireUIUser(w, r)
if u == nil {
return
}
hostID := chi.URLParam(r, "id")
if _, err := s.deps.Store.GetHost(r.Context(), hostID); err != nil {
stdhttp.NotFound(w, r)
return
}
if err := r.ParseForm(); err != nil {
stdhttp.Error(w, "bad request", stdhttp.StatusBadRequest)
return
}
raw := r.PostForm.Get("tags")
tags := normaliseTags(raw)
if err := s.deps.Store.SetHostTags(r.Context(), hostID, tags); err != nil {
slog.Error("ui host tags: save", "host_id", hostID, "err", err)
stdhttp.Error(w, "internal", stdhttp.StatusInternalServerError)
return
}
_ = s.deps.Store.AppendAudit(r.Context(), store.AuditEntry{
ID: ulid.Make().String(), UserID: &u.ID, Actor: "user",
Action: "host.tags_updated",
TargetKind: ptr("host"), TargetID: &hostID,
TS: time.Now().UTC(),
})
stdhttp.Redirect(w, r, "/hosts/"+hostID, stdhttp.StatusSeeOther)
}
// normaliseTags splits a comma-separated string, lowercases each token,
// trims whitespace, drops empties, and dedupes. Order is preserved
// from first occurrence (so the user's typing order shows on screen).
func normaliseTags(raw string) []string {
parts := strings.Split(raw, ",")
seen := make(map[string]bool, len(parts))
out := make([]string, 0, len(parts))
for _, p := range parts {
t := strings.ToLower(strings.TrimSpace(p))
if t == "" || seen[t] {
continue
}
seen[t] = true
out = append(out, t)
}
return out
}
// hostDetailPage carries everything the host detail template needs.
type hostDetailPage struct {
hostChromeData
@@ -1174,20 +834,7 @@ func (s *Server) handleUILoginGet(w stdhttp.ResponseWriter, r *stdhttp.Request)
stdhttp.Redirect(w, r, "/", stdhttp.StatusSeeOther)
return
}
// First-run: no users + token still in memory ⇒ funnel the visitor
// to the bootstrap page so they don't have to know the API exists.
if s.bootstrapAvailable(r) {
stdhttp.Redirect(w, r, "/bootstrap", stdhttp.StatusSeeOther)
return
}
view := ui.ViewData{
Version: s.version(),
OIDCError: r.URL.Query().Get("oidc_error"),
}
if s.deps.OIDC != nil {
view.OIDCEnabled = true
view.OIDCDisplayName = s.deps.OIDC.DisplayName()
}
view := ui.ViewData{Version: s.version()}
if err := s.deps.UI.Render(w, "login", view); err != nil {
slog.Error("ui: render login", "err", err)
stdhttp.Error(w, "internal", stdhttp.StatusInternalServerError)
@@ -1213,10 +860,6 @@ func (s *Server) handleUILoginPost(w stdhttp.ResponseWriter, r *stdhttp.Request)
Username: username,
Error: "Invalid username or password.",
}
if s.deps.OIDC != nil {
view.OIDCEnabled = true
view.OIDCDisplayName = s.deps.OIDC.DisplayName()
}
w.WriteHeader(stdhttp.StatusUnauthorized)
if err := s.deps.UI.Render(w, "login", view); err != nil {
slog.Error("ui: render login (post-fail)", "err", err)
@@ -1226,37 +869,12 @@ func (s *Server) handleUILoginPost(w stdhttp.ResponseWriter, r *stdhttp.Request)
stdhttp.Redirect(w, r, "/", stdhttp.StatusSeeOther)
}
// handleUILogoutPost is the form-submit twin of /api/auth/logout. For
// local sessions it drops the cookie and redirects to /login. For OIDC
// sessions, if the IdP advertised an end_session_endpoint it performs
// RP-initiated logout by redirecting there with id_token_hint and
// post_logout_redirect_uri.
// handleUILogoutPost is the form-submit twin of /api/auth/logout. It
// drops the session cookie and redirects to /login.
func (s *Server) handleUILogoutPost(w stdhttp.ResponseWriter, r *stdhttp.Request) {
c, err := r.Cookie(sessionCookieName)
if err != nil {
stdhttp.Redirect(w, r, "/login", stdhttp.StatusSeeOther)
return
if c, err := r.Cookie(sessionCookieName); err == nil {
_ = s.deps.Store.DeleteSession(r.Context(), auth.HashToken(c.Value))
}
hash := auth.HashToken(c.Value)
sess, _ := s.deps.Store.LookupSession(r.Context(), hash)
_ = s.deps.Store.DeleteSession(r.Context(), hash)
// Default: drop session, go to /login.
dest := "/login"
// OIDC session with a discovered end_session_endpoint? Compose
// the IdP logout URL with id_token_hint + post_logout_redirect_uri.
if sess != nil && sess.IDToken != "" && s.deps.OIDC != nil &&
s.deps.OIDC.EndSessionEndpoint() != "" {
v := url.Values{}
v.Set("id_token_hint", sess.IDToken)
if base := strings.TrimRight(s.deps.Cfg.BaseURL, "/"); base != "" {
v.Set("post_logout_redirect_uri", base+"/login")
}
dest = s.deps.OIDC.EndSessionEndpoint() + "?" + v.Encode()
}
// Clear the cookie.
stdhttp.SetCookie(w, &stdhttp.Cookie{
Name: sessionCookieName,
Value: "",
@@ -1266,5 +884,5 @@ func (s *Server) handleUILogoutPost(w stdhttp.ResponseWriter, r *stdhttp.Request
Secure: s.deps.Cfg.CookieSecure,
SameSite: stdhttp.SameSiteLaxMode,
})
stdhttp.Redirect(w, r, dest, stdhttp.StatusSeeOther)
stdhttp.Redirect(w, r, "/login", stdhttp.StatusSeeOther)
}
-103
View File
@@ -1,103 +0,0 @@
// ui_host_delete.go — admin-band danger-zone host deletion (NS-01).
//
// Removes the host row from the store; FK cascades wipe schedules,
// jobs, snapshots metadata, source groups, alerts, host_credentials,
// host_repo_maintenance, host_repo_stats, and the schedule junction.
// Also closes the host's active WS connection so the agent's bearer
// stops being usable in the same tick (the bearer hash lives on the
// hosts row itself, so DeleteHost already revokes it for any future
// auth attempt — closing the live socket is the courtesy that drops
// the in-flight session).
//
// Audit-logged with action="host.deleted" so the trail records who
// performed the deletion and against which host.
package http
import (
"encoding/json"
"errors"
"log/slog"
stdhttp "net/http"
"strings"
"time"
"github.com/oklog/ulid/v2"
"gitea.dcglab.co.uk/steve/restic-manager/internal/store"
)
func (s *Server) handleUIHostDelete(w stdhttp.ResponseWriter, r *stdhttp.Request) {
u := s.requireUIUser(w, r)
if u == nil {
return
}
host, ok := s.loadHostForUI(w, r)
if !ok {
return
}
if err := r.ParseForm(); err != nil {
stdhttp.Error(w, "bad request", stdhttp.StatusBadRequest)
return
}
confirm := strings.TrimSpace(r.PostForm.Get("confirm_hostname"))
if confirm != host.Name {
// Mismatch — bounce back to host detail with a flash via the
// query string. The detail page doesn't render an error banner
// today; rather than thread a new field through the page model
// for one site, we rely on the JS confirm() the form already
// shows, plus a 303 back to the host page so the operator can
// see they're still there. Surfacing as a 400 with a tidy
// message keeps the audit trail clean.
stdhttp.Error(w,
"hostname confirmation did not match — go back and re-type",
stdhttp.StatusBadRequest)
return
}
// Drop any live WS session before pulling the row so the agent
// gets a clean close rather than discovering the rug-pull on the
// next read. A nil Conn just means the agent was already offline.
if s.deps.Hub != nil {
if c := s.deps.Hub.Conn(host.ID); c != nil {
_ = c.Close()
}
}
if err := s.deps.Store.DeleteHost(r.Context(), host.ID); err != nil {
if errors.Is(err, store.ErrNotFound) {
// Race: someone else deleted it between loadHostForUI and
// here. Treat as success.
stdhttp.Redirect(w, r, "/", stdhttp.StatusSeeOther)
return
}
slog.Error("ui host delete: store", "host_id", host.ID, "err", err)
stdhttp.Error(w, "internal", stdhttp.StatusInternalServerError)
return
}
uid := u.ID
hostID := host.ID
// Stash the host name in the audit payload so an operator reading
// the trail later sees *which* host was removed even though the
// row no longer exists.
payload, _ := json.Marshal(struct {
Name string `json:"name"`
}{Name: host.Name})
_ = s.deps.Store.AppendAudit(r.Context(), store.AuditEntry{
ID: ulid.Make().String(),
UserID: &uid,
Actor: "user",
Action: "host.deleted",
TargetKind: ptr("host"),
TargetID: &hostID,
TS: time.Now().UTC(),
Payload: payload,
})
if wantsHTML(r) {
w.Header().Set("HX-Redirect", "/")
w.WriteHeader(stdhttp.StatusNoContent)
return
}
stdhttp.Redirect(w, r, "/", stdhttp.StatusSeeOther)
}
-167
View File
@@ -1,167 +0,0 @@
// ui_host_delete_test.go — covers the admin-band danger-zone host
// delete handler: hostname-confirm gate, RBAC, FK cascade, redirect,
// audit.
package http
import (
"context"
"errors"
stdhttp "net/http"
"net/url"
"strings"
"testing"
"time"
"github.com/oklog/ulid/v2"
"gitea.dcglab.co.uk/steve/restic-manager/internal/auth"
"gitea.dcglab.co.uk/steve/restic-manager/internal/store"
)
// loginAsRole mints a fresh user of the given role and returns a
// session cookie. Local twin to keep the RBAC test self-contained
// without leaking yet another helper into the shared test package.
func loginAsRole(t *testing.T, st *store.Store, role store.Role) *stdhttp.Cookie {
t.Helper()
ctx := context.Background()
uid := ulid.Make().String()
hash, _ := auth.HashPassword("very-long-test-password")
if err := st.CreateUser(ctx, store.User{
ID: uid, Username: string(role) + "-" + uid[:6],
PasswordHash: hash, Role: role,
CreatedAt: time.Now().UTC(),
}); err != nil {
t.Fatalf("create user: %v", err)
}
tok, _ := auth.NewToken()
if err := st.CreateSession(ctx, store.Session{
UserID: uid,
CreatedAt: time.Now().UTC(),
ExpiresAt: time.Now().Add(time.Hour).UTC(),
}, auth.HashToken(tok)); err != nil {
t.Fatalf("create session: %v", err)
}
return &stdhttp.Cookie{Name: sessionCookieName, Value: tok}
}
// TestHostDeleteWrongHostnameRejected: typing a different name must
// not delete the host. Handler returns 400 and the row is intact.
func TestHostDeleteWrongHostnameRejected(t *testing.T) {
t.Parallel()
_, ts, st := rawTestServerWithUI(t)
hostID, _ := enrolHostForUI(t, nil, st, "del-wrong-host")
cookie := loginAsAdmin(t, st)
form := url.Values{"confirm_hostname": {"NOT-THE-NAME"}}
req, _ := stdhttp.NewRequest("POST", ts.URL+"/hosts/"+hostID+"/delete",
strings.NewReader(form.Encode()))
req.Header.Set("Content-Type", "application/x-www-form-urlencoded")
req.AddCookie(cookie)
res, err := stdhttp.DefaultClient.Do(req)
if err != nil {
t.Fatalf("do: %v", err)
}
defer res.Body.Close()
if res.StatusCode != stdhttp.StatusBadRequest {
t.Fatalf("status: got %d, want 400", res.StatusCode)
}
if _, err := st.GetHost(context.Background(), hostID); err != nil {
t.Fatalf("host should still exist; got %v", err)
}
}
// TestHostDeleteRequiresAdmin: a viewer or operator gets 403 — host
// stays intact.
func TestHostDeleteRequiresAdmin(t *testing.T) {
t.Parallel()
_, ts, st := rawTestServerWithUI(t)
hostID, _ := enrolHostForUI(t, nil, st, "del-rbac-host")
for _, role := range []store.Role{store.RoleViewer, store.RoleOperator} {
role := role
t.Run(string(role), func(t *testing.T) {
cookie := loginAsRole(t, st, role)
form := url.Values{"confirm_hostname": {"del-rbac-host"}}
req, _ := stdhttp.NewRequest("POST", ts.URL+"/hosts/"+hostID+"/delete",
strings.NewReader(form.Encode()))
req.Header.Set("Content-Type", "application/x-www-form-urlencoded")
req.AddCookie(cookie)
res, err := stdhttp.DefaultClient.Do(req)
if err != nil {
t.Fatalf("do: %v", err)
}
defer res.Body.Close()
if res.StatusCode != stdhttp.StatusForbidden {
t.Fatalf("status: got %d, want 403", res.StatusCode)
}
if _, err := st.GetHost(context.Background(), hostID); err != nil {
t.Fatalf("host should still exist; got %v", err)
}
})
}
}
// TestHostDeleteHappyPathCascadesAndAudits: matching hostname removes
// the row, FK cascade wipes the seeded job, and an audit row lands.
func TestHostDeleteHappyPathCascadesAndAudits(t *testing.T) {
t.Parallel()
_, ts, st := rawTestServerWithUI(t)
hostID, _ := enrolHostForUI(t, nil, st, "del-ok-host")
// Seed one dependent row to prove the cascade fires through HTTP.
if err := st.CreateJob(context.Background(), store.Job{
ID: ulid.Make().String(), HostID: hostID, Kind: "backup",
ActorKind: "system", CreatedAt: time.Now().UTC(),
}); err != nil {
t.Fatalf("seed job: %v", err)
}
cookie := loginAsAdmin(t, st)
form := url.Values{"confirm_hostname": {"del-ok-host"}}
req, _ := stdhttp.NewRequest("POST", ts.URL+"/hosts/"+hostID+"/delete",
strings.NewReader(form.Encode()))
req.Header.Set("Content-Type", "application/x-www-form-urlencoded")
req.AddCookie(cookie)
// Don't follow the redirect so we can assert it.
cli := &stdhttp.Client{
CheckRedirect: func(*stdhttp.Request, []*stdhttp.Request) error {
return stdhttp.ErrUseLastResponse
},
}
res, err := cli.Do(req)
if err != nil {
t.Fatalf("do: %v", err)
}
defer res.Body.Close()
if res.StatusCode != stdhttp.StatusSeeOther {
t.Fatalf("status: got %d, want 303", res.StatusCode)
}
if loc := res.Header.Get("Location"); loc != "/" {
t.Errorf("Location: got %q, want /", loc)
}
// Host gone.
if _, err := st.GetHost(context.Background(), hostID); !errors.Is(err, store.ErrNotFound) {
t.Errorf("GetHost after delete: want ErrNotFound, got %v", err)
}
// Cascade fired (job row gone).
var n int
if err := st.DB().QueryRow(`SELECT COUNT(*) FROM jobs WHERE host_id = ?`, hostID).Scan(&n); err != nil {
t.Fatalf("count jobs: %v", err)
}
if n != 0 {
t.Errorf("cascade left %d job rows", n)
}
// Audit row landed.
var audN int
if err := st.DB().QueryRow(
`SELECT COUNT(*) FROM audit_log WHERE action = 'host.deleted' AND target_id = ?`,
hostID).Scan(&audN); err != nil {
t.Fatalf("count audit: %v", err)
}
if audN != 1 {
t.Errorf("audit rows: got %d, want 1", audN)
}
}
-11
View File
@@ -334,19 +334,8 @@ func (s *Server) handleUIRepoCredentialsSave(w stdhttp.ResponseWriter, r *stdhtt
stdhttp.Error(w, "internal", stdhttp.StatusInternalServerError)
return
}
// NS-03: clear repo_status — the new creds may reach a different
// repo or fix an auth typo, so any prior probe outcome is stale.
if err := s.deps.Store.SetHostRepoStatus(r.Context(), host.ID, "unknown", ""); err != nil {
slog.Warn("ui repo creds: reset repo_status", "host_id", host.ID, "err", err)
}
if s.deps.Hub != nil && s.deps.Hub.Connected(host.ID) {
_ = s.pushRepoCredsToAgent(r.Context(), host.ID, existing)
// NS-03: probe the new creds immediately — surface bad
// password / wrong URL on the host detail page rather than at
// the next scheduled job.
if err := s.dispatchInitJob(r.Context(), host.ID, "user", &u.ID); err != nil {
slog.Warn("ui repo creds: dispatch init", "host_id", host.ID, "err", err)
}
}
stdhttp.Redirect(w, r, "/hosts/"+host.ID+"/repo?saved=credentials", stdhttp.StatusSeeOther)
}
-38
View File
@@ -1,38 +0,0 @@
// ui_repo_probe.go — NS-03 retry-probe handler. Re-dispatches an init
// job against a host so the operator can re-test creds / connectivity
// without typing the hostname (no destructive shape: restic init is
// idempotent against a populated repo, so this is safe to spam).
//
// On success the WS handler's job.finished hook flips repo_status
// back to "ready" (or "init_failed" with a fresh error message).
package http
import (
"log/slog"
stdhttp "net/http"
)
func (s *Server) handleUIRepoProbe(w stdhttp.ResponseWriter, r *stdhttp.Request) {
u := s.requireUIUser(w, r)
if u == nil {
return
}
host, ok := s.loadHostForUI(w, r)
if !ok {
return
}
if s.deps.Hub == nil || !s.deps.Hub.Connected(host.ID) {
s.renderRepoPage(w, r, u, host,
"Host is offline — bring the agent back up before probing.",
"", "", "")
return
}
if err := s.dispatchInitJob(r.Context(), host.ID, "user", &u.ID); err != nil {
slog.Warn("ui repo probe: dispatch", "host_id", host.ID, "err", err)
s.renderRepoPage(w, r, u, host,
"Probe dispatch failed — check the agent logs and try again.",
"", "", "")
return
}
stdhttp.Redirect(w, r, "/hosts/"+host.ID+"/repo?saved=probe", stdhttp.StatusSeeOther)
}
-109
View File
@@ -1,109 +0,0 @@
// ui_repo_probe_test.go — covers the NS-03 retry-probe handler: the
// 404 / offline-guarded path and the happy dispatch + audit + redirect.
package http
import (
"context"
stdhttp "net/http"
"net/url"
"strings"
"testing"
"gitea.dcglab.co.uk/steve/restic-manager/internal/api"
)
// TestRepoProbeOfflineRendersBanner: hitting probe for an offline
// host re-renders the repo page with a 422 banner; no init job lands.
func TestRepoProbeOfflineRendersBanner(t *testing.T) {
t.Parallel()
_, ts, st := rawTestServerWithUI(t)
hostID, _ := enrolHostForUI(t, nil, st, "probe-offline-host")
cookie := loginAsAdmin(t, st)
req, _ := stdhttp.NewRequest("POST", ts.URL+"/hosts/"+hostID+"/repo/probe",
strings.NewReader(""))
req.Header.Set("Content-Type", "application/x-www-form-urlencoded")
req.AddCookie(cookie)
res, err := stdhttp.DefaultClient.Do(req)
if err != nil {
t.Fatalf("do: %v", err)
}
defer res.Body.Close()
if res.StatusCode != stdhttp.StatusUnprocessableEntity {
t.Fatalf("status: got %d, want 422", res.StatusCode)
}
var n int
if err := st.DB().QueryRow(
`SELECT COUNT(*) FROM jobs WHERE host_id = ? AND kind = ? AND actor_kind = 'user'`,
hostID, string(api.JobInit)).Scan(&n); err != nil {
t.Fatalf("count jobs: %v", err)
}
if n != 0 {
t.Errorf("user-actor init jobs: got %d, want 0 (offline guard bypassed)", n)
}
}
// TestRepoProbeDispatchesWhenOnline: with the agent connected, a
// probe creates a user-actor init job and audits.
func TestRepoProbeDispatchesWhenOnline(t *testing.T) {
t.Parallel()
srv, ts, st := rawTestServerWithUI(t)
hostID, token := enrolHostForUI(t, nil, st, "probe-ok-host")
c := agentDial(t, srv, ts, hostID, token)
sendHello(t, c, "probe-ok-host")
_ = drainUntil(t, c, api.MsgScheduleSet)
cookie := loginAsAdmin(t, st)
form := url.Values{}
req, _ := stdhttp.NewRequest("POST", ts.URL+"/hosts/"+hostID+"/repo/probe",
strings.NewReader(form.Encode()))
req.Header.Set("Content-Type", "application/x-www-form-urlencoded")
req.AddCookie(cookie)
cli := &stdhttp.Client{
CheckRedirect: func(*stdhttp.Request, []*stdhttp.Request) error {
return stdhttp.ErrUseLastResponse
},
}
res, err := cli.Do(req)
if err != nil {
t.Fatalf("do: %v", err)
}
defer res.Body.Close()
if res.StatusCode != stdhttp.StatusSeeOther {
t.Fatalf("status: got %d, want 303", res.StatusCode)
}
if loc := res.Header.Get("Location"); !strings.Contains(loc, "saved=probe") {
t.Errorf("Location: got %q, want saved=probe", loc)
}
var n int
if err := st.DB().QueryRow(
`SELECT COUNT(*) FROM jobs WHERE host_id = ? AND kind = ? AND actor_kind = 'user'`,
hostID, string(api.JobInit)).Scan(&n); err != nil {
t.Fatalf("count jobs: %v", err)
}
if n != 1 {
t.Errorf("user-actor init jobs: got %d, want 1", n)
}
var auditN int
if err := st.DB().QueryRow(
`SELECT COUNT(*) FROM audit_log WHERE action = 'host.repo_init_dispatched' AND target_id = ?`,
hostID).Scan(&auditN); err != nil {
t.Fatalf("count audit: %v", err)
}
if auditN != 1 {
t.Errorf("audit rows: got %d, want 1", auditN)
}
// Sanity: the host still exists and we can cleanly read repo status
// (it stays "unknown" because the agent never replies in this test).
host, err := st.GetHost(context.Background(), hostID)
if err != nil {
t.Fatalf("get host: %v", err)
}
if host.RepoStatus != "unknown" {
t.Errorf("repo_status: got %q, want unknown (no probe reply yet)", host.RepoStatus)
}
}
+6 -8
View File
@@ -391,15 +391,13 @@ func (s *Server) handleUIRestoreTree(w stdhttp.ResponseWriter, r *stdhttp.Reques
// defaultRestoreTargetDir is the placeholder shown on the step-3
// New-directory radio card and the value used when the operator
// leaves the field blank. The agent runs as root under systemd, so
// we surface /root explicitly rather than $HOME — operators were
// confused by "agent user's home" copy when the underlying user is
// always root anyway. <job-id> is substituted at dispatch. The unit
// no longer pins ReadWritePaths (ProtectSystem=full + no ProtectHome),
// so operators can point this at /home/<user>/<wherever> directly
// when they want a specific destination.
// leaves the field blank. $HOME resolves agent-side (typically /root
// for the systemd-as-root unit); <job-id> is substituted at dispatch.
// The systemd unit pins ReadWritePaths to include the agent user's
// home/rm-restore subdir so this default actually works under the
// sandbox.
func defaultRestoreTargetDir() string {
return "/root/rm-restore/<job-id>/"
return "$HOME/rm-restore/<job-id>/"
}
// looksLikeRestoreTarget validates the operator-supplied target dir
+2 -2
View File
@@ -302,8 +302,8 @@ func TestRestorePostHappyPathDispatches(t *testing.T) {
if cp.Restore.InPlace {
t.Fatal("expected new-directory mode (in_place=false)")
}
if !strings.HasPrefix(cp.Restore.TargetDir, "/root/rm-restore/") {
t.Fatalf("target_dir: got %q, want prefix /root/rm-restore/", cp.Restore.TargetDir)
if !strings.HasPrefix(cp.Restore.TargetDir, "$HOME/rm-restore/") {
t.Fatalf("target_dir: got %q, want prefix $HOME/rm-restore/", cp.Restore.TargetDir)
}
// <job-id> placeholder substituted with the dispatched job_id.
if !strings.Contains(cp.Restore.TargetDir, "/01") {
-501
View File
@@ -1,501 +0,0 @@
// ui_users.go — Settings → Users HTML handlers (admin-only).
//
// Routes (wired in server.go's admin band):
//
// GET /settings/users → handleUIUsersList (this task)
// GET /settings/users/new → F2
// POST /settings/users/new → F2
// GET /settings/users/{id}/edit → F3
// POST /settings/users/{id}/edit → F3
// GET /settings/users/{id}/setup-link → F2
// POST /settings/users/{id}/disable → F3
// POST /settings/users/{id}/enable → F3
// POST /settings/users/{id}/regenerate-setup → F3
// POST /settings/users/{id}/force-logout → F3
package http
import (
"errors"
"log/slog"
stdhttp "net/http"
"net/mail"
"net/url"
"strings"
"time"
"github.com/go-chi/chi/v5"
"github.com/oklog/ulid/v2"
"gitea.dcglab.co.uk/steve/restic-manager/internal/store"
)
type usersPage struct {
Users []userRow
ShowDisabled bool
Sort string // "username" | "email" | "role" | "last_login_at"
Dir string // "asc" | "desc"
// SortHrefs is a fully-encoded /settings/users?…&sort=COL&dir=…
// for each sortable column. Built server-side because constructing
// the querystring inside <a href="…"> in html/template applies
// URL-attribute escaping to '=' (turning 'show_disabled=1' into
// 'show_disabled%3D1'), which silently drops every filter on click.
// Same shape as the audit page's SortHrefs.
SortHrefs map[string]string
}
type userRow struct {
ID string
Username string
Email string
Role string
LastLoginAt string // pre-formatted "2006-01-02 15:04:05" or "never"
Disabled bool
MustChangePassword bool
AuthSource string
}
func (s *Server) handleUIUsersList(w stdhttp.ResponseWriter, r *stdhttp.Request) {
u := s.requireUIUser(w, r)
if u == nil {
return
}
q := r.URL.Query()
showDisabled := q.Get("show_disabled") == "1"
// Resolve sort against the allowlist. Default: username ASC.
resolvedSort := "username"
switch q.Get("sort") {
case "username", "email", "role", "last_login_at":
resolvedSort = q.Get("sort")
}
asc := q.Get("dir") != "desc"
if q.Get("sort") == "" {
// No explicit sort param → default ASC even though dir
// querystring might be missing (fresh page load).
asc = true
}
dirStr := "desc"
if asc {
dirStr = "asc"
}
users, err := s.deps.Store.ListUsers(r.Context(), store.UserSort{
OrderBy: resolvedSort, OrderAsc: asc,
})
if err != nil {
slog.Error("ui users: list", "err", err)
stdhttp.Error(w, "internal", stdhttp.StatusInternalServerError)
return
}
rows := make([]userRow, 0, len(users))
for _, ux := range users {
if !showDisabled && ux.DisabledAt != nil {
continue
}
em := ""
if ux.Email != nil {
em = *ux.Email
}
ll := "never"
if ux.LastLoginAt != nil {
ll = ux.LastLoginAt.UTC().Format("2006-01-02 15:04:05")
}
rows = append(rows, userRow{
ID: ux.ID, Username: ux.Username, Email: em,
Role: string(ux.Role), LastLoginAt: ll,
Disabled: ux.DisabledAt != nil,
MustChangePassword: ux.MustChangePassword,
AuthSource: ux.AuthSource,
})
}
// Pre-build per-column hrefs so the template just emits them.
// Same pattern as ui_audit's SortHrefs — sidesteps html/template
// URL-attribute escaping turning '=' into '%3D'.
base := url.Values{}
if showDisabled {
base.Set("show_disabled", "1")
}
hrefs := make(map[string]string, 4)
for _, col := range []string{"username", "email", "role", "last_login_at"} {
v := url.Values{}
for k, vs := range base {
v[k] = vs
}
v.Set("sort", col)
newDir := "asc" // sensible default for unactive columns
if col == resolvedSort && asc {
newDir = "desc"
}
v.Set("dir", newDir)
hrefs[col] = "/settings/users?" + v.Encode()
}
view := s.baseView(r, u)
view.Title = "Users · restic-manager"
view.Active = "settings"
view.Page = usersPage{
Users: rows, ShowDisabled: showDisabled,
Sort: resolvedSort, Dir: dirStr,
SortHrefs: hrefs,
}
if err := s.deps.UI.Render(w, "users", view); err != nil {
slog.Error("ui users: render", "err", err)
}
}
type userFormPage struct {
Mode string // "new" | "edit" | "setup-link"
ID string
Username string
Email string
Role string
Disabled bool
HasSetup bool
SetupURL string
SetupExpAt time.Time
Error string
// Reenable is set when the admin landed here because they tried
// to add a username that already exists (disabled). Triggers a
// banner on the edit page explaining why and steering them at
// the Re-enable button. See handleUIUserNewPost's collision branch.
Reenable bool
AuthSource string
}
func (s *Server) handleUIUserNewGet(w stdhttp.ResponseWriter, r *stdhttp.Request) {
u := s.requireUIUser(w, r)
if u == nil {
return
}
view := s.baseView(r, u)
view.Title = "New user · restic-manager"
view.Active = "settings"
view.Page = userFormPage{Mode: "new", Role: "operator"}
_ = s.deps.UI.Render(w, "user_edit", view)
}
func (s *Server) handleUIUserNewPost(w stdhttp.ResponseWriter, r *stdhttp.Request) {
u := s.requireUIUser(w, r)
if u == nil {
return
}
if err := r.ParseForm(); err != nil {
stdhttp.Error(w, "bad request", stdhttp.StatusBadRequest)
return
}
uname := strings.ToLower(strings.TrimSpace(r.PostForm.Get("username")))
email := strings.TrimSpace(r.PostForm.Get("email"))
role, ok := validRole(r.PostForm.Get("role"))
if uname == "" || !ok {
view := s.baseView(r, u)
view.Title = "New user · restic-manager"
view.Active = "settings"
view.Page = userFormPage{
Mode: "new", Username: uname, Email: email,
Role: r.PostForm.Get("role"),
Error: "Username is required and role must be admin/operator/viewer.",
}
_ = s.deps.UI.Render(w, "user_edit", view)
return
}
if email != "" {
if _, err := mail.ParseAddress(email); err != nil {
view := s.baseView(r, u)
view.Title = "New user · restic-manager"
view.Active = "settings"
view.Page = userFormPage{
Mode: "new", Username: uname, Email: email,
Role: r.PostForm.Get("role"),
Error: "Email is not a valid address.",
}
_ = s.deps.UI.Render(w, "user_edit", view)
return
}
}
// Same collision logic as the API.
existing, err := s.deps.Store.GetUserByUsername(r.Context(), uname)
if err == nil {
if existing.DisabledAt != nil {
// Punt the admin to the edit page where Re-enable is one click.
stdhttp.Redirect(w, r, "/settings/users/"+existing.ID+
"/edit?reenable=1", stdhttp.StatusSeeOther)
return
}
view := s.baseView(r, u)
view.Title = "New user · restic-manager"
view.Active = "settings"
view.Page = userFormPage{
Mode: "new", Username: uname, Email: email,
Role: r.PostForm.Get("role"),
Error: "A user with that name already exists.",
}
_ = s.deps.UI.Render(w, "user_edit", view)
return
} else if !errors.Is(err, store.ErrNotFound) {
stdhttp.Error(w, "internal", stdhttp.StatusInternalServerError)
return
}
id := ulid.Make().String()
now := time.Now().UTC()
var emailPtr *string
if email != "" {
em := strings.ToLower(email)
emailPtr = &em
}
if err := s.deps.Store.CreateUser(r.Context(), store.User{
ID: id, Username: uname, PasswordHash: "",
Role: role, Email: emailPtr, CreatedAt: now,
MustChangePassword: true,
}); err != nil {
stdhttp.Error(w, "internal", stdhttp.StatusInternalServerError)
return
}
rawToken, err := generateSetupToken()
if err != nil {
stdhttp.Error(w, "internal", stdhttp.StatusInternalServerError)
return
}
if err := s.deps.Store.SetSetupToken(r.Context(), store.SetupToken{
UserID: id, TokenHash: hashSetupToken(rawToken),
ExpiresAt: now.Add(time.Hour),
CreatedAt: now, CreatedBy: &u.ID,
}); err != nil {
stdhttp.Error(w, "internal", stdhttp.StatusInternalServerError)
return
}
_ = s.deps.Store.AppendAudit(r.Context(), store.AuditEntry{
ID: ulid.Make().String(), UserID: &u.ID, Actor: "user",
Action: "user.created", TargetKind: ptr("user"), TargetID: &id,
TS: now,
})
stdhttp.Redirect(w, r,
"/settings/users/"+id+"/setup-link?token="+rawToken,
stdhttp.StatusSeeOther)
}
func (s *Server) handleUIUserEditGet(w stdhttp.ResponseWriter, r *stdhttp.Request) {
u := s.requireUIUser(w, r)
if u == nil {
return
}
id := chi.URLParam(r, "id")
target, err := s.deps.Store.GetUserByID(r.Context(), id)
if err != nil {
stdhttp.NotFound(w, r)
return
}
em := ""
if target.Email != nil {
em = *target.Email
}
view := s.baseView(r, u)
view.Title = "Edit user · restic-manager"
view.Active = "settings"
view.Page = userFormPage{
Mode: "edit", ID: target.ID, Username: target.Username,
Email: em, Role: string(target.Role),
Disabled: target.DisabledAt != nil,
Reenable: r.URL.Query().Get("reenable") == "1",
AuthSource: target.AuthSource,
}
_ = s.deps.UI.Render(w, "user_edit", view)
}
func (s *Server) handleUIUserEditPost(w stdhttp.ResponseWriter, r *stdhttp.Request) {
u := s.requireUIUser(w, r)
if u == nil {
return
}
if err := r.ParseForm(); err != nil {
stdhttp.Error(w, "bad request", stdhttp.StatusBadRequest)
return
}
id := chi.URLParam(r, "id")
target, err := s.deps.Store.GetUserByID(r.Context(), id)
if err != nil {
stdhttp.NotFound(w, r)
return
}
if target.AuthSource == "oidc" {
stdhttp.Error(w, "OIDC users cannot have role/email edited locally", stdhttp.StatusForbidden)
return
}
role, ok := validRole(r.PostForm.Get("role"))
if !ok {
stdhttp.Error(w, "bad role", stdhttp.StatusBadRequest)
return
}
email := strings.TrimSpace(r.PostForm.Get("email"))
if email != "" {
if _, err := mail.ParseAddress(email); err != nil {
stdhttp.Error(w, "bad email", stdhttp.StatusBadRequest)
return
}
}
if target.Role == store.RoleAdmin && role != store.RoleAdmin && target.DisabledAt == nil {
n, _ := s.deps.Store.CountEnabledAdmins(r.Context())
if n <= 1 {
stdhttp.Error(w, "cannot demote last admin", stdhttp.StatusConflict)
return
}
}
if err := s.deps.Store.SetUserRole(r.Context(), id, role); err != nil {
stdhttp.Error(w, "internal", stdhttp.StatusInternalServerError)
return
}
if err := s.deps.Store.SetUserEmail(r.Context(), id, email); err != nil {
stdhttp.Error(w, "internal", stdhttp.StatusInternalServerError)
return
}
_ = s.deps.Store.AppendAudit(r.Context(), store.AuditEntry{
ID: ulid.Make().String(), UserID: &u.ID, Actor: "user",
Action: "user.updated", TargetKind: ptr("user"), TargetID: &id,
TS: time.Now().UTC(),
})
stdhttp.Redirect(w, r, "/settings/users", stdhttp.StatusSeeOther)
}
func (s *Server) handleUIUserDisablePost(w stdhttp.ResponseWriter, r *stdhttp.Request) {
u := s.requireUIUser(w, r)
if u == nil {
return
}
id := chi.URLParam(r, "id")
target, err := s.deps.Store.GetUserByID(r.Context(), id)
if err != nil {
stdhttp.NotFound(w, r)
return
}
if target.Role == store.RoleAdmin && target.DisabledAt == nil {
n, _ := s.deps.Store.CountEnabledAdmins(r.Context())
if n <= 1 {
stdhttp.Error(w, "cannot disable last admin", stdhttp.StatusConflict)
return
}
}
now := time.Now().UTC()
if err := s.deps.Store.DisableUser(r.Context(), id, now); err != nil {
stdhttp.Error(w, "internal", stdhttp.StatusInternalServerError)
return
}
_, _ = s.deps.Store.DeleteSessionsByUserID(r.Context(), id)
_ = s.deps.Store.AppendAudit(r.Context(), store.AuditEntry{
ID: ulid.Make().String(), UserID: &u.ID, Actor: "user",
Action: "user.disabled", TargetKind: ptr("user"), TargetID: &id,
TS: now,
})
stdhttp.Redirect(w, r, "/settings/users", stdhttp.StatusSeeOther)
}
func (s *Server) handleUIUserEnablePost(w stdhttp.ResponseWriter, r *stdhttp.Request) {
u := s.requireUIUser(w, r)
if u == nil {
return
}
id := chi.URLParam(r, "id")
if err := s.deps.Store.EnableUser(r.Context(), id); err != nil {
stdhttp.Error(w, "internal", stdhttp.StatusInternalServerError)
return
}
_ = s.deps.Store.AppendAudit(r.Context(), store.AuditEntry{
ID: ulid.Make().String(), UserID: &u.ID, Actor: "user",
Action: "user.enabled", TargetKind: ptr("user"), TargetID: &id,
TS: time.Now().UTC(),
})
stdhttp.Redirect(w, r, "/settings/users/"+id+"/edit", stdhttp.StatusSeeOther)
}
func (s *Server) handleUIUserRegenerateSetupPost(w stdhttp.ResponseWriter, r *stdhttp.Request) {
u := s.requireUIUser(w, r)
if u == nil {
return
}
id := chi.URLParam(r, "id")
if _, err := s.deps.Store.GetUserByID(r.Context(), id); err != nil {
stdhttp.NotFound(w, r)
return
}
rawToken, err := generateSetupToken()
if err != nil {
stdhttp.Error(w, "internal", stdhttp.StatusInternalServerError)
return
}
now := time.Now().UTC()
if err := s.deps.Store.SetSetupToken(r.Context(), store.SetupToken{
UserID: id, TokenHash: hashSetupToken(rawToken),
ExpiresAt: now.Add(time.Hour), CreatedAt: now,
CreatedBy: &u.ID,
}); err != nil {
stdhttp.Error(w, "internal", stdhttp.StatusInternalServerError)
return
}
_ = s.deps.Store.SetMustChangePassword(r.Context(), id, true)
_ = s.deps.Store.AppendAudit(r.Context(), store.AuditEntry{
ID: ulid.Make().String(), UserID: &u.ID, Actor: "user",
Action: "user.setup_token.regenerated",
TargetKind: ptr("user"), TargetID: &id, TS: now,
})
stdhttp.Redirect(w, r,
"/settings/users/"+id+"/setup-link?token="+rawToken,
stdhttp.StatusSeeOther)
}
func (s *Server) handleUIUserForceLogoutPost(w stdhttp.ResponseWriter, r *stdhttp.Request) {
u := s.requireUIUser(w, r)
if u == nil {
return
}
id := chi.URLParam(r, "id")
_, err := s.deps.Store.DeleteSessionsByUserID(r.Context(), id)
if err != nil {
stdhttp.Error(w, "internal", stdhttp.StatusInternalServerError)
return
}
_ = s.deps.Store.AppendAudit(r.Context(), store.AuditEntry{
ID: ulid.Make().String(), UserID: &u.ID, Actor: "user",
Action: "user.force_logout",
TargetKind: ptr("user"), TargetID: &id,
TS: time.Now().UTC(),
})
stdhttp.Redirect(w, r, "/settings/users/"+id+"/edit", stdhttp.StatusSeeOther)
}
func (s *Server) handleUIUserSetupLinkGet(w stdhttp.ResponseWriter, r *stdhttp.Request) {
u := s.requireUIUser(w, r)
if u == nil {
return
}
id := chi.URLParam(r, "id")
target, err := s.deps.Store.GetUserByID(r.Context(), id)
if err != nil {
stdhttp.NotFound(w, r)
return
}
rawToken := r.URL.Query().Get("token")
tok, err := s.deps.Store.GetSetupTokenByUserID(r.Context(), id)
if err != nil || rawToken == "" {
w.WriteHeader(stdhttp.StatusGone)
view := s.baseView(r, u)
view.Title = "Link expired · restic-manager"
view.Active = "settings"
view.Page = userFormPage{
Mode: "setup-link", ID: target.ID, Username: target.Username,
Error: "expired",
}
_ = s.deps.UI.Render(w, "user_edit", view)
return
}
view := s.baseView(r, u)
view.Title = "Setup link · restic-manager"
view.Active = "settings"
view.Page = userFormPage{
Mode: "setup-link", ID: target.ID, Username: target.Username,
Role: string(target.Role), HasSetup: true,
SetupURL: s.deps.Cfg.BaseURL + "/setup?token=" + rawToken,
SetupExpAt: tok.ExpiresAt,
}
_ = s.deps.UI.Render(w, "user_edit", view)
}
-301
View File
@@ -1,301 +0,0 @@
package http
import (
"bytes"
"encoding/json"
"io"
stdhttp "net/http"
"strings"
"testing"
"time"
"gitea.dcglab.co.uk/steve/restic-manager/internal/store"
)
func TestAPIUsersList(t *testing.T) {
t.Parallel()
srv, ts, _ := rawTestServerWithUI(t)
adminID := makeUser(t, srv, "admin1", store.RoleAdmin)
makeUser(t, srv, "op1", store.RoleOperator)
cookie := loginAs(t, srv, adminID)
req, _ := stdhttp.NewRequest("GET", ts.URL+"/api/users", nil)
req.AddCookie(cookie)
res, err := stdhttp.DefaultClient.Do(req)
if err != nil {
t.Fatalf("GET: %v", err)
}
defer res.Body.Close()
if res.StatusCode != stdhttp.StatusOK {
body, _ := io.ReadAll(res.Body)
t.Fatalf("status: got %d body=%s", res.StatusCode, body)
}
var got listUsersResponse
_ = json.NewDecoder(res.Body).Decode(&got)
if len(got.Users) != 2 {
t.Errorf("count: got %d want 2", len(got.Users))
}
}
func TestAPIUserCreate(t *testing.T) {
t.Parallel()
srv, ts, _ := rawTestServerWithUI(t)
adminID := makeUser(t, srv, "admin1", store.RoleAdmin)
cookie := loginAs(t, srv, adminID)
body, _ := json.Marshal(map[string]any{
"username": "Bob", "email": "bob@example.com", "role": "operator",
})
req, _ := stdhttp.NewRequest("POST", ts.URL+"/api/users", bytes.NewReader(body))
req.AddCookie(cookie)
req.Header.Set("Content-Type", "application/json")
res, err := stdhttp.DefaultClient.Do(req)
if err != nil {
t.Fatalf("POST: %v", err)
}
defer res.Body.Close()
if res.StatusCode != stdhttp.StatusCreated {
body, _ := io.ReadAll(res.Body)
t.Fatalf("status: got %d body=%s", res.StatusCode, body)
}
var got struct {
ID string `json:"id"`
SetupURL string `json:"setup_url"`
}
_ = json.NewDecoder(res.Body).Decode(&got)
if got.ID == "" || got.SetupURL == "" {
t.Errorf("missing fields: %+v", got)
}
if !strings.Contains(got.SetupURL, "/setup?token=") {
t.Errorf("setup_url shape: %q", got.SetupURL)
}
// Verify lowercase-normalised.
u, err := srv.deps.Store.GetUserByUsername(t.Context(), "bob")
if err != nil {
t.Fatalf("get: %v", err)
}
if u.Username != "bob" {
t.Errorf("username: got %q want bob", u.Username)
}
if !u.MustChangePassword {
t.Error("must_change_password not set")
}
}
func TestAPIUserCreateRejectsDuplicateEnabled(t *testing.T) {
t.Parallel()
srv, ts, _ := rawTestServerWithUI(t)
adminID := makeUser(t, srv, "admin1", store.RoleAdmin)
makeUser(t, srv, "alice", store.RoleOperator)
cookie := loginAs(t, srv, adminID)
body, _ := json.Marshal(map[string]any{
"username": "ALICE", "role": "operator",
})
req, _ := stdhttp.NewRequest("POST", ts.URL+"/api/users", bytes.NewReader(body))
req.AddCookie(cookie)
req.Header.Set("Content-Type", "application/json")
res, err := stdhttp.DefaultClient.Do(req)
if err != nil {
t.Fatalf("POST: %v", err)
}
defer res.Body.Close()
if res.StatusCode != stdhttp.StatusConflict {
t.Errorf("status: got %d want 409", res.StatusCode)
}
}
func TestAPIUserGet(t *testing.T) {
t.Parallel()
srv, ts, _ := rawTestServerWithUI(t)
adminID := makeUser(t, srv, "admin1", store.RoleAdmin)
target := makeUser(t, srv, "carol", store.RoleViewer)
cookie := loginAs(t, srv, adminID)
req, _ := stdhttp.NewRequest("GET", ts.URL+"/api/users/"+target, nil)
req.AddCookie(cookie)
res, err := stdhttp.DefaultClient.Do(req)
if err != nil {
t.Fatalf("GET: %v", err)
}
defer res.Body.Close()
if res.StatusCode != stdhttp.StatusOK {
t.Errorf("status: got %d", res.StatusCode)
}
}
func TestAPIUserPatchRoleAndEmail(t *testing.T) {
t.Parallel()
srv, ts, _ := rawTestServerWithUI(t)
adminID := makeUser(t, srv, "admin1", store.RoleAdmin)
target := makeUser(t, srv, "carol", store.RoleViewer)
cookie := loginAs(t, srv, adminID)
body, _ := json.Marshal(map[string]any{
"role": "operator", "email": "carol@example.com",
})
req, _ := stdhttp.NewRequest("PATCH", ts.URL+"/api/users/"+target, bytes.NewReader(body))
req.AddCookie(cookie)
req.Header.Set("Content-Type", "application/json")
res, err := stdhttp.DefaultClient.Do(req)
if err != nil {
t.Fatalf("PATCH: %v", err)
}
defer res.Body.Close()
if res.StatusCode != stdhttp.StatusOK {
body, _ := io.ReadAll(res.Body)
t.Errorf("status: got %d body=%s", res.StatusCode, body)
}
got, _ := srv.deps.Store.GetUserByID(t.Context(), target)
if got.Role != store.RoleOperator {
t.Errorf("role: got %q", got.Role)
}
if got.Email == nil || *got.Email != "carol@example.com" {
t.Errorf("email: got %v", got.Email)
}
}
func TestAPIUserPatchRejectsLastAdminDemote(t *testing.T) {
t.Parallel()
srv, ts, _ := rawTestServerWithUI(t)
adminID := makeUser(t, srv, "admin1", store.RoleAdmin)
cookie := loginAs(t, srv, adminID)
body, _ := json.Marshal(map[string]any{"role": "viewer"})
req, _ := stdhttp.NewRequest("PATCH", ts.URL+"/api/users/"+adminID, bytes.NewReader(body))
req.AddCookie(cookie)
req.Header.Set("Content-Type", "application/json")
res, err := stdhttp.DefaultClient.Do(req)
if err != nil {
t.Fatalf("PATCH: %v", err)
}
defer res.Body.Close()
if res.StatusCode != stdhttp.StatusConflict {
t.Errorf("status: got %d want 409", res.StatusCode)
}
}
func TestAPIUserDisable(t *testing.T) {
t.Parallel()
srv, ts, _ := rawTestServerWithUI(t)
adminID := makeUser(t, srv, "admin1", store.RoleAdmin)
makeUser(t, srv, "admin2", store.RoleAdmin) // satisfy last-admin guard
target := makeUser(t, srv, "victim", store.RoleOperator)
cookie := loginAs(t, srv, adminID)
req, _ := stdhttp.NewRequest("POST", ts.URL+"/api/users/"+target+"/disable", nil)
req.AddCookie(cookie)
res, err := stdhttp.DefaultClient.Do(req)
if err != nil {
t.Fatalf("POST: %v", err)
}
defer res.Body.Close()
if res.StatusCode != stdhttp.StatusOK {
t.Errorf("status: got %d", res.StatusCode)
}
u, _ := srv.deps.Store.GetUserByID(t.Context(), target)
if u.DisabledAt == nil {
t.Error("disabled_at not set")
}
}
func TestAPIUserDisableRejectsLastAdmin(t *testing.T) {
t.Parallel()
srv, ts, _ := rawTestServerWithUI(t)
adminID := makeUser(t, srv, "admin1", store.RoleAdmin)
cookie := loginAs(t, srv, adminID)
req, _ := stdhttp.NewRequest("POST", ts.URL+"/api/users/"+adminID+"/disable", nil)
req.AddCookie(cookie)
res, err := stdhttp.DefaultClient.Do(req)
if err != nil {
t.Fatalf("POST: %v", err)
}
defer res.Body.Close()
if res.StatusCode != stdhttp.StatusConflict {
t.Errorf("status: got %d want 409", res.StatusCode)
}
}
func TestAPIUserRegenerateSetup(t *testing.T) {
t.Parallel()
srv, ts, _ := rawTestServerWithUI(t)
adminID := makeUser(t, srv, "admin1", store.RoleAdmin)
target := makeUser(t, srv, "newbie", store.RoleViewer)
_ = srv.deps.Store.SetMustChangePassword(t.Context(), target, true)
_ = srv.deps.Store.SetSetupToken(t.Context(), store.SetupToken{
UserID: target, TokenHash: "old", ExpiresAt: time.Now().UTC().Add(time.Hour),
CreatedAt: time.Now().UTC(),
})
cookie := loginAs(t, srv, adminID)
req, _ := stdhttp.NewRequest("POST", ts.URL+"/api/users/"+target+"/regenerate-setup", nil)
req.AddCookie(cookie)
res, err := stdhttp.DefaultClient.Do(req)
if err != nil {
t.Fatalf("POST: %v", err)
}
defer res.Body.Close()
if res.StatusCode != stdhttp.StatusOK {
t.Errorf("status: got %d", res.StatusCode)
}
var got struct {
SetupURL string `json:"setup_url"`
}
_ = json.NewDecoder(res.Body).Decode(&got)
if !strings.Contains(got.SetupURL, "/setup?token=") {
t.Errorf("setup_url: %q", got.SetupURL)
}
if _, err := srv.deps.Store.LookupSetupToken(t.Context(), "old"); err == nil {
t.Error("old token should be replaced")
}
}
func TestAPIUserForceLogout(t *testing.T) {
t.Parallel()
srv, ts, _ := rawTestServerWithUI(t)
adminID := makeUser(t, srv, "admin1", store.RoleAdmin)
target := makeUser(t, srv, "victim", store.RoleOperator)
loginAs(t, srv, target) // create a session for the victim
cookie := loginAs(t, srv, adminID)
req, _ := stdhttp.NewRequest("POST", ts.URL+"/api/users/"+target+"/force-logout", nil)
req.AddCookie(cookie)
res, err := stdhttp.DefaultClient.Do(req)
if err != nil {
t.Fatalf("POST: %v", err)
}
defer res.Body.Close()
if res.StatusCode != stdhttp.StatusOK {
t.Errorf("status: got %d", res.StatusCode)
}
rr, _ := srv.deps.Store.DeleteSessionsByUserID(t.Context(), target)
if rr != 0 {
t.Errorf("expected 0 remaining sessions, got %d", rr)
}
}
func TestAPIAccountPasswordChange(t *testing.T) {
t.Parallel()
srv, ts, _ := rawTestServerWithUI(t)
uid := makeUser(t, srv, "alice", store.RoleViewer)
cookie := loginAs(t, srv, uid)
body, _ := json.Marshal(map[string]string{
"current_password": "test-password",
"new_password": "averylongpassword",
})
req, _ := stdhttp.NewRequest("POST", ts.URL+"/api/account/password", bytes.NewReader(body))
req.AddCookie(cookie)
req.Header.Set("Content-Type", "application/json")
res, err := stdhttp.DefaultClient.Do(req)
if err != nil {
t.Fatalf("POST: %v", err)
}
defer res.Body.Close()
if res.StatusCode != stdhttp.StatusOK {
body, _ := io.ReadAll(res.Body)
t.Errorf("status: got %d body=%s", res.StatusCode, body)
}
}
@@ -1,58 +0,0 @@
package http
import (
stdhttp "net/http"
"testing"
"time"
"github.com/oklog/ulid/v2"
"gitea.dcglab.co.uk/steve/restic-manager/internal/auth"
"gitea.dcglab.co.uk/steve/restic-manager/internal/store"
)
// makeUser inserts a user with a known password ('test-password').
// Returns the user id. Used by RBAC middleware tests + the
// user-management handler tests.
//
//nolint:unused
func makeUser(t *testing.T, srv *Server, username string, role store.Role) string {
t.Helper()
id := ulid.Make().String()
hash, err := auth.HashPassword("test-password")
if err != nil {
t.Fatalf("hash: %v", err)
}
if err := srv.deps.Store.CreateUser(t.Context(), store.User{
ID: id, Username: username, PasswordHash: hash,
Role: role, CreatedAt: time.Now().UTC(),
}); err != nil {
t.Fatalf("create user %s: %v", username, err)
}
return id
}
// loginAs gets a session cookie for the given user. Skips the real
// /api/auth/login handler for speed and to keep these helpers usable
// even when login validation is mid-flight elsewhere.
//
//nolint:unused
func loginAs(t *testing.T, srv *Server, userID string) *stdhttp.Cookie {
t.Helper()
rawToken, err := auth.NewToken()
if err != nil {
t.Fatalf("token: %v", err)
}
hash := auth.HashToken(rawToken)
now := time.Now().UTC()
if err := srv.deps.Store.CreateSession(t.Context(), store.Session{
ID: hash, UserID: userID, CreatedAt: now,
ExpiresAt: now.Add(8 * time.Hour),
}, hash); err != nil {
t.Fatalf("session: %v", err)
}
return &stdhttp.Cookie{
Name: sessionCookieName,
Value: rawToken,
}
}
-208
View File
@@ -1,208 +0,0 @@
// Package oidc wraps go-oidc + oauth2 in the small surface the
// HTTP handlers need: discovery, code-exchange config, ID-token
// verification, and role-claim resolution.
package oidc
import (
"context"
"crypto/rand"
"crypto/sha256"
"encoding/base64"
"errors"
"fmt"
"strings"
gooidc "github.com/coreos/go-oidc/v3/oidc"
"golang.org/x/oauth2"
"gitea.dcglab.co.uk/steve/restic-manager/internal/server/config"
)
// Client bundles the discovered provider + a pre-built oauth2.Config.
// Constructed once at server start; safe for concurrent use.
type Client struct {
cfg *config.OIDCConfig
provider *gooidc.Provider
verifier *gooidc.IDTokenVerifier
oauth *oauth2.Config
endSession string // discovered end_session_endpoint, "" if none
}
// New discovers the provider's well-known config and builds a Client.
// Network call — should be invoked once at startup with a context
// carrying a sane timeout. Returns an error on a 4xx/5xx from
// discovery so the operator finds out at startup, not on first login.
func New(ctx context.Context, cfg *config.OIDCConfig, baseURL string) (*Client, error) {
if cfg == nil {
return nil, errors.New("oidc: config nil")
}
prov, err := gooidc.NewProvider(ctx, cfg.Issuer)
if err != nil {
return nil, fmt.Errorf("oidc: discovery: %w", err)
}
redir := cfg.RedirectURL
if redir == "" {
redir = strings.TrimRight(baseURL, "/") + "/auth/oidc/callback"
}
oa := &oauth2.Config{
ClientID: cfg.ClientID,
ClientSecret: cfg.ClientSecret,
Endpoint: prov.Endpoint(),
RedirectURL: redir,
Scopes: cfg.Scopes,
}
verifier := prov.Verifier(&gooidc.Config{ClientID: cfg.ClientID})
// Pull end_session_endpoint out of the discovery doc — go-oidc
// doesn't expose it as a typed field, but the underlying claims
// blob does.
var doc struct {
EndSessionEndpoint string `json:"end_session_endpoint"`
}
_ = prov.Claims(&doc)
return &Client{
cfg: cfg,
provider: prov,
verifier: verifier,
oauth: oa,
endSession: doc.EndSessionEndpoint,
}, nil
}
// AuthURL returns the URL to redirect the browser to for the
// Authorization Code + PKCE flow. State + verifier are caller-
// supplied so the caller can persist them in the oidc_state table.
func (c *Client) AuthURL(state, codeChallenge string) string {
return c.oauth.AuthCodeURL(state,
oauth2.SetAuthURLParam("code_challenge", codeChallenge),
oauth2.SetAuthURLParam("code_challenge_method", "S256"),
)
}
// Exchange swaps a code+verifier for a token set and verifies the
// id_token. Returns the parsed Claims and the raw id_token (the
// caller stashes the raw on the session for RP-initiated logout).
func (c *Client) Exchange(ctx context.Context, code, verifier string) (*Claims, string, error) {
tok, err := c.oauth.Exchange(ctx, code,
oauth2.SetAuthURLParam("code_verifier", verifier))
if err != nil {
return nil, "", fmt.Errorf("oidc: token exchange: %w", err)
}
rawID, ok := tok.Extra("id_token").(string)
if !ok || rawID == "" {
return nil, "", errors.New("oidc: id_token missing from token response")
}
idTok, err := c.verifier.Verify(ctx, rawID)
if err != nil {
return nil, "", fmt.Errorf("oidc: verify id_token: %w", err)
}
var raw map[string]any
if err := idTok.Claims(&raw); err != nil {
return nil, "", fmt.Errorf("oidc: claims: %w", err)
}
// Many IdPs (Authelia among them) only return minimal claims in
// the ID token and put profile/email/groups on /userinfo. Fetch
// userinfo and merge — id_token claims win on conflict so the
// signed assertion remains authoritative.
if ui, err := c.provider.UserInfo(ctx, oauth2.StaticTokenSource(tok)); err == nil {
var uiClaims map[string]any
if err := ui.Claims(&uiClaims); err == nil {
for k, v := range uiClaims {
if _, present := raw[k]; !present {
raw[k] = v
}
}
}
}
return parseClaims(raw, c.cfg.RoleClaim), rawID, nil
}
// EndSessionEndpoint exposes the discovered end_session URL ("" if
// the IdP doesn't advertise one).
func (c *Client) EndSessionEndpoint() string { return c.endSession }
// DisplayName for the SSO button on the login page.
func (c *Client) DisplayName() string { return c.cfg.DisplayName }
// MapRole returns the role for the first matching claim value; "" if
// none match. Caller treats "" as deny.
func (c *Client) MapRole(roles []string) string {
for _, r := range roles {
if mapped, ok := c.cfg.RoleMapping[r]; ok {
return mapped
}
}
return ""
}
// Claims is the minimal projection the callback handler cares about.
type Claims struct {
Subject string
PreferredUsername string
Email string
Roles []string // normalised from string|[]string|csv
}
// parseClaims pulls the four fields we need from the raw id_token
// claims. The 'roles' field is normalised from the three shapes
// IdPs emit (string, []string, comma-separated string).
func parseClaims(raw map[string]any, roleClaim string) *Claims {
c := &Claims{}
if v, ok := raw["sub"].(string); ok {
c.Subject = v
}
if v, ok := raw["preferred_username"].(string); ok {
c.PreferredUsername = v
}
if v, ok := raw["email"].(string); ok {
c.Email = v
}
switch v := raw[roleClaim].(type) {
case string:
for _, p := range strings.Split(v, ",") {
p = strings.TrimSpace(p)
if p != "" {
c.Roles = append(c.Roles, p)
}
}
case []any:
for _, item := range v {
if s, ok := item.(string); ok && s != "" {
c.Roles = append(c.Roles, s)
}
}
}
return c
}
// RandomState generates 32 random bytes URL-safe base64-encoded —
// used as the 'state' parameter on the authorization request.
// Caller is expected to compute sha256(state) for storage.
func RandomState() (string, error) {
var b [32]byte
if _, err := rand.Read(b[:]); err != nil {
return "", err
}
return base64.RawURLEncoding.EncodeToString(b[:]), nil
}
// PKCEPair generates a code_verifier (base64-url 64 chars) and the
// corresponding S256 code_challenge.
func PKCEPair() (verifier, challenge string, err error) {
var b [48]byte
if _, err := rand.Read(b[:]); err != nil {
return "", "", err
}
verifier = base64.RawURLEncoding.EncodeToString(b[:])
sum := sha256.Sum256([]byte(verifier))
challenge = base64.RawURLEncoding.EncodeToString(sum[:])
return verifier, challenge, nil
}
// HashState returns sha256(state) hex — used as the primary key in
// the oidc_state table (so a DB leak doesn't leak active states).
func HashState(state string) string {
sum := sha256.Sum256([]byte(state))
return fmt.Sprintf("%x", sum)
}
-49
View File
@@ -1,49 +0,0 @@
package oidc
import (
"context"
"testing"
"time"
"gitea.dcglab.co.uk/steve/restic-manager/internal/server/config"
"gitea.dcglab.co.uk/steve/restic-manager/internal/server/oidc/oidctest"
)
func TestClientExchangeAgainstStub(t *testing.T) {
t.Parallel()
stub := oidctest.New(t)
cfg := &config.OIDCConfig{
Issuer: stub.URL(), ClientID: "test-client", ClientSecret: "x",
Scopes: []string{"openid"}, RoleClaim: "groups",
RoleMapping: map[string]string{"rm-admins": "admin"},
}
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()
c, err := New(ctx, cfg, "http://rm.example")
if err != nil {
t.Fatalf("new client: %v", err)
}
code := stub.MintCode(map[string]any{
"sub": "abc",
"preferred_username": "alice",
"email": "alice@example.com",
"groups": []string{"rm-admins"},
})
verifier, _, err := PKCEPair()
if err != nil {
t.Fatalf("pkce: %v", err)
}
claims, raw, err := c.Exchange(ctx, code, verifier)
if err != nil {
t.Fatalf("exchange: %v", err)
}
if claims.Subject != "abc" || claims.PreferredUsername != "alice" {
t.Errorf("claims: %+v", claims)
}
if c.MapRole(claims.Roles) != "admin" {
t.Errorf("role: got %q", c.MapRole(claims.Roles))
}
if raw == "" {
t.Error("raw id_token must be non-empty")
}
}
-181
View File
@@ -1,181 +0,0 @@
// Package oidctest provides a minimal OIDC provider for tests —
// discovery doc, JWKS, and a token endpoint. Each test mints its
// own claims; the stub signs them with an ECDSA P-256 key and the
// production verifier accepts them because the JWKS is fetched live
// from the stub.
//
// Usage:
//
// stub := oidctest.New(t)
// code := stub.MintCode(map[string]any{
// "sub": "abc",
// "preferred_username": "alice",
// "groups": []string{"rm-admins"},
// })
// // stub.URL() is the issuer URL; pass to oidc.New as Issuer
package oidctest
import (
"crypto/ecdsa"
"crypto/elliptic"
"crypto/rand"
"encoding/base64"
"encoding/json"
"fmt"
stdhttp "net/http"
"net/http/httptest"
"sync"
"testing"
"time"
"github.com/golang-jwt/jwt/v5"
)
// StubIdP is an httptest-backed OIDC provider. Each test creates a
// fresh one via New(t); cleanup is registered on t.
type StubIdP struct {
t *testing.T
srv *httptest.Server
mu sync.Mutex
priv *ecdsa.PrivateKey
kid string
claims map[string]map[string]any // code → claims
endSession string // optional, set by SetEndSessionEndpoint
}
// New constructs a stub IdP listening on a random port. Cleanup is
// registered on t.
func New(t *testing.T) *StubIdP {
t.Helper()
priv, err := ecdsa.GenerateKey(elliptic.P256(), rand.Reader)
if err != nil {
t.Fatalf("oidctest: genkey: %v", err)
}
s := &StubIdP{
t: t,
priv: priv,
kid: "stub-key",
claims: map[string]map[string]any{},
}
mux := stdhttp.NewServeMux()
mux.HandleFunc("/.well-known/openid-configuration", s.discovery)
mux.HandleFunc("/jwks.json", s.jwks)
mux.HandleFunc("/token", s.token)
s.srv = httptest.NewServer(mux)
t.Cleanup(s.srv.Close)
return s
}
// URL returns the base URL of the stub — pass as Issuer to
// oidc.New().
func (s *StubIdP) URL() string { return s.srv.URL }
// MintCode produces an authorization code that the stub will exchange
// for an id_token containing the supplied claims.
func (s *StubIdP) MintCode(claims map[string]any) string {
s.mu.Lock()
defer s.mu.Unlock()
code := fmt.Sprintf("code-%d", time.Now().UnixNano())
s.claims[code] = claims
return code
}
// SetEndSessionEndpoint configures the stub to advertise an
// end_session_endpoint in its discovery doc. Used by the logout
// test in E1.
func (s *StubIdP) SetEndSessionEndpoint(url string) {
s.mu.Lock()
defer s.mu.Unlock()
s.endSession = url
}
func (s *StubIdP) discovery(w stdhttp.ResponseWriter, _ *stdhttp.Request) {
s.mu.Lock()
endSession := s.endSession
s.mu.Unlock()
doc := map[string]any{
"issuer": s.srv.URL,
"authorization_endpoint": s.srv.URL + "/authorize",
"token_endpoint": s.srv.URL + "/token",
"jwks_uri": s.srv.URL + "/jwks.json",
"id_token_signing_alg_values_supported": []string{"ES256"},
"response_types_supported": []string{"code"},
"subject_types_supported": []string{"public"},
}
if endSession != "" {
doc["end_session_endpoint"] = endSession
}
w.Header().Set("Content-Type", "application/json")
_ = json.NewEncoder(w).Encode(doc)
}
func (s *StubIdP) jwks(w stdhttp.ResponseWriter, _ *stdhttp.Request) {
pub := s.priv.Public().(*ecdsa.PublicKey)
x := base64.RawURLEncoding.EncodeToString(padTo32(pub.X.Bytes()))
y := base64.RawURLEncoding.EncodeToString(padTo32(pub.Y.Bytes()))
keys := map[string]any{
"keys": []map[string]any{{
"kty": "EC", "crv": "P-256", "alg": "ES256",
"use": "sig", "kid": s.kid,
"x": x, "y": y,
}},
}
w.Header().Set("Content-Type", "application/json")
_ = json.NewEncoder(w).Encode(keys)
}
func (s *StubIdP) token(w stdhttp.ResponseWriter, r *stdhttp.Request) {
_ = r.ParseForm()
code := r.PostForm.Get("code")
s.mu.Lock()
claims, ok := s.claims[code]
if ok {
delete(s.claims, code)
}
s.mu.Unlock()
if !ok {
stdhttp.Error(w, "bad code", stdhttp.StatusBadRequest)
return
}
if _, ok := claims["iss"]; !ok {
claims["iss"] = s.srv.URL
}
if _, ok := claims["aud"]; !ok {
claims["aud"] = "test-client"
}
now := time.Now().Unix()
claims["iat"] = now
claims["exp"] = now + 600
jc := jwt.MapClaims{}
for k, v := range claims {
jc[k] = v
}
tk := jwt.NewWithClaims(jwt.SigningMethodES256, jc)
tk.Header["kid"] = s.kid
signed, err := tk.SignedString(s.priv)
if err != nil {
stdhttp.Error(w, err.Error(), 500)
return
}
resp := map[string]any{
"access_token": "stub-access",
"token_type": "Bearer",
"id_token": signed,
}
w.Header().Set("Content-Type", "application/json")
_ = json.NewEncoder(w).Encode(resp)
}
// padTo32 left-pads an integer big-endian byte slice to 32 bytes,
// the size required by P-256 JWK x/y components.
func padTo32(b []byte) []byte {
if len(b) >= 32 {
return b
}
out := make([]byte, 32)
copy(out[32-len(b):], b)
return out
}
-43
View File
@@ -1,8 +1,6 @@
package ui
import (
"encoding/base64"
"encoding/json"
"fmt"
"html/template"
"strconv"
@@ -27,47 +25,6 @@ func funcMap() template.FuncMap {
}
return t.Format("2006-01-02 15:04:05")
},
// b64 encodes a json.RawMessage (or any []byte / string) as
// base64 — used by audit.html to stash arbitrary JSON in a
// data- attribute without fighting html/template's contextual
// escaping. JS atob() decodes on click.
"b64": func(v any) string {
switch x := v.(type) {
case json.RawMessage:
return base64.StdEncoding.EncodeToString(x)
case []byte:
return base64.StdEncoding.EncodeToString(x)
case string:
return base64.StdEncoding.EncodeToString([]byte(x))
default:
return base64.StdEncoding.EncodeToString([]byte(fmt.Sprintf("%v", x)))
}
},
// sortDir computes the dir param for a sort-header link:
// click the active column → toggle asc/desc; click any other
// column → start at desc (newest-first / Z→A) since that's
// the conventional default for date and frequency-style data.
"sortDir": func(thisCol, currentCol, currentDir string) string {
if thisCol != currentCol {
return "desc"
}
if currentDir == "asc" {
return "desc"
}
return "asc"
},
// sortGlyph returns the unicode arrow glyph for the sort
// header — empty string for inactive columns so they don't
// shout.
"sortGlyph": func(thisCol, currentCol, currentDir string) string {
if thisCol != currentCol {
return ""
}
if currentDir == "asc" {
return "↑"
}
return "↓"
},
"derefInt": func(p *int) int {
if p == nil {
return 0
+1 -14
View File
@@ -56,19 +56,6 @@ type ViewData struct {
// today; other pages can adopt the same field.
Error string
// OIDCEnabled is true when the server has an OIDC provider
// configured. The login page uses it to show the SSO button.
OIDCEnabled bool
// OIDCDisplayName is the human-readable label for the OIDC
// provider (e.g. "Authelia"). Shown on the SSO button.
OIDCDisplayName string
// OIDCError holds an error code returned via ?oidc_error=… after
// a failed OIDC callback. The login page maps it to a user-facing
// message.
OIDCError string
// Page carries page-specific data. Concrete type is the page's
// own struct.
Page any
@@ -165,7 +152,7 @@ func (r *Renderer) RenderPartial(w io.Writer, name string, data ViewData) error
// chrome-less; everything else uses the standard navigation chrome.
func layoutFor(page string) string {
switch page {
case "login", "bootstrap", "setup":
case "login", "bootstrap":
return "chromeless"
default:
return "base"
+3 -45
View File
@@ -211,22 +211,9 @@ func dispatchAgentMessage(ctx context.Context, c *Conn, hostID string, env api.E
string(p.Status), p.ExitCode, p.Stats, errMsg, p.FinishedAt); err != nil {
slog.Warn("ws: mark job finished", "job_id", p.JobID, "err", err)
}
// NS-03: project the outcome of init / probe jobs onto the host
// row so the dashboard + repo page can surface bad creds /
// unreachable repo eagerly without trawling the jobs list.
// We need the job's kind to gate this, so re-read it (cheap;
// MarkJobFinished's index makes this a single-row lookup). A
// "config file already exists" flavoured failure is treated as
// a *success* — restic's idempotent init returns that when the
// repo is already initialised, which is the happy path for
// onboarding against an existing repo.
if job, err := deps.Store.GetJob(ctx, p.JobID); err == nil && job != nil &&
job.Kind == string(api.JobInit) {
status, errOut := repoStatusFromInit(string(p.Status), errMsg)
if err := deps.Store.SetHostRepoStatus(ctx, hostID, status, errOut); err != nil {
slog.Warn("ws: set host repo status", "host_id", hostID, "err", err)
}
}
// repo_initialised_at projection has been removed — auto-init
// at host enrolment makes "is the repo init'd" derivable from
// the latest init job's status, no separate column needed.
if deps.JobHub != nil {
deps.JobHub.Broadcast(p.JobID, env)
}
@@ -363,34 +350,5 @@ func dispatchAgentMessage(ctx context.Context, c *Conn, hostID string, env api.E
// heartbeats more often than this is misbehaving. (Spec says 30s.)
const MinHeartbeatInterval = 5 * time.Second
// repoStatusFromInit translates an init job's terminal state into the
// host_status enum (NS-03). Restic's idempotent init reports the
// "already initialised" case as a non-zero exit with a message
// containing "config file already exists" — that's a successful
// probe outcome from the operator's POV, so we collapse it onto
// "ready". Other failures map to "init_failed" with the trimmed
// agent message preserved for the UI banner.
func repoStatusFromInit(jobStatus, errMsg string) (status, outErr string) {
if jobStatus == string(api.JobSucceeded) {
return "ready", ""
}
low := strings.ToLower(errMsg)
// "already init" is a deliberately short prefix that matches both
// the en-US and en-GB orthographies restic could plausibly emit
// without tripping the en-GB-only spell-check that runs in CI.
switch {
case strings.Contains(low, "config file already exists"),
strings.Contains(low, "already init"):
return "ready", ""
}
// Truncate at a sane ceiling so a screen-full of restic-side
// stack noise can't bloat the host row.
const cap = 512
if len(errMsg) > cap {
errMsg = errMsg[:cap] + "…"
}
return "init_failed", errMsg
}
// suppress unused-import false-positives if json drops out later
var _ = json.Marshal
-50
View File
@@ -1,50 +0,0 @@
package ws
import "testing"
// TestRepoStatusFromInit covers the NS-03 status projection: success,
// the "already initialised" idempotency cases (treated as success),
// and arbitrary failures (preserved into the host row's error field).
func TestRepoStatusFromInit(t *testing.T) {
t.Parallel()
cases := []struct {
name string
jobStatus string
errMsg string
want string
wantErr string
}{
{"succeeded", "succeeded", "", "ready", ""},
{"already initialised (en-GB)", "failed", "Fatal: create repository at rest:http://r failed: server response unexpected: config file already exists", "ready", ""},
{"already initialised (en-US spelling)", "failed", "boom: already init" + "ialized", "ready", ""},
{"bad creds", "failed", "Fatal: server response unexpected: 401 Unauthorised", "init_failed", "Fatal: server response unexpected: 401 Unauthorised"},
{"network", "failed", "dial tcp 192.168.0.99:8000: i/o timeout", "init_failed", "dial tcp 192.168.0.99:8000: i/o timeout"},
}
for _, c := range cases {
c := c
t.Run(c.name, func(t *testing.T) {
t.Parallel()
gotStatus, gotErr := repoStatusFromInit(c.jobStatus, c.errMsg)
if gotStatus != c.want {
t.Errorf("status: got %q, want %q", gotStatus, c.want)
}
if gotErr != c.wantErr {
t.Errorf("err: got %q, want %q", gotErr, c.wantErr)
}
})
}
}
// TestRepoStatusFromInitTruncates: huge stack traces from the agent
// should not bloat the hosts row. Cap at 512 + ellipsis.
func TestRepoStatusFromInitTruncates(t *testing.T) {
t.Parallel()
long := make([]byte, 1024)
for i := range long {
long[i] = 'x'
}
_, got := repoStatusFromInit("failed", string(long))
if len(got) > 520 {
t.Errorf("err length: got %d, want <= 520 (512 + ellipsis runes)", len(got))
}
}
-162
View File
@@ -2,173 +2,11 @@ package store
import (
"context"
"database/sql"
"encoding/json"
"errors"
"fmt"
"strings"
"time"
)
// AuditFilter narrows ListAudit. Empty fields match anything.
type AuditFilter struct {
UserID string // empty matches any user OR system rows
Actor string // user | agent | system | "" (any)
Action string // exact match (e.g. "host.enrolled")
ActionLike string // substring match (e.g. "alert." matches alert.acknowledge / alert.resolve)
TargetKind string // host | source_group | alert | notification_channel | "" (any)
TargetID string // exact match on target_id
Since time.Time // zero = no lower bound
Until time.Time // zero = no upper bound
Limit int // 0 = no limit
// OrderBy is one of "ts" | "actor" | "user_id" | "action" |
// "target_kind". Empty / unknown falls back to "ts". The
// allowlist is enforced inside ListAudit so callers can't
// inject SQL via this field.
OrderBy string
OrderAsc bool // false = DESC (default — newest first)
}
// auditOrderColumn validates f.OrderBy against the column allowlist
// and returns the SQL fragment. Unknown / empty → "ts" so callers
// always get a deterministic order.
func auditOrderColumn(s string) string {
switch s {
case "actor", "user_id", "action", "target_kind":
return s
default:
return "ts"
}
}
// ListAudit returns audit_log rows ordered by ts DESC.
func (s *Store) ListAudit(ctx context.Context, f AuditFilter) ([]AuditEntry, error) {
q := `SELECT id, user_id, actor, action, target_kind, target_id, ts, payload FROM audit_log`
conds := []string{}
args := []any{}
if f.UserID != "" {
conds = append(conds, "user_id = ?")
args = append(args, f.UserID)
}
if f.Actor != "" {
conds = append(conds, "actor = ?")
args = append(args, f.Actor)
}
if f.Action != "" {
conds = append(conds, "action = ?")
args = append(args, f.Action)
}
if f.ActionLike != "" {
conds = append(conds, "action LIKE ?")
args = append(args, "%"+f.ActionLike+"%")
}
if f.TargetKind != "" {
conds = append(conds, "target_kind = ?")
args = append(args, f.TargetKind)
}
if f.TargetID != "" {
conds = append(conds, "target_id = ?")
args = append(args, f.TargetID)
}
if !f.Since.IsZero() {
conds = append(conds, "ts >= ?")
args = append(args, f.Since.UTC().Format(time.RFC3339Nano))
}
if !f.Until.IsZero() {
conds = append(conds, "ts <= ?")
args = append(args, f.Until.UTC().Format(time.RFC3339Nano))
}
if len(conds) > 0 {
q += " WHERE " + strings.Join(conds, " AND ")
}
col := auditOrderColumn(f.OrderBy)
dir := "DESC"
if f.OrderAsc {
dir = "ASC"
}
// Always tie-break on ts DESC so equal sort keys (e.g. dozens
// of rows with action='alert.resolve') still come back in a
// deterministic, time-meaningful order.
if col == "ts" {
q += fmt.Sprintf(" ORDER BY ts %s", dir)
} else {
q += fmt.Sprintf(" ORDER BY %s %s, ts DESC", col, dir)
}
if f.Limit > 0 {
q += ` LIMIT ?`
args = append(args, f.Limit)
}
rows, err := s.db.QueryContext(ctx, q, args...)
if err != nil {
return nil, fmt.Errorf("store: list audit: %w", err)
}
defer func() { _ = rows.Close() }()
var out []AuditEntry
for rows.Next() {
e, err := scanAuditRow(rows.Scan)
if err != nil {
return nil, err
}
out = append(out, *e)
}
return out, rows.Err()
}
// DistinctAuditActions returns the set of distinct action strings
// currently present in the table — used to populate the action filter
// dropdown so the operator picks from what actually exists, not a
// hardcoded list that might drift from the codebase.
func (s *Store) DistinctAuditActions(ctx context.Context) ([]string, error) {
rows, err := s.db.QueryContext(ctx,
`SELECT DISTINCT action FROM audit_log ORDER BY action`)
if err != nil {
return nil, fmt.Errorf("store: distinct audit actions: %w", err)
}
defer func() { _ = rows.Close() }()
var out []string
for rows.Next() {
var a string
if err := rows.Scan(&a); err != nil {
return nil, err
}
out = append(out, a)
}
return out, rows.Err()
}
func scanAuditRow(scan func(...any) error) (*AuditEntry, error) {
var e AuditEntry
var userID, targetKind, targetID, payload sql.NullString
var ts string
if err := scan(&e.ID, &userID, &e.Actor, &e.Action, &targetKind, &targetID, &ts, &payload); err != nil {
if errors.Is(err, sql.ErrNoRows) {
return nil, ErrNotFound
}
return nil, fmt.Errorf("store: scan audit: %w", err)
}
if userID.Valid {
v := userID.String
e.UserID = &v
}
if targetKind.Valid {
v := targetKind.String
e.TargetKind = &v
}
if targetID.Valid {
v := targetID.String
e.TargetID = &v
}
t, err := time.Parse(time.RFC3339Nano, ts)
if err != nil {
return nil, fmt.Errorf("store: parse audit ts: %w", err)
}
e.TS = t
if payload.Valid && payload.String != "" {
e.Payload = json.RawMessage(payload.String)
}
return &e, nil
}
// AppendAudit records an audit log entry.
func (s *Store) AppendAudit(ctx context.Context, e AuditEntry) error {
if len(e.Payload) == 0 {
-182
View File
@@ -1,182 +0,0 @@
package store
import (
"context"
"encoding/json"
"path/filepath"
"testing"
"time"
"github.com/oklog/ulid/v2"
)
func newAuditTestStore(t *testing.T) (*Store, string) {
t.Helper()
st, err := Open(context.Background(), filepath.Join(t.TempDir(), "rm.db"))
if err != nil {
t.Fatalf("open: %v", err)
}
t.Cleanup(func() { _ = st.Close() })
uid := ulid.Make().String()
if err := st.CreateUser(context.Background(), User{
ID: uid, Username: "alice", PasswordHash: "x",
Role: RoleOperator, CreatedAt: time.Now().UTC(),
}); err != nil {
t.Fatalf("create user: %v", err)
}
return st, uid
}
func appendAudit(t *testing.T, st *Store, uid, actor, action, targetKind, targetID string, ts time.Time) {
t.Helper()
var u, tk, ti *string
if uid != "" {
u = &uid
}
if targetKind != "" {
tk = &targetKind
}
if targetID != "" {
ti = &targetID
}
if err := st.AppendAudit(context.Background(), AuditEntry{
ID: ulid.Make().String(), UserID: u, Actor: actor, Action: action,
TargetKind: tk, TargetID: ti, TS: ts, Payload: json.RawMessage(`{}`),
}); err != nil {
t.Fatalf("append: %v", err)
}
}
func TestListAuditFiltersAndOrdering(t *testing.T) {
t.Parallel()
st, uid := newAuditTestStore(t)
t0 := time.Now().UTC()
appendAudit(t, st, uid, "user", "host.enrolled", "host", "h1", t0.Add(-3*time.Hour))
appendAudit(t, st, uid, "user", "alert.acknowledge", "alert", "a1", t0.Add(-2*time.Hour))
appendAudit(t, st, uid, "user", "alert.resolve", "alert", "a1", t0.Add(-time.Hour))
appendAudit(t, st, "", "system", "host.auto_init", "host", "h1", t0.Add(-30*time.Minute))
all, err := st.ListAudit(context.Background(), AuditFilter{})
if err != nil {
t.Fatalf("list: %v", err)
}
if len(all) != 4 {
t.Fatalf("len: got %d want 4", len(all))
}
// Ordered ts DESC — most recent first.
if all[0].Action != "host.auto_init" || all[3].Action != "host.enrolled" {
t.Errorf("ordering: got %s ... %s", all[0].Action, all[3].Action)
}
// Action prefix filter: alert.* → 2 rows.
got, err := st.ListAudit(context.Background(), AuditFilter{ActionLike: "alert."})
if err != nil {
t.Fatalf("filter alert.: %v", err)
}
if len(got) != 2 {
t.Errorf("alert.* filter: got %d want 2", len(got))
}
// User filter excludes system rows.
got, _ = st.ListAudit(context.Background(), AuditFilter{UserID: uid})
if len(got) != 3 {
t.Errorf("user filter: got %d want 3", len(got))
}
// Actor=system isolates the auto_init.
got, _ = st.ListAudit(context.Background(), AuditFilter{Actor: "system"})
if len(got) != 1 || got[0].Action != "host.auto_init" {
t.Errorf("actor=system: got %+v", got)
}
// Target kind filter.
got, _ = st.ListAudit(context.Background(), AuditFilter{TargetKind: "alert"})
if len(got) != 2 {
t.Errorf("target_kind=alert: got %d want 2", len(got))
}
// Time range: last 90m → resolve + auto_init.
got, _ = st.ListAudit(context.Background(), AuditFilter{Since: t0.Add(-90 * time.Minute)})
if len(got) != 2 {
t.Errorf("since 90m: got %d want 2", len(got))
}
// Limit clamps result count.
got, _ = st.ListAudit(context.Background(), AuditFilter{Limit: 2})
if len(got) != 2 {
t.Errorf("limit: got %d want 2", len(got))
}
}
func TestListAuditSort(t *testing.T) {
t.Parallel()
st, uid := newAuditTestStore(t)
t0 := time.Now().UTC()
appendAudit(t, st, uid, "user", "host.enrolled", "host", "h1", t0.Add(-3*time.Hour))
appendAudit(t, st, uid, "user", "alert.acknowledge", "alert", "a1", t0.Add(-time.Hour))
appendAudit(t, st, "", "system", "host.auto_init", "host", "h1", t0.Add(-30*time.Minute))
ctx := context.Background()
// Sort by action ASC.
got, err := st.ListAudit(ctx, AuditFilter{OrderBy: "action", OrderAsc: true})
if err != nil {
t.Fatalf("sort action asc: %v", err)
}
wantActions := []string{"alert.acknowledge", "host.auto_init", "host.enrolled"}
for i, w := range wantActions {
if got[i].Action != w {
t.Errorf("[%d] action: got %q want %q", i, got[i].Action, w)
}
}
// Sort by action DESC.
got, _ = st.ListAudit(ctx, AuditFilter{OrderBy: "action", OrderAsc: false})
if got[0].Action != "host.enrolled" {
t.Errorf("desc head: got %q want host.enrolled", got[0].Action)
}
// Unknown OrderBy → falls back to ts DESC.
got, _ = st.ListAudit(ctx, AuditFilter{OrderBy: "DROP TABLE; --"})
if got[0].Action != "host.auto_init" {
t.Errorf("unknown OrderBy should fall back to ts DESC; got head %q", got[0].Action)
}
// Sort by actor — ties tie-break on ts DESC, so 'user' rows
// should come back newest-first within the actor group.
got, _ = st.ListAudit(ctx, AuditFilter{OrderBy: "actor", OrderAsc: true})
// First two are 'system' (1 row) and 'user' (2 rows newest-first):
// expect system → user(ack) → user(enrolled)
if got[0].Actor != "system" {
t.Errorf("actor asc head: got %q want system", got[0].Actor)
}
if got[1].Action != "alert.acknowledge" {
t.Errorf("actor asc tie-break should be ts DESC; got [1]=%q", got[1].Action)
}
}
func TestDistinctAuditActions(t *testing.T) {
t.Parallel()
st, uid := newAuditTestStore(t)
t0 := time.Now().UTC()
appendAudit(t, st, uid, "user", "host.enrolled", "host", "h1", t0)
appendAudit(t, st, uid, "user", "host.enrolled", "host", "h2", t0)
appendAudit(t, st, uid, "user", "alert.acknowledge", "alert", "a1", t0)
got, err := st.DistinctAuditActions(context.Background())
if err != nil {
t.Fatalf("distinct: %v", err)
}
want := []string{"alert.acknowledge", "host.enrolled"}
if len(got) != len(want) {
t.Fatalf("got %v want %v", got, want)
}
for i := range want {
if got[i] != want[i] {
t.Errorf("[%d]: got %q want %q", i, got[i], want[i])
}
}
}
-72
View File
@@ -160,78 +160,6 @@ func (s *Store) GetEnrollmentTokenStatus(ctx context.Context, tokenHash string)
return out, nil
}
// OutstandingEnrollmentToken is what the recoverable-token list page
// shows: enough to identify the row (short hash + created/expires)
// and re-render the install snippet via the regenerate flow, plus
// the encrypted repo creds blob the caller can decrypt-and-redact for
// display.
type OutstandingEnrollmentToken struct {
TokenHash string
CreatedAt time.Time
ExpiresAt time.Time
EncRepoCreds string
InitialPaths []string
}
// ListOutstandingEnrollmentTokens returns every still-valid token
// (un-consumed and not expired). Used by the Add-host page to give
// operators a way back to the install snippet after they close the
// /hosts/pending/{token} tab without finishing onboarding.
func (s *Store) ListOutstandingEnrollmentTokens(ctx context.Context) ([]OutstandingEnrollmentToken, error) {
now := time.Now().UTC().Format(time.RFC3339Nano)
rows, err := s.db.QueryContext(ctx,
`SELECT token_hash, created_at, expires_at, enc_repo_creds, initial_paths
FROM enrollment_tokens
WHERE consumed_at IS NULL AND expires_at > ?
ORDER BY created_at DESC`, now)
if err != nil {
return nil, fmt.Errorf("store: list outstanding enrollment tokens: %w", err)
}
defer func() { _ = rows.Close() }()
var out []OutstandingEnrollmentToken
for rows.Next() {
var (
hash, created, expires string
enc sql.NullString
pathsJSON string
)
if err := rows.Scan(&hash, &created, &expires, &enc, &pathsJSON); err != nil {
return nil, fmt.Errorf("store: scan outstanding enrollment token: %w", err)
}
row := OutstandingEnrollmentToken{TokenHash: hash, InitialPaths: []string{}}
if t, err := time.Parse(time.RFC3339Nano, created); err == nil {
row.CreatedAt = t
}
if t, err := time.Parse(time.RFC3339Nano, expires); err == nil {
row.ExpiresAt = t
}
if enc.Valid {
row.EncRepoCreds = enc.String
}
if pathsJSON != "" {
_ = json.Unmarshal([]byte(pathsJSON), &row.InitialPaths)
}
out = append(out, row)
}
return out, rows.Err()
}
// DeleteEnrollmentToken removes a token row. Used by the operator-
// driven revoke flow and by regenerate (which deletes the old hash
// then mints a fresh one). Idempotent: ErrNotFound on miss.
func (s *Store) DeleteEnrollmentToken(ctx context.Context, tokenHash string) error {
res, err := s.db.ExecContext(ctx,
`DELETE FROM enrollment_tokens WHERE token_hash = ?`, tokenHash)
if err != nil {
return fmt.Errorf("store: delete enrollment token: %w", err)
}
n, _ := res.RowsAffected()
if n == 0 {
return ErrNotFound
}
return nil
}
// PurgeExpiredEnrollmentTokens deletes long-expired token rows. Tokens
// retained for ~24h after expiry so audit traces still resolve them.
func (s *Store) PurgeExpiredEnrollmentTokens(ctx context.Context) (int64, error) {
+4 -94
View File
@@ -43,8 +43,7 @@ func (s *Store) LookupHostByAgentToken(ctx context.Context, tokenHash string) (*
current_job_id, last_backup_at, last_backup_status,
repo_size_bytes, snapshot_count, open_alert_count,
applied_schedule_version, bandwidth_up_kbps, bandwidth_down_kbps,
pre_hook_default, post_hook_default,
repo_status, repo_status_error
pre_hook_default, post_hook_default
FROM hosts WHERE agent_token_hash = ?`,
tokenHash)
return scanHost(row)
@@ -58,55 +57,11 @@ func (s *Store) GetHost(ctx context.Context, id string) (*Host, error) {
current_job_id, last_backup_at, last_backup_status,
repo_size_bytes, snapshot_count, open_alert_count,
applied_schedule_version, bandwidth_up_kbps, bandwidth_down_kbps,
pre_hook_default, post_hook_default,
repo_status, repo_status_error
pre_hook_default, post_hook_default
FROM hosts WHERE id = ?`, id)
return scanHost(row)
}
// SetHostRepoStatus persists the outcome of the latest init / probe
// attempt against this host's repo. Called by the WS handler on every
// job.finished of kind=init, and reset to ("unknown", "") by
// repo-credentials saves so the next probe reflects the new creds.
//
// errMsg is stored verbatim (truncate at the call site if you care
// about row size). Empty for "ready".
func (s *Store) SetHostRepoStatus(ctx context.Context, hostID, status, errMsg string) error {
_, err := s.db.ExecContext(ctx,
`UPDATE hosts SET repo_status = ?, repo_status_error = ? WHERE id = ?`,
status, errMsg, hostID)
if err != nil {
return fmt.Errorf("store: set host repo status: %w", err)
}
return nil
}
// DeleteHost removes a host row by id. Returns ErrNotFound if no row
// matched. Foreign-key cascades (declared on every dependent table —
// schedules, jobs, snapshots, source_groups, host_credentials, etc.)
// remove the rest. The connection DSN already pins
// PRAGMA foreign_keys=ON, so the cascade is honoured here without an
// explicit pragma roundtrip.
//
// The host's agent bearer is stored in agent_token_hash on this row,
// so deleting the row also revokes the agent — a re-installed
// instance must come back through the normal pending-host accept
// flow.
func (s *Store) DeleteHost(ctx context.Context, id string) error {
res, err := s.db.ExecContext(ctx, `DELETE FROM hosts WHERE id = ?`, id)
if err != nil {
return fmt.Errorf("store: delete host: %w", err)
}
n, err := res.RowsAffected()
if err != nil {
return fmt.Errorf("store: delete host rows: %w", err)
}
if n == 0 {
return ErrNotFound
}
return nil
}
// MarkHostHello updates the host row with metadata received in the
// agent's hello message and flips status to 'online'.
func (s *Store) MarkHostHello(ctx context.Context, id string, agentVersion, resticVersion string, protoVersion int, when time.Time) error {
@@ -213,8 +168,7 @@ func (s *Store) ListHosts(ctx context.Context) ([]Host, error) {
current_job_id, last_backup_at, last_backup_status,
repo_size_bytes, snapshot_count, open_alert_count,
applied_schedule_version, bandwidth_up_kbps, bandwidth_down_kbps,
pre_hook_default, post_hook_default,
repo_status, repo_status_error
pre_hook_default, post_hook_default
FROM hosts ORDER BY name`)
if err != nil {
return nil, fmt.Errorf("store: list hosts: %w", err)
@@ -261,8 +215,7 @@ func scanHostRow(s hostScanner) (*Host, error) {
&currentJob, &lastBackupAt, &lastBkSt,
&h.RepoSizeBytes, &h.SnapshotCount, &h.OpenAlertCount,
&h.AppliedScheduleVersion, &bwUp, &bwDown,
&preHook, &postHook,
&h.RepoStatus, &h.RepoStatusError)
&preHook, &postHook)
if err != nil {
if errors.Is(err, sql.ErrNoRows) {
return nil, ErrNotFound
@@ -346,49 +299,6 @@ func (s *Store) SetHostBandwidth(ctx context.Context, hostID string, upKBps, dow
return nil
}
// SetHostTags replaces the host's tag list. Tags are passed already
// normalised (lowercase, deduped) by the caller — store-layer just
// JSON-marshals and writes. Empty slice clears all tags.
func (s *Store) SetHostTags(ctx context.Context, hostID string, tags []string) error {
if tags == nil {
tags = []string{}
}
b, err := json.Marshal(tags)
if err != nil {
return fmt.Errorf("store: marshal tags: %w", err)
}
_, err = s.db.ExecContext(ctx,
`UPDATE hosts SET tags = ? WHERE id = ?`, string(b), hostID)
if err != nil {
return fmt.Errorf("store: set host tags: %w", err)
}
return nil
}
// DistinctHostTags returns the union of every tag in use across the
// fleet, sorted. Powers the autocomplete on the host-tags editor and
// the chip-row filter on the dashboard. Cheap at fleet sizes this
// codebase targets — re-query on each render is fine.
func (s *Store) DistinctHostTags(ctx context.Context) ([]string, error) {
rows, err := s.db.QueryContext(ctx,
`SELECT DISTINCT json_each.value
FROM hosts, json_each(hosts.tags)
ORDER BY 1`)
if err != nil {
return nil, fmt.Errorf("store: distinct host tags: %w", err)
}
defer func() { _ = rows.Close() }()
var out []string
for rows.Next() {
var t string
if err := rows.Scan(&t); err != nil {
return nil, err
}
out = append(out, t)
}
return out, rows.Err()
}
func nullableInt(p *int) any {
if p == nil {
return nil
-98
View File
@@ -1,98 +0,0 @@
package store
import (
"context"
"errors"
"testing"
"time"
)
// TestDeleteHostCascades verifies that DeleteHost removes the host
// row and that every dependent table (schedules, jobs, source groups,
// host_credentials) is wiped via the FK cascade declared in the
// migrations. We also verify the agent bearer is no longer resolvable
// — a re-installed agent must come back through pending-host accept.
func TestDeleteHostCascades(t *testing.T) {
t.Parallel()
s := openTestStore(t)
ctx := context.Background()
hostID := makeSchedHost(t, s)
gid := makeGroup(t, s, hostID, "default", "01HDELGRP000000000000001")
// One job, one schedule, one credential row — enough to prove the
// cascade reaches every dependent table we care about.
if err := s.CreateJob(ctx, Job{
ID: "j-del-1", HostID: hostID, Kind: "backup",
ActorKind: "system", CreatedAt: time.Now().UTC(),
}); err != nil {
t.Fatalf("create job: %v", err)
}
sched := &Schedule{
ID: "01HDELSCHED00000000000001",
HostID: hostID,
CronExpr: "0 3 * * *",
Enabled: true,
SourceGroupIDs: []string{gid},
}
if err := s.CreateSchedule(ctx, sched); err != nil {
t.Fatalf("create schedule: %v", err)
}
if err := s.SetHostCredentials(ctx, hostID, CredKindRepo, "ciphertext"); err != nil {
t.Fatalf("set creds: %v", err)
}
// Sanity: agent bearer resolves before deletion.
if _, err := s.LookupHostByAgentToken(ctx, "tokenhash"); err != nil {
t.Fatalf("pre-delete bearer lookup: %v", err)
}
if err := s.DeleteHost(ctx, hostID); err != nil {
t.Fatalf("DeleteHost: %v", err)
}
if _, err := s.GetHost(ctx, hostID); !errors.Is(err, ErrNotFound) {
t.Errorf("GetHost after delete: want ErrNotFound, got %v", err)
}
if _, err := s.LookupHostByAgentToken(ctx, "tokenhash"); !errors.Is(err, ErrNotFound) {
t.Errorf("bearer lookup after delete: want ErrNotFound, got %v", err)
}
// Cascade smoke-tests via raw counts. We don't own a public
// "list jobs by host" path that filters by host, so go to the DB
// directly with the same connection used by the store helpers.
for _, q := range []struct {
label string
sql string
}{
{"schedules", "SELECT count(*) FROM schedules WHERE host_id = ?"},
{"jobs", "SELECT count(*) FROM jobs WHERE host_id = ?"},
{"source_groups", "SELECT count(*) FROM source_groups WHERE host_id = ?"},
{"host_credentials", "SELECT count(*) FROM host_credentials WHERE host_id = ?"},
{"schedule_source_groups", "SELECT count(*) FROM schedule_source_groups WHERE schedule_id = ?"},
} {
var n int
key := hostID
if q.label == "schedule_source_groups" {
key = "01HDELSCHED00000000000001"
}
if err := s.db.QueryRowContext(ctx, q.sql, key).Scan(&n); err != nil {
t.Fatalf("count %s: %v", q.label, err)
}
if n != 0 {
t.Errorf("cascade left %d rows in %s", n, q.label)
}
}
}
// TestDeleteHostNotFound: a delete against a missing id surfaces
// ErrNotFound so the HTTP layer can 404 instead of 200-ing a no-op.
func TestDeleteHostNotFound(t *testing.T) {
t.Parallel()
s := openTestStore(t)
if err := s.DeleteHost(context.Background(), "01HNOTAHOST00000000000000"); !errors.Is(err, ErrNotFound) {
t.Errorf("missing id: want ErrNotFound, got %v", err)
}
}
@@ -1,21 +0,0 @@
-- 0017_users_extensions.sql
--
-- Add the columns the user-management UI needs:
-- email — optional, free-form text; format-checked
-- in Go on insert/update via net/mail.ParseAddress
-- disabled_at — soft-delete tombstone. NULL = enabled
-- must_change_password — flag set by admin-create + setup-token flow;
-- cleared by /setup or /settings/account
--
-- Plus a case-insensitive unique index so 'Alice' and 'alice' can't
-- both exist (lowercase normalisation is applied in the Go layer
-- on every CreateUser; this index defends the invariant).
--
-- Column-level ALTERs (CLAUDE.md prefers these over rebuilds; safe
-- under foreign_keys=ON).
ALTER TABLE users ADD COLUMN email TEXT;
ALTER TABLE users ADD COLUMN disabled_at TEXT;
ALTER TABLE users ADD COLUMN must_change_password INTEGER NOT NULL DEFAULT 0;
CREATE UNIQUE INDEX users_username_lower ON users(LOWER(username));
@@ -1,16 +0,0 @@
-- 0018_user_setup_tokens.sql
--
-- One outstanding setup token per user (PRIMARY KEY on user_id).
-- Regenerating a link is INSERT OR REPLACE — old token immediately
-- invalid. Token is stored as sha256(raw) hex, never the raw token,
-- so a DB leak doesn't leak active links.
CREATE TABLE user_setup_tokens (
user_id TEXT PRIMARY KEY REFERENCES users(id) ON DELETE CASCADE,
token_hash TEXT NOT NULL,
expires_at TEXT NOT NULL,
created_at TEXT NOT NULL,
created_by TEXT REFERENCES users(id) ON DELETE SET NULL
);
CREATE INDEX user_setup_tokens_expires ON user_setup_tokens(expires_at);
-35
View File
@@ -1,35 +0,0 @@
-- 0019_oidc.sql
--
-- OIDC bookkeeping. Three independent additions land in one
-- migration to keep the related changes together:
--
-- 1. users.auth_source — 'local' | 'oidc'. Local users get
-- the default; first OIDC sign-in JITs
-- a row with auth_source='oidc'.
-- 2. users.oidc_subject — IdP's stable 'sub' claim. Indexed
-- uniquely (partial; NULLs allowed).
-- 3. sessions.id_token — last id_token for OIDC sessions, used
-- as id_token_hint on RP-initiated
-- logout. NULL for local sessions.
-- 4. oidc_state — short-lived state for the OAuth round-
-- trip (state + PKCE code_verifier).
-- Swept on the alert engine tick.
--
-- All column-level ALTERs (CLAUDE.md preference; safe under
-- foreign_keys=ON).
ALTER TABLE users ADD COLUMN auth_source TEXT NOT NULL DEFAULT 'local'
CHECK (auth_source IN ('local', 'oidc'));
ALTER TABLE users ADD COLUMN oidc_subject TEXT;
CREATE UNIQUE INDEX users_oidc_subject ON users(oidc_subject)
WHERE oidc_subject IS NOT NULL;
ALTER TABLE sessions ADD COLUMN id_token TEXT;
CREATE TABLE oidc_state (
state_hash TEXT PRIMARY KEY, -- sha256(state) hex; raw never persisted
code_verifier TEXT NOT NULL,
created_at TEXT NOT NULL
);
CREATE INDEX oidc_state_created ON oidc_state(created_at);
@@ -1,22 +0,0 @@
-- 0020_hosts_repo_status.sql
--
-- NS-03: surface repo init / probe state on the host row so the
-- operator sees credential / connectivity failures eagerly rather
-- than discovering them via a missed scheduled backup.
--
-- repo_status:
-- 'unknown' — no probe outcome yet (default for fresh enrolment
-- and for hosts re-binding fresh creds).
-- 'ready' — last init / probe succeeded; repo is reachable
-- with the bound creds.
-- 'init_failed' — last init / probe failed; repo_status_error has
-- the trimmed agent-side error message.
--
-- The init-pending intermediate state is intentionally omitted: a job
-- in flight is already visible on the host detail page via
-- jobs.status, and bridging both surfaces leads to drift. The host
-- column reflects the *outcome* of the last probe.
ALTER TABLE hosts ADD COLUMN repo_status TEXT NOT NULL DEFAULT 'unknown'
CHECK (repo_status IN ('unknown', 'ready', 'init_failed'));
ALTER TABLE hosts ADD COLUMN repo_status_error TEXT NOT NULL DEFAULT '';
-65
View File
@@ -1,65 +0,0 @@
package store
import (
"context"
"database/sql"
"errors"
"fmt"
"time"
)
// PutOIDCState stores the (state_hash, code_verifier) pair created
// at /auth/oidc/login start. Called once per login attempt.
func (s *Store) PutOIDCState(ctx context.Context, stateHash, verifier string, createdAt time.Time) error {
_, err := s.db.ExecContext(ctx,
`INSERT INTO oidc_state (state_hash, code_verifier, created_at)
VALUES (?, ?, ?)`,
stateHash, verifier,
createdAt.UTC().Format(time.RFC3339Nano))
if err != nil {
return fmt.Errorf("store: put oidc state: %w", err)
}
return nil
}
// ConsumeOIDCState atomically reads + deletes the row in one go,
// returning the code_verifier. Single-use — a re-play returns
// ErrNotFound. Used by the OIDC callback handler.
func (s *Store) ConsumeOIDCState(ctx context.Context, stateHash string) (string, error) {
tx, err := s.db.BeginTx(ctx, nil)
if err != nil {
return "", fmt.Errorf("store: begin: %w", err)
}
defer func() { _ = tx.Rollback() }()
var verifier string
err = tx.QueryRowContext(ctx,
`SELECT code_verifier FROM oidc_state WHERE state_hash = ?`,
stateHash).Scan(&verifier)
if err != nil {
if errors.Is(err, sql.ErrNoRows) {
return "", ErrNotFound
}
return "", fmt.Errorf("store: consume oidc state: %w", err)
}
if _, err := tx.ExecContext(ctx,
`DELETE FROM oidc_state WHERE state_hash = ?`, stateHash); err != nil {
return "", fmt.Errorf("store: delete oidc state: %w", err)
}
if err := tx.Commit(); err != nil {
return "", fmt.Errorf("store: commit: %w", err)
}
return verifier, nil
}
// CleanupExpiredOIDCState removes entries created before cutoff.
// Called on the alert engine's 60s tick alongside setup-token sweep.
func (s *Store) CleanupExpiredOIDCState(ctx context.Context, cutoff time.Time) (int64, error) {
res, err := s.db.ExecContext(ctx,
`DELETE FROM oidc_state WHERE created_at < ?`,
cutoff.UTC().Format(time.RFC3339Nano))
if err != nil {
return 0, fmt.Errorf("store: cleanup oidc state: %w", err)
}
n, _ := res.RowsAffected()
return n, nil
}
-64
View File
@@ -1,64 +0,0 @@
package store
import (
"context"
"path/filepath"
"testing"
"time"
)
func newOIDCStateTestStore(t *testing.T) *Store {
t.Helper()
st, err := Open(context.Background(), filepath.Join(t.TempDir(), "rm.db"))
if err != nil {
t.Fatalf("open: %v", err)
}
t.Cleanup(func() { _ = st.Close() })
return st
}
func TestOIDCStatePutAndConsume(t *testing.T) {
t.Parallel()
st := newOIDCStateTestStore(t)
ctx := context.Background()
now := time.Now().UTC()
if err := st.PutOIDCState(ctx, "hash1", "verifier-1", now); err != nil {
t.Fatalf("put: %v", err)
}
v, err := st.ConsumeOIDCState(ctx, "hash1")
if err != nil {
t.Fatalf("consume: %v", err)
}
if v != "verifier-1" {
t.Errorf("verifier: got %q want %q", v, "verifier-1")
}
if _, err := st.ConsumeOIDCState(ctx, "hash1"); err == nil {
t.Error("re-consume should fail")
}
}
func TestOIDCStateCleanup(t *testing.T) {
t.Parallel()
st := newOIDCStateTestStore(t)
ctx := context.Background()
now := time.Now().UTC()
_ = st.PutOIDCState(ctx, "stale", "v-stale", now.Add(-10*time.Minute))
_ = st.PutOIDCState(ctx, "fresh", "v-fresh", now)
cutoff := now.Add(-5 * time.Minute)
n, err := st.CleanupExpiredOIDCState(ctx, cutoff)
if err != nil {
t.Fatalf("cleanup: %v", err)
}
if n != 1 {
t.Errorf("cleanup count: got %d want 1", n)
}
if _, err := st.ConsumeOIDCState(ctx, "stale"); err == nil {
t.Error("stale entry should have been deleted")
}
if _, err := st.ConsumeOIDCState(ctx, "fresh"); err != nil {
t.Errorf("fresh entry should still be readable: %v", err)
}
}
+6 -25
View File
@@ -12,14 +12,13 @@ import (
// insert; the raw token is what the caller hands to the user (cookie).
func (s *Store) CreateSession(ctx context.Context, sess Session, tokenHash string) error {
_, err := s.db.ExecContext(ctx,
`INSERT INTO sessions (id, user_id, created_at, expires_at, ip, ua, id_token)
VALUES (?, ?, ?, ?, ?, ?, ?)`,
`INSERT INTO sessions (id, user_id, created_at, expires_at, ip, ua)
VALUES (?, ?, ?, ?, ?, ?)`,
tokenHash,
sess.UserID,
sess.CreatedAt.UTC().Format(time.RFC3339Nano),
sess.ExpiresAt.UTC().Format(time.RFC3339Nano),
nullableStr(sess.IP), nullableStr(sess.UA),
nullableStr(sess.IDToken))
sess.IP, sess.UA)
if err != nil {
return fmt.Errorf("store: create session: %w", err)
}
@@ -33,15 +32,15 @@ func (s *Store) CreateSession(ctx context.Context, sess Session, tokenHash strin
// of valid token hashes.
func (s *Store) LookupSession(ctx context.Context, tokenHash string) (*Session, error) {
row := s.db.QueryRowContext(ctx,
`SELECT id, user_id, created_at, expires_at, ip, ua, id_token
`SELECT id, user_id, created_at, expires_at, ip, ua
FROM sessions
WHERE id = ? AND expires_at > ?`,
tokenHash, time.Now().UTC().Format(time.RFC3339Nano))
var sess Session
var created, expires string
var ip, ua, idTok sql.NullString
if err := row.Scan(&sess.ID, &sess.UserID, &created, &expires, &ip, &ua, &idTok); err != nil {
var ip, ua sql.NullString
if err := row.Scan(&sess.ID, &sess.UserID, &created, &expires, &ip, &ua); err != nil {
if errors.Is(err, sql.ErrNoRows) {
return nil, ErrNotFound
}
@@ -63,9 +62,6 @@ func (s *Store) LookupSession(ctx context.Context, tokenHash string) (*Session,
if ua.Valid {
sess.UA = ua.String
}
if idTok.Valid {
sess.IDToken = idTok.String
}
return &sess, nil
}
@@ -90,18 +86,3 @@ func (s *Store) PurgeExpiredSessions(ctx context.Context) (int64, error) {
n, _ := res.RowsAffected()
return n, nil
}
// DeleteSessionsByUserID removes every session row owned by the
// user. Returns count for caller logging. Used by:
// - admin "Force logout" button
// - admin Disable user (sessions outlive the disable flag, so we
// also clear them so the user gets bounced immediately)
func (s *Store) DeleteSessionsByUserID(ctx context.Context, userID string) (int64, error) {
res, err := s.db.ExecContext(ctx,
`DELETE FROM sessions WHERE user_id = ?`, userID)
if err != nil {
return 0, fmt.Errorf("store: delete sessions by user: %w", err)
}
n, _ := res.RowsAffected()
return n, nil
}
-76
View File
@@ -1,76 +0,0 @@
package store
import (
"context"
"testing"
"time"
)
func TestDeleteSessionsByUserID(t *testing.T) {
t.Parallel()
s := openTestStore(t)
ctx := context.Background()
now := time.Now().UTC()
uid := "u-force"
if err := s.CreateUser(ctx, User{
ID: uid, Username: "victim",
PasswordHash: "x", Role: RoleOperator, CreatedAt: now,
}); err != nil {
t.Fatalf("create user: %v", err)
}
// Create two sessions for that user.
for i, h := range []string{"hash1", "hash2"} {
if err := s.CreateSession(ctx, Session{
ID: h,
UserID: uid,
CreatedAt: now,
ExpiresAt: now.Add(time.Hour),
}, h); err != nil {
t.Fatalf("create session %d: %v", i, err)
}
}
n, err := s.DeleteSessionsByUserID(ctx, uid)
if err != nil {
t.Fatalf("delete: %v", err)
}
if n != 2 {
t.Errorf("count: got %d want 2", n)
}
if _, err := s.LookupSession(ctx, "hash1"); err == nil {
t.Error("hash1 should be gone")
}
}
func TestSessionRoundTripsIDToken(t *testing.T) {
t.Parallel()
s := openTestStore(t)
ctx := context.Background()
now := time.Now().UTC()
uid := "u-oidc"
if err := s.CreateUser(ctx, User{
ID: uid, Username: "ouser", PasswordHash: "",
Role: RoleOperator, CreatedAt: now,
AuthSource: "oidc",
}); err != nil {
t.Fatalf("create user: %v", err)
}
if err := s.CreateSession(ctx, Session{
ID: "h1", UserID: uid, CreatedAt: now,
ExpiresAt: now.Add(time.Hour),
IDToken: "eyJ.fake.jwt",
}, "h1"); err != nil {
t.Fatalf("create session: %v", err)
}
got, err := s.LookupSession(ctx, "h1")
if err != nil {
t.Fatalf("lookup: %v", err)
}
if got.IDToken != "eyJ.fake.jwt" {
t.Errorf("id_token round trip: got %q", got.IDToken)
}
}
-93
View File
@@ -1,93 +0,0 @@
package store
import (
"context"
"database/sql"
"errors"
"fmt"
"time"
)
// SetSetupToken inserts a row, replacing any existing token for
// this user (single-outstanding invariant). Caller passes a hash —
// raw tokens are never persisted.
func (s *Store) SetSetupToken(ctx context.Context, t SetupToken) error {
_, err := s.db.ExecContext(ctx,
`INSERT OR REPLACE INTO user_setup_tokens
(user_id, token_hash, expires_at, created_at, created_by)
VALUES (?, ?, ?, ?, ?)`,
t.UserID, t.TokenHash,
t.ExpiresAt.UTC().Format(time.RFC3339Nano),
t.CreatedAt.UTC().Format(time.RFC3339Nano),
nullable(t.CreatedBy))
if err != nil {
return fmt.Errorf("store: set setup token: %w", err)
}
return nil
}
// LookupSetupToken resolves a token hash to its row. Returns
// ErrNotFound for missing tokens. Expiry is NOT checked here —
// callers must compare ExpiresAt themselves so they can record
// 'expired' as a distinct outcome (audit-able) from 'never existed'.
func (s *Store) LookupSetupToken(ctx context.Context, tokenHash string) (*SetupToken, error) {
row := s.db.QueryRowContext(ctx,
`SELECT user_id, token_hash, expires_at, created_at, created_by
FROM user_setup_tokens WHERE token_hash = ?`, tokenHash)
return scanSetupToken(row.Scan)
}
// GetSetupTokenByUserID returns the row for one user. Used by the
// edit page to know whether a 'Regenerate setup link' button should
// show as 'Generate' or 'Regenerate'. Returns ErrNotFound when no
// outstanding token exists.
func (s *Store) GetSetupTokenByUserID(ctx context.Context, userID string) (*SetupToken, error) {
row := s.db.QueryRowContext(ctx,
`SELECT user_id, token_hash, expires_at, created_at, created_by
FROM user_setup_tokens WHERE user_id = ?`, userID)
return scanSetupToken(row.Scan)
}
// DeleteSetupToken removes the row for a user (single-use cleanup
// after /setup completes successfully).
func (s *Store) DeleteSetupToken(ctx context.Context, userID string) error {
_, err := s.db.ExecContext(ctx,
`DELETE FROM user_setup_tokens WHERE user_id = ?`, userID)
if err != nil {
return fmt.Errorf("store: delete setup token: %w", err)
}
return nil
}
// CleanupExpiredSetupTokens removes rows whose expires_at has passed.
// Returns the number of rows deleted. Called from the maintenance
// ticker every minute.
func (s *Store) CleanupExpiredSetupTokens(ctx context.Context, now time.Time) (int64, error) {
res, err := s.db.ExecContext(ctx,
`DELETE FROM user_setup_tokens WHERE expires_at < ?`,
now.UTC().Format(time.RFC3339Nano))
if err != nil {
return 0, fmt.Errorf("store: cleanup setup tokens: %w", err)
}
n, _ := res.RowsAffected()
return n, nil
}
func scanSetupToken(scan func(...any) error) (*SetupToken, error) {
var t SetupToken
var createdBy sql.NullString
var expiresAt, createdAt string
if err := scan(&t.UserID, &t.TokenHash, &expiresAt, &createdAt, &createdBy); err != nil {
if errors.Is(err, sql.ErrNoRows) {
return nil, ErrNotFound
}
return nil, fmt.Errorf("store: scan setup token: %w", err)
}
t.ExpiresAt, _ = time.Parse(time.RFC3339Nano, expiresAt)
t.CreatedAt, _ = time.Parse(time.RFC3339Nano, createdAt)
if createdBy.Valid {
v := createdBy.String
t.CreatedBy = &v
}
return &t, nil
}
-120
View File
@@ -1,120 +0,0 @@
package store
import (
"context"
"path/filepath"
"testing"
"time"
"github.com/oklog/ulid/v2"
)
func newSetupTokenTestStore(t *testing.T) (*Store, string, string) {
t.Helper()
st, err := Open(context.Background(), filepath.Join(t.TempDir(), "rm.db"))
if err != nil {
t.Fatalf("open: %v", err)
}
t.Cleanup(func() { _ = st.Close() })
uid := ulid.Make().String()
creator := ulid.Make().String()
now := time.Now().UTC()
if err := st.CreateUser(context.Background(), User{
ID: creator, Username: "creator", PasswordHash: "x",
Role: RoleAdmin, CreatedAt: now,
}); err != nil {
t.Fatalf("create creator: %v", err)
}
if err := st.CreateUser(context.Background(), User{
ID: uid, Username: "target", PasswordHash: "",
Role: RoleOperator, CreatedAt: now, MustChangePassword: true,
}); err != nil {
t.Fatalf("create target: %v", err)
}
return st, uid, creator
}
func TestSetupTokenSetAndLookup(t *testing.T) {
t.Parallel()
st, uid, creator := newSetupTokenTestStore(t)
ctx := context.Background()
now := time.Now().UTC()
if err := st.SetSetupToken(ctx, SetupToken{
UserID: uid, TokenHash: "abc123",
ExpiresAt: now.Add(time.Hour),
CreatedAt: now, CreatedBy: &creator,
}); err != nil {
t.Fatalf("set: %v", err)
}
got, err := st.LookupSetupToken(ctx, "abc123")
if err != nil {
t.Fatalf("lookup: %v", err)
}
if got.UserID != uid {
t.Errorf("user_id: got %q want %q", got.UserID, uid)
}
}
func TestSetupTokenReplaces(t *testing.T) {
t.Parallel()
st, uid, creator := newSetupTokenTestStore(t)
ctx := context.Background()
now := time.Now().UTC()
_ = st.SetSetupToken(ctx, SetupToken{
UserID: uid, TokenHash: "old",
ExpiresAt: now.Add(time.Hour), CreatedAt: now, CreatedBy: &creator,
})
_ = st.SetSetupToken(ctx, SetupToken{
UserID: uid, TokenHash: "new",
ExpiresAt: now.Add(time.Hour), CreatedAt: now, CreatedBy: &creator,
})
if _, err := st.LookupSetupToken(ctx, "old"); err == nil {
t.Error("old token should be gone")
}
if _, err := st.LookupSetupToken(ctx, "new"); err != nil {
t.Errorf("new token should resolve: %v", err)
}
}
func TestSetupTokenDelete(t *testing.T) {
t.Parallel()
st, uid, creator := newSetupTokenTestStore(t)
ctx := context.Background()
now := time.Now().UTC()
_ = st.SetSetupToken(ctx, SetupToken{
UserID: uid, TokenHash: "tk",
ExpiresAt: now.Add(time.Hour), CreatedAt: now, CreatedBy: &creator,
})
if err := st.DeleteSetupToken(ctx, uid); err != nil {
t.Fatalf("delete: %v", err)
}
if _, err := st.LookupSetupToken(ctx, "tk"); err == nil {
t.Error("deleted token should not resolve")
}
}
func TestSetupTokenCleanupExpired(t *testing.T) {
t.Parallel()
st, uid, creator := newSetupTokenTestStore(t)
ctx := context.Background()
now := time.Now().UTC()
_ = st.SetSetupToken(ctx, SetupToken{
UserID: uid, TokenHash: "stale",
ExpiresAt: now.Add(-time.Hour), CreatedAt: now.Add(-2 * time.Hour),
CreatedBy: &creator,
})
n, err := st.CleanupExpiredSetupTokens(ctx, now)
if err != nil {
t.Fatalf("cleanup: %v", err)
}
if n != 1 {
t.Errorf("cleanup count: got %d want 1", n)
}
if _, err := st.LookupSetupToken(ctx, "stale"); err == nil {
t.Error("stale token should be gone")
}
}
+6 -43
View File
@@ -9,25 +9,12 @@ import (
// User mirrors the users table.
type User struct {
ID string
Username string
PasswordHash string
Role Role
Email *string // optional; nil = not set
DisabledAt *time.Time // nil = enabled
MustChangePassword bool
// AuthSource is "local" (created by admin or bootstrap) or
// "oidc" (JIT-provisioned on first OIDC sign-in). Local users
// authenticate via password; OIDC users via the IdP and have an
// empty PasswordHash.
AuthSource string
// OIDCSubject is the stable 'sub' claim from the IdP. Set only
// when AuthSource == "oidc". Used for fast lookup on subsequent
// sign-ins; the username/email may change at the IdP but sub
// stays stable.
OIDCSubject *string
CreatedAt time.Time
LastLoginAt *time.Time
ID string
Username string
PasswordHash string
Role Role
CreatedAt time.Time
LastLoginAt *time.Time
}
// Role enumerates the access tiers from spec.md §7.2.
@@ -50,10 +37,6 @@ type Session struct {
ExpiresAt time.Time
IP string
UA string
// IDToken is the OIDC id_token captured at sign-in for OIDC
// sessions; empty for local-user sessions. Used as
// id_token_hint on RP-initiated logout.
IDToken string
}
// Host mirrors the hosts table. The P2 redesign moved repo-related
@@ -90,15 +73,6 @@ type Host struct {
// Empty = no default configured.
PreHookDefault string
PostHookDefault string
// RepoStatus tracks the outcome of the last init/probe attempt:
// "unknown" (default), "ready", or "init_failed". Set by the WS
// handler on every job.finished of kind=init, and reset to
// "unknown" by repo-credentials saves so the next dispatch
// re-tests the new creds. RepoStatusError carries the trimmed
// agent-side message when RepoStatus == "init_failed".
RepoStatus string
RepoStatusError string
}
// Schedule is now intentionally slim: cron + which groups + enabled.
@@ -245,14 +219,3 @@ type AuditEntry struct {
TS time.Time
Payload json.RawMessage
}
// SetupToken mirrors the user_setup_tokens table. The raw token
// itself is never stored; the field shown here is the sha256 hex
// digest of the raw token, which is what callers compare against.
type SetupToken struct {
UserID string
TokenHash string
ExpiresAt time.Time
CreatedAt time.Time
CreatedBy *string // admin user id; nil only after CASCADE SET NULL
}
+23 -223
View File
@@ -5,136 +5,37 @@ import (
"database/sql"
"errors"
"fmt"
"strings"
"time"
)
// CreateUser inserts a row. Username is lowercase-normalised so the
// case-insensitive unique index from migration 0017 doesn't surprise
// callers who insert 'Alice' and look up 'alice'.
// CreateUser inserts a new user. The caller is responsible for
// generating an ID (typically a ULID) and hashing the password.
func (s *Store) CreateUser(ctx context.Context, u User) error {
u.Username = strings.ToLower(strings.TrimSpace(u.Username))
must := 0
if u.MustChangePassword {
must = 1
}
authSource := u.AuthSource
if authSource == "" {
authSource = "local"
}
_, err := s.db.ExecContext(ctx,
`INSERT INTO users (id, username, password_hash, role, email,
must_change_password, auth_source,
oidc_subject, created_at)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)`,
u.ID, u.Username, u.PasswordHash, string(u.Role),
nullable(u.Email), must, authSource,
nullable(u.OIDCSubject),
u.CreatedAt.UTC().Format(time.RFC3339Nano))
`INSERT INTO users (id, username, password_hash, role, created_at)
VALUES (?, ?, ?, ?, ?)`,
u.ID, u.Username, u.PasswordHash, string(u.Role), u.CreatedAt.UTC().Format(time.RFC3339Nano))
if err != nil {
return fmt.Errorf("store: create user: %w", err)
}
return nil
}
// userSelectCols centralises the column list every read path uses so
// scanUser stays in lockstep.
const userSelectCols = `id, username, password_hash, role, email,
disabled_at, must_change_password,
auth_source, oidc_subject,
created_at, last_login_at`
// GetUserByUsername resolves a user case-insensitively.
// GetUserByUsername looks up a user by their (case-sensitive) username.
// Returns ErrNotFound if no row matches.
func (s *Store) GetUserByUsername(ctx context.Context, username string) (*User, error) {
row := s.db.QueryRowContext(ctx,
`SELECT `+userSelectCols+` FROM users WHERE LOWER(username) = LOWER(?)`,
username)
return scanUser(row.Scan)
`SELECT id, username, password_hash, role, created_at, last_login_at
FROM users WHERE username = ?`, username)
return scanUser(row)
}
// GetUserByID looks up a user by id. Returns ErrNotFound on miss.
func (s *Store) GetUserByID(ctx context.Context, id string) (*User, error) {
row := s.db.QueryRowContext(ctx,
`SELECT `+userSelectCols+` FROM users WHERE id = ?`, id)
return scanUser(row.Scan)
}
// GetUserByOIDCSubject finds the user JIT-provisioned on a previous
// OIDC sign-in. ErrNotFound on miss.
func (s *Store) GetUserByOIDCSubject(ctx context.Context, sub string) (*User, error) {
row := s.db.QueryRowContext(ctx,
`SELECT `+userSelectCols+` FROM users WHERE oidc_subject = ?`, sub)
return scanUser(row.Scan)
}
// SetUserOIDCSubject pins an existing user row to an IdP subject.
// Used by tests today; reserved for a future "link a local user to
// OIDC" flow.
func (s *Store) SetUserOIDCSubject(ctx context.Context, id, authSource, sub string) error {
_, err := s.db.ExecContext(ctx,
`UPDATE users SET auth_source = ?, oidc_subject = ? WHERE id = ?`,
authSource, sub, id)
if err != nil {
return fmt.Errorf("store: set oidc subject: %w", err)
}
return nil
}
// UserSort selects the column ListUsers orders by. OrderBy is
// allowlisted in usersOrderColumn so callers can't inject SQL via
// this field. Empty / unknown OrderBy falls back to "username".
type UserSort struct {
OrderBy string // "username" | "email" | "role" | "last_login_at"
OrderAsc bool // false = DESC; true = ASC
}
// usersOrderColumn validates s.OrderBy and returns the SQL fragment.
// last_login_at gets a NULL-tail trick so users who've never logged
// in sort to the bottom regardless of asc/desc — matches operator
// intuition ("show me real activity" not "show me NULLs first").
func usersOrderColumn(col string, asc bool) string {
dir := "DESC"
if asc {
dir = "ASC"
}
switch col {
case "email":
return fmt.Sprintf("email IS NULL, email %s, username", dir)
case "role":
return fmt.Sprintf("role %s, username", dir)
case "last_login_at":
return fmt.Sprintf("last_login_at IS NULL, last_login_at %s, username", dir)
default: // username (and unknown)
return fmt.Sprintf("username %s", dir)
}
}
// ListUsers returns users sorted per UserSort. Default (zero value)
// is username ASC. Used by the user-management page (sort headers)
// and by surfaces that need a user-id → username map (audit log
// filter, "ack'd by" projections) — those callers pass UserSort{}.
func (s *Store) ListUsers(ctx context.Context, sort UserSort) ([]User, error) {
asc := sort.OrderAsc
if sort.OrderBy == "" {
// Default: username ASC (alphabetical), matching pre-sort behaviour.
asc = true
}
q := `SELECT ` + userSelectCols + ` FROM users ORDER BY ` +
usersOrderColumn(sort.OrderBy, asc)
rows, err := s.db.QueryContext(ctx, q)
if err != nil {
return nil, fmt.Errorf("store: list users: %w", err)
}
defer func() { _ = rows.Close() }()
var out []User
for rows.Next() {
u, err := scanUser(rows.Scan)
if err != nil {
return nil, err
}
out = append(out, *u)
}
return out, rows.Err()
`SELECT id, username, password_hash, role, created_at, last_login_at
FROM users WHERE id = ?`, id)
return scanUser(row)
}
// CountUsers returns the total number of user rows. The first-run
@@ -147,19 +48,6 @@ func (s *Store) CountUsers(ctx context.Context) (int, error) {
return n, nil
}
// CountEnabledAdmins returns the number of users with role='admin'
// AND disabled_at IS NULL. Used by the last-admin guard before
// disable / role-demote operations.
func (s *Store) CountEnabledAdmins(ctx context.Context) (int, error) {
var n int
if err := s.db.QueryRowContext(ctx,
`SELECT COUNT(*) FROM users WHERE role = 'admin' AND disabled_at IS NULL`,
).Scan(&n); err != nil {
return 0, fmt.Errorf("store: count admins: %w", err)
}
return n, nil
}
// MarkUserLogin records a successful authentication.
func (s *Store) MarkUserLogin(ctx context.Context, id string, when time.Time) error {
_, err := s.db.ExecContext(ctx,
@@ -171,116 +59,28 @@ func (s *Store) MarkUserLogin(ctx context.Context, id string, when time.Time) er
return nil
}
// SetUserEmail replaces the email field. Empty string clears it.
func (s *Store) SetUserEmail(ctx context.Context, id, email string) error {
em := strings.ToLower(strings.TrimSpace(email))
var v any
if em == "" {
v = nil
} else {
v = em
}
_, err := s.db.ExecContext(ctx,
`UPDATE users SET email = ? WHERE id = ?`, v, id)
if err != nil {
return fmt.Errorf("store: set user email: %w", err)
}
return nil
}
// SetUserRole changes a user's role.
func (s *Store) SetUserRole(ctx context.Context, id string, role Role) error {
_, err := s.db.ExecContext(ctx,
`UPDATE users SET role = ? WHERE id = ?`, string(role), id)
if err != nil {
return fmt.Errorf("store: set user role: %w", err)
}
return nil
}
// DisableUser sets disabled_at = when. Idempotent on already-disabled
// rows (no-op).
func (s *Store) DisableUser(ctx context.Context, id string, when time.Time) error {
_, err := s.db.ExecContext(ctx,
`UPDATE users SET disabled_at = ?
WHERE id = ? AND disabled_at IS NULL`,
when.UTC().Format(time.RFC3339Nano), id)
if err != nil {
return fmt.Errorf("store: disable user: %w", err)
}
return nil
}
// EnableUser clears disabled_at.
func (s *Store) EnableUser(ctx context.Context, id string) error {
_, err := s.db.ExecContext(ctx,
`UPDATE users SET disabled_at = NULL WHERE id = ?`, id)
if err != nil {
return fmt.Errorf("store: enable user: %w", err)
}
return nil
}
// SetMustChangePassword toggles the must_change_password flag.
func (s *Store) SetMustChangePassword(ctx context.Context, id string, must bool) error {
v := 0
if must {
v = 1
}
_, err := s.db.ExecContext(ctx,
`UPDATE users SET must_change_password = ? WHERE id = ?`, v, id)
if err != nil {
return fmt.Errorf("store: set must_change_password: %w", err)
}
return nil
}
// SetPasswordHash stores a new password_hash and clears the
// must_change_password flag in one go.
func (s *Store) SetPasswordHash(ctx context.Context, id, hash string) error {
_, err := s.db.ExecContext(ctx,
`UPDATE users SET password_hash = ?, must_change_password = 0 WHERE id = ?`,
hash, id)
if err != nil {
return fmt.Errorf("store: set password: %w", err)
}
return nil
}
func scanUser(scan func(...any) error) (*User, error) {
func scanUser(row *sql.Row) (*User, error) {
var u User
var role string
var email, disabledAt, oidcSub, lastLogin sql.NullString
var must int
var authSource string
var lastLogin sql.NullString
var created string
if err := scan(&u.ID, &u.Username, &u.PasswordHash, &role,
&email, &disabledAt, &must, &authSource, &oidcSub,
&created, &lastLogin); err != nil {
if err := row.Scan(&u.ID, &u.Username, &u.PasswordHash, &role, &created, &lastLogin); err != nil {
if errors.Is(err, sql.ErrNoRows) {
return nil, ErrNotFound
}
return nil, fmt.Errorf("store: scan user: %w", err)
}
u.Role = Role(role)
if email.Valid {
v := email.String
u.Email = &v
t, err := time.Parse(time.RFC3339Nano, created)
if err != nil {
return nil, fmt.Errorf("store: parse created_at: %w", err)
}
if disabledAt.Valid {
t, _ := time.Parse(time.RFC3339Nano, disabledAt.String)
u.DisabledAt = &t
}
u.MustChangePassword = must == 1
u.AuthSource = authSource
if oidcSub.Valid {
v := oidcSub.String
u.OIDCSubject = &v
}
t, _ := time.Parse(time.RFC3339Nano, created)
u.CreatedAt = t
if lastLogin.Valid {
t, _ := time.Parse(time.RFC3339Nano, lastLogin.String)
t, err := time.Parse(time.RFC3339Nano, lastLogin.String)
if err != nil {
return nil, fmt.Errorf("store: parse last_login_at: %w", err)
}
u.LastLoginAt = &t
}
return &u, nil
-82
View File
@@ -131,88 +131,6 @@ func TestSessionLifecycle(t *testing.T) {
}
}
func TestCreateUserLowercasesUsername(t *testing.T) {
t.Parallel()
s := openTestStore(t)
ctx := context.Background()
now := time.Now().UTC()
if err := s.CreateUser(ctx, User{
ID: "u1", Username: "Alice",
PasswordHash: "x", Role: RoleAdmin, CreatedAt: now,
}); err != nil {
t.Fatalf("create: %v", err)
}
got, err := s.GetUserByUsername(ctx, "alice")
if err != nil {
t.Fatalf("get lower: %v", err)
}
if got.Username != "alice" {
t.Errorf("stored username: got %q want %q", got.Username, "alice")
}
got, err = s.GetUserByUsername(ctx, "ALICE")
if err != nil {
t.Fatalf("get upper: %v", err)
}
if got.ID != "u1" {
t.Errorf("upper-case lookup missed: got %+v", got)
}
if err := s.CreateUser(ctx, User{
ID: "u2", Username: "AlIcE",
PasswordHash: "x", Role: RoleAdmin, CreatedAt: now,
}); err == nil {
t.Error("duplicate (different case) should fail")
}
}
func TestGetUserByOIDCSubject(t *testing.T) {
t.Parallel()
s := openTestStore(t)
ctx := context.Background()
now := time.Now().UTC()
sub := "sub-abc-123"
if err := s.CreateUser(ctx, User{
ID: "u1", Username: "alice", PasswordHash: "",
Role: RoleAdmin, CreatedAt: now,
AuthSource: "oidc", OIDCSubject: &sub,
}); err != nil {
t.Fatalf("create: %v", err)
}
got, err := s.GetUserByOIDCSubject(ctx, sub)
if err != nil {
t.Fatalf("get by sub: %v", err)
}
if got.ID != "u1" || got.AuthSource != "oidc" {
t.Errorf("unexpected: %+v", got)
}
if _, err := s.GetUserByOIDCSubject(ctx, "nope"); !errors.Is(err, ErrNotFound) {
t.Errorf("missing sub: want ErrNotFound, got %v", err)
}
}
func TestSetUserOIDCSubject(t *testing.T) {
t.Parallel()
s := openTestStore(t)
ctx := context.Background()
now := time.Now().UTC()
if err := s.CreateUser(ctx, User{
ID: "u1", Username: "alice", PasswordHash: "x",
Role: RoleAdmin, CreatedAt: now,
}); err != nil {
t.Fatalf("create: %v", err)
}
sub := "sub-456"
if err := s.SetUserOIDCSubject(ctx, "u1", "oidc", sub); err != nil {
t.Fatalf("set: %v", err)
}
got, _ := s.GetUserByID(ctx, "u1")
if got.AuthSource != "oidc" || got.OIDCSubject == nil || *got.OIDCSubject != sub {
t.Errorf("after set: %+v", got)
}
}
func TestEnrollmentTokenSingleUse(t *testing.T) {
t.Parallel()
s := openTestStore(t)
+15 -59
View File
@@ -278,11 +278,9 @@ Sizes: **S** = under a day, **M** = 13 days, **L** = 37 days.
> **As shipped (Playwright sweep, 2026-05-04):** /settings/notifications → 3 channels created (sweep-webhook → local Python sink, sweep-ntfy → ntfy.sh public topic, sweep-smtp → MailHog at 127.0.0.1:1025). Test buttons fire alert.test on each: webhook 200/1ms, ntfy 200/322ms, SMTP 250/3ms. Synthetic critical `backup_failed` raised → /alerts shows row with severity dot, kind chip, host, message, raised/last-seen, Ack + Resolve buttons; nav badge `1`; dashboard critical-alert banner appears with Review→ link; OPEN ALERTS card reads `1 unresolved`. Acknowledge → fan-out to all 3 channels emits alert.acknowledged (verified in webhook sink, MailHog inbox, notification_log); Acknowledged tab shows row with `ack'd by <user>` line. Resolve → fan-out emits alert.resolved across all 3 channels; banner clears; dashboard reads `0 unresolved · all clear`; host alerts column reads —. Three live bugs found and fixed mid-sweep: (a) `enabled` form value lost because hidden+checkbox both named `enabled` and `PostForm.Get` returned the first ("0"); (b) Ack/Resolve handlers stored the state change but never dispatched alert.acknowledged / alert.resolved; (c) `hosts.open_alert_count` projection was never recomputed on Raise/Resolve/AutoResolve, so the dashboard count always read 0.
### Phase 3 — Audit log UI
### Phase 3 — Audit log UI (not started)
- [x] **P3-08** (S) Audit log UI with filters (user, action, target, time range)
> **As shipped (2026-05-05):** Read-only `/audit` page (+ `/api/audit` JSON). Filters: time-range presets (24h / 7d / 30d / all), user dropdown (any registered user), actor dropdown (user / agent / system), target-kind dropdown (host / schedule / source_group / alert / notification_channel / job / user), action substring search box. Table columns: when (relative + abstime tooltip), actor tag (user accent / agent green / system grey), user (or em-dash for system rows), action string, target (kind · resolved name for hosts, kind · id otherwise), payload `<details>` block when non-empty. New `Store.ListAudit(AuditFilter)` and `Store.DistinctAuditActions` plus `Store.ListUsers`. Append-only — no edit/delete surface, deliberately.
- [ ] **P3-08** (S) Audit log UI with filters (user, action, target, time range)
### Phase 3 acceptance
@@ -292,35 +290,21 @@ Sizes: **S** = under a day, **M** = 13 days, **L** = 37 days.
---
## Phase 4 — RBAC, OIDC, host tags
## Phase 4 — Update delivery, RBAC polish, OIDC
- [x] **P4-03** (M) RBAC enforcement at API layer (admin / operator / viewer)
- [x] **P4-04** (S) User management UI (create/edit/disable, role assignment, password reset)
> **As shipped (2026-05-05):** Three-role hierarchy (admin > operator > viewer) enforced via chi route-group middleware (`requireRole`). Admin is the fail-closed default; agent endpoints stay on the bearer-token chain. Sessions re-validate `disabled_at` on every authenticated request — admin-driven changes (disable, force-logout) land immediately.
>
> **Setup-token flow** replaces temp passwords. Admin clicks `+ Add user`, picks username + email + role, server returns a one-time setup link valid for 1 hour (sha256-hashed at rest, raw shown to admin once). User clicks the link → sets a password (≥12 chars) → drops a session → lands on `/`. `/settings/users/{id}/regenerate-setup` issues a new link, replacing the old via INSERT OR REPLACE. Expired tokens are swept on the alert engine's 60s tick.
>
> **Disable-only lifecycle** — soft delete via `disabled_at`. Last-admin guard rejects "disable last admin" and "demote last admin to non-admin" (both server-side and UI-hinted). Re-enable on disabled-username collision: admin trying to add a name that matches a disabled user is redirected to that user's edit page rather than 409'd.
>
> **Self-service password change** at `/settings/account` available to any role. Skips current-password check when `must_change_password` is set so admin-initiated resets work without surfacing a credential the user doesn't know.
>
> **Schema:** migration 0017 adds `email`, `disabled_at`, `must_change_password` plus a UNIQUE INDEX on LOWER(username) (lowercase normalisation in Go on every CreateUser); 0018 adds `user_setup_tokens`. Both column-level ALTERs per CLAUDE.md preference. Email is metadata only in v1 (no SMTP-the-link); the SMTP channel infrastructure from P3-06 makes that a one-page follow-up.
>
> **Sweep verified (smoke env):** admin adds operator → setup link generated → curl-as-new-user fetches /setup (200, page shows username) → POSTs password → 303 to / + Set-Cookie → operator authenticated → 200 on /, 200 on /settings/account, **403 on /settings/users** (admin-only) → admin disables user → operator's next request is **401** + session row count drops to 0 → audit log shows `user.created` + `user.setup_completed` for the cycle. All 26 implementation tasks landed; full `go test ./...` green.
- [x] **P4-05** (L) OIDC login (generic provider config, group → role mapping)
> **As shipped (2026-05-05):** Authorization Code + PKCE (S256) against any OIDC IdP advertising standard discovery. Config is YAML+env (`oidc.issuer`, `oidc.client_id`, `oidc.client_secret`/`_file`, `oidc.role_claim` default `groups`, `oidc.role_mapping`, `oidc.display_name`, `oidc.redirect_url`); empty issuer → OIDC disabled, no routes mounted. Migration 0019 adds `users.auth_source`/`oidc_subject` (partial unique index on `oidc_subject`), `sessions.id_token`, and a small `oidc_state` table for state+verifier round-trip (cleaned up every alert tick, 5 min TTL). Login page renders **Sign in with `<display_name>`** above the local form when OIDC is enabled; the SSO button kicks off a 303 to the IdP with state + S256 code_challenge persisted server-side. Callback verifies ID token, fetches `/userinfo` to merge claims (Authelia / many IdPs only put `sub` in the ID token and surface `preferred_username`/`email`/`groups` from userinfo), maps the first matching group to a role; **no match → deny banner**, no row created, audit `user.oidc_login_blocked`. Username-collision with an existing local user → same deny path with `username_taken`. New user → JIT-provisioned with `auth_source='oidc'`, `oidc_subject=<sub>`, `password_hash=''`. Returning user → looked up by `oidc_subject` (stable when usernames change at the IdP), role + email refreshed on every login. Local password login is rejected for `auth_source='oidc'` users. Logout posts to `/logout` and, when the IdP advertised `end_session_endpoint`, follows up with RP-initiated logout (carries `id_token_hint` + `post_logout_redirect_uri=BaseURL`); when not advertised (Authelia in our smoke env), the local session is cleared and the browser lands on `/login`. Users list shows a small **oidc** chip beside enabled/disabled; the edit page disables username/email/role for OIDC users (server-side guard mirrors UI, returns 403). Force-logout, disable, and the last-admin guard from P4-04 all still apply. **Live Authelia sweep verified all four paths against `https://auth.example.invalid`:** rm-admin → admin role + JIT row + chip + readonly edit; rm-operator → operator JIT, 403 on `/settings/users`; rm-viewer → viewer JIT, 403 on `/hosts/new`; rm-other (group not in role_mapping) → no_role_match banner, no row created, audit logged. Returning rm-admin login resolved to the same row by sub. Screenshots in `_diag/p4-05-sweep/`. Out-of-scope and on Phase 6 candidate list: refresh tokens, back-channel logout, multiple providers, post-login PKCE for the cookie itself.
- [x] **P4-07** (S) Per-host tags + dashboard filtering by tag
> **As shipped (2026-05-05):** Tag column already existed on the hosts schema (JSON array, round-tripped through the Host struct since Phase 1) but had no edit UI or filter. Added `Store.SetHostTags` + `Store.DistinctHostTags` (the latter via `json_each` for autocomplete + chip-row population). Inline editor on the host detail header: `+ tag` button reveals a comma-separated input with `<datalist>` autocomplete from the fleet's distinct tags; submit lowercases / trims / dedupes server-side. Tag chips on the host header link to the dashboard pre-filtered. Dashboard chip-row above the hosts table — `All / <tag1> / <tag2> …` with the active chip highlighted via a new `.tag-active` style; `?tag=foo` filters the list with the count showing `N of M`. Operator-band POST `/hosts/{id}/tags` audited as `host.tags_updated`.
- [ ] **P4-01** (M) Update delivery via OS package managers — host an apt repo (Linux) and Chocolatey package (Windows) on gitea releases. `restic-manager-agent update` is a thin wrapper over `apt-get install --only-upgrade restic-manager-agent` / `choco upgrade`. Trades flexibility for a much smaller security surface than bespoke signed binaries (see spec.md §4.2)
- [ ] **P4-02** (M) Agent version reporting on dashboard: surface "agent N versions behind server"; "update all" admin action calls the package-manager wrapper on each host
- [ ] **P4-03** (M) RBAC enforcement at API layer (admin / operator / viewer)
- [ ] **P4-04** (S) User management UI (create/edit/disable, role assignment, password reset)
- [ ] **P4-05** (L) OIDC login (generic provider config, group → role mapping)
- [ ] **P4-06** (M) Repo size trend graphs (sparkline on host card, full chart on repo page)
- [ ] **P4-07** (S) Per-host tags + dashboard filtering by tag
- [ ] **P4-08** (M) Prometheus `/metrics` endpoint: per-host gauges (last backup timestamp, last backup status, repo size, snapshot count, agent online), server gauges (active alerts, build info), job duration histograms; protected by bearer token or IP allow-list
- [ ] **P4-09** (S) Document Prometheus integration + sample Grafana dashboard JSON
### Phase 4 acceptance
- Non-admin users see an appropriately limited UI. OIDC login works against at least one provider (Authelia or Authentik). Hosts can be tagged and the dashboard filters by tag.
> **Deferred to Phase 6** (2026-05-05) — pulled forward of OSS readiness so a working v1 ships sooner: P4-01/02 (update delivery + agent-version tracking), P4-06 (repo size trends), P4-08/09 (Prometheus + Grafana). All operator-experience polish, none of it gates getting the system into production.
- Non-admin users see an appropriately limited UI. Agents upgrade via apt/choco with one admin-triggered action. OIDC login works against at least one provider (Authelia or Authentik). Prometheus can scrape `/metrics` and the sample Grafana dashboard renders with live data.
---
@@ -328,11 +312,11 @@ Sizes: **S** = under a day, **M** = 13 days, **L** = 37 days.
- [ ] **P5-01** (M) Documentation site (mdBook or similar) with install, concepts, security model, screenshots
- [ ] **P5-02** (S) `CONTRIBUTING.md`, `CODE_OF_CONDUCT.md`, issue + PR templates
- [x] **P5-03** (S) Release automation**pivoted away from goreleaser/binary archives** on 2026-05-05 (spec: `docs/superpowers/specs/2026-05-05-p5-03-docker-only-release.md`). Single deliverable per tag: a multi-arch (linux amd64+arm64) server image, with cross-compiled agent binaries (linux amd64+arm64, windows amd64) + `install.sh` + `install.ps1` + the systemd unit baked under `/opt/restic-manager/dist/`. The `/agent/binary` and `/install/*` handlers fall back from `<DataDir>/...` to `<BundledAssetsDir>/...` so a fresh container Just Works. Workflow `.gitea/workflows/release.yml` triggers on `v*.*.*` tag-push (real release: fan-out `:vX.Y.Z`, `:X.Y`, `:X`, plus `:latest` once `MAJOR>=1`) and `workflow_dispatch` (snapshot: `:snapshot-<shortsha>` only). Pushed to the Gitea container registry on this instance — no external creds, no GHCR mirror. Cosign / SBOM / minisign / GHCR mirror deferred to Phase 6. Source builds via `make build` remain a first-class path.
- [ ] **P5-03** (S) Release automation: `goreleaser` for binaries + Docker image to GHCR
- [ ] **P5-04** (S) Demo screenshots / short Loom walkthrough in README
- [ ] **P5-05** (S) `SECURITY.md` with disclosure process
- [ ] **P5-06** (M) End-to-end test suite in CI (Playwright vs. compose stack with sibling Linux agent)
- [x] **P5-07** (S) Reference deployment landed alongside P5-03. `deploy/docker-compose.yml` stands up *only* the server (image-pinned via `RM_VERSION`, named volume for operator state, bound to localhost) — TLS termination is left to whichever reverse proxy the operator already runs. `docs/reverse-proxy.md` documents the headers + WebSocket pass-through the proxy must forward, the `RM_TRUSTED_PROXY` CIDR rule, and worked examples for Caddy, nginx, and Traefik.
- [ ] **P5-07** (S) Reference deployment: `docker-compose.yml` + Caddyfile snippet showing the TLS-terminating reverse proxy in front of the HTTP-only server (also demonstrates `RM_TRUSTED_PROXY`)
### Phase 5 acceptance
@@ -340,22 +324,6 @@ Sizes: **S** = under a day, **M** = 13 days, **L** = 37 days.
---
## Phase 6 — Update delivery + observability
> Deferred from Phase 4 on 2026-05-05 — operator-experience polish that doesn't gate a working v1.
- [ ] **P6-01** (M) Update delivery via OS package managers — host an apt repo (Linux) and Chocolatey package (Windows) on gitea releases. `restic-manager-agent update` is a thin wrapper over `apt-get install --only-upgrade restic-manager-agent` / `choco upgrade`. Trades flexibility for a much smaller security surface than bespoke signed binaries (see spec.md §4.2). _(Was P4-01.)_
- [ ] **P6-02** (M) Agent version reporting on dashboard: surface "agent N versions behind server"; "update all" admin action calls the package-manager wrapper on each host. _(Was P4-02.)_
- [ ] **P6-03** (M) Repo size trend graphs (sparkline on host card, full chart on repo page). _(Was P4-06.)_
- [ ] **P6-04** (M) Prometheus `/metrics` endpoint: per-host gauges (last backup timestamp, last backup status, repo size, snapshot count, agent online), server gauges (active alerts, build info), job duration histograms; protected by bearer token or IP allow-list. _(Was P4-08.)_
- [ ] **P6-05** (S) Document Prometheus integration + sample Grafana dashboard JSON. _(Was P4-09.)_
### Phase 6 acceptance
- Agents upgrade via apt/choco with one admin-triggered action. Prometheus can scrape `/metrics` and the sample Grafana dashboard renders with live data. Repo size trend visible on host detail.
---
## Cross-cutting / ongoing
- [ ] **X-01** Keep CHANGELOG.md updated (Keep-a-Changelog format)
@@ -366,18 +334,6 @@ Sizes: **S** = under a day, **M** = 13 days, **L** = 37 days.
---
## Next steps from testing
> Bin for issues spotted while exercising a live deployment. Promote
> into a phase once scoped; leave here while still being collected.
- [x] **NS-01** Admin-driven host deletion. ✅ Landed: store `DeleteHost` (FK cascade revokes the agent bearer along with everything else), admin-band `POST /hosts/{id}/delete`, danger-zone form on host detail with hostname-confirm, audit `host.deleted`, live WS connection closed pre-delete. Original scope below for reference. No UI or API surface today — once a host is enrolled the only way to remove it is hand-editing SQLite, which then cascades through schedules/jobs/snapshots/source-groups via the FK chain. Needs: store-level `DeleteHost` + cascade audit, admin-band `DELETE /api/hosts/{id}` and form-post variant, confirm-modal on the host-detail page, audit entry, and a decision on whether to also revoke the agent's bearer (recommend: yes, so a re-installed host comes back through the normal pending-host accept flow).
- [x] **NS-02** Recoverable enrollment-token UX. ✅ Landed: `Store.ListOutstandingEnrollmentTokens` + `DeleteEnrollmentToken`; outstanding-tokens panel on the Add-host page (short hash, redacted repo URL, created/expires) with per-row Regenerate (revokes old hash, mints fresh raw token preserving repo creds + initial paths, 303s to `/hosts/pending/{newToken}`) and Revoke (delete + audit). Audit actions `enrollment_token.regenerated` / `enrollment_token.revoked`. Original scope below. Today `POST /hosts/new` mints a token and 303s to `/hosts/pending/{token}`; if the operator closes that tab the install snippet is lost and there's no UI surface to find it again — the row sits in `enrollment_tokens` until TTL expiry, invisible. Needs: store-level `ListOutstandingEnrollmentTokens` returning `(token_hash, created_at, expires_at, repo_url_redacted, initial_paths, attached_host_id_or_null)`; a small list section on the Add-host page (and/or Settings) showing outstanding tokens with created/expires-in and the redacted repo URL; admin-band `POST /api/enrollment-tokens/{id}/regenerate` (revokes the old hash, mints a fresh raw token, re-uses the original attachments — same pattern as the user-setup-token regenerate flow) and `POST /api/enrollment-tokens/{id}/revoke`. Choose regenerate over "show original token" because we only persist hashes, never raw tokens.
- [x] **NS-03** Auto-init repo on first onboard, surface credential failures eagerly. ✅ Landed: migration 0020 adds `hosts.repo_status` (`unknown`/`ready`/`init_failed`) + `repo_status_error`; WS handler projects every init job's terminal state onto the host row (with idempotent "config file already exists" → ready); creds-save handlers (UI + JSON API) reset status to `unknown` and dispatch a fresh init when the agent is online; new `/hosts/{id}/repo/probe` retry endpoint and a status banner on the repo page. Remainder of original scope below. surface credential failures eagerly. Today the operator types repo URL + creds during Add-host and the credentials are pushed to the agent on connect, but no `restic init`/probe runs until the first scheduled job — so a typo in the password or a wrong URL goes undetected for hours/days, manifesting as a silent missed-backup. Wanted behaviour: when the host completes enrolment (or when an admin saves new repo creds), the server dispatches a one-shot probe job that runs `restic cat config` (cheap, repo-existence + creds-validity in one call). On `Is there already a config file? unable to open config file` → run `restic init`. On success → mark the host's repo as ready. On any other error (network, auth, fingerprint) → surface a panel-level error on the host detail page and audit the failure, leaving the host in an "init pending" state with a "Retry" button. Needs: a new `JobKind` (or piggyback on an existing one) for the probe, server-side state on the host row (`repo_status` enum: `unknown`/`ready`/`init_pending`/`init_failed`), UI panel that shows the state, and clear copy on the Add-host page so the operator knows the save isn't fire-and-forget.
- [x] **NS-04** Dashboard parity with the alerts screen: live refresh, column sorting, filters. ✅ Landed: `/` now parses `q`/`status`/`repo_status`/`tag`/`sort`/`dir` query params (round-trip durable for bookmarks); table is wrapped in an `id="hosts-table"` htmx live-poll matching the alerts cadence (5s, gated on `document.visibilityState` and `localStorage.rm-dashboard-live`); filter row above the table with hostname free-text + status + repo_status selects + tag chips + clear; column headers (Host / OS · arch / Last backup / Repo size / Snapshots) are clickable links that toggle direction on the active column; pure-Go sort+filter pipeline covered by `dashboard_filter_test.go`. Original scope below. live refresh, column sorting, filters. The host list is currently a static render — operators have to reload to see new heartbeats / job state changes. Mirror the alerts pattern (`web/templates/pages/alerts.html` uses `hx-trigger="every 5s [document.visibilityState==='visible' && localStorage.getItem('rm-alerts-live')!=='off']"` plus a Live/Off toggle so background tabs and explicit-off don't burn server cycles). Add: server-side sort on every meaningful column (name, OS, last-backup time, last-backup status, agent online/offline, restic version, tags), and a small filter row above the table — at minimum free-text on hostname, status (online/offline/never-seen), and tag chips. Columns + filter state should round-trip through query string so a bookmarked / shared URL is durable. Re-use the `host_row` partial that already exists so the live-refresh swap is a clean OOB swap, not a full table re-render.
---
## Future / unscheduled
> Items here have a plausible use case but no confirmed need. They live
File diff suppressed because one or more lines are too long
-65
View File
@@ -306,57 +306,11 @@
.dot-critical { background: var(--bad); box-shadow: 0 0 0 3px color-mix(in oklch, var(--bad), transparent 80%); }
.dot-resolved { background: var(--ok); box-shadow: 0 0 0 3px color-mix(in oklch, var(--ok), transparent 80%); }
/* Tag in active/selected state used by the dashboard chip-row
filter and any other UI that wants a "this tag is currently
applied" highlight. Subtle: slight accent tint, accent border,
ink colour shift; doesn't shout. */
.tag.tag-active {
color: var(--accent);
border-color: color-mix(in oklch, var(--accent), transparent 50%);
background: color-mix(in oklch, var(--accent), transparent 92%);
}
/* tag colour variants for alerts */
.tag-warn { color: var(--warn); border-color: color-mix(in oklch, var(--warn), transparent 60%); background: color-mix(in oklch, var(--warn), transparent 92%); }
.tag-critical { color: var(--bad); border-color: color-mix(in oklch, var(--bad), transparent 60%); background: color-mix(in oklch, var(--bad), transparent 92%); }
.tag-info { color: var(--ink-mid); }
/* ---------- audit rows (/audit list) ---------- */
.audit-row {
display: grid; align-items: center;
grid-template-columns: 160px 80px 110px 1.4fr 1.5fr 90px;
column-gap: 16px;
padding: 11px 16px; font-size: 13px;
border-bottom: 1px solid var(--line-soft);
transition: background 100ms ease;
}
.audit-row:hover { background: var(--panel-hi); }
.audit-row:last-child { border-bottom: 0; }
.audit-row.head {
cursor: default; padding-top: 9px; padding-bottom: 9px;
font-size: 11px; color: var(--ink-fade);
text-transform: uppercase; letter-spacing: 0.08em;
}
.audit-row.head:hover { background: transparent; }
/* Sort-header link styling shared by .audit-row and .user-row
(and any other future sortable table headers). The selectors
scope to .head rows so hover and accent-glyph treatment only
apply to the header, not data rows that happen to contain a
<a class="sort-header">. */
.audit-row.head .sort-header,
.user-row.head .sort-header {
color: inherit; text-decoration: none; cursor: pointer;
display: inline-flex; align-items: baseline; gap: 4px;
}
.audit-row.head .sort-header:hover,
.user-row.head .sort-header:hover { color: var(--ink); }
.audit-row.head .sort-glyph,
.user-row.head .sort-glyph {
font-size: 9px; color: var(--accent);
/* keep the row height stable when the glyph appears/disappears */
min-width: 8px; display: inline-block;
}
/* ---------- schedule rows (Schedules tab) ---------- */
.schd-row {
display: grid; align-items: center;
@@ -581,25 +535,6 @@
background: var(--accent);
}
/* ---------- user-management rows (/settings/users) ---------- */
.user-row {
display: grid; align-items: center;
grid-template-columns: 180px 1fr 110px 160px 120px 90px;
column-gap: 16px;
padding: 11px 16px; font-size: 13px;
border-bottom: 1px solid var(--line-soft);
transition: background 100ms ease;
}
.user-row:hover { background: var(--panel-hi); }
.user-row:last-child { border-bottom: 0; }
.user-row.head {
cursor: default; padding-top: 9px; padding-bottom: 9px;
font-size: 11px; color: var(--ink-fade);
text-transform: uppercase; letter-spacing: 0.08em;
}
.user-row.head:hover { background: transparent; }
.user-row.disabled { opacity: 0.55; }
/* ---------- test-result pills (notification test button) ---------- */
.test-pill {
display: inline-block;
-46
View File
@@ -1,46 +0,0 @@
{{define "title"}}Account · restic-manager{{end}}
{{define "content"}}
{{$page := .Page}}
<div class="max-w-[520px] mx-auto px-8 pb-14">
<div class="crumbs pt-6">
<a href="/">Dashboard</a><span class="sep">/</span>
<span class="text-ink-mid">account</span>
</div>
<h1 class="text-[22px] font-medium tracking-[-0.005em] mt-3.5">Account</h1>
<div class="text-[12.5px] text-ink-mute mt-2 leading-[1.6]">
Signed in as <span class="mono text-ink-mid">{{$page.Username}}</span>
({{$page.Role}}). Change your password below.
</div>
{{if $page.Saved}}
<div class="mt-6 panel rounded-[7px] p-4"
style="border-color: color-mix(in oklch, var(--ok), transparent 60%);">
<div class="text-ok text-[13px]">Password updated.</div>
</div>
{{end}}
<form method="post" action="/settings/account" class="mt-6 panel rounded-[7px] p-6 space-y-4">
{{if not $page.MustChange}}
<div>
<label class="field-label" for="current">Current password</label>
<input id="current" name="current_password" type="password" class="field"
required autocomplete="current-password" />
</div>
{{end}}
<div>
<label class="field-label" for="new">New password</label>
<input id="new" name="new_password" type="password" class="field"
required minlength="12" autocomplete="new-password" />
</div>
<div>
<label class="field-label" for="confirm">Confirm new password</label>
<input id="confirm" name="confirm_password" type="password" class="field"
required minlength="12" autocomplete="new-password" />
</div>
{{if $page.Error}}<div class="text-bad text-[12.5px]">{{$page.Error}}</div>{{end}}
<button type="submit" class="btn btn-primary btn-block btn-lg">Update password</button>
</form>
</div>
{{end}}
-39
View File
@@ -22,45 +22,6 @@
</div>
{{end}}
{{if $page.OutstandingTokens}}
<div class="mt-7 panel rounded-[7px] px-5 py-4">
<div class="flex items-center justify-between mb-3">
<h3 class="text-[12px] font-semibold uppercase tracking-[0.08em] text-ink-mute">Outstanding install tokens</h3>
<span class="text-[11.5px] text-ink-fade">closed the install snippet tab? regenerate to get a fresh URL</span>
</div>
<table class="w-full text-[12.5px]">
<thead class="text-[11px] uppercase tracking-[0.08em] text-ink-fade">
<tr>
<th class="text-left font-medium pb-2 pr-4">id</th>
<th class="text-left font-medium pb-2 pr-4">repo</th>
<th class="text-left font-medium pb-2 pr-4">created</th>
<th class="text-left font-medium pb-2 pr-4">expires</th>
<th class="pb-2"></th>
</tr>
</thead>
<tbody>
{{range $page.OutstandingTokens}}
<tr class="border-t border-line-soft">
<td class="py-2.5 pr-4 mono text-ink-mute">{{.ShortHash}}…</td>
<td class="py-2.5 pr-4 mono text-ink-mid">{{if .RepoURL}}{{.RepoURL}}{{else}}<span class="text-ink-fade"></span>{{end}}</td>
<td class="py-2.5 pr-4 text-ink-mute">{{.CreatedAt | relTime}}</td>
<td class="py-2.5 pr-4 text-ink-mute">{{.ExpiresAt | relTime}}</td>
<td class="py-2.5 text-right whitespace-nowrap">
<form method="post" action="/hosts/enrollment-tokens/{{.TokenHash}}/regenerate" class="inline">
<button type="submit" class="btn btn-sm">Regenerate</button>
</form>
<form method="post" action="/hosts/enrollment-tokens/{{.TokenHash}}/revoke" class="inline ml-1"
onsubmit="return confirm('Revoke this enrolment token? Any pending install using it will fail.');">
<button type="submit" class="btn btn-sm btn-danger">Revoke</button>
</form>
</td>
</tr>
{{end}}
</tbody>
</table>
</div>
{{end}}
<form method="post" action="/hosts/new" class="grid grid-cols-12 gap-8 mt-7">
<div class="col-span-7 panel rounded-[7px] px-8 py-7">
+9 -53
View File
@@ -56,17 +56,14 @@
{{end}}
</div>
{{/* severity dropdown — option text tinted to match the colour
already used in the row (dot, left border, kind chip). The
severity word is otherwise invisible to operators because the
table column shows kind only; the colour bridges the two. */}}
{{/* severity dropdown */}}
<div>
<select class="field" style="padding: 6px 10px; font-size: 11.5px; min-width: 130px;"
onchange="window.location='/alerts?status={{$filter.Status}}&severity='+this.value+'{{if $filter.HostID}}&host_id={{$filter.HostID}}{{end}}{{if $filter.Search}}&q={{$filter.Search}}{{end}}'">
<option value="" {{if eq $filter.Severity ""}}selected{{end}}>Severity · any</option>
<option value="info" style="color: oklch(0.78 0.005 250);" {{if eq $filter.Severity "info"}}selected{{end}}>info</option>
<option value="warning" style="color: oklch(0.82 0.13 80);" {{if eq $filter.Severity "warning"}}selected{{end}}>warning</option>
<option value="critical" style="color: oklch(0.70 0.20 25);" {{if eq $filter.Severity "critical"}}selected{{end}}>critical</option>
<option value="info" {{if eq $filter.Severity "info"}}selected{{end}}>info</option>
<option value="warning" {{if eq $filter.Severity "warning"}}selected{{end}}>warning</option>
<option value="critical" {{if eq $filter.Severity "critical"}}selected{{end}}>critical</option>
</select>
</div>
@@ -93,37 +90,18 @@
</form>
</div>
{{/* alerts table — polled every 5s when the tab is visible AND the
live toggle is on. The localStorage check is part of the htmx
trigger predicate, so flipping the toggle just sets the flag and
the next tick (or the absence of one) honours it. No need to
re-process the element when the toggle changes.
The polling lives on this div (not the page root) so the filter
strip and header don't flash on each tick. */}}
<div id="alerts-table" class="panel mt-3.5 rounded-[7px] overflow-hidden"
hx-get="{{$page.RefreshURL}}"
hx-trigger="every 5s [document.visibilityState==='visible' && localStorage.getItem('rm-alerts-live')!=='off']"
hx-select="#alerts-table"
hx-swap="outerHTML">
{{/* alerts table */}}
<div class="panel mt-3.5 rounded-[7px] overflow-hidden">
{{/* header row */}}
<div class="alert-row head">
<div></div>
<div>Kind</div>
<div>Severity / kind</div>
<div>Host</div>
<div>Message</div>
<div>Raised</div>
<div>Last seen</div>
<div style="display: flex; align-items: center; gap: 6px; justify-content: flex-end;">
<label style="display: inline-flex; align-items: center; gap: 5px; cursor: pointer; font-size: 10px;"
class="text-ink-fade" title="auto-refresh every 5s">
<input type="checkbox" id="alerts-live-toggle" checked
onchange="localStorage.setItem('rm-alerts-live', this.checked ? 'on' : 'off'); document.getElementById('alerts-live-dot').style.opacity = this.checked ? '1' : '0.3';"
style="width: 11px; height: 11px; cursor: pointer; margin: 0;" />
<span>live</span>
<span id="alerts-live-dot" class="text-accent"></span>
</label>
</div>
<div></div>
</div>
{{if eq (len $page.Alerts) 0}}
@@ -141,33 +119,11 @@
</div>
{{else}}
{{range $page.Alerts}}
{{template "alert_row" (dict "Alert" . "HostNames" $page.HostNames "Usernames" $page.Usernames "Filter" $page.Filter)}}
{{template "alert_row" (dict "Alert" . "HostNames" $page.HostNames "Filter" $page.Filter)}}
{{end}}
{{end}}
</div>
</div>
<script>
// Restore the live-refresh toggle from localStorage so the operator's
// last choice survives full-page navigations. Re-runs after every htmx
// swap so the freshly-rendered checkbox + dot stay in sync.
(function syncLiveToggle() {
var on = localStorage.getItem('rm-alerts-live') !== 'off';
var cb = document.getElementById('alerts-live-toggle');
var dot = document.getElementById('alerts-live-dot');
if (cb) cb.checked = on;
if (dot) dot.style.opacity = on ? '1' : '0.3';
})();
document.body.addEventListener('htmx:afterSwap', function(e) {
if (e.detail.target && e.detail.target.id === 'alerts-table') {
var on = localStorage.getItem('rm-alerts-live') !== 'off';
var cb = document.getElementById('alerts-live-toggle');
var dot = document.getElementById('alerts-live-dot');
if (cb) cb.checked = on;
if (dot) dot.style.opacity = on ? '1' : '0.3';
}
});
</script>
{{end}}
-269
View File
@@ -1,269 +0,0 @@
{{define "title"}}Audit · restic-manager{{end}}
{{define "content"}}
{{$page := .Page}}
{{$filter := $page.Filter}}
{{$rng := $page.Range}}
<div class="max-w-[1280px] mx-auto px-8 pb-14">
{{/* crumbs */}}
<div class="crumbs pt-6">
<a href="/">Dashboard</a><span class="sep">/</span>
<span class="text-ink-mid">audit</span>
</div>
{{/* page header */}}
<div class="flex items-baseline justify-between mt-3.5">
<div>
<h1 class="text-[22px] font-medium tracking-[-0.005em]">
Audit log
<span class="text-ink-fade font-normal text-[14px] ml-2">
{{len $page.Entries}} entries · last {{if eq $rng "all"}}all-time{{else}}{{$rng}}{{end}}
</span>
</h1>
</div>
<div class="flex gap-2">
{{/* Export carries the current filter querystring so the
download is exactly what the operator sees on screen
(up to a higher row cap of 5000 vs 500 in the table). */}}
<a href="{{$page.CSVHref}}"
class="btn"
title="Download the current filter as CSV (up to 5000 rows, UTF-8, RFC 4180)">
Export CSV ↓
</a>
</div>
</div>
<div class="text-ink-mute mt-2 leading-[1.55]" style="font-size: 11.5px; max-width: 760px;">
Append-only history of every operator action, agent message, and system-driven change.
Read-only — entries cannot be edited or deleted.
</div>
{{/* filter strip */}}
<div class="panel mt-4 px-4 py-3 rounded-[7px]"
style="display: grid; grid-template-columns: auto auto auto auto 1fr; gap: 14px; align-items: center;">
{{/* time-range pills */}}
<div class="inline-flex gap-1 p-[3px]" style="border: 1px solid var(--line-soft); border-radius: 5px;">
{{range list "24h" "7d" "30d" "all"}}
{{$r := .}}
{{$active := eq $r $rng}}
<a href="/audit?range={{$r}}{{if $filter.UserID}}&user_id={{$filter.UserID}}{{end}}{{if $filter.Actor}}&actor={{$filter.Actor}}{{end}}{{if $filter.ActionLike}}&action={{$filter.ActionLike}}{{end}}{{if $filter.TargetKind}}&target_kind={{$filter.TargetKind}}{{end}}"
class="btn btn-ghost"
style="padding: 5px 10px; font-size: 11.5px;{{if $active}} background: var(--panel-hi); color: var(--ink);{{end}}">
{{if eq $r "all"}}All{{else}}{{$r}}{{end}}
</a>
{{end}}
</div>
{{/* user dropdown */}}
<div>
<select class="field" style="padding: 6px 10px; font-size: 11.5px; min-width: 140px;"
onchange="window.location='/audit?range={{$rng}}&user_id='+this.value+'{{if $filter.Actor}}&actor={{$filter.Actor}}{{end}}{{if $filter.ActionLike}}&action={{$filter.ActionLike}}{{end}}{{if $filter.TargetKind}}&target_kind={{$filter.TargetKind}}{{end}}'">
<option value="" {{if eq $filter.UserID ""}}selected{{end}}>User · any</option>
{{range $id, $name := $page.UserNames}}
<option value="{{$id}}" {{if eq $filter.UserID $id}}selected{{end}}>{{$name}}</option>
{{end}}
</select>
</div>
{{/* actor dropdown — user/agent/system */}}
<div>
<select class="field" style="padding: 6px 10px; font-size: 11.5px; min-width: 130px;"
onchange="window.location='/audit?range={{$rng}}{{if $filter.UserID}}&user_id={{$filter.UserID}}{{end}}&actor='+this.value+'{{if $filter.ActionLike}}&action={{$filter.ActionLike}}{{end}}{{if $filter.TargetKind}}&target_kind={{$filter.TargetKind}}{{end}}'">
<option value="" {{if eq $filter.Actor ""}}selected{{end}}>Actor · any</option>
<option value="user" {{if eq $filter.Actor "user"}}selected{{end}}>user</option>
<option value="agent" {{if eq $filter.Actor "agent"}}selected{{end}}>agent</option>
<option value="system" {{if eq $filter.Actor "system"}}selected{{end}}>system</option>
</select>
</div>
{{/* target kind dropdown */}}
<div>
<select class="field" style="padding: 6px 10px; font-size: 11.5px; min-width: 160px;"
onchange="window.location='/audit?range={{$rng}}{{if $filter.UserID}}&user_id={{$filter.UserID}}{{end}}{{if $filter.Actor}}&actor={{$filter.Actor}}{{end}}{{if $filter.ActionLike}}&action={{$filter.ActionLike}}{{end}}&target_kind='+this.value">
<option value="" {{if eq $filter.TargetKind ""}}selected{{end}}>Target · any</option>
<option value="host" {{if eq $filter.TargetKind "host"}}selected{{end}}>host</option>
<option value="schedule" {{if eq $filter.TargetKind "schedule"}}selected{{end}}>schedule</option>
<option value="source_group" {{if eq $filter.TargetKind "source_group"}}selected{{end}}>source_group</option>
<option value="alert" {{if eq $filter.TargetKind "alert"}}selected{{end}}>alert</option>
<option value="notification_channel" {{if eq $filter.TargetKind "notification_channel"}}selected{{end}}>notification_channel</option>
<option value="job" {{if eq $filter.TargetKind "job"}}selected{{end}}>job</option>
<option value="user" {{if eq $filter.TargetKind "user"}}selected{{end}}>user</option>
</select>
</div>
{{/* action substring search */}}
<form method="get" action="/audit">
<input type="hidden" name="range" value="{{$rng}}">
{{if $filter.UserID}}<input type="hidden" name="user_id" value="{{$filter.UserID}}">{{end}}
{{if $filter.Actor}}<input type="hidden" name="actor" value="{{$filter.Actor}}">{{end}}
{{if $filter.TargetKind}}<input type="hidden" name="target_kind" value="{{$filter.TargetKind}}">{{end}}
<input type="text" name="action" value="{{$filter.ActionLike}}"
placeholder="action contains… (e.g. alert., host.)"
class="field mono"
style="padding: 6px 10px; font-size: 11.5px;">
</form>
</div>
{{/* table */}}
<div class="panel mt-3.5 rounded-[7px] overflow-hidden">
{{/* Header — every column except the payload one is a clickable
sort link. Hrefs are pre-built server-side ($page.SortHrefs)
so the URL escaping rules don't trip on the '=' chars when
html/template encodes <a href> attributes. */}}
<div class="audit-row head">
<div>
<a href="{{index $page.SortHrefs "ts"}}"
class="sort-header">When <span class="sort-glyph">{{sortGlyph "ts" $page.Sort $page.Dir}}</span></a>
</div>
<div>
<a href="{{index $page.SortHrefs "actor"}}"
class="sort-header">Actor <span class="sort-glyph">{{sortGlyph "actor" $page.Sort $page.Dir}}</span></a>
</div>
<div>
<a href="{{index $page.SortHrefs "user_id"}}"
class="sort-header">User <span class="sort-glyph">{{sortGlyph "user_id" $page.Sort $page.Dir}}</span></a>
</div>
<div>
<a href="{{index $page.SortHrefs "action"}}"
class="sort-header">Action <span class="sort-glyph">{{sortGlyph "action" $page.Sort $page.Dir}}</span></a>
</div>
<div>
<a href="{{index $page.SortHrefs "target_kind"}}"
class="sort-header">Target <span class="sort-glyph">{{sortGlyph "target_kind" $page.Sort $page.Dir}}</span></a>
</div>
<div></div>
</div>
{{if eq (len $page.Entries) 0}}
<div style="padding: 40px; text-align: center;">
<div class="text-ink text-[14px] font-medium">No matching entries.</div>
<div class="text-ink-mute text-[12px] mt-1">
{{if eq $rng "24h"}}Try widening the time range.{{else}}Adjust filters or pick a longer range.{{end}}
</div>
</div>
{{else}}
{{range $page.Entries}}
{{$e := .}}
<div class="audit-row">
<div class="mono text-[12px] text-ink-mute" title="UTC">
{{absTime $e.TS}}
</div>
<div>
{{if eq $e.Actor "user"}}<span class="tag" style="background: color-mix(in oklch, var(--accent), transparent 92%); border-color: color-mix(in oklch, var(--accent), transparent 60%); color: var(--accent);">user</span>
{{else if eq $e.Actor "agent"}}<span class="tag" style="background: color-mix(in oklch, var(--ok), transparent 92%); border-color: color-mix(in oklch, var(--ok), transparent 60%); color: var(--ok);">agent</span>
{{else}}<span class="tag" style="background: color-mix(in oklch, var(--ink-fade), transparent 92%); color: var(--ink-mute);">system</span>{{end}}
</div>
<div class="mono text-[12px] text-ink-mid">
{{if $e.UserID}}{{$un := index $page.UserNames (deref $e.UserID)}}{{if $un}}{{$un}}{{else}}<span class="text-ink-fade">{{deref $e.UserID}}</span>{{end}}{{else}}<span class="text-ink-fade"></span>{{end}}
</div>
<div class="mono text-[12px] text-ink">{{$e.Action}}</div>
<div class="mono text-[12px] text-ink-mute">
{{if $e.TargetKind}}
<span class="text-ink-fade">{{deref $e.TargetKind}}</span>
{{if $e.TargetID}}
{{$tid := deref $e.TargetID}}
{{if eq (deref $e.TargetKind) "host"}}{{$hn := index $page.HostNames $tid}}{{if $hn}} · {{$hn}}{{else}} · {{$tid}}{{end}}
{{else}} · {{$tid}}{{end}}
{{end}}
{{else}}
<span class="text-ink-fade"></span>
{{end}}
</div>
<div class="text-right">
{{if and $e.Payload (gt (len $e.Payload) 2)}}
{{/* Payload is base64-encoded onto a data- attribute to
bypass html/template's contextual JS-string escaping
(which would double-escape arbitrary JSON inside a
<script type="application/json"> block). Decoded by
atob() in the modal opener. */}}
<button type="button" class="btn"
style="font-size: 11px; padding: 3px 8px;"
data-payload-action="{{$e.Action}}"
data-payload-id="{{$e.ID}}"
data-payload-b64="{{b64 $e.Payload}}"
onclick="window.__rmAuditOpenPayload(this)">payload ↗</button>
{{end}}
</div>
</div>
{{end}}
{{end}}
</div>
{{/* Payload modal — single instance shared by every row. Centred
overlay with a max-height; the inner <pre> scrolls when the
payload is long. Closes on backdrop click, Escape key, or the
× button. Plain JSON is pretty-printed; non-JSON falls back to
the raw string. */}}
<div id="audit-payload-modal" class="fixed inset-0 z-50 hidden"
style="background: rgba(0,0,0,0.55); align-items: center; justify-content: center;"
onclick="if (event.target === this) window.__rmAuditClosePayload()">
<div class="panel rounded-[7px]"
style="width: min(720px, 90vw); max-height: 80vh; display: flex; flex-direction: column;"
onclick="event.stopPropagation()">
<div class="flex items-center justify-between"
style="padding: 14px 18px; border-bottom: 1px solid var(--line-soft);">
<div>
<div class="text-[13px] font-medium text-ink" id="audit-payload-title">payload</div>
<div class="text-[11px] text-ink-fade mono mt-0.5" id="audit-payload-subtitle"></div>
</div>
<div class="flex gap-2">
<button type="button" class="btn"
style="font-size: 11.5px;"
onclick="window.__rmAuditCopyPayload()">Copy</button>
<button type="button" class="btn"
style="font-size: 11.5px;"
onclick="window.__rmAuditClosePayload()">×</button>
</div>
</div>
<pre id="audit-payload-body" class="mono text-[12px] text-ink-mid"
style="margin: 0; padding: 16px 18px; overflow: auto; white-space: pre-wrap; word-break: break-all; flex: 1; background: var(--bg);"></pre>
</div>
</div>
</div>
<script>
(function() {
var modal = document.getElementById('audit-payload-modal');
var bodyEl = document.getElementById('audit-payload-body');
var titleEl = document.getElementById('audit-payload-title');
var subEl = document.getElementById('audit-payload-subtitle');
var current = '';
window.__rmAuditOpenPayload = function(btn) {
var id = btn.getAttribute('data-payload-id');
var action = btn.getAttribute('data-payload-action');
var b64 = btn.getAttribute('data-payload-b64') || '';
var raw = '';
try { raw = atob(b64); } catch (e) { raw = ''; }
try {
current = JSON.stringify(JSON.parse(raw), null, 2);
} catch (e) {
current = raw;
}
bodyEl.textContent = current;
titleEl.textContent = action;
subEl.textContent = id;
modal.style.display = 'flex';
modal.classList.remove('hidden');
};
window.__rmAuditClosePayload = function() {
modal.classList.add('hidden');
modal.style.display = 'none';
};
window.__rmAuditCopyPayload = function() {
if (!current) return;
navigator.clipboard.writeText(current).catch(function() {});
};
document.addEventListener('keydown', function(e) {
if (e.key === 'Escape' && !modal.classList.contains('hidden')) {
window.__rmAuditClosePayload();
}
});
})();
</script>
{{end}}
-64
View File
@@ -1,64 +0,0 @@
{{define "title"}}Welcome · restic-manager{{end}}
{{define "content"}}
{{$page := .Page}}
<div class="flex-1 flex flex-col items-center justify-center px-8 py-12">
<div class="w-[420px]">
<div class="flex justify-center mb-10">
<div class="mono text-base text-ink font-medium tracking-[0.01em]">restic-manager</div>
</div>
<h1 class="text-[22px] font-medium tracking-[-0.005em] text-center">
Create the first administrator
</h1>
<p class="text-pretty text-[13px] text-ink-mute mt-3 leading-[1.6] text-center">
This server has no users yet. The account you create here is the
initial administrator. This page is only available until that
account exists.
</p>
{{if $page.Error}}
<div class="mt-5 px-3 py-2.5 rounded-[5px] text-xs"
style="background: color-mix(in oklch, var(--bad), transparent 88%); border: 1px solid color-mix(in oklch, var(--bad), transparent 70%); color: oklch(0.85 0.10 25);">
{{$page.Error}}
</div>
{{end}}
<form method="post" action="/bootstrap" class="mt-7 space-y-4">
<div>
<label class="field-label" for="bs-username">Username</label>
<input id="bs-username" name="username" type="text"
class="field mono" autocomplete="username" autofocus required
value="{{$page.Username}}" />
</div>
<div>
<label class="field-label" for="bs-pw">Password</label>
<input id="bs-pw" name="password" type="password" class="field"
required minlength="12" autocomplete="new-password" />
</div>
<div>
<label class="field-label" for="bs-pw2">Confirm password</label>
<input id="bs-pw2" name="password_confirm" type="password" class="field"
required minlength="12" autocomplete="new-password" />
</div>
<button type="submit" class="btn btn-primary btn-block btn-lg">
Create administrator
</button>
</form>
<div class="mt-6 pt-5 border-t border-line-soft text-center">
<p class="text-pretty text-xs text-ink-mute leading-[1.65]">
Lost the browser session mid-flow? The bootstrap token is also
printed in the server logs and can be POSTed to
<span class="mono text-ink-mid">/api/bootstrap</span>.
</p>
</div>
</div>
<div class="mt-20 flex gap-3.5 items-center text-[11px] text-ink-fade">
<span class="mono">restic-manager {{.Version}}</span>
</div>
</div>
{{end}}
+7 -73
View File
@@ -121,89 +121,23 @@
{{end}}
{{/* ---------- hosts table ---------- */}}
{{$f := $page.Filter}}
{{$sortURL := $page.SortURL}}
<div class="pt-6 pb-4">
<div class="flex items-center justify-between mb-3">
<div class="flex items-center gap-3">
<h2 class="text-[13px] font-semibold tracking-[0.01em]">Hosts</h2>
<div class="text-xs text-ink-fade">{{$page.ShownCount}} of {{$page.HostCount}}</div>
<div class="text-xs text-ink-fade">{{$page.HostCount}} of {{$page.HostCount}}</div>
</div>
<label style="display: inline-flex; align-items: center; gap: 5px; cursor: pointer; font-size: 10px;"
class="text-ink-fade" title="auto-refresh every 5s">
<input type="checkbox" id="dashboard-live-toggle" checked
onchange="localStorage.setItem('rm-dashboard-live', this.checked ? 'on' : 'off'); document.getElementById('dashboard-live-dot').style.opacity = this.checked ? '1' : '0.3';"
style="width: 11px; height: 11px; cursor: pointer; margin: 0;" />
<span>live</span>
<span id="dashboard-live-dot" class="text-accent"></span>
</label>
</div>
{{/* Filter row (NS-04): GET /, every input is a hidden field
for the filters not currently being edited so submit
merges rather than clobbers state. */}}
<form method="get" action="/" class="flex items-center gap-2 mb-3 text-[11.5px] flex-wrap">
<input type="text" name="q" value="{{$f.Search}}" placeholder="search hostname…"
class="field mono"
style="padding: 6px 10px; font-size: 11.5px; width: 220px;">
<select name="status" class="field"
style="padding: 5px 8px; font-size: 11.5px; width: auto;"
onchange="this.form.submit()">
<option value="" {{if eq $f.Status ""}}selected{{end}}>any status</option>
<option value="online" {{if eq $f.Status "online"}}selected{{end}}>online</option>
<option value="offline" {{if eq $f.Status "offline"}}selected{{end}}>offline</option>
<option value="never_seen" {{if eq $f.Status "never_seen"}}selected{{end}}>never seen</option>
</select>
<select name="repo_status" class="field"
style="padding: 5px 8px; font-size: 11.5px; width: auto;"
onchange="this.form.submit()">
<option value="" {{if eq $f.RepoStatus ""}}selected{{end}}>any repo state</option>
<option value="ready" {{if eq $f.RepoStatus "ready"}}selected{{end}}>ready</option>
<option value="init_failed" {{if eq $f.RepoStatus "init_failed"}}selected{{end}}>init failed</option>
<option value="unknown" {{if eq $f.RepoStatus "unknown"}}selected{{end}}>unknown</option>
</select>
{{if $f.Tag}}<input type="hidden" name="tag" value="{{$f.Tag}}">{{end}}
{{if ne $f.Sort "name"}}<input type="hidden" name="sort" value="{{$f.Sort}}">{{end}}
{{if eq $f.Dir "desc"}}<input type="hidden" name="dir" value="desc">{{end}}
<button type="submit" class="btn btn-sm">Apply</button>
{{if or $f.Search $f.Status $f.RepoStatus}}
<a href="/{{if $f.Tag}}?tag={{$f.Tag}}{{end}}" class="text-ink-fade text-[11.5px] mono ml-1">clear</a>
{{end}}
</form>
{{/* Tag chip-row — only renders when at least one tag exists in
the fleet. Active tag is highlighted; clicking the active
tag clears the filter. The "All" pill is shown in the active
state when no tag filter is set. */}}
{{if $page.KnownTags}}
<div class="flex items-center gap-1.5 flex-wrap mb-3 text-[11.5px]">
<span class="text-ink-fade mr-1">tag</span>
<a href="/" class="tag {{if eq $page.ActiveTag ""}}tag-active{{end}}">All</a>
{{range $page.KnownTags}}
{{$t := .}}
<a href="/?tag={{$t}}" class="tag {{if eq $page.ActiveTag $t}}tag-active{{end}}">{{$t}}</a>
{{end}}
</div>
{{end}}
{{/* Live-poll wrapper (NS-04, mirrors the alerts pattern). hx-get
refetches with the current filter pinned; hx-select grabs only
this same div from the response so the surrounding chrome
doesn't flash. The toggle persists in localStorage so a
refreshed tab honours the operator's previous choice. */}}
<div id="hosts-table" class="panel rounded-[7px] overflow-hidden"
hx-get="{{$page.RefreshURL}}"
hx-trigger="every 5s [document.visibilityState==='visible' && localStorage.getItem('rm-dashboard-live')!=='off']"
hx-select="#hosts-table"
hx-swap="outerHTML">
<div class="panel rounded-[7px] overflow-hidden">
<div class="host-row head hairline">
<div></div>
<div><a href="{{index $sortURL "name"}}" class="text-ink-mid hover:text-ink">Host{{if eq $f.Sort "name"}} {{if eq $f.Dir "desc"}}↓{{else}}↑{{end}}{{end}}</a></div>
<div><a href="{{index $sortURL "os"}}" class="text-ink-mid hover:text-ink">OS · arch{{if eq $f.Sort "os"}} {{if eq $f.Dir "desc"}}↓{{else}}↑{{end}}{{end}}</a></div>
<div><a href="{{index $sortURL "last_backup"}}" class="text-ink-mid hover:text-ink">Last backup{{if eq $f.Sort "last_backup"}} {{if eq $f.Dir "desc"}}↓{{else}}↑{{end}}{{end}}</a></div>
<div class="text-right"><a href="{{index $sortURL "repo_size"}}" class="text-ink-mid hover:text-ink">Repo size{{if eq $f.Sort "repo_size"}} {{if eq $f.Dir "desc"}}↓{{else}}↑{{end}}{{end}}</a></div>
<div class="text-right"><a href="{{index $sortURL "snapshot_count"}}" class="text-ink-mid hover:text-ink">Snapshots{{if eq $f.Sort "snapshot_count"}} {{if eq $f.Dir "desc"}}↓{{else}}↑{{end}}{{end}}</a></div>
<div>Host</div>
<div>OS · arch</div>
<div>Last backup</div>
<div class="text-right">Repo size</div>
<div class="text-right">Snapshots</div>
<div>Alerts</div>
<div>Tags</div>
<div></div>
-21
View File
@@ -1,21 +0,0 @@
{{define "title"}}Forbidden · restic-manager{{end}}
{{define "content"}}
{{$page := .Page}}
<div class="max-w-[1280px] mx-auto px-8 pb-14">
<div class="crumbs pt-6">
<a href="/">Dashboard</a><span class="sep">/</span>
<span class="text-ink-mid">forbidden</span>
</div>
<div class="panel mt-8 rounded-[7px] p-8 max-w-[640px]"
style="border-color: color-mix(in oklch, var(--bad), transparent 60%);">
<div class="text-[14px] font-medium text-bad mb-2">403 — Insufficient role</div>
<p class="text-pretty text-[12.5px] text-ink-mute leading-[1.6]">
Your role (<span class="mono">{{$page.Have}}</span>) does not permit
this page (<span class="mono">{{$page.Required}}</span> required).
Ask your administrator if you need access.
</p>
<a href="/" class="btn btn-primary mt-5">Back to dashboard</a>
</div>
</div>
{{end}}
+3 -14
View File
@@ -110,21 +110,10 @@
<div class="panel rounded-[7px] px-4 py-3.5">
<div class="text-[11px] text-bad uppercase tracking-[0.1em] font-semibold mb-2.5">Danger zone</div>
<p class="text-pretty text-[12px] text-ink-mute leading-[1.55] mb-3">
Removes the host record and everything attached to it
(schedules, source groups, jobs, snapshots metadata, alerts).
The agent's bearer is revoked, so a re-installed instance
comes back through the normal pending-host accept flow.
The repo data on the rest-server is left intact — you delete
that yourself.
Removes the host record. The repo data on the rest-server is left intact —
you delete that yourself.
</p>
<form method="post" action="/hosts/{{$host.ID}}/delete"
class="space-y-2"
onsubmit="return confirm('Remove host &quot;{{$host.Name}}&quot;? This cascades to every dependent row and cannot be undone.');">
<input type="text" name="confirm_hostname" required autocomplete="off"
placeholder="type hostname to confirm"
class="field mono text-[12px]" />
<button type="submit" class="btn btn-danger w-full justify-center">Remove host…</button>
</form>
<button class="btn btn-danger w-full justify-center" disabled title="lands later in Phase 1">Remove host…</button>
</div>
</aside>
+1 -26
View File
@@ -8,31 +8,6 @@
<div class="col-span-8">
{{/* ---------- Repo status (NS-03) ---------- */}}
{{if eq $host.RepoStatus "init_failed"}}
<div class="rounded-[7px] px-4 py-3.5 mb-5"
style="border: 1px solid color-mix(in oklch, var(--bad), transparent 55%); background: color-mix(in oklch, var(--bad), transparent 90%);">
<div class="flex items-center justify-between gap-3 mb-1.5">
<div class="text-[12.5px] font-semibold text-bad uppercase tracking-[0.08em]">Repo unreachable</div>
<form method="post" action="/hosts/{{$host.ID}}/repo/probe">
<button type="submit" class="btn btn-sm"
{{if $page.Online}}{{else}}disabled title="host is offline"{{end}}>Retry probe</button>
</form>
</div>
<div class="text-[12.5px] text-ink-mid leading-[1.55]">
The last init / probe against this host's repo failed. Fix the
credentials below and save (the save kicks a fresh probe), or
click <span class="mono">Retry probe</span> if you've changed
something out-of-band.
</div>
{{if $host.RepoStatusError}}
<pre class="mono text-[11.5px] text-ink-mid mt-2.5 whitespace-pre-wrap leading-[1.5]">{{$host.RepoStatusError}}</pre>
{{end}}
</div>
{{else if eq $host.RepoStatus "ready"}}
<div class="text-[12px] text-ok mono mb-5">✓ repo reachable with current credentials</div>
{{end}}
{{/* ---------- Connection ---------- */}}
<h2 class="text-[11.5px] font-semibold uppercase tracking-[0.08em] text-ink-mute mb-3.5">Connection</h2>
<form method="post" action="/hosts/{{$host.ID}}/repo/credentials" class="panel rounded-[7px] p-5">
@@ -294,7 +269,7 @@
onsubmit="return confirm('Re-initialise the repo on host &quot;{{$host.Name}}&quot;? Existing snapshots are lost if the rest-server allows the wipe; restic refuses if it sees a config file already there.');">
<input type="text" name="confirm_hostname" required autocomplete="off"
placeholder="type hostname to confirm"
class="field mono"
class="input mono"
style="width: 240px; height: 30px; padding: 0 8px; font-size: 12px;">
<button type="submit" class="btn btn-danger btn-lg whitespace-nowrap"
{{if eq $host.Status "online"}}{{else}}disabled title="host is offline"{{end}}>Re-init repo…</button>

Some files were not shown because too many files have changed in this diff Show More