testing: bootstrap UI, agent reliability, NS-01..04 + alert username #18

Merged
steve merged 1 commits from ns-batch-host-ops into main 2026-05-05 22:09:18 +01:00
Owner

Round of post-onboarding-test polish. Five themes:

First-run bootstrap UI

  • New /bootstrap page (chrome-less, mirrors /login style) with a username + password form. Uses the in-memory bootstrap token directly — operator never sees or types it.
  • /login redirects to /bootstrap while a token + 0 users.
  • Token printed to stderr stays as a break-glass for the JSON /api/bootstrap path.

Agent reliability

  • failJob helper emits a synthetic job.started + job.finished{failed} for every early-return path in runJob (missing restic, missing creds, malformed payload). Previously the server left the job in running and command.cancel later hit "unknown job".
  • restic.SupportsRestoreNoOwnership probes restic restore --help at agent startup and gates --no-ownership instead of version-sniffing — 0.18.x removed the flag, so the SemVer gate misfired.
  • Systemd unit reshaped to ProtectSystem=full + targeted ReadWritePaths=/etc/restic-manager, no ProtectHome — restore can now land anywhere on a real filesystem.

Restore wizard polish

  • Default target is /root/rm-restore/<job-id>/ with shorter help text.
  • Re-init confirm input swapped to .field (was .input, which doesn't exist in the stylesheet — typed text was invisible against the dark theme).

NS-01..NS-04

  • NS-01 host deleteStore.DeleteHost (FK cascade revokes the agent bearer with everything else), admin-band POST /hosts/{id}/delete, danger-zone form on host detail with hostname-confirm, host.deleted audit, live WS close.
  • NS-02 enrollment-token recoveryStore.ListOutstandingEnrollmentTokens + DeleteEnrollmentToken; outstanding-tokens panel on the Add-host page (short hash, redacted repo URL, created/expires); operator-band Regenerate (revokes old hash, mints fresh raw token preserving repo creds + initial paths, 303s to /hosts/pending/{newToken}) and Revoke (delete + audit).
  • NS-03 repo init / probe surface — migration 0020 adds hosts.repo_status + repo_status_error; WS handler projects every init job's terminal state onto the host row (idempotent "already initialised" collapses to ready); creds-save resets status and dispatches a fresh probe; /hosts/{id}/repo/probe retry endpoint with status banner on the repo page.
  • NS-04 dashboard live + sort + filter/ parses q / status / repo_status / tag / sort / dir query params (round-trip durable); 5s htmx live poll mirroring the alerts pattern with a localStorage live toggle; sortable column headers; filter row + clear.

Alerts page

  • ack'd by … line resolves the acknowledged_by user_id ULID to the actual username (falls back to raw id if the user has been deleted).

Misc

  • compose.yaml added to .gitignore — host-specific dev/test bench file (canonical reference deployment lives in deploy/).

Test plan

  • go vet ./... clean
  • go test ./... clean
  • Live tested end-to-end on test bench: fresh container → /bootstrap → admin → onboard host → run-now without restic surfaces a failed job (not a hang) → restore to /home/<user>/... succeeds → host delete from danger zone → outstanding-token recovery → dashboard filter/sort/live refresh
Round of post-onboarding-test polish. Five themes: ## First-run bootstrap UI - New `/bootstrap` page (chrome-less, mirrors `/login` style) with a username + password form. Uses the in-memory bootstrap token directly — operator never sees or types it. - `/login` redirects to `/bootstrap` while a token + 0 users. - Token printed to stderr stays as a break-glass for the JSON `/api/bootstrap` path. ## Agent reliability - `failJob` helper emits a synthetic `job.started` + `job.finished{failed}` for every early-return path in `runJob` (missing restic, missing creds, malformed payload). Previously the server left the job in `running` and `command.cancel` later hit "unknown job". - `restic.SupportsRestoreNoOwnership` probes `restic restore --help` at agent startup and gates `--no-ownership` instead of version-sniffing — `0.18.x` removed the flag, so the SemVer gate misfired. - Systemd unit reshaped to `ProtectSystem=full` + targeted `ReadWritePaths=/etc/restic-manager`, no `ProtectHome` — restore can now land anywhere on a real filesystem. ## Restore wizard polish - Default target is `/root/rm-restore/<job-id>/` with shorter help text. - Re-init confirm input swapped to `.field` (was `.input`, which doesn't exist in the stylesheet — typed text was invisible against the dark theme). ## NS-01..NS-04 - **NS-01 host delete** — `Store.DeleteHost` (FK cascade revokes the agent bearer with everything else), admin-band `POST /hosts/{id}/delete`, danger-zone form on host detail with hostname-confirm, `host.deleted` audit, live WS close. - **NS-02 enrollment-token recovery** — `Store.ListOutstandingEnrollmentTokens` + `DeleteEnrollmentToken`; outstanding-tokens panel on the Add-host page (short hash, redacted repo URL, created/expires); operator-band Regenerate (revokes old hash, mints fresh raw token preserving repo creds + initial paths, 303s to `/hosts/pending/{newToken}`) and Revoke (delete + audit). - **NS-03 repo init / probe surface** — migration 0020 adds `hosts.repo_status` + `repo_status_error`; WS handler projects every init job's terminal state onto the host row (idempotent "already initialised" collapses to `ready`); creds-save resets status and dispatches a fresh probe; `/hosts/{id}/repo/probe` retry endpoint with status banner on the repo page. - **NS-04 dashboard live + sort + filter** — `/` parses `q` / `status` / `repo_status` / `tag` / `sort` / `dir` query params (round-trip durable); 5s htmx live poll mirroring the alerts pattern with a localStorage live toggle; sortable column headers; filter row + clear. ## Alerts page - `ack'd by …` line resolves the `acknowledged_by` user_id ULID to the actual username (falls back to raw id if the user has been deleted). ## Misc - `compose.yaml` added to `.gitignore` — host-specific dev/test bench file (canonical reference deployment lives in `deploy/`). ## Test plan - [x] `go vet ./...` clean - [x] `go test ./...` clean - [x] Live tested end-to-end on test bench: fresh container → /bootstrap → admin → onboard host → run-now without restic surfaces a failed job (not a hang) → restore to `/home/<user>/...` succeeds → host delete from danger zone → outstanding-token recovery → dashboard filter/sort/live refresh
steve added 1 commit 2026-05-05 22:03:46 +01:00
testing: bootstrap UI, agent reliability, NS-01..04 + alert username
CI / Test (rest) (pull_request) Successful in 29s
CI / Lint (pull_request) Successful in 32s
CI / Build (windows/amd64) (pull_request) Successful in 22s
CI / Test (store) (pull_request) Successful in 1m22s
CI / Test (server-http) (pull_request) Successful in 1m30s
CI / Build (linux/amd64) (pull_request) Successful in 22s
CI / Build (linux/arm64) (pull_request) Successful in 41s
3800b34a2b
Smoothes the rough edges that came up exercising a live deployment.

First-run bootstrap UI: /bootstrap renders a username + password form
that uses the in-memory token directly (operator no longer copies it
out of the log); /login redirects there while bootstrap is available.

Agent reliability: failJob synthetic envelopes so command.run early
returns no longer hang the server-side job; runtime probe of restic
restore --help drives --no-ownership instead of version sniffing
(0.18.x had it removed). Server unit re-shaped: ProtectSystem=full
plus ReadWritePaths=/etc/restic-manager, no ProtectHome — restore
can now write anywhere a user might want.

Restore wizard: default target is /root/rm-restore/<job-id>/ with
clearer help text. Re-init confirm input uses .field (was .input,
which doesn't exist — text was invisible).

NS-01 host delete: store DeleteHost, admin-band /hosts/{id}/delete
with hostname-confirm danger zone, audit, FK cascade, live WS close.

NS-02 enrollment-token recovery: outstanding-tokens panel on
/hosts/new, regenerate (preserves attachments) and revoke handlers
+ audit, store-level ListOutstandingEnrollmentTokens and
DeleteEnrollmentToken.

NS-03 repo init / probe surface: migration 0020 adds
hosts.repo_status + repo_status_error; WS handler projects every
init job's outcome onto the host row (idempotent already-initialised
collapses to ready); creds-save resets status and dispatches a fresh
probe; /hosts/{id}/repo/probe retry endpoint with banner.

NS-04 dashboard live + sort + filter: query-string filter
(q/status/repo_status/tag/sort/dir), 5s htmx live poll mirroring the
alerts pattern with a localStorage live toggle, sortable column
headers, filter row + clear.

Alerts page: ack'd-by line resolves user_id ULID to username.

Compose.yaml ignored — host-specific.
steve merged commit 505a2d7a79 into main 2026-05-05 22:09:18 +01:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: steve/restic-manager#18