P3 sweep fixes: snap-row CSS, tree expand, --no-ownership drop, target path
Bug fixes from the Playwright sweep against the live smoke server:
1. Snapshot-picker layout. The .snap-row class was used in the wireframe
but never landed in web/styles/input.css; rows rendered as vertical
blocks instead of a 6-column grid. Added the token (mirrors host-row
shape with restore-specific column widths).
2. Tree expansion. hx-target='closest .tree-row + .tree-children' isn't
a valid HTMX selector — modifiers don't chain. Replaced HTMX-driven
expansion with a small window.__rmTreeToggle helper that uses plain
fetch + .tree-pair wrapper structure for trivial sibling lookup.
Caches loaded state per node.
3. --no-ownership flag dropped. Restic 0.17 introduced --no-ownership;
0.16 rejects it ('unknown flag') before doing any work. Since the
agent runs as root in the systemd unit, restored files keep their
original uid/gid either way and the parent dir is root-owned, so
the 'cp without sudo' rationale doesn't hold. Drop the flag entirely.
4. Default target dir moved to /var/lib/restic-manager/restore. The
systemd unit pins ReadWritePaths to /etc/restic-manager +
/var/lib/restic-manager (with ProtectSystem=strict making the rest
of /var read-only); writes to /var/restic-restore failed with
'read-only file system'.
5. Confirm summary HTML escaping. defaultTarget JS literal evaluates
to a string with literal angle brackets; insertion into innerHTML
must escape them. Added an inline HTML-escape pass.
tasks.md ticked for the Restore sub-phase with a sweep summary
covering the live end-to-end test.
This commit is contained in:
@@ -246,18 +246,24 @@ Sizes: **S** = under a day, **M** = 1–3 days, **L** = 3–7 days.
|
||||
> doesn't have a confirmed need yet, so it's moved to the **Future /
|
||||
> unscheduled** section at the end of this file.
|
||||
|
||||
### Phase 3 — Restore (in progress, brand `p3-restore`)
|
||||
### Phase 3 — Restore ✅
|
||||
|
||||
> Spec: `docs/superpowers/specs/2026-05-04-p3-restore-design.md`.
|
||||
> Wireframe: `_diag/p3-restore-wizard/wireframe.html`.
|
||||
> Sweep screenshots: `_diag/p3-restore-sweep/`.
|
||||
> Shipped on branch `p3-restore`.
|
||||
|
||||
- [ ] **P3-X1** (S) Cancel-job feature. New `command.cancel` WS envelope; agent tracks per-job ctx.CancelFunc and kills the running `restic` subprocess (SIGTERM, SIGKILL after 5s grace); server endpoint `POST /api/jobs/{id}/cancel` bridges UI → WS; the existing UI Cancel button on `/jobs/{id}` becomes real for any running kind. Foundational — restore depends on it.
|
||||
- [ ] **P3-X2** (S) Tree-list synchronous WS RPC. New `tree.list` request / `tree.list.result` reply on the existing correlation-ID infra; agent runs `restic ls --json <sid> <path>` per call; server-side mediator `ws.SendRPC` + per-wizard-session in-memory cache (~30-min TTL).
|
||||
- [ ] **P3-01** (L) Restore wizard backend: tree browse via `tree.list` RPC (P3-X2), path picker validation, target selection (new-dir vs in-place + typed-confirm), dispatch endpoint `POST /hosts/{id}/restore`, audit row `host.restore`.
|
||||
- [ ] **P3-02** (L) Restore wizard UI: single-page progressively-enabled four-step form at `/hosts/{id}/restore` (and pre-selected variant `/hosts/{id}/snapshots/{sid}/restore`); tree-browser HTMX partials. Top-level "Restore" button on host detail.
|
||||
- [ ] **P3-03** (M) Restore execution: `restic.RunRestore` (paths, --target, --no-ownership for new-dir; preserves ownership for in-place); agent dispatcher case `JobRestore`; restore-specific job page variant with files-restored / bytes-restored / throughput / ETA / current-file widget.
|
||||
- [ ] **P3-09** (S) `diff` between two snapshots in UI: `JobDiff` JobKind, `restic.RunDiff`, `POST /api/hosts/{id}/snapshots/diff` dispatcher, snapshot-picker UI on Snapshots tab to pick A+B; output streams as `log.stream` to the standard live job log page.
|
||||
- [ ] **P3-X3** (S) Recent-restores panel on host detail: small line below the existing init-status, surfacing latest `JobRestore` outcome (succeeded N hours ago / failed → live log link). Backed by `store.LatestJobByKind(host_id, JobRestore)`.
|
||||
- [x] **P3-X1** (S) Cancel-job feature. `command.cancel` WS envelope; agent tracks per-job ctx.CancelFunc and kills the running `restic` subprocess via context cancel (SIGTERM, SIGKILL after 5s grace via `cmd.Cancel` + `cmd.WaitDelay`); server endpoint `POST /api/jobs/{id}/cancel` bridges UI → WS; the existing UI Cancel button on `/jobs/{id}` is now real for any running kind. Sandbox-aware: `internal/restic/cancel_{unix,windows}.go` build-tags pick SIGTERM on POSIX vs `os.Kill` on Windows (which can't deliver SIGTERM). Tests: cancel mid-run via 'sleep 30' fake-restic returns JobCancelled with exit 130 in <200ms.
|
||||
- [x] **P3-X2** (S) Tree-list synchronous WS RPC. `MsgTreeList` ↔ `MsgTreeListResult` with `Envelope.ID` correlation; generic `Hub.SendRPC` helper (registry of buffered channels keyed by ULID, ctx-cancel + timeout aware). `internal/restic.ListTreeChildren` wraps `restic ls --json` and filters its recursive output to direct children. Server-side `treeCache` is per-wizard-session (keyed by session cookie + host + snapshot + path) with a 30-min TTL and lazy sweep.
|
||||
- [x] **P3-01** (L) Restore wizard backend (`internal/server/http/ui_restore.go`). GET handlers render the four-step wizard against the wireframe. HTMX/fetch tree partial endpoint hits `fetchTreeWithCache`. POST validates: snapshot_id, ≥1 absolute path, in-place ⇒ confirm_hostname == host name, agent online; on error re-renders with operator's input intact. Happy path mints job_id, target = `/var/lib/restic-manager/restore/<job-id>` (server-picked, agent's writable dir under the systemd sandbox's `ReadWritePaths`), creates job row, ships `command.run` with `RestorePayload`, writes `host.restore` audit row, returns HX-Redirect (or 303) to the live job page.
|
||||
- [x] **P3-02** (L) Wizard UI templates (`web/templates/pages/host_restore.html` + `partials/tree_node.html`). Single-page progressively-enabled four-step form. Form-state-driven JS computes a running tally + step-4 confirm summary client-side. Tree expansion uses plain fetch (not HTMX) for simpler target lookup; loaded-state cached per node. Top-level Restore button on host detail right rail + per-snapshot Restore action on snapshot rows. New `.snap-row` token in `web/styles/input.css`.
|
||||
- [x] **P3-03** (M) Restore execution. `restic.RunRestore` builds `restore <sid> --target <dir> [--include p]...` with --json; new `pumpRestoreStdout` parses status + summary objects. `--no-ownership` is **not** passed — restic 0.17 added that flag, 0.16 errors out, and the agent's systemd unit runs as root anyway so the original "cp without sudo" rationale doesn't hold (parent dir is root-owned regardless). `runner.RunRestore` translates `RestoreStatus` into `job.progress` (mapping FilesRestored → FilesDone, etc.); agent dispatcher case `JobRestore` reuses the `spawn()` helper from P3-X1 so cancel works. Restore-shaped job-detail variant with current-file display under the progress bar.
|
||||
- [x] **P3-09** (S) `diff` between two snapshots. `JobDiff` JobKind + `restic.RunDiff` + `runner.RunDiff`; `POST /api/hosts/{id}/snapshots/diff` (and HTMX-form variant on the unprefixed path) dispatcher with two-snapshot guard + per-host snapshot-list validation; UI panel on host detail right rail (visible when 2+ snapshots) with two short-id inputs + Diff button. Output streams as log.stream to the standard live job log page.
|
||||
- [x] **P3-X3** (S) Recent-restores line on host detail. `hostChromeData` grows `RestoreStatus` / `RestoreAt` / `RestoreJobID` populated via `store.LatestJobByKind(host_id, 'restore')` (already exists from P2R). `host_chrome.html` renders a small line below the init-status one with status-coloured copy + a link to the job log. Hidden when no restore has ever run on this host.
|
||||
|
||||
> **Migration 0012** widens the `jobs.kind` CHECK constraint to include `restore` and `diff`. Rebuild required (SQLite can't ALTER CHECK in place); follows the safe pattern from 0005, with a defensive temp-table backup of `job_logs` so the cascade-trap that bit migration 0007 wouldn't take the log history with it.
|
||||
|
||||
> **As shipped (Playwright sweep against the live smoke env, 2026-05-04):** login → host detail → Restore button → wizard step 1 picks snapshot a1ac4006 (most recent) → tree drill-down `/home/steve/test` (3 lazy loads) → tick `file1` + `file2` → step 4 confirm summary populated → dispatch → live job page with running progress widget → restore succeeds, files land on disk at `/var/lib/restic-manager/restore/<job-id>/home/steve/test/file{1,2}`. Snapshot diff between `a1ac4006` and `5f78c788` → diff job page, statistics output streamed (738 bytes added, 0 removed). Recent-restores line on host detail reads "last restore · succeeded 28s ago · job log →".
|
||||
|
||||
### Phase 3 — Alerts (not started)
|
||||
|
||||
|
||||
Reference in New Issue
Block a user