Phase 3 — Restore (P3-X1, X2, 01, 02, 03, 09, X3-X6) #6

Merged
steve merged 13 commits from p3-restore into main 2026-05-04 18:06:18 +01:00
3 changed files with 51 additions and 4 deletions
Showing only changes of commit e4031d26fa - Show all commits
+31
View File
@@ -2,6 +2,8 @@ package runner
import ( import (
"context" "context"
"os"
"path/filepath"
"strings" "strings"
"testing" "testing"
@@ -188,6 +190,35 @@ esac
} }
} }
// TestRunRestoreNewDirAutoCreatesTarget: a new-directory restore
// should mkdir the requested target chain before invoking restic, so
// operators don't have to pre-create the per-job subdir.
func TestRunRestoreNewDirAutoCreatesTarget(t *testing.T) {
t.Parallel()
bin := setupScript(t, `
case "$1" in
restore)
echo '{"message_type":"summary","seconds_elapsed":0,"total_files":0,"files_restored":0,"total_bytes":0,"bytes_restored":0}'
;;
esac
`)
tx := &fakeSender{}
r := New(Config{ResticBin: bin}, tx, 0)
// Multi-level path the operator hasn't created yet.
target := filepath.Join(t.TempDir(), "deep", "deeper", "deepest")
if err := r.RunRestore(context.Background(), "job-rmkdir", "abc",
[]string{"/etc/foo"}, false, target); err != nil {
t.Fatalf("RunRestore: %v", err)
}
if st, err := os.Stat(target); err != nil {
t.Fatalf("expected target dir to exist: %v", err)
} else if !st.IsDir() {
t.Fatalf("expected directory, got %v", st.Mode())
}
}
// TestRunDiffShipsLogLines: diff output is forwarded as log.stream. // TestRunDiffShipsLogLines: diff output is forwarded as log.stream.
func TestRunDiffShipsLogLines(t *testing.T) { func TestRunDiffShipsLogLines(t *testing.T) {
t.Parallel() t.Parallel()
+13 -2
View File
@@ -67,13 +67,24 @@ func (e Env) RunRestore(ctx context.Context, snapshotID string, paths []string,
target = "/" target = "/"
} else { } else {
// Expand $HOME / ${HOME} / leading ~/ in the operator-supplied // Expand $HOME / ${HOME} / leading ~/ in the operator-supplied
// path, using the agent's own HOME (which under the systemd // path, using the agent's own HOME (typically /root for the
// unit is the agent user's home — typically /root for the
// User=root unit). The expansion runs agent-side so the // User=root unit). The expansion runs agent-side so the
// operator can specify a portable default like // operator can specify a portable default like
// $HOME/rm-restore/<job-id>/ in the wizard without the server // $HOME/rm-restore/<job-id>/ in the wizard without the server
// needing to know which user the agent runs as. // needing to know which user the agent runs as.
target = expandHome(target) target = expandHome(target)
// Ensure the target directory exists. Restic itself creates
// missing leaves but won't traverse multiple missing levels
// (and we don't want the operator to have to pre-create the
// per-job subdir). 0700 keeps the data root-only — the agent
// runs as root, and operators who want a different mode can
// chmod after the fact. If MkdirAll fails (operator typed a
// path inside a read-only sandbox mount, ENOSPC, etc.) we
// surface a clean error rather than letting restic fail with
// something cryptic.
if err := os.MkdirAll(target, 0o700); err != nil {
return nil, fmt.Errorf("restic restore: prepare target %q: %w", target, err)
}
} }
args = append(args, "--target", target) args = append(args, "--target", target)
// --no-ownership was added in restic 0.17. Older versions reject // --no-ownership was added in restic 0.17. Older versions reject
+7 -2
View File
@@ -257,13 +257,18 @@ Sizes: **S** = under a day, **M** = 13 days, **L** = 37 days.
- [x] **P3-X2** (S) Tree-list synchronous WS RPC. `MsgTreeList``MsgTreeListResult` with `Envelope.ID` correlation; generic `Hub.SendRPC` helper (registry of buffered channels keyed by ULID, ctx-cancel + timeout aware). `internal/restic.ListTreeChildren` wraps `restic ls --json` and filters its recursive output to direct children. Server-side `treeCache` is per-wizard-session (keyed by session cookie + host + snapshot + path) with a 30-min TTL and lazy sweep. - [x] **P3-X2** (S) Tree-list synchronous WS RPC. `MsgTreeList``MsgTreeListResult` with `Envelope.ID` correlation; generic `Hub.SendRPC` helper (registry of buffered channels keyed by ULID, ctx-cancel + timeout aware). `internal/restic.ListTreeChildren` wraps `restic ls --json` and filters its recursive output to direct children. Server-side `treeCache` is per-wizard-session (keyed by session cookie + host + snapshot + path) with a 30-min TTL and lazy sweep.
- [x] **P3-01** (L) Restore wizard backend (`internal/server/http/ui_restore.go`). GET handlers render the four-step wizard against the wireframe. HTMX/fetch tree partial endpoint hits `fetchTreeWithCache`. POST validates: snapshot_id, ≥1 absolute path, in-place ⇒ confirm_hostname == host name, agent online; on error re-renders with operator's input intact. Happy path mints job_id, target = `/var/lib/restic-manager/restore/<job-id>` (server-picked, agent's writable dir under the systemd sandbox's `ReadWritePaths`), creates job row, ships `command.run` with `RestorePayload`, writes `host.restore` audit row, returns HX-Redirect (or 303) to the live job page. - [x] **P3-01** (L) Restore wizard backend (`internal/server/http/ui_restore.go`). GET handlers render the four-step wizard against the wireframe. HTMX/fetch tree partial endpoint hits `fetchTreeWithCache`. POST validates: snapshot_id, ≥1 absolute path, in-place ⇒ confirm_hostname == host name, agent online; on error re-renders with operator's input intact. Happy path mints job_id, target = `/var/lib/restic-manager/restore/<job-id>` (server-picked, agent's writable dir under the systemd sandbox's `ReadWritePaths`), creates job row, ships `command.run` with `RestorePayload`, writes `host.restore` audit row, returns HX-Redirect (or 303) to the live job page.
- [x] **P3-02** (L) Wizard UI templates (`web/templates/pages/host_restore.html` + `partials/tree_node.html`). Single-page progressively-enabled four-step form. Form-state-driven JS computes a running tally + step-4 confirm summary client-side. Tree expansion uses plain fetch (not HTMX) for simpler target lookup; loaded-state cached per node. Top-level Restore button on host detail right rail + per-snapshot Restore action on snapshot rows. New `.snap-row` token in `web/styles/input.css`. - [x] **P3-02** (L) Wizard UI templates (`web/templates/pages/host_restore.html` + `partials/tree_node.html`). Single-page progressively-enabled four-step form. Form-state-driven JS computes a running tally + step-4 confirm summary client-side. Tree expansion uses plain fetch (not HTMX) for simpler target lookup; loaded-state cached per node. Top-level Restore button on host detail right rail + per-snapshot Restore action on snapshot rows. New `.snap-row` token in `web/styles/input.css`.
- [x] **P3-03** (M) Restore execution. `restic.RunRestore` builds `restore <sid> --target <dir> [--include p]...` with --json; new `pumpRestoreStdout` parses status + summary objects. `--no-ownership` is **not** passed — restic 0.17 added that flag, 0.16 errors out, and the agent's systemd unit runs as root anyway so the original "cp without sudo" rationale doesn't hold (parent dir is root-owned regardless). `runner.RunRestore` translates `RestoreStatus` into `job.progress` (mapping FilesRestored → FilesDone, etc.); agent dispatcher case `JobRestore` reuses the `spawn()` helper from P3-X1 so cancel works. Restore-shaped job-detail variant with current-file display under the progress bar. - [x] **P3-03** (M) Restore execution. `restic.RunRestore` builds `restore <sid> --target <dir> [--include p]...` with --json; new `pumpRestoreStdout` parses status + summary objects. `--no-ownership` is gated on the agent's restic version via `Env.AtLeastVersion(0, 17)` — the flag was added in 0.17 and 0.16 rejects it. Restic version is threaded through `runner.Config.ResticVersion` from the agent's sysinfo snapshot. New-dir target is operator-editable (default `$HOME/rm-restore/<job-id>/`); agent expands `$HOME` / `${HOME}` / `~/` at run time and calls `os.MkdirAll` on the target chain so the operator never has to pre-create the per-job subdir. `runner.RunRestore` translates `RestoreStatus` into `job.progress` (mapping FilesRestored → FilesDone, etc.); agent dispatcher case `JobRestore` reuses the `spawn()` helper from P3-X1 so cancel works. Restore-shaped job-detail variant with current-file display under the progress bar.
- [x] **P3-09** (S) `diff` between two snapshots. `JobDiff` JobKind + `restic.RunDiff` + `runner.RunDiff`; `POST /api/hosts/{id}/snapshots/diff` (and HTMX-form variant on the unprefixed path) dispatcher with two-snapshot guard + per-host snapshot-list validation; UI panel on host detail right rail (visible when 2+ snapshots) with two short-id inputs + Diff button. Output streams as log.stream to the standard live job log page. - [x] **P3-09** (S) `diff` between two snapshots. `JobDiff` JobKind + `restic.RunDiff` + `runner.RunDiff`; `POST /api/hosts/{id}/snapshots/diff` (and HTMX-form variant on the unprefixed path) dispatcher with two-snapshot guard + per-host snapshot-list validation; UI panel on host detail right rail (visible when 2+ snapshots) with two short-id inputs + Diff button. Output streams as log.stream to the standard live job log page.
- [x] **P3-X3** (S) Recent-restores line on host detail. `hostChromeData` grows `RestoreStatus` / `RestoreAt` / `RestoreJobID` populated via `store.LatestJobByKind(host_id, 'restore')` (already exists from P2R). `host_chrome.html` renders a small line below the init-status one with status-coloured copy + a link to the job log. Hidden when no restore has ever run on this host. - [x] **P3-X3** (S) Recent-restores line on host detail. `hostChromeData` grows `RestoreStatus` / `RestoreAt` / `RestoreJobID` populated via `store.LatestJobByKind(host_id, 'restore')` (already exists from P2R). `host_chrome.html` renders a small line below the init-status one with status-coloured copy + a link to the job log. Hidden when no restore has ever run on this host.
- [x] **P3-X4** (S) Job log download (txt + ndjson). New `GET /api/jobs/{id}/log.{txt|ndjson}` endpoint backed by the persisted `job_logs` table — works any time (running or finished) without pausing the live WS stream because the source is the DB, not the live socket. Plain-text format mirrors the on-screen "HH:MM:SS.mmm TAG payload" shape with a small `# job ... · kind ... · status ...` header; ndjson emits one self-contained `{seq,ts,stream,payload}` JSON object per line for `jq` / tooling. Surfaced as a single header dropdown on the live job page (`details/summary`-driven, native keyboard support, click-outside-to-close). New reusable `.dropdown` / `.dropdown-menu` / `.dropdown-item` tokens in `web/styles/input.css`.
- [x] **P3-X5** (S) UK lint locale + sweep. `.golangci.yml` misspell locale switched US → UK and the codebase swept (~73 corrections — behaviour, serialise, recognise, honour, initialise, enrol, unauthorised, etc.). Wire `ErrorCode` value `"unauthorized"``"unauthorised"` is a tiny contract change but the agent doesn't parse those codes today and no external clients exist yet.
- [x] **P3-X6** (S) Snapshot SIZE/FILES tooltip on host detail. The per-snapshot summary block was added by restic 0.17 (the source comment in `internal/restic/snapshots.go` incorrectly said 0.16+); on 0.16 hosts the columns render `—`. `hostDetailPage.LegacyRestic` (computed via `Env.AtLeastVersion(0, 17)`) drives a `title="Needs restic 0.17+ on the agent host. This host runs <ver>."` + `cursor: help` on the column headers, hidden once the host upgrades.
> **Migration 0012** widens the `jobs.kind` CHECK constraint to include `restore` and `diff`. Rebuild required (SQLite can't ALTER CHECK in place); follows the safe pattern from 0005, with a defensive temp-table backup of `job_logs` so the cascade-trap that bit migration 0007 wouldn't take the log history with it. > **Migration 0012** widens the `jobs.kind` CHECK constraint to include `restore` and `diff`. Rebuild required (SQLite can't ALTER CHECK in place); follows the safe pattern from 0005, with a defensive temp-table backup of `job_logs` so the cascade-trap that bit migration 0007 wouldn't take the log history with it.
> **As shipped (Playwright sweep against the live smoke env, 2026-05-04):** login → host detail → Restore button → wizard step 1 picks snapshot a1ac4006 (most recent) → tree drill-down `/home/steve/test` (3 lazy loads) → tick `file1` + `file2` → step 4 confirm summary populated → dispatch → live job page with running progress widget → restore succeeds, files land on disk at `/var/lib/restic-manager/restore/<job-id>/home/steve/test/file{1,2}`. Snapshot diff between `a1ac4006` and `5f78c788` → diff job page, statistics output streamed (738 bytes added, 0 removed). Recent-restores line on host detail reads "last restore · succeeded 28s ago · job log →". > **install.sh + systemd unit:** the install script now pre-creates `/root/rm-restore` (root-owned 0700) so the default new-dir restore target works under the sandbox out of the box; the unit's `ReadWritePaths` gains `-/root/rm-restore` (soft-fail prefix). Existing installs need a re-run of `install.sh` to pick up the new dir; new operator-typed targets are auto-created by the agent at job time.
> **As shipped (Playwright sweep against the live smoke env, 2026-05-04):** login → host detail → Restore button → wizard step 1 picks snapshot a1ac4006 (most recent) → tree drill-down `/home/steve/test` (3 lazy loads) → tick `file1` + `file2` → step 4 confirm summary populated → dispatch → live job page with running progress widget → restore succeeds, files land on disk at `/root/rm-restore/<job-id>/home/steve/test/file{1,2}` (default `$HOME/rm-restore/<job-id>/` after agent-side expansion). Custom-target restore to `/tmp/custom-restore/<job-id>/` lands inside the agent's `PrivateTmp` namespace. Snapshot diff between `a1ac4006` and `5f78c788` → diff job page, statistics output streamed (738 bytes added, 0 removed). Recent-restores line on host detail reads "last restore · succeeded 28s ago · job log →". Download dropdown serves both `.txt` and `.ndjson` with correct `Content-Type` + `Content-Disposition`. SIZE/FILES tooltip "Needs restic 0.17+ on the agent host. This host runs 0.16.4." renders on column hover.
### Phase 3 — Alerts (not started) ### Phase 3 — Alerts (not started)