diff --git a/docs/superpowers/plans/2026-05-04-p2-completion.md b/docs/superpowers/plans/2026-05-04-p2-completion.md new file mode 100644 index 0000000..1bc93f2 --- /dev/null +++ b/docs/superpowers/plans/2026-05-04-p2-completion.md @@ -0,0 +1,259 @@ +# P2 Completion Implementation Plan + +> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task. + +**Goal:** Close every remaining P2 task in `tasks.md`: P2R-09 (auto-init UX), P2R-10/11/12 (hooks), P2R-13 (bandwidth wiring + per-job override), P2R-14 (schedule next/last run), P2-16 (Windows svc), P2-17 (`install.ps1`), P2-18 (announce-and-approve). + +**Architecture:** Server stays HTTP+WS; agent stays a single binary that auto-restages via `make build`. Hooks live on `source_groups` (and host-level defaults). Announce-and-approve adds a separate WS path (`/ws/agent/pending`) and a Pending hosts panel; token-flow stays default. Windows service support uses `golang.org/x/sys/windows/svc` behind a `//go:build windows` tag — Linux builds untouched. **Operator is away — make best guesses on small UX choices, but commit each item separately so the choices are reviewable.** + +**Tech Stack:** Go 1.23+, chi router, modernc/sqlite, `coder/websocket`, `robfig/cron/v3`, HTMX + Tailwind, `golang.org/x/sys/windows/svc`, Ed25519 (stdlib). + +--- + +## Pre-flight + +- [ ] **Run baseline:** `go vet ./... && go build ./... && go test ./...` — must be green before starting. Restage agent + restart server (per CLAUDE.md restage block) so smoke env is warm. + +## Order of execution + +Smallest blast-radius first. UI polish → bandwidth → next/last → hooks → announce → Windows. Commit and restage at each task boundary. Run `go vet ./... && go test ./...` before every commit. + +--- + +## Task 1 — P2R-13a: Wire bandwidth caps into restic invocations + +**Files:** +- Modify: `internal/restic/runner.go` (add `LimitUploadKBps`, `LimitDownloadKBps` to `Env` or to a per-call options struct already present; emit `--limit-upload N`/`--limit-download N` on `restic backup|forget|prune|check|restore`) +- Modify: `internal/agent/runner/*.go` — pass host-wide caps into the runner. Caps come from `agent.config.Config` or are pushed via `config.update`. Decision: ship caps in the existing `config.update` envelope as new fields `bandwidth_up_kbps`, `bandwidth_down_kbps`. Server pushes on hello + on `PUT /api/hosts/{id}/bandwidth`. +- Modify: `internal/api/messages.go` — extend `ConfigUpdatePayload` with the two int pointers. +- Modify: `internal/server/ws/handler.go` (or wherever hello/config push lives) — include caps in the pushed config. +- Modify: `internal/server/http/host_bandwidth.go` — after `SetHostBandwidth`, fan out a `config.update` to the connected agent (mirror the credentials-edit path). +- Test: `internal/restic/runner_test.go` — assert flag injection. +- Test: `internal/server/ws/*_test.go` — assert config.update carries caps on hello and on edit. + +- [ ] **Step 1.1** Add `LimitUploadKBps *int`, `LimitDownloadKBps *int` to whatever per-host config the runner already consults. Existing pattern is `restic.Env{}`; extend it. +- [ ] **Step 1.2** Failing test in `internal/restic/runner_test.go`: build a backup command with `LimitUploadKBps=1024`, assert the resulting argv contains `--limit-upload 1024`. +- [ ] **Step 1.3** Implement: prepend the flags in argv builders for `backup`, `forget`, `prune`, `check`, `restore`. Skip when nil/<=0. +- [ ] **Step 1.4** Wire `config.update` payload — server reads `Host.BandwidthUpKBps`/`DownKBps`, includes them in the existing `ConfigUpdatePayload` push on hello and on bandwidth edit (mirror cred-edit fan-out in `internal/server/http/host_credentials.go`). +- [ ] **Step 1.5** Agent applies caps: store in the in-memory dispatcher state on `config.update`, attach to every restic call. +- [ ] **Step 1.6** `go vet ./... && go test ./... && make build && `. Commit: +``` +agent+server: apply host bandwidth caps to restic invocations +``` + +## Task 2 — P2R-13b: Per-job override on Run-now confirm dialog + +**Decision:** A small numeric input on the per-source-group Run-now button (and dashboard Run-all). Operator is away — keep it minimal: two optional inputs (up/down KB/s) on the dispatch endpoint; UI shows a `
` "Limit bandwidth for this run" disclosure with two number inputs. + +**Files:** +- Modify: `internal/server/http/sources.go` (or wherever the per-group Run-now POST lives) — accept optional `bandwidth_up_kbps`/`bandwidth_down_kbps` form fields, pass through. +- Modify: dispatch path (`internal/server/dispatch_*.go` or `ws/handler.go` job-dispatch core) — accept overrides, include in the `command.run` payload. +- Modify: `internal/api/messages.go` — `CommandRunPayload` gains optional caps that take precedence over host-wide caps when present. +- Modify: agent dispatcher — use payload override if present else falls back to config caps. +- Modify: `web/templates/pages/host_sources.html` (and the schedules Run-now form) — `
` block. +- Test: HTTP test for the new form fields; agent runner test for override precedence. + +- [ ] **Step 2.1** Failing test: POST to per-group Run-now with `bandwidth_up_kbps=512` → assert dispatched payload carries 512. +- [ ] **Step 2.2** Implement endpoint changes + payload extension. +- [ ] **Step 2.3** Agent override precedence test (payload wins over config). +- [ ] **Step 2.4** UI `
` blocks (one per Run-now form). +- [ ] **Step 2.5** Playwright spot-check via `:8080` smoke env: open Sources tab, expand the Run-now disclosure, fire with limit=128, then open the live job log and confirm the agent's restic argv (read `/tmp/rm-smoke/server.log` for the dispatched command — it logs argv) shows `--limit-upload 128`. +- [ ] **Step 2.6** Commit. + +## Task 3 — P2R-14: Schedule "next run" / "last run" + +**Files:** +- Modify: `internal/store/schedules.go` — add `NextRunAt(time.Time)` derivation helper and `LatestScheduledJobAt(host_id, schedule_id) (time.Time, error)` (or a single batched fetch for all schedules of a host). +- Modify: dashboard host row (`web/templates/partials/host_row.html`) — show "Next: …" and "Last: …" when there's a single covering schedule (already detected in slice 5). +- Modify: `web/templates/pages/host_schedules.html` — add Next/Last columns to the schedules table. +- Modify: relevant page handlers (`internal/server/http/ui_schedules.go`, dashboard handler) — populate the data. +- Test: `schedules_test.go` for next-run derivation (parse cron, compute next from a fixed `now`). + +- [ ] **Step 3.1** Add `NextRun(cronExpr string, from time.Time) (time.Time, error)` helper using `robfig/cron/v3`'s `Parse(...).Next(from)`. Test with three crons. +- [ ] **Step 3.2** Add `LatestJobByActorKindForSchedule(host_id, schedule_id) (time.Time, status, error)` query against `jobs` (filter `actor_kind='schedule'` AND `schedule_id=?`, ORDER BY `started_at` DESC LIMIT 1). +- [ ] **Step 3.3** Wire schedules-page handler to populate Next/Last per row; render relative time + ISO tooltip (mirror existing `formatRelTime` template helper if it exists; otherwise use a simple "5m ago" helper). +- [ ] **Step 3.4** Wire dashboard row: when single covering schedule, surface "Next: 03:00" / "Last: 8h ago — succeeded". +- [ ] **Step 3.5** Playwright spot-check: a host with a schedule shows Next/Last; pause it → Next becomes "—" / "(paused)". +- [ ] **Step 3.6** Commit. + +## Task 4 — P2R-09: Auto-init UX polish + +**Files:** +- Modify: `web/templates/pages/host_repo.html` — danger-zone re-init button + two-step confirm (type the host name). +- Modify: `internal/server/http/ui_repo.go` (or new `repo_reinit.go`) — `POST /hosts/{id}/repo/reinit` admin-only, audit-logged. Server runs `restic init --force` (or wipes-then-inits — pick the safer of the two; restic doesn't truly wipe a repo, the operator must clear the bucket. **Best guess:** dispatch a normal `init` job with a flag that re-runs even if the repo claims to exist; if restic refuses, surface "the repo on the remote already has data — clear it manually before re-init" via the job log). +- Modify: host detail page header / vitals strip — surface init result line. Use the existing latest-`init`-job query to render "repo ready · initialised ago" or "init failed · job N · retry". +- Test: HTTP test for re-init endpoint (auth, audit, host-name confirm); template test that the result line renders for both states. + +- [ ] **Step 4.1** Add helper: `LatestJobByKind(host_id, "init")` — already exists from P2R-06 (`store.LatestJobByKind`). Reuse. +- [ ] **Step 4.2** Render init line into vitals strip; show "init failed" amber when latest init failed. +- [ ] **Step 4.3** Implement `POST /hosts/{id}/repo/reinit` handler — admin role check, requires a `confirm_hostname` form field that must equal `host.Name`, returns 400 otherwise. Dispatches a fresh `init` job. +- [ ] **Step 4.4** Add danger-zone re-init form to `host_repo.html` (currently disabled per slice 4). Two-step confirm with the typed hostname. +- [ ] **Step 4.5** Playwright: visit `/hosts/{id}/repo`, click re-init, type wrong hostname → blocked; type right hostname → dispatches init job → returns to live log. +- [ ] **Step 4.6** Commit. + +## Task 5 — P2R-10: Hook schema (migration 0010) + +**Files:** +- Create: `internal/store/migrations/0010_hooks.sql` + - `ALTER TABLE source_groups ADD COLUMN pre_hook BLOB;` (AEAD ciphertext, NULLable) + - `ALTER TABLE source_groups ADD COLUMN post_hook BLOB;` + - `ALTER TABLE hosts ADD COLUMN pre_hook_default BLOB;` + - `ALTER TABLE hosts ADD COLUMN post_hook_default BLOB;` + - All four are AEAD ciphertext (existing `crypto.AEAD`); BLOB column type. +- Modify: `internal/store/types.go` — add `PreHook *string` (decrypted), `PostHook *string` to `SourceGroup`; same to `Host`. +- Modify: `internal/store/sources.go` + `internal/store/hosts.go` — getters/setters encrypt on write, decrypt on read. Pass `crypto.AEAD` through (pattern mirrors `host_credentials.go`). +- Test: encrypt/decrypt round-trip; setting `nil` clears the column. + +- [ ] **Step 5.1** Write migration SQL. Column-level ALTERs only (per CLAUDE.md). +- [ ] **Step 5.2** Update store types + getters/setters with AEAD encrypt/decrypt. Mirror `internal/store/host_credentials.go` patterns exactly. +- [ ] **Step 5.3** Round-trip test: set hook on a source group; reload; assert plaintext returned. Set nil; assert nil after reload. +- [ ] **Step 5.4** `go vet && go test`. Commit. + +## Task 6 — P2R-11: Agent execution of hooks + +**Files:** +- Modify: `internal/api/messages.go` — `ConfigUpdatePayload` (or the per-source-group bundle inside `ScheduleSetPayload`) carries `PreHook`, `PostHook` plaintext (server has decrypted by then; wire is authenticated WS, same trust boundary as repo creds). +- Modify: agent dispatcher — for `kind=backup` only: + - Run `pre_hook` (if present) via `os/exec` with the host shell (`/bin/sh -c` on Linux, `cmd.exe /C` on Windows). Capture stdout+stderr → JobLog with `hook:` prefix. Non-zero exit aborts the backup, marks the job failed with `pre_hook` error. + - Run `post_hook` (if present) **always** after the backup, with `RM_JOB_STATUS=succeeded|failed` env var. Capture into JobLog, prefix `hook:`. Non-zero exit on post_hook does NOT change job status (warning logged). +- Skip both for `kind` ∈ {forget, prune, check, unlock, init} per spec.md §14.3. +- Test: dispatcher test with a `pre_hook` that exits 1 → backup not started; `post_hook` always runs and sees `RM_JOB_STATUS`. + +- [ ] **Step 6.1** Plumb hooks through `ScheduleSetPayload` source-group bundle + per-group Run-now `command.run` payload (override host-default with group hook if both present). Server-side resolution: host default if group hook is empty. +- [ ] **Step 6.2** Agent dispatcher: factor hook execution into `internal/agent/runner/hooks.go`. Use `exec.CommandContext`, set env, plumb output to existing JobLog stream with `Source: "hook"` (or prefix the log lines `hook: …`). +- [ ] **Step 6.3** Failing test in `internal/agent/runner/runner_test.go` (create file if absent): `pre_hook=/bin/false` → job fails with `pre_hook failed (exit 1)` and the actual restic backup never runs (assert via mock-restic shim). +- [ ] **Step 6.4** Test: `post_hook` runs even when backup fails; receives `RM_JOB_STATUS=failed`. +- [ ] **Step 6.5** Test: hooks skipped on `forget`/`prune`/`check`/`unlock` jobs. +- [ ] **Step 6.6** `go vet && go test && make build && `. Commit. + +## Task 7 — P2R-12: Hook editor UI + +**Files:** +- Modify: `web/templates/pages/source_group_edit.html` (new or extend existing source-group form) — `