Files
restic-manager/docs/superpowers/plans/2026-05-04-p2-completion.md
T

23 KiB

P2 Completion Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.

Goal: Close every remaining P2 task in tasks.md: P2R-09 (auto-init UX), P2R-10/11/12 (hooks), P2R-13 (bandwidth wiring + per-job override), P2R-14 (schedule next/last run), P2-16 (Windows svc), P2-17 (install.ps1), P2-18 (announce-and-approve).

Architecture: Server stays HTTP+WS; agent stays a single binary that auto-restages via make build. Hooks live on source_groups (and host-level defaults). Announce-and-approve adds a separate WS path (/ws/agent/pending) and a Pending hosts panel; token-flow stays default. Windows service support uses golang.org/x/sys/windows/svc behind a //go:build windows tag — Linux builds untouched. Operator is away — make best guesses on small UX choices, but commit each item separately so the choices are reviewable.

Tech Stack: Go 1.23+, chi router, modernc/sqlite, coder/websocket, robfig/cron/v3, HTMX + Tailwind, golang.org/x/sys/windows/svc, Ed25519 (stdlib).


Pre-flight

  • Run baseline: go vet ./... && go build ./... && go test ./... — must be green before starting. Restage agent + restart server (per CLAUDE.md restage block) so smoke env is warm.

Order of execution

Smallest blast-radius first. UI polish → bandwidth → next/last → hooks → announce → Windows. Commit and restage at each task boundary. Run go vet ./... && go test ./... before every commit.


Task 1 — P2R-13a: Wire bandwidth caps into restic invocations

Files:

  • Modify: internal/restic/runner.go (add LimitUploadKBps, LimitDownloadKBps to Env or to a per-call options struct already present; emit --limit-upload N/--limit-download N on restic backup|forget|prune|check|restore)

  • Modify: internal/agent/runner/*.go — pass host-wide caps into the runner. Caps come from agent.config.Config or are pushed via config.update. Decision: ship caps in the existing config.update envelope as new fields bandwidth_up_kbps, bandwidth_down_kbps. Server pushes on hello + on PUT /api/hosts/{id}/bandwidth.

  • Modify: internal/api/messages.go — extend ConfigUpdatePayload with the two int pointers.

  • Modify: internal/server/ws/handler.go (or wherever hello/config push lives) — include caps in the pushed config.

  • Modify: internal/server/http/host_bandwidth.go — after SetHostBandwidth, fan out a config.update to the connected agent (mirror the credentials-edit path).

  • Test: internal/restic/runner_test.go — assert flag injection.

  • Test: internal/server/ws/*_test.go — assert config.update carries caps on hello and on edit.

  • Step 1.1 Add LimitUploadKBps *int, LimitDownloadKBps *int to whatever per-host config the runner already consults. Existing pattern is restic.Env{}; extend it.

  • Step 1.2 Failing test in internal/restic/runner_test.go: build a backup command with LimitUploadKBps=1024, assert the resulting argv contains --limit-upload 1024.

  • Step 1.3 Implement: prepend the flags in argv builders for backup, forget, prune, check, restore. Skip when nil/<=0.

  • Step 1.4 Wire config.update payload — server reads Host.BandwidthUpKBps/DownKBps, includes them in the existing ConfigUpdatePayload push on hello and on bandwidth edit (mirror cred-edit fan-out in internal/server/http/host_credentials.go).

  • Step 1.5 Agent applies caps: store in the in-memory dispatcher state on config.update, attach to every restic call.

  • Step 1.6 go vet ./... && go test ./... && make build && <restage block>. Commit:

agent+server: apply host bandwidth caps to restic invocations

Task 2 — P2R-13b: Per-job override on Run-now confirm dialog

Decision: A small numeric input on the per-source-group Run-now button (and dashboard Run-all). Operator is away — keep it minimal: two optional inputs (up/down KB/s) on the dispatch endpoint; UI shows a <details> "Limit bandwidth for this run" disclosure with two number inputs.

Files:

  • Modify: internal/server/http/sources.go (or wherever the per-group Run-now POST lives) — accept optional bandwidth_up_kbps/bandwidth_down_kbps form fields, pass through.

  • Modify: dispatch path (internal/server/dispatch_*.go or ws/handler.go job-dispatch core) — accept overrides, include in the command.run payload.

  • Modify: internal/api/messages.goCommandRunPayload gains optional caps that take precedence over host-wide caps when present.

  • Modify: agent dispatcher — use payload override if present else falls back to config caps.

  • Modify: web/templates/pages/host_sources.html (and the schedules Run-now form) — <details> block.

  • Test: HTTP test for the new form fields; agent runner test for override precedence.

  • Step 2.1 Failing test: POST to per-group Run-now with bandwidth_up_kbps=512 → assert dispatched payload carries 512.

  • Step 2.2 Implement endpoint changes + payload extension.

  • Step 2.3 Agent override precedence test (payload wins over config).

  • Step 2.4 UI <details> blocks (one per Run-now form).

  • Step 2.5 Playwright spot-check via :8080 smoke env: open Sources tab, expand the Run-now disclosure, fire with limit=128, then open the live job log and confirm the agent's restic argv (read /tmp/rm-smoke/server.log for the dispatched command — it logs argv) shows --limit-upload 128.

  • Step 2.6 Commit.

Task 3 — P2R-14: Schedule "next run" / "last run"

Files:

  • Modify: internal/store/schedules.go — add NextRunAt(time.Time) derivation helper and LatestScheduledJobAt(host_id, schedule_id) (time.Time, error) (or a single batched fetch for all schedules of a host).

  • Modify: dashboard host row (web/templates/partials/host_row.html) — show "Next: …" and "Last: …" when there's a single covering schedule (already detected in slice 5).

  • Modify: web/templates/pages/host_schedules.html — add Next/Last columns to the schedules table.

  • Modify: relevant page handlers (internal/server/http/ui_schedules.go, dashboard handler) — populate the data.

  • Test: schedules_test.go for next-run derivation (parse cron, compute next from a fixed now).

  • Step 3.1 Add NextRun(cronExpr string, from time.Time) (time.Time, error) helper using robfig/cron/v3's Parse(...).Next(from). Test with three crons.

  • Step 3.2 Add LatestJobByActorKindForSchedule(host_id, schedule_id) (time.Time, status, error) query against jobs (filter actor_kind='schedule' AND schedule_id=?, ORDER BY started_at DESC LIMIT 1).

  • Step 3.3 Wire schedules-page handler to populate Next/Last per row; render relative time + ISO tooltip (mirror existing formatRelTime template helper if it exists; otherwise use a simple "5m ago" helper).

  • Step 3.4 Wire dashboard row: when single covering schedule, surface "Next: 03:00" / "Last: 8h ago — succeeded".

  • Step 3.5 Playwright spot-check: a host with a schedule shows Next/Last; pause it → Next becomes "—" / "(paused)".

  • Step 3.6 Commit.

Task 4 — P2R-09: Auto-init UX polish

Files:

  • Modify: web/templates/pages/host_repo.html — danger-zone re-init button + two-step confirm (type the host name).

  • Modify: internal/server/http/ui_repo.go (or new repo_reinit.go) — POST /hosts/{id}/repo/reinit admin-only, audit-logged. Server runs restic init --force (or wipes-then-inits — pick the safer of the two; restic doesn't truly wipe a repo, the operator must clear the bucket. Best guess: dispatch a normal init job with a flag that re-runs even if the repo claims to exist; if restic refuses, surface "the repo on the remote already has data — clear it manually before re-init" via the job log).

  • Modify: host detail page header / vitals strip — surface init result line. Use the existing latest-init-job query to render "repo ready · initialised ago" or "init failed · job N · retry".

  • Test: HTTP test for re-init endpoint (auth, audit, host-name confirm); template test that the result line renders for both states.

  • Step 4.1 Add helper: LatestJobByKind(host_id, "init") — already exists from P2R-06 (store.LatestJobByKind). Reuse.

  • Step 4.2 Render init line into vitals strip; show "init failed" amber when latest init failed.

  • Step 4.3 Implement POST /hosts/{id}/repo/reinit handler — admin role check, requires a confirm_hostname form field that must equal host.Name, returns 400 otherwise. Dispatches a fresh init job.

  • Step 4.4 Add danger-zone re-init form to host_repo.html (currently disabled per slice 4). Two-step confirm with the typed hostname.

  • Step 4.5 Playwright: visit /hosts/{id}/repo, click re-init, type wrong hostname → blocked; type right hostname → dispatches init job → returns to live log.

  • Step 4.6 Commit.

Task 5 — P2R-10: Hook schema (migration 0010)

Files:

  • Create: internal/store/migrations/0010_hooks.sql

    • ALTER TABLE source_groups ADD COLUMN pre_hook BLOB; (AEAD ciphertext, NULLable)
    • ALTER TABLE source_groups ADD COLUMN post_hook BLOB;
    • ALTER TABLE hosts ADD COLUMN pre_hook_default BLOB;
    • ALTER TABLE hosts ADD COLUMN post_hook_default BLOB;
    • All four are AEAD ciphertext (existing crypto.AEAD); BLOB column type.
  • Modify: internal/store/types.go — add PreHook *string (decrypted), PostHook *string to SourceGroup; same to Host.

  • Modify: internal/store/sources.go + internal/store/hosts.go — getters/setters encrypt on write, decrypt on read. Pass crypto.AEAD through (pattern mirrors host_credentials.go).

  • Test: encrypt/decrypt round-trip; setting nil clears the column.

  • Step 5.1 Write migration SQL. Column-level ALTERs only (per CLAUDE.md).

  • Step 5.2 Update store types + getters/setters with AEAD encrypt/decrypt. Mirror internal/store/host_credentials.go patterns exactly.

  • Step 5.3 Round-trip test: set hook on a source group; reload; assert plaintext returned. Set nil; assert nil after reload.

  • Step 5.4 go vet && go test. Commit.

Task 6 — P2R-11: Agent execution of hooks

Files:

  • Modify: internal/api/messages.goConfigUpdatePayload (or the per-source-group bundle inside ScheduleSetPayload) carries PreHook, PostHook plaintext (server has decrypted by then; wire is authenticated WS, same trust boundary as repo creds).

  • Modify: agent dispatcher — for kind=backup only:

    • Run pre_hook (if present) via os/exec with the host shell (/bin/sh -c on Linux, cmd.exe /C on Windows). Capture stdout+stderr → JobLog with hook: prefix. Non-zero exit aborts the backup, marks the job failed with pre_hook error.
    • Run post_hook (if present) always after the backup, with RM_JOB_STATUS=succeeded|failed env var. Capture into JobLog, prefix hook:. Non-zero exit on post_hook does NOT change job status (warning logged).
  • Skip both for kind ∈ {forget, prune, check, unlock, init} per spec.md §14.3.

  • Test: dispatcher test with a pre_hook that exits 1 → backup not started; post_hook always runs and sees RM_JOB_STATUS.

  • Step 6.1 Plumb hooks through ScheduleSetPayload source-group bundle + per-group Run-now command.run payload (override host-default with group hook if both present). Server-side resolution: host default if group hook is empty.

  • Step 6.2 Agent dispatcher: factor hook execution into internal/agent/runner/hooks.go. Use exec.CommandContext, set env, plumb output to existing JobLog stream with Source: "hook" (or prefix the log lines hook: …).

  • Step 6.3 Failing test in internal/agent/runner/runner_test.go (create file if absent): pre_hook=/bin/false → job fails with pre_hook failed (exit 1) and the actual restic backup never runs (assert via mock-restic shim).

  • Step 6.4 Test: post_hook runs even when backup fails; receives RM_JOB_STATUS=failed.

  • Step 6.5 Test: hooks skipped on forget/prune/check/unlock jobs.

  • Step 6.6 go vet && go test && make build && <restage block>. Commit.

Task 7 — P2R-12: Hook editor UI

Files:

  • Modify: web/templates/pages/source_group_edit.html (new or extend existing source-group form) — <textarea> for pre_hook, <textarea> for post_hook, with the warning banner: "this hook runs as the agent service user (root on Linux; LocalSystem on Windows)".

  • Modify: source-group HTTP handler (internal/server/http/sources.go) — accept hook fields on POST/PUT, encrypt-and-persist via store.

  • Create: a new "Settings" tab section on host detail (currently inert per P1-25) — wait, just add a new sub-tab or extend Repo page. Decision: add pre_hook_default / post_hook_default to the Repo page under a new "Hooks" section since Settings is still inert.

  • Modify: source-group form admin-only check; post-only edit allowed by operators? Decision: admin-only edit per spec; render but disable for operators.

  • Modify: audit-log writer — emit source_group.hook_updated and host.default_hook_updated events (without the hook body).

  • Test: HTTP test for create + update; admin-only enforcement; audit row written without secret.

  • Step 7.1 Source-group form extension + handler wiring.

  • Step 7.2 Repo page Hooks section (host defaults).

  • Step 7.3 Audit entries.

  • Step 7.4 Playwright: as admin, set a pre_hook of echo hello, fire Run-now, open live log, confirm hook: hello line appears.

  • Step 7.5 Commit.

Task 8 — P2-18a: Announce schema + endpoint

Files:

  • Create: internal/store/migrations/0011_pending_hosts.sql

    CREATE TABLE pending_hosts (
      id                 TEXT PRIMARY KEY,
      hostname           TEXT NOT NULL,
      os                 TEXT NOT NULL,
      arch               TEXT NOT NULL,
      agent_version      TEXT NOT NULL,
      restic_version     TEXT NOT NULL,
      public_key         BLOB NOT NULL,             -- 32-byte Ed25519
      fingerprint        TEXT NOT NULL,             -- "SHA256:hex"
      announced_from_ip  TEXT NOT NULL,
      first_seen_at      TEXT NOT NULL,
      last_seen_at       TEXT NOT NULL,
      expires_at         TEXT NOT NULL
    );
    CREATE INDEX pending_hosts_expires ON pending_hosts(expires_at);
    CREATE INDEX pending_hosts_fingerprint ON pending_hosts(fingerprint);
    
  • Create: internal/store/pending_hosts.goCreatePendingHost, GetPendingHostByFingerprint, ListPendingHosts, DeletePendingHost, TouchPendingHost, DeleteExpiredPendingHosts.

  • Create: internal/server/http/announce.goPOST /api/agents/announce accepts {hostname, os, arch, agent_version, restic_version, public_key (base64)}. Validates protocol_version implicitly via agent_version check. Token-bucket rate limit per source IP (10/min). Global cap 100 pending rows. Returns {fingerprint, pending_id, hostname_collision: bool}.

  • Test: announce_test.go — happy path; rate limit; cap; collision flag.

  • Step 8.1 Migration + store layer + tests.

  • Step 8.2 Endpoint + tests (use a fake clock + in-process token bucket).

  • Step 8.3 Commit.

Task 9 — P2-18b: Pending WS + accept/reject

Files:

  • Create: internal/server/ws/pending.goGET /ws/agent/pending upgrade. Server issues a 32-byte nonce; agent signs it with its Ed25519 private key; server verifies against the public_key stored on the pending row keyed by the supplied pending_id. If valid, hold the connection open; on accept, push a single enrolled message containing {bearer_token, repo_credentials_aead_blob} and close cleanly. On reject, close with code 4001 + reason "rejected".

  • Create: internal/server/http/pending.go — admin-only POST /api/pending-hosts/{id}/accept (atomically: mint bearer, decrypt admin-supplied repo creds (passed in form), promote pending row → real hosts row, push enrolled to the open WS, audit-log) and POST /api/pending-hosts/{id}/reject (delete row + close socket).

  • Modify: server main.go route registration.

  • Test: integration test — fake agent opens pending WS, admin POST /accept, agent receives bearer.

  • Step 9.1 Pending WS handler with nonce-sign verify.

  • Step 9.2 Accept/reject endpoints. Accept reuses the existing token-consume path internally (mints persistent bearer from crypto.RandomToken-style helper, inserts host row + host_credentials).

  • Step 9.3 Tests.

  • Step 9.4 Commit.

Task 10 — P2-18c: Agent announce path

Files:

  • Modify: cmd/agent/main.go — when RM_TOKEN is unset, switch to announce mode instead of erroring out. RM_SERVER still required.

  • Create: internal/agent/announce/announce.go — generate-or-load Ed25519 keypair (persisted as a file alongside secrets.enc, mode 0600). POST /api/agents/announce. Open /ws/agent/pending. Wait. On enrolled message, persist bearer to agent.yaml, persist repo creds via existing secrets store, exit announce mode and reconnect via the normal WS path.

  • Modify: deploy/install/install.sh — when RM_TOKEN is missing, run agent in announce mode and journalctl --follow until the agent prints the fingerprint, print it to the operator's terminal in big copy-friendly format, then keep following until enrolled.

  • Test: end-to-end test in internal/server/... using a fake agent.

  • Step 10.1 Keypair generation + persistence.

  • Step 10.2 Announce client + pending WS client; print SHA256:… fingerprint to stdout in a banner.

  • Step 10.3 Install script branch.

  • Step 10.4 Playwright: register a host via announce mode (run agent locally with no RM_TOKEN), log into UI, see Pending hosts panel with the fingerprint, click Accept, confirm host appears.

  • Step 10.5 Commit.

Task 11 — P2-18d: Pending hosts UI panel

Files:

  • Modify: web/templates/pages/dashboard.html — add Pending hosts panel above the host list when any pending rows exist.

  • Modify: dashboard handler — Store.ListPendingHosts(now) (auto-skips expired).

  • Add buttons → POST /api/pending-hosts/{id}/accept and /reject via HTMX.

  • Background sweeper for DeleteExpiredPendingHosts every 60s (mirror the existing offline-sweeper goroutine pattern).

  • Step 11.1 Sweeper goroutine.

  • Step 11.2 Dashboard handler + template.

  • Step 11.3 Accept form must include the same repo URL/user/pw fields as the token-mint form (admin still supplies repo creds at accept time).

  • Step 11.4 Playwright sweep.

  • Step 11.5 Commit.

Task 12 — P2-16: Windows service integration

Decision: Cannot test on Windows from WSL. Goal is a clean compile under GOOS=windows GOARCH=amd64 and code that follows the canonical golang.org/x/sys/windows/svc/example pattern. Untestable beyond compile + manual review; mark in commit message.

Files:

  • Create: internal/agent/service/service_windows.go (build tag //go:build windows) — implements svc.Handler. Execute starts the agent's main loop in a goroutine, listens for svc.Stop/svc.Shutdown, cancels ctx, waits.

  • Create: internal/agent/service/service_other.go (build tag //go:build !windows) — stub RunService that just runs the agent loop in the foreground.

  • Create: internal/agent/service/install_windows.goInstall, Uninstall, Start, Stop thin wrappers around mgr package.

  • Modify: cmd/agent/main.go — sub-commands: install, uninstall, start, stop, run (default). run delegates to service.Run() which on Windows checks svc.IsWindowsService() and dispatches accordingly.

  • Test: internal/agent/service/service_windows_test.go (build-tagged) for argv parsing only — actual SCM interaction can't be tested in CI.

  • Step 12.1 Implement the svc.Handler shell.

  • Step 12.2 Install/uninstall wrappers (use mgr.ConnectLocal(), m.CreateService(name, exepath, mgr.Config{...}, "run")).

  • Step 12.3 Cross-compile check: GOOS=windows GOARCH=amd64 go build ./cmd/agent must succeed.

  • Step 12.4 Commit with note "untested on Windows; compile-verified only".

Task 13 — P2-17: install.ps1

Files:

  • Create: deploy/install/install.ps1 — PowerShell 5.1+ compatible. Checks admin elevation. Downloads agent binary from $RM_SERVER/agent/binary?os=windows&arch=amd64. Drops it at C:\Program Files\restic-manager\restic-manager-agent.exe. Runs restic-manager-agent.exe install (registers service). Starts it. Detects existing tasks named *restic* via Get-ScheduledTask and prints them — does not auto-disable. Writes C:\ProgramData\restic-manager\agent.yaml with RM_SERVER + RM_TOKEN (or no token if announce-mode).

  • Modify: internal/server/http/install.go (or wherever install scripts are served) to also serve /install/install.ps1.

  • Modify: CLAUDE.md restage block to also stage install.ps1.

  • Step 13.1 Write the script.

  • Step 13.2 Wire serving + restage.

  • Step 13.3 Smoke parse: pwsh -NoProfile -Command "Get-Command -Syntax (Get-ChildItem deploy/install/install.ps1)" if pwsh is on PATH, else Set-StrictMode parse via pwsh -c "$null = [scriptblock]::Create((Get-Content deploy/install/install.ps1 -Raw))". Skip if no pwsh available — note in commit.

  • Step 13.4 Commit.

Task 14 — Final integration sweep

  • Step 14.1 go vet ./... && go test ./... -race. Full build. Restage. Restart server.
  • Step 14.2 Playwright walkthrough on :8080: login → dashboard shows pending-hosts empty state → create source group → set a pre_hook → Run-now with bandwidth override → confirm hook fires + bandwidth applied → schedules tab shows next/last → repo page shows init-OK line → re-init flow gated by typed hostname.
  • Step 14.3 Update tasks.md: tick P2R-09, P2R-10, P2R-11, P2R-12, P2R-13, P2R-14, P2-16, P2-17, P2-18 done. Update Phase 2 acceptance line items as satisfied.
  • Step 14.4 Open PR p2-completion → main with a summary of every item closed.

Decisions made on the operator's behalf (away)

  1. Bandwidth UI for per-job override: small <details> disclosure under each Run-now button. Simpler than a modal; matches the rest of the app's progressive-disclosure style.
  2. Re-init UX: server dispatches a fresh init job; if restic refuses because the repo already exists, surfaces the error in the job log and instructs the operator to clear the remote bucket. We don't try to forcibly wipe — too dangerous, and the agent doesn't have credentials to wipe S3/B2/etc generically.
  3. Hooks editor lives on the Repo page (host defaults) + on the source-group edit form (per-group override). Skips inventing a new "Settings" tab since that surface is still inert.
  4. Announce flow: admin still supplies repo creds at accept time (same form as the token-mint flow). The pending row only carries identity-of-the-endpoint material, never repo creds.
  5. Windows service: compile-verified only; untested. Commit message will say so.