23 KiB
P2 Completion Implementation Plan
For agentic workers: REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
Goal: Close every remaining P2 task in tasks.md: P2R-09 (auto-init UX), P2R-10/11/12 (hooks), P2R-13 (bandwidth wiring + per-job override), P2R-14 (schedule next/last run), P2-16 (Windows svc), P2-17 (install.ps1), P2-18 (announce-and-approve).
Architecture: Server stays HTTP+WS; agent stays a single binary that auto-restages via make build. Hooks live on source_groups (and host-level defaults). Announce-and-approve adds a separate WS path (/ws/agent/pending) and a Pending hosts panel; token-flow stays default. Windows service support uses golang.org/x/sys/windows/svc behind a //go:build windows tag — Linux builds untouched. Operator is away — make best guesses on small UX choices, but commit each item separately so the choices are reviewable.
Tech Stack: Go 1.23+, chi router, modernc/sqlite, coder/websocket, robfig/cron/v3, HTMX + Tailwind, golang.org/x/sys/windows/svc, Ed25519 (stdlib).
Pre-flight
- Run baseline:
go vet ./... && go build ./... && go test ./...— must be green before starting. Restage agent + restart server (per CLAUDE.md restage block) so smoke env is warm.
Order of execution
Smallest blast-radius first. UI polish → bandwidth → next/last → hooks → announce → Windows. Commit and restage at each task boundary. Run go vet ./... && go test ./... before every commit.
Task 1 — P2R-13a: Wire bandwidth caps into restic invocations
Files:
-
Modify:
internal/restic/runner.go(addLimitUploadKBps,LimitDownloadKBpstoEnvor to a per-call options struct already present; emit--limit-upload N/--limit-download Nonrestic backup|forget|prune|check|restore) -
Modify:
internal/agent/runner/*.go— pass host-wide caps into the runner. Caps come fromagent.config.Configor are pushed viaconfig.update. Decision: ship caps in the existingconfig.updateenvelope as new fieldsbandwidth_up_kbps,bandwidth_down_kbps. Server pushes on hello + onPUT /api/hosts/{id}/bandwidth. -
Modify:
internal/api/messages.go— extendConfigUpdatePayloadwith the two int pointers. -
Modify:
internal/server/ws/handler.go(or wherever hello/config push lives) — include caps in the pushed config. -
Modify:
internal/server/http/host_bandwidth.go— afterSetHostBandwidth, fan out aconfig.updateto the connected agent (mirror the credentials-edit path). -
Test:
internal/restic/runner_test.go— assert flag injection. -
Test:
internal/server/ws/*_test.go— assert config.update carries caps on hello and on edit. -
Step 1.1 Add
LimitUploadKBps *int,LimitDownloadKBps *intto whatever per-host config the runner already consults. Existing pattern isrestic.Env{}; extend it. -
Step 1.2 Failing test in
internal/restic/runner_test.go: build a backup command withLimitUploadKBps=1024, assert the resulting argv contains--limit-upload 1024. -
Step 1.3 Implement: prepend the flags in argv builders for
backup,forget,prune,check,restore. Skip when nil/<=0. -
Step 1.4 Wire
config.updatepayload — server readsHost.BandwidthUpKBps/DownKBps, includes them in the existingConfigUpdatePayloadpush on hello and on bandwidth edit (mirror cred-edit fan-out ininternal/server/http/host_credentials.go). -
Step 1.5 Agent applies caps: store in the in-memory dispatcher state on
config.update, attach to every restic call. -
Step 1.6
go vet ./... && go test ./... && make build && <restage block>. Commit:
agent+server: apply host bandwidth caps to restic invocations
Task 2 — P2R-13b: Per-job override on Run-now confirm dialog
Decision: A small numeric input on the per-source-group Run-now button (and dashboard Run-all). Operator is away — keep it minimal: two optional inputs (up/down KB/s) on the dispatch endpoint; UI shows a <details> "Limit bandwidth for this run" disclosure with two number inputs.
Files:
-
Modify:
internal/server/http/sources.go(or wherever the per-group Run-now POST lives) — accept optionalbandwidth_up_kbps/bandwidth_down_kbpsform fields, pass through. -
Modify: dispatch path (
internal/server/dispatch_*.goorws/handler.gojob-dispatch core) — accept overrides, include in thecommand.runpayload. -
Modify:
internal/api/messages.go—CommandRunPayloadgains optional caps that take precedence over host-wide caps when present. -
Modify: agent dispatcher — use payload override if present else falls back to config caps.
-
Modify:
web/templates/pages/host_sources.html(and the schedules Run-now form) —<details>block. -
Test: HTTP test for the new form fields; agent runner test for override precedence.
-
Step 2.1 Failing test: POST to per-group Run-now with
bandwidth_up_kbps=512→ assert dispatched payload carries 512. -
Step 2.2 Implement endpoint changes + payload extension.
-
Step 2.3 Agent override precedence test (payload wins over config).
-
Step 2.4 UI
<details>blocks (one per Run-now form). -
Step 2.5 Playwright spot-check via
:8080smoke env: open Sources tab, expand the Run-now disclosure, fire with limit=128, then open the live job log and confirm the agent's restic argv (read/tmp/rm-smoke/server.logfor the dispatched command — it logs argv) shows--limit-upload 128. -
Step 2.6 Commit.
Task 3 — P2R-14: Schedule "next run" / "last run"
Files:
-
Modify:
internal/store/schedules.go— addNextRunAt(time.Time)derivation helper andLatestScheduledJobAt(host_id, schedule_id) (time.Time, error)(or a single batched fetch for all schedules of a host). -
Modify: dashboard host row (
web/templates/partials/host_row.html) — show "Next: …" and "Last: …" when there's a single covering schedule (already detected in slice 5). -
Modify:
web/templates/pages/host_schedules.html— add Next/Last columns to the schedules table. -
Modify: relevant page handlers (
internal/server/http/ui_schedules.go, dashboard handler) — populate the data. -
Test:
schedules_test.gofor next-run derivation (parse cron, compute next from a fixednow). -
Step 3.1 Add
NextRun(cronExpr string, from time.Time) (time.Time, error)helper usingrobfig/cron/v3'sParse(...).Next(from). Test with three crons. -
Step 3.2 Add
LatestJobByActorKindForSchedule(host_id, schedule_id) (time.Time, status, error)query againstjobs(filteractor_kind='schedule'ANDschedule_id=?, ORDER BYstarted_atDESC LIMIT 1). -
Step 3.3 Wire schedules-page handler to populate Next/Last per row; render relative time + ISO tooltip (mirror existing
formatRelTimetemplate helper if it exists; otherwise use a simple "5m ago" helper). -
Step 3.4 Wire dashboard row: when single covering schedule, surface "Next: 03:00" / "Last: 8h ago — succeeded".
-
Step 3.5 Playwright spot-check: a host with a schedule shows Next/Last; pause it → Next becomes "—" / "(paused)".
-
Step 3.6 Commit.
Task 4 — P2R-09: Auto-init UX polish
Files:
-
Modify:
web/templates/pages/host_repo.html— danger-zone re-init button + two-step confirm (type the host name). -
Modify:
internal/server/http/ui_repo.go(or newrepo_reinit.go) —POST /hosts/{id}/repo/reinitadmin-only, audit-logged. Server runsrestic init --force(or wipes-then-inits — pick the safer of the two; restic doesn't truly wipe a repo, the operator must clear the bucket. Best guess: dispatch a normalinitjob with a flag that re-runs even if the repo claims to exist; if restic refuses, surface "the repo on the remote already has data — clear it manually before re-init" via the job log). -
Modify: host detail page header / vitals strip — surface init result line. Use the existing latest-
init-job query to render "repo ready · initialised ago" or "init failed · job N · retry". -
Test: HTTP test for re-init endpoint (auth, audit, host-name confirm); template test that the result line renders for both states.
-
Step 4.1 Add helper:
LatestJobByKind(host_id, "init")— already exists from P2R-06 (store.LatestJobByKind). Reuse. -
Step 4.2 Render init line into vitals strip; show "init failed" amber when latest init failed.
-
Step 4.3 Implement
POST /hosts/{id}/repo/reinithandler — admin role check, requires aconfirm_hostnameform field that must equalhost.Name, returns 400 otherwise. Dispatches a freshinitjob. -
Step 4.4 Add danger-zone re-init form to
host_repo.html(currently disabled per slice 4). Two-step confirm with the typed hostname. -
Step 4.5 Playwright: visit
/hosts/{id}/repo, click re-init, type wrong hostname → blocked; type right hostname → dispatches init job → returns to live log. -
Step 4.6 Commit.
Task 5 — P2R-10: Hook schema (migration 0010)
Files:
-
Create:
internal/store/migrations/0010_hooks.sqlALTER TABLE source_groups ADD COLUMN pre_hook BLOB;(AEAD ciphertext, NULLable)ALTER TABLE source_groups ADD COLUMN post_hook BLOB;ALTER TABLE hosts ADD COLUMN pre_hook_default BLOB;ALTER TABLE hosts ADD COLUMN post_hook_default BLOB;- All four are AEAD ciphertext (existing
crypto.AEAD); BLOB column type.
-
Modify:
internal/store/types.go— addPreHook *string(decrypted),PostHook *stringtoSourceGroup; same toHost. -
Modify:
internal/store/sources.go+internal/store/hosts.go— getters/setters encrypt on write, decrypt on read. Passcrypto.AEADthrough (pattern mirrorshost_credentials.go). -
Test: encrypt/decrypt round-trip; setting
nilclears the column. -
Step 5.1 Write migration SQL. Column-level ALTERs only (per CLAUDE.md).
-
Step 5.2 Update store types + getters/setters with AEAD encrypt/decrypt. Mirror
internal/store/host_credentials.gopatterns exactly. -
Step 5.3 Round-trip test: set hook on a source group; reload; assert plaintext returned. Set nil; assert nil after reload.
-
Step 5.4
go vet && go test. Commit.
Task 6 — P2R-11: Agent execution of hooks
Files:
-
Modify:
internal/api/messages.go—ConfigUpdatePayload(or the per-source-group bundle insideScheduleSetPayload) carriesPreHook,PostHookplaintext (server has decrypted by then; wire is authenticated WS, same trust boundary as repo creds). -
Modify: agent dispatcher — for
kind=backuponly:- Run
pre_hook(if present) viaos/execwith the host shell (/bin/sh -con Linux,cmd.exe /Con Windows). Capture stdout+stderr → JobLog withhook:prefix. Non-zero exit aborts the backup, marks the job failed withpre_hookerror. - Run
post_hook(if present) always after the backup, withRM_JOB_STATUS=succeeded|failedenv var. Capture into JobLog, prefixhook:. Non-zero exit on post_hook does NOT change job status (warning logged).
- Run
-
Skip both for
kind∈ {forget, prune, check, unlock, init} per spec.md §14.3. -
Test: dispatcher test with a
pre_hookthat exits 1 → backup not started;post_hookalways runs and seesRM_JOB_STATUS. -
Step 6.1 Plumb hooks through
ScheduleSetPayloadsource-group bundle + per-group Run-nowcommand.runpayload (override host-default with group hook if both present). Server-side resolution: host default if group hook is empty. -
Step 6.2 Agent dispatcher: factor hook execution into
internal/agent/runner/hooks.go. Useexec.CommandContext, set env, plumb output to existing JobLog stream withSource: "hook"(or prefix the log lineshook: …). -
Step 6.3 Failing test in
internal/agent/runner/runner_test.go(create file if absent):pre_hook=/bin/false→ job fails withpre_hook failed (exit 1)and the actual restic backup never runs (assert via mock-restic shim). -
Step 6.4 Test:
post_hookruns even when backup fails; receivesRM_JOB_STATUS=failed. -
Step 6.5 Test: hooks skipped on
forget/prune/check/unlockjobs. -
Step 6.6
go vet && go test && make build && <restage block>. Commit.
Task 7 — P2R-12: Hook editor UI
Files:
-
Modify:
web/templates/pages/source_group_edit.html(new or extend existing source-group form) —<textarea>for pre_hook,<textarea>for post_hook, with the warning banner: "this hook runs as the agent service user (root on Linux; LocalSystem on Windows)". -
Modify: source-group HTTP handler (
internal/server/http/sources.go) — accept hook fields on POST/PUT, encrypt-and-persist via store. -
Create: a new "Settings" tab section on host detail (currently inert per P1-25) — wait, just add a new sub-tab or extend Repo page. Decision: add
pre_hook_default/post_hook_defaultto the Repo page under a new "Hooks" section since Settings is still inert. -
Modify: source-group form admin-only check; post-only edit allowed by operators? Decision: admin-only edit per spec; render but disable for operators.
-
Modify: audit-log writer — emit
source_group.hook_updatedandhost.default_hook_updatedevents (without the hook body). -
Test: HTTP test for create + update; admin-only enforcement; audit row written without secret.
-
Step 7.1 Source-group form extension + handler wiring.
-
Step 7.2 Repo page Hooks section (host defaults).
-
Step 7.3 Audit entries.
-
Step 7.4 Playwright: as admin, set a
pre_hookofecho hello, fire Run-now, open live log, confirmhook: helloline appears. -
Step 7.5 Commit.
Task 8 — P2-18a: Announce schema + endpoint
Files:
-
Create:
internal/store/migrations/0011_pending_hosts.sqlCREATE TABLE pending_hosts ( id TEXT PRIMARY KEY, hostname TEXT NOT NULL, os TEXT NOT NULL, arch TEXT NOT NULL, agent_version TEXT NOT NULL, restic_version TEXT NOT NULL, public_key BLOB NOT NULL, -- 32-byte Ed25519 fingerprint TEXT NOT NULL, -- "SHA256:hex" announced_from_ip TEXT NOT NULL, first_seen_at TEXT NOT NULL, last_seen_at TEXT NOT NULL, expires_at TEXT NOT NULL ); CREATE INDEX pending_hosts_expires ON pending_hosts(expires_at); CREATE INDEX pending_hosts_fingerprint ON pending_hosts(fingerprint); -
Create:
internal/store/pending_hosts.go—CreatePendingHost,GetPendingHostByFingerprint,ListPendingHosts,DeletePendingHost,TouchPendingHost,DeleteExpiredPendingHosts. -
Create:
internal/server/http/announce.go—POST /api/agents/announceaccepts{hostname, os, arch, agent_version, restic_version, public_key (base64)}. Validates protocol_version implicitly viaagent_versioncheck. Token-bucket rate limit per source IP (10/min). Global cap 100 pending rows. Returns{fingerprint, pending_id, hostname_collision: bool}. -
Test:
announce_test.go— happy path; rate limit; cap; collision flag. -
Step 8.1 Migration + store layer + tests.
-
Step 8.2 Endpoint + tests (use a fake clock + in-process token bucket).
-
Step 8.3 Commit.
Task 9 — P2-18b: Pending WS + accept/reject
Files:
-
Create:
internal/server/ws/pending.go—GET /ws/agent/pendingupgrade. Server issues a 32-byte nonce; agent signs it with its Ed25519 private key; server verifies against thepublic_keystored on the pending row keyed by the suppliedpending_id. If valid, hold the connection open; on accept, push a singleenrolledmessage containing{bearer_token, repo_credentials_aead_blob}and close cleanly. On reject, close with code 4001 + reason "rejected". -
Create:
internal/server/http/pending.go— admin-onlyPOST /api/pending-hosts/{id}/accept(atomically: mint bearer, decrypt admin-supplied repo creds (passed in form), promote pending row → realhostsrow, pushenrolledto the open WS, audit-log) andPOST /api/pending-hosts/{id}/reject(delete row + close socket). -
Modify: server
main.goroute registration. -
Test: integration test — fake agent opens pending WS, admin POST /accept, agent receives bearer.
-
Step 9.1 Pending WS handler with nonce-sign verify.
-
Step 9.2 Accept/reject endpoints. Accept reuses the existing token-consume path internally (mints persistent bearer from
crypto.RandomToken-style helper, inserts host row +host_credentials). -
Step 9.3 Tests.
-
Step 9.4 Commit.
Task 10 — P2-18c: Agent announce path
Files:
-
Modify:
cmd/agent/main.go— whenRM_TOKENis unset, switch to announce mode instead of erroring out.RM_SERVERstill required. -
Create:
internal/agent/announce/announce.go— generate-or-load Ed25519 keypair (persisted as a file alongsidesecrets.enc, mode 0600). POST/api/agents/announce. Open/ws/agent/pending. Wait. Onenrolledmessage, persist bearer toagent.yaml, persist repo creds via existing secrets store, exit announce mode and reconnect via the normal WS path. -
Modify:
deploy/install/install.sh— whenRM_TOKENis missing, run agent in announce mode andjournalctl --followuntil the agent prints the fingerprint, print it to the operator's terminal in big copy-friendly format, then keep following until enrolled. -
Test: end-to-end test in
internal/server/...using a fake agent. -
Step 10.1 Keypair generation + persistence.
-
Step 10.2 Announce client + pending WS client; print
SHA256:…fingerprint to stdout in a banner. -
Step 10.3 Install script branch.
-
Step 10.4 Playwright: register a host via announce mode (run agent locally with no RM_TOKEN), log into UI, see Pending hosts panel with the fingerprint, click Accept, confirm host appears.
-
Step 10.5 Commit.
Task 11 — P2-18d: Pending hosts UI panel
Files:
-
Modify:
web/templates/pages/dashboard.html— add Pending hosts panel above the host list when any pending rows exist. -
Modify: dashboard handler —
Store.ListPendingHosts(now)(auto-skips expired). -
Add buttons → POST
/api/pending-hosts/{id}/acceptand/rejectvia HTMX. -
Background sweeper for
DeleteExpiredPendingHostsevery 60s (mirror the existing offline-sweeper goroutine pattern). -
Step 11.1 Sweeper goroutine.
-
Step 11.2 Dashboard handler + template.
-
Step 11.3 Accept form must include the same repo URL/user/pw fields as the token-mint form (admin still supplies repo creds at accept time).
-
Step 11.4 Playwright sweep.
-
Step 11.5 Commit.
Task 12 — P2-16: Windows service integration
Decision: Cannot test on Windows from WSL. Goal is a clean compile under GOOS=windows GOARCH=amd64 and code that follows the canonical golang.org/x/sys/windows/svc/example pattern. Untestable beyond compile + manual review; mark in commit message.
Files:
-
Create:
internal/agent/service/service_windows.go(build tag//go:build windows) — implementssvc.Handler.Executestarts the agent's main loop in a goroutine, listens forsvc.Stop/svc.Shutdown, cancels ctx, waits. -
Create:
internal/agent/service/service_other.go(build tag//go:build !windows) — stubRunServicethat just runs the agent loop in the foreground. -
Create:
internal/agent/service/install_windows.go—Install,Uninstall,Start,Stopthin wrappers aroundmgrpackage. -
Modify:
cmd/agent/main.go— sub-commands:install,uninstall,start,stop,run(default).rundelegates toservice.Run()which on Windows checkssvc.IsWindowsService()and dispatches accordingly. -
Test:
internal/agent/service/service_windows_test.go(build-tagged) for argv parsing only — actual SCM interaction can't be tested in CI. -
Step 12.1 Implement the svc.Handler shell.
-
Step 12.2 Install/uninstall wrappers (use
mgr.ConnectLocal(),m.CreateService(name, exepath, mgr.Config{...}, "run")). -
Step 12.3 Cross-compile check:
GOOS=windows GOARCH=amd64 go build ./cmd/agentmust succeed. -
Step 12.4 Commit with note "untested on Windows; compile-verified only".
Task 13 — P2-17: install.ps1
Files:
-
Create:
deploy/install/install.ps1— PowerShell 5.1+ compatible. Checks admin elevation. Downloads agent binary from$RM_SERVER/agent/binary?os=windows&arch=amd64. Drops it atC:\Program Files\restic-manager\restic-manager-agent.exe. Runsrestic-manager-agent.exe install(registers service). Starts it. Detects existing tasks named*restic*viaGet-ScheduledTaskand prints them — does not auto-disable. WritesC:\ProgramData\restic-manager\agent.yamlwithRM_SERVER+RM_TOKEN(or no token if announce-mode). -
Modify:
internal/server/http/install.go(or wherever install scripts are served) to also serve/install/install.ps1. -
Modify: CLAUDE.md restage block to also stage
install.ps1. -
Step 13.1 Write the script.
-
Step 13.2 Wire serving + restage.
-
Step 13.3 Smoke parse:
pwsh -NoProfile -Command "Get-Command -Syntax (Get-ChildItem deploy/install/install.ps1)"if pwsh is on PATH, elseSet-StrictModeparse viapwsh -c "$null = [scriptblock]::Create((Get-Content deploy/install/install.ps1 -Raw))". Skip if no pwsh available — note in commit. -
Step 13.4 Commit.
Task 14 — Final integration sweep
- Step 14.1
go vet ./... && go test ./... -race. Full build. Restage. Restart server. - Step 14.2 Playwright walkthrough on
:8080: login → dashboard shows pending-hosts empty state → create source group → set apre_hook→ Run-now with bandwidth override → confirm hook fires + bandwidth applied → schedules tab shows next/last → repo page shows init-OK line → re-init flow gated by typed hostname. - Step 14.3 Update
tasks.md: tick P2R-09, P2R-10, P2R-11, P2R-12, P2R-13, P2R-14, P2-16, P2-17, P2-18 done. Update Phase 2 acceptance line items as satisfied. - Step 14.4 Open PR
p2-completion → mainwith a summary of every item closed.
Decisions made on the operator's behalf (away)
- Bandwidth UI for per-job override: small
<details>disclosure under each Run-now button. Simpler than a modal; matches the rest of the app's progressive-disclosure style. - Re-init UX: server dispatches a fresh
initjob; if restic refuses because the repo already exists, surfaces the error in the job log and instructs the operator to clear the remote bucket. We don't try to forcibly wipe — too dangerous, and the agent doesn't have credentials to wipe S3/B2/etc generically. - Hooks editor lives on the Repo page (host defaults) + on the source-group edit form (per-group override). Skips inventing a new "Settings" tab since that surface is still inert.
- Announce flow: admin still supplies repo creds at accept time (same form as the token-mint flow). The pending row only carries identity-of-the-endpoint material, never repo creds.
- Windows service: compile-verified only; untested. Commit message will say so.