P2 redesign · phase 2.5: tasks.md rewrite + UI patch-up
CI / Test (linux/amd64) (push) Failing after 4m47s
CI / Lint (push) Failing after 26s
CI / Build (windows/amd64) (push) Successful in 54s
CI / Build (linux/amd64) (push) Successful in 46s
CI / Build (linux/arm64) (push) Successful in 46s

The store rewrite in 5667cdf left tasks.md describing a data shape
(fat schedules, host.repo_initialised_at, manual flag) that no longer
exists, and left the host-detail templates rendering against fields
the store no longer exposes. This commit reconciles both.

tasks.md
* Mid-phase pivot called out at the top of Phase 2 with commit hashes.
* P2-01..P2-05 kept as done but stamped ⚠️ "shipped against old shape
  — to re-validate under P2R-02".
* P2-04.5 (manual flag) struck as superseded.
* New P2R-NN section covering work that previously lived only in
  commit messages and code stubs:
    P2R-00.1/00.2/00.3/00.4 — phases already shipped (this commit
                              records 00.4)
    P2R-01 — REST + WS rewire against slim schedules + source groups
             + repo maintenance + auto-init
    P2R-02 — UI rewire against the v4 wireframes
    P2R-03..05 — prune / check / unlock command surfaces
    P2R-06 — server-side maintenance ticker (cadence-driven)
    P2R-07 — repo stats panel
    P2R-08 — pending_runs queue worker
    P2R-09 — auto-init UX polish
    P2R-10..12 — pre/post hooks rehomed from schedule onto source group
    P2R-13..14 — bandwidth + next/last-run surface
* P2-16/17/18 (Windows + announce-and-approve) untouched.
* Phase 2 acceptance criteria rewritten against the new model.

UI patch-up (P2R-00.4)
* host_detail.html + host_row.html: removed every $host.RepoInitialisedAt
  reference (column dropped in migration 0008 — render was 500'ing).
* Removed manual init-repo branches; the auto-init path replaces them.
* Schedules sub-tab demoted from active link to inert div until P2R-02
  rebuilds the page (it was linking to a raw 501 from the stubbed
  ui_schedules.go handlers).
* Disabled the four per-host Run-now buttons (dashboard row + host
  detail header + empty-snapshots state + right-rail) with a
  "lands in P2 Phase 4" hint — handler is 501-stubbed pending P2R-01,
  so leaving them clickable produced silent failures over htmx.
* Dashboard row-action becomes "Open →" instead of Run-now.

Project tooling
* .mcp.json at repo root: project-scoped Playwright MCP override.
  Forces --headless (so I don't pop a browser at the operator) and
  --output-dir _diag (so screenshots / traces land in the gitignored
  _diag/ directory rather than scattered at the repo root).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-03 09:13:05 +01:00
parent 5667cdf13a
commit 813158b3d6
4 changed files with 111 additions and 78 deletions
+11
View File
@@ -0,0 +1,11 @@
{
"playwright": {
"command": "npx",
"args": [
"@playwright/mcp@latest",
"--headless",
"--output-dir",
"_diag"
]
}
}
+93 -21
View File
@@ -97,24 +97,91 @@ Sizes: **S** = under a day, **M** = 13 days, **L** = 37 days.
## Phase 2 — Scheduling, retention, repo operations
- [x] **P2-01** (M) Schedule schema + CRUD API. `schedules` table was already laid down in 0001; this slice adds `store.Schedule`/`RetentionPolicy`/`ScheduleOptions` types, `CreateSchedule` / `GetSchedule` / `ListSchedulesByHost` / `UpdateSchedule` / `DeleteSchedule` / `GetHostScheduleVersion` / `SetHostAppliedScheduleVersion` (mutations bump `host_schedule_version` atomically in-tx), and REST endpoints `GET|POST /api/hosts/{id}/schedules` + `PUT|DELETE /api/hosts/{id}/schedules/{sid}`. Validation: cron-expr parses via `robfig/cron/v3` (same parser the agent will use, so anything that validates here will fire there); kind ∈ {backup, forget, prune, check} (init/unlock are operator-only); backup schedules require ≥1 path; hooks rejected on non-backup kinds (spec §14.3). Mutations audit-logged. Server + store tests cover the happy path, validation, and version bumps.
- [x] **P2-02** (L) Server-pushed schedule reconciliation. `pushScheduleSet*` helpers (on-hello + async post-CRUD flavours), wiring in `onAgentHello` (always pushes, even when the host has no repo creds yet), `pushScheduleSetAsync` called from Create/Update/Delete handlers (no-op when the host is offline; on-hello catches up). `MsgScheduleAck` handled in the WS dispatcher: `OnScheduleAck` callback persists `applied_schedule_version`. Agent-side `schedule.set` handler ships in P2-03; the server side now has parity tests.
- [x] **P2-03** (M) Agent local scheduler. New `internal/agent/scheduler` package wraps `robfig/cron/v3``Apply(ScheduleSetPayload, Sender)` stops the prior cron (waits for in-flight entries), rebuilds from scratch (skipping disabled entries + skipping bad cron exprs with a warn log), starts, and emits `schedule.ack`. On a tick the entry sends a new `schedule.fire` envelope to the server with `{schedule_id, scheduled_at}`. The server's `OnScheduleFire` callback (`dispatchScheduledJob`) looks up the schedule, builds args from kind, persists a `jobs` row with `actor_kind=schedule` + `scheduled_id`, and ships `command.run` back on the same conn — agent runs the job through the existing dispatcher. Tx is swapped on every Apply so reconnect is handled naturally (cron entries that fire against a dropped tx log + skip the tick). `CreateJob` now writes `scheduled_id`; this column was in the schema since 0001 but never populated. Tests: scheduler unit tests cover ack-on-apply, cron tick → fire envelope, disabled-entries silent, replace-prior-state stops the old cron. Server-side end-to-end test covers fire → command.run with the right job_id/kind/args, plus jobs row with actor_kind=schedule + scheduled_id linking back. **Deferred:** persistence of next-fire times across agent restarts (a missed fire window during downtime simply fires once on reconnect — desirable behaviour).
- [x] **P2-04.5** (S) Manual schedules — kill `host.default_paths`. Two independent path lists (host.default_paths fed Run-now while schedule.paths fed cron) was a real footgun that could put divergent file sets in the same repo. Replaced with a `manual` flag on schedules: same data shape, no cron, fires only via Run-now. Migration 0007 drops `host.default_paths` (ALTER TABLE DROP COLUMN — no rebuild dance, the original draft used the parent-table-rebuild pattern and FK cascade wiped every dependent table on the smoke env), seeds a manual schedule from any non-empty default_paths, and renames `enrollment_tokens.default_paths``initial_paths`. Add-host form retitled "Initial schedule · manual" so the operator knows where the paths land. Per-schedule Run-now button (`POST /hosts/{id}/schedules/{sid}/run`) reuses the same `dispatchScheduleNow` path used by `schedule.fire`. Dashboard's per-host Run-now picks the host's only enabled manual schedule, then falls back to the only enabled schedule, else returns "pick one in Schedules tab" — keeps one-click for the common case. Schedule edit form gains a "Manual schedule" toggle that hides the cron field when checked. Agent skips manual schedules in cron build. Validator allows missing cron when manual=true.
- [x] **P2-04** (M) Schedule editor UI. New "Schedules" sub-tab on host detail (header + run-now panel preserved across the snapshot/schedule pages). List view shows status, cron, paths, retention summary (`store.RetentionPolicy.Summary()` renders "last=7, d=14, w=4"), tags, and edit/delete buttons. The header carries a "version N · agent in sync / agent at vM" indicator backed by `host_schedule_version` + `applied_schedule_version`. Create/edit form covers cron expr (with quick-pick presets), paths textarea, excludes textarea, tags (comma-separated), retention (six numeric inputs mirroring restic's `--keep-*` flags), bandwidth caps, enabled toggle. Form validation re-renders with the operator's typed input still in place. Each save fires `pushScheduleSetAsync` so an online agent re-arms within a few seconds. Hooks UI deferred to P2-15 (lands when the hook execution path does).
- [x] **P2-05** (M) `forget` command with retention policy. Wire: `api.CommandRunPayload` gains `retention_policy json.RawMessage`. Restic wrapper: `restic.ForgetPolicy` mirrors restic's `--keep-*` flags + `args()` helper; `restic.Env.RunForget` rejects empty policy (would delete every snapshot), runs `restic forget --json --keep-* …`. Crucially does **not** pass `--prune` — pruning lives behind a separate admin-only credential (P2-06); forget just rewrites the snapshot index. Agent runner: `RunForget(jobID, policy)` mirrors `RunBackup`'s envelope shape; on success runs `reportSnapshots` since forget shrinks the host's snapshot list. Agent dispatcher: `case api.JobForget:` decodes retention from the wire and dispatches. Server `dispatchScheduleNow`: for `kind=forget` schedules, marshals `RetentionPolicy` into the wire payload (refuses if empty). UI: schedule edit form gains a Kind dropdown (Backup or Forget; immutable on edit), Paths block hides for forget, validator rejects forget without retention. Agent skips manual-mode forget schedules in cron (same as backup). End-to-end: operator creates a forget schedule with keep-last=N → cron fires → `schedule.fire` → server resolves schedule + retention → `command.run` with `kind=forget` + retention payload → agent runs `restic forget --keep-last N` → snapshots refresh → UI reflects shrunk count. Run-now button on the schedule row also dispatches.
- [ ] **P2-06** (M) `prune` command (admin-only, uses non-append-only credential)
- [ ] **P2-07** (S) `check` command (random subset + `--read-data-subset`)
- [ ] **P2-08** (S) `unlock` command
- [ ] **P2-09** (M) Repo stats panel: size, dedup ratio, snapshot count, last check time, lock state
- [ ] **P2-10** (S) Run-now buttons for forget/prune/check/unlock on host detail page
- [ ] **P2-11** (S) Schedule "next run" / "last run" surfaced on host card
- [ ] **P2-12** (S) Bandwidth limit fields on schedule editor (`--limit-upload`, `--limit-download`); also overridable on run-now jobs
- [ ] **P2-13** (M) Pre/post backup hooks: schema (`Schedule.pre_hook`, `Schedule.post_hook`, `Host.pre_hook_default`, `Host.post_hook_default`), encrypted at rest, admin-only edit, audit-logged
- [ ] **P2-14** (M) Agent execution of hooks: configurable shell per host, `pre_hook` failure aborts backup, `post_hook` always runs with `RM_JOB_STATUS` env var, stdout/stderr captured into `JobLog` with prefix
- [ ] **P2-15** (S) Hook editor UI on schedule + host pages, with sensible warnings (e.g. "this hook runs as the agent service user"); validation enforces hooks only on `kind = backup` schedules (see spec.md §14.3)
- [ ] **P2-16** (M) Windows service integration: agent runs under the Service Control Manager via `golang.org/x/sys/windows/svc`; install/uninstall/start/stop wired up
- [ ] **P2-17** (M) `install.ps1` (Windows): downloads agent, installs as service, enrolls; detects existing scheduled tasks named `*restic*` and prints them for manual review
> **Mid-phase pivot — "P2 redesign" (commits `7a7cac5`, `666af41`, `5667cdf`).**
> The original P2 plan put paths/excludes/retention/manual/kind/options on
> `Schedule` and one repo per host. After landing P2-01..P2-05 against that
> shape, the data model was rewritten: schedules are slim (cron + which
> `source_groups`); paths/excludes/retention/retry live on `source_group`
> (also doubles as the snapshot tag); forget/prune/check cadences live on
> `host_repo_maintenance` and run on a server-side ticker, not the agent
> cron; `pending_runs` queues offline retries; `host.repo_initialised_at`
> is gone (auto-init at enrolment). The redesign is captured below as
> `P2R-NN` items. Items P2-01..P2-05 stay marked done because the work
> shipped, but they're labelled ⚠️ **shipped against old shape — behaviour
> to be re-validated under P2R-02 after UI rewire**. P2-04.5 (`manual`
> flag) is dropped wholesale. P2-06..P2-15 are reframed below to point at
> their new homes; P2-16/17/18 are unaffected by the redesign.
### Original P2 work — shipped (against pre-redesign shape)
- [x] ⚠️ **P2-01** (M) Schedule schema + CRUD API (fat-schedule shape) — superseded by P2R-01.
- [x] ⚠️ **P2-02** (L) Server-pushed schedule reconciliation (fat-schedule shape) — re-validate under P2R-01.
- [x] ⚠️ **P2-03** (M) Agent local scheduler (`internal/agent/scheduler`, `robfig/cron/v3`, `schedule.fire` envelope, `dispatchScheduledJob`). The cron loop + ack/fire round-trip stay; the payload it carries reshapes under P2R-01.
- [x] ⚠️ **P2-04** (M) Schedule editor UI (fat-schedule form: paths/excludes/tags/retention/bandwidth on the schedule itself) — superseded by P2R-02.
- ~~**P2-04.5** Manual schedules / kill `host.default_paths`~~ — superseded; the `manual` flag concept is gone, Run-now lives on source groups (see P2R-01 / P2R-02).
- [x] ⚠️ **P2-05** (M) `forget` command with retention policy. Wire payload (`CommandRunPayload.retention_policy`) and restic wrapper (`restic.ForgetPolicy`, `RunForget`) are still correct; what changes under P2R-03 is **where retention comes from** (source_group, not schedule) and **who dispatches** (server-side maintenance ticker for cadence; per-source-group Run-now for ad-hoc).
### P2 redesign — Phase 1 ✅
- [x] **P2R-00.1** (M) Migration 0008 — sources + repo maintenance. Adds `source_groups`, `schedule_source_groups` junction, `host_repo_maintenance`, `pending_runs`, `host.bandwidth_up_kbps` / `bandwidth_down_kbps`. Drops `host.repo_initialised_at`. Slim-schedule columns dropped from `schedules`. Column-level ALTERs only — no table rebuilds (FK cascade trap, see CLAUDE.md). Commit `7a7cac5`.
- [x] **P2R-00.2** (M) v4 wireframes for sources / schedules / repo. Hi-fi mocks of `/hosts/{id}/sources`, `/sources/{gid}/edit` (with retention-conflict banner), slim `/schedules`, `/repo` (connection / bandwidth / maintenance / re-init). Commit `666af41`.
### P2 redesign — Phase 2 ✅
- [x] **P2R-00.3** (L) Go-side store rewrite against migration 0008. New types: `SourceGroup`, `HostRepoMaintenance`, `PendingRun`. `Schedule` slimmed to `{id, host_id, cron, enabled, source_group_ids, timestamps}`. `RetentionPolicy` moves from schedule field → source group field (type unchanged). `Host` loses `RepoInitialisedAt`, gains bandwidth caps. New files: `store/sources.go`, `store/maintenance.go`, `store/pending.go`. `store/schedules.go` rewritten for slim shape + junction CRUD. `enrollment.go` seeds a default source group + repo-maintenance row instead of a manual schedule. `ws/handler.go` drops `MarkHostRepoInitialised`. HTTP layer + UI templates **temporarily 501-stubbed** with `redesign_in_progress` — this is what P2R-01 / P2R-02 fill back in. Tests for the obsolete fat-schedule API deleted. Commit `5667cdf`.
- [x] **P2R-00.4** (S) Host-detail UI patched up enough to render: `RepoInitialisedAt` template refs removed, manual init-repo branches stripped, dead Schedules sub-tab demoted to inert div (matches Jobs/Repo/Settings), broken Run-now buttons disabled with P2-Phase-4 hints. Stop-gap until P2R-02 lands the real surface.
### P2 redesign — Phase 3 (REST + WS rewire) — TODO
- [ ] **P2R-01** (L) HTTP/WS layer against the slim shape:
- **Schedules REST CRUD**: `GET|POST /api/hosts/{id}/schedules`, `PUT|DELETE /api/hosts/{id}/schedules/{sid}`. Body shape is `{cron, enabled, source_group_ids[]}` — paths/excludes/retention/kind/manual all go away. Junction wiped + re-inserted on every update (per `store.UpdateSchedule`). Validation: cron parses via `robfig/cron/v3`; ≥1 `source_group_ids`; all referenced groups belong to the host.
- **Source-groups REST CRUD**: `GET|POST /api/hosts/{id}/source-groups`, `GET|PUT|DELETE /api/hosts/{id}/source-groups/{gid}`. Body: `{name, includes[], excludes[], retention_policy, retry_max, retry_backoff_seconds}`. Name uniqueness per host. Refuse delete if `SchedulesUsingGroup(gid)` is non-empty (return the schedule list so UI can show "remove from these schedules first"). Mutations bump `host_schedule_version`.
- **Repo-maintenance REST**: `GET|PUT /api/hosts/{id}/repo-maintenance`. Body: `{forget_cadence, prune_cadence, check_cadence, check_subset_pct, enabled}`. Server-side ticker drives execution (P2R-04), so updates here do **not** bump `host_schedule_version`.
- **Per-source-group Run-now**: `POST /hosts/{id}/source-groups/{gid}/run`. Reuses the existing `dispatchScheduleNow`-style path; agent receives a normal `command.run` carrying the resolved includes/excludes/retention from the group. This replaces the old per-host `/hosts/{id}/run-backup` endpoint (kept around as a 410-Gone with a hint pointing to source groups).
- **`schedule_push.go` reconciliation**: rebuild `pushScheduleSet*` to ship the new wire format (`ScheduleSetPayload` carries `[{schedule_id, cron, enabled, source_groups: [{name, includes, excludes, retention, retry_*}]}]` — agent doesn't need to know `source_group_id`, just the resolved bundle). On-hello + async-on-CRUD flavours; ack still persists `applied_schedule_version`.
- **Auto-init at enrolment**: server dispatches `restic init` on first WS connect (was P2-old "Init repo" button — now invisible to the operator). On success: emit a normal job row with `kind=init` so the audit trail still shows it. On `init` returning "config file already exists" (e.g. re-enrolment against an existing repo): treat as soft success per existing restic-wrapper behaviour.
- **Tests**: rewrite the deleted `schedules_test.go` and `schedule_push_test.go` against new endpoints; new `source_groups_test.go`, `repo_maintenance_test.go`, `auto_init_test.go`. End-to-end: enrol → server pushes creds → server dispatches init → agent runs it → schedule reconcile fires → operator hits per-source-group Run-now → backup runs → snapshots refresh.
### P2 redesign — Phase 4 (UI rewire, against v4 wireframes) — TODO
- [ ] **P2R-02** (L) UI templates rebuilt against the new model:
- `/hosts/{id}/sources` — list of source groups with per-row meta (includes/excludes count, retention summary via `RetentionPolicy.Summary()`, usage = which schedules reference this group, snapshot count for `tag = group.name`). Run-now / Edit / Delete actions per row.
- `/hosts/{id}/sources/{gid}/edit` (and `/sources/new`) — name (= snapshot tag), includes/excludes textareas, retention as a 3×2 keep-* grid, retry-on-offline, inline conflict banner above retention when granularity ↔ cadence mismatch detected (uses `SourceGroup.conflict_dimension` cache).
- `/hosts/{id}/schedules` — slim list (status / cron / source-tags / actions) plus new-schedule form (cron with quick-pick chips, source-group multi-select via clickable check pickers, enabled toggle).
- `/hosts/{id}/repo` — connection (URL/user/password — pre-filled from `GET /api/hosts/{id}/repo-credentials` redacted view; cert pin), bandwidth caps (host-wide), maintenance rows (forget/prune/check cadence + check subset %), danger-zone re-init.
- **Re-enable the four host-detail sub-tabs** (Snapshots is already live; Schedules / Sources / Repo become real links again; Settings stays inert until later). Drop the stop-gap inert-div hack from P2R-00.4.
- **Per-source-group Run-now buttons** replace today's per-host `Run backup now` buttons (right-rail + dashboard row + empty-snapshots state). Dashboard row's Run-now becomes either "Run all groups" (if exactly one schedule covers all groups) or "Open →" (multi-group hosts).
- Header "version N · agent in sync / agent at vM" indicator preserved (still backed by `host_schedule_version` + `applied_schedule_version`).
- Form validation re-renders with the operator's typed input intact (mirror P2-04's behaviour). Each save fires `pushScheduleSetAsync` so an online agent re-arms within seconds.
### P2 redesign — Phase 5 (server-side maintenance ticker) — TODO
- [ ] **P2R-03** (M) `prune` command end-to-end. Restic wrapper (`restic.RunPrune`), agent dispatcher (`case api.JobPrune:`), wire envelope. **Admin-only credential**: a second `host_credentials` row keyed by `host_id` + `kind=admin` carries the non-append-only username/password; server pushes it via `config.update` only when dispatching a prune job, and the agent's secrets store keeps it in a separate slot from the everyday append-only creds. UI: prune row on the Repo page. Operator-triggered Run-now via `POST /hosts/{id}/repo/prune`. Cadence-driven dispatch lands in P2R-04.
- [ ] **P2R-04** (M) `check` command end-to-end (`restic check --read-data-subset N%`). Wrapper + dispatcher + wire. UI: check row on the Repo page (with the subset % slider). Operator Run-now via `POST /hosts/{id}/repo/check`. Cadence-driven dispatch lands in P2R-05.
- [ ] **P2R-05** (S) `unlock` command end-to-end (`restic unlock`). Operator-only — no cadence. `POST /hosts/{id}/repo/unlock`. Repo page surfaces lock state from the most recent `check` (which warns about stale locks).
- [ ] **P2R-06** (M) Server-side maintenance ticker. Cron-style loop on the server reads `host_repo_maintenance` rows, dispatches `forget` / `prune` / `check` jobs against the right host on the configured cadence (last-run timestamps tracked per kind on the maintenance row). Independent of the agent's local cron — the agent's cron only handles backup schedules now. Skips offline hosts (queues to `pending_runs` instead — see P2R-08). Handles ticker restarts cleanly (no-op if a job of the same kind ran inside the cadence window).
- [ ] **P2R-07** (S) Repo stats panel on the Repo page: size, dedup ratio, snapshot count, last-check timestamp + result, lock state, last-prune timestamp + bytes-freed. Backed by parsing `restic stats --json` output that the agent ships periodically (piggyback on the existing snapshots-report path).
- [ ] **P2R-08** (M) Pending-runs queue worker. On agent reconnect, server drains `pending_runs` rows for that host and re-dispatches them in order. Bump backoff per `pending_run.attempt_count`; drop rows that have exceeded the source-group's `retry_max`. Audit-logged. Smoke-tested by stopping the agent, running maintenance ticker so cadence misses, restarting agent, watching the queue drain.
### P2 redesign — Phase 6 (auto-init follow-up) — TODO
- [ ] **P2R-09** (S) Auto-init UX polish. Surface init result on host detail (small "repo ready · initialised by you on …" line; or "init failed — see job N · retry" if init failed). Re-init button on Repo page danger zone wipes then re-runs init (admin only, audit-logged, two-step confirm with the host name typed in).
### Pre/post hooks (rehomed onto source groups) — TODO
- [ ] **P2R-10** (M) Hook schema: `source_group.pre_hook`, `source_group.post_hook`, `host.pre_hook_default`, `host.post_hook_default`. Encrypted at rest (existing `crypto.AEAD`). Admin-only edit. Audit-logged.
- [ ] **P2R-11** (M) Agent execution of hooks: configurable shell per host. `pre_hook` failure aborts the backup. `post_hook` always runs with `RM_JOB_STATUS` env var. Stdout/stderr captured into `JobLog` with a `hook:` prefix. Hooks only run for `kind=backup` jobs (forget/prune/check/unlock skip them, per spec.md §14.3).
- [ ] **P2R-12** (S) Hook editor UI on source-group edit page (per-group override) and host Settings tab (host-wide default). Validation rejects non-backup contexts. Warning banner: "this hook runs as the agent service user (root on Linux; LocalSystem on Windows)".
### Bandwidth + niceties (rehomed onto host + source groups) — TODO
- [ ] **P2R-13** (S) Bandwidth limit fields. Host-wide caps (`Host.BandwidthUpKBps`, `BandwidthDownKBps` — schema is in 0008 already, just needs UI on the Repo page) applied to every restic invocation. Per-job override on Run-now (override field on the Run-now confirm dialog). Maps to `restic --limit-upload` / `--limit-download`.
- [ ] **P2R-14** (S) Schedule "next run" / "last run" surfaced on host card (dashboard row) + on the Schedules tab. "Next run" computed server-side from cron + now; "last run" from the most recent job with `actor_kind=schedule` for any schedule that uses any of the host's source groups.
### Cross-platform + alt-enrolment (unchanged by redesign) — TODO
- [ ] **P2-16** (M) Windows service integration: agent runs under the Service Control Manager via `golang.org/x/sys/windows/svc`; install/uninstall/start/stop wired up.
- [ ] **P2-17** (M) `install.ps1` (Windows): downloads agent, installs as service, enrolls; detects existing scheduled tasks named `*restic*` and prints them for manual review.
- [ ] **P2-18** (L) Announce-and-approve enrollment (second enrollment mode, alongside the token flow that ships in Phase 1):
- Agent run with no `RM_TOKEN` generates a local Ed25519 keypair (persisted alongside the encrypted secrets blob), then `POST /api/agents/announce` with `{hostname, os, arch, agent_version, restic_version, public_key}`. Server stores a `pending_hosts` row (`public_key`, `fingerprint = sha256(public_key)`, `announced_from_ip`, `first_seen_at`, `last_seen_at`, `expires_at = now+1h`). Hostname collisions with existing or other pending rows are flagged in the response so the install script can warn loudly on the endpoint terminal.
- Agent then opens a long-poll/WS to `/ws/agent/pending` authenticated by signing a server-issued nonce with its private key — proves possession of the key tied to the pending row. Connection stays open; agent waits.
@@ -125,9 +192,13 @@ Sizes: **S** = under a day, **M** = 13 days, **L** = 37 days.
### Phase 2 acceptance
- Schedules created in UI run on agents on time; retention is applied; admin can prune from UI; repo health visible per host. Pre/post hooks fire correctly (verified with a Docker stop/start example and a `mysqldump` example) and are rejected on non-backup schedule kinds. Bandwidth limits honoured.
- A Windows host can enroll, appear in the dashboard, and run a backup with live log streaming — closing the cross-platform gap left by Phase 1.
- A Linux host can enroll via announce-and-approve: operator runs the install script with no token, sees a fingerprint in the terminal, finds the matching pending row in the UI, clicks accept, and the host is fully credentialled and online without further endpoint interaction. Rejecting a pending row leaves the agent process exited cleanly with a clear log line. Rate-limit and pending-cap guards verified under a synthetic flood.
- A host can be onboarded end-to-end with no manual REST: enrol → auto-init runs → operator opens host → creates source group(s) → attaches them to one or more schedules → schedule fires on time → backup runs against the right paths with the right retention → snapshots tagged by group name appear in UI.
- Operator can hit Run-now per source group from any of: dashboard row (single-group host), source group row, snapshot empty-state.
- Server-side maintenance ticker drives forget/prune/check at the configured cadences, independent of agent cron. Offline hosts queue to `pending_runs` and drain on reconnect.
- Pre/post hooks fire correctly per source group, fail loudly on `pre_hook` errors, run `post_hook` with `RM_JOB_STATUS`. Rejected on non-backup kinds.
- Bandwidth limits honoured (host-wide default + per-run override).
- A Windows host can enrol, appear in the dashboard, and run a backup with live log streaming.
- A Linux host can enrol via announce-and-approve, with fingerprint-comparison gate enforced. Rate-limit + pending-cap guards verified.
---
@@ -189,3 +260,4 @@ Sizes: **S** = under a day, **M** = 13 days, **L** = 37 days.
- [ ] **X-02** Track restic version compatibility matrix
- [ ] **X-03** Periodic dependency updates (`dependabot` or `renovate`)
- [ ] **X-04** Threat-model review at end of each phase
- [ ] **X-05** Proper first-run onboarding UI: admin shouldn't need to `curl` `/api/bootstrap` by hand. Render the bootstrap form on the same login page (extra "setup token" field shown only while no admin user exists, hidden after); on submit POST to `/api/bootstrap`, then drop straight into a session. Surface the one-time token from the server log somewhere copy-able (or print a clickable URL with the token in the query string at first-run). Also: relax the 12-char password floor for the first-run path or document it in the form so `admin` doesn't silently fail validation.
+6 -42
View File
@@ -38,20 +38,7 @@
</div>
</div>
<div class="flex items-center gap-2">
{{if eq $host.Status "offline"}}
<button class="btn" disabled title="agent is offline">Run&nbsp;backup&nbsp;now</button>
{{else if not $host.RepoInitialisedAt}}
<button class="btn btn-danger"
hx-post="/hosts/{{$host.ID}}/init-repo"
hx-swap="none"
hx-disabled-elt="this"
title="restic repo not yet initialised — run this once before the first backup">Initialise repo</button>
{{else}}
<button class="btn btn-primary"
hx-post="/hosts/{{$host.ID}}/run-backup"
hx-swap="none"
hx-disabled-elt="this">Run&nbsp;backup&nbsp;now</button>
{{end}}
<button class="btn" disabled title="per-source-group Run-now lands in P2 Phase 4">Run&nbsp;backup&nbsp;now</button>
<button class="btn">Edit credentials</button>
<button class="btn btn-ghost text-base px-2.5"></button>
</div>
@@ -93,7 +80,7 @@
{{/* ---------- secondary tabs ---------- */}}
<div class="flex items-end mt-1.5">
<a class="sub-tab active" href="/hosts/{{$host.ID}}">Snapshots <span class="mono text-ink-fade text-[11px] ml-1">{{comma $host.SnapshotCount}}</span></a>
<a class="sub-tab" href="/hosts/{{$host.ID}}/schedules">Schedules</a>
<div class="sub-tab" title="schedules UI lands in P2 Phase 4">Schedules</div>
<div class="sub-tab">Jobs</div>
<div class="sub-tab">Repo</div>
<div class="sub-tab">Settings</div>
@@ -118,21 +105,9 @@
<p class="text-pretty text-ink-mute text-[13px] mt-2 mx-auto max-w-[440px] leading-[1.65]">
Once a backup completes, the agent will refresh this list automatically.
</p>
{{if ne $host.Status "offline"}}
<div class="mt-5">
{{if not $host.RepoInitialisedAt}}
<button class="btn btn-danger"
hx-post="/hosts/{{$host.ID}}/init-repo"
hx-swap="none"
hx-disabled-elt="this">Initialise repo</button>
{{else}}
<button class="btn btn-primary"
hx-post="/hosts/{{$host.ID}}/run-backup"
hx-swap="none"
hx-disabled-elt="this">Run now</button>
{{end}}
</div>
{{end}}
<div class="mt-5">
<button class="btn" disabled title="per-source-group Run-now lands in P2 Phase 4">Run now</button>
</div>
</div>
{{else}}
<div class="hairline grid items-baseline px-4 py-2.5 text-[11px] text-ink-fade uppercase tracking-[0.08em]"
@@ -176,18 +151,7 @@
<div class="panel rounded-[7px] px-4 py-3.5">
<div class="text-[11px] text-ink-fade uppercase tracking-[0.1em] mb-2.5">Run-now</div>
<div class="flex flex-col gap-1.5">
{{if not $host.RepoInitialisedAt}}
<button class="btn justify-start w-full text-bad font-medium {{if eq $host.Status "offline"}}opacity-50 cursor-not-allowed pointer-events-none{{end}}"
hx-post="/hosts/{{$host.ID}}/init-repo"
hx-swap="none"
hx-disabled-elt="this"
title="restic repo not yet initialised — click to run `restic init` once">init</button>
{{end}}
<button class="btn justify-start w-full {{if or (eq $host.Status "offline") (not $host.RepoInitialisedAt)}}opacity-50 cursor-not-allowed pointer-events-none{{end}}"
hx-post="/hosts/{{$host.ID}}/run-backup"
hx-swap="none"
hx-disabled-elt="this"
{{if not $host.RepoInitialisedAt}}title="initialise the repo first"{{end}}>backup</button>
<button class="btn justify-start w-full" disabled title="per-source-group Run-now lands in P2 Phase 4">backup <span class="text-[10px] text-ink-fade ml-1.5">P2</span></button>
<button class="btn justify-start w-full" disabled title="lands with P2-05">forget <span class="text-[10px] text-ink-fade ml-1.5">P2</span></button>
<button class="btn justify-start w-full" disabled title="lands with P2-06">prune <span class="text-[10px] text-ink-fade ml-1.5">admin</span></button>
<button class="btn justify-start w-full" disabled title="lands with P2-07">check <span class="text-[10px] text-ink-fade ml-1.5">P2</span></button>
+1 -15
View File
@@ -51,22 +51,8 @@
<span class="mono text-xs text-ink-fade">offline</span>
{{- else if .CurrentJobID -}}
<a href="/jobs/{{deref .CurrentJobID}}" class="btn btn-ghost">View job →</a>
{{- else if not .RepoInitialisedAt -}}
<button class="btn btn-danger"
hx-post="/hosts/{{.ID}}/init-repo"
hx-swap="none"
hx-disabled-elt="this"
title="restic repo not yet initialised">Init repo</button>
{{- else if eq (deref .LastBackupStatus) "failed" -}}
<button class="btn"
hx-post="/hosts/{{.ID}}/run-backup"
hx-swap="none"
hx-disabled-elt="this">Retry</button>
{{- else -}}
<button class="btn whitespace-nowrap"
hx-post="/hosts/{{.ID}}/run-backup"
hx-swap="none"
hx-disabled-elt="this">Run&nbsp;now</button>
<a href="/hosts/{{.ID}}" class="btn btn-ghost whitespace-nowrap" title="per-source-group Run-now lands in P2 Phase 4 — open the host">Open →</a>
{{- end -}}
</div>
</div>