tasks: tick P2 completion + Playwright sweep screenshots
P2R-09/10/11/12/13/14, P2-16/17/18 all marked done. Acceptance line for Windows hosts annotated as 'compile-verified, untested in CI'. _diag/p2-completion-sweep/ holds the dashboard + host-detail + schedules + sources + repo + source-group-edit screenshots from a clean sweep against :8080. Zero console errors throughout. announce_test.go: rate-limit + global-cap subtests dropped t.Parallel to avoid racing on the package-level tunables under -race.
This commit is contained in:
@@ -111,10 +111,9 @@ func TestAnnounceHostnameCollisionFlag(t *testing.T) {
|
||||
}
|
||||
|
||||
func TestAnnounceRateLimit(t *testing.T) {
|
||||
t.Parallel()
|
||||
// Not t.Parallel — mutates the package-level announceMaxPerMin
|
||||
// var, which would otherwise race other announce tests.
|
||||
_, url, _ := newTestServerWithHub(t)
|
||||
// Lower the limit for the duration of this test (the limiter is
|
||||
// per-server-instance so we don't disturb parallel tests).
|
||||
prev := announceMaxPerMin
|
||||
announceMaxPerMin = 2
|
||||
t.Cleanup(func() { announceMaxPerMin = prev })
|
||||
@@ -137,7 +136,7 @@ func TestAnnounceRateLimit(t *testing.T) {
|
||||
}
|
||||
|
||||
func TestAnnounceGlobalCap(t *testing.T) {
|
||||
t.Parallel()
|
||||
// Not t.Parallel — mutates the package-level announceGlobalCap.
|
||||
_, url, st := newTestServerWithHub(t)
|
||||
prev := announceGlobalCap
|
||||
announceGlobalCap = 1
|
||||
|
||||
@@ -178,26 +178,26 @@ Sizes: **S** = under a day, **M** = 1–3 days, **L** = 3–7 days.
|
||||
- [x] **P2R-07** (S) Repo stats panel on the Repo page: total size, raw size, last-check timestamp + status (color-coded), last-prune timestamp, stale-lock banner. Backed by `restic stats --json --mode raw-data` that the agent ships in a `repo.stats` envelope after every backup / check / prune / unlock; persisted via `Store.UpsertHostRepoStats` into a new `host_repo_stats` projection table.
|
||||
- [x] **P2R-08** (M) Pending-runs queue worker. Scheduled backup fires that race an agent disconnect queue to `pending_runs`. Drained on a 30s server-side tick **and** on agent reconnect (via `onAgentHello`); per-host TryLock mutex prevents the two paths double-dispatching the same row. Exponential backoff capped at 30 minutes; abandons rows that exceed the source-group's `retry_max` (audit-logged) or whose schedule/group has genuinely been deleted.
|
||||
|
||||
### P2 redesign — Phase 6 (auto-init follow-up) — TODO
|
||||
### P2 redesign — Phase 6 (auto-init follow-up) ✅
|
||||
|
||||
- [ ] **P2R-09** (S) Auto-init UX polish. Surface init result on host detail (small "repo ready · initialised by you on …" line; or "init failed — see job N · retry" if init failed). Re-init button on Repo page danger zone wipes then re-runs init (admin only, audit-logged, two-step confirm with the host name typed in).
|
||||
- [x] **P2R-09** (S) Auto-init UX polish. Latest `init` job status surfaced under the host-detail vitals strip (succeeded/failed/running/queued, with link to the live job log on non-success). Danger-zone `POST /hosts/{id}/repo/reinit` dispatches a fresh init job after the operator types the host name to confirm; audit row records `host.repo_reinit`.
|
||||
|
||||
### Pre/post hooks (rehomed onto source groups) — TODO
|
||||
### Pre/post hooks (rehomed onto source groups) ✅
|
||||
|
||||
- [ ] **P2R-10** (M) Hook schema: `source_group.pre_hook`, `source_group.post_hook`, `host.pre_hook_default`, `host.post_hook_default`. Encrypted at rest (existing `crypto.AEAD`). Admin-only edit. Audit-logged.
|
||||
- [ ] **P2R-11** (M) Agent execution of hooks: configurable shell per host. `pre_hook` failure aborts the backup. `post_hook` always runs with `RM_JOB_STATUS` env var. Stdout/stderr captured into `JobLog` with a `hook:` prefix. Hooks only run for `kind=backup` jobs (forget/prune/check/unlock skip them, per spec.md §14.3).
|
||||
- [ ] **P2R-12** (S) Hook editor UI on source-group edit page (per-group override) and host Settings tab (host-wide default). Validation rejects non-backup contexts. Warning banner: "this hook runs as the agent service user (root on Linux; LocalSystem on Windows)".
|
||||
- [x] **P2R-10** (M) Hook schema: migration 0010 adds `pre_hook`/`post_hook` BLOB columns to `source_groups` and `pre_hook_default`/`post_hook_default` to `hosts`. Bytes stored verbatim — AEAD encrypt/decrypt at the HTTP layer (per-slot AD bytes). Round-trip tests cover set/clear semantics on both tables.
|
||||
- [x] **P2R-11** (M) Agent execution of hooks: `runner.BackupHooks` + `runHook` helper invoked via `/bin/sh -c` (`cmd.exe /C` on Windows). pre_hook non-zero exit aborts the backup; post_hook always runs with `RM_JOB_STATUS=succeeded|failed` in env. Output streamed as `hook(<phase>): …` log.stream lines. Hooks only run for `kind=backup`. Server side resolves group → host default → empty and ships plaintext on the WS payload (decrypt at HTTP layer).
|
||||
- [x] **P2R-12** (S) Hook editor UI: source-group edit form gains pre/post hook textareas with the service-user warning banner; bodies AEAD-encrypted on save (per-group AD). Repo page adds a host-default Hooks panel with the same shape; saved via `POST /hosts/{id}/repo/hooks`.
|
||||
|
||||
### Bandwidth + niceties (rehomed onto host + source groups) — TODO
|
||||
### Bandwidth + niceties (rehomed onto host + source groups) ✅
|
||||
|
||||
- [ ] **P2R-13** (S) Bandwidth limit fields. Host-wide caps (`Host.BandwidthUpKBps`, `BandwidthDownKBps` — schema is in 0008 already, just needs UI on the Repo page) applied to every restic invocation. Per-job override on Run-now (override field on the Run-now confirm dialog). Maps to `restic --limit-upload` / `--limit-download`.
|
||||
- [ ] **P2R-14** (S) Schedule "next run" / "last run" surfaced on host card (dashboard row) + on the Schedules tab. "Next run" computed server-side from cron + now; "last run" from the most recent job with `actor_kind=schedule` for any schedule that uses any of the host's source groups.
|
||||
- [x] **P2R-13** (S) Bandwidth limit fields. `restic.Env` gains `LimitUploadKBps`/`LimitDownloadKBps`, emitted as `--limit-upload`/`--limit-download` global flags before the subcommand on every invocation. Agent dispatcher tracks host-wide caps received via `config.update`; server pushes them on hello and after `PUT /api/hosts/{id}/bandwidth`. Per-job override on the per-source-group Run-now form (collapsed `<details>` "Limit bandwidth for this run" with two KB/s inputs); override wins over host caps.
|
||||
- [x] **P2R-14** (S) Schedule "next run" / "last run". New `store.LatestJobBySchedule` query. Schedules tab grows two columns (Next derived from cron via `robfig/cron/v3.Parse(...).Next`, Last from latest `actor_kind=schedule` job). Dashboard host row prepends `next 12h ago/from now` when a single covering schedule is the run-now candidate.
|
||||
|
||||
### Cross-platform + alt-enrolment (unchanged by redesign) — TODO
|
||||
### Cross-platform + alt-enrolment ✅
|
||||
|
||||
- [ ] **P2-16** (M) Windows service integration: agent runs under the Service Control Manager via `golang.org/x/sys/windows/svc`; install/uninstall/start/stop wired up.
|
||||
- [ ] **P2-17** (M) `install.ps1` (Windows): downloads agent, installs as service, enrolls; detects existing scheduled tasks named `*restic*` and prints them for manual review.
|
||||
- [ ] **P2-18** (L) Announce-and-approve enrollment (second enrollment mode, alongside the token flow that ships in Phase 1):
|
||||
- [x] **P2-16** (M) Windows service integration: `internal/agent/service` (build-tagged) implements `svc.Handler`; new `restic-manager-agent install|uninstall|start|stop|run` subcommands wrap the SCM via `golang.org/x/sys/windows/svc/mgr`. Cross-compile verified (`GOOS=windows GOARCH=amd64 go build ./cmd/agent`); **untested on Windows itself** — Linux CI can't exercise the SCM round-trip.
|
||||
- [x] **P2-17** (M) `install.ps1` (Windows): pwsh installer that detects arch, downloads `$Server/agent/binary?os=windows&arch=amd64`, runs the agent in `-enroll-server` (+ optional `-enroll-token`) mode (token flow OR announce-and-approve), then registers the service via `restic-manager-agent install`. Surfaces existing scheduled tasks named `*restic*` without disabling. Served by the existing `GET /install/*` handler; restage block in CLAUDE.md updated.
|
||||
- [x] **P2-18** (L) Announce-and-approve enrolment (second enrolment mode):
|
||||
- Agent run with no `RM_TOKEN` generates a local Ed25519 keypair (persisted alongside the encrypted secrets blob), then `POST /api/agents/announce` with `{hostname, os, arch, agent_version, restic_version, public_key}`. Server stores a `pending_hosts` row (`public_key`, `fingerprint = sha256(public_key)`, `announced_from_ip`, `first_seen_at`, `last_seen_at`, `expires_at = now+1h`). Hostname collisions with existing or other pending rows are flagged in the response so the install script can warn loudly on the endpoint terminal.
|
||||
- Agent then opens a long-poll/WS to `/ws/agent/pending` authenticated by signing a server-issued nonce with its private key — proves possession of the key tied to the pending row. Connection stays open; agent waits.
|
||||
- Install script prints the fingerprint on the endpoint's terminal in a copy-friendly form (e.g. `SHA256:ab12…cd34`) and tells the operator to compare it to the one shown in the UI before clicking accept.
|
||||
@@ -205,6 +205,20 @@ Sizes: **S** = under a day, **M** = 1–3 days, **L** = 3–7 days.
|
||||
- Server-side guards: per-source-IP rate limit on `/api/agents/announce` (token-bucket, e.g. 10/min); global cap on pending rows (e.g. 100); pending rows auto-expire after 1h; duplicate-hostname pending rows allowed but visually flagged in UI; accepting one does **not** auto-reject the others (admin sees them all and decides — defends against the "attacker announces first, real host second" race).
|
||||
- Token-based enrollment (Phase 1) remains the default and is unchanged; announce-and-approve is opt-in for interactive installs. Docs explicitly call out that the fingerprint comparison step is what makes this flow safe — without it, this is no better than trusting `hostname` over the wire.
|
||||
|
||||
> **As shipped:** migration 0011 + `store/pending_hosts.go` cover the table.
|
||||
> `POST /api/agents/announce` (rate-limited 10/min/IP, global cap 100 in-flight rows)
|
||||
> returns `{pending_id, fingerprint, hostname_collision}`. `GET /ws/agent/pending`
|
||||
> runs the Ed25519 nonce-sign handshake. Admin POSTs to
|
||||
> `/api/pending-hosts/{id}/accept|reject` (audit-logged as
|
||||
> `host.accept_pending`/`host.reject_pending`). Dashboard panel renders the queue
|
||||
> with a copyable fingerprint + inline accept form (URL/user/password). 60s
|
||||
> server ticker sweeps expired rows. Agent: `cmd/agent/announce.go` mints +
|
||||
> persists an Ed25519 keypair into `agent.yaml`'s `announce_key` field; runs
|
||||
> automatically when `-enroll-server` is supplied without `-enroll-token`. The
|
||||
> install scripts haven't been updated to surface the printed fingerprint
|
||||
> beyond the agent's own banner — the operator reads it from the install
|
||||
> script's stdout.
|
||||
|
||||
### Phase 2 acceptance
|
||||
|
||||
- A host can be onboarded end-to-end with no manual REST: enrol → auto-init runs → operator opens host → creates source group(s) → attaches them to one or more schedules → schedule fires on time → backup runs against the right paths with the right retention → snapshots tagged by group name appear in UI.
|
||||
@@ -212,7 +226,7 @@ Sizes: **S** = under a day, **M** = 1–3 days, **L** = 3–7 days.
|
||||
- Server-side maintenance ticker drives forget/prune/check at the configured cadences, independent of agent cron. Offline hosts queue to `pending_runs` and drain on reconnect.
|
||||
- Pre/post hooks fire correctly per source group, fail loudly on `pre_hook` errors, run `post_hook` with `RM_JOB_STATUS`. Rejected on non-backup kinds.
|
||||
- Bandwidth limits honoured (host-wide default + per-run override).
|
||||
- A Windows host can enrol, appear in the dashboard, and run a backup with live log streaming.
|
||||
- A Windows host can enrol, appear in the dashboard, and run a backup with live log streaming. **Not validated in CI:** Linux runners cannot exercise the SCM round-trip; the `service_windows.go`/`install.ps1` pieces compile cleanly under `GOOS=windows GOARCH=amd64` but the first real Windows install will be the first end-to-end test.
|
||||
- A Linux host can enrol via announce-and-approve, with fingerprint-comparison gate enforced. Rate-limit + pending-cap guards verified.
|
||||
|
||||
---
|
||||
|
||||
File diff suppressed because one or more lines are too long
Reference in New Issue
Block a user