restic-manager

Author	SHA1	Message	Date
steve	aba0b7e177	server: fix stale RetentionPolicy comment + check Scan errors in maintenance test	2026-05-04 10:19:15 +01:00
steve	14b703be58	server: maintenance ticker drives forget/prune/check on cadence Wires a 60s server-side ticker to the pure-logic maintenance.Decide introduced in the previous commit. Decisions flow through a new DispatchMaintenance method on Server, which: - skips offline hosts (no pending_runs queueing — maintenance is not a backup, missed fires shouldn't pile up) - silently skips prune when admin creds aren't bound - pushes admin creds before prune, then dispatches with RequiresAdminCreds=true (same as operator-driven prune) - persists job rows with actor_kind="system" Reshapes the forget wire payload from a single RetentionPolicy to a ForgetGroups list (one tag + per-group keep- per source group). The agent walks the groups and runs `restic forget --tag <name> --keep-*` once per group. Dead-code removed: CommandRunPayload.RetentionPolicy, the old forget JSON-decode in cmd/agent, and the single-policy form of restic.RunForget.	2026-05-04 10:19:15 +01:00
steve	ae96983877	maintenance: pure-logic ticker decides forget/prune/check fires	2026-05-04 10:19:15 +01:00
steve	6f204a6877	ui: hx-swap none on Run-now + truthful save banner + tailwind rebuild Add hx-swap="none" to the three Run-now buttons (check/prune/unlock) in host_repo.html to match the existing pattern on host_sources.html and host_schedules.html. Fix all-blank admin-credentials save to redirect without ?saved= query string so no false-positive banner is shown; strengthen the corresponding test to assert Location has no ?saved=. Rebuild CSS bundle via Tailwind to pick up max-w-[640px] JIT class.	2026-05-04 10:19:15 +01:00
steve	c5b52df7ed	ui: Slice E — admin creds form + run-now buttons + repo health panel - hostRepoPage gains AdminURL/AdminUsername/HasAdminPassword, Online, and StatsView (pre-dereferenced projection of host_repo_stats). - loadHostRepoPage loads the admin slot (tolerating ErrNotFound), hub.Connected, and stats (tolerating ErrNotFound). - renderRepoPage gains an adminErr parameter; all callers updated. - handleUIAdminCredentialsSave / handleUIAdminCredentialsDelete added (form-POST handlers mirroring the repo-creds pattern, with audit). - Routes /hosts/{id}/admin-credentials POST and /delete POST registered. - Template: Admin credentials form after Connection, Run-now HTMX buttons after Maintenance, Repo health stats panel in right rail. - Tests: 9 new tests covering rendering, disabled states, save/delete round-trips, audit rows, and idempotent delete.	2026-05-04 10:19:15 +01:00
steve	e2d94bf3a2	server: populate audit UserID on credential mutations + slog prune push errors Switch handleSetHostCredentials, handleSetAdminCredentials, and handleDeleteAdminCredentials from authedUser (bool) to requireUser (*store.User) so AuditEntry.UserID and Actor are populated correctly. Add slog.Warn on the non-ErrNotFound pushAdminCredsToAgent path in handleRunRepoPrune so decrypt/send failures surface in the server log rather than appearing as a generic host_offline 503.	2026-05-04 10:19:15 +01:00
steve	c5f401e99b	server: cover HTMX auth-redirect path in repo-ops tests	2026-05-04 10:19:15 +01:00
steve	69abc40786	server: HTTP run-now for prune / check / unlock Adds POST /api/hosts/{id}/repo/{prune,check,unlock} (and matching outer routes for HTMX form posts). Prune pushes the admin-cred slot via pushAdminCredsToAgent before dispatch and refuses with admin_creds_required when the slot is not set. Check reads check_subset_pct from host_repo_maintenance (overridable via ?subset=N, clamped 0-100; non-numeric override falls back to DB value silently). Unlock needs no admin creds. All three share the same wantsHTML/HX-Redirect response split as the per-source-group run-now endpoint.	2026-05-04 10:19:15 +01:00
steve	35f07c3cee	server: admin-credentials REST + Slot:admin push helper Adds GET/PUT/DELETE /api/hosts/{id}/admin-credentials handlers that mirror the existing repo-credentials endpoints but write to store.CredKindAdmin with AEAD additional-data "host:<id>:admin" (scoped away from the repo slot to prevent cross-binding). PUT immediately pushes a config.update(Slot:"admin") to the agent when it is connected, and the new pushAdminCredsToAgent helper is wired for use by the upcoming prune run-now endpoint (D2) to push on-demand before dispatch.	2026-05-04 10:19:15 +01:00
steve	a110e3c00c	agent: secrets fail-loud on corrupt blob + small polish Save and SaveAdmin now propagate loadBundle errors instead of silently overwriting a corrupt file (data-loss fix). Tests added for both paths. reportStats logs a Debug on RunStats failure; r in runJob gets a comment explaining the prune-runner asymmetry; runner_test comment tightened.	2026-05-04 10:19:15 +01:00
steve	22adde36b3	agent/runner: ship repo.stats before job.finished in RunCheck/RunUnlock RunCheck and RunUnlock were calling sendFinished before reportStats, inverting the required job.started → log.stream → repo.stats → job.finished envelope order. Move reportStats ahead of sendFinished in both functions to match the pattern already correct in RunPrune. Strengthen TestRunCheckShipsCheckStatus, TestRunCheckErrorsFoundShipsErrorsStatus, and TestRunUnlockClearsLock with the same position-index ordering assertions used by TestRunPruneShipsExpectedEnvelopes; these assertions would have failed against the pre-fix code.	2026-05-04 10:19:15 +01:00
steve	57bf9690f2	agent: RunPrune/RunCheck/RunUnlock + reportStats + admin-cred slot dispatch Extract resticEnv/sendStarted/streamHandler/sendFinished helpers to remove boilerplate duplication across Run* methods. Add RunPrune (ships repo.stats with LastPruneAt before job.finished), RunCheck (ships stats with LastCheckStatus/LockPresent regardless of outcome), RunUnlock (ships LockPresent=false on success), and reportStats (fills size fields via RunStats when caller didn't populate them). Wire JobPrune/JobCheck/JobUnlock into the dispatcher switch; teach MsgConfigUpdate about the Slot discriminator for admin vs repo creds; add strconv import for subset-pct parsing.	2026-05-04 10:19:15 +01:00
steve	c1237583bd	agent/secrets: separate admin slot with backwards-compatible decode Split the on-disk bundle into repo + admin slots. Legacy flat Repo blobs are detected at load time by the presence of "repo_url" at the top level and transparently promoted into the new shape on the next Save/SaveAdmin. Adds ErrNoAdmin sentinel, LoadAdmin, SaveAdmin, and three new tests.	2026-05-04 10:19:15 +01:00
steve	0c3c907de8	api: stats partial-update payload + ConfigUpdate.Slot + CommandRun.RequiresAdminCreds Reshape RepoStatsPayload into pointer-field partial-update form matching store.HostRepoStats semantics; add Slot discriminator to ConfigUpdatePayload for admin vs repo credential routing; add RequiresAdminCreds flag to CommandRunPayload for prune/unlock jobs that need delete authority.	2026-05-04 10:19:15 +01:00
steve	e93eb2a060	restic: tighten RunCheck lock sniff + RunStats zero-snapshot test Narrow the LockPresent predicate from bare "locked" (too broad) to "stale lock" and "already locked" — the two phrases restic actually emits. Replace TestRunCheckParsesLock with table-driven TestRunCheckLockSniff covering both trigger phrases and a benign "locked-file" line that must not set LockPresent. Add TestRunStatsZeroSnapshots to pin that RunStats accepts zero-snapshot JSON without error.	2026-05-04 10:19:15 +01:00
steve	485f4322cb	restic: RunUnlock + RunStats (raw-data mode) Add RunUnlock (delegates straight to runWithPump) and RunStats which runs `restic stats --json --mode raw-data`, captures the single JSON line from stdout into RepoStats, and returns an error if no JSON arrives. Tests cover arg plumbing for unlock, JSON parsing, and the no-JSON error path.	2026-05-04 10:19:15 +01:00
steve	b24faf6de7	restic: RunCheck with subset% + lock-state sniffing Add CheckResult (LockPresent, ErrorsFound) and RunCheck. subsetPct>0 passes --read-data-subset N% to limit data reads. Stderr is sniffed for "Found stale lock"/"locked" to set LockPresent; a non-zero exit from restic is absorbed as ErrorsFound=true rather than an error so the caller can always persist last_check_status. Tests cover lock detection, exit-1 absorption, and subset-arg plumbing.	2026-05-04 10:19:15 +01:00
steve	9b790bbade	restic: RunPrune + runWithPump helper, refactor Forget/Init onto it Add RunPrune for admin-credential prune invocations. Extract runWithPump to DRY the stdout+stderr pump pattern; refactor RunForget and RunInit to delegate to it (RunInit preserves the "config file already exists" soft-success sniff by wrapping the handler before the call). Add runner_test.go with TestRunPruneInvokesPrune.	2026-05-04 10:19:15 +01:00
steve	11cbc2fb7f	store: tighten CHECK constraint on host_repo_stats.last_check_status	2026-05-04 10:19:15 +01:00
steve	5200e44536	store: wrap UpsertHostRepoStats in a transaction (concurrency safety)	2026-05-04 10:19:15 +01:00
steve	84a8c060b6	store: assert CHECK constraint on host_credentials.kind	2026-05-04 10:19:15 +01:00
steve	cfe25b9799	store: HostRepoStats projection (size, lock, last-check, last-prune)	2026-05-04 10:19:15 +01:00
steve	f801fdf65b	store: host_credentials becomes kind-aware (repo + admin slots)	2026-05-04 10:19:15 +01:00
steve	9f2cb18e42	store: migration 0009 — admin-creds kind + host_repo_stats	2026-05-04 10:19:15 +01:00
steve	e73c4bd96c	infra: remove provision-gitea-runner.sh (now lives with the infra team) The runner-provisioning script has been handed off to the infra agent, who will own it going forward. ci.yml's header comment is updated to point at "the infra team owns the script" rather than the in-repo path, but the runner expectations themselves stay the same — workflows still rely on the persistent volumes, pre-cloned actions, and host-installed golangci-lint that any compliant provisioning produces.	2026-05-04 10:19:09 +01:00
steve	bd460d7532	ci+infra: provisioning script for gitea runners + drop setup-go cache scripts/provision-gitea-runner.sh is a one-shot, idempotent host setup for an act_runner LXC. It mounts persistent host volumes for GOMODCACHE / GOCACHE / act-clones, pre-pulls the runner image, pre-clones the common GitHub actions, installs golangci-lint, and sets up a nightly cron to refresh the lot. Generic — no per-project state. With those persistent volumes in place, `cache: true` on actions/setup-go becomes a net negative — the action keeps tar-ing / un-tar-ing GOMODCACHE+GOCACHE through the Gitea cache backend on every job, adding ~10s per job and overwriting the volume contents. Drop it from all three jobs in ci.yml. Add a header comment block explaining the runner-side expectations and the Go version / build matrix / upload-artifact context for anyone reading later.	2026-05-04 09:40:27 +01:00
steve	2ba2c9c7db	Merge pull request 'P2R-02: UI rewire against the slim-schedule + source-group model' (#2 ) from p2r-02-ui-rebuild into main Reviewed-on: #2	2026-05-03 20:34:02 +00:00
steve	380931b3a8	lint: align local gofumpt rules with golangci-lint v2.5.0 Bumping CI to v2.5.0 surfaced two new gofumpt findings (in two test files that gofumpt v2.1.6 considered fine). Local re-format with the matching tool brings them in line. Pre-commit hook config: prepend $GOPATH/bin to PATH inside the hook entry so gofumpt + golangci-lint resolve when ~/go/bin isn't on the operator's interactive shell PATH (common — go install puts them there but PATH config varies). Without this, the hooks fail with 'Executable not found' even when the tools are installed. Pin the Makefile setup target to v2.5.0 so a fresh clone gets the same binary CI runs — keeps pre-commit and CI from drifting again.	2026-05-03 21:31:47 +01:00
steve	d9c8da139c	ci: bump golangci-lint to v2.5.0 (Go 1.25-built binary) The v2.1.6 release binary is built with Go 1.24, and golangci-lint refuses to load a config targeting a newer toolchain than itself ('Go language version (go1.24) used to build golangci-lint is lower than the targeted Go version (1.25.0)'). go.mod is on 1.25, so the binary needs to be too. Locally this didn't bite because 'go install …@v2.1.6' compiled v2.1.6 against the local Go 1.25 toolchain; CI uses the prebuilt release tarball which carries the build-time Go version. v2.5.0 is the first v2.x line built with Go 1.25 — pin in lockstep with go.mod going forward.	2026-05-03 21:29:02 +01:00
steve	174bdae750	ci: enforce lint locally via pre-commit hook The repo had a .pre-commit-config.yaml entry for golangci-lint already, but pinned to v1.61.0 — which doesn't grok the v2 schema we just migrated to, so it would crash if anyone ever ran it. Hence nobody did. Replace the third-party hook blocks with local hooks that call whatever tool is on the developer's PATH (gofumpt + go vet + golangci-lint). That way the version of each tool tracks what the developer would invoke by hand — no drift between hook config and binary. Add 'make setup' as a one-liner per-clone bootstrap: * installs gofumpt + golangci-lint via go install if missing * installs the pre-commit hooks via 'pre-commit install' end-of-file-fixer auto-fixed two existing files (web/static/css/ styles.css and ask.md) — trailing newlines, harmless.	2026-05-03 21:26:24 +01:00
steve	b6f8de1dcc	lint: drive baseline to zero, drop only-new-issues gate Cleanup pass over the repo so CI can enforce lint going forward without the only-new-issues escape hatch: * gofumpt -w across the tree (31 hits, all formatting) * misspell --fix (25 hits, US-locale spelling) — but reverted on api.JobCancelled = "cancelled" since that literal is the wire + DB CHECK constraint value, plus matched the case in store/fleet.go back to "cancelled" and added //nolint:misspell on both for the next time someone reaches for the auto-fix * Wrap every `defer rows.Close()` / `defer stmt.Close()` / `defer res.Body.Close()` in `defer func() { _ = .Close() }()` to satisfy errcheck without losing the close itself * websocket.Dial callers (1 prod, 4 tests) now capture + close the upgrade response Body — coder/websocket can return res with a nil Body on success, so the test deferred-closes guard against that * Annotate the two genuine-by-design nilerr cases with //nolint comments explaining why nil-on-error is the contract (cookie missing = no session; ctx cancelled mid-backoff = clean shutdown) * Add brief godoc on the 10 exported const groups + types that revive flagged (api.HostOS/HostArch/JobKind/JobStatus/LogStream/ ErrorCode, restic.EventKind, store.Role, web.FS) * Drop the unused (Server).userByID method Inline the unparam baseView(active) — every UI page is under the dashboard primary nav today Result: `golangci-lint run ./...` reports 0 issues. CI lint job no longer needs only-new-issues: true; X-06 follow-up entry in tasks.md removed.	2026-05-03 16:15:17 +01:00
steve	41c3ec7c6f	ci: migrate .golangci.yml to v2 schema + only-new-issues gate The bump from golangci-lint-action@v6 → v7 (which downloads the v2.x binary) was blocking CI lint with 'unsupported version of the configuration: ""' because .golangci.yml was still in the v1 schema. Migrate the config to v2: * version: "2" prelude * disable-all → default: none * linters-settings → linters.settings * gofumpt + goimports move into formatters.enable + formatters.settings * exclude-rules move into linters.exclusions.rules * gosimple drops (folded into staticcheck in v2) Fix the four lint hits in the new P2R-02 code: * host_bandwidth.go: convert hostBandwidthRequest directly to hostBandwidthView via type conversion (S1016) * ui_repo.go: drop unparam savedSection + status arguments from renderRepoPage (always "" / always 422 — split GET render from validation-fail render) * ui_schedules.go: gofumpt formatting on the scheduleEditPage struct Add only-new-issues: true to the lint job. The repo carries ~90 pre-existing findings (gofumpt drift × 31, misspell × 25, missing godoc × 10, bodyclose × 6, errcheck × 12, …) accumulated before lint was actually wired into CI. Without this gate, every PR would fail on baseline noise instead of its own changes. Track the cleanup as X-06 in tasks.md so the gate is temporary.	2026-05-03 15:00:24 +01:00
steve	8b57b8a06d	P2R-02 ✅ — mark Phase 4 complete, all 6 slices done Update tasks.md: Phase 4 of the P2 redesign is done end-to-end. Slice 1–5 wired the four host-detail tabs against the new slim-schedule + source-group + repo-maintenance model; slice 6 ran a Playwright sweep against the live :8080 server (login, walk every tab, create source group, create schedule, Run-now, confirm a snapshot landed) — clean pass, no console errors. Screenshots in _diag/p2r-02-sweep/. Side-fix landed alongside slice 6: agent runner now drops restic's noisy --json status events from log.stream (the throttled job.progress envelope already covers them). Phase 5 (server-side maintenance ticker — P2R-03..08) is next.	2026-05-03 14:49:40 +01:00
steve	a4823193e7	P2R-02 slice 5: dashboard row Run-now uses covering schedule Replace the placeholder 'Open →' link with a per-host Run-now decision computed server-side once per render: * If the host has exactly one enabled schedule whose source-group set covers every group on the host → primary 'Run all groups' button (HX-POST to that schedule's /run endpoint, fires every backup the host knows about in one click). * Otherwise (zero matches, multiple matches, or any ambiguity) → ghost 'Open →' link to /hosts/{id}/sources, where the operator picks per-group from the source-group rows. dashboardPage.Hosts moves from []store.Host to []dashboardHostRow to carry the precomputed RunAllScheduleID; host_row.html now reads .Host.* and .RunAllScheduleID. Two extra store calls per host on dashboard render — fine at fleet sizes we care about; if we ever need to support thousands of hosts we'll batch these queries.	2026-05-03 13:42:50 +01:00
steve	5f2845c331	agent runner: drop status-event spam from log.stream restic --json emits a status frame ~every 16ms during a backup. The runner was forwarding every line to log.stream verbatim, which flooded the live log pane with duplicate status JSON for any short-running backup (visible immediately on a 1000-file, ~4MB test set: ~14 identical 'percent_done: 1' lines in 220ms). The progress widget already covers the same information at a sane sample rate (one per second via job.progress), so the raw status lines in log.stream are double-bookkeeping. Skip them and forward only non-status lines (file names, errors, summary). Throttling logic for job.progress is unchanged.	2026-05-03 13:35:18 +01:00
steve	e45f75598f	P2R-02 follow-up: schedule Run-now feedback (single → job log, multi → toast) Schedules tab Run-now used to silently HX-Redirect back to the list, leaving the operator wondering whether the click registered. Now: * Single-source-group schedule → HX-Redirect to that one job's live log, matching the per-source-group Run-now UX from Sources. * Multi-group schedule → stay on the schedules list and fire a success toast ("N backups dispatched: <group names>") via the existing rm:toast HX-Trigger channel, so the operator sees clear acknowledgement without losing their place. dispatchBackupForGroup now returns the persisted job ID so the caller can choose between job-log redirect and toast feedback; on any internal failure it returns "" and the warning still hits slog as before. The cron-fired path (dispatchScheduledJob) ignores the return value, behaviour unchanged.	2026-05-03 13:25:31 +01:00
steve	9ac5088fde	P2R-02 slice 4: Repo tab — connection / bandwidth / maintenance Three independent forms on /hosts/{id}/repo so saving one section doesn't disturb the others: * Connection: edits repo URL, username, password (pre-filled from the redacted GET /api/hosts/{id}/repo-credentials view; password field shows masked stored-creds placeholder; blank password = keep existing). On save, encrypts and pushes config.update to a connected agent. * Bandwidth: host-wide upload/download caps (KB/s; blank = no cap) written via store.SetHostBandwidth. New REST endpoint PUT /api/hosts/{id}/bandwidth for JSON callers. * Maintenance: forget/prune/check cadences + check subset %, with per-row enabled toggles. Reuses cronParser for validation; auto-seeds the row if a host pre-dates the migration. Right-rail surfaces repo size, snapshot count, snapshots-by-tag breakdown (counted from existing snapshot tag rows), and an 'untagged snapshots are left alone' note. Danger-zone re-init button is rendered but disabled with a hint pointing at P2R-09 (real implementation lands there). Validation re-renders the page with the relevant form's banner and all other section state intact. Successful saves redirect with a ?saved=<section> query param so the page surfaces a small ✓ saved indicator on the relevant form. ci.yml: bump golangci-lint-action v6→v7 (separate change picked up in this commit).	2026-05-03 12:14:03 +01:00
steve	0b70da2955	P2R-02 follow-up: Run-now works on disabled schedules with confirm Surface the Run-now button on every schedule when the host is online, not just enabled ones. Disabled rows render the button as a non-primary style + a HX-confirm dialog ("This schedule is paused — running it now won't change that. Fire it once anyway?"); enabled rows keep the zero-friction primary button. Server-side, Run-now no longer short-circuits on !Enabled — it dispatches the source groups inline rather than via dispatchScheduledJob (which always bails on disabled schedules, since cron-tick semantics are different from explicit operator intent). The audit-log entry inside dispatchBackupForGroup still records every fire.	2026-05-03 12:07:26 +01:00
steve	54528b9b15	P2R-02 follow-up: clickable rows on Sources/Schedules + cron-preset tooltips Aligns Sources and Schedules tab rows with the dashboard's row-click UX: whole-row click navigates to the row's edit page (mirroring .host-row.clickable). Drops the redundant Edit buttons; Run-now and Delete remain in .row-action cells that sit above the row-link overlay via z-index. Schedule edit form's cron preset chips now carry human-readable title= tooltips ("Every day at 03:00", "Every Sunday at 03:00", etc). tasks.md gets a binding row-design rule covering all current and future list-row templates, and the P2R-02 entry is split into the six slices already agreed with the operator (slices 1–3 marked done, 4 next).	2026-05-03 12:01:55 +01:00
steve	8d993ac77c	P2R-02 slice 3: Schedules tab — slim list, new/edit form, delete, Run-now Schedules list: status (enabled/paused) + cron + source-group tags + actions (Run-now when enabled+online, Edit, Delete). Run-now reuses dispatchScheduledJob — same path real cron fires take, so each referenced source group runs as its own backup with its own tag. Falls back to a 409 if the agent is offline. Schedule new/edit form: cron input with five preset chips (quick-pick @hourly / nightly / 6h / weekly / monthly), source-group multi-pick rendered as styled checkbox cards (visual state tracks the underlying box via a tiny inline script), enabled toggle. No paths/excludes/retention/kind on the schedule itself — those live on source groups now. Server-side validation re-renders with the operator's input + ticked groups intact. Every successful mutation calls pushScheduleSetAsync. Adds .schd-row, .preset-chip, .picker styles.	2026-05-03 11:55:16 +01:00
steve	27a995e812	P2R-02 slice 2 follow-up: refuse to delete a host's last source group Belt-and-braces: the UI now disables the Delete button when a group is the only one on the host (with a tooltip explaining why), and the server-side handler returns 409 if a curl/form-replay tries anyway. Every host needs at least one source group to be backup-able, so the 'last group on a fresh host' case is a meaningful accident to guard against.	2026-05-03 11:49:17 +01:00
steve	da9ed4c3d4	P2R-02 slice 2: Sources tab — list, new/edit form, delete, Run-now Sources tab now lists every source group on the host with per-row counts (used-by-N-schedules, snapshot count by tag), the v4 conflict tag (keep-* dimension that has no compatible cadence), and Run-now / Edit / Delete actions. Run-now reuses the existing HTMX-aware /hosts/{id}/source-groups/{gid}/run handler. New /hosts/{id}/sources/new and /sources/{gid}/edit form: name + includes/excludes textareas + the 3×2 keep-* retention grid + retry-on-offline knobs. Server-side validation re-renders with the operator's input intact; the inline conflict banner shows above the retention grid when ConflictDimension is set. Delete blocks (UI + server) when the group is referenced by any schedule. Every successful mutation calls pushScheduleSetAsync so an online agent re-arms within seconds. Adds .src-row and .keep-cell to input.css for the row + retention grid layout.	2026-05-03 11:44:43 +01:00
steve	079b4bed70	P2R-02 slice 1: host-detail sub-tab skeleton Extract header/vitals/sub-tabs into a host_chrome partial that every host-detail tab page renders. Sources / Schedules / Repo go from inert divs to real <a> links backed by stub pages that share the chrome and a 'coming next' body — slices 2/3/4 fill them in. Also re-establishes the version indicator (host_schedule_version vs agent's applied_schedule_version) in the header. Drops the legacy fat-schedule list/edit templates that referenced fields removed by the P2 redesign (Manual / Paths / RetentionPolicy on Schedule); the new templates land in slice 3.	2026-05-03 11:37:55 +01:00
steve	84914fd6c5	ci: only trigger on PRs into main Drop the push-to-main trigger; main is fast-forward only via PR, so the post-merge run was redundant.	2026-05-03 11:25:13 +01:00
steve	c019633b77	ci: fix race-trip in enrollment fixture + bump golangci-lint to v2.1.6 - host_credentials_test.go's CreateEnrollmentToken fixture passed 1<<20 as the TTL (third arg, time.Duration) — that's ~1ms in nanoseconds. Local non-race runs finished inside the window, but -race overhead blew the deadline so the token was already expired by the time GetEnrollmentTokenAttachments / ConsumeEnrollmentToken ran. Use time.Hour instead, which matches the spirit of a per-test fixture. - Lint pin v1.61.0 was built against Go 1.23 and refuses to load a config targeting newer toolchains. go.mod is on 1.25, so the lint step exited 3 ('the Go language version used to build golangci-lint is lower than the targeted Go version'). Bumping to v2.1.6, which supports Go 1.25. Both failures showed up only on the Gitea runner because local make target runs go test without -race and lint hadn't been re-run after the go.mod toolchain bump.	2026-05-03 11:13:22 +01:00
steve	d692272d10	P2R-01 follow-up: WS-path tests + drop unused retention from backup dispatch Adds p2r01_ws_test.go covering the two paths the original commit's in-process tests couldn't reach without a live conn: - maybeAutoInit dispatches command.run(init) on first hello when creds are bound, skips on second hello once a job row exists, and skips entirely when the host has no creds. - dispatchScheduledJob iterates a schedule's source groups and emits one backup per group with the right Tag/Includes; persists job rows with actor_kind=schedule + scheduled_id; no-ops on a disabled schedule. Drops RetentionPolicy from the per-group Run-now and schedule.fire backup payloads — the agent's RunBackup ignores it (forget is the only consumer). Adds Hub.Conn() so tests can grab the live *Conn post-hello.	2026-05-03 11:00:45 +01:00
steve	ec0bf0f6c3	P2R-01: REST + WS rewire against the slim shape Schedules CRUD now takes {cron, enabled, source_group_ids[]} with cron parsed via robfig/cron/v3 and group membership scoped to the host. New source-groups CRUD lives at /api/hosts/{id}/source-groups; delete refuses with 409 if any schedule still references the group, returning the schedule list so the UI can prompt 'remove from these schedules first.' Repo-maintenance GET/PUT manages forget/prune/check cadences on host_repo_maintenance — no version bump, the server-side ticker (P2R-06) drives execution. Per-source-group Run-now (POST /hosts/{id}/source-groups/{gid}/run) resolves the group's includes/excludes/retention/tag and dispatches a backup command.run with the new structured CommandRunPayload fields (Includes/Excludes/Tag). Old per-host /hosts/{id}/run-backup and /hosts/{id}/init-repo return 410 Gone with a redirect message. schedule_push.go is rebuilt: buildScheduleSetPayload assembles the slim wire shape, pushScheduleSetOnConn ships it during the on-hello window, pushScheduleSetAsync fires after every CRUD mutation, and dispatchScheduledJob handles agent schedule.fire by iterating the schedule's source groups and dispatching one backup per group with actor_kind=schedule and scheduled_id pointing at the schedule. Auto-init at first WS connect: when the host has repo creds bound and no init job in its history, server dispatches restic init. Restic's 'config file already exists' soft-success means re-runs against an existing repo no-op; we don't auto-retry on failure (operator triggers re-init manually via the danger zone in P2R-09). api.Schedule drops Kind/Paths/Excludes/Tags/RetentionPolicy/Manual etc. in favour of {id, cron, enabled, source_groups: [...]}. The agent scheduler stops checking sch.Manual; cmd/agent's backup dispatch reads Includes/Excludes/Tag instead of Args. Tests cover the new HTTP surface end-to-end: source-groups CRUD with in-use refusal, schedule validation (bad cron / missing groups / foreign group), repo-maintenance auto-seed and validation, the 410 route, and buildScheduleSetPayload's wire-shape correctness. Full suite passes; smoke env exercises auto-init dispatch on hello, async push after schedule create, and per-source-group Run-now landing the right paths/excludes/tag at the agent.	2026-05-03 10:56:40 +01:00
steve	0735038ea8	fix(.mcp.json): wrap playwright under mcpServers key Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-03 10:35:57 +01:00
steve	e6657c23ff	P2 redesign · phase 2.5: tasks.md rewrite + UI patch-up The store rewrite in `e7eea7a` left tasks.md describing a data shape (fat schedules, host.repo_initialised_at, manual flag) that no longer exists, and left the host-detail templates rendering against fields the store no longer exposes. This commit reconciles both. tasks.md * Mid-phase pivot called out at the top of Phase 2 with commit hashes. * P2-01..P2-05 kept as done but stamped ⚠️ "shipped against old shape — to re-validate under P2R-02". * P2-04.5 (manual flag) struck as superseded. * New P2R-NN section covering work that previously lived only in commit messages and code stubs: P2R-00.1/00.2/00.3/00.4 — phases already shipped (this commit records 00.4) P2R-01 — REST + WS rewire against slim schedules + source groups + repo maintenance + auto-init P2R-02 — UI rewire against the v4 wireframes P2R-03..05 — prune / check / unlock command surfaces P2R-06 — server-side maintenance ticker (cadence-driven) P2R-07 — repo stats panel P2R-08 — pending_runs queue worker P2R-09 — auto-init UX polish P2R-10..12 — pre/post hooks rehomed from schedule onto source group P2R-13..14 — bandwidth + next/last-run surface * P2-16/17/18 (Windows + announce-and-approve) untouched. * Phase 2 acceptance criteria rewritten against the new model. UI patch-up (P2R-00.4) * host_detail.html + host_row.html: removed every $host.RepoInitialisedAt reference (column dropped in migration 0008 — render was 500'ing). * Removed manual init-repo branches; the auto-init path replaces them. * Schedules sub-tab demoted from active link to inert div until P2R-02 rebuilds the page (it was linking to a raw 501 from the stubbed ui_schedules.go handlers). * Disabled the four per-host Run-now buttons (dashboard row + host detail header + empty-snapshots state + right-rail) with a "lands in P2 Phase 4" hint — handler is 501-stubbed pending P2R-01, so leaving them clickable produced silent failures over htmx. * Dashboard row-action becomes "Open →" instead of Run-now. Project tooling * .mcp.json at repo root: project-scoped Playwright MCP override. Forces --headless (so I don't pop a browser at the operator) and --output-dir _diag (so screenshots / traces land in the gitignored _diag/ directory rather than scattered at the repo root). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 09:13:05 +01:00
steve	e7eea7afac	P2 redesign · phase 2: store rewrite — sources, slim schedules, repo maintenance Go-side data model rebuilt against migration 0008. The fat-Schedule shape (paths/excludes/tags/retention/manual/kind/options/hooks) is gone; that surface lives on source_groups now. * store/types.go - Schedule slimmed to {id, host_id, cron, enabled, source_group_ids, timestamps}. SourceGroupIDs populated by Get/List, accepted on Create/Update so callers pass desired junction state in one shape. - SourceGroup added: name (= snapshot tag), includes/excludes, retention_policy, retry_max + retry_backoff_seconds, cached conflict_dimension. - HostRepoMaintenance added: forget/prune/check cadences + enabled. - PendingRun added: offline-retry queue. - Host loses RepoInitialisedAt; gains BandwidthUpKBps + BandwidthDownKBps. - RetentionPolicy moves home from "schedule field" to "source group field" but the type itself + Summary() method unchanged. * store/sources.go (new) — CRUD + GetByName + ConflictDimension cache. Group writes bump host_schedule_version; conflict cache writes don't (server-internal projection, agent doesn't see it). * store/maintenance.go (new) — CreateDefault is idempotent (INSERT OR IGNORE). UpdateRepoMaintenance doesn't bump schedule version because these run on the server's own ticker, not the agent's local cron. * store/pending.go (new) — Enqueue / DueRunsForRetry / Bump / Delete. * store/schedules.go — rewritten for slim shape + junction CRUD. Update wipes the schedule_source_groups junction wholesale and re-inserts (simpler than diffing). Adds SchedulesUsingGroup for retention-conflict detection + UI labels. * store/hosts.go — drops repo_initialised_at scan, adds bandwidth scan. New SetHostBandwidth helper. * HTTP layer — temporarily stubbed during this rewrite (501 returns with redesign_in_progress error code). Phase 3 fills these in against the new shape: - schedules.go REST CRUD - schedule_push.go agent reconciliation - ui_schedules.go HTML form CRUD Run-now-per-host + Init-repo handlers in ui_handlers.go also stubbed — both go away in the new model (Run-now per source group; auto-init at host enrolment). * enrollment.go — replaces "seed manual schedule from typed paths" with "seed default source group + repo-maintenance row." The default group gets the typed paths as its includes; operator edits later via Sources tab. * ws/handler.go — drops the MarkHostRepoInitialised projection (column is gone; auto-init makes it derivable from latest init job's status). Tests: * store: existing schedule test rewritten for slim shape + junction; new sources_test.go covers source-group CRUD, name uniqueness, conflict cache, repo-maintenance defaults + idempotent seed, pending-runs queue lifecycle. * http: schedules_test.go and schedule_push_test.go deleted — both exercised the obsolete fat-schedule API. Phase 3 rewrites them against the new endpoints. go test ./... green. cmd/server + cmd/agent build. The UI is broken end-to-end (schedules / sources / repo tabs all hit 501 stubs); Phase 3 restores REST + on-the-wire reconciliation; Phase 4 rewires the UI templates against the new model. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 21:30:41 +01:00

1 2 3 4 5

243 Commits