restic-manager

Author	SHA1	Message	Date
steve	6a171596f1	P2-05: forget command with retention policy End-to-end forget plumbing — operator can create a forget schedule with keep-* values, agent runs restic forget --keep-* … on the schedule's cron (or via per-row Run-now), snapshot list shrinks, UI updates. * api.CommandRunPayload gains retention_policy json.RawMessage so the agent doesn't need a typed copy of the server-side struct. * restic.ForgetPolicy mirrors restic's --keep-* flags. Empty() reports zero dimensions; restic wrapper RunForget refuses to run an empty policy (would delete every snapshot). Does NOT pass --prune — pruning lives behind a separate admin-only credential (P2-06); forget just rewrites the snapshot index. * runner.RunForget mirrors RunBackup's envelope shape so the live log viewer works without special-casing. On success triggers reportSnapshots (forget shrinks the index, the host's snapshot count almost certainly changed). * cmd/agent dispatcher handles MsgCommandRun with kind=forget, decodes RetentionPolicy from the wire, builds restic.ForgetPolicy. * Server dispatchScheduleNow marshals the schedule's RetentionPolicy into the wire payload for kind=forget jobs. Refuses to dispatch a forget schedule with empty retention. * validateSchedule rejects kind=forget without at least one keep-* dimension (new error code: missing_retention). * UI schedule edit form gains a Kind dropdown (backup or forget; immutable on edit). Paths block toggles by kind via inline data-kind attributes. Form help-text explains the prune separation. Other kinds (prune, check, unlock) deferred to P2-06..08; the Kind dropdown only offers backup and forget today. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 14:07:42 +01:00
steve	8fb1c100fd	P2-04.5: kill host.default_paths in favour of manual schedules Two independent path lists for "what does this host back up?" was a real divergence footgun — operator types one set at Add-host time and a different set into a schedule, both end up in the same repo, the snapshot history looks fine until restore. Resolution: drop host.default_paths entirely; add a `manual` flag on schedules. A manual schedule has paths/excludes/tags/retention like any other but no cron — it fires only via per-schedule Run-now. Single source of truth for what gets backed up. Schema (migration 0007): * schedules.manual INTEGER NOT NULL DEFAULT 0. * For every host with non-empty default_paths, seed a manual schedule with those paths and bump host_schedule_version. * ALTER TABLE hosts DROP COLUMN default_paths. * ALTER TABLE enrollment_tokens RENAME COLUMN default_paths TO initial_paths. Original draft of this migration rebuilt hosts via the create-new + drop-old + rename-new pattern. With foreign_keys=ON (set in the connection DSN), DROP TABLE on the parent fired ON DELETE CASCADE on every child of hosts(id) — schedules / jobs / snapshots / host_credentials all wiped on the smoke env when I tried it. SQLite 3.35+ supports column-level ALTERs directly, so we skip the rebuild dance and avoid the cascade trap. Six lines of SQL instead of sixty, no FK risk. Run-now rewiring: * New `dispatchScheduleNow(hostID, scheduleID, conn?)` helper unifies the agent-driven path (cron fire → schedule.fire → OnScheduleFire callback) and the UI-driven path (operator clicks Run-now on a schedule row). Conn arg is optional; nil falls back to Hub.Send. * New POST /hosts/{id}/schedules/{sid}/run endpoint — per-row Run-now button on the schedules list. * Dashboard's per-host Run-now (handleUIRunBackup) now picks the host's only enabled manual schedule, falls back to the only enabled schedule, else returns "pick one in Schedules tab". Keeps one-click for the common case. Agent: * Scheduler skips manual schedules in cron build (silent — they're a normal data shape, not an error). * Wire Schedule struct gains Manual flag. * Schedule.fire flow unchanged — the agent only ever fires non-manual schedules anyway. UI: * Add-host form retitled "Initial schedule · manual" so the operator knows the paths become an editable schedule under the Schedules tab. Result page calls out the manual schedule + points at Host > Schedules. * Schedule edit form: "Manual schedule" checkbox at the top of the When section; toggling it hides/shows the cron field via inline JS. Server-side validator skips the cron requirement when manual=true. * Schedule list shows a "manual" tag under the status pill and renders the When column as "— run-now only —" for manual rows. Each row gets a Run-now button when the schedule is enabled and the host is online. Tests + go test ./... green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 12:26:06 +01:00
steve	608962441b	P2-02 (agent side) + P2-03: agent scheduler + schedule.fire dispatch Closes the schedule reconciliation loop end-to-end. * New `internal/agent/scheduler` package wraps robfig/cron/v3 with the lifecycle the agent needs: - Apply(ScheduleSetPayload, Sender) stops the prior cron (waiting for in-flight entries to return), rebuilds from scratch, starts, and emits schedule.ack with the version we just applied. - Disabled entries skipped silently; bad cron exprs (which shouldn't reach us — the server validates — but defensive) log a warn and skip. - On each cron tick the entry sends a new schedule.fire envelope to the server with {schedule_id, scheduled_at}. The scheduler itself never builds CommandRunPayloads — server is the source of truth for jobs. - tx is swapped on every Apply, so reconnect is handled naturally: cron entries that fire against a dropped tx log "no active connection" and skip the tick. - Stop() is idempotent and waits for the cron's in-flight workers via cron.Stop().Done(). * New wire message api.MsgScheduleFire + api.ScheduleFirePayload for the agent → server "I just fired locally" RPC. * Server-side dispatch (schedule_push.go: dispatchScheduledJob): looks up the schedule by id, validates ownership + that it's enabled, builds args from kind (paths for backup; other kinds are still arg-less in Phase 2 and grow as those job kinds land in P2-05..08), persists a jobs row with actor_kind=schedule + scheduled_id, and writes command.run back on the same conn so the agent runs through its existing dispatch path. * store.CreateJob now writes scheduled_id. This column was in the schema since 0001 but never populated — the original P1 path only had operator-driven jobs, so actor_kind was always 'user' and scheduled_id was always nil. * cmd/agent/main.go integration: dispatcher gains a scheduler.Scheduler; the MsgScheduleSet case now hands the payload to scheduler.Apply (in a goroutine so the WS read loop keeps draining other messages). WS dispatcher gains OnScheduleFire alongside OnScheduleAck. * Tests: - scheduler unit tests (4): ack-on-apply, cron tick fires schedule.fire envelope, disabled entries don't fire, replace- prior-state stops the old cron. - Server-side end-to-end: schedule.fire → command.run with the right job_id / kind / args, plus jobs row with actor_kind= "schedule" and scheduled_id linking back to the schedule. Persistence of next-fire times across agent restarts is deliberately deferred. A missed fire window during downtime simply fires once on reconnect — that's the desirable behaviour (the operator wants the missed backup to run, not be silently skipped because we lost track of when it was due). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 11:29:12 +01:00
steve	a086b0eb75	P2-02 (server side): schedule reconciliation push + ack handling Server is now the source of truth for the agent's cron set. * Helpers in schedule_push.go: - loadScheduleSetPayload reads the host's schedules + canonical version into the wire shape. - pushScheduleSetOnConn writes directly to a just-handshaken conn (avoids racing against Hub.Register on a brand-new connection). - pushScheduleSetAsync is the post-CRUD flavour — no-op when the host is offline (the next reconnect's on-hello path catches it up, so a missed push is non-fatal). - applyScheduleAck records what version the agent has confirmed. * onAgentHello restructured: was returning early when the host had no repo credentials, which made the schedule push unreachable for fresh hosts. Split into pushRepoCredsOnHello (silent no-op on ErrNotFound) + pushScheduleSetOnConn (always runs). Empty schedule list is a valid push: tells the agent to drop stale cron entries. * WS dispatcher gains an OnScheduleAck hook on HandlerDeps; the http server wires it to applyScheduleAck. MsgScheduleAck moves out of the "TODO(P2)" group into a real case that decodes the payload and forwards to the callback. * Schedule CRUD handlers each fire pushScheduleSetAsync after the audit-log write so the agent picks up changes within seconds. Tests cover: - On-hello push of an already-created schedule, agent acks, applied_schedule_version flips on the host row. - Connect-then-CRUD: empty initial push (version 0), then a follow-on push at version 1 after the operator creates a schedule via REST. Agent-side `schedule.set` handler (parse, replace local cron, emit `schedule.ack`) is the remainder of P2-02 and lands with P2-03's local scheduler. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 11:22:06 +01:00

4 Commits