P2-02 (agent side) + P2-03: agent scheduler + schedule.fire dispatch
Closes the schedule reconciliation loop end-to-end.
* New `internal/agent/scheduler` package wraps robfig/cron/v3 with
the lifecycle the agent needs:
- Apply(ScheduleSetPayload, Sender) stops the prior cron (waiting
for in-flight entries to return), rebuilds from scratch, starts,
and emits schedule.ack with the version we just applied.
- Disabled entries skipped silently; bad cron exprs (which
shouldn't reach us — the server validates — but defensive)
log a warn and skip.
- On each cron tick the entry sends a new schedule.fire envelope
to the server with {schedule_id, scheduled_at}. The scheduler
itself never builds CommandRunPayloads — server is the source
of truth for jobs.
- tx is swapped on every Apply, so reconnect is handled
naturally: cron entries that fire against a dropped tx log
"no active connection" and skip the tick.
- Stop() is idempotent and waits for the cron's in-flight
workers via cron.Stop().Done().
* New wire message api.MsgScheduleFire + api.ScheduleFirePayload
for the agent → server "I just fired locally" RPC.
* Server-side dispatch (schedule_push.go: dispatchScheduledJob):
looks up the schedule by id, validates ownership + that it's
enabled, builds args from kind (paths for backup; other kinds
are still arg-less in Phase 2 and grow as those job kinds land
in P2-05..08), persists a jobs row with actor_kind=schedule +
scheduled_id, and writes command.run back on the same conn so
the agent runs through its existing dispatch path.
* store.CreateJob now writes scheduled_id. This column was in the
schema since 0001 but never populated — the original P1 path
only had operator-driven jobs, so actor_kind was always 'user'
and scheduled_id was always nil.
* cmd/agent/main.go integration: dispatcher gains a
*scheduler.Scheduler; the MsgScheduleSet case now hands the
payload to scheduler.Apply (in a goroutine so the WS read loop
keeps draining other messages).
* WS dispatcher gains OnScheduleFire alongside OnScheduleAck.
* Tests:
- scheduler unit tests (4): ack-on-apply, cron tick fires
schedule.fire envelope, disabled entries don't fire, replace-
prior-state stops the old cron.
- Server-side end-to-end: schedule.fire → command.run with the
right job_id / kind / args, plus jobs row with actor_kind=
"schedule" and scheduled_id linking back to the schedule.
Persistence of next-fire times across agent restarts is
deliberately deferred. A missed fire window during downtime
simply fires once on reconnect — that's the desirable behaviour
(the operator wants the missed backup to run, not be silently
skipped because we lost track of when it was due).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
+12
-2
@@ -13,6 +13,7 @@ import (
|
||||
|
||||
"gitea.dcglab.co.uk/steve/restic-manager/internal/agent/config"
|
||||
"gitea.dcglab.co.uk/steve/restic-manager/internal/agent/runner"
|
||||
"gitea.dcglab.co.uk/steve/restic-manager/internal/agent/scheduler"
|
||||
"gitea.dcglab.co.uk/steve/restic-manager/internal/agent/secrets"
|
||||
"gitea.dcglab.co.uk/steve/restic-manager/internal/agent/sysinfo"
|
||||
"gitea.dcglab.co.uk/steve/restic-manager/internal/agent/wsclient"
|
||||
@@ -103,6 +104,7 @@ func run() error {
|
||||
d := &dispatcher{
|
||||
resticBin: resticBin,
|
||||
secrets: sec,
|
||||
scheduler: scheduler.New(),
|
||||
}
|
||||
if err := wsclient.Run(ctx, wsCfg, d.handle); err != nil {
|
||||
return fmt.Errorf("ws run: %w", err)
|
||||
@@ -166,6 +168,7 @@ func openSecretsStore(cfg *config.Config) (*secrets.Store, error) {
|
||||
type dispatcher struct {
|
||||
resticBin string
|
||||
secrets *secrets.Store
|
||||
scheduler *scheduler.Scheduler
|
||||
}
|
||||
|
||||
func (d *dispatcher) handle(ctx context.Context, env api.Envelope, tx wsclient.Sender) error {
|
||||
@@ -182,8 +185,15 @@ func (d *dispatcher) handle(ctx context.Context, env api.Envelope, tx wsclient.S
|
||||
slog.Info("ws agent: command.cancel received (cancellation lands in P2)", "id", env.ID)
|
||||
|
||||
case api.MsgScheduleSet:
|
||||
// TODO(P2): apply the schedule.
|
||||
slog.Info("ws agent: schedule.set received (handled in P2)", "id", env.ID)
|
||||
var p api.ScheduleSetPayload
|
||||
if err := env.UnmarshalPayload(&p); err != nil {
|
||||
return fmt.Errorf("schedule.set: %w", err)
|
||||
}
|
||||
// scheduler.Apply rebuilds the local cron from scratch and
|
||||
// emits schedule.ack via tx. Async-safe: tx may have to wait
|
||||
// briefly on the connection's writeMu, but the read loop
|
||||
// keeps draining other messages.
|
||||
go d.scheduler.Apply(p, tx)
|
||||
|
||||
case api.MsgConfigUpdate:
|
||||
var p api.ConfigUpdatePayload
|
||||
|
||||
Reference in New Issue
Block a user