server: drain pending_runs on tick + on agent reconnect
Two trigger paths land here: - A 30s ticker in cmd/server calls Server.DrainAllDue(ctx). It walks pending_runs rows whose next_attempt_at <= now, dedupes by host, skips offline hosts, and per online host runs DrainPending. - onAgentHello spawns a background DrainPending(hostID). When a host comes back, every pending row for it is dispatchable now — due-ness becomes irrelevant once the wire is back. Each row's schedule + group are reloaded; ErrNotFound or disabled-schedule or gone-group abandons the row with a pending_run.abandoned audit. attempt >= retry_max also abandons. Otherwise dispatchBackupForGroup is invoked; success deletes the row, failure bumps attempt with exponential backoff capped at 30m.
This commit is contained in:
@@ -411,6 +411,11 @@ func (s *Server) onAgentHello(ctx context.Context, hostID string, conn *ws.Conn)
|
||||
// just no-ops. Skipped silently when the host has no creds yet —
|
||||
// the next hello after the operator binds creds will dispatch.
|
||||
s.maybeAutoInit(ctx, hostID, conn)
|
||||
// Drain any pending runs that accumulated while this host was
|
||||
// offline. Use a fresh context — the hello-bound ctx is short-lived,
|
||||
// and the drain may take seconds across many rows. A non-blocking
|
||||
// goroutine keeps the hello path snappy.
|
||||
go s.DrainPending(context.Background(), hostID)
|
||||
}
|
||||
|
||||
// maybeAutoInit dispatches a `restic init` job iff the host has no
|
||||
|
||||
Reference in New Issue
Block a user