server: drain pending_runs on tick + on agent reconnect

Two trigger paths land here: - A 30s ticker in cmd/server calls Server.DrainAllDue(ctx). It walks pending_runs rows whose next_attempt_at <= now, dedupes by host, skips offline hosts, and per online host runs DrainPending. - onAgentHello spawns a background DrainPending(hostID). When a host comes back, every pending row for it is dispatchable now — due-ness becomes irrelevant once the wire is back. Each row's schedule + group are reloaded; ErrNotFound or disabled-schedule or gone-group abandons the row with a pending_run.abandoned audit. attempt >= retry_max also abandons. Otherwise dispatchBackupForGroup is invoked; success deletes the row, failure bumps attempt with exponential backoff capped at 30m.
2026-05-03 23:57:08 +01:00
parent e64cf25c0e
commit 3e337dfb3c
4 changed files with 604 additions and 0 deletions
@@ -411,6 +411,11 @@ func (s *Server) onAgentHello(ctx context.Context, hostID string, conn *ws.Conn)
 	// just no-ops. Skipped silently when the host has no creds yet —
 	// the next hello after the operator binds creds will dispatch.
 	s.maybeAutoInit(ctx, hostID, conn)
+	// Drain any pending runs that accumulated while this host was
+	// offline. Use a fresh context — the hello-bound ctx is short-lived,
+	// and the drain may take seconds across many rows. A non-blocking
+	// goroutine keeps the hello path snappy.
+	go s.DrainPending(context.Background(), hostID)
 }

 // maybeAutoInit dispatches a `restic init` job iff the host has no