server: drainer uses dispatch-core to avoid duplicate pending_run enqueue

Extract dispatchBackupForGroupCore (persist+marshal+send, no enqueue on failure) from dispatchBackupForGroup. drainOne now calls the core directly so a failed Send only bumps the existing pending_runs row via BumpPendingRunAttempt — not create a second row — stopping the geometric duplication on repeated drain failures. dispatchBackupForGroup (schedule.fire path) wraps the core and keeps its enqueue-on-failure behaviour unchanged. TestDrainPendingBumpsOnSendFailure strengthened: asserts exactly 1 row remains after a send failure (was tolerating >=1 duplicate rows).
2026-05-04 00:01:42 +01:00
parent a9185424d3
commit 1e212db24e
3 changed files with 67 additions and 53 deletions
@@ -81,14 +81,15 @@ func (s *Server) drainOne(ctx context.Context, conn *ws.Conn, p store.PendingRun
 		s.abandonPending(ctx, p, "retry_max exceeded")
 		return
 	}
-	jobID := s.dispatchBackupForGroup(ctx, conn, p.HostID, p.ScheduleID, g, p.ScheduledAt)
+	// Calls dispatchBackupForGroupCore (not dispatchBackupForGroup) so a
+	// failed Send doesn't double-enqueue: dispatchBackupForGroup's
+	// enqueue-on-failure path would create a NEW pending_runs row while
+	// this function already bumps the EXISTING row via
+	// BumpPendingRunAttempt, producing geometric duplicates on repeated
+	// failures.
+	jobID, _ := s.dispatchBackupForGroupCore(ctx, conn, p.HostID, p.ScheduleID, g, p.ScheduledAt)
 	if jobID == "" {
 		// Send failed again. Bump attempt with exponential backoff.
-		// Note: dispatchBackupForGroup's failure path *also* enqueues a
-		// fresh pending_runs row (G1.1). That's a duplicate, but harmless:
-		// it'll be drained the same way and either succeed or hit
-		// retry_max alongside this one. The bump below preserves this
-		// row's history (attempt count, last error) for forensics.
 		baseBackoff := time.Duration(g.RetryBackoffSeconds) * time.Second
 		if baseBackoff <= 0 {
 			baseBackoff = 60 * time.Second