store: LatestJobByKind includes in-flight jobs (avoid maintenance double-fire)
Widen the SQL query to consider all statuses (queued, running, succeeded, failed, cancelled) rather than terminal-only. An in-flight prune that outlasts the 60s tick interval previously produced ErrNotFound, causing the ticker to anchor at now-24h and fire a second prune concurrently with the first. Update the doc comment and test: remove the "queued job filtered out" case, add assertions that a running job and a queued job are each returned as the latest.
This commit is contained in:
@@ -193,20 +193,18 @@ func (s *Store) GetJob(ctx context.Context, id string) (*Job, error) {
|
||||
return &j, nil
|
||||
}
|
||||
|
||||
// LatestJobByKind returns the most recent terminal job (status in
|
||||
// 'succeeded','failed','cancelled' — UK spelling matches the wire/DB
|
||||
// literal, see api.JobCancelled) of the given kind for the host, or
|
||||
// LatestJobByKind returns the most recent job (any status, including
|
||||
// queued and running) of the given kind for the host, or
|
||||
// (nil, ErrNotFound) if no such job exists. Used by the maintenance
|
||||
// ticker to compute "last fire" anchors for the cron-due check;
|
||||
// queued and running jobs are excluded so an in-flight run doesn't
|
||||
// suppress its own cron tick from firing. //nolint:misspell // wire format
|
||||
// in-flight jobs MUST be considered or a long-running prune (>60s)
|
||||
// would re-fire on the next tick while the first is still running.
|
||||
func (s *Store) LatestJobByKind(ctx context.Context, hostID, kind string) (*Job, error) {
|
||||
row := s.db.QueryRowContext(ctx,
|
||||
`SELECT id, host_id, kind, status, scheduled_id, actor_kind, actor_id,
|
||||
started_at, finished_at, exit_code, stats, error, created_at
|
||||
FROM jobs
|
||||
WHERE host_id = ? AND kind = ?
|
||||
AND status IN ('succeeded','failed','cancelled')
|
||||
ORDER BY created_at DESC
|
||||
LIMIT 1`, hostID, kind)
|
||||
var (
|
||||
|
||||
Reference in New Issue
Block a user