Phase 3 — Alerts: per-source-group dedup #8
Reference in New Issue
Block a user
Delete Branch "p3-alerts-dedup"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Stacks on top of #7 (p3-alerts).
Until now alert dedup keyed on (host_id, kind), so two source groups
failing on one host collapsed onto a single open backup_failed row —
second failure touched last_seen_at + overwrote the message but
fired no fan-out. Operators saw one apparently-flapping alert
instead of two distinct broken things.
This PR widens the open-alert key to (host_id, kind, dedup_key)
where dedup_key is the source-group id for backup_failed and ''
for the host-scoped alerts (forget/prune/check stay repo-scoped,
agent_offline/stale_schedule are already one-per-host).
Changes
0015_jobs_source_group_id.sql— FK to source_groups, indexed.0016_alerts_dedup_key.sql— drops the oldalerts_openpartial index and replaces it with a UNIQUE partial index on
(host_id, kind, dedup_key) WHERE resolved_at IS NULL. Theindex is the dedup primitive now.
RaiseOrTouch/AutoResolve/Alertstruct gaindedup_key.engine.JobFinishedEventgainsSourceGroupID;handleJobFinishedthreads it through for backup only.ws.handler.goreads it off the freshly-loaded job row.dispatchJobWithPayloadgains a*string sourceGroupIDarg;per-group Run-now (
run_group.go) and schedule.fire pass&g.ID.Test plan
go test ./...TestRaiseOrTouchDedupsPerSourceGroupproves themulti-group case
one host via cmd/_fake_alert -dedup-key, verify two open rows
on /alerts and two fan-outs
Notes
Base is
p3-alerts; merge that one first or merge both as a pair.