Phase 3 — Alerts (P3-05/06/07) #7
Reference in New Issue
Block a user
Delete Branch "p3-alerts"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Phase-3 Alerts subsystem: hardcoded rule engine, three notification
channels (webhook + ntfy + SMTP), and the UI surfaces for review,
acknowledgement, and resolve.
backup_failed,forget_failed,prune_failed,check_failed(critical),agent_offline, plus astale_scheduleplaceholder. Engine runs as a goroutine with a 60s ticker plus
event-driven hooks (job finished, host up/down).
notification.Channel;notification.Hubfans every event out to all enabled channels in parallel and
writes a
notification_logrow per dispatch (status code,latency, error).
notification_channels.config;associated data binds ciphertext to the row id so swapping config
between rows is rejected.
/alertslist with status/severity/host filters, JSONvariant at
/api/alerts, ack + resolve handlers, dashboardcritical-alerts banner, nav badge.
/settings/notificationschannel CRUD with per-kind sub-forms,Test button that fires a synthetic
alert.testpayload.Sweep findings (live Playwright run, 2026-05-04)
Three real bugs caught and fixed mid-sweep — see commits
9be3cea,6466f8c,3d99306:alert.acknowledged/alert.resolved. AddedEngine.Acknowledge/
Engine.Resolvewrappers; handlers now route through the engine.Also detached the goroutine context with
context.WithoutCancelso the dispatch survives the 204 response.
enabled=0even when the toggle was on:hidden+checkbox both named
enabled, andPostForm.Getreturnedthe first ("0"). Switched to a slice scan helper.
hosts.open_alert_countprojection was never written by thealerts code path, so the dashboard's OPEN ALERTS card and per-host
alerts column always read 0. Added
refreshHostOpenAlertCount(recompute from alerts table — self-healing) and called it from
RaiseOrTouch(when a row was inserted),Resolve, andAutoResolve.End-to-end verified: 3 channels created + Test fired (webhook 200/1ms,
ntfy 200/322ms, SMTP 250/3ms via local MailHog) → synthetic critical
raised → /alerts list, nav badge, dashboard banner, OPEN ALERTS card
all populate → Acknowledge fans out across all 3 channels → Resolve
fans out across all 3 channels and clears the banner + count.
Test plan
go test ./...(passes locally)[restic-manager] [info] (test): test_notificationbackup_failedalert appears on /alerts at warning severity, banner does NOT show (warning), nav badge increments, all enabled channels receivealert.raisedalert.acknowledgedfanned outalert.resolvedfanned outagent_offlineraised at warning; reconnect → auto-resolvedTwo bugs in the channel-enabled affordance: 1. List-row toggle was a static span with no handler; the row's row-link overlay swallowed every click and routed to /edit. Add POST /settings/notifications/{id}/toggle backed by a new store method SetNotificationChannelEnabled, and turn the row toggle into an htmx-driven button that swaps in the new state. Use event.stopPropagation() on the toggle so it beats the row link. 2. Edit-form toggle visually flipped but the underlying checkbox reverted: the visual span lives inside the <label>, so clicking it fired the inline JS handler AND the label's native checkbox-toggle, cancelling out. Bind to the checkbox 'change' event instead and let the label do the toggling — the JS just mirrors check.checked into the .on class.Right-rail preview was rendered server-side via {{if eq $f.Kind ...}}, so it stayed on whatever kind the page loaded with. Editing an SMTP channel and flipping to ntfy in the picker left the email RFC 5322 sample on screen. Render all three preview panels with id='preview-<kind>' (only the matching one visible on first render) and toggle their .hidden class in the kind-switcher JS alongside the field panels. Same pattern used for fields-<kind>.