docs: P3 alerts spec — add SMTP as first-class v1 channel
Post-brainstorm change after operator review: overnight-digest / "don't ping me at 03:00, email me in the morning" use case is poorly served by ntfy (push) and clumsy via webhook → email-gateway. SMTP joins webhook + ntfy as the third v1 channel; Apprise stays deferred. Spec updates: - Decision 5 reworded: three channels in v1. - Channel iface gains smtpChannel using net/smtp + crypto/tls. 10s timeout vs 5s for HTTP — STARTTLS handshake + DATA over a slow link legitimately needs the headroom. - Migration 0014 CHECK now allows 'smtp'. New smtpConfig struct: host, port, encryption (starttls/tls/none), username, password (AEAD), from, to. One channel = one To-address; multi-recipient = multiple channels (keeps failure attribution per-recipient). - Body shape documented: hardcoded subject pattern '[restic-manager] [<sev>] <host>: <kind>', Message-ID includes the alert id so threading groups raised → ack → resolved cleanly. Plain text only in v1. - Encryption defaults to STARTTLS on 465/587; PLAIN auth over TLS, no XOAUTH2 yet (app passwords recommended for Gmail / M365). - Test plan adds MailHog step in the Playwright sweep. - Non-goals expanded: HTML emails, OAuth2/XOAUTH2, multi-recipient channels are explicitly out of v1. Wireframe updates (_diag/p3-alerts-wireframe/wireframe.html): - Kind picker grows from 2 cards to 3 (Webhook / Ntfy / SMTP @). SMTP gets the --ok green colour family so it visually separates from webhook (accent) and ntfy (warm). - New SMTP variant section (3c): host+port+encryption row, user+pass row, from+to row, test result, plus right-rail email shape preview showing the RFC 5322 layout. - Channel list grows a third row: 'overnight-digest · smtp://… → ops-overnight@example.com'.
This commit is contained in:
@@ -22,8 +22,12 @@ Brainstorm decisions (in order asked):
|
||||
4. **Resolution.** Auto-resolve when the underlying condition clears + manual
|
||||
Resolve at any time. Acknowledge is a separate "I've seen it" intermediate
|
||||
state that does NOT close the alert.
|
||||
5. **v1 channels.** Webhook + native ntfy. Apprise deferred (the channel
|
||||
plumbing accepts new kinds without reshaping).
|
||||
5. **v1 channels.** Webhook + native ntfy + SMTP. Apprise deferred (the
|
||||
channel plumbing accepts new kinds without reshaping). SMTP added as
|
||||
a first-class channel post-brainstorm because the use case — overnight
|
||||
alerts the operator wants to read in the morning rather than be pinged
|
||||
on at 03:00 — is poorly served by ntfy's push model and clumsy via
|
||||
webhook → email-gateway.
|
||||
6. **Channel scope.** Global only. No per-host or per-severity routing in v1.
|
||||
7. **Notification body.** Structured JSON for webhooks, formatted
|
||||
title+body+click-URL for ntfy, plus a per-channel "Send test notification"
|
||||
@@ -70,7 +74,7 @@ goroutine:
|
||||
| `internal/alert.Engine` | Owns the rule evaluation. Exposes `OnJobFinished`, `OnHostOffline`, `OnHostOnline` event hooks; runs a 60s ticker for stale-schedule + auto-resolution sweeps. Persists raises/resolves through the store. | store, notification.Hub, slog |
|
||||
| `internal/alert.Rule` + per-rule files | Each of the six rules is a small struct with `Kind() string`, `Severity() string`, `MessageFor(ctx) string`. The engine iterates over a registered slice. | store models |
|
||||
| `internal/notification.Hub` | Receives "alert raised/resolved/test" events; fans out to enabled channels in parallel; logs results to a new `notification_log` table. | store, channel adapters |
|
||||
| `internal/notification.Channel` (iface) | Single method `Send(ctx, payload) error` with a 5s context. Two impls in v1: `webhookChannel`, `ntfyChannel`. | http.Client |
|
||||
| `internal/notification.Channel` (iface) | Single method `Send(ctx, payload) error` with a 5s context for HTTP channels, 10s for SMTP. Three impls in v1: `webhookChannel`, `ntfyChannel`, `smtpChannel`. | http.Client; net/smtp + crypto/tls for SMTP |
|
||||
| `internal/store/alerts.go` | CRUD on `alerts` table: `RaiseOrTouch(host_id, kind, severity, message)`, `Acknowledge(id, user)`, `Resolve(id, by user)`, `AutoResolve(host_id, kind)`, `ListAlerts(filter)`, plus the `last_seen_at` bump. | sqlite |
|
||||
| `internal/store/notification_channels.go` | CRUD on `notification_channels` (new table) + `notification_log` (new table). | sqlite, crypto.AEAD (for secrets) |
|
||||
| `internal/server/http/ui_alerts.go` | `/alerts` page handler + filter parsing + ack/resolve form actions. | store |
|
||||
@@ -162,6 +166,58 @@ Touch-only events keep the row's `last_seen_at` fresh so the UI can render
|
||||
alert.test`. The same envelope shape is reused across events — operators
|
||||
build one bridge, switch on `event` and `severity`.
|
||||
|
||||
**SMTP** — single-recipient plain-text email per channel. The channel
|
||||
config carries the SMTP server credentials and a `to` address; one
|
||||
channel = one recipient (or one distribution-list address). Operators
|
||||
who want multiple recipients add multiple channels — keeps the config
|
||||
flat and the failure modes per-recipient.
|
||||
|
||||
Subject pattern is hardcoded (no per-channel template in v1):
|
||||
|
||||
```
|
||||
Subject: [restic-manager] [<severity>] <host_name>: <kind>
|
||||
From: <configured-from-address>
|
||||
To: <configured-to-address>
|
||||
Date: <RFC 5322>
|
||||
Message-ID: <alert_id@<server-host>>
|
||||
|
||||
<message line — same string the webhook/ntfy gets>
|
||||
|
||||
—
|
||||
Raised at: 2026-05-04T15:42:01Z
|
||||
Severity: warning
|
||||
Host: alfa-01
|
||||
Kind: backup_failed
|
||||
|
||||
Open in restic-manager:
|
||||
https://restic-manager.example/alerts/01KQT...
|
||||
|
||||
(This message was sent by restic-manager. Acknowledge or resolve in the UI.)
|
||||
```
|
||||
|
||||
The body is plain text only in v1 — no HTML alternative — both because
|
||||
the data is already structured well enough as text and because HTML
|
||||
email opens a long tail of rendering / sanitisation concerns. The
|
||||
`Message-ID` includes the alert id so a thread-aware client can group
|
||||
related events (raised → acknowledged → resolved) together.
|
||||
|
||||
Encryption:
|
||||
- **STARTTLS** (default, port 587). Opportunistic upgrade. Most
|
||||
operator-facing relays.
|
||||
- **Implicit TLS** (port 465). Connect-then-TLS-handshake.
|
||||
- **None** (port 25). Plain. Hidden behind a "Yes I understand" warning
|
||||
on the form because the password goes over the wire.
|
||||
|
||||
Auth:
|
||||
- **PLAIN** (RFC 4616) over TLS. Default and almost always what's wanted.
|
||||
- **CRAM-MD5** (RFC 2195). Offered if the server advertises it, no UI
|
||||
toggle — automatic.
|
||||
- No OAuth2 / XOAUTH2 in v1; that's a real next step if Gmail-without-
|
||||
app-passwords becomes a recurring ask.
|
||||
|
||||
Per-message timeout is 10s (vs 5s for HTTP channels) — STARTTLS
|
||||
handshake + DATA over a slow link can legitimately take that long.
|
||||
|
||||
**Ntfy** — uses the standard publish format:
|
||||
|
||||
```
|
||||
@@ -229,11 +285,11 @@ with rows from the alert-engine-pre-bump period.
|
||||
```sql
|
||||
CREATE TABLE notification_channels (
|
||||
id TEXT PRIMARY KEY,
|
||||
kind TEXT NOT NULL CHECK (kind IN ('webhook', 'ntfy')),
|
||||
kind TEXT NOT NULL CHECK (kind IN ('webhook', 'ntfy', 'smtp')),
|
||||
name TEXT NOT NULL,
|
||||
enabled INTEGER NOT NULL DEFAULT 1 CHECK (enabled IN (0, 1)),
|
||||
config BLOB NOT NULL, -- AEAD-encrypted JSON; per-kind shape
|
||||
default_priority TEXT, -- ntfy only; null for others
|
||||
default_priority TEXT, -- ntfy only; null for webhook + smtp
|
||||
created_at TEXT NOT NULL,
|
||||
updated_at TEXT NOT NULL,
|
||||
last_fired_at TEXT
|
||||
@@ -273,6 +329,16 @@ type ntfyConfig struct {
|
||||
Topic string `json:"topic"`
|
||||
AccessToken string `json:"access_token,omitempty"`
|
||||
}
|
||||
|
||||
type smtpConfig struct {
|
||||
Host string `json:"host"` // e.g. smtp.example.com
|
||||
Port int `json:"port"` // default 587 (STARTTLS), 465 (TLS), 25 (none)
|
||||
Encryption string `json:"encryption"` // "starttls" | "tls" | "none"
|
||||
Username string `json:"username"`
|
||||
Password string `json:"password"` // sensitive — AEAD-encrypted with the rest of config
|
||||
From string `json:"from"` // RFC 5322 address; "alerts@example.com" or "Restic-Manager <alerts@…>"
|
||||
To string `json:"to"` // single recipient or distribution-list address; v1 = one channel = one to-line
|
||||
}
|
||||
```
|
||||
|
||||
### Engine state
|
||||
@@ -316,6 +382,13 @@ inert sub-tabs.
|
||||
- `internal/notification/ntfy_test.go` — title/priority/tags/click headers
|
||||
match the severity mapping; access token sent as `Authorization: Bearer
|
||||
<token>`; default priority overridden by severity for critical.
|
||||
- `internal/notification/smtp_test.go` — round-trip against a local
|
||||
`net/smtp.NewServer`-style fake (or `mhog`/MailHog if convenient):
|
||||
STARTTLS handshake completes against a self-signed cert; PLAIN auth
|
||||
uses configured creds; subject + from + to + body bytes match the
|
||||
spec'd format; Message-ID contains the alert id; 10s timeout enforced;
|
||||
failure path (auth refused) lands in `notification_log` with the
|
||||
server's error string.
|
||||
- `internal/server/http/ui_alerts_test.go` — page renders with filters
|
||||
applied; ack/resolve POSTs flip the row + write audit; HX-Redirect
|
||||
bounces back to the filtered list.
|
||||
@@ -346,14 +419,18 @@ End-of-phase sweep mirrors the P2R-02 / P3-restore pattern:
|
||||
test" → green ✓.
|
||||
7. Configure a ntfy channel pointing at a local sink → click "Send test"
|
||||
→ green ✓.
|
||||
8. Trigger a fresh failed backup → both channels receive the notification
|
||||
(verified from sink logs); `notification_log` has two rows
|
||||
`event=alert.raised, ok=true`.
|
||||
9. Manually Resolve the open `backup_failed`; confirm both channels
|
||||
receive `event=alert.resolved`.
|
||||
10. Critical-severity test: trigger `check_failed` (mocked) → dashboard
|
||||
8. Configure an SMTP channel pointing at a local MailHog (Docker, port
|
||||
1025, no TLS for the local-only sweep) → click "Send test" → green ✓
|
||||
→ MailHog UI at :8025 shows the test email with the right subject
|
||||
and Message-ID.
|
||||
9. Trigger a fresh failed backup → all three channels receive the
|
||||
notification (verified from sink logs + MailHog inbox);
|
||||
`notification_log` has three rows `event=alert.raised, ok=true`.
|
||||
10. Manually Resolve the open `backup_failed`; confirm all three channels
|
||||
receive `event=alert.resolved`.
|
||||
11. Critical-severity test: trigger `check_failed` (mocked) → dashboard
|
||||
banner appears; clicking it lands on `/alerts?severity=critical&status=open`.
|
||||
11. Empty the alerts again → banner disappears.
|
||||
12. Empty the alerts again → banner disappears.
|
||||
|
||||
Screenshots into `_diag/p3-alerts-sweep/`. End-to-end clean, zero console
|
||||
errors, before handing back.
|
||||
@@ -373,8 +450,17 @@ errors, before handing back.
|
||||
- **Per-rule cooldowns / re-raise on long-running issues.** Out of scope
|
||||
(brainstorm question 8 ruled this out). Operators see "still happening"
|
||||
in the UI; they don't get a reminder ping.
|
||||
- **SMTP / email channel.** Out of scope. Operators wanting email today
|
||||
can chain webhook → email-gateway; native SMTP can land later.
|
||||
- **SMTP HTML emails.** v1 is plain text only — operators wanting rich
|
||||
rendering can deploy a webhook → mail-merge bridge, or wait for a v2
|
||||
template engine. The Message-ID threading + plain text body should be
|
||||
enough for almost every overnight-digest workflow.
|
||||
- **SMTP OAuth2 / XOAUTH2.** Out of scope. Gmail / Microsoft 365 with
|
||||
modern OAuth requires an `app password` workaround in v1. Native
|
||||
XOAUTH2 lands when an operator asks (or when Google starts refusing
|
||||
app passwords for non-business accounts in earnest).
|
||||
- **Multi-recipient SMTP channels.** A channel = one `To`. Operators
|
||||
wanting multiple recipients add multiple channels. Keeps failure
|
||||
attribution per-recipient.
|
||||
- **Apprise sidecar integration.** Deferred per brainstorm. The
|
||||
`Channel` interface accepts a third impl without reshaping when we get
|
||||
there.
|
||||
|
||||
Reference in New Issue
Block a user