# Alerts and notifications restic-manager raises alerts on conditions that need human attention. The alert engine evaluates rules on a 60s tick and on every job-finished / host-online event. ## Built-in alert kinds | Kind | Trigger | Severity | |---------------------|---------|----------| | `backup_failed` | A backup job ends in `failed` or `cancelled` | warning | | `forget_failed` | A forget job ends in `failed` | warning | | `prune_failed` | A prune job ends in `failed` | critical | | `check_failed` | A check job ends in `failed` | critical | | `agent_offline` | A host has been offline more than 90s past its heartbeat cadence | warning | | `stale_schedule` | A schedule's "last run" is more than 1.5 × its interval ago | warning | | `update_failed` | An agent self-update returned a fail or didn't reconnect within 90s | warning | | `fleet_update_halted`| The rolling fleet-update worker stopped on a failure | critical | Each alert has a `dedup_key` so re-firing the same condition just bumps `last_seen_at` — the operator gets one row per condition, not a thousand. ## Lifecycle ``` raised ──acknowledge──▶ acknowledged ──resolve──▶ resolved │ │ └────────auto-resolve──────┘ (e.g. agent_offline auto-resolves on agent_online) ``` - **Acknowledge** says "I've seen this, stop notifying about it". - **Resolve** says "the underlying condition is gone". - Some alerts auto-resolve when the condition clears (`agent_offline` is the canonical example). ## Notification channels Configure under **Settings → Notifications**. Each channel can subscribe to all alerts or filter by severity. ### Webhook Posts a JSON envelope to a URL of your choice. Useful for piping into Slack via an Incoming Webhook URL or into your own alerting tooling. ### ntfy Pushes a plain-text alert to an [ntfy.sh](https://ntfy.sh/) topic. Configure the topic URL; optional bearer token if you self-host with auth. ### SMTP Plain SMTP (with optional TLS). Configure host, port, username, password, and the recipient list. ## Test fire Each channel exposes a **Test fire** button that dispatches a single synthetic alert through the channel without touching the alert engine. Use this when you've added a channel and want to verify connectivity before the next real failure happens. ## What gets logged Every alert raise / acknowledge / resolve writes an audit log entry. The audit log UI at **Settings → Audit log** filters by user, action, target, and time range — useful for the post-incident "who clicked acknowledge on the prune-failure alert" question.