docs(spec): clarify staleness vs job-failure alerting for asleep hosts

2026-06-15 20:42:00 +01:00
parent 0c3a0844e4
commit 261b83ec26
1 changed files with 8 additions and 2 deletions
@@ -37,8 +37,14 @@ Let an operator mark a host as **not** always-on. Such a host:
   whether it missed a scheduled backup and — if so — triggers a
   catch-up backup automatically.
 4. Still raises a *staleness* alert if it has genuinely gone too long
-   without any backup (a host left in a drawer), and still raises normal
-   job-failure alerts for backups that run and fail.
+   without any backup (a host left in a drawer). This is the only
+   alert covering an asleep host: while the agent is offline no job
+   runs, so there is no failure to detect — staleness is the safety
+   net for "no backups are happening at all."
+5. Leaves normal job-failure alerting untouched: a backup that
+   actually runs (scheduled or catch-up) and fails alerts as it does
+   today. Failures can only occur while the agent is online and
+   executing restic.

 Default behaviour is unchanged for the entire existing fleet.