docs(spec): clarify staleness vs job-failure alerting for asleep hosts

This commit is contained in:
2026-06-15 20:42:00 +01:00
parent 0c3a0844e4
commit 261b83ec26
@@ -37,8 +37,14 @@ Let an operator mark a host as **not** always-on. Such a host:
whether it missed a scheduled backup and — if so — triggers a whether it missed a scheduled backup and — if so — triggers a
catch-up backup automatically. catch-up backup automatically.
4. Still raises a *staleness* alert if it has genuinely gone too long 4. Still raises a *staleness* alert if it has genuinely gone too long
without any backup (a host left in a drawer), and still raises normal without any backup (a host left in a drawer). This is the only
job-failure alerts for backups that run and fail. alert covering an asleep host: while the agent is offline no job
runs, so there is no failure to detect — staleness is the safety
net for "no backups are happening at all."
5. Leaves normal job-failure alerting untouched: a backup that
actually runs (scheduled or catch-up) and fails alerts as it does
today. Failures can only occur while the agent is online and
executing restic.
Default behaviour is unchanged for the entire existing fleet. Default behaviour is unchanged for the entire existing fleet.