docs(spec): clarify staleness vs job-failure alerting for asleep hosts
This commit is contained in:
@@ -37,8 +37,14 @@ Let an operator mark a host as **not** always-on. Such a host:
|
|||||||
whether it missed a scheduled backup and — if so — triggers a
|
whether it missed a scheduled backup and — if so — triggers a
|
||||||
catch-up backup automatically.
|
catch-up backup automatically.
|
||||||
4. Still raises a *staleness* alert if it has genuinely gone too long
|
4. Still raises a *staleness* alert if it has genuinely gone too long
|
||||||
without any backup (a host left in a drawer), and still raises normal
|
without any backup (a host left in a drawer). This is the only
|
||||||
job-failure alerts for backups that run and fail.
|
alert covering an asleep host: while the agent is offline no job
|
||||||
|
runs, so there is no failure to detect — staleness is the safety
|
||||||
|
net for "no backups are happening at all."
|
||||||
|
5. Leaves normal job-failure alerting untouched: a backup that
|
||||||
|
actually runs (scheduled or catch-up) and fails alerts as it does
|
||||||
|
today. Failures can only occur while the agent is online and
|
||||||
|
executing restic.
|
||||||
|
|
||||||
Default behaviour is unchanged for the entire existing fleet.
|
Default behaviour is unchanged for the entire existing fleet.
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user