Always-On vs intermittent host mode (laptops): suppress offline noise, catch up missed backups #31

Merged
steve merged 17 commits from feat-laptop-host-mode into main 2026-06-15 23:01:04 +01:00
Owner

Lets an operator mark a host as not always-on (laptop/workstation) so it stops generating offline-alert noise when it legitimately sleeps, shows a calm state, and auto-catches-up a backup it missed while away.

What it does

  • hosts.always_on column (migration 0024, default 1 → no change for the existing fleet; intermittent is opt-in).
  • Alerts: intermittent hosts never raise agent_offline; instead the previously-dead stale_schedule alert is wired up for them at a 7-day threshold (only when they have an enabled schedule), resolved on the next successful backup or on mode toggle.
  • Catch-up: ~60s after an intermittent host reconnects, a server-side scheduler dispatches a backup for any enabled schedule whose window elapsed while it slept (overdue = cron.Next(lastBackup) <= now), guarded against double-dispatch via a real queued/running-backup check.
  • Toggle: POST /hosts/{id}/mode (operator-band, audited host.mode_updated); clears open offline/staleness alerts so the next sweep re-settles.
  • UI: intermittent offline hosts render a grey asleep · … · will catch up on return state instead of red "offline"; the host-detail header groups tags + presence into boxed pills with click-to-edit, a 24x7/Free presence chip, and a simplified out of date agent chip.

Design + plan: docs/specs/2026-06-15-always-on-host-mode-design.md, docs/plans/2026-06-15-always-on-host-mode.md. Tracked as NS-08 in tasks.md.

Test plan

  • go test ./internal/store/ ./internal/alert/ green (always_on round-trip, offline suppression, staleness raise+resolve, mode-change resolve)
  • go test ./internal/server/http/ -run 'TestScheduleOverdue|TestRunCatchup|TestHostModeSaveToggle' green
  • go vet ./... clean
  • Manual: toggle a host to intermittent → 24x7 chip becomes Free, offline host shows 'asleep', reconnect triggers catch-up

Note: this branch does not include the stale-dated sparkline-test fix (that's in PR #30); CI here may show that pre-existing failure until #30 lands in main and this branch is updated.

Lets an operator mark a host as **not** always-on (laptop/workstation) so it stops generating offline-alert noise when it legitimately sleeps, shows a calm state, and auto-catches-up a backup it missed while away. ## What it does - **`hosts.always_on`** column (migration 0024, default 1 → no change for the existing fleet; intermittent is opt-in). - **Alerts:** intermittent hosts never raise `agent_offline`; instead the previously-dead `stale_schedule` alert is wired up for them at a 7-day threshold (only when they have an enabled schedule), resolved on the next successful backup or on mode toggle. - **Catch-up:** ~60s after an intermittent host reconnects, a server-side scheduler dispatches a backup for any enabled schedule whose window elapsed while it slept (overdue = `cron.Next(lastBackup) <= now`), guarded against double-dispatch via a real queued/running-backup check. - **Toggle:** `POST /hosts/{id}/mode` (operator-band, audited `host.mode_updated`); clears open offline/staleness alerts so the next sweep re-settles. - **UI:** intermittent offline hosts render a grey `asleep · … · will catch up on return` state instead of red "offline"; the host-detail header groups tags + presence into boxed pills with click-to-edit, a `24x7`/`Free` presence chip, and a simplified `out of date` agent chip. Design + plan: `docs/specs/2026-06-15-always-on-host-mode-design.md`, `docs/plans/2026-06-15-always-on-host-mode.md`. Tracked as NS-08 in tasks.md. ## Test plan - [ ] `go test ./internal/store/ ./internal/alert/` green (always_on round-trip, offline suppression, staleness raise+resolve, mode-change resolve) - [ ] `go test ./internal/server/http/ -run 'TestScheduleOverdue|TestRunCatchup|TestHostModeSaveToggle'` green - [ ] `go vet ./...` clean - [ ] Manual: toggle a host to intermittent → 24x7 chip becomes Free, offline host shows 'asleep', reconnect triggers catch-up > Note: this branch does not include the stale-dated sparkline-test fix (that's in PR #30); CI here may show that pre-existing failure until #30 lands in main and this branch is updated.
steve added 16 commits 2026-06-15 22:59:18 +01:00
ui(host header): boxed tags/presence pills, click-to-edit, simplified out-of-date chip
CI / Test (rest) (pull_request) Successful in 41s
CI / Test (store) (pull_request) Successful in 1m16s
CI / Lint (pull_request) Successful in 41s
CI / Build (windows/amd64) (pull_request) Successful in 14s
CI / Build (linux/arm64) (pull_request) Successful in 15s
e2e / Playwright vs docker-compose (pull_request) Failing after 11s
CI / Build (linux/amd64) (pull_request) Successful in 50s
CI / Test (server-http) (pull_request) Failing after 2m53s
39030a3bbe
steve added 1 commit 2026-06-15 23:00:59 +01:00
Merge branch 'main' into feat-laptop-host-mode
CI / Test (rest) (pull_request) Successful in 1m6s
CI / Lint (pull_request) Successful in 18s
CI / Build (windows/amd64) (pull_request) Successful in 12s
CI / Build (linux/amd64) (pull_request) Successful in 14s
CI / Test (store) (pull_request) Successful in 1m8s
CI / Build (linux/arm64) (pull_request) Successful in 11s
e2e / Playwright vs docker-compose (pull_request) Failing after 10s
CI / Test (server-http) (pull_request) Successful in 2m52s
e17932d797
steve merged commit d8fd4110b0 into main 2026-06-15 23:01:04 +01:00
steve deleted branch feat-laptop-host-mode 2026-06-15 23:01:04 +01:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: steve/restic-manager#31