# Always-On vs Intermittent Host Mode — Implementation Plan > **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. **Goal:** Let an operator mark a host as not-always-on so it stops raising offline alerts when it legitimately sleeps, renders a calm "asleep" state, auto-catches-up a missed backup ~1 minute after it reconnects, and still raises a long-threshold staleness alert if it goes too long with no backup. **Architecture:** A thin policy + presentation layer over the existing online/offline state machine. A new `hosts.always_on` boolean (default 1 = today's behaviour) gates three behaviours: offline-alert suppression + a 7-day staleness alert in the alert engine; an in-memory catch-up scheduler in the HTTP server armed on agent hello and fired from the existing 30s tick; and an "asleep" UI state plus a 24×7 chip. Online/offline tracking, heartbeat, and `pending_runs` are untouched. **Tech Stack:** Go, SQLite (modernc), `github.com/robfig/cron/v3` (already a dependency), Go `html/template`, Tailwind-in-`input.css`. **Spec:** `docs/specs/2026-06-15-always-on-host-mode-design.md` --- ## File Structure - **Create** `internal/store/migrations/0024_hosts_always_on.sql` — add the column. - **Modify** `internal/store/types.go` — add `Host.AlwaysOn bool`. - **Modify** `internal/store/hosts.go` — add `always_on` to the 3 host SELECTs + `scanHostRow`; add `SetHostAlwaysOn`. - **Create** `internal/store/hosts_always_on_test.go` — round-trip + default test. - **Modify** `internal/alert/engine.go` — suppress offline for intermittent hosts; staleness sweep; resolve staleness on backup success. - **Modify** `internal/alert/rules.go` — exported `ResolveKind` helper for the toggle handler; staleness threshold constant. - **Create** `internal/alert/intermittent_test.go` — suppression + staleness + resolve tests. - **Create** `internal/server/http/catchup.go` — overdue helper + in-memory catch-up scheduler. - **Create** `internal/server/http/catchup_test.go` — overdue table tests. - **Modify** `internal/server/http/server.go` — catch-up map fields on `Server`, init in `New`. - **Modify** `internal/server/http/host_credentials.go` — arm catch-up in `onAgentHello`. - **Modify** `cmd/server/main.go` — call `srv.RunCatchupsDue` on the pending-drain tick. - **Modify** `internal/server/http/ui_handlers.go` — `handleUIHostModeSave` handler. - **Modify** `internal/server/http/server.go` (routes) — mount `POST /hosts/{id}/mode`. - **Modify** `web/styles/input.css` — `dot-asleep` token. - **Modify** `web/templates/partials/host_row.html` — asleep dot + text. - **Modify** `web/templates/partials/host_chrome.html` — asleep dot/last-seen, 24×7 chip, mode toggle form. - **Modify** `tasks.md` — record the feature. --- ## Task 1: Schema + store field for `always_on` **Files:** - Create: `internal/store/migrations/0024_hosts_always_on.sql` - Modify: `internal/store/types.go:62-102` (Host struct) - Modify: `internal/store/hosts.go` (3 SELECTs at lines 41-48, 56-63, 224-231; `scanHostRow` at 261-334) - Test: `internal/store/hosts_always_on_test.go` - [ ] **Step 1: Write the migration** Create `internal/store/migrations/0024_hosts_always_on.sql`: ```sql -- 0024: distinguish always-on (24x7 server) hosts from intermittent -- hosts (laptops/workstations that legitimately sleep). Default 1 so -- every existing and future host keeps today's offline/alert -- semantics unless explicitly opted out. Column-level ALTER per the -- repo's migration rules (no table rebuild — hosts has inbound FKs). ALTER TABLE hosts ADD COLUMN always_on INTEGER NOT NULL DEFAULT 1; ``` - [ ] **Step 2: Add the struct field** In `internal/store/types.go`, add to the `Host` struct (after `RepoStatusError` at line 101): ```go // AlwaysOn is true for 24x7 server hosts (the default). When false // the host is intermittent (laptop/workstation): offline alerts are // suppressed, the UI shows an "asleep" state, and a missed backup is // caught up ~1 min after reconnect. See the always-on-host-mode spec. AlwaysOn bool ``` - [ ] **Step 3: Thread `always_on` through reads** In `internal/store/hosts.go`, append `, always_on` to the SELECT column list in all three queries: `LookupHostByAgentToken` (line 47), `GetHost` (line 62), and `ListHosts` (line 230). Each currently ends `repo_status, repo_status_error` — change to `repo_status, repo_status_error, always_on`. Then in `scanHostRow` (line 261), add scanning. Add a local var and the scan target. Change the `Scan(...)` call's final args from `&h.RepoStatus, &h.RepoStatusError)` to `&h.RepoStatus, &h.RepoStatusError, &alwaysOn)` and declare `var alwaysOn int` in the var block, then after the existing post-scan assignments add: ```go h.AlwaysOn = alwaysOn != 0 ``` (SQLite stores the boolean as INTEGER; scan into int then compare to avoid driver bool-coercion surprises.) - [ ] **Step 4: Add `SetHostAlwaysOn`** In `internal/store/hosts.go`, after `SetHostTags` (line 379), add: ```go // SetHostAlwaysOn flips the host's always-on flag. true = 24x7 server // (default); false = intermittent host (laptop). See the // always-on-host-mode spec. func (s *Store) SetHostAlwaysOn(ctx context.Context, hostID string, alwaysOn bool) error { v := 0 if alwaysOn { v = 1 } _, err := s.db.ExecContext(ctx, `UPDATE hosts SET always_on = ? WHERE id = ?`, v, hostID) if err != nil { return fmt.Errorf("store: set host always_on: %w", err) } return nil } ``` - [ ] **Step 5: Write the round-trip test** Create `internal/store/hosts_always_on_test.go`. Use the existing test harness pattern — check a sibling test (e.g. `internal/store/hosts_test.go`) for the `newTestStore`/`testStore` helper name and the host-creation helper, and mirror it exactly. The test body: ```go package store import ( "context" "testing" "time" ) func TestHostAlwaysOnDefaultAndToggle(t *testing.T) { ctx := context.Background() st := newTestStore(t) // mirror the helper used by hosts_test.go h := Host{ ID: "h-always-on", Name: "lap", OS: "linux", Arch: "amd64", ProtocolVersion: 1, EnrolledAt: time.Now().UTC(), } if err := st.CreateHost(ctx, h, "tok-hash", "pin"); err != nil { t.Fatalf("create host: %v", err) } got, err := st.GetHost(ctx, h.ID) if err != nil { t.Fatalf("get host: %v", err) } if !got.AlwaysOn { t.Fatalf("new host should default to always_on=true, got false") } if err := st.SetHostAlwaysOn(ctx, h.ID, false); err != nil { t.Fatalf("set always_on: %v", err) } got, err = st.GetHost(ctx, h.ID) if err != nil { t.Fatalf("get host 2: %v", err) } if got.AlwaysOn { t.Fatalf("expected always_on=false after toggle, got true") } // ListHosts must surface the same value. hosts, err := st.ListHosts(ctx) if err != nil { t.Fatalf("list hosts: %v", err) } if len(hosts) != 1 || hosts[0].AlwaysOn { t.Fatalf("ListHosts should report always_on=false, got %+v", hosts) } } ``` - [ ] **Step 6: Run the test (expect FAIL first if written before code, else PASS)** Run: `go test ./internal/store/ -run TestHostAlwaysOnDefaultAndToggle -v` Expected: PASS once Steps 1-4 are in. If you wrote the test first, it fails to compile on `AlwaysOn` / `SetHostAlwaysOn` — that is the expected red. - [ ] **Step 7: Commit** ```bash go vet ./internal/store/... git add internal/store/migrations/0024_hosts_always_on.sql internal/store/types.go internal/store/hosts.go internal/store/hosts_always_on_test.go git commit -m "feat(store): add hosts.always_on flag (default on)" ``` --- ## Task 2: Overdue computation helper This is a pure function so it can be unit-tested in isolation before the scheduler wires it up. It lives in the new `catchup.go` (the scheduler will follow in Task 3, same file). **Files:** - Create: `internal/server/http/catchup.go` - Test: `internal/server/http/catchup_test.go` - [ ] **Step 1: Write the failing test** Create `internal/server/http/catchup_test.go`: ```go package http import ( "testing" "time" ) func TestScheduleOverdue(t *testing.T) { mustParse := func(s string) time.Time { t.Helper() v, err := time.Parse(time.RFC3339, s) if err != nil { t.Fatalf("parse %q: %v", s, err) } return v } daily := "0 2 * * *" // 02:00 every day cases := []struct { name string cron string lastBackup *time.Time now time.Time want bool }{ { name: "never backed up is overdue", cron: daily, lastBackup: nil, now: mustParse("2026-06-15T09:00:00Z"), want: true, }, { name: "missed last nights window", cron: daily, lastBackup: ptrTime(mustParse("2026-06-13T02:05:00Z")), now: mustParse("2026-06-15T09:00:00Z"), want: true, }, { name: "backed up after the most recent window", cron: daily, lastBackup: ptrTime(mustParse("2026-06-15T02:05:00Z")), now: mustParse("2026-06-15T09:00:00Z"), want: false, }, { name: "unparseable cron is never overdue", cron: "not a cron", lastBackup: nil, now: mustParse("2026-06-15T09:00:00Z"), want: false, }, } for _, c := range cases { t.Run(c.name, func(t *testing.T) { got := scheduleOverdue(c.cron, c.lastBackup, c.now) if got != c.want { t.Fatalf("scheduleOverdue(%q, %v, %v) = %v, want %v", c.cron, c.lastBackup, c.now, got, c.want) } }) } } func ptrTime(t time.Time) *time.Time { return &t } ``` - [ ] **Step 2: Run the test to verify it fails** Run: `go test ./internal/server/http/ -run TestScheduleOverdue -v` Expected: FAIL — `undefined: scheduleOverdue`. - [ ] **Step 3: Implement `scheduleOverdue`** Create `internal/server/http/catchup.go` with the helper (the scheduler methods are added in Task 3): ```go // catchup.go — server-side catch-up for intermittent (non-always-on) // hosts. When such a host reconnects we wait a short settle window, // then dispatch a backup for any schedule whose window elapsed while // the host was asleep. This is separate from pending_runs: a host that // was asleep never fired its local cron, so no pending row exists. package http import ( "time" ) // scheduleOverdue reports whether a schedule's most recent expected // fire is newer than the host's last successful backup — i.e. a window // passed with no backup. A nil lastBackup means "never backed up" and // is always overdue (provided the cron parses). An unparseable cron is // treated as not-overdue so a bad expression can never trigger a // surprise dispatch. Uses the same cronParser the agent's scheduler // and schedule validation use, so interpretation is identical. func scheduleOverdue(cronExpr string, lastBackup *time.Time, now time.Time) bool { sched, err := cronParser.Parse(cronExpr) if err != nil { return false } if lastBackup == nil { return true } next := sched.Next(*lastBackup) return !next.After(now) } ``` - [ ] **Step 4: Run the test to verify it passes** Run: `go test ./internal/server/http/ -run TestScheduleOverdue -v` Expected: PASS (all four sub-cases). - [ ] **Step 5: Commit** ```bash go vet ./internal/server/http/... git add internal/server/http/catchup.go internal/server/http/catchup_test.go git commit -m "feat(catchup): scheduleOverdue helper for missed-window detection" ``` --- ## Task 3: Catch-up scheduler (arm on hello, fire on tick) **Files:** - Modify: `internal/server/http/server.go:68-93` (Server struct), `:96-112` (New) - Modify: `internal/server/http/catchup.go` (add scheduler methods) - Modify: `internal/server/http/host_credentials.go:463-486` (onAgentHello) - Modify: `cmd/server/main.go:228-229` (pending-drain tick case) - [ ] **Step 1: Add catch-up state to the Server struct** In `internal/server/http/server.go`, add fields to `Server` (after `treeCache` at line 92): ```go // catchupDueAt tracks intermittent hosts that reconnected and are // in their settle window. Keyed hostID → earliest time to evaluate // catch-up. Best-effort + in-memory: a server restart simply re-arms // on the next hello. Guarded by catchupMu. catchupMu sync.Mutex catchupDueAt map[string]time.Time ``` Add `"time"` to the imports if not already present (check the import block). - [ ] **Step 2: Initialise the map in New** In `New` (line 106), add to the `&Server{...}` literal: ```go catchupDueAt: make(map[string]time.Time), ``` - [ ] **Step 3: Add scheduler methods to catchup.go** Append to `internal/server/http/catchup.go`. Add `"context"`, `"log/slog"` to its imports: ```go // catchupSettle is how long after a reconnect we wait before evaluating // catch-up, so a laptop that wakes briefly and sleeps again doesn't // trigger a backup it can't finish. ~1 minute per the spec. const catchupSettle = 60 * time.Second // ArmCatchup records that an intermittent host just reconnected and // should be evaluated for a missed backup after the settle window. // No-op for always-on hosts (caller passes only intermittent hosts). // Re-arming overwrites the timer (debounce — flapping doesn't stack). func (s *Server) ArmCatchup(hostID string, now time.Time) { s.catchupMu.Lock() defer s.catchupMu.Unlock() if s.catchupDueAt == nil { s.catchupDueAt = make(map[string]time.Time) } s.catchupDueAt[hostID] = now.Add(catchupSettle) } // dueCatchups returns the hostIDs whose settle window has elapsed and // removes them from the map. Caller evaluates each. func (s *Server) dueCatchups(now time.Time) []string { s.catchupMu.Lock() defer s.catchupMu.Unlock() var due []string for id, at := range s.catchupDueAt { if !now.Before(at) { due = append(due, id) delete(s.catchupDueAt, id) } } return due } // RunCatchupsDue is the tick entrypoint. For each host past its settle // window it dispatches a backup for every enabled schedule that is // overdue. Skips hosts that bounced back offline, that are already // running/queued a job, or that turned out to be always-on. func (s *Server) RunCatchupsDue(ctx context.Context) { if s.deps.Hub == nil { return } now := time.Now().UTC() for _, hostID := range s.dueCatchups(now) { s.runCatchup(ctx, hostID, now) } } // runCatchup evaluates and dispatches catch-up backups for a single // host. Exported logic kept here so RunCatchupsDue reads cleanly. func (s *Server) runCatchup(ctx context.Context, hostID string, now time.Time) { conn := s.deps.Hub.Conn(hostID) if conn == nil { return // bounced offline during the settle window; re-arms on next hello } host, err := s.deps.Store.GetHost(ctx, hostID) if err != nil { slog.Warn("catchup: load host", "host_id", hostID, "err", err) return } if host.AlwaysOn { return // mode flipped during settle window } if host.CurrentJobID != nil { return // a job is already running; don't pile on } schedules, err := s.deps.Store.ListSchedulesByHost(ctx, hostID) if err != nil { slog.Warn("catchup: list schedules", "host_id", hostID, "err", err) return } for _, sc := range schedules { if !sc.Enabled || len(sc.SourceGroupIDs) == 0 { continue } if !scheduleOverdue(sc.CronExpr, host.LastBackupAt, now) { continue } for _, gid := range sc.SourceGroupIDs { g, err := s.deps.Store.GetSourceGroup(ctx, hostID, gid) if err != nil { slog.Warn("catchup: load source group", "host_id", hostID, "schedule_id", sc.ID, "group_id", gid, "err", err) continue } if _, derr := s.dispatchBackupForGroupCore(ctx, conn, hostID, sc.ID, g, now); derr != nil { // Send failed — host dropped again. Re-arm so the next // reconnect retries; stop processing this host. s.ArmCatchup(hostID, now) return } slog.Info("catchup: dispatched missed backup", "host_id", hostID, "schedule_id", sc.ID, "group", g.Name) } } } ``` - [ ] **Step 4: Arm catch-up on agent hello** In `internal/server/http/host_credentials.go`, in `onAgentHello` (line 463), after the `go s.DrainPending(...)` line (485), add: ```go // Intermittent hosts that just reconnected may have slept through a // backup window. Arm a catch-up evaluation after a settle delay; the // pending-drain tick fires it. Always-on hosts never need this. if host, err := s.deps.Store.GetHost(ctx, hostID); err == nil && !host.AlwaysOn { s.ArmCatchup(hostID, time.Now().UTC()) } ``` Verify `time` is already imported in this file (it is — used elsewhere). If not, add it. - [ ] **Step 5: Fire catch-up from the pending-drain tick** In `cmd/server/main.go`, in the `case <-pendingDrainTick.C:` block (line 228), change: ```go case <-pendingDrainTick.C: srv.DrainAllDue(ctx) ``` to: ```go case <-pendingDrainTick.C: srv.DrainAllDue(ctx) srv.RunCatchupsDue(ctx) ``` - [ ] **Step 6: Build and vet** Run: `go build ./... && go vet ./...` Expected: clean build, no vet errors. - [ ] **Step 7: Commit** ```bash git add internal/server/http/server.go internal/server/http/catchup.go internal/server/http/host_credentials.go cmd/server/main.go git commit -m "feat(catchup): arm on hello, fire missed-window backups on tick" ``` --- ## Task 4: Alert engine — suppress offline + staleness alert **Files:** - Modify: `internal/alert/engine.go:121-153` (handleJobFinished), `:155-174` (handleHostOffline), `:188-216` (tick) - Modify: `internal/alert/rules.go:13-39` (constants), add exported resolve helper - Test: `internal/alert/intermittent_test.go` - [ ] **Step 1: Add the staleness threshold constant** In `internal/alert/engine.go`, add near the top of the file (after imports, before `JobFinishedEvent`): ```go // staleBackupThreshold is how long an intermittent host may go without // a successful backup before we raise a stale_schedule alert. Global // constant for v1 (may become per-host later). Only intermittent hosts // are evaluated — always-on hosts' stale_schedule stays a no-op. const staleBackupThreshold = 7 * 24 * time.Hour ``` - [ ] **Step 2: Suppress the offline alert for intermittent hosts** In `handleHostOffline` (line 155), after loading the host and the existing `if host.LastSeenAt == nil { return }` guard, add a mode check. Change: ```go if host.LastSeenAt == nil { return } if time.Since(*host.LastSeenAt) < e.agentOfflineFloor { return } ``` to: ```go // Intermittent hosts (laptops) legitimately disappear — never raise // agent_offline for them. The stale_schedule sweep in tick() is the // only staleness signal for these hosts. if !host.AlwaysOn { return } if host.LastSeenAt == nil { return } if time.Since(*host.LastSeenAt) < e.agentOfflineFloor { return } ``` - [ ] **Step 3: Suppress offline + add staleness in the tick sweep** In `tick` (line 188), the host loop currently raises agent_offline for every offline host. Replace the loop body (lines 205-214) with: ```go for _, h := range hosts { // Intermittent hosts: suppress agent_offline entirely; instead // raise stale_schedule when they have gone too long with no // successful backup AND they have at least one enabled schedule // to be measured against. A nil LastBackupAt (never backed up) // has no baseline — onboarding/repo_status covers that case. if !h.AlwaysOn { if h.LastBackupAt == nil { continue } if now.Sub(*h.LastBackupAt) < staleBackupThreshold { continue } hasEnabled, err := e.hostHasEnabledSchedule(ctx, h.ID) if err != nil || !hasEnabled { continue } e.raiseAndNotify(ctx, h.ID, KindStaleSchedule, "", "warning", fmt.Sprintf("No backup in %s (threshold %s)", roundDur(now.Sub(*h.LastBackupAt)), staleBackupThreshold), now) continue } // Always-on hosts: existing agent_offline re-evaluation. if h.Status != "offline" || h.LastSeenAt == nil { continue } if now.Sub(*h.LastSeenAt) >= e.agentOfflineFloor { e.raiseAndNotify(ctx, h.ID, KindAgentOffline, "", "warning", fmt.Sprintf("Agent offline for %s (threshold %s)", roundDur(now.Sub(*h.LastSeenAt)), e.agentOfflineFloor), now) } } ``` Delete the trailing `// Stale-schedule sweep — no-op in v1.` comment at line 215. - [ ] **Step 4: Add the `hostHasEnabledSchedule` helper** In `internal/alert/engine.go`, add at the end of the file: ```go // hostHasEnabledSchedule reports whether the host has at least one // enabled backup schedule — the precondition for a stale_schedule // alert (no schedule = no backup expectation to measure against). func (e *Engine) hostHasEnabledSchedule(ctx context.Context, hostID string) (bool, error) { schedules, err := e.store.ListSchedulesByHost(ctx, hostID) if err != nil { return false, err } for _, sc := range schedules { if sc.Enabled { return true, nil } } return false, nil } ``` - [ ] **Step 5: Resolve staleness on a successful backup** In `handleJobFinished` (line 146), the `case "succeeded":` currently resolves only the job-kind alert. For a successful backup, also clear any open stale_schedule. Change: ```go case "succeeded": e.resolveAndNotify(ctx, ev.HostID, kind, dedupKey, ev.When) } ``` to: ```go case "succeeded": e.resolveAndNotify(ctx, ev.HostID, kind, dedupKey, ev.When) if ev.Kind == "backup" { // A fresh backup clears staleness for intermittent hosts. e.resolveAndNotify(ctx, ev.HostID, KindStaleSchedule, "", ev.When) } } ``` - [ ] **Step 6: Add an exported mode-change resolve hook** The HTTP toggle handler (Task 5) needs to clear stale alerts when an operator changes a host's mode. Add to `internal/alert/rules.go` (after `Resolve`, around line 100): ```go // ResolveOnModeChange clears any open agent_offline and stale_schedule // alerts for a host whose always-on flag was just toggled. The next // 60s tick re-raises whichever still applies under the new mode, so // this is a self-correcting "wipe and let the sweep settle" call. // Safe to invoke from the HTTP layer (it only touches the store + hub). func (e *Engine) ResolveOnModeChange(ctx context.Context, hostID string, when time.Time) { e.resolveAndNotify(ctx, hostID, KindAgentOffline, "", when) e.resolveAndNotify(ctx, hostID, KindStaleSchedule, "", when) } ``` - [ ] **Step 7: Write the engine tests** Create `internal/alert/intermittent_test.go`. First inspect an existing engine test (e.g. grep `internal/alert/*_test.go` for how `NewEngine` is constructed with a test store + hub, and the helper that creates a host + schedule). Mirror those helpers. The tests to write: ```go package alert import ( "context" "testing" "time" ) // Mirror the construction helpers used by the existing engine tests // (newTestEngine / test store / host+schedule seeding). Replace the // placeholder helpers below with the real ones from this package's // existing _test.go files. func TestIntermittentHostSuppressesOfflineAlert(t *testing.T) { ctx := context.Background() e, st := newTestEngine(t) // mirror existing helper hostID := seedHost(t, st, false /* alwaysOn */) // last seen well past the floor touchHostSeen(t, st, hostID, time.Now().Add(-2*time.Hour)) markHostOffline(t, st, hostID) e.handleHostOffline(ctx, hostID) if n := openAlertCount(t, st, hostID, KindAgentOffline); n != 0 { t.Fatalf("intermittent host should not raise agent_offline, got %d", n) } } func TestAlwaysOnHostStillRaisesOfflineAlert(t *testing.T) { ctx := context.Background() e, st := newTestEngine(t) hostID := seedHost(t, st, true /* alwaysOn */) touchHostSeen(t, st, hostID, time.Now().Add(-2*time.Hour)) markHostOffline(t, st, hostID) e.handleHostOffline(ctx, hostID) if n := openAlertCount(t, st, hostID, KindAgentOffline); n != 1 { t.Fatalf("always-on host should raise agent_offline, got %d", n) } } func TestStalenessAlertForIntermittentHost(t *testing.T) { ctx := context.Background() e, st := newTestEngine(t) hostID := seedHost(t, st, false) seedEnabledSchedule(t, st, hostID) // "0 2 * * *" with a source group setLastBackup(t, st, hostID, time.Now().Add(-8*24*time.Hour)) e.tick(ctx, time.Now().UTC()) if n := openAlertCount(t, st, hostID, KindStaleSchedule); n != 1 { t.Fatalf("expected one stale_schedule alert, got %d", n) } // A successful backup clears it. e.handleJobFinished(ctx, JobFinishedEvent{ HostID: hostID, JobID: "j1", Kind: "backup", Status: "succeeded", When: time.Now().UTC(), }) if n := openAlertCount(t, st, hostID, KindStaleSchedule); n != 0 { t.Fatalf("stale_schedule should resolve after backup, got %d", n) } } func TestNoStalenessWithoutEnabledSchedule(t *testing.T) { ctx := context.Background() e, st := newTestEngine(t) hostID := seedHost(t, st, false) setLastBackup(t, st, hostID, time.Now().Add(-8*24*time.Hour)) // no schedule seeded e.tick(ctx, time.Now().UTC()) if n := openAlertCount(t, st, hostID, KindStaleSchedule); n != 0 { t.Fatalf("no schedule => no staleness alert, got %d", n) } } ``` > **Note for the implementer:** the `newTestEngine`, `seedHost`, `touchHostSeen`, `markHostOffline`, `openAlertCount`, `seedEnabledSchedule`, `setLastBackup` helpers must be replaced with the real equivalents in this package's existing tests. If a needed seeding helper doesn't exist, write it using the `store` methods directly (`CreateHost`, `SetHostAlwaysOn`, `CreateSchedule`, `SetHostLastBackup`, `MarkHostsOfflineStale`, `ListAlerts`). Do NOT invent store methods — all required ones exist as of Task 1. - [ ] **Step 8: Run the tests** Run: `go test ./internal/alert/ -v` Expected: PASS for all four new tests plus the existing suite. - [ ] **Step 9: Commit** ```bash go vet ./internal/alert/... git add internal/alert/engine.go internal/alert/rules.go internal/alert/intermittent_test.go git commit -m "feat(alert): suppress offline + add staleness alert for intermittent hosts" ``` --- ## Task 5: HTTP toggle handler + route **Files:** - Modify: `internal/server/http/ui_handlers.go` (new handler near `handleUIHostTagsSave` at line 954) - Modify: `internal/server/http/server.go:281` (route mount) - [ ] **Step 1: Add the handler** In `internal/server/http/ui_handlers.go`, after `handleUIHostTagsSave` (line 984), add: ```go // handleUIHostModeSave flips a host's always-on flag. Checkbox present // in the form (value any) => always-on; absent => intermittent. // Operator-band; mounted in server.go. On change we clear open // offline/staleness alerts via the engine so the next sweep re-raises // only what still applies under the new mode. func (s *Server) handleUIHostModeSave(w stdhttp.ResponseWriter, r *stdhttp.Request) { u := s.requireUIUser(w, r) if u == nil { return } hostID := chi.URLParam(r, "id") if _, err := s.deps.Store.GetHost(r.Context(), hostID); err != nil { stdhttp.NotFound(w, r) return } if err := r.ParseForm(); err != nil { stdhttp.Error(w, "bad request", stdhttp.StatusBadRequest) return } alwaysOn := r.PostForm.Get("always_on") != "" if err := s.deps.Store.SetHostAlwaysOn(r.Context(), hostID, alwaysOn); err != nil { slog.Error("ui host mode: save", "host_id", hostID, "err", err) stdhttp.Error(w, "internal", stdhttp.StatusInternalServerError) return } if s.deps.AlertEngine != nil { s.deps.AlertEngine.ResolveOnModeChange(r.Context(), hostID, time.Now().UTC()) } _ = s.deps.Store.AppendAudit(r.Context(), store.AuditEntry{ ID: ulid.Make().String(), UserID: &u.ID, Actor: "user", Action: "host.mode_updated", TargetKind: ptr("host"), TargetID: &hostID, TS: time.Now().UTC(), }) stdhttp.Redirect(w, r, "/hosts/"+hostID, stdhttp.StatusSeeOther) } ``` - [ ] **Step 2: Mount the route** In `internal/server/http/server.go`, next to the tags route (line 281): ```go r.Post("/hosts/{id}/tags", s.handleUIHostTagsSave) ``` add directly below: ```go r.Post("/hosts/{id}/mode", s.handleUIHostModeSave) ``` (Confirm it lands in the same operator-band route group as `/hosts/{id}/tags` — same indentation/block.) - [ ] **Step 3: Build and vet** Run: `go build ./... && go vet ./...` Expected: clean. - [ ] **Step 4: Write a handler test** Add to the existing UI-handler test file (grep `internal/server/http/*_test.go` for the harness that builds a `Server` + does form POSTs against `/hosts/{id}/tags`; mirror it). The test posts to `/hosts/{id}/mode` with and without the `always_on` field and asserts the stored flag: ```go func TestHandleUIHostModeSave(t *testing.T) { srv, st, sess := newUITestServer(t) // mirror tags-save test harness hostID := seedHostForUI(t, st) // mirror existing host seeding // Uncheck: form without always_on => intermittent. postForm(t, srv, sess, "/hosts/"+hostID+"/mode", map[string]string{}) if h, _ := st.GetHost(context.Background(), hostID); h.AlwaysOn { t.Fatalf("expected always_on=false after empty post") } // Check: form with always_on=on => always-on. postForm(t, srv, sess, "/hosts/"+hostID+"/mode", map[string]string{"always_on": "on"}) if h, _ := st.GetHost(context.Background(), hostID); !h.AlwaysOn { t.Fatalf("expected always_on=true after checked post") } } ``` > Replace `newUITestServer`/`seedHostForUI`/`postForm` with the real harness helpers from the existing UI handler tests. - [ ] **Step 5: Run the test** Run: `go test ./internal/server/http/ -run TestHandleUIHostModeSave -v` Expected: PASS. - [ ] **Step 6: Commit** ```bash git add internal/server/http/ui_handlers.go internal/server/http/server.go internal/server/http/*_test.go git commit -m "feat(http): host mode toggle handler + route (host.mode_updated)" ``` --- ## Task 6: UI — asleep state, 24×7 chip, mode toggle **Files:** - Modify: `web/styles/input.css` (dot-asleep token) - Modify: `web/templates/partials/host_row.html` - Modify: `web/templates/partials/host_chrome.html` - [ ] **Step 1: Add the `dot-asleep` CSS token** In `web/styles/input.css`, find the `.dot-offline` definition (grep for `dot-offline`) and add a sibling `.dot-asleep` rule. Match the existing dot pattern; use a calm grey-blue distinct from offline's grey/red. Example (adapt colours to the file's existing tokens): ```css .dot-asleep { background: var(--ink-fade); opacity: 0.6; } ``` > Inspect the neighbouring `.dot-offline` / `.dot-degraded` rules first and follow their exact shape (size, border, etc.); only the colour/opacity should differ. - [ ] **Step 2: Rebuild CSS if the project precompiles it** Check the Makefile for a CSS build step (grep `css` in `Makefile`). If present, run it (e.g. `make css`). If the server serves `input.css` directly, skip. - [ ] **Step 3: Asleep dot + text in host_row.html** In `web/templates/partials/host_row.html`, change the status-dot block (lines 6-14). Replace the `{{- else if eq $h.Status "offline" -}}` dot branch: ```html {{- else if eq $h.Status "offline" -}} ``` with: ```html {{- else if eq $h.Status "offline" -}} {{- if $h.AlwaysOn -}} {{- else -}} {{- end -}} ``` Then change the last-seen text branch (lines 28-29): ```html {{- else if eq $h.Status "offline" -}} last seen {{relTime $h.LastSeenAt}} ``` to: ```html {{- else if eq $h.Status "offline" -}} {{- if $h.AlwaysOn -}} last seen {{relTime $h.LastSeenAt}} {{- else -}} asleep · {{relTime $h.LastSeenAt}} · will catch up on return {{- end -}} ``` And the row-action label (lines 55-56): ```html {{- if eq $h.Status "offline" -}} offline ``` to: ```html {{- if eq $h.Status "offline" -}} {{if $h.AlwaysOn}}offline{{else}}asleep{{end}} ``` - [ ] **Step 4: Asleep dot + last-seen in host_chrome.html** In `web/templates/partials/host_chrome.html`, change the offline dot branch (lines 36-37): ```html {{else if eq $host.Status "offline"}} ``` to: ```html {{else if eq $host.Status "offline"}} {{if $host.AlwaysOn}} {{else}} {{end}} ``` And the last-seen line (lines 90-94): ```html {{if eq $host.Status "offline"}} last seen {{relTime $host.LastSeenAt}} {{else}} online · last heartbeat {{relTime $host.LastSeenAt}} {{end}} ``` to: ```html {{if eq $host.Status "offline"}} {{if $host.AlwaysOn}} last seen {{relTime $host.LastSeenAt}} {{else}} asleep · last seen {{relTime $host.LastSeenAt}} · will catch up on return {{end}} {{else}} online · last heartbeat {{relTime $host.LastSeenAt}} {{end}} ``` - [ ] **Step 5: Add the 24×7 chip + mode toggle to host_chrome.html** In the header tags block (lines 42-48), after the tags `edit/add tags` button and before the closing `` at line 48, add the chip (shown only when always-on) and a small toggle button mirroring the tags-editor reveal pattern: ```html {{if $host.AlwaysOn}}24×7{{end}} ``` Then add the toggle form right after the tags `