38 KiB
Always-On vs Intermittent Host Mode — Implementation Plan
For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (
- [ ]) syntax for tracking.
Goal: Let an operator mark a host as not-always-on so it stops raising offline alerts when it legitimately sleeps, renders a calm "asleep" state, auto-catches-up a missed backup ~1 minute after it reconnects, and still raises a long-threshold staleness alert if it goes too long with no backup.
Architecture: A thin policy + presentation layer over the existing online/offline state machine. A new hosts.always_on boolean (default 1 = today's behaviour) gates three behaviours: offline-alert suppression + a 7-day staleness alert in the alert engine; an in-memory catch-up scheduler in the HTTP server armed on agent hello and fired from the existing 30s tick; and an "asleep" UI state plus a 24×7 chip. Online/offline tracking, heartbeat, and pending_runs are untouched.
Tech Stack: Go, SQLite (modernc), github.com/robfig/cron/v3 (already a dependency), Go html/template, Tailwind-in-input.css.
Spec: docs/specs/2026-06-15-always-on-host-mode-design.md
File Structure
- Create
internal/store/migrations/0024_hosts_always_on.sql— add the column. - Modify
internal/store/types.go— addHost.AlwaysOn bool. - Modify
internal/store/hosts.go— addalways_onto the 3 host SELECTs +scanHostRow; addSetHostAlwaysOn. - Create
internal/store/hosts_always_on_test.go— round-trip + default test. - Modify
internal/alert/engine.go— suppress offline for intermittent hosts; staleness sweep; resolve staleness on backup success. - Modify
internal/alert/rules.go— exportedResolveKindhelper for the toggle handler; staleness threshold constant. - Create
internal/alert/intermittent_test.go— suppression + staleness + resolve tests. - Create
internal/server/http/catchup.go— overdue helper + in-memory catch-up scheduler. - Create
internal/server/http/catchup_test.go— overdue table tests. - Modify
internal/server/http/server.go— catch-up map fields onServer, init inNew. - Modify
internal/server/http/host_credentials.go— arm catch-up inonAgentHello. - Modify
cmd/server/main.go— callsrv.RunCatchupsDueon the pending-drain tick. - Modify
internal/server/http/ui_handlers.go—handleUIHostModeSavehandler. - Modify
internal/server/http/server.go(routes) — mountPOST /hosts/{id}/mode. - Modify
web/styles/input.css—dot-asleeptoken. - Modify
web/templates/partials/host_row.html— asleep dot + text. - Modify
web/templates/partials/host_chrome.html— asleep dot/last-seen, 24×7 chip, mode toggle form. - Modify
tasks.md— record the feature.
Task 1: Schema + store field for always_on
Files:
-
Create:
internal/store/migrations/0024_hosts_always_on.sql -
Modify:
internal/store/types.go:62-102(Host struct) -
Modify:
internal/store/hosts.go(3 SELECTs at lines 41-48, 56-63, 224-231;scanHostRowat 261-334) -
Test:
internal/store/hosts_always_on_test.go -
Step 1: Write the migration
Create internal/store/migrations/0024_hosts_always_on.sql:
-- 0024: distinguish always-on (24x7 server) hosts from intermittent
-- hosts (laptops/workstations that legitimately sleep). Default 1 so
-- every existing and future host keeps today's offline/alert
-- semantics unless explicitly opted out. Column-level ALTER per the
-- repo's migration rules (no table rebuild — hosts has inbound FKs).
ALTER TABLE hosts ADD COLUMN always_on INTEGER NOT NULL DEFAULT 1;
- Step 2: Add the struct field
In internal/store/types.go, add to the Host struct (after RepoStatusError at line 101):
// AlwaysOn is true for 24x7 server hosts (the default). When false
// the host is intermittent (laptop/workstation): offline alerts are
// suppressed, the UI shows an "asleep" state, and a missed backup is
// caught up ~1 min after reconnect. See the always-on-host-mode spec.
AlwaysOn bool
- Step 3: Thread
always_onthrough reads
In internal/store/hosts.go, append , always_on to the SELECT column list in all three queries: LookupHostByAgentToken (line 47), GetHost (line 62), and ListHosts (line 230). Each currently ends repo_status, repo_status_error — change to repo_status, repo_status_error, always_on.
Then in scanHostRow (line 261), add scanning. Add a local var and the scan target. Change the Scan(...) call's final args from &h.RepoStatus, &h.RepoStatusError) to &h.RepoStatus, &h.RepoStatusError, &alwaysOn) and declare var alwaysOn int in the var block, then after the existing post-scan assignments add:
h.AlwaysOn = alwaysOn != 0
(SQLite stores the boolean as INTEGER; scan into int then compare to avoid driver bool-coercion surprises.)
- Step 4: Add
SetHostAlwaysOn
In internal/store/hosts.go, after SetHostTags (line 379), add:
// SetHostAlwaysOn flips the host's always-on flag. true = 24x7 server
// (default); false = intermittent host (laptop). See the
// always-on-host-mode spec.
func (s *Store) SetHostAlwaysOn(ctx context.Context, hostID string, alwaysOn bool) error {
v := 0
if alwaysOn {
v = 1
}
_, err := s.db.ExecContext(ctx,
`UPDATE hosts SET always_on = ? WHERE id = ?`, v, hostID)
if err != nil {
return fmt.Errorf("store: set host always_on: %w", err)
}
return nil
}
- Step 5: Write the round-trip test
Create internal/store/hosts_always_on_test.go. Use the existing test harness pattern — check a sibling test (e.g. internal/store/hosts_test.go) for the newTestStore/testStore helper name and the host-creation helper, and mirror it exactly. The test body:
package store
import (
"context"
"testing"
"time"
)
func TestHostAlwaysOnDefaultAndToggle(t *testing.T) {
ctx := context.Background()
st := newTestStore(t) // mirror the helper used by hosts_test.go
h := Host{
ID: "h-always-on", Name: "lap", OS: "linux", Arch: "amd64",
ProtocolVersion: 1, EnrolledAt: time.Now().UTC(),
}
if err := st.CreateHost(ctx, h, "tok-hash", "pin"); err != nil {
t.Fatalf("create host: %v", err)
}
got, err := st.GetHost(ctx, h.ID)
if err != nil {
t.Fatalf("get host: %v", err)
}
if !got.AlwaysOn {
t.Fatalf("new host should default to always_on=true, got false")
}
if err := st.SetHostAlwaysOn(ctx, h.ID, false); err != nil {
t.Fatalf("set always_on: %v", err)
}
got, err = st.GetHost(ctx, h.ID)
if err != nil {
t.Fatalf("get host 2: %v", err)
}
if got.AlwaysOn {
t.Fatalf("expected always_on=false after toggle, got true")
}
// ListHosts must surface the same value.
hosts, err := st.ListHosts(ctx)
if err != nil {
t.Fatalf("list hosts: %v", err)
}
if len(hosts) != 1 || hosts[0].AlwaysOn {
t.Fatalf("ListHosts should report always_on=false, got %+v", hosts)
}
}
- Step 6: Run the test (expect FAIL first if written before code, else PASS)
Run: go test ./internal/store/ -run TestHostAlwaysOnDefaultAndToggle -v
Expected: PASS once Steps 1-4 are in. If you wrote the test first, it fails to compile on AlwaysOn / SetHostAlwaysOn — that is the expected red.
- Step 7: Commit
go vet ./internal/store/...
git add internal/store/migrations/0024_hosts_always_on.sql internal/store/types.go internal/store/hosts.go internal/store/hosts_always_on_test.go
git commit -m "feat(store): add hosts.always_on flag (default on)"
Task 2: Overdue computation helper
This is a pure function so it can be unit-tested in isolation before the scheduler wires it up. It lives in the new catchup.go (the scheduler will follow in Task 3, same file).
Files:
-
Create:
internal/server/http/catchup.go -
Test:
internal/server/http/catchup_test.go -
Step 1: Write the failing test
Create internal/server/http/catchup_test.go:
package http
import (
"testing"
"time"
)
func TestScheduleOverdue(t *testing.T) {
mustParse := func(s string) time.Time {
t.Helper()
v, err := time.Parse(time.RFC3339, s)
if err != nil {
t.Fatalf("parse %q: %v", s, err)
}
return v
}
daily := "0 2 * * *" // 02:00 every day
cases := []struct {
name string
cron string
lastBackup *time.Time
now time.Time
want bool
}{
{
name: "never backed up is overdue",
cron: daily, lastBackup: nil,
now: mustParse("2026-06-15T09:00:00Z"),
want: true,
},
{
name: "missed last nights window",
cron: daily,
lastBackup: ptrTime(mustParse("2026-06-13T02:05:00Z")),
now: mustParse("2026-06-15T09:00:00Z"),
want: true,
},
{
name: "backed up after the most recent window",
cron: daily,
lastBackup: ptrTime(mustParse("2026-06-15T02:05:00Z")),
now: mustParse("2026-06-15T09:00:00Z"),
want: false,
},
{
name: "unparseable cron is never overdue",
cron: "not a cron",
lastBackup: nil,
now: mustParse("2026-06-15T09:00:00Z"),
want: false,
},
}
for _, c := range cases {
t.Run(c.name, func(t *testing.T) {
got := scheduleOverdue(c.cron, c.lastBackup, c.now)
if got != c.want {
t.Fatalf("scheduleOverdue(%q, %v, %v) = %v, want %v",
c.cron, c.lastBackup, c.now, got, c.want)
}
})
}
}
func ptrTime(t time.Time) *time.Time { return &t }
- Step 2: Run the test to verify it fails
Run: go test ./internal/server/http/ -run TestScheduleOverdue -v
Expected: FAIL — undefined: scheduleOverdue.
- Step 3: Implement
scheduleOverdue
Create internal/server/http/catchup.go with the helper (the scheduler methods are added in Task 3):
// catchup.go — server-side catch-up for intermittent (non-always-on)
// hosts. When such a host reconnects we wait a short settle window,
// then dispatch a backup for any schedule whose window elapsed while
// the host was asleep. This is separate from pending_runs: a host that
// was asleep never fired its local cron, so no pending row exists.
package http
import (
"time"
)
// scheduleOverdue reports whether a schedule's most recent expected
// fire is newer than the host's last successful backup — i.e. a window
// passed with no backup. A nil lastBackup means "never backed up" and
// is always overdue (provided the cron parses). An unparseable cron is
// treated as not-overdue so a bad expression can never trigger a
// surprise dispatch. Uses the same cronParser the agent's scheduler
// and schedule validation use, so interpretation is identical.
func scheduleOverdue(cronExpr string, lastBackup *time.Time, now time.Time) bool {
sched, err := cronParser.Parse(cronExpr)
if err != nil {
return false
}
if lastBackup == nil {
return true
}
next := sched.Next(*lastBackup)
return !next.After(now)
}
- Step 4: Run the test to verify it passes
Run: go test ./internal/server/http/ -run TestScheduleOverdue -v
Expected: PASS (all four sub-cases).
- Step 5: Commit
go vet ./internal/server/http/...
git add internal/server/http/catchup.go internal/server/http/catchup_test.go
git commit -m "feat(catchup): scheduleOverdue helper for missed-window detection"
Task 3: Catch-up scheduler (arm on hello, fire on tick)
Files:
-
Modify:
internal/server/http/server.go:68-93(Server struct),:96-112(New) -
Modify:
internal/server/http/catchup.go(add scheduler methods) -
Modify:
internal/server/http/host_credentials.go:463-486(onAgentHello) -
Modify:
cmd/server/main.go:228-229(pending-drain tick case) -
Step 1: Add catch-up state to the Server struct
In internal/server/http/server.go, add fields to Server (after treeCache at line 92):
// catchupDueAt tracks intermittent hosts that reconnected and are
// in their settle window. Keyed hostID → earliest time to evaluate
// catch-up. Best-effort + in-memory: a server restart simply re-arms
// on the next hello. Guarded by catchupMu.
catchupMu sync.Mutex
catchupDueAt map[string]time.Time
Add "time" to the imports if not already present (check the import block).
- Step 2: Initialise the map in New
In New (line 106), add to the &Server{...} literal:
catchupDueAt: make(map[string]time.Time),
- Step 3: Add scheduler methods to catchup.go
Append to internal/server/http/catchup.go. Add "context", "log/slog" to its imports:
// catchupSettle is how long after a reconnect we wait before evaluating
// catch-up, so a laptop that wakes briefly and sleeps again doesn't
// trigger a backup it can't finish. ~1 minute per the spec.
const catchupSettle = 60 * time.Second
// ArmCatchup records that an intermittent host just reconnected and
// should be evaluated for a missed backup after the settle window.
// No-op for always-on hosts (caller passes only intermittent hosts).
// Re-arming overwrites the timer (debounce — flapping doesn't stack).
func (s *Server) ArmCatchup(hostID string, now time.Time) {
s.catchupMu.Lock()
defer s.catchupMu.Unlock()
if s.catchupDueAt == nil {
s.catchupDueAt = make(map[string]time.Time)
}
s.catchupDueAt[hostID] = now.Add(catchupSettle)
}
// dueCatchups returns the hostIDs whose settle window has elapsed and
// removes them from the map. Caller evaluates each.
func (s *Server) dueCatchups(now time.Time) []string {
s.catchupMu.Lock()
defer s.catchupMu.Unlock()
var due []string
for id, at := range s.catchupDueAt {
if !now.Before(at) {
due = append(due, id)
delete(s.catchupDueAt, id)
}
}
return due
}
// RunCatchupsDue is the tick entrypoint. For each host past its settle
// window it dispatches a backup for every enabled schedule that is
// overdue. Skips hosts that bounced back offline, that are already
// running/queued a job, or that turned out to be always-on.
func (s *Server) RunCatchupsDue(ctx context.Context) {
if s.deps.Hub == nil {
return
}
now := time.Now().UTC()
for _, hostID := range s.dueCatchups(now) {
s.runCatchup(ctx, hostID, now)
}
}
// runCatchup evaluates and dispatches catch-up backups for a single
// host. Exported logic kept here so RunCatchupsDue reads cleanly.
func (s *Server) runCatchup(ctx context.Context, hostID string, now time.Time) {
conn := s.deps.Hub.Conn(hostID)
if conn == nil {
return // bounced offline during the settle window; re-arms on next hello
}
host, err := s.deps.Store.GetHost(ctx, hostID)
if err != nil {
slog.Warn("catchup: load host", "host_id", hostID, "err", err)
return
}
if host.AlwaysOn {
return // mode flipped during settle window
}
if host.CurrentJobID != nil {
return // a job is already running; don't pile on
}
schedules, err := s.deps.Store.ListSchedulesByHost(ctx, hostID)
if err != nil {
slog.Warn("catchup: list schedules", "host_id", hostID, "err", err)
return
}
for _, sc := range schedules {
if !sc.Enabled || len(sc.SourceGroupIDs) == 0 {
continue
}
if !scheduleOverdue(sc.CronExpr, host.LastBackupAt, now) {
continue
}
for _, gid := range sc.SourceGroupIDs {
g, err := s.deps.Store.GetSourceGroup(ctx, hostID, gid)
if err != nil {
slog.Warn("catchup: load source group",
"host_id", hostID, "schedule_id", sc.ID, "group_id", gid, "err", err)
continue
}
if _, derr := s.dispatchBackupForGroupCore(ctx, conn, hostID, sc.ID, g, now); derr != nil {
// Send failed — host dropped again. Re-arm so the next
// reconnect retries; stop processing this host.
s.ArmCatchup(hostID, now)
return
}
slog.Info("catchup: dispatched missed backup",
"host_id", hostID, "schedule_id", sc.ID, "group", g.Name)
}
}
}
- Step 4: Arm catch-up on agent hello
In internal/server/http/host_credentials.go, in onAgentHello (line 463), after the go s.DrainPending(...) line (485), add:
// Intermittent hosts that just reconnected may have slept through a
// backup window. Arm a catch-up evaluation after a settle delay; the
// pending-drain tick fires it. Always-on hosts never need this.
if host, err := s.deps.Store.GetHost(ctx, hostID); err == nil && !host.AlwaysOn {
s.ArmCatchup(hostID, time.Now().UTC())
}
Verify time is already imported in this file (it is — used elsewhere). If not, add it.
- Step 5: Fire catch-up from the pending-drain tick
In cmd/server/main.go, in the case <-pendingDrainTick.C: block (line 228), change:
case <-pendingDrainTick.C:
srv.DrainAllDue(ctx)
to:
case <-pendingDrainTick.C:
srv.DrainAllDue(ctx)
srv.RunCatchupsDue(ctx)
- Step 6: Build and vet
Run: go build ./... && go vet ./...
Expected: clean build, no vet errors.
- Step 7: Commit
git add internal/server/http/server.go internal/server/http/catchup.go internal/server/http/host_credentials.go cmd/server/main.go
git commit -m "feat(catchup): arm on hello, fire missed-window backups on tick"
Task 4: Alert engine — suppress offline + staleness alert
Files:
-
Modify:
internal/alert/engine.go:121-153(handleJobFinished),:155-174(handleHostOffline),:188-216(tick) -
Modify:
internal/alert/rules.go:13-39(constants), add exported resolve helper -
Test:
internal/alert/intermittent_test.go -
Step 1: Add the staleness threshold constant
In internal/alert/engine.go, add near the top of the file (after imports, before JobFinishedEvent):
// staleBackupThreshold is how long an intermittent host may go without
// a successful backup before we raise a stale_schedule alert. Global
// constant for v1 (may become per-host later). Only intermittent hosts
// are evaluated — always-on hosts' stale_schedule stays a no-op.
const staleBackupThreshold = 7 * 24 * time.Hour
- Step 2: Suppress the offline alert for intermittent hosts
In handleHostOffline (line 155), after loading the host and the existing if host.LastSeenAt == nil { return } guard, add a mode check. Change:
if host.LastSeenAt == nil {
return
}
if time.Since(*host.LastSeenAt) < e.agentOfflineFloor {
return
}
to:
// Intermittent hosts (laptops) legitimately disappear — never raise
// agent_offline for them. The stale_schedule sweep in tick() is the
// only staleness signal for these hosts.
if !host.AlwaysOn {
return
}
if host.LastSeenAt == nil {
return
}
if time.Since(*host.LastSeenAt) < e.agentOfflineFloor {
return
}
- Step 3: Suppress offline + add staleness in the tick sweep
In tick (line 188), the host loop currently raises agent_offline for every offline host. Replace the loop body (lines 205-214) with:
for _, h := range hosts {
// Intermittent hosts: suppress agent_offline entirely; instead
// raise stale_schedule when they have gone too long with no
// successful backup AND they have at least one enabled schedule
// to be measured against. A nil LastBackupAt (never backed up)
// has no baseline — onboarding/repo_status covers that case.
if !h.AlwaysOn {
if h.LastBackupAt == nil {
continue
}
if now.Sub(*h.LastBackupAt) < staleBackupThreshold {
continue
}
hasEnabled, err := e.hostHasEnabledSchedule(ctx, h.ID)
if err != nil || !hasEnabled {
continue
}
e.raiseAndNotify(ctx, h.ID, KindStaleSchedule, "", "warning",
fmt.Sprintf("No backup in %s (threshold %s)",
roundDur(now.Sub(*h.LastBackupAt)), staleBackupThreshold), now)
continue
}
// Always-on hosts: existing agent_offline re-evaluation.
if h.Status != "offline" || h.LastSeenAt == nil {
continue
}
if now.Sub(*h.LastSeenAt) >= e.agentOfflineFloor {
e.raiseAndNotify(ctx, h.ID, KindAgentOffline, "", "warning",
fmt.Sprintf("Agent offline for %s (threshold %s)",
roundDur(now.Sub(*h.LastSeenAt)), e.agentOfflineFloor), now)
}
}
Delete the trailing // Stale-schedule sweep — no-op in v1. comment at line 215.
- Step 4: Add the
hostHasEnabledSchedulehelper
In internal/alert/engine.go, add at the end of the file:
// hostHasEnabledSchedule reports whether the host has at least one
// enabled backup schedule — the precondition for a stale_schedule
// alert (no schedule = no backup expectation to measure against).
func (e *Engine) hostHasEnabledSchedule(ctx context.Context, hostID string) (bool, error) {
schedules, err := e.store.ListSchedulesByHost(ctx, hostID)
if err != nil {
return false, err
}
for _, sc := range schedules {
if sc.Enabled {
return true, nil
}
}
return false, nil
}
- Step 5: Resolve staleness on a successful backup
In handleJobFinished (line 146), the case "succeeded": currently resolves only the job-kind alert. For a successful backup, also clear any open stale_schedule. Change:
case "succeeded":
e.resolveAndNotify(ctx, ev.HostID, kind, dedupKey, ev.When)
}
to:
case "succeeded":
e.resolveAndNotify(ctx, ev.HostID, kind, dedupKey, ev.When)
if ev.Kind == "backup" {
// A fresh backup clears staleness for intermittent hosts.
e.resolveAndNotify(ctx, ev.HostID, KindStaleSchedule, "", ev.When)
}
}
- Step 6: Add an exported mode-change resolve hook
The HTTP toggle handler (Task 5) needs to clear stale alerts when an operator changes a host's mode. Add to internal/alert/rules.go (after Resolve, around line 100):
// ResolveOnModeChange clears any open agent_offline and stale_schedule
// alerts for a host whose always-on flag was just toggled. The next
// 60s tick re-raises whichever still applies under the new mode, so
// this is a self-correcting "wipe and let the sweep settle" call.
// Safe to invoke from the HTTP layer (it only touches the store + hub).
func (e *Engine) ResolveOnModeChange(ctx context.Context, hostID string, when time.Time) {
e.resolveAndNotify(ctx, hostID, KindAgentOffline, "", when)
e.resolveAndNotify(ctx, hostID, KindStaleSchedule, "", when)
}
- Step 7: Write the engine tests
Create internal/alert/intermittent_test.go. First inspect an existing engine test (e.g. grep internal/alert/*_test.go for how NewEngine is constructed with a test store + hub, and the helper that creates a host + schedule). Mirror those helpers. The tests to write:
package alert
import (
"context"
"testing"
"time"
)
// Mirror the construction helpers used by the existing engine tests
// (newTestEngine / test store / host+schedule seeding). Replace the
// placeholder helpers below with the real ones from this package's
// existing _test.go files.
func TestIntermittentHostSuppressesOfflineAlert(t *testing.T) {
ctx := context.Background()
e, st := newTestEngine(t) // mirror existing helper
hostID := seedHost(t, st, false /* alwaysOn */)
// last seen well past the floor
touchHostSeen(t, st, hostID, time.Now().Add(-2*time.Hour))
markHostOffline(t, st, hostID)
e.handleHostOffline(ctx, hostID)
if n := openAlertCount(t, st, hostID, KindAgentOffline); n != 0 {
t.Fatalf("intermittent host should not raise agent_offline, got %d", n)
}
}
func TestAlwaysOnHostStillRaisesOfflineAlert(t *testing.T) {
ctx := context.Background()
e, st := newTestEngine(t)
hostID := seedHost(t, st, true /* alwaysOn */)
touchHostSeen(t, st, hostID, time.Now().Add(-2*time.Hour))
markHostOffline(t, st, hostID)
e.handleHostOffline(ctx, hostID)
if n := openAlertCount(t, st, hostID, KindAgentOffline); n != 1 {
t.Fatalf("always-on host should raise agent_offline, got %d", n)
}
}
func TestStalenessAlertForIntermittentHost(t *testing.T) {
ctx := context.Background()
e, st := newTestEngine(t)
hostID := seedHost(t, st, false)
seedEnabledSchedule(t, st, hostID) // "0 2 * * *" with a source group
setLastBackup(t, st, hostID, time.Now().Add(-8*24*time.Hour))
e.tick(ctx, time.Now().UTC())
if n := openAlertCount(t, st, hostID, KindStaleSchedule); n != 1 {
t.Fatalf("expected one stale_schedule alert, got %d", n)
}
// A successful backup clears it.
e.handleJobFinished(ctx, JobFinishedEvent{
HostID: hostID, JobID: "j1", Kind: "backup",
Status: "succeeded", When: time.Now().UTC(),
})
if n := openAlertCount(t, st, hostID, KindStaleSchedule); n != 0 {
t.Fatalf("stale_schedule should resolve after backup, got %d", n)
}
}
func TestNoStalenessWithoutEnabledSchedule(t *testing.T) {
ctx := context.Background()
e, st := newTestEngine(t)
hostID := seedHost(t, st, false)
setLastBackup(t, st, hostID, time.Now().Add(-8*24*time.Hour))
// no schedule seeded
e.tick(ctx, time.Now().UTC())
if n := openAlertCount(t, st, hostID, KindStaleSchedule); n != 0 {
t.Fatalf("no schedule => no staleness alert, got %d", n)
}
}
Note for the implementer: the
newTestEngine,seedHost,touchHostSeen,markHostOffline,openAlertCount,seedEnabledSchedule,setLastBackuphelpers must be replaced with the real equivalents in this package's existing tests. If a needed seeding helper doesn't exist, write it using thestoremethods directly (CreateHost,SetHostAlwaysOn,CreateSchedule,SetHostLastBackup,MarkHostsOfflineStale,ListAlerts). Do NOT invent store methods — all required ones exist as of Task 1.
- Step 8: Run the tests
Run: go test ./internal/alert/ -v
Expected: PASS for all four new tests plus the existing suite.
- Step 9: Commit
go vet ./internal/alert/...
git add internal/alert/engine.go internal/alert/rules.go internal/alert/intermittent_test.go
git commit -m "feat(alert): suppress offline + add staleness alert for intermittent hosts"
Task 5: HTTP toggle handler + route
Files:
-
Modify:
internal/server/http/ui_handlers.go(new handler nearhandleUIHostTagsSaveat line 954) -
Modify:
internal/server/http/server.go:281(route mount) -
Step 1: Add the handler
In internal/server/http/ui_handlers.go, after handleUIHostTagsSave (line 984), add:
// handleUIHostModeSave flips a host's always-on flag. Checkbox present
// in the form (value any) => always-on; absent => intermittent.
// Operator-band; mounted in server.go. On change we clear open
// offline/staleness alerts via the engine so the next sweep re-raises
// only what still applies under the new mode.
func (s *Server) handleUIHostModeSave(w stdhttp.ResponseWriter, r *stdhttp.Request) {
u := s.requireUIUser(w, r)
if u == nil {
return
}
hostID := chi.URLParam(r, "id")
if _, err := s.deps.Store.GetHost(r.Context(), hostID); err != nil {
stdhttp.NotFound(w, r)
return
}
if err := r.ParseForm(); err != nil {
stdhttp.Error(w, "bad request", stdhttp.StatusBadRequest)
return
}
alwaysOn := r.PostForm.Get("always_on") != ""
if err := s.deps.Store.SetHostAlwaysOn(r.Context(), hostID, alwaysOn); err != nil {
slog.Error("ui host mode: save", "host_id", hostID, "err", err)
stdhttp.Error(w, "internal", stdhttp.StatusInternalServerError)
return
}
if s.deps.AlertEngine != nil {
s.deps.AlertEngine.ResolveOnModeChange(r.Context(), hostID, time.Now().UTC())
}
_ = s.deps.Store.AppendAudit(r.Context(), store.AuditEntry{
ID: ulid.Make().String(), UserID: &u.ID, Actor: "user",
Action: "host.mode_updated",
TargetKind: ptr("host"), TargetID: &hostID,
TS: time.Now().UTC(),
})
stdhttp.Redirect(w, r, "/hosts/"+hostID, stdhttp.StatusSeeOther)
}
- Step 2: Mount the route
In internal/server/http/server.go, next to the tags route (line 281):
r.Post("/hosts/{id}/tags", s.handleUIHostTagsSave)
add directly below:
r.Post("/hosts/{id}/mode", s.handleUIHostModeSave)
(Confirm it lands in the same operator-band route group as /hosts/{id}/tags — same indentation/block.)
- Step 3: Build and vet
Run: go build ./... && go vet ./...
Expected: clean.
- Step 4: Write a handler test
Add to the existing UI-handler test file (grep internal/server/http/*_test.go for the harness that builds a Server + does form POSTs against /hosts/{id}/tags; mirror it). The test posts to /hosts/{id}/mode with and without the always_on field and asserts the stored flag:
func TestHandleUIHostModeSave(t *testing.T) {
srv, st, sess := newUITestServer(t) // mirror tags-save test harness
hostID := seedHostForUI(t, st) // mirror existing host seeding
// Uncheck: form without always_on => intermittent.
postForm(t, srv, sess, "/hosts/"+hostID+"/mode", map[string]string{})
if h, _ := st.GetHost(context.Background(), hostID); h.AlwaysOn {
t.Fatalf("expected always_on=false after empty post")
}
// Check: form with always_on=on => always-on.
postForm(t, srv, sess, "/hosts/"+hostID+"/mode", map[string]string{"always_on": "on"})
if h, _ := st.GetHost(context.Background(), hostID); !h.AlwaysOn {
t.Fatalf("expected always_on=true after checked post")
}
}
Replace
newUITestServer/seedHostForUI/postFormwith the real harness helpers from the existing UI handler tests.
- Step 5: Run the test
Run: go test ./internal/server/http/ -run TestHandleUIHostModeSave -v
Expected: PASS.
- Step 6: Commit
git add internal/server/http/ui_handlers.go internal/server/http/server.go internal/server/http/*_test.go
git commit -m "feat(http): host mode toggle handler + route (host.mode_updated)"
Task 6: UI — asleep state, 24×7 chip, mode toggle
Files:
-
Modify:
web/styles/input.css(dot-asleep token) -
Modify:
web/templates/partials/host_row.html -
Modify:
web/templates/partials/host_chrome.html -
Step 1: Add the
dot-asleepCSS token
In web/styles/input.css, find the .dot-offline definition (grep for dot-offline) and add a sibling .dot-asleep rule. Match the existing dot pattern; use a calm grey-blue distinct from offline's grey/red. Example (adapt colours to the file's existing tokens):
.dot-asleep { background: var(--ink-fade); opacity: 0.6; }
Inspect the neighbouring
.dot-offline/.dot-degradedrules first and follow their exact shape (size, border, etc.); only the colour/opacity should differ.
- Step 2: Rebuild CSS if the project precompiles it
Check the Makefile for a CSS build step (grep css in Makefile). If present, run it (e.g. make css). If the server serves input.css directly, skip.
- Step 3: Asleep dot + text in host_row.html
In web/templates/partials/host_row.html, change the status-dot block (lines 6-14). Replace the {{- else if eq $h.Status "offline" -}} dot branch:
{{- else if eq $h.Status "offline" -}}
<span class="dot dot-offline"></span>
with:
{{- else if eq $h.Status "offline" -}}
{{- if $h.AlwaysOn -}}
<span class="dot dot-offline"></span>
{{- else -}}
<span class="dot dot-asleep"></span>
{{- end -}}
Then change the last-seen text branch (lines 28-29):
{{- else if eq $h.Status "offline" -}}
<span class="text-ink-mute">last seen <span class="mono">{{relTime $h.LastSeenAt}}</span></span>
to:
{{- else if eq $h.Status "offline" -}}
{{- if $h.AlwaysOn -}}
<span class="text-ink-mute">last seen <span class="mono">{{relTime $h.LastSeenAt}}</span></span>
{{- else -}}
<span class="text-ink-mute">asleep · <span class="mono">{{relTime $h.LastSeenAt}}</span> · will catch up on return</span>
{{- end -}}
And the row-action label (lines 55-56):
{{- if eq $h.Status "offline" -}}
<span class="mono text-xs text-ink-fade">offline</span>
to:
{{- if eq $h.Status "offline" -}}
<span class="mono text-xs text-ink-fade">{{if $h.AlwaysOn}}offline{{else}}asleep{{end}}</span>
- Step 4: Asleep dot + last-seen in host_chrome.html
In web/templates/partials/host_chrome.html, change the offline dot branch (lines 36-37):
{{else if eq $host.Status "offline"}}
<span class="dot dot-offline"></span>
to:
{{else if eq $host.Status "offline"}}
{{if $host.AlwaysOn}}
<span class="dot dot-offline"></span>
{{else}}
<span class="dot dot-asleep"></span>
{{end}}
And the last-seen line (lines 90-94):
{{if eq $host.Status "offline"}}
<span>last seen <span class="mono text-ink-mid">{{relTime $host.LastSeenAt}}</span></span>
{{else}}
<span>online · last heartbeat <span class="mono text-ink-mid">{{relTime $host.LastSeenAt}}</span></span>
{{end}}
to:
{{if eq $host.Status "offline"}}
{{if $host.AlwaysOn}}
<span>last seen <span class="mono text-ink-mid">{{relTime $host.LastSeenAt}}</span></span>
{{else}}
<span>asleep · last seen <span class="mono text-ink-mid">{{relTime $host.LastSeenAt}}</span> · will catch up on return</span>
{{end}}
{{else}}
<span>online · last heartbeat <span class="mono text-ink-mid">{{relTime $host.LastSeenAt}}</span></span>
{{end}}
- Step 5: Add the 24×7 chip + mode toggle to host_chrome.html
In the header tags block (lines 42-48), after the tags edit/add tags button and before the closing </div> at line 48, add the chip (shown only when always-on) and a small toggle button mirroring the tags-editor reveal pattern:
{{if $host.AlwaysOn}}<span class="tag" title="Expected online 24×7 — offline raises an alert">24×7</span>{{end}}
<button type="button" class="text-ink-fade text-[11px] hover:text-ink-mid whitespace-nowrap"
style="padding: 2px 8px; border: 1px dashed var(--line); border-radius: 3px; cursor: pointer;"
onclick="document.getElementById('mode-edit-{{$host.ID}}').classList.toggle('hidden')"
title="Change presence mode">presence</button>
Then add the toggle form right after the tags <form> block (after line 82, before the <div class="flex items-center gap-3 mt-3 ..."> at line 83):
{{/* Presence-mode editor — hidden by default; toggled by the
"presence" button. Checkbox present => always-on (24×7);
unchecked => intermittent (laptop): no offline alerts, shows
"asleep", auto-catches-up a missed backup on reconnect. */}}
<form id="mode-edit-{{$host.ID}}" method="post"
action="/hosts/{{$host.ID}}/mode"
class="hidden mt-3" style="max-width: 640px;">
<label class="flex items-center gap-2 text-[12px] text-ink-mid">
<input type="checkbox" name="always_on" value="on" {{if $host.AlwaysOn}}checked{{end}} />
Always On — expected online 24×7
</label>
<div class="field-help">
Uncheck for an intermittent host (laptop/workstation): it won’t
raise offline alerts when asleep, shows an “asleep” state, and
catches up a missed backup ~1 minute after it reconnects.
</div>
<button type="submit" class="btn btn-primary mt-2 whitespace-nowrap">Save presence</button>
</form>
- Step 6: Verify templates parse
Run: go build ./... && go test ./internal/server/... -run Template -v (if a template-render test exists; otherwise rely on the smoke run in Step 7). At minimum: go build ./... must pass.
- Step 7: Manual smoke (per CLAUDE.md smoke targets)
make smoke-deploy
Then in a browser (or Playwright): open the dashboard and a host detail page. Toggle a host to intermittent via the "presence" control, confirm the 24×7 chip disappears, and confirm an offline/sleeping intermittent host renders the grey "asleep · … · will catch up on return" line instead of red "offline". Toggle back and confirm the chip returns.
- Step 8: Commit
git add web/styles/input.css web/templates/partials/host_row.html web/templates/partials/host_chrome.html
git commit -m "feat(ui): asleep state, 24×7 chip, presence toggle for host mode"
Task 7: Record in tasks.md + final verification
Files:
-
Modify:
tasks.md -
Step 1: Add a tasks.md entry
Add a [x] entry under "Next steps from testing" in tasks.md (mirroring the NS-07 style — one line + a short "As shipped" note) describing the always-on/intermittent host mode: always_on column (default on), offline-alert suppression + 7-day staleness alert for intermittent hosts, settle-then-catch-up on reconnect, and the asleep UI + 24×7 chip + presence toggle.
- Step 2: Full verification
go vet ./...
go test ./...
Expected: vet clean, all tests green.
- Step 3: Commit
git add tasks.md
git commit -m "docs(tasks): record always-on/intermittent host mode"
Self-Review notes
- Spec coverage: §1 data model → Task 1. §2 mechanics unchanged → no task needed (verified untouched). §3 alerts (suppress offline, staleness, resolve-on-backup, resolve-on-toggle) → Task 4 + Task 5 Step 1. §4 catch-up (arm on hello, settle, per-schedule overdue, dispatch, guards) → Tasks 2-3. §5 UI (dot-asleep, asleep text, 24×7 chip, toggle) → Task 6. Testing → tests in Tasks 1-5. Out-of-scope items respected (global 7d const, reconnect-only, no agent-side cron, always-on stale_schedule untouched).
- Type consistency:
scheduleOverdue(cronExpr string, *time.Time, time.Time) bool,ArmCatchup(hostID string, now time.Time),RunCatchupsDue(ctx),SetHostAlwaysOn(ctx, hostID, bool),ResolveOnModeChange(ctx, hostID, when),Host.AlwaysOn bool— used consistently across tasks. - No invented store methods: all
store.*calls (GetHost, ListSchedulesByHost, GetSourceGroup, SetHostLastBackup, ListAlerts, AppendAudit, dispatchBackupForGroupCore, Hub.Conn/Connected) exist in the current tree;SetHostAlwaysOnis the only new one and is defined in Task 1. - Test helper caveat: the alert and HTTP handler tests reference package-local helpers (
newTestEngine,newUITestServer, etc.) that must be matched to the real names in existing_test.gofiles at implementation time — flagged inline in each task.