16 Commits

Author SHA1 Message Date
steve 0fbacf9f98 docs(changelog): v1.1.0 (always-on host mode) + retroactive v1.0.1
CI / Test (rest) (pull_request) Successful in 10s
CI / Lint (pull_request) Successful in 16s
CI / Build (windows/amd64) (pull_request) Successful in 11s
CI / Build (linux/amd64) (pull_request) Successful in 12s
CI / Build (linux/arm64) (pull_request) Successful in 11s
CI / Test (store) (pull_request) Successful in 1m5s
e2e / Playwright vs docker-compose (pull_request) Failing after 9s
CI / Test (server-http) (pull_request) Failing after 2m43s
2026-06-15 23:07:43 +01:00
steve d8fd4110b0 Merge pull request 'Always-On vs intermittent host mode (laptops): suppress offline noise, catch up missed backups' (#31) from feat-laptop-host-mode into main
Reviewed-on: #31
2026-06-15 23:01:03 +01:00
steve e17932d797 Merge branch 'main' into feat-laptop-host-mode
CI / Test (rest) (pull_request) Successful in 1m6s
CI / Lint (pull_request) Successful in 18s
CI / Build (windows/amd64) (pull_request) Successful in 12s
CI / Build (linux/amd64) (pull_request) Successful in 14s
CI / Test (store) (pull_request) Successful in 1m8s
CI / Build (linux/arm64) (pull_request) Successful in 11s
e2e / Playwright vs docker-compose (pull_request) Failing after 10s
CI / Test (server-http) (pull_request) Successful in 2m52s
2026-06-15 23:00:56 +01:00
steve 39030a3bbe ui(host header): boxed tags/presence pills, click-to-edit, simplified out-of-date chip
CI / Test (rest) (pull_request) Successful in 41s
CI / Test (store) (pull_request) Successful in 1m16s
CI / Lint (pull_request) Successful in 41s
CI / Build (windows/amd64) (pull_request) Successful in 14s
CI / Build (linux/arm64) (pull_request) Successful in 15s
e2e / Playwright vs docker-compose (pull_request) Failing after 11s
CI / Build (linux/amd64) (pull_request) Successful in 50s
CI / Test (server-http) (pull_request) Failing after 2m53s
2026-06-15 22:58:38 +01:00
steve a30f824a3c Merge pull request 'Tidy: fix stale-dated sparkline test + gitignore agent worktrees' (#30) from tidy-sparkline-test-and-gitignore into main
Reviewed-on: #30
2026-06-15 22:32:53 +01:00
steve 239d55b65b test(dashboard): use relative dates so sparkline test doesn't age out of the 30-day window
CI / Test (store) (pull_request) Successful in 8s
CI / Test (rest) (pull_request) Successful in 45s
CI / Lint (pull_request) Successful in 33s
CI / Build (windows/amd64) (pull_request) Successful in 44s
CI / Build (linux/amd64) (pull_request) Successful in 47s
CI / Build (linux/arm64) (pull_request) Successful in 45s
CI / Test (server-http) (pull_request) Successful in 2m26s
e2e / Playwright vs docker-compose (pull_request) Successful in 2m50s
2026-06-15 22:15:07 +01:00
steve 74e5b75380 chore: gitignore .claude/worktrees (transient agent worktrees) 2026-06-15 22:14:36 +01:00
steve 9371b7b777 fix(catchup): guard on real in-flight backup check; add scheduler tests 2026-06-15 21:45:01 +01:00
steve 10b2518323 docs(tasks): record NS-08 always-on/intermittent host mode 2026-06-15 21:30:23 +01:00
steve 6694dfdc3a fix(ui): rebuild CSS bundle so dot-asleep ships to the browser 2026-06-15 21:27:33 +01:00
steve f88f2cc1f2 feat(ui): asleep state, 24×7 chip, presence toggle for host mode 2026-06-15 21:22:42 +01:00
steve 1a07fbb217 feat(http): host mode toggle handler + route (host.mode_updated) 2026-06-15 21:17:57 +01:00
steve 9e6524788f refactor(alert): refresh stale_schedule docs; log tick schedule errors; add mode-change + never-backed-up tests 2026-06-15 21:15:35 +01:00
steve 25c55e5e4d feat(alert): suppress offline + add staleness alert for intermittent hosts 2026-06-15 21:09:39 +01:00
steve e408de9610 refactor(catchup): drop dead nil-guard; document per-host baseline limitation 2026-06-15 21:06:37 +01:00
steve 5c4e0275d9 feat(catchup): arm on hello, fire missed-window backups on tick 2026-06-15 21:02:04 +01:00
20 changed files with 972 additions and 29 deletions
+3
View File
@@ -49,3 +49,6 @@ coverage.html
# Local-only planning / scratch — never committed.
/ask.md
/docs/superpowers/
# Claude Code agent worktrees (transient, harness-created).
/.claude/worktrees/
+38
View File
@@ -6,6 +6,44 @@ and the project follows [Semantic Versioning](https://semver.org/).
## [Unreleased]
## [1.1.0] - 2026-06-15
### Added
- **Always-On vs intermittent host mode.** A host can now be marked as
not always-on — for laptops/workstations that legitimately sleep,
travel, or shut down outside hours. An intermittent host no longer
raises "agent offline" alerts when it disappears; instead it shows a
calm "asleep" state in the UI ("asleep · last seen … · will catch up
on return") and is covered by a longer-horizon staleness alert (raised
only when it has an enabled schedule and no successful backup in 7
days). When such a host reconnects, the server waits a short settle
window and then automatically dispatches any scheduled backup whose
window elapsed while it was asleep. Toggle per host from the host
detail page (operator-band, audited as `host.mode_updated`). New and
existing hosts default to always-on, so current fleets are unaffected.
### Changed
- Host-detail header redesign: tags and presence are grouped into
labelled, boxed pills with click-to-edit; presence shows a `24x7` /
`Free` chip; the agent "out of date" indicator is simplified (the full
version detail remains in the Agent-update panel and on hover).
- Relative timestamps ("2h ago") now tick client-side, so a tab left
open no longer shows a stale value as wall-clock time moves on.
- Release and CI container images are now published to and pulled from
the zot OCI registry (`docker.dcglab.co.uk`).
## [1.0.1] - 2026-05-09
### Fixed
- Build version is now single-sourced from `internal/version`, and the
server Dockerfile's ldflags were corrected so docker-built binaries
report their real version. Previously `internal/version.Version` stayed
at its "dev" default in docker images, which made every host look
permanently out-of-date to the update logic.
## [1.0.0] - 2026-05-09
First tagged release. Six development phases brought the project from
+1
View File
@@ -227,6 +227,7 @@ func run() error {
}
case <-pendingDrainTick.C:
srv.DrainAllDue(ctx)
srv.RunCatchupsDue(ctx)
case <-pendingExpiryTick.C:
if n, err := st.DeleteExpiredPendingHosts(ctx, time.Now().UTC()); err == nil && n > 0 {
slog.Info("expired pending hosts swept", "n", n)
+64 -6
View File
@@ -22,6 +22,12 @@ import (
"gitea.dcglab.co.uk/steve/restic-manager/internal/store"
)
// staleBackupThreshold is how long an intermittent host may go without
// a successful backup before we raise a stale_schedule alert. Global
// constant for v1 (may become per-host later). Only intermittent hosts
// are evaluated — always-on hosts' stale_schedule stays a no-op.
const staleBackupThreshold = 7 * 24 * time.Hour
// JobFinishedEvent carries everything the engine needs to evaluate
// the failed-X rules. Pushed via Engine.NotifyJobFinished from the
// MarkJobFinished site.
@@ -149,6 +155,10 @@ func (e *Engine) handleJobFinished(ctx context.Context, ev JobFinishedEvent) {
fmt.Sprintf("%s job %s failed", ev.Kind, ev.JobID), ev.When)
case "succeeded":
e.resolveAndNotify(ctx, ev.HostID, kind, dedupKey, ev.When)
if ev.Kind == "backup" {
// A fresh backup clears staleness for intermittent hosts.
e.resolveAndNotify(ctx, ev.HostID, KindStaleSchedule, "", ev.When)
}
}
}
@@ -157,6 +167,12 @@ func (e *Engine) handleHostOffline(ctx context.Context, hostID string) {
if err != nil {
return
}
// Intermittent hosts (laptops) legitimately disappear — never raise
// agent_offline for them. The stale_schedule sweep in tick() is the
// only staleness signal for these hosts.
if !host.AlwaysOn {
return
}
// Apply the 15-min floor — raise only when last_seen_at is older
// than agentOfflineFloor. A nil last_seen_at (host enrolled but
// never connected) is treated as "now" so we don't raise
@@ -180,11 +196,9 @@ func (e *Engine) handleHostOnline(ctx context.Context, hostID string) {
// tick is the 60-second sweep. Responsibilities:
// 1. Re-evaluate agent_offline for every offline host that may have
// crossed the floor between explicit events.
// 2. Stale-schedule detection — declared in the spec but intentionally
// left as a no-op in v1. The precise "expected to have fired but
// didn't" trigger requires a store helper that lands in a later
// task. The KindStaleSchedule constant is exported so UI code can
// reference the tag string today.
// 2. Stale-schedule detection for intermittent hosts — raises
// stale_schedule when LastBackupAt is older than 7 days and the
// host has an enabled schedule. Always-on hosts are excluded.
func (e *Engine) tick(ctx context.Context, now time.Time) {
// User-management cleanup piggy-backed here for now. Setup tokens
// have a 1h expiry; the alert engine tick is the cheapest existing
@@ -203,6 +217,35 @@ func (e *Engine) tick(ctx context.Context, now time.Time) {
return
}
for _, h := range hosts {
// Intermittent hosts: suppress agent_offline entirely; instead
// raise stale_schedule when they have gone too long with no
// successful backup AND they have at least one enabled schedule
// to be measured against. A nil LastBackupAt (never backed up)
// has no baseline — onboarding/repo_status covers that case.
if !h.AlwaysOn {
if h.LastBackupAt == nil {
continue
}
if now.Sub(*h.LastBackupAt) < staleBackupThreshold {
continue
}
hasEnabled, err := e.hostHasEnabledSchedule(ctx, h.ID)
if err != nil {
slog.Warn("alert: tick list schedules", "host_id", h.ID, "err", err)
continue
}
if !hasEnabled {
continue
}
e.raiseAndNotify(ctx, h.ID, KindStaleSchedule, "", "warning",
fmt.Sprintf("No backup in %s (threshold %s)",
roundDur(now.Sub(*h.LastBackupAt)), staleBackupThreshold), now)
// Resolution is handled in handleJobFinished on a successful
// backup (and ResolveOnModeChange on toggle) — the tick only
// raises, it does not auto-resolve.
continue
}
// Always-on hosts: existing agent_offline re-evaluation.
if h.Status != "offline" || h.LastSeenAt == nil {
continue
}
@@ -212,7 +255,6 @@ func (e *Engine) tick(ctx context.Context, now time.Time) {
roundDur(now.Sub(*h.LastSeenAt)), e.agentOfflineFloor), now)
}
}
// Stale-schedule sweep — no-op in v1. See KindStaleSchedule doc comment.
}
// roundDur returns a human-readable duration string, rounding to the
@@ -224,3 +266,19 @@ func roundDur(d time.Duration) string {
}
return d.Round(time.Minute).String()
}
// hostHasEnabledSchedule reports whether the host has at least one
// enabled backup schedule — the precondition for a stale_schedule
// alert (no schedule = no backup expectation to measure against).
func (e *Engine) hostHasEnabledSchedule(ctx context.Context, hostID string) (bool, error) {
schedules, err := e.store.ListSchedulesByHost(ctx, hostID)
if err != nil {
return false, err
}
for _, sc := range schedules {
if sc.Enabled {
return true, nil
}
}
return false, nil
}
+255
View File
@@ -0,0 +1,255 @@
package alert
import (
"context"
"testing"
"time"
"github.com/oklog/ulid/v2"
"gitea.dcglab.co.uk/steve/restic-manager/internal/store"
)
// TestIntermittentHostSuppressesOfflineAlert checks that handleHostOffline
// does NOT raise agent_offline for a host with AlwaysOn=false.
func TestIntermittentHostSuppressesOfflineAlert(t *testing.T) {
t.Parallel()
eng, st, hostID := setupEngine(t)
ctx := context.Background()
// Make the host intermittent.
if err := st.SetHostAlwaysOn(ctx, hostID, false); err != nil {
t.Fatalf("SetHostAlwaysOn: %v", err)
}
// Give it a stale last_seen_at well past the floor.
if _, err := st.DB().Exec(
`UPDATE hosts SET last_seen_at = ?, status = ? WHERE id = ?`,
time.Now().UTC().Add(-2*time.Hour).Format(time.RFC3339Nano),
"offline",
hostID,
); err != nil {
t.Fatalf("update last_seen_at: %v", err)
}
eng.handleHostOffline(ctx, hostID)
open, _ := st.ListAlerts(ctx, store.AlertFilter{Status: "open", HostID: hostID})
if len(open) != 0 {
t.Fatalf("expected 0 open alerts for intermittent host; got %d: %+v", len(open), open)
}
}
// TestAlwaysOnHostStillRaisesOfflineAlert checks that always-on hosts still
// get an agent_offline alert when offline past the floor.
func TestAlwaysOnHostStillRaisesOfflineAlert(t *testing.T) {
t.Parallel()
eng, st, hostID := setupEngine(t)
ctx := context.Background()
// always_on=true is the default, but be explicit.
if err := st.SetHostAlwaysOn(ctx, hostID, true); err != nil {
t.Fatalf("SetHostAlwaysOn: %v", err)
}
// Give it a stale last_seen_at well past the 15m floor.
if _, err := st.DB().Exec(
`UPDATE hosts SET last_seen_at = ?, status = ? WHERE id = ?`,
time.Now().UTC().Add(-2*time.Hour).Format(time.RFC3339Nano),
"offline",
hostID,
); err != nil {
t.Fatalf("update last_seen_at: %v", err)
}
eng.handleHostOffline(ctx, hostID)
open, _ := st.ListAlerts(ctx, store.AlertFilter{Status: "open", HostID: hostID})
if len(open) != 1 || open[0].Kind != KindAgentOffline {
t.Fatalf("expected 1 agent_offline alert; got %d: %+v", len(open), open)
}
}
// TestStalenessAlertForIntermittentHost checks that tick raises stale_schedule
// for an intermittent host whose last backup is older than 7 days AND has an
// enabled schedule. Also verifies that a succeeded backup clears the alert.
func TestStalenessAlertForIntermittentHost(t *testing.T) {
t.Parallel()
eng, st, hostID := setupEngine(t)
ctx := context.Background()
// Make intermittent.
if err := st.SetHostAlwaysOn(ctx, hostID, false); err != nil {
t.Fatalf("SetHostAlwaysOn: %v", err)
}
// Create a source group to attach the schedule to.
sgID := ulid.Make().String()
if err := st.CreateSourceGroup(ctx, &store.SourceGroup{
ID: sgID,
HostID: hostID,
Name: "default",
Includes: []string{"/home"},
}); err != nil {
t.Fatalf("CreateSourceGroup: %v", err)
}
// Create an enabled schedule pointing at the source group.
schedID := ulid.Make().String()
if err := st.CreateSchedule(ctx, &store.Schedule{
ID: schedID,
HostID: hostID,
CronExpr: "0 2 * * *",
Enabled: true,
SourceGroupIDs: []string{sgID},
}); err != nil {
t.Fatalf("CreateSchedule: %v", err)
}
// Set last_backup_at to 8 days ago.
eightDaysAgo := time.Now().UTC().Add(-8 * 24 * time.Hour)
if err := st.SetHostLastBackup(ctx, hostID, "succeeded", eightDaysAgo); err != nil {
t.Fatalf("SetHostLastBackup: %v", err)
}
eng.tick(ctx, time.Now().UTC())
open, _ := st.ListAlerts(ctx, store.AlertFilter{Status: "open", HostID: hostID})
var staleCount int
for _, a := range open {
if a.Kind == KindStaleSchedule {
staleCount++
}
}
if staleCount != 1 {
t.Fatalf("expected 1 stale_schedule alert after tick; got %d (all open: %+v)", staleCount, open)
}
// A succeeded backup should clear the stale_schedule alert.
eng.handleJobFinished(ctx, JobFinishedEvent{
HostID: hostID,
JobID: ulid.Make().String(),
Kind: "backup",
Status: "succeeded",
SourceGroupID: sgID,
When: time.Now().UTC(),
})
open, _ = st.ListAlerts(ctx, store.AlertFilter{Status: "open", HostID: hostID})
for _, a := range open {
if a.Kind == KindStaleSchedule {
t.Fatalf("expected stale_schedule to be resolved after backup succeeded; still open: %+v", a)
}
}
}
// TestNoStalenessWithoutEnabledSchedule checks that no stale_schedule is
// raised for an intermittent host with a stale backup but no enabled schedule.
func TestNoStalenessWithoutEnabledSchedule(t *testing.T) {
t.Parallel()
eng, st, hostID := setupEngine(t)
ctx := context.Background()
// Make intermittent.
if err := st.SetHostAlwaysOn(ctx, hostID, false); err != nil {
t.Fatalf("SetHostAlwaysOn: %v", err)
}
// Set last_backup_at to 8 days ago — stale — but no schedule.
eightDaysAgo := time.Now().UTC().Add(-8 * 24 * time.Hour)
if err := st.SetHostLastBackup(ctx, hostID, "succeeded", eightDaysAgo); err != nil {
t.Fatalf("SetHostLastBackup: %v", err)
}
eng.tick(ctx, time.Now().UTC())
open, _ := st.ListAlerts(ctx, store.AlertFilter{Status: "open", HostID: hostID})
for _, a := range open {
if a.Kind == KindStaleSchedule {
t.Fatalf("expected no stale_schedule without an enabled schedule; got: %+v", a)
}
}
}
// TestResolveOnModeChangeClearsOfflineAlert checks that ResolveOnModeChange
// clears an open agent_offline alert when a host's mode is toggled.
func TestResolveOnModeChangeClearsOfflineAlert(t *testing.T) {
t.Parallel()
eng, st, hostID := setupEngine(t)
ctx := context.Background()
// Make always-on and set it offline with a stale last_seen_at.
if err := st.SetHostAlwaysOn(ctx, hostID, true); err != nil {
t.Fatalf("SetHostAlwaysOn: %v", err)
}
if _, err := st.DB().Exec(
`UPDATE hosts SET last_seen_at = ?, status = ? WHERE id = ?`,
time.Now().UTC().Add(-2*time.Hour).Format(time.RFC3339Nano),
"offline",
hostID,
); err != nil {
t.Fatalf("update last_seen_at: %v", err)
}
// Raise the offline alert.
eng.handleHostOffline(ctx, hostID)
open, _ := st.ListAlerts(ctx, store.AlertFilter{Status: "open", HostID: hostID})
if len(open) != 1 || open[0].Kind != KindAgentOffline {
t.Fatalf("expected 1 agent_offline alert before mode change; got %d: %+v", len(open), open)
}
// Toggle mode — should clear the alert.
eng.ResolveOnModeChange(ctx, hostID, time.Now().UTC())
open, _ = st.ListAlerts(ctx, store.AlertFilter{Status: "open", HostID: hostID})
for _, a := range open {
if a.Kind == KindAgentOffline {
t.Fatalf("expected agent_offline to be resolved after mode change; still open: %+v", a)
}
}
}
// TestNoStalenessWhenNeverBackedUp checks that no stale_schedule alert is
// raised for an intermittent host that has never backed up (nil LastBackupAt).
func TestNoStalenessWhenNeverBackedUp(t *testing.T) {
t.Parallel()
eng, st, hostID := setupEngine(t)
ctx := context.Background()
// Make intermittent.
if err := st.SetHostAlwaysOn(ctx, hostID, false); err != nil {
t.Fatalf("SetHostAlwaysOn: %v", err)
}
// Create a source group and an enabled schedule — but do NOT set LastBackupAt.
sgID := ulid.Make().String()
if err := st.CreateSourceGroup(ctx, &store.SourceGroup{
ID: sgID,
HostID: hostID,
Name: "default",
Includes: []string{"/home"},
}); err != nil {
t.Fatalf("CreateSourceGroup: %v", err)
}
schedID := ulid.Make().String()
if err := st.CreateSchedule(ctx, &store.Schedule{
ID: schedID,
HostID: hostID,
CronExpr: "0 2 * * *",
Enabled: true,
SourceGroupIDs: []string{sgID},
}); err != nil {
t.Fatalf("CreateSchedule: %v", err)
}
eng.tick(ctx, time.Now().UTC())
open, _ := st.ListAlerts(ctx, store.AlertFilter{Status: "open", HostID: hostID})
for _, a := range open {
if a.Kind == KindStaleSchedule {
t.Fatalf("expected no stale_schedule when never backed up; got: %+v", a)
}
}
}
+14 -4
View File
@@ -27,10 +27,10 @@ const (
// integrity is at risk) when a check job fails.
KindCheckFailed = "check_failed"
// KindStaleSchedule is declared for completeness but intentionally
// left as a no-op in v1. The precise "expected to have fired but
// didn't" logic requires a store helper that lands in a follow-up
// task. Ask the team before implementing.
// KindStaleSchedule is raised for intermittent (non-always-on) hosts
// when their last successful backup is older than staleBackupThreshold
// (7 days) and they have at least one enabled schedule. Resolved on
// backup success or when the host is switched to always-on mode.
KindStaleSchedule = "stale_schedule"
// KindAgentOffline is raised when a host's last_seen_at is older
@@ -122,6 +122,16 @@ func alertPayload(ctx context.Context, st *store.Store, ev notification.Event, a
}
}
// ResolveOnModeChange clears any open agent_offline and stale_schedule
// alerts for a host whose always-on flag was just toggled. The next
// 60s tick re-raises whichever still applies under the new mode, so
// this is a self-correcting "wipe and let the sweep settle" call.
// Safe to invoke from the HTTP layer (it only touches the store + hub).
func (e *Engine) ResolveOnModeChange(ctx context.Context, hostID string, when time.Time) {
e.resolveAndNotify(ctx, hostID, KindAgentOffline, "", when)
e.resolveAndNotify(ctx, hostID, KindStaleSchedule, "", when)
}
// resolveAndNotify clears the open (or acknowledged) alert matching
// (host_id, kind, dedup_key) via store.AutoResolve, then fires
// alert.resolved for the row(s) actually closed. Best-effort —
+112
View File
@@ -6,6 +6,8 @@
package http
import (
"context"
"log/slog"
"time"
)
@@ -27,3 +29,113 @@ func scheduleOverdue(cronExpr string, lastBackup *time.Time, now time.Time) bool
next := sched.Next(*lastBackup)
return !next.After(now)
}
// catchupSettle is how long after a reconnect we wait before evaluating
// catch-up, so a laptop that wakes briefly and sleeps again doesn't
// trigger a backup it can't finish. ~1 minute per the spec.
const catchupSettle = 60 * time.Second
// ArmCatchup records that an intermittent host just reconnected and
// should be evaluated for a missed backup after the settle window.
// No-op for always-on hosts (caller passes only intermittent hosts).
// Re-arming overwrites the timer (debounce — flapping doesn't stack).
func (s *Server) ArmCatchup(hostID string, now time.Time) {
s.catchupMu.Lock()
defer s.catchupMu.Unlock()
s.catchupDueAt[hostID] = now.Add(catchupSettle)
}
// dueCatchups returns the hostIDs whose settle window has elapsed and
// removes them from the map. Caller evaluates each.
func (s *Server) dueCatchups(now time.Time) []string {
s.catchupMu.Lock()
defer s.catchupMu.Unlock()
var due []string
for id, at := range s.catchupDueAt {
if !now.Before(at) {
due = append(due, id)
delete(s.catchupDueAt, id)
}
}
return due
}
// RunCatchupsDue is the tick entrypoint. For each host past its settle
// window it dispatches a backup for every enabled schedule that is
// overdue. Skips hosts that bounced back offline, that are already
// running/queued a job, or that turned out to be always-on.
func (s *Server) RunCatchupsDue(ctx context.Context) {
if s.deps.Hub == nil {
return
}
now := time.Now().UTC()
for _, hostID := range s.dueCatchups(now) {
s.runCatchup(ctx, hostID, now)
}
}
// runCatchup evaluates and dispatches catch-up backups for a single
// host. Kept separate so RunCatchupsDue reads cleanly.
func (s *Server) runCatchup(ctx context.Context, hostID string, now time.Time) {
conn := s.deps.Hub.Conn(hostID)
if conn == nil {
return // bounced offline during the settle window; re-arms on next hello
}
host, err := s.deps.Store.GetHost(ctx, hostID)
if err != nil {
slog.Warn("catchup: load host", "host_id", hostID, "err", err)
return
}
if host.AlwaysOn {
return // mode flipped during settle window
}
// Skip if a backup is already queued or running for this host —
// don't pile a catch-up on top of in-flight work. (hosts.current_job_id
// is not maintained, so we check the jobs table directly.)
active, err := s.deps.Store.HasActiveBackupJob(ctx, hostID)
if err != nil {
slog.Warn("catchup: check active backup", "host_id", hostID, "err", err)
return
}
if active {
return
}
schedules, err := s.deps.Store.ListSchedulesByHost(ctx, hostID)
if err != nil {
slog.Warn("catchup: list schedules", "host_id", hostID, "err", err)
return
}
// NOTE: overdue is measured against host.LastBackupAt, which is the
// most recent *successful backup of any schedule* on this host — not
// a per-schedule timestamp. For the common intermittent host (a
// single backup schedule) this is exact. With multiple schedules of
// different cadences, a recent backup from one schedule can mask
// another schedule's missed window. Acceptable for v1; revisit with
// per-schedule last-success tracking if multi-cadence laptops appear.
for _, sc := range schedules {
if !sc.Enabled || len(sc.SourceGroupIDs) == 0 {
continue
}
if !scheduleOverdue(sc.CronExpr, host.LastBackupAt, now) {
continue
}
for _, gid := range sc.SourceGroupIDs {
g, err := s.deps.Store.GetSourceGroup(ctx, hostID, gid)
if err != nil {
slog.Warn("catchup: load source group",
"host_id", hostID, "schedule_id", sc.ID, "group_id", gid, "err", err)
continue
}
if _, derr := s.dispatchBackupForGroupCore(ctx, conn, hostID, sc.ID, g, now); derr != nil {
// Send failed for this group — host may have dropped
// again. Earlier groups in this batch were already
// dispatched; re-arm so a later reconnect re-evaluates
// any still-overdue schedules.
s.ArmCatchup(hostID, now)
return
}
slog.Info("catchup: dispatched missed backup",
"host_id", hostID, "schedule_id", sc.ID, "group", g.Name)
}
}
}
@@ -0,0 +1,246 @@
// catchup_scheduler_test.go — integration tests for the catch-up scheduler.
package http
import (
"context"
"testing"
"time"
"github.com/oklog/ulid/v2"
"gitea.dcglab.co.uk/steve/restic-manager/internal/api"
"gitea.dcglab.co.uk/steve/restic-manager/internal/store"
)
// TestRunCatchupDispatchesOverdue verifies four properties of the
// catch-up scheduler in separate sub-tests sharing no state.
func TestRunCatchupDispatchesOverdue(t *testing.T) {
t.Parallel()
// --- 1. Overdue host with connected agent → backup dispatched -------
t.Run("overdue_dispatch", func(t *testing.T) {
t.Parallel()
srv, ts, st := rawTestServer(t)
hostID, token := enrolHostForWS(t, srv, st, "catchup-overdue")
if err := st.SetHostAlwaysOn(context.Background(), hostID, false); err != nil {
t.Fatalf("set always_on: %v", err)
}
// Last backup ~8 days ago → schedule overdue.
eightDaysAgo := time.Now().UTC().Add(-8 * 24 * time.Hour)
if err := st.SetHostLastBackup(context.Background(), hostID, "succeeded", eightDaysAgo); err != nil {
t.Fatalf("set last backup: %v", err)
}
if err := st.CreateJob(context.Background(), store.Job{
ID: ulid.Make().String(), HostID: hostID, Kind: "init",
ActorKind: "system", CreatedAt: time.Now().UTC(),
}); err != nil {
t.Fatalf("seed init: %v", err)
}
gid := ulid.Make().String()
if err := st.CreateSourceGroup(context.Background(), &store.SourceGroup{
ID: gid, HostID: hostID, Name: "home", Includes: []string{"/home"},
}); err != nil {
t.Fatalf("source group: %v", err)
}
sid := ulid.Make().String()
if err := st.CreateSchedule(context.Background(), &store.Schedule{
ID: sid, HostID: hostID, CronExpr: "0 2 * * *", Enabled: true,
SourceGroupIDs: []string{gid},
}); err != nil {
t.Fatalf("schedule: %v", err)
}
c := agentDial(t, srv, ts, hostID, token)
sendHello(t, c, "catchup-overdue")
_ = drainUntil(t, c, api.MsgScheduleSet)
// Arm with a past time so the settle window is already elapsed.
srv.ArmCatchup(hostID, time.Now().UTC().Add(-2*time.Minute))
srv.RunCatchupsDue(context.Background())
// Give the dispatch goroutine a moment to write the job row.
time.Sleep(100 * time.Millisecond)
var n int
if err := st.DB().QueryRow(
`SELECT COUNT(*) FROM jobs WHERE host_id = ? AND kind = 'backup'`, hostID).Scan(&n); err != nil {
t.Fatalf("count: %v", err)
}
if n < 1 {
t.Errorf("overdue host: want ≥1 backup job, got %d", n)
}
})
// --- 2. Not overdue → no dispatch -----------------------------------
t.Run("not_overdue_no_dispatch", func(t *testing.T) {
t.Parallel()
srv, ts, st := rawTestServer(t)
hostID, token := enrolHostForWS(t, srv, st, "catchup-notoverdue")
if err := st.SetHostAlwaysOn(context.Background(), hostID, false); err != nil {
t.Fatalf("set always_on: %v", err)
}
// Last backup just now → not overdue.
now := time.Now().UTC()
if err := st.SetHostLastBackup(context.Background(), hostID, "succeeded", now); err != nil {
t.Fatalf("set last backup: %v", err)
}
if err := st.CreateJob(context.Background(), store.Job{
ID: ulid.Make().String(), HostID: hostID, Kind: "init",
ActorKind: "system", CreatedAt: now,
}); err != nil {
t.Fatalf("seed init: %v", err)
}
gid := ulid.Make().String()
if err := st.CreateSourceGroup(context.Background(), &store.SourceGroup{
ID: gid, HostID: hostID, Name: "home", Includes: []string{"/home"},
}); err != nil {
t.Fatalf("source group: %v", err)
}
sid := ulid.Make().String()
if err := st.CreateSchedule(context.Background(), &store.Schedule{
ID: sid, HostID: hostID, CronExpr: "0 2 * * *", Enabled: true,
SourceGroupIDs: []string{gid},
}); err != nil {
t.Fatalf("schedule: %v", err)
}
c := agentDial(t, srv, ts, hostID, token)
sendHello(t, c, "catchup-notoverdue")
_ = drainUntil(t, c, api.MsgScheduleSet)
srv.ArmCatchup(hostID, time.Now().UTC().Add(-2*time.Minute))
srv.RunCatchupsDue(context.Background())
time.Sleep(100 * time.Millisecond)
var n int
if err := st.DB().QueryRow(
`SELECT COUNT(*) FROM jobs WHERE host_id = ? AND kind = 'backup'`, hostID).Scan(&n); err != nil {
t.Fatalf("count: %v", err)
}
if n != 0 {
t.Errorf("not-overdue host: want 0 backup jobs, got %d", n)
}
})
// --- 3. Active backup in flight → no new dispatch -------------------
t.Run("active_backup_blocks_dispatch", func(t *testing.T) {
t.Parallel()
srv, ts, st := rawTestServer(t)
hostID, token := enrolHostForWS(t, srv, st, "catchup-active")
if err := st.SetHostAlwaysOn(context.Background(), hostID, false); err != nil {
t.Fatalf("set always_on: %v", err)
}
eightDaysAgo := time.Now().UTC().Add(-8 * 24 * time.Hour)
if err := st.SetHostLastBackup(context.Background(), hostID, "succeeded", eightDaysAgo); err != nil {
t.Fatalf("set last backup: %v", err)
}
if err := st.CreateJob(context.Background(), store.Job{
ID: ulid.Make().String(), HostID: hostID, Kind: "init",
ActorKind: "system", CreatedAt: time.Now().UTC(),
}); err != nil {
t.Fatalf("seed init: %v", err)
}
gid := ulid.Make().String()
if err := st.CreateSourceGroup(context.Background(), &store.SourceGroup{
ID: gid, HostID: hostID, Name: "home", Includes: []string{"/home"},
}); err != nil {
t.Fatalf("source group: %v", err)
}
sid := ulid.Make().String()
if err := st.CreateSchedule(context.Background(), &store.Schedule{
ID: sid, HostID: hostID, CronExpr: "0 2 * * *", Enabled: true,
SourceGroupIDs: []string{gid},
}); err != nil {
t.Fatalf("schedule: %v", err)
}
// Seed a queued backup job — this is "already in flight".
if err := st.CreateJob(context.Background(), store.Job{
ID: ulid.Make().String(), HostID: hostID, Kind: "backup",
ActorKind: "schedule", CreatedAt: time.Now().UTC(),
}); err != nil {
t.Fatalf("seed queued backup: %v", err)
}
c := agentDial(t, srv, ts, hostID, token)
sendHello(t, c, "catchup-active")
_ = drainUntil(t, c, api.MsgScheduleSet)
srv.ArmCatchup(hostID, time.Now().UTC().Add(-2*time.Minute))
srv.RunCatchupsDue(context.Background())
time.Sleep(100 * time.Millisecond)
var n int
if err := st.DB().QueryRow(
`SELECT COUNT(*) FROM jobs WHERE host_id = ? AND kind = 'backup'`, hostID).Scan(&n); err != nil {
t.Fatalf("count: %v", err)
}
// Count must still be exactly 1 — no second job added.
if n != 1 {
t.Errorf("active backup guard: want 1 job (the seeded one), got %d", n)
}
})
// --- 4. Disconnected host → no dispatch -----------------------------
t.Run("disconnected_no_dispatch", func(t *testing.T) {
t.Parallel()
srv, _, st := rawTestServer(t)
hostID, _ := enrolHostForWS(t, srv, st, "catchup-disconnected")
if err := st.SetHostAlwaysOn(context.Background(), hostID, false); err != nil {
t.Fatalf("set always_on: %v", err)
}
eightDaysAgo := time.Now().UTC().Add(-8 * 24 * time.Hour)
if err := st.SetHostLastBackup(context.Background(), hostID, "succeeded", eightDaysAgo); err != nil {
t.Fatalf("set last backup: %v", err)
}
if err := st.CreateJob(context.Background(), store.Job{
ID: ulid.Make().String(), HostID: hostID, Kind: "init",
ActorKind: "system", CreatedAt: time.Now().UTC(),
}); err != nil {
t.Fatalf("seed init: %v", err)
}
gid := ulid.Make().String()
if err := st.CreateSourceGroup(context.Background(), &store.SourceGroup{
ID: gid, HostID: hostID, Name: "home", Includes: []string{"/home"},
}); err != nil {
t.Fatalf("source group: %v", err)
}
sid := ulid.Make().String()
if err := st.CreateSchedule(context.Background(), &store.Schedule{
ID: sid, HostID: hostID, CronExpr: "0 2 * * *", Enabled: true,
SourceGroupIDs: []string{gid},
}); err != nil {
t.Fatalf("schedule: %v", err)
}
// Host is NOT connected — no agentDial.
srv.ArmCatchup(hostID, time.Now().UTC().Add(-2*time.Minute))
srv.RunCatchupsDue(context.Background())
time.Sleep(100 * time.Millisecond)
var n int
if err := st.DB().QueryRow(
`SELECT COUNT(*) FROM jobs WHERE host_id = ? AND kind = 'backup'`, hostID).Scan(&n); err != nil {
t.Fatalf("count: %v", err)
}
if n != 0 {
t.Errorf("disconnected host: want 0 backup jobs, got %d", n)
}
})
}
+6
View File
@@ -483,6 +483,12 @@ func (s *Server) onAgentHello(ctx context.Context, hostID string, conn *ws.Conn)
// and the drain may take seconds across many rows. A non-blocking
// goroutine keeps the hello path snappy.
go s.DrainPending(context.Background(), hostID)
// Intermittent hosts that just reconnected may have slept through a
// backup window. Arm a catch-up evaluation after a settle delay; the
// pending-drain tick fires it. Always-on hosts never need this.
if host, err := s.deps.Store.GetHost(ctx, hostID); err == nil && !host.AlwaysOn {
s.ArmCatchup(hostID, time.Now().UTC())
}
}
// maybeAutoInit dispatches a `restic init` job iff the host has no
+14 -5
View File
@@ -90,6 +90,13 @@ type Server struct {
// directories (P3-X2). Pre-allocated in New so the lazy-init
// race is impossible.
treeCache *treeCache
// catchupDueAt tracks intermittent hosts that reconnected and are
// in their settle window. Keyed hostID → earliest time to evaluate
// catch-up. Best-effort + in-memory: a server restart simply re-arms
// on the next hello. Guarded by catchupMu.
catchupMu sync.Mutex
catchupDueAt map[string]time.Time
}
// New builds a configured but not-yet-started server.
@@ -104,11 +111,12 @@ func New(deps Deps) *Server {
r.Use(requestLogger)
s := &Server{
deps: deps,
drainLocks: make(map[string]*sync.Mutex),
announceRL: newAnnounceLimiter(),
pendingHub: newPendingHub(),
treeCache: newTreeCache(),
deps: deps,
drainLocks: make(map[string]*sync.Mutex),
announceRL: newAnnounceLimiter(),
pendingHub: newPendingHub(),
treeCache: newTreeCache(),
catchupDueAt: make(map[string]time.Time),
}
s.routes(r)
@@ -279,6 +287,7 @@ func (s *Server) routes(r chi.Router) {
r.Post("/hosts/{id}/repo/probe", s.handleUIRepoProbe)
r.Post("/hosts/{id}/repo/hooks", s.handleUIRepoHooksSave)
r.Post("/hosts/{id}/tags", s.handleUIHostTagsSave)
r.Post("/hosts/{id}/mode", s.handleUIHostModeSave)
r.Post("/hosts/{id}/admin-credentials", s.handleUIAdminCredentialsSave)
r.Post("/hosts/{id}/admin-credentials/delete", s.handleUIAdminCredentialsDelete)
r.Post("/hosts/{id}/schedules/new", s.handleUIScheduleSave)
@@ -49,8 +49,14 @@ func TestDashboard_HostRowSparklineRendersWithHistory(t *testing.T) {
hostID := makeHost(t, st, "h-spark")
ctx := context.Background()
// Two history points → polyline must render.
for i, day := range []string{"2026-05-05", "2026-05-06"} {
// Two history points → polyline must render. Use dates relative to
// now so the points always fall inside the dashboard's rolling
// 30-day window (ui_handlers.go: since = now-30d); hard-coded dates
// silently age out of the window and break this test over time.
for i, day := range []string{
time.Now().UTC().AddDate(0, 0, -2).Format("2006-01-02"),
time.Now().UTC().AddDate(0, 0, -1).Format("2006-01-02"),
} {
v := int64(100 + i*50)
if err := st.UpsertHostRepoStatsHistory(ctx, hostID, day,
store.HostRepoStats{TotalSizeBytes: &v}, time.Now().UTC()); err != nil {
+37
View File
@@ -983,6 +983,43 @@ func (s *Server) handleUIHostTagsSave(w stdhttp.ResponseWriter, r *stdhttp.Reque
stdhttp.Redirect(w, r, "/hosts/"+hostID, stdhttp.StatusSeeOther)
}
// handleUIHostModeSave flips a host's always-on flag. Checkbox present
// in the form (value any) => always-on; absent => intermittent.
// Operator-band; mounted in server.go. On change we clear open
// offline/staleness alerts via the engine so the next sweep re-raises
// only what still applies under the new mode.
func (s *Server) handleUIHostModeSave(w stdhttp.ResponseWriter, r *stdhttp.Request) {
u := s.requireUIUser(w, r)
if u == nil {
return
}
hostID := chi.URLParam(r, "id")
if _, err := s.deps.Store.GetHost(r.Context(), hostID); err != nil {
stdhttp.NotFound(w, r)
return
}
if err := r.ParseForm(); err != nil {
stdhttp.Error(w, "bad request", stdhttp.StatusBadRequest)
return
}
alwaysOn := r.PostForm.Get("always_on") != ""
if err := s.deps.Store.SetHostAlwaysOn(r.Context(), hostID, alwaysOn); err != nil {
slog.Error("ui host mode: save", "host_id", hostID, "err", err)
stdhttp.Error(w, "internal", stdhttp.StatusInternalServerError)
return
}
if s.deps.AlertEngine != nil {
s.deps.AlertEngine.ResolveOnModeChange(r.Context(), hostID, time.Now().UTC())
}
_ = s.deps.Store.AppendAudit(r.Context(), store.AuditEntry{
ID: ulid.Make().String(), UserID: &u.ID, Actor: "user",
Action: "host.mode_updated",
TargetKind: ptr("host"), TargetID: &hostID,
TS: time.Now().UTC(),
})
stdhttp.Redirect(w, r, "/hosts/"+hostID, stdhttp.StatusSeeOther)
}
// normaliseTags splits a comma-separated string, lowercases each token,
// trims whitespace, drops empties, and dedupes. Order is preserved
// from first occurrence (so the user's typing order shows on screen).
+88
View File
@@ -0,0 +1,88 @@
// ui_host_mode_test.go — covers handleUIHostModeSave: toggling a
// host's always-on flag via POST /hosts/{id}/mode.
package http
import (
"context"
stdhttp "net/http"
"net/url"
"strings"
"testing"
)
// TestHostModeSaveToggle verifies the checkbox-absent ⇒ intermittent
// and checkbox-present ⇒ always-on semantics, and that the audit row
// lands for each request.
func TestHostModeSaveToggle(t *testing.T) {
t.Parallel()
_, ts, st := rawTestServerWithUI(t)
hostID, _ := enrolHostForUI(t, nil, st, "mode-toggle-host")
cookie := loginAsAdmin(t, st)
cli := &stdhttp.Client{
CheckRedirect: func(*stdhttp.Request, []*stdhttp.Request) error {
return stdhttp.ErrUseLastResponse
},
}
// --- POST with no always_on field => intermittent ---
form := url.Values{}
req, _ := stdhttp.NewRequest("POST", ts.URL+"/hosts/"+hostID+"/mode",
strings.NewReader(form.Encode()))
req.Header.Set("Content-Type", "application/x-www-form-urlencoded")
req.AddCookie(cookie)
res, err := cli.Do(req)
if err != nil {
t.Fatalf("do: %v", err)
}
_ = res.Body.Close()
if res.StatusCode != stdhttp.StatusSeeOther {
t.Fatalf("status: got %d, want 303", res.StatusCode)
}
if loc := res.Header.Get("Location"); loc != "/hosts/"+hostID {
t.Errorf("Location: got %q, want /hosts/%s", loc, hostID)
}
got, err := st.GetHost(context.Background(), hostID)
if err != nil {
t.Fatalf("GetHost: %v", err)
}
if got.AlwaysOn {
t.Errorf("AlwaysOn after empty form: got true, want false")
}
// --- POST with always_on=on => always-on ---
form2 := url.Values{"always_on": {"on"}}
req2, _ := stdhttp.NewRequest("POST", ts.URL+"/hosts/"+hostID+"/mode",
strings.NewReader(form2.Encode()))
req2.Header.Set("Content-Type", "application/x-www-form-urlencoded")
req2.AddCookie(cookie)
res2, err := cli.Do(req2)
if err != nil {
t.Fatalf("do: %v", err)
}
_ = res2.Body.Close()
if res2.StatusCode != stdhttp.StatusSeeOther {
t.Fatalf("status: got %d, want 303", res2.StatusCode)
}
got2, err := st.GetHost(context.Background(), hostID)
if err != nil {
t.Fatalf("GetHost: %v", err)
}
if !got2.AlwaysOn {
t.Errorf("AlwaysOn after always_on=on: got false, want true")
}
// Audit rows must exist (one per request).
var n int
if err := st.DB().QueryRow(
`SELECT COUNT(*) FROM audit_log WHERE action = 'host.mode_updated' AND target_id = ?`,
hostID).Scan(&n); err != nil {
t.Fatalf("count audit: %v", err)
}
if n != 2 {
t.Errorf("audit rows: got %d, want 2", n)
}
}
+16
View File
@@ -270,6 +270,22 @@ func (s *Store) LatestJobByKind(ctx context.Context, hostID, kind string) (*Job,
return &j, nil
}
// HasActiveBackupJob reports whether the host has a backup job that is
// still queued or running. The catch-up scheduler uses this to avoid
// dispatching a duplicate backup alongside one already in flight
// (hosts.current_job_id is not maintained, so this is the authoritative
// in-flight check).
func (s *Store) HasActiveBackupJob(ctx context.Context, hostID string) (bool, error) {
var exists bool
err := s.db.QueryRowContext(ctx,
`SELECT EXISTS(SELECT 1 FROM jobs WHERE host_id = ? AND kind = 'backup' AND status IN ('queued','running'))`,
hostID).Scan(&exists)
if err != nil {
return false, fmt.Errorf("store: has active backup job: %w", err)
}
return exists, nil
}
// HasJobOfKind reports whether any job of the given kind exists for
// this host, regardless of status. Used by the auto-init path on
// agent hello to decide whether to dispatch a fresh `restic init` —
+1
View File
@@ -499,6 +499,7 @@ Sizes: **S** = under a day, **M** = 13 days, **L** = 37 days.
- [x] **NS-05** Drop redundant `actions/setup-go` from `.gitea/workflows/ci.yml`. ✅ Already gone — verified `.gitea/workflows/ci.yml` has zero `actions/setup-go@v5` invocations and no `GO_VERSION` env; the file's header comment now documents that the runner image (`gitea.dcglab.co.uk/steve/ci-runner-go`) is the single source of truth for the Go version. Closing as done; no further code change needed.
- [x] **NS-06** Remove the permanently-disabled "Run backup now" button from `web/templates/partials/host_chrome.html`. ✅ Landed: dropped the disabled tombstone button from the host header action row; only "Edit credentials" + the ⋯ menu remain. Per-source-group Run-now on `/hosts/{id}/sources` is the only path now. No e2e change needed — `smoke.spec.ts` does not assert on host_chrome's button row.
- [x] **NS-07** Relative timestamps go stale on long-open tabs. ✅ Landed: `formatRelTime` now wraps its label in `<time data-rel-ts=…>` and both layouts (`base.html`, `chromeless.html`) carry a small ticker that re-renders every 30s, so a page rendered an hour ago no longer keeps showing "2h ago" when the wall-clock truth is "3h ago". Covered by `funcs_test.go`. The bug: every relative label was computed once at server render and never updated client-side, so a job-detail page left open drifted further from reality the longer it sat.
- [x] **NS-08** Always-On vs intermittent host mode. ✅ Landed: a host can now be marked not-always-on (laptop/workstation) so it stops generating offline-alert noise when it legitimately sleeps. Migration 0024 adds `hosts.always_on` (default 1 = today's 24×7 behaviour; intermittent is strictly opt-in). The alert engine suppresses `agent_offline` for intermittent hosts and instead wires up the previously-dead `stale_schedule` alert for them — raised at a 7-day global threshold when the host has an enabled schedule and a stale last backup, resolved on the next successful backup. A new server-side catch-up scheduler (`internal/server/http/catchup.go`) arms on agent hello and fires from the existing 30s pending-drain tick: ~60s after an intermittent host reconnects it dispatches a backup for any enabled schedule whose window elapsed while asleep (overdue = `cron.Next(lastBackup) <= now`, reusing the shared `cronParser`), guarded against firing when the host bounced offline, flipped to always-on, or already has a job running. Overdue is measured against the per-host `LastBackupAt` (exact for the common single-schedule laptop; a known coarseness for multi-cadence hosts, documented in code). Operator toggle via `POST /hosts/{id}/mode` (audited `host.mode_updated`), which also clears open offline/staleness alerts so the next sweep re-settles. UI: intermittent offline hosts render a calm grey `asleep · <relTime> · will catch up on return` state (new `.dot-asleep`) instead of red "offline"; a `24×7` chip shows only for always-on hosts; a "presence" inline toggle on the host header. Design + plan in `docs/specs/2026-06-15-always-on-host-mode-design.md` and `docs/plans/2026-06-15-always-on-host-mode.md`. Spec §2 (online/offline mechanics) deliberately left untouched. Out of scope for v1: per-host staleness thresholds, continuous (non-reconnect) overdue evaluation, per-schedule last-success tracking.
- [x] **NS-04** Dashboard parity with the alerts screen: live refresh, column sorting, filters. ✅ Landed: `/` now parses `q`/`status`/`repo_status`/`tag`/`sort`/`dir` query params (round-trip durable for bookmarks); table is wrapped in an `id="hosts-table"` htmx live-poll matching the alerts cadence (5s, gated on `document.visibilityState` and `localStorage.rm-dashboard-live`); filter row above the table with hostname free-text + status + repo_status selects + tag chips + clear; column headers (Host / OS · arch / Last backup / Repo size / Snapshots) are clickable links that toggle direction on the active column; pure-Go sort+filter pipeline covered by `dashboard_filter_test.go`. Original scope below. live refresh, column sorting, filters. The host list is currently a static render — operators have to reload to see new heartbeats / job state changes. Mirror the alerts pattern (`web/templates/pages/alerts.html` uses `hx-trigger="every 5s [document.visibilityState==='visible' && localStorage.getItem('rm-alerts-live')!=='off']"` plus a Live/Off toggle so background tabs and explicit-off don't burn server cycles). Add: server-side sort on every meaningful column (name, OS, last-backup time, last-backup status, agent online/offline, restic version, tags), and a small filter row above the table — at minimum free-text on hostname, status (online/offline/never-seen), and tag chips. Columns + filter state should round-trip through query string so a bookmarked / shared URL is durable. Re-use the `host_row` partial that already exists so the live-refresh swap is a clean OOB swap, not a full table re-render.
---
File diff suppressed because one or more lines are too long
+12
View File
@@ -70,6 +70,7 @@
.dot-online { background: var(--ok); box-shadow: 0 0 0 3px color-mix(in oklch, var(--ok), transparent 80%); }
.dot-degraded { background: var(--warn); box-shadow: 0 0 0 3px color-mix(in oklch, var(--warn), transparent 80%); }
.dot-offline { background: var(--off); }
.dot-asleep { background: var(--ink-fade); opacity: 0.6; }
.dot-failed { background: var(--bad); box-shadow: 0 0 0 3px color-mix(in oklch, var(--bad), transparent 80%); }
.pulse { animation: rm-pulse 2.4s ease-in-out infinite; }
@keyframes rm-pulse {
@@ -195,6 +196,17 @@
}
.tag-removable .x { color: var(--ink-fade); cursor: pointer; padding-left: 2px; }
/* ---------- header meta groups (boxed tags / presence pills) ---------- */
.meta-group {
display: inline-flex; align-items: center; gap: 6px;
font-size: 11px; line-height: 1; padding: 3px 9px;
border: 1px solid var(--line); border-radius: 5px;
background: color-mix(in oklch, var(--ink), transparent 95%);
}
.meta-group .meta-label { color: var(--ink-mute); }
.meta-group .meta-val { color: var(--ink-mid); text-decoration: none; }
.meta-group a.meta-val:hover { color: var(--ink); text-decoration: underline; }
/* ---------- form fields ---------- */
.field-label { font-size: 12px; color: var(--ink-mid); margin-bottom: 6px; display: block; }
.field-help { font-size: 12px; color: var(--ink-mute); margin-top: 6px; line-height: 1.55; }
+44 -7
View File
@@ -34,17 +34,32 @@
{{else if eq $host.Status "degraded"}}
<span class="dot dot-degraded"></span>
{{else if eq $host.Status "offline"}}
<span class="dot dot-offline"></span>
{{if $host.AlwaysOn}}
<span class="dot dot-offline"></span>
{{else}}
<span class="dot dot-asleep"></span>
{{end}}
{{else}}
<span class="dot dot-failed"></span>
{{end}}
<h1 class="mono text-[26px] font-medium tracking-[0.005em] text-ink">{{$host.Name}}</h1>
<div class="flex gap-1.5 items-center">
{{range $host.Tags}}<a href="/?tag={{.}}" class="tag" title="filter dashboard by this tag">{{.}}</a>{{end}}
<button type="button" class="text-ink-fade text-[11px] hover:text-ink-mid whitespace-nowrap"
style="padding: 2px 8px; border: 1px dashed var(--line); border-radius: 3px; cursor: pointer;"
<div class="flex items-center gap-2.5">
{{/* tags group pill — click the "tags" label to edit; the tag
values still filter the dashboard by that tag. */}}
<span class="meta-group">
<span class="meta-label cursor-pointer hover:text-ink"
onclick="document.getElementById('tags-edit-{{$host.ID}}').classList.toggle('hidden')"
title="Edit tags">{{if $host.Tags}}edit tags{{else}}add tags{{end}}</button>
title="Edit tags">tags</span>
{{range $host.Tags}}<a href="/?tag={{.}}" class="meta-val" title="filter dashboard by this tag">{{.}}</a>{{end}}
{{if not $host.Tags}}<span class="meta-val"></span>{{end}}
</span>
{{/* presence group pill — click anywhere to edit. */}}
<span class="meta-group cursor-pointer"
onclick="document.getElementById('mode-edit-{{$host.ID}}').classList.toggle('hidden')"
title="Change presence mode">
<span class="meta-label">presence</span>
<span class="meta-val">{{if $host.AlwaysOn}}24x7{{else}}Free{{end}}</span>
</span>
</div>
{{if gt $page.ScheduleVersion 0}}
<span class="mono text-[11px] text-ink-mute ml-2">
@@ -80,6 +95,24 @@
</div>
<div class="field-help">Comma-separated. Lowercased automatically.</div>
</form>
{{/* Presence-mode editor — hidden by default; toggled by the
"presence" button. Checkbox present => always-on (24×7);
unchecked => intermittent (laptop): no offline alerts, shows
"asleep", auto-catches-up a missed backup on reconnect. */}}
<form id="mode-edit-{{$host.ID}}" method="post"
action="/hosts/{{$host.ID}}/mode"
class="hidden mt-3" style="max-width: 640px;">
<label class="flex items-center gap-2 text-[12px] text-ink-mid">
<input type="checkbox" name="always_on" value="on" {{if $host.AlwaysOn}}checked{{end}} />
Always On — expected online 24×7
</label>
<div class="field-help">
Uncheck for an intermittent host (laptop/workstation): it won't
raise offline alerts when asleep, shows an "asleep" state, and
catches up a missed backup ~1 minute after it reconnects.
</div>
<button type="submit" class="btn btn-primary mt-2 whitespace-nowrap">Save presence</button>
</form>
<div class="flex items-center gap-3 mt-3 text-[13px] text-ink-mute">
<span class="mono text-ink-mid">{{$host.OS}}/{{$host.Arch}}</span>
<span class="text-ink-fade">·</span>
@@ -88,7 +121,11 @@
<span>restic <span class="mono text-ink-mid">{{if $host.ResticVersion}}{{$host.ResticVersion}}{{else}}—{{end}}</span></span>
<span class="text-ink-fade">·</span>
{{if eq $host.Status "offline"}}
<span>last seen <span class="mono text-ink-mid">{{relTime $host.LastSeenAt}}</span></span>
{{if $host.AlwaysOn}}
<span>last seen <span class="mono text-ink-mid">{{relTime $host.LastSeenAt}}</span></span>
{{else}}
<span>asleep · last seen <span class="mono text-ink-mid">{{relTime $host.LastSeenAt}}</span> · will catch up on return</span>
{{end}}
{{else}}
<span>online · last heartbeat <span class="mono text-ink-mid">{{relTime $host.LastSeenAt}}</span></span>
{{end}}
+11 -3
View File
@@ -8,7 +8,11 @@
{{- else if eq $h.Status "degraded" -}}
<span class="dot dot-degraded"></span>
{{- else if eq $h.Status "offline" -}}
<span class="dot dot-offline"></span>
{{- if $h.AlwaysOn -}}
<span class="dot dot-offline"></span>
{{- else -}}
<span class="dot dot-asleep"></span>
{{- end -}}
{{- else -}}
<span class="dot dot-failed"></span>
{{- end -}}
@@ -26,7 +30,11 @@
{{- else if eq (deref $h.LastBackupStatus) "cancelled" -}}
<span class="text-warn">cancelled</span> · <span class="mono">{{relTime $h.LastBackupAt}}</span>
{{- else if eq $h.Status "offline" -}}
<span class="text-ink-mute">last seen <span class="mono">{{relTime $h.LastSeenAt}}</span></span>
{{- if $h.AlwaysOn -}}
<span class="text-ink-mute">last seen <span class="mono">{{relTime $h.LastSeenAt}}</span></span>
{{- else -}}
<span class="text-ink-mute">asleep · <span class="mono">{{relTime $h.LastSeenAt}}</span> · will catch up on return</span>
{{- end -}}
{{- else -}}
<span class="text-ink-fade italic">never run</span>
{{- end -}}
@@ -53,7 +61,7 @@
</div>
<div class="text-right row-action">
{{- if eq $h.Status "offline" -}}
<span class="mono text-xs text-ink-fade">offline</span>
<span class="mono text-xs text-ink-fade">{{if $h.AlwaysOn}}offline{{else}}asleep{{end}}</span>
{{- else if $h.CurrentJobID -}}
<a href="/jobs/{{deref $h.CurrentJobID}}" class="btn btn-ghost">View job →</a>
{{- else if .RunAllScheduleID -}}
+1 -1
View File
@@ -7,5 +7,5 @@
Hidden entirely when UpdateAvailable is false.
*/}}
{{define "host_update_chip"}}
{{if .UpdateAvailable}}<span class="update-chip" title="Agent at {{.Host.AgentVersion}}; server at {{.TargetVersion}}">out of date · {{.Host.AgentVersion}} → {{.TargetVersion}}</span>{{end}}
{{if .UpdateAvailable}}<span class="update-chip" title="Agent at {{.Host.AgentVersion}}; server at {{.TargetVersion}}">out of date</span>{{end}}
{{end}}