331 Commits

Author SHA1 Message Date
steve 6c6b962e24 Merge pull request 'De-flake TestDrainPendingSerializesPerHost (CI stability)' (#33) from fix-flaky-server-http-tests into main
Reviewed-on: #33
2026-06-16 15:44:47 +01:00
steve e64075d5d7 test(pending-drain): de-flake TestDrainPendingSerializesPerHost
CI / Test (store) (pull_request) Successful in 8s
CI / Test (rest) (pull_request) Successful in 12s
CI / Build (windows/amd64) (pull_request) Successful in 15s
CI / Lint (pull_request) Successful in 19s
CI / Build (linux/amd64) (pull_request) Successful in 12s
CI / Build (linux/arm64) (pull_request) Successful in 44s
CI / Test (server-http) (pull_request) Successful in 2m55s
e2e / Playwright vs docker-compose (pull_request) Successful in 2m45s
Keep the test WS client actively reading (a real agent always is) so
the server-side conn stays registered under parallel load, and drain to
completion via condition polling instead of asserting one-shot
completeness. The conn could be dropped/unregistered under CI load,
making DrainPending correctly no-op (conn==nil) and the test observe a
partial/empty drain. -race confirms no production data race; the
exactly-5-jobs assertion (proving the per-host mutex blocks
double-dispatch) is unchanged. Verified: 0 failures over 25 loaded runs
+ 4 -race iterations.
2026-06-16 13:29:47 +01:00
steve 0f5110f3d9 Merge pull request 'Release v1.1.0 — CHANGELOG' (#32) from release-v1.1.0 into main
Release / Build + push image (push) Successful in 3m39s
v1.1.0
2026-06-16 07:32:00 +01:00
steve 0fbacf9f98 docs(changelog): v1.1.0 (always-on host mode) + retroactive v1.0.1
CI / Test (rest) (pull_request) Successful in 10s
CI / Lint (pull_request) Successful in 16s
CI / Build (windows/amd64) (pull_request) Successful in 11s
CI / Build (linux/amd64) (pull_request) Successful in 12s
CI / Build (linux/arm64) (pull_request) Successful in 11s
CI / Test (store) (pull_request) Successful in 1m5s
e2e / Playwright vs docker-compose (pull_request) Failing after 9s
CI / Test (server-http) (pull_request) Failing after 2m43s
2026-06-15 23:07:43 +01:00
steve d8fd4110b0 Merge pull request 'Always-On vs intermittent host mode (laptops): suppress offline noise, catch up missed backups' (#31) from feat-laptop-host-mode into main
Reviewed-on: #31
2026-06-15 23:01:03 +01:00
steve e17932d797 Merge branch 'main' into feat-laptop-host-mode
CI / Test (rest) (pull_request) Successful in 1m6s
CI / Lint (pull_request) Successful in 18s
CI / Build (windows/amd64) (pull_request) Successful in 12s
CI / Build (linux/amd64) (pull_request) Successful in 14s
CI / Test (store) (pull_request) Successful in 1m8s
CI / Build (linux/arm64) (pull_request) Successful in 11s
e2e / Playwright vs docker-compose (pull_request) Failing after 10s
CI / Test (server-http) (pull_request) Successful in 2m52s
2026-06-15 23:00:56 +01:00
steve 39030a3bbe ui(host header): boxed tags/presence pills, click-to-edit, simplified out-of-date chip
CI / Test (rest) (pull_request) Successful in 41s
CI / Test (store) (pull_request) Successful in 1m16s
CI / Lint (pull_request) Successful in 41s
CI / Build (windows/amd64) (pull_request) Successful in 14s
CI / Build (linux/arm64) (pull_request) Successful in 15s
e2e / Playwright vs docker-compose (pull_request) Failing after 11s
CI / Build (linux/amd64) (pull_request) Successful in 50s
CI / Test (server-http) (pull_request) Failing after 2m53s
2026-06-15 22:58:38 +01:00
steve a30f824a3c Merge pull request 'Tidy: fix stale-dated sparkline test + gitignore agent worktrees' (#30) from tidy-sparkline-test-and-gitignore into main
Reviewed-on: #30
2026-06-15 22:32:53 +01:00
steve 239d55b65b test(dashboard): use relative dates so sparkline test doesn't age out of the 30-day window
CI / Test (store) (pull_request) Successful in 8s
CI / Test (rest) (pull_request) Successful in 45s
CI / Lint (pull_request) Successful in 33s
CI / Build (windows/amd64) (pull_request) Successful in 44s
CI / Build (linux/amd64) (pull_request) Successful in 47s
CI / Build (linux/arm64) (pull_request) Successful in 45s
CI / Test (server-http) (pull_request) Successful in 2m26s
e2e / Playwright vs docker-compose (pull_request) Successful in 2m50s
2026-06-15 22:15:07 +01:00
steve 74e5b75380 chore: gitignore .claude/worktrees (transient agent worktrees) 2026-06-15 22:14:36 +01:00
steve 9371b7b777 fix(catchup): guard on real in-flight backup check; add scheduler tests 2026-06-15 21:45:01 +01:00
steve 10b2518323 docs(tasks): record NS-08 always-on/intermittent host mode 2026-06-15 21:30:23 +01:00
steve 6694dfdc3a fix(ui): rebuild CSS bundle so dot-asleep ships to the browser 2026-06-15 21:27:33 +01:00
steve f88f2cc1f2 feat(ui): asleep state, 24×7 chip, presence toggle for host mode 2026-06-15 21:22:42 +01:00
steve 1a07fbb217 feat(http): host mode toggle handler + route (host.mode_updated) 2026-06-15 21:17:57 +01:00
steve 9e6524788f refactor(alert): refresh stale_schedule docs; log tick schedule errors; add mode-change + never-backed-up tests 2026-06-15 21:15:35 +01:00
steve 25c55e5e4d feat(alert): suppress offline + add staleness alert for intermittent hosts 2026-06-15 21:09:39 +01:00
steve e408de9610 refactor(catchup): drop dead nil-guard; document per-host baseline limitation 2026-06-15 21:06:37 +01:00
steve 5c4e0275d9 feat(catchup): arm on hello, fire missed-window backups on tick 2026-06-15 21:02:04 +01:00
steve 7aaafceab5 feat(catchup): scheduleOverdue helper for missed-window detection 2026-06-15 20:58:17 +01:00
steve 4c9641b6ed fix(store): SetHostAlwaysOn returns ErrNotFound; test agent-token lookup path 2026-06-15 20:56:59 +01:00
steve ff65d39f25 feat(store): add hosts.always_on flag (default on) 2026-06-15 20:53:13 +01:00
steve 9d16e3f7e3 docs(plan): always-on vs intermittent host mode implementation plan 2026-06-15 20:48:16 +01:00
steve 261b83ec26 docs(spec): clarify staleness vs job-failure alerting for asleep hosts 2026-06-15 20:42:00 +01:00
steve 0c3a0844e4 docs(spec): always-on vs intermittent host mode design 2026-06-15 20:37:45 +01:00
steve 2dae61f678 Merge pull request 'fix(ui): tick relative timestamps client-side so long-open tabs don't go stale' (#29) from fix-stale-reltime into main
Reviewed-on: #29
2026-06-15 20:19:59 +01:00
steve 55cb8909c7 docs(tasks): record NS-07 client-side relTime ticker fix
CI / Test (rest) (pull_request) Successful in 1m46s
CI / Test (store) (pull_request) Successful in 2m4s
CI / Lint (pull_request) Successful in 34s
CI / Build (windows/amd64) (pull_request) Successful in 45s
CI / Build (linux/amd64) (pull_request) Successful in 46s
CI / Test (server-http) (pull_request) Failing after 3m32s
CI / Build (linux/arm64) (pull_request) Successful in 47s
e2e / Playwright vs docker-compose (pull_request) Successful in 2m43s
2026-06-15 20:19:32 +01:00
steve 06748f5582 Merge pull request 'ui(relTime): tick relative timestamps client-side' (#28) from fix-stale-reltime into main
Release / Build + push image (push) Successful in 3m52s
Reviewed-on: #28
v1.0.1
2026-05-15 20:14:08 +00:00
steve a4d705db6b Merge branch 'main' into fix-stale-reltime
CI / Test (store) (pull_request) Successful in 1m15s
CI / Lint (pull_request) Successful in 19s
CI / Build (windows/amd64) (pull_request) Successful in 25s
CI / Test (server-http) (pull_request) Successful in 2m2s
CI / Test (rest) (pull_request) Successful in 2m12s
CI / Build (linux/amd64) (pull_request) Successful in 26s
CI / Build (linux/arm64) (pull_request) Successful in 26s
e2e / Playwright vs docker-compose (pull_request) Successful in 2m59s
2026-05-15 20:05:45 +00:00
steve c6f73f790d ci: pull ci-runner-go from zot registry 2026-05-15 19:51:02 +00:00
steve 068f08d96d ci: migrate release workflow to zot registry 2026-05-15 19:50:50 +00:00
steve 28ef9750d3 ui(relTime): tick relative timestamps client-side so long-open tabs don't freeze
CI / Test (rest) (pull_request) Successful in 9s
CI / Test (store) (pull_request) Successful in 6s
CI / Build (windows/amd64) (pull_request) Successful in 8s
CI / Build (linux/amd64) (pull_request) Successful in 7s
CI / Lint (pull_request) Successful in 19s
CI / Build (linux/arm64) (pull_request) Successful in 7s
e2e / Playwright vs docker-compose (pull_request) Successful in 1m26s
CI / Test (server-http) (pull_request) Successful in 2m34s
formatRelTime now wraps its label in <time data-rel-ts=...>, and
both layouts include a small ticker that re-renders every 30s.
Without this, a job-detail page rendered an hour ago kept showing
'2h ago' when the wall-clock truth was '3h ago'.
2026-05-10 07:37:03 +01:00
steve f4db0b17e8 Merge pull request 'fix(version): single-source internal/version, fix dockerfile ldflags' (#27) from fix-version-ldflags into main
Release / Build + push image (push) Successful in 3m58s
2026-05-09 14:26:50 +00:00
steve 8afda7cd8c fix(version): use internal/version as single source for build constants
CI / Test (store) (pull_request) Successful in 5s
CI / Test (rest) (pull_request) Successful in 9s
CI / Build (windows/amd64) (pull_request) Successful in 7s
CI / Test (server-http) (pull_request) Successful in 17s
CI / Build (linux/amd64) (pull_request) Successful in 7s
CI / Lint (pull_request) Successful in 19s
CI / Build (linux/arm64) (pull_request) Successful in 14s
e2e / Playwright vs docker-compose (pull_request) Successful in 1m27s
The Dockerfile only set `-X main.version=...`, so docker-built binaries
left `internal/version.Version` at its default "dev". The update logic
(host_update.go:61, hosts.go:94, fleet_update.go:101 et al.) compares
against `internal/version.Version`, so a v1.0.0 host always looked
out-of-date to a v1.0.0 server, the chip never cleared, and pressing
"update" re-downloaded the same bundled binary on a loop.

Collapse the two version sources: drop the `var version/commit/date`
locals in cmd/{server,agent}/main.go, route everything through
internal/version (now also carrying Date), and have both the Dockerfile
and the Makefile set the same single set of -X flags. Verified
end-to-end: make build and docker build both emit binaries whose
--version reflects the build VERSION.
2026-05-09 15:20:13 +01:00
steve 123e4f4915 scrub: remove docs/superpowers and ask.md; gitignore them
These were never meant for the public repo. Wiped from history in
the same change set via git-filter-repo.
2026-05-09 14:23:29 +01:00
steve 7b035a8f09 Merge pull request 'v1 readiness: CHANGELOG + threat model + first-run onboarding polish' (#26) from v1-readiness into main
Release / Build + push image (push) Successful in 2m16s
Reviewed-on: #26
v1.0.0
2026-05-09 11:52:33 +00:00
steve 7a813cacd3 first-run: keep 'bootstrap token' phrase so e2e log-scraper still matches
The CI e2e workflow greps for 'bootstrap token' in server logs to capture
the one-shot token. The earlier reword dropped that phrase; restore it on
the headless-instructions line so .gitea/workflows/e2e.yml step 'Capture
bootstrap token from server logs' keeps matching.
2026-05-09 12:49:40 +01:00
steve 1d36dcd668 v1 readiness: CHANGELOG + threat model + first-run onboarding polish
- CHANGELOG.md: Keep-a-Changelog format, v1.0.0 entry summarising
  what each phase delivered.
- docs/threat-model.md: structured walkthrough of assets, actors,
  attack surfaces and residual risks; reviewed against v1.0.0.
- cmd/server/main.go: at first-run startup, print a clickable
  $RM_BASE_URL/bootstrap URL alongside the existing one-shot
  bootstrap token (or a fallback hint when RM_BASE_URL is unset).
- web/templates/pages/bootstrap.html: visible "Minimum 12 characters"
  hint under the password field so the rule is communicated
  before the operator submits.
- tasks.md: close X-01, X-04, X-05 with notes.
2026-05-09 12:29:00 +01:00
steve 755840d9ff Merge pull request 'docs: AI-agent host onboarding guide' (#25) from temp-onboarding into main
Reviewed-on: #25
2026-05-09 11:22:54 +00:00
steve cc638f6456 Added new AI focused document for host onboarding 2026-05-09 12:18:42 +01:00
steve e046be98b2 Merge pull request 'Cleanup: NS-05/NS-06 + drop dead /repos nav link' (#24) from ns-05-06-cleanup into main
Reviewed-on: #24
2026-05-09 11:11:36 +00:00
steve a9c47deb26 nav: drop dead /repos top-level link (repos are per-host, accessed via host sub-tab) 2026-05-09 11:59:08 +01:00
steve 8a7706407d tasks: close NS-05 (setup-go already gone) + NS-06 (drop Run-backup tombstone button) 2026-05-09 11:55:21 +01:00
steve 3101024d1a tasks: queue NS-05 (drop setup-go) + NS-06 (drop disabled Run-backup button)
Two small follow-ups noted while working through the
p5-oss-readiness CI-runner switch:

* NS-05 — actions/setup-go is now redundant; ci-runner-go ships
  Go on PATH and re-downloading on every job costs ~5s a shard.
* NS-06 — host_chrome's per-host "Run backup now" button is a
  permanently-disabled tombstone; remove it so the chrome stops
  advertising an action that no longer exists.
2026-05-08 22:26:59 +01:00
steve 7f98524cfa Merge pull request 'P5: OSS readiness — docs site, contributor onboarding, e2e harness' (#23) from p5-oss-readiness into main
Reviewed-on: #23
2026-05-08 21:22:38 +00:00
steve 41def51977 e2e: dispatch backup via source-group API
Per-host Run-backup is gone — the host_chrome partial still
renders the button but it's hard-disabled with a tooltip
pointing to per-source-group Run-now. The smoke test was
clicking that disabled button and waiting forever for a URL
change that would never happen.

Replace the navigation-based dispatch with two API calls:
create a source group covering the agent's /source mount,
then POST to /api/hosts/{id}/source-groups/{gid}/run. The
backup-status assertion at the end is unchanged — host record
is still the source of truth.
2026-05-08 22:16:57 +01:00
steve b9439da467 api: expose host.repo_status in /api/hosts JSON
The dashboard renders init_running / init_failed / ready state
based on host.repo_status, but the JSON endpoint dropped the
field on its way out. The e2e test couldn't poll for repo
readiness; reflect the same projection the UI uses.
2026-05-08 22:06:22 +01:00
steve 5925d09e8b e2e: wait for repo_status=ready and bump test timeout
Two issues uncovered by the page-snapshot dump after the agent
state-dir fix:

* The host page server-renders `Run backup now` as disabled
  while repo_status != ready, and the page has no live-refresh
  on that field. The test was navigating right after status
  flipped to 'online' but before auto-init had completed (~3s
  later), so the rendered HTML still showed init_running and
  the click was a no-op. Wait for repo_status === 'ready'
  before navigating.

* playwright.config.ts pinned the per-test timeout at 60s,
  but the test itself uses 60s + 120s of internal waits.
  Bump to 240s so the test fails on real regressions instead
  of timing out on its own internal budget.

Renamed the test description away from "under a minute" since
it overpromises against the new timeout. The performance SLO
belongs in a separate test if we want to assert it.
2026-05-08 22:00:24 +01:00
steve cc6844605f e2e: fix agent state-dir to /var/lib/restic-manager
The agent writes its encrypted secrets blob to
$DefaultSecretsPath (/var/lib/restic-manager/secrets.enc) but
the e2e fixtures created and mounted a directory at
/var/lib/restic-manager-agent — name mismatch. Result: every
`config.update` push failed with 'create tmp: no such file or
directory', the auto-init never got the repo creds, the host
landed in init_failed, and the smoke test couldn't kick off a
backup (the Run backup button is disabled while
repo_status != ready).

Align the compose volume mount and the Dockerfile mkdir on
/var/lib/restic-manager so they match the production install
script + the agent's own default.
2026-05-08 21:53:35 +01:00
steve 4cd36d83e3 ui: show pending-hosts panel even when fleet is otherwise empty
The dashboard's empty-state ("No hosts yet.") was gated on
HostCount == 0 alone, which hid the pending-hosts panel — and
the inline accept form — for the most common first-run scenario:
operator just installed an agent that announced, the fleet has
zero accepted hosts, and the only thing the operator needs to do
is review fingerprint + click Accept.

Tighten the gate so the empty state only shows when there are
truly zero hosts and zero pending announces. With a pending
host, fall through to the regular dashboard layout so the
approval queue is visible and actionable.

Caught by the e2e enrol-via-announce smoke test (now unblocked
on PR #23).
2026-05-08 21:47:31 +01:00