restic-manager

Author	SHA1	Message	Date
steve	6c6b962e24	Merge pull request 'De-flake TestDrainPendingSerializesPerHost (CI stability)' (#33 ) from fix-flaky-server-http-tests into main Reviewed-on: #33	2026-06-16 15:44:47 +01:00
steve	e64075d5d7	test(pending-drain): de-flake TestDrainPendingSerializesPerHost CI / Test (store) (pull_request) Successful in 8s Details CI / Test (rest) (pull_request) Successful in 12s Details CI / Build (windows/amd64) (pull_request) Successful in 15s Details CI / Lint (pull_request) Successful in 19s Details CI / Build (linux/amd64) (pull_request) Successful in 12s Details CI / Build (linux/arm64) (pull_request) Successful in 44s Details CI / Test (server-http) (pull_request) Successful in 2m55s Details e2e / Playwright vs docker-compose (pull_request) Successful in 2m45s Details Keep the test WS client actively reading (a real agent always is) so the server-side conn stays registered under parallel load, and drain to completion via condition polling instead of asserting one-shot completeness. The conn could be dropped/unregistered under CI load, making DrainPending correctly no-op (conn==nil) and the test observe a partial/empty drain. -race confirms no production data race; the exactly-5-jobs assertion (proving the per-host mutex blocks double-dispatch) is unchanged. Verified: 0 failures over 25 loaded runs + 4 -race iterations.	2026-06-16 13:29:47 +01:00
steve	0f5110f3d9	Merge pull request 'Release v1.1.0 — CHANGELOG' (#32 ) from release-v1.1.0 into main Release / Build + push image (push) Successful in 3m39s Details v1.1.0	2026-06-16 07:32:00 +01:00
steve	0fbacf9f98	docs(changelog): v1.1.0 (always-on host mode) + retroactive v1.0.1 CI / Test (rest) (pull_request) Successful in 10s Details CI / Lint (pull_request) Successful in 16s Details CI / Build (windows/amd64) (pull_request) Successful in 11s Details CI / Build (linux/amd64) (pull_request) Successful in 12s Details CI / Build (linux/arm64) (pull_request) Successful in 11s Details CI / Test (store) (pull_request) Successful in 1m5s Details e2e / Playwright vs docker-compose (pull_request) Failing after 9s Details CI / Test (server-http) (pull_request) Failing after 2m43s Details	2026-06-15 23:07:43 +01:00
steve	d8fd4110b0	Merge pull request 'Always-On vs intermittent host mode (laptops): suppress offline noise, catch up missed backups' (#31 ) from feat-laptop-host-mode into main Reviewed-on: #31	2026-06-15 23:01:03 +01:00
steve	e17932d797	Merge branch 'main' into feat-laptop-host-mode CI / Test (rest) (pull_request) Successful in 1m6s Details CI / Lint (pull_request) Successful in 18s Details CI / Build (windows/amd64) (pull_request) Successful in 12s Details CI / Build (linux/amd64) (pull_request) Successful in 14s Details CI / Test (store) (pull_request) Successful in 1m8s Details CI / Build (linux/arm64) (pull_request) Successful in 11s Details e2e / Playwright vs docker-compose (pull_request) Failing after 10s Details CI / Test (server-http) (pull_request) Successful in 2m52s Details	2026-06-15 23:00:56 +01:00
steve	39030a3bbe	ui(host header): boxed tags/presence pills, click-to-edit, simplified out-of-date chip CI / Test (rest) (pull_request) Successful in 41s Details CI / Test (store) (pull_request) Successful in 1m16s Details CI / Lint (pull_request) Successful in 41s Details CI / Build (windows/amd64) (pull_request) Successful in 14s Details CI / Build (linux/arm64) (pull_request) Successful in 15s Details e2e / Playwright vs docker-compose (pull_request) Failing after 11s Details CI / Build (linux/amd64) (pull_request) Successful in 50s Details CI / Test (server-http) (pull_request) Failing after 2m53s Details	2026-06-15 22:58:38 +01:00
steve	a30f824a3c	Merge pull request 'Tidy: fix stale-dated sparkline test + gitignore agent worktrees' (#30 ) from tidy-sparkline-test-and-gitignore into main Reviewed-on: #30	2026-06-15 22:32:53 +01:00
steve	239d55b65b	test(dashboard): use relative dates so sparkline test doesn't age out of the 30-day window CI / Test (store) (pull_request) Successful in 8s Details CI / Test (rest) (pull_request) Successful in 45s Details CI / Lint (pull_request) Successful in 33s Details CI / Build (windows/amd64) (pull_request) Successful in 44s Details CI / Build (linux/amd64) (pull_request) Successful in 47s Details CI / Build (linux/arm64) (pull_request) Successful in 45s Details CI / Test (server-http) (pull_request) Successful in 2m26s Details e2e / Playwright vs docker-compose (pull_request) Successful in 2m50s Details	2026-06-15 22:15:07 +01:00
steve	74e5b75380	chore: gitignore .claude/worktrees (transient agent worktrees)	2026-06-15 22:14:36 +01:00
steve	9371b7b777	fix(catchup): guard on real in-flight backup check; add scheduler tests	2026-06-15 21:45:01 +01:00
steve	10b2518323	docs(tasks): record NS-08 always-on/intermittent host mode	2026-06-15 21:30:23 +01:00
steve	6694dfdc3a	fix(ui): rebuild CSS bundle so dot-asleep ships to the browser	2026-06-15 21:27:33 +01:00
steve	f88f2cc1f2	feat(ui): asleep state, 24×7 chip, presence toggle for host mode	2026-06-15 21:22:42 +01:00
steve	1a07fbb217	feat(http): host mode toggle handler + route (host.mode_updated)	2026-06-15 21:17:57 +01:00
steve	9e6524788f	refactor(alert): refresh stale_schedule docs; log tick schedule errors; add mode-change + never-backed-up tests	2026-06-15 21:15:35 +01:00
steve	25c55e5e4d	feat(alert): suppress offline + add staleness alert for intermittent hosts	2026-06-15 21:09:39 +01:00
steve	e408de9610	refactor(catchup): drop dead nil-guard; document per-host baseline limitation	2026-06-15 21:06:37 +01:00
steve	5c4e0275d9	feat(catchup): arm on hello, fire missed-window backups on tick	2026-06-15 21:02:04 +01:00
steve	7aaafceab5	feat(catchup): scheduleOverdue helper for missed-window detection	2026-06-15 20:58:17 +01:00
steve	4c9641b6ed	fix(store): SetHostAlwaysOn returns ErrNotFound; test agent-token lookup path	2026-06-15 20:56:59 +01:00
steve	ff65d39f25	feat(store): add hosts.always_on flag (default on)	2026-06-15 20:53:13 +01:00
steve	9d16e3f7e3	docs(plan): always-on vs intermittent host mode implementation plan	2026-06-15 20:48:16 +01:00
steve	261b83ec26	docs(spec): clarify staleness vs job-failure alerting for asleep hosts	2026-06-15 20:42:00 +01:00
steve	0c3a0844e4	docs(spec): always-on vs intermittent host mode design	2026-06-15 20:37:45 +01:00
steve	2dae61f678	Merge pull request 'fix(ui): tick relative timestamps client-side so long-open tabs don't go stale' (#29 ) from fix-stale-reltime into main Reviewed-on: #29	2026-06-15 20:19:59 +01:00
steve	55cb8909c7	docs(tasks): record NS-07 client-side relTime ticker fix CI / Test (rest) (pull_request) Successful in 1m46s Details CI / Test (store) (pull_request) Successful in 2m4s Details CI / Lint (pull_request) Successful in 34s Details CI / Build (windows/amd64) (pull_request) Successful in 45s Details CI / Build (linux/amd64) (pull_request) Successful in 46s Details CI / Test (server-http) (pull_request) Failing after 3m32s Details CI / Build (linux/arm64) (pull_request) Successful in 47s Details e2e / Playwright vs docker-compose (pull_request) Successful in 2m43s Details	2026-06-15 20:19:32 +01:00
steve	06748f5582	Merge pull request 'ui(relTime): tick relative timestamps client-side' (#28 ) from fix-stale-reltime into main Release / Build + push image (push) Successful in 3m52s Details Reviewed-on: #28 v1.0.1	2026-05-15 20:14:08 +00:00
steve	a4d705db6b	Merge branch 'main' into fix-stale-reltime CI / Test (store) (pull_request) Successful in 1m15s Details CI / Lint (pull_request) Successful in 19s Details CI / Build (windows/amd64) (pull_request) Successful in 25s Details CI / Test (server-http) (pull_request) Successful in 2m2s Details CI / Test (rest) (pull_request) Successful in 2m12s Details CI / Build (linux/amd64) (pull_request) Successful in 26s Details CI / Build (linux/arm64) (pull_request) Successful in 26s Details e2e / Playwright vs docker-compose (pull_request) Successful in 2m59s Details	2026-05-15 20:05:45 +00:00
steve	c6f73f790d	ci: pull ci-runner-go from zot registry	2026-05-15 19:51:02 +00:00
steve	068f08d96d	ci: migrate release workflow to zot registry	2026-05-15 19:50:50 +00:00
steve	28ef9750d3	ui(relTime): tick relative timestamps client-side so long-open tabs don't freeze CI / Test (rest) (pull_request) Successful in 9s Details CI / Test (store) (pull_request) Successful in 6s Details CI / Build (windows/amd64) (pull_request) Successful in 8s Details CI / Build (linux/amd64) (pull_request) Successful in 7s Details CI / Lint (pull_request) Successful in 19s Details CI / Build (linux/arm64) (pull_request) Successful in 7s Details e2e / Playwright vs docker-compose (pull_request) Successful in 1m26s Details CI / Test (server-http) (pull_request) Successful in 2m34s Details formatRelTime now wraps its label in <time data-rel-ts=...>, and both layouts include a small ticker that re-renders every 30s. Without this, a job-detail page rendered an hour ago kept showing '2h ago' when the wall-clock truth was '3h ago'.	2026-05-10 07:37:03 +01:00
steve	f4db0b17e8	Merge pull request 'fix(version): single-source internal/version, fix dockerfile ldflags' (#27 ) from fix-version-ldflags into main Release / Build + push image (push) Successful in 3m58s Details	2026-05-09 14:26:50 +00:00
steve	8afda7cd8c	fix(version): use internal/version as single source for build constants CI / Test (store) (pull_request) Successful in 5s Details CI / Test (rest) (pull_request) Successful in 9s Details CI / Build (windows/amd64) (pull_request) Successful in 7s Details CI / Test (server-http) (pull_request) Successful in 17s Details CI / Build (linux/amd64) (pull_request) Successful in 7s Details CI / Lint (pull_request) Successful in 19s Details CI / Build (linux/arm64) (pull_request) Successful in 14s Details e2e / Playwright vs docker-compose (pull_request) Successful in 1m27s Details The Dockerfile only set `-X main.version=...`, so docker-built binaries left `internal/version.Version` at its default "dev". The update logic (host_update.go:61, hosts.go:94, fleet_update.go:101 et al.) compares against `internal/version.Version`, so a v1.0.0 host always looked out-of-date to a v1.0.0 server, the chip never cleared, and pressing "update" re-downloaded the same bundled binary on a loop. Collapse the two version sources: drop the `var version/commit/date` locals in cmd/{server,agent}/main.go, route everything through internal/version (now also carrying Date), and have both the Dockerfile and the Makefile set the same single set of -X flags. Verified end-to-end: make build and docker build both emit binaries whose --version reflects the build VERSION.	2026-05-09 15:20:13 +01:00
steve	123e4f4915	scrub: remove docs/superpowers and ask.md; gitignore them These were never meant for the public repo. Wiped from history in the same change set via git-filter-repo.	2026-05-09 14:23:29 +01:00
steve	7b035a8f09	Merge pull request 'v1 readiness: CHANGELOG + threat model + first-run onboarding polish' (#26 ) from v1-readiness into main Release / Build + push image (push) Successful in 2m16s Details Reviewed-on: #26 v1.0.0	2026-05-09 11:52:33 +00:00
steve	7a813cacd3	first-run: keep 'bootstrap token' phrase so e2e log-scraper still matches The CI e2e workflow greps for 'bootstrap token' in server logs to capture the one-shot token. The earlier reword dropped that phrase; restore it on the headless-instructions line so .gitea/workflows/e2e.yml step 'Capture bootstrap token from server logs' keeps matching.	2026-05-09 12:49:40 +01:00
steve	1d36dcd668	v1 readiness: CHANGELOG + threat model + first-run onboarding polish - CHANGELOG.md: Keep-a-Changelog format, v1.0.0 entry summarising what each phase delivered. - docs/threat-model.md: structured walkthrough of assets, actors, attack surfaces and residual risks; reviewed against v1.0.0. - cmd/server/main.go: at first-run startup, print a clickable $RM_BASE_URL/bootstrap URL alongside the existing one-shot bootstrap token (or a fallback hint when RM_BASE_URL is unset). - web/templates/pages/bootstrap.html: visible "Minimum 12 characters" hint under the password field so the rule is communicated before the operator submits. - tasks.md: close X-01, X-04, X-05 with notes.	2026-05-09 12:29:00 +01:00
steve	755840d9ff	Merge pull request 'docs: AI-agent host onboarding guide' (#25 ) from temp-onboarding into main Reviewed-on: #25	2026-05-09 11:22:54 +00:00
steve	cc638f6456	Added new AI focused document for host onboarding	2026-05-09 12:18:42 +01:00
steve	e046be98b2	Merge pull request 'Cleanup: NS-05/NS-06 + drop dead /repos nav link' (#24 ) from ns-05-06-cleanup into main Reviewed-on: #24	2026-05-09 11:11:36 +00:00
steve	a9c47deb26	nav: drop dead /repos top-level link (repos are per-host, accessed via host sub-tab)	2026-05-09 11:59:08 +01:00
steve	8a7706407d	tasks: close NS-05 (setup-go already gone) + NS-06 (drop Run-backup tombstone button)	2026-05-09 11:55:21 +01:00
steve	3101024d1a	tasks: queue NS-05 (drop setup-go) + NS-06 (drop disabled Run-backup button) Two small follow-ups noted while working through the p5-oss-readiness CI-runner switch: * NS-05 — actions/setup-go is now redundant; ci-runner-go ships Go on PATH and re-downloading on every job costs ~5s a shard. * NS-06 — host_chrome's per-host "Run backup now" button is a permanently-disabled tombstone; remove it so the chrome stops advertising an action that no longer exists.	2026-05-08 22:26:59 +01:00
steve	7f98524cfa	Merge pull request 'P5: OSS readiness — docs site, contributor onboarding, e2e harness' (#23 ) from p5-oss-readiness into main Reviewed-on: #23	2026-05-08 21:22:38 +00:00
steve	41def51977	e2e: dispatch backup via source-group API Per-host Run-backup is gone — the host_chrome partial still renders the button but it's hard-disabled with a tooltip pointing to per-source-group Run-now. The smoke test was clicking that disabled button and waiting forever for a URL change that would never happen. Replace the navigation-based dispatch with two API calls: create a source group covering the agent's /source mount, then POST to /api/hosts/{id}/source-groups/{gid}/run. The backup-status assertion at the end is unchanged — host record is still the source of truth.	2026-05-08 22:16:57 +01:00
steve	b9439da467	api: expose host.repo_status in /api/hosts JSON The dashboard renders init_running / init_failed / ready state based on host.repo_status, but the JSON endpoint dropped the field on its way out. The e2e test couldn't poll for repo readiness; reflect the same projection the UI uses.	2026-05-08 22:06:22 +01:00
steve	5925d09e8b	e2e: wait for repo_status=ready and bump test timeout Two issues uncovered by the page-snapshot dump after the agent state-dir fix: * The host page server-renders `Run backup now` as disabled while repo_status != ready, and the page has no live-refresh on that field. The test was navigating right after status flipped to 'online' but before auto-init had completed (~3s later), so the rendered HTML still showed init_running and the click was a no-op. Wait for repo_status === 'ready' before navigating. * playwright.config.ts pinned the per-test timeout at 60s, but the test itself uses 60s + 120s of internal waits. Bump to 240s so the test fails on real regressions instead of timing out on its own internal budget. Renamed the test description away from "under a minute" since it overpromises against the new timeout. The performance SLO belongs in a separate test if we want to assert it.	2026-05-08 22:00:24 +01:00
steve	cc6844605f	e2e: fix agent state-dir to /var/lib/restic-manager The agent writes its encrypted secrets blob to $DefaultSecretsPath (/var/lib/restic-manager/secrets.enc) but the e2e fixtures created and mounted a directory at /var/lib/restic-manager-agent — name mismatch. Result: every `config.update` push failed with 'create tmp: no such file or directory', the auto-init never got the repo creds, the host landed in init_failed, and the smoke test couldn't kick off a backup (the Run backup button is disabled while repo_status != ready). Align the compose volume mount and the Dockerfile mkdir on /var/lib/restic-manager so they match the production install script + the agent's own default.	2026-05-08 21:53:35 +01:00
steve	4cd36d83e3	ui: show pending-hosts panel even when fleet is otherwise empty The dashboard's empty-state ("No hosts yet.") was gated on HostCount == 0 alone, which hid the pending-hosts panel — and the inline accept form — for the most common first-run scenario: operator just installed an agent that announced, the fleet has zero accepted hosts, and the only thing the operator needs to do is review fingerprint + click Accept. Tighten the gate so the empty state only shows when there are truly zero hosts and zero pending announces. With a pending host, fall through to the regular dashboard layout so the approval queue is visible and actionable. Caught by the e2e enrol-via-announce smoke test (now unblocked on PR #23).	2026-05-08 21:47:31 +01:00

1 2 3 4 5 ...

331 Commits