restic-manager

Author	SHA1	Message	Date
steve	8ef681f3f4	tasks: queue NS-05 (drop setup-go) + NS-06 (drop disabled Run-backup button) Two small follow-ups noted while working through the p5-oss-readiness CI-runner switch: * NS-05 — actions/setup-go is now redundant; ci-runner-go ships Go on PATH and re-downloading on every job costs ~5s a shard. * NS-06 — host_chrome's per-host "Run backup now" button is a permanently-disabled tombstone; remove it so the chrome stops advertising an action that no longer exists.	2026-05-08 22:26:59 +01:00
steve	7be2e4c5b0	Merge pull request 'P5: OSS readiness — docs site, contributor onboarding, e2e harness' (#23 ) from p5-oss-readiness into main Reviewed-on: #23	2026-05-08 21:22:38 +00:00
steve	ea9941b9ec	e2e: dispatch backup via source-group API CI / Test (rest) (pull_request) Successful in 7s Details CI / Test (store) (pull_request) Successful in 6s Details CI / Build (windows/amd64) (pull_request) Successful in 8s Details CI / Lint (pull_request) Successful in 18s Details CI / Build (linux/amd64) (pull_request) Successful in 7s Details CI / Build (linux/arm64) (pull_request) Successful in 8s Details e2e / Playwright vs docker-compose (pull_request) Successful in 1m27s Details CI / Test (server-http) (pull_request) Successful in 3m3s Details Per-host Run-backup is gone — the host_chrome partial still renders the button but it's hard-disabled with a tooltip pointing to per-source-group Run-now. The smoke test was clicking that disabled button and waiting forever for a URL change that would never happen. Replace the navigation-based dispatch with two API calls: create a source group covering the agent's /source mount, then POST to /api/hosts/{id}/source-groups/{gid}/run. The backup-status assertion at the end is unchanged — host record is still the source of truth.	2026-05-08 22:16:57 +01:00
steve	130b68226e	api: expose host.repo_status in /api/hosts JSON CI / Test (rest) (pull_request) Successful in 11s Details CI / Test (store) (pull_request) Successful in 10s Details CI / Build (windows/amd64) (pull_request) Successful in 15s Details CI / Lint (pull_request) Successful in 19s Details CI / Build (linux/arm64) (pull_request) Successful in 7s Details CI / Build (linux/amd64) (pull_request) Successful in 15s Details CI / Test (server-http) (pull_request) Successful in 1m30s Details e2e / Playwright vs docker-compose (pull_request) Failing after 6m36s Details The dashboard renders init_running / init_failed / ready state based on host.repo_status, but the JSON endpoint dropped the field on its way out. The e2e test couldn't poll for repo readiness; reflect the same projection the UI uses.	2026-05-08 22:06:22 +01:00
steve	ccd7c2f2fd	e2e: wait for repo_status=ready and bump test timeout CI / Test (rest) (pull_request) Successful in 8s Details CI / Test (store) (pull_request) Successful in 8s Details CI / Test (server-http) (pull_request) Successful in 12s Details CI / Build (windows/amd64) (pull_request) Successful in 9s Details CI / Lint (pull_request) Successful in 20s Details CI / Build (linux/arm64) (pull_request) Successful in 9s Details CI / Build (linux/amd64) (pull_request) Successful in 17s Details e2e / Playwright vs docker-compose (pull_request) Failing after 4m7s Details Two issues uncovered by the page-snapshot dump after the agent state-dir fix: * The host page server-renders `Run backup now` as disabled while repo_status != ready, and the page has no live-refresh on that field. The test was navigating right after status flipped to 'online' but before auto-init had completed (~3s later), so the rendered HTML still showed init_running and the click was a no-op. Wait for repo_status === 'ready' before navigating. * playwright.config.ts pinned the per-test timeout at 60s, but the test itself uses 60s + 120s of internal waits. Bump to 240s so the test fails on real regressions instead of timing out on its own internal budget. Renamed the test description away from "under a minute" since it overpromises against the new timeout. The performance SLO belongs in a separate test if we want to assert it.	2026-05-08 22:00:24 +01:00
steve	51fe1946b7	e2e: fix agent state-dir to /var/lib/restic-manager CI / Test (store) (pull_request) Successful in 6s Details CI / Test (rest) (pull_request) Successful in 17s Details CI / Lint (pull_request) Successful in 19s Details CI / Build (linux/amd64) (pull_request) Successful in 8s Details CI / Build (linux/arm64) (pull_request) Successful in 9s Details CI / Build (windows/amd64) (pull_request) Successful in 57s Details CI / Test (server-http) (pull_request) Successful in 1m30s Details e2e / Playwright vs docker-compose (pull_request) Failing after 3m26s Details The agent writes its encrypted secrets blob to $DefaultSecretsPath (/var/lib/restic-manager/secrets.enc) but the e2e fixtures created and mounted a directory at /var/lib/restic-manager-agent — name mismatch. Result: every `config.update` push failed with 'create tmp: no such file or directory', the auto-init never got the repo creds, the host landed in init_failed, and the smoke test couldn't kick off a backup (the Run backup button is disabled while repo_status != ready). Align the compose volume mount and the Dockerfile mkdir on /var/lib/restic-manager so they match the production install script + the agent's own default.	2026-05-08 21:53:35 +01:00
steve	523ac4137a	ui: show pending-hosts panel even when fleet is otherwise empty CI / Test (store) (pull_request) Successful in 6s Details CI / Lint (pull_request) Successful in 20s Details CI / Build (windows/amd64) (pull_request) Successful in 10s Details CI / Test (rest) (pull_request) Successful in 41s Details CI / Build (linux/amd64) (pull_request) Successful in 8s Details CI / Build (linux/arm64) (pull_request) Successful in 8s Details CI / Test (server-http) (pull_request) Successful in 3m8s Details e2e / Playwright vs docker-compose (pull_request) Failing after 3m28s Details The dashboard's empty-state ("No hosts yet.") was gated on HostCount == 0 alone, which hid the pending-hosts panel — and the inline accept form — for the most common first-run scenario: operator just installed an agent that announced, the fleet has zero accepted hosts, and the only thing the operator needs to do is review fingerprint + click Accept. Tighten the gate so the empty state only shows when there are truly zero hosts and zero pending announces. With a pending host, fall through to the regular dashboard layout so the approval queue is visible and actionable. Caught by the e2e enrol-via-announce smoke test (now unblocked on PR #23).	2026-05-08 21:47:31 +01:00
steve	74be681b4b	e2e: dump error-context.md to log on failure + bump upload-artifact CI / Test (server-http) (pull_request) Successful in 7s Details CI / Test (store) (pull_request) Successful in 6s Details CI / Test (rest) (pull_request) Successful in 13s Details CI / Build (windows/amd64) (pull_request) Successful in 9s Details CI / Build (linux/arm64) (pull_request) Successful in 8s Details CI / Lint (pull_request) Successful in 19s Details CI / Build (linux/amd64) (pull_request) Successful in 15s Details e2e / Playwright vs docker-compose (pull_request) Failing after 3m37s Details The Playwright run produces error-context.md per failed test with a full DOM snapshot — useful for triaging UI test failures without round-tripping through downloaded artifacts. Cat it into the workflow log on failure. Also bump actions/upload-artifact v3 → v4. v3 uploads still return success on this Gitea runner but the artifacts don't surface through the API or UI; v4 is the correct version per the workflow header note.	2026-05-08 21:41:38 +01:00
steve	e14dd82f20	e2e: extract Playwright report via docker cp instead of bind mount CI / Test (server-http) (pull_request) Successful in 6s Details CI / Test (store) (pull_request) Successful in 5s Details CI / Build (windows/amd64) (pull_request) Successful in 7s Details CI / Lint (pull_request) Successful in 19s Details CI / Build (linux/amd64) (pull_request) Successful in 7s Details CI / Build (linux/arm64) (pull_request) Successful in 8s Details CI / Test (rest) (pull_request) Successful in 51s Details e2e / Playwright vs docker-compose (pull_request) Failing after 3m30s Details When the runner job runs inside a container, compose's relative `./playwright/playwright-report` resolves to a path that exists only inside the runner container, so the host's docker daemon silently bind-mounts an empty dir and the report never lands anywhere we can read. Drop the bind mounts; keep the playwright container around (--name e2e-pw, no --rm); after the test, `docker cp` the report and traces out into the runner's workspace volume so upload-artifact has something real to upload. The new test-results directory (Playwright traces, screenshots, videos) is also included so failure post-mortem doesn't need a re-run.	2026-05-08 21:36:09 +01:00
steve	21567adb8e	runner tests: probe-exec setupScript to clear overlayfs ETXTBSY CI / Test (rest) (pull_request) Successful in 7s Details CI / Test (server-http) (pull_request) Successful in 1m37s Details CI / Test (store) (pull_request) Successful in 5s Details CI / Lint (pull_request) Successful in 21s Details CI / Build (windows/amd64) (pull_request) Successful in 10s Details CI / Build (linux/arm64) (pull_request) Successful in 9s Details CI / Build (linux/amd64) (pull_request) Successful in 1m2s Details e2e / Playwright vs docker-compose (pull_request) Failing after 5m0s Details The original write-tmp-then-rename guard handles the ETXTBSY race on a vanilla filesystem, but inside the new ci-runner-go container our jobs land on overlayfs, which keeps a lagged "writable inode" view long enough to leak ETXTBSY into the exec the test does milliseconds later. After rename, probe-exec the file with a benign argument ("__rm_probe__" — every script's case statement falls through to a clean exit) until exec succeeds. Each script body is shaped `case "$1" in restore) ... ;; esac` so the probe is a no-op. 3s deadline keeps a stuck filesystem from hanging the suite.	2026-05-08 21:26:35 +01:00
steve	084ddd56ba	ci: force bash as default shell in container jobs CI / Test (rest) (pull_request) Failing after 56s Details CI / Test (store) (pull_request) Successful in 37s Details CI / Lint (pull_request) Successful in 17s Details CI / Test (server-http) (pull_request) Successful in 2m0s Details CI / Build (windows/amd64) (pull_request) Successful in 26s Details CI / Build (linux/amd64) (pull_request) Successful in 28s Details CI / Build (linux/arm64) (pull_request) Successful in 26s Details e2e / Playwright vs docker-compose (pull_request) Failing after 3m47s Details When jobs run with `container:` set, Gitea Actions defaults to `sh -e` (dash on Ubuntu), so `set -euo pipefail` fails with "Illegal option -o pipefail". Pinning bash workflow-wide matches what the runner used pre-container and keeps existing scripts portable.	2026-05-08 21:10:33 +01:00
steve	dedc653256	ci: run jobs in ci-runner-go container CI / Test (rest) (pull_request) Failing after 40s Details CI / Test (store) (pull_request) Failing after 40s Details CI / Lint (pull_request) Successful in 21s Details CI / Build (windows/amd64) (pull_request) Successful in 26s Details CI / Test (server-http) (pull_request) Failing after 1m19s Details CI / Build (linux/amd64) (pull_request) Successful in 27s Details CI / Build (linux/arm64) (pull_request) Successful in 27s Details e2e / Playwright vs docker-compose (pull_request) Failing after 5m18s Details Pin every job to gitea.dcglab.co.uk/steve/ci-runner-go:2026-05-08 so Go, Node, and Docker tooling are already installed when the job starts. Drops three actions/setup-go invocations from ci.yml (redundant — Go is on PATH) and inherits Buildx + Compose v2 in e2e.yml and release.yml without per-job apt-installs. Recipe lives in steve/ci. Bump the date pin in lockstep across the three workflows when picking up a fresher image (e.g. when the Go floor moves).	2026-05-08 21:06:38 +01:00
steve	60e9197c24	e2e: build playwright image with --profile test --pull CI / Test (server-http) (pull_request) Successful in 22s Details CI / Test (store) (pull_request) Successful in 22s Details CI / Lint (pull_request) Successful in 27s Details CI / Build (windows/amd64) (pull_request) Successful in 25s Details CI / Build (linux/amd64) (pull_request) Successful in 25s Details CI / Build (linux/arm64) (pull_request) Successful in 24s Details CI / Test (rest) (pull_request) Successful in 1m19s Details e2e / Playwright vs docker-compose (pull_request) Failing after 4m51s Details Without --profile test, `docker compose build` skips the playwright service (profiles: [test]) and the image is built on-demand by `compose run` instead. Across CI runs the Gitea runner caches the resulting tag, so a Dockerfile FROM bump (v1.50.0 → v1.59.1) is masked by the cached image — the container ends up with old browser binaries and Playwright's own version-mismatch check fails the suite. Pull base images on every build so the FROM tag wins.	2026-05-08 20:15:21 +01:00
steve	a3f134bcd6	e2e: pin Playwright to 1.59.1 CI / Test (rest) (pull_request) Successful in 34s Details CI / Test (store) (pull_request) Successful in 54s Details CI / Lint (pull_request) Successful in 26s Details CI / Build (windows/amd64) (pull_request) Successful in 26s Details CI / Build (linux/amd64) (pull_request) Successful in 25s Details CI / Build (linux/arm64) (pull_request) Successful in 25s Details e2e / Playwright vs docker-compose (pull_request) Failing after 1m36s Details CI / Test (server-http) (pull_request) Successful in 3m19s Details `@playwright/test` was loose-pinned to ^1.50.0; npm resolved it to 1.59.1 inside the runner image, which only ships browser binaries for 1.50.0. Pin both the package and the docker image to v1.59.1 so deps and binaries stay aligned.	2026-05-08 20:09:17 +01:00
steve	17b9ee08b7	e2e: run health probe + Playwright on the compose network Gitea's act-style runners execute workflow steps inside a runner container, so compose's host port-publish (127.0.0.1:8080:8080) is not reachable from the steps. PR #23's e2e job timed out waiting for the server even though the container was up and listening. Move both the health probe and the Playwright run onto rmnet so they address the server as http://server:8080: * health probe: docker run --rm --network e2e_rmnet curlimages/curl * Playwright: new mcr.microsoft.com/playwright-based image, added as a profile-gated `playwright` service in compose.e2e.yml, invoked via `docker compose run --rm playwright`. Drops the setup-node + npm install runner steps.	2026-05-08 20:08:23 +01:00
steve	89537d417a	P5: OSS readiness — docs site, contributor onboarding, e2e harness P5-01 — Documentation site under docs/book/ rendered with mdBook (downloaded via Makefile, same static-binary pattern as Tailwind). Structured chapters: getting started, concepts, operations, security, reference. `make docs` / `make docs-watch`. Generated output gitignored. P5-02 — CONTRIBUTING.md rewritten from placeholder to a full guide. CODE_OF_CONDUCT.md adapted from Contributor Covenant for a single-maintainer project. .gitea/issue_template/{bug,feature}.md and PULL_REQUEST_TEMPLATE.md. P5-04 — Six README screenshots captured live from a fresh server bootstrap (login, empty dashboard, add-host, alerts, settings, audit log). README rewritten to centre the screenshot grid and link out to the docs site. P5-05 — SECURITY.md with disclosure policy (3-day ack, 30-day default window), scope in/out, threat-model summary, operator hardening checklist. Mirrored as a docs-site chapter. P5-06 — End-to-end test harness. e2e/compose.e2e.yml brings up server + sibling Linux agent (alpine + restic) + restic/rest-server. Agent uses announce-and-approve so Playwright can drive the full operator flow: bootstrap → login → accept pending → backup → verify terminal status. Second spec scrapes /metrics to assert the P6-04 endpoint surface. .gitea/workflows/e2e.yml runs on every PR; local how-to in docs/e2e.md.	2026-05-08 20:08:23 +01:00
steve	a252b25854	Merge pull request 'spec+plan: P6-04/05 prometheus /metrics + Grafana dashboard' (#22 ) from p6-04-05-prometheus-metrics into main Reviewed-on: #22	2026-05-08 18:31:57 +00:00
steve	73e733be61	P6-04+05: Prometheus /metrics endpoint + Grafana dashboard CI / Test (rest) (pull_request) Successful in 41s Details CI / Test (store) (pull_request) Successful in 43s Details CI / Lint (pull_request) Successful in 29s Details CI / Build (windows/amd64) (pull_request) Successful in 44s Details CI / Test (server-http) (pull_request) Successful in 1m47s Details CI / Build (linux/arm64) (pull_request) Successful in 43s Details CI / Build (linux/amd64) (pull_request) Successful in 2m1s Details New internal/server/metrics package emits the legacy text/plain exposition format directly, so we don't pull in prometheus/client_golang. Endpoint is opt-in via RM_METRICS_TOKEN and/or RM_METRICS_TRUSTED_CIDR; route is not mounted at all if neither gate is set. Both gates ANDed when both configured. Per-host gauges (online, last_backup_*, repo_size_bytes, snapshot_count, open_alerts, repo_status), server gauges (hosts_total/online, active_alerts by severity, build_info), and an in-memory job-duration histogram observed from the existing MsgJobFinished branch in the WS handler. Docs in docs/prometheus.md (enable + scrape config + metric reference + dashboard import). Sample dashboard at deploy/grafana/restic-manager-dashboard.json - six panels, Grafana schema 39, single Prometheus datasource variable. Tests: golden render, concurrent observe, bucket boundaries in the metrics package; auth matrix (no auth -> 404, token gate, CIDR gate, both required) in the HTTP layer.	2026-05-07 23:17:15 +01:00
steve	70ff554402	spec+plan: P6-04/05 prometheus /metrics + Grafana dashboard	2026-05-07 23:07:30 +01:00
steve	39dcda4e9e	Merge pull request 'P6-03 repo size trend + agent-update UI fix + dashboard polish' (#21 ) from tidy-up-last-backup-projection into main Reviewed-on: #21	2026-05-07 22:00:03 +00:00
steve	1b9b23f205	smoke env: systemd --user unit + Make targets so the dev server outlives shell tool boundaries CI / Test (rest) (pull_request) Successful in 46s Details CI / Test (store) (pull_request) Successful in 1m34s Details CI / Test (server-http) (pull_request) Successful in 1m46s Details CI / Build (linux/amd64) (pull_request) Successful in 23s Details CI / Build (windows/amd64) (pull_request) Successful in 41s Details CI / Build (linux/arm64) (pull_request) Successful in 23s Details CI / Lint (pull_request) Successful in 2m9s Details Spent half an evening fighting a smoke server that kept getting SIGTERM'd mid-iteration. Root cause: backgrounded processes spawned from sandboxed shell tool calls don't outlive the parent — even with nohup + disown. Fix: hand the server to user-systemd as a transient unit so its lifecycle is owned by the user's session, not by whichever bash subprocess started it. New Make targets: make smoke-restart build server + (re)launch as systemd --user unit make smoke-status show unit status make smoke-logs tail $HOME/smoke/server.log make smoke-stop stop the unit make smoke-deploy full rebuild + restage agent assets + restart Documents the workflow in CLAUDE.md so the next session doesn't relitigate.	2026-05-07 22:55:36 +01:00
steve	c4dc9e9119	ui+store: dashboard polish — repo size projection + header alignment - Project total_size_bytes onto hosts.repo_size_bytes inside the UpsertHostRepoStats transaction. The hosts row column has been unwritten since the initial schema in 0001, so the dashboard's Repo size cell has always rendered '—' even after backups. Now the column updates atomically alongside the host_repo_stats row, and FleetSummary's SUM(repo_size_bytes) becomes accurate too. - Right-align the Alerts column header so it sits over its right-aligned value (was floating left of column, ambiguous). - Add text-ink-mid to the 30d trend / Alerts / Tags headers so all column headers share the same brightness.	2026-05-07 22:55:21 +01:00
steve	7011510092	ui: chart polish — rotated y-axis labels, wider viewBox, single-day fallback - Add rotated 'Size' (left) and 'Snapshots' (right) axis titles in the chart's outer margins so the two y-axes are self-describing. - Bump the chart viewBox from 600x220 to 640x220 and lift padL from 56 to 72 so the rotated labels and byte tick numbers don't crowd. - Dedupe the X-axis labels for short windows (1 or 2 days collapsed the start/mid/end indices onto each other, stacking 'May 7' three times); the 1-day case now centres a single label, 2-day uses start+end only. - Pin a lone data dot to the chart centre instead of the left edge when len(days)==1, so it sits under the centred date label. Goldens regenerated.	2026-05-07 22:55:12 +01:00
steve	42eeabea9a	ui: per-host Jobs sub-tab; drop unused Settings stub Adds /hosts/{id}/jobs page listing recent jobs for the host (newest first, capped at 100) with click-through to /jobs/{id}. Converts the Jobs placeholder <div> to a real <a> nav link; removes the Settings stub entirely. Also registers durationHuman template func and a .jobs-row CSS grid to match the existing .schd-row idiom.	2026-05-07 22:49:10 +01:00
steve	7b390e9e5e	ws: synthesize job.finished from update watcher so browser stream wakes up	2026-05-07 20:32:48 +01:00
steve	afd15c6990	tasks: P6-03 done, repo size trend graphs	2026-05-07 19:20:05 +01:00
steve	2562b2c7b5	test: assert Trend panel renders on full repo page	2026-05-07 19:14:34 +01:00
steve	8be551349c	ui: trend panel + range selector on host repo page	2026-05-07 19:10:59 +01:00
steve	a48df77f40	ui: 30d repo-size sparkline on every dashboard host row	2026-05-07 19:02:35 +01:00
steve	70769f0841	web/sparkline: guard days[i] against shorter days slice in RenderChart	2026-05-07 18:58:33 +01:00
steve	ea74965830	web/sparkline: two-axis trend chart with hover dots	2026-05-07 18:55:31 +01:00
steve	9c209a952e	web/sparkline: inline-SVG sparkline renderer (empty / single / multi)	2026-05-07 18:50:23 +01:00
steve	871490b9d4	ws: record daily repo stats history alongside current upsert	2026-05-07 18:46:26 +01:00
steve	d317d2e561	store: history table helpers (upsert/list, COALESCE preserves prior values)	2026-05-07 18:43:20 +01:00
steve	00bfef0aee	store: migration 0023 host_repo_stats_history	2026-05-07 18:39:44 +01:00
steve	363bdff85b	plan: P6-03 repo size trend implementation	2026-05-07 18:15:06 +01:00
steve	20425b3360	spec: P6-03 repo size trend (sparkline + chart) design	2026-05-07 18:09:25 +01:00
steve	9c098e773b	Merge pull request 'tidy: project finished backup jobs onto host row + smoke doc tweaks' (#20 ) from tidy-up-last-backup-projection into main Reviewed-on: #20	2026-05-07 16:58:16 +00:00
steve	711d5e964c	fix: project finished backup jobs onto host row + smoke path tweaks CI / Test (store) (pull_request) Successful in 50s Details CI / Test (rest) (pull_request) Successful in 1m5s Details CI / Lint (pull_request) Successful in 24s Details CI / Build (linux/amd64) (pull_request) Successful in 22s Details CI / Build (windows/amd64) (pull_request) Successful in 43s Details CI / Test (server-http) (pull_request) Successful in 1m51s Details CI / Build (linux/arm64) (pull_request) Successful in 21s Details The dashboard's 'Last backup' column reads hosts.last_backup_at / last_backup_status, but the WS handler only updated hosts.repo_status on job.finished — backup terminations were silently dropped. Add a SetHostLastBackup store method and call it from the same job.finished switch that already handles init jobs. Also: CLAUDE.md restage block uses /tmp/rm-smoke (the original default) but the actual dev env runs out of $HOME/smoke. Update the paths in the doc to match.	2026-05-07 17:55:23 +01:00
steve	39657355be	Merge pull request 'P6-01 + P6-02: agent self-update + fleet update' (#19 ) from p6-agent-self-update into main Reviewed-on: #19	2026-05-07 16:49:25 +00:00
steve	0bd075c2a3	tasks: mark P6-01 + P6-02 done with as-shipped block CI / Test (store) (pull_request) Successful in 52s Details CI / Test (rest) (pull_request) Successful in 1m6s Details CI / Lint (pull_request) Successful in 32s Details CI / Test (server-http) (pull_request) Successful in 1m41s Details CI / Build (windows/amd64) (pull_request) Successful in 41s Details CI / Build (linux/amd64) (pull_request) Successful in 22s Details CI / Build (linux/arm64) (pull_request) Successful in 24s Details	2026-05-06 22:33:33 +01:00
steve	83d97a27cc	agent unit: allow writes to /usr/local/bin for self-update Smoke caught this: ProtectSystem=full mounts /usr read-only so the agent couldn't write its own .new staging file or atomic-rename over the running binary. Adding /usr/local/bin to ReadWritePaths is the minimum diff that lets self-update work; the whole-dir grant is required because os.Rename needs write on the parent directory.	2026-05-06 22:32:50 +01:00
steve	ccaccd840a	ui: dashboard hosts-behind tile + filter - Add ?updates=behind query filter and the matching dashboardFilter field; round-trips through encode/parse. - Compute UpdatesBehind on the dashboard view-model (online + version trailing the server) and surface as an amber hero tile that links to the filtered list. - Test exercise covering the new filter case.	2026-05-06 22:20:54 +01:00
steve	94441a5371	ui: update chip + per-host button - Surface UpdateAvailable + TargetVersion on the dashboard host row, the host_chrome header, and the JSON Host shape. - New host_update_chip partial renders an amber out-of-date pill next to the agent-version display when the host's agent trails the server. - Host detail right-rail gains an admin-only Update agent button (disabled when host is offline or already updating). - New .update-chip and .btn-amber CSS tokens; tailwind output refreshed.	2026-05-06 22:20:40 +01:00
steve	3fa7be51a5	ui: fleet update page + endpoints - POST /api/fleet/update, POST /api/fleet-updates/{id}/cancel, GET /api/fleet-updates/{id} (admin-only). - GET /settings/fleet-update + /partial for htmx polling. - Renders idle / running / terminal states with per-host progress. - Tests cover happy path, derive-host-ids, conflict, cancel, get, and RBAC.	2026-05-06 22:20:03 +01:00
steve	6fd2a2ff77	p6-01/02: agent self-update + fleet update server cluster - alert: update_failed (per-host, dedup=hostID) + fleet_update_halted (system-scoped, host_id NULL via new RaiseOrTouchSystem helper). - ws: UpdateWatcher tracks in-flight command.update dispatches and reconciles them against incoming hello envelopes — success path marks the job succeeded and auto-resolves the alert; 90s timeout marks the job failed and raises update_failed. - http: POST /api/hosts/{id}/update (admin-only JSON) + the HTMX /hosts/{id}/update form variant. Pre-checks: host exists, online, agent_version != current, no running update job. Refactored core into Server.dispatchHostUpdate so the fleet worker can share it without going through HTTP. - fleetupdate: rolling worker iterating through host slots, halting on first failure and raising fleet_update_halted. Polling-based version-match (re-read hosts.agent_version every 1s up to 95s) — no extra plumbing into the WS hello path. At-most-one-running is enforced at the store layer (ErrFleetUpdateRunning). - cmd/server: wire UpdateWatcher and FleetWorker into the main goroutine; the worker uses a small serverDispatcher adapter that delegates back into Server.DispatchHostUpdate. Tests: watcher (success/timeout/mismatch/late-hello), HTTP endpoint (happy + four pre-check branches + RBAC), worker (two-host happy, timeout-halt, host-offline-halt, already-at-target skip, cancel mid-run, double-Start guard).	2026-05-06 22:03:50 +01:00
steve	d413896302	store: migrations 0021+0022 + fleet_updates CRUD	2026-05-06 21:47:54 +01:00
steve	74cf24c28b	agent: command.update handler + updater package (Linux + Windows)	2026-05-06 21:42:50 +01:00
steve	22bcf69e6c	http: expose GET /api/version	2026-05-06 21:39:13 +01:00
steve	fe1ed49977	version: build-time version package + Makefile ldflags wiring	2026-05-06 21:38:35 +01:00

1 2 3 4 5 ...

303 Commits