Keep the test WS client actively reading (a real agent always is) so
the server-side conn stays registered under parallel load, and drain to
completion via condition polling instead of asserting one-shot
completeness. The conn could be dropped/unregistered under CI load,
making DrainPending correctly no-op (conn==nil) and the test observe a
partial/empty drain. -race confirms no production data race; the
exactly-5-jobs assertion (proving the per-host mutex blocks
double-dispatch) is unchanged. Verified: 0 failures over 25 loaded runs
+ 4 -race iterations.
formatRelTime now wraps its label in <time data-rel-ts=...>, and
both layouts include a small ticker that re-renders every 30s.
Without this, a job-detail page rendered an hour ago kept showing
'2h ago' when the wall-clock truth was '3h ago'.
The Dockerfile only set `-X main.version=...`, so docker-built binaries
left `internal/version.Version` at its default "dev". The update logic
(host_update.go:61, hosts.go:94, fleet_update.go:101 et al.) compares
against `internal/version.Version`, so a v1.0.0 host always looked
out-of-date to a v1.0.0 server, the chip never cleared, and pressing
"update" re-downloaded the same bundled binary on a loop.
Collapse the two version sources: drop the `var version/commit/date`
locals in cmd/{server,agent}/main.go, route everything through
internal/version (now also carrying Date), and have both the Dockerfile
and the Makefile set the same single set of -X flags. Verified
end-to-end: make build and docker build both emit binaries whose
--version reflects the build VERSION.
The CI e2e workflow greps for 'bootstrap token' in server logs to capture
the one-shot token. The earlier reword dropped that phrase; restore it on
the headless-instructions line so .gitea/workflows/e2e.yml step 'Capture
bootstrap token from server logs' keeps matching.
- CHANGELOG.md: Keep-a-Changelog format, v1.0.0 entry summarising
what each phase delivered.
- docs/threat-model.md: structured walkthrough of assets, actors,
attack surfaces and residual risks; reviewed against v1.0.0.
- cmd/server/main.go: at first-run startup, print a clickable
$RM_BASE_URL/bootstrap URL alongside the existing one-shot
bootstrap token (or a fallback hint when RM_BASE_URL is unset).
- web/templates/pages/bootstrap.html: visible "Minimum 12 characters"
hint under the password field so the rule is communicated
before the operator submits.
- tasks.md: close X-01, X-04, X-05 with notes.
Two small follow-ups noted while working through the
p5-oss-readiness CI-runner switch:
* NS-05 — actions/setup-go is now redundant; ci-runner-go ships
Go on PATH and re-downloading on every job costs ~5s a shard.
* NS-06 — host_chrome's per-host "Run backup now" button is a
permanently-disabled tombstone; remove it so the chrome stops
advertising an action that no longer exists.
Per-host Run-backup is gone — the host_chrome partial still
renders the button but it's hard-disabled with a tooltip
pointing to per-source-group Run-now. The smoke test was
clicking that disabled button and waiting forever for a URL
change that would never happen.
Replace the navigation-based dispatch with two API calls:
create a source group covering the agent's /source mount,
then POST to /api/hosts/{id}/source-groups/{gid}/run. The
backup-status assertion at the end is unchanged — host record
is still the source of truth.
The dashboard renders init_running / init_failed / ready state
based on host.repo_status, but the JSON endpoint dropped the
field on its way out. The e2e test couldn't poll for repo
readiness; reflect the same projection the UI uses.
Two issues uncovered by the page-snapshot dump after the agent
state-dir fix:
* The host page server-renders `Run backup now` as disabled
while repo_status != ready, and the page has no live-refresh
on that field. The test was navigating right after status
flipped to 'online' but before auto-init had completed (~3s
later), so the rendered HTML still showed init_running and
the click was a no-op. Wait for repo_status === 'ready'
before navigating.
* playwright.config.ts pinned the per-test timeout at 60s,
but the test itself uses 60s + 120s of internal waits.
Bump to 240s so the test fails on real regressions instead
of timing out on its own internal budget.
Renamed the test description away from "under a minute" since
it overpromises against the new timeout. The performance SLO
belongs in a separate test if we want to assert it.
The agent writes its encrypted secrets blob to
$DefaultSecretsPath (/var/lib/restic-manager/secrets.enc) but
the e2e fixtures created and mounted a directory at
/var/lib/restic-manager-agent — name mismatch. Result: every
`config.update` push failed with 'create tmp: no such file or
directory', the auto-init never got the repo creds, the host
landed in init_failed, and the smoke test couldn't kick off a
backup (the Run backup button is disabled while
repo_status != ready).
Align the compose volume mount and the Dockerfile mkdir on
/var/lib/restic-manager so they match the production install
script + the agent's own default.
The dashboard's empty-state ("No hosts yet.") was gated on
HostCount == 0 alone, which hid the pending-hosts panel — and
the inline accept form — for the most common first-run scenario:
operator just installed an agent that announced, the fleet has
zero accepted hosts, and the only thing the operator needs to do
is review fingerprint + click Accept.
Tighten the gate so the empty state only shows when there are
truly zero hosts and zero pending announces. With a pending
host, fall through to the regular dashboard layout so the
approval queue is visible and actionable.
Caught by the e2e enrol-via-announce smoke test (now unblocked
on PR #23).