P5: OSS readiness — docs site, contributor onboarding, e2e harness #23

Merged
steve merged 14 commits from p5-oss-readiness into main 2026-05-08 22:22:39 +01:00
Owner

P5-01 — Documentation site under docs/book/ rendered with mdBook
(downloaded via Makefile, same static-binary pattern as Tailwind).
Structured chapters: getting started, concepts, operations,
security, reference. make docs / make docs-watch. Generated
output gitignored.

P5-02 — CONTRIBUTING.md rewritten from placeholder to a full
guide. CODE_OF_CONDUCT.md adapted from Contributor Covenant for a
single-maintainer project. .gitea/issue_template/{bug,feature}.md
and PULL_REQUEST_TEMPLATE.md.

P5-04 — Six README screenshots captured live from a fresh server
bootstrap (login, empty dashboard, add-host, alerts, settings,
audit log). README rewritten to centre the screenshot grid and
link out to the docs site.

P5-05 — SECURITY.md with disclosure policy (3-day ack, 30-day
default window), scope in/out, threat-model summary, operator
hardening checklist. Mirrored as a docs-site chapter.

P5-06 — End-to-end test harness. e2e/compose.e2e.yml brings up
server + sibling Linux agent (alpine + restic) + restic/rest-server.
Agent uses announce-and-approve so Playwright can drive the full
operator flow: bootstrap → login → accept pending → backup →
verify terminal status. Second spec scrapes /metrics to assert
the P6-04 endpoint surface. .gitea/workflows/e2e.yml runs on every
PR; local how-to in docs/e2e.md.

P5-01 — Documentation site under docs/book/ rendered with mdBook (downloaded via Makefile, same static-binary pattern as Tailwind). Structured chapters: getting started, concepts, operations, security, reference. `make docs` / `make docs-watch`. Generated output gitignored. P5-02 — CONTRIBUTING.md rewritten from placeholder to a full guide. CODE_OF_CONDUCT.md adapted from Contributor Covenant for a single-maintainer project. .gitea/issue_template/{bug,feature}.md and PULL_REQUEST_TEMPLATE.md. P5-04 — Six README screenshots captured live from a fresh server bootstrap (login, empty dashboard, add-host, alerts, settings, audit log). README rewritten to centre the screenshot grid and link out to the docs site. P5-05 — SECURITY.md with disclosure policy (3-day ack, 30-day default window), scope in/out, threat-model summary, operator hardening checklist. Mirrored as a docs-site chapter. P5-06 — End-to-end test harness. e2e/compose.e2e.yml brings up server + sibling Linux agent (alpine + restic) + restic/rest-server. Agent uses announce-and-approve so Playwright can drive the full operator flow: bootstrap → login → accept pending → backup → verify terminal status. Second spec scrapes /metrics to assert the P6-04 endpoint surface. .gitea/workflows/e2e.yml runs on every PR; local how-to in docs/e2e.md.
steve added 3 commits 2026-05-08 20:09:24 +01:00
P5-01 — Documentation site under docs/book/ rendered with mdBook
(downloaded via Makefile, same static-binary pattern as Tailwind).
Structured chapters: getting started, concepts, operations,
security, reference. `make docs` / `make docs-watch`. Generated
output gitignored.

P5-02 — CONTRIBUTING.md rewritten from placeholder to a full
guide. CODE_OF_CONDUCT.md adapted from Contributor Covenant for a
single-maintainer project. .gitea/issue_template/{bug,feature}.md
and PULL_REQUEST_TEMPLATE.md.

P5-04 — Six README screenshots captured live from a fresh server
bootstrap (login, empty dashboard, add-host, alerts, settings,
audit log). README rewritten to centre the screenshot grid and
link out to the docs site.

P5-05 — SECURITY.md with disclosure policy (3-day ack, 30-day
default window), scope in/out, threat-model summary, operator
hardening checklist. Mirrored as a docs-site chapter.

P5-06 — End-to-end test harness. e2e/compose.e2e.yml brings up
server + sibling Linux agent (alpine + restic) + restic/rest-server.
Agent uses announce-and-approve so Playwright can drive the full
operator flow: bootstrap → login → accept pending → backup →
verify terminal status. Second spec scrapes /metrics to assert
the P6-04 endpoint surface. .gitea/workflows/e2e.yml runs on every
PR; local how-to in docs/e2e.md.
Gitea's act-style runners execute workflow steps inside a runner
container, so compose's host port-publish (127.0.0.1:8080:8080) is
not reachable from the steps. PR #23's e2e job timed out waiting
for the server even though the container was up and listening.

Move both the health probe and the Playwright run onto rmnet so
they address the server as http://server:8080:

* health probe: docker run --rm --network e2e_rmnet curlimages/curl
* Playwright: new mcr.microsoft.com/playwright-based image, added
  as a profile-gated `playwright` service in compose.e2e.yml,
  invoked via `docker compose run --rm playwright`. Drops the
  setup-node + npm install runner steps.
e2e: pin Playwright to 1.59.1
CI / Test (rest) (pull_request) Successful in 34s
CI / Test (store) (pull_request) Successful in 54s
CI / Lint (pull_request) Successful in 26s
CI / Build (windows/amd64) (pull_request) Successful in 26s
CI / Build (linux/amd64) (pull_request) Successful in 25s
CI / Build (linux/arm64) (pull_request) Successful in 25s
e2e / Playwright vs docker-compose (pull_request) Failing after 1m36s
CI / Test (server-http) (pull_request) Successful in 3m19s
a3f134bcd6
`@playwright/test` was loose-pinned to ^1.50.0; npm resolved it
to 1.59.1 inside the runner image, which only ships browser
binaries for 1.50.0. Pin both the package and the docker image
to v1.59.1 so deps and binaries stay aligned.
steve force-pushed p5-oss-readiness from 1af02f4495 to a3f134bcd6 2026-05-08 20:09:24 +01:00 Compare
steve added 1 commit 2026-05-08 20:15:24 +01:00
e2e: build playwright image with --profile test --pull
CI / Test (server-http) (pull_request) Successful in 22s
CI / Test (store) (pull_request) Successful in 22s
CI / Lint (pull_request) Successful in 27s
CI / Build (windows/amd64) (pull_request) Successful in 25s
CI / Build (linux/amd64) (pull_request) Successful in 25s
CI / Build (linux/arm64) (pull_request) Successful in 24s
CI / Test (rest) (pull_request) Successful in 1m19s
e2e / Playwright vs docker-compose (pull_request) Failing after 4m51s
60e9197c24
Without --profile test, `docker compose build` skips the
playwright service (profiles: [test]) and the image is built
on-demand by `compose run` instead. Across CI runs the Gitea
runner caches the resulting tag, so a Dockerfile FROM bump
(v1.50.0 → v1.59.1) is masked by the cached image — the
container ends up with old browser binaries and Playwright's
own version-mismatch check fails the suite. Pull base images
on every build so the FROM tag wins.
steve added 1 commit 2026-05-08 21:06:39 +01:00
ci: run jobs in ci-runner-go container
CI / Test (rest) (pull_request) Failing after 40s
CI / Test (store) (pull_request) Failing after 40s
CI / Lint (pull_request) Successful in 21s
CI / Build (windows/amd64) (pull_request) Successful in 26s
CI / Test (server-http) (pull_request) Failing after 1m19s
CI / Build (linux/amd64) (pull_request) Successful in 27s
CI / Build (linux/arm64) (pull_request) Successful in 27s
e2e / Playwright vs docker-compose (pull_request) Failing after 5m18s
dedc653256
Pin every job to gitea.dcglab.co.uk/steve/ci-runner-go:2026-05-08
so Go, Node, and Docker tooling are already installed when the
job starts. Drops three actions/setup-go invocations from ci.yml
(redundant — Go is on PATH) and inherits Buildx + Compose v2 in
e2e.yml and release.yml without per-job apt-installs.

Recipe lives in steve/ci. Bump the date pin in lockstep across
the three workflows when picking up a fresher image (e.g. when
the Go floor moves).
steve added 1 commit 2026-05-08 21:10:35 +01:00
ci: force bash as default shell in container jobs
CI / Test (rest) (pull_request) Failing after 56s
CI / Test (store) (pull_request) Successful in 37s
CI / Lint (pull_request) Successful in 17s
CI / Test (server-http) (pull_request) Successful in 2m0s
CI / Build (windows/amd64) (pull_request) Successful in 26s
CI / Build (linux/amd64) (pull_request) Successful in 28s
CI / Build (linux/arm64) (pull_request) Successful in 26s
e2e / Playwright vs docker-compose (pull_request) Failing after 3m47s
084ddd56ba
When jobs run with `container:` set, Gitea Actions defaults to
`sh -e` (dash on Ubuntu), so `set -euo pipefail` fails with
"Illegal option -o pipefail". Pinning bash workflow-wide
matches what the runner used pre-container and keeps existing
scripts portable.
steve force-pushed p5-oss-readiness from 7793767625 to 4f1ca2fed8 2026-05-08 21:26:24 +01:00 Compare
steve added 1 commit 2026-05-08 21:26:37 +01:00
runner tests: probe-exec setupScript to clear overlayfs ETXTBSY
CI / Test (rest) (pull_request) Successful in 7s
CI / Test (server-http) (pull_request) Successful in 1m37s
CI / Test (store) (pull_request) Successful in 5s
CI / Lint (pull_request) Successful in 21s
CI / Build (windows/amd64) (pull_request) Successful in 10s
CI / Build (linux/arm64) (pull_request) Successful in 9s
CI / Build (linux/amd64) (pull_request) Successful in 1m2s
e2e / Playwright vs docker-compose (pull_request) Failing after 5m0s
21567adb8e
The original write-tmp-then-rename guard handles the ETXTBSY race
on a vanilla filesystem, but inside the new ci-runner-go
container our jobs land on overlayfs, which keeps a lagged
"writable inode" view long enough to leak ETXTBSY into the
exec the test does milliseconds later.

After rename, probe-exec the file with a benign argument
("__rm_probe__" — every script's case statement falls through
to a clean exit) until exec succeeds. Each script body is shaped
`case "$1" in restore) ... ;; esac` so the probe is a no-op.
3s deadline keeps a stuck filesystem from hanging the suite.
steve force-pushed p5-oss-readiness from 4f1ca2fed8 to 21567adb8e 2026-05-08 21:26:37 +01:00 Compare
steve added 1 commit 2026-05-08 21:36:12 +01:00
e2e: extract Playwright report via docker cp instead of bind mount
CI / Test (server-http) (pull_request) Successful in 6s
CI / Test (store) (pull_request) Successful in 5s
CI / Build (windows/amd64) (pull_request) Successful in 7s
CI / Lint (pull_request) Successful in 19s
CI / Build (linux/amd64) (pull_request) Successful in 7s
CI / Build (linux/arm64) (pull_request) Successful in 8s
CI / Test (rest) (pull_request) Successful in 51s
e2e / Playwright vs docker-compose (pull_request) Failing after 3m30s
e14dd82f20
When the runner job runs inside a container, compose's relative
`./playwright/playwright-report` resolves to a path that exists
only inside the runner container, so the host's docker daemon
silently bind-mounts an empty dir and the report never lands
anywhere we can read.

Drop the bind mounts; keep the playwright container around
(--name e2e-pw, no --rm); after the test, `docker cp` the
report and traces out into the runner's workspace volume so
upload-artifact has something real to upload. The new test-results
directory (Playwright traces, screenshots, videos) is also
included so failure post-mortem doesn't need a re-run.
steve added 1 commit 2026-05-08 21:41:42 +01:00
e2e: dump error-context.md to log on failure + bump upload-artifact
CI / Test (server-http) (pull_request) Successful in 7s
CI / Test (store) (pull_request) Successful in 6s
CI / Test (rest) (pull_request) Successful in 13s
CI / Build (windows/amd64) (pull_request) Successful in 9s
CI / Build (linux/arm64) (pull_request) Successful in 8s
CI / Lint (pull_request) Successful in 19s
CI / Build (linux/amd64) (pull_request) Successful in 15s
e2e / Playwright vs docker-compose (pull_request) Failing after 3m37s
74be681b4b
The Playwright run produces error-context.md per failed test
with a full DOM snapshot — useful for triaging UI test failures
without round-tripping through downloaded artifacts. Cat it
into the workflow log on failure.

Also bump actions/upload-artifact v3 → v4. v3 uploads still
return success on this Gitea runner but the artifacts don't
surface through the API or UI; v4 is the correct version per
the workflow header note.
steve added 1 commit 2026-05-08 21:47:35 +01:00
ui: show pending-hosts panel even when fleet is otherwise empty
CI / Test (store) (pull_request) Successful in 6s
CI / Lint (pull_request) Successful in 20s
CI / Build (windows/amd64) (pull_request) Successful in 10s
CI / Test (rest) (pull_request) Successful in 41s
CI / Build (linux/amd64) (pull_request) Successful in 8s
CI / Build (linux/arm64) (pull_request) Successful in 8s
CI / Test (server-http) (pull_request) Successful in 3m8s
e2e / Playwright vs docker-compose (pull_request) Failing after 3m28s
523ac4137a
The dashboard's empty-state ("No hosts yet.") was gated on
HostCount == 0 alone, which hid the pending-hosts panel — and
the inline accept form — for the most common first-run scenario:
operator just installed an agent that announced, the fleet has
zero accepted hosts, and the only thing the operator needs to do
is review fingerprint + click Accept.

Tighten the gate so the empty state only shows when there are
truly zero hosts and zero pending announces. With a pending
host, fall through to the regular dashboard layout so the
approval queue is visible and actionable.

Caught by the e2e enrol-via-announce smoke test (now unblocked
on PR #23).
steve added 1 commit 2026-05-08 21:53:38 +01:00
e2e: fix agent state-dir to /var/lib/restic-manager
CI / Test (store) (pull_request) Successful in 6s
CI / Test (rest) (pull_request) Successful in 17s
CI / Lint (pull_request) Successful in 19s
CI / Build (linux/amd64) (pull_request) Successful in 8s
CI / Build (linux/arm64) (pull_request) Successful in 9s
CI / Build (windows/amd64) (pull_request) Successful in 57s
CI / Test (server-http) (pull_request) Successful in 1m30s
e2e / Playwright vs docker-compose (pull_request) Failing after 3m26s
51fe1946b7
The agent writes its encrypted secrets blob to
$DefaultSecretsPath (/var/lib/restic-manager/secrets.enc) but
the e2e fixtures created and mounted a directory at
/var/lib/restic-manager-agent — name mismatch. Result: every
`config.update` push failed with 'create tmp: no such file or
directory', the auto-init never got the repo creds, the host
landed in init_failed, and the smoke test couldn't kick off a
backup (the Run backup button is disabled while
repo_status != ready).

Align the compose volume mount and the Dockerfile mkdir on
/var/lib/restic-manager so they match the production install
script + the agent's own default.
steve added 1 commit 2026-05-08 22:00:28 +01:00
e2e: wait for repo_status=ready and bump test timeout
CI / Test (rest) (pull_request) Successful in 8s
CI / Test (store) (pull_request) Successful in 8s
CI / Test (server-http) (pull_request) Successful in 12s
CI / Build (windows/amd64) (pull_request) Successful in 9s
CI / Lint (pull_request) Successful in 20s
CI / Build (linux/arm64) (pull_request) Successful in 9s
CI / Build (linux/amd64) (pull_request) Successful in 17s
e2e / Playwright vs docker-compose (pull_request) Failing after 4m7s
ccd7c2f2fd
Two issues uncovered by the page-snapshot dump after the agent
state-dir fix:

* The host page server-renders `Run backup now` as disabled
  while repo_status != ready, and the page has no live-refresh
  on that field. The test was navigating right after status
  flipped to 'online' but before auto-init had completed (~3s
  later), so the rendered HTML still showed init_running and
  the click was a no-op. Wait for repo_status === 'ready'
  before navigating.

* playwright.config.ts pinned the per-test timeout at 60s,
  but the test itself uses 60s + 120s of internal waits.
  Bump to 240s so the test fails on real regressions instead
  of timing out on its own internal budget.

Renamed the test description away from "under a minute" since
it overpromises against the new timeout. The performance SLO
belongs in a separate test if we want to assert it.
steve added 1 commit 2026-05-08 22:06:28 +01:00
api: expose host.repo_status in /api/hosts JSON
CI / Test (rest) (pull_request) Successful in 11s
CI / Test (store) (pull_request) Successful in 10s
CI / Build (windows/amd64) (pull_request) Successful in 15s
CI / Lint (pull_request) Successful in 19s
CI / Build (linux/arm64) (pull_request) Successful in 7s
CI / Build (linux/amd64) (pull_request) Successful in 15s
CI / Test (server-http) (pull_request) Successful in 1m30s
e2e / Playwright vs docker-compose (pull_request) Failing after 6m36s
130b68226e
The dashboard renders init_running / init_failed / ready state
based on host.repo_status, but the JSON endpoint dropped the
field on its way out. The e2e test couldn't poll for repo
readiness; reflect the same projection the UI uses.
steve added 1 commit 2026-05-08 22:16:59 +01:00
e2e: dispatch backup via source-group API
CI / Test (rest) (pull_request) Successful in 7s
CI / Test (store) (pull_request) Successful in 6s
CI / Build (windows/amd64) (pull_request) Successful in 8s
CI / Lint (pull_request) Successful in 18s
CI / Build (linux/amd64) (pull_request) Successful in 7s
CI / Build (linux/arm64) (pull_request) Successful in 8s
e2e / Playwright vs docker-compose (pull_request) Successful in 1m27s
CI / Test (server-http) (pull_request) Successful in 3m3s
ea9941b9ec
Per-host Run-backup is gone — the host_chrome partial still
renders the button but it's hard-disabled with a tooltip
pointing to per-source-group Run-now. The smoke test was
clicking that disabled button and waiting forever for a URL
change that would never happen.

Replace the navigation-based dispatch with two API calls:
create a source group covering the agent's /source mount,
then POST to /api/hosts/{id}/source-groups/{gid}/run. The
backup-status assertion at the end is unchanged — host record
is still the source of truth.
steve merged commit 7be2e4c5b0 into main 2026-05-08 22:22:39 +01:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: steve/restic-manager#23