Commit Graph

18 Commits

Author SHA1 Message Date
steve a8026608ae ci: force bash as default shell in container jobs
When jobs run with `container:` set, Gitea Actions defaults to
`sh -e` (dash on Ubuntu), so `set -euo pipefail` fails with
"Illegal option -o pipefail". Pinning bash workflow-wide
matches what the runner used pre-container and keeps existing
scripts portable.
2026-05-08 21:10:33 +01:00
steve 6c23bdbe63 ci: run jobs in ci-runner-go container
Pin every job to gitea.dcglab.co.uk/steve/ci-runner-go:2026-05-08
so Go, Node, and Docker tooling are already installed when the
job starts. Drops three actions/setup-go invocations from ci.yml
(redundant — Go is on PATH) and inherits Buildx + Compose v2 in
e2e.yml and release.yml without per-job apt-installs.

Recipe lives in steve/ci. Bump the date pin in lockstep across
the three workflows when picking up a fresher image (e.g. when
the Go floor moves).
2026-05-08 21:06:38 +01:00
steve a087321570 e2e: build playwright image with --profile test --pull
Without --profile test, `docker compose build` skips the
playwright service (profiles: [test]) and the image is built
on-demand by `compose run` instead. Across CI runs the Gitea
runner caches the resulting tag, so a Dockerfile FROM bump
(v1.50.0 → v1.59.1) is masked by the cached image — the
container ends up with old browser binaries and Playwright's
own version-mismatch check fails the suite. Pull base images
on every build so the FROM tag wins.
2026-05-08 20:15:21 +01:00
steve af2cb292b8 e2e: run health probe + Playwright on the compose network
Gitea's act-style runners execute workflow steps inside a runner
container, so compose's host port-publish (127.0.0.1:8080:8080) is
not reachable from the steps. PR #23's e2e job timed out waiting
for the server even though the container was up and listening.

Move both the health probe and the Playwright run onto rmnet so
they address the server as http://server:8080:

* health probe: docker run --rm --network e2e_rmnet curlimages/curl
* Playwright: new mcr.microsoft.com/playwright-based image, added
  as a profile-gated `playwright` service in compose.e2e.yml,
  invoked via `docker compose run --rm playwright`. Drops the
  setup-node + npm install runner steps.
2026-05-08 20:08:23 +01:00
steve bb4ed3502d P5: OSS readiness — docs site, contributor onboarding, e2e harness
P5-01 — Documentation site under docs/book/ rendered with mdBook
(downloaded via Makefile, same static-binary pattern as Tailwind).
Structured chapters: getting started, concepts, operations,
security, reference. `make docs` / `make docs-watch`. Generated
output gitignored.

P5-02 — CONTRIBUTING.md rewritten from placeholder to a full
guide. CODE_OF_CONDUCT.md adapted from Contributor Covenant for a
single-maintainer project. .gitea/issue_template/{bug,feature}.md
and PULL_REQUEST_TEMPLATE.md.

P5-04 — Six README screenshots captured live from a fresh server
bootstrap (login, empty dashboard, add-host, alerts, settings,
audit log). README rewritten to centre the screenshot grid and
link out to the docs site.

P5-05 — SECURITY.md with disclosure policy (3-day ack, 30-day
default window), scope in/out, threat-model summary, operator
hardening checklist. Mirrored as a docs-site chapter.

P5-06 — End-to-end test harness. e2e/compose.e2e.yml brings up
server + sibling Linux agent (alpine + restic) + restic/rest-server.
Agent uses announce-and-approve so Playwright can drive the full
operator flow: bootstrap → login → accept pending → backup →
verify terminal status. Second spec scrapes /metrics to assert
the P6-04 endpoint surface. .gitea/workflows/e2e.yml runs on every
PR; local how-to in docs/e2e.md.
2026-05-08 20:08:23 +01:00
steve ab7fee0ae7 ci(release): use DEV_TOKEN for registry login
Release / Build + push image (push) Successful in 3m57s
The auto-issued GITHUB_TOKEN lacks write:package scope on this Gitea
instance, so the v0.9.0 tag build failed at docker login. Switch to
the user-level DEV_TOKEN secret which has the correct scope.
2026-05-06 19:05:54 +01:00
steve fb978ad10c p5-03: docker-only release path (drop goreleaser)
Single public deliverable per tag: a multi-arch server image, with
cross-compiled agent binaries + install scripts + the systemd unit
baked under /opt/restic-manager/dist/. The /agent/binary and
/install/* handlers fall back from <DataDir>/... to that read-only
path so a fresh container Just Works without first-run staging;
operators can still drop a custom build into <DataDir>/ to override
per-host.

Architecture rationale: agent distribution already routes through
the running server, so the release surface mirrors that — there's
no second source of truth to keep in sync.

Workflow .gitea/workflows/release.yml triggers on v*.*.* tag-push
(fan-out :vX.Y.Z / :X.Y / :X, plus :latest once MAJOR>=1) and
workflow_dispatch (snapshot tag only). Pushes to the Gitea
container registry on this instance.

Both binaries grow main.commit + main.date ldflag targets. Makefile
and Dockerfile fill them; release workflow forwards from gitea.sha
plus a UTC timestamp.

Spec : docs/superpowers/specs/2026-05-05-p5-03-docker-only-release.md
Plan : docs/superpowers/plans/2026-05-05-p5-03-docker-only-release.md
2026-05-05 15:18:48 +01:00
steve b2983aed52 ci: shard test job + cheap argon2 in test mode
Test job was wall-clocked by `internal/server/http` (~156s on the
self-hosted runner under -race). Two changes here cut that:

1. Matrix-shard the test job by package group: server-http, store,
   and "rest" (everything else, computed via `go list | grep -v`).
   Each shard runs on its own runner so the heavy package isn't
   CPU-starved by siblings.

2. `auth.HashPassword` drops to cheap argon2id params (8 KiB / 1
   iter / 1 lane) when `testing.Testing()` returns true. Production
   params are unchanged. VerifyPassword reads params from the
   encoded hash so cheap-params hashes verify identically — no test
   call sites need to change.
2026-05-05 08:40:50 +01:00
steve e73c4bd96c infra: remove provision-gitea-runner.sh (now lives with the infra team)
The runner-provisioning script has been handed off to the infra
agent, who will own it going forward. ci.yml's header comment is
updated to point at "the infra team owns the script" rather than
the in-repo path, but the runner expectations themselves stay the
same — workflows still rely on the persistent volumes, pre-cloned
actions, and host-installed golangci-lint that any compliant
provisioning produces.
2026-05-04 10:19:09 +01:00
steve bd460d7532 ci+infra: provisioning script for gitea runners + drop setup-go cache
scripts/provision-gitea-runner.sh is a one-shot, idempotent host
setup for an act_runner LXC. It mounts persistent host volumes for
GOMODCACHE / GOCACHE / act-clones, pre-pulls the runner image,
pre-clones the common GitHub actions, installs golangci-lint, and
sets up a nightly cron to refresh the lot. Generic — no per-project
state.

With those persistent volumes in place, `cache: true` on
actions/setup-go becomes a net negative — the action keeps tar-ing /
un-tar-ing GOMODCACHE+GOCACHE through the Gitea cache backend on
every job, adding ~10s per job and overwriting the volume contents.
Drop it from all three jobs in ci.yml. Add a header comment block
explaining the runner-side expectations and the Go version / build
matrix / upload-artifact context for anyone reading later.
2026-05-04 09:40:27 +01:00
steve d9c8da139c ci: bump golangci-lint to v2.5.0 (Go 1.25-built binary)
The v2.1.6 release binary is built with Go 1.24, and golangci-lint
refuses to load a config targeting a newer toolchain than itself
('Go language version (go1.24) used to build golangci-lint is lower
than the targeted Go version (1.25.0)'). go.mod is on 1.25, so the
binary needs to be too.

Locally this didn't bite because 'go install …@v2.1.6' compiled
v2.1.6 against the local Go 1.25 toolchain; CI uses the prebuilt
release tarball which carries the build-time Go version.

v2.5.0 is the first v2.x line built with Go 1.25 — pin in lockstep
with go.mod going forward.
2026-05-03 21:29:02 +01:00
steve b6f8de1dcc lint: drive baseline to zero, drop only-new-issues gate
Cleanup pass over the repo so CI can enforce lint going forward
without the only-new-issues escape hatch:

* gofumpt -w across the tree (31 hits, all formatting)
* misspell --fix (25 hits, US-locale spelling) — but reverted on
  api.JobCancelled = "cancelled" since that literal is the wire +
  DB CHECK constraint value, plus matched the case in store/fleet.go
  back to "cancelled" and added //nolint:misspell on both for the
  next time someone reaches for the auto-fix
* Wrap every `defer rows.Close()` / `defer stmt.Close()` /
  `defer res.Body.Close()` in `defer func() { _ = .Close() }()`
  to satisfy errcheck without losing the close itself
* websocket.Dial callers (1 prod, 4 tests) now capture + close the
  upgrade response Body — coder/websocket can return res with a nil
  Body on success, so the test deferred-closes guard against that
* Annotate the two genuine-by-design nilerr cases with //nolint
  comments explaining why nil-on-error is the contract (cookie
  missing = no session; ctx cancelled mid-backoff = clean shutdown)
* Add brief godoc on the 10 exported const groups + types that
  revive flagged (api.HostOS/HostArch/JobKind/JobStatus/LogStream/
  ErrorCode, restic.EventKind, store.Role, web.FS)
* Drop the unused (*Server).userByID method
* Inline the unparam baseView(active) — every UI page is under
  the dashboard primary nav today

Result: `golangci-lint run ./...` reports 0 issues. CI lint job
no longer needs only-new-issues: true; X-06 follow-up entry in
tasks.md removed.
2026-05-03 16:15:17 +01:00
steve 41c3ec7c6f ci: migrate .golangci.yml to v2 schema + only-new-issues gate
The bump from golangci-lint-action@v6 → v7 (which downloads the v2.x
binary) was blocking CI lint with 'unsupported version of the
configuration: ""' because .golangci.yml was still in the v1 schema.

Migrate the config to v2:
* version: "2" prelude
* disable-all → default: none
* linters-settings → linters.settings
* gofumpt + goimports move into formatters.enable + formatters.settings
* exclude-rules move into linters.exclusions.rules
* gosimple drops (folded into staticcheck in v2)

Fix the four lint hits in the new P2R-02 code:
* host_bandwidth.go: convert hostBandwidthRequest directly to
  hostBandwidthView via type conversion (S1016)
* ui_repo.go: drop unparam savedSection + status arguments from
  renderRepoPage (always "" / always 422 — split GET render from
  validation-fail render)
* ui_schedules.go: gofumpt formatting on the scheduleEditPage struct

Add only-new-issues: true to the lint job. The repo carries ~90
pre-existing findings (gofumpt drift × 31, misspell × 25, missing
godoc × 10, bodyclose × 6, errcheck × 12, …) accumulated before
lint was actually wired into CI. Without this gate, every PR would
fail on baseline noise instead of its own changes.

Track the cleanup as X-06 in tasks.md so the gate is temporary.
2026-05-03 15:00:24 +01:00
steve 9ac5088fde P2R-02 slice 4: Repo tab — connection / bandwidth / maintenance
Three independent forms on /hosts/{id}/repo so saving one section
doesn't disturb the others:

* Connection: edits repo URL, username, password (pre-filled from
  the redacted GET /api/hosts/{id}/repo-credentials view; password
  field shows masked stored-creds placeholder; blank password = keep
  existing). On save, encrypts and pushes config.update to a
  connected agent.
* Bandwidth: host-wide upload/download caps (KB/s; blank = no cap)
  written via store.SetHostBandwidth. New REST endpoint
  PUT /api/hosts/{id}/bandwidth for JSON callers.
* Maintenance: forget/prune/check cadences + check subset %, with
  per-row enabled toggles. Reuses cronParser for validation;
  auto-seeds the row if a host pre-dates the migration.

Right-rail surfaces repo size, snapshot count, snapshots-by-tag
breakdown (counted from existing snapshot tag rows), and an
'untagged snapshots are left alone' note.

Danger-zone re-init button is rendered but disabled with a hint
pointing at P2R-09 (real implementation lands there).

Validation re-renders the page with the relevant form's banner and
all other section state intact. Successful saves redirect with a
?saved=<section> query param so the page surfaces a small ✓ saved
indicator on the relevant form.

ci.yml: bump golangci-lint-action v6→v7 (separate change picked up
in this commit).
2026-05-03 12:14:03 +01:00
steve 84914fd6c5 ci: only trigger on PRs into main
Drop the push-to-main trigger; main is fast-forward only via PR, so
the post-merge run was redundant.
2026-05-03 11:25:13 +01:00
steve c019633b77 ci: fix race-trip in enrollment fixture + bump golangci-lint to v2.1.6
- host_credentials_test.go's CreateEnrollmentToken fixture passed 1<<20
  as the TTL (third arg, time.Duration) — that's ~1ms in nanoseconds.
  Local non-race runs finished inside the window, but -race overhead
  blew the deadline so the token was already expired by the time
  GetEnrollmentTokenAttachments / ConsumeEnrollmentToken ran. Use
  time.Hour instead, which matches the spirit of a per-test fixture.
- Lint pin v1.61.0 was built against Go 1.23 and refuses to load a
  config targeting newer toolchains. go.mod is on 1.25, so the lint
  step exited 3 ('the Go language version used to build golangci-lint
  is lower than the targeted Go version'). Bumping to v2.1.6, which
  supports Go 1.25.

Both failures showed up only on the Gitea runner because local make
target runs go test without -race and lint hadn't been re-run after
the go.mod toolchain bump.
2026-05-03 11:13:22 +01:00
steve f55747a281 phase 1 foundations: api types, store, crypto, auth
Lands the bottom three layers of Phase 1:

P1-08 internal/api: protocol_version + envelope + every WS message
  shape from spec.md §6.2 (Hello, Heartbeat, Job*, Schedule*, etc).
  Wire-format tests pin the JSON shape so a rename here breaks
  tests instead of silently breaking the agent.

P1-02 + P1-03 internal/store: SQLite via modernc.org/sqlite,
  embed.FS + a tiny version table for hand-rolled migrations.
  0001_initial.sql covers every table from spec.md §5 plus
  enrollment_tokens and host_schedule_version. Typed accessors
  for users / sessions / enrollment / audit. WAL + foreign_keys
  + busy_timeout on by default.

P1-06 internal/crypto: XChaCha20-Poly1305 AEAD wrapper with
  per-message random nonce. Key file lifecycle (generate +
  refuse-to-overwrite, load with size validation). Optional
  additionalData binds ciphertext to the row that owns it.

P1-04 internal/auth (partial — passwords + tokens; sessions
  middleware lands with the HTTP handlers): argon2id following
  RFC 9106 (64 MiB / t=3 / p=4 / 32B), constant-time verify.
  HashToken stores SHA-256 of session/agent/enrollment tokens
  so a stolen DB doesn't hand over credentials.

Build floor moves to Go 1.25 (modernc.org/sqlite v1.50+ requires
it); CI + Dockerfile + README updated. Markdown lint diagnostics
on tasks.md cleared.

All packages tested. ~70 new tests pass in <1s.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 00:24:40 +01:00
steve 25aa001135 phase 0: project bootstrap
P0-01 Go module + cmd/server + cmd/agent skeletons + internal/ tree
P0-02 LICENSE (PolyForm NC 1.0.0), README, CONTRIBUTING
P0-03 golangci-lint, pre-commit, .editorconfig, .gitignore
P0-04 Gitea Actions CI: test (race+coverage), lint, cross-platform build matrix
P0-05 Dockerfile.server (multi-stage, distroless/static), docker-compose.yml
P0-06 Makefile with build/test/lint/fmt/run/release targets

build, vet, test, and cross-compile to linux/{amd64,arm64} + windows/amd64
all verified locally.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 00:03:59 +01:00