p5-03: docker-only release path (drop goreleaser)

Single public deliverable per tag: a multi-arch server image, with
cross-compiled agent binaries + install scripts + the systemd unit
baked under /opt/restic-manager/dist/. The /agent/binary and
/install/* handlers fall back from <DataDir>/... to that read-only
path so a fresh container Just Works without first-run staging;
operators can still drop a custom build into <DataDir>/ to override
per-host.

Architecture rationale: agent distribution already routes through
the running server, so the release surface mirrors that — there's
no second source of truth to keep in sync.

Workflow .gitea/workflows/release.yml triggers on v*.*.* tag-push
(fan-out :vX.Y.Z / :X.Y / :X, plus :latest once MAJOR>=1) and
workflow_dispatch (snapshot tag only). Pushes to the Gitea
container registry on this instance.

Both binaries grow main.commit + main.date ldflag targets. Makefile
and Dockerfile fill them; release workflow forwards from gitea.sha
plus a UTC timestamp.

Spec : docs/superpowers/specs/2026-05-05-p5-03-docker-only-release.md
Plan : docs/superpowers/plans/2026-05-05-p5-03-docker-only-release.md
This commit is contained in:
2026-05-05 15:18:48 +01:00
parent 5ee58979fa
commit 7cc17813a9
11 changed files with 752 additions and 29 deletions
@@ -0,0 +1,131 @@
# P5-03 implementation plan — Docker-only release
Spec: `docs/superpowers/specs/2026-05-05-p5-03-docker-only-release.md`.
Branch: `p5-03-docker-release`. Do not auto-open a PR (see CLAUDE.md
memory: CI runs are expensive on the self-hosted cluster).
---
## Slice 1 — Server config + handler fallback
**Goal:** server can serve agent binaries / install scripts from a
read-only "bundled assets" path when `<DataDir>` doesn't have them.
1. `internal/server/config/config.go` (or wherever `Cfg` lives) gains
a `BundledAssetsDir string` field, defaulting to
`/opt/restic-manager/dist`. Wire from `RM_BUNDLED_ASSETS_DIR` env
var, mirroring the existing env-var conventions.
2. `internal/server/http/agent_assets.go`:
- `handleAgentBinary`: try `<DataDir>/agent-binaries/<name>`
first; on `os.Stat` ENOENT, try
`<BundledAssetsDir>/agent-binaries/<name>`; on second ENOENT,
existing 404.
- `handleInstallAsset`: same dual-path, with `install/` subpath.
3. Tests in `internal/server/http/agent_assets_test.go` (new file):
- DataDir hit serves DataDir bytes.
- DataDir miss + bundled hit serves bundled bytes.
- DataDir hit shadows bundled.
- Both miss → 404 + existing error envelope.
- Path-traversal still rejected for `install/*` (regression check).
**Verify:** `go vet ./...` + `go test ./internal/server/http/...`.
---
## Slice 2 — Version ldflags on both binaries
1. `cmd/server/main.go`: keep `var version`, add
`var commit = "none"` and `var date = "unknown"`. Surface via
existing version-log line.
2. `cmd/agent/main.go`: same three vars. Agent already reports
`agent_version` in the WS hello — extend to include commit if
it's already plumbed through `internal/api`; otherwise leave the
commit out of the wire and just log it on startup.
3. `Makefile`: extend the `make build` `-ldflags` to set all three
from `git describe --tags --always` + `git rev-parse HEAD` +
UTC timestamp. Source-build users get real values, not "dev".
4. `deploy/Dockerfile.server`: add `ARG COMMIT=none` and
`ARG DATE=unknown`; pass through `-ldflags`.
**Verify:** `make build && ./bin/restic-manager-server -version`
(or whatever the existing flag is) prints non-`dev` values.
---
## Slice 3 — Dockerfile bakes agents + install assets
1. Build stage cross-compiles three agents:
```dockerfile
RUN go build -trimpath -ldflags="-s -w \
-X main.version=${VERSION} -X main.commit=${COMMIT} -X main.date=${DATE}" \
-o /out/agent/restic-manager-agent-linux-amd64 ./cmd/agent
ENV GOARCH=arm64
RUN go build ... -o /out/agent/restic-manager-agent-linux-arm64 ./cmd/agent
ENV GOOS=windows GOARCH=amd64
RUN go build ... -o /out/agent/restic-manager-agent-windows-amd64.exe ./cmd/agent
```
(Reset `GOOS`/`GOARCH` between layers via `ENV`. Server build
stays at `GOOS=linux GOARCH=$TARGETARCH`.)
2. Final stage `COPY --from=build`:
- `/out/restic-manager-server` → `/usr/local/bin/`
- `/out/agent/*` → `/opt/restic-manager/dist/agent-binaries/`
- `deploy/install/install.sh` →
`/opt/restic-manager/dist/install/install.sh`
- `deploy/install/install.ps1` →
`/opt/restic-manager/dist/install/install.ps1`
- `deploy/install/restic-manager-agent.service` →
`/opt/restic-manager/dist/install/restic-manager-agent.service`
3. Set `--chmod=0755` on the agent binaries and `install.sh`,
`--chmod=0644` on the unit file and `install.ps1`. Distroless
final stage runs as `nonroot`; bundled assets are readable by
anyone (mode `o+r`), so the user switch doesn't break reads.
**Verify:**
```sh
docker build -f deploy/Dockerfile.server -t rm:dev .
docker run --rm -d -p 18080:8080 \
-e RM_LISTEN=:8080 -e RM_DATA_DIR=/data \
-e RM_BASE_URL=http://127.0.0.1:18080 \
-v rm-test:/data rm:dev
curl -fsSL "http://127.0.0.1:18080/agent/binary?os=linux&arch=amd64" | wc -c
curl -fsSL "http://127.0.0.1:18080/install/install.sh" | head -1
```
Both should succeed against a fresh volume (no operator staging).
---
## Slice 4 — Release workflow
`.gitea/workflows/release.yml` per the spec. Two jobs:
1. **`image`**: checkout → setup-qemu → setup-buildx → login → compute
tags → buildx build+push.
2. (Future) `release-notes`: stub left as a TODO comment for now.
Operator can hand-write release notes via the Gitea UI on first
cut.
The `compute tags` shell step is the only non-trivial bit; tested
inline by running the script with mocked `GITHUB_REF_TYPE` /
`GITHUB_REF_NAME` env vars before committing.
**Verify on first dispatch:** trigger `workflow_dispatch` from the
Gitea UI, check the runner produces `:snapshot-<sha>` and pushes
multi-arch.
---
## Slice 5 — Tasks.md + commit + push
1. `tasks.md`: tick P5-03; add a one-line note that goreleaser was
dropped in favour of Docker-only after a 2026-05-05 design pass
(link the spec).
2. `git add -A && git commit -m "p5-03: docker-only release path"`
(no Co-Authored-By trailer — CLAUDE.md rule).
3. `git push -u origin p5-03-docker-release`.
4. **Stop.** Do not open a PR. Wait for operator review.
@@ -0,0 +1,229 @@
# P5-03 — Docker-only release path
**Status:** approved 2026-05-05. Pivots P5-03 away from `goreleaser` +
binary archives toward a single Docker image as the only public
deliverable.
## Goal
One artifact per tag: the `restic-manager` server image, multi-arch
(linux amd64 + arm64), published to the Gitea container registry of
this self-hosted instance. The image bakes in cross-compiled agent
binaries (linux amd64, linux arm64, windows amd64), the install
scripts, and the systemd unit at a read-only image path. The running
server distributes those agents and scripts via its existing
`/agent/binary` and `/install/*` endpoints; operators on N hosts never
download a release artifact directly.
Source builds via `make build` remain a first-class path for anyone
who wants binaries.
## Non-goals
- Standalone binary archives (`.tar.gz`, `.zip`) on the release page.
- darwin / windows-arm64 agent targets — neither is service-tested.
- `goreleaser`. Not used.
- `cosign`, `SBOM`, `in-toto`, `minisign`. Re-promote when we ship
binaries outside an image (Phase 6 candidate).
- GHCR / GitHub mirror. Single source of truth = Gitea.
## Decisions captured (with one-line rationale)
| ID | Decision | Why |
|----|----------|-----|
| D1 | One artifact: server Docker image | Architecture already routes agent distribution through the server (`/agent/binary`); release surface should mirror that. |
| D2 | Trigger: `tag-push` (`v*.*.*`) **plus** `workflow_dispatch` | Tag for real cuts; dispatch for snapshot iteration without polluting tag history. |
| D3 | Build matrix: linux amd64+arm64 server image; agent cross-compiles for linux amd64+arm64+windows amd64 | Mirrors the existing CI build matrix; nothing ships that hasn't been service-tested. |
| D4 | Image-baked, separate path (`/opt/restic-manager/dist/`); HTTP handler reads `<DataDir>/...` first, falls back to `/opt/...` | Volume stays purely operator state; image content is immutable per tag; eliminates the smoke-env "stale agent" footgun in production. |
| D5 | Tag fan-out: `vX.Y.Z`, `X.Y`, `X`, `latest` — but `latest` is held back until `v1.0.0` | Standard rolling-minor pattern; pre-1.0 forces explicit pinning. |
| D6 | Snapshot tag: `:snapshot-<shortsha>`, never moves `latest` | Operator can never accidentally pull an unblessed build. |
| D7 | Version embedding via `-ldflags`: `main.version`, `main.commit`, `main.date` on both `cmd/server` and `cmd/agent` | Server already had `version`; add `commit`/`date` to both for parity and traceability. |
| D8 | Registry: Gitea container registry on this instance, under `<host>/<owner>/restic-manager` | One source of truth, no external creds. |
| D9 | Integrity: a `SHA256SUMS` file + the manifest digest in the release notes; nothing else | Image is the unit of trust; pull-by-digest is the verification primitive. |
| D10 | P1-31 (signed binaries) stays deferred | Re-promote the day we ship binaries outside an image. |
## Image layout
Multi-stage Dockerfile (extends today's `deploy/Dockerfile.server`):
```
build stage (golang:1.25-alpine):
cross-compile cmd/server for $TARGETARCH (linux)
cross-compile cmd/agent for linux/amd64
cross-compile cmd/agent for linux/arm64
cross-compile cmd/agent for windows/amd64
(CGO_ENABLED=0 throughout — pure-Go SQLite)
final stage (gcr.io/distroless/static-debian12:nonroot):
/usr/local/bin/restic-manager-server (matches image arch)
/opt/restic-manager/dist/agent-binaries/
restic-manager-agent-linux-amd64
restic-manager-agent-linux-arm64
restic-manager-agent-windows-amd64.exe
/opt/restic-manager/dist/install/
install.sh
install.ps1
restic-manager-agent.service
```
`/opt/restic-manager/dist/` is owned by `root:root`, mode `0755` for
directories, `0755` for `install.sh` (script must be executable when
the install path uses `curl ... | sh` semantics) and `0644` for the
unit file and `install.ps1`. The agent binaries are mode `0755`.
`<DataDir>` keeps holding only operator state: `restic-manager.db`,
`secret.key`, `secrets.enc`, `audit/`, `tls/`. Nothing the image
owns gets written into the volume.
## Server-side handler change
`internal/server/http/agent_assets.go` today reads from
`<DataDir>/agent-binaries/<name>` and `<DataDir>/install/<name>`.
Change: if the file isn't present in `<DataDir>`, fall back to
`/opt/restic-manager/dist/<subpath>/<name>`. The fallback path is a
new server-config field defaulted to `/opt/restic-manager/dist`,
overridable via `RM_BUNDLED_ASSETS_DIR` for tests and source-build
deployments. If neither path resolves, return 404 (existing
`binary_not_published` / `not_found` body unchanged).
This means:
- A fresh container without any operator-staged overrides serves the
baked-in agents. No first-run setup needed.
- An operator can still drop a custom-built agent into
`<DataDir>/agent-binaries/` to override the image's copy (handy for
pre-release agent testing without rebuilding the server image).
- Source-build dev (`bin/restic-manager-server` running out of the
working tree) still works exactly as today — the fallback dir is
configurable, and the `<DataDir>` path remains the primary lookup.
Tests cover four cases: (a) DataDir hit, (b) fallback hit, (c) DataDir
hit shadows fallback, (d) neither — 404.
## Versioning
Both binaries grow `commit` and `date` ldflag-targets next to the
existing `version`:
```go
var (
version = "dev"
commit = "none"
date = "unknown"
)
```
Dockerfile gains `ARG VERSION`, `ARG COMMIT`, `ARG DATE`, all
`""`-defaulted; the `go build` line passes them via `-ldflags`. The
release workflow fills them from `${{ gitea.ref_name }}`,
`${{ gitea.sha }}`, and a UTC ISO-8601 timestamp.
Snapshot builds (workflow_dispatch) compute
`VERSION=0.0.0-snapshot-${SHORTSHA}` and tag the image as
`:snapshot-${SHORTSHA}` only. They never touch `latest` or any
`vX.Y.Z` tag.
## Workflow (`.gitea/workflows/release.yml`)
```yaml
name: Release
on:
push:
tags: ['v[0-9]+.[0-9]+.[0-9]+']
workflow_dispatch:
env:
IMAGE: gitea.dcglab.co.uk/${{ gitea.repository }}
jobs:
image:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: docker/setup-qemu-action@v3
- uses: docker/setup-buildx-action@v3
- uses: docker/login-action@v3
with:
registry: gitea.dcglab.co.uk
username: ${{ gitea.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: compute tags
id: meta
run: |
# tag-push → :vX.Y.Z, :X.Y, :X (only :latest if X >= 1)
# dispatch → :snapshot-<shortsha>
...
- uses: docker/build-push-action@v6
with:
context: .
file: deploy/Dockerfile.server
platforms: linux/amd64,linux/arm64
push: true
tags: ${{ steps.meta.outputs.tags }}
build-args: |
VERSION=${{ steps.meta.outputs.version }}
COMMIT=${{ gitea.sha }}
DATE=${{ steps.meta.outputs.date }}
```
The `compute tags` step:
- For `push:tags`: extract `vMAJOR.MINOR.PATCH`. Always emit
`:vMAJOR.MINOR.PATCH`, `:MAJOR.MINOR`, `:MAJOR`. Emit `:latest`
only when `MAJOR >= 1`.
- For `workflow_dispatch`: emit `:snapshot-<shortsha>`. Nothing else.
No release-asset upload step yet — the GHCR-equivalent registry push
is the deliverable. A future iteration may attach a `SHA256SUMS` file
to a Gitea release object once `tea release create` is wired in;
that's not in scope for the first cut.
## Tests / verification
1. `go vet ./...` (CLAUDE.md rule, runs locally pre-commit).
2. `go test ./internal/server/http/...` covers the new fallback
logic.
3. Local manual smoke: `docker build -f deploy/Dockerfile.server .`
produces an image; `docker run --rm <image>` starts the server;
`curl http://127.0.0.1:8080/agent/binary?os=linux&arch=amd64`
serves bytes; `curl http://127.0.0.1:8080/install/install.sh`
serves the script.
4. Release workflow itself is exercised on first tag-push; until
then, `workflow_dispatch` is the smoke test.
## Operator-facing changes
- `README.md` install snippet becomes
`docker run -v rm-data:/var/lib/restic-manager ...
gitea.dcglab.co.uk/<owner>/restic-manager:vX.Y.Z`. Pre-1.0
releases are pinned by exact tag; no `:latest` is published.
- The CLAUDE.md "restage" block is dev-only (smoke env runs the
server out of `bin/`). Production users on the image never see
it.
- `RM_BUNDLED_ASSETS_DIR` is documented in the server config
reference (defaults to `/opt/restic-manager/dist`).
## Risks / footguns
- **Image size growth.** Three agent binaries (~15-20 MB each
stripped) add ~50 MB. Acceptable; we're already shipping a
distroless server. Watch the trajectory once Phase 4 alerting is
in.
- **Dockerfile cross-compile multiplies build time** on the runner.
Pure-Go means each leg is just a `go build`; total stage time
should stay under 60s on the self-hosted runner.
- **`ARG VERSION` leakage.** The current Dockerfile already accepts
`ARG VERSION=dev`; we're tightening, not loosening.
- **Operator overriding `<DataDir>/agent-binaries/<name>`** with a
stale binary will silently shadow the image's copy. Documented in
the server config reference; this is a feature (lets operators
hot-patch a pre-release agent) not a bug.
## Out of scope (tracked for follow-up)
- Cosign / SBOM / in-toto provenance — defer to Phase 6 with the rest
of the supply-chain hardening.
- GHCR mirror — defer until P5-01 docs site goes public.
- `tea release create` integration — pending until we have something
worth attaching beyond the image digest.