Files
restic-manager/docs/e2e-smoke.md
T
steve ee3ee241ea
CI / Test (linux/amd64) (push) Has been cancelled
CI / Lint (push) Has been cancelled
CI / Build (windows/amd64) (push) Has been cancelled
CI / Build (linux/amd64) (push) Has been cancelled
CI / Build (linux/arm64) (push) Has been cancelled
P1 polish: agent-as-root, init-repo flow, rest creds passthrough, UX fixes
Cohesive batch from a smoke-test session against a real rest-server.
Themed bullets:

* Agent runs as root, sandboxed via systemd. CapabilityBoundingSet
  drops to CAP_DAC_READ_SEARCH + restore caps; ProtectSystem=strict
  with ReadWritePaths confined to /etc + /var/lib/restic-manager;
  NoNewPrivileges blocks escalation. Install script no longer
  creates a service user. spec.md §4.2 / §14.1 / §14.3 explain the
  rationale (matches UrBackup / Veeam / Bareos defaults; trying to
  back up "everything" as an unprivileged user creates silent skips
  on /home, /root, /var/lib/* with no upside vs the threat model
  the agent already implies).

* Init-repo end-to-end. New JobKind="init" wired through agent
  runner, restic.Env.RunInit, server dispatcher, and a UI button
  (red "Initialise repo" in the run-now panel). hosts.repo_initialised_at
  flips on init success, on backup success, or on a non-empty
  snapshots.report. The "Run now" / "Init" / "Retry" branching now
  drives both the dashboard host row and the host-detail panel.
  Migrations 0004 (column), 0005 (jobs.kind CHECK widened — using
  the safe create-new-then-rename pattern; first version corrupted
  job_logs.job_id FK), 0006 (cleans up job_logs FK on already-
  affected DBs).

* rest-server creds embedded at exec time only. restic.Env gains
  RepoUsername; mergeRestCreds() builds the user:pass@-prefixed URL
  inside envSlice() and never assigns it back to the struct, so
  nothing slog-able ever sees the cleartext form. RedactURL helper
  for any future surface that needs to log a URL safely. Both
  helpers tested.

* Add-host UX. Repo password is now optional — server mints a
  24-byte URL-safe random one and surfaces it once, alongside an
  htpasswd snippet ("echo PASS | htpasswd -B -i ... USERNAME") so
  the operator pastes one command on the rest-server host and one
  on the endpoint. Result page also links the install snippet at
  /install/install.sh (was /install.sh — 404'd before) and pipes
  to bash (not sh — script uses set -o pipefail and other
  bashisms; on Debian/Ubuntu sh is dash).

* Late-subscriber race in JobHub. A fast-failing job could finish
  (DB write + Broadcast) before the browser's HX-Redirect → page
  load → WS-connect path completed, so the JS sat forever waiting
  on a job.finished that already passed. JobHub split into
  Register + Send + Run; handleJobStream now subscribes first,
  re-fetches the job, and sends a synthetic job.finished if the
  state is already terminal.

* HTMX error visibility. New toast partial listens to
  htmx:responseError and surfaces the response body as a
  bottom-right toast — every server-side validation error now
  becomes visible without per-handler JS wiring. Also handles
  custom rm:toast events for future server-pushed notifications
  via the HX-Trigger header. Themed via existing CSS vars.

* Dashboard rows are now whole-row clickable to host detail
  (CSS card-link pattern: absolute-positioned anchor + .row-action
  z-index restoration so the action button stays clickable).
  "View →" on a running job links to /jobs/<id> rather than
  /hosts/<id> since the row click already covers the host page.

* "Run first" / "Run first backup" → "Run now" everywhere for
  consistency.

* runbook (docs/e2e-smoke.md) updated — live-log streaming step
  now reflects P1-26; mentions the browser-driven Run-now flow.

* _diag/dump-creds — moved out of cmd/ so go build doesn't pick
  it up; .gitignore now excludes /_diag/ entirely.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 11:02:12 +01:00

285 lines
8.5 KiB
Markdown

# End-to-end smoke test (P1-34)
A runbook for verifying the Phase 1 happy path against a real
`restic/rest-server`. Run this on any Linux host with Docker; nothing
here touches your real Proxmox cluster or Unraid storage.
The test exercises:
1. Operator mints an enrollment token **with repo creds** (P1-32).
2. Agent enrols, server burns the token, host_credentials row lands.
3. Agent connects over WS, server pushes `config.update` containing
the decrypted creds **before** the agent sees any command.
4. Agent persists creds into `secrets.enc` (P1-33).
5. Run-now backup against the live `restic/rest-server`.
6. `snapshots.report` updates the per-host projection.
7. `GET /api/hosts/{id}/snapshots` returns the new snapshot.
Total time: ~5 minutes on a warm machine.
---
## Prereqs
- Docker + Docker Compose
- `restic` v0.16+ on the host running the agent (the agent does **not**
install it; that's a deliberate design choice — see spec §4.2)
- `curl`, `jq`
## Layout
Everything lives under `/tmp/rm-smoke/`. Nothing escapes it; remove the
directory to clean up.
```
/tmp/rm-smoke/
├── compose.yaml # rest-server + control-plane
├── data/ # control-plane SQLite + secret key
│ └── agent-binaries/ # built agent binaries served by /agent/binary
├── rest/ # rest-server data volume
│ └── htpasswd
└── agent/ # this host plays the part of an endpoint
├── etc/ # → bind-mounted as /etc/restic-manager
└── var-lib/ # → bind-mounted as /var/lib/restic-manager
```
## 1. Build the binaries
```sh
mkdir -p /tmp/rm-smoke/data/agent-binaries
cd ~/src/restic-manager
make build
cp bin/restic-manager-agent /tmp/rm-smoke/data/agent-binaries/restic-manager-agent-linux-amd64
```
The server's `/agent/binary?os=linux&arch=amd64` resolves to that path.
## 2. Compose the stack
`/tmp/rm-smoke/compose.yaml`:
```yaml
services:
rest-server:
image: restic/rest-server:latest
restart: unless-stopped
environment:
- OPTIONS=--no-auth # smoke-test only; real deploys use --append-only + htpasswd
ports:
# Mapped to 8100 because most dev boxes already have something
# on 8000. Use any free port; just keep the URLs below in sync.
- "127.0.0.1:8100:8000"
volumes:
- ./rest:/data
control-plane:
image: ghcr.io/dcglab/restic-manager:dev # or build locally; see §1
restart: unless-stopped
ports:
- "127.0.0.1:8080:8080"
volumes:
- ./data:/data
environment:
- RM_LISTEN=:8080
- RM_DATA_DIR=/data
- RM_BASE_URL=http://127.0.0.1:8080
- RM_SECRET_KEY_FILE=/data/secret.key
- RM_COOKIE_SECURE=false # smoke-test only — we're on plain HTTP
```
For local-only smoke: skip the image and run the server straight from
the binary instead, pointing at `/tmp/rm-smoke/data`:
```sh
RM_LISTEN=:8080 RM_DATA_DIR=/tmp/rm-smoke/data \
RM_SECRET_KEY_FILE=/tmp/rm-smoke/data/secret.key \
RM_COOKIE_SECURE=false \
./bin/restic-manager-server
```
Either way, watch stderr for the **bootstrap token** — printed on first
run, used in the next step.
## 3. Bootstrap the admin account
```sh
BOOTSTRAP_TOKEN='<paste from server logs>'
curl -s -X POST http://127.0.0.1:8080/api/bootstrap \
-H 'content-type: application/json' \
-d "{\"token\":\"$BOOTSTRAP_TOKEN\",\"username\":\"admin\",\"password\":\"correct horse battery staple\"}"
```
## 4. Mint an enrollment token (with repo creds)
```sh
curl -s -c /tmp/rm-smoke/cookies -X POST http://127.0.0.1:8080/api/auth/login \
-H 'content-type: application/json' \
-d '{"username":"admin","password":"correct horse battery staple"}'
ENROLL=$(curl -s -b /tmp/rm-smoke/cookies -X POST http://127.0.0.1:8080/api/enrollment-tokens \
-H 'content-type: application/json' \
-d '{
"hostname":"smoke-host",
"repo_url":"rest:http://127.0.0.1:8100/smoke/",
"repo_username":"",
"repo_password":"smoke-pw"
}')
TOKEN=$(echo "$ENROLL" | jq -r .token)
echo "token: $TOKEN"
```
If the server rejects with `missing_field`, you forgot
`repo_url`/`repo_password` — both are required (P1-32).
## 5. Initialise the rest-server repo
`restic/rest-server` will lazy-create the path on first write, but
restic itself wants the repo initialised:
```sh
RESTIC_PASSWORD=smoke-pw \
restic -r rest:http://127.0.0.1:8100/smoke/ init
```
## 6. Pretend to be a fresh endpoint
The agent will write `agent.yaml` + `secrets.enc` under
`/tmp/rm-smoke/agent/etc` and `/tmp/rm-smoke/agent/var-lib`. We point
both at those dirs to keep the smoke run isolated from your real
`/etc/restic-manager`.
```sh
mkdir -p /tmp/rm-smoke/agent/etc /tmp/rm-smoke/agent/var-lib
CONFIG=/tmp/rm-smoke/agent/etc/agent.yaml
# Pre-write the secrets path so we don't hit the system default.
cat > "$CONFIG" <<EOF
secrets_path: /tmp/rm-smoke/agent/var-lib/secrets.enc
EOF
# Enroll. This call talks to the server, returns the persistent
# bearer, and writes server_url/host_id/agent_token/secrets_key
# back into agent.yaml. secrets.enc is empty until the first
# config.update push lands.
./bin/restic-manager-agent \
-config "$CONFIG" \
-enroll-server http://127.0.0.1:8080 \
-enroll-token "$TOKEN"
# Read off the host_id for later steps.
HOST_ID=$(grep host_id "$CONFIG" | awk '{print $2}' | tr -d '"')
echo "host id: $HOST_ID"
```
After enrolment, `agent.yaml` should contain `host_id:` (a ULID),
`agent_token:`, and `server_url:`. It will **not** contain
`secrets_key:` yet — that's minted on the first non-enroll start
of the agent (next step). It should **not** contain `repo_url:`
or `repo_password:` (those never appear in plaintext on disk).
```sh
cat "$CONFIG"
```
## 7. Run the agent
In a second terminal:
```sh
./bin/restic-manager-agent -config /tmp/rm-smoke/agent/etc/agent.yaml
```
You should see, in order:
```
agent starting host_id=01H… server=http://127.0.0.1:8080 …
ws agent connected protocol_version=…
ws agent: repo credentials updated via config.update
```
That last line confirms slice 1 + 2 of P1-32/33: the server pushed
the encrypted creds, the agent decrypted, persisted to `secrets.enc`,
and is now ready to back up. `secrets.enc` should now exist and be
0600. `agent.yaml` should now also contain a freshly-minted
`secrets_key:` (base64-encoded 32 bytes).
```sh
ls -l /tmp/rm-smoke/agent/var-lib/secrets.enc
```
## 8. Run a backup
Back in the first terminal:
```sh
JOB=$(curl -s -b /tmp/rm-smoke/cookies -X POST \
"http://127.0.0.1:8080/api/hosts/$HOST_ID/jobs" \
-H 'content-type: application/json' \
-d '{"kind":"backup","args":["/etc/hostname","/etc/os-release"]}')
JOB_ID=$(echo "$JOB" | jq -r .job_id)
echo "job: $JOB_ID"
```
The agent terminal will show restic chugging through two tiny files;
the server terminal will log the lifecycle (`mark job started` /
`mark job finished` / `snapshots refreshed count=1`).
For a browser-driven version of the same flow, log in at
`http://127.0.0.1:8080/` and click **Run now** on the host row — the
button posts to `/hosts/{id}/run-backup` and the response sets
`HX-Redirect` to the live log page, which subscribes to
`/api/jobs/{id}/stream` (P1-26) and tails `job.progress` / `log.stream`
until `job.finished` flips it to the final header.
## 9. Confirm the snapshot
```sh
curl -s -b /tmp/rm-smoke/cookies \
"http://127.0.0.1:8080/api/hosts/$HOST_ID/snapshots" | jq
```
Expect one snapshot with the two paths and a non-zero `size_bytes`.
## 10. Verify the redacted credential view (sanity)
```sh
curl -s -b /tmp/rm-smoke/cookies \
"http://127.0.0.1:8080/api/hosts/$HOST_ID/repo-credentials" | jq
```
Expect `{"repo_url":"rest:http://127.0.0.1:8100/smoke/","has_password":true}`.
The password is never returned over this endpoint.
## 11. Edit creds + verify push-on-update
```sh
curl -s -b /tmp/rm-smoke/cookies -X PUT \
"http://127.0.0.1:8080/api/hosts/$HOST_ID/repo-credentials" \
-H 'content-type: application/json' \
-d '{"repo_password":"new-smoke-pw"}'
```
Agent terminal should log `repo credentials updated via config.update`
again. (Backups will then fail until you also update the rest-server
auth — but that proves the push path is live.)
## Cleanup
```sh
docker compose -f /tmp/rm-smoke/compose.yaml down -v
rm -rf /tmp/rm-smoke
```
---
## What this runbook does NOT cover
These are intentionally out of scope for Phase 1; revisit when the
relevant tasks land:
- TLS termination at a reverse proxy (covered by P5-07 reference deployment)
- Append-only restic creds + separate prune credential (P2-06)
- Cancellation (P2)
- Schedule-driven backups (P2-01 onwards)
- Windows agent (P2-16/17)