Files
restic-manager/docs/e2e-smoke.md
T
steve ee3ee241ea
CI / Test (linux/amd64) (push) Has been cancelled
CI / Lint (push) Has been cancelled
CI / Build (windows/amd64) (push) Has been cancelled
CI / Build (linux/amd64) (push) Has been cancelled
CI / Build (linux/arm64) (push) Has been cancelled
P1 polish: agent-as-root, init-repo flow, rest creds passthrough, UX fixes
Cohesive batch from a smoke-test session against a real rest-server.
Themed bullets:

* Agent runs as root, sandboxed via systemd. CapabilityBoundingSet
  drops to CAP_DAC_READ_SEARCH + restore caps; ProtectSystem=strict
  with ReadWritePaths confined to /etc + /var/lib/restic-manager;
  NoNewPrivileges blocks escalation. Install script no longer
  creates a service user. spec.md §4.2 / §14.1 / §14.3 explain the
  rationale (matches UrBackup / Veeam / Bareos defaults; trying to
  back up "everything" as an unprivileged user creates silent skips
  on /home, /root, /var/lib/* with no upside vs the threat model
  the agent already implies).

* Init-repo end-to-end. New JobKind="init" wired through agent
  runner, restic.Env.RunInit, server dispatcher, and a UI button
  (red "Initialise repo" in the run-now panel). hosts.repo_initialised_at
  flips on init success, on backup success, or on a non-empty
  snapshots.report. The "Run now" / "Init" / "Retry" branching now
  drives both the dashboard host row and the host-detail panel.
  Migrations 0004 (column), 0005 (jobs.kind CHECK widened — using
  the safe create-new-then-rename pattern; first version corrupted
  job_logs.job_id FK), 0006 (cleans up job_logs FK on already-
  affected DBs).

* rest-server creds embedded at exec time only. restic.Env gains
  RepoUsername; mergeRestCreds() builds the user:pass@-prefixed URL
  inside envSlice() and never assigns it back to the struct, so
  nothing slog-able ever sees the cleartext form. RedactURL helper
  for any future surface that needs to log a URL safely. Both
  helpers tested.

* Add-host UX. Repo password is now optional — server mints a
  24-byte URL-safe random one and surfaces it once, alongside an
  htpasswd snippet ("echo PASS | htpasswd -B -i ... USERNAME") so
  the operator pastes one command on the rest-server host and one
  on the endpoint. Result page also links the install snippet at
  /install/install.sh (was /install.sh — 404'd before) and pipes
  to bash (not sh — script uses set -o pipefail and other
  bashisms; on Debian/Ubuntu sh is dash).

* Late-subscriber race in JobHub. A fast-failing job could finish
  (DB write + Broadcast) before the browser's HX-Redirect → page
  load → WS-connect path completed, so the JS sat forever waiting
  on a job.finished that already passed. JobHub split into
  Register + Send + Run; handleJobStream now subscribes first,
  re-fetches the job, and sends a synthetic job.finished if the
  state is already terminal.

* HTMX error visibility. New toast partial listens to
  htmx:responseError and surfaces the response body as a
  bottom-right toast — every server-side validation error now
  becomes visible without per-handler JS wiring. Also handles
  custom rm:toast events for future server-pushed notifications
  via the HX-Trigger header. Themed via existing CSS vars.

* Dashboard rows are now whole-row clickable to host detail
  (CSS card-link pattern: absolute-positioned anchor + .row-action
  z-index restoration so the action button stays clickable).
  "View →" on a running job links to /jobs/<id> rather than
  /hosts/<id> since the row click already covers the host page.

* "Run first" / "Run first backup" → "Run now" everywhere for
  consistency.

* runbook (docs/e2e-smoke.md) updated — live-log streaming step
  now reflects P1-26; mentions the browser-driven Run-now flow.

* _diag/dump-creds — moved out of cmd/ so go build doesn't pick
  it up; .gitignore now excludes /_diag/ entirely.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 11:02:12 +01:00

8.5 KiB

End-to-end smoke test (P1-34)

A runbook for verifying the Phase 1 happy path against a real restic/rest-server. Run this on any Linux host with Docker; nothing here touches your real Proxmox cluster or Unraid storage.

The test exercises:

  1. Operator mints an enrollment token with repo creds (P1-32).
  2. Agent enrols, server burns the token, host_credentials row lands.
  3. Agent connects over WS, server pushes config.update containing the decrypted creds before the agent sees any command.
  4. Agent persists creds into secrets.enc (P1-33).
  5. Run-now backup against the live restic/rest-server.
  6. snapshots.report updates the per-host projection.
  7. GET /api/hosts/{id}/snapshots returns the new snapshot.

Total time: ~5 minutes on a warm machine.


Prereqs

  • Docker + Docker Compose
  • restic v0.16+ on the host running the agent (the agent does not install it; that's a deliberate design choice — see spec §4.2)
  • curl, jq

Layout

Everything lives under /tmp/rm-smoke/. Nothing escapes it; remove the directory to clean up.

/tmp/rm-smoke/
├── compose.yaml            # rest-server + control-plane
├── data/                   # control-plane SQLite + secret key
│   └── agent-binaries/     # built agent binaries served by /agent/binary
├── rest/                   # rest-server data volume
│   └── htpasswd
└── agent/                  # this host plays the part of an endpoint
    ├── etc/                # → bind-mounted as /etc/restic-manager
    └── var-lib/            # → bind-mounted as /var/lib/restic-manager

1. Build the binaries

mkdir -p /tmp/rm-smoke/data/agent-binaries
cd ~/src/restic-manager
make build
cp bin/restic-manager-agent /tmp/rm-smoke/data/agent-binaries/restic-manager-agent-linux-amd64

The server's /agent/binary?os=linux&arch=amd64 resolves to that path.

2. Compose the stack

/tmp/rm-smoke/compose.yaml:

services:
  rest-server:
    image: restic/rest-server:latest
    restart: unless-stopped
    environment:
      - OPTIONS=--no-auth   # smoke-test only; real deploys use --append-only + htpasswd
    ports:
      # Mapped to 8100 because most dev boxes already have something
      # on 8000. Use any free port; just keep the URLs below in sync.
      - "127.0.0.1:8100:8000"
    volumes:
      - ./rest:/data

  control-plane:
    image: ghcr.io/dcglab/restic-manager:dev   # or build locally; see §1
    restart: unless-stopped
    ports:
      - "127.0.0.1:8080:8080"
    volumes:
      - ./data:/data
    environment:
      - RM_LISTEN=:8080
      - RM_DATA_DIR=/data
      - RM_BASE_URL=http://127.0.0.1:8080
      - RM_SECRET_KEY_FILE=/data/secret.key
      - RM_COOKIE_SECURE=false   # smoke-test only — we're on plain HTTP

For local-only smoke: skip the image and run the server straight from the binary instead, pointing at /tmp/rm-smoke/data:

RM_LISTEN=:8080 RM_DATA_DIR=/tmp/rm-smoke/data \
RM_SECRET_KEY_FILE=/tmp/rm-smoke/data/secret.key \
RM_COOKIE_SECURE=false \
./bin/restic-manager-server

Either way, watch stderr for the bootstrap token — printed on first run, used in the next step.

3. Bootstrap the admin account

BOOTSTRAP_TOKEN='<paste from server logs>'
curl -s -X POST http://127.0.0.1:8080/api/bootstrap \
  -H 'content-type: application/json' \
  -d "{\"token\":\"$BOOTSTRAP_TOKEN\",\"username\":\"admin\",\"password\":\"correct horse battery staple\"}"

4. Mint an enrollment token (with repo creds)

curl -s -c /tmp/rm-smoke/cookies -X POST http://127.0.0.1:8080/api/auth/login \
  -H 'content-type: application/json' \
  -d '{"username":"admin","password":"correct horse battery staple"}'

ENROLL=$(curl -s -b /tmp/rm-smoke/cookies -X POST http://127.0.0.1:8080/api/enrollment-tokens \
  -H 'content-type: application/json' \
  -d '{
    "hostname":"smoke-host",
    "repo_url":"rest:http://127.0.0.1:8100/smoke/",
    "repo_username":"",
    "repo_password":"smoke-pw"
  }')
TOKEN=$(echo "$ENROLL" | jq -r .token)
echo "token: $TOKEN"

If the server rejects with missing_field, you forgot repo_url/repo_password — both are required (P1-32).

5. Initialise the rest-server repo

restic/rest-server will lazy-create the path on first write, but restic itself wants the repo initialised:

RESTIC_PASSWORD=smoke-pw \
restic -r rest:http://127.0.0.1:8100/smoke/ init

6. Pretend to be a fresh endpoint

The agent will write agent.yaml + secrets.enc under /tmp/rm-smoke/agent/etc and /tmp/rm-smoke/agent/var-lib. We point both at those dirs to keep the smoke run isolated from your real /etc/restic-manager.

mkdir -p /tmp/rm-smoke/agent/etc /tmp/rm-smoke/agent/var-lib
CONFIG=/tmp/rm-smoke/agent/etc/agent.yaml

# Pre-write the secrets path so we don't hit the system default.
cat > "$CONFIG" <<EOF
secrets_path: /tmp/rm-smoke/agent/var-lib/secrets.enc
EOF

# Enroll. This call talks to the server, returns the persistent
# bearer, and writes server_url/host_id/agent_token/secrets_key
# back into agent.yaml. secrets.enc is empty until the first
# config.update push lands.
./bin/restic-manager-agent \
  -config "$CONFIG" \
  -enroll-server http://127.0.0.1:8080 \
  -enroll-token "$TOKEN"

# Read off the host_id for later steps.
HOST_ID=$(grep host_id "$CONFIG" | awk '{print $2}' | tr -d '"')
echo "host id: $HOST_ID"

After enrolment, agent.yaml should contain host_id: (a ULID), agent_token:, and server_url:. It will not contain secrets_key: yet — that's minted on the first non-enroll start of the agent (next step). It should not contain repo_url: or repo_password: (those never appear in plaintext on disk).

cat "$CONFIG"

7. Run the agent

In a second terminal:

./bin/restic-manager-agent -config /tmp/rm-smoke/agent/etc/agent.yaml

You should see, in order:

agent starting host_id=01H… server=http://127.0.0.1:8080 …
ws agent connected protocol_version=…
ws agent: repo credentials updated via config.update

That last line confirms slice 1 + 2 of P1-32/33: the server pushed the encrypted creds, the agent decrypted, persisted to secrets.enc, and is now ready to back up. secrets.enc should now exist and be 0600. agent.yaml should now also contain a freshly-minted secrets_key: (base64-encoded 32 bytes).

ls -l /tmp/rm-smoke/agent/var-lib/secrets.enc

8. Run a backup

Back in the first terminal:

JOB=$(curl -s -b /tmp/rm-smoke/cookies -X POST \
  "http://127.0.0.1:8080/api/hosts/$HOST_ID/jobs" \
  -H 'content-type: application/json' \
  -d '{"kind":"backup","args":["/etc/hostname","/etc/os-release"]}')
JOB_ID=$(echo "$JOB" | jq -r .job_id)
echo "job: $JOB_ID"

The agent terminal will show restic chugging through two tiny files; the server terminal will log the lifecycle (mark job started / mark job finished / snapshots refreshed count=1).

For a browser-driven version of the same flow, log in at http://127.0.0.1:8080/ and click Run now on the host row — the button posts to /hosts/{id}/run-backup and the response sets HX-Redirect to the live log page, which subscribes to /api/jobs/{id}/stream (P1-26) and tails job.progress / log.stream until job.finished flips it to the final header.

9. Confirm the snapshot

curl -s -b /tmp/rm-smoke/cookies \
  "http://127.0.0.1:8080/api/hosts/$HOST_ID/snapshots" | jq

Expect one snapshot with the two paths and a non-zero size_bytes.

10. Verify the redacted credential view (sanity)

curl -s -b /tmp/rm-smoke/cookies \
  "http://127.0.0.1:8080/api/hosts/$HOST_ID/repo-credentials" | jq

Expect {"repo_url":"rest:http://127.0.0.1:8100/smoke/","has_password":true}. The password is never returned over this endpoint.

11. Edit creds + verify push-on-update

curl -s -b /tmp/rm-smoke/cookies -X PUT \
  "http://127.0.0.1:8080/api/hosts/$HOST_ID/repo-credentials" \
  -H 'content-type: application/json' \
  -d '{"repo_password":"new-smoke-pw"}'

Agent terminal should log repo credentials updated via config.update again. (Backups will then fail until you also update the rest-server auth — but that proves the push path is live.)

Cleanup

docker compose -f /tmp/rm-smoke/compose.yaml down -v
rm -rf /tmp/rm-smoke

What this runbook does NOT cover

These are intentionally out of scope for Phase 1; revisit when the relevant tasks land:

  • TLS termination at a reverse proxy (covered by P5-07 reference deployment)
  • Append-only restic creds + separate prune credential (P2-06)
  • Cancellation (P2)
  • Schedule-driven backups (P2-01 onwards)
  • Windows agent (P2-16/17)