From cc638f6456b9046c9cba1e75f2c8153dc5f803fb Mon Sep 17 00:00:00 2001 From: Steve Cliff Date: Sat, 9 May 2026 12:18:42 +0100 Subject: [PATCH] Added new AI focused document for host onboarding --- docs/agent-host-onboarding.md | 249 ++++++++++++++++++++++++++++++++++ 1 file changed, 249 insertions(+) create mode 100644 docs/agent-host-onboarding.md diff --git a/docs/agent-host-onboarding.md b/docs/agent-host-onboarding.md new file mode 100644 index 0000000..5c306fd --- /dev/null +++ b/docs/agent-host-onboarding.md @@ -0,0 +1,249 @@ +# Onboarding a new host — agent instructions + +How an automation agent (with a username + password for the +restic-manager server) brings a new host fully online. + +The flow is two roles: + +- **Controller side**: the agent calls JSON APIs on the + restic-manager server. Needs network reach to the server, plus + username/password. +- **Target side**: the host being onboarded runs the install + script, which calls back to the server with the one-time token. + +If the agent is *both* sides (e.g. it can SSH into the target), +it does steps 1–2 against the server and steps 3–4 against the +target. If the agent only controls the server, it stops at +step 2 and hands the install snippet to whoever owns the target. + +--- + +## Conventions + +- Base URL: `$RM_SERVER` (e.g. `https://restic.lab.example`). +- Session cookie jar: persist `rm_session` between calls. +- All request/response bodies are JSON unless noted. +- On any non-2xx, response body is + `{"code": "...", "message": "..."}`. + +--- + +## 1. Login + +``` +POST $RM_SERVER/api/auth/login +Content-Type: application/json + +{"username": "...", "password": "..."} +``` + +→ 200 with `{"user_id": "...", "role": "..."}` and a `Set-Cookie: +rm_session=...` (HttpOnly, 24h TTL). Persist the cookie; reuse +it on every subsequent call. + +Required role for the next step: **operator** or **admin**. +A viewer-only login can read but cannot mint tokens. + +Session expires at 24h. On 401 from a later call, re-login. + +--- + +## 2. Mint an enrolment token + +``` +POST $RM_SERVER/api/enrollment-tokens +Cookie: rm_session=... +Content-Type: application/json + +{ + "hostname": "newhost.example", + "tags": ["prod", "london"], // optional + "repo_url": "rest:https://rest.example/newhost", + "repo_username": "...", // optional, for rest-server / S3 + "repo_password": "...", // optional + "initial_paths": ["/etc", "/home", "/var/lib"] // optional; default source group +} +``` + +→ 200 with: + +```json +{ "token": "", "expires_at": "2026-05-09T..." } +``` + +**Capture `token` immediately — the server only stores its hash +and will never return the raw value again.** TTL is 1 hour. + +The repo creds you provided are encrypted under the token hash +and pre-attached to the host. The agent will fetch and store +them at enrol-time; you will not need to push them again. + +If you lose the token before the install runs, mint a new one +(the existing one becomes irrelevant; you can leave it to expire +or revoke it via the UI). + +--- + +## 3. Install on the target host + +The install script is hosted by the server itself. Running on the +target: + +### Linux + +``` +curl -fsSL $RM_SERVER/install/install.sh | \ + sudo RM_SERVER=$RM_SERVER RM_TOKEN= bash +``` + +What it does, end-to-end: + +1. detects arch (amd64 / arm64) +2. downloads `$RM_SERVER/agent/binary?os=linux&arch=` to + `/usr/local/bin/restic-manager-agent` +3. creates `/etc/restic-manager/` and `/var/lib/restic-manager/` + (root:root, 0700) +4. calls `POST /api/agents/enroll` with the token; server returns + the persistent agent bearer + `host_id`, written to + `/etc/restic-manager/agent.env` +5. installs the systemd unit, `daemon-reload`, `enable --now` +6. surfaces any pre-existing restic cron/timer entries so the + operator can decide whether to disable them (script does + *not* touch them automatically) + +The script is idempotent. Re-running on an already-enrolled host +is a no-op unless `RM_FORCE_REENROLL=1`. + +The agent runs as **root** by design — fleet backup needs to +read every file on the system. See +`deploy/install/restic-manager-agent.service` for rationale. + +### Windows + +``` +iwr $RM_SERVER/install/install.ps1 -UseBasicParsing | iex +# (or download + run; needs an elevated PowerShell) +# Required env: $env:RM_SERVER, $env:RM_TOKEN +``` + +Same flow, lays down a Windows service instead of a systemd unit. + +### Manual / non-script enrolment + +If the install script can't be used, the wire-level enrol call is: + +``` +POST $RM_SERVER/api/agents/enroll +Content-Type: application/json + +{ + "token": "", + "hostname": "newhost.example", + "os": "linux", // linux | windows + "arch": "amd64", // amd64 | arm64 + "agent_version": "...", + "restic_version": "..." +} +``` + +→ 200 with +`{"host_id": "...", "agent_token": "...", "cert_pin_sha256": "..."}`. + +The agent_token goes into `/etc/restic-manager/agent.env` as +`RM_AGENT_TOKEN=...`; subsequent agent → server traffic uses +`Authorization: Bearer $RM_AGENT_TOKEN`. + +--- + +## 4. Verify the host is healthy + +Poll until both conditions are true. Cap at ~5 minutes. + +``` +GET $RM_SERVER/api/hosts +Cookie: rm_session=... +``` + +→ array of host objects. Find the one with the matching hostname +and check: + +- `"status": "online"` — agent connected to the WS heartbeat +- `"repo_status": "ready"` — `restic init` (or existing-config + detection) completed successfully + +If `repo_status` settles on `"init_failed"`, the repo creds are +wrong or the repo URL is unreachable from the target. Inspect +the matching job log: + +``` +GET $RM_SERVER/api/hosts//jobs (most recent init job) +GET $RM_SERVER/api/jobs/ (full output) +``` + +Fix the creds with a creds-update call (see Settings → Repo on +the UI for the exact route — currently form-only) or revoke the +host and start over. + +--- + +## 5. (Optional) configure schedules + +A new host gets one default source group covering `initial_paths` +(or `/etc`,`/home` if you didn't pass any) and **no schedule**. +Backups won't run until either: + +- a schedule is attached (cron expression, retention, etc.), or +- you trigger an on-demand run via the source-group "Run now" + endpoint. + +These are not yet exposed cleanly as JSON-only routes; if the +agent needs them, look at `internal/server/http/schedules*.go` +and `internal/server/http/source_groups*.go` — most are JSON- +capable, some are form-only with HTML 303 responses. + +--- + +## Failure modes — quick reference + +| Symptom | Likely cause | Fix | +|---|---|---| +| `401` on `/api/enrollment-tokens` | session expired or viewer role | re-login as operator+ | +| install.sh fails at "enrol": HTTP 410 | token expired (>1h) or already used | mint a fresh token | +| Host shows `status=offline` after install | systemd unit didn't start; firewall blocks WS | `systemctl status restic-manager-agent`, check `$RM_SERVER` reachability | +| `repo_status=init_failed` | bad repo creds or URL | inspect init job log; fix creds; retry probe via `/hosts/{id}/repo/probe` | +| Token list grows with stale rows | normal — they expire at 1h | optional cleanup via `/hosts/enrollment-tokens/{hash}/revoke` | + +--- + +## Minimum reproducible script + +```bash +#!/usr/bin/env bash +set -euo pipefail +: "${RM_SERVER:?}" "${RM_USER:?}" "${RM_PASS:?}" "${RM_HOSTNAME:?}" \ + "${RM_REPO_URL:?}" "${RM_REPO_USER:?}" "${RM_REPO_PASS:?}" + +JAR=$(mktemp) +trap 'rm -f "$JAR"' EXIT + +# 1. login +curl -fsS -c "$JAR" -H 'Content-Type: application/json' \ + -d "{\"username\":\"$RM_USER\",\"password\":\"$RM_PASS\"}" \ + "$RM_SERVER/api/auth/login" >/dev/null + +# 2. mint token +TOKEN=$(curl -fsS -b "$JAR" -H 'Content-Type: application/json' \ + -d "$(jq -nc \ + --arg h "$RM_HOSTNAME" --arg u "$RM_REPO_USER" \ + --arg p "$RM_REPO_PASS" --arg r "$RM_REPO_URL" \ + '{hostname:$h, repo_url:$r, repo_username:$u, repo_password:$p}')" \ + "$RM_SERVER/api/enrollment-tokens" | jq -r .token) + +# 3. emit the install snippet for the target machine +cat <