docs: AI-agent host onboarding guide #25

Merged
steve merged 1 commits from temp-onboarding into main 2026-05-09 12:22:55 +01:00
+249
View File
@@ -0,0 +1,249 @@
# Onboarding a new host — agent instructions
How an automation agent (with a username + password for the
restic-manager server) brings a new host fully online.
The flow is two roles:
- **Controller side**: the agent calls JSON APIs on the
restic-manager server. Needs network reach to the server, plus
username/password.
- **Target side**: the host being onboarded runs the install
script, which calls back to the server with the one-time token.
If the agent is *both* sides (e.g. it can SSH into the target),
it does steps 12 against the server and steps 34 against the
target. If the agent only controls the server, it stops at
step 2 and hands the install snippet to whoever owns the target.
---
## Conventions
- Base URL: `$RM_SERVER` (e.g. `https://restic.lab.example`).
- Session cookie jar: persist `rm_session` between calls.
- All request/response bodies are JSON unless noted.
- On any non-2xx, response body is
`{"code": "...", "message": "..."}`.
---
## 1. Login
```
POST $RM_SERVER/api/auth/login
Content-Type: application/json
{"username": "...", "password": "..."}
```
→ 200 with `{"user_id": "...", "role": "..."}` and a `Set-Cookie:
rm_session=...` (HttpOnly, 24h TTL). Persist the cookie; reuse
it on every subsequent call.
Required role for the next step: **operator** or **admin**.
A viewer-only login can read but cannot mint tokens.
Session expires at 24h. On 401 from a later call, re-login.
---
## 2. Mint an enrolment token
```
POST $RM_SERVER/api/enrollment-tokens
Cookie: rm_session=...
Content-Type: application/json
{
"hostname": "newhost.example",
"tags": ["prod", "london"], // optional
"repo_url": "rest:https://rest.example/newhost",
"repo_username": "...", // optional, for rest-server / S3
"repo_password": "...", // optional
"initial_paths": ["/etc", "/home", "/var/lib"] // optional; default source group
}
```
→ 200 with:
```json
{ "token": "<RAW_ONE_TIME_TOKEN>", "expires_at": "2026-05-09T..." }
```
**Capture `token` immediately — the server only stores its hash
and will never return the raw value again.** TTL is 1 hour.
The repo creds you provided are encrypted under the token hash
and pre-attached to the host. The agent will fetch and store
them at enrol-time; you will not need to push them again.
If you lose the token before the install runs, mint a new one
(the existing one becomes irrelevant; you can leave it to expire
or revoke it via the UI).
---
## 3. Install on the target host
The install script is hosted by the server itself. Running on the
target:
### Linux
```
curl -fsSL $RM_SERVER/install/install.sh | \
sudo RM_SERVER=$RM_SERVER RM_TOKEN=<RAW_ONE_TIME_TOKEN> bash
```
What it does, end-to-end:
1. detects arch (amd64 / arm64)
2. downloads `$RM_SERVER/agent/binary?os=linux&arch=<arch>` to
`/usr/local/bin/restic-manager-agent`
3. creates `/etc/restic-manager/` and `/var/lib/restic-manager/`
(root:root, 0700)
4. calls `POST /api/agents/enroll` with the token; server returns
the persistent agent bearer + `host_id`, written to
`/etc/restic-manager/agent.env`
5. installs the systemd unit, `daemon-reload`, `enable --now`
6. surfaces any pre-existing restic cron/timer entries so the
operator can decide whether to disable them (script does
*not* touch them automatically)
The script is idempotent. Re-running on an already-enrolled host
is a no-op unless `RM_FORCE_REENROLL=1`.
The agent runs as **root** by design — fleet backup needs to
read every file on the system. See
`deploy/install/restic-manager-agent.service` for rationale.
### Windows
```
iwr $RM_SERVER/install/install.ps1 -UseBasicParsing | iex
# (or download + run; needs an elevated PowerShell)
# Required env: $env:RM_SERVER, $env:RM_TOKEN
```
Same flow, lays down a Windows service instead of a systemd unit.
### Manual / non-script enrolment
If the install script can't be used, the wire-level enrol call is:
```
POST $RM_SERVER/api/agents/enroll
Content-Type: application/json
{
"token": "<RAW_ONE_TIME_TOKEN>",
"hostname": "newhost.example",
"os": "linux", // linux | windows
"arch": "amd64", // amd64 | arm64
"agent_version": "...",
"restic_version": "..."
}
```
→ 200 with
`{"host_id": "...", "agent_token": "...", "cert_pin_sha256": "..."}`.
The agent_token goes into `/etc/restic-manager/agent.env` as
`RM_AGENT_TOKEN=...`; subsequent agent → server traffic uses
`Authorization: Bearer $RM_AGENT_TOKEN`.
---
## 4. Verify the host is healthy
Poll until both conditions are true. Cap at ~5 minutes.
```
GET $RM_SERVER/api/hosts
Cookie: rm_session=...
```
→ array of host objects. Find the one with the matching hostname
and check:
- `"status": "online"` — agent connected to the WS heartbeat
- `"repo_status": "ready"``restic init` (or existing-config
detection) completed successfully
If `repo_status` settles on `"init_failed"`, the repo creds are
wrong or the repo URL is unreachable from the target. Inspect
the matching job log:
```
GET $RM_SERVER/api/hosts/<host_id>/jobs (most recent init job)
GET $RM_SERVER/api/jobs/<job_id> (full output)
```
Fix the creds with a creds-update call (see Settings → Repo on
the UI for the exact route — currently form-only) or revoke the
host and start over.
---
## 5. (Optional) configure schedules
A new host gets one default source group covering `initial_paths`
(or `/etc`,`/home` if you didn't pass any) and **no schedule**.
Backups won't run until either:
- a schedule is attached (cron expression, retention, etc.), or
- you trigger an on-demand run via the source-group "Run now"
endpoint.
These are not yet exposed cleanly as JSON-only routes; if the
agent needs them, look at `internal/server/http/schedules*.go`
and `internal/server/http/source_groups*.go` — most are JSON-
capable, some are form-only with HTML 303 responses.
---
## Failure modes — quick reference
| Symptom | Likely cause | Fix |
|---|---|---|
| `401` on `/api/enrollment-tokens` | session expired or viewer role | re-login as operator+ |
| install.sh fails at "enrol": HTTP 410 | token expired (>1h) or already used | mint a fresh token |
| Host shows `status=offline` after install | systemd unit didn't start; firewall blocks WS | `systemctl status restic-manager-agent`, check `$RM_SERVER` reachability |
| `repo_status=init_failed` | bad repo creds or URL | inspect init job log; fix creds; retry probe via `/hosts/{id}/repo/probe` |
| Token list grows with stale rows | normal — they expire at 1h | optional cleanup via `/hosts/enrollment-tokens/{hash}/revoke` |
---
## Minimum reproducible script
```bash
#!/usr/bin/env bash
set -euo pipefail
: "${RM_SERVER:?}" "${RM_USER:?}" "${RM_PASS:?}" "${RM_HOSTNAME:?}" \
"${RM_REPO_URL:?}" "${RM_REPO_USER:?}" "${RM_REPO_PASS:?}"
JAR=$(mktemp)
trap 'rm -f "$JAR"' EXIT
# 1. login
curl -fsS -c "$JAR" -H 'Content-Type: application/json' \
-d "{\"username\":\"$RM_USER\",\"password\":\"$RM_PASS\"}" \
"$RM_SERVER/api/auth/login" >/dev/null
# 2. mint token
TOKEN=$(curl -fsS -b "$JAR" -H 'Content-Type: application/json' \
-d "$(jq -nc \
--arg h "$RM_HOSTNAME" --arg u "$RM_REPO_USER" \
--arg p "$RM_REPO_PASS" --arg r "$RM_REPO_URL" \
'{hostname:$h, repo_url:$r, repo_username:$u, repo_password:$p}')" \
"$RM_SERVER/api/enrollment-tokens" | jq -r .token)
# 3. emit the install snippet for the target machine
cat <<EOF
Run on $RM_HOSTNAME (as root):
curl -fsSL $RM_SERVER/install/install.sh | \\
sudo RM_SERVER=$RM_SERVER RM_TOKEN=$TOKEN bash
EOF
```