Merge pull request 'docs: AI-agent host onboarding guide' (#25) from temp-onboarding into main
Reviewed-on: #25
This commit was merged in pull request #25.
This commit is contained in:
@@ -0,0 +1,249 @@
|
||||
# Onboarding a new host — agent instructions
|
||||
|
||||
How an automation agent (with a username + password for the
|
||||
restic-manager server) brings a new host fully online.
|
||||
|
||||
The flow is two roles:
|
||||
|
||||
- **Controller side**: the agent calls JSON APIs on the
|
||||
restic-manager server. Needs network reach to the server, plus
|
||||
username/password.
|
||||
- **Target side**: the host being onboarded runs the install
|
||||
script, which calls back to the server with the one-time token.
|
||||
|
||||
If the agent is *both* sides (e.g. it can SSH into the target),
|
||||
it does steps 1–2 against the server and steps 3–4 against the
|
||||
target. If the agent only controls the server, it stops at
|
||||
step 2 and hands the install snippet to whoever owns the target.
|
||||
|
||||
---
|
||||
|
||||
## Conventions
|
||||
|
||||
- Base URL: `$RM_SERVER` (e.g. `https://restic.lab.example`).
|
||||
- Session cookie jar: persist `rm_session` between calls.
|
||||
- All request/response bodies are JSON unless noted.
|
||||
- On any non-2xx, response body is
|
||||
`{"code": "...", "message": "..."}`.
|
||||
|
||||
---
|
||||
|
||||
## 1. Login
|
||||
|
||||
```
|
||||
POST $RM_SERVER/api/auth/login
|
||||
Content-Type: application/json
|
||||
|
||||
{"username": "...", "password": "..."}
|
||||
```
|
||||
|
||||
→ 200 with `{"user_id": "...", "role": "..."}` and a `Set-Cookie:
|
||||
rm_session=...` (HttpOnly, 24h TTL). Persist the cookie; reuse
|
||||
it on every subsequent call.
|
||||
|
||||
Required role for the next step: **operator** or **admin**.
|
||||
A viewer-only login can read but cannot mint tokens.
|
||||
|
||||
Session expires at 24h. On 401 from a later call, re-login.
|
||||
|
||||
---
|
||||
|
||||
## 2. Mint an enrolment token
|
||||
|
||||
```
|
||||
POST $RM_SERVER/api/enrollment-tokens
|
||||
Cookie: rm_session=...
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"hostname": "newhost.example",
|
||||
"tags": ["prod", "london"], // optional
|
||||
"repo_url": "rest:https://rest.example/newhost",
|
||||
"repo_username": "...", // optional, for rest-server / S3
|
||||
"repo_password": "...", // optional
|
||||
"initial_paths": ["/etc", "/home", "/var/lib"] // optional; default source group
|
||||
}
|
||||
```
|
||||
|
||||
→ 200 with:
|
||||
|
||||
```json
|
||||
{ "token": "<RAW_ONE_TIME_TOKEN>", "expires_at": "2026-05-09T..." }
|
||||
```
|
||||
|
||||
**Capture `token` immediately — the server only stores its hash
|
||||
and will never return the raw value again.** TTL is 1 hour.
|
||||
|
||||
The repo creds you provided are encrypted under the token hash
|
||||
and pre-attached to the host. The agent will fetch and store
|
||||
them at enrol-time; you will not need to push them again.
|
||||
|
||||
If you lose the token before the install runs, mint a new one
|
||||
(the existing one becomes irrelevant; you can leave it to expire
|
||||
or revoke it via the UI).
|
||||
|
||||
---
|
||||
|
||||
## 3. Install on the target host
|
||||
|
||||
The install script is hosted by the server itself. Running on the
|
||||
target:
|
||||
|
||||
### Linux
|
||||
|
||||
```
|
||||
curl -fsSL $RM_SERVER/install/install.sh | \
|
||||
sudo RM_SERVER=$RM_SERVER RM_TOKEN=<RAW_ONE_TIME_TOKEN> bash
|
||||
```
|
||||
|
||||
What it does, end-to-end:
|
||||
|
||||
1. detects arch (amd64 / arm64)
|
||||
2. downloads `$RM_SERVER/agent/binary?os=linux&arch=<arch>` to
|
||||
`/usr/local/bin/restic-manager-agent`
|
||||
3. creates `/etc/restic-manager/` and `/var/lib/restic-manager/`
|
||||
(root:root, 0700)
|
||||
4. calls `POST /api/agents/enroll` with the token; server returns
|
||||
the persistent agent bearer + `host_id`, written to
|
||||
`/etc/restic-manager/agent.env`
|
||||
5. installs the systemd unit, `daemon-reload`, `enable --now`
|
||||
6. surfaces any pre-existing restic cron/timer entries so the
|
||||
operator can decide whether to disable them (script does
|
||||
*not* touch them automatically)
|
||||
|
||||
The script is idempotent. Re-running on an already-enrolled host
|
||||
is a no-op unless `RM_FORCE_REENROLL=1`.
|
||||
|
||||
The agent runs as **root** by design — fleet backup needs to
|
||||
read every file on the system. See
|
||||
`deploy/install/restic-manager-agent.service` for rationale.
|
||||
|
||||
### Windows
|
||||
|
||||
```
|
||||
iwr $RM_SERVER/install/install.ps1 -UseBasicParsing | iex
|
||||
# (or download + run; needs an elevated PowerShell)
|
||||
# Required env: $env:RM_SERVER, $env:RM_TOKEN
|
||||
```
|
||||
|
||||
Same flow, lays down a Windows service instead of a systemd unit.
|
||||
|
||||
### Manual / non-script enrolment
|
||||
|
||||
If the install script can't be used, the wire-level enrol call is:
|
||||
|
||||
```
|
||||
POST $RM_SERVER/api/agents/enroll
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"token": "<RAW_ONE_TIME_TOKEN>",
|
||||
"hostname": "newhost.example",
|
||||
"os": "linux", // linux | windows
|
||||
"arch": "amd64", // amd64 | arm64
|
||||
"agent_version": "...",
|
||||
"restic_version": "..."
|
||||
}
|
||||
```
|
||||
|
||||
→ 200 with
|
||||
`{"host_id": "...", "agent_token": "...", "cert_pin_sha256": "..."}`.
|
||||
|
||||
The agent_token goes into `/etc/restic-manager/agent.env` as
|
||||
`RM_AGENT_TOKEN=...`; subsequent agent → server traffic uses
|
||||
`Authorization: Bearer $RM_AGENT_TOKEN`.
|
||||
|
||||
---
|
||||
|
||||
## 4. Verify the host is healthy
|
||||
|
||||
Poll until both conditions are true. Cap at ~5 minutes.
|
||||
|
||||
```
|
||||
GET $RM_SERVER/api/hosts
|
||||
Cookie: rm_session=...
|
||||
```
|
||||
|
||||
→ array of host objects. Find the one with the matching hostname
|
||||
and check:
|
||||
|
||||
- `"status": "online"` — agent connected to the WS heartbeat
|
||||
- `"repo_status": "ready"` — `restic init` (or existing-config
|
||||
detection) completed successfully
|
||||
|
||||
If `repo_status` settles on `"init_failed"`, the repo creds are
|
||||
wrong or the repo URL is unreachable from the target. Inspect
|
||||
the matching job log:
|
||||
|
||||
```
|
||||
GET $RM_SERVER/api/hosts/<host_id>/jobs (most recent init job)
|
||||
GET $RM_SERVER/api/jobs/<job_id> (full output)
|
||||
```
|
||||
|
||||
Fix the creds with a creds-update call (see Settings → Repo on
|
||||
the UI for the exact route — currently form-only) or revoke the
|
||||
host and start over.
|
||||
|
||||
---
|
||||
|
||||
## 5. (Optional) configure schedules
|
||||
|
||||
A new host gets one default source group covering `initial_paths`
|
||||
(or `/etc`,`/home` if you didn't pass any) and **no schedule**.
|
||||
Backups won't run until either:
|
||||
|
||||
- a schedule is attached (cron expression, retention, etc.), or
|
||||
- you trigger an on-demand run via the source-group "Run now"
|
||||
endpoint.
|
||||
|
||||
These are not yet exposed cleanly as JSON-only routes; if the
|
||||
agent needs them, look at `internal/server/http/schedules*.go`
|
||||
and `internal/server/http/source_groups*.go` — most are JSON-
|
||||
capable, some are form-only with HTML 303 responses.
|
||||
|
||||
---
|
||||
|
||||
## Failure modes — quick reference
|
||||
|
||||
| Symptom | Likely cause | Fix |
|
||||
|---|---|---|
|
||||
| `401` on `/api/enrollment-tokens` | session expired or viewer role | re-login as operator+ |
|
||||
| install.sh fails at "enrol": HTTP 410 | token expired (>1h) or already used | mint a fresh token |
|
||||
| Host shows `status=offline` after install | systemd unit didn't start; firewall blocks WS | `systemctl status restic-manager-agent`, check `$RM_SERVER` reachability |
|
||||
| `repo_status=init_failed` | bad repo creds or URL | inspect init job log; fix creds; retry probe via `/hosts/{id}/repo/probe` |
|
||||
| Token list grows with stale rows | normal — they expire at 1h | optional cleanup via `/hosts/enrollment-tokens/{hash}/revoke` |
|
||||
|
||||
---
|
||||
|
||||
## Minimum reproducible script
|
||||
|
||||
```bash
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
: "${RM_SERVER:?}" "${RM_USER:?}" "${RM_PASS:?}" "${RM_HOSTNAME:?}" \
|
||||
"${RM_REPO_URL:?}" "${RM_REPO_USER:?}" "${RM_REPO_PASS:?}"
|
||||
|
||||
JAR=$(mktemp)
|
||||
trap 'rm -f "$JAR"' EXIT
|
||||
|
||||
# 1. login
|
||||
curl -fsS -c "$JAR" -H 'Content-Type: application/json' \
|
||||
-d "{\"username\":\"$RM_USER\",\"password\":\"$RM_PASS\"}" \
|
||||
"$RM_SERVER/api/auth/login" >/dev/null
|
||||
|
||||
# 2. mint token
|
||||
TOKEN=$(curl -fsS -b "$JAR" -H 'Content-Type: application/json' \
|
||||
-d "$(jq -nc \
|
||||
--arg h "$RM_HOSTNAME" --arg u "$RM_REPO_USER" \
|
||||
--arg p "$RM_REPO_PASS" --arg r "$RM_REPO_URL" \
|
||||
'{hostname:$h, repo_url:$r, repo_username:$u, repo_password:$p}')" \
|
||||
"$RM_SERVER/api/enrollment-tokens" | jq -r .token)
|
||||
|
||||
# 3. emit the install snippet for the target machine
|
||||
cat <<EOF
|
||||
Run on $RM_HOSTNAME (as root):
|
||||
|
||||
curl -fsSL $RM_SERVER/install/install.sh | \\
|
||||
sudo RM_SERVER=$RM_SERVER RM_TOKEN=$TOKEN bash
|
||||
EOF
|
||||
```
|
||||
Reference in New Issue
Block a user