Files
restic-manager/docs/agent-host-onboarding.md
T

7.5 KiB
Raw Blame History

Onboarding a new host — agent instructions

How an automation agent (with a username + password for the restic-manager server) brings a new host fully online.

The flow is two roles:

  • Controller side: the agent calls JSON APIs on the restic-manager server. Needs network reach to the server, plus username/password.
  • Target side: the host being onboarded runs the install script, which calls back to the server with the one-time token.

If the agent is both sides (e.g. it can SSH into the target), it does steps 12 against the server and steps 34 against the target. If the agent only controls the server, it stops at step 2 and hands the install snippet to whoever owns the target.


Conventions

  • Base URL: $RM_SERVER (e.g. https://restic.lab.example).
  • Session cookie jar: persist rm_session between calls.
  • All request/response bodies are JSON unless noted.
  • On any non-2xx, response body is {"code": "...", "message": "..."}.

1. Login

POST $RM_SERVER/api/auth/login
Content-Type: application/json

{"username": "...", "password": "..."}

→ 200 with {"user_id": "...", "role": "..."} and a Set-Cookie: rm_session=... (HttpOnly, 24h TTL). Persist the cookie; reuse it on every subsequent call.

Required role for the next step: operator or admin. A viewer-only login can read but cannot mint tokens.

Session expires at 24h. On 401 from a later call, re-login.


2. Mint an enrolment token

POST $RM_SERVER/api/enrollment-tokens
Cookie: rm_session=...
Content-Type: application/json

{
  "hostname":      "newhost.example",
  "tags":          ["prod", "london"],          // optional
  "repo_url":      "rest:https://rest.example/newhost",
  "repo_username": "...",                        // optional, for rest-server / S3
  "repo_password": "...",                        // optional
  "initial_paths": ["/etc", "/home", "/var/lib"] // optional; default source group
}

→ 200 with:

{ "token": "<RAW_ONE_TIME_TOKEN>", "expires_at": "2026-05-09T..." }

Capture token immediately — the server only stores its hash and will never return the raw value again. TTL is 1 hour.

The repo creds you provided are encrypted under the token hash and pre-attached to the host. The agent will fetch and store them at enrol-time; you will not need to push them again.

If you lose the token before the install runs, mint a new one (the existing one becomes irrelevant; you can leave it to expire or revoke it via the UI).


3. Install on the target host

The install script is hosted by the server itself. Running on the target:

Linux

curl -fsSL $RM_SERVER/install/install.sh | \
  sudo RM_SERVER=$RM_SERVER RM_TOKEN=<RAW_ONE_TIME_TOKEN> bash

What it does, end-to-end:

  1. detects arch (amd64 / arm64)
  2. downloads $RM_SERVER/agent/binary?os=linux&arch=<arch> to /usr/local/bin/restic-manager-agent
  3. creates /etc/restic-manager/ and /var/lib/restic-manager/ (root:root, 0700)
  4. calls POST /api/agents/enroll with the token; server returns the persistent agent bearer + host_id, written to /etc/restic-manager/agent.env
  5. installs the systemd unit, daemon-reload, enable --now
  6. surfaces any pre-existing restic cron/timer entries so the operator can decide whether to disable them (script does not touch them automatically)

The script is idempotent. Re-running on an already-enrolled host is a no-op unless RM_FORCE_REENROLL=1.

The agent runs as root by design — fleet backup needs to read every file on the system. See deploy/install/restic-manager-agent.service for rationale.

Windows

iwr $RM_SERVER/install/install.ps1 -UseBasicParsing | iex
# (or download + run; needs an elevated PowerShell)
# Required env: $env:RM_SERVER, $env:RM_TOKEN

Same flow, lays down a Windows service instead of a systemd unit.

Manual / non-script enrolment

If the install script can't be used, the wire-level enrol call is:

POST $RM_SERVER/api/agents/enroll
Content-Type: application/json

{
  "token":          "<RAW_ONE_TIME_TOKEN>",
  "hostname":       "newhost.example",
  "os":             "linux",                  // linux | windows
  "arch":           "amd64",                  // amd64 | arm64
  "agent_version":  "...",
  "restic_version": "..."
}

→ 200 with {"host_id": "...", "agent_token": "...", "cert_pin_sha256": "..."}.

The agent_token goes into /etc/restic-manager/agent.env as RM_AGENT_TOKEN=...; subsequent agent → server traffic uses Authorization: Bearer $RM_AGENT_TOKEN.


4. Verify the host is healthy

Poll until both conditions are true. Cap at ~5 minutes.

GET $RM_SERVER/api/hosts
Cookie: rm_session=...

→ array of host objects. Find the one with the matching hostname and check:

  • "status": "online" — agent connected to the WS heartbeat
  • "repo_status": "ready"restic init (or existing-config detection) completed successfully

If repo_status settles on "init_failed", the repo creds are wrong or the repo URL is unreachable from the target. Inspect the matching job log:

GET $RM_SERVER/api/hosts/<host_id>/jobs   (most recent init job)
GET $RM_SERVER/api/jobs/<job_id>          (full output)

Fix the creds with a creds-update call (see Settings → Repo on the UI for the exact route — currently form-only) or revoke the host and start over.


5. (Optional) configure schedules

A new host gets one default source group covering initial_paths (or /etc,/home if you didn't pass any) and no schedule. Backups won't run until either:

  • a schedule is attached (cron expression, retention, etc.), or
  • you trigger an on-demand run via the source-group "Run now" endpoint.

These are not yet exposed cleanly as JSON-only routes; if the agent needs them, look at internal/server/http/schedules*.go and internal/server/http/source_groups*.go — most are JSON- capable, some are form-only with HTML 303 responses.


Failure modes — quick reference

Symptom Likely cause Fix
401 on /api/enrollment-tokens session expired or viewer role re-login as operator+
install.sh fails at "enrol": HTTP 410 token expired (>1h) or already used mint a fresh token
Host shows status=offline after install systemd unit didn't start; firewall blocks WS systemctl status restic-manager-agent, check $RM_SERVER reachability
repo_status=init_failed bad repo creds or URL inspect init job log; fix creds; retry probe via /hosts/{id}/repo/probe
Token list grows with stale rows normal — they expire at 1h optional cleanup via /hosts/enrollment-tokens/{hash}/revoke

Minimum reproducible script

#!/usr/bin/env bash
set -euo pipefail
: "${RM_SERVER:?}" "${RM_USER:?}" "${RM_PASS:?}" "${RM_HOSTNAME:?}" \
  "${RM_REPO_URL:?}" "${RM_REPO_USER:?}" "${RM_REPO_PASS:?}"

JAR=$(mktemp)
trap 'rm -f "$JAR"' EXIT

# 1. login
curl -fsS -c "$JAR" -H 'Content-Type: application/json' \
  -d "{\"username\":\"$RM_USER\",\"password\":\"$RM_PASS\"}" \
  "$RM_SERVER/api/auth/login" >/dev/null

# 2. mint token
TOKEN=$(curl -fsS -b "$JAR" -H 'Content-Type: application/json' \
  -d "$(jq -nc \
        --arg h "$RM_HOSTNAME" --arg u "$RM_REPO_USER" \
        --arg p "$RM_REPO_PASS" --arg r "$RM_REPO_URL" \
        '{hostname:$h, repo_url:$r, repo_username:$u, repo_password:$p}')" \
  "$RM_SERVER/api/enrollment-tokens" | jq -r .token)

# 3. emit the install snippet for the target machine
cat <<EOF
Run on $RM_HOSTNAME (as root):

  curl -fsSL $RM_SERVER/install/install.sh | \\
    sudo RM_SERVER=$RM_SERVER RM_TOKEN=$TOKEN bash
EOF