89537d417a
P5-01 — Documentation site under docs/book/ rendered with mdBook
(downloaded via Makefile, same static-binary pattern as Tailwind).
Structured chapters: getting started, concepts, operations,
security, reference. `make docs` / `make docs-watch`. Generated
output gitignored.
P5-02 — CONTRIBUTING.md rewritten from placeholder to a full
guide. CODE_OF_CONDUCT.md adapted from Contributor Covenant for a
single-maintainer project. .gitea/issue_template/{bug,feature}.md
and PULL_REQUEST_TEMPLATE.md.
P5-04 — Six README screenshots captured live from a fresh server
bootstrap (login, empty dashboard, add-host, alerts, settings,
audit log). README rewritten to centre the screenshot grid and
link out to the docs site.
P5-05 — SECURITY.md with disclosure policy (3-day ack, 30-day
default window), scope in/out, threat-model summary, operator
hardening checklist. Mirrored as a docs-site chapter.
P5-06 — End-to-end test harness. e2e/compose.e2e.yml brings up
server + sibling Linux agent (alpine + restic) + restic/rest-server.
Agent uses announce-and-approve so Playwright can drive the full
operator flow: bootstrap → login → accept pending → backup →
verify terminal status. Second spec scrapes /metrics to assert
the P6-04 endpoint surface. .gitea/workflows/e2e.yml runs on every
PR; local how-to in docs/e2e.md.
51 lines
1.8 KiB
Markdown
51 lines
1.8 KiB
Markdown
# Updating agents
|
|
|
|
Server updates are a `docker compose pull && up -d` away.
|
|
Agents update via the control plane.
|
|
|
|
## Single-host update
|
|
|
|
Each host's detail page shows an **Update agent** button when
|
|
the agent's reported version is older than the server's. The
|
|
button:
|
|
|
|
1. Dispatches a `command.update` to that host.
|
|
2. The agent fetches the appropriate binary from
|
|
`$RM_SERVER/agent/binary?os=…&arch=…` to
|
|
`<binary-path>.new`.
|
|
3. Copies the running binary to `<binary-path>.old` (one
|
|
revision back, in case rollback is needed).
|
|
4. Atomic-renames `.new` over the running binary.
|
|
5. Exits cleanly. systemd's `Restart=always` (or Windows SCM)
|
|
brings the process back on the new binary.
|
|
|
|
A 90-second timer on the server side waits for a hello at the
|
|
target version and marks the update succeeded — or, if the
|
|
agent doesn't reconnect at the expected version in time, marks
|
|
the update **failed** and raises an `update_failed` alert.
|
|
|
|
## Fleet update
|
|
|
|
The admin-only **Settings → Fleet update** page drives a rolling
|
|
update across every host in the fleet:
|
|
|
|
- One host at a time.
|
|
- Wait for hello-with-target-version (max 95s).
|
|
- On any host failing, **halt** the rollout, raise a
|
|
`fleet_update_halted` alert, leave the rest of the fleet on
|
|
the old version. No surprise mass-failures.
|
|
|
|
You can cancel an in-progress fleet update; the worker stops
|
|
after the current host finishes.
|
|
|
|
## TLS and corruption
|
|
|
|
Updates rely on the reverse proxy's TLS to detect corruption in
|
|
transit. There's no separate sha256 verification step — we
|
|
chose the simpler model on the basis that the same TLS already
|
|
gates every other byte the server hands to the agent.
|
|
|
|
If you'd like a separate signature step before applying updates,
|
|
that's a future-phase enhancement (see `tasks.md` Phase 6
|
|
candidates).
|