P1 polish: agent-as-root, init-repo flow, rest creds passthrough, UX fixes
Cohesive batch from a smoke-test session against a real rest-server.
Themed bullets:
* Agent runs as root, sandboxed via systemd. CapabilityBoundingSet
drops to CAP_DAC_READ_SEARCH + restore caps; ProtectSystem=strict
with ReadWritePaths confined to /etc + /var/lib/restic-manager;
NoNewPrivileges blocks escalation. Install script no longer
creates a service user. spec.md §4.2 / §14.1 / §14.3 explain the
rationale (matches UrBackup / Veeam / Bareos defaults; trying to
back up "everything" as an unprivileged user creates silent skips
on /home, /root, /var/lib/* with no upside vs the threat model
the agent already implies).
* Init-repo end-to-end. New JobKind="init" wired through agent
runner, restic.Env.RunInit, server dispatcher, and a UI button
(red "Initialise repo" in the run-now panel). hosts.repo_initialised_at
flips on init success, on backup success, or on a non-empty
snapshots.report. The "Run now" / "Init" / "Retry" branching now
drives both the dashboard host row and the host-detail panel.
Migrations 0004 (column), 0005 (jobs.kind CHECK widened — using
the safe create-new-then-rename pattern; first version corrupted
job_logs.job_id FK), 0006 (cleans up job_logs FK on already-
affected DBs).
* rest-server creds embedded at exec time only. restic.Env gains
RepoUsername; mergeRestCreds() builds the user:pass@-prefixed URL
inside envSlice() and never assigns it back to the struct, so
nothing slog-able ever sees the cleartext form. RedactURL helper
for any future surface that needs to log a URL safely. Both
helpers tested.
* Add-host UX. Repo password is now optional — server mints a
24-byte URL-safe random one and surfaces it once, alongside an
htpasswd snippet ("echo PASS | htpasswd -B -i ... USERNAME") so
the operator pastes one command on the rest-server host and one
on the endpoint. Result page also links the install snippet at
/install/install.sh (was /install.sh — 404'd before) and pipes
to bash (not sh — script uses set -o pipefail and other
bashisms; on Debian/Ubuntu sh is dash).
* Late-subscriber race in JobHub. A fast-failing job could finish
(DB write + Broadcast) before the browser's HX-Redirect → page
load → WS-connect path completed, so the JS sat forever waiting
on a job.finished that already passed. JobHub split into
Register + Send + Run; handleJobStream now subscribes first,
re-fetches the job, and sends a synthetic job.finished if the
state is already terminal.
* HTMX error visibility. New toast partial listens to
htmx:responseError and surfaces the response body as a
bottom-right toast — every server-side validation error now
becomes visible without per-handler JS wiring. Also handles
custom rm:toast events for future server-pushed notifications
via the HX-Trigger header. Themed via existing CSS vars.
* Dashboard rows are now whole-row clickable to host detail
(CSS card-link pattern: absolute-positioned anchor + .row-action
z-index restoration so the action button stays clickable).
"View →" on a running job links to /jobs/<id> rather than
/hosts/<id> since the row click already covers the host page.
* "Run first" / "Run first backup" → "Run now" everywhere for
consistency.
* runbook (docs/e2e-smoke.md) updated — live-log streaming step
now reflects P1-26; mentions the browser-driven Run-now flow.
* _diag/dump-creds — moved out of cmd/ so go build doesn't pick
it up; .gitignore now excludes /_diag/ entirely.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
+14
-26
@@ -2,19 +2,23 @@
|
||||
# install.sh — Linux installer for the restic-manager agent.
|
||||
#
|
||||
# Usage (paste in shell):
|
||||
# curl -fsSL https://restic.lab.example/install.sh | \
|
||||
# sudo RM_SERVER=https://restic.lab.example RM_TOKEN=<one-time-token> sh
|
||||
# curl -fsSL https://restic.lab.example/install/install.sh | \
|
||||
# sudo RM_SERVER=https://restic.lab.example RM_TOKEN=<one-time-token> bash
|
||||
#
|
||||
# What it does:
|
||||
# 1. detects arch (amd64 / arm64)
|
||||
# 2. fetches the matching agent binary from the server
|
||||
# 3. creates the restic-manager-agent service user
|
||||
# 4. lays down /etc/restic-manager/, /var/lib/restic-manager/
|
||||
# 5. enrolls (POST /api/agents/enroll) using RM_TOKEN
|
||||
# 6. installs the systemd unit, enables, starts
|
||||
# 7. surfaces (but does NOT disable) any existing restic timers /
|
||||
# 3. lays down /etc/restic-manager/, /var/lib/restic-manager/ (root:root, 0700)
|
||||
# 4. enrolls (POST /api/agents/enroll) using RM_TOKEN
|
||||
# 5. installs the systemd unit, enables, starts
|
||||
# 6. surfaces (but does NOT disable) any existing restic timers /
|
||||
# cron entries so the operator can decide what to do
|
||||
#
|
||||
# The agent runs as root. See restic-manager-agent.service for the
|
||||
# rationale (in short: a fleet-backup tool must read every file on
|
||||
# the system; trying to do that unprivileged buys very little
|
||||
# security and creates large UX cliffs).
|
||||
#
|
||||
# Idempotent — safe to re-run; will refuse if already enrolled
|
||||
# unless RM_FORCE_REENROLL=1 is set.
|
||||
|
||||
@@ -25,8 +29,6 @@ set -euo pipefail
|
||||
: "${RM_INSTALL_PREFIX:=/usr/local/bin}"
|
||||
: "${RM_CONFIG_DIR:=/etc/restic-manager}"
|
||||
: "${RM_STATE_DIR:=/var/lib/restic-manager}"
|
||||
: "${RM_USER:=restic-manager-agent}"
|
||||
: "${RM_GROUP:=restic-manager-agent}"
|
||||
: "${RM_FORCE_REENROLL:=0}"
|
||||
|
||||
require_root() {
|
||||
@@ -44,21 +46,9 @@ detect_arch() {
|
||||
esac
|
||||
}
|
||||
|
||||
ensure_user() {
|
||||
if ! getent group "$RM_GROUP" >/dev/null; then
|
||||
groupadd --system "$RM_GROUP"
|
||||
fi
|
||||
if ! getent passwd "$RM_USER" >/dev/null; then
|
||||
useradd --system --gid "$RM_GROUP" \
|
||||
--home-dir "$RM_STATE_DIR" --no-create-home \
|
||||
--shell /usr/sbin/nologin \
|
||||
"$RM_USER"
|
||||
fi
|
||||
}
|
||||
|
||||
ensure_dirs() {
|
||||
install -d -m 0750 -o "$RM_USER" -g "$RM_GROUP" "$RM_CONFIG_DIR"
|
||||
install -d -m 0750 -o "$RM_USER" -g "$RM_GROUP" "$RM_STATE_DIR"
|
||||
install -d -m 0700 -o root -g root "$RM_CONFIG_DIR"
|
||||
install -d -m 0700 -o root -g root "$RM_STATE_DIR"
|
||||
}
|
||||
|
||||
detect_existing_schedulers() {
|
||||
@@ -121,8 +111,7 @@ enroll_agent() {
|
||||
fi
|
||||
|
||||
echo "==> Enrolling agent with $RM_SERVER"
|
||||
sudo -u "$RM_USER" \
|
||||
"$RM_INSTALL_PREFIX/restic-manager-agent" \
|
||||
"$RM_INSTALL_PREFIX/restic-manager-agent" \
|
||||
-config "$cfg" \
|
||||
-enroll-server "$RM_SERVER" \
|
||||
-enroll-token "$RM_TOKEN"
|
||||
@@ -142,7 +131,6 @@ install_unit() {
|
||||
|
||||
main() {
|
||||
require_root
|
||||
ensure_user
|
||||
ensure_dirs
|
||||
download_agent
|
||||
detect_existing_schedulers
|
||||
|
||||
@@ -10,20 +10,34 @@ ExecStart=/usr/local/bin/restic-manager-agent -config /etc/restic-manager/agent.
|
||||
Restart=always
|
||||
RestartSec=5
|
||||
|
||||
# Run as a dedicated unprivileged user; the install script creates it.
|
||||
User=restic-manager-agent
|
||||
Group=restic-manager-agent
|
||||
# The agent runs as root. A fleet-backup tool needs to read every
|
||||
# file on the system regardless of DAC permissions; running as a
|
||||
# dedicated unprivileged user means either silent skips on /home,
|
||||
# /root, /var/lib/<other-daemons>, or operators having to add the
|
||||
# service user to every group whose files they want backed up. Both
|
||||
# are worse than the threat model already implies (the agent holds
|
||||
# repo credentials, executes arbitrary restic, and runs operator-
|
||||
# defined hooks — its blast radius is already large).
|
||||
#
|
||||
# The mitigation is aggressive systemd sandboxing of the root
|
||||
# process: drop all capabilities except the few we need, deny
|
||||
# writes outside our state dirs, and forbid privilege escalation.
|
||||
User=root
|
||||
Group=root
|
||||
|
||||
# The agent reads its config and writes a small state file there.
|
||||
# Anything else is read-only.
|
||||
ReadWritePaths=/etc/restic-manager /var/lib/restic-manager
|
||||
# CAP_DAC_READ_SEARCH lets us read any file regardless of DAC perms
|
||||
# (the "backup everything" capability). CAP_DAC_OVERRIDE is needed
|
||||
# during restore for chown/chmod to recreate ownership. Drop the
|
||||
# rest — root in this process means "can read", not "can do".
|
||||
CapabilityBoundingSet=CAP_DAC_READ_SEARCH CAP_DAC_OVERRIDE CAP_FOWNER CAP_CHOWN
|
||||
AmbientCapabilities=CAP_DAC_READ_SEARCH CAP_DAC_OVERRIDE CAP_FOWNER CAP_CHOWN
|
||||
|
||||
# Hardening — restic itself needs filesystem read access to whatever
|
||||
# paths it's backing up; we don't lock that down here. But everything
|
||||
# else gets the standard systemd sandboxing toggles.
|
||||
# Hardening — blocks privilege escalation even from root, and
|
||||
# confines writes / network / kernel access to what restic actually
|
||||
# needs. Filesystem reads stay open: that's the whole job.
|
||||
NoNewPrivileges=true
|
||||
PrivateTmp=true
|
||||
ProtectSystem=strict
|
||||
ReadWritePaths=/etc/restic-manager /var/lib/restic-manager
|
||||
ProtectHome=read-only
|
||||
ProtectHostname=true
|
||||
ProtectKernelTunables=true
|
||||
@@ -31,12 +45,16 @@ ProtectKernelModules=true
|
||||
ProtectKernelLogs=true
|
||||
ProtectControlGroups=true
|
||||
ProtectClock=true
|
||||
PrivateTmp=true
|
||||
RestrictAddressFamilies=AF_UNIX AF_INET AF_INET6
|
||||
RestrictRealtime=true
|
||||
RestrictSUIDSGID=true
|
||||
RestrictNamespaces=true
|
||||
LockPersonality=true
|
||||
MemoryDenyWriteExecute=true
|
||||
SystemCallArchitectures=native
|
||||
SystemCallFilter=@system-service
|
||||
SystemCallFilter=~@privileged @resources @reboot @swap @module @raw-io
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
|
||||
Reference in New Issue
Block a user