P1 polish: agent-as-root, init-repo flow, rest creds passthrough, UX fixes
Cohesive batch from a smoke-test session against a real rest-server.
Themed bullets:
* Agent runs as root, sandboxed via systemd. CapabilityBoundingSet
drops to CAP_DAC_READ_SEARCH + restore caps; ProtectSystem=strict
with ReadWritePaths confined to /etc + /var/lib/restic-manager;
NoNewPrivileges blocks escalation. Install script no longer
creates a service user. spec.md §4.2 / §14.1 / §14.3 explain the
rationale (matches UrBackup / Veeam / Bareos defaults; trying to
back up "everything" as an unprivileged user creates silent skips
on /home, /root, /var/lib/* with no upside vs the threat model
the agent already implies).
* Init-repo end-to-end. New JobKind="init" wired through agent
runner, restic.Env.RunInit, server dispatcher, and a UI button
(red "Initialise repo" in the run-now panel). hosts.repo_initialised_at
flips on init success, on backup success, or on a non-empty
snapshots.report. The "Run now" / "Init" / "Retry" branching now
drives both the dashboard host row and the host-detail panel.
Migrations 0004 (column), 0005 (jobs.kind CHECK widened — using
the safe create-new-then-rename pattern; first version corrupted
job_logs.job_id FK), 0006 (cleans up job_logs FK on already-
affected DBs).
* rest-server creds embedded at exec time only. restic.Env gains
RepoUsername; mergeRestCreds() builds the user:pass@-prefixed URL
inside envSlice() and never assigns it back to the struct, so
nothing slog-able ever sees the cleartext form. RedactURL helper
for any future surface that needs to log a URL safely. Both
helpers tested.
* Add-host UX. Repo password is now optional — server mints a
24-byte URL-safe random one and surfaces it once, alongside an
htpasswd snippet ("echo PASS | htpasswd -B -i ... USERNAME") so
the operator pastes one command on the rest-server host and one
on the endpoint. Result page also links the install snippet at
/install/install.sh (was /install.sh — 404'd before) and pipes
to bash (not sh — script uses set -o pipefail and other
bashisms; on Debian/Ubuntu sh is dash).
* Late-subscriber race in JobHub. A fast-failing job could finish
(DB write + Broadcast) before the browser's HX-Redirect → page
load → WS-connect path completed, so the JS sat forever waiting
on a job.finished that already passed. JobHub split into
Register + Send + Run; handleJobStream now subscribes first,
re-fetches the job, and sends a synthetic job.finished if the
state is already terminal.
* HTMX error visibility. New toast partial listens to
htmx:responseError and surfaces the response body as a
bottom-right toast — every server-side validation error now
becomes visible without per-handler JS wiring. Also handles
custom rm:toast events for future server-pushed notifications
via the HX-Trigger header. Themed via existing CSS vars.
* Dashboard rows are now whole-row clickable to host detail
(CSS card-link pattern: absolute-positioned anchor + .row-action
z-index restoration so the action button stays clickable).
"View →" on a running job links to /jobs/<id> rather than
/hosts/<id> since the row click already covers the host page.
* "Run first" / "Run first backup" → "Run now" everywhere for
consistency.
* runbook (docs/e2e-smoke.md) updated — live-log streaming step
now reflects P1-26; mentions the browser-driven Run-now flow.
* _diag/dump-creds — moved out of cmd/ so go build doesn't pick
it up; .gitignore now excludes /_diag/ entirely.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
+28
-4
@@ -50,7 +50,7 @@ func (s *Store) LookupHostByAgentToken(ctx context.Context, tokenHash string) (*
|
||||
enrolled_at, last_seen_at, status, repo_id, tags,
|
||||
current_job_id, last_backup_at, last_backup_status,
|
||||
repo_size_bytes, snapshot_count, open_alert_count,
|
||||
applied_schedule_version, default_paths
|
||||
applied_schedule_version, default_paths, repo_initialised_at
|
||||
FROM hosts WHERE agent_token_hash = ?`,
|
||||
tokenHash)
|
||||
return scanHost(row)
|
||||
@@ -63,7 +63,7 @@ func (s *Store) GetHost(ctx context.Context, id string) (*Host, error) {
|
||||
enrolled_at, last_seen_at, status, repo_id, tags,
|
||||
current_job_id, last_backup_at, last_backup_status,
|
||||
repo_size_bytes, snapshot_count, open_alert_count,
|
||||
applied_schedule_version, default_paths
|
||||
applied_schedule_version, default_paths, repo_initialised_at
|
||||
FROM hosts WHERE id = ?`, id)
|
||||
return scanHost(row)
|
||||
}
|
||||
@@ -124,7 +124,7 @@ func (s *Store) ListHosts(ctx context.Context) ([]Host, error) {
|
||||
enrolled_at, last_seen_at, status, repo_id, tags,
|
||||
current_job_id, last_backup_at, last_backup_status,
|
||||
repo_size_bytes, snapshot_count, open_alert_count,
|
||||
applied_schedule_version, default_paths
|
||||
applied_schedule_version, default_paths, repo_initialised_at
|
||||
FROM hosts ORDER BY name`)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("store: list hosts: %w", err)
|
||||
@@ -163,13 +163,14 @@ func scanHostRow(s hostScanner) (*Host, error) {
|
||||
enrolled string
|
||||
tags string
|
||||
defaultPaths string
|
||||
repoInitAt sql.NullString
|
||||
)
|
||||
err := s.Scan(&h.ID, &h.Name, &h.OS, &h.Arch,
|
||||
&h.AgentVersion, &h.ResticVersion, &h.ProtocolVersion,
|
||||
&enrolled, &lastSeen, &h.Status, &repoID, &tags,
|
||||
¤tJob, &lastBackupAt, &lastBkSt,
|
||||
&h.RepoSizeBytes, &h.SnapshotCount, &h.OpenAlertCount,
|
||||
&h.AppliedScheduleVersion, &defaultPaths)
|
||||
&h.AppliedScheduleVersion, &defaultPaths, &repoInitAt)
|
||||
if err != nil {
|
||||
if errors.Is(err, sql.ErrNoRows) {
|
||||
return nil, ErrNotFound
|
||||
@@ -213,5 +214,28 @@ func scanHostRow(s hostScanner) (*Host, error) {
|
||||
if defaultPaths != "" {
|
||||
_ = json.Unmarshal([]byte(defaultPaths), &h.DefaultPaths)
|
||||
}
|
||||
if repoInitAt.Valid {
|
||||
t, err := time.Parse(time.RFC3339Nano, repoInitAt.String)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("store: parse repo_initialised_at: %w", err)
|
||||
}
|
||||
h.RepoInitialisedAt = &t
|
||||
}
|
||||
return &h, nil
|
||||
}
|
||||
|
||||
// MarkHostRepoInitialised sets repo_initialised_at to `when` if it is
|
||||
// currently NULL. Idempotent: re-firing for an already-initialised
|
||||
// host is a no-op (we never want to clobber the original timestamp).
|
||||
// Returns true if the row was updated, false if it was already set.
|
||||
func (s *Store) MarkHostRepoInitialised(ctx context.Context, hostID string, when time.Time) (bool, error) {
|
||||
res, err := s.db.ExecContext(ctx,
|
||||
`UPDATE hosts SET repo_initialised_at = ?
|
||||
WHERE id = ? AND repo_initialised_at IS NULL`,
|
||||
when.UTC().Format(time.RFC3339Nano), hostID)
|
||||
if err != nil {
|
||||
return false, fmt.Errorf("store: mark repo initialised: %w", err)
|
||||
}
|
||||
n, _ := res.RowsAffected()
|
||||
return n > 0, nil
|
||||
}
|
||||
|
||||
@@ -0,0 +1,15 @@
|
||||
-- 0004_repo_initialised.sql
|
||||
--
|
||||
-- Track whether a host's restic repo has been initialised. Set when:
|
||||
-- 1. a `repo_init` job succeeds, OR
|
||||
-- 2. any backup job succeeds (proves the repo exists), OR
|
||||
-- 3. a snapshots.report arrives with at least one snapshot.
|
||||
--
|
||||
-- Once set, never cleared by code — only by the operator deleting the
|
||||
-- host or wiping the column manually if they re-pointed the agent at
|
||||
-- a different (empty) repo. The UI keys off NULL/non-NULL to decide
|
||||
-- whether to surface the red "Initialise repo" affordance in the
|
||||
-- run-now panel.
|
||||
|
||||
ALTER TABLE hosts
|
||||
ADD COLUMN repo_initialised_at TEXT;
|
||||
@@ -0,0 +1,47 @@
|
||||
-- 0005_jobs_init_kind.sql
|
||||
--
|
||||
-- Add 'init' to the jobs.kind CHECK constraint so the operator can
|
||||
-- dispatch a `restic init` job from the UI before the first backup.
|
||||
-- SQLite can't ALTER a CHECK in place, so we rebuild the table.
|
||||
--
|
||||
-- Rebuild pattern note: we create jobs_new (with the wider CHECK),
|
||||
-- copy data over, DROP the original jobs table, then ALTER RENAME
|
||||
-- jobs_new TO jobs. This avoids the trap of renaming the original
|
||||
-- first — with legacy_alter_table=OFF (the modern default), a rename
|
||||
-- propagates into FK references in dependent tables (e.g.
|
||||
-- job_logs.job_id), leaving them pointing at the temporary name even
|
||||
-- after we drop it. Migration 0006 cleans up the orphan FK left by
|
||||
-- the first version of this migration on already-affected DBs.
|
||||
|
||||
PRAGMA foreign_keys = OFF;
|
||||
|
||||
CREATE TABLE jobs_new (
|
||||
id TEXT PRIMARY KEY,
|
||||
host_id TEXT NOT NULL REFERENCES hosts(id) ON DELETE CASCADE,
|
||||
kind TEXT NOT NULL CHECK (kind IN ('backup','init','forget','prune','check','unlock')),
|
||||
status TEXT NOT NULL CHECK (status IN ('queued','running','succeeded','failed','cancelled')),
|
||||
scheduled_id TEXT REFERENCES schedules(id) ON DELETE SET NULL,
|
||||
actor_kind TEXT NOT NULL CHECK (actor_kind IN ('user','schedule','system')),
|
||||
actor_id TEXT,
|
||||
started_at TEXT,
|
||||
finished_at TEXT,
|
||||
exit_code INTEGER,
|
||||
stats TEXT,
|
||||
error TEXT,
|
||||
created_at TEXT NOT NULL
|
||||
);
|
||||
|
||||
INSERT INTO jobs_new
|
||||
SELECT id, host_id, kind, status, scheduled_id, actor_kind, actor_id,
|
||||
started_at, finished_at, exit_code, stats, error, created_at
|
||||
FROM jobs;
|
||||
|
||||
DROP TABLE jobs;
|
||||
|
||||
ALTER TABLE jobs_new RENAME TO jobs;
|
||||
|
||||
CREATE INDEX jobs_host_id ON jobs(host_id);
|
||||
CREATE INDEX jobs_status ON jobs(status);
|
||||
CREATE INDEX jobs_created_at ON jobs(created_at);
|
||||
|
||||
PRAGMA foreign_keys = ON;
|
||||
@@ -0,0 +1,33 @@
|
||||
-- 0006_fix_job_logs_fk.sql
|
||||
--
|
||||
-- Migration 0005 rebuilt the jobs table via the unsafe pattern of
|
||||
-- renaming the original to jobs_old before dropping it. SQLite (with
|
||||
-- legacy_alter_table=OFF, the modern default) propagated that rename
|
||||
-- into the FK declaration of job_logs.job_id, which is now pointing
|
||||
-- at jobs_old — a table that no longer exists. INSERTs into job_logs
|
||||
-- fail with "no such table: main.jobs_old (1)".
|
||||
--
|
||||
-- Rebuild job_logs using the safe pattern: create job_logs_new with
|
||||
-- a clean FK to jobs, copy rows, drop the broken job_logs, rename
|
||||
-- job_logs_new to job_logs. Renaming job_logs_new is safe because
|
||||
-- nothing references it.
|
||||
|
||||
PRAGMA foreign_keys = OFF;
|
||||
|
||||
CREATE TABLE job_logs_new (
|
||||
job_id TEXT NOT NULL REFERENCES jobs(id) ON DELETE CASCADE,
|
||||
seq INTEGER NOT NULL,
|
||||
ts TEXT NOT NULL,
|
||||
stream TEXT NOT NULL CHECK (stream IN ('stdout','stderr','event')),
|
||||
payload TEXT NOT NULL,
|
||||
PRIMARY KEY (job_id, seq)
|
||||
);
|
||||
|
||||
INSERT INTO job_logs_new (job_id, seq, ts, stream, payload)
|
||||
SELECT job_id, seq, ts, stream, payload FROM job_logs;
|
||||
|
||||
DROP TABLE job_logs;
|
||||
|
||||
ALTER TABLE job_logs_new RENAME TO job_logs;
|
||||
|
||||
PRAGMA foreign_keys = ON;
|
||||
@@ -62,6 +62,12 @@ type Host struct {
|
||||
// operator hits "Run now" without supplying paths. Phase 1
|
||||
// interim — schedules (P2-01) supersede this.
|
||||
DefaultPaths []string
|
||||
// RepoInitialisedAt is non-nil once we've confirmed the host's
|
||||
// repo has been initialised — either the operator clicked the
|
||||
// init button, or a backup succeeded, or snapshots.report came
|
||||
// back non-empty. The host detail run-now panel shows a red
|
||||
// "Initialise repo" affordance while this is nil.
|
||||
RepoInitialisedAt *time.Time
|
||||
}
|
||||
|
||||
// EnrollmentToken is the issuer's view of a one-time token. The
|
||||
|
||||
Reference in New Issue
Block a user