main
7 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
02e4ef7544 |
testing: bootstrap UI, agent reliability, NS-01..04 + alert username
Smoothes the rough edges that came up exercising a live deployment.
First-run bootstrap UI: /bootstrap renders a username + password form
that uses the in-memory token directly (operator no longer copies it
out of the log); /login redirects there while bootstrap is available.
Agent reliability: failJob synthetic envelopes so command.run early
returns no longer hang the server-side job; runtime probe of restic
restore --help drives --no-ownership instead of version sniffing
(0.18.x had it removed). Server unit re-shaped: ProtectSystem=full
plus ReadWritePaths=/etc/restic-manager, no ProtectHome — restore
can now write anywhere a user might want.
Restore wizard: default target is /root/rm-restore/<job-id>/ with
clearer help text. Re-init confirm input uses .field (was .input,
which doesn't exist — text was invisible).
NS-01 host delete: store DeleteHost, admin-band /hosts/{id}/delete
with hostname-confirm danger zone, audit, FK cascade, live WS close.
NS-02 enrollment-token recovery: outstanding-tokens panel on
/hosts/new, regenerate (preserves attachments) and revoke handlers
+ audit, store-level ListOutstandingEnrollmentTokens and
DeleteEnrollmentToken.
NS-03 repo init / probe surface: migration 0020 adds
hosts.repo_status + repo_status_error; WS handler projects every
init job's outcome onto the host row (idempotent already-initialised
collapses to ready); creds-save resets status and dispatches a fresh
probe; /hosts/{id}/repo/probe retry endpoint with banner.
NS-04 dashboard live + sort + filter: query-string filter
(q/status/repo_status/tag/sort/dir), 5s htmx live poll mirroring the
alerts pattern with a localStorage live toggle, sortable column
headers, filter row + clear.
Alerts page: ack'd-by line resolves user_id ULID to username.
Compose.yaml ignored — host-specific.
|
||
|
|
c5777122db |
Add-host: default repo username to hostname; always show htpasswd snippet
The pending page suppressed the htpasswd snippet when repo_username was blank — but with --private-repos the username is required for auth, and operators routinely leave the field blank assuming the system will pick something sensible. * handleUIAddHostPost defaults repo_username to the typed hostname when blank. Matches what --private-repos expects (URL path segment == username). * pending_host.html: snippet now renders whenever a password is present (always true after the generate-on-blank logic landed earlier). * Form help-text updated to describe the default explicitly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
c1f85da55f |
Add-host: durable pending page + polled awaiting-agent panel
Two issues from a smoke session:
1. The awaiting-agent panel never refreshed — operator had to go
back to the dashboard to see the host had connected.
2. Generated passwords were displayed only on the POST response.
Navigating away (or even an accidental tab close) lost them
permanently, so the operator couldn't update the rest-server's
htpasswd.
Both are the same fix: convert the POST-rendered transient
"result state" into a durable GET page at /hosts/pending/{token}.
* New route GET /hosts/pending/{token} renders the install-command +
htpasswd snippet view. Password is decrypted from the (still-
encrypted-at-rest) token row on every render — operator can
refresh, bookmark, navigate away and come back. Once the agent
enrols, the page redirects to /hosts/{id}; once the token
expires, redirect to /hosts/new.
* New route GET /hosts/pending/{token}/awaiting returns a polled
HTML fragment that the pending page swaps in every 2s via HTMX.
States: awaiting (keep polling) | connected (show "Open host →"
+ "View schedules" CTAs, polling stops) | expired (mint-new
link, polling stops). Polling stops naturally because only the
awaiting state's wrapper carries the hx-trigger attribute.
* POST /hosts/new now 303-redirects to /hosts/pending/{token}
on success; validation errors keep re-rendering the form with
banner.
Supporting changes:
* New store helper Store.GetEnrollmentTokenStatus(tokenHash) for
the polling endpoint — returns {expires_at, consumed_at,
consumed_host} in one round-trip without dragging in the
attachments-decryption path.
* New ui.Renderer.RenderPartial(w, name, data) for HTMX fragment
responses (no layout wrap). Picks an arbitrary page's template
set as the lookup point — every page parses the full common-
paths list, so they all see every partial.
* add_host.html stripped to form-only; pending_host.html owns the
result-state UI; awaiting_agent.html is the polled partial.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
8fb1c100fd |
P2-04.5: kill host.default_paths in favour of manual schedules
Two independent path lists for "what does this host back up?" was
a real divergence footgun — operator types one set at Add-host time
and a different set into a schedule, both end up in the same repo,
the snapshot history looks fine until restore. Resolution: drop
host.default_paths entirely; add a `manual` flag on schedules.
A manual schedule has paths/excludes/tags/retention like any other
but no cron — it fires only via per-schedule Run-now. Single source
of truth for what gets backed up.
Schema (migration 0007):
* schedules.manual INTEGER NOT NULL DEFAULT 0.
* For every host with non-empty default_paths, seed a manual
schedule with those paths and bump host_schedule_version.
* ALTER TABLE hosts DROP COLUMN default_paths.
* ALTER TABLE enrollment_tokens RENAME COLUMN default_paths
TO initial_paths.
Original draft of this migration rebuilt hosts via the
create-new + drop-old + rename-new pattern. With foreign_keys=ON
(set in the connection DSN), DROP TABLE on the parent fired
ON DELETE CASCADE on every child of hosts(id) — schedules /
jobs / snapshots / host_credentials all wiped on the smoke env
when I tried it. SQLite 3.35+ supports column-level ALTERs
directly, so we skip the rebuild dance and avoid the cascade
trap. Six lines of SQL instead of sixty, no FK risk.
Run-now rewiring:
* New `dispatchScheduleNow(hostID, scheduleID, conn?)` helper
unifies the agent-driven path (cron fire → schedule.fire →
OnScheduleFire callback) and the UI-driven path (operator
clicks Run-now on a schedule row). Conn arg is optional; nil
falls back to Hub.Send.
* New POST /hosts/{id}/schedules/{sid}/run endpoint — per-row
Run-now button on the schedules list.
* Dashboard's per-host Run-now (handleUIRunBackup) now picks the
host's only enabled manual schedule, falls back to the only
enabled schedule, else returns "pick one in Schedules tab".
Keeps one-click for the common case.
Agent:
* Scheduler skips manual schedules in cron build (silent — they're
a normal data shape, not an error).
* Wire Schedule struct gains Manual flag.
* Schedule.fire flow unchanged — the agent only ever fires
non-manual schedules anyway.
UI:
* Add-host form retitled "Initial schedule · manual" so the
operator knows the paths become an editable schedule under
the Schedules tab. Result page calls out the manual schedule
+ points at Host > Schedules.
* Schedule edit form: "Manual schedule" checkbox at the top of
the When section; toggling it hides/shows the cron field via
inline JS. Server-side validator skips the cron requirement
when manual=true.
* Schedule list shows a "manual" tag under the status pill and
renders the When column as "— run-now only —" for manual rows.
Each row gets a Run-now button when the schedule is enabled
and the host is online.
Tests + go test ./... green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
c8ead66f08 |
P1 polish: agent-as-root, init-repo flow, rest creds passthrough, UX fixes
Cohesive batch from a smoke-test session against a real rest-server.
Themed bullets:
* Agent runs as root, sandboxed via systemd. CapabilityBoundingSet
drops to CAP_DAC_READ_SEARCH + restore caps; ProtectSystem=strict
with ReadWritePaths confined to /etc + /var/lib/restic-manager;
NoNewPrivileges blocks escalation. Install script no longer
creates a service user. spec.md §4.2 / §14.1 / §14.3 explain the
rationale (matches UrBackup / Veeam / Bareos defaults; trying to
back up "everything" as an unprivileged user creates silent skips
on /home, /root, /var/lib/* with no upside vs the threat model
the agent already implies).
* Init-repo end-to-end. New JobKind="init" wired through agent
runner, restic.Env.RunInit, server dispatcher, and a UI button
(red "Initialise repo" in the run-now panel). hosts.repo_initialised_at
flips on init success, on backup success, or on a non-empty
snapshots.report. The "Run now" / "Init" / "Retry" branching now
drives both the dashboard host row and the host-detail panel.
Migrations 0004 (column), 0005 (jobs.kind CHECK widened — using
the safe create-new-then-rename pattern; first version corrupted
job_logs.job_id FK), 0006 (cleans up job_logs FK on already-
affected DBs).
* rest-server creds embedded at exec time only. restic.Env gains
RepoUsername; mergeRestCreds() builds the user:pass@-prefixed URL
inside envSlice() and never assigns it back to the struct, so
nothing slog-able ever sees the cleartext form. RedactURL helper
for any future surface that needs to log a URL safely. Both
helpers tested.
* Add-host UX. Repo password is now optional — server mints a
24-byte URL-safe random one and surfaces it once, alongside an
htpasswd snippet ("echo PASS | htpasswd -B -i ... USERNAME") so
the operator pastes one command on the rest-server host and one
on the endpoint. Result page also links the install snippet at
/install/install.sh (was /install.sh — 404'd before) and pipes
to bash (not sh — script uses set -o pipefail and other
bashisms; on Debian/Ubuntu sh is dash).
* Late-subscriber race in JobHub. A fast-failing job could finish
(DB write + Broadcast) before the browser's HX-Redirect → page
load → WS-connect path completed, so the JS sat forever waiting
on a job.finished that already passed. JobHub split into
Register + Send + Run; handleJobStream now subscribes first,
re-fetches the job, and sends a synthetic job.finished if the
state is already terminal.
* HTMX error visibility. New toast partial listens to
htmx:responseError and surfaces the response body as a
bottom-right toast — every server-side validation error now
becomes visible without per-handler JS wiring. Also handles
custom rm:toast events for future server-pushed notifications
via the HX-Trigger header. Themed via existing CSS vars.
* Dashboard rows are now whole-row clickable to host detail
(CSS card-link pattern: absolute-positioned anchor + .row-action
z-index restoration so the action button stays clickable).
"View →" on a running job links to /jobs/<id> rather than
/hosts/<id> since the row click already covers the host page.
* "Run first" / "Run first backup" → "Run now" everywhere for
consistency.
* runbook (docs/e2e-smoke.md) updated — live-log streaming step
now reflects P1-26; mentions the browser-driven Run-now flow.
* _diag/dump-creds — moved out of cmd/ so go build doesn't pick
it up; .gitignore now excludes /_diag/ entirely.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
8aa635f0c1 |
P1 polish: Host.default_paths interim + restic env hygiene + job_id JS quoting
Two fixes that close the loop on dashboard run-now and harden the
agent's restic invocation.
Default paths (interim until P2-01 schedules):
- 0003 migration adds default_paths TEXT NOT NULL DEFAULT '[]'
to hosts and to enrollment_tokens.
- Operator types paths in the Add-host form (textarea, one per
line). They ride on the enrol_token row alongside the
encrypted creds (paths aren't secret — plain JSON column).
- On consume, ConsumeEnrollmentToken still just burns the token;
the new GetEnrollmentTokenAttachments returns both the
re-bindable creds and the path list in one round trip, the
handler transfers them onto the new host row inside CreateHost.
- The dashboard's Run-now and host-detail's "Run backup now"
button now read Host.DefaultPaths and pass them to dispatchJob.
A host with no default paths returns 400 with a friendly
"no paths set" message instead of dispatching a doomed
`restic backup` with no positional args.
- Doc comments explicitly call this out as a Phase 1 interim —
schedules supersede.
Restic env hygiene:
- envSlice() previously omitted HOME / XDG_CACHE_HOME, which
bit the smoke runs whenever the agent was launched outside
systemd (restic refused to start: "neither $XDG_CACHE_HOME
nor $HOME are defined"). Now both are set explicitly: prefer
Env.ExtraEnv overrides, fall back to the agent process's own
HOME, and finally to /var/lib/restic-manager.
- Comment makes the env policy explicit: parent's RESTIC_* /
AWS_* / B2_* env is filtered out by design — control-plane
is the unambiguous source of truth.
JS bug fix in the live log page:
- {{$job.ID | printf "%q"}} produced a literal-quoted JS string,
which then went into the WS URL as ".../jobs/"<ID>"/stream"
→ 404. Switched to '{{$job.ID}}' inside the literal so
html/template's auto-escape does the right thing. Verified
end-to-end: dashboard "Run now" → live progress + log lines
arrive over the WS → succeeded pill renders.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
9795492f2e |
P1-27: Add host flow — form + minted-token result page
GET /hosts/new renders the focused two-column form (hostname,
tags, repo URL/username/password). POST /hosts/new validates,
mints a one-time token via the new mintEnrollmentToken helper —
shared with the existing JSON /api/enrollment-tokens endpoint —
and re-renders the same page in result state showing:
- the install command with RM_SERVER + RM_TOKEN filled in (and
an inline copy-to-clipboard button),
- an "awaiting agent connection" panel with the hostname
pre-filled,
- a troubleshooting list pointing at the most common reasons
the agent doesn't appear,
- back-to-dashboard / add-another-host links.
publicURL() resolves RM_BASE_URL first, falling back to scheme +
Host on the inbound request — useful for local smoke without a
proxy.
Browser-verified end-to-end: form submit → token minted → install
command renders with the right values from the form input.
template fn formatRelTime now accepts time.Time *or* *time.Time
so templates can pass either without fighting Go's lack of an
address-of operator.
Deferred: download-preconfigured-installer (a templated .sh with
the values baked in) — copy-paste covers v1; nice-to-have later.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|