ee16bc7ce7775d1b7d68e93a25028cbce18bfe4c
13 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
ee16bc7ce7 |
P1-24: live dashboard — fleet summary tiles + host table
Server-rendered HTML view backed by:
- new store.FleetSummary aggregating host counts + repo bytes +
snapshot total + open alerts + last-24h job rollup in two queries.
- GET /api/hosts (JSON list of hosts in the dashboard projection).
- GET /api/fleet/summary (JSON aggregate, same shape as above).
The HTML page (web/templates/pages/dashboard.html) renders the four
summary tiles + host table directly from store data — no separate
fetch. Per-row state colour comes from .host-row.{degraded,failed,
offline} which paint a 3px left edge so problem hosts are scannable
without reading. HTMX is loaded into the base layout so per-row
"Run now" buttons can hx-post to /hosts/{id}/run-backup, a thin
HTML wrapper that funnels into a new dispatchJob helper shared
with the JSON /api/hosts/{id}/jobs endpoint.
Empty state (zero hosts) collapses to the "no hosts yet" prompt
with the + Add host CTA — matches the v1 mockup.
Template helpers (internal/server/ui/funcs.go) added for byte
formatting (412 GB / 3.7 TB), relative time (3m ago / 2d ago), and
comma grouping (1,847). Pure Go, no template-magic dependency.
Browser-verified end-to-end with seeded fixture data: five hosts
across all four states render with correct dots, accents, last-
backup pills, sizes, snapshot counts, alerts, tags, and the right
action button (Run now / Retry / Run first / View → / offline).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
229f89fee2 |
P1-23 / P1-28: base layout, login, session-aware nav + Tailwind build
P1-28: Tailwind standalone CLI wired into the Makefile. `make tailwind` downloads the pinned v3.4.17 binary into bin/tailwindcss (gitignored), builds web/styles/input.css → web/static/css/styles.css. `make build` now runs the CSS pass first; `make tailwind-watch` for dev. Output is embedded in the binary via web.FS — single static binary, no Node. The CSS source carries every component class the v1 mockups defined (status dots, buttons, host row, log viewer, progress bar, fields, chips, snippet panel, empty state) so screens that land later can just reach for them. P1-23: html/template tree at web/templates with two layouts (base with chrome, chromeless for login + bootstrap), one nav partial, and two pages (dashboard placeholder, login). internal/server/ui parses the tree at startup; ui_handlers.go in the http package wires: GET / dashboard (303 → /login when unauthed) GET /login sign-in form POST /login consume form, mint session cookie, 303 → / POST /logout drop cookie, 303 → /login GET /static/* embedded Tailwind bundle The HTML login flow shares store/session logic with /api/auth/login via a new authenticateAndSession helper — same security guarantees, two surface representations (HTML form / JSON). Verified end-to-end: bootstrap → form-login → authed dashboard → sign-out → 303 cycle works in the browser; Tailwind output emits only the component classes referenced in the live templates (9.6kB minified). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
b6cfa99413 |
agent: log accept/complete on backup jobs; audit: populate host.enrolled payload
Two warts surfaced during the smoke run:
- Agent was silent between "config.update applied" and "job
finished" — operators tailing journalctl saw no acknowledgement
that a command.run had landed. Adds Info logs at job-accept
({job_id, paths}) and at successful completion.
- The host.enrolled audit row had an empty {} payload. Now
carries {hostname, os, arch, has_repo_creds} so an audit-log
reader can answer "what got enrolled and did the operator
bundle creds with the token" without joining back to hosts.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
2418e585db |
fix: enrollment FK race + log-when-rejected; runbook fixes from dry-run
The smoke runbook caught a real bug: ConsumeEnrollmentToken was
inserting into host_credentials (FK -> hosts) inside the same tx as
the token burn, but the host row didn't exist yet — CreateHost
runs in the *next* statement. The agent saw a generic 401 with no
clue why.
Fix: drop the host_credentials insert from ConsumeEnrollmentToken;
the HTTP handler now does Consume -> CreateHost ->
SetHostCredentials. SetHostCredentials failure is logged loudly
but doesn't fail the enrol — operator recovers via PUT
/api/hosts/{id}/repo-credentials.
Adds slog.Warn lines on both 401 paths in handleAgentEnroll so the
underlying cause is visible in server logs (the wire response stays
generic to avoid leaking which step failed).
Test: TestEnrollmentTransfersRepoCreds rewritten to mirror the new
order (consume -> create host -> SetHostCredentials).
Runbook (docs/e2e-smoke.md): rest-server moved off 8000 (commonly
in use); URLs use trailing slash on the rest path; clarified that
secrets_key is minted on first agent start, not at enrol time.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
5d1951ad94 |
P1-34: e2e smoke runbook + redacted GET /repo-credentials
Adds docs/e2e-smoke.md — an ~5-minute runbook that walks the full
P1 happy path against a sibling restic/rest-server: bootstrap
admin, mint token with repo creds, enrol an agent, watch the
config.update push land, run a backup, confirm the snapshot, edit
creds and watch the second push fire. Per the design discussion
this is a runbook (not a Go integration test); the Playwright
version lands in P5-06.
GET /api/hosts/{id}/repo-credentials returns the redacted view —
{repo_url, repo_username, has_password} — so the UI can pre-fill
the edit form without ever pulling the password out of the AEAD
blob.
Marks P1-32 / P1-33 / P1-34 done in tasks.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
0ba56ed30d |
P1-32: server-side encrypted repo creds + push-on-hello
Operator-minted enrollment tokens now carry the repo URL/username/
password as one AEAD blob bound (via additional-data) to the token
hash. ConsumeEnrollmentToken re-encrypts under host_id and writes a
host_credentials row in the same tx as token-burn, so the binding
moves with the credential.
PUT /api/hosts/{id}/repo-credentials lets an operator edit creds
post-enrollment; merges with the existing blob, audits, and pushes
config.update if the agent is connected.
WS handler grows an OnHello hook that the HTTP layer wires to send
the host's decrypted creds as a config.update immediately after the
hello succeeds — synchronously, so a racing command.run lands after
the agent has its repo password.
Schema: 0002_host_credentials.sql adds enc_repo_creds to
enrollment_tokens and a host_credentials table (PK = host_id, FK
ON DELETE CASCADE).
Tests: round-trip token → consume → host_credentials with AAD swap
detection; no-creds path stays compatible.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
3904a78f14 |
P1-22: snapshot listing via restic snapshots --json
Agent calls restic snapshots --json after each successful backup
(60s timeout, separate from the backup ctx) and ships the projection
over the existing snapshots.report WS envelope. Failure here is
logged but doesn't fail the job — the next successful backup catches
the projection up.
Server-side ReplaceHostSnapshots is delete-then-insert plus a
hosts.snapshot_count update in one transaction so the dashboard's
per-host count stays consistent with the projection. New read
endpoint GET /api/hosts/{id}/snapshots returns the cached list with
a refreshed_at marker so the UI can show staleness when an agent
has been offline.
Schema: dropped the unused snapshots.repo_id FK (repos as a
first-class entity is P2 work), added short_id and refreshed_at
columns, switched the time index to DESC for the most-recent-first
list query. api.Snapshot gains short_id; size_bytes/file_count come
from the embedded summary block on restic 0.16+ and stay zero on
older clients.
Tests cover round-trip, authoritative replacement after forget+prune
shrinkage, and empty-after-wipe.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
41a4043af3 |
server: drop in-process TLS — HTTP-only behind reverse proxy
Self-hosted deployments already terminate TLS at Caddy/Traefik/nginx; making the server do TLS too means double cert config, dual ACME plumbing, and an untested code path. Drop RM_TLS_CERT/RM_TLS_KEY, remove TLSEnabled() and the ListenAndServeTLS branch. Replace the cookie's "Secure if TLS-in-process" check with a new RM_COOKIE_SECURE flag (default true). Local HTTP-only testing sets RM_COOKIE_SECURE=false; production is always behind a TLS proxy and the cookie stays Secure. Default port :8443 → :8080. docker-compose binds 127.0.0.1 only and populates RM_TRUSTED_PROXY. spec.md §4.1/§10.1 rewritten with a Caddyfile snippet and a hard "do not expose RM_LISTEN publicly" warning. enrollResponse keeps cert_pin_sha256 in the shape but the server can't introspect a cert it doesn't terminate — operator pastes the proxy's hash into -cert-pin at install time. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
95b49ecab9 |
phase 1: run-now backup — restic wrapper, job lifecycle, end-to-end
Lands the operator → server → agent → restic → server roundtrip for
on-demand backups. The flow:
POST /api/hosts/{id}/jobs {kind:"backup",args:["/path"]}
→ server creates a queued Job row
→ server emits command.run over WS to the host's agent
→ agent dispatcher spawns runner.RunBackup in a goroutine
→ runner spawns `restic backup --json`, parses each line
→ forwards: job.started, log.stream (every line), job.progress
(throttled to 1/sec), job.finished (with summary stats blob)
→ server WS handler persists those into jobs / job_logs
P1-16 internal/restic: thin Locate + Env wrapper that runs `restic
backup --json`, scans stdout/stderr, parses BackupStatus +
BackupSummary, calls back into a LineHandler so the agent can fan
out to log.stream + job.progress. Treats exit code 3 as
"succeeded with issues" (matches restic's contract).
P1-18 store: jobs accessors (CreateJob, MarkJobStarted,
MarkJobFinished, AppendJobLog, GetJob).
P1-19 server: POST /api/hosts/{id}/jobs creates the Job row,
validates kind, dispatches via Hub.Send, audit-logs the action.
P1-20 agent runner: wraps restic.RunBackup with throttled progress
emission. Sender abstraction was added to wsclient.Handler so
background goroutines can keep replying after dispatch returns.
P1-21 server WS: dispatchAgentMessage now persists job.started,
job.finished, log.stream into the database. Browser fan-out for
live tailing lands with the UI work.
Agent gets repo_url + repo_password from agent.yaml in plaintext
for now (mode 0600, owned by service user); spec.md §7.3's keyring
storage moves there in P2. config.update over WS overrides the
in-memory copy (does not persist).
Build clean; all tests pass. End-to-end with a real restic still
needs a host that has restic installed — wire shape verified by
the existing hello/heartbeat round-trip test.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
e8eccd20c2 |
phase 1: agent install path — systemd unit, install.sh, asset endpoints
P1-14 deploy/install/restic-manager-agent.service: standard systemd
unit with the usual hardening switches (NoNewPrivileges, Protect*,
RestrictRealtime, MemoryDenyWriteExecute). Restart=always with a
5s backoff. Runs as a dedicated unprivileged restic-manager-agent
user; the install script creates it.
P1-29 deploy/install/install.sh: arch detection (amd64/arm64), pulls
the agent binary from /agent/binary, creates the service user
+ dirs (/etc/restic-manager, /var/lib/restic-manager), runs
enrollment via `agent -enroll-server -enroll-token`, lays down
the systemd unit, enables and starts it.
Honours the spec's "detect, don't auto-disable" rule for existing
schedulers: scans systemd timers, /etc/cron.d/*, /etc/cron.daily/*,
root crontab for restic-named entries and prints them with the
exact disable command — operator decides.
P1-31 server endpoints to ship the agent installation payload:
GET /agent/binary?os=linux&arch=amd64 → serves
<DataDir>/agent-binaries/restic-manager-agent-linux-amd64
GET /install/<file> → serves
<DataDir>/install/<file>
Both endpoints reject path traversal and return 404 if the file
isn't published. Operators drop the binaries + service unit into
these directories at release time. Signed-bundle verification is
deferred to Phase 5 OSS readiness.
All tests still pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
f34773b505 |
phase 1: WS transport, enrollment, agent that hellos and heartbeats
Lands the protocol layer end-to-end: an agent can be enrolled through the operator UI, store credentials, dial back to the server over WS, complete the protocol_version handshake, and stay connected with periodic heartbeats. Server side: - P1-09 ws.Hub: one Conn per host_id, last-write-wins eviction, json envelope writer with a write mutex, reader, error envelopes. - P1-09 ws.AgentHandler: bearer-auth, accept upgrade, hello-stage (10s deadline, protocol_version checked against api.MinAgentProtocolVersion → ErrProtocolTooOld with help URL on reject), main read loop, defer hub register/unregister. - P1-10 POST /api/agents/enroll consumes a one-time token, mints a persistent agent bearer (sha-256 stored), creates a host row. - P1-10 POST /api/enrollment-tokens (operator, session-auth) issues a 1h one-time token. - P1-11 hello upserts agent_version + restic_version + protocol_version on the host row, flips status to online. - P1-12 heartbeat touches last_seen_at; background sweeper marks hosts offline after 90s without one. - store: hosts table accessors, host_schedule_version, enrollment_tokens FK on consumed_host dropped (audit-only field; the token gets burned before the host row exists). Agent side: - P1-13 internal/agent/config: yaml at /etc/restic-manager/agent.yaml, atomic Save (tmp+fsync+rename), Enrolled() helper. - P1-15 internal/agent/wsclient: dial with bearer + optional TLS cert pinning (sha-256 of leaf), exponential backoff with jitter (1s → 60s cap), heartbeat goroutine, fatal handling for ErrProtocolTooOld. - P1-15 wsclient.Enroll: HTTP POST /api/agents/enroll with sysinfo. - P1-17 internal/agent/sysinfo: hostname/OS/arch/restic-version collection. restic detected by `restic version` parse; absent restic doesn't block startup. - cmd/agent: -enroll-server / -enroll-token flags drive first-run enrollment then exit (so the install script can hand off to systemd to run the persistent service). End-to-end smoke verified: bootstrap → login → issue token → enroll → run agent → server logs `ws agent connected` with the right host_id and protocol_version 1. All tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
84fd31ccaa |
phase 1: HTTP server + first-run bootstrap
P1-01 chi router, slog request log, graceful shutdown via signal
context. Health endpoint, /api/auth/login, /api/auth/logout,
/api/bootstrap. Background sweeper for expired sessions and
enrollment tokens (15 min cadence).
P1-04 (sessions half) HttpOnly Secure-when-TLS cookie carrying a
base64url token; server stores SHA-256(token) so a stolen DB
doesn't yield credentials. Unknown user and bad password collapse
to the same 401 response code so a probe can't enumerate names.
P1-05 first-run admin bootstrap. On a fresh DB the server mints a
one-time token and prints it to stderr inside a banner. The
/api/bootstrap handler accepts {token, username, password},
creates the first admin, then becomes a 409 forever.
P1-07 (partial) audit hooks fire on auth.login and auth.bootstrap.
Full middleware-driven coverage lands with the rest of the API.
internal/server/config: env > YAML > defaults. RM_LISTEN /
RM_DATA_DIR / RM_BASE_URL / RM_TLS_CERT / RM_TLS_KEY /
RM_SECRET_KEY_FILE / RM_TRUSTED_PROXY (CIDR list, validated).
End-to-end smoke test passes: server boots on a fresh dir,
prints the bootstrap token, POST /api/bootstrap creates the admin,
POST /api/auth/login returns 200 with a session cookie.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
c9368de904 |
phase 0: project bootstrap
P0-01 Go module + cmd/server + cmd/agent skeletons + internal/ tree
P0-02 LICENSE (PolyForm NC 1.0.0), README, CONTRIBUTING
P0-03 golangci-lint, pre-commit, .editorconfig, .gitignore
P0-04 Gitea Actions CI: test (race+coverage), lint, cross-platform build matrix
P0-05 Dockerfile.server (multi-stage, distroless/static), docker-compose.yml
P0-06 Makefile with build/test/lint/fmt/run/release targets
build, vet, test, and cross-compile to linux/{amd64,arm64} + windows/amd64
all verified locally.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|