phase 1: WS transport, enrollment, agent that hellos and heartbeats

Lands the protocol layer end-to-end: an agent can be enrolled
through the operator UI, store credentials, dial back to the server
over WS, complete the protocol_version handshake, and stay
connected with periodic heartbeats.

Server side:
- P1-09 ws.Hub: one Conn per host_id, last-write-wins eviction,
  json envelope writer with a write mutex, reader, error envelopes.
- P1-09 ws.AgentHandler: bearer-auth, accept upgrade, hello-stage
  (10s deadline, protocol_version checked against
  api.MinAgentProtocolVersion → ErrProtocolTooOld with help URL on
  reject), main read loop, defer hub register/unregister.
- P1-10 POST /api/agents/enroll consumes a one-time token, mints a
  persistent agent bearer (sha-256 stored), creates a host row.
- P1-10 POST /api/enrollment-tokens (operator, session-auth)
  issues a 1h one-time token.
- P1-11 hello upserts agent_version + restic_version +
  protocol_version on the host row, flips status to online.
- P1-12 heartbeat touches last_seen_at; background sweeper marks
  hosts offline after 90s without one.
- store: hosts table accessors, host_schedule_version,
  enrollment_tokens FK on consumed_host dropped (audit-only field;
  the token gets burned before the host row exists).

Agent side:
- P1-13 internal/agent/config: yaml at /etc/restic-manager/agent.yaml,
  atomic Save (tmp+fsync+rename), Enrolled() helper.
- P1-15 internal/agent/wsclient: dial with bearer + optional
  TLS cert pinning (sha-256 of leaf), exponential backoff with
  jitter (1s → 60s cap), heartbeat goroutine, fatal handling for
  ErrProtocolTooOld.
- P1-15 wsclient.Enroll: HTTP POST /api/agents/enroll with sysinfo.
- P1-17 internal/agent/sysinfo: hostname/OS/arch/restic-version
  collection. restic detected by `restic version` parse; absent
  restic doesn't block startup.
- cmd/agent: -enroll-server / -enroll-token flags drive first-run
  enrollment then exit (so the install script can hand off to
  systemd to run the persistent service).

End-to-end smoke verified: bootstrap → login → issue token →
enroll → run agent → server logs `ws agent connected` with the
right host_id and protocol_version 1.

All tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-01 00:39:00 +01:00
parent df2c584b23
commit 9cc0caff1e
18 changed files with 1670 additions and 14 deletions
+78
View File
@@ -0,0 +1,78 @@
// Package sysinfo collects host metadata at agent startup: OS, arch,
// hostname, restic version. The agent sends this in `hello` so the
// server's Host row stays current.
package sysinfo
import (
"context"
"fmt"
"os"
"os/exec"
"runtime"
"strings"
"time"
"gitea.dcglab.co.uk/steve/restic-manager/internal/api"
)
// Snapshot is the bundle of metadata reported in `hello`.
type Snapshot struct {
Hostname string
OS api.HostOS
Arch api.HostArch
ResticVersion string
ProtocolVersion int
BootTime time.Time
}
// Collect probes the running host. resticPath, if non-empty,
// overrides PATH lookup.
func Collect(ctx context.Context, resticPath string) (Snapshot, error) {
hn, err := os.Hostname()
if err != nil {
return Snapshot{}, fmt.Errorf("sysinfo: hostname: %w", err)
}
osTag := api.HostOS(runtime.GOOS)
archTag := api.HostArch(runtime.GOARCH)
resticVer, _ := detectResticVersion(ctx, resticPath) // empty on failure is fine
return Snapshot{
Hostname: hn,
OS: osTag,
Arch: archTag,
ResticVersion: resticVer,
ProtocolVersion: api.CurrentProtocolVersion,
}, nil
}
// detectResticVersion runs `restic version` and parses the first line.
// Output looks like:
// restic 0.17.1 compiled with go1.22.5 on linux/amd64
// Returns the version token (e.g. "0.17.1") or "" if restic isn't
// found. We never block startup on a missing restic — the operator
// might not have installed it yet, and the agent should still be
// able to connect and report.
func detectResticVersion(ctx context.Context, override string) (string, error) {
bin := override
if bin == "" {
var err error
bin, err = exec.LookPath("restic")
if err != nil {
return "", err
}
}
versionCtx, cancel := context.WithTimeout(ctx, 5*time.Second)
defer cancel()
out, err := exec.CommandContext(versionCtx, bin, "version").Output()
if err != nil {
return "", err
}
first := strings.SplitN(strings.TrimSpace(string(out)), "\n", 2)[0]
parts := strings.Fields(first)
if len(parts) >= 2 && parts[0] == "restic" {
return parts[1], nil
}
return "", fmt.Errorf("sysinfo: unrecognised restic version output: %q", first)
}