phase 1: run-now backup — restic wrapper, job lifecycle, end-to-end

Lands the operator → server → agent → restic → server roundtrip for
on-demand backups. The flow:

  POST /api/hosts/{id}/jobs {kind:"backup",args:["/path"]}
    → server creates a queued Job row
    → server emits command.run over WS to the host's agent
    → agent dispatcher spawns runner.RunBackup in a goroutine
    → runner spawns `restic backup --json`, parses each line
    → forwards: job.started, log.stream (every line), job.progress
      (throttled to 1/sec), job.finished (with summary stats blob)
    → server WS handler persists those into jobs / job_logs

P1-16 internal/restic: thin Locate + Env wrapper that runs `restic
  backup --json`, scans stdout/stderr, parses BackupStatus +
  BackupSummary, calls back into a LineHandler so the agent can fan
  out to log.stream + job.progress. Treats exit code 3 as
  "succeeded with issues" (matches restic's contract).

P1-18 store: jobs accessors (CreateJob, MarkJobStarted,
  MarkJobFinished, AppendJobLog, GetJob).

P1-19 server: POST /api/hosts/{id}/jobs creates the Job row,
  validates kind, dispatches via Hub.Send, audit-logs the action.

P1-20 agent runner: wraps restic.RunBackup with throttled progress
  emission. Sender abstraction was added to wsclient.Handler so
  background goroutines can keep replying after dispatch returns.

P1-21 server WS: dispatchAgentMessage now persists job.started,
  job.finished, log.stream into the database. Browser fan-out for
  live tailing lands with the UI work.

Agent gets repo_url + repo_password from agent.yaml in plaintext
for now (mode 0600, owned by service user); spec.md §7.3's keyring
storage moves there in P2. config.update over WS overrides the
in-memory copy (does not persist).

Build clean; all tests pass. End-to-end with a real restic still
needs a host that has restic installed — wire shape verified by
the existing hello/heartbeat round-trip test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-01 00:45:04 +01:00
parent 24ab071702
commit a7c6a6e09c
10 changed files with 811 additions and 29 deletions
+38 -5
View File
@@ -19,6 +19,7 @@ import (
stdhttp "net/http"
"net/url"
"strings"
"sync"
"time"
"github.com/coder/websocket"
@@ -36,10 +37,19 @@ type Config struct {
HelloPayload api.HelloPayload
}
// Handler is invoked for every server-sent message. The agent's main
// program supplies one that knows how to dispatch command.run etc.
// to the runner package.
type Handler func(ctx context.Context, env api.Envelope) error
// Sender is what handlers use to push agent → server messages
// (job.progress, job.finished, log.stream, command.result, …).
// Returned by the WS client to the dispatch handler. Write operations
// serialise behind a single mutex on the conn; concurrent calls are
// safe.
type Sender interface {
Send(env api.Envelope) error
}
// Handler is invoked for every server-sent message. tx lets the
// handler push replies back; it is valid only for the lifetime of
// the connection (calls fail if the agent has reconnected since).
type Handler func(ctx context.Context, env api.Envelope, tx Sender) error
// Run keeps the agent connected indefinitely. Returns when ctx is
// cancelled. Errors during a single connection attempt are logged and
@@ -107,6 +117,8 @@ func connectOnce(ctx context.Context, cfg Config, handle Handler) error {
}
slog.Info("ws agent connected", "server", wsURL)
tx := &connSender{conn: conn, ctx: ctx}
// Heartbeat goroutine.
heartbeatCtx, cancelHeartbeat := context.WithCancel(ctx)
defer cancelHeartbeat()
@@ -138,13 +150,34 @@ func connectOnce(ctx context.Context, cfg Config, handle Handler) error {
continue
}
if handle != nil {
if err := handle(ctx, env); err != nil {
if err := handle(ctx, env, tx); err != nil {
slog.Warn("ws agent: handler returned error", "type", env.Type, "err", err)
}
}
}
}
// connSender is the per-connection Sender. Goroutines beyond the
// read loop (e.g. a backup running in its own goroutine) keep a
// reference to one of these for the duration of their work.
type connSender struct {
conn *websocket.Conn
ctx context.Context
mu sync.Mutex
}
func (s *connSender) Send(env api.Envelope) error {
s.mu.Lock()
defer s.mu.Unlock()
raw, err := json.Marshal(env)
if err != nil {
return err
}
writeCtx, cancel := context.WithTimeout(s.ctx, 30*time.Second)
defer cancel()
return s.conn.Write(writeCtx, websocket.MessageText, raw)
}
func heartbeatLoop(ctx context.Context, conn *websocket.Conn, period time.Duration) {
t := time.NewTicker(period)
defer t.Stop()