p6-01/02: agent self-update + fleet update server cluster
- alert: update_failed (per-host, dedup=hostID) + fleet_update_halted
(system-scoped, host_id NULL via new RaiseOrTouchSystem helper).
- ws: UpdateWatcher tracks in-flight command.update dispatches and
reconciles them against incoming hello envelopes — success path
marks the job succeeded and auto-resolves the alert; 90s timeout
marks the job failed and raises update_failed.
- http: POST /api/hosts/{id}/update (admin-only JSON) + the HTMX
/hosts/{id}/update form variant. Pre-checks: host exists, online,
agent_version != current, no running update job. Refactored core
into Server.dispatchHostUpdate so the fleet worker can share it
without going through HTTP.
- fleetupdate: rolling worker iterating through host slots, halting
on first failure and raising fleet_update_halted. Polling-based
version-match (re-read hosts.agent_version every 1s up to 95s) —
no extra plumbing into the WS hello path. At-most-one-running is
enforced at the store layer (ErrFleetUpdateRunning).
- cmd/server: wire UpdateWatcher and FleetWorker into the main
goroutine; the worker uses a small serverDispatcher adapter that
delegates back into Server.DispatchHostUpdate.
Tests: watcher (success/timeout/mismatch/late-hello), HTTP endpoint
(happy + four pre-check branches + RBAC), worker (two-host happy,
timeout-halt, host-offline-halt, already-at-target skip, cancel
mid-run, double-Start guard).
This commit is contained in:
@@ -16,6 +16,7 @@ import (
|
||||
"gitea.dcglab.co.uk/steve/restic-manager/internal/api"
|
||||
"gitea.dcglab.co.uk/steve/restic-manager/internal/auth"
|
||||
"gitea.dcglab.co.uk/steve/restic-manager/internal/store"
|
||||
"gitea.dcglab.co.uk/steve/restic-manager/internal/version"
|
||||
)
|
||||
|
||||
// HandlerDeps is the set of collaborators the agent WS handler needs.
|
||||
@@ -26,6 +27,9 @@ type HandlerDeps struct {
|
||||
// AlertEngine receives job-finished and host-online events so the
|
||||
// alert engine can evaluate its rules. Optional; nil = no-op.
|
||||
AlertEngine *alert.Engine
|
||||
// UpdateWatcher reconciles in-flight agent-update dispatches against
|
||||
// hello envelopes. Optional; nil = no-op.
|
||||
UpdateWatcher *UpdateWatcher
|
||||
// OnHello is called once per successful hello, after the host row
|
||||
// has been touched and the conn registered. Used by the HTTP
|
||||
// layer to push host_credentials down as a config.update before
|
||||
@@ -147,6 +151,9 @@ func runAgentLoop(ctx context.Context, c *Conn, hostID string, deps HandlerDeps)
|
||||
if deps.AlertEngine != nil {
|
||||
deps.AlertEngine.NotifyHostOnline(hostID)
|
||||
}
|
||||
if deps.UpdateWatcher != nil {
|
||||
deps.UpdateWatcher.OnHello(ctx, hostID, helloPayload.AgentVersion, version.Version)
|
||||
}
|
||||
|
||||
deps.Hub.Register(hostID, c)
|
||||
defer deps.Hub.Unregister(hostID, c)
|
||||
|
||||
Reference in New Issue
Block a user