Files
restic-manager/internal/server/http/server.go
T
steve 6450bf1b88
CI / Test (linux/amd64) (push) Has been cancelled
CI / Lint (push) Has been cancelled
CI / Build (windows/amd64) (push) Has been cancelled
CI / Build (linux/amd64) (push) Has been cancelled
CI / Build (linux/arm64) (push) Has been cancelled
P2-02 (agent side) + P2-03: agent scheduler + schedule.fire dispatch
Closes the schedule reconciliation loop end-to-end.

* New `internal/agent/scheduler` package wraps robfig/cron/v3 with
  the lifecycle the agent needs:
  - Apply(ScheduleSetPayload, Sender) stops the prior cron (waiting
    for in-flight entries to return), rebuilds from scratch, starts,
    and emits schedule.ack with the version we just applied.
  - Disabled entries skipped silently; bad cron exprs (which
    shouldn't reach us — the server validates — but defensive)
    log a warn and skip.
  - On each cron tick the entry sends a new schedule.fire envelope
    to the server with {schedule_id, scheduled_at}. The scheduler
    itself never builds CommandRunPayloads — server is the source
    of truth for jobs.
  - tx is swapped on every Apply, so reconnect is handled
    naturally: cron entries that fire against a dropped tx log
    "no active connection" and skip the tick.
  - Stop() is idempotent and waits for the cron's in-flight
    workers via cron.Stop().Done().

* New wire message api.MsgScheduleFire + api.ScheduleFirePayload
  for the agent → server "I just fired locally" RPC.

* Server-side dispatch (schedule_push.go: dispatchScheduledJob):
  looks up the schedule by id, validates ownership + that it's
  enabled, builds args from kind (paths for backup; other kinds
  are still arg-less in Phase 2 and grow as those job kinds land
  in P2-05..08), persists a jobs row with actor_kind=schedule +
  scheduled_id, and writes command.run back on the same conn so
  the agent runs through its existing dispatch path.

* store.CreateJob now writes scheduled_id. This column was in the
  schema since 0001 but never populated — the original P1 path
  only had operator-driven jobs, so actor_kind was always 'user'
  and scheduled_id was always nil.

* cmd/agent/main.go integration: dispatcher gains a
  *scheduler.Scheduler; the MsgScheduleSet case now hands the
  payload to scheduler.Apply (in a goroutine so the WS read loop
  keeps draining other messages).

* WS dispatcher gains OnScheduleFire alongside OnScheduleAck.

* Tests:
  - scheduler unit tests (4): ack-on-apply, cron tick fires
    schedule.fire envelope, disabled entries don't fire, replace-
    prior-state stops the old cron.
  - Server-side end-to-end: schedule.fire → command.run with the
    right job_id / kind / args, plus jobs row with actor_kind=
    "schedule" and scheduled_id linking back to the schedule.

Persistence of next-fire times across agent restarts is
deliberately deferred. A missed fire window during downtime
simply fires once on reconnect — that's the desirable behaviour
(the operator wants the missed backup to run, not be silently
skipped because we lost track of when it was due).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 11:29:12 +01:00

188 lines
6.4 KiB
Go

// Package http hosts the chi-based REST handlers for the control
// plane. The Server type owns the router, the handlers, and the
// graceful-shutdown lifecycle.
package http
import (
"context"
"errors"
stdhttp "net/http"
"time"
"github.com/go-chi/chi/v5"
"github.com/go-chi/chi/v5/middleware"
"gitea.dcglab.co.uk/steve/restic-manager/internal/crypto"
"gitea.dcglab.co.uk/steve/restic-manager/internal/server/config"
"gitea.dcglab.co.uk/steve/restic-manager/internal/server/ui"
"gitea.dcglab.co.uk/steve/restic-manager/internal/server/ws"
"gitea.dcglab.co.uk/steve/restic-manager/internal/store"
)
// Deps bundles every collaborator the HTTP server depends on. Wired up
// in cmd/server; tests pass a pared-down Deps with fakes.
type Deps struct {
Cfg config.Config
Store *store.Store
AEAD *crypto.AEAD
Hub *ws.Hub
JobHub *ws.JobHub
UI *ui.Renderer
// Version is the binary's build version, surfaced in the chrome.
// Empty falls back to "dev".
Version string
// BootstrapToken (optional, populated only on first run) is the raw
// admin-bootstrap token printed in the server logs. While set, the
// /bootstrap endpoint accepts it to create the first admin user.
BootstrapToken string
}
// Server is the running HTTP server.
type Server struct {
srv *stdhttp.Server
deps Deps
}
// New builds a configured but not-yet-started server.
func New(deps Deps) *Server {
r := chi.NewRouter()
// Built-in middleware: request ID for log correlation, recovery
// (don't crash the process on a panic in a handler), realIP iff a
// trusted proxy is configured.
r.Use(middleware.RequestID)
r.Use(middleware.Recoverer)
r.Use(requestLogger)
// Health endpoint — unauthenticated, no audit, deliberately cheap.
r.Get("/healthz", func(w stdhttp.ResponseWriter, _ *stdhttp.Request) {
w.WriteHeader(stdhttp.StatusNoContent)
})
s := &Server{deps: deps}
s.routes(r)
s.srv = &stdhttp.Server{
Addr: deps.Cfg.Listen,
Handler: r,
ReadHeaderTimeout: 10 * time.Second,
IdleTimeout: 60 * time.Second,
// Long write timeout — WS upgrades and live log streams need it.
WriteTimeout: 0,
}
return s
}
// routes wires the API tree. Subtrees live in this file by area so a
// reader can scan one place and see the surface.
func (s *Server) routes(r chi.Router) {
r.Route("/api", func(r chi.Router) {
r.Post("/auth/login", s.handleLogin)
r.Post("/auth/logout", s.handleLogout)
r.Post("/bootstrap", s.handleBootstrap)
// Agent enrollment (open endpoint — token is the credential).
r.Post("/agents/enroll", s.handleAgentEnroll)
// Operator → server (authenticated). Spec.md §6.1's
// /hosts/{id}/enrollment-token (regenerate) lands when the
// host page can call it; for now just the create endpoint.
r.Post("/enrollment-tokens", s.handleCreateEnrollmentToken)
// Fleet read endpoints — back the dashboard.
r.Get("/hosts", s.handleListHosts)
r.Get("/fleet/summary", s.handleFleetSummary)
// Run-now: dispatch a job to a host's agent.
r.Post("/hosts/{id}/jobs", s.handleRunNow)
// Snapshot projection (refreshed by the agent after each backup).
r.Get("/hosts/{id}/snapshots", s.handleListHostSnapshots)
// Repo credentials — operator can edit after enrollment. The
// initial set is supplied at token-mint time (see enrollment.go).
// GET returns a redacted view (URL, username, has_password).
r.Get("/hosts/{id}/repo-credentials", s.handleGetHostCredentials)
r.Put("/hosts/{id}/repo-credentials", s.handleSetHostCredentials)
// Per-host schedule CRUD. Mutations bump host_schedule_version;
// the agent sync path (P2-02) picks up the new version on the
// next reconciliation tick.
r.Get("/hosts/{id}/schedules", s.handleListSchedules)
r.Post("/hosts/{id}/schedules", s.handleCreateSchedule)
r.Put("/hosts/{id}/schedules/{sid}", s.handleUpdateSchedule)
r.Delete("/hosts/{id}/schedules/{sid}", s.handleDeleteSchedule)
})
// Agent ↔ server WebSocket. Bearer-authenticated inside the handler.
if s.deps.Hub != nil {
r.Mount("/ws/agent", ws.AgentHandler(ws.HandlerDeps{
Hub: s.deps.Hub,
Store: s.deps.Store,
JobHub: s.deps.JobHub,
OnHello: s.onAgentHello,
OnScheduleAck: s.applyScheduleAck,
OnScheduleFire: s.dispatchScheduledJob,
}))
}
// Agent binaries + install scripts. Open endpoints — content is
// unprivileged on its own, gating happens via the enrollment
// token. See agent_assets.go.
r.Get("/agent/binary", s.handleAgentBinary)
r.Get("/install/*", s.handleInstallAsset)
// Static assets (Tailwind CSS bundle, future favicon).
r.Mount("/static/", staticHandler())
// HTML UI. The renderer is required — fail loud if the binary
// was built without templates (impossible in practice given
// embed, but guards bad test wiring).
if s.deps.UI != nil {
r.Get("/", s.handleUIDashboard)
r.Get("/login", s.handleUILoginGet)
r.Post("/login", s.handleUILoginPost)
r.Post("/logout", s.handleUILogoutPost)
// HTMX action endpoint for "Run now" buttons on the dashboard.
r.Post("/hosts/{id}/run-backup", s.handleUIRunBackup)
// HTMX action endpoint for the red "Initialise repo" button
// shown in the run-now panel until the repo is confirmed init'd.
r.Post("/hosts/{id}/init-repo", s.handleUIInitRepo)
// Add host flow.
r.Get("/hosts/new", s.handleUIAddHostGet)
r.Post("/hosts/new", s.handleUIAddHostPost)
// Host detail (Snapshots tab is the default).
r.Get("/hosts/{id}", s.handleUIHostDetail)
// Live job log.
r.Get("/jobs/{id}", s.handleUIJobDetail)
}
// Browser job-log stream (separate from /ws/agent so the auth
// layer is session-cookie not bearer). Mounted regardless of
// whether the UI is up — JSON callers may also subscribe.
if s.deps.JobHub != nil {
r.Get("/api/jobs/{id}/stream", s.handleJobStream)
}
}
// Start begins listening. Blocks until ListenAndServe returns
// (typically only on Shutdown). The server is HTTP-only by design;
// production deployments terminate TLS at a reverse proxy in front.
func (s *Server) Start() error {
err := s.srv.ListenAndServe()
if errors.Is(err, stdhttp.ErrServerClosed) {
return nil
}
return err
}
// Shutdown stops accepting new connections and waits up to ctx.Deadline
// for in-flight handlers to finish.
func (s *Server) Shutdown(ctx context.Context) error {
return s.srv.Shutdown(ctx)
}
// Addr returns the configured listen address. Useful in tests when
// the caller passes :0 to get a random port.
func (s *Server) Addr() string { return s.srv.Addr }