2026-05-07 23:00:04 +01:00
1 changed files with 223 additions and 0 deletions
@@ -0,0 +1,223 @@
+# P6-03 — Repo size trend graphs
+
+Sparkline on the dashboard host row + full chart on the host repo
+page, both showing repo growth over time. Closes the last
+operator-visibility gap in Phase 6 alongside Prometheus metrics
+(P6-04).
+
+## Goals
+
+- Operators can see at a glance whether a host's repo is growing,
+  stable, or shrinking, without leaving the dashboard.
+- A second screen on the repo page exposes the same data over a
+  longer window with a snapshot-count overlay so retention
+  behaviour can be eyeballed against size.
+- Zero new client-side dependencies; matches the existing
+  HTMX + server-rendered idiom used everywhere else in the UI.
+
+## Non-goals
+
+- No backfill of historical data. Trend lights up with whatever
+  the agents report from the day this ships.
+- No per-source-group breakdown — repo-level only.
+- No alerting on growth rate (dedicated to a future ticket if a
+  user asks).
+- No JSON API surface. Prometheus exposure is P6-04, separate.
+
+## Decisions taken in brainstorming
+
+- **Metrics:** `total_size_bytes` (sparkline + chart) and
+  `snapshot_count` (chart only). Raw size dropped as redundant.
+- **Cadence:** one row per `(host_id, UTC date)`, last-write-wins
+  per column. Bounded at ~365 rows/host/year regardless of job
+  frequency.
+- **Backfill:** none. Pure forward-fill from launch day.
+- **Rendering:** server-rendered inline SVG, no JS library.
+- **Spans:** sparkline fixed at 30 days; chart has `30d | 90d | 1y`
+  range selector, server-rendered swap.
+
+## Schema
+
+New migration `internal/store/migrations/0023_host_repo_stats_history.sql`:
+
+```sql
+CREATE TABLE host_repo_stats_history (
+  host_id           TEXT NOT NULL REFERENCES hosts(id) ON DELETE CASCADE,
+  day               TEXT NOT NULL,        -- 'YYYY-MM-DD' UTC
+  total_size_bytes  INTEGER,              -- nullable; partial patches don't overwrite
+  snapshot_count    INTEGER,              -- nullable
+  recorded_at       TEXT NOT NULL,        -- RFC3339Nano of last write touching this row
+  PRIMARY KEY (host_id, day)
+);
+CREATE INDEX host_repo_stats_history_host_day
+  ON host_repo_stats_history(host_id, day DESC);
+```
+
+FK cascade matches every other host-scoped table; deleting a host
+through `Store.DeleteHost` (NS-01) wipes its history automatically.
+
+## Write path
+
+Hook the existing `MsgRepoStats` handler in
+`internal/server/ws/handler.go` (around line 319). After the
+existing `UpsertHostRepoStats(ctx, hostID, patch)` call, append:
+
+```go
+day := time.Now().UTC().Format("2006-01-02")
+if err := deps.Store.UpsertHostRepoStatsHistory(ctx, hostID, day, patch); err != nil {
+    slog.Warn("ws: upsert host repo stats history", "host_id", hostID, "err", err)
+}
+```
+
+A history-write failure is logged and dropped — never blocks the
+main upsert. The partial-update contract that
+`UpsertHostRepoStats` already implements is preserved at the
+history layer:
+
+```sql
+INSERT INTO host_repo_stats_history (host_id, day, total_size_bytes, snapshot_count, recorded_at)
+VALUES (?, ?, ?, ?, ?)
+ON CONFLICT(host_id, day) DO UPDATE SET
+  total_size_bytes = COALESCE(excluded.total_size_bytes, host_repo_stats_history.total_size_bytes),
+  snapshot_count   = COALESCE(excluded.snapshot_count,   host_repo_stats_history.snapshot_count),
+  recorded_at      = excluded.recorded_at;
+```
+
+This is critical: the agent's prune handler in
+`internal/agent/runner/runner.go:318` emits a stats patch that
+only carries `LastPruneAt`. Without `COALESCE`, that prune ack
+would null out a `total_size_bytes` we'd already captured from a
+backup earlier the same day.
+
+## Read path
+
+Two new helpers in `internal/store/host_repo_stats_history.go`:
+
+```go
+type RepoStatsHistoryPoint struct {
+    Day            time.Time   // 00:00:00 UTC
+    TotalSizeBytes *int64
+    SnapshotCount  *int64
+}
+
+func (s *Store) ListHostRepoStatsHistory(
+    ctx context.Context, hostID string, since time.Time,
+) ([]RepoStatsHistoryPoint, error)
+```
+
+Returns rows ordered by `day` ascending where at least one metric
+is non-null. The renderer connects available points with a
+straight line — there is no explicit gap representation. A host
+that was offline for a week shows a single segment spanning the
+gap, which is the right visual: the repo state didn't change.
+
+## Rendering
+
+New package `internal/web/sparkline`. Pure Go, no template
+dependency:
+
+```go
+type Series struct {
+    Name   string
+    Points []float64    // nil-points represented as math.NaN
+    Stroke string       // CSS color
+}
+
+func RenderSparkline(points []float64, width, height int) template.HTML
+func RenderChart(series []Series, days []time.Time, opts ChartOpts) template.HTML
+```
+
+`RenderChart` produces a 600×220 SVG with:
+
+- Light horizontal gridlines (4 bands).
+- Two y-axes: bytes (left, blue) and count (right, amber). Each
+  series is normalised against its own axis.
+- X-axis labels at start, midpoint, and end of the window.
+- Per-point `<circle>` with a `<title>` for hover tooltips —
+  accessible by default, no JS.
+- Empty state: faint dashed baseline + centered "no data yet"
+  text.
+
+Sparkline is 80×20, single blue polyline, single `<title>` on the
+group element showing `"current → 30d ago"`.
+
+Two new partials:
+
+- `web/templates/partials/repo_size_sparkline.html`
+- `web/templates/partials/repo_size_chart.html`
+
+Both call into the renderer with the appropriate opts. No
+inline `<style>` — colours come from existing Tailwind palette
+classes already used elsewhere (`text-blue-500`, `text-amber-500`).
+
+## UI placement
+
+### Dashboard host row
+
+`web/templates/partials/host_row.html` gains one `<td>` between
+the existing "Repo size" cell and "Snapshots" cell. Width ≈ 88px.
+Cell renders the sparkline partial; if `len(points) < 2` the cell
+shows "—" centred (matches the existing no-data idiom for
+last-backup time in the same partial).
+
+The dashboard's existing 5-second htmx live-refresh
+(`hx-trigger="every 5s ..."` from NS-04) re-renders this cell
+along with the rest of the row. No extra polling.
+
+### Host repo page
+
+`web/templates/pages/host_repo.html` gains a "Trend" panel
+inserted between the existing summary panel and the maintenance
+panel. Panel contains:
+
+- Range pills `30d | 90d | 1y` (anchor links with
+  `hx-get="/hosts/{id}/repo/trend?range=…"` and
+  `hx-target="#repo-trend-chart" hx-swap="outerHTML"`).
+- The chart partial wrapped in `<div id="repo-trend-chart">`.
+- A small legend strip below the chart.
+
+## Endpoints
+
+- `GET /hosts/{id}/repo/trend?range=30d|90d|1y` — admin/operator,
+  htmx fragment, returns the chart partial. Auth reuses the
+  existing host-scoped middleware on the `/hosts/{id}` family.
+  Invalid `range` falls back to 30d.
+
+No new admin-only surface — anyone with read access to the host
+can see the trend.
+
+## Testing
+
+- `internal/store/host_repo_stats_history_test.go` — upsert
+  merges partial patches without nulling; ordering; since-day
+  filter; cascade on host delete.
+- `internal/web/sparkline/sparkline_test.go` — golden SVG files
+  for: empty input, single point, full 30-day series, mixed
+  null points. Goldens live under `testdata/`.
+- `internal/server/http/ui_repo_test.go` — trend panel renders
+  with seeded history; range selector swaps server-side; empty
+  state.
+- `internal/server/http/ui_dashboard_test.go` — host row sparkline
+  cell present and renders SVG when points exist, "—" when not.
+- Smoke after build: dashboard row shows sparkline once two days
+  of data exist; repo page chart toggles cleanly between ranges.
+
+## Migration / rollout
+
+- Schema migration is additive — no risk to existing tables.
+- Write path is best-effort; on schema issue the main repo-stats
+  upsert is unaffected.
+- No agent change required, so no fleet update needed.
+
+## Acceptance
+
+- After two days of operation, the dashboard sparkline shows a
+  visible line for any host that has run a backup or
+  maintenance op on both days.
+- Host repo page renders the trend panel with the snapshot-count
+  overlay; range selector switches view without a full page
+  reload.
+- `go test ./...` and `go vet ./...` clean.
+- Smoke env exercise: backup → sparkline updates; range pills
+  swap; FK cascade verified by deleting a host and checking the
+  history table.