Files
restic-manager/docs/superpowers/specs/2026-05-07-p6-03-repo-size-trend-design.md
T

8.0 KiB
Raw Blame History

P6-03 — Repo size trend graphs

Sparkline on the dashboard host row + full chart on the host repo page, both showing repo growth over time. Closes the last operator-visibility gap in Phase 6 alongside Prometheus metrics (P6-04).

Goals

  • Operators can see at a glance whether a host's repo is growing, stable, or shrinking, without leaving the dashboard.
  • A second screen on the repo page exposes the same data over a longer window with a snapshot-count overlay so retention behaviour can be eyeballed against size.
  • Zero new client-side dependencies; matches the existing HTMX + server-rendered idiom used everywhere else in the UI.

Non-goals

  • No backfill of historical data. Trend lights up with whatever the agents report from the day this ships.
  • No per-source-group breakdown — repo-level only.
  • No alerting on growth rate (dedicated to a future ticket if a user asks).
  • No JSON API surface. Prometheus exposure is P6-04, separate.

Decisions taken in brainstorming

  • Metrics: total_size_bytes (sparkline + chart) and snapshot_count (chart only). Raw size dropped as redundant.
  • Cadence: one row per (host_id, UTC date), last-write-wins per column. Bounded at ~365 rows/host/year regardless of job frequency.
  • Backfill: none. Pure forward-fill from launch day.
  • Rendering: server-rendered inline SVG, no JS library.
  • Spans: sparkline fixed at 30 days; chart has 30d | 90d | 1y range selector, server-rendered swap.

Schema

New migration internal/store/migrations/0023_host_repo_stats_history.sql:

CREATE TABLE host_repo_stats_history (
  host_id           TEXT NOT NULL REFERENCES hosts(id) ON DELETE CASCADE,
  day               TEXT NOT NULL,        -- 'YYYY-MM-DD' UTC
  total_size_bytes  INTEGER,              -- nullable; partial patches don't overwrite
  snapshot_count    INTEGER,              -- nullable
  recorded_at       TEXT NOT NULL,        -- RFC3339Nano of last write touching this row
  PRIMARY KEY (host_id, day)
);
CREATE INDEX host_repo_stats_history_host_day
  ON host_repo_stats_history(host_id, day DESC);

FK cascade matches every other host-scoped table; deleting a host through Store.DeleteHost (NS-01) wipes its history automatically.

Write path

Hook the existing MsgRepoStats handler in internal/server/ws/handler.go (around line 319). After the existing UpsertHostRepoStats(ctx, hostID, patch) call, append:

day := time.Now().UTC().Format("2006-01-02")
if err := deps.Store.UpsertHostRepoStatsHistory(ctx, hostID, day, patch); err != nil {
    slog.Warn("ws: upsert host repo stats history", "host_id", hostID, "err", err)
}

A history-write failure is logged and dropped — never blocks the main upsert. The partial-update contract that UpsertHostRepoStats already implements is preserved at the history layer:

INSERT INTO host_repo_stats_history (host_id, day, total_size_bytes, snapshot_count, recorded_at)
VALUES (?, ?, ?, ?, ?)
ON CONFLICT(host_id, day) DO UPDATE SET
  total_size_bytes = COALESCE(excluded.total_size_bytes, host_repo_stats_history.total_size_bytes),
  snapshot_count   = COALESCE(excluded.snapshot_count,   host_repo_stats_history.snapshot_count),
  recorded_at      = excluded.recorded_at;

This is critical: the agent's prune handler in internal/agent/runner/runner.go:318 emits a stats patch that only carries LastPruneAt. Without COALESCE, that prune ack would null out a total_size_bytes we'd already captured from a backup earlier the same day.

Read path

Two new helpers in internal/store/host_repo_stats_history.go:

type RepoStatsHistoryPoint struct {
    Day            time.Time   // 00:00:00 UTC
    TotalSizeBytes *int64
    SnapshotCount  *int64
}

func (s *Store) ListHostRepoStatsHistory(
    ctx context.Context, hostID string, since time.Time,
) ([]RepoStatsHistoryPoint, error)

Returns rows ordered by day ascending where at least one metric is non-null. The renderer connects available points with a straight line — there is no explicit gap representation. A host that was offline for a week shows a single segment spanning the gap, which is the right visual: the repo state didn't change.

Rendering

New package internal/web/sparkline. Pure Go, no template dependency:

type Series struct {
    Name   string
    Points []float64    // nil-points represented as math.NaN
    Stroke string       // CSS color
}

func RenderSparkline(points []float64, width, height int) template.HTML
func RenderChart(series []Series, days []time.Time, opts ChartOpts) template.HTML

RenderChart produces a 600×220 SVG with:

  • Light horizontal gridlines (4 bands).
  • Two y-axes: bytes (left, blue) and count (right, amber). Each series is normalised against its own axis.
  • X-axis labels at start, midpoint, and end of the window.
  • Per-point <circle> with a <title> for hover tooltips — accessible by default, no JS.
  • Empty state: faint dashed baseline + centered "no data yet" text.

Sparkline is 80×20, single blue polyline, single <title> on the group element showing "current → 30d ago".

Two new partials:

  • web/templates/partials/repo_size_sparkline.html
  • web/templates/partials/repo_size_chart.html

Both call into the renderer with the appropriate opts. No inline <style> — colours come from existing Tailwind palette classes already used elsewhere (text-blue-500, text-amber-500).

UI placement

Dashboard host row

web/templates/partials/host_row.html gains one <td> between the existing "Repo size" cell and "Snapshots" cell. Width ≈ 88px. Cell renders the sparkline partial; if len(points) < 2 the cell shows "—" centred (matches the existing no-data idiom for last-backup time in the same partial).

The dashboard's existing 5-second htmx live-refresh (hx-trigger="every 5s ..." from NS-04) re-renders this cell along with the rest of the row. No extra polling.

Host repo page

web/templates/pages/host_repo.html gains a "Trend" panel inserted between the existing summary panel and the maintenance panel. Panel contains:

  • Range pills 30d | 90d | 1y (anchor links with hx-get="/hosts/{id}/repo/trend?range=…" and hx-target="#repo-trend-chart" hx-swap="outerHTML").
  • The chart partial wrapped in <div id="repo-trend-chart">.
  • A small legend strip below the chart.

Endpoints

  • GET /hosts/{id}/repo/trend?range=30d|90d|1y — admin/operator, htmx fragment, returns the chart partial. Auth reuses the existing host-scoped middleware on the /hosts/{id} family. Invalid range falls back to 30d.

No new admin-only surface — anyone with read access to the host can see the trend.

Testing

  • internal/store/host_repo_stats_history_test.go — upsert merges partial patches without nulling; ordering; since-day filter; cascade on host delete.
  • internal/web/sparkline/sparkline_test.go — golden SVG files for: empty input, single point, full 30-day series, mixed null points. Goldens live under testdata/.
  • internal/server/http/ui_repo_test.go — trend panel renders with seeded history; range selector swaps server-side; empty state.
  • internal/server/http/ui_dashboard_test.go — host row sparkline cell present and renders SVG when points exist, "—" when not.
  • Smoke after build: dashboard row shows sparkline once two days of data exist; repo page chart toggles cleanly between ranges.

Migration / rollout

  • Schema migration is additive — no risk to existing tables.
  • Write path is best-effort; on schema issue the main repo-stats upsert is unaffected.
  • No agent change required, so no fleet update needed.

Acceptance

  • After two days of operation, the dashboard sparkline shows a visible line for any host that has run a backup or maintenance op on both days.
  • Host repo page renders the trend panel with the snapshot-count overlay; range selector switches view without a full page reload.
  • go test ./... and go vet ./... clean.
  • Smoke env exercise: backup → sparkline updates; range pills swap; FK cascade verified by deleting a host and checking the history table.