Files
restic-manager/docs/superpowers/specs/2026-05-07-p6-03-repo-size-trend-design.md
T

224 lines
8.0 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# P6-03 — Repo size trend graphs
Sparkline on the dashboard host row + full chart on the host repo
page, both showing repo growth over time. Closes the last
operator-visibility gap in Phase 6 alongside Prometheus metrics
(P6-04).
## Goals
- Operators can see at a glance whether a host's repo is growing,
stable, or shrinking, without leaving the dashboard.
- A second screen on the repo page exposes the same data over a
longer window with a snapshot-count overlay so retention
behaviour can be eyeballed against size.
- Zero new client-side dependencies; matches the existing
HTMX + server-rendered idiom used everywhere else in the UI.
## Non-goals
- No backfill of historical data. Trend lights up with whatever
the agents report from the day this ships.
- No per-source-group breakdown — repo-level only.
- No alerting on growth rate (dedicated to a future ticket if a
user asks).
- No JSON API surface. Prometheus exposure is P6-04, separate.
## Decisions taken in brainstorming
- **Metrics:** `total_size_bytes` (sparkline + chart) and
`snapshot_count` (chart only). Raw size dropped as redundant.
- **Cadence:** one row per `(host_id, UTC date)`, last-write-wins
per column. Bounded at ~365 rows/host/year regardless of job
frequency.
- **Backfill:** none. Pure forward-fill from launch day.
- **Rendering:** server-rendered inline SVG, no JS library.
- **Spans:** sparkline fixed at 30 days; chart has `30d | 90d | 1y`
range selector, server-rendered swap.
## Schema
New migration `internal/store/migrations/0023_host_repo_stats_history.sql`:
```sql
CREATE TABLE host_repo_stats_history (
host_id TEXT NOT NULL REFERENCES hosts(id) ON DELETE CASCADE,
day TEXT NOT NULL, -- 'YYYY-MM-DD' UTC
total_size_bytes INTEGER, -- nullable; partial patches don't overwrite
snapshot_count INTEGER, -- nullable
recorded_at TEXT NOT NULL, -- RFC3339Nano of last write touching this row
PRIMARY KEY (host_id, day)
);
CREATE INDEX host_repo_stats_history_host_day
ON host_repo_stats_history(host_id, day DESC);
```
FK cascade matches every other host-scoped table; deleting a host
through `Store.DeleteHost` (NS-01) wipes its history automatically.
## Write path
Hook the existing `MsgRepoStats` handler in
`internal/server/ws/handler.go` (around line 319). After the
existing `UpsertHostRepoStats(ctx, hostID, patch)` call, append:
```go
day := time.Now().UTC().Format("2006-01-02")
if err := deps.Store.UpsertHostRepoStatsHistory(ctx, hostID, day, patch); err != nil {
slog.Warn("ws: upsert host repo stats history", "host_id", hostID, "err", err)
}
```
A history-write failure is logged and dropped — never blocks the
main upsert. The partial-update contract that
`UpsertHostRepoStats` already implements is preserved at the
history layer:
```sql
INSERT INTO host_repo_stats_history (host_id, day, total_size_bytes, snapshot_count, recorded_at)
VALUES (?, ?, ?, ?, ?)
ON CONFLICT(host_id, day) DO UPDATE SET
total_size_bytes = COALESCE(excluded.total_size_bytes, host_repo_stats_history.total_size_bytes),
snapshot_count = COALESCE(excluded.snapshot_count, host_repo_stats_history.snapshot_count),
recorded_at = excluded.recorded_at;
```
This is critical: the agent's prune handler in
`internal/agent/runner/runner.go:318` emits a stats patch that
only carries `LastPruneAt`. Without `COALESCE`, that prune ack
would null out a `total_size_bytes` we'd already captured from a
backup earlier the same day.
## Read path
Two new helpers in `internal/store/host_repo_stats_history.go`:
```go
type RepoStatsHistoryPoint struct {
Day time.Time // 00:00:00 UTC
TotalSizeBytes *int64
SnapshotCount *int64
}
func (s *Store) ListHostRepoStatsHistory(
ctx context.Context, hostID string, since time.Time,
) ([]RepoStatsHistoryPoint, error)
```
Returns rows ordered by `day` ascending where at least one metric
is non-null. The renderer connects available points with a
straight line — there is no explicit gap representation. A host
that was offline for a week shows a single segment spanning the
gap, which is the right visual: the repo state didn't change.
## Rendering
New package `internal/web/sparkline`. Pure Go, no template
dependency:
```go
type Series struct {
Name string
Points []float64 // nil-points represented as math.NaN
Stroke string // CSS color
}
func RenderSparkline(points []float64, width, height int) template.HTML
func RenderChart(series []Series, days []time.Time, opts ChartOpts) template.HTML
```
`RenderChart` produces a 600×220 SVG with:
- Light horizontal gridlines (4 bands).
- Two y-axes: bytes (left, blue) and count (right, amber). Each
series is normalised against its own axis.
- X-axis labels at start, midpoint, and end of the window.
- Per-point `<circle>` with a `<title>` for hover tooltips —
accessible by default, no JS.
- Empty state: faint dashed baseline + centered "no data yet"
text.
Sparkline is 80×20, single blue polyline, single `<title>` on the
group element showing `"current → 30d ago"`.
Two new partials:
- `web/templates/partials/repo_size_sparkline.html`
- `web/templates/partials/repo_size_chart.html`
Both call into the renderer with the appropriate opts. No
inline `<style>` — colours come from existing Tailwind palette
classes already used elsewhere (`text-blue-500`, `text-amber-500`).
## UI placement
### Dashboard host row
`web/templates/partials/host_row.html` gains one `<td>` between
the existing "Repo size" cell and "Snapshots" cell. Width ≈ 88px.
Cell renders the sparkline partial; if `len(points) < 2` the cell
shows "—" centred (matches the existing no-data idiom for
last-backup time in the same partial).
The dashboard's existing 5-second htmx live-refresh
(`hx-trigger="every 5s ..."` from NS-04) re-renders this cell
along with the rest of the row. No extra polling.
### Host repo page
`web/templates/pages/host_repo.html` gains a "Trend" panel
inserted between the existing summary panel and the maintenance
panel. Panel contains:
- Range pills `30d | 90d | 1y` (anchor links with
`hx-get="/hosts/{id}/repo/trend?range=…"` and
`hx-target="#repo-trend-chart" hx-swap="outerHTML"`).
- The chart partial wrapped in `<div id="repo-trend-chart">`.
- A small legend strip below the chart.
## Endpoints
- `GET /hosts/{id}/repo/trend?range=30d|90d|1y` — admin/operator,
htmx fragment, returns the chart partial. Auth reuses the
existing host-scoped middleware on the `/hosts/{id}` family.
Invalid `range` falls back to 30d.
No new admin-only surface — anyone with read access to the host
can see the trend.
## Testing
- `internal/store/host_repo_stats_history_test.go` — upsert
merges partial patches without nulling; ordering; since-day
filter; cascade on host delete.
- `internal/web/sparkline/sparkline_test.go` — golden SVG files
for: empty input, single point, full 30-day series, mixed
null points. Goldens live under `testdata/`.
- `internal/server/http/ui_repo_test.go` — trend panel renders
with seeded history; range selector swaps server-side; empty
state.
- `internal/server/http/ui_dashboard_test.go` — host row sparkline
cell present and renders SVG when points exist, "—" when not.
- Smoke after build: dashboard row shows sparkline once two days
of data exist; repo page chart toggles cleanly between ranges.
## Migration / rollout
- Schema migration is additive — no risk to existing tables.
- Write path is best-effort; on schema issue the main repo-stats
upsert is unaffected.
- No agent change required, so no fleet update needed.
## Acceptance
- After two days of operation, the dashboard sparkline shows a
visible line for any host that has run a backup or
maintenance op on both days.
- Host repo page renders the trend panel with the snapshot-count
overlay; range selector switches view without a full page
reload.
- `go test ./...` and `go vet ./...` clean.
- Smoke env exercise: backup → sparkline updates; range pills
swap; FK cascade verified by deleting a host and checking the
history table.