# Repo maintenance Backups go in; without maintenance, repos grow forever and eventually fall over. restic-manager runs three maintenance operations on a per-host cadence: | Command | What it does | Default cadence | |----------|-------------------------------------------------------------|-----------------| | `forget` | Marks snapshots eligible for removal per the retention policy attached to each source group. Cheap; runs append-only. | Daily after the last backup of the day | | `prune` | Reclaims space from the repo. Requires the **admin** credential (write+delete). | Weekly, off-peak | | `check` | Verifies repo integrity. Sub-options surface lock state. | Weekly, with `--read-data-subset N%` to sample pack files | A new field on each host row, `host_repo_maintenance`, holds the cron expressions and last-fire anchors. The maintenance ticker on the server runs every 60s, finds hosts whose next-fire is due, and dispatches the right command. The agent's local cron is **only** for backups. ## Why server-side and not agent-side? The agent's cron knows about backups because backups are per-source-group. Maintenance is per-repo, not per-source-group, so doing it server-side keeps the per-host wiring simple: - One ticker, not N agent crons to keep in sync. - Cancelling a maintenance dispatch is just "don't dispatch the next one" — no agent-side state to clean up. - Skipping offline hosts is trivial (no queue; only scheduled *backups* queue into `pending_runs`). ## Forget and the multi-group payload A single `forget` job can target several source groups at once. The wire envelope (`ForgetGroups`) carries one entry per group, each with its retention policy. The agent runs N `restic forget --tag --keep-...` invocations in sequence, streams their output, and reports a single terminal status. ## Prune and the admin credential Prune mutates the repo. The everyday append-only credential **cannot** prune — that's the whole point of append-only. restic-manager keeps a second slot per host (`kind = 'admin'`) for the credential that can. When a prune is dispatched (cadence-driven or operator-driven): 1. Server pushes the admin credential to the agent in a fresh `config.update`. 2. Agent runs `restic prune` with the merged credential. 3. Job finishes; agent discards the admin credential from its in-memory secrets store. The server never logs the merged URL (see [Credentials](./credentials.md)). ## Check and lock state `restic check` warns about stale locks when it finds them. The agent ships every check's output back as a `repo.stats` envelope and a stream of log lines; if a stale lock is detected, the **Repo** page surfaces a banner with an **Unlock** button. The operator-only `unlock` command runs `restic unlock` and clears the banner. `unlock` has no cadence — it's a manual action, never automatic. Auto-unlocking would mask the cause (probably a previously crashed long-running operation) and risk corrupting an operation the operator has merely lost track of. ## Repo stats After every backup, check, prune, and unlock, the agent runs `restic stats --json --mode raw-data` and ships the result as a `repo.stats` envelope. The server stores this in `host_repo_stats` (latest only) and `host_repo_stats_history` (one row per host per day, last-write-wins per column — a prune-only patch never nulls a backup-time size). The host detail page surfaces: - Total size + raw size in the vitals strip. - Last-check timestamp + colour-coded status. - Last-prune timestamp. - 30/90-day repo size trend chart.