P5: OSS readiness — docs site, contributor onboarding, e2e harness
P5-01 — Documentation site under docs/book/ rendered with mdBook
(downloaded via Makefile, same static-binary pattern as Tailwind).
Structured chapters: getting started, concepts, operations,
security, reference. `make docs` / `make docs-watch`. Generated
output gitignored.
P5-02 — CONTRIBUTING.md rewritten from placeholder to a full
guide. CODE_OF_CONDUCT.md adapted from Contributor Covenant for a
single-maintainer project. .gitea/issue_template/{bug,feature}.md
and PULL_REQUEST_TEMPLATE.md.
P5-04 — Six README screenshots captured live from a fresh server
bootstrap (login, empty dashboard, add-host, alerts, settings,
audit log). README rewritten to centre the screenshot grid and
link out to the docs site.
P5-05 — SECURITY.md with disclosure policy (3-day ack, 30-day
default window), scope in/out, threat-model summary, operator
hardening checklist. Mirrored as a docs-site chapter.
P5-06 — End-to-end test harness. e2e/compose.e2e.yml brings up
server + sibling Linux agent (alpine + restic) + restic/rest-server.
Agent uses announce-and-approve so Playwright can drive the full
operator flow: bootstrap → login → accept pending → backup →
verify terminal status. Second spec scrapes /metrics to assert
the P6-04 endpoint surface. .gitea/workflows/e2e.yml runs on every
PR; local how-to in docs/e2e.md.
This commit is contained in:
@@ -0,0 +1,105 @@
|
||||
# Schedules and source groups
|
||||
|
||||
Two related but separable ideas:
|
||||
|
||||
- A **source group** is a named bundle of "what to back up":
|
||||
include paths, exclude patterns, retention policy, retry
|
||||
configuration, optional pre/post hooks. The group's name is
|
||||
used as the restic snapshot tag, so retention can target it
|
||||
with `restic forget --tag <name>`.
|
||||
- A **schedule** is a cron expression that, when it fires,
|
||||
triggers a backup of one or more source groups on a host.
|
||||
|
||||
Decoupling them means you can have one schedule covering several
|
||||
groups (e.g. `0 1 * * *` running both `system` and `data`), and
|
||||
each group has its own retention without duplicating policy
|
||||
across schedules.
|
||||
|
||||
## Source group anatomy
|
||||
|
||||
```yaml
|
||||
name: data
|
||||
includes:
|
||||
- /var/lib/postgresql
|
||||
- /home
|
||||
excludes:
|
||||
- /home/*/.cache
|
||||
- /home/*/Downloads
|
||||
retention:
|
||||
keep_last: 7
|
||||
keep_daily: 14
|
||||
keep_weekly: 4
|
||||
keep_monthly: 6
|
||||
retry_max: 3
|
||||
retry_backoff_seconds: 600
|
||||
pre_hook: |
|
||||
pg_dump -U postgres -F c -f /var/lib/postgresql/dumps/all.dump
|
||||
post_hook: |
|
||||
rm -f /var/lib/postgresql/dumps/all.dump
|
||||
```
|
||||
|
||||
### Conflict detection
|
||||
|
||||
If your retention policy says `keep_hourly: 24` but no schedule
|
||||
points at this group sub-daily, the UI surfaces a
|
||||
**conflict-dimension banner** ("`hourly` won't be honoured —
|
||||
no schedule fires more often than once a day"). The flag is
|
||||
stored on the source group (`conflict_dimension`) and refreshed
|
||||
whenever a schedule or group changes.
|
||||
|
||||
### Hooks
|
||||
|
||||
`pre_hook` and `post_hook` run on the agent host inside
|
||||
`/bin/sh -c` (`cmd.exe /C` on Windows). Output is streamed back
|
||||
to the live job log as `hook(<phase>): …` lines.
|
||||
|
||||
- A non-zero `pre_hook` exit aborts the backup.
|
||||
- `post_hook` always runs, with `RM_JOB_STATUS=succeeded|failed`
|
||||
in the environment. Use this for cleanup that must happen
|
||||
whether the backup worked or not.
|
||||
- Hooks only run for `kind=backup` jobs. They do not run for
|
||||
`forget`, `prune`, `check`, etc.
|
||||
- AEAD-encrypted at rest at the HTTP layer; the agent receives
|
||||
plaintext over the WS channel.
|
||||
|
||||
A "host default" pair of hooks lives on the host itself; a
|
||||
source group's own hooks override them when set.
|
||||
|
||||
## Schedule anatomy
|
||||
|
||||
```yaml
|
||||
cron: "0 2 * * *"
|
||||
enabled: true
|
||||
source_group_ids:
|
||||
- <gid for "data">
|
||||
- <gid for "system">
|
||||
```
|
||||
|
||||
Slim by design: a schedule says **when** and **which groups**.
|
||||
Everything else (paths, retention, hooks) lives on the groups.
|
||||
|
||||
The agent's local cron fires the schedule. If the WebSocket is
|
||||
down at fire time, the server queues the firing into
|
||||
`pending_runs` and drains it on the next agent reconnect — a
|
||||
short network blip won't lose the backup.
|
||||
|
||||
### Last / next run
|
||||
|
||||
The schedules tab shows "next" (computed by parsing the cron
|
||||
expression with `robfig/cron/v3`) and "last" (the latest
|
||||
`actor_kind=schedule` job in the `jobs` table) for every
|
||||
schedule. The dashboard host row also surfaces `next 12h ago/from
|
||||
now` when a single covering schedule is the run-now candidate.
|
||||
|
||||
## Bandwidth limits
|
||||
|
||||
Two places set restic's `--limit-upload` / `--limit-download`:
|
||||
|
||||
1. **Host-wide caps** on the host row (`bandwidth_up_kbps`,
|
||||
`bandwidth_down_kbps`). Pushed to the agent on hello and
|
||||
after `PUT /api/hosts/{id}/bandwidth`. Apply to every restic
|
||||
invocation on the host.
|
||||
2. **Per-job overrides** on the per-source-group Run-now form.
|
||||
Win over host caps for the lifetime of that one job.
|
||||
|
||||
If neither is set, restic runs unthrottled.
|
||||
Reference in New Issue
Block a user