Files
restic-manager/CLAUDE.md
T
steve d000fe7ec1 P2R-01: REST + WS rewire against the slim shape
Schedules CRUD now takes {cron, enabled, source_group_ids[]} with cron
parsed via robfig/cron/v3 and group membership scoped to the host.
New source-groups CRUD lives at /api/hosts/{id}/source-groups; delete
refuses with 409 if any schedule still references the group, returning
the schedule list so the UI can prompt 'remove from these schedules
first.' Repo-maintenance GET/PUT manages forget/prune/check cadences
on host_repo_maintenance — no version bump, the server-side ticker
(P2R-06) drives execution.

Per-source-group Run-now (POST /hosts/{id}/source-groups/{gid}/run)
resolves the group's includes/excludes/retention/tag and dispatches a
backup command.run with the new structured CommandRunPayload fields
(Includes/Excludes/Tag). Old per-host /hosts/{id}/run-backup and
/hosts/{id}/init-repo return 410 Gone with a redirect message.

schedule_push.go is rebuilt: buildScheduleSetPayload assembles the
slim wire shape, pushScheduleSetOnConn ships it during the on-hello
window, pushScheduleSetAsync fires after every CRUD mutation, and
dispatchScheduledJob handles agent schedule.fire by iterating the
schedule's source groups and dispatching one backup per group with
actor_kind=schedule and scheduled_id pointing at the schedule.

Auto-init at first WS connect: when the host has repo creds bound and
no init job in its history, server dispatches restic init. Restic's
'config file already exists' soft-success means re-runs against an
existing repo no-op; we don't auto-retry on failure (operator triggers
re-init manually via the danger zone in P2R-09).

api.Schedule drops Kind/Paths/Excludes/Tags/RetentionPolicy/Manual etc.
in favour of {id, cron, enabled, source_groups: [...]}. The agent
scheduler stops checking sch.Manual; cmd/agent's backup dispatch reads
Includes/Excludes/Tag instead of Args.

Tests cover the new HTTP surface end-to-end: source-groups CRUD with
in-use refusal, schedule validation (bad cron / missing groups /
foreign group), repo-maintenance auto-seed and validation, the 410
route, and buildScheduleSetPayload's wire-shape correctness. Full
suite passes; smoke env exercises auto-init dispatch on hello,
async push after schedule create, and per-source-group Run-now
landing the right paths/excludes/tag at the agent.
2026-05-03 10:56:40 +01:00

95 lines
4.0 KiB
Markdown

# CLAUDE.md
Project-specific rules for Claude when working in this repo.
## No `Co-Authored-By` trailers on commits
Don't add `Co-Authored-By: Claude ...` (or any other co-author
trailer) to commit messages in this repo. The README will make it
plain that the project is heavily spec-coded, so per-commit
attribution is just noise.
## After building a new binary, also stage it for the smoke env
The smoke / dev environment runs the server out of `bin/` directly,
but the **agent** is fetched by the install script from the server's
`<DataDir>/agent-binaries/` directory, and the **systemd unit** + the
**install script** are fetched from `<DataDir>/install/`. Plain
`make build` doesn't touch any of those — the source-of-truth files
in the working tree (`deploy/install/*`, `bin/restic-manager-agent`)
must be copied into `/tmp/rm-smoke/data/...` *and* the running agent
on this dev host needs replacing if the change touches agent code or
the unit file.
This has bitten the smoke env twice (stale agent without
`mergeRestCreds`; stale unit without `User=root` + capabilities).
Both produced confusing test failures that looked like bugs in the
new code but were actually "old binary still running."
**Rule: after every `make build`, run the full restage block before
asking the operator to test.**
```sh
# 1. Restage what the install script serves (binary + unit + script).
cp bin/restic-manager-agent \
/tmp/rm-smoke/data/agent-binaries/restic-manager-agent-linux-amd64
cp deploy/install/install.sh \
/tmp/rm-smoke/data/install/install.sh
cp deploy/install/restic-manager-agent.service \
/tmp/rm-smoke/data/install/restic-manager-agent.service
# 2. Replace the running agent on this dev box and restart the
# service. Skip only when the change is server-side only AND
# doesn't include a unit-file edit.
sudo -n install -m 0755 bin/restic-manager-agent \
/usr/local/bin/restic-manager-agent
sudo -n install -m 0644 deploy/install/restic-manager-agent.service \
/etc/systemd/system/restic-manager-agent.service
sudo -n systemctl daemon-reload
sudo -n systemctl restart restic-manager-agent
# 3. The server runs from the working tree; restart it manually
# after a build that touches server code:
pkill -f restic-manager-server
RM_LISTEN=:8080 RM_DATA_DIR=/tmp/rm-smoke/data \
RM_BASE_URL=http://127.0.0.1:8080 \
RM_SECRET_KEY_FILE=/tmp/rm-smoke/data/secret.key \
RM_COOKIE_SECURE=false \
./bin/restic-manager-server >> /tmp/rm-smoke/server.log 2>&1 &
```
A `make smoke-deploy` target that bundles all of this would be a
good follow-up.
## Migrations: prefer column-level ALTERs over table rebuilds
SQLite ≥ 3.35 supports `ALTER TABLE ... DROP COLUMN` and
`ALTER TABLE ... RENAME COLUMN`. Use them. The
"rename-old + create-new + copy + drop-old" pattern is unsafe in
this codebase because the connection DSN sets
`PRAGMA foreign_keys=ON`, and `DROP TABLE` on a parent with
`ON DELETE CASCADE` children **wipes every dependent table**. We
hit this in migration 0007 (first draft) and lost the entire
smoke env's schedules / jobs / snapshots / host_credentials.
`PRAGMA foreign_keys = OFF` inside a migration is a no-op — that
PRAGMA can only change outside a transaction, and migrations run
in one. So the cascade-trap can't be defused that way; just avoid
the rebuild pattern when there are inbound FKs.
If a column-level ALTER won't do what you need (e.g. tightening a
CHECK), use the safe rebuild order: **create new with a temp name
→ copy → DROP old → ALTER new RENAME TO old**. Never rename the
original first; that propagates the rename into dependent FKs and
leaves them dangling after the eventual drop.
## Don't slog the merged rest-server URL
`restic.Env.RepoURL` is bare (no creds). The `user:pass@`-embedded
form is built only inside `envSlice()` at the moment of
`exec.Command` and is fed straight to the subprocess. Never store
it on a struct field. Never pass it to `slog`. If a URL needs to
appear in any operator-readable surface, run it through
`restic.RedactURL()` first — that mirrors restic's own `***`
substitution.