Files
restic-manager/CLAUDE.md
T
steve ec0bf0f6c3 P2R-01: REST + WS rewire against the slim shape
Schedules CRUD now takes {cron, enabled, source_group_ids[]} with cron
parsed via robfig/cron/v3 and group membership scoped to the host.
New source-groups CRUD lives at /api/hosts/{id}/source-groups; delete
refuses with 409 if any schedule still references the group, returning
the schedule list so the UI can prompt 'remove from these schedules
first.' Repo-maintenance GET/PUT manages forget/prune/check cadences
on host_repo_maintenance — no version bump, the server-side ticker
(P2R-06) drives execution.

Per-source-group Run-now (POST /hosts/{id}/source-groups/{gid}/run)
resolves the group's includes/excludes/retention/tag and dispatches a
backup command.run with the new structured CommandRunPayload fields
(Includes/Excludes/Tag). Old per-host /hosts/{id}/run-backup and
/hosts/{id}/init-repo return 410 Gone with a redirect message.

schedule_push.go is rebuilt: buildScheduleSetPayload assembles the
slim wire shape, pushScheduleSetOnConn ships it during the on-hello
window, pushScheduleSetAsync fires after every CRUD mutation, and
dispatchScheduledJob handles agent schedule.fire by iterating the
schedule's source groups and dispatching one backup per group with
actor_kind=schedule and scheduled_id pointing at the schedule.

Auto-init at first WS connect: when the host has repo creds bound and
no init job in its history, server dispatches restic init. Restic's
'config file already exists' soft-success means re-runs against an
existing repo no-op; we don't auto-retry on failure (operator triggers
re-init manually via the danger zone in P2R-09).

api.Schedule drops Kind/Paths/Excludes/Tags/RetentionPolicy/Manual etc.
in favour of {id, cron, enabled, source_groups: [...]}. The agent
scheduler stops checking sch.Manual; cmd/agent's backup dispatch reads
Includes/Excludes/Tag instead of Args.

Tests cover the new HTTP surface end-to-end: source-groups CRUD with
in-use refusal, schedule validation (bad cron / missing groups /
foreign group), repo-maintenance auto-seed and validation, the 410
route, and buildScheduleSetPayload's wire-shape correctness. Full
suite passes; smoke env exercises auto-init dispatch on hello,
async push after schedule create, and per-source-group Run-now
landing the right paths/excludes/tag at the agent.
2026-05-03 10:56:40 +01:00

4.0 KiB

CLAUDE.md

Project-specific rules for Claude when working in this repo.

No Co-Authored-By trailers on commits

Don't add Co-Authored-By: Claude ... (or any other co-author trailer) to commit messages in this repo. The README will make it plain that the project is heavily spec-coded, so per-commit attribution is just noise.

After building a new binary, also stage it for the smoke env

The smoke / dev environment runs the server out of bin/ directly, but the agent is fetched by the install script from the server's <DataDir>/agent-binaries/ directory, and the systemd unit + the install script are fetched from <DataDir>/install/. Plain make build doesn't touch any of those — the source-of-truth files in the working tree (deploy/install/*, bin/restic-manager-agent) must be copied into /tmp/rm-smoke/data/... and the running agent on this dev host needs replacing if the change touches agent code or the unit file.

This has bitten the smoke env twice (stale agent without mergeRestCreds; stale unit without User=root + capabilities). Both produced confusing test failures that looked like bugs in the new code but were actually "old binary still running."

Rule: after every make build, run the full restage block before asking the operator to test.

# 1. Restage what the install script serves (binary + unit + script).
cp bin/restic-manager-agent \
   /tmp/rm-smoke/data/agent-binaries/restic-manager-agent-linux-amd64
cp deploy/install/install.sh \
   /tmp/rm-smoke/data/install/install.sh
cp deploy/install/restic-manager-agent.service \
   /tmp/rm-smoke/data/install/restic-manager-agent.service

# 2. Replace the running agent on this dev box and restart the
#    service. Skip only when the change is server-side only AND
#    doesn't include a unit-file edit.
sudo -n install -m 0755 bin/restic-manager-agent \
                        /usr/local/bin/restic-manager-agent
sudo -n install -m 0644 deploy/install/restic-manager-agent.service \
                        /etc/systemd/system/restic-manager-agent.service
sudo -n systemctl daemon-reload
sudo -n systemctl restart restic-manager-agent

# 3. The server runs from the working tree; restart it manually
#    after a build that touches server code:
pkill -f restic-manager-server
RM_LISTEN=:8080 RM_DATA_DIR=/tmp/rm-smoke/data \
RM_BASE_URL=http://127.0.0.1:8080 \
RM_SECRET_KEY_FILE=/tmp/rm-smoke/data/secret.key \
RM_COOKIE_SECURE=false \
./bin/restic-manager-server >> /tmp/rm-smoke/server.log 2>&1 &

A make smoke-deploy target that bundles all of this would be a good follow-up.

Migrations: prefer column-level ALTERs over table rebuilds

SQLite ≥ 3.35 supports ALTER TABLE ... DROP COLUMN and ALTER TABLE ... RENAME COLUMN. Use them. The "rename-old + create-new + copy + drop-old" pattern is unsafe in this codebase because the connection DSN sets PRAGMA foreign_keys=ON, and DROP TABLE on a parent with ON DELETE CASCADE children wipes every dependent table. We hit this in migration 0007 (first draft) and lost the entire smoke env's schedules / jobs / snapshots / host_credentials.

PRAGMA foreign_keys = OFF inside a migration is a no-op — that PRAGMA can only change outside a transaction, and migrations run in one. So the cascade-trap can't be defused that way; just avoid the rebuild pattern when there are inbound FKs.

If a column-level ALTER won't do what you need (e.g. tightening a CHECK), use the safe rebuild order: create new with a temp name → copy → DROP old → ALTER new RENAME TO old. Never rename the original first; that propagates the rename into dependent FKs and leaves them dangling after the eventual drop.

Don't slog the merged rest-server URL

restic.Env.RepoURL is bare (no creds). The user:pass@-embedded form is built only inside envSlice() at the moment of exec.Command and is fed straight to the subprocess. Never store it on a struct field. Never pass it to slog. If a URL needs to appear in any operator-readable surface, run it through restic.RedactURL() first — that mirrors restic's own *** substitution.