Files
restic-manager/CLAUDE.md
T
steve 1b9b23f205
CI / Test (rest) (pull_request) Successful in 46s
CI / Test (store) (pull_request) Successful in 1m34s
CI / Test (server-http) (pull_request) Successful in 1m46s
CI / Build (linux/amd64) (pull_request) Successful in 23s
CI / Build (windows/amd64) (pull_request) Successful in 41s
CI / Build (linux/arm64) (pull_request) Successful in 23s
CI / Lint (pull_request) Successful in 2m9s
smoke env: systemd --user unit + Make targets so the dev server outlives shell tool boundaries
Spent half an evening fighting a smoke server that kept getting SIGTERM'd
mid-iteration. Root cause: backgrounded processes spawned from sandboxed
shell tool calls don't outlive the parent — even with nohup + disown.

Fix: hand the server to user-systemd as a transient unit so its lifecycle
is owned by the user's session, not by whichever bash subprocess started it.
New Make targets:

  make smoke-restart   build server + (re)launch as systemd --user unit
  make smoke-status    show unit status
  make smoke-logs      tail $HOME/smoke/server.log
  make smoke-stop      stop the unit
  make smoke-deploy    full rebuild + restage agent assets + restart

Documents the workflow in CLAUDE.md so the next session doesn't relitigate.
2026-05-07 22:55:36 +01:00

139 lines
5.7 KiB
Markdown

# CLAUDE.md
Project-specific rules for Claude when working in this repo.
## Commands
Is the user types in any of the following, follow the instructions in the table
| Command | Action |
| --- | --- |
| :release | trigger subagent to commit (if needed), push (if needed), raise PR, wait for PR to pass or fail. If fail, report back. If pass, merge in to main |
## Repo
The repo lives inside a Gitea instance; `tea` CLI is available for use by agents
## Run `go vet` before every commit
CI runs `go vet ./...` and will fail the build on any vet error.
Run it locally before staging a commit and fix anything it flags.
A common one is `res, _ := http.Do(...); defer res.Body.Close()`
if `err != nil` then `res` may be nil and the deferred close
panics. Always check the error before touching `res`.
## No `Co-Authored-By` trailers on commits
Don't add `Co-Authored-By: Claude ...` (or any other co-author
trailer) to commit messages in this repo. The README will make it
plain that the project is heavily spec-coded, so per-commit
attribution is just noise.
## After building a new binary, also stage it for the smoke env
The smoke / dev environment runs the server out of `bin/` directly,
but the **agent** is fetched by the install script from the server's
`<DataDir>/agent-binaries/` directory, and the **systemd unit** + the
**install script** are fetched from `<DataDir>/install/`. Plain
`make build` doesn't touch any of those — the source-of-truth files
in the working tree (`deploy/install/*`, `bin/restic-manager-agent`)
must be copied into `$HOME/smoke/data/...` *and* the running agent
on this dev host needs replacing if the change touches agent code or
the unit file.
This has bitten the smoke env twice (stale agent without
`mergeRestCreds`; stale unit without `User=root` + capabilities).
Both produced confusing test failures that looked like bugs in the
new code but were actually "old binary still running."
**Rule: after every `make build`, run the full restage block before
asking the operator to test.**
```sh
# 1. Restage what the install script serves (binary + unit + script).
cp bin/restic-manager-agent \
$HOME/smoke/data/agent-binaries/restic-manager-agent-linux-amd64
cp deploy/install/install.sh \
$HOME/smoke/data/install/install.sh
cp deploy/install/install.ps1 \
$HOME/smoke/data/install/install.ps1
cp deploy/install/restic-manager-agent.service \
$HOME/smoke/data/install/restic-manager-agent.service
# 2. Replace the running agent on this dev box and restart the
# service. Skip only when the change is server-side only AND
# doesn't include a unit-file edit.
sudo -n install -m 0755 bin/restic-manager-agent \
/usr/local/bin/restic-manager-agent
sudo -n install -m 0644 deploy/install/restic-manager-agent.service \
/etc/systemd/system/restic-manager-agent.service
sudo -n systemctl daemon-reload
sudo -n systemctl restart restic-manager-agent
# 3. The server runs from the working tree; restart it manually
# after a build that touches server code:
pkill -f restic-manager-server
RM_LISTEN=:8080 RM_DATA_DIR=$HOME/smoke/data \
RM_BASE_URL=http://127.0.0.1:8080 \
RM_SECRET_KEY_FILE=$HOME/smoke/data/secret.key \
RM_COOKIE_SECURE=false \
./bin/restic-manager-server >> $HOME/smoke/server.log 2>&1 &
```
## Smoke server: use the Make targets, not raw `nohup`
The smoke server runs as a transient `systemd --user` unit named
`restic-manager-smoke.service` so it survives any sandbox or
process-group boundary that would otherwise SIGTERM a backgrounded
process. Use the Make targets:
```
make smoke-restart # rebuild server + (re)launch as systemd --user unit
make smoke-status # systemctl --user status
make smoke-logs # tail $HOME/smoke/server.log
make smoke-stop # stop the unit
make smoke-deploy # full rebuild + restage agent assets + restart
```
`./bin/restic-manager-server &` from inside a Bash tool call gets
reaped when the tool exits — don't do that. If the unit fails to
start: `systemctl --user status restic-manager-smoke` and
`$HOME/smoke/server.log` have the diagnosis.
`smoke-deploy` does NOT touch `/usr/local/bin/restic-manager-agent`
on this dev box; if your change requires the live agent here to
update, run the agent restage block above by hand.
## Migrations: prefer column-level ALTERs over table rebuilds
SQLite ≥ 3.35 supports `ALTER TABLE ... DROP COLUMN` and
`ALTER TABLE ... RENAME COLUMN`. Use them. The
"rename-old + create-new + copy + drop-old" pattern is unsafe in
this codebase because the connection DSN sets
`PRAGMA foreign_keys=ON`, and `DROP TABLE` on a parent with
`ON DELETE CASCADE` children **wipes every dependent table**. We
hit this in migration 0007 (first draft) and lost the entire
smoke env's schedules / jobs / snapshots / host_credentials.
`PRAGMA foreign_keys = OFF` inside a migration is a no-op — that
PRAGMA can only change outside a transaction, and migrations run
in one. So the cascade-trap can't be defused that way; just avoid
the rebuild pattern when there are inbound FKs.
If a column-level ALTER won't do what you need (e.g. tightening a
CHECK), use the safe rebuild order: **create new with a temp name
→ copy → DROP old → ALTER new RENAME TO old**. Never rename the
original first; that propagates the rename into dependent FKs and
leaves them dangling after the eventual drop.
## Don't slog the merged rest-server URL
`restic.Env.RepoURL` is bare (no creds). The `user:pass@`-embedded
form is built only inside `envSlice()` at the moment of
`exec.Command` and is fed straight to the subprocess. Never store
it on a struct field. Never pass it to `slog`. If a URL needs to
appear in any operator-readable surface, run it through
`restic.RedactURL()` first — that mirrors restic's own `***`
substitution.