Spent half an evening fighting a smoke server that kept getting SIGTERM'd mid-iteration. Root cause: backgrounded processes spawned from sandboxed shell tool calls don't outlive the parent — even with nohup + disown. Fix: hand the server to user-systemd as a transient unit so its lifecycle is owned by the user's session, not by whichever bash subprocess started it. New Make targets: make smoke-restart build server + (re)launch as systemd --user unit make smoke-status show unit status make smoke-logs tail $HOME/smoke/server.log make smoke-stop stop the unit make smoke-deploy full rebuild + restage agent assets + restart Documents the workflow in CLAUDE.md so the next session doesn't relitigate.
5.7 KiB
CLAUDE.md
Project-specific rules for Claude when working in this repo.
Commands
Is the user types in any of the following, follow the instructions in the table
| Command | Action |
|---|---|
| :release | trigger subagent to commit (if needed), push (if needed), raise PR, wait for PR to pass or fail. If fail, report back. If pass, merge in to main |
Repo
The repo lives inside a Gitea instance; tea CLI is available for use by agents
Run go vet before every commit
CI runs go vet ./... and will fail the build on any vet error.
Run it locally before staging a commit and fix anything it flags.
A common one is res, _ := http.Do(...); defer res.Body.Close() —
if err != nil then res may be nil and the deferred close
panics. Always check the error before touching res.
No Co-Authored-By trailers on commits
Don't add Co-Authored-By: Claude ... (or any other co-author
trailer) to commit messages in this repo. The README will make it
plain that the project is heavily spec-coded, so per-commit
attribution is just noise.
After building a new binary, also stage it for the smoke env
The smoke / dev environment runs the server out of bin/ directly,
but the agent is fetched by the install script from the server's
<DataDir>/agent-binaries/ directory, and the systemd unit + the
install script are fetched from <DataDir>/install/. Plain
make build doesn't touch any of those — the source-of-truth files
in the working tree (deploy/install/*, bin/restic-manager-agent)
must be copied into $HOME/smoke/data/... and the running agent
on this dev host needs replacing if the change touches agent code or
the unit file.
This has bitten the smoke env twice (stale agent without
mergeRestCreds; stale unit without User=root + capabilities).
Both produced confusing test failures that looked like bugs in the
new code but were actually "old binary still running."
Rule: after every make build, run the full restage block before
asking the operator to test.
# 1. Restage what the install script serves (binary + unit + script).
cp bin/restic-manager-agent \
$HOME/smoke/data/agent-binaries/restic-manager-agent-linux-amd64
cp deploy/install/install.sh \
$HOME/smoke/data/install/install.sh
cp deploy/install/install.ps1 \
$HOME/smoke/data/install/install.ps1
cp deploy/install/restic-manager-agent.service \
$HOME/smoke/data/install/restic-manager-agent.service
# 2. Replace the running agent on this dev box and restart the
# service. Skip only when the change is server-side only AND
# doesn't include a unit-file edit.
sudo -n install -m 0755 bin/restic-manager-agent \
/usr/local/bin/restic-manager-agent
sudo -n install -m 0644 deploy/install/restic-manager-agent.service \
/etc/systemd/system/restic-manager-agent.service
sudo -n systemctl daemon-reload
sudo -n systemctl restart restic-manager-agent
# 3. The server runs from the working tree; restart it manually
# after a build that touches server code:
pkill -f restic-manager-server
RM_LISTEN=:8080 RM_DATA_DIR=$HOME/smoke/data \
RM_BASE_URL=http://127.0.0.1:8080 \
RM_SECRET_KEY_FILE=$HOME/smoke/data/secret.key \
RM_COOKIE_SECURE=false \
./bin/restic-manager-server >> $HOME/smoke/server.log 2>&1 &
Smoke server: use the Make targets, not raw nohup
The smoke server runs as a transient systemd --user unit named
restic-manager-smoke.service so it survives any sandbox or
process-group boundary that would otherwise SIGTERM a backgrounded
process. Use the Make targets:
make smoke-restart # rebuild server + (re)launch as systemd --user unit
make smoke-status # systemctl --user status
make smoke-logs # tail $HOME/smoke/server.log
make smoke-stop # stop the unit
make smoke-deploy # full rebuild + restage agent assets + restart
./bin/restic-manager-server & from inside a Bash tool call gets
reaped when the tool exits — don't do that. If the unit fails to
start: systemctl --user status restic-manager-smoke and
$HOME/smoke/server.log have the diagnosis.
smoke-deploy does NOT touch /usr/local/bin/restic-manager-agent
on this dev box; if your change requires the live agent here to
update, run the agent restage block above by hand.
Migrations: prefer column-level ALTERs over table rebuilds
SQLite ≥ 3.35 supports ALTER TABLE ... DROP COLUMN and
ALTER TABLE ... RENAME COLUMN. Use them. The
"rename-old + create-new + copy + drop-old" pattern is unsafe in
this codebase because the connection DSN sets
PRAGMA foreign_keys=ON, and DROP TABLE on a parent with
ON DELETE CASCADE children wipes every dependent table. We
hit this in migration 0007 (first draft) and lost the entire
smoke env's schedules / jobs / snapshots / host_credentials.
PRAGMA foreign_keys = OFF inside a migration is a no-op — that
PRAGMA can only change outside a transaction, and migrations run
in one. So the cascade-trap can't be defused that way; just avoid
the rebuild pattern when there are inbound FKs.
If a column-level ALTER won't do what you need (e.g. tightening a CHECK), use the safe rebuild order: create new with a temp name → copy → DROP old → ALTER new RENAME TO old. Never rename the original first; that propagates the rename into dependent FKs and leaves them dangling after the eventual drop.
Don't slog the merged rest-server URL
restic.Env.RepoURL is bare (no creds). The user:pass@-embedded
form is built only inside envSlice() at the moment of
exec.Command and is fed straight to the subprocess. Never store
it on a struct field. Never pass it to slog. If a URL needs to
appear in any operator-readable surface, run it through
restic.RedactURL() first — that mirrors restic's own ***
substitution.