Files
steve a28bda2031 smoke env: systemd --user unit + Make targets so the dev server outlives shell tool boundaries
Spent half an evening fighting a smoke server that kept getting SIGTERM'd
mid-iteration. Root cause: backgrounded processes spawned from sandboxed
shell tool calls don't outlive the parent — even with nohup + disown.

Fix: hand the server to user-systemd as a transient unit so its lifecycle
is owned by the user's session, not by whichever bash subprocess started it.
New Make targets:

  make smoke-restart   build server + (re)launch as systemd --user unit
  make smoke-status    show unit status
  make smoke-logs      tail $HOME/smoke/server.log
  make smoke-stop      stop the unit
  make smoke-deploy    full rebuild + restage agent assets + restart

Documents the workflow in CLAUDE.md so the next session doesn't relitigate.
2026-05-07 22:55:36 +01:00

5.7 KiB

CLAUDE.md

Project-specific rules for Claude when working in this repo.

Commands

Is the user types in any of the following, follow the instructions in the table

Command Action
:release trigger subagent to commit (if needed), push (if needed), raise PR, wait for PR to pass or fail. If fail, report back. If pass, merge in to main

Repo

The repo lives inside a Gitea instance; tea CLI is available for use by agents

Run go vet before every commit

CI runs go vet ./... and will fail the build on any vet error. Run it locally before staging a commit and fix anything it flags. A common one is res, _ := http.Do(...); defer res.Body.Close() — if err != nil then res may be nil and the deferred close panics. Always check the error before touching res.

No Co-Authored-By trailers on commits

Don't add Co-Authored-By: Claude ... (or any other co-author trailer) to commit messages in this repo. The README will make it plain that the project is heavily spec-coded, so per-commit attribution is just noise.

After building a new binary, also stage it for the smoke env

The smoke / dev environment runs the server out of bin/ directly, but the agent is fetched by the install script from the server's <DataDir>/agent-binaries/ directory, and the systemd unit + the install script are fetched from <DataDir>/install/. Plain make build doesn't touch any of those — the source-of-truth files in the working tree (deploy/install/*, bin/restic-manager-agent) must be copied into $HOME/smoke/data/... and the running agent on this dev host needs replacing if the change touches agent code or the unit file.

This has bitten the smoke env twice (stale agent without mergeRestCreds; stale unit without User=root + capabilities). Both produced confusing test failures that looked like bugs in the new code but were actually "old binary still running."

Rule: after every make build, run the full restage block before asking the operator to test.

# 1. Restage what the install script serves (binary + unit + script).
cp bin/restic-manager-agent \
   $HOME/smoke/data/agent-binaries/restic-manager-agent-linux-amd64
cp deploy/install/install.sh \
   $HOME/smoke/data/install/install.sh
cp deploy/install/install.ps1 \
   $HOME/smoke/data/install/install.ps1
cp deploy/install/restic-manager-agent.service \
   $HOME/smoke/data/install/restic-manager-agent.service

# 2. Replace the running agent on this dev box and restart the
#    service. Skip only when the change is server-side only AND
#    doesn't include a unit-file edit.
sudo -n install -m 0755 bin/restic-manager-agent \
                        /usr/local/bin/restic-manager-agent
sudo -n install -m 0644 deploy/install/restic-manager-agent.service \
                        /etc/systemd/system/restic-manager-agent.service
sudo -n systemctl daemon-reload
sudo -n systemctl restart restic-manager-agent

# 3. The server runs from the working tree; restart it manually
#    after a build that touches server code:
pkill -f restic-manager-server
RM_LISTEN=:8080 RM_DATA_DIR=$HOME/smoke/data \
RM_BASE_URL=http://127.0.0.1:8080 \
RM_SECRET_KEY_FILE=$HOME/smoke/data/secret.key \
RM_COOKIE_SECURE=false \
./bin/restic-manager-server >> $HOME/smoke/server.log 2>&1 &

Smoke server: use the Make targets, not raw nohup

The smoke server runs as a transient systemd --user unit named restic-manager-smoke.service so it survives any sandbox or process-group boundary that would otherwise SIGTERM a backgrounded process. Use the Make targets:

make smoke-restart   # rebuild server + (re)launch as systemd --user unit
make smoke-status    # systemctl --user status
make smoke-logs      # tail $HOME/smoke/server.log
make smoke-stop      # stop the unit
make smoke-deploy    # full rebuild + restage agent assets + restart

./bin/restic-manager-server & from inside a Bash tool call gets reaped when the tool exits — don't do that. If the unit fails to start: systemctl --user status restic-manager-smoke and $HOME/smoke/server.log have the diagnosis.

smoke-deploy does NOT touch /usr/local/bin/restic-manager-agent on this dev box; if your change requires the live agent here to update, run the agent restage block above by hand.

Migrations: prefer column-level ALTERs over table rebuilds

SQLite ≥ 3.35 supports ALTER TABLE ... DROP COLUMN and ALTER TABLE ... RENAME COLUMN. Use them. The "rename-old + create-new + copy + drop-old" pattern is unsafe in this codebase because the connection DSN sets PRAGMA foreign_keys=ON, and DROP TABLE on a parent with ON DELETE CASCADE children wipes every dependent table. We hit this in migration 0007 (first draft) and lost the entire smoke env's schedules / jobs / snapshots / host_credentials.

PRAGMA foreign_keys = OFF inside a migration is a no-op — that PRAGMA can only change outside a transaction, and migrations run in one. So the cascade-trap can't be defused that way; just avoid the rebuild pattern when there are inbound FKs.

If a column-level ALTER won't do what you need (e.g. tightening a CHECK), use the safe rebuild order: create new with a temp name → copy → DROP old → ALTER new RENAME TO old. Never rename the original first; that propagates the rename into dependent FKs and leaves them dangling after the eventual drop.

Don't slog the merged rest-server URL

restic.Env.RepoURL is bare (no creds). The user:pass@-embedded form is built only inside envSlice() at the moment of exec.Command and is fed straight to the subprocess. Never store it on a struct field. Never pass it to slog. If a URL needs to appear in any operator-readable surface, run it through restic.RedactURL() first — that mirrors restic's own *** substitution.