runner tests: probe-exec setupScript to clear overlayfs ETXTBSY
CI / Test (rest) (pull_request) Successful in 7s
CI / Test (server-http) (pull_request) Successful in 1m37s
CI / Test (store) (pull_request) Successful in 5s
CI / Lint (pull_request) Successful in 21s
CI / Build (windows/amd64) (pull_request) Successful in 10s
CI / Build (linux/arm64) (pull_request) Successful in 9s
CI / Build (linux/amd64) (pull_request) Successful in 1m2s
e2e / Playwright vs docker-compose (pull_request) Failing after 5m0s

The original write-tmp-then-rename guard handles the ETXTBSY race
on a vanilla filesystem, but inside the new ci-runner-go
container our jobs land on overlayfs, which keeps a lagged
"writable inode" view long enough to leak ETXTBSY into the
exec the test does milliseconds later.

After rename, probe-exec the file with a benign argument
("__rm_probe__" — every script's case statement falls through
to a clean exit) until exec succeeds. Each script body is shaped
`case "$1" in restore) ... ;; esac` so the probe is a no-op.
3s deadline keeps a stuck filesystem from hanging the suite.
This commit is contained in:
2026-05-08 21:17:18 +01:00
parent 084ddd56ba
commit 21567adb8e
+34 -7
View File
@@ -2,10 +2,14 @@ package runner
import (
"context"
"errors"
"os"
"os/exec"
"path/filepath"
"sync"
"syscall"
"testing"
"time"
"gitea.dcglab.co.uk/steve/restic-manager/internal/api"
"gitea.dcglab.co.uk/steve/restic-manager/internal/restic"
@@ -43,13 +47,22 @@ func (s *fakeSender) snapshot() []api.Envelope {
// setupScript writes a shell script (without shebang) to a temp dir,
// names it "restic", makes it executable, and returns the path.
//
// Writes to "<path>.tmp" then renames into place. The rename is what
// makes this race-free: under -race + many t.Parallel tests, a
// fork-from-another-goroutine can inherit the writable fd from
// Writes to "<path>.tmp" then renames into place. The rename is the
// usual guard against ETXTBSY: under -race + many t.Parallel tests,
// a fork-from-another-goroutine can inherit the writable fd from
// os.WriteFile before close completes, and exec'ing the file then
// returns ETXTBSY ("text file busy"). Once the rename lands, the
// final path is a fresh dirent pointing at an inode that has no
// writable fd open anywhere — exec is safe.
// returns ETXTBSY ("text file busy"). The renamed dirent points at
// an inode that has no writable fd open anywhere — exec is safe on
// a vanilla filesystem.
//
// On overlayfs (every job that runs inside a `container:` block on
// our Gitea runner), the rename can briefly leak ETXTBSY anyway —
// the upper layer's "writable inode" bookkeeping lags the userspace
// close. To make the helper deterministic across environments, we
// probe-exec the file with a benign argument until exec succeeds,
// then return. Each script body has a `case "$1" in ... esac` shape
// where unknown args fall through to a clean exit, so the probe is
// a no-op from the test's point of view.
func setupScript(t *testing.T, body string) string {
t.Helper()
dir := t.TempDir()
@@ -61,7 +74,21 @@ func setupScript(t *testing.T, body string) string {
if err := os.Rename(tmp, final); err != nil {
t.Fatalf("setupScript: rename: %v", err)
}
return final
deadline := time.Now().Add(3 * time.Second)
for {
err := exec.Command(final, "__rm_probe__").Run()
if err == nil {
return final
}
if !errors.Is(err, syscall.ETXTBSY) {
t.Fatalf("setupScript: probe exec: %v", err)
}
if time.Now().After(deadline) {
t.Fatalf("setupScript: %s still ETXTBSY after 3s", final)
}
time.Sleep(10 * time.Millisecond)
}
}
// firstEnvOfType returns the first envelope with the given type, or