Skip to content

Harden test harness: a local vitest run forked 30+ real daemons and OOM-reaped prod kaval #1375

@srid

Description

@srid

What happened

During an autonomous /be run on a workstation that also runs production kolu + kaval (PR #1374), the agent ran the kaval unit suite locally for fast feedback:

packages/kaval/node_modules/.bin/vitest run --root packages/kaval   # ×2

packages/kaval/src/socketDaemon.test.ts spawns real kaval daemon processes (node --import <tsx loader> bin.ts) + real kaval-tui processes — ~30+ real processes across the two runs (fingerprint: 182 leftover /tmp/kaval-e2e-* dirs). The tests are isolated (per-test mkdtemp sockets, PID-scoped kills — they never touch the prod $XDG_RUNTIME_DIR/kaval-* socket), so nothing killed prod kaval directly. But the process/memory spike OOM-reaped the user's long-lived production kaval (a crash, not a kill). The agent should have run these on CI / a pu box (where they pass green), never on the workstation.

Root cause

The kaval test binary forks real daemons with no gate inside the binary. Existing guards wrap commands (the /test skill routes e2e to pu boxes; dev-server uses random ports; do.md forbids bare just dev) — but a bare vitest --root <pkg> reaches past the command to the binary, bypassing all of them. The fix must live in the test binary, because only that is independent of how it's invoked, which harness runs it, or which directory it's in.

Two traps (found by an adversarial review pass)

  1. kaval was just where it landed — the danger surface is wider. The same bypass works on:
    • packages/surface-daemon/src/daemonMain.test.ts, pidGate.test.ts — fork real node -e
    • packages/surface-daemon-supervisor/src/waitForPidGone.test.ts — spawns real sleep 30
    • packages/kaval-tui/src/{attach,create}.test.ts — fork ~6 real shell PTYs via terminal.spawn
    • packages/tests/support/hooks.ts (Cucumber) — spawns a real kolu server + Chromium + Xvfb (gigabytes if run locally)
  2. Gating on CI is broken here. odu runs unit tests as nix develop -c pnpm test:unit with no CI env exported. describe.runIf(process.env.CI) would silently skip daemon tests in CI too (coverage → 0), or — if widened to honor a locally-set CIre-arm the danger on the workstation. CI is the wrong key.

The fix that closes the class (impossible-by-construction)

  • 1. A shared gate helperdescribeDaemon(...) — keyed only on KOLU_DAEMON_TESTS=1, default OFF, applied at every real-spawn test site (the files above). A bare vitest run on any package then forks nothing, regardless of invocation path.
  • 2. A meta-test/lint that fails the suite if any *.test.ts forks a real process without routing through that gate (grep for spawn(process.execPath, node --import, servePtyHostOverUnixSocket, real ssh/nix, PTY-forking terminal.spawn). This makes it structural rather than discipline — a future spawning test can't exist ungated.
  • 3. CI's own just recipe exports KOLU_DAEMON_TESTS=1 (CI runs just test-unit → full coverage), verified against a real odu unit log that the daemon it(...) lines actually execute.

Defense-in-depth (guardrails, never the barrier)

  • 4. PreToolUse Bash hook — authored in the apm source (so just ai::apm doesn't clobber it) — that denies local vitest / pnpm test:unit / cucumber and steers the agent to CI/pu. Bypassable + harness-scoped, so a tripwire, not a wall. (Do not trigger on "is a kaval live" — the resource spike harms even with no kaval up, and a detector can't tell prod kaval from a just dev one or the test's own /tmp gates → false blocks that train people to disable it.)
  • 5. Ergonomic default: just test-unit stays fork-free (the safe reach); just test-daemon (sets the opt-in) is documented as CI/pu-only.
  • 6. Doc rule keyed on a property ("never run a suite that forks real OS processes on the workstation") pointing at the lint — not a hand-maintained file list that rots.

Net: layers 1–3 make the workstation safe by construction even with a careless agent or a new harness; 4–6 are insurance.


Filed from the /be run on PR #1374 (P2.5 frontDaemonOverStdio). Design produced + adversarially verified via an ultracode workflow; full verdict in the session transcript.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions