Symptom
pegaflow-server's http_cleanup_hang_repro integration test SIGABRTs — on a clean master checkout (cce4946, v0.22.8), so it currently blocks the cargo test --release pre-commit hook on this dev box regardless of branch.
$ cargo test --release --no-default-features --features cuda-13,rdma \
-p pegaflow-server --test http_cleanup_hang_repro
running 1 test
Fatal Python error: _PyThreadState_Attach: non-NULL old thread state
Python runtime state: initialized
Current thread 0x00007bb9f1fff6c0 (most recent call first):
File ".../site-packages/torch/cuda/__init__.py", line 638 in set_device
Thread 0x00007bb9c21ff6c0 (most recent call first):
<no Python frame>
Extension modules: numpy..., torch._C, ... (total: 13)
error: test failed
Caused by:
process didn't exit successfully: ... (signal: 6, SIGABRT)
Environment
- master @ cce4946 (chore: bump version to 0.22.8), reproduced in a pristine
git worktree
- rustc 1.95.0, pyo3 0.28
- torch 2.11.0+cu130 (root repo
.venv, Python 3.13.5 from uv)
- Run env:
LD_LIBRARY_PATH=/usr/local/cuda-12.8/lib64:<uv cpython-3.13.5>/lib (libpython needed by the PyO3-embedding test binaries; cuda-12.8 lib64 is the profile default matching the driver)
- Single 16 GB GPU box
Analysis
_PyThreadState_Attach: non-NULL old thread state is CPython aborting because a thread attached a thread state while it already had one — i.e. two code paths (PyO3's GIL management in the embedded interpreter vs torch.cuda.set_device called during engine GPU registration) disagree about thread-state ownership. The second thread with <no Python frame> suggests the cleanup/registration race the test was written to reproduce now trips a CPython invariant before reaching the original hang scenario.
Likely interactions to investigate:
- pyo3 0.28 attach/detach semantics vs Python 3.13 (the
_PyThreadState_Attach hard-abort is new-ish CPython behavior; 3.12 tolerated more).
- The test embeds Python and drives CUDA registration from multiple Tokio/OS threads; if any path calls into Python without going through
Python::attach (or holds a cached thread state across threads), 3.13 aborts instead of deadlocking.
Impact
prek run / pre-commit cargo test --release hook cannot pass on this box, on any branch, until this is fixed or the test is quarantined behind a marker.
Symptom
pegaflow-server'shttp_cleanup_hang_reprointegration test SIGABRTs — on a cleanmastercheckout (cce4946, v0.22.8), so it currently blocks thecargo test --releasepre-commit hook on this dev box regardless of branch.Environment
git worktree.venv, Python 3.13.5 from uv)LD_LIBRARY_PATH=/usr/local/cuda-12.8/lib64:<uv cpython-3.13.5>/lib(libpython needed by the PyO3-embedding test binaries; cuda-12.8 lib64 is the profile default matching the driver)Analysis
_PyThreadState_Attach: non-NULL old thread stateis CPython aborting because a thread attached a thread state while it already had one — i.e. two code paths (PyO3's GIL management in the embedded interpreter vstorch.cuda.set_devicecalled during engine GPU registration) disagree about thread-state ownership. The second thread with<no Python frame>suggests the cleanup/registration race the test was written to reproduce now trips a CPython invariant before reaching the original hang scenario.Likely interactions to investigate:
_PyThreadState_Attachhard-abort is new-ish CPython behavior; 3.12 tolerated more).Python::attach(or holds a cached thread state across threads), 3.13 aborts instead of deadlocking.Impact
prek run/ pre-commitcargo test --releasehook cannot pass on this box, on any branch, until this is fixed or the test is quarantined behind a marker.