Skip to content

http_cleanup_hang_repro aborts with 'Fatal Python error: _PyThreadState_Attach: non-NULL old thread state' on master #339

Description

@xiaguan

Symptom

pegaflow-server's http_cleanup_hang_repro integration test SIGABRTs — on a clean master checkout (cce4946, v0.22.8), so it currently blocks the cargo test --release pre-commit hook on this dev box regardless of branch.

$ cargo test --release --no-default-features --features cuda-13,rdma \
    -p pegaflow-server --test http_cleanup_hang_repro

running 1 test
Fatal Python error: _PyThreadState_Attach: non-NULL old thread state
Python runtime state: initialized

Current thread 0x00007bb9f1fff6c0 (most recent call first):
  File ".../site-packages/torch/cuda/__init__.py", line 638 in set_device

Thread 0x00007bb9c21ff6c0 (most recent call first):
  <no Python frame>

Extension modules: numpy..., torch._C, ... (total: 13)
error: test failed
Caused by:
  process didn't exit successfully: ... (signal: 6, SIGABRT)

Environment

  • master @ cce4946 (chore: bump version to 0.22.8), reproduced in a pristine git worktree
  • rustc 1.95.0, pyo3 0.28
  • torch 2.11.0+cu130 (root repo .venv, Python 3.13.5 from uv)
  • Run env: LD_LIBRARY_PATH=/usr/local/cuda-12.8/lib64:<uv cpython-3.13.5>/lib (libpython needed by the PyO3-embedding test binaries; cuda-12.8 lib64 is the profile default matching the driver)
  • Single 16 GB GPU box

Analysis

_PyThreadState_Attach: non-NULL old thread state is CPython aborting because a thread attached a thread state while it already had one — i.e. two code paths (PyO3's GIL management in the embedded interpreter vs torch.cuda.set_device called during engine GPU registration) disagree about thread-state ownership. The second thread with <no Python frame> suggests the cleanup/registration race the test was written to reproduce now trips a CPython invariant before reaching the original hang scenario.

Likely interactions to investigate:

  • pyo3 0.28 attach/detach semantics vs Python 3.13 (the _PyThreadState_Attach hard-abort is new-ish CPython behavior; 3.12 tolerated more).
  • The test embeds Python and drives CUDA registration from multiple Tokio/OS threads; if any path calls into Python without going through Python::attach (or holds a cached thread state across threads), 3.13 aborts instead of deadlocking.

Impact

  • prek run / pre-commit cargo test --release hook cannot pass on this box, on any branch, until this is fixed or the test is quarantined behind a marker.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions