Skip to content

Commit c93b4ed

Browse files
committed
fix(python): restore deterministic teardown for async-callback machinery
PRs #2007/#2008 fixed the TM-PY-030 deadlocks and exit crash by making teardown hands-off (shutdown_background, no loop close), trading deterministic cleanup for forward progress. This restores determinism while the interpreter is alive and keeps hands-off behavior only where CPython makes determinism impossible (interpreter finalization). Protocol: - An atexit handler registered at module import sets INTERPRETER_AT_EXIT. atexit runs at the very start of Py_FinalizeEx, strictly before the phase in which native threads may no longer attach, so the flag cleanly separates 'interpreter alive' from 'process exiting'. - Each private-loop callback runs as a published asyncio.Task. Teardown cancels it via call_soon_threadsafe(task.cancel) and a closing flag rejects queued-but-unstarted items, so joins are bounded by cooperative cancellation instead of full callback duration. - PyPrivateAsyncLoop::shutdown (Drop) joins its worker thread, which closes its asyncio loop before exiting — fds freed before drop returns. - PyRuntime::drop joins the tokio blocking pool again (the pre-#2007 semantics) instead of shutdown_background. - Every join runs through join_without_gil: detach first when PyGILState_Check reports the dropping thread attached. This removes the GIL deadlock rather than avoiding the join. - Pyclass Drop impls (ScriptedTool/Bash/BashTool) cancel in-flight callbacks through an engine registry of live per-session loops before the rt field drop joins the pool. - At exit: threads skip Python entirely (flag check, no Python::attach), runtime falls back to shutdown_background, OS reclaims resources. Verification: - New tests/test_teardown_determinism.py: exact native-thread-count and fd-count stability across tool churn (joins are synchronous in drop), bounded cancellation of abandoned callbacks, and 10x subprocess interpreter-exit checks for both clean and abandoned-callback exits. - Race-sensitive suites looped 20x, concurrent stress (8 threads x 40 mixed iterations incl. timeout+drop churn) looped 10x, langgraph example 40x: zero hangs, zero aborts. - Full bashkit-python suite: 705 passed, 1 skipped. just pre-pr green.
1 parent 7a03f7c commit c93b4ed

5 files changed

Lines changed: 498 additions & 74 deletions

File tree

0 commit comments

Comments
 (0)