Commit c93b4ed
committed
fix(python): restore deterministic teardown for async-callback machinery
PRs #2007/#2008 fixed the TM-PY-030 deadlocks and exit crash by making
teardown hands-off (shutdown_background, no loop close), trading
deterministic cleanup for forward progress. This restores determinism
while the interpreter is alive and keeps hands-off behavior only where
CPython makes determinism impossible (interpreter finalization).
Protocol:
- An atexit handler registered at module import sets INTERPRETER_AT_EXIT.
atexit runs at the very start of Py_FinalizeEx, strictly before the
phase in which native threads may no longer attach, so the flag cleanly
separates 'interpreter alive' from 'process exiting'.
- Each private-loop callback runs as a published asyncio.Task. Teardown
cancels it via call_soon_threadsafe(task.cancel) and a closing flag
rejects queued-but-unstarted items, so joins are bounded by cooperative
cancellation instead of full callback duration.
- PyPrivateAsyncLoop::shutdown (Drop) joins its worker thread, which
closes its asyncio loop before exiting — fds freed before drop returns.
- PyRuntime::drop joins the tokio blocking pool again (the pre-#2007
semantics) instead of shutdown_background.
- Every join runs through join_without_gil: detach first when
PyGILState_Check reports the dropping thread attached. This removes the
GIL deadlock rather than avoiding the join.
- Pyclass Drop impls (ScriptedTool/Bash/BashTool) cancel in-flight
callbacks through an engine registry of live per-session loops before
the rt field drop joins the pool.
- At exit: threads skip Python entirely (flag check, no Python::attach),
runtime falls back to shutdown_background, OS reclaims resources.
Verification:
- New tests/test_teardown_determinism.py: exact native-thread-count and
fd-count stability across tool churn (joins are synchronous in drop),
bounded cancellation of abandoned callbacks, and 10x subprocess
interpreter-exit checks for both clean and abandoned-callback exits.
- Race-sensitive suites looped 20x, concurrent stress (8 threads x 40
mixed iterations incl. timeout+drop churn) looped 10x, langgraph
example 40x: zero hangs, zero aborts.
- Full bashkit-python suite: 705 passed, 1 skipped. just pre-pr green.1 parent 7a03f7c commit c93b4ed
5 files changed
Lines changed: 498 additions & 74 deletions
File tree
- crates/bashkit-python
- src
- tests
- specs
0 commit comments