Add opt-in cancel_futures_on_exit to ThreadSensitiveContext#560
Open
junodak wants to merge 3 commits into
Open
Add opt-in cancel_futures_on_exit to ThreadSensitiveContext#560junodak wants to merge 3 commits into
junodak wants to merge 3 commits into
Conversation
added 3 commits
May 21, 2026 14:21
Default ThreadSensitiveContext.__aexit__ calls executor.shutdown() with the implicit wait=True. When sync middleware in the context calls back into the event loop via SyncToAsync, the request-scoped pool ends up waiting on a future the loop itself must produce — a textbook cross-call deadlock that shows up in production as never-ending requests and stuck health probes (see django#545, django#495, django#458). This change adds an opt-in escape hatch that switches the shutdown call to shutdown(wait=False, cancel_futures=True) — pending futures are cancelled, the loop is freed, running workers finish in the background. Default behaviour is preserved; users can opt in three ways: 1. constructor argument: async with ThreadSensitiveContext(cancel_futures_on_exit=True): ... 2. subclass attribute (handy when ThreadSensitiveContext is instantiated by framework code you cannot easily reach): class MyTSCtx(ThreadSensitiveContext): cancel_futures_on_exit = True 3. environment variable for process-wide opt-in: ASGIREF_CANCEL_FUTURES_ON_THREADSENSITIVE_EXIT=1 Tests cover the historic graceful-drain default plus all three opt-in paths.
- Extract _cancel_futures_on_exit_default_from_env() so tests can exercise env parsing without a subprocess (asgiref has no existing subprocess test convention). - Replace bool | None with Optional[bool] to keep Python 3.9 compatibility. - Document that the env var is read once at import time; runtime changes require the constructor argument or a subclass attribute. - Add parametrized env-parser test, explicit-False-override test. - Add CHANGELOG.txt Unreleased entry.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
ThreadSensitiveContext.__aexit__callsexecutor.shutdown()with the implicitwait=True. When sync middleware inside the context invokesSyncToAsyncto call back into the event loop, the request-scoped pool ends up waiting on a future the loop itself must produce:executor.shutdown(), waiting for worker threads to draincurrent_thread_executor.run_until_future, waiting on a future the loop must runNeither side can make progress. This is the failure mode reported in #545, #495, and #458.
Proposal
Add an opt-in
cancel_futures_on_exitflag that switches the shutdown call toexecutor.shutdown(wait=False, cancel_futures=True). The default is unchanged.Three ways to opt in:
Trade-off
With the flag on, pending futures are cancelled and
shutdownreturns immediately. Running worker threads cannot be interrupted mid-syscall in Python, so they finish in the background; their results are discarded because the context is already gone, and any resources they own are released on thread exit. Graceful drain is traded for liveness.Tests
tests/test_sync.pycovers the historic default plus each of the three opt-in surfaces. The rest of the suite is unaffected.Production observation
In our deployment (Django ASGI + Daphne + ninja API, sync middleware chain underneath an async view) this deadlock manifests as the pod becoming unresponsive — no
/internal/healthresponse, no API requests served. Kubernetes' liveness probe eventually SIGKILLs the container, so there is no permanent outage, but every cycle costs the configured failure threshold (5 minutes in our setup) of full service unavailability. Over an 8‑day window we observed 19/19 pod restarts following this exact pattern, on average every 5.4 hours.After applying the same change as a monkey-patch in our service, the cycle stopped under identical load. Filing this PR so users in similar situations have an upstream-blessed way out instead of patching
asgirefin place.References