Add opt-in cancel_futures_on_exit to ThreadSensitiveContext by junodak · Pull Request #560 · django/asgiref

junodak · 2026-05-21T06:21:24Z

Problem

ThreadSensitiveContext.__aexit__ calls executor.shutdown() with the implicit wait=True. When sync middleware inside the context invokes SyncToAsync to call back into the event loop, the request-scoped pool ends up waiting on a future the loop itself must produce:

the main loop blocks on executor.shutdown(), waiting for worker threads to drain
the worker threads block on current_thread_executor.run_until_future, waiting on a future the loop must run

Neither side can make progress. This is the failure mode reported in #545, #495, and #458.

Proposal

Add an opt-in cancel_futures_on_exit flag that switches the shutdown call to executor.shutdown(wait=False, cancel_futures=True). The default is unchanged.

Three ways to opt in:

constructor argument

async with ThreadSensitiveContext(cancel_futures_on_exit=True):
    ...

subclass attribute (useful when the context is instantiated by framework code you cannot easily reach)
```
class MyTSCtx(ThreadSensitiveContext):
    cancel_futures_on_exit = True
```

environment variable for process-wide opt-in

ASGIREF_CANCEL_FUTURES_ON_THREADSENSITIVE_EXIT=1

Trade-off

With the flag on, pending futures are cancelled and shutdown returns immediately. Running worker threads cannot be interrupted mid-syscall in Python, so they finish in the background; their results are discarded because the context is already gone, and any resources they own are released on thread exit. Graceful drain is traded for liveness.

Tests

tests/test_sync.py covers the historic default plus each of the three opt-in surfaces. The rest of the suite is unaffected.

Production observation

In our deployment (Django ASGI + Daphne + ninja API, sync middleware chain underneath an async view) this deadlock manifests as the pod becoming unresponsive — no /internal/health response, no API requests served. Kubernetes' liveness probe eventually SIGKILLs the container, so there is no permanent outage, but every cycle costs the configured failure threshold (5 minutes in our setup) of full service unavailability. Over an 8‑day window we observed 19/19 pod restarts following this exact pattern, on average every 5.4 hours.

After applying the same change as a monkey-patch in our service, the cycle stopped under identical load. Filing this PR so users in similar situations have an upstream-blessed way out instead of patching asgiref in place.

References

Nested AsyncToSync and SyncToAsync results in blocking code #545 Nested AsyncToSync and SyncToAsync results in blocking code
sync_to_async detects deadlock in case of issue in loop.run_in_executor #495 sync_to_async detects deadlock in case of issue in loop.run_in_executor
Issue in sync.py's SyncToAsync class as new ThreadPoolExecutor executors with daemon threads getting created for requests. #458 ThreadPoolExecutor accumulation in SyncToAsync

Default ThreadSensitiveContext.__aexit__ calls executor.shutdown() with the implicit wait=True. When sync middleware in the context calls back into the event loop via SyncToAsync, the request-scoped pool ends up waiting on a future the loop itself must produce — a textbook cross-call deadlock that shows up in production as never-ending requests and stuck health probes (see django#545, django#495, django#458). This change adds an opt-in escape hatch that switches the shutdown call to shutdown(wait=False, cancel_futures=True) — pending futures are cancelled, the loop is freed, running workers finish in the background. Default behaviour is preserved; users can opt in three ways: 1. constructor argument: async with ThreadSensitiveContext(cancel_futures_on_exit=True): ... 2. subclass attribute (handy when ThreadSensitiveContext is instantiated by framework code you cannot easily reach): class MyTSCtx(ThreadSensitiveContext): cancel_futures_on_exit = True 3. environment variable for process-wide opt-in: ASGIREF_CANCEL_FUTURES_ON_THREADSENSITIVE_EXIT=1 Tests cover the historic graceful-drain default plus all three opt-in paths.

- Extract _cancel_futures_on_exit_default_from_env() so tests can exercise env parsing without a subprocess (asgiref has no existing subprocess test convention). - Replace bool | None with Optional[bool] to keep Python 3.9 compatibility. - Document that the env var is read once at import time; runtime changes require the constructor argument or a subclass attribute. - Add parametrized env-parser test, explicit-False-override test. - Add CHANGELOG.txt Unreleased entry.

junhokim added 3 commits May 21, 2026 14:21

Apply black/isort formatting

d51c925

junodak marked this pull request as ready for review May 21, 2026 09:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add opt-in cancel_futures_on_exit to ThreadSensitiveContext#560

Add opt-in cancel_futures_on_exit to ThreadSensitiveContext#560
junodak wants to merge 3 commits into
django:mainfrom
junodak:feature/threadsensitive-cancel-on-exit-opt-in

junodak commented May 21, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

junodak commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Proposal

Trade-off

Tests

Production observation

References

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

junodak commented May 21, 2026 •

edited

Loading