Skip to content

Add opt-in cancel_futures_on_exit to ThreadSensitiveContext#560

Open
junodak wants to merge 3 commits into
django:mainfrom
junodak:feature/threadsensitive-cancel-on-exit-opt-in
Open

Add opt-in cancel_futures_on_exit to ThreadSensitiveContext#560
junodak wants to merge 3 commits into
django:mainfrom
junodak:feature/threadsensitive-cancel-on-exit-opt-in

Conversation

@junodak
Copy link
Copy Markdown

@junodak junodak commented May 21, 2026

Problem

ThreadSensitiveContext.__aexit__ calls executor.shutdown() with the implicit wait=True. When sync middleware inside the context invokes SyncToAsync to call back into the event loop, the request-scoped pool ends up waiting on a future the loop itself must produce:

  • the main loop blocks on executor.shutdown(), waiting for worker threads to drain
  • the worker threads block on current_thread_executor.run_until_future, waiting on a future the loop must run

Neither side can make progress. This is the failure mode reported in #545, #495, and #458.

Proposal

Add an opt-in cancel_futures_on_exit flag that switches the shutdown call to executor.shutdown(wait=False, cancel_futures=True). The default is unchanged.

Three ways to opt in:

  • constructor argument
    async with ThreadSensitiveContext(cancel_futures_on_exit=True):
        ...
  • subclass attribute (useful when the context is instantiated by framework code you cannot easily reach)
    class MyTSCtx(ThreadSensitiveContext):
        cancel_futures_on_exit = True
  • environment variable for process-wide opt-in
    ASGIREF_CANCEL_FUTURES_ON_THREADSENSITIVE_EXIT=1
    

Trade-off

With the flag on, pending futures are cancelled and shutdown returns immediately. Running worker threads cannot be interrupted mid-syscall in Python, so they finish in the background; their results are discarded because the context is already gone, and any resources they own are released on thread exit. Graceful drain is traded for liveness.

Tests

tests/test_sync.py covers the historic default plus each of the three opt-in surfaces. The rest of the suite is unaffected.

Production observation

In our deployment (Django ASGI + Daphne + ninja API, sync middleware chain underneath an async view) this deadlock manifests as the pod becoming unresponsive — no /internal/health response, no API requests served. Kubernetes' liveness probe eventually SIGKILLs the container, so there is no permanent outage, but every cycle costs the configured failure threshold (5 minutes in our setup) of full service unavailability. Over an 8‑day window we observed 19/19 pod restarts following this exact pattern, on average every 5.4 hours.

After applying the same change as a monkey-patch in our service, the cycle stopped under identical load. Filing this PR so users in similar situations have an upstream-blessed way out instead of patching asgiref in place.

References

junhokim added 3 commits May 21, 2026 14:21
Default ThreadSensitiveContext.__aexit__ calls executor.shutdown() with the
implicit wait=True. When sync middleware in the context calls back into
the event loop via SyncToAsync, the request-scoped pool ends up waiting on a
future the loop itself must produce — a textbook cross-call deadlock that
shows up in production as never-ending requests and stuck health probes
(see django#545, django#495, django#458).

This change adds an opt-in escape hatch that switches the shutdown call to
shutdown(wait=False, cancel_futures=True) — pending futures are cancelled,
the loop is freed, running workers finish in the background. Default
behaviour is preserved; users can opt in three ways:

  1. constructor argument:
     async with ThreadSensitiveContext(cancel_futures_on_exit=True): ...
  2. subclass attribute (handy when ThreadSensitiveContext is instantiated
     by framework code you cannot easily reach):
     class MyTSCtx(ThreadSensitiveContext): cancel_futures_on_exit = True
  3. environment variable for process-wide opt-in:
     ASGIREF_CANCEL_FUTURES_ON_THREADSENSITIVE_EXIT=1

Tests cover the historic graceful-drain default plus all three opt-in
paths.
- Extract _cancel_futures_on_exit_default_from_env() so tests can exercise
  env parsing without a subprocess (asgiref has no existing subprocess test
  convention).
- Replace bool | None with Optional[bool] to keep Python 3.9 compatibility.
- Document that the env var is read once at import time; runtime changes
  require the constructor argument or a subclass attribute.
- Add parametrized env-parser test, explicit-False-override test.
- Add CHANGELOG.txt Unreleased entry.
@junodak junodak marked this pull request as ready for review May 21, 2026 09:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant