Skip to content

fix: run blocking requests download in thread pool to unfreeze parallel downloads#982

Open
assafw wants to merge 1 commit into
nathom:devfrom
assafw:dev
Open

fix: run blocking requests download in thread pool to unfreeze parallel downloads#982
assafw wants to merge 1 commit into
nathom:devfrom
assafw:dev

Conversation

@assafw

@assafw assafw commented Jun 7, 2026

Copy link
Copy Markdown

Problem

fast_async_download was using the synchronous requests library directly on the asyncio event loop. This caused all active concurrent downloads to freeze whenever one download was establishing a connection or reading data — because requests.get() is fully blocking and owns the thread until it completes.

The # noqa: ASYNC100/ASYNC101 suppressions in the original code indicate this was a known lint warning that was silenced rather than addressed.

The await asyncio.sleep(0) yielded between ~1MB chunks, but the event loop was still blocked for the entire duration of each blocking read syscall between those yields.

Root Cause

# Before — blocks the entire event loop on every call
async def fast_async_download(path, url, headers, callback):
    with requests.get(url, stream=True) as resp:   # blocks event loop
        for chunk in resp.iter_content(...):
            ...
            await asyncio.sleep(0)  # only yields every ~1MB, too late

Fix

Split into a plain blocking function (_do_download) and offload it to the default thread pool via run_in_executor. This keeps the event loop free so all concurrent downloads proceed without blocking each other.

The large 131KB chunk size from the original implementation is preserved — this was a deliberate fix for a separate issue where aiohttp + aiofiles yielded to the event loop every 1KB, making downloads CPU-bound and capping total throughput at ~10MB/s.

Progress callbacks are dispatched back to the event loop via call_soon_threadsafe to keep rich's Live display thread-safe (it must only be updated from the main thread).

# After — blocking work runs in a thread, event loop stays free
def _do_download(path, url, headers, callback, loop):
    chunk_size = 2**17  # 131 KB — preserves original throughput fix
    with open(path, "wb") as file:
        with requests.get(url, headers=headers, allow_redirects=True, stream=True) as resp:
            for chunk in resp.iter_content(chunk_size=chunk_size):
                file.write(chunk)
                loop.call_soon_threadsafe(callback, len(chunk))

async def fast_async_download(path, url, headers, callback):
    loop = asyncio.get_event_loop()
    await loop.run_in_executor(None, _do_download, path, url, headers, callback, loop)

What's preserved

  • requests + large chunks for throughput (not reverted to aiohttp)
  • Semaphore-based concurrency limits (max_connections / concurrency config)
  • Rate limiter (requests_per_minute config) — applied at the API layer before download starts
  • All 59 existing passing tests continue to pass

…el downloads

fast_async_download was using the synchronous requests library directly on the
event loop, blocking all concurrent downloads whenever a connection was active.
Offload to run_in_executor so the event loop stays free, and dispatch progress
callbacks via call_soon_threadsafe to keep rich's Live display thread-safe.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant