perf: Improve download throughput with 128KB buffer and SpooledTempFi…#2248
perf: Improve download throughput with 128KB buffer and SpooledTempFi…#2248pm-ju wants to merge 3 commits intoconda:mainfrom
Conversation
…le caching for tar.bz2 downloads
|
It looks like with high concurrency there is no benefit? or did i read the benchmarks incorrectly? |
|
Yeah, you read the benchmarks right! The gap between streaming and spooling definitely shrinks as we crank up the concurrency. The reason for this is that the benchmark uses a local test server. At 32 concurrent threads, the local machine just gets completely CPU-bound trying to run 32 In a real-world scenario though (downloading over an actual network), |
|
So I see a couple of downsides to this approach. One is that this now first downloads the entire file and only then starts decoding for large files this will mean that downloading and extracting are no longer current. The other is that its unclear from the benchmark what the difference is between scenario 1 and 4. The code seems to be the same? |
|
Ah actually you were right.
To fix that and actually get parallel download+extract back while still having a buffer for slow Basically we're spawning two concurrent tasks connected by a 5MB pipe now:
If decompression starts lagging, the pipe fills up to 5MB and naturally back-pressures the download task so we don't blow up memory. If the network drops, extraction just waits. I ran the benchmarks again locally against
what do you think of new benchmarks? |
Description
This PR improves the download throughput of rattler by unblocking concurrent extraction constraints and increasing default Read buffer sizes.
Previously,
tar.bz2downloads were constrained by single-inline decompression, meaning the request stream would throttle if the CPU decompression (bzip2) took too long. Furthermore, many parts of the pipeline used thetokiodefault8KBbuffer instead of a more efficient chunking sizing.This PR:
SpooledTempFilecaching strategy forextract_tar_bz2over HTTP urls, decoupling the download bounds from the CPU extraction bottlenecks by buffering.std::cmp::max128KB(viaDEFAULT_BUF_SIZE) consistently across HTTP range-reads insparse.rs, standard local decoding inreqwest/tokio.rs, and overall byte streaming infull_download.rsFixes #1007
How Has This Been Tested?
Included new tests via
cargo bench --bench extraction_benchmark --package rattler_package_streaming --features reqwest --no-fail-fastusing atest_serverto verify extraction metrics. Tests ran locally via the rust axum internal server.Benchmark Results
AI Disclosure
Tools: Gemini, Claude
Checklist: