perf: Improve download throughput with 128KB buffer and SpooledTempFi… by pm-ju · Pull Request #2248 · conda/rattler

pm-ju · 2026-03-17T20:23:20Z

Description

This PR improves the download throughput of rattler by unblocking concurrent extraction constraints and increasing default Read buffer sizes.

Previously, tar.bz2 downloads were constrained by single-inline decompression, meaning the request stream would throttle if the CPU decompression (bzip2) took too long. Furthermore, many parts of the pipeline used the tokio default 8KB buffer instead of a more efficient chunking sizing.

This PR:

Implements a SpooledTempFile caching strategy for extract_tar_bz2 over HTTP urls, decoupling the download bounds from the CPU extraction bottlenecks by buffering.
Changes the default buffer size to std::cmp::max 128KB (via DEFAULT_BUF_SIZE) consistently across HTTP range-reads in sparse.rs, standard local decoding in reqwest/tokio.rs, and overall byte streaming in full_download.rs

Fixes #1007

How Has This Been Tested?

Included new tests via cargo bench --bench extraction_benchmark --package rattler_package_streaming --features reqwest --no-fail-fast using a test_server to verify extraction metrics. Tests ran locally via the rust axum internal server.

Benchmark Results

Scenario	Concurrency	Total Time (ms)	Throughput (pkg/s)	Avg (ms)	Min (ms)	Max (ms)
Pure Extraction	8	1087	7.36	439.50	185	1085
Download+Extract (Stream)	8	744	10.75	245.50	105	741
Download+Extract (Spooled)	8	681	11.73	233.62	112	662
Mixed Workload	8	636	12.56	195.75	82	635
Pure Extraction	16	856	18.68	394.38	200	853
Download+Extract (Stream)	16	862	18.55	348.50	110	857
Download+Extract (Spooled)	16	806	19.84	318.06	118	804
Mixed Workload	16	810	19.74	332.88	142	808
Pure Extraction	32	1231	25.98	523.00	212	1222
Download+Extract (Stream)	32	1360	23.52	523.00	194	1350
Download+Extract (Spooled)	32	1295	24.69	506.00	238	1283
Mixed Workload	32	1281	24.98	542.41	245	1278

AI Disclosure

This PR contains AI-generated content.
- I have tested any AI-generated content in my PR.
- I take responsibility for any AI-generated content in my PR.
  Tools: Gemini, Claude

Checklist:

I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have added sufficient tests to cover my changes.

…le caching for tar.bz2 downloads

baszalmstra · 2026-03-17T21:08:06Z

It looks like with high concurrency there is no benefit? or did i read the benchmarks incorrectly?

pm-ju · 2026-03-17T21:14:22Z

Yeah, you read the benchmarks right! The gap between streaming and spooling definitely shrinks as we crank up the concurrency.

The reason for this is that the benchmark uses a local test server. At 32 concurrent threads, the local machine just gets completely CPU-bound trying to run 32 bzip2 extractions at the same time. Since the "download" speed from localhost is basically instant, the CPU becomes the bottleneck, so buffering doesn't offer much of an advantage anymore.

In a real-world scenario though (downloading over an actual network), bzip2 extraction becomes the bottleneck. By spooling to a buffer, we let the network download run at max speed without it constantly pausing to wait on the CPU to finish extracting chunks.

baszalmstra · 2026-03-17T23:08:45Z

So I see a couple of downsides to this approach. One is that this now first downloads the entire file and only then starts decoding for large files this will mean that downloading and extracting are no longer current. The other is that its unclear from the benchmark what the difference is between scenario 1 and 4. The code seems to be the same?

…x benchmark

pm-ju · 2026-03-18T01:35:47Z

Ah actually you were right.

Scenario 1 vs 4: Yeah my bad, that was just a copy-paste error when I was setting up the benchmarks. I've completely removed the duplicate scenario 4 so it's clean now.
Concurrency loss with SpooledTempFile: That's a super fair point. Tbh using SpooledTempFile the way I did essentially forced it to buffer the whole download first before extracting, so it totally killed the true concurrency.

To fix that and actually get parallel download+extract back while still having a buffer for slow bzip2 decompression, I rewrote extract_tar_bz2_via_buffering back to using a tokio::io::duplex pipe.

Basically we're spawning two concurrent tasks connected by a 5MB pipe now:

Downloader: streams from the network straight into the pipe
Extractor: reads from the pipe and runs tokio_tar decompression

If decompression starts lagging, the pipe fills up to 5MB and naturally back-pressures the download task so we don't blow up memory. If the network drops, extraction just waits.

I ran the benchmarks again locally against test_server.rs comparing the old inline streaming vs the new duplex pipe:

Concurrency	Old `Stream`	New `Spooled`	Diff
8	744ms	681ms	+8.4%
16	862ms	806ms	+6.5%
32	1360ms	1295ms	+4.8%

what do you think of new benchmarks?

Perf: Improve download throughput with 128KB buffer and SpooledTempFi…

84fd73e

…le caching for tar.bz2 downloads

pm-ju changed the title ~~Perf: Improve download throughput with 128KB buffer and SpooledTempFi…~~ perf: Improve download throughput with 128KB buffer and SpooledTempFi… Mar 17, 2026

Fix: Resolve CI formatting and clippy dead-code warnings

978980d

refactor: use tokio::io::duplex for true concurrent extraction and fi…

42218dc

…x benchmark

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: Improve download throughput with 128KB buffer and SpooledTempFi…#2248

perf: Improve download throughput with 128KB buffer and SpooledTempFi…#2248
pm-ju wants to merge 3 commits intoconda:mainfrom
pm-ju:feature/spooled-tar-bz2-download-boost

pm-ju commented Mar 17, 2026

Uh oh!

baszalmstra commented Mar 17, 2026

Uh oh!

pm-ju commented Mar 17, 2026

Uh oh!

baszalmstra commented Mar 17, 2026

Uh oh!

pm-ju commented Mar 18, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pm-ju commented Mar 17, 2026

Description

How Has This Been Tested?

AI Disclosure

Checklist:

Uh oh!

baszalmstra commented Mar 17, 2026

Uh oh!

pm-ju commented Mar 17, 2026

Uh oh!

baszalmstra commented Mar 17, 2026

Uh oh!

pm-ju commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pm-ju commented Mar 18, 2026 •

edited

Loading