Skip to content

perf: Don't sort GDB index symbols#2087

Merged
davidlattimore merged 1 commit into
mainfrom
push-tokxrrrpvkms
Jun 17, 2026
Merged

perf: Don't sort GDB index symbols#2087
davidlattimore merged 1 commit into
mainfrom
push-tokxrrrpvkms

Conversation

@davidlattimore

Copy link
Copy Markdown
Member

It looks to me like it's not actually necessary to sort the symbols. I checked the output of lld and it didn't have sorted symbols. @lapla-cogito, I assume that sorting was only done to make the output deterministic? This PR removes the sorting, but keeps the output deterministic anyway. Note, we remove some uses of par_bridge, since the output order of par_bridge is non-deterministic.

This gives about a 10% speedup to a --gdb-index benchmark. It also unlocks further parallelism speedups that I'll send in later PRs.

Benchmark 1 (89 runs): /home/david/save/mold-gdb-index/run-with env-rand /home/d/wild-builds/2026-06-17.cg1.b --gdb-index
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           337ms ± 4.14ms     329ms …  348ms          0 ( 0%)        0%
  peak_rss           7.53MB ±  127KB    7.27MB … 7.82MB          0 ( 0%)        0%
  cpu_cycles         9.35G  ±  166M     8.49G  … 9.60G           3 ( 3%)        0%
  instructions       11.4G  ± 25.0M     11.3G  … 11.4G           3 ( 3%)        0%
  cache_references    307M  ± 2.52M      304M  …  319M           8 ( 9%)        0%
  cache_misses       94.7M  ± 1.06M     88.7M  … 96.1M           4 ( 4%)        0%
  branch_misses      25.3M  ± 95.0K     25.2M  … 25.8M           1 ( 1%)        0%
Benchmark 2 (100 runs): /home/david/save/mold-gdb-index/run-with env-rand target/cg1/wild --gdb-index
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           301ms ± 4.25ms     293ms …  319ms          1 ( 1%)        ⚡- 10.8% ±  0.4%
  peak_rss           7.52MB ±  124KB    7.22MB … 7.82MB          4 ( 4%)          -  0.1% ±  0.5%
  cpu_cycles         9.08G  ±  142M     8.44G  … 9.28G           7 ( 7%)        ⚡-  2.9% ±  0.5%
  instructions       11.0G  ± 22.0M     10.9G  … 11.1G           1 ( 1%)        ⚡-  3.1% ±  0.1%
  cache_references    286M  ± 2.23M      281M  …  293M           1 ( 1%)        ⚡-  7.0% ±  0.2%
  cache_misses       88.8M  ±  752K     85.8M  … 90.1M           4 ( 4%)        ⚡-  6.2% ±  0.3%
  branch_misses      23.0M  ± 76.7K     22.8M  … 23.3M          10 (10%)        ⚡-  9.0% ±  0.1%

@lapla-cogito lapla-cogito self-requested a review June 17, 2026 21:51

@lapla-cogito lapla-cogito left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I was sorting the symbols was to ensure build reproducibility (similar to what mold does).

Comment thread libwild/src/gdb_index.rs
let all_scans = per_object.into_iter().flatten().collect_vec();

let num_buckets = rayon::current_num_threads();
let num_buckets = 32;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. By fixing the number of threads, the bucket assignment for each symbol becomes deterministic, which preserves reproducibility. Indeed, this is likely preferable in many scenarios. While it may introduce some overhead or leave certain threads underutilized on some systems, those drawbacks are probably within an acceptable range.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, exactly. Even with substantially more buckets that we need, this PR still shows a performance win. With 2 threads, we see 4.9% improvement, with 1 thread, 3.1%. I guess if systems with 64 or 128 cores become common, then we might want to increase this further, but I suspect that's not currently all that common.

@davidlattimore davidlattimore merged commit 0db969c into main Jun 17, 2026
22 checks passed
@davidlattimore davidlattimore deleted the push-tokxrrrpvkms branch June 17, 2026 23:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants