perf: Don't sort GDB index symbols#2087
Conversation
lapla-cogito
left a comment
There was a problem hiding this comment.
The reason I was sorting the symbols was to ensure build reproducibility (similar to what mold does).
| let all_scans = per_object.into_iter().flatten().collect_vec(); | ||
|
|
||
| let num_buckets = rayon::current_num_threads(); | ||
| let num_buckets = 32; |
There was a problem hiding this comment.
Ah, I see. By fixing the number of threads, the bucket assignment for each symbol becomes deterministic, which preserves reproducibility. Indeed, this is likely preferable in many scenarios. While it may introduce some overhead or leave certain threads underutilized on some systems, those drawbacks are probably within an acceptable range.
There was a problem hiding this comment.
Yep, exactly. Even with substantially more buckets that we need, this PR still shows a performance win. With 2 threads, we see 4.9% improvement, with 1 thread, 3.1%. I guess if systems with 64 or 128 cores become common, then we might want to increase this further, but I suspect that's not currently all that common.
It looks to me like it's not actually necessary to sort the symbols. I checked the output of lld and it didn't have sorted symbols. @lapla-cogito, I assume that sorting was only done to make the output deterministic? This PR removes the sorting, but keeps the output deterministic anyway. Note, we remove some uses of
par_bridge, since the output order ofpar_bridgeis non-deterministic.This gives about a 10% speedup to a
--gdb-indexbenchmark. It also unlocks further parallelism speedups that I'll send in later PRs.