perf: stabilize SymbolSlab inlining + reduce encode allocations#213
Merged
cberner merged 2 commits intocberner:masterfrom Mar 15, 2026
Merged
Conversation
Closed
ecb3133 to
cecd2d0
Compare
Promote get(), get_mut(), and get_pair_mut() from #[inline] to #[inline(always)]. These are the PI solver's hottest accessors (called millions of times during decode). Under lto=fat + codegen-units=1, the soft #[inline] hint leaves LLVM free to outline them when nearby code changes, causing up to 24% performance swings in unrelated hot paths.
Contributor
Author
|
@codex review |
cecd2d0 to
985e073
Compare
|
Codex Review: Didn't find any major issues. Hooray! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
cberner
approved these changes
Mar 12, 2026
Owner
cberner
left a comment
There was a problem hiding this comment.
lgtm, but I had a couple minor comments
| let S = num_ldpc_symbols(source_block.len() as u32); | ||
| let H = num_hdpc_symbols(source_block.len() as u32); | ||
|
|
||
| debug_assert_eq!(source_block.symbol_size(), symbol_size); |
Owner
There was a problem hiding this comment.
Should this be an assert!() rather than debug_assert!()?
Comment on lines
+107
to
+110
| debug_assert!( | ||
| self.mapping.is_none(), | ||
| "as_bytes called with active mapping" | ||
| ); |
Owner
There was a problem hiding this comment.
It seems like this should be assert!(). It'd be a pretty serious bug if the mapping is Some, ya?
54a5920 to
d11dd86
Compare
Contributor
Author
|
Moved asserts to runtime, not just debug builds. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
get,get_mut,get_pair_mut) use#[inline](soft hint). Underlto = "fat"+codegen-units = 1, LLVM outlines them when the impl block grows. Adding a single dead-code method to SymbolSlab causes a ~24% roundtrip regression from binary layout shifts alone.SourceBlockEncoderstores source symbols as individualSymbolobjects in aVec<Symbol>, each owning a separateVec<u8>. Building aSourceBlockEncodercopies every source byte twice: once into theVec<Symbol>, then again into the intermediateSymbolSlab.How
Commit 1: stabilize SymbolSlab accessor inlining under LTO
get(),get_mut(), andget_pair_mut()from#[inline]to#[inline(always)].physical_index()was already#[inline(always)].Commit 2: reduce encode source symbol allocations
SymbolSlab::from_bytes()to construct a slab directly from a contiguous byte slice with padding.create_symbols()to return aSymbolSlabinstead ofVec<Symbol>, eliminating the per-symbol heap allocation.create_d(), bulk-copy source bytes into the intermediate slab viacopy_block_from()instead of iterating per-symbol.Benchmarks (Zen4, EPYC 9654P, symbol_size=1280)
Back-to-back runs, cooldowns between switches. Criterion uses 100 samples per metric.
codec_benchmark(criterion, reliable)Comparison is this branch (both commits) vs the inline-only baseline, measured back-to-back:
* Criterion's cached value used for comparison (reported as -17.6% change). Inline-only and master encode times were comparable (~13 us).
End-to-end vs master (separate back-to-back run):
decode_benchmark(5% overhead, single-run harness)decode_benchmark(0% overhead, single-run harness)Tests
cargo clippy --all --all-targets -- -Dwarnings— cleancargo test --all— 60 passed, 4 ignoredcargo build --features benchmarking,serde_support— cleancargo test --features benchmarking— cleanCommit history
985e073perf: stabilize SymbolSlab accessor inlining under LTO (src/symbol_slab.rs)54a5920perf: reduce encode source symbol allocations (src/encoder.rs,src/symbol_slab.rs)Notes
#[inline(always)]on the accessors, addingfrom_bytes()to SymbolSlab triggers the codegen fragility described above, causing a ~24% roundtrip regression from layout shifts.