perf: stabilize SymbolSlab inlining + reduce encode allocations by virtuallynathan · Pull Request #213 · cberner/raptorq

virtuallynathan · 2026-03-09T04:26:11Z

Why

Codegen fragility: The SymbolSlab accessor methods (get, get_mut, get_pair_mut) use #[inline] (soft hint). Under lto = "fat" + codegen-units = 1, LLVM outlines them when the impl block grows. Adding a single dead-code method to SymbolSlab causes a ~24% roundtrip regression from binary layout shifts alone.
Redundant copies in encode: SourceBlockEncoder stores source symbols as individual Symbol objects in a Vec<Symbol>, each owning a separate Vec<u8>. Building a SourceBlockEncoder copies every source byte twice: once into the Vec<Symbol>, then again into the intermediate SymbolSlab.

How

Commit 1: stabilize SymbolSlab accessor inlining under LTO

Promote get(), get_mut(), and get_pair_mut() from #[inline] to #[inline(always)].
These are the PI solver's hottest accessors. physical_index() was already #[inline(always)].
1 file, 3 lines changed.

Commit 2: reduce encode source symbol allocations

Add SymbolSlab::from_bytes() to construct a slab directly from a contiguous byte slice with padding.
Change create_symbols() to return a SymbolSlab instead of Vec<Symbol>, eliminating the per-symbol heap allocation.
In create_d(), bulk-copy source bytes into the intermediate slab via copy_block_from() instead of iterating per-symbol.
2 files, +53/-37 lines.
Public API unchanged.

Benchmarks (Zen4, EPYC 9654P, symbol_size=1280)

Back-to-back runs, cooldowns between switches. Criterion uses 100 samples per metric.

`codec_benchmark` (criterion, reliable)

Comparison is this branch (both commits) vs the inline-only baseline, measured back-to-back:

Metric	inline-only baseline	this branch	Delta
encode 10KB	15.86 us*	13.07 us	-17.6%
roundtrip 10KB	14.42 us	11.49 us	-20.3%
roundtrip repair 10KB	48.41 us	45.61 us	-5.8%

* Criterion's cached value used for comparison (reported as -17.6% change). Inline-only and master encode times were comparable (~13 us).

End-to-end vs master (separate back-to-back run):

Metric	master	this branch	Estimated total delta
encode 10KB	13.46 us	13.07 us	-3%
roundtrip 10KB	14.64 us	11.49 us	-21%
roundtrip repair 10KB	48.93 us	45.61 us	-7%

`decode_benchmark` (5% overhead, single-run harness)

K	master	this branch	Delta
10	3,122	3,084	-1.2%
100	3,906	3,906	0.0%
250	4,192	4,353	+3.8%
500	4,883	5,207	+6.6%
1,000	4,859	5,374	+10.6%
2,000	4,680	4,859	+3.8%
5,000	4,156	4,173	+0.4%
10,000	3,081	3,475	+12.8%
20,000	2,128	1,885	-11.4%
50,000	1,270	1,405	+10.6%

Commit 2 only touches encoder code. Decode variation is from the #[inline(always)] in commit 1 and single-run noise.

`decode_benchmark` (0% overhead, single-run harness)

K	master	this branch	Delta
10	3,303	2,959	-10.4%
100	4,110	4,029	-2.0%
250	3,693	4,353	+17.9%
500	4,066	4,217	+3.7%
1,000	3,876	4,197	+8.3%
2,000	3,983	4,250	+6.7%
5,000	3,299	3,742	+13.4%
10,000	2,933	3,277	+11.7%
20,000	2,128	1,961	-7.8%
50,000	1,372	1,449	+5.6%

Single-run harness. The 0% overhead path does not exercise the PI solver, so variation is noise.

Tests

cargo clippy --all --all-targets -- -Dwarnings — clean
cargo test --all — 60 passed, 4 ignored
cargo build --features benchmarking,serde_support — clean
cargo test --features benchmarking — clean

Commit history

985e073 perf: stabilize SymbolSlab accessor inlining under LTO (src/symbol_slab.rs)
54a5920 perf: reduce encode source symbol allocations (src/encoder.rs, src/symbol_slab.rs)

Notes

Commit 1 is prerequisite for commit 2. Without #[inline(always)] on the accessors, adding from_bytes() to SymbolSlab triggers the codegen fragility described above, causing a ~24% roundtrip regression from layout shifts.
The decode-side copy reduction from the original version of this PR was dropped — it showed marginal improvement indistinguishable from noise and added complexity.

Promote get(), get_mut(), and get_pair_mut() from #[inline] to #[inline(always)]. These are the PI solver's hottest accessors (called millions of times during decode). Under lto=fat + codegen-units=1, the soft #[inline] hint leaves LLVM free to outline them when nearby code changes, causing up to 24% performance swings in unrelated hot paths.

virtuallynathan · 2026-03-09T05:36:27Z

@codex review

chatgpt-codex-connector · 2026-03-09T05:41:28Z

Codex Review: Didn't find any major issues. Hooray!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

cberner

lgtm, but I had a couple minor comments

cberner · 2026-03-12T00:25:53Z

    let S = num_ldpc_symbols(source_block.len() as u32);
    let H = num_hdpc_symbols(source_block.len() as u32);

+    debug_assert_eq!(source_block.symbol_size(), symbol_size);


Should this be an assert!() rather than debug_assert!()?

cberner · 2026-03-12T00:26:34Z

+        debug_assert!(
+            self.mapping.is_none(),
+            "as_bytes called with active mapping"
+        );


It seems like this should be assert!(). It'd be a pretty serious bug if the mapping is Some, ya?

virtuallynathan · 2026-03-15T04:23:01Z

Moved asserts to runtime, not just debug builds.

virtuallynathan changed the title ~~perf: reduce encoder allocations and trim decoder output copies~~ perf: cut encoder allocation overhead and decoder output copies Mar 9, 2026

virtuallynathan mentioned this pull request Mar 9, 2026

perf improvements #207

Closed

virtuallynathan marked this pull request as draft March 9, 2026 04:41

virtuallynathan force-pushed the perf/exp-stack-encode-decode-foundation branch from ecb3133 to cecd2d0 Compare March 9, 2026 05:30

virtuallynathan changed the title ~~perf: cut encoder allocation overhead and decoder output copies~~ perf: reduce encode allocations and stabilize SymbolSlab inlining Mar 9, 2026

virtuallynathan force-pushed the perf/exp-stack-encode-decode-foundation branch from cecd2d0 to 985e073 Compare March 9, 2026 05:40

virtuallynathan changed the title ~~perf: reduce encode allocations and stabilize SymbolSlab inlining~~ perf: stabilize SymbolSlab accessor inlining under LTO Mar 9, 2026

virtuallynathan marked this pull request as ready for review March 9, 2026 05:44

virtuallynathan changed the title ~~perf: stabilize SymbolSlab accessor inlining under LTO~~ perf: stabilize SymbolSlab inlining + reduce encode allocations Mar 9, 2026

cberner approved these changes Mar 12, 2026

View reviewed changes

perf: reduce encode source symbol allocations

d11dd86

virtuallynathan force-pushed the perf/exp-stack-encode-decode-foundation branch from 54a5920 to d11dd86 Compare March 15, 2026 04:08

cberner merged commit df75b75 into cberner:master Mar 15, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: stabilize SymbolSlab inlining + reduce encode allocations#213

perf: stabilize SymbolSlab inlining + reduce encode allocations#213
cberner merged 2 commits intocberner:masterfrom
virtuallynathan:perf/exp-stack-encode-decode-foundation

virtuallynathan commented Mar 9, 2026 •

edited

Loading

Uh oh!

virtuallynathan commented Mar 9, 2026

Uh oh!

chatgpt-codex-connector bot commented Mar 9, 2026

Uh oh!

cberner left a comment

Uh oh!

cberner Mar 12, 2026

Uh oh!

cberner Mar 12, 2026

Uh oh!

virtuallynathan commented Mar 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

virtuallynathan commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

How

Commit 1: stabilize SymbolSlab accessor inlining under LTO

Commit 2: reduce encode source symbol allocations

Benchmarks (Zen4, EPYC 9654P, symbol_size=1280)

codec_benchmark (criterion, reliable)

decode_benchmark (5% overhead, single-run harness)

decode_benchmark (0% overhead, single-run harness)

Tests

Commit history

Notes

Uh oh!

virtuallynathan commented Mar 9, 2026

Uh oh!

chatgpt-codex-connector bot commented Mar 9, 2026

Uh oh!

cberner left a comment

Choose a reason for hiding this comment

Uh oh!

cberner Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

cberner Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

virtuallynathan commented Mar 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

virtuallynathan commented Mar 9, 2026 •

edited

Loading

`codec_benchmark` (criterion, reliable)

`decode_benchmark` (5% overhead, single-run harness)

`decode_benchmark` (0% overhead, single-run harness)