Commit 5880331
authored
perf(fse): kill iterator overhead in build_decoding_table, write decode via set_len (#293)
* fix(fse): use spare_capacity_mut + MaybeUninit instead of premature set_len
Copilot flagged the prior P2 change as UB: `set_len(table_size)`
before the write loop made `self.decode` logically contain
`table_size` initialised entries, but they were uninitialised
memory. The error path (`nb > accuracy_log`) returned with that
broken state, and the outer `reset()`/Drop would have run on
uninit entries — unsound per the Vec contract.
Fix: write via `spare_capacity_mut()` (typed `&mut [MaybeUninit<E>]`)
and call `set_len(table_size)` only AFTER the loop completes.
Error path returns with `decode.len() == 0` (set by the preceding
`clear()`), so no uninitialised entry is observable. The hot-loop
codegen is unchanged — `MaybeUninit::write` lowers to the same
strided store sequence the raw pointer version was emitting; the
soundness is now machine-checkable.
`E: FseEntry` has no `Drop` so the dropped `MaybeUninit` slice on
the error path is a no-op.
* perf(decode): drop #[cold] on do_offset_history_repcode for hot workloads (#294)
Issue #279 round-1 mispredict diagnosis attributed 15.42% of decoder
mispredicts to `do_offset_history_repcode`, with 27.80% landing on
the `pushq %rax` function entry — call/ret BTB pressure from the
never-inlined boundary. `#[inline(never)]` was dropped in earlier
rounds; `#[cold]` was kept to preserve out-of-line layout for
low-entropy blocks where the prior «drop both» variant regressed
+15.9% on L14.
The z000033 L-5 decode flamegraph surfaces this helper at 1.93%
self-time despite the `#[cold]` label — the cold-bias attribute
itself blocks LLVM from inlining even at hot call sites where the
call/ret + BTB cost dominates the body work. The body is small (RULES
lookup + 6 branchless cmov) and inlining duplicates a tight scalar
sequence into the seq decoder's per-sequence loop.
Drop `#[cold]` and let the inline cost-model see the full picture.
Cold callers don't pay anything they weren't paying before — their
total cost was already dominated by surrounding rare-path work, not
this helper.
* fix(fse): index slice instead of take() before unsafe set_len
`spread.iter().take(table_size)` silently runs fewer iterations
if `spread.len() < table_size` — which can only happen if a future
refactor breaks the upstream `spread.resize(table_size, 0)`
invariant, but the failure mode is severe: the loop body would
leave the `[loop_count..table_size)` slots in `decode`'s spare
capacity uninitialised, and the post-loop `set_len(table_size)`
would then claim those uninitialised entries as initialised — UB.
Switch to `spread[..table_size].iter()`. A length mismatch now
panics with a clear bounds-check error BEFORE the unsafe set_len
runs, surfacing the broken invariant immediately instead of
turning it into silent memory unsafety.1 parent 8065eb2 commit 5880331
2 files changed
Lines changed: 64 additions & 26 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
84 | 84 | | |
85 | 85 | | |
86 | 86 | | |
87 | | - | |
88 | | - | |
89 | | - | |
90 | | - | |
91 | | - | |
92 | | - | |
93 | | - | |
94 | | - | |
95 | | - | |
96 | | - | |
97 | | - | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
98 | 104 | | |
99 | 105 | | |
100 | 106 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
473 | 473 | | |
474 | 474 | | |
475 | 475 | | |
| 476 | + | |
| 477 | + | |
| 478 | + | |
| 479 | + | |
| 480 | + | |
| 481 | + | |
476 | 482 | | |
477 | | - | |
478 | | - | |
479 | | - | |
480 | | - | |
481 | | - | |
482 | | - | |
| 483 | + | |
| 484 | + | |
483 | 485 | | |
484 | 486 | | |
485 | 487 | | |
| |||
493 | 495 | | |
494 | 496 | | |
495 | 497 | | |
496 | | - | |
497 | | - | |
498 | | - | |
499 | | - | |
500 | | - | |
501 | | - | |
| 498 | + | |
| 499 | + | |
502 | 500 | | |
503 | 501 | | |
504 | 502 | | |
| |||
532 | 530 | | |
533 | 531 | | |
534 | 532 | | |
| 533 | + | |
| 534 | + | |
| 535 | + | |
| 536 | + | |
| 537 | + | |
| 538 | + | |
| 539 | + | |
| 540 | + | |
| 541 | + | |
535 | 542 | | |
536 | 543 | | |
537 | | - | |
| 544 | + | |
| 545 | + | |
| 546 | + | |
| 547 | + | |
| 548 | + | |
| 549 | + | |
| 550 | + | |
| 551 | + | |
| 552 | + | |
| 553 | + | |
| 554 | + | |
| 555 | + | |
538 | 556 | | |
539 | 557 | | |
540 | 558 | | |
| |||
567 | 585 | | |
568 | 586 | | |
569 | 587 | | |
| 588 | + | |
| 589 | + | |
| 590 | + | |
| 591 | + | |
| 592 | + | |
| 593 | + | |
570 | 594 | | |
571 | 595 | | |
572 | 596 | | |
| |||
585 | 609 | | |
586 | 610 | | |
587 | 611 | | |
588 | | - | |
589 | | - | |
| 612 | + | |
| 613 | + | |
| 614 | + | |
| 615 | + | |
| 616 | + | |
| 617 | + | |
| 618 | + | |
| 619 | + | |
| 620 | + | |
| 621 | + | |
590 | 622 | | |
591 | 623 | | |
592 | 624 | | |
| |||
0 commit comments