perf: borrowed in-place over-window scan for dfast + row + btlazy2 by polaz · Pull Request #422 · structured-world/structured-zstd

polaz · 2026-06-15T11:37:24Z

Summary

Extend the borrowed (no-copy) one-shot scan to over-window inputs across all in-place-capable backends: Dfast, Row, and btlazy2 (the binary-tree backend, L13-15). Previously over-window inputs fell back to the owned path, which copies the whole input into the history mirror (__memmove_avx_unaligned_erms); on large over-window streams that copy dominates L1 misses and the cycle gap vs C.

Each backend applies a per-position window_low = abs_ip - advertised_window candidate cap so an over-window in-place scan never emits an unresolvable offset, mirroring upstream zstd's continuous-index + windowLow one-shot behaviour. The cap is byte-identical for owned and borrowed-in-window (it collapses to the existing eviction floor / saturates to 0), so only the new over-window borrowed path changes behaviour.

btlazy2 (binary-tree) borrowed

The BT tree is done in-place by upstream zstd via a windowLow stop in the walk loop (ZSTD_insertBt1 zstd_opt.c:489, ZSTD_insertBtAndGetAllMatches zstd_opt.c:724), not via eviction — our BT walk/insert already carry the same stop, so the tree depth stays bounded regardless of eviction. The BT byte-source (bt_insert_* + emit_optimal_plan) was routed through live_history() (reborrow-then-raw-ptr) so the borrowed mirror is read correctly; btlazy2 borrowed is then byte-identical to owned.

Pre-split cache-locality fix

The borrowed path matches in place on the caller's input, so the pre-split fingerprint (optimal_block_size → split_block_by_chunks) becomes the first touch of each 128 KiB region: a cache-cold, sampling-strided read with interleaved random writes into the events table. That latency-bound pattern costs ~3× an ERMS streaming read of the same bytes (measured: pre-split memset/histogram 1.1% owned → 15.3% borrowed, the source of an early +45% regression on btlazy2). The owned path never pays this because its history-mirror copy already warmed the bytes. A single bandwidth-bound sequential warm pass per pre-split window restores that warmth without the copy's write half, gated to exactly the conditions under which the splitter reads the block (pre-split level, full 128 KiB remaining, savings >= 3).

Results (i9-9900K, `perf stat -e cycles`, over-window fixtures)

Dfast L3: 2.22× faster (19.46G → 8.75G cycles), now beats C (9.02G); L1-misses 3.38G → 0.84G.
Row L5 (greedy): 1.60× faster (20.65G → 12.90G); L9 (lazy): 1.14×; L1-misses −74% both.
btlazy2 L15: +45% regression eliminated. In-window 8 MiB: borrowed 4.80G vs owned 4.82G (parity, slightly ahead). Over-window 11.5 MiB: borrowed 111.6G vs owned 115.2G = ~3.1% faster. Ratio unchanged.

The win scales inversely with search cost (cheap-search backends gain most; the memmove is a smaller fraction of heavier parsers). For btlazy2 the BT search dominates, so the throughput edge is small, but borrowed also removes the 2× window history-mirror allocation.

The OPTIMAL parsers (btopt/btultra/btultra2, L16-22) stay on the owned path: their cost-based DP is sensitive to candidate quality and the borrowed continuous-index scan produced ratio-worse candidates that fell outside the ffi bound. Tracked separately.

Correctness

borrowed_oneshot_matches_owned_and_roundtrips covers L3 / L5 / L9 / L15 over-window (3 MiB) cases byte-identical to the owned (evicting) baseline + roundtrip.
periodic_stream_not_oversplit guards the btlazy2 borrowed byte-source against the BT over-split failure mode.
Full suite (839 tests, --features dict_builder) + cross_validation green on x86_64 (avx2) and aarch64.

Testing

cargo nextest run -p structured-zstd --features dict_builder — 839 passed.

Summary by CodeRabbit

New Features
- Added borrowed (zero-copy) input scanning support across all compression backends, enabling efficient in-place processing without internal buffer copying.
- Extended no-copy mode eligibility to additional compression levels beyond Fast variants.
Performance
- Added memory cache warming for pre-split window processing to optimize compression throughput.
Bug Fixes
- Strengthened window bounds validation for match candidates to ensure correctness with configurable window sizes.

The borrowed (no-copy) one-shot scan was gated to in-window inputs for Dfast; over-window inputs fell back to the owned path, which copies the whole input into the history mirror. On large over-window streams that memmove dominates: ~75% of L1-dcache misses and a 2.16x cycle gap vs C (C matches in place on src). Add a per-position window_low = abs_ip - advertised_window candidate bound to the borrowed dfast scan (owned eviction's history_abs_start already provides this; borrowed-in- window saturates to 0, so both stay byte-identical) and drop the input_len <= window_size gate. Over-window dfast now matches in place, mirroring C's continuous-index + windowLow one-shot behaviour and avoiding the input->mirror copy.

Extend the borrowed (no-copy) one-shot scan to the Row backend (greedy L5, lazy L9-12), mirroring the Dfast over-window fix: over-window inputs were copied into the owned history mirror (the input->mirror memmove that dominates L1 misses + the cycle gap vs C). Add borrowed_input/borrowed_block state + set_borrowed_window/stage_borrowed_block/start_matching_borrowed/ skip_matching_borrowed to RowMatchGenerator; set_borrowed_window zeroes history_abs_start so positions stay absolute input offsets. The candidate lower bound becomes window_low = history_abs_start.max(abs_pos - max_window_size): owned picks history_abs_start (byte-identical), borrowed (history_abs_start = 0) picks the window cap so an over-window in-place scan never emits an unresolvable offset. live_history()/get_last_space()/ current_block_range()/trailing-literals all read the borrowed input in place. borrowed_eligible gates on the resolved backend (Row), not the strategy tag, so HashChain/BT (no borrowed scan yet) stay owned. Driver wires set_borrowed_window/block + start/skip routing for Row. borrowed_oneshot_matches_owned_and_roundtrips now covers L5/L9 over-window byte-identical to owned.

coderabbitai · 2026-06-15T11:37:34Z

Warning

Review limit reached

@polaz, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 1 hour, 53 minutes, and 40 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 5d70fec1-5c96-4033-aea9-a475fa203ec4

📥 Commits

Reviewing files that changed from the base of the PR and between a2044af and 069ed79.

📒 Files selected for processing (8)

zstd/src/encoding/dfast/mod.rs
zstd/src/encoding/frame_compressor.rs
zstd/src/encoding/hc/mod.rs
zstd/src/encoding/levels/fastest.rs
zstd/src/encoding/match_generator.rs
zstd/src/encoding/match_table/storage.rs
zstd/src/encoding/row/mod.rs
zstd/src/tests/roundtrip_integrity.rs

📝 Walkthrough

Walkthrough

The PR extends the borrowed (no-copy) one-shot compression path from Simple/Dfast to Row and HashChain/BinaryTree backends. It adds borrowed_input/borrowed_block state to MatchTable, RowMatchGenerator, and HcMatchGenerator, introduces a borrowed_supported() predicate on MatchGeneratorDriver, tightens advertised-window candidate lower bounds across dfast/hc/row match finders, and adds a cache-warm helper in the frame compressor.

Changes

Borrowed no-copy scan for all backends

Layer / File(s)	Summary
MatchTable borrowed-window state and APIs `zstd/src/encoding/match_table/storage.rs`	Adds `borrowed_input`/`borrowed_block` fields and `set_borrowed_window`/`clear_borrowed_window`/`stage_borrowed_block`/`current_block_range` APIs; updates `live_history()`, `get_last_space()`, `reset()`, `skip_matching()`, and `emit_optimal_plan()` to serve borrowed slices instead of owned history.
MatchGeneratorDriver eligibility and dispatch widening `zstd/src/encoding/match_generator.rs`	Makes `active_backend()` pub(crate), adds `borrowed_supported()`, and extends `set_borrowed_window`/`clear_borrowed_window`/`set_borrowed_block`/`start_matching`/`skip_matching_with_hint` to dispatch to Row (greedy/lazy) and HashChain (lazy + BT optimal) in addition to existing Simple/Dfast paths.
RowMatchGenerator borrowed-window support `zstd/src/encoding/row/mod.rs`	Adds borrowed state fields and management methods, updates greedy/lazy parse-body macros to use `current_block_range()`/`live_history()` for block sizing and trailing-literal slicing, fixes row-probe `window_low` bound for borrowed mode, and adds `start_matching_borrowed`/`skip_matching_borrowed` entrypoints.
HcMatchGenerator borrowed support and BT/HC macro fixes `zstd/src/encoding/match_generator.rs`, `zstd/src/encoding/hc/mod.rs`	Adds borrowed-facing helpers to `HcMatchGenerator`; converts BT/HC loop macros to read from `table.live_history()` and derive block range via `table.current_block_range()`; tightens `chain_candidates`/`repcode_candidate`/`hash_chain_candidate` lower bounds to `history_abs_start.max(abs_pos.saturating_sub(max_window_size))`.
Dfast advertised-window candidate bounds `zstd/src/encoding/dfast/mod.rs`	Introduces `advertised_window` in the fast-loop preamble, computes per-cursor `wlow0`/`wlow1` bounds for borrowed mode, and updates all four candidate validity checks to use these bounds.
Frame compressor eligibility, cache-warm, and fast-path assertion `zstd/src/encoding/frame_compressor.rs`, `zstd/src/encoding/levels/fastest.rs`	Adds `warm_presplit_window` (64-byte-stride cache touch), delegates `borrowed_eligible` to `matcher.borrowed_supported()`, gates the warm-up call on pre-splitter conditions, and replaces an `assert!` with `debug_assert!(state.matcher.borrowed_supported())`.
Roundtrip integrity test expansion `zstd/src/tests/roundtrip_integrity.rs`	Adds `Level(5)`, `Level(9)`, and `Level(15)` to the borrowed-vs-owned roundtrip test's non-Fast level coverage.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

structured-world/structured-zstd#318: Directly modifies the borrowed one-shot encode path in frame_compressor.rs/levels/fastest.rs, the same eligibility/staging logic this PR rewires.
structured-world/structured-zstd#335: Modifies RowMatchGenerator and match_generator.rs Row dispatch at the same data structures and functions this PR extends with borrowed-window support.
structured-world/structured-zstd#390: Reshapes the same dfast fast-loop candidate validity checks that this PR updates for borrowed-mode window bounds.

Poem

🐇 Hopping through history, no copy in sight,
All backends now borrow the frames just right.
The window is tight, the bounds tightly keyed,
Row, HashChain, and BTree — all borrowed indeed!
Cache warm, eligibility clean,
The fluffiest no-copy path ever seen! 🌟

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main optimization: extending borrowed in-place over-window scanning to Dfast, Row, and Btlazy2 backends, which is the primary focus of the changes across all modified files.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch perf/dfast-speed-microopt

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-06-15T11:39:52Z

Codecov Report

❌ Patch coverage is 91.84549% with 19 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
zstd/src/encoding/match_generator.rs	77.33%	17 Missing ⚠️
zstd/src/encoding/dfast/mod.rs	83.33%	1 Missing ⚠️
zstd/src/encoding/match_table/storage.rs	97.95%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

Extend the borrowed (no-copy) over-window scan to the HashChain backend's btlazy2 binary-tree parser (L13-15). Over-window inputs were copied into the owned history mirror; now matched in place via the per-position window_low cap. Key fix: the BT-tree body macros (bt_insert_step_no_rebase_body, bt_insert_and_collect_matches_body) and emit_optimal_plan read the live region via `live_history()` (borrowed-aware, reborrow-then-raw-ptr so the slice holds no borrow while the tree mutates) instead of `&history[history_start..]` directly — the direct read returned the EMPTY borrowed mirror, so the tree found no matches and the encoder over-split (periodic_stream_not_oversplit: l15 184 -> 318994). MatchTable gains the borrowed-window scaffolding (borrowed_input/block, set_borrowed_window zeroing history_abs_start, stage, current_block_range, borrowed-aware live_history/get_last_space) + window_low on the HC chain candidate checks. The OPTIMAL parsers (BtOpt/BtUltra/BtUltra2, L16-22) stay owned: their cost-based DP is sensitive to candidate quality and the borrowed continuous-index scan yields ratio-worse candidates (diverged from owned + fell outside the ffi ratio bound). borrowed_supported is the single gate. borrowed_oneshot_matches_owned_and_roundtrips covers L15 over-window byte-identical to owned; periodic_stream_not_oversplit + 839 tests pass.

polaz · 2026-06-15T12:15:28Z

@coderabbitai review

coderabbitai · 2026-06-15T12:15:35Z

✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

The borrowed over-window path matches in place on the caller's input, so the pre-split fingerprint is the first touch of each 128 KiB region: a cache-cold, sampling-strided read with interleaved random writes into the events table (latency-bound, ~3x an ERMS streaming read of the same bytes). The owned path never pays this because its history-mirror copy already warmed the bytes. Restore that warmth without the copy's write half via one bandwidth-bound sequential pass per pre-split window, gated to exactly the conditions under which the splitter reads the block.

polaz · 2026-06-15T14:13:23Z

@coderabbitai full review

coderabbitai · 2026-06-15T14:13:30Z

✅ Action performed

Full review finished.

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

zstd/src/encoding/dfast/mod.rs (1)
922-926: ⚠️ Potential issue | 🔴 Critical

Borrowed mode missing advertised window bounds in tail probe.

probe_tail_ip0_only uses history_abs_start (which is 0 in borrowed mode per scan_source line 1192) for candidate validity checks, but doesn't apply the wlow bounds that the fast loop body computes. For over-window borrowed scans, this could emit matches with offset > advertised_window:

Line 924: cand_pos >= history_abs_start → always true when history_abs_start = 0

Line 964: same issue for short-hash probe

The fast loop (lines 1924–1932) correctly computes wlow = abs_ip.saturating_sub(advertised_window) for borrowed mode, but this function skips that check entirely. Since this function is called at the end of every block (when ip1 exceeds the readable region), an over-window input could emit an unresolvable offset from this path.

The fix is to apply the same wlow bounds logic used in the fast loop:
 fn probe_tail_ip0_only(
     &self,
     current_abs_start: usize,
     current_len: usize,
     ip0: usize,
     literals_start: usize,
+    borrowed: bool,
 ) -> Option<MatchCandidate> {
     ...
     let abs_ip0 = current_abs_start + ip0;
+    let advertised_window = self.max_window_size;
+    let wlow = if borrowed {
+        abs_ip0.saturating_sub(advertised_window)
+    } else {
+        history_abs_start
+    };
     ...
     // Long-hash probe
     if idxl0 != DFAST_EMPTY_SLOT {
         let cand_pos = position_base + (idxl0 as usize) - 1;
-        if cand_pos >= history_abs_start && cand_pos < abs_ip0 {
+        if cand_pos >= wlow && cand_pos < abs_ip0 {
     ...
     // Short-hash probe
     if idxs0 != DFAST_EMPTY_SLOT {
         let cand_pos_s = position_base + (idxs0 as usize) - 1;
-        if cand_pos_s >= history_abs_start && cand_pos_s < abs_ip0 {
+        if cand_pos_s >= wlow && cand_pos_s < abs_ip0 {
Then update the call site at line 1813 to pass $borrowed:
 if let Some(committed) =
-    $self.probe_tail_ip0_only($current_abs_start, $current_len, ip0, literals_start)
+    $self.probe_tail_ip0_only($current_abs_start, $current_len, ip0, literals_start, $borrowed)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@zstd/src/encoding/dfast/mod.rs` around lines 922 - 926, The
probe_tail_ip0_only function fails to enforce advertised window bounds in
borrowed mode scans, allowing matches with offset > advertised_window to be
emitted. In borrowed mode, history_abs_start is 0, making the check at line 924
(cand_pos >= history_abs_start) always true and unable to filter out-of-window
candidates. The same issue exists at line 964 for short-hash probes. Fix this by
computing wlow bounds using abs_ip.saturating_sub(advertised_window) and
applying it to validate candidates at both probe locations, matching the logic
used in the fast loop body. Additionally, update the call site at line 1813 to
pass the $borrowed parameter to enable this bounds checking within the function.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@zstd/src/encoding/levels/fastest.rs`:
- Around line 289-298: The debug_assert checking borrowed_supported() is
currently positioned only within the compressed branch, allowing unsupported
borrowed dispatch to proceed unchecked for RLE/raw-fast block paths. Move the
debug_assert macro call that validates state.matcher.borrowed_supported() from
its current location in the compressed branch to the function entry point,
before the branch selection logic. This ensures that debug builds catch every
invalid caller regardless of whether the code takes the compressed, RLE, or
raw-fast path.

In `@zstd/src/encoding/match_generator.rs`:
- Around line 1504-1520: The borrowed_supported() predicate in the match
statement correctly rejects BtOpt, BtUltra, and BtUltra2 for HashChain, but
internal callers at line 1609 can still stage borrowed blocks for these
unsupported configurations because the assertion there does not properly
validate the predicate. Add search-awareness to the borrowed block staging logic
by ensuring that all callsites where borrowed blocks are staged (at lines
1609-1617 and 2942-2954) properly gate their operations with a check against
borrowed_supported(), preventing BtOpt/BtUltra/BtUltra2 from ever staging
borrowed blocks regardless of the BackendTag.

In `@zstd/src/encoding/match_table/storage.rs`:
- Around line 2069-2078: The unsafe slice creation using from_raw_parts trusts
that current_len matches the actual length returned by get_last_space() without
validation, which could lead to undefined behavior if a mismatch occurs. Add a
debug_assert statement before the unsafe block to verify that current_len equals
the length of the space obtained from get_last_space(), ensuring that misuse is
caught in debug builds without incurring runtime cost in release builds.

---

Outside diff comments:
In `@zstd/src/encoding/dfast/mod.rs`:
- Around line 922-926: The probe_tail_ip0_only function fails to enforce
advertised window bounds in borrowed mode scans, allowing matches with offset >
advertised_window to be emitted. In borrowed mode, history_abs_start is 0,
making the check at line 924 (cand_pos >= history_abs_start) always true and
unable to filter out-of-window candidates. The same issue exists at line 964 for
short-hash probes. Fix this by computing wlow bounds using
abs_ip.saturating_sub(advertised_window) and applying it to validate candidates
at both probe locations, matching the logic used in the fast loop body.
Additionally, update the call site at line 1813 to pass the $borrowed parameter
to enable this bounds checking within the function.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: c1b00dec-5708-46b3-b901-1586fcaab437

📥 Commits

Reviewing files that changed from the base of the PR and between a2044af and 789ba37.

📒 Files selected for processing (8)

zstd/src/encoding/dfast/mod.rs
zstd/src/encoding/frame_compressor.rs
zstd/src/encoding/hc/mod.rs
zstd/src/encoding/levels/fastest.rs
zstd/src/encoding/match_generator.rs
zstd/src/encoding/match_table/storage.rs
zstd/src/encoding/row/mod.rs
zstd/src/tests/roundtrip_integrity.rs

dfast probe_tail_ip0_only (runs once per block at the iend boundary) validated candidates only against history_abs_start, which is 0 in borrowed mode, so an over-window borrowed scan could emit a match with offset > advertised_window from the tail path (an unresolvable offset). Apply the same wlow = abs_ip - advertised_window bound the fast loop body already uses, threaded via a borrowed flag. Also hardens the borrowed staging invariants: - borrowed_supported() is search-aware: the HashChain backend admits the lazy CHAIN parser and btlazy2 only, never the optimal BT parsers. - set_borrowed_block asserts borrowed_supported() (was a backend-only check); the borrowed BinaryTree dispatch drops the unreachable optimal arms (optimal always takes the owned path via borrowed_eligible). - compress_block_encoded_borrowed checks the invariant at function entry so RLE / raw-fast / compressed paths are all covered. - emit_optimal_plan debug_asserts current_len <= get_last_space().len() before the unchecked from_raw_parts slice. 839 tests pass; optimal levels stay byte-identical (owned path untouched).

greptile-apps · 2026-06-15T15:09:18Z

Greptile Summary

This PR extends the borrowed (zero-copy, in-place) one-shot scan to over-window inputs for the Dfast, Row (greedy L5 + lazy L9), and btlazy2 (L13-15) backends, eliminating the memmove-dominated history-mirror copy on large over-window inputs. A sequential cache-warming pass is added before the pre-split fingerprint on the borrowed path to avoid the 3x penalty of a cold strided read.

Per-position window_low cap (abs_pos - advertised_window) replaces the raw history_abs_start floor for borrowed mode across Dfast, Row, and HC, ensuring no over-window offset is ever emitted.
btlazy2 BT byte-source is made borrowed-aware by routing concat/concat_full through live_history(), with current_block_range() replacing the manual history_abs_start + window_size - current_len formula throughout.
borrowed_supported() is the single source of truth for which backend+search combinations take the borrowed path; optimal BT parsers (L16-22) remain on the owned path.

Confidence Score: 5/5

Safe to merge; the borrowed scan is correctly gated by borrowed_supported(), the window-low cap prevents unresolvable offsets, and borrowed/owned byte-identity is validated by 839 passing tests including over-window cases for all three new backends.

The core correctness invariant is enforced at every candidate site across Dfast, Row, HC, and the BT walk. The two findings are a maintenance safety gap in clear_borrowed_window and a loop-invariant sub-expression in the HC candidate walk, neither of which affects current correctness.

match_generator.rs (clear_borrowed_window) and hc/mod.rs (candidate-walk window-low expression) are worth a second look, but neither contains a correctness defect in the current code.

Important Files Changed

Filename	Overview
zstd/src/encoding/dfast/mod.rs	Adds `borrowed: bool` to `probe_tail_ip0_only` and introduces per-position `wlow`/`wlow0`/`wlow1` window-low bounds; correctly uses `saturating_sub(advertised_window)` for borrowed mode and falls back to `history_abs_start` for owned.
zstd/src/encoding/frame_compressor.rs	Simplifies `borrowed_eligible` to delegate to `borrowed_supported()`; adds `warm_presplit_window` correctly gated inside the borrowed-only block loop.
zstd/src/encoding/hc/mod.rs	Strengthens candidate bounds with `history_abs_start.max(abs_pos.saturating_sub(max_window_size))`; the window-low expression is position-invariant but recomputed per candidate iteration.
zstd/src/encoding/match_generator.rs	Adds `borrowed_supported()`, extends borrowed dispatch to Row and HashChain; `clear_borrowed_window` uses a `_ => {}` catch-all that would silently skip clearing for future BackendTag variants.
zstd/src/encoding/match_table/storage.rs	Adds `borrowed_input`/`borrowed_block` fields and the full borrowed-window API with correct ownership, safety docs, and reset-clearing.
zstd/src/encoding/row/mod.rs	Mirrors the MatchTable borrowed-window API on RowMatchGenerator; updates greedy/lazy parse macros to use `current_block_range()` and `live_history()`; adds `window_low` cap and borrowed entry points.
zstd/src/encoding/levels/fastest.rs	Replaces the hard assert against L16-22 with `debug_assert!(borrowed_supported())` to allow btlazy2 and Row through the borrowed path.
zstd/src/tests/roundtrip_integrity.rs	Extends `borrowed_oneshot_matches_owned_and_roundtrips` with L5, L9, and L15 over-window borrowed cases covering all three new backends.

_{Reviews (2): Last reviewed commit: "refactor(encode): drop redundant borrowe..." | Re-trigger Greptile}

The borrowed BinaryTree arm in start_matching re-staged the block on hc_matcher().table, but set_borrowed_block already staged the same range there for the HashChain backend (and borrowed_pending, which gates this arm, is set only by set_borrowed_block). The re-stage was a no-op overwrite with an identical range; remove it and note where the stage actually happens.

coderabbitai · 2026-06-15T16:13:06Z

✅ Action performed

Full review finished.

polaz added 2 commits June 15, 2026 13:58

polaz changed the title ~~perf: borrowed in-place over-window scan for dfast + row~~ perf: borrowed in-place over-window scan for dfast + row + btlazy2 Jun 15, 2026

polaz mentioned this pull request Jun 15, 2026

perf(huff0): hoist bit-stream state into the encode loop #423

Merged

coderabbitai Bot reviewed Jun 15, 2026

View reviewed changes

Comment thread zstd/src/encoding/levels/fastest.rs Outdated

Comment thread zstd/src/encoding/match_generator.rs

Comment thread zstd/src/encoding/match_table/storage.rs

greptile-apps Bot reviewed Jun 15, 2026

View reviewed changes

Comment thread zstd/src/encoding/match_generator.rs Outdated

polaz merged commit 1ad404a into main Jun 15, 2026
28 checks passed

polaz deleted the perf/dfast-speed-microopt branch June 15, 2026 15:38

sw-release-bot Bot mentioned this pull request Jun 15, 2026

chore: release v0.0.40 #419

Merged

coderabbitai Bot mentioned this pull request Jun 15, 2026

perf(encode): close small-window dict compress gap (hash-chain matchfinder + borrowed dict kernel) #426

Merged

Conversation

polaz commented Jun 15, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

btlazy2 (binary-tree) borrowed

Pre-split cache-locality fix

Results (i9-9900K, perf stat -e cycles, over-window fixtures)

Correctness

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

codecov Bot commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

polaz commented Jun 15, 2026

Uh oh!

coderabbitai Bot commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

polaz commented Jun 15, 2026

Uh oh!

coderabbitai Bot commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps Bot commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

polaz commented Jun 15, 2026 •

edited by coderabbitai Bot

Loading

Results (i9-9900K, `perf stat -e cycles`, over-window fixtures)

coderabbitai Bot commented Jun 15, 2026 •

edited

Loading

codecov Bot commented Jun 15, 2026 •

edited

Loading

coderabbitai Bot commented Jun 15, 2026 •

edited

Loading

coderabbitai Bot commented Jun 15, 2026 •

edited

Loading

greptile-apps Bot commented Jun 15, 2026 •

edited

Loading

coderabbitai Bot commented Jun 15, 2026 •

edited

Loading