Skip to content

perf: borrowed in-place over-window scan for dfast + row + btlazy2#422

Merged
polaz merged 6 commits into
mainfrom
perf/dfast-speed-microopt
Jun 15, 2026
Merged

perf: borrowed in-place over-window scan for dfast + row + btlazy2#422
polaz merged 6 commits into
mainfrom
perf/dfast-speed-microopt

Conversation

@polaz

@polaz polaz commented Jun 15, 2026

Copy link
Copy Markdown
Member

Summary

Extend the borrowed (no-copy) one-shot scan to over-window inputs across all in-place-capable backends: Dfast, Row, and btlazy2 (the binary-tree backend, L13-15). Previously over-window inputs fell back to the owned path, which copies the whole input into the history mirror (__memmove_avx_unaligned_erms); on large over-window streams that copy dominates L1 misses and the cycle gap vs C.

Each backend applies a per-position window_low = abs_ip - advertised_window candidate cap so an over-window in-place scan never emits an unresolvable offset, mirroring upstream zstd's continuous-index + windowLow one-shot behaviour. The cap is byte-identical for owned and borrowed-in-window (it collapses to the existing eviction floor / saturates to 0), so only the new over-window borrowed path changes behaviour.

btlazy2 (binary-tree) borrowed

The BT tree is done in-place by upstream zstd via a windowLow stop in the walk loop (ZSTD_insertBt1 zstd_opt.c:489, ZSTD_insertBtAndGetAllMatches zstd_opt.c:724), not via eviction — our BT walk/insert already carry the same stop, so the tree depth stays bounded regardless of eviction. The BT byte-source (bt_insert_* + emit_optimal_plan) was routed through live_history() (reborrow-then-raw-ptr) so the borrowed mirror is read correctly; btlazy2 borrowed is then byte-identical to owned.

Pre-split cache-locality fix

The borrowed path matches in place on the caller's input, so the pre-split fingerprint (optimal_block_sizesplit_block_by_chunks) becomes the first touch of each 128 KiB region: a cache-cold, sampling-strided read with interleaved random writes into the events table. That latency-bound pattern costs ~3× an ERMS streaming read of the same bytes (measured: pre-split memset/histogram 1.1% owned → 15.3% borrowed, the source of an early +45% regression on btlazy2). The owned path never pays this because its history-mirror copy already warmed the bytes. A single bandwidth-bound sequential warm pass per pre-split window restores that warmth without the copy's write half, gated to exactly the conditions under which the splitter reads the block (pre-split level, full 128 KiB remaining, savings >= 3).

Results (i9-9900K, perf stat -e cycles, over-window fixtures)

  • Dfast L3: 2.22× faster (19.46G → 8.75G cycles), now beats C (9.02G); L1-misses 3.38G → 0.84G.
  • Row L5 (greedy): 1.60× faster (20.65G → 12.90G); L9 (lazy): 1.14×; L1-misses −74% both.
  • btlazy2 L15: +45% regression eliminated. In-window 8 MiB: borrowed 4.80G vs owned 4.82G (parity, slightly ahead). Over-window 11.5 MiB: borrowed 111.6G vs owned 115.2G = ~3.1% faster. Ratio unchanged.

The win scales inversely with search cost (cheap-search backends gain most; the memmove is a smaller fraction of heavier parsers). For btlazy2 the BT search dominates, so the throughput edge is small, but borrowed also removes the 2× window history-mirror allocation.

The OPTIMAL parsers (btopt/btultra/btultra2, L16-22) stay on the owned path: their cost-based DP is sensitive to candidate quality and the borrowed continuous-index scan produced ratio-worse candidates that fell outside the ffi bound. Tracked separately.

Correctness

  • borrowed_oneshot_matches_owned_and_roundtrips covers L3 / L5 / L9 / L15 over-window (3 MiB) cases byte-identical to the owned (evicting) baseline + roundtrip.
  • periodic_stream_not_oversplit guards the btlazy2 borrowed byte-source against the BT over-split failure mode.
  • Full suite (839 tests, --features dict_builder) + cross_validation green on x86_64 (avx2) and aarch64.

Testing

cargo nextest run -p structured-zstd --features dict_builder — 839 passed.

Summary by CodeRabbit

  • New Features

    • Added borrowed (zero-copy) input scanning support across all compression backends, enabling efficient in-place processing without internal buffer copying.
    • Extended no-copy mode eligibility to additional compression levels beyond Fast variants.
  • Performance

    • Added memory cache warming for pre-split window processing to optimize compression throughput.
  • Bug Fixes

    • Strengthened window bounds validation for match candidates to ensure correctness with configurable window sizes.

polaz added 2 commits June 15, 2026 13:58
The borrowed (no-copy) one-shot scan was gated to in-window inputs for
Dfast; over-window inputs fell back to the owned path, which copies the
whole input into the history mirror. On large over-window streams that
memmove dominates: ~75% of L1-dcache misses and a 2.16x cycle gap vs C
(C matches in place on src). Add a per-position window_low =
abs_ip - advertised_window candidate bound to the borrowed dfast scan
(owned eviction's history_abs_start already provides this; borrowed-in-
window saturates to 0, so both stay byte-identical) and drop the
input_len <= window_size gate. Over-window dfast now matches in place,
mirroring C's continuous-index + windowLow one-shot behaviour and
avoiding the input->mirror copy.
Extend the borrowed (no-copy) one-shot scan to the Row backend (greedy
L5, lazy L9-12), mirroring the Dfast over-window fix: over-window inputs
were copied into the owned history mirror (the input->mirror memmove that
dominates L1 misses + the cycle gap vs C). Add borrowed_input/borrowed_block
state + set_borrowed_window/stage_borrowed_block/start_matching_borrowed/
skip_matching_borrowed to RowMatchGenerator; set_borrowed_window zeroes
history_abs_start so positions stay absolute input offsets. The candidate
lower bound becomes window_low = history_abs_start.max(abs_pos -
max_window_size): owned picks history_abs_start (byte-identical), borrowed
(history_abs_start = 0) picks the window cap so an over-window in-place scan
never emits an unresolvable offset. live_history()/get_last_space()/
current_block_range()/trailing-literals all read the borrowed input in
place. borrowed_eligible gates on the resolved backend (Row), not the
strategy tag, so HashChain/BT (no borrowed scan yet) stay owned. Driver
wires set_borrowed_window/block + start/skip routing for Row.
borrowed_oneshot_matches_owned_and_roundtrips now covers L5/L9 over-window
byte-identical to owned.
@coderabbitai

coderabbitai Bot commented Jun 15, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@polaz, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 1 hour, 53 minutes, and 40 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 5d70fec1-5c96-4033-aea9-a475fa203ec4

📥 Commits

Reviewing files that changed from the base of the PR and between a2044af and 069ed79.

📒 Files selected for processing (8)
  • zstd/src/encoding/dfast/mod.rs
  • zstd/src/encoding/frame_compressor.rs
  • zstd/src/encoding/hc/mod.rs
  • zstd/src/encoding/levels/fastest.rs
  • zstd/src/encoding/match_generator.rs
  • zstd/src/encoding/match_table/storage.rs
  • zstd/src/encoding/row/mod.rs
  • zstd/src/tests/roundtrip_integrity.rs
📝 Walkthrough

Walkthrough

The PR extends the borrowed (no-copy) one-shot compression path from Simple/Dfast to Row and HashChain/BinaryTree backends. It adds borrowed_input/borrowed_block state to MatchTable, RowMatchGenerator, and HcMatchGenerator, introduces a borrowed_supported() predicate on MatchGeneratorDriver, tightens advertised-window candidate lower bounds across dfast/hc/row match finders, and adds a cache-warm helper in the frame compressor.

Changes

Borrowed no-copy scan for all backends

Layer / File(s) Summary
MatchTable borrowed-window state and APIs
zstd/src/encoding/match_table/storage.rs
Adds borrowed_input/borrowed_block fields and set_borrowed_window/clear_borrowed_window/stage_borrowed_block/current_block_range APIs; updates live_history(), get_last_space(), reset(), skip_matching(), and emit_optimal_plan() to serve borrowed slices instead of owned history.
MatchGeneratorDriver eligibility and dispatch widening
zstd/src/encoding/match_generator.rs
Makes active_backend() pub(crate), adds borrowed_supported(), and extends set_borrowed_window/clear_borrowed_window/set_borrowed_block/start_matching/skip_matching_with_hint to dispatch to Row (greedy/lazy) and HashChain (lazy + BT optimal) in addition to existing Simple/Dfast paths.
RowMatchGenerator borrowed-window support
zstd/src/encoding/row/mod.rs
Adds borrowed state fields and management methods, updates greedy/lazy parse-body macros to use current_block_range()/live_history() for block sizing and trailing-literal slicing, fixes row-probe window_low bound for borrowed mode, and adds start_matching_borrowed/skip_matching_borrowed entrypoints.
HcMatchGenerator borrowed support and BT/HC macro fixes
zstd/src/encoding/match_generator.rs, zstd/src/encoding/hc/mod.rs
Adds borrowed-facing helpers to HcMatchGenerator; converts BT/HC loop macros to read from table.live_history() and derive block range via table.current_block_range(); tightens chain_candidates/repcode_candidate/hash_chain_candidate lower bounds to history_abs_start.max(abs_pos.saturating_sub(max_window_size)).
Dfast advertised-window candidate bounds
zstd/src/encoding/dfast/mod.rs
Introduces advertised_window in the fast-loop preamble, computes per-cursor wlow0/wlow1 bounds for borrowed mode, and updates all four candidate validity checks to use these bounds.
Frame compressor eligibility, cache-warm, and fast-path assertion
zstd/src/encoding/frame_compressor.rs, zstd/src/encoding/levels/fastest.rs
Adds warm_presplit_window (64-byte-stride cache touch), delegates borrowed_eligible to matcher.borrowed_supported(), gates the warm-up call on pre-splitter conditions, and replaces an assert! with debug_assert!(state.matcher.borrowed_supported()).
Roundtrip integrity test expansion
zstd/src/tests/roundtrip_integrity.rs
Adds Level(5), Level(9), and Level(15) to the borrowed-vs-owned roundtrip test's non-Fast level coverage.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

Poem

🐇 Hopping through history, no copy in sight,
All backends now borrow the frames just right.
The window is tight, the bounds tightly keyed,
Row, HashChain, and BTree — all borrowed indeed!
Cache warm, eligibility clean,
The fluffiest no-copy path ever seen! 🌟

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main optimization: extending borrowed in-place over-window scanning to Dfast, Row, and Btlazy2 backends, which is the primary focus of the changes across all modified files.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch perf/dfast-speed-microopt

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov

codecov Bot commented Jun 15, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 91.84549% with 19 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
zstd/src/encoding/match_generator.rs 77.33% 17 Missing ⚠️
zstd/src/encoding/dfast/mod.rs 83.33% 1 Missing ⚠️
zstd/src/encoding/match_table/storage.rs 97.95% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

Extend the borrowed (no-copy) over-window scan to the HashChain backend's
btlazy2 binary-tree parser (L13-15). Over-window inputs were copied into
the owned history mirror; now matched in place via the per-position
window_low cap.

Key fix: the BT-tree body macros (bt_insert_step_no_rebase_body,
bt_insert_and_collect_matches_body) and emit_optimal_plan read the live
region via `live_history()` (borrowed-aware, reborrow-then-raw-ptr so the
slice holds no borrow while the tree mutates) instead of
`&history[history_start..]` directly — the direct read returned the EMPTY
borrowed mirror, so the tree found no matches and the encoder over-split
(periodic_stream_not_oversplit: l15 184 -> 318994). MatchTable gains the
borrowed-window scaffolding (borrowed_input/block, set_borrowed_window
zeroing history_abs_start, stage, current_block_range, borrowed-aware
live_history/get_last_space) + window_low on the HC chain candidate checks.

The OPTIMAL parsers (BtOpt/BtUltra/BtUltra2, L16-22) stay owned: their
cost-based DP is sensitive to candidate quality and the borrowed
continuous-index scan yields ratio-worse candidates (diverged from owned +
fell outside the ffi ratio bound). borrowed_supported is the single gate.

borrowed_oneshot_matches_owned_and_roundtrips covers L15 over-window
byte-identical to owned; periodic_stream_not_oversplit + 839 tests pass.
@polaz

polaz commented Jun 15, 2026

Copy link
Copy Markdown
Member Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 15, 2026

Copy link
Copy Markdown
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

The borrowed over-window path matches in place on the caller's input, so
the pre-split fingerprint is the first touch of each 128 KiB region: a
cache-cold, sampling-strided read with interleaved random writes into the
events table (latency-bound, ~3x an ERMS streaming read of the same bytes).
The owned path never pays this because its history-mirror copy already
warmed the bytes. Restore that warmth without the copy's write half via one
bandwidth-bound sequential pass per pre-split window, gated to exactly the
conditions under which the splitter reads the block.
@polaz polaz changed the title perf: borrowed in-place over-window scan for dfast + row perf: borrowed in-place over-window scan for dfast + row + btlazy2 Jun 15, 2026
@polaz

polaz commented Jun 15, 2026

Copy link
Copy Markdown
Member Author

@coderabbitai full review

@coderabbitai

coderabbitai Bot commented Jun 15, 2026

Copy link
Copy Markdown
✅ Action performed

Full review finished.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
zstd/src/encoding/dfast/mod.rs (1)

922-926: ⚠️ Potential issue | 🔴 Critical

Borrowed mode missing advertised window bounds in tail probe.

probe_tail_ip0_only uses history_abs_start (which is 0 in borrowed mode per scan_source line 1192) for candidate validity checks, but doesn't apply the wlow bounds that the fast loop body computes. For over-window borrowed scans, this could emit matches with offset > advertised_window:

  • Line 924: cand_pos >= history_abs_start → always true when history_abs_start = 0
  • Line 964: same issue for short-hash probe

The fast loop (lines 1924–1932) correctly computes wlow = abs_ip.saturating_sub(advertised_window) for borrowed mode, but this function skips that check entirely. Since this function is called at the end of every block (when ip1 exceeds the readable region), an over-window input could emit an unresolvable offset from this path.

The fix is to apply the same wlow bounds logic used in the fast loop:

 fn probe_tail_ip0_only(
     &self,
     current_abs_start: usize,
     current_len: usize,
     ip0: usize,
     literals_start: usize,
+    borrowed: bool,
 ) -> Option<MatchCandidate> {
     ...
     let abs_ip0 = current_abs_start + ip0;
+    let advertised_window = self.max_window_size;
+    let wlow = if borrowed {
+        abs_ip0.saturating_sub(advertised_window)
+    } else {
+        history_abs_start
+    };
     ...
     // Long-hash probe
     if idxl0 != DFAST_EMPTY_SLOT {
         let cand_pos = position_base + (idxl0 as usize) - 1;
-        if cand_pos >= history_abs_start && cand_pos < abs_ip0 {
+        if cand_pos >= wlow && cand_pos < abs_ip0 {
     ...
     // Short-hash probe
     if idxs0 != DFAST_EMPTY_SLOT {
         let cand_pos_s = position_base + (idxs0 as usize) - 1;
-        if cand_pos_s >= history_abs_start && cand_pos_s < abs_ip0 {
+        if cand_pos_s >= wlow && cand_pos_s < abs_ip0 {

Then update the call site at line 1813 to pass $borrowed:

 if let Some(committed) =
-    $self.probe_tail_ip0_only($current_abs_start, $current_len, ip0, literals_start)
+    $self.probe_tail_ip0_only($current_abs_start, $current_len, ip0, literals_start, $borrowed)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@zstd/src/encoding/dfast/mod.rs` around lines 922 - 926, The
probe_tail_ip0_only function fails to enforce advertised window bounds in
borrowed mode scans, allowing matches with offset > advertised_window to be
emitted. In borrowed mode, history_abs_start is 0, making the check at line 924
(cand_pos >= history_abs_start) always true and unable to filter out-of-window
candidates. The same issue exists at line 964 for short-hash probes. Fix this by
computing wlow bounds using abs_ip.saturating_sub(advertised_window) and
applying it to validate candidates at both probe locations, matching the logic
used in the fast loop body. Additionally, update the call site at line 1813 to
pass the $borrowed parameter to enable this bounds checking within the function.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@zstd/src/encoding/levels/fastest.rs`:
- Around line 289-298: The debug_assert checking borrowed_supported() is
currently positioned only within the compressed branch, allowing unsupported
borrowed dispatch to proceed unchecked for RLE/raw-fast block paths. Move the
debug_assert macro call that validates state.matcher.borrowed_supported() from
its current location in the compressed branch to the function entry point,
before the branch selection logic. This ensures that debug builds catch every
invalid caller regardless of whether the code takes the compressed, RLE, or
raw-fast path.

In `@zstd/src/encoding/match_generator.rs`:
- Around line 1504-1520: The borrowed_supported() predicate in the match
statement correctly rejects BtOpt, BtUltra, and BtUltra2 for HashChain, but
internal callers at line 1609 can still stage borrowed blocks for these
unsupported configurations because the assertion there does not properly
validate the predicate. Add search-awareness to the borrowed block staging logic
by ensuring that all callsites where borrowed blocks are staged (at lines
1609-1617 and 2942-2954) properly gate their operations with a check against
borrowed_supported(), preventing BtOpt/BtUltra/BtUltra2 from ever staging
borrowed blocks regardless of the BackendTag.

In `@zstd/src/encoding/match_table/storage.rs`:
- Around line 2069-2078: The unsafe slice creation using from_raw_parts trusts
that current_len matches the actual length returned by get_last_space() without
validation, which could lead to undefined behavior if a mismatch occurs. Add a
debug_assert statement before the unsafe block to verify that current_len equals
the length of the space obtained from get_last_space(), ensuring that misuse is
caught in debug builds without incurring runtime cost in release builds.

---

Outside diff comments:
In `@zstd/src/encoding/dfast/mod.rs`:
- Around line 922-926: The probe_tail_ip0_only function fails to enforce
advertised window bounds in borrowed mode scans, allowing matches with offset >
advertised_window to be emitted. In borrowed mode, history_abs_start is 0,
making the check at line 924 (cand_pos >= history_abs_start) always true and
unable to filter out-of-window candidates. The same issue exists at line 964 for
short-hash probes. Fix this by computing wlow bounds using
abs_ip.saturating_sub(advertised_window) and applying it to validate candidates
at both probe locations, matching the logic used in the fast loop body.
Additionally, update the call site at line 1813 to pass the $borrowed parameter
to enable this bounds checking within the function.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: c1b00dec-5708-46b3-b901-1586fcaab437

📥 Commits

Reviewing files that changed from the base of the PR and between a2044af and 789ba37.

📒 Files selected for processing (8)
  • zstd/src/encoding/dfast/mod.rs
  • zstd/src/encoding/frame_compressor.rs
  • zstd/src/encoding/hc/mod.rs
  • zstd/src/encoding/levels/fastest.rs
  • zstd/src/encoding/match_generator.rs
  • zstd/src/encoding/match_table/storage.rs
  • zstd/src/encoding/row/mod.rs
  • zstd/src/tests/roundtrip_integrity.rs

Comment thread zstd/src/encoding/levels/fastest.rs Outdated
Comment thread zstd/src/encoding/match_generator.rs
Comment thread zstd/src/encoding/match_table/storage.rs
dfast probe_tail_ip0_only (runs once per block at the iend boundary)
validated candidates only against history_abs_start, which is 0 in
borrowed mode, so an over-window borrowed scan could emit a match with
offset > advertised_window from the tail path (an unresolvable offset).
Apply the same wlow = abs_ip - advertised_window bound the fast loop body
already uses, threaded via a borrowed flag.

Also hardens the borrowed staging invariants:
- borrowed_supported() is search-aware: the HashChain backend admits the
  lazy CHAIN parser and btlazy2 only, never the optimal BT parsers.
- set_borrowed_block asserts borrowed_supported() (was a backend-only
  check); the borrowed BinaryTree dispatch drops the unreachable optimal
  arms (optimal always takes the owned path via borrowed_eligible).
- compress_block_encoded_borrowed checks the invariant at function entry
  so RLE / raw-fast / compressed paths are all covered.
- emit_optimal_plan debug_asserts current_len <= get_last_space().len()
  before the unchecked from_raw_parts slice.

839 tests pass; optimal levels stay byte-identical (owned path untouched).
@greptile-apps

greptile-apps Bot commented Jun 15, 2026

Copy link
Copy Markdown

Greptile Summary

This PR extends the borrowed (zero-copy, in-place) one-shot scan to over-window inputs for the Dfast, Row (greedy L5 + lazy L9), and btlazy2 (L13-15) backends, eliminating the memmove-dominated history-mirror copy on large over-window inputs. A sequential cache-warming pass is added before the pre-split fingerprint on the borrowed path to avoid the 3x penalty of a cold strided read.

  • Per-position window_low cap (abs_pos - advertised_window) replaces the raw history_abs_start floor for borrowed mode across Dfast, Row, and HC, ensuring no over-window offset is ever emitted.
  • btlazy2 BT byte-source is made borrowed-aware by routing concat/concat_full through live_history(), with current_block_range() replacing the manual history_abs_start + window_size - current_len formula throughout.
  • borrowed_supported() is the single source of truth for which backend+search combinations take the borrowed path; optimal BT parsers (L16-22) remain on the owned path.

Confidence Score: 5/5

Safe to merge; the borrowed scan is correctly gated by borrowed_supported(), the window-low cap prevents unresolvable offsets, and borrowed/owned byte-identity is validated by 839 passing tests including over-window cases for all three new backends.

The core correctness invariant is enforced at every candidate site across Dfast, Row, HC, and the BT walk. The two findings are a maintenance safety gap in clear_borrowed_window and a loop-invariant sub-expression in the HC candidate walk, neither of which affects current correctness.

match_generator.rs (clear_borrowed_window) and hc/mod.rs (candidate-walk window-low expression) are worth a second look, but neither contains a correctness defect in the current code.

Important Files Changed

Filename Overview
zstd/src/encoding/dfast/mod.rs Adds borrowed: bool to probe_tail_ip0_only and introduces per-position wlow/wlow0/wlow1 window-low bounds; correctly uses saturating_sub(advertised_window) for borrowed mode and falls back to history_abs_start for owned.
zstd/src/encoding/frame_compressor.rs Simplifies borrowed_eligible to delegate to borrowed_supported(); adds warm_presplit_window correctly gated inside the borrowed-only block loop.
zstd/src/encoding/hc/mod.rs Strengthens candidate bounds with history_abs_start.max(abs_pos.saturating_sub(max_window_size)); the window-low expression is position-invariant but recomputed per candidate iteration.
zstd/src/encoding/match_generator.rs Adds borrowed_supported(), extends borrowed dispatch to Row and HashChain; clear_borrowed_window uses a _ => {} catch-all that would silently skip clearing for future BackendTag variants.
zstd/src/encoding/match_table/storage.rs Adds borrowed_input/borrowed_block fields and the full borrowed-window API with correct ownership, safety docs, and reset-clearing.
zstd/src/encoding/row/mod.rs Mirrors the MatchTable borrowed-window API on RowMatchGenerator; updates greedy/lazy parse macros to use current_block_range() and live_history(); adds window_low cap and borrowed entry points.
zstd/src/encoding/levels/fastest.rs Replaces the hard assert against L16-22 with debug_assert!(borrowed_supported()) to allow btlazy2 and Row through the borrowed path.
zstd/src/tests/roundtrip_integrity.rs Extends borrowed_oneshot_matches_owned_and_roundtrips with L5, L9, and L15 over-window borrowed cases covering all three new backends.

Reviews (2): Last reviewed commit: "refactor(encode): drop redundant borrowe..." | Re-trigger Greptile

Comment thread zstd/src/encoding/match_generator.rs Outdated
The borrowed BinaryTree arm in start_matching re-staged the block on
hc_matcher().table, but set_borrowed_block already staged the same range
there for the HashChain backend (and borrowed_pending, which gates this
arm, is set only by set_borrowed_block). The re-stage was a no-op
overwrite with an identical range; remove it and note where the stage
actually happens.
@polaz polaz merged commit 1ad404a into main Jun 15, 2026
28 checks passed
@polaz polaz deleted the perf/dfast-speed-microopt branch June 15, 2026 15:38
@sw-release-bot sw-release-bot Bot mentioned this pull request Jun 15, 2026
@coderabbitai

coderabbitai Bot commented Jun 15, 2026

Copy link
Copy Markdown
✅ Action performed

Full review finished.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant