Skip to content

Commit 4fa5f27

Browse files
committed
refactor(bench): preserve decode RSS window + plain FfiMemTracker arithmetic
Three small bench corrections: - Decode RSS sampling: `rust_window.finish()` now runs while `target` and `decoder` are still alive. Previously they sat in an inner block that dropped them before `finish` took its final on-thread sample — on sub-poll-interval payloads (small inputs decoding faster than the 250 µs sampler), the background poller can miss the spike entirely and the post-drop final sample underreports actual decode footprint. - `FfiMemTracker::current` updates with plain `+=` / `-=` instead of saturating arithmetic. A single-CCtx measurement bench cannot reach `usize::MAX` of live state under any realistic libzstd workload; if the counter ever did over/underflow that's a real bug (alloc/free imbalance or mis-routed opaque pointer) and a panic surfaces it instead of silently freezing at the saturation bound. - Code comments at the FFI-peak read sites document the intentional asymmetry: the FFI tracker observes ONLY libzstd's customMem requests, NOT the Rust-owned output/chunk/target buffers the FFI helpers also allocate. Both sides allocate output via the system allocator and the metric is the libzstd-internal cost on top, so wrapping the whole FFI call in an RSS window would inflate the comparison with bookkeeping the Rust path also pays.
1 parent 021a806 commit 4fa5f27

1 file changed

Lines changed: 46 additions & 11 deletions

File tree

zstd/benches/compare_ffi.rs

Lines changed: 46 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -280,8 +280,15 @@ unsafe extern "C" fn ffi_alloc(
280280
}
281281
// SAFETY: opaque came from `FfiMemTracker::custom_mem`; per-CCtx
282282
// single-threaded access.
283+
//
284+
// Plain `+=`/`-=` (no saturating arithmetic): a single-CCtx
285+
// measurement bench cannot realistically reach `usize::MAX` bytes
286+
// of live libzstd state. If the counter ever did overflow, that's
287+
// a real bug (alloc/free imbalance, mis-routed opaque pointer)
288+
// and the panic surfaces it instead of silently freezing the
289+
// counter at its max.
283290
let tracker = unsafe { &mut *(opaque as *mut FfiMemTracker) };
284-
tracker.current = tracker.current.saturating_add(size);
291+
tracker.current += size;
285292
if tracker.current > tracker.peak {
286293
tracker.peak = tracker.current;
287294
}
@@ -306,9 +313,11 @@ unsafe extern "C" fn ffi_free(opaque: *mut core::ffi::c_void, address: *mut core
306313
let layout = Layout::from_size_align(size + FFI_HEADER_BYTES, FFI_ALIGN)
307314
.expect("layout round-trips from ffi_alloc");
308315
// SAFETY: opaque is the same FfiMemTracker that `ffi_alloc` saw
309-
// for this CCtx; single-threaded per-CCtx.
316+
// for this CCtx; single-threaded per-CCtx. Plain `-=` for the same
317+
// reason `ffi_alloc` uses plain `+=`: a free without a matching
318+
// alloc would be a real bug, and panicking surfaces it.
310319
let tracker = unsafe { &mut *(opaque as *mut FfiMemTracker) };
311-
tracker.current = tracker.current.saturating_sub(size);
320+
tracker.current -= size;
312321
// SAFETY: header_ptr + layout matches the pair from `ffi_alloc`.
313322
unsafe { std::alloc::dealloc(header_ptr, layout) };
314323
}
@@ -543,6 +552,20 @@ fn bench_compress(c: &mut Criterion) {
543552
// malloc/free precisely. Same `ffi_encode_to_vec` the
544553
// timing loop below calls — only the customMem opt-in
545554
// differs.
555+
//
556+
// Intentional asymmetry vs the Rust side: `ffi_tracker.peak`
557+
// counts ONLY libzstd's customMem requests, NOT the
558+
// Rust-owned `output: Vec<u8>` and `chunk: Vec<u8>`
559+
// inside `ffi_encode_to_vec`. The Rust path's
560+
// `compress_to_vec` allocates its output via the same
561+
// system allocator, so on the FFI side we want only
562+
// the libzstd-internal hash/chain/workspace memory —
563+
// the apples-to-apples comparison for "what does
564+
// libzstd cost over the Rust crate". Wrapping the
565+
// whole FFI call in an RSS window would conflate the
566+
// two and inflate the FFI metric with bookkeeping the
567+
// Rust side doesn't pay either. See `emit_memory_report`
568+
// for the full asymmetry note.
546569
let mut ffi_tracker = FfiMemTracker::default();
547570
let ffi_compressed =
548571
ffi_encode_to_vec(&scenario.bytes[..], level.ffi_level, Some(&mut ffi_tracker));
@@ -640,16 +663,28 @@ fn bench_decompress_source(
640663
// sampling; FFI uses a per-DCtx customMem tracker. Both
641664
// observe the SAME decode call the timing loop below runs —
642665
// only the memory hook differs.
666+
// `rust_window.finish()` MUST run before `target`/`decoder` drop
667+
// so the final on-thread RSS sample sees their pages still
668+
// resident. With the previous inner-block scope they were
669+
// dropped first, the allocator could (and on macOS does) return
670+
// memory to the OS before `finish` sampled — undercounting peak.
643671
let rust_window = rss::PeakWindow::start();
644-
{
645-
let mut target = vec![0u8; expected_len];
646-
let mut decoder = FrameDecoder::new();
647-
let written = decoder.decode_all(compressed, &mut target).unwrap();
648-
assert_eq!(written, expected_len);
649-
black_box(target);
650-
}
672+
let mut target = vec![0u8; expected_len];
673+
let mut decoder = FrameDecoder::new();
674+
let written = decoder.decode_all(compressed, &mut target).unwrap();
675+
assert_eq!(written, expected_len);
651676
let rust_peak_rss_delta_bytes = rust_window.finish();
652-
677+
black_box(target);
678+
679+
// Same intentional asymmetry as the compress path: the FFI
680+
// peak counts ONLY libzstd's customMem requests (DCtx workspace
681+
// + window buffer), NOT the Rust-owned `ffi_target` output
682+
// slice. The Rust decode path allocates its own `target` via
683+
// the system allocator and we want both sides to compare
684+
// libzstd-internal vs FrameDecoder-internal memory, not output
685+
// bookkeeping. Wrapping the FFI decode in an RSS window would
686+
// double-count the Rust-side output that's identical between
687+
// both paths. See `emit_memory_report` for details.
653688
let mut ffi_tracker = FfiMemTracker::default();
654689
let mut ffi_target = vec![0u8; expected_len];
655690
let written = ffi_decompress_into(compressed, &mut ffi_target, Some(&mut ffi_tracker));

0 commit comments

Comments
 (0)