Skip to content

perf: borrowed in-place over-window scan for dfast + row + btlazy2 #2023

perf: borrowed in-place over-window scan for dfast + row + btlazy2

perf: borrowed in-place over-window scan for dfast + row + btlazy2 #2023

Workflow file for this run

name: CI
on:
push:
branches: [main]
# Docs-only / dashboard-only commits don't touch code, tests, or
# bench inputs — no reason to spin up the full lint → test (3 OS) →
# cross-i686 → msrv → codecov → fuzz → 27-shard bench pipeline.
# The dashboard's `index.html` republish path lives in
# `pages-only.yml`, which is gated on these same paths inversely.
paths-ignore:
- '**.md'
- '.github/bench-dashboard/**'
- 'docs/**'
- 'LICENSE*'
- '.gitignore'
pull_request:
branches: [main]
paths-ignore:
- '**.md'
- '.github/bench-dashboard/**'
- 'docs/**'
- 'LICENSE*'
- '.gitignore'
# Manual trigger: lets the wasm dashboard shard (`bench-wasm`) be
# refreshed on demand for changes the push path doesn't auto-gate on
# — e.g. a wasm bench-harness / parser edit that moves the published
# numbers without touching the `rust_core` paths that gate the shard.
workflow_dispatch:
permissions:
contents: read
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
# Keep cancellation for PR churn, but never cancel main pushes mid-flight:
# benchmark baseline + gh-pages merge must complete atomically.
cancel-in-progress: ${{ !(github.event_name == 'push' && github.ref == 'refs/heads/main') }}
env:
CARGO_TERM_COLOR: always
jobs:
changes:
# Cheap path probe (~10s) consumed by the bench pipeline gate below.
# Bench numbers (Rust vs C FFI) can only move when the core crate,
# the workspace manifests that pin deps / profiles, or the toolchain
# file change. A push touching ONLY wasm / npm / ts / js / CI / docs
# cannot move them, so `bench-matrix` (and its whole downstream
# cascade) is skipped on such pushes — see the gate on that job.
name: Detect changed paths
runs-on: ubuntu-latest
outputs:
rust_core: ${{ steps.filter.outputs.rust_core }}
steps:
- uses: actions/checkout@v6
- uses: dorny/paths-filter@v4
id: filter
with:
filters: |
# `rust_core` gates BOTH bench pipelines (native `bench-matrix`
# and the wasm-vs-bokuweb `bench-wasm` shard). The wasm payload
# compiles the core crate to wasm32, so the numbers it tracks
# move exactly when the core crate / manifests / toolchain move,
# NOT when the wasm-crate / npm glue changes (that path is
# validated by the always-on `wasm` job instead). A push that
# touches only wasm / npm / ts / js / CI / docs cannot move the
# compression behaviour, so both bench cascades skip it.
rust_core:
- 'zstd/**'
- 'Cargo.toml'
- 'Cargo.lock'
- 'rust-toolchain.toml'
lint:
timeout-minutes: 10
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- uses: dtolnay/rust-toolchain@stable
with:
components: rustfmt, clippy
- uses: Swatinem/rust-cache@v2
- name: Format
run: cargo fmt --all -- --check
- name: Clippy
run: cargo clippy -p structured-zstd --features hash,std,dict_builder -- -D warnings
- name: Clippy (bench_internals)
run: cargo clippy -p structured-zstd --features hash,std,dict_builder,bench_internals --benches -- -D warnings
- name: Gate — compare_ffi/memory must not pull bench_internals
# Rationale: both `compare_ffi` (timing) and `compare_ffi_memory`
# (peak alloc) benchmark structured-zstd against libzstd. If
# `bench_internals` widens our crate's public API surface
# (visibility changes for `BitReaderReversed` etc.), the Rust
# side ships compiled differently than what a real downstream
# consumer uses — biasing every cross-side comparison. Keep
# bench_internals strictly out of both benches' feature sets.
#
# tomllib (Python 3.11+ stdlib) handles both single-line and
# multi-line TOML arrays, so a future `required-features` array
# reformat can't sneak `bench_internals` past the gate.
run: |
python3 - <<'PY'
import sys, tomllib
with open("zstd/Cargo.toml", "rb") as f:
cargo = tomllib.load(f)
benches = {
bench.get("name"): bench.get("required-features", [])
for bench in cargo.get("bench", [])
}
violators = [
name for name in ("compare_ffi", "compare_ffi_memory")
if "bench_internals" in benches.get(name, [])
]
if violators:
for name in violators:
print(f"::error::{name} must NOT require bench_internals — would bias Rust-vs-FFI parity")
sys.exit(1)
PY
- name: Gate — bench instrumentation must not leak into zstd/src/
# Rationale: bench-only memory observation (TrackingAllocator,
# customMem hooks) lives in zstd/benches/. Anything
# bench-instrumentation-shaped in zstd/src/ would bloat the
# published crate. Comments are OK; identifier references in
# actual code are not.
#
# The second `rg -v` filters out comment-only lines so a doc
# comment referencing these names (e.g. "see bench's
# `TrackingAllocator`") doesn't trip the gate. Matches
# `path:line:` followed by leading whitespace and a Rust
# comment prefix (`//`, `///`, `//!`, `/*`, ` *`).
run: |
leaked=$(rg -n --no-heading \
'TrackingAllocator|ALLOC_PEAK|ALLOC_CURRENT|TRACKING_ENABLED|ZSTD_customMem|customMem\(' \
zstd/src/ \
| rg -v '^[^:]+:[0-9]+:\s*(//|/\*|\*)' || true)
if [ -n "$leaked" ]; then
echo "$leaked"
echo "::error::bench-only instrumentation symbols leaked into zstd/src/"
exit 1
fi
c-abi:
# Builds the libzstd-compatible C ABI front end and verifies it is a real
# drop-in: vendored headers match upstream verbatim, the cdylib advertises
# SONAME libzstd.so.1, every declared symbol is exported, and a genuine C
# consumer links + round-trips through the vendored header.
needs: lint
timeout-minutes: 12
runs-on: ubuntu-latest
env:
UPSTREAM_TAG: v1.5.7
steps:
- uses: actions/checkout@v6
with:
# Build + test + header-diff job (no push); don't leave the token
# in the checkout's git config.
persist-credentials: false
- uses: dtolnay/rust-toolchain@stable
with:
components: clippy
- uses: Swatinem/rust-cache@v2
with:
prefix-key: c-abi
- name: Clippy (c-api)
run: cargo clippy -p structured-zstd-c --all-targets -- -D warnings
- name: Unit + ABI tests
run: cargo test -p structured-zstd-c
- name: 32-bit toolchain (i686)
# Upstream libzstd ships on 32-bit platforms; the drop-in C ABI must
# keep building and passing there too (size_t error encoding, struct
# layouts, and overflow guards all change shape at 32-bit usize).
run: |
rustup target add i686-unknown-linux-gnu
sudo apt-get update
sudo apt-get install -y gcc-multilib
- name: Clippy (c-api, i686)
run: cargo clippy -p structured-zstd-c --all-targets --target i686-unknown-linux-gnu -- -D warnings
- name: Unit + ABI tests (i686)
run: cargo test -p structured-zstd-c --target i686-unknown-linux-gnu
- name: Vendored headers match upstream verbatim
# The headers are copied byte-for-byte from the pinned upstream tag; a
# diff means someone edited a vendored header (forbidden) or the pin
# moved without re-vendoring.
run: |
set -euo pipefail
base="https://raw.githubusercontent.com/facebook/zstd/${UPSTREAM_TAG}/lib"
for h in zstd.h zdict.h zstd_errors.h; do
curl --retry 5 --retry-all-errors --retry-delay 2 -fsSL "${base}/${h}" -o "/tmp/${h}.upstream"
if ! diff -u "/tmp/${h}.upstream" "c-api/include/${h}"; then
echo "::error::c-api/include/${h} diverges from upstream ${UPSTREAM_TAG}"
exit 1
fi
done
echo "vendored headers identical to upstream ${UPSTREAM_TAG}"
- name: Build cdylib + staticlib
run: cargo build -p structured-zstd-c
- name: SONAME is libzstd.so.1
run: |
set -euo pipefail
so=target/debug/libstructured_zstd.so
soname=$(readelf -d "$so" | sed -n 's/.*SONAME.*\[\(.*\)\]/\1/p')
echo "SONAME=$soname"
test "$soname" = "libzstd.so.1"
- name: All declared symbols are exported
run: |
set -euo pipefail
so=target/debug/libstructured_zstd.so
exported=$(nm -D --defined-only "$so" | awk '{print $NF}')
missing=0
for sym in \
ZSTD_compress ZSTD_decompress ZSTD_compressBound \
ZSTD_getFrameContentSize ZSTD_findFrameCompressedSize \
ZSTD_isError ZSTD_getErrorCode ZSTD_getErrorName ZSTD_getErrorString \
ZSTD_minCLevel ZSTD_maxCLevel ZSTD_defaultCLevel \
ZSTD_versionNumber ZSTD_versionString \
ZSTD_createCCtx ZSTD_freeCCtx ZSTD_createDCtx ZSTD_freeDCtx \
ZSTD_compressCCtx ZSTD_decompressDCtx ZSTD_sizeof_CCtx ZSTD_sizeof_DCtx \
ZSTD_frameHeaderSize ZSTD_getFrameHeader ZSTD_getFrameHeader_advanced \
ZSTD_findDecompressedSize ZSTD_decompressBound \
ZDICT_trainFromBuffer ZDICT_finalizeDictionary ZDICT_getDictID \
ZDICT_getDictHeaderSize ZDICT_isError ZDICT_getErrorName \
ZSTD_CCtx_loadDictionary ZSTD_CCtx_loadDictionary_byReference \
ZSTD_CCtx_loadDictionary_advanced ZSTD_CCtx_refCDict \
ZSTD_CCtx_refPrefix ZSTD_CCtx_refPrefix_advanced \
ZSTD_DCtx_loadDictionary ZSTD_DCtx_loadDictionary_byReference \
ZSTD_DCtx_loadDictionary_advanced ZSTD_DCtx_refDDict ZSTD_DCtx_refPrefix \
ZSTD_DCtx_refPrefix_advanced \
ZSTD_compress_usingCDict ZSTD_decompress_usingDDict \
ZSTD_createCDict ZSTD_createCDict_byReference ZSTD_freeCDict \
ZSTD_createDDict ZSTD_createDDict_byReference ZSTD_freeDDict \
ZSTD_sizeof_CDict ZSTD_sizeof_DDict \
ZSTD_getDictID_fromCDict ZSTD_getDictID_fromDDict \
ZSTD_compress_usingDict ZSTD_decompress_usingDict \
ZSTD_createCDict_advanced ZSTD_createDDict_advanced \
ZSTD_compress_usingCDict_advanced \
ZSTD_getDictID_fromDict ZSTD_getDictID_fromFrame \
ZSTD_estimateCCtxSize ZSTD_estimateCCtxSize_usingCParams \
ZSTD_estimateCStreamSize_usingCParams \
ZSTD_estimateDCtxSize ZSTD_estimateCStreamSize ZSTD_estimateDStreamSize \
ZSTD_getDecompressedSize ZSTD_sizeof_CStream ZSTD_sizeof_DStream \
ZDICT_trainFromBuffer_fastCover ZDICT_optimizeTrainFromBuffer_fastCover; do
if ! grep -qx "$sym" <<<"$exported"; then
echo "::error::symbol $sym not exported from $so"
missing=1
fi
done
test "$missing" -eq 0
- name: pkg-config reports upstream version
run: |
set -euo pipefail
# Validate every discovered libzstd.pc, not just the first hit: a
# restored cache can leave a stale file ahead of the fresh build.
mapfile -t pcs < <(find target -type f -name libzstd.pc)
test "${#pcs[@]}" -gt 0
for pc in "${pcs[@]}"; do
echo "checking $pc"; cat "$pc"
grep -qx "Version: 1.5.7" "$pc"
done
- name: Real C consumer links + round-trips
run: |
set -euo pipefail
# The cdylib advertises SONAME libzstd.so.1, so a consumer records a
# NEEDED dependency on that name; provide it via a symlink. Also
# expose the canonical `libzstd.so` link name and link with `-lzstd`,
# exactly the path a real drop-in C consumer uses.
ln -sf libstructured_zstd.so target/debug/libzstd.so.1
ln -sf libstructured_zstd.so target/debug/libzstd.so
cc -std=c11 -Wall -Wextra -Ic-api/include c-api/tests/c_consumer.c \
-Ltarget/debug -lzstd -o /tmp/c_consumer
LD_LIBRARY_PATH=target/debug /tmp/c_consumer
- name: musl static drop-in builds + exports symbols
# musl is a std target (not no-std); its default `+crt-static` profile
# makes the static archive `libstructured_zstd.a` the canonical drop-in
# for Alpine / fully-static binaries (the cdylib is dropped under
# crt-static, which is expected). Verify the archive builds and carries
# the exported wrappers.
run: |
set -euo pipefail
rustup target add x86_64-unknown-linux-musl
cargo build -p structured-zstd-c --target x86_64-unknown-linux-musl
a=target/x86_64-unknown-linux-musl/debug/libstructured_zstd.a
test -f "$a"
for sym in ZSTD_compress ZSTD_decompress ZSTD_versionNumber ZSTD_getFrameHeader; do
nm "$a" | grep -qE " T ${sym}$" || { echo "::error::$sym missing from musl staticlib"; exit 1; }
done
echo "musl staticlib OK ($(du -h "$a" | cut -f1))"
test:
needs: lint
timeout-minutes: 15
strategy:
# Each OS hits its own runner-image regressions (macos-latest currently
# ships Homebrew rustup without `cargo`/`rustc` shims under
# `~/.cargo/bin`, so plain `cargo …` resolves to `rustup-init`). Run
# every OS so one glitchy image doesn't mask the others.
fail-fast: false
matrix:
os: [ubuntu-latest, windows-latest, macos-latest]
runs-on: ${{ matrix.os }}
steps:
- uses: actions/checkout@v6
- uses: dtolnay/rust-toolchain@stable
- name: Prepend toolchain bin to PATH (macos shim workaround)
if: runner.os == 'macOS'
# macos-latest preinstalled rustup ships via Homebrew without the
# `~/.cargo/bin/{cargo,rustc}` proxy shims. `rustup run stable cargo`
# works for the outer call but the cargo it launches then invokes
# `rustc -vV` through the same broken proxy. Putting the toolchain's
# actual `bin/` ahead on PATH gives every nested invocation the
# real binaries.
run: |
TC="$(rustup show active-toolchain | awk '{print $1}')"
echo "$HOME/.rustup/toolchains/$TC/bin" >> $GITHUB_PATH
- uses: taiki-e/install-action@nextest
- uses: Swatinem/rust-cache@v2
with:
prefix-key: ${{ runner.os }}-cargo
- name: Test
working-directory: zstd
run: cargo nextest run --profile ci -p structured-zstd --features hash,std,dict_builder
- name: Doc tests
run: cargo test --doc -p structured-zstd --features hash,std,dict_builder
cross-i686:
needs: lint
timeout-minutes: 15
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- uses: dtolnay/rust-toolchain@stable
- uses: taiki-e/install-action@nextest
- name: Install i686 target
run: rustup target add i686-unknown-linux-gnu
- name: Install 32-bit libs
run: sudo apt-get update && sudo apt-get install -y gcc-multilib
- uses: Swatinem/rust-cache@v2
with:
prefix-key: cross-i686
- name: Test (i686)
working-directory: zstd
run: cargo nextest run --profile ci -p structured-zstd --features hash,std,dict_builder --target i686-unknown-linux-gnu
msrv:
needs: lint
timeout-minutes: 15
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- uses: dtolnay/rust-toolchain@stable
with:
toolchain: "1.92.0"
- uses: taiki-e/install-action@nextest
- uses: Swatinem/rust-cache@v2
with:
prefix-key: msrv
- name: Test (MSRV)
working-directory: zstd
run: cargo nextest run --profile ci -p structured-zstd --features hash,std,dict_builder
no-std:
# Build-only smoke test that the crate compiles under no_std + alloc.
# The decoder ships #![no_std] with optional `std` feature gating the
# io::Read/Write impls + runtime CPUID detection; without this gate
# nothing catches a regression where a new `use std::...` slips into
# a hot decode path and silently breaks downstream embedded users.
# Two configurations cover the realistic deployment surface:
# alloc-only (zero features) and alloc + xxhash content checksum.
# Clippy runs cargo's compile checks plus linting, so a separate
# `cargo check` pass would just duplicate work.
needs: lint
timeout-minutes: 10
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- uses: dtolnay/rust-toolchain@stable
with:
components: clippy
- uses: Swatinem/rust-cache@v2
with:
prefix-key: no-std
- name: Clippy (no_std)
working-directory: zstd
run: cargo clippy -p structured-zstd --no-default-features -- -D warnings
- name: Clippy (no_std + hash)
working-directory: zstd
run: cargo clippy -p structured-zstd --no-default-features --features hash -- -D warnings
# Embedded-minimal kernel trim: scalar kernel only, all SIMD tiers and
# their dispatch trampolines compiled out. Guards the kernel_* feature
# gating so a future change can't silently break the trimmed build.
- name: Clippy (embedded — kernel_scalar only)
working-directory: zstd
run: cargo clippy -p structured-zstd --no-default-features --features kernel_scalar,hash -- -D warnings
- name: Clippy (embedded_minimal — kernel_scalar + std)
working-directory: zstd
run: cargo clippy -p structured-zstd --no-default-features --features kernel_scalar,std,hash -- -D warnings
wasm:
# #347 / #348: the wasm32 simd128 kernel tier + the npm package
# (@structured-world/structured-zstd). Builds both payloads (simd128 +
# scalar), then runs the Node format cross-check against the C reference
# (@bokuweb/zstd-wasm): our frames decode there and vice versa. `+simd128`
# is scoped to the wasm32 target via `--config` so host build-scripts don't
# warn. Cheap (wasm build + node), so it runs on every PR.
needs: lint
timeout-minutes: 15
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
with:
# This job builds + tests only (no push); don't leave the token in
# the checkout's git config.
persist-credentials: false
- uses: dtolnay/rust-toolchain@stable
with:
targets: wasm32-unknown-unknown
components: clippy
- uses: Swatinem/rust-cache@v2
with:
prefix-key: wasm
- name: Clippy (wasm32 simd128)
run: cargo clippy -p structured-zstd -p structured-zstd-wasm --target wasm32-unknown-unknown --no-default-features --features kernel_simd128 --config 'target.wasm32-unknown-unknown.rustflags=["-C","target-feature=+simd128"]' -- -D warnings
- name: Clippy (wasm32 scalar)
run: cargo clippy -p structured-zstd -p structured-zstd-wasm --target wasm32-unknown-unknown --no-default-features --features kernel_simd128 -- -D warnings
- name: Install wasm-pack
uses: taiki-e/install-action@v2
with:
tool: wasm-pack
- uses: actions/setup-node@v6
with:
node-version: "24"
- name: Build npm package (both payloads + types)
working-directory: zstd-wasm/npm
run: |
npm ci || npm install
npm run build
- name: Format cross-check vs C reference (@bokuweb/zstd-wasm)
working-directory: zstd-wasm/bench
run: |
npm ci || npm install
npm test
- name: wasm size budget
run: |
for p in simd scalar; do
sz=$(stat -c %s "zstd-wasm/npm/$p/structured_zstd_wasm_bg.wasm")
echo "$p payload: $sz bytes"
# Generous ceiling (~768 KiB) over the ~550 KiB baseline; bump
# deliberately if a feature legitimately grows the module.
if [ "$sz" -gt 786432 ]; then
echo "::error::$p .wasm is $sz bytes, over the 768 KiB budget"
exit 1
fi
done
bench-wasm:
# #366: continuous wasm dashboard shard. The native bench matrix runs
# a prebuilt criterion binary (Rust vs C FFI); this shard instead
# builds the npm payloads and runs `node zstd-wasm/bench/bench.mjs`,
# which measures our two wasm tiers (simd128 + scalar) against the
# most popular npm competitor (@bokuweb/zstd-wasm). Its REPORT* lines
# feed a dedicated wasm section on the gh-pages dashboard so the
# wasm speed/ratio vs bokuweb is tracked over time, not just checked
# locally before an npm publish.
#
# Push-to-main only (consistent with the rest of the bench pipeline
# being push-only — #362) and gated on `rust_core`, the SAME gate as
# the native `bench-matrix` cascade: the wasm payload compiles the core
# crate to wasm32, so its numbers move when the Rust compressor changes,
# not when the wasm-crate / npm glue changes. It publishes only
# `benchmark-wasm.json` to gh-pages — `index.html` stays owned by
# `benchmark-pages` / `pages-only.yml` — so the two publishers write
# disjoint files and never race for the same blob.
name: Bench wasm vs bokuweb (dashboard shard)
needs: [lint, changes]
if: github.event_name == 'workflow_dispatch' || (github.event_name == 'push' && github.ref == 'refs/heads/main' && needs.changes.outputs.rust_core == 'true')
timeout-minutes: 30
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- uses: dtolnay/rust-toolchain@stable
with:
targets: wasm32-unknown-unknown
- uses: Swatinem/rust-cache@v2
with:
prefix-key: wasm-bench
- name: Install wasm-pack
uses: taiki-e/install-action@v2
with:
tool: wasm-pack
- uses: actions/setup-node@v6
with:
node-version: "24"
# `wasm-pack build --release` runs the `-Oz` wasm-opt pass, for which it
# downloads a pinned binaryen tarball from GitHub releases at build time.
# That download is network-flaky (transient 5xx / rate-limit) and was
# failing the whole dashboard shard. Cache wasm-pack's tool dir so the
# tarball is fetched at most once across runs.
- name: Cache wasm-pack binaryen download
uses: actions/cache@v5
with:
path: ~/.cache/.wasm-pack
key: wasm-pack-binaryen-${{ runner.os }}
- name: Build npm payloads (simd128 + scalar)
working-directory: zstd-wasm/npm
run: |
npm ci || npm install
# Retry the build: on a cold binaryen cache the wasm-opt tarball
# download can transiently fail. Hard-fail after 3 attempts so a real
# (non-transient) breakage still fails the shard.
attempt=0
until npm run build; do
attempt=$((attempt + 1))
if [ "$attempt" -ge 3 ]; then
echo "wasm build failed after $attempt attempts" >&2
exit 1
fi
echo "wasm build attempt $attempt failed (likely transient binaryen fetch); retrying..." >&2
sleep 15
done
- name: Run wasm bench (capture REPORT* lines)
working-directory: zstd-wasm/bench
# `pipefail` + `tee`: bench.mjs exits non-zero on any round-trip
# failure, which must fail the shard. `tee` keeps the captured
# stdout for the parser regardless of pass/fail.
run: |
npm ci || npm install
set -o pipefail
node bench.mjs | tee "$GITHUB_WORKSPACE/wasm-bench-raw.txt"
- name: Parse REPORT* lines into dashboard records
env:
WASM_BENCH_RAW_FILE: ${{ github.workspace }}/wasm-bench-raw.txt
STRUCTURED_ZSTD_BENCH_GENERATED_AT: ${{ github.event.head_commit.timestamp || github.event.repository.updated_at }}
run: python3 .github/scripts/parse-wasm-bench.py
- name: Upload wasm bench artifact
uses: actions/upload-artifact@v7
with:
name: benchmark-wasm-run
path: benchmark-wasm-run.json
if-no-files-found: error
retention-days: 7
- name: Generate bot token
id: bot-token
uses: actions/create-github-app-token@v3
with:
app-id: ${{ secrets.RELEASER_APP_ID }}
private-key: ${{ secrets.RELEASER_APP_PRIVATE_KEY }}
- name: Checkout gh-pages with push token
uses: actions/checkout@v6
with:
ref: gh-pages
token: ${{ steps.bot-token.outputs.token }}
path: gh-pages
- name: Merge into persisted wasm timeseries and publish
env:
WASM_RUN_FILE: ${{ github.workspace }}/benchmark-wasm-run.json
WASM_EXISTING_FILE: ${{ github.workspace }}/gh-pages/dev/bench/benchmark-wasm.json
WASM_OUTPUT_FILE: ${{ github.workspace }}/gh-pages/dev/bench/benchmark-wasm.json
run: |
mkdir -p gh-pages/dev/bench
python3 .github/scripts/merge-wasm-bench.py
cd gh-pages
git config user.name "github-actions[bot]"
git config user.email "41898282+github-actions[bot]@users.noreply.github.com"
git add dev/bench/benchmark-wasm.json
if git diff --cached --quiet; then
echo "No wasm timeseries change to publish."
exit 0
fi
git commit -m "chore(bench): publish wasm vs bokuweb dashboard data"
# Disjoint-file publishers can still collide on a concurrent
# gh-pages push; rebase onto the latest remote tip and retry a
# few times before giving up.
for attempt in 1 2 3; do
if git push origin gh-pages; then
echo "Published on attempt $attempt."
exit 0
fi
echo "Push rejected (attempt $attempt); rebasing on remote gh-pages."
git pull --rebase origin gh-pages
done
echo "::error::failed to publish wasm timeseries after 3 attempts"
exit 1
codecov:
needs: lint
timeout-minutes: 15
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- uses: dtolnay/rust-toolchain@nightly
with:
components: llvm-tools-preview
- uses: Swatinem/rust-cache@v2
- uses: taiki-e/install-action@cargo-llvm-cov
- run: cargo llvm-cov -p structured-zstd --features hash,std,dict_builder --lcov --output-path lcov.info
working-directory: zstd
- uses: codecov/codecov-action@v7
with:
files: zstd/lcov.info
env:
CODECOV_TOKEN: ${{ secrets.CODECOV_TOKEN }}
fuzz:
# Short-budget libFuzzer smoke run across all targets in a single
# runner. The asan+sancov sub-build of `structured-zstd` is the
# dominant cost (~2 min); running the five targets sequentially in
# one job amortises that build (cargo cache reuse drops targets
# 2..5 to a few seconds each) and halves the compute vs a per-
# target matrix. Wall-clock stays bounded by the bench matrix,
# which runs in parallel and takes far longer.
#
# The repo ships a regression corpus in `zstd/fuzz/artifacts/`;
# for every target cargo-fuzz replays that corpus first (any old
# crash that resurfaces fails the job), then runs
# `-max_total_time` seconds of fresh fuzzing on top.
name: Fuzz smoke
needs: lint
timeout-minutes: 20
runs-on: ubuntu-latest
env:
# Override `rust-toolchain.toml` (which pins stable) so `cargo fuzz`
# — which requires `-Z sanitizer` from nightly — gets a nightly
# compiler inside the fuzz crate sub-build.
RUSTUP_TOOLCHAIN: nightly
# Explicit target: the prebuilt `cargo-fuzz` binary installed via
# taiki-e/install-action is statically linked against musl and its
# `default_target()` probe picks `x86_64-unknown-linux-musl`,
# which fails because libFuzzer's AddressSanitizer cannot link
# against a static libc (`sanitizer is incompatible with
# statically linked libc`). Pinning the gnu target sidesteps the
# probe and matches the toolchain rustc actually has stdlib for.
FUZZ_TARGET_TRIPLE: x86_64-unknown-linux-gnu
# Single source of truth for the fuzz target inventory; both the
# corpus replay step and the fresh-fuzz step iterate over this.
FUZZ_TARGETS: "decode encode interop huff0 fse"
steps:
- uses: actions/checkout@v6
- uses: dtolnay/rust-toolchain@nightly
- uses: Swatinem/rust-cache@v2
with:
prefix-key: fuzz
workspaces: zstd/fuzz
- name: Install cargo-fuzz
uses: taiki-e/install-action@v2
with:
tool: cargo-fuzz
- name: Replay regression corpus
# Drive the existing crash artifacts through each target so
# any reintroduction of a previously-fixed bug fails CI on
# the same input that originally surfaced it.
working-directory: zstd/fuzz
run: |
for target in $FUZZ_TARGETS; do
if [ -d "artifacts/$target" ] && [ -n "$(ls artifacts/$target 2>/dev/null)" ]; then
echo "Replaying $(ls artifacts/$target | wc -l) regression artifacts for $target"
cargo fuzz run --target "$FUZZ_TARGET_TRIPLE" "$target" artifacts/"$target"/*
else
echo "No regression artifacts for $target — skipping replay"
fi
done
- name: Fuzz (90s per target)
working-directory: zstd/fuzz
# `-max_total_time` is libFuzzer's own time cap; on top of
# the GitHub Actions timeout-minutes it gives us a hard
# ceiling even if a target wedges in setup.
run: |
for target in $FUZZ_TARGETS; do
echo "::group::Fuzz $target"
cargo fuzz run --target "$FUZZ_TARGET_TRIPLE" "$target" -- -max_total_time=90 -timeout=30
echo "::endgroup::"
done
- name: Upload new crash artifacts on failure
if: failure()
uses: actions/upload-artifact@v7
with:
name: fuzz-artifacts
path: zstd/fuzz/artifacts/
if-no-files-found: ignore
retention-days: 14
bench-matrix:
# Canonical bench target inventory. Every downstream bench job
# (`bench-build`, `benchmark`, `benchmark-aggregate`,
# `benchmark-pages`) consumes this so a new target ID is added
# in exactly one place. `build_setup` runs only on `bench-build`;
# `runtime_setup` runs only on the bench shards (e.g. `i686-gnu`
# needs gcc-multilib's 32-bit loader at runtime, while `x86_64-musl`
# only needs `musl-tools` for the build).
#
# The bench pipeline runs ONLY on push-to-main, never on PRs (#362).
# Benchmarks on GitHub free runners (~6-8x slower than a dev box) cost
# ~20 min per (target, L22) shard, and the pre-merge regression gate it
# used to feed added runner load for little signal. `main` is the
# immutable historical record: the `dev/bench` dashboard surfaces perf
# post-merge, which is where it gets published. This single gate
# cascades through `needs:` so EVERY downstream bench job
# (`bench-build`, `benchmark`, `benchmark-aggregate`, `benchmark-pages`)
# stays green-skipped on PRs without each needing its own filter.
# See #164 (sharding) and #362 (push-only gate).
#
# Second gate: `needs.changes.outputs.rust_core`. A push to main that
# touched ONLY wasm / npm / ts / js / CI / docs cannot move the
# Rust-vs-FFI numbers, so the bench pipeline is skipped on it (the
# published dashboard snapshot stays the last Rust-core baseline).
# Same cascade — skipping `bench-matrix` skips every downstream bench job.
name: Resolve bench target matrix
needs: [lint, changes]
if: github.event_name == 'push' && github.ref == 'refs/heads/main' && needs.changes.outputs.rust_core == 'true'
runs-on: ubuntu-latest
outputs:
targets: ${{ steps.set.outputs.targets }}
ids_csv: ${{ steps.set.outputs.ids_csv }}
shards: ${{ steps.set.outputs.shards }}
shards_csv: ${{ steps.set.outputs.shards_csv }}
steps:
- id: set
env:
# Drives the shard plan below. On a `pull_request` we run
# only the two canonical levels (`level_3_dfast` = donor
# default, `level_22_btultra2` = max compression) bundled
# into a single shard per target — three shards total, cheap
# PR feedback. On a `push: main` (post-merge), one shard per
# strategy group runs — nine groups (fast split into
# `fast-neg` / `fast-pos`; lazy split into `lazy-lower` /
# `lazy-upper` to keep the worst-case per-shard wall under
# the 120-min CI cap) × three targets = 27 shards, so the
# published gh-pages snapshot keeps full coverage for the
# dashboard + tagged baselines (#164).
EVENT_NAME: ${{ github.event_name }}
run: |
cat > targets.json <<'EOF'
[
{
"id": "x86_64-gnu",
"target_triple": "x86_64-unknown-linux-gnu",
"build_setup": "",
"runtime_setup": "",
"timeout_minutes": 120
},
{
"id": "i686-gnu",
"target_triple": "i686-unknown-linux-gnu",
"build_setup": "sudo apt-get update && sudo apt-get install -y gcc-multilib libc6-dev-i386",
"runtime_setup": "sudo apt-get update && sudo apt-get install -y libc6-dev-i386",
"timeout_minutes": 120
},
{
"id": "x86_64-musl",
"target_triple": "x86_64-unknown-linux-musl",
"build_setup": "sudo apt-get update && sudo apt-get install -y musl-tools",
"runtime_setup": "",
"timeout_minutes": 120
}
]
EOF
targets_compact=$(jq -c . targets.json)
ids_csv=$(jq -r '[.[].id] | join(",")' targets.json)
echo "targets=$targets_compact" >> "$GITHUB_OUTPUT"
echo "ids_csv=$ids_csv" >> "$GITHUB_OUTPUT"
echo "Bench targets: $ids_csv"
# Shard plan: each entry runs a comma-separated set of
# levels through one bench binary invocation via the
# `STRUCTURED_ZSTD_BENCH_LEVEL_FILTER` env var. `id` drives
# the artifact name (`benchmark-shard-<target>-<id>`) and
# the per-file suffix in the markdown / JSON outputs.
#
# PR event = single shard covering the two canonical levels
# so reviewers see ratio + speed + memory deltas on the
# default-level path (level_3_dfast) and the max-compression
# path (level_22_btultra2) within minutes. Strategy groups
# mirror `clevels.h` + `StrategyTag::for_level` so an
# entire strategy's levels share a runner — keeps per-job
# build overhead amortised across the levels of that family.
if [ "$EVENT_NAME" = "pull_request" ]; then
cat > shards.json <<'EOF'
[
{
"id": "pr-canonical",
"levels": "level_3_dfast,level_22_btultra2"
}
]
EOF
else
cat > shards.json <<'EOF'
[
{
"id": "fast-neg",
"levels": "level_-7_fast,level_-6_fast,level_-5_fast,level_-4_fast,level_-3_fast,level_-2_fast"
},
{
"id": "fast-dfast",
"levels": "level_-1_fast,level_1_fast,level_2_fast,level_3_dfast,level_4_dfast,level_5_greedy"
},
{
"id": "lazy-lower",
"levels": "level_6_lazy,level_7_lazy,level_8_lazy,level_9_lazy,level_10_lazy"
},
{
"id": "lazy-upper",
"levels": "level_11_lazy,level_12_lazy,level_13_lazy,level_14_lazy,level_15_lazy"
},
{
"id": "btopt",
"levels": "level_16_btopt,level_17_btopt,level_1_fast_ldm,level_22_btultra2_ldm,level_1_fast_ldm_dict,level_22_btultra2_ldm_dict"
},
{
"id": "btultra2",
"levels": "level_18_btultra,level_19_btultra2,level_20_btultra2,level_21_btultra2,level_22_btultra2"
}
]
EOF
fi
shards_compact=$(jq -c . shards.json)
shards_csv=$(jq -r '[.[].id] | join(",")' shards.json)
echo "shards=$shards_compact" >> "$GITHUB_OUTPUT"
echo "shards_csv=$shards_csv" >> "$GITHUB_OUTPUT"
echo "Bench shards ($EVENT_NAME): $shards_csv"
bench-build:
# Build the criterion `compare_ffi` binary once per target. Every
# downstream bench shard (target × level) downloads the binary
# via `bench-binary-<target>` artifact and runs it directly — no
# rebuild per shard. Saves ~4-7 min on each of the 18 shard
# runners.
name: Build bench binary (${{ matrix.bench.id }})
needs: [lint, bench-matrix]
timeout-minutes: 20
strategy:
fail-fast: false
matrix:
bench: ${{ fromJSON(needs.bench-matrix.outputs.targets) }}
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- uses: dtolnay/rust-toolchain@stable
- name: Install benchmark target
run: rustup target add ${{ matrix.bench.target_triple }}
- name: Install build toolchain dependencies
if: matrix.bench.build_setup != ''
run: ${{ matrix.bench.build_setup }}
- uses: Swatinem/rust-cache@v2
with:
prefix-key: bench-${{ matrix.bench.id }}
- name: Build compare_ffi + compare_ffi_memory bench binaries
env:
CC_x86_64_unknown_linux_musl: musl-gcc
# The donor zstd-sys C library uses runtime feature detection
# (is_x86_feature_detected!-equivalent) so it transparently
# picks up BMI2/AVX2/etc. on the runner. Pure-Rust hot paths
# gate intrinsics on COMPILE-time cfg!(target_feature = ...)
# and the default rustc x86_64 target ships with SSE2 only.
# Without explicit target selection the bench compares
# "donor with full ISA" vs "us with SSE2 baseline" — not
# apples-to-apples.
#
# Use a DETERMINISTIC baseline (x86-64-v3 = BMI2 + AVX2 +
# everything in the Haswell ISA, the 2013+ x86_64 baseline)
# ONLY for x86_64 targets via target.<triple>.rustflags. NOT
# target-cpu=native: that picks whatever CPU the BUILD runner
# has, which (a) varies across github-runners, (b) crashes
# with SIGILL when a bench shard runner lacks features the
# build runner had, and (c) is meaningless for cross-compile
# targets like i686-unknown-linux-gnu.
#
# i686 / non-x86 / musl targets keep the default rustc
# baseline. Measured +8.5% on
# decompress/level_-1_fast/decodecorpus-z000033/c_stream on
# i9-9900K; the win comes from _bzhi_u64-backed
# mask_lower_bits in the FSE state-update hot path.
CARGO_TARGET_X86_64_UNKNOWN_LINUX_GNU_RUSTFLAGS: "-C target-cpu=x86-64-v3"
CARGO_TARGET_X86_64_UNKNOWN_LINUX_MUSL_RUSTFLAGS: "-C target-cpu=x86-64-v3"
run: |
# `--no-run` builds without executing; `--message-format=json`
# exposes the resolved binary path in the build log so we
# can ship just the executable to shards. Building BOTH
# bench binaries in one cargo invocation reuses dependency
# compilation between them (zstd-sys, criterion, etc.).
cargo bench --bench compare_ffi --bench compare_ffi_memory \
-p structured-zstd --features dict_builder \
--target ${{ matrix.bench.target_triple }} --no-run \
--message-format=json > build.log
mkdir -p bench-binary
for name in compare_ffi compare_ffi_memory; do
bin_path=$(jq -r --arg n "$name" 'select(.executable != null and (.target.name == $n)) | .executable' build.log | tail -1)
if [ -z "$bin_path" ] || [ ! -x "$bin_path" ]; then
echo "ERROR: failed to locate $name binary in cargo output" >&2
cat build.log | jq -r 'select(.executable != null) | "\(.target.name) \(.executable)"' >&2
exit 1
fi
cp "$bin_path" "bench-binary/$name"
chmod +x "bench-binary/$name"
echo "$name size: $(wc -c < bench-binary/$name) bytes"
done
- name: Upload bench binary
uses: actions/upload-artifact@v7
with:
name: bench-binary-${{ matrix.bench.id }}
path: bench-binary/
if-no-files-found: error
retention-days: 7
benchmark:
name: Bench ${{ matrix.bench.id }} / ${{ matrix.shard.id }}
needs: [bench-build, bench-matrix]
timeout-minutes: ${{ matrix.bench.timeout_minutes }}
strategy:
# Matrix split target × level. The pre-built binary from
# `bench-build` is what each shard executes, so the runtime
# budget per shard is purely the criterion measurement +
# post-processing. `level22` on i686 is still the natural
# bottleneck (~20 min); every other (target, level) combo
# finishes well under 10 min.
fail-fast: false
matrix:
bench: ${{ fromJSON(needs.bench-matrix.outputs.targets) }}
# Shard plan is resolved in `bench-matrix.outputs.shards`.
# Each shard owns one strategy-grouped level bundle (PR runs
# a single `pr-canonical` shard with level_3 + level_22; main
# runs nine strategy groups — see #164 for the fast/lazy
# split rationale). `shard.levels` is a CSV that we forward
# into `STRUCTURED_ZSTD_BENCH_LEVEL_FILTER` so the bench
# binary iterates the requested levels in one process.
shard: ${{ fromJSON(needs.bench-matrix.outputs.shards) }}
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
# Needed for: corpus files referenced via `env!("CARGO_MANIFEST_DIR")`
# inside the bench binary, the run-benchmarks.sh script, and
# the python post-processor.
- name: Install target runtime dependencies
if: matrix.bench.runtime_setup != ''
run: ${{ matrix.bench.runtime_setup }}
- name: Download pre-built bench binary
uses: actions/download-artifact@v8
with:
name: bench-binary-${{ matrix.bench.id }}
path: bench-binary
- name: Mark bench binaries executable
# `actions/download-artifact` strips the executable bit
# (downloaded files land as mode 0644). Both binaries shipped
# in the artifact need +x — the memory binary's `-x` check in
# run-benchmarks.sh would otherwise reject it on main pushes.
run: |
chmod +x bench-binary/compare_ffi
if [ -f bench-binary/compare_ffi_memory ]; then
chmod +x bench-binary/compare_ffi_memory
fi
- name: Run benchmarks (filtered to shard's levels)
env:
STRUCTURED_ZSTD_BENCH_TARGET: ${{ matrix.bench.id }}
STRUCTURED_ZSTD_BENCH_TRIPLE: ${{ matrix.bench.target_triple }}
STRUCTURED_ZSTD_BENCH_GENERATED_AT: ${{ github.event_name == 'pull_request' && github.event.pull_request.updated_at || github.event.head_commit.timestamp || github.event.repository.updated_at }}
STRUCTURED_ZSTD_BENCH_LEVEL_FILTER: ${{ matrix.shard.levels }}
# run-benchmarks.sh: re-exec this binary instead of `cargo bench`.
STRUCTURED_ZSTD_BENCH_BIN: ${{ github.workspace }}/bench-binary/compare_ffi
# Memory bench runs only on main pushes — its TrackingAllocator
# measures peak alloc bytes precisely but adds per-allocation
# overhead, so we don't want it on every PR review cycle. On
# main pushes (`event_name == 'push'`) the second binary is
# invoked sequentially by run-benchmarks.sh and its REPORT_MEM
# lines feed the dashboard's `peak_alloc_bytes` metric.
STRUCTURED_ZSTD_BENCH_MEMORY_BIN: ${{ github.event_name == 'push' && format('{0}/bench-binary/compare_ffi_memory', github.workspace) || '' }}
# The prebuilt bench binary is launched directly (not via cargo),
# so `env::var("CARGO_MANIFEST_DIR")` returns None inside it.
# Without this override, `load_decode_corpus_scenario()` falls
# back to the synthetic 1 MiB corpus and the bench label silently
# flips from `decodecorpus-z000033` to `decodecorpus-synthetic-1m`,
# making dashboards diverge from a baseline produced via
# `cargo bench`. Point the binary at the checkout's real fixture.
STRUCTURED_ZSTD_BENCH_CORPUS_PATH: ${{ github.workspace }}/zstd/decodecorpus_files/z000033
run: bash .github/scripts/run-benchmarks.sh
- name: Rename benchmark outputs for matrix artifact
run: |
mv benchmark-results.json benchmark-results.${{ matrix.bench.id }}.${{ matrix.shard.id }}.json
mv benchmark-report.md benchmark-report.${{ matrix.bench.id }}.${{ matrix.shard.id }}.md
mv benchmark-delta.json benchmark-delta.${{ matrix.bench.id }}.${{ matrix.shard.id }}.json
mv benchmark-delta.md benchmark-delta.${{ matrix.bench.id }}.${{ matrix.shard.id }}.md
mv benchmark-relative.json benchmark-relative.${{ matrix.bench.id }}.${{ matrix.shard.id }}.json
- name: Upload benchmark shard artifacts
uses: actions/upload-artifact@v7
with:
name: benchmark-shard-${{ matrix.bench.id }}-${{ matrix.shard.id }}
path: |
benchmark-results.${{ matrix.bench.id }}.${{ matrix.shard.id }}.json
benchmark-report.${{ matrix.bench.id }}.${{ matrix.shard.id }}.md
benchmark-delta.${{ matrix.bench.id }}.${{ matrix.shard.id }}.json
benchmark-delta.${{ matrix.bench.id }}.${{ matrix.shard.id }}.md
benchmark-relative.${{ matrix.bench.id }}.${{ matrix.shard.id }}.json
if-no-files-found: error
# Intermediate inputs to `benchmark-aggregate`; match the
# 7-day retention used for `bench-binary-*`.
retention-days: 7
benchmark-aggregate:
name: Aggregate benchmark shards per target
needs: [benchmark, bench-matrix]
timeout-minutes: 10
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- name: Generate bot token
id: bot-token
if: github.event_name == 'push' || github.event.pull_request.head.repo.full_name == github.repository
uses: actions/create-github-app-token@v3
with:
app-id: ${{ secrets.RELEASER_APP_ID }}
private-key: ${{ secrets.RELEASER_APP_PRIVATE_KEY }}
- name: Download benchmark shard artifacts
uses: actions/download-artifact@v8
with:
pattern: benchmark-shard-*
path: benchmark-artifacts
- name: Aggregate level shards into per-target files
env:
AGGREGATE_TARGETS: ${{ needs.bench-matrix.outputs.ids_csv }}
run: python3 .github/scripts/aggregate-bench-levels.py
- name: Upload aggregated benchmark artifact
# Single combined artifact carrying per-target consolidated
# files. `merge-benchmarks.py` rglob's the download root so
# it picks them up regardless of subdir layout; this lets
# bench-pages download one artifact instead of one per target.
uses: actions/upload-artifact@v7
with:
name: benchmark-aggregated
path: |
benchmark-results.*.json
benchmark-report.*.md
benchmark-delta.*.json
benchmark-delta.*.md
benchmark-relative.*.json
if-no-files-found: error
retention-days: 7
# Save baseline (main push only). Intentionally NO `fail-on-alert`
# and NO `comment-on-alert` — this step only records the new baseline
# for the dashboard. The pre-merge PR regression gate was removed in
# #362 (it forced the full bench matrix on every PR for little signal);
# perf is now surfaced post-merge via the `dev/bench` dashboard. Adding
# an alert/fail path here would re-create the stuck-baseline cascade
# from #158: a regression on main push would fail the step before
# `save-data-file` ran, freezing the baseline indefinitely.
- name: Save benchmark baseline (main push only)
if: steps.bot-token.outputs.token != '' && github.event_name == 'push' && github.ref == 'refs/heads/main'
uses: benchmark-action/github-action-benchmark@v1
with:
name: "structured-zstd vs C FFI (x86_64-gnu)"
tool: customSmallerIsBetter
output-file-path: benchmark-results.x86_64-gnu.json
github-token: ${{ steps.bot-token.outputs.token }}
auto-push: true
save-data-file: true
comment-on-alert: false
fail-on-alert: false
alert-threshold: "130%"
benchmark-data-dir-path: dev/bench
benchmark-pages:
name: Publish benchmark pages payloads
needs: benchmark-aggregate
# Push-to-main only — the whole bench pipeline is gated there now (#362);
# the PR branch of this filter was dead once `bench-matrix` stopped
# running on PRs (cascade-skip), and PR runs never published anyway.
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
timeout-minutes: 20
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- name: Generate bot token
id: bot-token
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
uses: actions/create-github-app-token@v3
with:
app-id: ${{ secrets.RELEASER_APP_ID }}
private-key: ${{ secrets.RELEASER_APP_PRIVATE_KEY }}
- name: Checkout gh-pages with push token
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
uses: actions/checkout@v6
with:
ref: gh-pages
token: ${{ steps.bot-token.outputs.token }}
path: gh-pages
# Only the aggregated per-target files feed merge-benchmarks.py.
# The level shards (benchmark-shard-*) are intermediate inputs
# to `benchmark-aggregate` and would otherwise pollute
# merge-benchmarks.py's per-target name extraction with
# `<target>.<level>` keys, so we download just the single
# combined `benchmark-aggregated` artifact here.
- uses: actions/download-artifact@v8
with:
name: benchmark-aggregated
path: benchmark-artifacts
- name: Merge relative/delta payloads
run: python3 .github/scripts/merge-benchmarks.py
- name: Publish benchmark reports to gh-pages
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
run: |
mkdir -p gh-pages/dev/bench
cp merged/benchmark-report.md gh-pages/dev/bench/benchmark-report.md
cp merged/benchmark-delta.json gh-pages/dev/bench/benchmark-delta.json
cp merged/benchmark-delta.md gh-pages/dev/bench/benchmark-delta.md
cp merged/benchmark-summary.md gh-pages/dev/bench/benchmark-summary.md
cp merged/benchmark-delta-summary.md gh-pages/dev/bench/benchmark-delta-summary.md
cp merged/benchmark-relative.json gh-pages/dev/bench/benchmark-relative.json
cp .github/bench-dashboard/index.html gh-pages/dev/bench/index.html
cd gh-pages
git config user.name "github-actions[bot]"
git config user.email "41898282+github-actions[bot]@users.noreply.github.com"
git add dev/bench/index.html dev/bench/benchmark-report.md dev/bench/benchmark-delta.json dev/bench/benchmark-delta.md dev/bench/benchmark-relative.json dev/bench/benchmark-summary.md dev/bench/benchmark-delta-summary.md
git diff --cached --quiet || git commit -m "chore(bench): publish benchmark reports"
git push origin gh-pages