Benchmark script to wrap dtest#202
Conversation
Run bazel build //library/tests:dtest before benchmarking when --build is passed; DRY_RUN prints the build command without executing it. Co-authored-by: Cursor <cursoragent@cursor.com>
Arguments after -- are forwarded to every dtest invocation (e.g. thread count and -r); benchmark -n remains the repeat count before --. Co-authored-by: Cursor <cursoragent@cursor.com>
Avoids clashing with dtest -n; DRY_RUN now prints commands only without fake timing rows or summary. Co-authored-by: Cursor <cursoragent@cursor.com>
Clarify default vs env overrides for --repeats and --max-deals; error now says 10^n <= N to match filtering. Co-authored-by: Cursor <cursoragent@cursor.com>
Use --branch/--compare (and BRANCH/COMPARE env) instead of dtest1/dtest2; widen ver/file columns and align summary output. Co-authored-by: Cursor <cursoragent@cursor.com>
Format speedup as a fixed-width string so the trailing x stays in-column; give note a 15-char field. Co-authored-by: Cursor <cursoragent@cursor.com>
Validate repeats, skip binary checks in dry-run, run branch before compare, and tighten dtest timing parse so the cmp/branch summary is easier to read. Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
Pull request overview
Adds a new benchmark.sh helper script to benchmark library/tests/dtest across solver modes and hand-file sizes, with optional building and binary-to-binary comparison.
Changes:
- Introduces a Bash benchmark runner that iterates solver/file combinations and prints per-run timing rows.
- Adds optional
--build,--compare,--repeats, and--max-dealscontrols plus--pass-through args todtest. - Adds an aggregated comparison summary when a second binary is provided.
Use a portable mktemp template, normalize dtest "zero" timings, warn only when user/sys are missing, and label the compare summary as avg user ms. Co-authored-by: Cursor <cursoragent@cursor.com>
Run solve before calc and largest hand files first to reduce warmup bias, aggregate summary on avg_user per hand, and add --reverse to run compare before branch when comparing two binaries. Co-authored-by: Cursor <cursoragent@cursor.com>
Parse Number of hands and compute user/hands when Avg user time is missing (e.g. zero user time), replacing the broken Copilot autofix. Co-authored-by: Cursor <cursoragent@cursor.com>
Show n/a/equal for 0/0, inf/branch faster when compare is nonzero and branch is zero, and keep 0.00x/compare faster when only compare is zero. Co-authored-by: Cursor <cursoragent@cursor.com>
Run branch then compare (or reverse) for each repeat before advancing, so paired timings see the same CPU warmth instead of all branch runs first. Co-authored-by: Cursor <cursoragent@cursor.com>
zzcgumn
left a comment
There was a problem hiding this comment.
Interesting idea. I am not sure about adding this to the CI run as we don't yet have an idea about how large performance regressions we can tolerate.
I did a couple of comparisons like this manually while refactoring from 2.9 to 3.0. The main problem I found was methods that had been inlined but were not in my release candidate code.
|
I had not considered adding this to the CI run. I like the idea! Something simpler could work for CI, wrap dtest and fail if the runtime for any of list1/10/100 or calc 1/10/100 becomes slower by more than a threshold. There's a fair bit of random variation from run to run, though, and some non-random order variation. On my Mac, the current version speeds up by more than 10% when the machine is "warm", in particular if running solve after many calc runs. |
This reverts commit 5626ea7.
Assume summary avg_user values are always positive and compute u2/u1 directly instead of guarding a zero branch average. Co-authored-by: Cursor <cursoragent@cursor.com>
Add --details to opt into per-run timing rows; branch-only runs still print the full table as before. Co-authored-by: Cursor <cursoragent@cursor.com>
On a tty, print per-run timing rows while --compare runs then erase them (including the table header) before the summary; use --details to keep rows. Co-authored-by: Cursor <cursoragent@cursor.com>
The ratio column is self-explanatory without the inline note. Co-authored-by: Cursor <cursoragent@cursor.com>
Default 0.5% (--epsilon / EPSILON) marks branch and compare as equal when avg user times differ by less than that relative threshold. Co-authored-by: Cursor <cursoragent@cursor.com>
Show the compare-mode tolerance flag in the header and help examples. Co-authored-by: Cursor <cursoragent@cursor.com>
Document that zero or skewed averages can come from per-interval int ms rounding and point to accumulating microseconds in TestTimer.cpp. Co-authored-by: Cursor <cursoragent@cursor.com>
Say rounding to zero rather than a rounding error. Co-authored-by: Cursor <cursoragent@cursor.com>
Sample output:
Compare against a binary from another branch or v2.9. Comparing against a branch could be automated; as is I build a dtest in another branch, rename it, and move it to dds/..