Skip to content

Benchmark script to wrap dtest#202

Merged
tameware merged 25 commits into
dds-bridge:developfrom
tameware:benchmark
Jun 21, 2026
Merged

Benchmark script to wrap dtest#202
tameware merged 25 commits into
dds-bridge:developfrom
tameware:benchmark

Conversation

@tameware

@tameware tameware commented Jun 19, 2026

Copy link
Copy Markdown
Collaborator

Sample output:

$ ./benchmark.sh 
DDS dtest benchmark
===================
branch:      /Users/adamw/src/dds/bazel-bin/library/tests/dtest
hands dir:   /Users/adamw/src/dds/hands
max_deals:   100
files:       list100.txt list10.txt list1.txt
git branch:  benchmark
repeats:     1

solver file              ver  user_ms   sys_ms   avg_user  ratio run
------ ------------- ------- -------- -------- ---------- ------ ---
solve  list100.txt    branch      284     2428       2.84   8.55 1/1
solve  list10.txt     branch       47      133       4.70   2.83 1/1
solve  list1.txt      branch       15       15      15.00   1.00 1/1
calc   list100.txt    branch     1457    14900      14.57  10.23 1/1
calc   list10.txt     branch      237     1178      23.70   4.97 1/1
calc   list1.txt      branch       45      149      45.00   3.31 1/1

Completed 6 runs (6 expected).

Compare against a binary from another branch or v2.9. Comparing against a branch could be automated; as is I build a dtest in another branch, rename it, and move it to dds/..

$ ./benchmark.sh --build --compare ../dtest_solver_context_reuse --max-deals 100
Building //library/tests:dtest...
Starting local Bazel server (9.1.0 Homebrew) and connecting to it...
INFO: Analyzed target //library/tests:dtest (126 packages loaded, 1424 targets configured).
INFO: Found 1 target...
Target //library/tests:dtest up-to-date:
  bazel-bin/library/tests/dtest
INFO: Elapsed time: 3.827s, Critical Path: 0.02s
INFO: 1 process: 96 action cache hit, 1 internal.
INFO: Build completed successfully, 1 total action
DDS dtest benchmark
===================
branch:      /Users/adamw/src/dds/bazel-bin/library/tests/dtest
compare:     ../dtest_solver_context_reuse
run order:   branch, compare
hands dir:   /Users/adamw/src/dds/hands
max_deals:   100
files:       list100.txt list10.txt list1.txt
git branch:  benchmark
repeats:     1

solver file              ver  user_ms   sys_ms   avg_user  ratio run
------ ------------- ------- -------- -------- ---------- ------ ---
solve  list100.txt    branch      300     2417       3.00   8.06 1/1
solve  list100.txt   compare      246     2450       2.46   9.96 1/1
solve  list10.txt     branch       47      127       4.70   2.70 1/1
solve  list10.txt    compare       50      134       5.00   2.68 1/1
solve  list1.txt      branch       15       15      15.00   1.00 1/1
solve  list1.txt     compare       16       16      16.00   1.00 1/1
calc   list100.txt    branch     1443    14842      14.43  10.29 1/1
calc   list100.txt   compare     1408    14731      14.08  10.46 1/1
calc   list10.txt     branch      240     1123      24.00   4.68 1/1
calc   list10.txt    compare      241     1173      24.10   4.87 1/1
calc   list1.txt      branch       46      147      46.00   3.20 1/1
calc   list1.txt     compare       45      147      45.00   3.27 1/1

Summary (branch vs compare, avg user ms; cmp/branch > 1 => branch faster)
==============================================================================
solver file           compare_avg   branch_avg cmp/branch note           
------ ------------- ------------ ------------ ---------- ---------------
solve  list100.txt           2.46         3.00      0.82x compare faster 
solve  list10.txt            5.00         4.70      1.06x branch faster  
solve  list1.txt            16.00        15.00      1.07x branch faster  
calc   list100.txt          14.08        14.43      0.98x compare faster 
calc   list10.txt           24.10        24.00      1.00x branch faster  
calc   list1.txt            45.00        46.00      0.98x compare faster 

Completed 12 runs (12 expected).

tameware and others added 11 commits June 17, 2026 22:45
Run bazel build //library/tests:dtest before benchmarking when --build is
passed; DRY_RUN prints the build command without executing it.

Co-authored-by: Cursor <cursoragent@cursor.com>
Arguments after -- are forwarded to every dtest invocation (e.g. thread
count and -r); benchmark -n remains the repeat count before --.

Co-authored-by: Cursor <cursoragent@cursor.com>
Avoids clashing with dtest -n; DRY_RUN now prints commands only without fake timing rows or summary.

Co-authored-by: Cursor <cursoragent@cursor.com>
Clarify default vs env overrides for --repeats and --max-deals; error now says 10^n <= N to match filtering.

Co-authored-by: Cursor <cursoragent@cursor.com>
Use --branch/--compare (and BRANCH/COMPARE env) instead of dtest1/dtest2; widen ver/file columns and align summary output.

Co-authored-by: Cursor <cursoragent@cursor.com>
Format speedup as a fixed-width string so the trailing x stays in-column; give note a 15-char field.

Co-authored-by: Cursor <cursoragent@cursor.com>
Validate repeats, skip binary checks in dry-run, run branch before compare,
and tighten dtest timing parse so the cmp/branch summary is easier to read.

Co-authored-by: Cursor <cursoragent@cursor.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new benchmark.sh helper script to benchmark library/tests/dtest across solver modes and hand-file sizes, with optional building and binary-to-binary comparison.

Changes:

  • Introduces a Bash benchmark runner that iterates solver/file combinations and prints per-run timing rows.
  • Adds optional --build, --compare, --repeats, and --max-deals controls plus -- pass-through args to dtest.
  • Adds an aggregated comparison summary when a second binary is provided.

Comment thread benchmark.sh Outdated
Comment thread benchmark.sh
Comment thread benchmark.sh
Comment thread benchmark.sh Outdated
tameware and others added 2 commits June 19, 2026 15:46
Use a portable mktemp template, normalize dtest "zero" timings, warn only
when user/sys are missing, and label the compare summary as avg user ms.

Co-authored-by: Cursor <cursoragent@cursor.com>
Run solve before calc and largest hand files first to reduce warmup bias,
aggregate summary on avg_user per hand, and add --reverse to run compare
before branch when comparing two binaries.

Co-authored-by: Cursor <cursoragent@cursor.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.

Comment thread benchmark.sh
@tameware tameware self-assigned this Jun 19, 2026
@tameware tameware requested a review from zzcgumn June 19, 2026 22:30
Parse Number of hands and compute user/hands when Avg user time is
missing (e.g. zero user time), replacing the broken Copilot autofix.

Co-authored-by: Cursor <cursoragent@cursor.com>
@tameware tameware marked this pull request as ready for review June 21, 2026 15:03
@tameware tameware requested a review from Copilot June 21, 2026 15:03

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.

Comment thread benchmark.sh Outdated
tameware and others added 2 commits June 21, 2026 11:58
Show n/a/equal for 0/0, inf/branch faster when compare is nonzero and
branch is zero, and keep 0.00x/compare faster when only compare is zero.

Co-authored-by: Cursor <cursoragent@cursor.com>
Run branch then compare (or reverse) for each repeat before advancing,
so paired timings see the same CPU warmth instead of all branch runs first.

Co-authored-by: Cursor <cursoragent@cursor.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.

Comment thread benchmark.sh Outdated

@zzcgumn zzcgumn left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting idea. I am not sure about adding this to the CI run as we don't yet have an idea about how large performance regressions we can tolerate.

I did a couple of comparisons like this manually while refactoring from 2.9 to 3.0. The main problem I found was methods that had been inlined but were not in my release candidate code.

@tameware

Copy link
Copy Markdown
Collaborator Author

I had not considered adding this to the CI run. I like the idea! Something simpler could work for CI, wrap dtest and fail if the runtime for any of list1/10/100 or calc 1/10/100 becomes slower by more than a threshold. There's a fair bit of random variation from run to run, though, and some non-random order variation. On my Mac, the current version speeds up by more than 10% when the machine is "warm", in particular if running solve after many calc runs.

tameware and others added 7 commits June 21, 2026 15:11
Assume summary avg_user values are always positive and compute u2/u1
directly instead of guarding a zero branch average.

Co-authored-by: Cursor <cursoragent@cursor.com>
Add --details to opt into per-run timing rows; branch-only runs still
print the full table as before.

Co-authored-by: Cursor <cursoragent@cursor.com>
On a tty, print per-run timing rows while --compare runs then erase them
(including the table header) before the summary; use --details to keep rows.

Co-authored-by: Cursor <cursoragent@cursor.com>
The ratio column is self-explanatory without the inline note.

Co-authored-by: Cursor <cursoragent@cursor.com>
Default 0.5% (--epsilon / EPSILON) marks branch and compare as equal when
avg user times differ by less than that relative threshold.

Co-authored-by: Cursor <cursoragent@cursor.com>
Show the compare-mode tolerance flag in the header and help examples.

Co-authored-by: Cursor <cursoragent@cursor.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.

Comment thread benchmark.sh
Comment thread benchmark.sh Outdated

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.

Comment thread benchmark.sh Outdated
tameware and others added 2 commits June 21, 2026 17:07
Document that zero or skewed averages can come from per-interval int ms
rounding and point to accumulating microseconds in TestTimer.cpp.

Co-authored-by: Cursor <cursoragent@cursor.com>
Say rounding to zero rather than a rounding error.

Co-authored-by: Cursor <cursoragent@cursor.com>
@tameware tameware merged commit 3127749 into dds-bridge:develop Jun 21, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants