feat: add test_helpers module (error_utils, test_utils) behind test_utlis flag by naor-starkware · Pull Request #2381 · starkware-libs/cairo-vm

naor-starkware · 2026-04-06T20:35:23Z

TITLE

Description

Description of the pull request changes and motivation.

Checklist

Linked to Github Issue
Unit tests added
Integration tests added.
This change requires new documentation.
- Documentation has been added/updated.
- CHANGELOG has been updated.

This change is

…on_runner flag - Create vm/src/test_helpers/ with error_utils.rs and test_utils.rs - Move from cairo_test_suite/ (fix filename typo: utlis → utils) - Fix crate:: import paths (were cairo_vm:: when outside the crate) - Fix $crate in macro_export macro (clippy::crate_in_macro_def) - Simplify load_cairo_program! path using with_file_name() - Gate module behind function_runner feature in lib.rs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ram! and error_utils checkers Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Add AlwaysFailConversion helper + 2 tests for assert_mr_eq! unwrap_or_else panic branch (no-message and message variants) - Allow clippy::result_large_err on hint_err test helper Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… noise Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…y function name error #[macro_export] macros containing closures (|x| ...) cause llvm-cov to emit a "function name is empty" error. Replaced unwrap_or_else(|e| panic!(...)) with match expressions to eliminate closures from macro expansions. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Follow-up to dropping the function_runner feature flag. Gate test_helpers module and function_runner module under test_utils, and update the doc comment in function_runner.rs accordingly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

naor-starkware · 2026-04-06T20:35:37Z

This stack of pull requests is managed by Graphite. Learn more about stacking.

github-actions · 2026-04-06T20:42:56Z

**Hyper Thereading Benchmark results**




hyperfine -r 2 -n "hyper_threading_main threads: 1" 'RAYON_NUM_THREADS=1 ./hyper_threading_main' -n "hyper_threading_pr threads: 1" 'RAYON_NUM_THREADS=1 ./hyper_threading_pr'
Benchmark 1: hyper_threading_main threads: 1
  Time (mean ± σ):     22.849 s ±  0.164 s    [User: 22.198 s, System: 0.649 s]
  Range (min … max):   22.733 s … 22.965 s    2 runs
 
Benchmark 2: hyper_threading_pr threads: 1
  Time (mean ± σ):     22.818 s ±  0.015 s    [User: 22.157 s, System: 0.658 s]
  Range (min … max):   22.807 s … 22.829 s    2 runs
 
Summary
  hyper_threading_pr threads: 1 ran
    1.00 ± 0.01 times faster than hyper_threading_main threads: 1




hyperfine -r 2 -n "hyper_threading_main threads: 2" 'RAYON_NUM_THREADS=2 ./hyper_threading_main' -n "hyper_threading_pr threads: 2" 'RAYON_NUM_THREADS=2 ./hyper_threading_pr'
Benchmark 1: hyper_threading_main threads: 2
  Time (mean ± σ):     12.376 s ±  0.028 s    [User: 22.389 s, System: 0.668 s]
  Range (min … max):   12.356 s … 12.396 s    2 runs
 
Benchmark 2: hyper_threading_pr threads: 2
  Time (mean ± σ):     12.331 s ±  0.067 s    [User: 22.442 s, System: 0.664 s]
  Range (min … max):   12.284 s … 12.378 s    2 runs
 
Summary
  hyper_threading_pr threads: 2 ran
    1.00 ± 0.01 times faster than hyper_threading_main threads: 2




hyperfine -r 2 -n "hyper_threading_main threads: 4" 'RAYON_NUM_THREADS=4 ./hyper_threading_main' -n "hyper_threading_pr threads: 4" 'RAYON_NUM_THREADS=4 ./hyper_threading_pr'
Benchmark 1: hyper_threading_main threads: 4
  Time (mean ± σ):      9.576 s ±  0.142 s    [User: 35.698 s, System: 0.781 s]
  Range (min … max):    9.475 s …  9.677 s    2 runs
 
Benchmark 2: hyper_threading_pr threads: 4
  Time (mean ± σ):      9.961 s ±  0.344 s    [User: 35.035 s, System: 0.738 s]
  Range (min … max):    9.718 s … 10.205 s    2 runs
 
Summary
  hyper_threading_main threads: 4 ran
    1.04 ± 0.04 times faster than hyper_threading_pr threads: 4




hyperfine -r 2 -n "hyper_threading_main threads: 6" 'RAYON_NUM_THREADS=6 ./hyper_threading_main' -n "hyper_threading_pr threads: 6" 'RAYON_NUM_THREADS=6 ./hyper_threading_pr'
Benchmark 1: hyper_threading_main threads: 6
  Time (mean ± σ):      9.900 s ±  0.135 s    [User: 35.448 s, System: 0.818 s]
  Range (min … max):    9.804 s …  9.995 s    2 runs
 
Benchmark 2: hyper_threading_pr threads: 6
  Time (mean ± σ):      9.641 s ±  0.013 s    [User: 35.812 s, System: 0.819 s]
  Range (min … max):    9.632 s …  9.651 s    2 runs
 
Summary
  hyper_threading_pr threads: 6 ran
    1.03 ± 0.01 times faster than hyper_threading_main threads: 6




hyperfine -r 2 -n "hyper_threading_main threads: 8" 'RAYON_NUM_THREADS=8 ./hyper_threading_main' -n "hyper_threading_pr threads: 8" 'RAYON_NUM_THREADS=8 ./hyper_threading_pr'
Benchmark 1: hyper_threading_main threads: 8
  Time (mean ± σ):      9.487 s ±  0.072 s    [User: 36.030 s, System: 0.794 s]
  Range (min … max):    9.437 s …  9.538 s    2 runs
 
Benchmark 2: hyper_threading_pr threads: 8
  Time (mean ± σ):      9.385 s ±  0.019 s    [User: 36.152 s, System: 0.758 s]
  Range (min … max):    9.372 s …  9.399 s    2 runs
 
Summary
  hyper_threading_pr threads: 8 ran
    1.01 ± 0.01 times faster than hyper_threading_main threads: 8




hyperfine -r 2 -n "hyper_threading_main threads: 16" 'RAYON_NUM_THREADS=16 ./hyper_threading_main' -n "hyper_threading_pr threads: 16" 'RAYON_NUM_THREADS=16 ./hyper_threading_pr'
Benchmark 1: hyper_threading_main threads: 16
  Time (mean ± σ):      9.687 s ±  0.257 s    [User: 36.190 s, System: 0.844 s]
  Range (min … max):    9.505 s …  9.868 s    2 runs
 
Benchmark 2: hyper_threading_pr threads: 16
  Time (mean ± σ):      9.526 s ±  0.124 s    [User: 35.971 s, System: 0.830 s]
  Range (min … max):    9.438 s …  9.614 s    2 runs
 
Summary
  hyper_threading_pr threads: 16 ran
    1.02 ± 0.03 times faster than hyper_threading_main threads: 16

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

codecov · 2026-04-06T20:58:41Z

Codecov Report

❌ Patch coverage is 95.72650% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 96.07%. Comparing base (f7ac327) to head (9b1b8f5).

Files with missing lines	Patch %	Lines
vm/src/test_helpers/error_utils.rs	95.18%	4 Missing ⚠️
vm/src/test_helpers/test_utils.rs	96.87%	1 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff            @@
##             main    #2381    +/-   ##
========================================
  Coverage   96.07%   96.07%            
========================================
  Files         105      107     +2     
  Lines       37737    37852   +115     
========================================
+ Hits        36254    36366   +112     
- Misses       1483     1486     +3

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

github-actions · 2026-04-06T21:08:46Z

Benchmark Results for unmodified programs 🚀

Command	Mean [s]	Min [s]	Max [s]	Relative
`base big_factorial`	2.133 ± 0.025	2.110	2.187	1.01 ± 0.01
`head big_factorial`	2.121 ± 0.008	2.105	2.129	1.00

Command	Mean [s]	Min [s]	Max [s]	Relative
`base big_fibonacci`	2.063 ± 0.015	2.049	2.100	1.00
`head big_fibonacci`	2.069 ± 0.019	2.051	2.116	1.00 ± 0.01

Command	Mean [s]	Min [s]	Max [s]	Relative
`base blake2s_integration_benchmark`	7.551 ± 0.197	7.402	8.054	1.01 ± 0.03
`head blake2s_integration_benchmark`	7.460 ± 0.064	7.400	7.621	1.00

Command	Mean [s]	Min [s]	Max [s]	Relative
`base compare_arrays_200000`	2.202 ± 0.017	2.176	2.230	1.00
`head compare_arrays_200000`	2.206 ± 0.012	2.190	2.228	1.00 ± 0.01

Command	Mean [s]	Min [s]	Max [s]	Relative
`base dict_integration_benchmark`	1.440 ± 0.005	1.434	1.446	1.00 ± 0.01
`head dict_integration_benchmark`	1.435 ± 0.008	1.426	1.450	1.00

Command	Mean [s]	Min [s]	Max [s]	Relative
`base field_arithmetic_get_square_benchmark`	1.234 ± 0.013	1.225	1.268	1.00 ± 0.01
`head field_arithmetic_get_square_benchmark`	1.233 ± 0.010	1.224	1.248	1.00

Command	Mean [s]	Min [s]	Max [s]	Relative
`base integration_builtins`	7.549 ± 0.044	7.505	7.646	1.00
`head integration_builtins`	7.578 ± 0.020	7.550	7.616	1.00 ± 0.01

Command	Mean [s]	Min [s]	Max [s]	Relative
`base keccak_integration_benchmark`	7.644 ± 0.030	7.603	7.714	1.00
`head keccak_integration_benchmark`	7.676 ± 0.030	7.613	7.706	1.00 ± 0.01

Command	Mean [s]	Min [s]	Max [s]	Relative
`base linear_search`	2.191 ± 0.009	2.179	2.205	1.00
`head linear_search`	2.235 ± 0.060	2.179	2.394	1.02 ± 0.03

Command	Mean [s]	Min [s]	Max [s]	Relative
`base math_cmp_and_pow_integration_benchmark`	1.520 ± 0.018	1.508	1.564	1.00
`head math_cmp_and_pow_integration_benchmark`	1.526 ± 0.008	1.518	1.548	1.00 ± 0.01

Command	Mean [s]	Min [s]	Max [s]	Relative
`base math_integration_benchmark`	1.483 ± 0.011	1.468	1.503	1.00
`head math_integration_benchmark`	1.484 ± 0.013	1.470	1.512	1.00 ± 0.01

Command	Mean [s]	Min [s]	Max [s]	Relative
`base memory_integration_benchmark`	1.234 ± 0.004	1.231	1.241	1.00
`head memory_integration_benchmark`	1.238 ± 0.027	1.224	1.313	1.00 ± 0.02

Command	Mean [s]	Min [s]	Max [s]	Relative
`base operations_with_data_structures_benchmarks`	1.552 ± 0.008	1.542	1.572	1.00
`head operations_with_data_structures_benchmarks`	1.572 ± 0.028	1.550	1.642	1.01 ± 0.02

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`base pedersen`	536.9 ± 2.4	533.3	541.5	1.00
`head pedersen`	537.1 ± 4.4	533.8	548.9	1.00 ± 0.01

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`base poseidon_integration_benchmark`	622.8 ± 7.5	614.3	635.5	1.00
`head poseidon_integration_benchmark`	629.2 ± 16.0	611.4	649.9	1.01 ± 0.03

Command	Mean [s]	Min [s]	Max [s]	Relative
`base secp_integration_benchmark`	1.854 ± 0.025	1.833	1.920	1.00
`head secp_integration_benchmark`	1.862 ± 0.024	1.823	1.912	1.00 ± 0.02

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`base set_integration_benchmark`	668.6 ± 3.0	664.6	674.8	1.00 ± 0.01
`head set_integration_benchmark`	668.3 ± 3.2	664.4	676.4	1.00

Command	Mean [s]	Min [s]	Max [s]	Relative
`base uint256_integration_benchmark`	4.252 ± 0.019	4.226	4.285	1.00
`head uint256_integration_benchmark`	4.262 ± 0.019	4.237	4.287	1.00 ± 0.01

YairVaknin-starkware

@YairVaknin-starkware reviewed 9 files and all commit messages, and made 4 comments.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on naor-starkware).

vm/src/test_helpers/error_utils.rs line 44 at r2 (raw file):

/// Type alias for check functions that validate test results.
pub type VmCheck<T> = fn(&std::result::Result<T, CairoRunError>);

can just be Result, right? pls looks for similar instances where you can shorten.

Code quote:

&std::result::Result<T, CairoRunError>

vm/src/test_helpers/error_utils.rs line 62 at r2 (raw file):

}

/// Asserts that the result is `HintError::AssertNotEqualFail`.

This funcs have very repetitive boiler plate. Pls extract to a single func which these func will invoke that will get the res and the predicate u check.

Code quote:

/// Asserts that the result is `HintError::AssertNotEqualFail`.

vm/src/test_helpers/error_utils.rs line 220 at r2 (raw file):

    }

    /// `expect_hint_assert_not_zero` does not panic on `HintError::AssertNotZero`.

these error tests could be parameterized using rtest to reduce alot of repetitive boilerplate.

vm/src/test_helpers/test_utils.rs line 48 at r2 (raw file):

            Ok(v) => v,
            Err(e) => panic!("conversion to MaybeRelocatable failed: {e:?}"),
        };

pls factor out the conversion logic of both cases into a single func which also enforces coercion into MaybeRelocatable (currently it will work for any right that is able to be try_into'd left's type).

Code quote:

        let right_mr = match ($right).try_into() {
            Ok(v) => v,
            Err(e) => panic!("conversion to MaybeRelocatable failed: {e:?}"),
        };

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…nt::Internal These errors arrive wrapped as Hint(Internal(...)) since they originate inside hint execution, not as bare VirtualMachineError variants. Remove now-unused expect_vm_error helper and vm_err test helper. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

naor-starkware

@naor-starkware made 4 comments.
Reviewable status: 7 of 10 files reviewed, 4 unresolved discussions (waiting on YairVaknin-starkware).

vm/src/test_helpers/error_utils.rs line 44 at r2 (raw file):

Previously, YairVaknin-starkware wrote…

can just be Result, right? pls looks for similar instances where you can shorten.

Done.

vm/src/test_helpers/error_utils.rs line 62 at r2 (raw file):

Previously, YairVaknin-starkware wrote…

This funcs have very repetitive boiler plate. Pls extract to a single func which these func will invoke that will get the res and the predicate u check.

Done.

vm/src/test_helpers/error_utils.rs line 220 at r2 (raw file):

Previously, YairVaknin-starkware wrote…

these error tests could be parameterized using rtest to reduce alot of repetitive boilerplate.

Done.

vm/src/test_helpers/test_utils.rs line 48 at r2 (raw file):

Previously, YairVaknin-starkware wrote…

pls factor out the conversion logic of both cases into a single func which also enforces coercion into MaybeRelocatable (currently it will work for any right that is able to be try_into'd left's type).

Done.

naor-starkware and others added 9 commits April 6, 2026 20:39

test(test_helpers): add unit tests for assert_mr_eq!, load_cairo_prog…

0edf8b8

…ram! and error_utils checkers Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

style: cargo fmt

298b0c0

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

chore: update CHANGELOG for PR #2378

39fcf0f

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs: mark load_cairo_program! example as ignore to suppress llvm-cov…

b4cb481

… noise Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

refactor: make function_runner module and its methods public

5e2dd11

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

This was referenced Apr 6, 2026

feat(makefile,ci): add cairo_test_suite_programs target and CI integration #2380

Open

feat: add math cairo tests under vm/src/tests/cairo_test_suite #2379

Open

naor-starkware changed the title ~~feat: add test_helpers module (error_utils, test_utils) behind function_runner flag~~ feat: add test_helpers module (error_utils, test_utils) behind test_utlis flag Apr 6, 2026

chore: update CHANGELOG PR link from #2378 to #2381

516a2a5

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

naor-starkware marked this pull request as ready for review April 9, 2026 06:06

YairVaknin-starkware requested changes Apr 9, 2026

View reviewed changes

naor-starkware and others added 3 commits April 9, 2026 12:13

refactor: replace assert_mr_eq! macro with generic function

53a159f

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

chore: apply cargo fmt

3e693fb

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

naor-starkware commented Apr 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add test_helpers module (error_utils, test_utils) behind test_utlis flag#2381

feat: add test_helpers module (error_utils, test_utils) behind test_utlis flag#2381
naor-starkware wants to merge 13 commits intomainfrom
naor/feat/add_test_helpers

naor-starkware commented Apr 6, 2026 •

edited by phil-starkware

Loading

Uh oh!

naor-starkware commented Apr 6, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Apr 6, 2026 •

edited

Loading

Uh oh!

codecov bot commented Apr 6, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Apr 6, 2026 •

edited

Loading

Uh oh!

YairVaknin-starkware left a comment

Uh oh!

naor-starkware left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

naor-starkware commented Apr 6, 2026 • edited by phil-starkware Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TITLE

Description

Checklist

Uh oh!

naor-starkware commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions bot commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

YairVaknin-starkware left a comment

Choose a reason for hiding this comment

Uh oh!

naor-starkware left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

naor-starkware commented Apr 6, 2026 •

edited by phil-starkware

Loading

naor-starkware commented Apr 6, 2026 •

edited

Loading

github-actions bot commented Apr 6, 2026 •

edited

Loading

codecov bot commented Apr 6, 2026 •

edited

Loading

github-actions bot commented Apr 6, 2026 •

edited

Loading