Skip to content

Benchmark Interleaving #73

@nvzqz

Description

@nvzqz

Divan 0.1 uses a simple serial scheduling algorithm where each benchmark is executed in presentation order. This approach has worked “well enough” for the purpose of 0.1 being a proof-of-concept, but the results have been shown to be unreliable. For example, simply renaming benchmarks has caused false performance regressions due to a change in execution order.

Benchmarks become unreliable because of systemic noise from the execution environment. This noise can be from other processes (resource sharing), other benchmarks (cache state), and benchmarking right after compiling (thermal throttling).

To improve reliability, Divan 0.2 will use random interleaving to reduce inter-benchmark variance by spreading out systemic noise across benchmarks. Interleaving order is randomized to improve fairness between benchmarks.

The following table illustrates the differences in execution order:

Order 1 2 3 4 5 6 7 8 9
Serial A₁ A₂ A₃ B₁ B₂ B₃ C₁ C₂ C₃
Interleaved A₁ B₁ C₁ A₂ B₂ C₂ A₃ B₃ C₃
Interleaved (Random) C₁ A₁ B₁ C₂ A₂ C₃ B₂ A₃ B₃

How interleaving is accomplished depends on the type of benchmark function:

  • Simple context-free functions are run once for each sample.
  • Functions that have local context and use Bencher will be paused between samples using stackful coroutines provided by corosensei.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions