Divan 0.1 uses a simple serial scheduling algorithm where each benchmark is executed in presentation order. This approach has worked “well enough” for the purpose of 0.1 being a proof-of-concept, but the results have been shown to be unreliable. For example, simply renaming benchmarks has caused false performance regressions due to a change in execution order.
Benchmarks become unreliable because of systemic noise from the execution environment. This noise can be from other processes (resource sharing), other benchmarks (cache state), and benchmarking right after compiling (thermal throttling).
To improve reliability, Divan 0.2 will use random interleaving to reduce inter-benchmark variance by spreading out systemic noise across benchmarks. Interleaving order is randomized to improve fairness between benchmarks.
The following table illustrates the differences in execution order:
| Order |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
| Serial |
A₁ |
A₂ |
A₃ |
B₁ |
B₂ |
B₃ |
C₁ |
C₂ |
C₃ |
| Interleaved |
A₁ |
B₁ |
C₁ |
A₂ |
B₂ |
C₂ |
A₃ |
B₃ |
C₃ |
| Interleaved (Random) |
C₁ |
A₁ |
B₁ |
C₂ |
A₂ |
C₃ |
B₂ |
A₃ |
B₃ |
How interleaving is accomplished depends on the type of benchmark function:
- Simple context-free functions are run once for each sample.
- Functions that have local context and use
Bencher will be paused between samples using stackful coroutines provided by corosensei.
Divan 0.1 uses a simple serial scheduling algorithm where each benchmark is executed in presentation order. This approach has worked “well enough” for the purpose of 0.1 being a proof-of-concept, but the results have been shown to be unreliable. For example, simply renaming benchmarks has caused false performance regressions due to a change in execution order.
Benchmarks become unreliable because of systemic noise from the execution environment. This noise can be from other processes (resource sharing), other benchmarks (cache state), and benchmarking right after compiling (thermal throttling).
To improve reliability, Divan 0.2 will use random interleaving to reduce inter-benchmark variance by spreading out systemic noise across benchmarks. Interleaving order is randomized to improve fairness between benchmarks.
The following table illustrates the differences in execution order:
How interleaving is accomplished depends on the type of benchmark function:
Bencherwill be paused between samples using stackful coroutines provided bycorosensei.