The `fp_reduction_latency` benchmarks were the very first benchmark, optimization and primitive code tested in Laser. Unfortunately it is currently very confusing. It should be reorganized: - 1. Multiple accumulators: https://github.com/numforge/laser/blob/af191c086b4a98c49049ecf18f5519dc6856cc77/benchmarks/fp_reduction_latency/reduction_bench.nim and https://github.com/numforge/laser/blob/af191c086b4a98c49049ecf18f5519dc6856cc77/benchmarks/fp_reduction_latency/reduction_packed_accum.nim - 2. raw vector intrinsics measurements: https://github.com/numforge/laser/blob/af191c086b4a98c49049ecf18f5519dc6856cc77/benchmarks/fp_reduction_latency/reduction_packed_sse.nim and https://github.com/numforge/laser/blob/af191c086b4a98c49049ecf18f5519dc6856cc77/benchmarks/fp_reduction_latency/reduction_sse_bench.nim - 3. Measuring max/min implementation: https://github.com/numforge/laser/blob/af191c086b4a98c49049ecf18f5519dc6856cc77/benchmarks/fp_reduction_latency/reduction_max_bench.nim This reorg should take into account https://github.com/nim-lang/Nim/issues/9514
The
fp_reduction_latencybenchmarks were the very first benchmark, optimization and primitive code tested in Laser.Unfortunately it is currently very confusing.
It should be reorganized:
This reorg should take into account nim-lang/Nim#9514