Skip to content

Add MicroBenchmark for Small Trip Count Loop vectorization#404

Open
Stylie777 wants to merge 3 commits into
llvm:mainfrom
Stylie777:users/Stylie777/Small-TC-Loop-Vectorization
Open

Add MicroBenchmark for Small Trip Count Loop vectorization#404
Stylie777 wants to merge 3 commits into
llvm:mainfrom
Stylie777:users/Stylie777/Small-TC-Loop-Vectorization

Conversation

@Stylie777
Copy link
Copy Markdown

@Stylie777 Stylie777 commented May 14, 2026

For targets where getMinTripCountTailFoldingThreshold returns a value greater than zero, llvm/llvm-project#195823 has enabled better vectorization of loops where applicable. This micro benchmark is intended to show the impact of these changes on the relevant targets.

For targets where getMinTripCountTailFoldingThreshold returns zero, there will be no effect to runtime when comparing scalar vs vector.

Assisted-by: Codex

For targets where getMinTripCountTailFoldingThreshold returns a value
greater than zero, llvm/llvm-project#195823
has enabled better vectorization of loops where applicable. This
micro benchmark is intended to show the impact of these changes on
the relevant targets.

For targets where getMinTripCountTailFoldingThreshold returns zero,
there will be no effect to runtime when comparing scalar vs vector.
Comment on lines +76 to +78
g_small_loop_trip_count_sum ^= checksum(B);
benchmark::DoNotOptimize(g_small_loop_trip_count_sum);
State.SetItemsProcessed(State.iterations() * 5);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to comment why this is needed

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not. I missed this when first reviewing the codex generated benchmark. I've removed it.

B[I] = A[I] + static_cast<Ty>(1);
}

NOINLINE void loopTc5I64InterleaveCount2Vector(const uint64_t *__restrict A,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to not use the templated version for this one as well?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No there isn't, I have made it consistent now.

BENCHMARK_TEMPLATE(benchTc5Scalar, uint16_t)->Name("tc5/i16/scalar");
BENCHMARK_TEMPLATE(benchTc5Vector, uint32_t)->Name("tc5/i32/vector");
BENCHMARK_TEMPLATE(benchTc5Scalar, uint32_t)->Name("tc5/i32/scalar");
BENCHMARK_TEMPLATE(benchTc5Vector, uint64_t)->Name("tc5/i64/vector");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the potential worst case would be i64 with TC =3, could you also cover this?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added cases for all data types for TC=3 for full coverage.

NOINLINE void loopTc5Vector(const Ty *__restrict A, Ty *__restrict B) {
LOOP_VECTORIZE_ENABLE
for (uint64_t I = 0; I != 5; ++I)
B[I] = A[I] + static_cast<Ty>(1);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a case where there is basically no overhead for the vector code compared to the scalar code.

Would be good to also include cases where there is some overhead from the vector code compared to scalar, e.g. some scalarization

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added an example that has scalarization in the loop, if this is not what you meant please let me know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants