feat: support runtime length for vectorized functions with compile-time max bound#1777
Open
AraxTheCoder wants to merge 1 commit intodata61:masterfrom
Open
feat: support runtime length for vectorized functions with compile-time max bound#1777AraxTheCoder wants to merge 1 commit intodata61:masterfrom
AraxTheCoder wants to merge 1 commit intodata61:masterfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description: Runtime Vectorization Length Support
This pull request introduces the ability to execute vectorized instructions and functions based on a runtime-determined length.
While the maximum possible length must still be defined at compile-time to facilitate instruction generation, this change allows for significantly more efficient execution when the actual number of required operations is only known at runtime.
High-Level API
The functionality is exposed through an extension of the vectorize decorator, which now supports the optional named argument
active_length.Comparison & Motivation
To illustrate the necessity of this change, consider the performance and resource trade-offs of different approaches when processing a fraction of a large array at runtime:
Current Framework State (Timer 3):
To handle dynamic lengths currently, one must process the entire max_length. This introduces no additional compile-time overhead but results in significant data-volume and computation overhead, as unused operations are still executed.
Manual Mapping (Pre-compilation):
Another workaround is "pre-compiling" the vectorized instruction/function for every possible length and mapping them at runtime to the corresponding function. While this avoids data-volume overhead, it causes a compile-time overhead that scales linearly with the maximum possible length.
The Proposed Solution (Timer 5):
By using the
active_lengthparameter, the optimal performance and data-volume of the static baseline (Timer 4) is matched. This eliminates the need to process unused elements without increasing compilation times.Preprocessed Data & Batching
A notable characteristic of this approach is that the exact amount of preprocessed data required is not known at compile-time. To ensure the communication rounds stay as low as the static baseline, the batch size must be adjusted. For the test program above, a batch size of 1,000,000 was used to verify optimal performance; however, users need to tune this value based on their specific requirements when using this feature.
I tried to keep the changes within the Instruction and Processor files as minimal as possible.
Crucially, the standard behavior of the vectorize decorator should remain unchanged when the active_length extension is not utilized.