Fold pipelineable downstream ops into merge pipelines after GROUP BY and TOP_N#898
Draft
Jedi18 wants to merge 3 commits into
Draft
Fold pipelineable downstream ops into merge pipelines after GROUP BY and TOP_N#898Jedi18 wants to merge 3 commits into
Jedi18 wants to merge 3 commits into
Conversation
…and TOP_N. Avoid an extra pipeline boundary when downstream operators can run in the same GPU task as the merge step. Block fusion when a structural sink (ORDER BY, HASH_JOIN, etc.) or a later pipeline reads from the fused output. Skip UNGROUPED_AGGREGATE where the same physical op is both partial sink and merge source, which breaks partial→merge repository wiring.
Cover GROUP BY and TOP_N fusion, structural sinks that block fusion (ORDER BY, HASH JOIN), and UNGROUPED aggregate exclusion.
Collaborator
|
The query planner is undergoing major refactoring and |
Collaborator
Author
|
Thanks for letting me know @bwyogatama, I'll put this PR on hold till then I guess |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
After
HASH_GROUP_BY/TOP_Nsplitting, Sirius previously created a small merge pipeline (PARTITION → MERGE_*) and then a separate downstream pipeline for streaming ops (typicallyRESULT_COLLECTOR). That extra pipeline boundary is unnecessary when downstream is a dead end.This PR folds pipelineable downstream operators into the merge pipeline when safe, so merge and collector run in a single
gpu_pipeline_task:Before:
PARTITION → MERGE_GROUP_BY | RESULT_COLLECTORAfter:
PARTITION → MERGE_GROUP_BY → RESULT_COLLECTORFusion is applied from
split_group_aggregate_sinkandsplit_top_n_sinkviaattach_merge_and_fuse_downstream, with three guards intry_fuse_downstream_pipelineable:ORDER BY,HASH_JOIN, nestedGROUP BY, etc.) that needs its ownsplit_*pathGROUP BY → PROJECTION → HASH_JOIN) — fusion would remove a pipeline boundary that join wiring still depends on!is_source() && !is_sink())UNGROUPED_AGGREGATE is explicitly excluded: the same physical op is both the partial-aggregate sink and the merge pipeline source. Folding downstream broke partial→merge repository wiring and produced wrong results (e.g.
avgoff by orders of magnitude).Adds
absorbed_pipelines_so absorbed downstream pipelines are skipped insplit_pipelines()and not double-scheduled.Also includes regression tests in
test_pipeline_merge_fusion.cpp