Skip to content

Fold pipelineable downstream ops into merge pipelines after GROUP BY and TOP_N#898

Draft
Jedi18 wants to merge 3 commits into
sirius-db:devfrom
Jedi18:fix_merge_pipelining
Draft

Fold pipelineable downstream ops into merge pipelines after GROUP BY and TOP_N#898
Jedi18 wants to merge 3 commits into
sirius-db:devfrom
Jedi18:fix_merge_pipelining

Conversation

@Jedi18

@Jedi18 Jedi18 commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator

Summary

After HASH_GROUP_BY / TOP_N splitting, Sirius previously created a small merge pipeline (PARTITION → MERGE_*) and then a separate downstream pipeline for streaming ops (typically RESULT_COLLECTOR). That extra pipeline boundary is unnecessary when downstream is a dead end.

This PR folds pipelineable downstream operators into the merge pipeline when safe, so merge and collector run in a single gpu_pipeline_task:

Before: PARTITION → MERGE_GROUP_BY | RESULT_COLLECTOR
After: PARTITION → MERGE_GROUP_BY → RESULT_COLLECTOR

Fusion is applied from split_group_aggregate_sink and split_top_n_sink via attach_merge_and_fuse_downstream, with three guards in try_fuse_downstream_pipelineable:

  • Guard 1: block when a direct downstream pipeline ends in a structural sink (ORDER BY, HASH_JOIN, nested GROUP BY, etc.) that needs its own split_* path
  • Guard 2: block when a later structural pipeline reads from the candidate downstream’s output (e.g. GROUP BY → PROJECTION → HASH_JOIN) — fusion would remove a pipeline boundary that join wiring still depends on
  • Guard 3: only absorb pipelines whose operators are plain intermediates (!is_source() && !is_sink())

UNGROUPED_AGGREGATE is explicitly excluded: the same physical op is both the partial-aggregate sink and the merge pipeline source. Folding downstream broke partial→merge repository wiring and produced wrong results (e.g. avg off by orders of magnitude).

Adds absorbed_pipelines_ so absorbed downstream pipelines are skipped in split_pipelines() and not double-scheduled.

Also includes regression tests in test_pipeline_merge_fusion.cpp

Jedi18 added 3 commits June 7, 2026 22:51
…and TOP_N.

Avoid an extra pipeline boundary when downstream operators can run in the same
GPU task as the merge step. Block fusion when a structural sink (ORDER BY,
HASH_JOIN, etc.) or a later pipeline reads from the fused output. Skip
UNGROUPED_AGGREGATE where the same physical op is both partial sink and merge
source, which breaks partial→merge repository wiring.
Cover GROUP BY and TOP_N fusion, structural sinks that block fusion (ORDER BY,
HASH JOIN), and UNGROUPED aggregate exclusion.
@bwyogatama

Copy link
Copy Markdown
Collaborator

The query planner is undergoing major refactoring and split_group_aggregate_sink and split_top_n_sink will be deleted after this PR #790.
So I would work on this after that PR is in.

@Jedi18

Jedi18 commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator Author

Thanks for letting me know @bwyogatama, I'll put this PR on hold till then I guess

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants