Skip to content

fix: preserve subquery structure when unparsing SubqueryAlias over Ag…#21099

Merged
kosiew merged 6 commits intoapache:mainfrom
yonatan-sevenai:main
Apr 8, 2026
Merged

fix: preserve subquery structure when unparsing SubqueryAlias over Ag…#21099
kosiew merged 6 commits intoapache:mainfrom
yonatan-sevenai:main

Conversation

@yonatan-sevenai
Copy link
Copy Markdown
Contributor

@yonatan-sevenai yonatan-sevenai commented Mar 22, 2026

When the SQL unparser encountered a SubqueryAlias node whose direct child was an Aggregate (or other clause-building plan like Window, Sort, Limit, Union), it would flatten the subquery into a simple table alias, losing the aggregate entirely.

For example, a plan representing:
SELECT j1.col FROM j1 JOIN (SELECT max(id) AS m FROM j2) AS b ON j1.id = b.m

would unparse to:
SELECT j1.col FROM j1 INNER JOIN j2 AS b ON j1.id = b.m

dropping the MAX aggregate and the subquery.

Root cause: the SubqueryAlias handler in select_to_sql_recursively would call subquery_alias_inner_query_and_columns (which only unwraps Projection children) and unparse_table_scan_pushdown (which only handles TableScan/SubqueryAlias/Projection). When both returned nothing useful for an Aggregate child, the code recursed directly into the Aggregate, merging its GROUP BY into the outer SELECT instead of wrapping it in a derived subquery.

The fix adds an early check: if the SubqueryAlias's direct child is a plan type that builds its own SELECT clauses (Aggregate, Window, Sort, Limit, Union), emit it as a derived subquery via self.derive() with the alias always attached, rather than falling through to the recursive path that would flatten it.

Which issue does this PR close?

Rationale for this change

The SQL unparser silently drops subquery structure when a SubqueryAlias node directly wraps an Aggregate (or Window, Sort, Limit, Union). For example, a plan representing

SELECT ... FROM j1 JOIN (SELECT max(id) FROM j2) AS b ...

unparses to

SELECT ... FROM j1 JOIN j2 AS b ...

losing the aggregate entirely. This produces semantically incorrect SQL.

What changes are included in this PR?

In the SubqueryAlias handler within select_to_sql_recursively (datafusion/sql/src/unparser/plan.rs):

  • Added an early check: if the SubqueryAlias's direct child is a plan type that builds its own SELECT clauses (Aggregate, Window, Sort, Limit, Union) and cannot be reduced to a table scan, emit it as a derived subquery (SELECT ...) AS alias via self.derive() instead of recursing into the child and flattening it.
  • Added a helper requires_derived_subquery() that identifies plan types requiring this treatment.

Are these changes tested?

Yes. A new test test_unparse_manual_join_with_subquery_aggregate is added that constructs a SubqueryAlias > Aggregate plan (without an intermediate Projection) and asserts the unparsed SQL preserves the MAX() aggregate function call. This test fails without the fix. All current unparser tests succeed without modification

Are there any user-facing changes?

No.

…gregate

When the SQL unparser encountered a SubqueryAlias node whose direct
child was an Aggregate (or other clause-building plan like Window, Sort,
Limit, Union), it would flatten the subquery into a simple table alias,
losing the aggregate entirely.

For example, a plan representing:
  SELECT j1.col FROM j1 JOIN (SELECT max(id) AS m FROM j2) AS b ON j1.id = b.m

would unparse to:
  SELECT j1.col FROM j1 INNER JOIN j2 AS b ON j1.id = b.m

dropping the MAX aggregate and the subquery.

Root cause: the SubqueryAlias handler in select_to_sql_recursively would
call subquery_alias_inner_query_and_columns (which only unwraps
Projection children) and unparse_table_scan_pushdown (which only handles
TableScan/SubqueryAlias/Projection). When both returned nothing useful
for an Aggregate child, the code recursed directly into the Aggregate,
merging its GROUP BY into the outer SELECT instead of wrapping it in a
derived subquery.

The fix adds an early check: if the SubqueryAlias's direct child is a
plan type that builds its own SELECT clauses (Aggregate, Window, Sort,
Limit, Union), emit it as a derived subquery via self.derive() with the
alias always attached, rather than falling through to the recursive
path that would flatten it.
@github-actions github-actions bot added the sql SQL Planner label Mar 22, 2026
yonatan-sevenai and others added 2 commits March 27, 2026 12:37
…gregate

When the SQL unparser encountered a SubqueryAlias node whose direct
child was an Aggregate (or other clause-building plan like Window, Sort,
Limit, Union), it would flatten the subquery into a simple table alias,
losing the aggregate entirely.

For example, a plan representing:
  SELECT j1.col FROM j1 JOIN (SELECT max(id) AS m FROM j2) AS b ON j1.id = b.m

would unparse to:
  SELECT j1.col FROM j1 INNER JOIN j2 AS b ON j1.id = b.m

dropping the MAX aggregate and the subquery.

Root cause: the SubqueryAlias handler in select_to_sql_recursively would
call subquery_alias_inner_query_and_columns (which only unwraps
Projection children) and unparse_table_scan_pushdown (which only handles
TableScan/SubqueryAlias/Projection). When both returned nothing useful
for an Aggregate child, the code recursed directly into the Aggregate,
merging its GROUP BY into the outer SELECT instead of wrapping it in a
derived subquery.

The fix adds an early check: if the SubqueryAlias's direct child is a
plan type that builds its own SELECT clauses (Aggregate, Window, Sort,
Limit, Union), emit it as a derived subquery via self.derive() with the
alias always attached, rather than falling through to the recursive
path that would flatten it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@kosiew kosiew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yonatan-sevenai
Thanks for the work here. This is heading in the right direction, but there is still a correctness issue around how the alias guard is applied. I think we need a small adjustment to ensure parser-generated subqueries are handled safely.

@yonatan-sevenai yonatan-sevenai requested a review from kosiew April 4, 2026 02:23
Copy link
Copy Markdown
Contributor

@kosiew kosiew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@yonatan-sevenai
Copy link
Copy Markdown
Contributor Author

yonatan-sevenai commented Apr 4, 2026

lgtm

Thanks @kosiew . What's the next step to get it merged?

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Apr 7, 2026

I kicked off the CI tests

@yonatan-sevenai
Copy link
Copy Markdown
Contributor Author

I kicked off the CI tests

Merged in Master to make merge easier, but you might need to rerun CI.

@kosiew kosiew added this pull request to the merge queue Apr 8, 2026
@kosiew
Copy link
Copy Markdown
Contributor

kosiew commented Apr 8, 2026

@yonatan-sevenai
thanks again for contributing

Merged via the queue into apache:main with commit cdddd76 Apr 8, 2026
35 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

sql SQL Planner

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SQL Unparser loses subquery structure when SubqueryAlias directly wraps an Aggregate (PR to follow)

3 participants