feat: Optimize ORDER BY by Pruning Functionally Redundant Sort Keys by xiedeyantu · Pull Request #21362 · apache/datafusion

xiedeyantu · 2026-04-04T15:11:31Z

Which issue does this PR close?

Closes Optimize ORDER BY by Pruning Functionally Redundant Sort Keys #21361 .

Rationale for this change

This PR adds functional-dependency-based simplification for ORDER BY clauses. When an earlier sort key already functionally determines a later key, the later key is redundant and can be removed without changing query semantics. This reduces unnecessary sorting work and avoids carrying extra sort keys through planning and execution.

What changes are included in this PR?

This PR extends the existing functional dependency utilities with a helper for pruning redundant sort keys, and wires that helper into eliminate_duplicated_expr so Sort nodes can be simplified during optimization. It also adds regression coverage for both the positive case, where a trailing sort key is removed, and the negative case, where sort order prevents pruning.

Are these changes tested?

Yes. I added unit tests covering:

removal of a functionally redundant trailing ORDER BY key
preservation of ordering when the dependent column appears before its determinant

I also ran cargo test -p datafusion-optimizer eliminate_duplicated_expr -- --nocapture successfully, and cargo fmt --all passes.

Are there any user-facing changes?

Yes, but only in query planning behavior. Some queries with redundant ORDER BY keys may produce simpler plans and run more efficiently. There are no public API changes.

neilconway

Looks good overall! A few minor suggestions.

datafusion/common/src/functional_dependencies.rs

datafusion/sqllogictest/test_files/window.slt

datafusion/optimizer/src/eliminate_duplicated_expr.rs

…deptno, abs(deptno)

xiedeyantu · 2026-04-06T03:43:06Z

Looks good overall! A few minor suggestions.

@neilconway Thank you for your comments and suggestions; I think the scenario you mentioned—changing order by deptno, total_sal, abs(deptno) to order by deptno, abs(deptno)—is an excellent point. We should definitely support this; our previous restrictions were indeed a bit too strict. We could even go a step further in the future by leveraging injective functions for additional optimizations (though it might be better to implement this in a separate PR). Regarding the rest of your comments, I have attempted to address each one by submitting a corresponding commit. Could you please take another look and review them? Thank you once again for your thorough reviews of every single one of my PRs!

neilconway

Thanks for iterating on this!

Can you "resolve" comment threads for review comments you believe have been addressed, please?

datafusion/common/src/functional_dependencies.rs

datafusion/optimizer/src/eliminate_duplicated_expr.rs

datafusion/common/src/functional_dependencies.rs

xiedeyantu · 2026-04-09T13:47:30Z

Thanks for iterating on this!

Can you "resolve" comment threads for review comments you believe have been addressed, please?

@neilconway My previous understanding was that the issue would only be marked as "resolved" after I had fixed it and the reviewer had confirmed that everything was in order; that is why I didn't click the button. Going forward, I will follow your suggestion. Thank you!

neilconway · 2026-04-09T13:49:21Z

My previous understanding was that the issue would only be marked as "resolved" after I had fixed it and the reviewer had confirmed that everything was in order; that is why I didn't click the button.

I think it's simpler if the PR submitter just proactively "resolves" comments they believe have been resolved; if the reviewer disagrees, they can always reopen the comment thread.

neilconway

Looks good to me! Nice work.

@alamb PR looks reasonable to me.

xiedeyantu · 2026-04-09T13:52:31Z

My previous understanding was that the issue would only be marked as "resolved" after I had fixed it and the reviewer had confirmed that everything was in order; that is why I didn't click the button.

I think it's simpler if the PR submitter just proactively "resolves" comments they believe have been resolved; if the reviewer disagrees, they can always reopen the comment thread.

I completely agree; this will make it much easier for the reviewer to conduct the next round of reviews. Thank you!

alamb

I am not sure about the logic in this one. Thanks @neilconway and @xiedeyantu

alamb · 2026-04-10T20:02:50Z

datafusion/sqllogictest/test_files/order.slt

 ----
 logical_plan
-01)Sort: table_with_ordered_pk.c1 ASC NULLS LAST, table_with_ordered_pk.c2 ASC NULLS LAST
+01)Sort: table_with_ordered_pk.c1 ASC NULLS LAST


I don't understand this change -- the query requires ORDR BY c1, c2 but now the query only sorts on c1. The primary key on c1 means there are no duplicates, but how does that ensure it is also ordered by c2

For example what about

INSERT INTO table VALUES (1,2) INSERT INTO table VALUES (2,1)

That is still ordered by c1, but if you don't also sort of c2, you'll end up with the wrong sort

Hmm, the optimization seems sound to me. If c1 functionally determines c2, we know that each distinct c1 value is associated with exactly one c2 value. So sorting by c1 is sufficient; adding in c2 as a tiebreaker / secondary sort key is never useful.

In the example, if c1 is a PK of table, sorting by c1 is sufficient to get the right sort order -- there will never be two rows with the same c1 value.

@alamb I fully agree with Neil's explanation: functional dependency serves precisely to guarantee this property—for instance, an injective function inherently exhibits the characteristics of functional dependency. In the context of ORDER BY, an additional aspect of ordering comes into play, requiring evaluation based on the sequence of the original fields (as explained in the accompanying code); this ensures that the semantics remain correct after the ORDER BY fields have been eliminated.

feat: Optimize ORDER BY by Pruning Functionally Redundant Sort Keys

3a57d0b

github-actions bot added optimizer Optimizer rules common Related to common crate labels Apr 4, 2026

fix existing cases

dc15b02

github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Apr 4, 2026

neilconway reviewed Apr 5, 2026

View reviewed changes

xiedeyantu added 5 commits April 6, 2026 10:43

support simplify order by deptno, total_sal, abs(deptno) to order by …

77308cc

…deptno, abs(deptno)

change unique_exprs conditional to an assertion

f998b1a

update test doc

8725d5c

changed code as your suggestions

17b1bd4

fix test

fb3c0b9

neilconway reviewed Apr 9, 2026

View reviewed changes

addressed comments

e6e430f

neilconway approved these changes Apr 9, 2026

View reviewed changes

Merge branch 'main' into sortkey

c50dc1b

alamb reviewed Apr 10, 2026

View reviewed changes

neilconway mentioned this pull request Apr 10, 2026

Incorrect query results for GROUP BY with UNIQUE constraint #21507

Open

Conversation

xiedeyantu commented Apr 4, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

neilconway left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

xiedeyantu commented Apr 6, 2026

Uh oh!

neilconway left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

xiedeyantu commented Apr 9, 2026

Uh oh!

neilconway commented Apr 9, 2026

Uh oh!

neilconway left a comment

Choose a reason for hiding this comment

Uh oh!

xiedeyantu commented Apr 9, 2026

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

alamb Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

neilconway Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xiedeyantu Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

neilconway Apr 10, 2026 •

edited

Loading