emit subselect if it is neeeded#1732
Conversation
Deploying datachain with
|
| Latest commit: |
2a24b1c
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://6a3d116e.datachain-2g6.pages.dev |
| Branch Preview URL: | https://persist-fixes-1.datachain-2g6.pages.dev |
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Pull request overview
This PR updates DatasetQuery.apply_steps() planning so SQL generation inserts subquery boundaries when later clauses must operate on the materialized output of earlier “finalizing” clauses (e.g., LIMIT/OFFSET/DISTINCT/GROUP BY), fixing incorrect clause ordering/semantics that show up in persisted queries.
Changes:
- Introduce a
QueryStatetracker and boundary emission (emit_boundary) to wrap in-flight queries in a subselect when needed. - Update clause-step implementations (filter/order_by/limit/offset/distinct/group_by) to consult/update
QueryStateand emit boundaries for correct semantics. - Add unit tests covering boundary-requiring clause sequences (e.g., filter after group_by/limit/offset/distinct; distinct after limit; order_by after limit/offset; count after limit/distinct).
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
src/datachain/query/dataset.py |
Adds query-state tracking and uses it during step application to force subquery boundaries for correct SQL semantics. |
tests/unit/lib/test_datachain.py |
Adds a focused test suite to validate stage-boundary behavior across common clause orderings. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Fixes a few open issue in queries where
persistis required.Usually it means we don't generate right SQL (e.g. limit.distinct add limit after distinct in the SQL), while it should be doing additional select distinct after limit is applied), etc.