Skip to content

emit subselect if it is neeeded#1732

Open
shcheklein wants to merge 13 commits intomainfrom
persist-fixes-1
Open

emit subselect if it is neeeded#1732
shcheklein wants to merge 13 commits intomainfrom
persist-fixes-1

Conversation

@shcheklein
Copy link
Copy Markdown
Contributor

Fixes a few open issue in queries where persist is required.

Usually it means we don't generate right SQL (e.g. limit.distinct add limit after distinct in the SQL), while it should be doing additional select distinct after limit is applied), etc.

@shcheklein shcheklein self-assigned this Apr 26, 2026
@shcheklein shcheklein marked this pull request as draft April 26, 2026 02:05
@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented Apr 26, 2026

Deploying datachain with  Cloudflare Pages  Cloudflare Pages

Latest commit: 2a24b1c
Status: ✅  Deploy successful!
Preview URL: https://6a3d116e.datachain-2g6.pages.dev
Branch Preview URL: https://persist-fixes-1.datachain-2g6.pages.dev

View logs

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 26, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates DatasetQuery.apply_steps() planning so SQL generation inserts subquery boundaries when later clauses must operate on the materialized output of earlier “finalizing” clauses (e.g., LIMIT/OFFSET/DISTINCT/GROUP BY), fixing incorrect clause ordering/semantics that show up in persisted queries.

Changes:

  • Introduce a QueryState tracker and boundary emission (emit_boundary) to wrap in-flight queries in a subselect when needed.
  • Update clause-step implementations (filter/order_by/limit/offset/distinct/group_by) to consult/update QueryState and emit boundaries for correct semantics.
  • Add unit tests covering boundary-requiring clause sequences (e.g., filter after group_by/limit/offset/distinct; distinct after limit; order_by after limit/offset; count after limit/distinct).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
src/datachain/query/dataset.py Adds query-state tracking and uses it during step application to force subquery boundaries for correct SQL semantics.
tests/unit/lib/test_datachain.py Adds a focused test suite to validate stage-boundary behavior across common clause orderings.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/unit/lib/test_datachain.py Outdated
Comment thread src/datachain/query/dataset.py Outdated
Comment thread src/datachain/query/dataset.py Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/unit/lib/test_datachain.py Outdated
Comment thread src/datachain/query/dataset.py Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/datachain/query/dataset.py Outdated
@shcheklein shcheklein requested a review from Copilot April 29, 2026 00:34
@shcheklein shcheklein marked this pull request as ready for review April 29, 2026 00:34
@shcheklein shcheklein requested a review from a team April 29, 2026 00:34
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/datachain/data_storage/warehouse.py Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/unit/lib/test_datachain.py Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/datachain/query/dataset.py Outdated
Comment thread src/datachain/data_storage/warehouse.py Outdated
Comment thread src/datachain/query/dataset.py Outdated
Comment thread src/datachain/query/dataset.py
Comment thread src/datachain/query/dataset.py
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/datachain/query/dataset.py Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants