Unify scan_plan across parquet and DuckDB-native scans by kevkrist · Pull Request #869 · sirius-db/sirius

kevkrist · 2026-06-03T19:45:36Z

Routes the DuckDB-native GPU scan through the same scan_plan abstraction the parquet scan already uses. The native bind data now carries a shared scan_plan instead of parallel projected_cols/projected_types vectors: the walker and decoder read plan.data_columns, filters resolve through plan.batch_position_by_column_id, and output is reshaped via assemble_scan_output instead of a hand-rolled projection. scan_plan gains optional is_rowid + per-column type fields (behind build_scan_plan_options) so the native decoder can synthesize rowid and resolve types up front; the parquet path is unchanged. Also fixes a latent crash when a filter references the rowid column.

bwyogatama · 2026-06-05T04:23:53Z

This PR touches a bunch of files that i deleted, which makes my rebase more difficult. Let's wait for #871 to merge before merging this

Route the DuckDB-native GPU scan through the same scan_plan abstraction the parquet scan uses, replacing its parallel projected_cols / projected_types vectors and ad-hoc "keep first N columns" projection with the shared plan + assemble_scan_output. - scan_plan::data_column gains optional is_rowid + logical-type fields, and build_scan_plan gains build_scan_plan_options (decode_rowid_columns + type_for) so the native path routes synthesized rowid columns into data_columns / output_layout and carries per-column types. Parquet is unchanged: default options reproduce today's behavior. - duckdb_native_ingestible_table_info carries a shared scan_plan; the projected_column struct and the now-dead output_types / names / projection_ids fields are removed. - The pipeline converter builds the plan once via build_scan_plan. - The walker and decoder consume plan.data_columns directly. - The ingestible builds filters over plan.batch_position_by_column_id and reshapes output via assemble_scan_output (needs_output_assembly), the same path parquet uses. Also fixes a latent rowid-filter crash: convert_table_filters_to_expression resolved a filter column's type via returned_types.at(primary_idx), out of bounds for rowid's sentinel primary index. Rowid's filter type now resolves as BIGINT directly. Tests: walker unit tests migrated to the scan_plan signature; new test_build_scan_plan.cpp covers rowid routing, the parquet default drop, the missing-type throw, pure-filter columns, and reordered projection. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The unified scan_plan carried two per-column fields (is_rowid, type) that only the duckdb-native reader populates; parquet/pinned left them at defaults. Fold them into a single std::optional<reader_decode_info> so the "type and is_rowid travel together" invariant is enforced by construction instead of a runtime loop, and so the native-only nature is self-documenting (reader_info present => reader-typed path; empty => the reader resolves types itself). - data_column::{is_rowid,type} -> data_column::reader_info (reader_decode_info{type, is_rowid}); reader_decode_info cannot exist without a type, so the build-time check now only verifies presence. - Native decoder/walker/ingestible read reader_info->{type,is_rowid}; duckdb_column_metadata's own is_rowid is unchanged. - Note the future direction: when a second reader needs per-column decode metadata (ORC/Arrow), promote reader_info to a std::variant with each format's arm defined outside scan_plan. Tests updated to the new shape. [scan] suite green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

bwyogatama · 2026-06-12T18:51:07Z

@kevkrist is this still an ongoing work?

kevkrist · 2026-06-12T21:55:50Z

@bwyogatama I will rebase and evaluate whether it serves any purpose after gpu_ingestible

bwyogatama · 2026-06-12T21:56:58Z

Yeah and this is not P0 so don't worry too much about it

kevkrist marked this pull request as ready for review June 3, 2026 19:46

kevkrist force-pushed the scan-plan-unification branch from ed00455 to 19fc910 Compare June 10, 2026 15:47

kevkrist changed the title ~~Scan plan unification + duckdb scan manager metadata parsing~~ Unify scan_plan across parquet and DuckDB-native scans Jun 10, 2026

kevkrist enabled auto-merge June 10, 2026 16:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unify scan_plan across parquet and DuckDB-native scans#869

Unify scan_plan across parquet and DuckDB-native scans#869
kevkrist wants to merge 2 commits into
sirius-db:devfrom
kevkrist:scan-plan-unification

kevkrist commented Jun 3, 2026 •

edited

Loading

Uh oh!

bwyogatama commented Jun 5, 2026

Uh oh!

bwyogatama commented Jun 12, 2026

Uh oh!

kevkrist commented Jun 12, 2026

Uh oh!

bwyogatama commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

kevkrist commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bwyogatama commented Jun 5, 2026

Uh oh!

bwyogatama commented Jun 12, 2026

Uh oh!

kevkrist commented Jun 12, 2026

Uh oh!

bwyogatama commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

kevkrist commented Jun 3, 2026 •

edited

Loading