Skip to content

Added ARRAY data type (fixed-sized lists)#901

Open
yayen-lin wants to merge 7 commits into
sirius-db:devfrom
yayen-lin:sirius-array-type
Open

Added ARRAY data type (fixed-sized lists)#901
yayen-lin wants to merge 7 commits into
sirius-db:devfrom
yayen-lin:sirius-array-type

Conversation

@yayen-lin

@yayen-lin yayen-lin commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Add ARRAY (fixed-size list) data type support

Summary

  • Maps DuckDB's ARRAY type to cuDF LIST in the type system.
  • cuCascade handles the H2D transfer and memory management for the LIST column.
  • list_offset_fixup.cpp/.hpp: a temporary workaround (see #147).

Changes

Type system

  • src/include/helper/logical_type.hpp: add the ARRAY logical type.
  • src/helper/type_conversions.cpp: ARRAY to/from DuckDB conversion.
  • src/include/cudf/cudf_utils.hpp: map ARRAY to/from cuDF LIST.

Scan / read-back

  • src/op/scan/duckdb_scan_task.cpp: lower a DuckDB ARRAY vector into a cuDF LIST column.
  • src/op/result/host_table_chunk_reader.{hpp,cpp}: read a cuDF LIST column back into a DuckDB ARRAY vector.
  • src/op/scan/duckdb_native_metadata.cpp: classify ARRAY as a nested type in the native-scan viability switch.

LIST offset fixup

  • src/helper/list_offset_fixup.cpp, src/include/cudf/list_offset_fixup.hpp: recast LIST offsets back to INT32 after cuCascade's H2D promotes them to INT64.
  • src/include/pipeline/batch_lock_utils.hpp: apply the fixup after the H2D conversion.
  • CMakeLists.txt: add list_offset_fixup.cpp and tests to the build.

Tests

  • test/cpp/helper/test_logical_type.cpp: ARRAY logical type tests.
  • test/cpp/helper/test_cudf_utils.cpp: ARRAY to cuDF LIST mapping tests.
  • test/cpp/integration/test_gpu_execution_array.cpp: end-to-end ARRAY scan tests (deferred, see below).

What's deferred

  • End-to-end GPU scan of ARRAY columns is deferred because it's out of scope for this PR. The DuckDB-native scan path rejects nested types, so transparent GPU execution of an ARRAY query isn't viable yet.

@mbrobbel mbrobbel left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @yayen-lin

Comment thread src/helper/list_offset_fixup.cpp
Comment thread src/include/helper/logical_type.hpp
Comment thread test/cpp/helper/test_logical_type.cpp
@yayen-lin

Copy link
Copy Markdown
Contributor Author

Hi Matthijs, thank you for taking your time to review this PR, I really appreciate it. Please take a look at my response and the new commit if you have a chance.

@yayen-lin yayen-lin force-pushed the sirius-array-type branch from aeaa6ae to 5692747 Compare June 11, 2026 02:43
@yayen-lin

Copy link
Copy Markdown
Contributor Author

Hey Matt good afternoon,

I was looking at the failed ci job, I thought that was a native scan path where Sirius would read directly from DuckDB's on-disk storage.

Could you let me know if I should get that to work, or whether it's fine to leave it as is for now?

Thanks,
Andy

@bwyogatama

bwyogatama commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

Can we add PR description and also @kevkrist has solved that issue i believe. You should rebase and that will probably solve the CI

@yayen-lin yayen-lin force-pushed the sirius-array-type branch from 673f0d6 to 5824a98 Compare June 12, 2026 18:10
@yayen-lin

Copy link
Copy Markdown
Contributor Author

Thank you Bobbi, I just rebased with latest code on dev.

Also just a heads up, I added [.] to the test (test_gpu_execution_array.cpp) since GPU decoding of the ARRAY type is out of the scope for this PR.

@bwyogatama

Copy link
Copy Markdown
Collaborator

Okay so the biggest problem with this PR is that you are using the CPU duckdb scan path, which will be deprecated. You need to use the duckdb scan path with GPU cause otherwise we will lose this feature as soon as this is deprecated. DuckDB GPU scan path is served through duckdb_native_gpu_ingestible

@yayen-lin

Copy link
Copy Markdown
Contributor Author

Got it, thank you for your feedback Bobbi, I will work on that!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants