refactor(scan): generalize batch coalescer into coalescing_gpu_ingestible#944
Open
kevkrist wants to merge 2 commits into
Open
refactor(scan): generalize batch coalescer into coalescing_gpu_ingestible#944kevkrist wants to merge 2 commits into
coalescing_gpu_ingestible#944kevkrist wants to merge 2 commits into
Conversation
…ible; port parquet + consumer-side multi-GPU Extract the duckdb-native scan's batch coalescing into a reusable base so the native and parquet scans share one path: - New coalescing_gpu_ingestible base owns the sealed consume_next_input -> coalesce -> emit_batch pipeline over format-blind coalescing_unit / carrier / payload. Rename duckdb_native_batch_coalescer.hpp -> batch_coalescer.hpp. - Add group_key partition affinity to the coalescer, and fix would_close to close a batch on group_key mismatch (it previously never did, making the affinity a no-op). - Port parquet_gpu_ingestible onto the base: the producer emits one coalescing_carrier of per-row-group units (group_key = hive partition values), and make_batch rebuilds a parquet_split_info, merging consecutive same-file units back into multi-row-group slices. - Multi-GPU: the device id is stamped consumer-side in emit_batch (the one site a real decode batch exists). prepare_for_query dispatches via dynamic_cast<coalescing_gpu_ingestible*>, so parquet gets round-robin GPU assignment for the first time and native's prior one-GPU default is fixed, with no producer-side double-stamp. Tests: coalescer group_key cases, parquet same-file-merge and round-robin device-id cases; full Catch2 suite (incl. TPC-H-parquet and hive-partition integration tests) green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
5b682f9 to
1f39589
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Extracts the duckdb-native scan's batch coalescing into a reusable base so the native and
parquet scans share one path, and moves multi-GPU device assignment to the consumer side for
both formats.
Changes
coalescing_gpu_ingestiblebase owns the sealedconsume_next_input → coalesce → emit_batchpipeline over format-blindcoalescing_unit/coalescing_carrier/coalescing_payload. Renamesduckdb_native_batch_coalescer.hpp → batch_coalescer.hpp.coalescing_carrierof per-row-groupunits (
group_key= hive partition values);make_batchrebuilds aparquet_split_info,merging consecutive same-file units back into multi-row-group slices.
emit_batch.prepare_for_querydispatches viadynamic_cast<coalescing_gpu_ingestible*>.Only the cargo differs between the two formats; the coalescing and balancing machinery is shared:
duckdb_native_payloadparquet_payload(arow_group_slice)make_batchbuildsduckdb_native_split_infoparquet_split_info(+ same-file merge)group_keyemit_batch(base)emit_batch(base) — sameDiagram 1 — how the multi-GPU policy is installed on the ingestible
The policy is installed once, at query-setup time, onto the ingestible itself (the consumer),
and then consumed per batch, at run time in
emit_batch.Both native and parquet take the left branch, so they are stamped identically. The provider
arm (right branch) is now used only by non-coalescing formats.
Diagram 2 — pull-based coalescing on the consumer
Producers run in parallel on the scan pool and push carriers into a blocking
split_connector.The single consumer pulls carriers on demand, packs their units into the coalescer, and pulls
cap-sized batches back out — so early batches start decoding while later carriers are still being
walked.
Why pull-based: the consumer never blocks the producers —
get_next_split()is the only waitpoint, and it returns as soon as any carrier is ready.
has_ready()is checked first eachiteration, so queued batches drain before the consumer waits again. The final partial batch is
served by
flush()only after the connector is closed and drained, so a single-split scannever drops its one batch. Carriers are intermediate transport — discarded right after their units
are unpacked — which is exactly why the device id is stamped on the coalesced batch in
emit_batch,not on the carrier.
🤖 Generated with Claude Code