Commit 855af4a
Avoid concat_batches on hash join build side
Hash join currently concatenates the full build side into a single
RecordBatch. When it is relatively large, this copies all data and
fails for >2GB StringArray data.
This PR stores build-side batches separately as Vec<RecordBatch> and
uses composite (u64) indices (batch_idx << 32 | row_idx) to index
into them. Build-side batches are coalesced to target_batch_size
before hash map construction so that build_batch_from_indices hits
the fast SingleBatch/take path for small-to-medium build sides.
Key changes:
- Store Vec<RecordBatch> instead of one concatenated batch in JoinLeftData
- Use interleave for multi-batch gathering, take for single-batch
- Element-wise key comparison in equal_rows_arr avoids materializing
intermediate arrays
- Coalesce build-side batches in collect_left_input
- Coalesce batches in CoalescePartitionsExec output
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent 5ba06ac commit 855af4a
File tree
10 files changed
+943
-254
lines changed- datafusion
- core/src/datasource
- physical_plan
- physical-plan/src
- joins
- hash_join
10 files changed
+943
-254
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
374 | 374 | | |
375 | 375 | | |
376 | 376 | | |
377 | | - | |
378 | | - | |
| 377 | + | |
| 378 | + | |
| 379 | + | |
| 380 | + | |
| 381 | + | |
| 382 | + | |
| 383 | + | |
379 | 384 | | |
380 | 385 | | |
381 | 386 | | |
| |||
400 | 405 | | |
401 | 406 | | |
402 | 407 | | |
403 | | - | |
404 | | - | |
405 | | - | |
| 408 | + | |
| 409 | + | |
| 410 | + | |
| 411 | + | |
| 412 | + | |
| 413 | + | |
| 414 | + | |
| 415 | + | |
| 416 | + | |
| 417 | + | |
| 418 | + | |
| 419 | + | |
406 | 420 | | |
407 | 421 | | |
408 | 422 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2454 | 2454 | | |
2455 | 2455 | | |
2456 | 2456 | | |
2457 | | - | |
| 2457 | + | |
| 2458 | + | |
2458 | 2459 | | |
2459 | 2460 | | |
2460 | 2461 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
| 21 | + | |
21 | 22 | | |
| 23 | + | |
22 | 24 | | |
23 | 25 | | |
24 | 26 | | |
25 | 27 | | |
26 | | - | |
27 | | - | |
| 28 | + | |
| 29 | + | |
28 | 30 | | |
| 31 | + | |
29 | 32 | | |
30 | 33 | | |
31 | 34 | | |
32 | 35 | | |
33 | 36 | | |
34 | 37 | | |
35 | 38 | | |
| 39 | + | |
| 40 | + | |
36 | 41 | | |
37 | 42 | | |
38 | 43 | | |
39 | 44 | | |
40 | 45 | | |
| 46 | + | |
| 47 | + | |
41 | 48 | | |
42 | 49 | | |
43 | 50 | | |
| |||
209 | 216 | | |
210 | 217 | | |
211 | 218 | | |
| 219 | + | |
| 220 | + | |
212 | 221 | | |
213 | 222 | | |
214 | 223 | | |
| |||
226 | 235 | | |
227 | 236 | | |
228 | 237 | | |
229 | | - | |
230 | | - | |
231 | | - | |
232 | | - | |
233 | | - | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
234 | 255 | | |
235 | 256 | | |
236 | 257 | | |
| |||
347 | 368 | | |
348 | 369 | | |
349 | 370 | | |
| 371 | + | |
| 372 | + | |
| 373 | + | |
| 374 | + | |
| 375 | + | |
| 376 | + | |
| 377 | + | |
| 378 | + | |
| 379 | + | |
| 380 | + | |
| 381 | + | |
| 382 | + | |
| 383 | + | |
| 384 | + | |
| 385 | + | |
| 386 | + | |
| 387 | + | |
| 388 | + | |
| 389 | + | |
| 390 | + | |
| 391 | + | |
| 392 | + | |
| 393 | + | |
| 394 | + | |
| 395 | + | |
| 396 | + | |
| 397 | + | |
| 398 | + | |
| 399 | + | |
| 400 | + | |
| 401 | + | |
| 402 | + | |
| 403 | + | |
| 404 | + | |
| 405 | + | |
| 406 | + | |
| 407 | + | |
| 408 | + | |
| 409 | + | |
| 410 | + | |
| 411 | + | |
| 412 | + | |
| 413 | + | |
| 414 | + | |
| 415 | + | |
| 416 | + | |
| 417 | + | |
350 | 418 | | |
351 | 419 | | |
352 | 420 | | |
| |||
378 | 446 | | |
379 | 447 | | |
380 | 448 | | |
381 | | - | |
| 449 | + | |
382 | 450 | | |
383 | 451 | | |
384 | | - | |
385 | 452 | | |
386 | 453 | | |
387 | 454 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
157 | 157 | | |
158 | 158 | | |
159 | 159 | | |
160 | | - | |
| 160 | + | |
161 | 161 | | |
162 | | - | |
| 162 | + | |
163 | 163 | | |
164 | 164 | | |
165 | | - | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
166 | 171 | | |
167 | 172 | | |
168 | 173 | | |
| |||
173 | 178 | | |
174 | 179 | | |
175 | 180 | | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
176 | 186 | | |
177 | | - | |
178 | | - | |
179 | | - | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
180 | 191 | | |
181 | 192 | | |
182 | 193 | | |
| |||
192 | 203 | | |
193 | 204 | | |
194 | 205 | | |
195 | | - | |
196 | | - | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
197 | 209 | | |
198 | 210 | | |
199 | 211 | | |
| |||
202 | 214 | | |
203 | 215 | | |
204 | 216 | | |
205 | | - | |
206 | 217 | | |
207 | | - | |
208 | | - | |
209 | | - | |
210 | | - | |
211 | | - | |
212 | | - | |
213 | | - | |
214 | | - | |
215 | | - | |
216 | | - | |
217 | | - | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
218 | 230 | | |
219 | | - | |
220 | | - | |
221 | | - | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
222 | 242 | | |
223 | | - | |
224 | 243 | | |
225 | 244 | | |
226 | 245 | | |
| |||
419 | 438 | | |
420 | 439 | | |
421 | 440 | | |
422 | | - | |
| 441 | + | |
423 | 442 | | |
424 | 443 | | |
425 | 444 | | |
| |||
450 | 469 | | |
451 | 470 | | |
452 | 471 | | |
453 | | - | |
| 472 | + | |
454 | 473 | | |
455 | 474 | | |
456 | 475 | | |
| |||
483 | 502 | | |
484 | 503 | | |
485 | 504 | | |
486 | | - | |
| 505 | + | |
487 | 506 | | |
488 | 507 | | |
489 | 508 | | |
| |||
513 | 532 | | |
514 | 533 | | |
515 | 534 | | |
516 | | - | |
| 535 | + | |
| 536 | + | |
| 537 | + | |
| 538 | + | |
| 539 | + | |
| 540 | + | |
517 | 541 | | |
518 | 542 | | |
519 | 543 | | |
| |||
0 commit comments