Commit 69a4b4e
Pass file metadata to IngestExternalFile to improve ingestion latency (#14837)
Summary:
External file ingestion (`DB::IngestExternalFile[s]`) re-opens every SST file and scans it -- footer, properties, index, filter, and the first/last data blocks -- to recompute the boundary keys, sequence-number bounds, and table properties before committing. For cold, I/O-bound files this scan dominates ingest latency, even when the file is moved/linked rather than copied.
This change lets a caller skip that work when it already has the file's metadata. `SstFileWriter::Finish()` now returns a `PreparedFileInfo` with the file size, table properties, and prepared internal `smallest`/`largest` bounds, and a new `IngestExternalFileArg::file_infos` field carries one `PreparedFileInfo` per file into `IngestExternalFiles()`. When set, ingestion reuses that metadata instead of re-opening and scanning each file. The file is still copied/linked, and the checksum is still verified when `verify_checksums_before_ingest` is set (the fast path opens the file only for that). Point-key and range-deletion bounds are folded into the same prepared bound pair, and user-defined-timestamp files (including the "UDT in Memtables only" format, whose boundary keys carry no timestamp) are handled.
Internally, ingestion-job metadata acquisition was split into `GetIngestedFileInfoFromFileInfo` (reuse caller metadata) and `GetIngestedFileInfoFromFile` (open + scan). Prepared boundary updates use RocksDB comparators, and the timestamp-stripping path pads prepared bounds back to the internal timestamp shape before installing the file.
## Benchmark Results
`db_bench ingestexternalfile` benchmark, release build, on SSD (btrfs). Files are linked (`move_files`) so the measurement isolates the ingest path rather than file-copy throughput.
Each generated SST was `121,883,007` bytes (`116.24 MiB`). The benchmark used 1M keys/SST and 5 ingest batches per run.
| Batch size | Files/run | Config | Prepare P50 | Run P50 | Total ingestion P50 | Total P50 drop |
| --- | --- | --- | --- | --- | --- | --- |
| 10 | 50 | Baseline | 20.667 ms | 5.500 ms | 26.167 ms | -- |
| 10 | 50 | `file_info=true` | 7.838 ms | 9.826 ms | 17.664 ms | 32.5% |
| 30 | 150 | Baseline | 59.278 ms | 20.667 ms | 79.945 ms | -- |
| 30 | 150 | `file_info=true` | 18.000 ms | 31.167 ms | 49.167 ms | 38.5% |
| 50 | 250 | Baseline | 92.500 ms | 37.250 ms | 129.750 ms | -- |
| 50 | 250 | `file_info=true` | 29.416 ms | 62.500 ms | 91.916 ms | 29.2% |
The prepared metadata path cuts `Prepare()` substantially. `Run()` is longer with `file_info=true` because the baseline path opens/scans the table during `Prepare()`, which warms OS/block-cache state before the later table-cache open in `Run()`. A syscall trace of the `file_info=true` path showed the remaining prepare time is mostly due to link/sync syscalls.
Benchmark args: `--benchmarks=ingestexternalfile --num=1000000 --value_size=100 --compression_type=none --ingest_external_file_batch_size=<10|30|50> --ingest_external_file_num_batches=5 --disable_auto_compactions=1 --statistics --ingest_external_file_use_file_info=<false|true>`.
Differential Revision: D1077212611 parent 828f6d1 commit 69a4b4e
15 files changed
Lines changed: 913 additions & 138 deletions
File tree
- db_stress_tool
- db
- db_impl
- include/rocksdb
- table
- tools
- unreleased_history/public_api_changes
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
6709 | 6709 | | |
6710 | 6710 | | |
6711 | 6711 | | |
| 6712 | + | |
| 6713 | + | |
| 6714 | + | |
| 6715 | + | |
| 6716 | + | |
| 6717 | + | |
| 6718 | + | |
| 6719 | + | |
| 6720 | + | |
| 6721 | + | |
| 6722 | + | |
| 6723 | + | |
| 6724 | + | |
| 6725 | + | |
| 6726 | + | |
| 6727 | + | |
| 6728 | + | |
| 6729 | + | |
| 6730 | + | |
| 6731 | + | |
| 6732 | + | |
6712 | 6733 | | |
6713 | 6734 | | |
6714 | 6735 | | |
| |||
6802 | 6823 | | |
6803 | 6824 | | |
6804 | 6825 | | |
6805 | | - | |
6806 | | - | |
| 6826 | + | |
| 6827 | + | |
| 6828 | + | |
6807 | 6829 | | |
6808 | 6830 | | |
6809 | 6831 | | |
| |||
6818 | 6840 | | |
6819 | 6841 | | |
6820 | 6842 | | |
6821 | | - | |
6822 | | - | |
| 6843 | + | |
| 6844 | + | |
| 6845 | + | |
6823 | 6846 | | |
6824 | 6847 | | |
6825 | 6848 | | |
| |||
Large diffs are not rendered by default.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
248 | 248 | | |
249 | 249 | | |
250 | 250 | | |
| 251 | + | |
251 | 252 | | |
252 | 253 | | |
253 | 254 | | |
| |||
315 | 316 | | |
316 | 317 | | |
317 | 318 | | |
318 | | - | |
319 | | - | |
320 | | - | |
| 319 | + | |
| 320 | + | |
321 | 321 | | |
322 | 322 | | |
323 | | - | |
| 323 | + | |
| 324 | + | |
| 325 | + | |
324 | 326 | | |
325 | 327 | | |
| 328 | + | |
326 | 329 | | |
327 | 330 | | |
328 | 331 | | |
| 332 | + | |
| 333 | + | |
| 334 | + | |
| 335 | + | |
| 336 | + | |
| 337 | + | |
| 338 | + | |
| 339 | + | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
329 | 350 | | |
330 | 351 | | |
331 | 352 | | |
| |||
0 commit comments