Add experimental embedded blob SST support#14851
Open
pdillinger wants to merge 1 commit into
Open
Conversation
Summary: Add EXPERIMENTAL embedded blob SST support for SstFileWriter through OpenWithEmbeddedBlobs(). Eligible large values are written as same-file blob records in a strict prefix of a block-based SST, while table entries store same-file BlobIndex references that readers resolve for Get, MultiGet, and iteration, including mixed embedded and non-embedded wide-column values. The first Codex implementation put much more of this feature directly in SstFileWriter, including table-builder timing and entry buffering concerns. This updated design is less intrusive generally: SstFileWriter selects the mode, while a BlockBasedTableBuilder wrapper owns prefix writing and replay into the normal builder. That keeps ownership closer to the table-building layer and is more open to possible generalization beyond SstFileWriter if that ever becomes useful. Regardless, this experimental feature is expected only to have niche applications. Embedded blob record bounds are stored in a dedicated SST metaindex entry so readers have an explicit sanity/corruption-checking locator for the record range, separate from table block layout assumptions. Blob count and payload-byte totals are stored as an auxiliary table property for diagnostics and can be ignored by readers. Same-file BlobIndex references use blob file number 0 as the marker. That value also serves as the invalid blob-file-number sentinel in broader metadata code, but the meanings do not conflict when used carefully: only the embedded-SST reader/writer path interprets 0 as same-file, while generic file-metadata paths continue to reject it as invalid. Using 1 would be worse because legacy "stackable" BlobDB can use low blob file numbers, including 1, so reserving it would collide with real blob files. Compression options remain in the public API as placeholders, but embedded blob compression support is deferred. Integrating compression with BlockBasedTableBuilder while avoiding copied CompressAndVerifyBlock-style logic is tricky enough to deserve a separate, focused PR. Test Plan: - Added BlobIndexTest.SameFileBlobIndex and BlobGarbageMeterTest.SameFileBlobIndex coverage for same-file BlobIndex encoding, display, recognition, and ignoring same-file references in blob-garbage accounting. - Extended FileMetaDataTest.UpdateBoundariesBlobIndex to preserve the generic zero-file-number corruption check while keeping same-file embedded blob semantics at the table-reader/writer layers. - Added SstFileReader embedded blob coverage for round-trip Get, MultiGet, and iterator reads; format_version gating; ignored placeholder compression options; the 2048-byte default min_blob_size; wide-column mixed embedded/non-embedded values; early append error surfacing; and reader tolerance for ignored bytes before and after the embedded blob record prefix. - Evaluated normal iterator CPU risk with DEBUG_LEVEL=0 PORTABLE=0 db_bench binaries against ../rocksdb_main on a non-embedded, cached 5M-key DB. Long forward scans (seekrandom --seek_nexts=65536), short forward scans (--seek_nexts=100), reverse long scans, and readseq showed no forward iterator CPU regression; reverse long scans were flat in cycles/op with a small instruction-count increase.
|
@pdillinger has imported this pull request. If you are a Meta employee, you can view this in D108564468. |
|
| Check | Count |
|---|---|
cppcoreguidelines-pro-type-member-init |
1 |
cppcoreguidelines-special-member-functions |
1 |
| Total | 2 |
Details
table/block_based/block_based_table_builder.cc (2 warning(s))
table/block_based/block_based_table_builder.cc:3141:7: warning: class 'EmbeddedBlobBlockBasedTableBuilder' defines a non-default destructor but does not define a move constructor or a move assignment operator [cppcoreguidelines-special-member-functions]
table/block_based/block_based_table_builder.cc:3379:5: warning: uninitialized record type: 'trailer' [cppcoreguidelines-pro-type-member-init]
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary:
Add EXPERIMENTAL embedded blob SST support for SstFileWriter through OpenWithEmbeddedBlobs(). Eligible large values are written as same-file blob records in a strict prefix of a block-based SST, while table entries store same-file BlobIndex references that readers resolve for Get, MultiGet, and iteration, including mixed embedded and non-embedded wide-column values.
The first Codex implementation put much more of this feature directly in SstFileWriter, including table-builder timing and entry buffering concerns. This updated design is less intrusive generally: SstFileWriter selects the mode, while a BlockBasedTableBuilder wrapper owns prefix writing and replay into the normal builder. That keeps ownership closer to the table-building layer and is more open to possible generalization beyond SstFileWriter if that ever becomes useful. Regardless, this experimental feature is expected only to have niche applications.
Embedded blob record bounds are stored in a dedicated SST metaindex entry so readers have an explicit sanity/corruption-checking locator for the record range, separate from table block layout assumptions. Blob count and payload-byte totals are stored as an auxiliary table property for diagnostics and can be ignored by readers.
Same-file BlobIndex references use blob file number 0 as the marker. That value also serves as the invalid blob-file-number sentinel in broader metadata code, but the meanings do not conflict when used carefully: only the embedded-SST reader/writer path interprets 0 as same-file, while generic file-metadata paths continue to reject it as invalid. Using 1 would be worse because legacy "stackable" BlobDB can use low blob file numbers, including 1, so reserving it would collide with real blob files.
Compression options remain in the public API as placeholders, but embedded blob compression support is deferred. Integrating compression with BlockBasedTableBuilder while avoiding copied CompressAndVerifyBlock-style logic is tricky enough to deserve a separate, focused PR.
Test Plan: