Add experimental embedded blob SST support by pdillinger · Pull Request #14851 · facebook/rocksdb

pdillinger · 2026-06-14T22:11:08Z

Summary:
Add EXPERIMENTAL embedded blob SST support for SstFileWriter through OpenWithEmbeddedBlobs(). Eligible large values are written as same-file blob records in a strict prefix of a block-based SST, while table entries store same-file BlobIndex references that readers resolve for Get, MultiGet, and iteration, including mixed embedded and non-embedded wide-column values.

The first Codex implementation put much more of this feature directly in SstFileWriter, including table-builder timing and entry buffering concerns. This updated design is less intrusive generally: SstFileWriter selects the mode, while a BlockBasedTableBuilder wrapper owns prefix writing and replay into the normal builder. That keeps ownership closer to the table-building layer and is more open to possible generalization beyond SstFileWriter if that ever becomes useful. Regardless, this experimental feature is expected only to have niche applications.

Embedded blob record bounds are stored in a dedicated SST metaindex entry so readers have an explicit sanity/corruption-checking locator for the record range, separate from table block layout assumptions. Blob count and payload-byte totals are stored as an auxiliary table property for diagnostics and can be ignored by readers.

Same-file BlobIndex references use blob file number 0 as the marker. That value also serves as the invalid blob-file-number sentinel in broader metadata code, but the meanings do not conflict when used carefully: only the embedded-SST reader/writer path interprets 0 as same-file, while generic file-metadata paths continue to reject it as invalid. Using 1 would be worse because legacy "stackable" BlobDB can use low blob file numbers, including 1, so reserving it would collide with real blob files.

Compression options remain in the public API as placeholders, but embedded blob compression support is deferred. Integrating compression with BlockBasedTableBuilder while avoiding copied CompressAndVerifyBlock-style logic is tricky enough to deserve a separate, focused PR.

Test Plan:

Added BlobIndexTest.SameFileBlobIndex and BlobGarbageMeterTest.SameFileBlobIndex coverage for same-file BlobIndex encoding, display, recognition, and ignoring same-file references in blob-garbage accounting.
Extended FileMetaDataTest.UpdateBoundariesBlobIndex to preserve the generic zero-file-number corruption check while keeping same-file embedded blob semantics at the table-reader/writer layers.
Added SstFileReader embedded blob coverage for round-trip Get, MultiGet, and iterator reads; format_version gating; ignored placeholder compression options; the 2048-byte default min_blob_size; wide-column mixed embedded/non-embedded values; early append error surfacing; and reader tolerance for ignored bytes before and after the embedded blob record prefix.
Evaluated normal iterator CPU risk with DEBUG_LEVEL=0 PORTABLE=0 db_bench binaries against ../rocksdb_main on a non-embedded, cached 5M-key DB. Long forward scans (seekrandom --seek_nexts=65536), short forward scans (--seek_nexts=100), reverse long scans, and readseq showed no forward iterator CPU regression; reverse long scans were flat in cycles/op with a small instruction-count increase.

Summary: Add EXPERIMENTAL embedded blob SST support for SstFileWriter through OpenWithEmbeddedBlobs(). Eligible large values are written as same-file blob records in a strict prefix of a block-based SST, while table entries store same-file BlobIndex references that readers resolve for Get, MultiGet, and iteration, including mixed embedded and non-embedded wide-column values. The first Codex implementation put much more of this feature directly in SstFileWriter, including table-builder timing and entry buffering concerns. This updated design is less intrusive generally: SstFileWriter selects the mode, while a BlockBasedTableBuilder wrapper owns prefix writing and replay into the normal builder. That keeps ownership closer to the table-building layer and is more open to possible generalization beyond SstFileWriter if that ever becomes useful. Regardless, this experimental feature is expected only to have niche applications. Embedded blob record bounds are stored in a dedicated SST metaindex entry so readers have an explicit sanity/corruption-checking locator for the record range, separate from table block layout assumptions. Blob count and payload-byte totals are stored as an auxiliary table property for diagnostics and can be ignored by readers. Same-file BlobIndex references use blob file number 0 as the marker. That value also serves as the invalid blob-file-number sentinel in broader metadata code, but the meanings do not conflict when used carefully: only the embedded-SST reader/writer path interprets 0 as same-file, while generic file-metadata paths continue to reject it as invalid. Using 1 would be worse because legacy "stackable" BlobDB can use low blob file numbers, including 1, so reserving it would collide with real blob files. Compression options remain in the public API as placeholders, but embedded blob compression support is deferred. Integrating compression with BlockBasedTableBuilder while avoiding copied CompressAndVerifyBlock-style logic is tricky enough to deserve a separate, focused PR. Test Plan: - Added BlobIndexTest.SameFileBlobIndex and BlobGarbageMeterTest.SameFileBlobIndex coverage for same-file BlobIndex encoding, display, recognition, and ignoring same-file references in blob-garbage accounting. - Extended FileMetaDataTest.UpdateBoundariesBlobIndex to preserve the generic zero-file-number corruption check while keeping same-file embedded blob semantics at the table-reader/writer layers. - Added SstFileReader embedded blob coverage for round-trip Get, MultiGet, and iterator reads; format_version gating; ignored placeholder compression options; the 2048-byte default min_blob_size; wide-column mixed embedded/non-embedded values; early append error surfacing; and reader tolerance for ignored bytes before and after the embedded blob record prefix. - Evaluated normal iterator CPU risk with DEBUG_LEVEL=0 PORTABLE=0 db_bench binaries against ../rocksdb_main on a non-embedded, cached 5M-key DB. Long forward scans (seekrandom --seek_nexts=65536), short forward scans (--seek_nexts=100), reverse long scans, and readseq showed no forward iterator CPU regression; reverse long scans were flat in cycles/op with a small instruction-count increase.

meta-codesync · 2026-06-14T22:11:46Z

@pdillinger has imported this pull request. If you are a Meta employee, you can view this in D108564468.

github-actions · 2026-06-14T22:20:53Z

⚠️ clang-tidy: 2 warning(s) on changed lines

Completed in 492.7s.

Summary by check

Check	Count
`cppcoreguidelines-pro-type-member-init`	1
`cppcoreguidelines-special-member-functions`	1
Total	2

Details

table/block_based/block_based_table_builder.cc (2 warning(s))

table/block_based/block_based_table_builder.cc:3141:7: warning: class 'EmbeddedBlobBlockBasedTableBuilder' defines a non-default destructor but does not define a move constructor or a move assignment operator [cppcoreguidelines-special-member-functions]
table/block_based/block_based_table_builder.cc:3379:5: warning: uninitialized record type: 'trailer' [cppcoreguidelines-pro-type-member-init]

pdillinger requested a review from xingbowang June 14, 2026 22:11

meta-cla Bot added the CLA Signed label Jun 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add experimental embedded blob SST support#14851

Add experimental embedded blob SST support#14851
pdillinger wants to merge 1 commit into
facebook:mainfrom
pdillinger:sst_writer_embedded_blob

pdillinger commented Jun 14, 2026

Uh oh!

meta-codesync Bot commented Jun 14, 2026

Uh oh!

github-actions Bot commented Jun 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pdillinger commented Jun 14, 2026

Uh oh!

meta-codesync Bot commented Jun 14, 2026

Uh oh!

github-actions Bot commented Jun 14, 2026

⚠️ clang-tidy: 2 warning(s) on changed lines

Summary by check

Details

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant