feat(encryption) [2/N] Support encryption: Add streaming encryption/decryption#2286
feat(encryption) [2/N] Support encryption: Add streaming encryption/decryption#2286xanderbailey wants to merge 3 commits intoapache:mainfrom
Conversation
5ce9c66 to
aed1755
Compare
|
cc: @mbutrovich |
|
@blackmwk any chance of a review here maybe? |
|
|
||
| // The buffer may be empty (for an empty file) or contain a partial block; | ||
| // either way it is encrypted and written as the final block (matching Java behavior). | ||
| let final_block = std::mem::take(&mut self.buffer); |
There was a problem hiding this comment.
If close() is called when self.buffer is empty, encrypt_and_write_block is called unconditionally:
self.encrypt_and_write_block(&final_block).await?;AesGcmCipher::encrypt with empty input produces 28 bytes (12-byte nonce + 16-byte GCM tag), so this writes a spurious empty encrypted block.
Java's https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/encryption/AesGcmOutputStream.java#L145-L146 has a guard that skips this case:
if (positionInPlainBlock == 0 && currentBlockIndex != 0) {
return;
}This means a file encrypted in Rust with block-aligned data will be 28 bytes longer than the same file encrypted in Java, breaking the stated byte-compatibility goal.
Suggested fix:
if !self.buffer.is_empty() || self.block_index == 0 {
let final_block = std::mem::take(&mut self.buffer);
self.encrypt_and_write_block(&final_block).await?;
}The logic is basically an inversion of the Java guard.
There was a problem hiding this comment.
Might want a test for this scenario where you write exactly the block-aligned amount, then try to close a file with nothing left in the buffer and make sure we don't add an empty block.
There was a problem hiding this comment.
Will take a look into this when I get home but this looks like a very good catch @mbutrovich, thanks!
| pub const GCM_STREAM_HEADER_LENGTH: u32 = 8; | ||
|
|
||
| /// Minimum valid AGS1 stream length (header + one empty block). | ||
| #[allow(dead_code)] |
There was a problem hiding this comment.
Can this be #[cfg(test)] instead?
Which issue does this PR close?
Part of #2034
What changes are included in this PR?
Summary
This PR adds AGS1 stream encryption/decryption support for Iceberg, building on the core crypto primitives merged in #2026. It implements the block-based AES-GCM stream format used by Iceberg for encrypting manifest lists and manifest files, byte-compatible with Java's
AesGcmInputStream/AesGcmOutputStream.Motivation
The previous PR provided low-level
AesGcmCipherencrypt/decrypt operations. This PR layers the AGS1 streaming format on top, enabling encryption of arbitrarily large files with random-access read support, this is a prerequisite for encrypting Iceberg metadata and data files.Changes
New Module:
encryption/stream.rs— AGS1 stream format implementationAesGcmFileRead: ImplementsFileReadfor transparent random-access decryption. Maps plaintext byte ranges to encrypted blocks, reads and decrypts them in a single I/O call, and returns the requested plaintext slice.AesGcmFileWrite: ImplementsFileWritefor transparent streaming encryption. Buffers plaintext, emits encrypted AGS1 blocks when full, and finalizes on close.stream_block_aad(): Constructs per-block AAD matching Java'sCiphers.streamBlockAAD().PLAIN_BLOCK_SIZE(1 MiB),CIPHER_BLOCK_SIZE,GCM_STREAM_MAGIC, etc.New Module:
encryption/file_decryptor.rs— File-level decryption helperAesGcmFileDecryptor: Holds decryption material (DEK + AAD prefix) and wraps aFileReadfor transparent decryption.New Module:
encryption/file_encryptor.rs— File-level encryption helperAesGcmFileEncryptor: Write-side counterpart; wraps aFileWritefor transparent encryption.AGS1 File Format:
Each block's AAD:
aad_prefix || block_index (4 bytes, LE)Java Compatibility
AesGcmInputStreamAesGcmFileReadAesGcmOutputStreamAesGcmFileWriteCiphers.streamBlockAAD()stream_block_aad()Ciphers.PLAIN_BLOCK_SIZEPLAIN_BLOCK_SIZEFuture Work
Upcoming PRs will add:
KeyManagementClienttrait)EncryptionManagerimplementationInputFile/OutputFileAre these changes tested?
Yes