Skip to content

feat(encryption) [2/N] Support encryption: Add streaming encryption/decryption#2286

Open
xanderbailey wants to merge 3 commits intoapache:mainfrom
xanderbailey:xb/streaming_encryption
Open

feat(encryption) [2/N] Support encryption: Add streaming encryption/decryption#2286
xanderbailey wants to merge 3 commits intoapache:mainfrom
xanderbailey:xb/streaming_encryption

Conversation

@xanderbailey
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Part of #2034

What changes are included in this PR?

Summary

This PR adds AGS1 stream encryption/decryption support for Iceberg, building on the core crypto primitives merged in #2026. It implements the block-based AES-GCM stream format used by Iceberg for encrypting manifest lists and manifest files, byte-compatible with Java's AesGcmInputStream / AesGcmOutputStream.

Motivation

The previous PR provided low-level AesGcmCipher encrypt/decrypt operations. This PR layers the AGS1 streaming format on top, enabling encryption of arbitrarily large files with random-access read support, this is a prerequisite for encrypting Iceberg metadata and data files.

Changes

New Module: encryption/stream.rs — AGS1 stream format implementation

  • AesGcmFileRead: Implements FileRead for transparent random-access decryption. Maps plaintext byte ranges to encrypted blocks, reads and decrypts them in a single I/O call, and returns the requested plaintext slice.
  • AesGcmFileWrite: Implements FileWrite for transparent streaming encryption. Buffers plaintext, emits encrypted AGS1 blocks when full, and finalizes on close.
  • stream_block_aad(): Constructs per-block AAD matching Java's Ciphers.streamBlockAAD().
  • Constants: PLAIN_BLOCK_SIZE (1 MiB), CIPHER_BLOCK_SIZE, GCM_STREAM_MAGIC, etc.

New Module: encryption/file_decryptor.rs — File-level decryption helper

  • AesGcmFileDecryptor: Holds decryption material (DEK + AAD prefix) and wraps a FileRead for transparent decryption.

New Module: encryption/file_encryptor.rs — File-level encryption helper

  • AesGcmFileEncryptor: Write-side counterpart; wraps a FileWrite for transparent encryption.

AGS1 File Format:

[Header: 8 bytes]
  "AGS1" magic (4 bytes) + plain block size u32 LE (4 bytes)
[Block 0..N]
  Nonce (12 bytes) + Ciphertext (up to 1 MiB) + GCM Tag (16 bytes)

Each block's AAD: aad_prefix || block_index (4 bytes, LE)

Java Compatibility

Java Class Rust Implementation
AesGcmInputStream AesGcmFileRead
AesGcmOutputStream AesGcmFileWrite
Ciphers.streamBlockAAD() stream_block_aad()
Ciphers.PLAIN_BLOCK_SIZE PLAIN_BLOCK_SIZE

Future Work

Upcoming PRs will add:

  1. Key management interfaces (KeyManagementClient trait)
  2. EncryptionManager implementation
  3. Key metadata serialization
  4. Integration with InputFile / OutputFile

Are these changes tested?

Yes

@xanderbailey xanderbailey force-pushed the xb/streaming_encryption branch from 5ce9c66 to aed1755 Compare March 25, 2026 09:49
@xanderbailey
Copy link
Copy Markdown
Contributor Author

cc: @mbutrovich

@mbutrovich mbutrovich self-requested a review March 25, 2026 15:07
@xanderbailey
Copy link
Copy Markdown
Contributor Author

@blackmwk any chance of a review here maybe?


// The buffer may be empty (for an empty file) or contain a partial block;
// either way it is encrypted and written as the final block (matching Java behavior).
let final_block = std::mem::take(&mut self.buffer);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If close() is called when self.buffer is empty, encrypt_and_write_block is called unconditionally:

self.encrypt_and_write_block(&final_block).await?;

AesGcmCipher::encrypt with empty input produces 28 bytes (12-byte nonce + 16-byte GCM tag), so this writes a spurious empty encrypted block.

Java's https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/encryption/AesGcmOutputStream.java#L145-L146 has a guard that skips this case:

if (positionInPlainBlock == 0 && currentBlockIndex != 0) {
  return;
}

This means a file encrypted in Rust with block-aligned data will be 28 bytes longer than the same file encrypted in Java, breaking the stated byte-compatibility goal.

Suggested fix:

if !self.buffer.is_empty() || self.block_index == 0 {
  let final_block = std::mem::take(&mut self.buffer);
  self.encrypt_and_write_block(&final_block).await?;
}

The logic is basically an inversion of the Java guard.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might want a test for this scenario where you write exactly the block-aligned amount, then try to close a file with nothing left in the buffer and make sure we don't add an empty block.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will take a look into this when I get home but this looks like a very good catch @mbutrovich, thanks!

pub const GCM_STREAM_HEADER_LENGTH: u32 = 8;

/// Minimum valid AGS1 stream length (header + one empty block).
#[allow(dead_code)]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be #[cfg(test)] instead?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants