Skip to content

Implementing BasicEncryptor with the optimized TypedBuffers#226

Merged
avalerio-tkd merged 8 commits into
mainfrom
av_typelist_optimizing_067
Mar 9, 2026
Merged

Implementing BasicEncryptor with the optimized TypedBuffers#226
avalerio-tkd merged 8 commits into
mainfrom
av_typelist_optimizing_067

Conversation

@avalerio-tkd

Copy link
Copy Markdown
Collaborator
  • Adding new BasicXOREncryptor implementation (transitional name before renaming to BasicEncryptor).
  • Implementing BasicXOREncryptor with typed buffer optimizations.
  • Added methods to TypedBuffer to get the write span for an element during SetElement calls
  • Added mehtods to TypedBuffer for getters of attributes.

…re renaming to BasicEncryptor).

- Implementing BasicXOREncryptor with typed buffer optimizations.
- Added methods to TypedBuffer to get the write span for an element during SetElement calls
- Added mehtods to TypedBuffer for getters of attributes.
- Updating unittests
- Including a temporary XorEncryptorInterface during the migration.

@argmarco-tkd argmarco-tkd left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this. Overall the code seems correct, but I do have concerns around code complexity (left a few specific comments)

Comment thread src/processing/encryptors/encryptor_utils.h Outdated
Comment thread src/processing/encryptors/encryptor_utils_test.cpp
Comment thread src/processing/typed_buffer.h
Comment thread src/processing/typed_buffer.h
Comment thread src/processing/typed_buffer.h
<< " user=" << user_id_ << " key=" << key_id_
<< " datatype=" << dbps::enum_utils::to_string(datatype_) << std::endl;

return std::visit([&](const auto& input_buffer) -> std::vector<uint8_t> {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we simplify this code? (e.g. remove the use of lambdas?) BasicEncryptor is supposed to be an example encryptor - this implementation makes legibility a bit hard.

@avalerio-tkd avalerio-tkd Mar 8, 2026

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great comment. Totally agree on keeping BasicEncryptor as readable as possible.

However, we can't remove this one since it's a needed visit due to TypeBuffer overloading. There are workarounds but all end up doing a visit somewhere, just placed somewhere else. This is in general a known boilerplate pattern for accessing variant types in cpp.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline. Added a TODO note in-place for this and will address it in a followup cleanup.

encrypted_bytes, kFixedHeaderLength,
RawBytesFixedSizedCodec{header.element_size}};

auto decrypt_fixed_into = [&](auto output) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simillar than for the encrypt. Can we optimize for legibility here?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved it to a separate helper function.

return output;
};

// TODO: This is leaking Parquet-specific types into the encryptor, which should be agnostic of Parquet.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the call out. why was this not needed before?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous implementation had the same dependency, just that it was harder to read because it was used indirectly by a helper function on parquet_utils, so harder to detect. I didn't realize it either.

I have a possible solution in mind that we can discuss offline. The gist is we can add a type annotation to the output. This can come from the Codec that generates it, could be as simple as a unique byte value. This would be protected by the version check if the Codec code changes.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline. The TODO note capture the pending item. Will leave as-is for this PR and will address in a followup.

Comment thread src/processing/typed_buffer.h Outdated
Comment thread src/processing/typed_buffer.h Outdated
void SetRawElement(size_t position, tcb::span<const uint8_t> raw);

// Getters for immediately available properties.
size_t GetSpanSize() const { return elements_span_.size(); }

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetSpanSize seems like leakage of impl details - can a simple 'GetSize()` do?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to GetRawBufferSize(). GetSize() is ambiguous/too close to other values related to ElementSize.

@avalerio-tkd avalerio-tkd left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review and the comments. Will followup to chat offline and make decisions of code complexity VS efficiencies trade-offs, all valid concerns on the review.

Comment thread src/processing/encryptors/encryptor_utils.h Outdated
<< " user=" << user_id_ << " key=" << key_id_
<< " datatype=" << dbps::enum_utils::to_string(datatype_) << std::endl;

return std::visit([&](const auto& input_buffer) -> std::vector<uint8_t> {

@avalerio-tkd avalerio-tkd Mar 8, 2026

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great comment. Totally agree on keeping BasicEncryptor as readable as possible.

However, we can't remove this one since it's a needed visit due to TypeBuffer overloading. There are workarounds but all end up doing a visit somewhere, just placed somewhere else. This is in general a known boilerplate pattern for accessing variant types in cpp.

encrypted_bytes, kFixedHeaderLength,
RawBytesFixedSizedCodec{header.element_size}};

auto decrypt_fixed_into = [&](auto output) {

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved it to a separate helper function.

return output;
};

// TODO: This is leaking Parquet-specific types into the encryptor, which should be agnostic of Parquet.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous implementation had the same dependency, just that it was harder to read because it was used indirectly by a helper function on parquet_utils, so harder to detect. I didn't realize it either.

I have a possible solution in mind that we can discuss offline. The gist is we can add a type annotation to the output. This can come from the Codec that generates it, could be as simple as a unique byte value. This would be protected by the version check if the Codec code changes.

Comment thread src/processing/encryptors/encryptor_utils_test.cpp
Comment thread src/processing/typed_buffer.h Outdated
void SetRawElement(size_t position, tcb::span<const uint8_t> raw);

// Getters for immediately available properties.
size_t GetSpanSize() const { return elements_span_.size(); }

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to GetRawBufferSize(). GetSize() is ambiguous/too close to other values related to ElementSize.

Comment thread src/processing/typed_buffer.h
Comment thread src/processing/typed_buffer.h
Comment thread src/processing/typed_buffer.h Outdated
Comment thread src/processing/typed_buffer.h
- Added protection to GetWriteSpanForElement to prevent an object copy by misbehaved callers.
@avalerio-tkd

Copy link
Copy Markdown
Collaborator Author

Updated after offline discussions. @argmarco-tkd could you PTAL?

@argmarco-tkd argmarco-tkd left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the discussion and the changes. LGTM, Ship it!

- Added cached current_element_size_ to TypedBuffer iterator.
@avalerio-tkd avalerio-tkd merged commit 9afea2d into main Mar 9, 2026
2 checks passed
@avalerio-tkd avalerio-tkd deleted the av_typelist_optimizing_067 branch March 17, 2026 02:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants