Support Selective Metafield Population: Enable only _hoodie_commit_time for Incremental Reads #17959
Replies: 3 comments
-
|
+1, the gains are remarkable:
|
Beta Was this translation helpful? Give feedback.
-
|
Created an issue to track this: #18383 The issue includes a detailed design for a new config |
Beta Was this translation helpful? Give feedback.
-
|
Closing the discussion as we have an agreement on the change. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Context
Hudi currently supports an all-or-nothing approach for metafield population. Users can either:
When metafields are disabled, Hudi operations still function correctly because certain fields like _hoodie_record_key are virtualized - they're automatically generated on-the-go during operations like upserts. This virtualization approach works seamlessly without requiring the field to be physically stored.
However, incremental reads rely on the _hoodie_commit_time metafield to identify and read delta changes at the record level. This field is essential for incremental query patterns, enabling Hudi to provide efficient record-level change tracking.
Problem Statement
Users face a trade-off between storage efficiency and incremental read capabilities:
The gap: There's no option to selectively enable only the metafields that cannot be virtualized or are critical for certain query patterns (like _hoodie_commit_time for incremental reads) while excluding others that can be generated on-the-go.
This results in unnecessary storage overhead for users who need incremental reads but don't require the other metafields to be physically stored.
Proposed Solution
Introduce selective metafield population that allows users to:
This would provide a middle ground between the current all-or-nothing approach, optimizing for both storage efficiency and functional requirements.
Beta Was this translation helpful? Give feedback.
All reactions