[flink] Support stream read Chain Table#8262
Open
yunfengzhou-hub wants to merge 1 commit into
Open
Conversation
4ca1aab to
49a4f20
Compare
49a4f20 to
f2dc523
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
Chain Table (
chain-table.enabled=true) separates data into asnapshotbranch (batch-imported full partitions) and adeltabranch (incremental updates). Prior to this change, streaming read was not supported because the standardDataTableStreamScanis unaware of the two-branch architecture.This PR introduces
ChainTableFileStoreTable(a wrapper overFallbackReadFileStoreTable) andChainTableStreamScanwhich implements a two-phase streaming scan: Phase 1 does a full load by reading delta data pinned to the current snapshot and merging snapshot files for overlapping partitions; Phase 2 incrementally monitors the delta branch only, returningDataSplit(isStreaming=true)for changelog passthrough. The snapshot-pinning strategy makes the Phase 1 / Phase 2 boundary deterministic — no overlap or data loss regardless of concurrent commits.Tests
Added
FlinkChainTableITCasewith 16 tests (all passing, ~75s):changelog-producer=inputWHEREpredicate forwarding,withShardforwardingscan.mode=latestbypass,changelog-producer=nonerejectionrestore(id, scanAll=true)andrestore(null, scanAll=true)state resetchain-partition-keysgroup partition streaming