Support Concurrent Clustering of Small MOR File Groups with Upserts Using pendingReplacedFileIdMap #18433
suryaprasanna
started this conversation in
New Feature Ideas
Replies: 3 comments 1 reply
-
|
CC @danny0405 |
Beta Was this translation helpful? Give feedback.
1 reply
-
|
Thanks for the discussion (since I was also curious how we could prevent clustering <-> upsert conflicts for frequent writes to MOR).
|
Beta Was this translation helpful? Give feedback.
0 replies
-
|
Thanks for writing this up. I have a few concerns with respect to the proposal:
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I would like to discuss a design for supporting concurrent clustering and upserts for MOR tables using a new structure called pendingReplacedFileIdMap in the FileSystemView.
Problem:
Assume there are file groups F1 and F2 with the following file structure.
F1: 1 base file + 2 log files
F2: 1 base file + 3 log files
Assume clustering selects file groups F1 and F2 and plans to replace them and finally create F3.
Another motivation is write amplification.
Proposal
Store a pendingReplacedFileIdMap in the File system view interface.
It is something like, F1 -> F3 & F2 -> F3. So, the requested clustering instant would contain following information. This requires a change in the clustering plan version.
Expected behavior
Writer
If clustering plans to replace F1 and F2 with F3, then new writes should be redirected to F3 instead of continuing to write to F1 and F2. This we may not be able to achieve in a straight forward manner, so let us consider all the scenarios and what happens in each of these scenarios.
Spark (Batch flow)
Flink (Streaming flow)
I do not have complete knowledge about Flink internals on which operator is best suited for the change. That I will leave it to Flink experts to decide, but overall following scenarios can happen,
Not sure, if we want to persist pendingReplacedFileIdMap to operator for seamless translation of F1 -> F3 and F2 -> F3.
Reader
Read Optimized
Read optimized is easy to handle as it returns the base files directly.
Snapshot reads:
Snapshot reads might need some changes, since when F1 and F2 are getting replaced by F3 and the F3 base files is not created but updates on F1 and F2 might still be present on F3 log file.
Beta Was this translation helpful? Give feedback.
All reactions