[KYUUBI #7126][LINEAGE] Support merge into syntax in row level catalog#7127
[KYUUBI #7126][LINEAGE] Support merge into syntax in row level catalog#7127yabola wants to merge 7 commits into
Conversation
|
|
||
| private def extractInstructionOutputs(instruction: Expression): Seq[Expression] = { | ||
| instruction match { | ||
| case p if p.nodeName == "Split" => getField[Seq[Expression]](p, "otherOutput") |
There was a problem hiding this comment.
According to spark's implementation , we should use otherOutput
https://github.com/apache/spark/blob/46b6ccbd93c4fe5c2b72f730a776a2739bdbc7b4/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteMergeIntoTable.scala#L467-L470
| val LOCAL_TABLE_IDENTIFIER = "__local__" | ||
| val METADATA_COL_ATTR_KEY = "__metadata_col" | ||
| val ORIGINAL_ROW_ID_VALUE_PREFIX: String = "__original_row_id_" | ||
| private val LOG = LoggerFactory.getLogger(classOf[LineageParser]) |
There was a problem hiding this comment.
import org.apache.spark.internal.Logging
... with Logging
There was a problem hiding this comment.
sorry , I didn't use this LOG, so I deleted it. And the trait LineageParser doesn't seem to work with Logging
| override def catalogName: String = { | ||
| if (SPARK_RUNTIME_VERSION <= "3.1") { | ||
| "org.apache.spark.sql.connector.InMemoryTableCatalog" | ||
| } else if (SPARK_RUNTIME_VERSION <= "3.2") { |
There was a problem hiding this comment.
we only support 3.3 and above now
There was a problem hiding this comment.
Kyuubi Spark Listener Extension
I see lineage plugin can support 3.1
There was a problem hiding this comment.
We have officially removed support prior Spark 3.2, some code/docs are left over to clean up, such code is unreachable since CI was removed
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #7127 +/- ##
=======================================
Coverage 0.00% 0.00%
=======================================
Files 697 700 +3
Lines 43203 43490 +287
Branches 5854 5894 +40
=======================================
- Misses 43203 43490 +287 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| .map(extractInstructionOutputs) | ||
| val nextColumnsLineage = ListMap(p.output.indices.map { index => | ||
| val keyAttr = p.output(index) | ||
| val instructionOutputs = instructionsOutputs.map(_(index)) |
There was a problem hiding this comment.
Is this index match always correct?
There was a problem hiding this comment.
Sorry, I am busy these days. I'll confirm later.
There was a problem hiding this comment.
@wForget I think index match should always be correct.
ResolveRowLevelCommandAssignments#alignActionswill align the length of the assignments in the merge-into actions to match the target table output.RewriteMergeIntoTablewill rewrite the merge-into operation as eitherWriteDeltaorReplaceDataonly when it has being an already aligned plan.- I have reviewed all the places where
mergeRowsis referenced.
buildReplaceDataPlan-> target table output +metadataAttrs
buildWriteDeltaPlan->operationTypeAttr+ target table output +metadataAttrs+originalRowIdAttrs
RewriteMergeIntoTable-> assignments(target table output)
There was a problem hiding this comment.
Thank you for your explanation, I will continue to review this PR later.
There was a problem hiding this comment.
@wForget Hi~ Please review the code when you have time. I have another PR that needs to solve a case for asubquery (not related to this PR, but there will be code conflicts).
|
Thanks, merged to master |
Why are the changes needed?
In Catalog which supports row level interface (iceberg etc.), merge into will be rewritten as WriteDelta or ReplaceData by rule . We should support the extraction of lineage relationship under this type.
How was this patch tested?
add new tests for row-level catalog
Was this patch authored or co-authored using generative AI tooling?
no