Skip to content

[KYUUBI #7126][LINEAGE] Support merge into syntax in row level catalog#7127

Closed
yabola wants to merge 7 commits into
apache:masterfrom
yabola:master-listener
Closed

[KYUUBI #7126][LINEAGE] Support merge into syntax in row level catalog#7127
yabola wants to merge 7 commits into
apache:masterfrom
yabola:master-listener

Conversation

@yabola

@yabola yabola commented Jul 7, 2025

Copy link
Copy Markdown
Contributor

Why are the changes needed?

In Catalog which supports row level interface (iceberg etc.), merge into will be rewritten as WriteDelta or ReplaceData by rule . We should support the extraction of lineage relationship under this type.

How was this patch tested?

add new tests for row-level catalog

Was this patch authored or co-authored using generative AI tooling?

no


private def extractInstructionOutputs(instruction: Expression): Seq[Expression] = {
instruction match {
case p if p.nodeName == "Split" => getField[Seq[Expression]](p, "otherOutput")

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

val LOCAL_TABLE_IDENTIFIER = "__local__"
val METADATA_COL_ATTR_KEY = "__metadata_col"
val ORIGINAL_ROW_ID_VALUE_PREFIX: String = "__original_row_id_"
private val LOG = LoggerFactory.getLogger(classOf[LineageParser])

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import org.apache.spark.internal.Logging

... with Logging

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry , I didn't use this LOG, so I deleted it. And the trait LineageParser doesn't seem to work with Logging

override def catalogName: String = {
if (SPARK_RUNTIME_VERSION <= "3.1") {
"org.apache.spark.sql.connector.InMemoryTableCatalog"
} else if (SPARK_RUNTIME_VERSION <= "3.2") {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we only support 3.3 and above now

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kyuubi Spark Listener Extension

I see lineage plugin can support 3.1

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have officially removed support prior Spark 3.2, some code/docs are left over to clean up, such code is unreachable since CI was removed

@pan3793 pan3793 requested a review from wForget July 7, 2025 10:25
@yabola yabola force-pushed the master-listener branch from 4b1933d to 295941a Compare July 7, 2025 12:11
@yabola yabola force-pushed the master-listener branch from 295941a to 85de5b0 Compare July 7, 2025 12:43
@codecov-commenter

codecov-commenter commented Jul 7, 2025

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 0% with 36 lines in your changes missing coverage. Please review.
✅ Project coverage is 0.00%. Comparing base (cad5a39) to head (e39d0d9).
⚠️ Report is 137 commits behind head on master.

Files with missing lines Patch % Lines
...in/lineage/helper/SparkSQLLineageParseHelper.scala 0.00% 36 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master   #7127    +/-   ##
=======================================
  Coverage    0.00%   0.00%            
=======================================
  Files         697     700     +3     
  Lines       43203   43490   +287     
  Branches     5854    5894    +40     
=======================================
- Misses      43203   43490   +287     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

.map(extractInstructionOutputs)
val nextColumnsLineage = ListMap(p.output.indices.map { index =>
val keyAttr = p.output(index)
val instructionOutputs = instructionsOutputs.map(_(index))

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this index match always correct?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I am busy these days. I'll confirm later.

@yabola yabola Jul 14, 2025

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wForget I think index match should always be correct.

  1. ResolveRowLevelCommandAssignments#alignActions will align the length of the assignments in the merge-into actions to match the target table output.
  2. RewriteMergeIntoTable will rewrite the merge-into operation as either WriteDelta or ReplaceData only when it has being an already aligned plan.
  3. I have reviewed all the places where mergeRows is referenced.
    buildReplaceDataPlan -> target table output + metadataAttrs
    buildWriteDeltaPlan -> operationTypeAttr + target table output + metadataAttrs + originalRowIdAttrs
    RewriteMergeIntoTable -> assignments(target table output)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your explanation, I will continue to review this PR later.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wForget Hi~ Please review the code when you have time. I have another PR that needs to solve a case for asubquery (not related to this PR, but there will be code conflicts).

@wForget wForget left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks.

@wForget wForget added this to the v1.11.0 milestone Aug 13, 2025
@wForget wForget closed this in fb2b7ef Aug 13, 2025
@wForget

wForget commented Aug 13, 2025

Copy link
Copy Markdown
Member

Thanks, merged to master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants