Skip to content

[BatchUpdate 3/4] feat(dao): add batch orchestration pipeline for multi-aspect upsert#608

Open
jphui wants to merge 1 commit intomasterfrom
jhui/batch-orchestration
Open

[BatchUpdate 3/4] feat(dao): add batch orchestration pipeline for multi-aspect upsert#608
jphui wants to merge 1 commit intomasterfrom
jhui/batch-orchestration

Conversation

@jphui
Copy link
Copy Markdown
Contributor

@jphui jphui commented Apr 7, 2026

Summary

Third PR in the batch upsert series (split from #598). Adds the batch orchestration layer in BaseLocalDAO that coordinates multi-aspect upsert as a single transaction.

What this adds:

  • addManyBatch(urn, aspects, auditStamp, trackingContext) — public entry point for batch upsert of raw aspects
  • addManyBatchWithCallbacks(urn, lambdas, auditStamp, trackingContext) — entry point accepting AspectUpdateLambda objects (supports collection aspects + IngestionParams)
  • addManyBatchInternal() — orchestration: runs shouldUpdateAspect() filtering, delegates to batchUpsertAspects(), emits MAEs, executes aspect callbacks, enforces retention
  • EbeanLocalDAO.batchUpsertAspects() — delegates to EbeanLocalAccess.batchUpsert() (from PR feat: add batch SQL layer for multi-aspect upsert and batch deletion #607)

How it works:

  1. Caller provides URN + list of aspects (or AspectUpdateLambdas)
  2. For each aspect, resolve the latest value and apply shouldUpdateAspect() filtering
  3. Collect aspects that pass filtering into a single batchUpsert() call (one SQL roundtrip)
  4. For each successfully upserted aspect, emit MAE and execute aspect callbacks
  5. Enforce retention policies per aspect

Depends on:

Testing Done

BaseLocalDAOTest (unit tests, Mockito)

Test Purpose
testAddManyBatchMAEEmission Verifies MAE events are emitted for each aspect after batch upsert
testAddManyBatchMAEEmissionWithEqualitySkip MAE is NOT emitted when old value equals new value (equality skip)
testAddManyBatchPreUpdateHook Pre-update hooks fire for each aspect in the batch
testAddManyBatchPostUpdateHook Post-update hooks fire for each aspect in the batch
testAddManyBatchTransactionBehavior Entire batch executes within a single transaction
testAddManyBatchWithTrackingContext Tracking context is propagated through to MAE emission
testAddManyBatchRejectsDuplicateAspects Throws IllegalArgumentException if same aspect class appears twice
testAddManyBatchReturnsAllAspects Returns union entries for all aspects, including unchanged ones
testAddManyBatchWithSingleAspectCallback Aspect callback transforms value before persistence and MAE
testAddManyBatchWithSingleAspectCallbackSkipped Callback returning skipProcessing=true excludes the aspect from write
testAddManyBatchWithTwoAspectsWithCallbacksNeitherSkipped Both aspects have callbacks, both are transformed and written
testAddManyBatchWithTwoAspectsWithCallbacksOneSkipped Two aspects with callbacks, one skipped — only non-skipped is written
testAddManyBatchWithMixedCallbackRegistration One aspect has callback (transformed), other has none (written as-is)

EbeanLocalDAOTest (integration tests, embedded MariaDB)

Test Purpose
testAddManyBatch End-to-end: batch upsert persists multiple aspects to DB and they are retrievable
testAddManyBatchUpsertBehavior Second batch call updates existing rows (upsert semantics, not insert-only)
testAddManyBatchWithEqualitySkip Aspect unchanged from DB value is skipped — no DB write, no MAE
testAddManyBatchWithBackfillSkip Backfill event with older emit time is skipped
testAddManyBatchMixedEqualityChanges Batch with mix of changed and unchanged aspects — only changed ones written
testAddManyBatchRelationshipsUpdateWithOldValues Relationship updates receive correct old values for diff computation
testAddManyBatchAllAspectsRejectedByEquality All aspects equal — no SQL write executed, no MAEs emitted
testAddManyBatchRelationshipsNotWrittenWhenEqualityCheckFails Relationships are NOT updated for equality-skipped aspects
testAddManyBatchWithSingleAspectCallbackPersistedToDB Callback-transformed value is what gets persisted to DB
testAddManyBatchWithSingleAspectCallbackSkipped Callback skip excludes aspect from DB write
testAddManyBatchWithTwoAspectsWithCallbacksNeitherSkipped Both callback-transformed aspects persisted to DB
testAddManyBatchWithTwoAspectsWithCallbacksOneSkipped One callback-skipped, one written — DB contains only the written one
testAddManyBatchWithMixedCallbackRegistration Mixed callback registration — both aspects persisted correctly
testAddManyBatchWithSingleLambdaFunctionPersistedToDB Lambda Function Registry transforms value before persistence
testAddManyBatchWithTwoLambdaFunctionsNeitherSkipped Two aspects with lambda functions, both transformed and written
testAddManyBatchWithMixedLambdaRegistration One aspect has lambda function (transformed), other written as-is
testAddManyBatchWithLambdaMergingOldValue Lambda receives old DB value and merges with new value
testAddManyBatchComprehensiveWithAllCodePaths Comprehensive test exercising callbacks + lambdas + equality skip + backfill in one batch

Checklist

  • The PR conforms to DataHub's Contributing Guideline
  • Links to related issues (if applicable)
  • Docs related to the changes have been added/updated (if applicable)

…ti-aspect upsert

Add addManyBatch/addManyBatchWithCallbacks to BaseLocalDAO for transactional
multi-aspect upsert with MAE emission, callback execution, and retention
enforcement. EbeanLocalDAO delegates batch SQL to EbeanLocalAccess.batchUpsert.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@codecov-commenter
Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 83.13253% with 14 lines in your changes missing coverage. Please review.
✅ Project coverage is 65.79%. Comparing base (a08aa98) to head (8e7eb27).

Files with missing lines Patch % Lines
...n/java/com/linkedin/metadata/dao/BaseLocalDAO.java 81.57% 4 Missing and 10 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master     #608      +/-   ##
============================================
+ Coverage     65.56%   65.79%   +0.23%     
- Complexity     1759     1776      +17     
============================================
  Files           144      144              
  Lines          6847     6909      +62     
  Branches        829      840      +11     
============================================
+ Hits           4489     4546      +57     
+ Misses         1996     1994       -2     
- Partials        362      369       +7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.


// STEP 3: Execute batch SQL (1 query) - only for changed aspects
if (!contextsToWrite.isEmpty()) {
boolean isTestMode = contextsToWrite.stream()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If one aspect has isTestMode=true and another has isTestMode=false, the entire batch runs in test mode
^^ I guess it makes sense. Do we flag this to caller somehow/somewhere so they're aware of this behavior, or better still never mix them in a batch?

Copy link
Copy Markdown
Contributor

@chmeng0 chmeng0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants