Fix: truncate destination table on FULL_TABLE replication to prevent …#6077
Open
abbasikov wants to merge 1 commit intomage-ai:masterfrom
Open
Fix: truncate destination table on FULL_TABLE replication to prevent …#6077abbasikov wants to merge 1 commit intomage-ai:masterfrom
abbasikov wants to merge 1 commit intomage-ai:masterfrom
Conversation
…stale data accumulation
Author
|
The test_web_server_windows (3.10) failure is unrelated to this change, it's a GitHub Actions artifact upload conflict (409 error) caused by parallel jobs uploading with the same artifact name. All actual tests passed |
Member
|
Have you tested with real Mage data integration pipeline? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
1: What the issue was
FULL_TABLE replication is supposed to be a full replace, source is the single source of truth. But when
the destination table already existed, the code only ran ALTER TABLE (to add new columns) and then
INSERT — so deleted rows in the source were never removed from the destination, causing stale data to
accumulate indefinitely.
suggested Solution was to opt-in toggle, but the correct semantic of FULL_TABLE replication (per Singer spec) is always truncate-and-reload — it's not optional. Making it optional would be incorrect behavior. The Singer specification defines FULL_TABLE as a complete replacement of the destination table on every sync. Our implementation is therefore more correct than what the issue author suggested.
2: What was changed:
In build_query_strings(), when table_exists and replication_method == FULL_TABLE, we now prepend a
TRUNCATE TABLE command before the inserts. The ALTER TABLE path is preserved for INCREMENTAL
and LOG_BASED methods via the else branch.