[FLINK-39412][cdc-common] Skip duplicate columns in AddColumnEvent to ensure idempotency by dangerousfeng · Pull Request #4370 · apache/flink-cdc

dangerousfeng · 2026-04-09T03:58:39Z

Summary

When recovering from a checkpoint/savepoint, binlog events may be replayed, causing AddColumnEvent to be applied for columns that already exist in the cached schema. This leads to duplicate field names in RowType, which throws:

java.lang.IllegalArgumentException: Field names must be unique. Found duplicates: [valid_date]
    at org.apache.flink.cdc.common.types.RowType.validateFields(RowType.java:158)
    at org.apache.flink.cdc.runtime.operators.transform.PreTransformOperator.processElement(...)

Root Cause

SchemaUtils.applyAddColumnEvent() blindly adds columns without checking if a column with the same name already exists. While isSchemaChangeEventRedundant() exists as a utility, PreTransformOperator.cacheChangeSchema() does not call it before applying schema changes.

Fix

Added an idempotency check in SchemaUtils.applyAddColumnEvent() to skip columns that already exist in the schema
This is the most defensive fix location since it protects all callers of applySchemaChangeEvent(), not just PreTransformOperator
Also maintains existingColumnNames set across iterations for correctness when a single event adds multiple columns

Changes

SchemaUtils.java: Added duplicate column name check before adding columns
SchemaUtilsTest.java: Added test cases for duplicate AddColumnEvent in both LAST and FIRST positions

Test plan

Added unit tests for duplicate AddColumnEvent with LAST position
Added unit tests for duplicate AddColumnEvent with FIRST position
All existing SchemaUtilsTest tests pass (5/5)

… ensure idempotency When recovering from a checkpoint/savepoint, binlog events may be replayed, causing AddColumnEvent to be applied for columns that already exist in the cached schema. This leads to duplicate field names in RowType, which throws IllegalArgumentException: "Field names must be unique". This fix adds an idempotency check in SchemaUtils.applyAddColumnEvent() to skip columns that already exist in the schema, preventing the duplicate column error during failover recovery.

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Improve schema-change replay safety by making AddColumnEvent application idempotent, preventing duplicate field names in recovered/replayed binlog scenarios.

Changes:

Skip adding columns whose names already exist when applying AddColumnEvent
Track existing column names across a single AddColumnEvent application
Add unit tests covering duplicate AddColumnEvent behavior for LAST and FIRST

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
flink-cdc-common/src/main/java/org/apache/flink/cdc/common/utils/SchemaUtils.java	Adds duplicate-name guard to make `applyAddColumnEvent` idempotent during replay/recovery.
flink-cdc-common/src/test/java/org/apache/flink/cdc/common/utils/SchemaUtilsTest.java	Adds regression tests ensuring duplicate adds are ignored for `LAST` and `FIRST`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-09T10:14:29Z

flink-cdc-common/src/main/java/org/apache/flink/cdc/common/utils/SchemaUtils.java

+        Set<String> existingColumnNames =
+                columns.stream()
+                        .map(Column::getName)
+                        .collect(Collectors.toCollection(HashSet::new));
        for (AddColumnEvent.ColumnWithPosition columnWithPosition : event.getAddedColumns()) {
+            // Skip adding the column if it already exists in the schema to ensure idempotency.
+            // This can happen when schema change events are replayed after a failover recovery.
+            if (existingColumnNames.contains(columnWithPosition.getAddColumn().getName())) {
+                continue;
+            }


Skipping duplicates purely by name can hide a real schema conflict (e.g., replay vs. a different upstream AddColumnEvent that reuses the same column name but with a different type/nullable/default/comment). To keep idempotency while avoiding silent corruption, consider: if the name exists, look up the existing Column and compare with columnWithPosition.getAddColumn(); only skip when they are equivalent, otherwise throw an informative exception.

Copilot · 2026-04-09T10:14:29Z

flink-cdc-common/src/test/java/org/apache/flink/cdc/common/utils/SchemaUtilsTest.java

+        // add duplicate column should be ignored (idempotency)
+        addedColumns = new ArrayList<>();
+        addedColumns.add(
+                new AddColumnEvent.ColumnWithPosition(
+                        Column.physicalColumn("col3", DataTypes.STRING()),
+                        AddColumnEvent.ColumnPosition.LAST,
+                        null));
+        addColumnEvent = new AddColumnEvent(tableId, addedColumns);
+        schema = SchemaUtils.applySchemaChangeEvent(schema, addColumnEvent);


Current tests cover replay of a single-column AddColumnEvent. Since the implementation now keeps existingColumnNames across iterations, add a unit test where a single AddColumnEvent contains the same column name twice (e.g., two ColumnWithPosition entries for col3), and assert only one column is added. This directly validates the new intra-event dedup behavior.

github-actions bot added the common label Apr 9, 2026

dangerousfeng changed the title ~~[FLINK-xxxxx][cdc-common] Skip duplicate columns in AddColumnEvent to ensure idempotency~~ [FLINK-39412][cdc-common] Skip duplicate columns in AddColumnEvent to ensure idempotency Apr 9, 2026

dangerousfeng force-pushed the fix-duplicate-add-column-event branch from 6a7977f to 1f1b0f0 Compare April 9, 2026 04:29

dangerousfeng force-pushed the fix-duplicate-add-column-event branch from 1f1b0f0 to 0ff420f Compare April 9, 2026 05:58

lvyanquan requested a review from Copilot April 9, 2026 10:07

Copilot AI reviewed Apr 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FLINK-39412][cdc-common] Skip duplicate columns in AddColumnEvent to ensure idempotency#4370

[FLINK-39412][cdc-common] Skip duplicate columns in AddColumnEvent to ensure idempotency#4370
dangerousfeng wants to merge 1 commit intoapache:masterfrom
dangerousfeng:fix-duplicate-add-column-event

dangerousfeng commented Apr 9, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

dangerousfeng commented Apr 9, 2026

Summary

Root Cause

Fix

Changes

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants