[FLINK-39409][connect/postgres] Support tables without primary key for incremental snapshot by JNSimba · Pull Request #4367 · apache/flink-cdc

JNSimba · 2026-04-08T03:23:12Z

Summary

This PR adds support for PostgreSQL tables without primary keys in incremental snapshot mode,
following the same approach as MySQL (PR #2150).

Changes

JDBC Base Layer (flink-cdc-base):

SplitKeyUtils.getSplitKey(): Handle null record key by extracting chunk key from value's
after/before struct instead of record key
IncrementalSourceScanFetcher.pollWithBuffer(): Use after struct as buffer key when
record key is null (no-PK table)
JdbcSourceFetchTaskContext.rewriteOutputBuffer(): Support no-PK merge logic — for tables
without primary key, use before/after struct as buffer key for CREATE/UPDATE/DELETE operations

PostgreSQL Connector (flink-connector-postgres-cdc):

PostgresDialect: Validate that tables without primary key must have REPLICA IDENTITY FULL
set, querying pg_class.relreplident and failing fast with a clear error message if not

Tests:

Added products_no_pk table DDL (without PK, with REPLICA IDENTITY FULL)
testNoPKTableWithChunkKey: End-to-end test for no-PK table with scan.incremental.snapshot.chunk.key-column configured
testNoPKTableWithoutChunkKey: Verify ValidationException is thrown when chunk key column is not specified

Design Decisions

No interface change: Kept Map<Struct, SourceRecord> in FetchTask.Context.rewriteOutputBuffer()
unchanged to avoid breaking MongoDB and other connectors. For no-PK tables, the full row struct
(all columns via REPLICA IDENTITY FULL) serves as the buffer key.
At-least-once for chunk key updates: When an UPDATE changes the chunk key column value and
crosses split boundaries, only at-least-once semantics can be guaranteed (consistent with MySQL behavior).
REPLICA IDENTITY FULL required: PostgreSQL requires this setting on no-PK tables to provide
full before image in WAL for UPDATE/DELETE events. The connector validates this at startup.

Copilot

Pull request overview

Adds support for PostgreSQL tables without primary keys when running in incremental snapshot mode, aligning behavior with the existing MySQL approach.

Changes:

Extend incremental snapshot split-key extraction and output buffering to handle Debezium records with null keys (no-PK tables).
Add PostgreSQL-side validation to require REPLICA IDENTITY FULL for no-PK tables.
Add end-to-end Postgres IT coverage and new DDL for a no-PK test table.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
flink-cdc-connect/flink-cdc-source-connectors/flink-connector-postgres-cdc/src/test/resources/ddl/inventory_no_pk.sql	Adds a no-PK test table and seed data (with `REPLICA IDENTITY FULL`).
flink-cdc-connect/flink-cdc-source-connectors/flink-connector-postgres-cdc/src/test/java/org/apache/flink/cdc/connectors/postgres/table/PostgreSQLConnectorITCase.java	Adds IT cases for no-PK tables with/without chunk key.
flink-cdc-connect/flink-cdc-source-connectors/flink-connector-postgres-cdc/src/main/java/org/apache/flink/cdc/connectors/postgres/source/PostgresDialect.java	Validates `REPLICA IDENTITY FULL` for no-PK tables during schema discovery.
flink-cdc-connect/flink-cdc-source-connectors/flink-cdc-base/src/main/java/org/apache/flink/cdc/connectors/base/utils/SplitKeyUtils.java	Extracts chunk key from value (before/after) when record key is `null`.
flink-cdc-connect/flink-cdc-source-connectors/flink-cdc-base/src/main/java/org/apache/flink/cdc/connectors/base/source/reader/external/JdbcSourceFetchTaskContext.java	Supports buffer rewrite/merge logic for no-PK by using before/after structs as buffer keys.
flink-cdc-connect/flink-cdc-source-connectors/flink-cdc-base/src/main/java/org/apache/flink/cdc/connectors/base/source/reader/external/IncrementalSourceScanFetcher.java	Uses value `after` struct as initial snapshot buffer key when record key is `null`.

Comments suppressed due to low confidence (1)

flink-cdc-connect/flink-cdc-source-connectors/flink-cdc-base/src/main/java/org/apache/flink/cdc/connectors/base/source/reader/external/JdbcSourceFetchTaskContext.java:142

Typo/grammar in exception message: "the the record" should be corrected to "the record" (and/or rephrase for clarity).

                    throw new IllegalStateException(
                            String.format(
                                    "Data change record shouldn't use READ operation, the the record is %s.",
                                    changeRecord));

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-09T03:00:03Z

...tgres-cdc/src/main/java/org/apache/flink/cdc/connectors/postgres/source/PostgresDialect.java

    public Map<TableId, TableChange> discoverDataCollectionSchemas(JdbcSourceConfig sourceConfig) {
        final List<TableId> capturedTableIds = discoverDataCollections(sourceConfig);

        try (JdbcConnection jdbc = openJdbcConnection(sourceConfig)) {
            // fetch table schemas
            Map<TableId, TableChange> tableSchemas = queryTableSchema(jdbc, capturedTableIds);
+            // validate REPLICA IDENTITY for tables without primary key
+            validateReplicaIdentityForNoPkTables(jdbc, tableSchemas);
            return tableSchemas;


discoverDataCollectionSchemas always calls validateReplicaIdentityForNoPkTables(...). If REPLICA IDENTITY FULL is only required for incremental snapshot mode, this should be gated by the corresponding config; otherwise the validation/error message should be updated to avoid implying this is incremental-snapshot-only.

Copilot · 2026-04-09T03:00:04Z

...tgres-cdc/src/main/java/org/apache/flink/cdc/connectors/postgres/source/PostgresDialect.java

+    private String queryReplicaIdentity(JdbcConnection jdbc, TableId tableId) throws SQLException {
+        String query =
+                String.format(
+                        "SELECT relreplident FROM pg_class c "
+                                + "JOIN pg_namespace n ON c.relnamespace = n.oid "
+                                + "WHERE n.nspname = '%s' AND c.relname = '%s'",
+                        tableId.schema(), tableId.table());


queryReplicaIdentity(...) builds SQL by string interpolation of schema/table names. This can break for identifiers containing quotes/case-sensitive names and is also vulnerable to SQL injection if table identifiers come from user configuration; prefer a parameterized query and/or proper identifier quoting.

Copilot · 2026-04-09T03:00:04Z

...tgres-cdc/src/main/java/org/apache/flink/cdc/connectors/postgres/source/PostgresDialect.java

+                    if (rs.next()) {
+                        result[0] = rs.getString(1);
+                    }
+                });


queryReplicaIdentity(...) returns null when the table isn't found; the caller then throws the REPLICA IDENTITY FULL exception, which becomes misleading. Consider explicitly handling the "no row returned" case (e.g., throw a dedicated exception indicating the table lookup failed).

Suggested change

});

});

if (result[0] == null) {

throw new FlinkRuntimeException(

String.format(

"Failed to query replica identity for table '%s.%s': table was not found.",

tableId.schema(), tableId.table()));

}

Copilot · 2026-04-09T03:00:04Z

.../org/apache/flink/cdc/connectors/base/source/reader/external/JdbcSourceFetchTaskContext.java

 public abstract class JdbcSourceFetchTaskContext implements FetchTask.Context {

+    private static final Logger LOG = LoggerFactory.getLogger(JdbcSourceFetchTaskContext.class);
+


LOG is declared but never used in this class; please remove it (or use it) to avoid unused-field/static-analysis violations.

Copilot · 2026-04-09T03:00:04Z

...s/flink-cdc-base/src/main/java/org/apache/flink/cdc/connectors/base/utils/SplitKeyUtils.java

+    /**
+     * For tables without primary key, the chunk key is not in the record key (which is null).
+     * Instead, extract it from the value's after struct (for CREATE/READ) or before struct (for
+     * UPDATE/DELETE).
+     */
+    public static Struct getStructContainingChunkKey(SourceRecord record) {
+        Struct value = (Struct) record.value();
+        Envelope.Operation op =
+                Envelope.Operation.forCode(value.getString(Envelope.FieldName.OPERATION));
+        if (op == Envelope.Operation.CREATE || op == Envelope.Operation.READ) {
+            return value.getStruct(Envelope.FieldName.AFTER);
+        } else {
+            return value.getStruct(Envelope.FieldName.BEFORE);
+        }


getStructContainingChunkKey(...) is only used inside SplitKeyUtils; consider making it private to avoid expanding the public API surface unnecessarily.

Copilot · 2026-04-09T03:00:05Z

...dc-source-connectors/flink-connector-postgres-cdc/src/test/resources/ddl/inventory_no_pk.sql

+       (105,'hammer','14oz carpenter''s hammer',0.875),
+       (106,'hammer','16oz carpenter''s hammer',1.0),
+       (107,'rocks','box of assorted rocks',5.3),
+       (108,'jacket','water resistent black wind breaker',0.1),


Spelling: "water resistent" should be "water resistant" in test data to avoid propagating typos into expected results/examples.

Suggested change

(108,'jacket','water resistent black wind breaker',0.1),

(108,'jacket','water resistant black wind breaker',0.1),

Copilot · 2026-04-09T03:00:05Z

.../src/test/java/org/apache/flink/cdc/connectors/postgres/table/PostgreSQLConnectorITCase.java

+                    "INSERT INTO inventory.products_no_pk VALUES (110,'jacket','water resistent white wind breaker',0.2);");
+            statement.execute(
+                    "INSERT INTO inventory.products_no_pk VALUES (111,'scooter','Big 2-wheel scooter',5.18);");
+            statement.execute(
+                    "UPDATE inventory.products_no_pk SET description='new water resistent white wind breaker', weight='0.5' WHERE id=110;");


Spelling: "water resistent" should be "water resistant" in the inserted test row (and adjust the subsequent UPDATE string accordingly).

Suggested change

"INSERT INTO inventory.products_no_pk VALUES (110,'jacket','water resistent white wind breaker',0.2);");

statement.execute(

"INSERT INTO inventory.products_no_pk VALUES (111,'scooter','Big 2-wheel scooter',5.18);");

statement.execute(

"UPDATE inventory.products_no_pk SET description='new water resistent white wind breaker', weight='0.5' WHERE id=110;");

"INSERT INTO inventory.products_no_pk VALUES (110,'jacket','water resistant white wind breaker',0.2);");

statement.execute(

"INSERT INTO inventory.products_no_pk VALUES (111,'scooter','Big 2-wheel scooter',5.18);");

statement.execute(

"UPDATE inventory.products_no_pk SET description='new water resistant white wind breaker', weight='0.5' WHERE id=110;");

Copilot · 2026-04-09T03:00:05Z

.../src/test/java/org/apache/flink/cdc/connectors/postgres/table/PostgreSQLConnectorITCase.java

+                    "INSERT INTO inventory.products_no_pk VALUES (110,'jacket','water resistent white wind breaker',0.2);");
+            statement.execute(
+                    "INSERT INTO inventory.products_no_pk VALUES (111,'scooter','Big 2-wheel scooter',5.18);");
+            statement.execute(
+                    "UPDATE inventory.products_no_pk SET description='new water resistent white wind breaker', weight='0.5' WHERE id=110;");


Spelling: "water resistent" should be "water resistant" in the UPDATE statement string to keep test data clean/consistent.

Suggested change

"INSERT INTO inventory.products_no_pk VALUES (110,'jacket','water resistent white wind breaker',0.2);");

statement.execute(

"INSERT INTO inventory.products_no_pk VALUES (111,'scooter','Big 2-wheel scooter',5.18);");

statement.execute(

"UPDATE inventory.products_no_pk SET description='new water resistent white wind breaker', weight='0.5' WHERE id=110;");

"INSERT INTO inventory.products_no_pk VALUES (110,'jacket','water resistant white wind breaker',0.2);");

statement.execute(

"INSERT INTO inventory.products_no_pk VALUES (111,'scooter','Big 2-wheel scooter',5.18);");

statement.execute(

"UPDATE inventory.products_no_pk SET description='new water resistant white wind breaker', weight='0.5' WHERE id=110;");

support no pk table for pg

52e719f

github-actions bot added base postgres-cdc-connector labels Apr 8, 2026

JNSimba added 3 commits April 8, 2026 11:30

code style

55170b1

code style

24a46fd

fix

6f34a1f

lvyanquan requested a review from Copilot April 9, 2026 02:54

lvyanquan self-assigned this Apr 9, 2026

Copilot started reviewing on behalf of lvyanquan April 9, 2026 02:55 View session

Copilot AI reviewed Apr 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FLINK-39409][connect/postgres] Support tables without primary key for incremental snapshot#4367

[FLINK-39409][connect/postgres] Support tables without primary key for incremental snapshot#4367
JNSimba wants to merge 4 commits intoapache:masterfrom
JNSimba:feature/postgres-no-pk-table-support

JNSimba commented Apr 8, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

-                });
+                });
+        if (result[0] == null) {
+            throw new FlinkRuntimeException(
+                    String.format(
+                            "Failed to query replica identity for table '%s.%s': table was not found.",
+                            tableId.schema(), tableId.table()));
+        }

		public abstract class JdbcSourceFetchTaskContext implements FetchTask.Context {

		private static final Logger LOG = LoggerFactory.getLogger(JdbcSourceFetchTaskContext.class);

	(108,'jacket','water resistent black wind breaker',0.1),
	(108,'jacket','water resistant black wind breaker',0.1),

Conversation

JNSimba commented Apr 8, 2026

Summary

Changes

Design Decisions

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants