Skip to content

Transactional handling for Debezium PG CDC#81

Open
shawkins wants to merge 3 commits intojwplayer:masterfrom
shawkins:transactional
Open

Transactional handling for Debezium PG CDC#81
shawkins wants to merge 3 commits intojwplayer:masterfrom
shawkins:transactional

Conversation

@shawkins
Copy link
Copy Markdown

There are some things here that could be teased apart, but it is probably good just to see what it's working towards. This was done for a POC of honoring transactaional metadata produced by Debezium - in particular for postgresql.

With a config that includes new serdes and a transactions topic, such as:

topics:
  default:
    acks: "all"
    auto.offset.reset: "earliest"
    bootstrap.servers: "localhost:9092"
    client.id: "southpaw"
    group.id: "southpaw"
    enable.auto.commit: false
    key.serde.class: "com.jwplayer.southpaw.serde.DebeziumJsonSerde"
    schema.registry.url: "http://localhost:80"
    topic.class: "com.jwplayer.southpaw.topic.KafkaTopic"
    value.serde.class: "com.jwplayer.southpaw.serde.DebeziumJsonSerde"
  CustomerWithAddresses:
    compression.type: "snappy"
    jackson.serde.class: "com.jwplayer.southpaw.json.DenormalizedRecord"
    key.serde.class: "org.apache.kafka.common.serialization.Serdes$ByteArraySerde"
    topic.class: "com.jwplayer.southpaw.topic.KafkaTopic"
    topic.name: "customers-with-addresses"
    value.serde.class: "com.jwplayer.southpaw.serde.JacksonSerde"
  customer:
    topic.name: "dbserver1.inventory.customers"
  address:
    topic.name: "dbserver1.inventory.addresses"
  transactions:
    topic.name: "dbserver1.transaction"
    value.serde.class: "com.jwplayer.southpaw.serde.JsonSerde"
    key.serde.class: "com.jwplayer.southpaw.serde.JsonSerde"
    persistent: false

One can consume the events from https://github.com/debezium/debezium-examples/tree/master/kstreams-fk-join with the connector configured with "provide.transaction.metadata": "true" and emit denormalizations that are consistent with the transaction boundaries. It also degrades if transaction metadata is not available to the normal eventually consistent processing. Please reach out if something like that is of interest.

In the earliest commit I'm trying to address redundant or unnecessary deserialization by holding onto the deserialized value and making getting the old value for filtering optional. It also add support for wrapped debezium json cdc events.

In the next commit there's code to make for a tighter polling loop to avoid setting or incurring a polling timeout on topics that don't change much.

Let me know if you want separate PRs for those changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant