Skip to content

fix(consumer): avoid broker race in response feeder#3486

Open
DCjanus wants to merge 3 commits intoIBM:mainfrom
DCjanus:fix/consumer-broker-race-repro
Open

fix(consumer): avoid broker race in response feeder#3486
DCjanus wants to merge 3 commits intoIBM:mainfrom
DCjanus:fix/consumer-broker-race-repro

Conversation

@DCjanus
Copy link
Copy Markdown
Contributor

@DCjanus DCjanus commented Mar 30, 2026

Background

A previous CI run in PR #3419 exposed a -race failure in Unit Testing with Go oldstable caused by concurrent access to partitionConsumer.broker.

Root Cause

responseFeeder() reads child.broker, but dispatcher() can change the same field during abort or redispatch. That is the race reported by -race.

The response path does not really need the current value of child.broker. It only needs to know which brokerConsumer produced the response that is being handled.

Fix Approach

Pass that brokerConsumer together with the feeder response.

This removes the read of child.broker from responseFeeder() and keeps the ack / resubscribe path tied to the broker that produced the in-flight response.

Testing

This PR was validated in three commits.

The first commit only adds the regression test plus a temporary workflow that runs just TestPartitionConsumerBrokerRace under -race, and is expected to fail. The second commit only contains the fix and is expected to make that workflow pass. The third commit removes the temporary workflow after the failure and recovery have both been demonstrated.

This sequence is intended to prove two things explicitly:

  • the new regression test really detects the race in CI
  • the follow-up fix really removes the race that the new test exercises

Relevant CI jobs:

@DCjanus DCjanus changed the title test(consumer): add broker race reproduction fix(consumer): avoid broker race in response feeder Mar 31, 2026
@DCjanus DCjanus force-pushed the fix/consumer-broker-race-repro branch from ad4363f to 78e19d7 Compare March 31, 2026 08:10
DCjanus added 3 commits April 1, 2026 00:15
Add a focused race test that fails against the current implementation and a temporary PR workflow that runs only this reproduction.

Signed-off-by: DCjanus <DCjanus@dcjanus.com>
Pass the brokerConsumer context alongside each fetch response so responseFeeder can acknowledge and resubscribe without reading child.broker concurrently with dispatcher.

Signed-off-by: DCjanus <DCjanus@dcjanus.com>
The focused reproduction workflow was only needed to demonstrate the failing test and its fix while validating the split commit history.

Signed-off-by: DCjanus <DCjanus@dcjanus.com>
@DCjanus DCjanus force-pushed the fix/consumer-broker-race-repro branch from 78e19d7 to f0887a1 Compare March 31, 2026 16:19
@DCjanus DCjanus marked this pull request as ready for review March 31, 2026 16:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant