fix(consumer): avoid broker race in response feeder#3486
Open
fix(consumer): avoid broker race in response feeder#3486
Conversation
ad4363f to
78e19d7
Compare
Add a focused race test that fails against the current implementation and a temporary PR workflow that runs only this reproduction. Signed-off-by: DCjanus <DCjanus@dcjanus.com>
Pass the brokerConsumer context alongside each fetch response so responseFeeder can acknowledge and resubscribe without reading child.broker concurrently with dispatcher. Signed-off-by: DCjanus <DCjanus@dcjanus.com>
The focused reproduction workflow was only needed to demonstrate the failing test and its fix while validating the split commit history. Signed-off-by: DCjanus <DCjanus@dcjanus.com>
78e19d7 to
f0887a1
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Background
A previous CI run in PR #3419 exposed a
-racefailure inUnit Testing with Go oldstablecaused by concurrent access topartitionConsumer.broker.Root Cause
responseFeeder()readschild.broker, butdispatcher()can change the same field during abort or redispatch. That is the race reported by-race.The response path does not really need the current value of
child.broker. It only needs to know whichbrokerConsumerproduced the response that is being handled.Fix Approach
Pass that
brokerConsumertogether with the feeder response.This removes the read of
child.brokerfromresponseFeeder()and keeps the ack / resubscribe path tied to the broker that produced the in-flight response.Testing
This PR was validated in three commits.
The first commit only adds the regression test plus a temporary workflow that runs just
TestPartitionConsumerBrokerRaceunder-race, and is expected to fail. The second commit only contains the fix and is expected to make that workflow pass. The third commit removes the temporary workflow after the failure and recovery have both been demonstrated.This sequence is intended to prove two things explicitly:
Relevant CI jobs:
Unit Testing with Go oldstablefrom PR #3419