Fix isClosing data race#418
Open
dnerdy wants to merge 4 commits into
Open
Conversation
This PR fixes the deadlock introduced in #402. The deadlock was occurring because the forwardWebSocketData function was holding a lock on the subscription map (in the listenWebSocket goroutine) while Unsubscribe was blocked attempting to aquire the same lock. The forwardWebSocketData function was also blocked attempting to send data on the interfaceChan channel, which had no readers in the test being run. After fixing the locking (by making sure the lock isn't held when forwardDataFunc is called), a race condition cropped up between sending on interfaceChan (in forwardWebSocketData) and closing interfaceChan in Unsubscribe. This was fixed by making the listenWebSocket goroutine the "owner" of the interfaceChan channel -- it is now the only goroutine that sends on and closes the channel. Other goroutines singal interfaceChan should be closed by setting `_hasBeenUnsubscribed` on the channel, which is protected by a short-held lock. Note that there's a similar data race involving isClosing on webSocketClient. isClosing is set when Close is called (while holding a lock) but read in listenWebSocket without a lock. A separate PR will fix this race (so the `-race` flag can be added when running tests).
This PR fixes a data race related to the webSocketClient isClosing field. Before this change the field was set in the Close method while holding a lock but read in the listenWebSocket method without holding a lock. The point of the isClosing flag was to prevent sending on errChan after close. Instead of having a hidden dependecy between a flag and the errChan channel state, this PR replaces isClosing with a flag named exitListenWebSocket, and when the flag is set the listenWebSocket closes errChan itself. listenWebSocket is the only goroutine that writes to errChan, so there's no longer any possibility of writing on a closed channel -- the listenWebSocket goroutine now effectively "owns" errChan.
dnerdy
commented
May 21, 2026
| conn WSConn | ||
| connParams map[string]interface{} | ||
| Dialer Dialer | ||
| conn WSConn |
Contributor
Author
There was a problem hiding this comment.
Struct fields were reordered due to the following lint error:
graphql/websocket.go:46:22: fieldalignment: struct with 104 pointer bytes could be 80 (govet)
6 tasks
benjaminjkraft
approved these changes
May 23, 2026
Collaborator
benjaminjkraft
left a comment
There was a problem hiding this comment.
Nice, thanks for tracking this down!
| w.exitListenWebSocketMu.Lock() | ||
| if w.exitListenWebSocket { | ||
| close(w.errChan) | ||
| return |
Collaborator
There was a problem hiding this comment.
Do we need to unlock here?
benjaminjkraft
pushed a commit
that referenced
this pull request
May 23, 2026
This PR fixes the deadlock introduced in #402. The deadlock was occurring because the forwardWebSocketData function was holding a lock on the subscription map (in the listenWebSocket goroutine) while Unsubscribe was blocked attempting to aquire the same lock. The forwardWebSocketData function was also blocked attempting to send data on the interfaceChan channel, which had no readers in the test being run. After fixing the locking (by making sure the lock isn't held when forwardDataFunc is called), a race condition cropped up between sending on interfaceChan (in forwardWebSocketData) and closing interfaceChan in Unsubscribe. This was fixed by making the listenWebSocket goroutine the "owner" of the interfaceChan channel -- it is now the only goroutine that sends on and closes the channel. Other goroutines singal interfaceChan should be closed by setting `_hasBeenUnsubscribed` on the subscription, which is protected by a short-held lock. Note that there's a similar data race involving isClosing on webSocketClient. isClosing is set when Close is called (while holding a lock) but read in listenWebSocket without a lock. #418 fixes this race (so the `-race` flag can be added when running tests). <!-- Thanks for your contribution! Check out the [contributing docs](https://github.com/Khan/genqlient/blob/main/docs/CONTRIBUTING.md) for more on contributing to genqlient. --> I have: - [x] Written a clear PR title and description (above) - [x] Signed the [Khan Academy CLA](https://www.khanacademy.org/r/cla) - [ ] Added tests covering my changes, if applicable - [ ] Included a link to the issue fixed, if applicable - [x] Included documentation, for new features - [ ] Added an entry to the changelog
6 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR fixes a data race related to the webSocketClient isClosing field. Before this change the field was set in the Close method while holding a lock but read in the listenWebSocket method without holding a lock. The point of the isClosing flag was to prevent sending on errChan after close. Instead of having a hidden dependecy between a flag and the errChan channel state, this PR replaces isClosing with a flag named exitListenWebSocket, and when the flag is set the listenWebSocket closes errChan itself. listenWebSocket is the only goroutine that writes to errChan, so there's no longer any possibility of writing on a closed channel -- the listenWebSocket goroutine now effectively "owns" errChan.
I have: