Skip to content

Fix deadloack introduced in #402#417

Merged
benjaminjkraft merged 3 commits into
mainfrom
fix-402-deadlock
May 23, 2026
Merged

Fix deadloack introduced in #402#417
benjaminjkraft merged 3 commits into
mainfrom
fix-402-deadlock

Conversation

@dnerdy
Copy link
Copy Markdown
Contributor

@dnerdy dnerdy commented May 20, 2026

This PR fixes the deadlock introduced in #402. The deadlock was occurring because the forwardWebSocketData function was holding a lock on the subscription map (in the listenWebSocket goroutine) while Unsubscribe was blocked attempting to aquire the same lock. The forwardWebSocketData function was also blocked attempting to send data on the interfaceChan channel, which had no readers in the test being run.

After fixing the locking (by making sure the lock isn't held when forwardDataFunc is called), a race condition cropped up between sending on interfaceChan (in forwardWebSocketData) and closing interfaceChan in Unsubscribe. This was fixed by making the listenWebSocket goroutine the "owner" of the interfaceChan channel -- it is now the only goroutine that sends on and closes the channel. Other goroutines singal interfaceChan should be closed by setting _hasBeenUnsubscribed on the subscription, which is protected by a short-held lock.

Note that there's a similar data race involving isClosing on webSocketClient. isClosing is set when Close is called (while holding a lock) but read in listenWebSocket without a lock. #418 fixes this race (so the -race flag can be added when running tests).

I have:

  • Written a clear PR title and description (above)
  • Signed the Khan Academy CLA
  • Added tests covering my changes, if applicable
  • Included a link to the issue fixed, if applicable
  • Included documentation, for new features
  • Added an entry to the changelog

This PR fixes the deadlock introduced in #402. The deadlock was occurring
because the forwardWebSocketData function was holding a lock on the
subscription map (in the listenWebSocket goroutine) while Unsubscribe was
blocked attempting to aquire the same lock. The forwardWebSocketData function
was also blocked attempting to send data on the interfaceChan channel, which
had no readers in the test being run.

After fixing the locking (by making sure the lock isn't held when
forwardDataFunc is called), a race condition cropped up between sending on
interfaceChan (in forwardWebSocketData) and closing interfaceChan in
Unsubscribe. This was fixed by making the listenWebSocket goroutine the "owner"
of the interfaceChan channel -- it is now the only goroutine that sends on and
closes the channel. Other goroutines singal interfaceChan should be closed by
setting `_hasBeenUnsubscribed` on the channel, which is protected by a
short-held lock.

Note that there's a similar data race involving isClosing on webSocketClient.
isClosing is set when Close is called (while holding a lock) but read in
listenWebSocket without a lock. A separate PR will fix this race (so the
`-race` flag can be added when running tests).
Comment thread graphql/subscription.go
hasBeenUnsubscribed bool

// Hold when accessing _hasBeenUnsubscribed
hasBeenUnsubscribedMu sync.Mutex
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the copylocks (below) is a real problem -- you have to make the mutex a pointer and then hae a constructor for this struct.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the code to use *subscription in the subscription map, which should work as well.

Copy link
Copy Markdown
Member

@csilvers csilvers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for taking this on! I'll leave it to ben to review, my head always gets fuzzy thinking about locking.

Copy link
Copy Markdown
Collaborator

@benjaminjkraft benjaminjkraft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@benjaminjkraft benjaminjkraft merged commit f0f1687 into main May 23, 2026
10 checks passed
@benjaminjkraft benjaminjkraft deleted the fix-402-deadlock branch May 23, 2026 18:51
@HaraldNordgren
Copy link
Copy Markdown
Contributor

Thanks a lot, @dnerdy!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants