Skip to content

fix(tm2/rpc): don't panic when txDispatcher subscription closes on shutdown#5561

Open
thehowl wants to merge 3 commits intognolang:masterfrom
thehowl:dev/morgan/fix-txdispatcher-panic
Open

fix(tm2/rpc): don't panic when txDispatcher subscription closes on shutdown#5561
thehowl wants to merge 3 commits intognolang:masterfrom
thehowl:dev/morgan/fix-txdispatcher-panic

Conversation

@thehowl
Copy link
Copy Markdown
Member

@thehowl thehowl commented Apr 21, 2026

Summary

Fixes a CI flake manifesting as panic: txDispatcher subscription unexpectedly closed (e.g. this run).

The txDispatcher.listenRoutine panicked any time its events subscription channel was closed. But that channel is closed legitimately by events.SubscribeFilteredOn during event-switch shutdown: when the listener callback's unbuffered send would block and evsw.Quit() has already fired, the callback takes the <-evsw.Quit() branch and close(ch)es the subscription (see tm2/pkg/events/subscribe.go:42-73).

The race surfaces during node teardown because:

  1. Node.OnStop stops the event switch (n.evsw.Stop()) but never stops the package-level gTxDispatcher.
  2. eventSwitch.FireEvent has no stop check — a reactor that hasn't finished draining can still fire an EventTx after Stop, invoking the dispatcher's callback while evsw.Quit() is closed.

Fix

  1. Treat a closed subscription as clean shutdown (tm2/pkg/bft/rpc/core/mempool.go): return from listenRoutine and asynchronously Stop() the dispatcher instead of panicking. Pending getTxResult waiters time out normally.
  2. Stop gTxDispatcher during node teardown (tm2/pkg/bft/node/node.go, tm2/pkg/bft/rpc/core/pipe.go): add rpccore.Stop() and call it from Node.OnStop before evsw.Stop, so in the normal case the listenRoutine exits via its own Quit channel. Also stop any previous dispatcher when SetEventSwitch is called again (prevents goroutine leaks across tests reusing the package globals).
  3. Regression tests (tm2/pkg/bft/rpc/core/mempool_test.go): one test closes the subscription directly; the other reproduces the full event-switch shutdown race. The second test reliably produces the original panic when reverted against the old code.

Test plan

  • go test ./tm2/pkg/bft/rpc/core/... -short -count=30 — new regression tests pass 30/30
  • go test ./tm2/pkg/bft/rpc/... ./tm2/pkg/bft/node/... — no regressions
  • go test ./gno.land/pkg/integration/... -run TestMalformedTypeURL — passes
  • Temporarily reverting the listenRoutine change makes the new TestTxDispatcher_EventSwitchShutdown fail with the original panic, confirming the test exercises the exact CI failure

…utdown

The txDispatcher's listenRoutine used to panic with "txDispatcher
subscription unexpectedly closed" whenever its events subscription channel
was closed. That channel is closed legitimately by
events.SubscribeFilteredOn during event-switch shutdown: when the listener
callback's unbuffered send would block and evsw.Quit() has already fired,
the callback takes the <-evsw.Quit() branch and close(ch)es the
subscription.

This hit integration test CI as a flake (panic: txDispatcher subscription
unexpectedly closed) because Node.OnStop stops the event switch without
first stopping the package-level gTxDispatcher, and eventSwitch.FireEvent
has no stop check — so a late EventTx fired during teardown can still
invoke the listener callback and race evsw.Quit() to close the channel.

Treat the closed subscription as a clean shutdown signal: stop the
dispatcher and return. Additionally stop gTxDispatcher in Node.OnStop
before evsw.Stop so the goroutine exits via its own Quit channel in the
normal case, and guard SetEventSwitch against leaking a previous
dispatcher when the package globals are re-initialized.

Regression tests exercise both the directly-closed-subscription path and
the full event-switch shutdown race.
@Gno2D2
Copy link
Copy Markdown
Collaborator

Gno2D2 commented Apr 21, 2026

🛠 PR Checks Summary

All Automated Checks passed. ✅

Manual Checks (for Reviewers):
  • IGNORE the bot requirements for this PR (force green CI check)
Read More

🤖 This bot helps streamline PR reviews by verifying automated checks and providing guidance for contributors and reviewers.

✅ Automated Checks (for Contributors):

🟢 Maintainers must be able to edit this pull request (more info)

☑️ Contributor Actions:
  1. Fix any issues flagged by automated checks.
  2. Follow the Contributor Checklist to ensure your PR is ready for review.
    • Add new tests, or document why they are unnecessary.
    • Provide clear examples/screenshots, if necessary.
    • Update documentation, if required.
    • Ensure no breaking changes, or include BREAKING CHANGE notes.
    • Link related issues/PRs, where applicable.
☑️ Reviewer Actions:
  1. Complete manual checks for the PR, including the guidelines and additional checks if applicable.
📚 Resources:
Debug
Automated Checks
Maintainers must be able to edit this pull request (more info)

If

🟢 Condition met
└── 🟢 And
    ├── 🟢 The base branch matches this pattern: ^master$
    └── 🟢 The pull request was created from a fork (head branch repo: thehowl/gno)

Then

🟢 Requirement satisfied
└── 🟢 Maintainer can modify this pull request

Manual Checks
**IGNORE** the bot requirements for this PR (force green CI check)

If

🟢 Condition met
└── 🟢 On every pull request

Can be checked by

  • Any user with comment edit permission

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 21, 2026

Codecov Report

❌ Patch coverage is 75.00000% with 2 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
tm2/pkg/bft/rpc/core/pipe.go 60.00% 1 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

@thehowl thehowl marked this pull request as draft April 21, 2026 13:34
@thehowl thehowl marked this pull request as ready for review April 21, 2026 13:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

📦 🌐 tendermint v2 Issues or PRs tm2 related

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants