fix(gnoland): detect validator changes via EndTxHook for same-block processing#5556
fix(gnoland): detect validator changes via EndTxHook for same-block processing#5556thehowl wants to merge 6 commits intognolang:masterfrom
Conversation
🛠 PR Checks SummaryAll Automated Checks passed. ✅ Manual Checks (for Reviewers):
Read More🤖 This bot helps streamline PR reviews by verifying automated checks and providing guidance for contributors and reviewers. ✅ Automated Checks (for Contributors):🟢 Maintainers must be able to edit this pull request (more info) ☑️ Contributor Actions:
☑️ Reviewer Actions:
📚 Resources:Debug
|
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
…rocessing Replace the in-memory event collector (which listened on EventSwitch post-commit, causing a one-block delay and losing state on restart) with an EndTxHook that scans ABCI events synchronously during DeliverTx. The EndBlocker now calls GetChanges(req.Height, req.Height) using the current block height, querying changes from the deliver-state context which reflects all uncommitted txs in the block. Because EndBlock(N) returns ValidatorUpdates that Tendermint persists in its own state DB, the validator set survives restarts without any special recovery logic. Deletes the generic collector[T] / events.go infrastructure entirely. Adds a gnorpc testscript command and a txtar integration test that confirms validators added via GovDAO survive a node restart. Co-authored-by: Omar Sy <syomar.pro@gmail.com>
4fcf79c to
d7d62c6
Compare
… closure flag Add BlockEvents() []abci.Event to BaseApp: accumulated from successful DeliverTx calls, reset at BeginBlock. The EndBlocker now reads block events directly via the endBlockerApp interface rather than relying on a shared bool captured in a closure between EndTxHook and EndBlocker. This removes the hasValEvent flag and the EndTxHook validator-detection logic, replacing it with a clean pull model. The CommitGnoTransactionStore call remains in EndTxHook as before. Co-authored-by: Omar Sy <syomar.pro@gmail.com>
…ock-same-block-validators
|
I'd assumed applying the consensus change at N+1 was a protocol requirement. Re-reading, it looks like the N+1 delay was a side effect of the If that's right, this PR still lives in the same substrate #5488 flags: realm write → event → scan → cross-realm Possible sketch (may be wrong)Native function callable only from func setValidator(ctx, addr, pubKey, power) {
assertCaller(ctx, valRealm)
if err := valKeeper.Validate(ctx, update); err != nil {
panic(err) // caught by runTx → tx fails, node stays up
}
valKeeper.QueueUpdate(ctx, update)
}Validation would run inside the tx's panic-recovery scope, so an invalid change fails that tx rather than crashing every node at return abci.ResponseEndBlock{ValidatorUpdates: valKeeper.Drain(ctx)}WDYT ? |
Yes, that's my understanding too. It was done that way because of how eventswitch worked, but really there was no need for the event switch all along.
here's my understanding, and what prompted me to propose this PR:
The proposal you make seems to me to add complexity while I don't understand your criticism of this PR's approach |
It is more of a proposal. My point was that since we're already refactoring this, it seemed like a good moment to also address #5488. The approach I suggested would fix both. |
|
Looks like #5485 was already attempting to move in the direction of using parameters |
|
#5485 is the way :) |
Fixes the validator-changes-lost-on-restart bug (originally reported in #5469) with a cleaner approach: instead of an in-memory event collector that listened post-commit, we hook into
EndTxHookwhich fires synchronously duringDeliverTx.What changes
EndTxHookscans each delivered tx's ABCI events forValidatorAdded/ValidatorRemovedfromr/sys/validators/v2. If any are found, ahasValEventflag is set.EndBlockerchecks the flag (and resets it). If set, it callsGetChanges(req.Height, req.Height)— using the current block height rather than the previously-committed height.deliverStatecontext holds all uncommitted writes from this block's txs, soQueryEvalsees the validator changes immediately, in the same block they were made.EndBlock(N)now returnsValidatorUpdatescontaining the changes from block N, Tendermint persists them in its own state DB. On restart, Tendermint restores the validator set from that DB — no special recovery code needed.events.go: the genericcollector[T]type is gone entirely.Why this is better than the firstBlock flag (from #5469)
The firstBlock approach would query
GetChanges(lastHeight, lastHeight)on the first block after restart, which queries for changes already applied to consensus. TheEndTxHookapproach eliminates the one-block delay entirely and makes restarts inherently correct without any special-case logic.ADR
gno.land/adr/pr5469_restart_validator_changes.mdTesting
TestEndBlockerunit tests pass.restart_validators.txtarintegration test: adds a validator via GovDAO, restarts the node, then usesgnorpc validators -waitto confirm the validator appears in the consensus set.Note
This PR includes the
gnorpctestscript command from #5469 (by @omarsy), which was needed for the integration test. If #5469 is merged first, that part can be dropped.Generated with Claude Code — AI assistance disclosed per AGENTS.md.