Skip to content

Nodes halt on dust registration load #1352

@chrispalaskas

Description

@chrispalaskas

Context & versions:

Steps to reproduce:

  1. Deployed https://github.com/midnightntwrk/midnight-node/releases/tag/node-1.0.0-toolkit-1.0.0-runtime-1.0.0-rc.2 on perfnet
  2. Observe network is progressing find (stabilized from cnight observation)
  3. Register more than 3, less than 10 seeds with MN Toolkit

Actual behavior:

Some nodes (BP + RPC + Gateway) stall block production and remain stuck until reboot.
Mempool is holding those txs for a long time.

Expected behavior:

I used to register 20 seeds at a time with no issues.

Log analysis:

Time Event
15:46:54 Imported #1086
15:47:00 Imported #1087
15:47:06 Imported #1088
15:47:12 Mike tries to author block #1089 — starts consensus session on top of #1088
15:47:16 Discarding proposal for slot 296059072; block production took too long — authoring failed after ~4 seconds
15:47:35 Upload bandwidth drops from ~14 kiB/s to 0.1 kiB/s
15:47:40 Still Idle at #1088, upload near zero
15:47:48 Last token bridge observation (slot processing stops)
15:47:50 Transitions to "Preparing 0.0 bps", target=#1094, stuck at #1088
15:47:50 - 16:03:25 16 minutes stuck — best stays at #1088, target reaches #1235 (147 blocks behind)
16:00:20 Download bandwidth also drops from ~10 kiB/s to ~1.8 kiB/s
16:03:26 Reserved peer disconnected (12D3KooWHXV...) — peers starting to give up on Mike
Root Cause: Same as Leo — stuck executing block #1089
The pattern is identical to Leo, but Mike has an extra clue — the failed block authoring attempt:

Mike was selected to author block #1089 at slot 296059072. It started the consensus session at 15:47:12 on top of block #1088.

Block production took too long — Aura discarded the proposal at 15:47:16, ~4 seconds later. This means the runtime couldn't finish executing the block's extrinsics within the slot deadline.

After that, Mike also can't import block #1089 from another node — it enters the same "Preparing 0.0 bps" stall as Leo.

It's worse for Mike — the logs span 16+ minutes and the node falls 147 blocks behind (target=#1235), with no recovery. DB writes stop at 15:47:30. Token bridge observations stop at 15:47:48. Eventually even download bandwidth degrades (~16:00:20) and a reserved peer disconnects (16:03:26).

Metadata

Metadata

Assignees

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions