Skip to content

fix(bump-builder): chunk the bulk MINED fan-out to stay under Kafka MaxMessageBytes#221

Merged
galt-tr merged 1 commit into
mainfrom
fix/bump-builder-stuck-tx-recovery
Jun 24, 2026
Merged

fix(bump-builder): chunk the bulk MINED fan-out to stay under Kafka MaxMessageBytes#221
galt-tr merged 1 commit into
mainfrom
fix/bump-builder-stuck-tx-recovery

Conversation

@galt-tr

@galt-tr galt-tr commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Context

markMinedAndPublish coalesced a block's MINED fan-out into a single bulk event carrying every txid. A ~27k-tx block serialized to ~1.85 MB — over the 1 MiB default Producer.MaxMessageBytes — so PublishBulk failed and the MINED event was silently dropped for large blocks: the DB status was still MINED, but SSE/webhook subscribers never saw it.

Change

Split the txid list into <= maxTxIDsPerBulkEvent (5000) chunks, one event each (~340 KB), comfortably under the limit. Subscribers already unfan per-tx from TxIDs[], so multiple events per block are additive and harmless.

Notes

  • Scoped to just this fix. The original PR also carried a watchdog/zero-STUMP recovery workstream (WS3a/WS3b); those were dropped because feat(bump-builder): detect and recover silently-dropped STUMPs #207 ("detect and recover silently-dropped STUMPs") already solves that with a cleaner design (processed_at as the finalize/recovery signal + expectedSubtreeIndices). Rebased onto current main.
  • Test added (builder_bulk_mined_test.go); go build/vet (incl. build tags), golangci-lint, and the bump_builder suite pass.

@galt-tr galt-tr requested a review from mrz1836 as a code owner June 24, 2026 01:50
@github-actions github-actions Bot added the bug-P3 Lowest rated bug, affects nearly none or low-impact label Jun 24, 2026
@github-actions github-actions Bot added the size/XL Very large change (>500 lines) label Jun 24, 2026
@mrz1836 mrz1836 assigned galt-tr and unassigned mrz1836 Jun 24, 2026
@galt-tr galt-tr force-pushed the fix/bump-builder-stuck-tx-recovery branch from 55a62ca to 0ca00b8 Compare June 24, 2026 15:43
…axMessageBytes

markMinedAndPublish coalesced a block's MINED fan-out into a single bulk event
carrying every txid. A ~27k-tx block serialized to ~1.85 MB, over the 1 MiB
default Producer.MaxMessageBytes, so PublishBulk failed and the MINED event was
silently dropped for large blocks — the DB status was still MINED, but
SSE/webhook subscribers never saw it.

Split the txid list into <=maxTxIDsPerBulkEvent (5000) chunks, one event each
(~340 KB), comfortably under the limit. Subscribers already unfan per-tx from
TxIDs[], so multiple events per block are additive and harmless.

(WS3a/WS3b from the original PR are dropped — superseded by #207's
expected-STUMP recovery model.)
@galt-tr galt-tr force-pushed the fix/bump-builder-stuck-tx-recovery branch from 0ca00b8 to e0b845e Compare June 24, 2026 15:44
@galt-tr galt-tr changed the title fix(bump-builder): recover stuck txs from late/missed STUMPs + chunk large MINED events fix(bump-builder): chunk the bulk MINED fan-out to stay under Kafka MaxMessageBytes Jun 24, 2026
@galt-tr galt-tr merged commit 3004c38 into main Jun 24, 2026
29 checks passed
@galt-tr galt-tr deleted the fix/bump-builder-stuck-tx-recovery branch June 24, 2026 15:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug-P3 Lowest rated bug, affects nearly none or low-impact size/XL Very large change (>500 lines)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants