Skip to content

Add WeakBoundedBTreeMap and use it for nomination-pools SubPools::with_era#12303

Open
cirko33 wants to merge 2 commits into
masterfrom
cirko33-weak-bounded-btree-map-subpools-with-era
Open

Add WeakBoundedBTreeMap and use it for nomination-pools SubPools::with_era#12303
cirko33 wants to merge 2 commits into
masterfrom
cirko33-weak-bounded-btree-map-subpools-with-era

Conversation

@cirko33

@cirko33 cirko33 commented Jun 8, 2026

Copy link
Copy Markdown
Member

What does this PR do?

Introduces a WeakBoundedBTreeMap storage type in frame-support and uses it
for the nomination-pools SubPools::with_era map to prevent loss of unbonding
data when the bonding duration shrinks at runtime.

WeakBoundedBTreeMap

The map counterpart of the existing WeakBoundedVec. It behaves like a
BoundedBTreeMap except that the bound S is not strictly enforced on
decoding:

  • Decoding a map with more entries than S succeeds and logs a warning,
    instead of failing.
  • All mutating operations (try_insert, try_mutate, ...) still respect the
    bound, so the map is compacted back to within S on the next mutation.
  • It encodes identically to BoundedBTreeMap / BTreeMap, so it is a drop-in,
    storage-compatible replacement requiring no migration.

The bug it fixes

SubPools::with_era is bounded by TotalUnbondingPools, which is
runtime-dynamic: it is derived from T::StakeAdapter::bonding_duration().
When the effective bonding duration shrinks (e.g. when nominator fast unbonding
is enabled), the bound shrinks with it.

With the previous strict BoundedBTreeMap, any pool whose historical
with_era map already held more entries than the new, smaller bound could no
longer be decoded:

  • SubPoolsStorage::get silently returned None ("Corrupted state").
  • The next unbond would unwrap_or_default() and overwrite the entry,
    permanently destroying the pool's per-era unbonding accounting (the
    underlying staking-ledger chunks survived, but the pool could no longer
    attribute them).

The fix

  • with_era now uses WeakBoundedBTreeMap, so the oversized historical map
    decodes intact (no funds lost). The read/remove paths (withdraw_unbonded,
    on_slash, PoolMember::total_balance) operate on it correctly.
  • The one mutation that inserts a new era — unbond — now first calls
    SubPools::merge_oldest_until_fits, which folds the oldest pools into
    no_era until there is room. (maybe_merge_pools only merges by era age
    and does not free a slot when stale entries are far-future-dated.) This makes
    the first post-shrink unbond succeed instead of hitting the defensive
    NotEnoughSpaceInUnbondPool path.

The change is loss-less, a no-op under a stable bonding duration, and the
encoded representation is unchanged.

Changes

  • frame-support: new storage::weak_bounded_btree_map::WeakBoundedBTreeMap,
    re-exported from the crate root and pallet_prelude, plus Sealed impl.
  • pallet-nomination-pools: SubPools::with_era switched to
    WeakBoundedBTreeMap; new merge_oldest_until_fits helper called in
    unbond; tests added.

…with_era`

Adds a `WeakBoundedBTreeMap` to `frame-support`, the map counterpart of the
existing `WeakBoundedVec`. Its bound `S` is not strictly enforced on decoding:
a map encoded with more entries than `S` decodes successfully (logging a
warning) instead of failing. All mutating operations still respect the bound,
so the map compacts back to within `S` on the next mutation, and it encodes
identically to `BoundedBTreeMap`/`BTreeMap` (drop-in, storage-compatible).
Uses it for nomination-pools `SubPools::with_era`, which is bounded by the
runtime-dynamic `TotalUnbondingPools` (derived from `bonding_duration()`).
When the effective bonding duration shrinks at runtime, the previous strict
`BoundedBTreeMap` made any oversized historical `with_era` map undecodable:
`SubPoolsStorage::get` silently returned `None` and the next `unbond`
overwrote the entry, destroying per-era unbonding accounting. With the weak
variant the oversized map decodes intact, and `unbond` first calls
`SubPools::merge_oldest_until_fits` to fold the oldest pools into `no_era`
so the insert succeeds instead of hitting `NotEnoughSpaceInUnbondPool`.
Loss-less, no-op under a stable bound, and no storage migration required.
@cirko33 cirko33 requested review from Ank4n and sigurpol June 8, 2026 19:09
@cirko33 cirko33 self-assigned this Jun 8, 2026
@cirko33 cirko33 requested a review from a team as a code owner June 8, 2026 19:09
@paritytech-workflow-stopper

Copy link
Copy Markdown

All GitHub workflows were cancelled due to failure one of the required jobs.
Failed workflow url: https://github.com/paritytech/polkadot-sdk/actions/runs/27160712457
Failed job name: fmt

@sigurpol sigurpol requested a review from andreitrand June 9, 2026 07:30
@sigurpol sigurpol added the T2-pallets This PR/Issue is related to a particular pallet. label Jun 9, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be better to make the weak bounded btree map as a separate PR?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, @sigurpol wdyt?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes I agree - I have commented here as well - but I wanted to explore an alternative to decouple the bound from the flag so the bound never shrinks - as I was mentioning in the original issue's comment but haven't had time to do it so to see if it makes sense 😅
TL;DR have bonding_duration sized to MaxUnbonding = BondingDuration + PostUnbondingPoolsWindow constant instead of the dynamic TotalUnbondingPools. So the bound doesn't shrink on the flip.
THis would be a quick win for 2.3.1 before flipping the flag. Then in parallel we can introduce the weak bounded map to live happy vs governance lowering BondingDuration.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we want to have 2.3.1 very soon to unlock the nominator flip, maybe it's ok to deploy the static max for bonding duration and then having the weak map in a followup runtime upgrade

@Ank4n Ank4n Jun 9, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TL;DR have bonding_duration sized to MaxUnbonding = BondingDuration + PostUnbondingPoolsWindow constant instead of the dynamic TotalUnbondingPools. So the bound doesn't shrink on the flip.
THis would be a quick win for 2.3.1 before flipping the flag. Then in parallel we can introduce the weak bounded map to live happy vs governance lowering BondingDuration.

I think this is a great idea, and probably better even in the long term: decouple MaxUnbonding from BondingDuration.
Concretely: introduce a fixed MaxUnbondingPools constant (e.g. 32 = current 28 + 4) and use it in both places that currently derive from bonding_duration:

  1. The with_era bound (TotalUnbondingPools).
  2. The merge window in maybe_merge_pools. Today it merges at current_era - PostUnbondingPoolsWindow, which collapses post-flip: with bonding_duration dropping to 2, members' clean per-era pools get merged into no_era in only 6 eras compared to 32 before, and any later withdrawal dissolves against the skewed no_era ratio (plus potential slashes from other eras). A decoupled MaxUnbonding keeps pools on their correct per-era ratio for ~32 eras regardless of bonding duration.

This should also get rid of PostUnbondingPoolsWindow.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The runtime level ugly patch would be
PostUnbondingPoolsWindow = if AreNominatorsSlashable { 4 } else { 30 }

Will get same behaviour as above.

@sigurpol sigurpol Jun 9, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some extra notes:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True about PostUnbondingPoolsWindow being a breaking change. On second thought, PostUnbondingPoolsWindow and MaxUnbondingPools are basically same, so we can just keep the old name. Wdyt @sigurpol @cirko33 ?

@cirko33 cirko33 Jun 10, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True about PostUnbondingPoolsWindow being a breaking change. On second thought, PostUnbondingPoolsWindow and MaxUnbondingPools are basically same, so we can just keep the old name. Wdyt @sigurpol @cirko33 ?

I think we can do the next:

  • Now upgrade only logic and numbers with comments explaining why name didn't change
  • Make a separate PR changing a name of parameter for future release

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name can stay I think. PostUnbondingPoolsWindow kinda indicates the same, the excess post bonding period that we keep the pools unmerged.

@@ -0,0 +1,424 @@
// This file is part of Substrate.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should probably define with WeakBoundedBTreeMap in bounded-collections crate probably - similarly to weak bounded vec and the other siblings do (extend crate + re-export here

bounded::{BoundedBTreeMap, BoundedBTreeSet, BoundedSlice, BoundedVec, WeakBoundedVec},
for sp-runtime

TIL we have also https://github.com/paritytech/bounded-collections-next

pub with_era: BoundedBTreeMap<EraIndex, UnbondPool<T>, TotalUnbondingPools<T>>,
/// [`WeakBoundedBTreeMap`] allows decoding if `TotalUnbondingPools` shrinks, preventing loss
/// of unbonding data.
pub with_era: WeakBoundedBTreeMap<EraIndex, UnbondPool<T>, TotalUnbondingPools<T>>,

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wont have a MaxEncodedLen anymore if we use a Weak type, so the PoV benchmarking cannot get the worst case anymore.

Generally we only use Weak types as intermediary solution, not as permanent ones. Not sure if there could be a permanent migration that automatically migrates these if the length is too long.
Depends on the effort to do that.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we can live w/o weak types here as per thread #12303 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

T2-pallets This PR/Issue is related to a particular pallet.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants