Statement store index optimization, stage 2a by s0me0ne-unkn0wn · Pull Request #12304 · paritytech/polkadot-sdk

s0me0ne-unkn0wn · 2026-06-08T19:33:47Z

This PR implements stage 2a of the optimizations proposed in #10910 (Move Read Index to Disk with LRU Cache).

As a notable difference from the original issue, the evicted index has also been moved to disk (AFAIR, it used to consume ~150 MiB of RAM with 4M statements).

alexggh · 2026-06-09T14:21:21Z

It would be interesting to have some performance benchmarks here, to better understand what is the penalty for keeping this indexes in memory vs in the database.

s0me0ne-unkn0wn · 2026-06-09T19:30:02Z

@alexggh This is just the "cost" side benchmark (without the "benefit" side — requires some memory measurement harness)

Benchmark	RAM index	Disk index	Slowdown
Reads — point
`has_statement` (1k/10k/50k)	~19 ns	~117 ns	~6× (flat across size)
`statement_lookup` (by hash)	6.83 ms	8.55 ms	1.25×
Reads — topic queries
`cache/hit`	2.00 µs	10.4 µs	~5×
`cache/miss`	3.29 µs	19.4 µs	~6×
`read_scaling/broadcasts` 1k	31.9 µs	188 µs	5.9×
`read_scaling/broadcasts` 10k	344 µs	2.73 ms	7.9×
`read_scaling/broadcasts` 50k	1.93 ms	18.9 ms	9.8×
`broadcasts` (1k store, 640 concurrent)	16.4 ms	227 ms	~14×
`posted`	16.5 ms	25.6 ms	1.6×
`statements_all`	166 ms	257 ms	1.55×
Writes
`submit`	10.7 ms	21.7 ms	~2×
`remove`	10.6 ms	22.4 ms	2.1×
`maintain`	9.8 ms	19.4 ms	2.0×
Mixed / concurrency
`mixed_workload`	119 ms	861 ms	7.2×
`contention_read_under_write`	47.5 ms	439 ms	9.2×

Some analysis by our common friend follow:

What it means

Writes ~2× — expected: index mutations are folded into the same commit, plus the extra columns. Moderate.
Reads 5–14×, and worse as the store grows (broadcasts: 6×→10× as size goes 1k→50k). At 50k a topic query goes from 1.9 ms to 18.9 ms.
Why broadcasts is 14× but posted is only 1.6×. Both are MatchAll, but broadcasts(&[t0, t1]) intersects three sets: it materializes the smallest (a topic set, ~100) and then does a disk get_size membership probe into each of the other sets (including the large dec_key = None set) for every candidate — ~200 disk probes per query × 640 concurrent. posted(&[], key) has no topics, hence no membership probes, just body fetches. So the dominant read cost is O(smallest_set × number_of_other_sets) disk probes in MatchAll.
Even a cache hit is ~5× slower than RAM (10 µs vs 2 µs): on a hit for the topic set, the membership check against the dec_key set still goes to disk.

Actionable (surfaced by the benchmark)

On a cache hit for the "other" sets in iterate_with_match_all, use the cached set (in-RAM contains) instead of a db.get_size per candidate. Only the smallest set is cached today; caching and intersecting the other sets in RAM on a hit should bring broadcasts / cache-hit reads close to stage 1. This is a direct read-path optimization (and matters for stage 2b too).

Caveats

Reduced sampling (--sample-size 10) → directional, not certified; the multi-threaded benchmarks are noisy.
Local temp DB with a hot OS page cache → on real disk the absolutes are higher, but the ratios are indicative.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

cla-bot-2021 · 2026-06-09T19:33:02Z

User @claude, please sign the CLA here.

alexggh · 2026-06-10T07:47:40Z

@alexggh This is just the "cost" side benchmark (without the "benefit" side — requires some memory measurement harness)

We do need the benefit side as well, both an analysis how much we save an some benchmark to try to prove it.

alexggh · 2026-06-10T07:52:56Z

+		let key_set = IndexSet::DecKey(key);
+		for topic in topics {
+			let prefix = topic[..].to_vec();
+			let mut iter =


Shouldn't this use the cache index somehow ?

alexggh · 2026-06-10T07:59:58Z

 		let mut result = Vec::new();
-		let query_index = self.query_index.read();
-		self.collect_statements_locked(key, topic_filter, &query_index, &mut result, f)?;
+		let mut query_index = self.query_index.lock();


This seems like a big problem now, because now you can't run any queries in parallel ?

I'm ready to discuss ways around that. The problem is that the LRU cache needs to update recency when hit so it naturally needs &mut self. Interior mutability may be the way, but let me know if you have other ideas

alexggh · 2026-06-10T08:01:47Z

@@ -1261,11 +1532,25 @@ impl StatementStore for Store {
 	}

 	fn has_statement(&self, hash: &Hash) -> bool {


This is on the hot path of everything, I don't expect we can go to disk here too often, I would have expected hitting a LRU cache more often than not.

s0me0ne-unkn0wn added 2 commits June 8, 2026 21:25

Statement store index optimization, stage 2a

d356070

Create pr_12304.prdoc

f44c41e

s0me0ne-unkn0wn added the T0-node This PR/Issue is related to the topic “node”. label Jun 8, 2026

s0me0ne-unkn0wn requested review from AndreiEres, DenzelPenzel and alexggh June 8, 2026 19:37

Improve benchmarks

0031d30

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

alexggh reviewed Jun 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Statement store index optimization, stage 2a#12304

Statement store index optimization, stage 2a#12304
s0me0ne-unkn0wn wants to merge 3 commits into
masterfrom
s0me0ne/statement-store-read-index-on-disk

s0me0ne-unkn0wn commented Jun 8, 2026

Uh oh!

alexggh commented Jun 9, 2026

Uh oh!

s0me0ne-unkn0wn commented Jun 9, 2026

Uh oh!

cla-bot-2021 Bot commented Jun 9, 2026

Uh oh!

alexggh commented Jun 10, 2026

Uh oh!

alexggh Jun 10, 2026

Uh oh!

alexggh Jun 10, 2026

Uh oh!

s0me0ne-unkn0wn Jun 10, 2026

Uh oh!

alexggh Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -1261,11 +1532,25 @@ impl StatementStore for Store {
		}

		fn has_statement(&self, hash: &Hash) -> bool {

Conversation

s0me0ne-unkn0wn commented Jun 8, 2026

Uh oh!

alexggh commented Jun 9, 2026

Uh oh!

s0me0ne-unkn0wn commented Jun 9, 2026

What it means

Actionable (surfaced by the benchmark)

Caveats

Uh oh!

cla-bot-2021 Bot commented Jun 9, 2026

Uh oh!

alexggh commented Jun 10, 2026

Uh oh!

alexggh Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

alexggh Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

s0me0ne-unkn0wn Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

alexggh Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants