Statement store index optimization, stage 2a#12304
Conversation
|
It would be interesting to have some performance benchmarks here, to better understand what is the penalty for keeping this indexes in memory vs in the database. |
|
@alexggh This is just the "cost" side benchmark (without the "benefit" side — requires some memory measurement harness)
Some analysis by our common friend follow: What it means
Actionable (surfaced by the benchmark)On a cache hit for the "other" sets in Caveats
|
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
We do need the benefit side as well, both an analysis how much we save an some benchmark to try to prove it. |
| let key_set = IndexSet::DecKey(key); | ||
| for topic in topics { | ||
| let prefix = topic[..].to_vec(); | ||
| let mut iter = |
There was a problem hiding this comment.
Shouldn't this use the cache index somehow ?
| let mut result = Vec::new(); | ||
| let query_index = self.query_index.read(); | ||
| self.collect_statements_locked(key, topic_filter, &query_index, &mut result, f)?; | ||
| let mut query_index = self.query_index.lock(); |
There was a problem hiding this comment.
This seems like a big problem now, because now you can't run any queries in parallel ?
There was a problem hiding this comment.
I'm ready to discuss ways around that. The problem is that the LRU cache needs to update recency when hit so it naturally needs &mut self. Interior mutability may be the way, but let me know if you have other ideas
| @@ -1261,11 +1532,25 @@ impl StatementStore for Store { | |||
| } | |||
|
|
|||
| fn has_statement(&self, hash: &Hash) -> bool { | |||
There was a problem hiding this comment.
This is on the hot path of everything, I don't expect we can go to disk here too often, I would have expected hitting a LRU cache more often than not.
This PR implements stage 2a of the optimizations proposed in #10910 (Move Read Index to Disk with LRU Cache).
As a notable difference from the original issue, the
evictedindex has also been moved to disk (AFAIR, it used to consume ~150 MiB of RAM with 4M statements).