YTDB-635: Index ordered match#880
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces an index-ordered MATCH traversal optimization to improve query performance when results are sorted by a property on a target vertex. Key additions include IndexOrderedEdgeStep for optimized edge traversal, a cost-based heuristic model (IndexOrderedCostModel) to choose between index scans and in-memory sorting, and RidFilteredIndexValuesStep for efficient filtered index scans. The MatchExecutionPlanner was updated to detect these optimization opportunities and suppress or optimize OrderByStep accordingly. Review feedback suggests extracting a helper method for entity loading to reduce code duplication and replacing fragile string-based AST inspection with direct structure checks in the planner.
...ain/java/com/jetbrains/youtrackdb/internal/core/sql/executor/match/IndexOrderedEdgeStep.java
Outdated
Show resolved
Hide resolved
...in/java/com/jetbrains/youtrackdb/internal/core/sql/executor/match/MatchExecutionPlanner.java
Outdated
Show resolved
Hide resolved
...in/java/com/jetbrains/youtrackdb/internal/core/sql/executor/match/MatchExecutionPlanner.java
Outdated
Show resolved
Hide resolved
Test Count Gate Results✅ No baseline available yet — gate skipped (first run). |
Coverage Gate ResultsThresholds: 85% line, 70% branch Line Coverage: ✅ 86.5% (688/795 lines)
Branch Coverage: ✅ 70.6% (377/534 branches)
|
|
@sandrawar why do we have -301 tests on this branch ? That is huge decrease |
d4507f8 to
0d6c1f8
Compare
d048143 to
2fc7b6c
Compare
error in test count, fixed by rebase to develop |
0d6c1f8 to
ca6ccc6
Compare
6e4849a to
5f165ff
Compare
JMH LDBC Benchmark ComparisonBase: Single-Thread Results
Multi-Thread Results
Scalability (MT/ST ratio)
|
|
Hi @sandrawar, please profile regressions using asyncprofiler on Hetzner CCX 33 node and find out what caused the regressions. |
ca6ccc6 to
083ce48
Compare
5f165ff to
0aadc7f
Compare
…hoosing path moved to execution from planning phase
0aadc7f to
7289036
Compare
42135e6 to
393e055
Compare
design.md
PR Title:
YTDB-635: Index ordered match
Motivation:
MATCH queries with ORDER BY … LIMIT K on an edge target property currently load all edge targets into memory, sort them, and take the top K. For LDBC queries like IS2 (a Person's recent messages), this means loading all 500 messages to return the latest 20 — dominated by random I/O on records that are immediately discarded.
When a single-field index exists on the ORDER BY property, we can scan the index in sort order and use a bitmap filter (RidSet from the source's LinkBag) to skip non-matching entries at near-zero cost. For the IS2 case this reduces the work from 500 random record loads + in-memory sort to ~20 loads and zero sort. For multi-field ORDER BY (e.g., IC2: creationDate DESC, messageId ASC), the index scan provides primary-key ordering and a bounded heap with early termination handles the secondary sort (queries like IC2, IC7, IC8, IC9, IS3, IS7).
The optimization also supports multi-source queries (multiple source vertices) with four execution modes depending on whether the source has a WHERE filter and whether the source alias appears in RETURN. A cost model compares index scan vs load-all-and-sort and falls back transparently when the index scan is not profitable.