Skip to content

YTDB-604: Lazy RID-only iteration for MATCH traversal steps#863

Open
sandrawar wants to merge 3 commits intodevelopfrom
lazy-result-match-traversal
Open

YTDB-604: Lazy RID-only iteration for MATCH traversal steps#863
sandrawar wants to merge 3 commits intodevelopfrom
lazy-result-match-traversal

Conversation

@sandrawar
Copy link
Copy Markdown
Collaborator

PR Title:

YTDB-604: Lazy RID-only iteration for MATCH traversal steps

Motivation:

Async-profiler data from LDBC IC5 (128K+ traversals) showed that the MATCH engine loads every intermediate vertex from storage (loadEntity()) even when only the RID is needed for traversal to the next hop. This causes unnecessary disk I/O, deserialization (EntityImpl.deserializeProperties() — 1.45% CPU), and GC pressure from short-lived ResultInternal objects wrapping full entities.

Most intermediate MATCH steps only need the RID — properties are only read at the final projection (RETURN post.title). By deferring loadEntity() to first property access, we skip I/O entirely for vertices that are just traversal waypoints or get rejected by downstream WHERE filters.

The fix adds ridIterator() to VertexFromLinkBagIterable, which yields bare RecordId objects from the LinkBag without touching storage. MatchEdgeTraverser.toExecutionStream() uses this path for VertexFromLinkBagIterable results. ResultInternal's existing lazy loading handles the rest — getIdentity() returns the RID immediately, getProperty() triggers loadEntity() on first access.

Class and RID pre-filters are preserved (both operate on the RID, no I/O needed).
No behavioral change for non-MATCH consumers — iterator() still returns
loaded Vertex objects.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request delivers a crucial performance optimization for the database's MATCH engine. By implementing a lazy loading strategy for intermediate traversal steps, the system now avoids materializing full Result objects when only Record IDs are needed. This change drastically cuts down on disk I/O, CPU cycles spent on deserialization, and memory overhead from short-lived objects, leading to more efficient execution of complex graph traversals, especially in scenarios with large datasets.

Highlights

  • Performance Enhancement: Introduced lazy RID-only iteration for the MATCH engine's traversal steps, significantly reducing unnecessary disk I/O, deserialization, and garbage collection pressure by deferring entity loading.
  • New ridIterator(): Added a ridIterator() method to VertexFromLinkBagIterable that directly yields RecordId objects from the LinkBag, applying filters without loading full entities from storage.
  • MatchEdgeTraverser Integration: Updated MatchEdgeTraverser.toExecutionStream() to leverage the new ridIterator() for VertexFromLinkBagIterable, ensuring that intermediate MATCH steps benefit from the lazy loading mechanism.
  • Non-Disruptive Change: Confirmed that existing class and RID pre-filters remain effective and I/O-free, and that the change does not affect non-MATCH consumers, who continue to receive fully loaded Vertex objects.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@sandrawar sandrawar changed the title YTDB-604: MATCH engine materializes full Result objects for intermedi… YTDB-604: Lazy RID-only iteration for MATCH traversal steps Mar 25, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant optimization for the MATCH query engine by implementing a "RID-only path" for VertexFromLinkBagIterable. A new ridIterator() method is added to VertexFromLinkBagIterable which allows iterating over RecordId objects directly from a LinkBag without loading the full entities, applying class and RID filters. The MatchEdgeTraverser.toExecutionStream method is updated to utilize this new iterator, enabling lazy loading of entities in MATCH traversals. Comprehensive unit tests have been added to validate the functionality and lazy-loading behavior of the new ridIterator() and its integration with the MATCH execution stream. I have no feedback to provide as there were no review comments.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 25, 2026

Test Count Gate Results

Tolerance: 5% drop allowed per module

Overall: ✅ 18060 tests (baseline: 18048, +12)

Module Baseline Current Change Status
core 7503 7515 +12
docker-tests 1891 1891 +0
embedded 1931 1931 +0
examples 3 3 +0
gremlin-annotations 30 30 +0
jmh-ldbc 39 39 +0
server 5504 5504 +0
tests 1147 1147 +0

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 25, 2026

Coverage Gate Results

Thresholds: 85% line, 70% branch

Line Coverage: ✅ 100.0% (32/32 lines)

File Coverage Uncovered Lines
core/src/main/java/com/jetbrains/youtrackdb/internal/core/record/impl/VertexFromLinkBagIterable.java ✅ 100.0% (23/23) -
core/src/main/java/com/jetbrains/youtrackdb/internal/core/sql/executor/match/MatchEdgeTraverser.java ✅ 100.0% (7/7) -
core/src/main/java/com/jetbrains/youtrackdb/internal/core/sql/executor/match/MatchFieldTraverser.java ✅ 100.0% (1/1) -
core/src/main/java/com/jetbrains/youtrackdb/internal/core/sql/executor/match/MatchReverseEdgeTraverser.java ✅ 100.0% (1/1) -

Branch Coverage: ✅ 100.0% (20/20 branches)

File Coverage Lines with Uncovered Branches
core/src/main/java/com/jetbrains/youtrackdb/internal/core/record/impl/VertexFromLinkBagIterable.java ✅ 100.0% (16/16) -
core/src/main/java/com/jetbrains/youtrackdb/internal/core/sql/executor/match/MatchEdgeTraverser.java ✅ 100.0% (4/4) -

@sandrawar sandrawar force-pushed the lazy-result-match-traversal branch 3 times, most recently from 3039c3f to ce57cae Compare April 2, 2026 06:54
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 3, 2026

JMH LDBC Benchmark Comparison

Base: 244d8c7275 (fork-point with develop) | Head: 2587baaffa
Summary: 🔴 4 regression(s), 🟢 6 improvement(s) (>±5% threshold)

Single-Thread Results

Benchmark Base ops/s Base err Head ops/s Head err Δ%
ic10_friendRecommendation 0.136 ±4.0% 0.148 ±5.2% +8.1% 🟢
ic11_jobReferral 38.8 ±1.6% 40.7 ±2.2% +5.0%
ic12_expertSearch 22.8 ±1.4% 23.6 ±2.2% +3.5%
ic13_shortestPath 4,189 ±1.3% 4,301 ±1.6% +2.7%
ic1_transitiveFriends 44.1 ±1.9% 47.0 ±2.0% +6.5% 🟢
ic2_recentFriendMessages 233.4 ±2.4% 227.4 ±1.8% -2.6%
ic3_friendsInCountries 0.171 ±3.8% 0.175 ±6.5% +2.4%
ic4_newTopics 4.0 ±6.0% 2.0 ±10.9% -49.8% 🔴
ic5_newGroups 0.095 ±24.4% 0.093 ±25.9% -1.7%
ic6_tagCoOccurrence 3.8 ±3.0% 3.9 ±3.2% +1.6%
ic7_recentLikers 63.8 ±2.5% 107.3 ±3.4% +68.1% 🟢
ic8_recentReplies 982.4 ±1.2% 966.3 ±0.9% -1.6%
ic9_recentFofMessages 1.3 ±1.1% 1.3 ±1.4% +1.4%
is1_personProfile 56,585 ±2.1% 57,179 ±0.8% +1.1%
is2_personPosts 579.4 ±1.0% 565.3 ±0.9% -2.4%
is3_personFriends 15,642 ±2.2% 16,111 ±2.8% +3.0%
is4_messageContent 78,892 ±1.2% 78,278 ±2.1% -0.8%
is5_messageCreator 72,476 ±1.0% 73,053 ±0.5% +0.8%
is6_messageForum 49,143 ±1.1% 49,470 ±1.8% +0.7%
is7_messageReplies 2,898 ±1.0% 6,071 ±1.3% +109.5% 🟢

Multi-Thread Results

Benchmark Base ops/s Base err Head ops/s Head err Δ%
ic10_friendRecommendation 0.662 ±1.9% 0.638 ±1.9% -3.6%
ic11_jobReferral 197.2 ±2.9% 199.2 ±2.2% +1.0%
ic12_expertSearch 129.5 ±0.9% 123.8 ±1.7% -4.5%
ic13_shortestPath 21,306 ±2.9% 21,340 ±2.5% +0.2%
ic1_transitiveFriends 241.3 ±0.6% 229.0 ±2.4% -5.1% 🔴
ic2_recentFriendMessages 1,180 ±0.8% 1,110 ±1.3% -5.9% 🔴
ic3_friendsInCountries 0.745 ±1.5% 0.738 ±1.3% -0.9%
ic4_newTopics 16.3 ±1.6% 8.0 ±7.5% -50.7% 🔴
ic5_newGroups 0.431 ±1.2% 0.422 ±2.9% -2.0%
ic6_tagCoOccurrence 19.5 ±1.3% 18.8 ±1.4% -3.7%
ic7_recentLikers 313.1 ±1.4% 474.9 ±1.2% +51.7% 🟢
ic8_recentReplies 5,172 ±0.9% 5,046 ±1.0% -2.4%
ic9_recentFofMessages 6.9 ±2.4% 6.7 ±3.4% -2.9%
is1_personProfile 256,566 ±2.3% 265,238 ±1.7% +3.4%
is2_personPosts 2,988 ±0.6% 2,931 ±0.9% -1.9%
is3_personFriends 78,888 ±3.1% 80,639 ±3.7% +2.2%
is4_messageContent 352,026 ±2.6% 364,779 ±1.2% +3.6%
is5_messageCreator 321,584 ±2.2% 330,917 ±1.6% +2.9%
is6_messageForum 220,457 ±2.7% 228,318 ±1.6% +3.6%
is7_messageReplies 14,980 ±0.8% 28,383 ±0.7% +89.5% 🟢

Scalability (MT/ST ratio)

Benchmark Base ratio Head ratio Δ%
ic10_friendRecommendation 4.85x 4.32x -10.9%
ic11_jobReferral 5.08x 4.89x -3.8%
ic12_expertSearch 5.68x 5.24x -7.7%
ic13_shortestPath 5.09x 4.96x -2.5%
ic1_transitiveFriends 5.48x 4.88x -10.9%
ic2_recentFriendMessages 5.05x 4.88x -3.4%
ic3_friendsInCountries 4.35x 4.21x -3.2%
ic4_newTopics 4.07x 4.00x -1.8%
ic5_newGroups 4.53x 4.52x -0.3%
ic6_tagCoOccurrence 5.11x 4.84x -5.3%
ic7_recentLikers 4.91x 4.43x -9.8%
ic8_recentReplies 5.26x 5.22x -0.8%
ic9_recentFofMessages 5.45x 5.21x -4.2%
is1_personProfile 4.53x 4.64x +2.3%
is2_personPosts 5.16x 5.19x +0.6%
is3_personFriends 5.04x 5.01x -0.8%
is4_messageContent 4.46x 4.66x +4.4%
is5_messageCreator 4.44x 4.53x +2.1%
is6_messageForum 4.49x 4.62x +2.9%
is7_messageReplies 5.17x 4.68x -9.6%

@andrii0lomakin
Copy link
Copy Markdown
Collaborator

Hi @sandrawar, please profile regressions using asyncprofiler on Hetzner CCX 33 node and find out what caused regressions.

@sandrawar sandrawar force-pushed the lazy-result-match-traversal branch from ce57cae to eaf7698 Compare April 3, 2026 11:54
…lter exists

The unconditional ridIterator() path caused ~50% regression on IC4 and
smaller regressions on IC1/IC2 MT because the WHERE filter forces loading
every entity anyway, making the lazy ResultInternal path more expensive
than eager VertexFromLinkBagIterator loading (extra isBlob() schema
lookups per entity). Now ridOnlyPath=true only when filter==null.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants