YTDB-604: Lazy RID-only iteration for MATCH traversal steps by sandrawar · Pull Request #863 · JetBrains/youtrackdb

sandrawar · 2026-03-25T14:50:13Z

PR Title:

YTDB-604: Lazy RID-only iteration for MATCH traversal steps

Motivation:

Async-profiler data from LDBC IC5 (128K+ traversals) showed that the MATCH engine loads every intermediate vertex from storage (loadEntity()) even when only the RID is needed for traversal to the next hop. This causes unnecessary disk I/O, deserialization (EntityImpl.deserializeProperties() — 1.45% CPU), and GC pressure from short-lived ResultInternal objects wrapping full entities.

Most intermediate MATCH steps only need the RID — properties are only read at the final projection (RETURN post.title). By deferring loadEntity() to first property access, we skip I/O entirely for vertices that are just traversal waypoints or get rejected by downstream WHERE filters.

The fix adds ridIterator() to VertexFromLinkBagIterable, which yields bare RecordId objects from the LinkBag without touching storage. MatchEdgeTraverser.toExecutionStream() uses this path for VertexFromLinkBagIterable results. ResultInternal's existing lazy loading handles the rest — getIdentity() returns the RID immediately, getProperty() triggers loadEntity() on first access.

Class and RID pre-filters are preserved (both operate on the RID, no I/O needed).
No behavioral change for non-MATCH consumers — iterator() still returns
loaded Vertex objects.

gemini-code-assist · 2026-03-25T14:50:38Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request delivers a crucial performance optimization for the database's MATCH engine. By implementing a lazy loading strategy for intermediate traversal steps, the system now avoids materializing full Result objects when only Record IDs are needed. This change drastically cuts down on disk I/O, CPU cycles spent on deserialization, and memory overhead from short-lived objects, leading to more efficient execution of complex graph traversals, especially in scenarios with large datasets.

Highlights

Performance Enhancement: Introduced lazy RID-only iteration for the MATCH engine's traversal steps, significantly reducing unnecessary disk I/O, deserialization, and garbage collection pressure by deferring entity loading.
New ridIterator(): Added a ridIterator() method to VertexFromLinkBagIterable that directly yields RecordId objects from the LinkBag, applying filters without loading full entities from storage.
MatchEdgeTraverser Integration: Updated MatchEdgeTraverser.toExecutionStream() to leverage the new ridIterator() for VertexFromLinkBagIterable, ensuring that intermediate MATCH steps benefit from the lazy loading mechanism.
Non-Disruptive Change: Confirmed that existing class and RID pre-filters remain effective and I/O-free, and that the change does not affect non-MATCH consumers, who continue to receive fully loaded Vertex objects.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a significant optimization for the MATCH query engine by implementing a "RID-only path" for VertexFromLinkBagIterable. A new ridIterator() method is added to VertexFromLinkBagIterable which allows iterating over RecordId objects directly from a LinkBag without loading the full entities, applying class and RID filters. The MatchEdgeTraverser.toExecutionStream method is updated to utilize this new iterator, enabling lazy loading of entities in MATCH traversals. Comprehensive unit tests have been added to validate the functionality and lazy-loading behavior of the new ridIterator() and its integration with the MATCH execution stream. I have no feedback to provide as there were no review comments.

github-actions · 2026-03-25T15:39:35Z

Test Count Gate Results

Tolerance: 5% drop allowed per module

Overall: ✅ 18060 tests (baseline: 18048, +12)

Module	Baseline	Current	Change	Status
`core`	7503	7515	+12	✅
`docker-tests`	1891	1891	+0	✅
`embedded`	1931	1931	+0	✅
`examples`	3	3	+0	✅
`gremlin-annotations`	30	30	+0	✅
`jmh-ldbc`	39	39	+0	✅
`server`	5504	5504	+0	✅
`tests`	1147	1147	+0	✅

github-actions · 2026-03-25T15:39:35Z

Coverage Gate Results

Thresholds: 85% line, 70% branch

Line Coverage: ✅ 100.0% (32/32 lines)

File	Coverage	Uncovered Lines
`core/src/main/java/com/jetbrains/youtrackdb/internal/core/record/impl/VertexFromLinkBagIterable.java`	✅ 100.0% (23/23)	-
`core/src/main/java/com/jetbrains/youtrackdb/internal/core/sql/executor/match/MatchEdgeTraverser.java`	✅ 100.0% (7/7)	-
`core/src/main/java/com/jetbrains/youtrackdb/internal/core/sql/executor/match/MatchFieldTraverser.java`	✅ 100.0% (1/1)	-
`core/src/main/java/com/jetbrains/youtrackdb/internal/core/sql/executor/match/MatchReverseEdgeTraverser.java`	✅ 100.0% (1/1)	-

Branch Coverage: ✅ 100.0% (20/20 branches)

File	Coverage	Lines with Uncovered Branches
`core/src/main/java/com/jetbrains/youtrackdb/internal/core/record/impl/VertexFromLinkBagIterable.java`	✅ 100.0% (16/16)	-
`core/src/main/java/com/jetbrains/youtrackdb/internal/core/sql/executor/match/MatchEdgeTraverser.java`	✅ 100.0% (4/4)	-

github-actions · 2026-04-03T00:03:30Z

JMH LDBC Benchmark Comparison

Base: 244d8c7275 (fork-point with develop) | Head: 2587baaffa
Summary: 🔴 4 regression(s), 🟢 6 improvement(s) (>±5% threshold)

Single-Thread Results

Benchmark	Base ops/s	Base err	Head ops/s	Head err	Δ%
ic10_friendRecommendation	0.136	±4.0%	0.148	±5.2%	+8.1% 🟢
ic11_jobReferral	38.8	±1.6%	40.7	±2.2%	+5.0%
ic12_expertSearch	22.8	±1.4%	23.6	±2.2%	+3.5%
ic13_shortestPath	4,189	±1.3%	4,301	±1.6%	+2.7%
ic1_transitiveFriends	44.1	±1.9%	47.0	±2.0%	+6.5% 🟢
ic2_recentFriendMessages	233.4	±2.4%	227.4	±1.8%	-2.6%
ic3_friendsInCountries	0.171	±3.8%	0.175	±6.5%	+2.4%
ic4_newTopics	4.0	±6.0%	2.0	±10.9%	-49.8% 🔴
ic5_newGroups	0.095	±24.4%	0.093	±25.9%	-1.7%
ic6_tagCoOccurrence	3.8	±3.0%	3.9	±3.2%	+1.6%
ic7_recentLikers	63.8	±2.5%	107.3	±3.4%	+68.1% 🟢
ic8_recentReplies	982.4	±1.2%	966.3	±0.9%	-1.6%
ic9_recentFofMessages	1.3	±1.1%	1.3	±1.4%	+1.4%
is1_personProfile	56,585	±2.1%	57,179	±0.8%	+1.1%
is2_personPosts	579.4	±1.0%	565.3	±0.9%	-2.4%
is3_personFriends	15,642	±2.2%	16,111	±2.8%	+3.0%
is4_messageContent	78,892	±1.2%	78,278	±2.1%	-0.8%
is5_messageCreator	72,476	±1.0%	73,053	±0.5%	+0.8%
is6_messageForum	49,143	±1.1%	49,470	±1.8%	+0.7%
is7_messageReplies	2,898	±1.0%	6,071	±1.3%	+109.5% 🟢

Multi-Thread Results

Benchmark	Base ops/s	Base err	Head ops/s	Head err	Δ%
ic10_friendRecommendation	0.662	±1.9%	0.638	±1.9%	-3.6%
ic11_jobReferral	197.2	±2.9%	199.2	±2.2%	+1.0%
ic12_expertSearch	129.5	±0.9%	123.8	±1.7%	-4.5%
ic13_shortestPath	21,306	±2.9%	21,340	±2.5%	+0.2%
ic1_transitiveFriends	241.3	±0.6%	229.0	±2.4%	-5.1% 🔴
ic2_recentFriendMessages	1,180	±0.8%	1,110	±1.3%	-5.9% 🔴
ic3_friendsInCountries	0.745	±1.5%	0.738	±1.3%	-0.9%
ic4_newTopics	16.3	±1.6%	8.0	±7.5%	-50.7% 🔴
ic5_newGroups	0.431	±1.2%	0.422	±2.9%	-2.0%
ic6_tagCoOccurrence	19.5	±1.3%	18.8	±1.4%	-3.7%
ic7_recentLikers	313.1	±1.4%	474.9	±1.2%	+51.7% 🟢
ic8_recentReplies	5,172	±0.9%	5,046	±1.0%	-2.4%
ic9_recentFofMessages	6.9	±2.4%	6.7	±3.4%	-2.9%
is1_personProfile	256,566	±2.3%	265,238	±1.7%	+3.4%
is2_personPosts	2,988	±0.6%	2,931	±0.9%	-1.9%
is3_personFriends	78,888	±3.1%	80,639	±3.7%	+2.2%
is4_messageContent	352,026	±2.6%	364,779	±1.2%	+3.6%
is5_messageCreator	321,584	±2.2%	330,917	±1.6%	+2.9%
is6_messageForum	220,457	±2.7%	228,318	±1.6%	+3.6%
is7_messageReplies	14,980	±0.8%	28,383	±0.7%	+89.5% 🟢

Scalability (MT/ST ratio)

Benchmark	Base ratio	Head ratio	Δ%
ic10_friendRecommendation	4.85x	4.32x	-10.9%
ic11_jobReferral	5.08x	4.89x	-3.8%
ic12_expertSearch	5.68x	5.24x	-7.7%
ic13_shortestPath	5.09x	4.96x	-2.5%
ic1_transitiveFriends	5.48x	4.88x	-10.9%
ic2_recentFriendMessages	5.05x	4.88x	-3.4%
ic3_friendsInCountries	4.35x	4.21x	-3.2%
ic4_newTopics	4.07x	4.00x	-1.8%
ic5_newGroups	4.53x	4.52x	-0.3%
ic6_tagCoOccurrence	5.11x	4.84x	-5.3%
ic7_recentLikers	4.91x	4.43x	-9.8%
ic8_recentReplies	5.26x	5.22x	-0.8%
ic9_recentFofMessages	5.45x	5.21x	-4.2%
is1_personProfile	4.53x	4.64x	+2.3%
is2_personPosts	5.16x	5.19x	+0.6%
is3_personFriends	5.04x	5.01x	-0.8%
is4_messageContent	4.46x	4.66x	+4.4%
is5_messageCreator	4.44x	4.53x	+2.1%
is6_messageForum	4.49x	4.62x	+2.9%
is7_messageReplies	5.17x	4.68x	-9.6%

andrii0lomakin · 2026-04-03T02:34:58Z

Hi @sandrawar, please profile regressions using asyncprofiler on Hetzner CCX 33 node and find out what caused regressions.

…ate steps

…lter exists The unconditional ridIterator() path caused ~50% regression on IC4 and smaller regressions on IC1/IC2 MT because the WHERE filter forces loading every entity anyway, making the lazy ResultInternal path more expensive than eager VertexFromLinkBagIterator loading (extra isBlob() schema lookups per entity). Now ridOnlyPath=true only when filter==null.

sandrawar changed the title ~~YTDB-604: MATCH engine materializes full Result objects for intermedi…~~ YTDB-604: Lazy RID-only iteration for MATCH traversal steps Mar 25, 2026

gemini-code-assist bot reviewed Mar 25, 2026

View reviewed changes

sandrawar requested review from andrii0lomakin March 25, 2026 15:16

sandrawar force-pushed the lazy-result-match-traversal branch 3 times, most recently from 3039c3f to ce57cae Compare April 2, 2026 06:54

YTDB-604: MATCH engine materializes full Result objects for intermedi…

eaf7698

…ate steps

sandrawar force-pushed the lazy-result-match-traversal branch from ce57cae to eaf7698 Compare April 3, 2026 11:54

sandrawar added 2 commits April 3, 2026 14:22

YTDB-604: add missing VertexFromLinkBagIterable import

2587baa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

YTDB-604: Lazy RID-only iteration for MATCH traversal steps#863

YTDB-604: Lazy RID-only iteration for MATCH traversal steps#863
sandrawar wants to merge 3 commits intodevelopfrom
lazy-result-match-traversal

sandrawar commented Mar 25, 2026

Uh oh!

gemini-code-assist bot commented Mar 25, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

github-actions bot commented Mar 25, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 25, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Apr 3, 2026 •

edited

Loading

Uh oh!

andrii0lomakin commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sandrawar commented Mar 25, 2026

PR Title:

Motivation:

Uh oh!

gemini-code-assist bot commented Mar 25, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

github-actions bot commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Count Gate Results

Overall: ✅ 18060 tests (baseline: 18048, +12)

Uh oh!

github-actions bot commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Coverage Gate Results

Line Coverage: ✅ 100.0% (32/32 lines)

Branch Coverage: ✅ 100.0% (20/20 branches)

Uh oh!

github-actions bot commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

JMH LDBC Benchmark Comparison

Single-Thread Results

Multi-Thread Results

Scalability (MT/ST ratio)

Uh oh!

andrii0lomakin commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented Mar 25, 2026 •

edited

Loading

github-actions bot commented Mar 25, 2026 •

edited

Loading

github-actions bot commented Apr 3, 2026 •

edited

Loading