Skip to content

YTDB-631: Cons list path#866

Open
sandrawar wants to merge 4 commits intodevelopfrom
cons-list-path
Open

YTDB-631: Cons list path#866
sandrawar wants to merge 4 commits intodevelopfrom
cons-list-path

Conversation

@sandrawar
Copy link
Copy Markdown
Collaborator

@sandrawar sandrawar commented Mar 26, 2026

PR Title:

YTDB-631: Replace ArrayList path copies with immutable cons-cell list in
MATCH WHILE traversal

Motivation:

During recursive MATCH WHILE traversals (e.g., IS2 query following REPLY_OF chains), the engine copies the entire path ArrayList at every recursion level:

List<Result> newPath = new ArrayList<>();
if (pathToHere != null) newPath.addAll(pathToHere);
newPath.add(origin);

For a chain of depth D, this produces O(D²) element copies (1+2+3+...+D). Profiling showed 3,527 ArrayList allocation samples attributed to this pattern. With 20 messages each traversing 5-hop Comment→Post chains, that's ~300 unnecessary ArrayList allocations per query.
This PR introduces PathNode — an immutable cons-cell record that shares structure with ancestor paths:

record PathNode(@Nonnull Result value, @Nullable PathNode prev, int depth) {
     List<Result> toList() { /* materialize only when needed */ }
 }

Appending is O(1) instead of O(depth). Paths sharing a common prefix reuse the same node chain in memory. Materialization to List (via toList()) is deferred and only happens when the user declares a pathAlias —
which IS2 does not, so for IS2 the list is never built at all.

Risk is low: the change is contained within MatchEdgeTraverser.executeTraversal() and the PathNode is a simple,
stateless record. All existing MATCH WHILE tests (including pathAlias and diamond-graph dedup scenarios) pass unchanged.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant performance improvement for recursive MATCH WHILE traversals by refactoring how paths are managed. Instead of repeatedly copying ArrayList instances, an immutable cons-cell PathNode record is now used. This change drastically reduces memory allocations and CPU overhead by enabling constant-time path appending and structural sharing, while also deferring the materialization of paths into lists until they are actually needed by the user.

Highlights

  • Performance Optimization: Replaced the O(D²) ArrayList path copying mechanism with an O(1) immutable cons-cell PathNode for MATCH WHILE traversals, significantly reducing element copies and memory allocations.
  • New Data Structure: Introduced PathNode, an immutable record that represents a path as a linked list, enabling structural sharing of common path prefixes.
  • Deferred Materialization: Materialization of the path into a List<Result> is now deferred until explicitly required (e.g., when a pathAlias is read), avoiding unnecessary list constructions.
  • Code Simplification: Simplified the path building logic within MatchEdgeTraverser by leveraging the PathNode record's O(1) append operation.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the path tracking mechanism in MATCH WHILE traversals by introducing an immutable cons-cell list, PathNode, to replace the previous ArrayList-based approach. This change aims to improve performance by making path appending an O(1) operation and enabling structural sharing. The PathNode is materialized into a List only when explicitly required. A review comment highlights a potential StackOverflowError in the recursive toList() implementation of PathNode for very deep traversal paths, suggesting an iterative approach for improved safety and performance.

@sandrawar sandrawar changed the title Cons list path YTDB-631: Cons list path Mar 26, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 26, 2026

Test Count Gate Results

✅ No baseline available yet — gate skipped (first run).

@github-actions
Copy link
Copy Markdown

Coverage Gate Results

Thresholds: 85% line, 70% branch

Line Coverage: ✅ 100.0% (10/10 lines)

File Coverage Uncovered Lines
core/src/main/java/com/jetbrains/youtrackdb/internal/core/sql/executor/match/MatchEdgeTraverser.java ✅ 100.0% (2/2) -
core/src/main/java/com/jetbrains/youtrackdb/internal/core/sql/executor/match/PathNode.java ✅ 100.0% (8/8) -

Branch Coverage: ✅ 100.0% (4/4 branches)

File Coverage Lines with Uncovered Branches
core/src/main/java/com/jetbrains/youtrackdb/internal/core/sql/executor/match/MatchEdgeTraverser.java ✅ 100.0% (2/2) -
core/src/main/java/com/jetbrains/youtrackdb/internal/core/sql/executor/match/PathNode.java ✅ 100.0% (2/2) -

@sandrawar sandrawar requested review from andrii0lomakin and removed request for andrii0lomakin March 26, 2026 15:30
@sandrawar sandrawar force-pushed the cons-list-path branch 2 times, most recently from 5e89ac0 to 910deee Compare April 2, 2026 06:54
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 2, 2026

JMH LDBC Benchmark Comparison

Base: 244d8c7275 (fork-point with develop) | Head: 60af410992
Summary: 🔴 2 regression(s), 🟢 2 improvement(s) (>±5% threshold)

Single-Thread Results

Benchmark Base ops/s Base err Head ops/s Head err Δ%
ic10_friendRecommendation 0.135 ±5.8% 0.136 ±4.2% +1.0%
ic11_jobReferral 38.6 ±2.5% 38.0 ±1.9% -1.6%
ic12_expertSearch 22.3 ±1.9% 23.7 ±1.1% +6.6% 🟢
ic13_shortestPath 3,583 ±2.4% 3,649 ±1.1% +1.9%
ic1_transitiveFriends 42.5 ±2.1% 42.1 ±4.2% -1.0%
ic2_recentFriendMessages 216.4 ±7.5% 215.0 ±5.4% -0.7%
ic3_friendsInCountries 0.150 ±4.2% 0.145 ±8.1% -3.3%
ic4_newTopics 4.1 ±4.2% 4.1 ±2.5% +1.0%
ic5_newGroups 0.094 ±23.5% 0.091 ±23.6% -3.5%
ic6_tagCoOccurrence 3.7 ±3.2% 3.7 ±2.3% +0.3%
ic7_recentLikers 60.3 ±3.1% 57.4 ±3.4% -4.9%
ic8_recentReplies 960.3 ±1.3% 939.6 ±0.7% -2.2%
ic9_recentFofMessages 1.2 ±2.0% 1.2 ±1.9% +1.0%
is1_personProfile 53,873 ±1.9% 55,399 ±2.4% +2.8%
is2_personPosts 559.6 ±0.8% 546.8 ±1.4% -2.3%
is3_personFriends 14,419 ±3.0% 15,453 ±5.3% +7.2% 🟢
is4_messageContent 76,757 ±1.7% 77,317 ±1.7% +0.7%
is5_messageCreator 70,514 ±1.2% 70,887 ±1.6% +0.5%
is6_messageForum 48,152 ±2.4% 47,264 ±1.5% -1.8%
is7_messageReplies 2,872 ±1.5% 2,913 ±1.3% +1.4%

Multi-Thread Results

Benchmark Base ops/s Base err Head ops/s Head err Δ%
ic10_friendRecommendation 0.613 ±6.1% 0.601 ±2.7% -1.9%
ic11_jobReferral 183.9 ±2.6% 172.1 ±3.9% -6.4% 🔴
ic12_expertSearch 121.8 ±2.5% 123.8 ±1.5% +1.7%
ic13_shortestPath 17,729 ±4.5% 17,618 ±4.5% -0.6%
ic1_transitiveFriends 226.4 ±1.6% 216.0 ±2.2% -4.6%
ic2_recentFriendMessages 1,001 ±3.4% 990.4 ±2.1% -1.0%
ic3_friendsInCountries 0.690 ±3.5% 0.676 ±2.7% -2.0%
ic4_newTopics 15.3 ±6.0% 14.4 ±5.3% -5.6% 🔴
ic5_newGroups 0.411 ±5.1% 0.407 ±6.0% -0.8%
ic6_tagCoOccurrence 17.4 ±1.7% 17.3 ±1.9% -0.6%
ic7_recentLikers 275.1 ±3.8% 268.3 ±2.3% -2.5%
ic8_recentReplies 4,844 ±1.2% 4,674 ±2.0% -3.5%
ic9_recentFofMessages 6.2 ±3.6% 6.2 ±4.0% +0.1%
is1_personProfile 233,143 ±1.1% 235,515 ±1.7% +1.0%
is2_personPosts 2,846 ±1.2% 2,788 ±2.0% -2.0%
is3_personFriends 70,886 ±2.6% 73,012 ±2.2% +3.0%
is4_messageContent 313,665 ±1.8% 312,485 ±1.3% -0.4%
is5_messageCreator 288,088 ±1.1% 281,061 ±4.5% -2.4%
is6_messageForum 197,535 ±1.6% 203,281 ±1.1% +2.9%
is7_messageReplies 14,175 ±1.2% 14,174 ±1.8% -0.0%

Scalability (MT/ST ratio)

Benchmark Base ratio Head ratio Δ%
ic10_friendRecommendation 4.55x 4.42x -2.9%
ic11_jobReferral 4.77x 4.53x -4.9%
ic12_expertSearch 5.47x 5.22x -4.6%
ic13_shortestPath 4.95x 4.83x -2.4%
ic1_transitiveFriends 5.32x 5.13x -3.6%
ic2_recentFriendMessages 4.62x 4.61x -0.4%
ic3_friendsInCountries 4.61x 4.67x +1.3%
ic4_newTopics 3.74x 3.49x -6.6%
ic5_newGroups 4.38x 4.50x +2.8%
ic6_tagCoOccurrence 4.67x 4.62x -1.0%
ic7_recentLikers 4.56x 4.68x +2.5%
ic8_recentReplies 5.04x 4.97x -1.4%
ic9_recentFofMessages 5.04x 5.00x -0.9%
is1_personProfile 4.33x 4.25x -1.8%
is2_personPosts 5.09x 5.10x +0.3%
is3_personFriends 4.92x 4.72x -3.9%
is4_messageContent 4.09x 4.04x -1.1%
is5_messageCreator 4.09x 3.96x -3.0%
is6_messageForum 4.10x 4.30x +4.8%
is7_messageReplies 4.94x 4.87x -1.4%

@andrii0lomakin
Copy link
Copy Markdown
Collaborator

Hi @sandrawar, please profile regressions using asyncprofiler on Hetzner CCX 33 node and find out what caused the regressions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants