feat: add performance report by wesley-weiming · Pull Request #1 · tiktok/project-impact-graph

wesley-weiming · 2023-12-06T08:12:41Z

We submitted a performance report
Supports automatically generating project-impact-graph.yaml for a given number of projects and dependencies, and then you can test it yourself
In the performance test, we found that using "minimatch" for glob matching would lead to performance degradation (For example, in the case of 2,000 projects and 10,000 dependencies, the time to calculate 100 paths reaches minutes). So we need to change the path matching method of the algorithm and use string matching instead of glob matching. This means that fields such as includeGlobs and excludeGlobs in the file schema must also discard the representation of glob.

CLAassistant · 2023-12-06T08:12:48Z

All committers have signed the CLA.

chengcyber · 2023-12-07T01:47:38Z

src/project-impact-graph.yaml

    A:
        includedGlobs:
-            - projects/folder_A/**
+            - projects/folder_A/


Could you elaborate the syntax used here?

projects: A: includedGlobs: - projects/folder_A/ excludedGlobs: - projects/folder_A/README.md dependentProjects: - G

This semantics describes a project named 'A', and 'includedGlobs' is used to specify the files that should be included in project 'A'. 'excludedGlobs' is a subset of 'includedGlobs', used to specify which paths in project 'A' need to be filtered. 'dependentProjects' indicates which projects directly depend on 'A'

projects/folder_A/, Because we replaced glob match with startsWith, this folder path is used here to represent project A.

It feels like it is no longer included/excludedGlobs. It's included/excluedPrefix now.

You are right, it is no longer appropriate to continue using 'glob'

chengcyber · 2023-12-07T01:49:47Z

src/performance/index.ts

@@ -0,0 +1,148 @@
+import path from 'path';


Could you add a REAMD to teach us how to run the performance?

My guess now is run this file with node. Is it true?

Even better, you can put the performance result with the running environment info.

Of course, let me add a new commit

chengcyber · 2023-12-07T02:23:42Z

src/project-impact-graph.yaml

@@ -2,11 +2,11 @@ globalExcludedGlobs:
    - OWNERS


Is these file names still working? The implementation has been changed to match with startsWith

OWNERS build.sh bootstrap.sh

These represent public configuration files in the root directory, OWNERS means repoRootDir/OWNERS, which is still available for startsWith

octogonz · 2023-12-11T23:17:56Z

src/index.ts

@@ -4,7 +4,6 @@
 import fs from 'fs';
 import yaml from 'yaml';
 import _ from 'lodash';


Is lodash used in any inner loops? Some time ago a perf investigation showed that Lodash's algorithms are often extremely inefficient due to so many layers of abstractions in its code base.

Not used in loops, lodash is only used twice in a complete calculation process (using its cloneDeep API to clone the graph structure)

octogonz · 2023-12-18T21:39:17Z

README.md

+| 3000      | 100000    | 1          | 1          | 9.46s       |
+| 3000      | 100000    | 10         | 10         | 10.533s     |
+| 3000      | 100000    | 100        | 100        | 11.029s     |
+| 3000      | 100000    | 1000       | 1000       | 11.984s     |


@wesley-weiming

In https://github.com/tiktok/project-impact-graph/pull/3/files I've added some instrumentation to count the number of times each part of the loop is executed.

Here's one of the test cases:

[ { nodeCount: 2000, edgeCount: 10000, pathCountA: 1000, pathCountB: 1000, hasImpactIntersection: true, executeTime: '1.513s' } ] { _integrateExcludedGlobs: 1, _integrateExcludedGlobs2: 2000, _validatePaths: 2, _validatePaths2: 4008000, lookUpProjectNamesByPathList: 2, lookUpProjectNamesByPathList2: 4000000, lookUpProjectNamesByPathList3: 4000000, lookUpProjectNamesByPathList4: 5780, getProjectImpactByProjectNames: 2, getProjectImpactByProjectNames2: 4000, getProjectImpactByProjectNames3: 24000, hasImpactIntersection: 1 }

The nodeCount is the number of projects, and pathCountA and pathCountB are the "before" and "after" lists of paths from the diff. The bottleneck of this algorithm seems to be 4,000,000 which is O(numberOfPaths * numberOfProjects).

But notice that our project paths have a well-behaved structure, for example:

INCLUDE apps/my-app/**

EXCLUDE apps/my-app/README.md

INCLUDE apps/my-app2/**

EXCLUDE apps/my-app2/README.md

INCLUDE libraries/my-lib3/**

EXCLUDE libraries/my-lib3/README.md

INCLUDE libraries/my-lib4/**

EXCLUDE libraries/my-lib4/README.md

INCLUDE libraries/my-lib4/bad-nested-project/**

EXCLUDE libraries/my-lib4/bad-nested-project/README.md

Even if we permit projects to be nested under other project folders, the prefixes of these globs still form a tree. For an example input path libraries/my-lib3/src/index.ts, imagine an O(n*log(n)) algorithm that would cheaply walk down libraries -> my-lib3, and then need to test only 2 globs ** and README.md.

This idea is similar to rush-lib/src/logic/LookupByPath.ts.

wesley-weiming added 2 commits December 6, 2023 15:51

feat: add performance report

0f062ef

feat: replace glob matching with string matching

d3ebe8d

chengcyber reviewed Dec 7, 2023

View reviewed changes

chore: add performance test guide to README

eb9ff10

octogonz reviewed Dec 11, 2023

View reviewed changes

chore: transform performance data to markdown table

e0930c6

octogonz mentioned this pull request Dec 18, 2023

Line coverage instrumentation [DO NOT MERGE] #3

Draft

octogonz reviewed Dec 18, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add performance report#1

feat: add performance report#1
wesley-weiming wants to merge 4 commits intomainfrom
feat/add-performance-report

wesley-weiming commented Dec 6, 2023

Uh oh!

CLAassistant commented Dec 6, 2023 •

edited

Loading

Uh oh!

chengcyber Dec 7, 2023

Uh oh!

wesley-weiming Dec 7, 2023 •

edited

Loading

Uh oh!

chengcyber Dec 7, 2023

Uh oh!

wesley-weiming Dec 7, 2023

Uh oh!

chengcyber Dec 7, 2023

Uh oh!

wesley-weiming Dec 7, 2023 •

edited

Loading

Uh oh!

chengcyber Dec 7, 2023

Uh oh!

wesley-weiming Dec 7, 2023

Uh oh!

octogonz Dec 11, 2023

Uh oh!

wesley-weiming Dec 15, 2023

Uh oh!

octogonz Dec 18, 2023 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

wesley-weiming commented Dec 6, 2023

Uh oh!

CLAassistant commented Dec 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wesley-weiming Dec 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wesley-weiming Dec 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

octogonz Dec 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

CLAassistant commented Dec 6, 2023 •

edited

Loading

wesley-weiming Dec 7, 2023 •

edited

Loading

wesley-weiming Dec 7, 2023 •

edited

Loading

octogonz Dec 18, 2023 •

edited

Loading