Reconcile bare/rich component identities in ComponentsFound by AMaini503 · Pull Request #1760 · microsoft/component-detection

AMaini503 · 2026-04-02T22:37:08Z

Within a single detector, components registered under bare Ids (no DownloadUrl/SourceUrl) are now merged into their rich counterparts sharing the same BaseId.

Changes:

ComponentRecorder.GetDetectedComponents(): group by BaseId, merge bare metadata (licenses, suppliers, containers) into all rich entries
DefaultGraphTranslationService.GatherSetOfDetectedComponentsUnmerged(): extend graph lookup to match on BaseId so rich components absorb graph data (roots, ancestors, devDep, scope, file paths) from bare-Id graphs
Tests for both reconciliation levels

Addresses work item #2372676.

github-actions · 2026-04-02T22:41:28Z

👋 Hi! It looks like you modified some files in the Detectors folder.
You may need to bump the detector versions if any of the following scenarios apply:

The detector detects more or fewer components than before
The detector generates different parent/child graph relationships than before
The detector generates different devDependencies values than before

If none of the above scenarios apply, feel free to ignore this comment 🙂

Copilot

Pull request overview

This PR updates the orchestrator’s component aggregation so that, within a single detector run, components registered under a “bare” identity (no provenance URLs) are reconciled into their corresponding “rich” identities (same BaseId, but with DownloadUrl/SourceUrl). This ensures ComponentsFound reflects a single coherent identity while retaining important metadata and graph-derived fields.

Changes:

Reconcile detected components by BaseId, merging bare metadata into rich entries and dropping the bare entries.
Extend graph translation to associate graph data with components by either Id or BaseId, so rich components pick up graph info from bare-id graphs.
Add unit tests covering reconciliation behavior at both the ComponentRecorder and graph translation layers.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File	Description
`src/Microsoft.ComponentDetection.Common/DependencyGraph/ComponentRecorder.cs`	Groups detected components by `BaseId` and merges bare metadata into rich components.
`src/Microsoft.ComponentDetection.Orchestrator/Services/GraphTranslation/DefaultGraphTranslationService.cs`	Applies graph roots/ancestors/dev-dep/scope/locations using `Id` or `BaseId` matching.
`test/Microsoft.ComponentDetection.Common.Tests/ComponentRecorderTests.cs`	Adds tests validating bare/rich reconciliation behavior for component aggregation.
`test/Microsoft.ComponentDetection.Orchestrator.Tests/Services/DefaultGraphTranslationServiceTests.cs`	Adds tests validating rich components absorb graph data from bare-id graphs.

src/Microsoft.ComponentDetection.Common/DependencyGraph/ComponentRecorder.cs

grvillic · 2026-04-04T00:26:49Z

....ComponentDetection.Orchestrator/Services/GraphTranslation/DefaultGraphTranslationService.cs

                    // Information about each component is relative to all of the graphs it is present in, so we take all graphs containing a given component and apply the graph data.
-                    foreach (var graphKvp in dependencyGraphsByLocation.Where(x => x.Value.Contains(component.Component.Id)))
+                    // Also match on BaseId to pick up graphs that registered this component under a bare Id (before reconciliation promoted it to a rich Id).
+                    foreach (var graphKvp in dependencyGraphsByLocation.Where(x => x.Value.Contains(component.Component.Id) || (component.Component.Id != component.Component.BaseId && x.Value.Contains(component.Component.BaseId))))


this line is a bit difficult to follow, can we make a private utility method that returns true when applicable.

Good call. Extracted to a GraphContainsComponent helper with a doc comment explaining the BaseId fallback logic.

Copilot

Pull request overview

This PR addresses identity fragmentation in ComponentsFound when the same package is registered with both a bare Id (no DownloadUrl/SourceUrl) and a rich Id (with URL data) within a single detector, by reconciling on BaseId so rich components absorb metadata/graph enrichment from their bare counterparts.

Changes:

Reconcile ComponentRecorder.GetDetectedComponents() output by grouping on BaseId and merging bare component metadata into all rich entries.
Extend DefaultGraphTranslationService enrichment to match dependency graphs by either full Id or BaseId, allowing rich entries to pick up roots/ancestors/scope/devDep/locations from bare-Id graphs.
Add unit tests and supporting spike/sample artifacts documenting npm lockfile behavior and identity reconciliation rationale.

Show a summary per file

File	Description
src/Microsoft.ComponentDetection.Common/DependencyGraph/ComponentRecorder.cs	Reconciles bare vs rich detected components by `BaseId` and merges selected metadata into rich entries.
src/Microsoft.ComponentDetection.Orchestrator/Services/GraphTranslation/DefaultGraphTranslationService.cs	Updates graph enrichment to treat rich components as present when graphs contain the bare `BaseId`.
test/Microsoft.ComponentDetection.Common.Tests/ComponentRecorderTests.cs	Adds tests validating bare→rich subsumption and metadata merge semantics in `GetDetectedComponents()`.
test/Microsoft.ComponentDetection.Orchestrator.Tests/Services/DefaultGraphTranslationServiceTests.cs	Adds tests ensuring graph-derived data is transferred from bare-Id graphs to rich components.
docs/component-identity-reconciliation-design.md	Design doc describing reconciliation points and semantics for `ComponentsFound` vs `DependencyGraphs`.
docs/component-identity-merging.md	Expanded design discussion and scenarios for bare/rich merging behavior.
docs/npm-detector-spike-plan.md	Work breakdown and analysis plan for npm detector metadata population (spike).
docs/npm-detector-spike-findings.md	Spike findings summarizing lockfile field availability and detector gaps.
docs/npm-lockfile-samples/README.md	Documentation of npm lockfile structural differences and detector read paths.
docs/npm-lockfile-samples/v1-lockfile-sample.json	Trimmed v1 lockfile excerpt used as documentation reference.
docs/npm-lockfile-samples/v2-lockfile-sample.json	Trimmed v2 lockfile excerpt used as documentation reference.
docs/npm-lockfile-samples/v3-lockfile-sample.json	Trimmed v3 lockfile excerpt used as documentation reference.
test-npm-spike/package.json	Spike project input for npm v3 lockfile generation/repro.
test-npm-spike/package-lock.json	Spike npm v3 lockfile (full) captured for repro.
test-npm-spike/baseline-output/GovCompDisc_Log_20260313102052505_25872.log	Captured run log artifact for the spike.
test-npm-spike/baseline-output/ScanManifest_20260313102052512.json	Captured scan manifest artifact for the spike run.
test-npm-spike-v1/package.json	Spike project input for npm v2 lockfile generation/repro.
test-npm-spike-v1/package-lock.json	Spike npm v2 lockfile (full) captured for repro.
test-npm-spike-v1/baseline-output/GovCompDisc_Log_20260313102109400_18928.log	Captured run log artifact for the spike.
test-npm-spike-v1/baseline-output/ScanManifest_20260313102109408.json	Captured scan manifest artifact for the spike run.
test-npm-spike-v1-only/package.json	Spike project input for npm v1 lockfile generation/repro.
test-npm-spike-v1-only/package-lock.json	Spike npm v1 lockfile (full) captured for repro.
test-npm-spike-v1-only/baseline-output/GovCompDisc_Log_20260313102122587_32472.log	Captured run log artifact for the spike.
test-npm-spike-v1-only/baseline-output/ScanManifest_20260313102122594.json	Captured scan manifest artifact for the spike run.

Copilot's findings

Files not reviewed (3)

test-npm-spike-v1-only/package-lock.json: Language not supported
test-npm-spike-v1/package-lock.json: Language not supported
test-npm-spike/package-lock.json: Language not supported

Files reviewed: 16/24 changed files
Comments generated: 1

Copilot · 2026-04-06T18:39:16Z

src/Microsoft.ComponentDetection.Common/DependencyGraph/ComponentRecorder.cs

+
+        target.LicensesConcluded = MergeAndNormalizeLicenses(target.LicensesConcluded, source.LicensesConcluded);
+        target.Suppliers = MergeAndNormalizeSuppliers(target.Suppliers, source.Suppliers);
+    }
+


When subsuming/merging DetectedComponent instances, MergeComponentMetadata currently merges container metadata + LicensesConcluded/Suppliers, but does not merge FilePaths (custom locations added by detectors via AddComponentFilePath) or TargetFrameworks (added in RegisterUsage). Because bare entries (and duplicate Id entries) can be dropped during reconciliation, those locations/frameworks can be lost from ComponentsFound. Consider unioning source.FilePaths into target.FilePaths and merging TargetFrameworks similarly to the existing MergeTargetFrameworks logic so this metadata survives reconciliation.

Suggested change

target.LicensesConcluded = MergeAndNormalizeLicenses(target.LicensesConcluded, source.LicensesConcluded);

target.Suppliers = MergeAndNormalizeSuppliers(target.Suppliers, source.Suppliers);

}

target.FilePaths = MergeAndNormalizeStrings(target.FilePaths, source.FilePaths, StringComparer.Ordinal);

target.TargetFrameworks = MergeAndNormalizeStrings(target.TargetFrameworks, source.TargetFrameworks, StringComparer.OrdinalIgnoreCase);

target.LicensesConcluded = MergeAndNormalizeLicenses(target.LicensesConcluded, source.LicensesConcluded);

target.Suppliers = MergeAndNormalizeSuppliers(target.Suppliers, source.Suppliers);

}

private static IList<string> MergeAndNormalizeStrings(IList<string> target, IList<string> source, StringComparer comparer)

{

if (target == null && source == null)

{

return null;

}

var merged = new HashSet<string>(comparer);

if (target != null)

{

merged.UnionWith(target.Where(x => x != null));

}

if (source != null)

{

merged.UnionWith(source.Where(x => x != null));

}

return merged.OrderBy(x => x, comparer).ToList();

}

Not applicable here. FilePaths and TargetFrameworks are not populated at this stage - they get filled downstream in GatherSetOfDetectedComponentsUnmerged from graph data. The original GetDetectedComponents code also only merged licenses, suppliers, and containers, so this is consistent with existing behavior.

Within a single detector, components registered under bare Ids (no DownloadUrl/SourceUrl) are now merged into their rich counterparts sharing the same BaseId. Changes: - ComponentRecorder.GetDetectedComponents(): group by BaseId, merge bare metadata (licenses, suppliers, containers) into all rich entries - DefaultGraphTranslationService.GatherSetOfDetectedComponentsUnmerged(): extend graph lookup to match on BaseId so rich components absorb graph data (roots, ancestors, devDep, scope, file paths) from bare-Id graphs - Tests for both reconciliation levels Addresses work item #2372676. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings April 2, 2026 22:37

AMaini503 requested a review from a team as a code owner April 2, 2026 22:37

AMaini503 requested a review from schmittjoseph April 2, 2026 22:37

Copilot started reviewing on behalf of AMaini503 April 2, 2026 22:37 View session

AMaini503 force-pushed the user/aamaini/component-identity-merging-level1 branch from eefbb60 to 0ec6835 Compare April 2, 2026 22:41

Copilot AI reviewed Apr 2, 2026

View reviewed changes

AMaini503 force-pushed the user/aamaini/component-identity-merging-level1 branch from 0ec6835 to f1ee3e5 Compare April 2, 2026 23:03

grvillic reviewed Apr 4, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings April 6, 2026 18:33

AMaini503 force-pushed the user/aamaini/component-identity-merging-level1 branch from f1ee3e5 to 699abfc Compare April 6, 2026 18:33

Copilot started reviewing on behalf of AMaini503 April 6, 2026 18:33 View session

Copilot AI reviewed Apr 6, 2026

View reviewed changes

AMaini503 force-pushed the user/aamaini/component-identity-merging-level1 branch from 699abfc to b06a7d2 Compare April 6, 2026 18:43

grvillic approved these changes Apr 6, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reconcile bare/rich component identities in ComponentsFound#1760

Reconcile bare/rich component identities in ComponentsFound#1760
AMaini503 wants to merge 1 commit intomainfrom
user/aamaini/component-identity-merging-level1

AMaini503 commented Apr 2, 2026

Uh oh!

github-actions bot commented Apr 2, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

grvillic Apr 4, 2026

Uh oh!

AMaini503 Apr 6, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 6, 2026

Uh oh!

AMaini503 Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

-        target.LicensesConcluded = MergeAndNormalizeLicenses(target.LicensesConcluded, source.LicensesConcluded);
-        target.Suppliers = MergeAndNormalizeSuppliers(target.Suppliers, source.Suppliers);
-    }
+        target.FilePaths = MergeAndNormalizeStrings(target.FilePaths, source.FilePaths, StringComparer.Ordinal);
+        target.TargetFrameworks = MergeAndNormalizeStrings(target.TargetFrameworks, source.TargetFrameworks, StringComparer.OrdinalIgnoreCase);
+        target.LicensesConcluded = MergeAndNormalizeLicenses(target.LicensesConcluded, source.LicensesConcluded);
+        target.Suppliers = MergeAndNormalizeSuppliers(target.Suppliers, source.Suppliers);
+    }
+    private static IList<string> MergeAndNormalizeStrings(IList<string> target, IList<string> source, StringComparer comparer)
+    {
+        if (target == null && source == null)
+        {
+            return null;
+        }
+        var merged = new HashSet<string>(comparer);
+        if (target != null)
+        {
+            merged.UnionWith(target.Where(x => x != null));
+        }
+        if (source != null)
+        {
+            merged.UnionWith(source.Where(x => x != null));
+        }
+        return merged.OrderBy(x => x, comparer).ToList();
+    }

Conversation

AMaini503 commented Apr 2, 2026

Uh oh!

github-actions bot commented Apr 2, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

grvillic Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

AMaini503 Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

AMaini503 Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants