Skip to content

Reconcile bare/rich component identities in ComponentsFound#1760

Open
AMaini503 wants to merge 1 commit intomainfrom
user/aamaini/component-identity-merging-level1
Open

Reconcile bare/rich component identities in ComponentsFound#1760
AMaini503 wants to merge 1 commit intomainfrom
user/aamaini/component-identity-merging-level1

Conversation

@AMaini503
Copy link
Copy Markdown
Contributor

Within a single detector, components registered under bare Ids (no DownloadUrl/SourceUrl) are now merged into their rich counterparts sharing the same BaseId.

Changes:

  • ComponentRecorder.GetDetectedComponents(): group by BaseId, merge bare metadata (licenses, suppliers, containers) into all rich entries
  • DefaultGraphTranslationService.GatherSetOfDetectedComponentsUnmerged(): extend graph lookup to match on BaseId so rich components absorb graph data (roots, ancestors, devDep, scope, file paths) from bare-Id graphs
  • Tests for both reconciliation levels

Addresses work item #2372676.

Copilot AI review requested due to automatic review settings April 2, 2026 22:37
@AMaini503 AMaini503 requested a review from a team as a code owner April 2, 2026 22:37
@AMaini503 AMaini503 requested a review from schmittjoseph April 2, 2026 22:37
@AMaini503 AMaini503 force-pushed the user/aamaini/component-identity-merging-level1 branch from eefbb60 to 0ec6835 Compare April 2, 2026 22:41
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 2, 2026

👋 Hi! It looks like you modified some files in the Detectors folder.
You may need to bump the detector versions if any of the following scenarios apply:

  • The detector detects more or fewer components than before
  • The detector generates different parent/child graph relationships than before
  • The detector generates different devDependencies values than before

If none of the above scenarios apply, feel free to ignore this comment 🙂

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the orchestrator’s component aggregation so that, within a single detector run, components registered under a “bare” identity (no provenance URLs) are reconciled into their corresponding “rich” identities (same BaseId, but with DownloadUrl/SourceUrl). This ensures ComponentsFound reflects a single coherent identity while retaining important metadata and graph-derived fields.

Changes:

  • Reconcile detected components by BaseId, merging bare metadata into rich entries and dropping the bare entries.
  • Extend graph translation to associate graph data with components by either Id or BaseId, so rich components pick up graph info from bare-id graphs.
  • Add unit tests covering reconciliation behavior at both the ComponentRecorder and graph translation layers.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
src/Microsoft.ComponentDetection.Common/DependencyGraph/ComponentRecorder.cs Groups detected components by BaseId and merges bare metadata into rich components.
src/Microsoft.ComponentDetection.Orchestrator/Services/GraphTranslation/DefaultGraphTranslationService.cs Applies graph roots/ancestors/dev-dep/scope/locations using Id or BaseId matching.
test/Microsoft.ComponentDetection.Common.Tests/ComponentRecorderTests.cs Adds tests validating bare/rich reconciliation behavior for component aggregation.
test/Microsoft.ComponentDetection.Orchestrator.Tests/Services/DefaultGraphTranslationServiceTests.cs Adds tests validating rich components absorb graph data from bare-id graphs.

@AMaini503 AMaini503 force-pushed the user/aamaini/component-identity-merging-level1 branch from 0ec6835 to f1ee3e5 Compare April 2, 2026 23:03
// Information about each component is relative to all of the graphs it is present in, so we take all graphs containing a given component and apply the graph data.
foreach (var graphKvp in dependencyGraphsByLocation.Where(x => x.Value.Contains(component.Component.Id)))
// Also match on BaseId to pick up graphs that registered this component under a bare Id (before reconciliation promoted it to a rich Id).
foreach (var graphKvp in dependencyGraphsByLocation.Where(x => x.Value.Contains(component.Component.Id) || (component.Component.Id != component.Component.BaseId && x.Value.Contains(component.Component.BaseId))))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line is a bit difficult to follow, can we make a private utility method that returns true when applicable.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. Extracted to a GraphContainsComponent helper with a doc comment explaining the BaseId fallback logic.

Copilot AI review requested due to automatic review settings April 6, 2026 18:33
@AMaini503 AMaini503 force-pushed the user/aamaini/component-identity-merging-level1 branch from f1ee3e5 to 699abfc Compare April 6, 2026 18:33
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses identity fragmentation in ComponentsFound when the same package is registered with both a bare Id (no DownloadUrl/SourceUrl) and a rich Id (with URL data) within a single detector, by reconciling on BaseId so rich components absorb metadata/graph enrichment from their bare counterparts.

Changes:

  • Reconcile ComponentRecorder.GetDetectedComponents() output by grouping on BaseId and merging bare component metadata into all rich entries.
  • Extend DefaultGraphTranslationService enrichment to match dependency graphs by either full Id or BaseId, allowing rich entries to pick up roots/ancestors/scope/devDep/locations from bare-Id graphs.
  • Add unit tests and supporting spike/sample artifacts documenting npm lockfile behavior and identity reconciliation rationale.
Show a summary per file
File Description
src/Microsoft.ComponentDetection.Common/DependencyGraph/ComponentRecorder.cs Reconciles bare vs rich detected components by BaseId and merges selected metadata into rich entries.
src/Microsoft.ComponentDetection.Orchestrator/Services/GraphTranslation/DefaultGraphTranslationService.cs Updates graph enrichment to treat rich components as present when graphs contain the bare BaseId.
test/Microsoft.ComponentDetection.Common.Tests/ComponentRecorderTests.cs Adds tests validating bare→rich subsumption and metadata merge semantics in GetDetectedComponents().
test/Microsoft.ComponentDetection.Orchestrator.Tests/Services/DefaultGraphTranslationServiceTests.cs Adds tests ensuring graph-derived data is transferred from bare-Id graphs to rich components.
docs/component-identity-reconciliation-design.md Design doc describing reconciliation points and semantics for ComponentsFound vs DependencyGraphs.
docs/component-identity-merging.md Expanded design discussion and scenarios for bare/rich merging behavior.
docs/npm-detector-spike-plan.md Work breakdown and analysis plan for npm detector metadata population (spike).
docs/npm-detector-spike-findings.md Spike findings summarizing lockfile field availability and detector gaps.
docs/npm-lockfile-samples/README.md Documentation of npm lockfile structural differences and detector read paths.
docs/npm-lockfile-samples/v1-lockfile-sample.json Trimmed v1 lockfile excerpt used as documentation reference.
docs/npm-lockfile-samples/v2-lockfile-sample.json Trimmed v2 lockfile excerpt used as documentation reference.
docs/npm-lockfile-samples/v3-lockfile-sample.json Trimmed v3 lockfile excerpt used as documentation reference.
test-npm-spike/package.json Spike project input for npm v3 lockfile generation/repro.
test-npm-spike/package-lock.json Spike npm v3 lockfile (full) captured for repro.
test-npm-spike/baseline-output/GovCompDisc_Log_20260313102052505_25872.log Captured run log artifact for the spike.
test-npm-spike/baseline-output/ScanManifest_20260313102052512.json Captured scan manifest artifact for the spike run.
test-npm-spike-v1/package.json Spike project input for npm v2 lockfile generation/repro.
test-npm-spike-v1/package-lock.json Spike npm v2 lockfile (full) captured for repro.
test-npm-spike-v1/baseline-output/GovCompDisc_Log_20260313102109400_18928.log Captured run log artifact for the spike.
test-npm-spike-v1/baseline-output/ScanManifest_20260313102109408.json Captured scan manifest artifact for the spike run.
test-npm-spike-v1-only/package.json Spike project input for npm v1 lockfile generation/repro.
test-npm-spike-v1-only/package-lock.json Spike npm v1 lockfile (full) captured for repro.
test-npm-spike-v1-only/baseline-output/GovCompDisc_Log_20260313102122587_32472.log Captured run log artifact for the spike.
test-npm-spike-v1-only/baseline-output/ScanManifest_20260313102122594.json Captured scan manifest artifact for the spike run.

Copilot's findings

Files not reviewed (3)
  • test-npm-spike-v1-only/package-lock.json: Language not supported
  • test-npm-spike-v1/package-lock.json: Language not supported
  • test-npm-spike/package-lock.json: Language not supported
  • Files reviewed: 16/24 changed files
  • Comments generated: 1

Comment on lines +110 to +114

target.LicensesConcluded = MergeAndNormalizeLicenses(target.LicensesConcluded, source.LicensesConcluded);
target.Suppliers = MergeAndNormalizeSuppliers(target.Suppliers, source.Suppliers);
}

Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When subsuming/merging DetectedComponent instances, MergeComponentMetadata currently merges container metadata + LicensesConcluded/Suppliers, but does not merge FilePaths (custom locations added by detectors via AddComponentFilePath) or TargetFrameworks (added in RegisterUsage). Because bare entries (and duplicate Id entries) can be dropped during reconciliation, those locations/frameworks can be lost from ComponentsFound. Consider unioning source.FilePaths into target.FilePaths and merging TargetFrameworks similarly to the existing MergeTargetFrameworks logic so this metadata survives reconciliation.

Suggested change
target.LicensesConcluded = MergeAndNormalizeLicenses(target.LicensesConcluded, source.LicensesConcluded);
target.Suppliers = MergeAndNormalizeSuppliers(target.Suppliers, source.Suppliers);
}
target.FilePaths = MergeAndNormalizeStrings(target.FilePaths, source.FilePaths, StringComparer.Ordinal);
target.TargetFrameworks = MergeAndNormalizeStrings(target.TargetFrameworks, source.TargetFrameworks, StringComparer.OrdinalIgnoreCase);
target.LicensesConcluded = MergeAndNormalizeLicenses(target.LicensesConcluded, source.LicensesConcluded);
target.Suppliers = MergeAndNormalizeSuppliers(target.Suppliers, source.Suppliers);
}
private static IList<string> MergeAndNormalizeStrings(IList<string> target, IList<string> source, StringComparer comparer)
{
if (target == null && source == null)
{
return null;
}
var merged = new HashSet<string>(comparer);
if (target != null)
{
merged.UnionWith(target.Where(x => x != null));
}
if (source != null)
{
merged.UnionWith(source.Where(x => x != null));
}
return merged.OrderBy(x => x, comparer).ToList();
}

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not applicable here. FilePaths and TargetFrameworks are not populated at this stage - they get filled downstream in GatherSetOfDetectedComponentsUnmerged from graph data. The original GetDetectedComponents code also only merged licenses, suppliers, and containers, so this is consistent with existing behavior.

Within a single detector, components registered under bare Ids (no
DownloadUrl/SourceUrl) are now merged into their rich counterparts
sharing the same BaseId.

Changes:
- ComponentRecorder.GetDetectedComponents(): group by BaseId, merge
  bare metadata (licenses, suppliers, containers) into all rich entries
- DefaultGraphTranslationService.GatherSetOfDetectedComponentsUnmerged():
  extend graph lookup to match on BaseId so rich components absorb
  graph data (roots, ancestors, devDep, scope, file paths) from
  bare-Id graphs
- Tests for both reconciliation levels

Addresses work item #2372676.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@AMaini503 AMaini503 force-pushed the user/aamaini/component-identity-merging-level1 branch from 699abfc to b06a7d2 Compare April 6, 2026 18:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants