Reconcile bare/rich component identities in ComponentsFound#1760
Reconcile bare/rich component identities in ComponentsFound#1760
Conversation
eefbb60 to
0ec6835
Compare
|
👋 Hi! It looks like you modified some files in the
If none of the above scenarios apply, feel free to ignore this comment 🙂 |
There was a problem hiding this comment.
Pull request overview
This PR updates the orchestrator’s component aggregation so that, within a single detector run, components registered under a “bare” identity (no provenance URLs) are reconciled into their corresponding “rich” identities (same BaseId, but with DownloadUrl/SourceUrl). This ensures ComponentsFound reflects a single coherent identity while retaining important metadata and graph-derived fields.
Changes:
- Reconcile detected components by
BaseId, merging bare metadata into rich entries and dropping the bare entries. - Extend graph translation to associate graph data with components by either
IdorBaseId, so rich components pick up graph info from bare-id graphs. - Add unit tests covering reconciliation behavior at both the
ComponentRecorderand graph translation layers.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
src/Microsoft.ComponentDetection.Common/DependencyGraph/ComponentRecorder.cs |
Groups detected components by BaseId and merges bare metadata into rich components. |
src/Microsoft.ComponentDetection.Orchestrator/Services/GraphTranslation/DefaultGraphTranslationService.cs |
Applies graph roots/ancestors/dev-dep/scope/locations using Id or BaseId matching. |
test/Microsoft.ComponentDetection.Common.Tests/ComponentRecorderTests.cs |
Adds tests validating bare/rich reconciliation behavior for component aggregation. |
test/Microsoft.ComponentDetection.Orchestrator.Tests/Services/DefaultGraphTranslationServiceTests.cs |
Adds tests validating rich components absorb graph data from bare-id graphs. |
src/Microsoft.ComponentDetection.Common/DependencyGraph/ComponentRecorder.cs
Show resolved
Hide resolved
src/Microsoft.ComponentDetection.Common/DependencyGraph/ComponentRecorder.cs
Show resolved
Hide resolved
src/Microsoft.ComponentDetection.Common/DependencyGraph/ComponentRecorder.cs
Outdated
Show resolved
Hide resolved
0ec6835 to
f1ee3e5
Compare
| // Information about each component is relative to all of the graphs it is present in, so we take all graphs containing a given component and apply the graph data. | ||
| foreach (var graphKvp in dependencyGraphsByLocation.Where(x => x.Value.Contains(component.Component.Id))) | ||
| // Also match on BaseId to pick up graphs that registered this component under a bare Id (before reconciliation promoted it to a rich Id). | ||
| foreach (var graphKvp in dependencyGraphsByLocation.Where(x => x.Value.Contains(component.Component.Id) || (component.Component.Id != component.Component.BaseId && x.Value.Contains(component.Component.BaseId)))) |
There was a problem hiding this comment.
this line is a bit difficult to follow, can we make a private utility method that returns true when applicable.
There was a problem hiding this comment.
Good call. Extracted to a GraphContainsComponent helper with a doc comment explaining the BaseId fallback logic.
f1ee3e5 to
699abfc
Compare
There was a problem hiding this comment.
Pull request overview
This PR addresses identity fragmentation in ComponentsFound when the same package is registered with both a bare Id (no DownloadUrl/SourceUrl) and a rich Id (with URL data) within a single detector, by reconciling on BaseId so rich components absorb metadata/graph enrichment from their bare counterparts.
Changes:
- Reconcile
ComponentRecorder.GetDetectedComponents()output by grouping onBaseIdand merging bare component metadata into all rich entries. - Extend
DefaultGraphTranslationServiceenrichment to match dependency graphs by either full Id orBaseId, allowing rich entries to pick up roots/ancestors/scope/devDep/locations from bare-Id graphs. - Add unit tests and supporting spike/sample artifacts documenting npm lockfile behavior and identity reconciliation rationale.
Show a summary per file
| File | Description |
|---|---|
| src/Microsoft.ComponentDetection.Common/DependencyGraph/ComponentRecorder.cs | Reconciles bare vs rich detected components by BaseId and merges selected metadata into rich entries. |
| src/Microsoft.ComponentDetection.Orchestrator/Services/GraphTranslation/DefaultGraphTranslationService.cs | Updates graph enrichment to treat rich components as present when graphs contain the bare BaseId. |
| test/Microsoft.ComponentDetection.Common.Tests/ComponentRecorderTests.cs | Adds tests validating bare→rich subsumption and metadata merge semantics in GetDetectedComponents(). |
| test/Microsoft.ComponentDetection.Orchestrator.Tests/Services/DefaultGraphTranslationServiceTests.cs | Adds tests ensuring graph-derived data is transferred from bare-Id graphs to rich components. |
| docs/component-identity-reconciliation-design.md | Design doc describing reconciliation points and semantics for ComponentsFound vs DependencyGraphs. |
| docs/component-identity-merging.md | Expanded design discussion and scenarios for bare/rich merging behavior. |
| docs/npm-detector-spike-plan.md | Work breakdown and analysis plan for npm detector metadata population (spike). |
| docs/npm-detector-spike-findings.md | Spike findings summarizing lockfile field availability and detector gaps. |
| docs/npm-lockfile-samples/README.md | Documentation of npm lockfile structural differences and detector read paths. |
| docs/npm-lockfile-samples/v1-lockfile-sample.json | Trimmed v1 lockfile excerpt used as documentation reference. |
| docs/npm-lockfile-samples/v2-lockfile-sample.json | Trimmed v2 lockfile excerpt used as documentation reference. |
| docs/npm-lockfile-samples/v3-lockfile-sample.json | Trimmed v3 lockfile excerpt used as documentation reference. |
| test-npm-spike/package.json | Spike project input for npm v3 lockfile generation/repro. |
| test-npm-spike/package-lock.json | Spike npm v3 lockfile (full) captured for repro. |
| test-npm-spike/baseline-output/GovCompDisc_Log_20260313102052505_25872.log | Captured run log artifact for the spike. |
| test-npm-spike/baseline-output/ScanManifest_20260313102052512.json | Captured scan manifest artifact for the spike run. |
| test-npm-spike-v1/package.json | Spike project input for npm v2 lockfile generation/repro. |
| test-npm-spike-v1/package-lock.json | Spike npm v2 lockfile (full) captured for repro. |
| test-npm-spike-v1/baseline-output/GovCompDisc_Log_20260313102109400_18928.log | Captured run log artifact for the spike. |
| test-npm-spike-v1/baseline-output/ScanManifest_20260313102109408.json | Captured scan manifest artifact for the spike run. |
| test-npm-spike-v1-only/package.json | Spike project input for npm v1 lockfile generation/repro. |
| test-npm-spike-v1-only/package-lock.json | Spike npm v1 lockfile (full) captured for repro. |
| test-npm-spike-v1-only/baseline-output/GovCompDisc_Log_20260313102122587_32472.log | Captured run log artifact for the spike. |
| test-npm-spike-v1-only/baseline-output/ScanManifest_20260313102122594.json | Captured scan manifest artifact for the spike run. |
Copilot's findings
Files not reviewed (3)
- test-npm-spike-v1-only/package-lock.json: Language not supported
- test-npm-spike-v1/package-lock.json: Language not supported
- test-npm-spike/package-lock.json: Language not supported
- Files reviewed: 16/24 changed files
- Comments generated: 1
|
|
||
| target.LicensesConcluded = MergeAndNormalizeLicenses(target.LicensesConcluded, source.LicensesConcluded); | ||
| target.Suppliers = MergeAndNormalizeSuppliers(target.Suppliers, source.Suppliers); | ||
| } | ||
|
|
There was a problem hiding this comment.
When subsuming/merging DetectedComponent instances, MergeComponentMetadata currently merges container metadata + LicensesConcluded/Suppliers, but does not merge FilePaths (custom locations added by detectors via AddComponentFilePath) or TargetFrameworks (added in RegisterUsage). Because bare entries (and duplicate Id entries) can be dropped during reconciliation, those locations/frameworks can be lost from ComponentsFound. Consider unioning source.FilePaths into target.FilePaths and merging TargetFrameworks similarly to the existing MergeTargetFrameworks logic so this metadata survives reconciliation.
| target.LicensesConcluded = MergeAndNormalizeLicenses(target.LicensesConcluded, source.LicensesConcluded); | |
| target.Suppliers = MergeAndNormalizeSuppliers(target.Suppliers, source.Suppliers); | |
| } | |
| target.FilePaths = MergeAndNormalizeStrings(target.FilePaths, source.FilePaths, StringComparer.Ordinal); | |
| target.TargetFrameworks = MergeAndNormalizeStrings(target.TargetFrameworks, source.TargetFrameworks, StringComparer.OrdinalIgnoreCase); | |
| target.LicensesConcluded = MergeAndNormalizeLicenses(target.LicensesConcluded, source.LicensesConcluded); | |
| target.Suppliers = MergeAndNormalizeSuppliers(target.Suppliers, source.Suppliers); | |
| } | |
| private static IList<string> MergeAndNormalizeStrings(IList<string> target, IList<string> source, StringComparer comparer) | |
| { | |
| if (target == null && source == null) | |
| { | |
| return null; | |
| } | |
| var merged = new HashSet<string>(comparer); | |
| if (target != null) | |
| { | |
| merged.UnionWith(target.Where(x => x != null)); | |
| } | |
| if (source != null) | |
| { | |
| merged.UnionWith(source.Where(x => x != null)); | |
| } | |
| return merged.OrderBy(x => x, comparer).ToList(); | |
| } |
There was a problem hiding this comment.
Not applicable here. FilePaths and TargetFrameworks are not populated at this stage - they get filled downstream in GatherSetOfDetectedComponentsUnmerged from graph data. The original GetDetectedComponents code also only merged licenses, suppliers, and containers, so this is consistent with existing behavior.
Within a single detector, components registered under bare Ids (no DownloadUrl/SourceUrl) are now merged into their rich counterparts sharing the same BaseId. Changes: - ComponentRecorder.GetDetectedComponents(): group by BaseId, merge bare metadata (licenses, suppliers, containers) into all rich entries - DefaultGraphTranslationService.GatherSetOfDetectedComponentsUnmerged(): extend graph lookup to match on BaseId so rich components absorb graph data (roots, ancestors, devDep, scope, file paths) from bare-Id graphs - Tests for both reconciliation levels Addresses work item #2372676. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
699abfc to
b06a7d2
Compare
Within a single detector, components registered under bare Ids (no DownloadUrl/SourceUrl) are now merged into their rich counterparts sharing the same BaseId.
Changes:
Addresses work item #2372676.