Skip to content

[AI Generated] BugFix: Handle missing cache topology in lscpu output for verify_l3_cache#4458

Open
Gnandeep99 wants to merge 4 commits into
microsoft:mainfrom
Gnandeep99:bugfix/l3-cache-skip-unavailable-topology_010526_141205
Open

[AI Generated] BugFix: Handle missing cache topology in lscpu output for verify_l3_cache#4458
Gnandeep99 wants to merge 4 commits into
microsoft:mainfrom
Gnandeep99:bugfix/l3-cache-skip-unavailable-topology_010526_141205

Conversation

@Gnandeep99

Copy link
Copy Markdown
Collaborator

Summary

On confidential VMs (e.g. Standard_DC2ads_v5), lscpu --extended=cpu,node,socket,cache outputs - in the CACHE column instead of L1d:L1i:L2:L3 format. This caused verify_l3_cache to fail with an assertion error. The fix adds a secondary regex to handle the - format and raises SkippedException when cache topology is not exposed to the guest.

Validation Results

Image VM Size Result
Canonical ubuntu-24_04-lts server 24.04.202408210 Standard_D2ds_v5 PASSED
Canonical 0001-com-ubuntu-confidential-vm-jammy 22_04-lts-cvm 22.04.202604150 Standard_DC2ads_v5 SKIPPED (expected)

Copilot AI review requested due to automatic review settings May 1, 2026 21:53

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates LISA’s CPU cache/NUMA validation to handle lscpu --extended=cpu,node,socket,cache outputs where cache topology is hidden (reported as -), which occurs on some confidential VM sizes. It extends the lscpu parser to accept the alternate format and skips verify_l3_cache when cache topology cannot be verified.

Changes:

  • Extend Lscpu.get_cpu_info() parsing to accept CACHE values of - and return sentinel values for cache IDs.
  • Update verify_l3_cache and its helper path to SkippedException when cache topology isn’t exposed.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
lisa/tools/lscpu.py Adds a secondary parse path for lscpu --extended output where cache IDs are not provided.
lisa/microsoft/testsuites/core/cpu.py Skips L3 cache mapping verification when lscpu reports no cache topology.

Comment thread lisa/tools/lscpu.py Outdated
Comment on lines +313 to +317
l1_data_cache=-1,
l1_instruction_cache=-1,
l2_cache=-1,
l3_cache=-1,
)

Copilot AI May 1, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using literal -1 as a sentinel for unknown cache IDs makes the meaning easy to miss and spreads a magic value across the codebase (the tests also check for -1). Please define a named constant (e.g., UNKNOWN_CACHE_ID) or switch the cache fields to Optional[int] and use None for "unknown" so callers can reliably detect this state without hardcoding -1.

Copilot uses AI. Check for mistakes.
Comment thread lisa/tools/lscpu.py Outdated
)
)
continue
assert False, f"lscpu NUMA node mapping is not in expected format: {item}"

Copilot AI May 1, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fallback failure path uses a built-in assert False to signal an unexpected lscpu format. Python assertions can be stripped with -O, which would silently ignore this error and return incomplete data. Please raise a real exception (e.g., LisaException) or use assertpy (assert_that(...).described_as(...).is_true()) so the failure is always enforced.

Suggested change
assert False, f"lscpu NUMA node mapping is not in expected format: {item}"
raise LisaException(
"lscpu NUMA node mapping is not in the expected format: "
f"{item}. Verify the output of "
"'lscpu --extended=cpu,node,socket,cache' on the target node "
"and update the parser if the format has changed."
)

Copilot uses AI. Check for mistakes.
Comment thread lisa/microsoft/testsuites/core/cpu.py Outdated
Comment on lines +127 to +132
if any(cpu.l3_cache == -1 for cpu in cpu_info):
raise SkippedException(
"Cache topology is not exposed on this VM. "
"lscpu reports no cache information (likely a confidential VM "
"or a VM size that does not expose cache topology to the guest)."
)

Copilot AI May 1, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The skip condition uses any(cpu.l3_cache == -1 ...), but the comment says lscpu reports "-" for all cache values. If only a subset of CPUs has unknown cache IDs (mixed output), any() will skip and potentially hide a real parsing/host issue. Consider using all(...) for the expected "no cache info at all" case, and treat a mixed state as a test failure (or raise a more specific exception).

Suggested change
if any(cpu.l3_cache == -1 for cpu in cpu_info):
raise SkippedException(
"Cache topology is not exposed on this VM. "
"lscpu reports no cache information (likely a confidential VM "
"or a VM size that does not expose cache topology to the guest)."
)
unknown_l3_cache_count = sum(1 for cpu in cpu_info if cpu.l3_cache == -1)
if unknown_l3_cache_count == len(cpu_info):
raise SkippedException(
"Cache topology is not exposed on this VM. "
"lscpu reports no cache information (likely a confidential VM "
"or a VM size that does not expose cache topology to the guest)."
)
if unknown_l3_cache_count:
raise LisaException(
"Inconsistent L3 cache topology reported by lscpu: "
f"{unknown_l3_cache_count} of {len(cpu_info)} CPUs have unknown "
"L3 cache IDs while others have valid values. Investigate lscpu "
"parsing or host cache-topology exposure on this VM."
)

Copilot uses AI. Check for mistakes.
Comment thread lisa/microsoft/testsuites/core/cpu.py Outdated
Comment on lines +312 to +316
if any(cpu.l3_cache == -1 for cpu in cpu_info):
raise SkippedException(
"Cache topology is not exposed on this VM. "
"lscpu reports no cache information."
)

Copilot AI May 1, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above: using any(cpu.l3_cache == -1 ...) will skip even if only some CPUs have unknown cache IDs. If the intent is to skip only when cache topology is entirely hidden, switch to all(...) and fail on mixed/partial cache data so real regressions aren't masked.

Suggested change
if any(cpu.l3_cache == -1 for cpu in cpu_info):
raise SkippedException(
"Cache topology is not exposed on this VM. "
"lscpu reports no cache information."
)
if all(cpu.l3_cache == -1 for cpu in cpu_info):
raise SkippedException(
"Cache topology is not exposed on this VM. "
"lscpu reports no cache information."
)
if any(cpu.l3_cache == -1 for cpu in cpu_info):
raise LisaException(
"Cache topology is partially exposed on this VM: some CPUs "
"report unknown L3 cache IDs while others do not. Verify the "
"guest cache topology reporting and investigate inconsistent "
"lscpu output before rerunning the test."
)

Copilot uses AI. Check for mistakes.
@Gnandeep99 Gnandeep99 force-pushed the bugfix/l3-cache-skip-unavailable-topology_010526_141205 branch from eb32522 to 9a11295 Compare May 1, 2026 22:00
…y handling

- Introduce UNKNOWN_CACHE_ID constant instead of magic -1 sentinel

- Replace AssertionError with LisaException so failure can't be stripped by python -O

- Use full-population check (count == len) and raise LisaException on partial/mixed cache state, so real regressions aren't masked by an any() skip
Copilot AI review requested due to automatic review settings June 9, 2026 06:39

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

Comment on lines 322 to +337
cpu_info = node.tools[Lscpu].get_cpu_info()
unknown_l3_cache_count = sum(
1 for cpu in cpu_info if cpu.l3_cache == UNKNOWN_CACHE_ID
)
if unknown_l3_cache_count == len(cpu_info):
raise SkippedException(
"Cache topology is not exposed on this VM. "
"lscpu reports no cache information."
)
if unknown_l3_cache_count:
raise LisaException(
"Inconsistent L3 cache topology reported by lscpu: "
f"{unknown_l3_cache_count} of {len(cpu_info)} CPUs have unknown "
"L3 cache IDs while others have valid values. Investigate lscpu "
"parsing or host cache-topology exposure on this VM."
)
…g cache-topology helper

Fixes flake8 C901 (complexity 16 > max 15) introduced by previous commit.
Copilot AI review requested due to automatic review settings June 9, 2026 06:49
… parameter

Fixes mypy type-arg error: use list[Any] to match existing helper signatures.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants