Skip to content

netvsp & net_mana - instrument VTL0 & host calls at risk of hanging#3170

Open
ben-zen wants to merge 5 commits intomicrosoft:mainfrom
ben-zen:netvsp-span-guest-calls
Open

netvsp & net_mana - instrument VTL0 & host calls at risk of hanging#3170
ben-zen wants to merge 5 commits intomicrosoft:mainfrom
ben-zen:netvsp-span-guest-calls

Conversation

@ben-zen
Copy link
Copy Markdown
Contributor

@ben-zen ben-zen commented Apr 1, 2026

There's a bunch of async operations with long or no timeout which are at risk of hanging the NetVSP code; adding spans to those actions makes them more obvious in tracing, with additional context to track down miscreant calls or make a misbehaving VM more obvious.

@ben-zen ben-zen requested a review from a team as a code owner April 1, 2026 00:20
Copilot AI review requested due to automatic review settings April 1, 2026 00:20
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves observability around NetVSP and MANA endpoint operations that can take a long time or potentially hang, by adding tracing spans with relevant identifiers (instance/vport/VF IDs). This helps correlate stalls in async shutdown/reconfiguration paths when debugging misbehaving VMs or host/VTL0 interactions.

Changes:

  • Add an info_span around NetVSP coordinator endpoint shutdown.
  • Add info_span instrumentation around MANA endpoint teardown steps (RX disable, vport resource destruction, queue drain wait).
  • Add info_span instrumentation around Underhill VTL0 VF offer/remove notification calls, plus an explicit log when VTL0 bus isn’t present.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
vm/devices/net/netvsp/src/lib.rs Instruments coordinator endpoint stop().await with an info span including instance_id.
vm/devices/net/net_mana/src/lib.rs Adds tracing::Instrument and spans around endpoint stop/cleanup operations with vport_id.
openhcl/underhill_core/src/emuplat/netvsp.rs Adds spans around VTL0 VF removal notice and VF offer call to better pinpoint stalls and add context.

Copy link
Copy Markdown
Contributor

@erfrimod erfrimod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment to trace one more thing, but otherwise ready to go. :)

@benhillis benhillis added the enhancement New feature or request label Apr 6, 2026
take Brian's better comment.

Co-authored-by: Brian Perkins <Brian-Perkins@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 7, 2026 18:27
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

Comment on lines +873 to 882
} else {
tracing::info!(
vtl2_vfid,
vtl0_vfid = vtl0_vfid_from_bus_control(&self.vtl0_bus_control),
tracing::info!(
vtl2_vfid,
%vtl0_bus_control,
"Ignoring VTL0 device request from guest"
);
}
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The else branch has a malformed tracing::info! invocation (a second tracing::info! starts before the first is closed), which will not compile. Collapse this into a single tracing::info! call that includes all desired fields (vtl2_vfid, vtl0_vfid, %vtl0_bus_control) and the message, and ensure the macro invocation is properly closed.

Copilot uses AI. Check for mistakes.
Comment on lines +1338 to +1346
self.coordinator
.task_mut()
.endpoint
.stop()
.instrument(tracing::info_span!(
"stopping coordinator endpoint",
instance_id = %self.instance_id,
))
.await;
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This uses .instrument(...), which requires the tracing::Instrument trait to be in scope for method resolution. If this module doesn’t already use tracing::Instrument;, the build will fail—add that import alongside the other use statements (or use UFCS).

Copilot uses AI. Check for mistakes.
Comment on lines +465 to +471
if let Err(err) = self
.send_vf_state_change_notifications()
.instrument(tracing::info_span!(
"sending VTL0 VF removal notice",
vtl2_vfid,
vtl0_bus = %bus_control))
.await
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This uses .instrument(...), which requires the tracing::Instrument trait to be in scope for method resolution. If this module doesn’t already use tracing::Instrument;, compilation will fail—add the import near the top-level use list (or switch to UFCS to avoid relying on the trait being in scope).

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants