Skip to content

burette: add TAP-backed network throughput test#3174

Open
jstarks wants to merge 2 commits intomicrosoft:mainfrom
jstarks:perf_tests_tap
Open

burette: add TAP-backed network throughput test#3174
jstarks wants to merge 2 commits intomicrosoft:mainfrom
jstarks:perf_tests_tap

Conversation

@jstarks
Copy link
Copy Markdown
Member

@jstarks jstarks commented Apr 1, 2026

The Consomme userspace networking stack is convenient but does not exercise the kernel datapath, making it a poor predictor of real-world VM networking performance. Add a TAP backend so burette can measure iperf3 throughput over a kernel TAP device in an isolated user/network namespace, giving a more realistic comparison point without requiring root privileges.

The TAP backend runs an iperf3 helper as a mesh child process that calls unshare(CLONE_NEWUSER | CLONE_NEWNET) while still single-threaded, creates and configures the TAP device inside the namespace, then serves iperf3 server instances via mesh RPC. The guest side detects NICs by MAC address rather than relying on device enumeration order, since VMBus GUID ordering is nondeterministic.

TapHandle in net_backend_resources is changed from carrying a device name string to a pre-opened OwnedFd, resolving an existing TODO and letting the opener (openvmm_entry or the burette helper) control namespace and permission details. The tap module is gated on cfg(unix) accordingly.

Select the backend with --backend consomme|tap alongside the existing --nic vmbus|virtio-net flag.

Copilot AI review requested due to automatic review settings April 1, 2026 18:50
@github-actions github-actions bot added the unsafe Related to unsafe code label Apr 1, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 1, 2026

⚠️ Unsafe Code Detected

This PR modifies files containing unsafe Rust code. Extra scrutiny is required during review.

For more on why we check whole files, instead of just diffs, check out the Rustonomicon

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a TAP-backed networking mode to the burette iperf3 throughput benchmark so it can exercise the host kernel datapath (via a TAP device in an isolated user+net namespace), and updates the TAP resource plumbing to pass a pre-opened fd instead of a device-name string.

Changes:

  • Switch TapHandle (net backend resources) to carry a pre-opened OwnedFd, and update net_tap resolver + openvmm_entry parsers to open TAP fds at the caller.
  • Extend burette’s network test to support --backend consomme|tap, launching an iperf3 mesh helper subprocess (with a Linux unshare variant) and adding TAP NIC wiring + MAC-based NIC detection in the guest.
  • Add new burette helper module + Linux-only deps to support namespace/TAP setup and mesh RPC control of iperf3 servers.

Reviewed changes

Copilot reviewed 9 out of 10 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
vm/devices/net/net_tap/src/resolver.rs Updates TAP endpoint resolver to wrap a pre-opened TAP fd.
vm/devices/net/net_backend_resources/src/lib.rs Changes TAP handle to OwnedFd and gates TAP module behind cfg(unix).
petri/burette/src/tests/network.rs Adds NetBackend selection, helper-driven iperf3 server execution, and Linux TAP backend VM/guest setup.
petri/burette/src/main.rs Adds helper subprocess entrypoints and exposes --backend CLI flag.
petri/burette/src/iperf_helper.rs New mesh RPC helper subprocess for iperf3 server management + Linux TAP namespace setup.
petri/burette/Cargo.toml Adds mesh + Linux-only networking/TAP-related dependencies.
openvmm/openvmm_entry/src/ttrpc/mod.rs Opens TAP device by name and passes the resulting fd via TapHandle.
openvmm/openvmm_entry/src/lib.rs Opens TAP device by name for CLI endpoint config and passes the fd via TapHandle.
openvmm/openvmm_entry/Cargo.toml Adds net_tap dependency on Unix.
Cargo.lock Updates lockfile for new dependency edges.

Comment on lines +360 to +363

// Brief delay to let the server bind.
std::thread::sleep(std::time::Duration::from_millis(500));

Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run_iperf3 is async but uses std::thread::sleep, which blocks the executor thread and can skew perf timings or stall unrelated async work. Use an async delay (e.g., pal_async::timer::PolledTimer with a stored/cloned DefaultDriver) or another non-blocking readiness mechanism (poll for port open) instead of blocking sleep.

Copilot uses AI. Check for mistakes.
Comment on lines +103 to +107
IperfRequest::SetupTap(rpc) => {
rpc.handle(async |config| {
linux::setup_tap_device(&config.name, &config.cidr)
.expect("failed to set up TAP device")
})
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TAP setup RPC handler uses expect("failed to set up TAP device"), which will panic the helper process on common failures (permissions, /dev/net/tun missing, ioctl errors). Return an error to the caller instead so the parent can surface a useful message and tear down cleanly.

Copilot uses AI. Check for mistakes.
@jstarks jstarks marked this pull request as ready for review April 1, 2026 21:53
@jstarks jstarks requested a review from a team as a code owner April 1, 2026 21:53
Copilot AI review requested due to automatic review settings April 1, 2026 21:53
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 10 changed files in this pull request and generated 5 comments.

Comment on lines +489 to +492
c.with_custom_config(|config| match nic {
super::NicBackend::Vmbus => add_tap_nic(config, tap_fd),
super::NicBackend::VirtioNet => add_virtio_tap_nic(config, tap_fd),
})
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tap_fd (OwnedFd) is moved into both match arms inside the with_custom_config closure, which won’t compile because the compiler can’t prove only one branch executes. Restructure so the fd is consumed exactly once (e.g., pick the closure/function based on nic outside the match, or use an if/match that moves tap_fd in a single expression).

Suggested change
c.with_custom_config(|config| match nic {
super::NicBackend::Vmbus => add_tap_nic(config, tap_fd),
super::NicBackend::VirtioNet => add_virtio_tap_nic(config, tap_fd),
})
let add_nic = match nic {
super::NicBackend::Vmbus => add_tap_nic,
super::NicBackend::VirtioNet => add_virtio_tap_nic,
};
c.with_custom_config(move |config| add_nic(config, tap_fd))

Copilot uses AI. Check for mistakes.
Comment on lines +35 to +41
guid.workspace = true
libc.workspace = true
net_backend_resources.workspace = true
net_tap.workspace = true
netvsp_resources.workspace = true
virtio_resources.workspace = true
vm_resource.workspace = true
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

guid, virtio_resources, and vm_resource are already in [dependencies], but they’re repeated under [target.'cfg(target_os = "linux")'.dependencies]. This duplication is unnecessary and can be removed (or, if the intent is to make them Linux-only, they should be removed from the non-target section instead).

Suggested change
guid.workspace = true
libc.workspace = true
net_backend_resources.workspace = true
net_tap.workspace = true
netvsp_resources.workspace = true
virtio_resources.workspace = true
vm_resource.workspace = true
libc.workspace = true
net_backend_resources.workspace = true
net_tap.workspace = true
netvsp_resources.workspace = true

Copilot uses AI. Check for mistakes.
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 2, 2026

Copilot AI review requested due to automatic review settings April 2, 2026 00:35
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 12 changed files in this pull request and generated 2 comments.

}
#[cfg(target_os = "linux")]
NetBackend::Tap => {
let tap_fd = tap_fd.unwrap();
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tap_fd.unwrap() will panic if TAP setup failed or if logic changes in the future. Since this code is part of the test harness setup path, return a structured error instead of panicking (e.g., convert the Option to an error with context).

Suggested change
let tap_fd = tap_fd.unwrap();
let tap_fd = tap_fd
.context("TAP backend selected but TAP fd was not provided")?;

Copilot uses AI. Check for mistakes.
@jstarks jstarks marked this pull request as draft April 2, 2026 02:14
Copilot AI review requested due to automatic review settings April 2, 2026 17:10
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 12 changed files in this pull request and generated 2 comments.

Comment on lines +205 to +206
// Consomme uses this for iperf3; TAP currently uses Alpine + apk
// since the TAP NIC setup needs networking tools not in the erofs.
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says the TAP backend “uses Alpine + apk”, but this test still boots linux_direct (build_firmware), and the TAP guest networking setup uses ip/ifconfig/udhcpc from the initrd shell. Update/remove this comment so it reflects the actual guest image/tooling path to avoid misleading future maintainers.

Suggested change
// Consomme uses this for iperf3; TAP currently uses Alpine + apk
// since the TAP NIC setup needs networking tools not in the erofs.
// Consomme uses this for iperf3. The TAP backend still boots via
// `linux_direct` (`build_firmware`) and uses networking tools
// (ip/ifconfig/udhcpc) from the initrd shell rather than from this erofs.

Copilot uses AI. Check for mistakes.
Comment on lines +99 to +105
if !output.status.success() {
let stderr = String::from_utf8_lossy(&output.stderr);
return Err(format!(
"iperf3 exited with {}: {}",
output.status,
stderr.trim()
));
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The helper treats any non-zero iperf3 server exit status as a hard error and discards stdout, but the previous in-process implementation explicitly noted that the server may exit non-zero while still producing valid JSON results. To avoid flakiness, consider parsing stdout when it’s non-empty (and log/warn on the status) instead of failing solely on !status.success().

Copilot uses AI. Check for mistakes.
@benhillis benhillis added the testing Related to our automated tests and infrastructure label Apr 6, 2026
Copilot AI review requested due to automatic review settings April 7, 2026 17:26
@jstarks jstarks marked this pull request as ready for review April 7, 2026 17:29
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 5 changed files in this pull request and generated 4 comments.

Comment on lines +393 to +397
// Ask the helper to start an iperf3 server (blocks until client done).
let json_future = state.iperf_requests.call_failable(
crate::iperf_helper::IperfRequest::RunIperf3,
crate::iperf_helper::Iperf3Args {
args: vec![
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call_failable starts the helper-side iperf3 server immediately, but if the guest client fails before connecting (e.g., routing/DNS issue), the helper will block in iperf3 -s -1 indefinitely. Dropping json_future won’t cancel the RPC, so the helper can get stuck and future sub-tests/teardown Stop won’t be processed. Add a timeout/idle-timeout and ensure the server process is terminated/awaited even when the guest step errors (e.g., run server under a cancellable task and always join it in a drop/finally path).

Copilot uses AI. Check for mistakes.
Comment on lines +146 to +150
let helper_mesh =
mesh_process::Mesh::new("iperf-helper".to_string()).context("failed to create mesh")?;

let (ready_send, ready_recv) = mesh::oneshot();
helper_mesh
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If any step after launching the helper fails (e.g., TAP setup, VM boot, guest networking), setup() returns early and the Mesh is dropped without an explicit shutdown(). On Unix mesh_process::Mesh doesn’t auto-kill children, so the helper process can be leaked and keep running. Consider using a guard to call shutdown() on all error paths (and/or send Stop) once the helper is launched.

Copilot uses AI. Check for mistakes.
Comment on lines +133 to +137
let ret = unsafe { libc::unshare(libc::CLONE_NEWUSER | libc::CLONE_NEWNET) };
if ret != 0 {
let err = std::io::Error::last_os_error();
eprintln!("unshare(CLONE_NEWUSER | CLONE_NEWNET) failed: {err}");
std::process::exit(1);
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On unshare() failure the helper exits before connecting to mesh, so the parent currently errors with a generic “iperf helper did not respond”. It would be more actionable to propagate a specific error back to the parent (especially for EPERM when unprivileged user namespaces are disabled, or when /dev/net/tun is restricted) so burette run --backend tap fails with a clear message.

Copilot uses AI. Check for mistakes.
Comment on lines +94 to +103
if !output.status.success() {
let stderr = String::from_utf8_lossy(&output.stderr);
tracing::warn!(
status = %output.status,
stderr = %stderr.trim(),
"iperf3 server exited non-zero (may still have valid JSON)"
);
}

Ok(String::from_utf8_lossy(&output.stdout).into_owned())
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RunIperf3 always returns stdout even when iperf3 exits non-zero. When the bind fails (port in use, permission issue, etc.) stdout is often empty and the parent reports “server produced no output”, losing the real cause. Consider returning an Err that includes exit status + stderr when the command fails (or at least when stdout is empty), so failures are diagnosable without digging through helper logs.

Suggested change
if !output.status.success() {
let stderr = String::from_utf8_lossy(&output.stderr);
tracing::warn!(
status = %output.status,
stderr = %stderr.trim(),
"iperf3 server exited non-zero (may still have valid JSON)"
);
}
Ok(String::from_utf8_lossy(&output.stdout).into_owned())
let stdout = String::from_utf8_lossy(&output.stdout).into_owned();
let stderr = String::from_utf8_lossy(&output.stderr).into_owned();
if !output.status.success() {
tracing::warn!(
status = %output.status,
stderr = %stderr.trim(),
"iperf3 server exited non-zero"
);
let stderr = stderr.trim();
let stdout = stdout.trim();
let mut message =
format!("iperf3 exited with status {}.", output.status);
if !stderr.is_empty() {
message.push_str(" stderr: ");
message.push_str(stderr);
}
if !stdout.is_empty() {
message.push_str(" stdout: ");
message.push_str(stdout);
}
return Err(message);
}
Ok(stdout)

Copilot uses AI. Check for mistakes.
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 7, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

testing Related to our automated tests and infrastructure unsafe Related to unsafe code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants