storage: add usermode storvsc disk backend for OpenHCL by juantian8seattle · Pull Request #3193 · microsoft/openvmm

juantian8seattle · 2026-04-04T03:55:54Z

Summary

Add disk_storvsc, a DiskIo backend that handles guest SCSI I/O in VTL2
userspace via storvsc_driver over VMBus UIO. When enabled, OpenHCL
intercepts VScsi controller channels and translates block I/O into
SCSI CDBs, bypassing the kernel hv_storvsc driver for relay controllers.

Opt-in via OPENHCL_STORVSC_USERMODE=1 environment variable.
Not enabled by default.

What's included

New crate: disk_storvsc

DiskIo implementation using SCSI CDBs over storvsc_driver
Async constructor pre-fetches device metadata (capacity, disk ID, read-only, unmap support)
READ(16), WRITE(16), UNMAP, READ_CAPACITY, INQUIRY, MODE_SENSE CDB formatting
CD-ROM detection via INQUIRY for READ_CAPACITY(10) vs (16) selection

storvsc_driver enhancements

DMA buffer allocation for bounce-buffer I/O
Resize listener support (host-initiated disk resize)
Save/restore for live migration
Mesh-based async lifecycle

OpenHCL integration (underhill_core)

StorvscManager: actor-based manager following NvmeManager pattern
StorvscDiskResolver: resolves StorvscDiskConfig from VTL2 settings
On-demand UIO channel claiming (claim_vmbus_device_for_uio)
VScsi device routing in vtl2_settings_worker
Save/restore state at mesh(10004)

Supporting changes

scsi_defs: additional SCSI structs for CDB construction
storvsp_protocol: Inspect derive on protocol types
vmbus_user_channel: configurable ring buffer sizes for message_pipe
guestmem: LockedPages::va() accessor

Testing

cargo test -p storvsc_driver -- 3 unit tests pass
All openvmm_openhcl_linux_x64_storvsp* VMM tests pass with STORVSC_USERMODE=1
DVD tests pass (SenseData decode fix)
Boot, reboot, dynamic disk add tests pass
No regressions when STORVSC_USERMODE=0 (default)

Related issues

Closes #3094, closes #3095, closes #3096
Ref #273

Notes for reviewers

OPENHCL_STORVSC_USERMODE=1 is set in openhcl_boot/src/main.rs for testing.
Will be removed before merge (feature is opt-in only via env var).

github-actions · 2026-04-04T03:56:12Z

⚠️ Unsafe Code Detected

This PR modifies files containing unsafe Rust code. Extra scrutiny is required during review.

For more on why we check whole files, instead of just diffs, check out the Rustonomicon

Copilot

Pull request overview

This PR adds a new OpenHCL storage path for servicing guest SCSI I/O in VTL2 userspace using a new disk_storvsc DiskIo backend backed by storvsc_driver over VMBus UIO, and wires it into Underhill behind the OPENHCL_STORVSC_USERMODE opt-in flag.

Changes:

Add new disk_storvsc crate implementing DiskIo by formatting and issuing SCSI CDBs through storvsc_driver.
Extend storvsc_driver with DMA buffer support, resize listeners, and save/restore state handling.
Integrate a new StorvscManager/resolver into underhill_core, update servicing save/restore, and add VTL2 settings routing for VScsi devices.

Reviewed changes

Copilot reviewed 21 out of 22 changed files in this pull request and generated 13 comments.

Show a summary per file

File	Description
vm/vmcore/guestmem/src/lib.rs	Adds `LockedPages::va()` accessor used by DMA/driver plumbing.
vm/devices/vmbus/vmbus_user_channel/src/lib.rs	Makes ring sizes configurable when opening UIO-backed channels.
vm/devices/storage/storvsp/src/lib.rs	Sets storvsp channel type to message-mode pipe for the SCSI interface.
vm/devices/storage/storvsp_protocol/src/lib.rs	Adds `Inspect` derive to protocol types for diagnostics.
vm/devices/storage/storvsp_protocol/Cargo.toml	Adds `inspect` dependency to support protocol inspection.
vm/devices/storage/storvsc_driver/src/test_helpers.rs	Updates test harness to new request shape (GPN list + resize listener channel).
vm/devices/storage/storvsc_driver/src/lib.rs	Major driver enhancements: DMA buffers, resize listeners, save/restore, request API changes.
vm/devices/storage/storvsc_driver/Cargo.toml	Adds dependencies needed for DMA, tracing, events, and save/restore.
vm/devices/storage/scsi_defs/src/lib.rs	Improves SCSI struct types (e.g., big-endian wrappers, typed opcodes).
vm/devices/storage/disk_storvsc/src/lib.rs	New `DiskIo` backend implementing I/O via SCSI CDBs over usermode storvsc.
vm/devices/storage/disk_storvsc/Cargo.toml	Declares the new `disk_storvsc` crate and its dependencies.
openhcl/underhill_core/src/worker.rs	Adds `storvsc_usermode` env config and instantiates `StorvscManager`.
openhcl/underhill_core/src/storvsc_manager.rs	New actor-style manager + disk resolver + sysfs UIO claiming + save/restore wiring.
openhcl/underhill_core/src/servicing.rs	Adds servicing saved-state plumbing for storvsc at mesh(10004).
openhcl/underhill_core/src/options.rs	Adds `OPENHCL_STORVSC_USERMODE` option parsing and config propagation.
openhcl/underhill_core/src/lib.rs	Wires new option field and module into worker launch path.
openhcl/underhill_core/src/dispatch/vtl2_settings_worker.rs	Routes VScsi devices to StorvscDiskResolver when usermode storvsc is enabled.
openhcl/underhill_core/src/dispatch/mod.rs	Adds storvsc manager lifecycle (shutdown + save state) to LoadedVm.
openhcl/underhill_core/Cargo.toml	Adds `disk_storvsc`, `storvsc_driver`, and `storvsp_protocol` dependencies.
openhcl/openhcl_boot/src/main.rs	Temporarily forces `OPENHCL_STORVSC_USERMODE=1` in kernel cmdline for CI testing.
Cargo.toml	Adds `disk_storvsc` to workspace members and workspace dependencies.
Cargo.lock	Locks new crate and dependency graph updates.

Copilot · 2026-04-04T04:02:19Z

vm/vmcore/guestmem/src/lib.rs

+    pub fn va(&self) -> u64 {
+        self.pages.first().unwrap().0 as u64


LockedPages::va() can panic when gpns is empty because it unconditionally unwraps self.pages.first(). Since lock_gpns() can return an empty LockedPages (gpns slice length 0), consider returning Option<u64>/Result<u64, _> or explicitly documenting/enforcing a non-empty invariant before exposing this accessor.

Suggested change

pub fn va(&self) -> u64 {

self.pages.first().unwrap().0 as u64

pub fn va(&self) -> Option<u64> {

self.pages.first().map(|page| page.0 as u64)

Copilot · 2026-04-04T04:02:19Z

vm/devices/storage/storvsc_driver/src/lib.rs

+            new_request_receiver,
+            add_resize_listener_receiver,
+        )?;
        storvsc.negotiate().await.unwrap();


storvsc.negotiate().await.unwrap() will panic on protocol negotiation failure. This is a host-facing boundary and should return an error instead (propagate with ? and map into StorvscErrorInner), so OpenHCL doesn’t crash on unexpected/malformed host responses.

Suggested change

storvsc.negotiate().await.unwrap();

storvsc.negotiate().await?;

Copilot · 2026-04-04T04:02:19Z

vm/devices/storage/storvsc_driver/src/lib.rs

+        if byte_len == 0 {}
        let payload_bytes = payload.as_bytes();
-        let start_page: u64 = gpa_start / PAGE_SIZE as u64;
-        let end_page: u64 = (gpa_start + (byte_len + PAGE_SIZE - 1) as u64) / PAGE_SIZE as u64;
-        let gpas: Vec<u64> = (start_page..end_page).collect();
-        let pages =
-            PagedRange::new(gpa_start as usize % PAGE_SIZE, byte_len, gpas.as_slice()).unwrap();
+        // Use caller-provided GPNs directly instead of computing a synthetic
+        // contiguous range. DMA allocations may have non-contiguous pages.
+        // gpn_offset handles sub-page-aligned guest buffers (e.g., 512-byte
+        // offset within first page).
+        let pages = PagedRange::new(gpn_offset, byte_len, gpns).unwrap();


PagedRange::new(...).unwrap() can panic if gpn_offset/byte_len are inconsistent with the provided gpns slice (e.g., empty gpns, offset >= page size, length too large). Please convert this to fallible error handling and return a StorvscError instead of panicking; also remove the empty if byte_len == 0 {} no-op (callers already gate on byte_len > 0).

Copilot · 2026-04-04T04:02:20Z

vm/devices/storage/storvsc_driver/src/lib.rs

+                    // Match completion against pending transactions
+                    match self
+                        .transactions
+                        .get_mut(completion.transaction_id as usize)
+                    {
+                        Some(t) => Ok(t),
+                        None => Err(StorvscError(StorvscErrorInner::PacketError(
+                            PacketError::UnexpectedTransaction(completion.transaction_id),
+                        ))),
+                    }?
+                    .cancel(StorvscCompleteReason::UnitAttention);


The UNIT ATTENTION path cancels the pending transaction but never removes it from self.transactions (it uses get_mut() and then .cancel(...)). This will leak slab entries over time and prevent transaction IDs from being reused, eventually growing memory and possibly failing new requests. Consider using remove() (or equivalent) when completing/cancelling a transaction.

Suggested change

// Match completion against pending transactions

match self

.transactions

.get_mut(completion.transaction_id as usize)

{

Some(t) => Ok(t),

None => Err(StorvscError(StorvscErrorInner::PacketError(

PacketError::UnexpectedTransaction(completion.transaction_id),

))),

}?

.cancel(StorvscCompleteReason::UnitAttention);

// Match completion against pending transactions and remove the

// cancelled transaction so the slot can be reused.

let transaction_id = completion.transaction_id as usize;

let mut transaction = if self.transactions.contains(transaction_id) {

Ok(self.transactions.remove(transaction_id))

} else {

Err(StorvscError(StorvscErrorInner::PacketError(

PacketError::UnexpectedTransaction(completion.transaction_id),

)))

}?;

transaction.cancel(StorvscCompleteReason::UnitAttention);

Copilot · 2026-04-04T04:02:20Z

vm/devices/storage/storvsc_driver/src/lib.rs

+                    // Match completion against pending transactions
+                    match self
+                        .transactions
+                        .get_mut(completion.transaction_id as usize)
+                    {
+                        Some(t) => Ok(t),
+                        None => Err(StorvscError(StorvscErrorInner::PacketError(
+                            PacketError::UnexpectedTransaction(completion.transaction_id),
+                        ))),
+                    }?
+                    .complete(result);
+                }


The normal completion path calls .complete(result) on a transaction found via self.transactions.get_mut(...), but does not remove the entry afterward. This causes unbounded growth of the Slab<PendingOperation> and can eventually break I/O. Consider removing the transaction from the slab when the completion is handled.

Copilot · 2026-04-04T04:02:21Z

vm/devices/storage/disk_storvsc/src/lib.rs

+        let cdb = scsi_defs::Unmap {
+            operation_code: ScsiOp::UNMAP,
+            allocation_length: (size_of::<scsi_defs::UnmapBlockDescriptor>() as u16).into(),
+            ..FromZeros::new_zeroed()
+        };


UNMAP CDB allocation_length should be the total parameter list length (header + descriptors), but it’s currently set to only size_of::<UnmapBlockDescriptor>(). This likely makes UNMAP fail (the device will read an incomplete parameter list). Consider setting the CDB length and the data-out byte_len to the actual list size (and only sending that many bytes).

Copilot · 2026-04-04T04:02:21Z

vm/devices/storage/disk_storvsc/src/lib.rs

+                    tracing::error!(
+                        error = e.as_ref() as &dyn std::error::Error,
+                        "failed to refetch capacity on resize"
+                    );


On capacity refetch failure, wait_resize() logs and continues without awaiting the resize event (or any backoff), which can spin in a tight loop if the host keeps returning errors. Consider awaiting listen (or adding a small delay/backoff) on error to avoid CPU burn.

Suggested change

);

);

listen.await;

Copilot · 2026-04-04T04:02:21Z

openhcl/underhill_core/src/dispatch/vtl2_settings_worker.rs

+        return Ok(Resource::new(StorvscDiskConfig {
+            instance_guid: controller_instance_id,
+            lun: sub_device_path as u8,


sub_device_path is a u32 but is cast to u8 for lun without validation. If sub_device_path > 255, this will silently truncate and route I/O to the wrong LUN. Consider validating the range and returning an error if it doesn’t fit in u8.

Suggested change

return Ok(Resource::new(StorvscDiskConfig {

instance_guid: controller_instance_id,

lun: sub_device_path as u8,

let lun = u8::try_from(sub_device_path).context(format!(

"VScsi sub_device_path {} does not fit in a u8 LUN",

sub_device_path

))?;

return Ok(Resource::new(StorvscDiskConfig {

instance_guid: controller_instance_id,

lun,

Copilot · 2026-04-04T04:02:21Z

openhcl/openhcl_boot/src/main.rs

        "rdinit=/underhill-init",
        // Default to user-mode NVMe driver.
        "OPENHCL_NVME_VFIO=1",
+        // TODO(juantian): TEMP -- enable usermode storvsc to validate via CI pipeline.
+        // MUST be removed before merge. See PR description.
+        "OPENHCL_STORVSC_USERMODE=1",


This hard-codes OPENHCL_STORVSC_USERMODE=1 into the boot command line, which enables the feature by default and contradicts the PR’s stated opt-in behavior. Please remove this before merge (or gate it behind a test-only/dev-only configuration) so production images remain opt-in via environment only.

Copilot · 2026-04-04T04:02:22Z

openhcl/underhill_core/src/storvsc_manager.rs

+        if !self.save_restore_supported {
+            async {
+                join_all(self.drivers.drain().map(|(guid, driver)| {
+                    let guid_str = guid.to_string();
+                    async move {
+                        driver
+                            .stop()
+                            .instrument(tracing::info_span!(
+                                "shutdown_storvsc_driver",
+                                guid = guid_str
+                            ))
+                            .await
+                    }
+                }))
+                .await
+            }
+            .instrument(join_span)
+            .await;
+        }


On shutdown, drivers are only stopped when !save_restore_supported. If save_restore_supported is true, this exits the manager without explicitly stopping the per-controller StorvscDriver tasks, which can leak work/resources and diverges from the NVMe manager shutdown pattern. Consider stopping drivers unconditionally on shutdown (save/restore support should affect servicing behavior, not shutdown cleanup).

Suggested change

if !self.save_restore_supported {

async {

join_all(self.drivers.drain().map(|(guid, driver)| {

let guid_str = guid.to_string();

async move {

driver

.stop()

.instrument(tracing::info_span!(

"shutdown_storvsc_driver",

guid = guid_str

))

.await

}

}))

.await

}

.instrument(join_span)

.await;

}

async {

join_all(self.drivers.drain().map(|(guid, driver)| {

let guid_str = guid.to_string();

async move {

driver

.stop()

.instrument(tracing::info_span!(

"shutdown_storvsc_driver",

guid = guid_str

))

.await

}

}))

.await

}

.instrument(join_span)

.await;

Copilot

Pull request overview

Copilot reviewed 23 out of 24 changed files in this pull request and generated 3 comments.

Copilot · 2026-04-05T02:00:57Z

vm/devices/storage/storvsc_driver/src/lib.rs

+            let state = s.state_mut().unwrap();
+
+            // Cancel pending operations with save/restore reason.
+            for mut transaction in state.inner.transactions.drain() {


state.inner.transactions is a slab::Slab<PendingOperation>, and Slab::drain() yields (usize, PendingOperation) items. The current loop binds the tuple to transaction and then calls transaction.cancel(...), which won't compile. Destructure the drain items (e.g., for (_id, mut op) in ...) before calling cancel.

Suggested change

for mut transaction in state.inner.transactions.drain() {

for (_id, mut transaction) in state.inner.transactions.drain() {

Copilot · 2026-04-05T02:00:57Z

vm/devices/storage/disk_storvsc/src/lib.rs

+        let cdb = scsi_defs::Unmap {
+            operation_code: ScsiOp::UNMAP,
+            allocation_length: ((size_of::<scsi_defs::UnmapListHeader>()
+                + size_of::<scsi_defs::UnmapBlockDescriptor>())
+                as u16)
+                .into(),
+            ..FromZeros::new_zeroed()
+        };
+
+        let unmap_param_list = scsi_defs::UnmapListHeader {
+            data_length: ((size_of::<scsi_defs::UnmapListHeader>() - 2
+                + size_of::<scsi_defs::UnmapBlockDescriptor>()) as u16)
+                .into(),
+            block_descriptor_data_length: (size_of::<scsi_defs::UnmapBlockDescriptor>() as u16)
+                .into(),
+            ..FromZeros::new_zeroed()
+        };
+
+        let unmap_descriptor = scsi_defs::UnmapBlockDescriptor {
+            start_lba: sector.into(),
+            lba_count: u32::try_from(count)
+                .map_err(|_| DiskError::InvalidInput)?
+                .into(),
+            ..FromZeros::new_zeroed()
+        };
+
+        // At this time we cannot allocate contiguous pages, but this could be done without an
+        // assert if we could guarantee that the allocation is contiguous.
+        const_assert!(
+            (size_of::<scsi_defs::UnmapListHeader>() + size_of::<scsi_defs::UnmapBlockDescriptor>())
+                as u64
+                <= PAGE_SIZE_4K
+        );
+        let data_out_size = PAGE_SIZE_4K as usize;
+        let data_out = match self.driver.allocate_dma_buffer(data_out_size) {
+            Ok(buf) => buf,
+            Err(err) => {
+                tracing::error!(
+                    error = err.to_string(),
+                    "Unable to allocate DMA buffer for UNMAP"
+                );
+                return Err(DiskError::Io(std::io::Error::other(err)));
+            }
+        };
+        data_out.write_at(0, unmap_param_list.as_bytes());
+        data_out.write_at(
+            size_of::<scsi_defs::UnmapListHeader>(),
+            unmap_descriptor.as_bytes(),
+        );
+
+        self.send_scsi_request(
+            cdb.as_bytes(),
+            cdb.operation_code,
+            data_out.pfns(),
+            data_out_size,
+            false,
+            0,
+        )


UNMAP builds a parameter list whose length matches the CDB allocation_length, but the request sends byte_len = data_out_size (a full 4K page). This makes ScsiRequest.data_transfer_length larger than the UNMAP parameter list length, which can cause protocol/target errors and adds unnecessary transfer. Send only the actual parameter list length (header + descriptor), even if the backing DMA allocation is a full page.

Copilot · 2026-04-05T02:00:58Z

openhcl/openhcl_boot/src/main.rs

+        // TODO(juantian): TEMP -- enable usermode storvsc to validate via CI pipeline.
+        // MUST be removed before merge. See PR description.
+        "OPENHCL_STORVSC_USERMODE=1",


This hard-codes OPENHCL_STORVSC_USERMODE=1 into the default kernel command line, which contradicts the stated "opt-in only" behavior and would enable usermode storvsc by default for all boots that use this image. Please remove this before merge (or gate it behind a test-only build/feature) so the default remains disabled unless the environment variable is explicitly set at runtime.

Suggested change

// TODO(juantian): TEMP -- enable usermode storvsc to validate via CI pipeline.

// MUST be removed before merge. See PR description.

"OPENHCL_STORVSC_USERMODE=1",

github-actions · 2026-04-05T03:25:43Z

217 Petri tests failed (0 unstable)

github-actions · 2026-04-05T05:11:32Z

17 Petri tests failed (0 unstable)

Copilot

Pull request overview

Copilot reviewed 22 out of 23 changed files in this pull request and generated 5 comments.

Copilot · 2026-04-05T07:19:50Z

vm/devices/storage/disk_storvsc/src/lib.rs

+    async fn fetch_capacity(&self) -> anyhow::Result<DiskCapacity> {
+        // Must fit in a single page -- DMA allocations may not be
+        // physically contiguous across page boundaries.
+        const_assert!(size_of::<scsi_defs::ReadCapacity16Data>() as u64 <= PAGE_SIZE_4K);
+        // Must fit in a single page -- DMA allocations may not be
+        // physically contiguous across page boundaries.
+        const_assert!(size_of::<scsi_defs::ReadCapacityData>() as u64 <= PAGE_SIZE_4K);
+        let data_in_size = PAGE_SIZE_4K as usize;


size_of::<T>() is used throughout this module but size_of is never imported/qualified, which will fail to compile. Add use core::mem::size_of; (or qualify calls as core::mem::size_of).

Copilot · 2026-04-05T07:19:50Z

vm/devices/storage/storvsc_driver/src/lib.rs

+
+            // Cancel pending operations with save/restore reason.
+            for mut transaction in state.inner.transactions.drain() {
+                transaction.cancel(StorvscCompleteReason::SaveRestore);
+            }
+
+            Ok(StorvscDriverSavedState {
+                version: state.version.major_minor,
+                has_negotiated: state.has_negotiated,


stop() (and save()) stop/remove the worker task but leave new_request_sender/add_resize_listener_sender intact. If a caller uses an Arc<StorvscDriver<_>> after stop/save begins, send_request() can enqueue a request with a live completion sender and then await forever because the worker is no longer draining the request channel. Consider clearing/closing the senders during stop/save or gating send_request/add_resize_listener on an explicit running-state check so they fail fast when the task is not running.

Suggested change

// Cancel pending operations with save/restore reason.

for mut transaction in state.inner.transactions.drain() {

transaction.cancel(StorvscCompleteReason::SaveRestore);

}

Ok(StorvscDriverSavedState {

version: state.version.major_minor,

has_negotiated: state.has_negotiated,

let version = state.version.major_minor;

let has_negotiated = state.has_negotiated;

// Cancel pending operations with save/restore reason.

for mut transaction in state.inner.transactions.drain() {

transaction.cancel(StorvscCompleteReason::SaveRestore);

}

// Remove the stopped worker state so its request/listener receivers

// are dropped and future sends fail fast instead of queueing work

// that no task will ever drain.

s.remove();

Ok(StorvscDriverSavedState {

version,

has_negotiated,

Copilot · 2026-04-05T07:19:51Z

vm/devices/storage/storvsc_driver/src/lib.rs

+                    };
+
+                // If CHECK CONDITION with sense UNIT ATTENTION, then notify any resize listeners and
+                // resend this request


The comment says the UNIT ATTENTION path will “resend this request”, but the implementation cancels the pending operation with StorvscCompleteReason::UnitAttention and does not resend. Either update the comment to match behavior (caller must retry) or implement the resend logic here.

Suggested change

// resend this request

// cancel the pending request with UnitAttention so the caller can retry it.

Copilot · 2026-04-05T07:19:51Z

openhcl/openhcl_boot/src/main.rs

        // Default to user-mode NVMe driver.
        "OPENHCL_NVME_VFIO=1",
+        // TODO(juantian): TEMP -- enable usermode storvsc to validate via CI pipeline.
+        // MUST be removed before merge. See PR description.
+        "OPENHCL_STORVSC_USERMODE=1",
        // The next three items reduce the memory overhead of the storvsc driver.


This hard-codes OPENHCL_STORVSC_USERMODE=1 into the default kernel command line, which contradicts the PR’s “opt-in only / not enabled by default” requirement. Please remove this before merge (or gate it behind an explicit test-only feature/build flag) so production boots remain opt-in via environment.

Copilot · 2026-04-05T07:19:51Z

vm/devices/storage/disk_storvsc/src/lib.rs

+    // TODO: Add unit tests for wait_resize -- cover error retry with
+    // listen.await backoff, and capacity change detection.
+    async fn wait_resize(&self, sector_count: u64) -> u64 {
+        loop {
+            let listen = self.resize_event.listen();
+            // Refetch capacity from host (we're in async context here)
+            let capacity = match self.fetch_capacity().await {
+                Ok(c) => c,
+                Err(e) => {
+                    tracing::error!(
+                        error = e.as_ref() as &dyn std::error::Error,
+                        "failed to refetch capacity on resize"
+                    );
+                    listen.await;
+                    continue;
+                }
+            };
+            if capacity.num_sectors != sector_count {
+                break capacity.num_sectors;
+            }
+            listen.await;
+        }
+    }
+}


disk_storvsc introduces substantial new disk I/O behavior (bounce-buffer vs zero-copy, retry logic on CancelledRetry, capacity refresh on resize), but the crate has no unit tests. Adding targeted tests (e.g., for send_scsi_request retry semantics and wait_resize behavior) would help prevent regressions.

Suggested change

// TODO: Add unit tests for wait_resize -- cover error retry with

// listen.await backoff, and capacity change detection.

async fn wait_resize(&self, sector_count: u64) -> u64 {

loop {

let listen = self.resize_event.listen();

// Refetch capacity from host (we're in async context here)

let capacity = match self.fetch_capacity().await {

Ok(c) => c,

Err(e) => {

tracing::error!(

error = e.as_ref() as &dyn std::error::Error,

"failed to refetch capacity on resize"

);

listen.await;

continue;

}

};

if capacity.num_sectors != sector_count {

break capacity.num_sectors;

}

listen.await;

}

}

}

async fn wait_resize(&self, sector_count: u64) -> u64 {

loop {

let listen = self.resize_event.listen();

let next = match self.fetch_capacity().await {

Ok(capacity) => {

wait_resize_next_action(sector_count, Ok(capacity.num_sectors))

}

Err(e) => {

tracing::error!(

error = e.as_ref() as &dyn std::error::Error,

"failed to refetch capacity on resize"

);

wait_resize_next_action(sector_count, Err(()))

}

};

match next {

WaitResizeAction::ReturnUpdatedCapacity(num_sectors) => break num_sectors,

WaitResizeAction::WaitForNotification => listen.await,

}

}

}

}

#[derive(Debug, Clone, Copy, PartialEq, Eq)]

enum WaitResizeAction {

WaitForNotification,

ReturnUpdatedCapacity(u64),

}

fn wait_resize_next_action(

sector_count: u64,

observed_sector_count: Result<u64, ()>,

) -> WaitResizeAction {

match observed_sector_count {

Ok(observed_sector_count) if observed_sector_count != sector_count => {

WaitResizeAction::ReturnUpdatedCapacity(observed_sector_count)

}

Ok(_) | Err(()) => WaitResizeAction::WaitForNotification,

}

}

#[cfg(test)]

mod tests {

use super::WaitResizeAction;

use super::wait_resize_next_action;

use test_with_tracing::test;

#[test]

fn wait_resize_retries_after_capacity_refresh_error() {

assert_eq!(

wait_resize_next_action(1024, Err(())),

WaitResizeAction::WaitForNotification

);

}

#[test]

fn wait_resize_waits_when_capacity_is_unchanged() {

assert_eq!(

wait_resize_next_action(1024, Ok(1024)),

WaitResizeAction::WaitForNotification

);

}

#[test]

fn wait_resize_returns_when_capacity_changes() {

assert_eq!(

wait_resize_next_action(1024, Ok(2048)),

WaitResizeAction::ReturnUpdatedCapacity(2048)

);

}

}

github-actions · 2026-04-05T08:26:43Z

13 Petri tests failed (0 unstable)

github-actions · 2026-04-05T19:13:12Z

9 Petri tests failed (0 unstable)

Copilot

Pull request overview

Copilot reviewed 26 out of 27 changed files in this pull request and generated 6 comments.

Copilot · 2026-04-07T04:32:42Z

vm/devices/storage/storvsc_driver/src/lib.rs

+            let state = s.state_mut().unwrap();
+
+            // Cancel pending operations with save/restore reason.
+            for mut transaction in state.inner.transactions.drain() {


Slab::drain() yields (usize, PendingOperation) in slab 0.4.x. Iterating as for mut transaction in ...drain() will not compile (tuple has no cancel). Destructure the tuple (or use drain().for_each) so you call cancel(...) on the PendingOperation value.

Suggested change

for mut transaction in state.inner.transactions.drain() {

for (_, mut transaction) in state.inner.transactions.drain() {

Copilot · 2026-04-07T04:32:42Z

vm/devices/storage/storvsc_driver/src/lib.rs

+                has_negotiated: state.has_negotiated,
+            })
+        } else {
+            // Task was not running, so not state to save


Comment typo: "Task was not running, so not state to save" should be corrected (e.g., "no state to save").

Suggested change

// Task was not running, so not state to save

// Task was not running, so no state to save

Copilot · 2026-04-07T04:32:43Z

vm/devices/storage/disk_storvsc/src/lib.rs

+    // TODO: Add unit tests for wait_resize -- cover error retry with
+    // listen.await backoff, and capacity change detection.
+    async fn wait_resize(&self, sector_count: u64) -> u64 {
+        loop {
+            let listen = self.resize_event.listen();
+            // Refetch capacity from host (we're in async context here)
+            let capacity = match self.fetch_capacity_10().await {
+                Ok(c) => c,
+                Err(e) => {
+                    tracing::error!(
+                        error = e.as_ref() as &dyn std::error::Error,
+                        "failed to refetch capacity on resize"
+                    );
+                    listen.await;
+                    continue;
+                }
+            };
+            if capacity.num_sectors != sector_count {
+                break capacity.num_sectors;
+            }
+            listen.await;
+        }


wait_resize() always refetches capacity via READ_CAPACITY(10) (fetch_capacity_10). For disks that required READ_CAPACITY(16) initially (>=2 TiB / last LBA == 0xFFFF_FFFF), this will cap the reported size and prevent detecting large resizes correctly. Use the same capacity query strategy as construction (prefer 16 for non-optical, fall back to 10) or branch based on device_type/a cached capability flag.

Copilot · 2026-04-07T04:32:43Z

openhcl/underhill_core/src/storvsc_manager.rs

+        // Check cache first (bounce resolver may have already created it).
+        {
+            let cache = self.disk_cache.lock().unwrap();
+            if let Some(disk) = cache.get(&key) {
+                return Ok(ResolvedDisk::new(disk_storvsc::StorvscDiskBounce::new(
+                    disk.clone(),


These lock().unwrap() calls can panic if the mutex is ever poisoned. Since this runs on an async path (disk resolution) and is potentially on a trust boundary, prefer handling PoisonError (e.g., recover via into_inner() and log) or use a non-poisoning lock implementation so a previous panic can't turn into a persistent crash.

Copilot · 2026-04-07T04:32:43Z

openhcl/underhill_core/src/storvsc_manager.rs

+    // Step 2: Unbind from hv_storvsc (if currently bound there).
+    match std::fs::write("/sys/bus/vmbus/drivers/hv_storvsc/unbind", &device_id) {
+        Ok(()) => {
+            tracing::info!(
+                %instance_guid,
+                "unbound SCSI channel from hv_storvsc for usermode relay"
+            );
+        }
+        Err(e) => {
+            // ENODEV means the device isn't on hv_storvsc -- maybe it's
+            // already on UIO or unbound. Not an error.
+            tracing::debug!(
+                %instance_guid,
+                error = %e,
+                "hv_storvsc unbind skipped (device may not be bound there)"
+            );
+        }
+    }
+
+    // Step 3: Bind to uio_hv_generic.
+    match std::fs::write("/sys/bus/vmbus/drivers/uio_hv_generic/bind", &device_id) {
+        Ok(()) => {
+            tracing::info!(
+                %instance_guid,
+                "bound SCSI channel to UIO for usermode relay"
+            );
+        }
+        Err(e) => {
+            // EBUSY means already bound to UIO -- fine.
+            tracing::debug!(
+                %instance_guid,
+                error = %e,
+                "UIO bind skipped (device may already be bound)"
+            );
+        }


The unbind/bind steps log all errors at debug! without checking expected vs unexpected error kinds. This can make real failures (e.g., EPERM/ENOENT) easy to miss while still leading to later hard failures when opening the UIO device. Consider matching on ErrorKind/raw_os_error() (e.g., treat ENODEV/EBUSY as debug, but warn/error for others) so operational issues surface in normal logs.

Copilot · 2026-04-07T04:32:43Z

vm/devices/storage/disk_storvsc/src/lib.rs

+                    let mut buf_pos = 0;
+                    let vpd_header = data_in.read_obj::<scsi_defs::VpdPageHeader>(0);
+                    buf_pos += size_of::<scsi_defs::VpdPageHeader>();
+                    while buf_pos < vpd_header.page_length as usize + 4 {
+                        let designator_header =
+                            data_in.read_obj::<scsi_defs::VpdIdentificationDescriptor>(buf_pos);
+                        match designator_header.identifiertype {
+                            scsi_defs::VPD_IDENTIFIER_TYPE_FCPH_NAME => {
+                                // VpdNaaId includes VpdIdentificationDescriptor as its
+                                // first field (`scsi_defs::VpdNaaId::header`), so read
+                                // the full struct from the descriptor start position.
+                                let designator_naa =
+                                    data_in.read_obj::<scsi_defs::VpdNaaId>(buf_pos);
+                                let mut created_disk_id = [0u8; 16];
+                                created_disk_id[0] = designator_naa.ouid_msb;
+                                created_disk_id[1..3]
+                                    .copy_from_slice(designator_naa.ouid_middle.as_slice());
+                                created_disk_id[3] = designator_naa.ouid_lsb;
+                                created_disk_id[4..]
+                                    .copy_from_slice(designator_naa.vendor_specific_id.as_slice());
+                                return Ok(Some(created_disk_id));
+                            }
+                            _ => {
+                                buf_pos += size_of::<scsi_defs::VpdIdentificationDescriptor>()
+                                    + designator_header.identifier_length as usize;
+                            }


MemoryBlock::read_obj() uses unchecked slice indexing and will panic on out-of-bounds. Here, vpd_header.page_length and designator_header.identifier_length come from the host/device (untrusted) and can make buf_pos exceed the allocated 4K buffer, leading to a crash. Please add explicit bounds checks against data_in_size (and validate page_length/descriptor sizes) before calling read_obj, or parse using fallible methods that return an error instead of panicking.

Add disk_storvsc, a DiskIo backend that handles guest SCSI I/O in VTL2 userspace via storvsc_driver over VMBus UIO. When enabled, OpenHCL intercepts VScsi controller channels and translates block I/O into SCSI CDBs, bypassing the kernel hv_storvsc driver for those controllers. Opt-in via OPENHCL_STORVSC_USERMODE=1 environment variable.

- Propagate negotiate() errors via ? instead of silently ignoring - Log decode errors and PagedRange failure inputs for diagnostics - Fix VPD NAA ID parsing to read from descriptor start - Add backoff on wait_resize error instead of busy-spinning - Validate sub_device_path with u8::try_from for LUN routing - Fix UNMAP allocation_length to include header + descriptor - Remove unused LockedPages::va() (dead code from prior draft) - Remove resize_listeners from inspect (internal bookkeeping) - Stop all drivers unconditionally on shutdown (deep defensive) - Fix ScsiOp::INQUIRY type in fuzz target - Add TODO for resize integration tests

…m gate Remove ChannelType::Pipe from storvsp SCSI offer -- incorrectly ported from eric135 draft. Windows storvsp uses flags=0 for SCSI (Default). This broke all UEFI VMM tests with Unrecognized operation COMPLETE_IO. Add cfg(unix) to disk_storvsc (depends on Linux-only vmbus_user_channel). Fix UNMAP to send actual param list length instead of full page size.

Set has_negotiated=true after the explicit negotiate() call in StorvscDriver::run(). Without this, StorvscState::run() sees has_negotiated==false and calls negotiate() a second time. storvsp rejects the second BEGIN_INITIALIZATION because it is already in Ready state, returning INVALID_DEVICE_STATE. This breaks all storvsp tests that go through the UIO channel path.

Skip disk-specific metadata queries for optical devices (device_type 0x05). DVD/CD-ROM only needs capacity for read I/O -- SimpleScsiDvd handles SCSI protocol (INQUIRY, MODE_SENSE, VPD) itself and only delegates read/eject. - Split fetch_capacity into fetch_capacity_10/16 for clarity - Optical path: READ_CAPACITY(10), disk_id=None, read_only=true, unmap=0 - Disk path: READ_CAPACITY(16->10 fallback), VPD, MODE_SENSE, Block Limits - Fix misleading comment in storvsc_driver (resend -> cancel so caller can retry)

…icrosoft#11) Root cause: IDE hard drive CommandBuffer uses heap-allocated memory with fake GPNs [0,1,2,...,15]. When StorvscDisk uses the GPA-direct (non-bounce) path for IDE port I/O, these fake GPNs are sent to host storvsp, which writes disk data to guest pages 0-15 (IVT/BDA area) instead of the IDE command buffer. PCAT firmware reads garbage and hangs. Fix: two resolvers for one shared disk. IDE accel (VMBus SCSI, real GPNs) gets GPA-direct zero-copy. IDE direct (port I/O, fake GPNs) gets forced bounce buffers. disk_storvsc: Extract read/write_vectored_inner with force_bounce param. Add StorvscDiskBounce wrapper (Arc<StorvscDisk> + force_bounce flag) that implements DiskIo -- delegates everything, overrides read/write to force bounce when needed. storvsc_manager: Add disk_cache on StorvscDiskResolver (shared between config types via Clone). Add StorvscDiskBounceConfig resource type with its own resolver that wraps cached Arc<StorvscDisk> in bounce wrapper. Both resolvers produce StorvscDiskBounce (force_bounce=false for accel, true for direct). ide_resources: Add ide_direct_disk_type field to GuestMedia::Disk for carrying the alternate bounce disk resource alongside the main one. vtl2_settings_worker: In make_ide_disk_config, when storvsc usermode is active and disk is VScsi-backed, set ide_direct_disk_type to a StorvscDiskBounceConfig with the same controller GUID and LUN. worker.rs: Register both StorvscDiskConfig and StorvscDiskBounceConfig resolvers from a single shared StorvscDiskResolver instance. In IDE disk wiring, resolve ide_direct_disk_type (if present) for DriveMedia::hard_disk while using the main disk_type for IDE accelerator storvsp.

- Fix comment typo in storvsc_driver save() ("not state" -> "no state") - Use device_type-aware capacity fetch in wait_resize() to handle >2TB disks correctly (READ_CAPACITY(16) for disk, (10) for optical) - Replace lock().unwrap() with lock().expect() for diagnostic messages on mutex poison (4 sites in storvsc_manager resolver cache) - Differentiate expected vs unexpected errors in claim_vmbus_device_for_uio: ENODEV/EBUSY logged at debug (expected), other errors at warn - Add bounds checking on VPD Device Identification parsing to guard against untrusted host-provided page_length values - Fix disk_storvsc Cargo.toml dependency ordering per openvmm conventions

Copilot

Pull request overview

Copilot reviewed 26 out of 27 changed files in this pull request and generated 12 comments.

Copilot · 2026-04-07T18:40:15Z

vm/devices/storage/disk_storvsc/src/lib.rs

+        const_assert!(size_of::<scsi_defs::ReadCapacityData>() as u64 <= PAGE_SIZE_4K);
+        let data_in_size = PAGE_SIZE_4K as usize;


size_of is used unqualified throughout this new file but std::mem::size_of is not imported. This will fail to compile; either add use std::mem::size_of; near the imports or qualify each call as std::mem::size_of::<T>().

Copilot · 2026-04-07T18:40:15Z

vm/devices/vmbus/vmbus_user_channel/src/lib.rs

@@ -181,15 +181,19 @@ pub fn open_uio_device(instance_id: &Guid) -> Result<File, Error> {
 pub fn channel(
    driver: &(impl Driver + ?Sized),
    file: File,


Changing the signature of the public channel() function is a breaking API change for downstream crates. Consider re-introducing the previous channel(driver, file) as a wrapper that calls the new function with None, None (and optionally deprecate the old signature) to preserve compatibility while still enabling configurable ring sizes.

Suggested change

file: File,

file: File,

) -> Result<RawAsyncChannel<MappedRingMem>, Error> {

channel_with_ring_sizes(driver, file, None, None)

}

/// Opens a channel with a file from [`open_uio_device`] and optional ring sizes.

pub fn channel_with_ring_sizes(

driver: &(impl Driver + ?Sized),

file: File,

Copilot · 2026-04-07T18:40:16Z

vm/devices/storage/disk_storvsc/src/lib.rs

+        {
+            Ok(resp) if resp.scsi_status == ScsiStatus::GOOD => {
+                let block_limits = data_in.read_obj::<scsi_defs::VpdBlockLimitsDescriptor>(0);
+                Ok(block_limits.optimal_unmap_granularity.into())
+            }
+            Ok(resp) => {
+                // Device doesn't support Block Limits VPD (e.g., DVD).
+                // CHECK_CONDITION with ILLEGAL REQUEST is expected here.
+                tracing::debug!(
+                    scsi_status = ?resp.scsi_status,
+                    "Block Limits VPD not supported, unmap disabled"
+                );
+                Ok(0)
+            }


fetch_optimal_unmap_sectors() treats any non-GOOD SCSI status as “not supported” and silently disables unmap. That can mask real device/transport errors. Prefer checking for the specific expected failure (e.g., CHECK_CONDITION + ILLEGAL REQUEST / relevant sense data) and return an error for other statuses so callers can surface genuine failures.

Copilot · 2026-04-07T18:40:16Z

vm/devices/storage/disk_storvsc/src/lib.rs

+    /// in GPA Direct packets (zero-copy).
+    use_bounce_buffer: bool,
+    // Pre-fetched metadata: queried once at construction to avoid block_on in
+    // sync DiskIo methods. Capacity is refetched on resize via wait_resize().


The struct comment says capacity is “refetched on resize via wait_resize()”, but wait_resize() currently only returns the updated sector count and never updates the cached self.sector_count/self.sector_size. If DiskIo consumers expect sector_count() to reflect the new size after wait_resize() completes, this will remain stale. Consider using interior mutability (e.g., atomics or a mutex) to update cached capacity when a resize is detected, or adjust the comment/contract so callers know they must use the return value and not the cached getter.

Suggested change

// sync DiskIo methods. Capacity is refetched on resize via wait_resize().

// sync DiskIo methods. These cached values are not updated by wait_resize();

// callers must use the value returned by wait_resize() after a resize.

Copilot · 2026-04-07T18:40:16Z

vm/devices/storage/disk_storvsc/src/lib.rs

+    async fn wait_resize(&self, sector_count: u64) -> u64 {
+        loop {
+            let listen = self.resize_event.listen();
+            // Refetch capacity from host (we're in async context here)


The struct comment says capacity is “refetched on resize via wait_resize()”, but wait_resize() currently only returns the updated sector count and never updates the cached self.sector_count/self.sector_size. If DiskIo consumers expect sector_count() to reflect the new size after wait_resize() completes, this will remain stale. Consider using interior mutability (e.g., atomics or a mutex) to update cached capacity when a resize is detected, or adjust the comment/contract so callers know they must use the return value and not the cached getter.

Suggested change

async fn wait_resize(&self, sector_count: u64) -> u64 {

loop {

let listen = self.resize_event.listen();

// Refetch capacity from host (we're in async context here)

//

// This method refetches capacity from the host until it changes from the

// caller-provided `sector_count`, then returns the updated sector count.

// It does not update any cached capacity fields on `self`; callers must

// use the returned value rather than assuming cached getters changed.

async fn wait_resize(&self, sector_count: u64) -> u64 {

loop {

let listen = self.resize_event.listen();

// Refetch capacity from the host (we're in async context here) and

// compare it against the caller's current sector count.

Copilot · 2026-04-07T18:40:17Z

vm/devices/storage/disk_storvsc/src/lib.rs

+        if force_bounce || self.use_bounce_buffer {
+            let dma_buf = self
+                .driver
+                .allocate_dma_buffer(buffers.len())
+                .map_err(|e| DiskError::Io(std::io::Error::other(e)))?;


The bounce-buffer path allocates a fresh DMA buffer and an intermediate Vec<u8> on every read, which can add significant overhead for I/O-heavy workloads. Consider introducing a reusable buffer pool (or per-disk scratch buffers sized to max request) so reads/writes avoid repeated heap allocations and copies where possible.

Copilot · 2026-04-07T18:40:17Z

vm/devices/storage/disk_storvsc/src/lib.rs

+            if result.is_ok() {
+                let mut data = vec![0u8; buffers.len()];
+                dma_buf.read_at(0, &mut data);
+                let mut writer = buffers.writer();
+                writer.write(&data)?;
+            }


The bounce-buffer path allocates a fresh DMA buffer and an intermediate Vec<u8> on every read, which can add significant overhead for I/O-heavy workloads. Consider introducing a reusable buffer pool (or per-disk scratch buffers sized to max request) so reads/writes avoid repeated heap allocations and copies where possible.

Copilot · 2026-04-07T18:40:18Z

vm/devices/storage/disk_storvsc/src/lib.rs

+        let mut num_tries = 0;
+        loop {
+            match self
+                .driver
+                .send_request(&request, buf_gpns, byte_len, gpn_offset)
+                .await


The retry loop for CancelledRetry has no backoff/yield and can immediately re-issue requests repeatedly. If the underlying condition persists briefly (e.g., during servicing churn), this can create a tight loop and log/CPU pressure. Consider adding a small async delay or at least an executor yield between retries (and include retry count in logs) to make retry behavior more stable.

Copilot · 2026-04-07T18:40:18Z

vm/devices/storage/disk_storvsc/src/lib.rs

+                            err,
+                        ))),
+                        StorvscErrorKind::CancelledRetry => {
+                            if num_tries < MAX_RETRIES {


The retry loop for CancelledRetry has no backoff/yield and can immediately re-issue requests repeatedly. If the underlying condition persists briefly (e.g., during servicing churn), this can create a tight loop and log/CPU pressure. Consider adding a small async delay or at least an executor yield between retries (and include retry count in logs) to make retry behavior more stable.

Suggested change

if num_tries < MAX_RETRIES {

if num_tries < MAX_RETRIES {

tracelimit::error_ratelimited!(

error = &err as &dyn std::error::Error,

retry = num_tries + 1,

max_retries = MAX_RETRIES,

"SCSI request cancelled; yielding before retry"

);

tokio::task::yield_now().await;

Copilot · 2026-04-07T18:40:18Z

openhcl/underhill_core/src/dispatch/vtl2_settings_worker.rs

+                    let lun = u8::try_from(device.sub_device_path).ok();
+                    lun.map(|lun| {
+                        Resource::new(StorvscDiskBounceConfig {
+                            instance_guid: device.vmbus_instance_id,
+                            lun,
+                        })
+                    })


This silently drops the bounce-disk configuration if sub_device_path doesn’t fit in u8 (.ok()), which makes failures harder to diagnose. Since the non-bounce storvsc path already treats out-of-range LUNs as an error, it would be more consistent to propagate an error here too (or reuse the already-validated LUN) so IDE direct-path routing can’t silently fall back to the unsafe/non-bounce behavior.

Suggested change

let lun = u8::try_from(device.sub_device_path).ok();

lun.map(|lun| {

Resource::new(StorvscDiskBounceConfig {

instance_guid: device.vmbus_instance_id,

lun,

})

})

let lun = u8::try_from(device.sub_device_path).context(

"VScsi IDE direct-path routing requires sub_device_path to fit in u8",

)?;

Some(Resource::new(StorvscDiskBounceConfig {

instance_guid: device.vmbus_instance_id,

lun,

}))

github-actions · 2026-04-07T20:57:05Z

10 Petri tests failed (0 unstable)

Copilot AI review requested due to automatic review settings April 4, 2026 03:55

juantian8seattle requested review from a team as code owners April 4, 2026 03:55

github-actions bot added the unsafe Related to unsafe code label Apr 4, 2026

Copilot started reviewing on behalf of juantian8seattle April 4, 2026 03:56 View session

Copilot AI reviewed Apr 4, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings April 5, 2026 01:54

Copilot started reviewing on behalf of juantian8seattle April 5, 2026 01:55 View session

Copilot AI reviewed Apr 5, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings April 5, 2026 07:12

Copilot started reviewing on behalf of juantian8seattle April 5, 2026 07:13 View session

Copilot AI reviewed Apr 5, 2026

View reviewed changes

benhillis added the storage label Apr 6, 2026

Copilot AI review requested due to automatic review settings April 7, 2026 04:27

Copilot started reviewing on behalf of juantian8seattle April 7, 2026 04:28 View session

Copilot AI reviewed Apr 7, 2026

View reviewed changes

juantian added 8 commits April 7, 2026 18:26

fix: resolve clippy warnings and test compilation for CI

4204f9d

Copilot AI review requested due to automatic review settings April 7, 2026 18:30

juantian8seattle force-pushed the user/juantian/storvsc-usermode branch from 3e73816 to 8ea22f7 Compare April 7, 2026 18:30

Copilot AI reviewed Apr 7, 2026

View reviewed changes

Copilot started reviewing on behalf of juantian8seattle April 7, 2026 19:00 View session

		pub fn va(&self) -> u64 {
		self.pages.first().unwrap().0 as u64

	storvsc.negotiate().await.unwrap();
	storvsc.negotiate().await?;

	for mut transaction in state.inner.transactions.drain() {
	for (_id, mut transaction) in state.inner.transactions.drain() {

	// TODO(juantian): TEMP -- enable usermode storvsc to validate via CI pipeline.
	// MUST be removed before merge. See PR description.
	"OPENHCL_STORVSC_USERMODE=1",

	// resend this request
	// cancel the pending request with UnitAttention so the caller can retry it.

	for mut transaction in state.inner.transactions.drain() {
	for (_, mut transaction) in state.inner.transactions.drain() {

	// Task was not running, so not state to save
	// Task was not running, so no state to save

		const_assert!(size_of::<scsi_defs::ReadCapacityData>() as u64 <= PAGE_SIZE_4K);
		let data_in_size = PAGE_SIZE_4K as usize;

	// sync DiskIo methods. Capacity is refetched on resize via wait_resize().
	// sync DiskIo methods. These cached values are not updated by wait_resize();
	// callers must use the value returned by wait_resize() after a resize.

-    async fn wait_resize(&self, sector_count: u64) -> u64 {
-        loop {
-            let listen = self.resize_event.listen();
-            // Refetch capacity from host (we're in async context here)
+    //
+    // This method refetches capacity from the host until it changes from the
+    // caller-provided `sector_count`, then returns the updated sector count.
+    // It does not update any cached capacity fields on `self`; callers must
+    // use the returned value rather than assuming cached getters changed.
+    async fn wait_resize(&self, sector_count: u64) -> u64 {
+        loop {
+            let listen = self.resize_event.listen();
+            // Refetch capacity from the host (we're in async context here) and
+            // compare it against the caller's current sector count.

-                            if num_tries < MAX_RETRIES {
+                            if num_tries < MAX_RETRIES {
+                                tracelimit::error_ratelimited!(
+                                    error = &err as &dyn std::error::Error,
+                                    retry = num_tries + 1,
+                                    max_retries = MAX_RETRIES,
+                                    "SCSI request cancelled; yielding before retry"
+                                );
+                                tokio::task::yield_now().await;

Conversation

juantian8seattle commented Apr 4, 2026

Summary

What's included

Testing

Related issues

Notes for reviewers

Uh oh!

github-actions bot commented Apr 4, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Apr 5, 2026

Uh oh!

github-actions bot commented Apr 5, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment