Summary
Filesystem performance for OCI images is ~10x slower than Docker containers. The current approach passes the root filesystem through virtio-fs (FUSE passthrough), where every file operation crosses the FUSE → virtio → host boundary. We should explore Docker's approach of assembling OCI layers onto a block device.
Current approach
OCI image layers are stored on the host filesystem and exposed to the guest via virtio-fs FUSE passthrough. OverlayFS semantics are implemented in userspace on the host side (crates/filesystem/lib/backends/).
Proposed approach
Explore assembling OCI layers onto a block device (QCOW2 or raw) with a real filesystem (ext4), then passing it to the guest as a virtio-blk device. This gives kernel-to-kernel filesystem access inside the guest, bypassing FUSE overhead entirely.
Docker Desktop takes this approach on macOS. All image layers live on a persistent block device (Docker.raw), and overlayfs runs natively inside the VM kernel.
Options to explore
- Pre-assemble: Write OCI layers to a raw block device at pull time, mount as virtio-blk at boot
- QCOW2 COW backing chains: Use QCOW2's native copy-on-write with backing files to represent layers without duplicating data
Implementing this will fix a host of filesystem compatibility issues and flakiness like:
- Case sensitivity edge cases
- Rlimit nofile not quite working the way users think
- Override stat themselves not being vritualizable
Context
A user reported ~10x filesystem performance gap vs Docker and worked around it by implementing their own BlockDeviceMount. We should close this gap for the default OCI image path.
Summary
Filesystem performance for OCI images is ~10x slower than Docker containers. The current approach passes the root filesystem through virtio-fs (FUSE passthrough), where every file operation crosses the FUSE → virtio → host boundary. We should explore Docker's approach of assembling OCI layers onto a block device.
Current approach
OCI image layers are stored on the host filesystem and exposed to the guest via virtio-fs FUSE passthrough. OverlayFS semantics are implemented in userspace on the host side (
crates/filesystem/lib/backends/).Proposed approach
Explore assembling OCI layers onto a block device (QCOW2 or raw) with a real filesystem (ext4), then passing it to the guest as a virtio-blk device. This gives kernel-to-kernel filesystem access inside the guest, bypassing FUSE overhead entirely.
Docker Desktop takes this approach on macOS. All image layers live on a persistent block device (
Docker.raw), and overlayfs runs natively inside the VM kernel.Options to explore
Implementing this will fix a host of filesystem compatibility issues and flakiness like:
Context
A user reported ~10x filesystem performance gap vs Docker and worked around it by implementing their own
BlockDeviceMount. We should close this gap for the default OCI image path.