Skip to content

Improve OCI image filesystem performance using block device backing #499

@appcypher

Description

@appcypher

Summary

Filesystem performance for OCI images is ~10x slower than Docker containers. The current approach passes the root filesystem through virtio-fs (FUSE passthrough), where every file operation crosses the FUSE → virtio → host boundary. We should explore Docker's approach of assembling OCI layers onto a block device.

Current approach

OCI image layers are stored on the host filesystem and exposed to the guest via virtio-fs FUSE passthrough. OverlayFS semantics are implemented in userspace on the host side (crates/filesystem/lib/backends/).

Proposed approach

Explore assembling OCI layers onto a block device (QCOW2 or raw) with a real filesystem (ext4), then passing it to the guest as a virtio-blk device. This gives kernel-to-kernel filesystem access inside the guest, bypassing FUSE overhead entirely.

Docker Desktop takes this approach on macOS. All image layers live on a persistent block device (Docker.raw), and overlayfs runs natively inside the VM kernel.

Options to explore

  1. Pre-assemble: Write OCI layers to a raw block device at pull time, mount as virtio-blk at boot
  2. QCOW2 COW backing chains: Use QCOW2's native copy-on-write with backing files to represent layers without duplicating data

Implementing this will fix a host of filesystem compatibility issues and flakiness like:

  • Case sensitivity edge cases
  • Rlimit nofile not quite working the way users think
  • Override stat themselves not being vritualizable

Context

A user reported ~10x filesystem performance gap vs Docker and worked around it by implementing their own BlockDeviceMount. We should close this gap for the default OCI image path.

Metadata

Metadata

Assignees

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions