Skip to content
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
212 changes: 212 additions & 0 deletions contrib/design-docs/cache-uncompressed-machine-image-design-doc.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,212 @@
# Change Request

## **Short Summary**

Speed up `podman machine init` by keeping a decompressed base image in the cache directory (created at pull time), so subsequent inits copy/clone the base instead of re-decompressing. Introduce `podman machine rm --cache` to handle cleanups and cache rotation.

## **Objective**

`podman machine init` is slow because reasons, one of which is that the VM disk image is decompressed on every invocation, even when the compressed image is already cached locally. When a user does `podman machine rm` + `podman machine init`, the decompressed disk is destroyed and recreated from scratch.

This proposal eliminates redundant decompression by caching the decompressed base image alongside the compressed blob, drastically reducing warm init times. Also introduces future improvements like a `podman machine pull` command that could further improve the init times, or at least move the logic to a phase of initialization where the user might expect longer times, so during pull.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why bother keeping a compressed version?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for whatever reason I didn't want to break any behavior by removing a file. I am totally up to remove the compressed blob

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it is logical to just work uncompressed bc then you don't have to fiddle with the UI. That said, you will need to think about the migration of someone who has compressed cache files today as you think this over. i'd be also curious what other folks think.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could just migrate to uncompressed only - I believe the penalty for this is just an extra cache miss the first time the user init's after this change.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then I think it's set. no breaking change if the compressed file is needed it will just be redownloaded

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could just migrate to uncompressed only - I believe the penalty for this is just an extra cache miss the first time the user init's after this change.

There is not penalty at all in theory, updating to a new podman version is required to use this new feature and that always implies a cache miss. With each new podman version you will need to download the new image for that podman version of course.


## **Detailed Description:**

### High-level approach

- Decompress the image at pull time and save it next to the compressed blob in the cache directory.
- Add a new `podman machine rm --cache` flag to allow explicit cache pair removal.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you dont keep a compressed version, then this is not needed?

- Add an `ImageDigest` field to `MachineConfig` to track provenance and enable refcounting.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The refcounting references totally confuse me, what are you trying to count.

If we do real reflink copies of files there is nothing to count, the file is a copy and the fs is handling the deduplication internally so the userspace does not need to know anything about it?


```
Registry (quay.io/podman/machine-os:5.4)
|
| pull (identified by artifact digest)
v
Compressed blob: cache/{digest}.{format}.zst
|
| decompress (at init time)
v
Decompressed base: cache/{digest}.{format}
|
| copy/clone (at init time)
v
Per-machine image: {datadir}/{name}-{arch}.{format}
```

Cache is now a pair of compressed and decompressed files.

- `cache/{digest}.{format}.zst`: Compressed blob pulled from registry
- `cache/{digest}.{format}`: Decompressed base, ready to copy

### Behavior: Old vs New

#### `podman machine init`

Here the logic is simple. During init decompress cache and save it as a separate file in `cache` dir. If no compressed file found, pull and decompress. If no decompressed file found, decompress only.


#### `podman machine rm`

Deletes machine and init, preserves cache. Use `--cache` to wipe cache pair.

| Command | Image | Ignition | Cache pair |
| --------------------------------- | ------- | -------- | --------------------------------------------------------- |
| `machine rm` (current) | Deleted | Deleted | Kept |
| `machine rm` (proposed) | Deleted | Deleted | Kept |
| `machine rm --cache` (new) | Deleted | Deleted | Deleted (unless refcount > 0; use `--force` to override) |
| `machine rm --cache` (VM gone) | N/A | N/A | All orphan cache files listed and deleted on confirmation |
| `machine rm --save-image` | Kept | Deleted | Kept |
| `machine rm --save-ignition` | Deleted | Kept | Kept |
| `machine rm --cache --save-image` | Kept | Deleted | Deleted (unless refcount > 0; use `--force` to override) |
| `machine reset` | Deleted | Deleted | Deleted |

When `--cache` is used, a new confirmation prompt shows cache files in a dedicated section:

```
$ podman machine rm --cache
The following files will be deleted:

podman-machine-default.json
podman-machine-default.sock
...

The following cache files will be deleted:

91d1e51d...qcow2.zst
91d1e51d...qcow2
Are you sure you want to continue? [y/N] y
```

When the machine does not exist but `--cache` is specified, cache files are still listed and removed:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think UI wise this seems rather confusing, should there not be a different command to clean a cache?

The entire podman machine command line is already confusing enough as no argument means default machine. So someone who just want to clean the cache cannot do that.

I think a dedicated command would solve this much cleaner and makes the docs much more straightforward, i.e. podman machine cache by default that just shows the cachefile localtion and size, and then when adding --remove it removes the cache?


```
$ podman machine rm --cache
podman-machine-default: VM does not exist

The following cache files will be deleted:

91d1e51d...qcow2.zst
91d1e51d...qcow2
Are you sure you want to continue? [y/N] y
```

If no cache files exist, the output is:

```
$ podman machine rm --cache
podman-machine-default: VM does not exist
No cache files to remove.
```

#### Cache rotation (during `podman machine init`)

| Step | Current | Proposed |
| --------------------- | ---------------------------------- | ---------------------------------------------------------------------- |
| Trigger | Cache miss (new version available) | Same |
| Snapshot | `os.ReadDir(cache/)` before pull | Same |
| Pull new | Download new `.zst` | Same |
| Clean old | Wipe all snapshotted files | Same (now also wipes old decompressed base since it lives in same dir) |
| New decompressed base | N/A | Created after new `.zst` is pulled (before old files are cleaned) |

#### `podman machine reset`

| | Current | Proposed |
| ------ | ---------------------------------------- | ---------------- |
| Effect | Wipes entire data dir, config dir, cache | Same (no change) |

### Config change

Add `ImageDigest` field to `MachineConfig`:

```go
type MachineConfig struct {
// ... existing fields ...
ImageDigest string `json:"ImageDigest,omitempty"`
}
```

Set during init from the resolved OCI artifact digest. Used by:

- `rm`: to locate cache files (`cache/{digest}.*`) and do refcount check
- `init`: to verify cached base is current

### Provider considerations

| Provider | Format | Reflink support | Benefit |
| -------- | ------ | ------------------------------ | ---------------------------------------------------------------------------- |
| AppleHV | .raw | APFS clonefile (near-instant) | High |
| LibKrun | .raw | APFS clonefile | High |
| HyperV | .vhdx | NTFS: no reflink, regular copy | High (copy still faster than decompress) |
| QEMU | .qcow2 | btrfs/xfs reflink | Medium (qcow2 is smaller) |
| WSL | .tar | N/A (used for wsl --import) | Medium (caches decompressed tarball, skips download+decompress on re-import) |

### Key implementation files

| File | Change |
| --------------------------------------- | ----------------------------------------------------------------------------------------------------------- |
| `pkg/machine/ocipull/ociartifact.go` | After pull+unpack, decompress to cache (not per-machine path). On cache hit, return decompressed base path. |
| `pkg/machine/shim/host.go` | Init: copy/clone decompressed base to `mc.ImagePath`. Rm: add cache deletion with refcount check. |
| `pkg/machine/shim/diskpull/diskpull.go` | Route to copy-from-cache when decompressed base exists |
| `pkg/machine/vmconfigs/config.go` | Add `ImageDigest` field |
| `pkg/machine/vmconfigs/machine.go` | `Remove()`: add cache pair deletion logic with refcount |
| `cmd/podman/machine/rm.go` | Add `--cache` flag |
| `pkg/machine/config.go` | Add `Cache` (or `RemoveCache`) to `RemoveOptions` |

## **Use cases**

- **Repeated init/rm cycle**: A developer that would destroy and recreate machines during testing would benefit for short `init` times. With the decompressed base cached, `podman machine init` completes in few seconds.
- **Better cache management**: A user has a better control to reclaim disk space and manage machine files with a dedicated `--cache` flag

## **Target Podman Release**

After Podman 6

## **Link(s)**

- [RUN-4473](https://redhat.atlassian.net/browse/RUN-4473) — Jira tracking issue

## **Stakeholders**

- [x] Podman Users
- [x] Podman Developers
- [ ] Buildah Users
- [ ] Buildah Developers
- [ ] Skopeo Users
- [ ] Skopeo Developers
- [x] Podman Desktop
- [ ] CRI-O
- [ ] Storage library
- [ ] Image library
- [ ] Common library
- [ ] Netavark and aardvark-dns

## ** Assignee(s) **

@inknos

## **Impacts**

### **CLI**

- New `--cache` flag on `podman machine rm` to force deletion of the cache pair (compressed + decompressed).
- Confirmation prompt updated to show cache files in a dedicated section when `--cache` is used.
- No changes to `podman machine init` CLI interface; the optimization is transparent.

### **Libpod**

- New `ImageDigest` field in `MachineConfig` to track image provenance.
- Add `Cache` field to `RemoveOptions` to support `--cache`.

## **Further Description (Optional):**

### Future improvements

- Add `podman machine pull` command that will pull and decompress the cache. Options could be:
- `podman machine pull` — new command
- `podman machine pull --no-decompress-cache` — pull only (if default is to pull and decompress)
- `podman machine pull --decompress-cache` — pull and decompress (if default is pull only)

## **Test Descriptions (Optional):**

<!-- How will this feature be tested? -->