Skip to content

machine: design doc for uncompressed image cache#28857

Open
inknos wants to merge 2 commits into
podman-container-tools:mainfrom
inknos:run-4473-design-doc
Open

machine: design doc for uncompressed image cache#28857
inknos wants to merge 2 commits into
podman-container-tools:mainfrom
inknos:run-4473-design-doc

Conversation

@inknos

@inknos inknos commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Fixes: https://redhat.atlassian.net/browse/RUN-4473

Checklist

Ensure you have completed the following checklist for your pull request to be reviewed:

  • Certify you wrote the patch or otherwise have the right to pass it on as an open-source patch by signing all
    commits. (git commit -s). (If needed, use git commit -s --amend). The author email must match
    the sign-off email address. See CONTRIBUTING.md
    for more information.
  • Referenced issues using Fixes: #00000 in commit message (if applicable)
  • Tests have been added/updated (or no tests are needed)
  • Documentation has been updated (or no documentation changes are needed)
  • All commits pass make validatepr (format/lint checks)
  • Release note entered in the section below (or None if no user-facing changes)

Does this PR introduce a user-facing change?



`podman machine init` is slow because reasons, one of which is that the VM disk image is decompressed on every invocation, even when the compressed image is already cached locally. When a user does `podman machine rm` + `podman machine init`, the decompressed disk is destroyed and recreated from scratch.

This proposal eliminates redundant decompression by caching the decompressed base image alongside the compressed blob, drastically reducing warm init times. Also introduces future improvements like a `podman machine pull` command that could further improve the init times, or at least move the logic to a phase of initialization where the user might expect longer times, so during pull.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why bother keeping a compressed version?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for whatever reason I didn't want to break any behavior by removing a file. I am totally up to remove the compressed blob

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it is logical to just work uncompressed bc then you don't have to fiddle with the UI. That said, you will need to think about the migration of someone who has compressed cache files today as you think this over. i'd be also curious what other folks think.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could just migrate to uncompressed only - I believe the penalty for this is just an extra cache miss the first time the user init's after this change.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then I think it's set. no breaking change if the compressed file is needed it will just be redownloaded

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could just migrate to uncompressed only - I believe the penalty for this is just an extra cache miss the first time the user init's after this change.

There is not penalty at all in theory, updating to a new podman version is required to use this new feature and that always implies a cache miss. With each new podman version you will need to download the new image for that podman version of course.

### High-level approach

- Decompress the image at pull time and save it next to the compressed blob in the cache directory.
- Add a new `podman machine rm --cache` flag to allow explicit cache pair removal.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you dont keep a compressed version, then this is not needed?

@ashley-cui

Copy link
Copy Markdown
Contributor

Will using reflink affect podman machine start times, since machine start is guaranteed to modify the disk? If so, would it be better to just do a full copy during init instead? I feel like I'd prefer a slightly snappier first start over init

@baude

baude commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Will using reflink affect podman machine start times, since machine start is guaranteed to modify the disk? If so, would it be better to just do a full copy during init instead? I feel like I'd prefer a slightly snappier first start over init

Excellent points. I dont recall the exact answer (because at one time I fiddled with this), but iirc it also depended on the platform (windows, darwin, linux).

@inknos

inknos commented Jun 4, 2026

Copy link
Copy Markdown
Contributor Author

Will using reflink affect podman machine start times, since machine start is guaranteed to modify the disk? If so, would it be better to just do a full copy during init instead? I feel like I'd prefer a slightly snappier first start over init

yup, makes a lot of sense. start is when it needs to be fast. furthermore, the init time could be additionally cut down if we split the pull + extract step. I think this change would be beneficial especially for Podman Desktop folks, which I believe could do the setup steps way before the user starts the machine.

@Luap99

Luap99 commented Jun 5, 2026

Copy link
Copy Markdown
Member

Will using reflink affect podman machine start times, since machine start is guaranteed to modify the disk? If so, would it be better to just do a full copy during init instead? I feel like I'd prefer a slightly snappier first start over init

yup, makes a lot of sense. start is when it needs to be fast. furthermore, the init time could be additionally cut down if we split the pull + extract step. I think this change would be beneficial especially for Podman Desktop folks, which I believe could do the setup steps way before the user starts the machine.

I would not argue based on assumptions, this should be measured if we want to decide on this.

That changes podman machine start will do the the image contents are minimal so how much extra writes would that produce, a lot less then a full copy each time on init.
And time wise it should only pay the cost once, not on each start.

At least looking at our CI tests (and our histological flakiness) I think a reflink copy would help a ton there for example to speed up the copies and thus the overall tests.

Are you sure you want to continue? [y/N] y
```

When the machine does not exist but `--cache` is specified, cache files are still listed and removed:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think UI wise this seems rather confusing, should there not be a different command to clean a cache?

The entire podman machine command line is already confusing enough as no argument means default machine. So someone who just want to clean the cache cannot do that.

I think a dedicated command would solve this much cleaner and makes the docs much more straightforward, i.e. podman machine cache by default that just shows the cachefile localtion and size, and then when adding --remove it removes the cache?


- Decompress the image at pull time and save it next to the compressed blob in the cache directory.
- Add a new `podman machine rm --cache` flag to allow explicit cache pair removal.
- Add an `ImageDigest` field to `MachineConfig` to track provenance and enable refcounting.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The refcounting references totally confuse me, what are you trying to count.

If we do real reflink copies of files there is nothing to count, the file is a copy and the fs is handling the deduplication internally so the userspace does not need to know anything about it?

@inknos

inknos commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

I updated the design doc with a second commit.

  • it introduces podman machine cache for managing the cache
  • uses uncompressed cache only
  • copies on init

edit:

for clarification, we could add reflink copies in this design doc or later

- introduce `podman machine cache`
- use uncompressed cache only
- copy image on init

Signed-off-by: Nicola Sella <nsella@redhat.com>
@inknos inknos force-pushed the run-4473-design-doc branch from 6a4ad2a to 1abe5e7 Compare June 8, 2026 05:19
| Command | Effect |
| --- | --- |
| `podman machine cache` | List cached base images (digest, format, size) |
| `podman machine cache --remove` | Delete all cached base images after confirmation |

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about --remove flag since podman machine cache lists. Maybe create podman machine cache prune?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about --remove flag since podman machine cache lists. Maybe create podman machine cache prune?

depending on the complexity of the subcommand, prune might be worth it, i.e. thinking of a future scenario when we might have prune --all, prune --outdated

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good. We can start with just pruning.

@ashley-cui

Copy link
Copy Markdown
Contributor

Since the cache is only managing the default podman machine image, I think it's overkill to have a machine cache command. This seems like internals that user shouldn't worry about. If they really need to clean the cache / something breaks, then I'd recommend just telling them to podman machine reset.

A separate machine cache command could be discussed if we ever decide on caching user-created machine images, which introduces a lot more complexity that we might not want, and might end up looking like a machine images command. I think that would warrant another future design doc, and seems out of scope for the current speedup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants