Skip to content

Stargz Memory consumption #2275

@dgaponcic

Description

@dgaponcic

Hi,

We are deploying stargz on a cluster hosting gitlab runners. This cluster is used to run user workloads, which means we have short-lived pods (from a couple of minutes to a couple of hours), and the images used are always different.

The images can be quite large, ranging from 70 GB to 120 GB compressed, up to 20-50 layers.

The behaviour we are experiencing:

  1. When we run the first pod on a fresh node, the stargz memory utilization grows to around 0.6G for one of the user images.
  2. Once the pod finishes running, the memory consumption still stays the same. I think it's because the mounts are preserved.
  3. When a new workload is scheduled on the same node, the memory consumption grows again, usually to around 1G-1.3G depending on the new image.

I think eventually some of the memory is released, but when we leave the cluster running for a couple of days, and there are many back to back workloads, stargz eventually consumes more and more memory until it exhausts the node.

Is there a way to allow all the user pods to run without killing the node after a while? I would like to preserve the mounts only if there are running pods, but not for pods that have already completed, because otherwise we kill the nodes. I tried to use fuse-manager, but it's the same behaviour, just the memory is being utilized by the fuse-manager process instead of stargz.

Restarting the stargz process doesn't release the memory either.

Can you advise?

Best regards,
Diana

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions