Skip to content

Router default memory limit (500Mi) causes OOMKilled in real cluster deployments #858

@pugafran

Description

@pugafran

Problem

The current Helm chart version vllm-stack (e.g. 0.1.9) sets the default router memory limit to:

routerSpec:
resources:
limits:
memory: 500Mi

In real cluster scenarios, this leads to:

  • OOMKilled
  • Exit code 137
  • Router entering CrashLoopBackOff

500Mi is insufficient for typical production-stack usage.


Observation

The GitHub repository appears to already contain an increased memory default for the router, but the published Helm chart (0.1.9) still renders 500Mi.

Running:

helm show values vllm/vllm-stack --version 0.1.9

still shows 500Mi.

So the Helm release does not yet include the updated default.


Suggested Fix

Increase the default router memory limit to at least:

requests:
memory: 1Gi
limits:
memory: 2Gi

or remove the hard memory limit and leave it configurable.


Workaround

Users can override the value manually in their values.yaml, but the default leads to immediate instability in many setups.


Thanks for the great project!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions