Skip to content

[ZenCPU] Changes with respect to docker build and relevant cpu tests#38703

Open
Chinmay-Kulkarni-AMD wants to merge 6 commits into
vllm-project:mainfrom
Chinmay-Kulkarni-AMD:amd-cpu-ci-changes
Open

[ZenCPU] Changes with respect to docker build and relevant cpu tests#38703
Chinmay-Kulkarni-AMD wants to merge 6 commits into
vllm-project:mainfrom
Chinmay-Kulkarni-AMD:amd-cpu-ci-changes

Conversation

@Chinmay-Kulkarni-AMD

@Chinmay-Kulkarni-AMD Chinmay-Kulkarni-AMD commented Apr 1, 2026

Copy link
Copy Markdown
Contributor

Purpose

Add AMD Zen CPU CI infrastructure and fix test compatibility for the AMD Zen CPU platform.

AMD Zen CPUs do not support float16 compute natively (zen_cpu.py only advertises
[bfloat16, float32] as supported dtypes). Models whose native dtype is float16
are automatically converted to bfloat16 at load time. This causes numerical
divergence for certain models:

  • bigscience/bloom-560m: ALiBi attention produces degenerate (all-identical)
    logprob distributions in bfloat16, causing complete output divergence from
    HuggingFace reference. The model works correctly in float16 and float32.
  • Qwen/Qwen2.5-Math-PRM-7B: Test hardcodes dtype="half", which is
    unsupported on AMD Zen CPUs.

This PR:

  1. Adds Buildkite CI steps for AMD Zen CPUs (kernel, compatibility, generation/pooling,
    distributed tests), mirroring the existing Intel CPU test matrix.
  2. Adds a vllm-zen-test Docker build target for AMD Zen CPU test images.
  3. Adds AMD CPU image build script and test runner script.
  4. Skips bloom-560m and Qwen2.5-Math-PRM-7B tests on AMD Zen CPUs with
    clear reason annotations.

Test Plan

Run the following on an AMD Zen CPU machine:

# Generation tests
bash .buildkite/scripts/hardware_ci/run-amd-cpu-test.sh 30m \
  "pytest -x -v -s tests/models/language/generation -m cpu_model"

# Pooling tests
bash .buildkite/scripts/hardware_ci/run-amd-cpu-test.sh 30m \
  "pytest -x -v -s tests/models/language/pooling -m cpu_model"

# Kernel tests
bash .buildkite/scripts/hardware_ci/run-amd-cpu-test.sh 20m \
  "pytest -x -v -s tests/model_executor/test_cpu_unquantized_gemm_dispatch.py"

# Compatibility tests
bash .buildkite/scripts/hardware_ci/run-amd-cpu-test.sh 20m \
  "pytest -x -v -s tests/test_zen_cpu_platform_detection.py"

Test Result

Generation tests: 14 passed, 2 skipped (bloom-560m), 80 deselected

==== 14 passed, 2 skipped, 80 deselected, 94 warnings in 654.79s (0:10:54) =====
Pooling tests: 5 passed, 2 skipped (Qwen2.5-Math-PRM-7B), 78 deselected

===== 5 passed, 2 skipped, 78 deselected, 33 warnings in 468.85s (0:07:48) =====
  • Models with torch_dtype=float16 in their HF config are correctly downcast to bfloat16 with a log warning: Your device 'cpu' doesn't support torch.float16. Falling back to torch.bfloat16 for compatibility.
  • bloom-560m: ALiBi attention produces degenerate logprobs in bfloat16 (passes in float16 and float32)
  • Qwen2.5-Math-PRM-7B: test hardcodes dtype="half", unsupported on Zen CPUs
  • All other models (gpt2, gemma-1.1-2b-it, gemma-2-2b-it, Llama-3.2-1B-Instruct, opt-125m, Qwen2.5-0.5B-Instruct, TitanML/tiny-mixtral) pass correctly in bfloat16.
  • Models with torch_dtype=bfloat16 or torch_dtype=float32 are unaffected.
  • Test cases added specifically for Zen CPU and zentorch run and pass successfully.
tests/test_zen_cpu_platform_detection.py::test_is_amd_zen_cpu_detects_amd_with_avx512 PASSED [25%]
tests/test_zen_cpu_platform_detection.py::test_is_amd_zen_cpu_returns_false_for_amd_without_avx512 PASSED [50%]
tests/test_zen_cpu_platform_detection.py::test_is_amd_zen_cpu_returns_false_for_intel_with_avx512 PASSED [75%]
tests/test_zen_cpu_platform_detection.py::test_is_amd_zen_cpu_returns_false_when_cpuinfo_missing PASSED [100%]

tests/model_executor/test_cpu_unquantized_gemm_dispatch.py::test_dispatch_cpu_unquantized_gemm_uses_zentorch_on_zen PASSED [50%]
tests/model_executor/test_cpu_unquantized_gemm_dispatch.py::test_dispatch_cpu_unquantized_gemm_zen_remove_weight PASSED [100%]

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@Chinmay-Kulkarni-AMD Chinmay-Kulkarni-AMD marked this pull request as draft April 1, 2026 07:52
@mergify mergify Bot added ci/build cpu Related to CPU backends labels Apr 1, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces comprehensive CI support for AMD Zen CPUs, adding new Buildkite test pipelines, a dedicated Docker image build process, and specific test skips for models requiring float16 compute. The review feedback highlights several critical improvements for CI robustness and efficiency: ensuring that changes to C++ source and build files trigger the relevant tests, avoiding redundant Docker builds by leveraging pre-built images, using unique job identifiers in image tags to prevent race conditions, and enhancing security by removing unnecessary privileged execution flags in the test environment.

Comment thread .buildkite/hardware_tests/cpu.yaml Outdated
Comment thread .buildkite/hardware_tests/cpu.yaml Outdated
# allow to bind to different cores
CORE_RANGE=${CORE_RANGE:-48-95}
NUMA_NODE=${NUMA_NODE:-1}
IMAGE_NAME="cpu-test-$NUMA_NODE"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using a static IMAGE_NAME based only on the NUMA_NODE can lead to race conditions if multiple CI jobs run concurrently on the same agent and share the same Docker daemon. It is safer to include a unique identifier like BUILDKITE_JOB_ID in the tag.

Suggested change
IMAGE_NAME="cpu-test-$NUMA_NODE"
IMAGE_NAME="cpu-test-$NUMA_NODE-$BUILDKITE_JOB_ID"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefix the name with amd.
IMAGE_NAME="amd-cpu-text-$NUMA_NODE-$BUILDKITE_JOB_ID"


# building the docker image
echo "--- :docker: Building Docker image"
docker build --progress plain --tag "$IMAGE_NAME" --target vllm-zen-test -f docker/Dockerfile.cpu .

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This script performs a full docker build every time it is executed. Since there are multiple test steps in cpu.yaml calling this script, the image is rebuilt multiple times per CI run, which is highly inefficient. The script should ideally pull and use the image already built and pushed by the image-build-amd-cpu step in the pipeline.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping the structure similar to the existing image-build-cpu.sh file

docker build --progress plain --tag "$IMAGE_NAME" --target vllm-zen-test -f docker/Dockerfile.cpu .

# Run the image, setting --shm-size=4g for tensor parallel.
docker run --rm --cpuset-cpus="$CORE_RANGE" --cpuset-mems="$NUMA_NODE" -v ~/.cache/huggingface:/root/.cache/huggingface --privileged=true -e HF_TOKEN -e VLLM_CPU_KVCACHE_SPACE=16 -e VLLM_CPU_CI_ENV=1 -e VLLM_CPU_SIM_MULTI_NUMA=1 --shm-size=4g "$IMAGE_NAME" \

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Running the container with --privileged=true is a significant security risk and should be avoided unless absolutely necessary. The required functionality for CPU pinning (--cpuset-cpus, --cpuset-mems) and shared memory (--shm-size) does not require elevated privileges. Please remove this flag or replace it with specific capabilities if needed.

Suggested change
docker run --rm --cpuset-cpus="$CORE_RANGE" --cpuset-mems="$NUMA_NODE" -v ~/.cache/huggingface:/root/.cache/huggingface --privileged=true -e HF_TOKEN -e VLLM_CPU_KVCACHE_SPACE=16 -e VLLM_CPU_CI_ENV=1 -e VLLM_CPU_SIM_MULTI_NUMA=1 --shm-size=4g "$IMAGE_NAME" \
docker run --rm --cpuset-cpus="$CORE_RANGE" --cpuset-mems="$NUMA_NODE" -v ~/.cache/huggingface:/root/.cache/huggingface -e HF_TOKEN -e VLLM_CPU_KVCACHE_SPACE=16 -e VLLM_CPU_CI_ENV=1 -e VLLM_CPU_SIM_MULTI_NUMA=1 --shm-size=4g "$IMAGE_NAME" \

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping the structure similar to the existing image-build-cpu.sh file

@amukho

amukho commented Apr 1, 2026

Copy link
Copy Markdown

Adding @andy-neuma @tlrmchlsmth for reviews

@Chinmay-Kulkarni-AMD Chinmay-Kulkarni-AMD marked this pull request as ready for review April 3, 2026 06:44
@mergify

mergify Bot commented Apr 10, 2026

Copy link
Copy Markdown
Contributor

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @Chinmay-Kulkarni-AMD.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Comment thread .buildkite/hardware_tests/cpu.yaml Outdated
- label: AMD-CPU-Kernel Tests
depends_on: []
soft_fail: false
device: amd-zen-cpu-inference

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please update to match "zen5" ...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made the required change.

@Chinmay-Kulkarni-AMD Chinmay-Kulkarni-AMD force-pushed the amd-cpu-ci-changes branch 2 times, most recently from 4fda385 to 3e7343d Compare May 6, 2026 07:11
@mergify

mergify Bot commented May 23, 2026

Copy link
Copy Markdown
Contributor

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @Chinmay-Kulkarni-AMD.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify Bot added the needs-rebase label May 23, 2026
@amukho

amukho commented May 27, 2026

Copy link
Copy Markdown

hi @AndreasKaratzas we have raised this PR to enable vLLM CPU CI on Zen CPU instances. Can you please review and help with the merge?
hi @Chinmay-Kulkarni-AMD can you please rebase this change?
This PR is also dependent on vllm-project/ci-infra#343 by @andy-neuma where he is adding a Zen5 CPU instance in the buildkite infra of vLLM

@AndreasKaratzas AndreasKaratzas left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would advise an RFC for the addition of a new platform, the Zen platform, on CI. It is clear that you want to contribute on this stack, and hence this is not going to be your last contribution. So the wider community should probably advise on how to do that. For example my opinion is to separate from "CPU" platform as much as possible (trust me you'll thank me later). But this is exactly a matter of personal opinion and I am not the only committer. So probably an RFC would be the way to go.

Comment thread docker/Dockerfile.cpu Outdated
Comment on lines +253 to +280

######################### ZEN CPU TEST IMAGE #########################
FROM vllm-openai-zen AS vllm-zen-test

WORKDIR /vllm-workspace

COPY --from=vllm-test-deps /vllm-workspace/requirements/test/cpu.txt requirements/test/cpu.txt

RUN --mount=type=cache,target=/root/.cache/uv \
uv pip install -r requirements/test/cpu.txt

ADD ./tests/ ./tests/
ADD ./examples/ ./examples/
ADD ./benchmarks/ ./benchmarks/
ADD ./vllm/collect_env.py .
ADD ./docker/ ./docker/
ADD ./.buildkite/ ./.buildkite/

RUN --mount=type=cache,target=/root/.cache/uv \
uv pip install -e tests/vllm_test_utils

# enable fast downloads from hf (for testing)
ENV HF_XET_HIGH_PERFORMANCE=1
# increase timeout for hf downloads (for testing)
ENV HF_HUB_DOWNLOAD_TIMEOUT=60


ENTRYPOINT []

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this should be in its own dockerfile. This is a standalone stage not inheriting from any other in this dockerfile, so that is why I am thinking that it does not belong in the same dockerfile.

Also I think that for this exact reason, there should be an RFC first. I'm positive for this and this PR (with a bit of modification) is a good start, but the process would be an RFC to lay out the plan of adding Zen CPUs in CI and the areas that Zen is going to be busy with (I remember for example the zentorch quantized PR which is very recent).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello Andreas, the restructuring of files and snippets into zen specific files is done and pushed. RFC is also created for introducing the Zen platform into the vLLM and vLLM CI which can be viewed here: #44421

Comment thread .buildkite/hardware_tests/cpu.yaml Outdated
pytest -x -v -s tests/kernels/test_awq_int4_to_int8.py"

- label: AMD-CPU-Kernel Tests
depends_on: []

@AndreasKaratzas AndreasKaratzas May 27, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From one engineer to another, I am thinking that it's best if you start separating Zen test yamls/dockerfiles/scripts from the CPU platform. It will be very hard to do it later on, and trust me it will be inevitable. Also, keep the naming as "zen" even for the filenames instead of "amd" since AMD can mislead to GPU deployment too (for example one of the file names in this PR that has this "amd" is the introduced run-amd-cpu-test.sh -- btw the first thing that you might tell me is "why your team (ROCm) are calling it AMD then and not ROCm or something"? Well, the answer is in my second sentence above).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the zen specific code and the build snippets are carved out into "zen"-specific files.

Comment thread .buildkite/hardware_tests/cpu.yaml Outdated
bash .buildkite/scripts/hardware_ci/run-cpu-distributed-smoke-test.sh dp_tp"

- label: AMD-CPU-Distributed Tests
depends_on: []

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After you define a Dockerfile.zen make sure that you add that docker image build in the dependency list here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently keeping the parity with the cpu.yaml. All of the tests are currently building images locally and using them to run the tests. Do you suggest to build the image once and use it across every test unlike the current cpu.yaml?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, take a look at vllm/.buildkite/test_areas/entrypoints.yaml line 75. For AMD GPU builds we have to let the pipeline generator know that we depend on the amd docker build rather than the cuda one. I assume that you would need to do the same. But I might be wrong. Check it out and lmk :)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zen.yaml follows the hardware_tests/cpu.yaml model: no_plugin: true steps where the run script builds the image locally and runs it, so there's no pre-built image to depend on — that's why depends_on: []. The entrypoints.yaml example is the docker-plugin model where the step pulls a pre-built image, hence the image-build-amd dependency. I can switch Zen to that pull-based model (depend on image-build-zen-cpu + pull instead of local build) if you'd prefer the build-once approach — it'd diverge from cpu.yaml but avoid the redundant build. However, explicit testing would be required if docker pull "$REGISTRY/$REPO:$BUILDKITE_COMMIT-zen-cpu" works as expected, if we want to build once and use it across every test.

Is it possible to make sure that the current setup works properly and then push the "build once and pull for every test run" feature?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that for Zen you need to invoke your own image cause of zen specific features like zentorch, i m also not sure i understand the "local build" idea. CI workflow for GPUs works like this:

bootstrap --> docker build on buildkite --> test groups begin after they pull the correct container from the previous step

That is for both upstream CI and AMD ROCm CI. What is the workflow for CPU CI? Does the main CPU image contain the dependencies you need for Zen tests?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I went with the local-build model is that I have no way to confirm if the zen5 agents on zen5 queue configuration have ECR pull credentials. Let me lay out how the current setup works.

The Zen test steps build the image in-step, with no_plugin: true — there's no pre-built image to depend on:

# .buildkite/hardware_tests/zen.yaml
- label: Zen-CPU-Kernel Tests
  depends_on: []
  soft_fail: false
  device: zen5
  no_plugin: true
  ...
  commands:
    - |
      bash .buildkite/scripts/hardware_ci/run-zen-cpu-test.sh 20m "
      pytest -x -v -s tests/model_executor/test_cpu_unquantized_gemm_dispatch.py"

The runner builds the CPU base + Zen image from the checkout and runs it — no docker pull of a registry image anywhere:

# .buildkite/scripts/hardware_ci/run-zen-cpu-test.sh
# Build CPU base image (Dockerfile.zen layers on top of this).
docker build --progress plain \
    --tag "$CPU_BASE_IMAGE" \
    --target vllm-openai \
    -f docker/Dockerfile.cpu .
# Build the Zen CPU test image on top of the CPU base.
docker build --progress plain \
    --build-arg BASE_IMAGE="$CPU_BASE_IMAGE" \
    --tag "$IMAGE_NAME" \
    --target vllm-zen-test \
    -f docker/Dockerfile.zen .
# Run tests inside the just-built image.
docker run --rm ... "$IMAGE_NAME" \
        timeout "$TIMEOUT_VAL" bash -c "... ${TEST_COMMAND}"

Because everything is built from the repo, this needs no ECR pull credentials on the zen5 agents, so it doesn't depend on registry access being wired up on that queue. This is the same model hardware_tests/cpu.yaml uses :

# .buildkite/hardware_tests/cpu.yaml
- label: CPU-Kernel Tests
  depends_on: []
  device: intel_cpu
  no_plugin: true
  commands:
    - |
      bash .buildkite/scripts/hardware_ci/run-cpu-test.sh 30m "..."
# .buildkite/scripts/hardware_ci/run-cpu-test.sh
docker build --progress plain --tag "$IMAGE_NAME" --target vllm-test -f docker/Dockerfile.cpu .
docker run --rm ... "$IMAGE_NAME" timeout "$TIMEOUT_VAL" bash -c "... ${TEST_COMMAND}"

To confirm your point directly: yes, Zen needs its own image — the main CPU image has no zentorch (grep zentorch docker/Dockerfile.cpu returns nothing); zentorch is installed only in Dockerfile.zen, layered on top of the CPU vllm-openai base.

The build-once-pull model you described (build → push → tests pull) is cleaner, and the pieces for it already exist — image_build_zen_cpu.sh builds and pushes the -zen-cpu image:

# .buildkite/image_build/image_build_zen_cpu.sh
ZEN_TAG="$REGISTRY/$REPO:$BUILDKITE_COMMIT-zen-cpu"
...
docker build --file docker/Dockerfile.zen \
  --build-arg BASE_IMAGE="$CPU_BASE_TAG" \
  --tag "$ZEN_TAG" \
  --target vllm-zen-test \
  --progress plain .
docker push "$ZEN_TAG"

To switch the tests to consume it, I'd add depends_on: [image-build-zen-cpu] to the steps and change the runner from the two docker builds above to docker pull "$REGISTRY/$REPO:$BUILDKITE_COMMIT-zen-cpu". The only blocker is that this needs the zen5 agents to be able to pull from ECR, which I can't verify from my side (the always-pull: true in amd_zen5_plugin_template from ci-infra#343 suggests they're meant to).

Comment on lines +13 to +14
# authenticate with AWS ECR
aws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin "$REGISTRY"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that this is in parity (near-exact verbatim) with the image_build_cpu.sh script. Word of warning: Since you are going to be setting the infra side, be very careful of tokens and login credentials here. You must make sure that they are protected such that if there is a PR up that attempts to print them during CI evaluation, you are not compromised. Make sure and be very careful with your infra team about this. I know that with Github secrets it would have been easy, but with Buildkite agents, I don't know the process to be honest with you, so I can just raise a word of warning.

docker build --progress plain --tag "$IMAGE_NAME" --target vllm-zen-test -f docker/Dockerfile.cpu .

# Run the image, setting --shm-size=4g for tensor parallel.
docker run --rm --cpuset-cpus="$CORE_RANGE" --cpuset-mems="$NUMA_NODE" -v ~/.cache/huggingface:/root/.cache/huggingface --privileged=true -e HF_TOKEN -e VLLM_CPU_KVCACHE_SPACE=16 -e VLLM_CPU_CI_ENV=1 -e VLLM_CPU_SIM_MULTI_NUMA=1 --shm-size=4g "$IMAGE_NAME" \

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same warning for the HF token that I gave in a different spot for AWS credentials.

Signed-off-by: Chinmay Kulkarni <Chinmay.Kulkarni@amd.com>
Signed-off-by: Chinmay Kulkarni <Chinmay.Kulkarni@amd.com>
…erence tests

Signed-off-by: Chinmay Kulkarni <Chinmay.Kulkarni@amd.com>
Signed-off-by: Chinmay Kulkarni <Chinmay.Kulkarni@amd.com>
Signed-off-by: Chinmay Kulkarni <Chinmay.Kulkarni@amd.com>
…vidual files

Signed-off-by: Chinmay Kulkarni <Chinmay.Kulkarni@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build cpu Related to CPU backends

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants