Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
183 commits
Select commit Hold shift + click to select a range
b8b49e2
Bump actions/github-script from 8.0.0 to 9.0.0 (#39667)
dependabot[bot] Jun 2, 2026
e4a2e58
[MRV2] Remove assignment of graph_pool in cudagraph_utils (#44338)
WoosukKwon Jun 2, 2026
e9e08c4
[Bugfix] Cache the EAGLE/MTP lookahead block in the SWA prefix-cache …
ivanium Jun 2, 2026
5577811
[Misc] Remove stray empty file (#44350)
MatthewBonanni Jun 2, 2026
e15f202
[ModelRunnerV2] Avoid pipeline parallel bubbles (#42187)
njhill Jun 2, 2026
3099de3
[Kernel][MoE] Add GELU_TANH to CPU, CUTLASS, and WNA16 MoE backends (…
lesj0610 Jun 2, 2026
0917a00
Fix sparse NCCL weight transfer test construction (#44345)
bedeks Jun 2, 2026
8b3b71e
[CI/Build] Bump flashinfer to v0.6.12 (#44036)
vadiklyutiy Jun 2, 2026
a4ac746
[MoE/b12x] Accept W4A16 (kNvfp4Static, None) in FlashInferB12xExperts…
ECMGit Jun 2, 2026
bd98e97
[Misc] Remove dead VLLM_RPC_TIMEOUT env var and fix profiling doc tha…
DaoyuanLi2816 Jun 3, 2026
b254e04
[DSV4] Minor cleanup for DeepseekV4MegaMoEExperts (#44367)
WoosukKwon Jun 3, 2026
ca17b6b
[Perf] Apply single-pass min_larger finding and binary search in Trit…
cakeng Jun 3, 2026
969aec4
[Bugfix] Fix Deepseek v4 non-mega-moe model init error (#44356)
wzhao18 Jun 3, 2026
27a93cd
[docker] Stop using extra-index-url for flashinfer-jit-cache (#44366)
khluu Jun 3, 2026
02a0149
[Platform] Add is_cumem_allocator_available (#43838)
wangxiyuan Jun 3, 2026
4454a18
[ROCm][CI] Fix stale wvSplitK GEMM fallback test for N=5 (#44368)
JartX Jun 3, 2026
7b476c8
[ROCm][CI] Skip fp8 reload tests on gfx90a (MI250) (#44369)
JartX Jun 3, 2026
53b88d1
[CI] Reject out-of-vocabulary before they reach the GPU logprob path…
AndreasKaratzas Jun 3, 2026
e670638
[CI] Add missing vllm/parser/ CI trigger and fix test_parse.py (#44352)
sfeng33 Jun 3, 2026
3f0a91b
Nit Changes in Tiered KV Offload (#44293)
rshavitt Jun 3, 2026
597bc15
fix: resolve CUTLASS fmin compatibility for DeepSeek-V4 init (#44236)
Oxygen56 Jun 3, 2026
f020435
[Bugfix] fix crash in postprocess for null tool args (#43862)
william-rom Jun 3, 2026
e0081ef
[Benchmark] Enable reasoning-model (thinking) benchmarking via `--cha…
qiching Jun 3, 2026
71df063
Enable perf_token_group_quant/_C_stable_libtorch for ROCm (#42758)
charlifu Jun 3, 2026
87954eb
[ROCm][CI] Optimize ROCm Docker build: registry cache, DeepEP, and ci…
AndreasKaratzas Jun 3, 2026
9af53a3
[Perf] Add tuned selective_state_update configs for H200 and RTX PRO …
Majid-Taheri Jun 3, 2026
7268457
[KV Offloading] Enable HMA models for Tiering Offloading (#44287)
varun-sundar-rabindranath Jun 3, 2026
4aaed4c
[Rust Frontend] Add server router extension hook (#43774)
NolanHo Jun 3, 2026
6550ff1
[Rust Frontend] Add dynamic LoRA endpoints (#43778)
Xunzhuo Jun 3, 2026
449be4f
[Rust Frontend] Fix several hf chat template rendering issues (#44311)
BugenZhao Jun 3, 2026
0e2b131
[Doc] Update ViT CUDA graph interfaces (#44388)
shen-shanshan Jun 3, 2026
ace95c9
[Bugfix] Update TrtLLM MoE routing methods (#44347)
wzhao18 Jun 3, 2026
209709a
[Bugfix] Fix unstreamed tool call args dropped in Responses API strea…
sfeng33 Jun 3, 2026
02564b4
[XPU]fallback to TRITON_ATTN for vit attn on xpu when use float32 dty…
yma11 Jun 3, 2026
1fa9ea0
[Perf] Triton fast path for small CPU→GPU `swap_blocks_batch` in the …
Etelis Jun 3, 2026
95b1615
[Perf] Improve multimodal item handling from O(n) to O(log n) per ste…
andylolu2 Jun 3, 2026
823d271
[Attention][CPU] Standardize kv layout to blocks first (#44393)
bigPYJ1151 Jun 3, 2026
3d76f39
[SharedOffloadRegion] Align blocks to page-size (#43689)
varun-sundar-rabindranath Jun 3, 2026
309385a
[Rust Frontend] Add /server_info to Rust frontend (#43942)
Xunzhuo Jun 3, 2026
e523267
[XPU] Add XPU block-scaled W8A8 fp8 path (#39968)
xwu-intel Jun 3, 2026
e3e132d
[Refactor] Suppress SyntaxWarning from ast.literal_eval in tool parse…
sfeng33 Jun 3, 2026
27f1d34
[Frontend][Responses API] Move developer-to-system conversion into HF…
chaunceyjiang Jun 3, 2026
ec8d60b
[Model Runner V2] Use FlashInfer sampler (#42472)
njhill Jun 3, 2026
4d1fd13
[CI/Build] Fix LoRA testing (#44425)
jeejeelee Jun 3, 2026
df7252c
[CI] Align PD tests to HMA on by default (#44174)
NickLucche Jun 3, 2026
0c6631f
[KVCache] Support Pluggable KVCacheSpec (#37505)
MengqingCao Jun 3, 2026
51e0c57
fix(config): validate max_num_scheduled_tokens >= 0 on all paths (#44…
Oxygen56 Jun 3, 2026
0a5cbf6
Handle spinloop ext load failure gracefully (#43659)
pschlan-amd Jun 3, 2026
59d0236
[10b/n] Migrate custom all-reduce, DeepSeek V4 fused MLA, MiniMax red…
cleonard530 Jun 3, 2026
5b2a2be
[ROCm][CI] Move Model Executor test step from MI250 to MI300 (gfx942)…
JartX Jun 3, 2026
2b91012
[Refactor] Remove dead code fp quant (#44122)
yewentao256 Jun 3, 2026
271328e
[LoRA] Fix dedup for post-replacement module aliases (#44413)
linitra24 Jun 3, 2026
a248b45
[Model] Add Gemma4 Unified (encoder-free) support (#44429)
lucianommartins Jun 3, 2026
dad95e3
[Feature] Support batch invariant rms norm with residual (#42453)
yewentao256 Jun 3, 2026
2b237c7
[Bugfix] Honor tool_choice="none" in Chat Completions streaming (#42752)
hoobnn Jun 3, 2026
91945b6
[Bug Fix][Model Runner V2][Spec Decode] Warmup & capture with differe…
TheEpicDolphin Jun 3, 2026
6bad553
[Minor] Remove FlashInfer version check in topk_topp_sampler (#44442)
WoosukKwon Jun 3, 2026
bdbf08f
Bump actions/stale from 10.1.1 to 10.2.0 (#35078)
dependabot[bot] Jun 3, 2026
128adab
[Bugfix] Fix Gemma4 MTP block_table batch_size mismatch under concurr…
Dymasik Jun 4, 2026
0414d75
[XPU] skip unapplied UT in test_gpu_model_runner.py (#44289)
yma11 Jun 4, 2026
ceb0111
[Model Runner V2][Spec Decode] Add Gemma4 MTP support (#43241)
TheEpicDolphin Jun 4, 2026
0c1e6f6
[Bugfix] Fix VLLMNotFoundError when using LoRA adapter name in poolin…
wanghenshui Jun 4, 2026
b58e082
[KV Connector] Update lmcache kv_offloading_backend to use LMCacheMPC…
maobaolong Jun 4, 2026
f25952e
[MM][Perf][CG] Support ViT full CUDA graph for InternVL (#41759)
oguzhankir Jun 4, 2026
e6018c6
[Refactor] Remove dead code in tests and parallel_state (#41471)
yewentao256 Jun 4, 2026
f0cd590
optimize the compressor 128 split cutedsl kernel (#44230)
Jie-Fang Jun 4, 2026
4f423bd
[EPLB] Nixl communicator optimization. Zero-copy transfers (#41633)
ilmarkov Jun 4, 2026
5e2af28
[CI] Resolve release V2 docker build after ROCm CI wheels change (#44…
AndreasKaratzas Jun 4, 2026
b4b4aaa
[Inductor] Fast-path Inductor fallback for vllm::*/vllm_aiter::* cust…
okorzh-amd Jun 4, 2026
d01d0b4
[Frontend] Consolidate online serving utils. (#44479)
noooop Jun 4, 2026
22c2e87
[CI] Reverted gitignore changes (#44497)
AndreasKaratzas Jun 4, 2026
a618356
[Prefix Caching] DeepSeekv4 - Support selective prefix-cache retentio…
wzhao18 Jun 4, 2026
1bdc60e
Fix Kimi-K2.5 FlashInfer ViT metadata (#44493)
Kevin-XiongC Jun 4, 2026
d0975a4
[perf] Add gemma RMS AR fusion (#42646)
jiahanc Jun 4, 2026
9061935
[Attention] Mamba attention module refactor - LINEAR (#43556)
wangxiyuan Jun 4, 2026
4b87b3e
[Bugfix] fix EVS for qwen3-vl (#44205)
garrygale Jun 4, 2026
e68988a
Refactor CT NVFP4 linear to use a single class (#42443)
dsikka Jun 4, 2026
f35b557
Add GH token to docs build pre run check (#44534)
hmellor Jun 4, 2026
9354fb1
[Bugfix][Compile] Guard per_token_group_fp8_quant lookup on non-CUDA …
QiliangCui2023 Jun 4, 2026
68f5e56
[PD][Nixl] Mamba prefix caching mode support (#42554)
NickLucche Jun 4, 2026
0c96dd6
[ROCm] Bump fastsafetensors to v0.3.2 from PyPI, remove git source bu…
wjabbour Jun 4, 2026
6f68ca3
[ROCm][CI] Stabilize memory-release in the Hybrid model generation te…
AndreasKaratzas Jun 4, 2026
3e77036
[ROCm][CI] Specifying time outs for the lm eval models (#44255)
AndreasKaratzas Jun 4, 2026
b5235fc
[DSv4] Adding TRTLLM gen attention kernel (#43827)
zyongye Jun 4, 2026
06ee2d8
[Quant] Support compressed-tensors WNA8O8Int linears and WNInt embedd…
mgoin Jun 4, 2026
b21443e
Add model support for granite speech plus (#43519)
zvik Jun 4, 2026
3dbb4e0
[Bugfix] MiniCPM-V-4.6 video inference crash: placeholder count misma…
tc-mb Jun 4, 2026
4cc78c9
[Core] Freeze garbage collector in workers after model initialization…
tlrmchlsmth Jun 4, 2026
99ef652
[Bugfix] Reject non-positive values for ParallelConfig int knobs (#44…
jwzheng96 Jun 4, 2026
06f9463
[ROCm][CI] Add test for Aiter unified attn kernel (#44436)
divakar-amd Jun 4, 2026
3da29aa
[DOC] Add INT8 W4A8 docs and Arm's supported quantization schemes (#3…
fadara01 Jun 4, 2026
8d9536a
[Misc] Add unit tests for pooler head classes (#44471)
taneem-ibrahim Jun 4, 2026
439203d
[Bugfix] Fix test_cutlass_moe.py (#44380)
bnellnm Jun 4, 2026
a947f7a
[Kernel][Test] Extend lightning_attn and awq_triton kernel tests to X…
adobrzyn Jun 4, 2026
38fd240
use split_group for pytorch process group creation (#41980)
tushar00jain Jun 4, 2026
41a4829
[Logs Refactor] Optimize shutdown logs, easier to follow and consiste…
yewentao256 Jun 4, 2026
a55fccf
[mamba] unify KDA conv states into one cache to match 2-state SSM lay…
ZJY0516 Jun 4, 2026
b7c5baf
fix: keep DeepSeek V4 RoPE cache on inv_freq device (#43926)
galletas1712 Jun 4, 2026
62d6f06
[Rust Frontend] Skip loading multimodal processor if `--language-mode…
BugenZhao Jun 5, 2026
063ce98
[XPU][MoE] support block_fp8_moe on xpu (#42139)
zufangzhu Jun 5, 2026
56aff0d
[10/n] Migrate cuda_view and silu_and_mul_per_block_quant kernels to …
cleonard530 Jun 5, 2026
4efd6ff
[DSV4] Refactor DeepseekV4Attention (#44569)
WoosukKwon Jun 5, 2026
da1daf4
[Bugfix] Exclude vision embedder from quantization in Gemma4 Unified …
lucianommartins Jun 5, 2026
96229fa
[KVConnector][1/N] PP-aware handshake aggregation and intermediate-PP…
zixi-qi Jun 5, 2026
c505cd9
[CI/Build] Disable CPU-Compatibility Tests (#44605)
bigPYJ1151 Jun 5, 2026
165b786
[ROCM] [FEAT] Integrate Aiter hipBLASLt GEMM online tuning (#40426)
hanlin12-AMD Jun 5, 2026
b4a6f26
[ROCm][perf] Use workspace manager for sparse indexer allocations (#4…
tuukkjs Jun 5, 2026
ef3af56
Fix `LLM.wait_for_completion` output type docstring (#44617)
viiccwen Jun 5, 2026
ca73293
[Bugfix][Rust Frontend] Fix UTF-8 char-boundary panic in incremental …
Sunt-ing Jun 5, 2026
6542d48
[Bugfix] Fix test_invocations flaky failure with newer openai SDK (#4…
XuZhou26 Jun 5, 2026
d2f70da
fix: pad dummy run query_start_loc (#44603)
UranusSeven Jun 5, 2026
d61d856
[Bugfix] Update mistral tokenizer test for continue_final_message fix…
XuZhou26 Jun 5, 2026
e64237a
[Rust Frontend] Support include_reasoning=false (#44391)
ricky-chaoju Jun 5, 2026
d98b8f3
[NixlConnector] Initiate deprecation cycle for `kv_both` role (#43874)
NickLucche Jun 5, 2026
efc347f
docs: fix tokenizer optimization typo (#44066)
chunyang-wen Jun 5, 2026
8a83e6f
[Rust Frontend] Batch auto-abort requests by engine (#44591)
HueCodes Jun 5, 2026
7fe7800
[BUG] Fix FP64 Gumbel precision coverage (#43150)
tianyu-z Jun 5, 2026
62215e7
Remove KV cache scale boilerplate from model weight loading methods (…
hmellor Jun 5, 2026
bbb6c27
[Bugfix] Fix gemma4 crash on CPU: guard mem_get_info call (#44615)
adhithyamulticoreware Jun 5, 2026
02d2da0
[DSV4] Move more ops out of eager breakpoint (#44561)
WoosukKwon Jun 5, 2026
6a11d72
[Reasoning][Structured Outputs] Add Command A plus tags for structura…
rishitdholakia13 Jun 5, 2026
c66b198
[CI] Bump mistral-common (#44649)
hmellor Jun 5, 2026
a80af24
Speed up docs build (#44635)
hmellor Jun 5, 2026
ef0df7d
[CI] Bump mypy version `1.19.1` -> `1.20.2` (#44647)
hmellor Jun 5, 2026
7f003a1
Support MiniCPMV batched preprocessing (#44609)
yma11 Jun 5, 2026
6a89457
Add objectstore as a secondary tier to multi-tier kv cache offloading…
effi-ofer Jun 5, 2026
aa6fb8a
[Bugfix] [ROCm] [Critical] fallback to regular abi for ROCm (#44648)
tjtanaa Jun 5, 2026
91e17d4
Fix sarvam forward compatibility with transformers v5 (#38804)
Vikrantpalle Jun 5, 2026
b593396
Upgrade tpu-inference to v0.21.0 (#44621)
CienetStingLin Jun 5, 2026
703fb17
[Bugfix] GPT-OSS instruction rendering (#44330)
yzong-rh Jun 5, 2026
e28e369
Male Mergify comment less spammy (#44666)
hmellor Jun 5, 2026
c73b0d0
[Core][Engine] allow DP ray placement groups to be set on specific no…
walterbm Jun 5, 2026
4200f62
[ROCm][GPT-OSS] Fuse RoPE + static Q FP8 quant on fused RoPE+KV path …
akii96 Jun 5, 2026
f6a708a
[Doc] Add Llama-3.2-3B-Instruct to batch-invariance tested models (#4…
DaoyuanLi2816 Jun 5, 2026
a50e675
[Cohere] fix RoutingMethodType (#44021)
Terrencezzj Jun 5, 2026
4765f0f
[Bugfix] Fix `sequence_parallel_chunk_impl` custom op aliasing its in…
vadiklyutiy Jun 5, 2026
2f27c9a
Preserve layout-changing clones (#44574)
mikekg Jun 6, 2026
c8beda4
[Rust Frontend] Add Phi-4 mini JSON tool parser (#44213)
devin-lai Jun 6, 2026
ec0a31d
[Bugfix][Kernel] Fix mHC fused-RMSNorm big-fuse miscompile for hidden…
zyongye Jun 6, 2026
eafbb06
[Misc] Replaced asserts with proper exceptions to improve UX for pool…
taneem-ibrahim Jun 6, 2026
f87df1d
[Bugfix][MoE] Snapshot max_cudagraph_capture_size into FusedMoEConfig…
aoshen02 Jun 6, 2026
c9b4b18
[Bugfix][Voxtral] Add fetch_audio to MistralCommonFeatureExtractor (t…
Yadan-Wei Jun 6, 2026
00d1fb7
[Bugfix][ROCm] `ApplyRotaryEmb`: fall back to native when flash_attn …
amd-fuweiy Jun 6, 2026
67d3792
[Bugfix] Fix Qwen3.5-FP8 nightly fail. Guard fused_add_rms_norm input…
vadiklyutiy Jun 6, 2026
fa27d4e
[PERF] [Qwen3.5] Split mixed prefill+decode batches: route decodes to…
vadiklyutiy Jun 6, 2026
062b05f
[ROCm][Perf] Fused MoE W4A16 HIP kernel for AMD RDNA3 (gfx1100) (#44075)
JartX Jun 6, 2026
3b3d528
[BugFix] Resolve multiple async kv load deadlock (#44560)
njhill Jun 6, 2026
bc5745a
[ROCm][MLA] Replace torch.cat in sparse-MLA forward_mqa with fused co…
maeehart Jun 6, 2026
2a983c7
[DSV4] Decouple DS V4 Sparse MLA Metadata from DS V3.2 (#44699)
WoosukKwon Jun 7, 2026
8109664
[XPU] Support cpu kv offloading and tiering offloading on XPU platfo…
chaojun-zhang Jun 7, 2026
3bb4697
[XPU][Feature] transparent sleep mode support for XPU platform (#37149)
yma11 Jun 7, 2026
6181e80
[XPU] add xpu branch in compressed_tensors_moe_w4a4_mxfp4 (#44540)
zufangzhu Jun 7, 2026
9c7f774
[Bugfix] Fix benchmark_moe.py after inplace mechanism removal (#44041)
qyYue1389 Jun 7, 2026
32f34d3
[feature] add index share feature for DSA MTP (#44420)
JaredforReal Jun 7, 2026
1505b3d
[Cohere] Enable Cohere Mini Code model and update Command A-plus test…
Terrencezzj Jun 7, 2026
6ac6920
[videoloader] implement glm46v video loader (#44417)
JaredforReal Jun 7, 2026
51ef688
[Bugfix][Mooncake] Fix per-group block_size/block_hash and group_idx …
ivanium Jun 7, 2026
15652a6
[Doc] Fix multimodal torch.compile troubleshooting to not use removed…
DaoyuanLi2816 Jun 7, 2026
f0f6805
[CI] Stabilize the multi-audio OpenAI server path (#44051)
AndreasKaratzas Jun 7, 2026
66ecfd0
[Dependency] Remove stale cuDNN frontend upper bound (#42599)
mmangkad Jun 7, 2026
3d3ba46
Modify torch dependency in xpu.txt (#43087)
BramVanroy Jun 7, 2026
228bcc4
[ROCm][Kernel] Enable permute_cols for ROCm (#44674)
charlifu Jun 7, 2026
4dcd10e
[1/N][KV-Cache Layout Refactor] Refactor DSV4 KV cache config constru…
LucasWilkinson Jun 7, 2026
2ed0a96
[Kernel][Test] Make kernel tests for mamba dual-HW (CUDA + XPU) (#42736)
adobrzyn Jun 8, 2026
6124a98
[Bugfix] Fix FunASR-Nano crash during initialization (#44215)
SunskyXH Jun 8, 2026
5633405
Added extra_repr() to pooler classes to improve debuggability (#44805)
taneem-ibrahim Jun 8, 2026
303916e
[Bugfix]: Fix assertion in MambaManager.allocate_slots() (#39562)
Holworth Jun 8, 2026
eebce65
[XPU]feat: add DeepSeek-V4 XPU attention decode path (#42953)
majian4work Jun 8, 2026
8fb0274
[MM][CG] Simplify ViT CUDA graph interfaces (#44484)
shen-shanshan Jun 8, 2026
54c660c
[XPU][Minor] format moe kernel name and add in kernel list (#44771)
yma11 Jun 8, 2026
967c5c3
[ROCm][CI] Stage C mirrors (#42793)
AndreasKaratzas Jun 8, 2026
d9ff7e4
[ROCm][CI] Stabilizing teardown and timeout of flaky tests to prevent…
AndreasKaratzas Jun 8, 2026
94fcdd0
[XPU][CI] Add more test cases in Intel GPU CI (#43663)
zxd1997066 Jun 8, 2026
469f3dc
[BugFix] Use served model name in gemma4 audio-tower error message (#…
llsj14 Jun 8, 2026
3c0b443
[Rust Frontend] Add /pause, /resume, /is_paused endpoints (#44499)
sahilsGit Jun 8, 2026
fa662b1
[XPU] Cap topk/topp Triton BLOCK_SIZE to 4096 to fix Top-p mask diffe…
chaojun-zhang Jun 8, 2026
d5fe994
[CPU][Spec Decode] Warn about throughput loss when libiomp5 is not pr…
jmamou Jun 8, 2026
5add018
[Connector] Remove `P2pNcclConnector` (#44854)
NickLucche Jun 8, 2026
980796c
[CI/Build][CPU] Fix flaky CI image build failure and unexpected warni…
bigPYJ1151 Jun 8, 2026
93ee4cd
[CI] Consolidate multimodal entrypoint tests. (#44819)
noooop Jun 8, 2026
ac3409d
[Benchmark] Auto-detect and correct client/server tokenizer mismatch …
akii96 Jun 8, 2026
753e9d5
[Quantization] add online fp8 ptpc (#44132)
walterbm Jun 8, 2026
a3e798b
Merge commit '753e9d55e6' into merge-from-upstream
eble-amd Jun 22, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
23 changes: 23 additions & 0 deletions .buildkite/ci_config_rocm.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
name: vllm_rocm_ci
job_dirs:
- ".buildkite/hardware_tests"
run_all_patterns:
- "docker/Dockerfile.rocm"
- "docker/Dockerfile.rocm_base"
- "docker/ci-rocm.hcl"
- "docker/docker-bake-rocm.hcl"
- ".buildkite/hardware_tests/amd.yaml"
- ".buildkite/scripts/ci-bake-rocm.sh"
- ".buildkite/scripts/hardware_ci/run-amd-test.py"
- ".buildkite/scripts/hardware_ci/run-amd-test.sh"
- "CMakeLists.txt"
- "requirements/common.txt"
- "requirements/rocm.txt"
- "requirements/build/rocm.txt"
- "requirements/test/rocm.txt"
- "setup.py"
- "csrc/"
- "cmake/"
run_all_exclude_patterns:
- "csrc/cpu/"
- "cmake/cpu_extension.cmake"
99 changes: 65 additions & 34 deletions .buildkite/hardware_tests/amd.yaml
Original file line number Diff line number Diff line change
@@ -1,42 +1,73 @@
group: Hardware - AMD Build
group: Hardware - AMD Build
steps:
- label: "AMD: :docker: build image"
key: image-build-amd
# Ensure ci_base is up-to-date before building the test image.
# Compares a content hash of ci_base-affecting files against the remote
# image label. If hashes match the build is skipped (< 30 s); if they
# differ ci_base is rebuilt and pushed automatically.
- label: "AMD: :docker: ensure ci_base"
key: ensure-ci-base-amd
depends_on: []
device: amd_cpu
no_plugin: true
commands:
- >
docker build
--build-arg max_jobs=16
--build-arg REMOTE_VLLM=1
--build-arg ARG_PYTORCH_ROCM_ARCH='gfx90a;gfx942;gfx950'
--build-arg VLLM_BRANCH=$BUILDKITE_COMMIT
--tag "rocm/vllm-ci:${BUILDKITE_COMMIT}"
-f docker/Dockerfile.rocm
--target test
--no-cache
--progress plain .
- |
docker run --rm --network=none --entrypoint /bin/bash "rocm/vllm-ci:${BUILDKITE_COMMIT}" -ec '
if [ ! -d /vllm-workspace ]; then echo Missing directory: /vllm-workspace >&2; exit 1; fi
if [ ! -d /vllm-workspace/tests ]; then echo Missing directory: /vllm-workspace/tests >&2; exit 1; fi
if [ ! -d /vllm-workspace/src/vllm ]; then echo Missing directory: /vllm-workspace/src/vllm >&2; exit 1; fi
if [ ! -x /vllm-workspace/src/vllm/vllm-rs ]; then echo Missing executable: /vllm-workspace/src/vllm/vllm-rs >&2; exit 1; fi
command -v python3
command -v uv
command -v pytest
if ! command -v amd-smi >/dev/null 2>&1 && ! command -v rocminfo >/dev/null 2>&1; then
echo No ROCm CLI found in image >&2
exit 1
- bash .buildkite/scripts/ci-bake-rocm.sh ci-base-rocm-ci-with-deps
env:
DOCKER_BUILDKIT: "1"
VLLM_BAKE_FILE: "docker/docker-bake-rocm.hcl"
PYTORCH_ROCM_ARCH: "gfx90a;gfx942;gfx950"
REMOTE_VLLM: "1"
VLLM_BRANCH: "$BUILDKITE_COMMIT"
retry:
automatic:
- exit_status: -1 # Agent was lost
limit: 1
- exit_status: -10 # Agent was lost
limit: 1

- label: "AMD: :docker: build test image and artifacts"
key: image-build-amd
depends_on:
- ensure-ci-base-amd
device: amd_cpu
no_plugin: true
commands:
- |
if [[ "${ROCM_CI_ARTIFACT_ONLY:-0}" == "1" ]]; then
echo "ROCM_CI_ARTIFACT_ONLY=1; building ROCm wheel artifact only"
IMAGE_TAG="" bash .buildkite/scripts/ci-bake-rocm.sh test-rocm-ci-with-artifacts
else
bash .buildkite/scripts/ci-bake-rocm.sh test-rocm-ci-with-wheel
fi
python3 - <<PY
import torch, vllm
print(torch.__version__)
print(vllm.__version__)
PY
echo AMD image smoke OK
'
- docker push "rocm/vllm-ci:${BUILDKITE_COMMIT}"
- |
docker run --rm --network=none --entrypoint /bin/bash "rocm/vllm-ci:${BUILDKITE_COMMIT}" -ec '
if [ ! -d /vllm-workspace ]; then echo Missing directory: /vllm-workspace >&2; exit 1; fi
if [ ! -d /vllm-workspace/tests ]; then echo Missing directory: /vllm-workspace/tests >&2; exit 1; fi
if [ ! -d /vllm-workspace/src/vllm ]; then echo Missing directory: /vllm-workspace/src/vllm >&2; exit 1; fi
if [ ! -x /vllm-workspace/src/vllm/vllm-rs ]; then echo Missing executable: /vllm-workspace/src/vllm/vllm-rs >&2; exit 1; fi
command -v python3
command -v uv
command -v pytest
if ! command -v amd-smi >/dev/null 2>&1 && ! command -v rocminfo >/dev/null 2>&1; then
echo No ROCm CLI found in image >&2
exit 1
fi
python3 - <<PY
import torch, vllm
print(torch.__version__)
print(vllm.__version__)
PY
echo AMD image smoke OK
'
env:
DOCKER_BUILDKIT: "1"
VLLM_BAKE_FILE: "docker/docker-bake-rocm.hcl"
PYTORCH_ROCM_ARCH: "gfx90a;gfx942;gfx950"
IMAGE_TAG: "rocm/vllm-ci:$BUILDKITE_COMMIT"
REMOTE_VLLM: "1"
VLLM_BRANCH: "$BUILDKITE_COMMIT"
retry:
automatic:
- exit_status: -1 # Agent was lost
limit: 1
- exit_status: -10 # Agent was lost
limit: 1
25 changes: 13 additions & 12 deletions .buildkite/hardware_tests/cpu.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -28,18 +28,19 @@ steps:
pytest -x -v -s tests/kernels/quantization/test_cpu_fp8_scaled_mm.py
pytest -x -v -s tests/kernels/mamba/cpu/test_cpu_gdn_ops.py"

- label: CPU-Compatibility Tests
depends_on: []
device: intel_cpu
no_plugin: true
source_file_dependencies:
- cmake/cpu_extension.cmake
- setup.py
- vllm/platforms/cpu.py
commands:
- |
bash .buildkite/scripts/hardware_ci/run-cpu-test.sh 20m "
bash .buildkite/scripts/hardware_ci/run-cpu-compatibility-test.sh"
# Note: SDE can't be downloaded from CI host because of AWS WAF
# - label: CPU-Compatibility Tests
# depends_on: []
# device: intel_cpu
# no_plugin: true
# source_file_dependencies:
# - cmake/cpu_extension.cmake
# - setup.py
# - vllm/platforms/cpu.py
# commands:
# - |
# bash .buildkite/scripts/hardware_ci/run-cpu-test.sh 20m "
# bash .buildkite/scripts/hardware_ci/run-cpu-compatibility-test.sh"

- label: CPU-Language Generation and Pooling Model Tests
depends_on: []
Expand Down
22 changes: 22 additions & 0 deletions .buildkite/intel_jobs/basic_correctness.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
group: Basic Correctness
depends_on:
- image-build-xpu
steps:
- label: XPU Sleep Mode
timeout_in_minutes: 30
device: intel_gpu
no_plugin: true
working_dir: "."
env:
REGISTRY: "public.ecr.aws/q9t5s3a7"
REPO: "vllm-ci-test-repo"
VLLM_TEST_DEVICE: "xpu"
source_file_dependencies:
- vllm/
- tests/basic_correctness/test_cumem.py
commands:
- >-
bash .buildkite/scripts/hardware_ci/run-intel-test.sh
'cd tests &&
export VLLM_WORKER_MULTIPROC_METHOD=spawn &&
pytest -v -s basic_correctness/test_mem.py::test_end_to_end'
23 changes: 23 additions & 0 deletions .buildkite/intel_jobs/expert_parallelism_intel.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
group: Expert Parallelism
depends_on:
- image-build-xpu
steps:
- label: EPLB Algorithm
key: eplb-algorithm
timeout_in_minutes: 45
device: intel_gpu
no_plugin: true
working_dir: "."
env:
REGISTRY: "public.ecr.aws/q9t5s3a7"
REPO: "vllm-ci-test-repo"
VLLM_TEST_DEVICE: "xpu"
source_file_dependencies:
- vllm/distributed/eplb
- tests/distributed/test_eplb_algo.py
- tests/distributed/test_eplb_utils.py
commands:
- >-
bash .buildkite/scripts/hardware_ci/run-intel-test.sh
'cd tests &&
pytest -v -s distributed/test_eplb_algo.py'
136 changes: 134 additions & 2 deletions .buildkite/intel_jobs/misc_intel.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,17 @@ steps:
REPO: "vllm-ci-test-repo"
VLLM_TEST_DEVICE: "xpu"
source_file_dependencies:
- vllm/
- vllm/config/
- vllm/distributed/
- vllm/engine/
- vllm/inputs/
- vllm/logger.py
- vllm/model_executor/
- vllm/platforms/
- vllm/sampling_params.py
- vllm/transformers_utils/
- vllm/utils/
- vllm/v1/
- tests/v1/sample
- tests/v1/logits_processors
- tests/v1/test_oracle.py
Expand All @@ -52,4 +62,126 @@ steps:
pytest -v -s v1/logits_processors --ignore=v1/logits_processors/test_custom_online.py --ignore=v1/logits_processors/test_custom_offline.py &&
pytest -v -s v1/test_oracle.py &&
pytest -v -s v1/test_request.py &&
pytest -v -s v1/test_outputs.py'
pytest -v -s v1/test_outputs.py &&
pytest -v -s v1/sample/test_topk_topp_sampler.py'

- label: XPU CPU Offload
timeout_in_minutes: 60
device: intel_gpu
no_plugin: true
working_dir: "."
env:
REGISTRY: "public.ecr.aws/q9t5s3a7"
REPO: "vllm-ci-test-repo"
VLLM_TEST_DEVICE: "xpu"
source_file_dependencies:
- vllm/
- vllm/v1/kv_offload/
- vllm/v1/kv_connector/
- tests/v1/kv_offload/
- tests/v1/kv_connector/unit/test_offloading_connector.py
commands:
- >-
bash .buildkite/scripts/hardware_ci/run-intel-test.sh
'export VLLM_WORKER_MULTIPROC_METHOD=spawn &&
cd tests &&
pytest -v -s v1/kv_offload &&
pytest -v -s v1/kv_connector/unit/test_offloading_connector.py'

- label: Regression
key: regression
timeout_in_minutes: 30
device: intel_gpu
no_plugin: true
working_dir: "."
env:
REGISTRY: "public.ecr.aws/q9t5s3a7"
REPO: "vllm-ci-test-repo"
VLLM_TEST_DEVICE: "xpu"
source_file_dependencies:
- vllm/config/
- vllm/distributed/
- vllm/engine/
- vllm/inputs/
- vllm/model_executor/
- vllm/multimodal/
- vllm/platforms/
- vllm/sampling_params.py
- vllm/transformers_utils/
- vllm/utils/
- vllm/v1/
- tests/test_regression
commands:
- >-
bash .buildkite/scripts/hardware_ci/run-intel-test.sh
'pip install modelscope &&
cd tests &&
pytest -v -s test_regression.py'

- label: Metrics, Tracing (2 GPUs)
key: metrics-tracing-2-gpus
timeout_in_minutes: 30
num_devices: 2
device: intel_gpu
no_plugin: true
working_dir: "."
env:
REGISTRY: "public.ecr.aws/q9t5s3a7"
REPO: "vllm-ci-test-repo"
VLLM_TEST_DEVICE: "xpu"
source_file_dependencies:
- vllm/config/
- vllm/distributed/
- vllm/engine/
- vllm/inputs/
- vllm/model_executor/
- vllm/multimodal/
- vllm/platforms/
- vllm/sampling_params.py
- vllm/tracing/
- vllm/transformers_utils/
- vllm/utils/
- vllm/v1/
- tests/v1/tracing
commands:
- >-
bash .buildkite/scripts/hardware_ci/run-intel-test.sh
'pip install opentelemetry-sdk\>=1.26.0 opentelemetry-api\>=1.26.0 opentelemetry-exporter-otlp\>=1.26.0 opentelemetry-semantic-conventions-ai\>=0.4.1 &&
cd tests &&
pytest -v -s v1/tracing'

- label: Async Engine, Inputs, Utils, Worker
key: async-engine-inputs-utils-worker
timeout_in_minutes: 30
device: intel_gpu
no_plugin: true
working_dir: "."
env:
REGISTRY: "public.ecr.aws/q9t5s3a7"
REPO: "vllm-ci-test-repo"
VLLM_TEST_DEVICE: "xpu"
source_file_dependencies:
- vllm/assets/
- vllm/config/
- vllm/distributed/
- vllm/engine/
- vllm/inputs/
- vllm/model_executor/
- vllm/multimodal/
- vllm/platforms/
- vllm/sampling_params.py
- vllm/tokenizers/
- vllm/transformers_utils/
- vllm/utils/
- vllm/v1/
- tests/detokenizer
- tests/multimodal
- tests/utils_
commands:
- >-
bash .buildkite/scripts/hardware_ci/run-intel-test.sh
'cd tests &&
pip install av &&
pytest -v -s detokenizer &&
pytest -v -s -m "not cpu_test" ./multimodal &&
pytest -v -s utils_ --ignore=utils_/test_mem_utils.py'
Loading
Loading