tests: sync v1 kv cache tests for vllm 0.23 by xzh25 · Pull Request #292 · MetaX-MACA/vLLM-metax

xzh25 · 2026-06-21T14:22:47Z

Summary

sync v1 KV cache CPU tests with the vLLM 0.23 API surface
update shared test imports for moved config/logprob/input-processing helpers
mark the synced pure CPU KV cache tests to skip global accelerator cleanup, avoiding MetaX torch 2.8 torch.accelerator.empty_cache teardown failures

Validation

Run on MetaX C500 remote workspace /data/vllm-metax-contribution/src/vLLM-metax with isolated vLLM 0.23 environment /data/vllm-metax-contribution/envs/vllm023:

git diff --check
python -m py_compile   tests/conftest.py   tests/models/registry.py   tests/models/utils.py   tests/v1/core/test_kv_cache_utils.py   tests/v1/core/test_single_type_kv_cache_manager.py
PYTHONPATH="$VLLM_METAX_SRC:${PYTHONPATH:-}" python -m pytest --collect-only -q   tests/v1/core/test_kv_cache_utils.py   tests/v1/core/test_single_type_kv_cache_manager.py   --tb=short
PYTHONPATH="$VLLM_METAX_SRC:${PYTHONPATH:-}" python -m pytest -q   tests/v1/core/test_kv_cache_utils.py::test_none_hash   tests/v1/core/test_single_type_kv_cache_manager.py   --tb=short --disable-warnings

Results:

git diff --check: passed
py_compile: passed
collect-only: 66 tests collected
CPU smoke: 10 passed, 5 warnings in 0.06s

Signed-off-by: xzh25 <xzh25@users.noreply.github.com>

gemini-code-assist

Code Review

This pull request refactors and expands the test suite for the KV cache management system, updating imports and adding comprehensive tests for multi-worker configurations, pipeline parallel sharding, prompt embeddings, and auto-fitting of the maximum model length. Feedback on the changes highlights multiple instances in the test suite where ModelConfig is instantiated without its required first positional argument (model), which will lead to TypeError exceptions during test execution.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-21T14:25:59Z

+
+
+def test_get_kv_cache_configs_multiple_workers():
+    model_config = ModelConfig(max_model_len=16)


The ModelConfig constructor requires model (the model name/path) as its first positional argument. Instantiating it without this argument will raise a TypeError when the test is executed. Please provide a dummy model name like "dummy".

Suggested change

model_config = ModelConfig(max_model_len=16)

model_config = ModelConfig("dummy", max_model_len=16)

gemini-code-assist · 2026-06-21T14:25:59Z

+    ids=["symmetric", "asymmetric"],
+)
+def test_get_kv_cache_configs_pp_sharding(asymmetric_memory):
+    model_config = ModelConfig(max_model_len=512)


The ModelConfig constructor requires model (the model name/path) as its first positional argument. Instantiating it without this argument will raise a TypeError when the test is executed. Please provide a dummy model name like "dummy".

Suggested change

model_config = ModelConfig(max_model_len=512)

model_config = ModelConfig("dummy", max_model_len=512)

gemini-code-assist · 2026-06-21T14:25:59Z

-def test_get_kv_cache_config():
+def test_get_kv_cache_config_one_worker():
    # pass max_model_len to pass check_enough_kv_cache_memory
    model_config = ModelConfig(max_model_len=16)


The ModelConfig constructor requires model (the model name/path) as its first positional argument. Instantiating it without this argument will raise a TypeError when the test is executed. Please provide a dummy model name like "dummy".

Suggested change

model_config = ModelConfig(max_model_len=16)

model_config = ModelConfig("dummy", max_model_len=16)

gemini-code-assist · 2026-06-21T14:25:59Z

+
+def test_get_kv_cache_configs_attention_free():
+    kv_cache_specs: dict[str, KVCacheSpec] = {}
+    vllm_config = VllmConfig(model_config=ModelConfig(max_model_len=16))


The ModelConfig constructor requires model (the model name/path) as its first positional argument. Instantiating it without this argument will raise a TypeError when the test is executed. Please provide a dummy model name like "dummy".

Suggested change

vllm_config = VllmConfig(model_config=ModelConfig(max_model_len=16))

vllm_config = VllmConfig(model_config=ModelConfig("dummy", max_model_len=16))

gemini-code-assist · 2026-06-21T14:26:00Z

+def test_auto_fit_max_model_len():
+    """Test that max_model_len=-1 auto-fits to available GPU memory."""
+    # Create config with original_max_model_len=-1 to trigger auto-fit
+    model_config = ModelConfig(max_model_len=1024)


The ModelConfig constructor requires model (the model name/path) as its first positional argument. Instantiating it without this argument will raise a TypeError when the test is executed. Please provide a dummy model name like "dummy".

Suggested change

model_config = ModelConfig(max_model_len=1024)

model_config = ModelConfig("dummy", max_model_len=1024)

gemini-code-assist · 2026-06-21T14:26:00Z

+def test_auto_fit_max_model_len_with_hybrid():
+    """Test that auto-fit works with hybrid KV cache specs."""
+    # Create config with original_max_model_len=-1 to trigger auto-fit
+    model_config = ModelConfig(max_model_len=8192)


The ModelConfig constructor requires model (the model name/path) as its first positional argument. Instantiating it without this argument will raise a TypeError when the test is executed. Please provide a dummy model name like "dummy".

Suggested change

model_config = ModelConfig(max_model_len=8192)

model_config = ModelConfig("dummy", max_model_len=8192)

gemini-code-assist · 2026-06-21T14:26:00Z

+
+def test_auto_fit_max_model_len_not_triggered():
+    """Test that auto-fit is not triggered when original_max_model_len is not -1."""
+    model_config = ModelConfig(max_model_len=16)


The ModelConfig constructor requires model (the model name/path) as its first positional argument. Instantiating it without this argument will raise a TypeError when the test is executed. Please provide a dummy model name like "dummy".

Suggested change

model_config = ModelConfig(max_model_len=16)

model_config = ModelConfig("dummy", max_model_len=16)

gemini-code-assist · 2026-06-21T14:26:00Z

+    the raw `available_memory`. Without this, auto-fit could pick a
+    max_model_len that no longer fits once `num_gpu_blocks_override` is applied.
+    """
+    model_config = ModelConfig(max_model_len=16384)


The ModelConfig constructor requires model (the model name/path) as its first positional argument. Instantiating it without this argument will raise a TypeError when the test is executed. Please provide a dummy model name like "dummy".

Suggested change

model_config = ModelConfig(max_model_len=16384)

model_config = ModelConfig("dummy", max_model_len=16384)

gemini-code-assist · 2026-06-21T14:26:00Z

+    `available_memory`. Without this, startup could accept a max_model_len
+    that does not actually fit in `num_gpu_blocks_override` blocks.
+    """
+    model_config = ModelConfig(max_model_len=16384)


The ModelConfig constructor requires model (the model name/path) as its first positional argument. Instantiating it without this argument will raise a TypeError when the test is executed. Please provide a dummy model name like "dummy".

Suggested change

model_config = ModelConfig(max_model_len=16384)

model_config = ModelConfig("dummy", max_model_len=16384)

gemini-code-assist · 2026-06-21T14:26:00Z

+    to False (i.e. HMA remains enabled) when no other condition requires it
+    to be disabled.
+    """
+    model_config = ModelConfig(max_model_len=16)


The ModelConfig constructor requires model (the model name/path) as its first positional argument. Instantiating it without this argument will raise a TypeError when the test is executed. Please provide a dummy model name like "dummy".

Suggested change

model_config = ModelConfig(max_model_len=16)

model_config = ModelConfig("dummy", max_model_len=16)

Signed-off-by: xzh25 <xzh25@users.noreply.github.com>

xzh25 · 2026-06-21T15:12:03Z

Addressed the Gemini review feedback on ModelConfig(max_model_len=...) in tests/v1/core/test_kv_cache_utils.py.

Changes:

Replaced validation-triggering ModelConfig/VllmConfig construction in lightweight KV cache utility tests with local SimpleNamespace helpers.
Kept the helper limited to fields read by KV cache config/memory sizing paths.
Removed the temporary dummy model approach so tests do not depend on HF model validation.

Validation on the MetaX remote vLLM 0.23 environment:

git diff --check: passed
python -m py_compile tests/v1/core/test_kv_cache_utils.py: passed
Review-targeted KV cache utils cases: 11 passed
Previous smoke coverage (test_none_hash + test_single_type_kv_cache_manager.py): 10 passed

Remote log: /data/vllm-metax-contribution/ops/logs/058_verify_pr292_modelconfig_followup_scheduler_tokens.txt

xzh25 · 2026-06-22T01:14:58Z

Added a stronger clean-worktree validation pass for this PR on the MetaX C500 remote environment.

Validation details:

Verified PR head from a clean worktree: 425a76942c6ee48ccb78a1460d22fdb54d366f17
Device/runtime: MetaX C500, torch==2.8.0+metax3.5.3.9, vllm==0.23.0
git diff --check: passed
py_compile for both target files: passed
Full target collection: 66 tests collected
Unfiltered full pytest reached 33 passed, then blocked in urllib3 while resolving external HuggingFace model config for existing ModelConfig("Qwen/Qwen1.5-7B", ...) tests.
Local CPU-safe target run excluding only those external model-config resolution cases: 63 passed, 3 deselected, 5 warnings

The excluded cases are the existing real-model config tests (test_estimate_max_model_len and test_get_max_concurrency_for_kv_cache_config), not the lightweight KV cache utility paths changed by this PR.

tests: sync v1 kv cache tests for vllm 0.23

666e202

Signed-off-by: xzh25 <xzh25@users.noreply.github.com>

gemini-code-assist Bot reviewed Jun 21, 2026

View reviewed changes

tests: avoid model validation in kv cache utils

425a769

Signed-off-by: xzh25 <xzh25@users.noreply.github.com>



		def test_get_kv_cache_configs_multiple_workers():
		model_config = ModelConfig(max_model_len=16)

	model_config = ModelConfig(max_model_len=16)
	model_config = ModelConfig("dummy", max_model_len=16)

	vllm_config = VllmConfig(model_config=ModelConfig(max_model_len=16))
	vllm_config = VllmConfig(model_config=ModelConfig("dummy", max_model_len=16))

Uh oh!

Conversation

xzh25 commented Jun 21, 2026

Summary

Validation

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 21, 2026

Choose a reason for hiding this comment

Uh oh!

xzh25 commented Jun 21, 2026

Uh oh!

xzh25 commented Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant