tests: sync v1 kv cache tests for vllm 0.23#292
Conversation
Signed-off-by: xzh25 <xzh25@users.noreply.github.com>
There was a problem hiding this comment.
Code Review
This pull request refactors and expands the test suite for the KV cache management system, updating imports and adding comprehensive tests for multi-worker configurations, pipeline parallel sharding, prompt embeddings, and auto-fitting of the maximum model length. Feedback on the changes highlights multiple instances in the test suite where ModelConfig is instantiated without its required first positional argument (model), which will lead to TypeError exceptions during test execution.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
|
|
||
|
|
||
| def test_get_kv_cache_configs_multiple_workers(): | ||
| model_config = ModelConfig(max_model_len=16) |
There was a problem hiding this comment.
The ModelConfig constructor requires model (the model name/path) as its first positional argument. Instantiating it without this argument will raise a TypeError when the test is executed. Please provide a dummy model name like "dummy".
| model_config = ModelConfig(max_model_len=16) | |
| model_config = ModelConfig("dummy", max_model_len=16) |
| ids=["symmetric", "asymmetric"], | ||
| ) | ||
| def test_get_kv_cache_configs_pp_sharding(asymmetric_memory): | ||
| model_config = ModelConfig(max_model_len=512) |
There was a problem hiding this comment.
The ModelConfig constructor requires model (the model name/path) as its first positional argument. Instantiating it without this argument will raise a TypeError when the test is executed. Please provide a dummy model name like "dummy".
| model_config = ModelConfig(max_model_len=512) | |
| model_config = ModelConfig("dummy", max_model_len=512) |
| def test_get_kv_cache_config(): | ||
| def test_get_kv_cache_config_one_worker(): | ||
| # pass max_model_len to pass check_enough_kv_cache_memory | ||
| model_config = ModelConfig(max_model_len=16) |
There was a problem hiding this comment.
The ModelConfig constructor requires model (the model name/path) as its first positional argument. Instantiating it without this argument will raise a TypeError when the test is executed. Please provide a dummy model name like "dummy".
| model_config = ModelConfig(max_model_len=16) | |
| model_config = ModelConfig("dummy", max_model_len=16) |
|
|
||
| def test_get_kv_cache_configs_attention_free(): | ||
| kv_cache_specs: dict[str, KVCacheSpec] = {} | ||
| vllm_config = VllmConfig(model_config=ModelConfig(max_model_len=16)) |
There was a problem hiding this comment.
The ModelConfig constructor requires model (the model name/path) as its first positional argument. Instantiating it without this argument will raise a TypeError when the test is executed. Please provide a dummy model name like "dummy".
| vllm_config = VllmConfig(model_config=ModelConfig(max_model_len=16)) | |
| vllm_config = VllmConfig(model_config=ModelConfig("dummy", max_model_len=16)) |
| def test_auto_fit_max_model_len(): | ||
| """Test that max_model_len=-1 auto-fits to available GPU memory.""" | ||
| # Create config with original_max_model_len=-1 to trigger auto-fit | ||
| model_config = ModelConfig(max_model_len=1024) |
There was a problem hiding this comment.
The ModelConfig constructor requires model (the model name/path) as its first positional argument. Instantiating it without this argument will raise a TypeError when the test is executed. Please provide a dummy model name like "dummy".
| model_config = ModelConfig(max_model_len=1024) | |
| model_config = ModelConfig("dummy", max_model_len=1024) |
| def test_auto_fit_max_model_len_with_hybrid(): | ||
| """Test that auto-fit works with hybrid KV cache specs.""" | ||
| # Create config with original_max_model_len=-1 to trigger auto-fit | ||
| model_config = ModelConfig(max_model_len=8192) |
There was a problem hiding this comment.
The ModelConfig constructor requires model (the model name/path) as its first positional argument. Instantiating it without this argument will raise a TypeError when the test is executed. Please provide a dummy model name like "dummy".
| model_config = ModelConfig(max_model_len=8192) | |
| model_config = ModelConfig("dummy", max_model_len=8192) |
|
|
||
| def test_auto_fit_max_model_len_not_triggered(): | ||
| """Test that auto-fit is not triggered when original_max_model_len is not -1.""" | ||
| model_config = ModelConfig(max_model_len=16) |
There was a problem hiding this comment.
The ModelConfig constructor requires model (the model name/path) as its first positional argument. Instantiating it without this argument will raise a TypeError when the test is executed. Please provide a dummy model name like "dummy".
| model_config = ModelConfig(max_model_len=16) | |
| model_config = ModelConfig("dummy", max_model_len=16) |
| the raw `available_memory`. Without this, auto-fit could pick a | ||
| max_model_len that no longer fits once `num_gpu_blocks_override` is applied. | ||
| """ | ||
| model_config = ModelConfig(max_model_len=16384) |
There was a problem hiding this comment.
The ModelConfig constructor requires model (the model name/path) as its first positional argument. Instantiating it without this argument will raise a TypeError when the test is executed. Please provide a dummy model name like "dummy".
| model_config = ModelConfig(max_model_len=16384) | |
| model_config = ModelConfig("dummy", max_model_len=16384) |
| `available_memory`. Without this, startup could accept a max_model_len | ||
| that does not actually fit in `num_gpu_blocks_override` blocks. | ||
| """ | ||
| model_config = ModelConfig(max_model_len=16384) |
There was a problem hiding this comment.
The ModelConfig constructor requires model (the model name/path) as its first positional argument. Instantiating it without this argument will raise a TypeError when the test is executed. Please provide a dummy model name like "dummy".
| model_config = ModelConfig(max_model_len=16384) | |
| model_config = ModelConfig("dummy", max_model_len=16384) |
| to False (i.e. HMA remains enabled) when no other condition requires it | ||
| to be disabled. | ||
| """ | ||
| model_config = ModelConfig(max_model_len=16) |
There was a problem hiding this comment.
The ModelConfig constructor requires model (the model name/path) as its first positional argument. Instantiating it without this argument will raise a TypeError when the test is executed. Please provide a dummy model name like "dummy".
| model_config = ModelConfig(max_model_len=16) | |
| model_config = ModelConfig("dummy", max_model_len=16) |
Signed-off-by: xzh25 <xzh25@users.noreply.github.com>
|
Addressed the Gemini review feedback on Changes:
Validation on the MetaX remote vLLM 0.23 environment:
Remote log: |
|
Added a stronger clean-worktree validation pass for this PR on the MetaX C500 remote environment. Validation details:
The excluded cases are the existing real-model config tests ( |
Summary
torch.accelerator.empty_cacheteardown failuresValidation
Run on MetaX C500 remote workspace
/data/vllm-metax-contribution/src/vLLM-metaxwith isolated vLLM 0.23 environment/data/vllm-metax-contribution/envs/vllm023:Results:
git diff --check: passedpy_compile: passed66 tests collected10 passed, 5 warnings in 0.06s