Fix vLLM vision GRPO compatibility for issue #4081#520
Fix vLLM vision GRPO compatibility for issue #4081#520danielhanchen wants to merge 6 commits intomainfrom
Conversation
Summary of ChangesHello @danielhanchen, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses critical compatibility issues within the vLLM integration layer, specifically targeting GRPO vision notebook runs. It introduces necessary adjustments to handle expected methods on the worker manager and normalizes complex token payloads during multimodal decoding, ensuring robust operation of vision models with vLLM. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces two fixes to improve vLLM compatibility for vision models, specifically for GRPO runs. The first change adds a fallback implementation for supports_tower_connector_lora in the WorkerLoRAManager to support older vLLM versions. The second change patches vLLM's _seq2text function to normalize various nested token payload formats before decoding. The changes are logical and well-targeted. I've provided a couple of suggestions in unsloth_zoo/vllm_utils.py to enhance the robustness and maintainability of the patching logic.
| if isinstance(payload, str): | ||
| return payload |
There was a problem hiding this comment.
The function _extract_token_ids has an inconsistent return type, as it can return a list of integers, None, or a string. This mixed responsibility makes the logic complex and potentially buggy, especially for list processing. For example, _extract_token_ids(['hello']) would return None because the recursive call for 'hello' returns a string, which is not a list.
To make the function's responsibility clearer (to only extract token IDs), it's better for it to only return a list of integers or None. String payloads can be handled by the wrapper function unsloth_seq2text.
| pass | ||
|
|
||
| mm_processor._seq2text = unsloth_seq2text | ||
| except: |
There was a problem hiding this comment.
Using a bare except: can hide unexpected errors and makes debugging difficult. It's better to catch a specific exception, like ImportError if you're only concerned about the module not being found, or at least Exception to avoid catching system-exiting exceptions like KeyboardInterrupt.
| except: | |
| except Exception: |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7375b071e9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| adapter_manager = getattr(self, "_adapter_manager", None) | ||
| return bool( | ||
| adapter_manager is not None and | ||
| getattr(adapter_manager, "supports_tower_connector_lora", False) |
There was a problem hiding this comment.
Call tower-connector capability hook instead of casting it
This returns bool(getattr(adapter_manager, "supports_tower_connector_lora", False)), which evaluates to True whenever the attribute is a bound method, regardless of what that method would return. In environments where the adapter manager implements supports_tower_connector_lora() but returns False, the worker will incorrectly advertise support and can route into unsupported vision-LoRA paths at runtime.
Useful? React with 👍 / 👎.
| else: | ||
| return set(self._adapter_manager.list_adapters()) | ||
|
|
||
| def supports_tower_connector_lora(self) -> bool: |
There was a problem hiding this comment.
I see where this comes from. Is this a necessity or a good to have?
Cuz from the looks of it, this is primarily aimed at mm_proj or those kind of modules LoRA and is kinda experimental acc to their docs
unsloth_zoo/vllm_utils.py
Outdated
| if getattr(original_seq2text, "__unsloth_patched_seq2text__", False): | ||
| return | ||
|
|
||
| def _extract_token_ids(payload, depth = 0): |
There was a problem hiding this comment.
Looks like we're heavily over complicating this. We should ideally check for 3-4 known things
list(int), list(list(int)), tensor. All the depth thing seems complicated
danielhanchen
left a comment
There was a problem hiding this comment.
Review Summary
The two patches are necessary -- supports_tower_connector_lora prevents AttributeError in vLLM v1 vision code, and the _seq2text normalizer handles non-standard payload types that upstream _seq2text (which only accepts str | list[int]) cannot process.
Validated _extract_token_ids with unit tests covering: list[int], torch.Tensor, np.ndarray, dict extraction, nested list flattening, string passthrough, single int, tuple, empty list, np.integer scalars. All pass.
Issues
1. supports_mm default should be False not True (vllm_lora_worker_manager.py:281)
getattr(adapter_manager, "supports_mm", True) assumes multimodal support when the attribute is absent. Upstream vLLM sets supports_mm as an explicit instance attribute -- if it's missing, the adapter manager likely doesn't support multimodal. Default should be False to be conservative.
2. _extract_token_ids over-engineered (vllm_utils.py:384-431)
Agree with @Datta0's comment. The depth-8 recursion and 8 preferred dict keys are complex for what should handle 3-4 known types: torch.Tensor, np.ndarray, list[list[int]], and dict with token_ids/input_ids. A simpler version would be more maintainable and less likely to mask bugs by silently normalizing unexpected inputs.
3. Inconsistent return type from _extract_token_ids
The function returns list[int], None, or str. The string path causes: _extract_token_ids(["a", "b"]) returns None because recursive calls return strings which fail isinstance(found, list). Not a crash (wrapper falls back to original seq), but it means multi-element string lists silently bypass normalization.
4. Bare except: on line 444 -- should be except Exception: to avoid catching KeyboardInterrupt/SystemExit. Consistent with codebase convention but easy to fix.
5. Issue #4081 reference -- Issue #4081 is about a tensor shape mismatch (828 vs 824) in compute_loss. The errors this PR fixes (missing supports_tower_connector_lora + nested payloads in _seq2text) are different. The PR description should clarify the relationship.
- Change supports_mm getattr default from True to False: when the adapter manager lacks the attribute, conservatively assume no multimodal support instead of assuming it. - Simplify _extract_token_ids to handle only known payload types: torch.Tensor, np.ndarray, int/np.integer, flat list[int], one-level nested list[list[int]], tuple equivalents, and dict with token_ids/input_ids/prompt_token_ids keys. Removes depth-8 recursion, 8 preferred dict keys, and exhaustive dict value scan.
- Add fallback import: try vllm.multimodal.processing.processor (v0.15.0+), fall back to vllm.multimodal.processing (v0.10.2-v0.14.x). The package path only exists from v0.15.0 onwards. - Detect use_cache parameter via inspect.signature before calling original _seq2text. The use_cache kwarg was added in v0.12.0; passing it on v0.10.2-v0.11.2 would raise TypeError. - Use **kwargs forwarding so the wrapper stays compatible with any future signature changes.
| except (ImportError, ModuleNotFoundError): | ||
| # v0.10.2 - v0.14.x uses a flat module | ||
| import vllm.multimodal.processing as mm_processor | ||
| pass |
| return _extract_token_ids(value) | ||
| return None | ||
| return None | ||
| pass |
| if _has_use_cache: | ||
| return original_seq2text(tokenizer, normalized_seq, **kwargs) | ||
| return original_seq2text(tokenizer, normalized_seq) | ||
| pass |
| mm_processor._seq2text = unsloth_seq2text | ||
| except: | ||
| pass | ||
| pass |
|
|
||
| def patch_vllm_multimodal_seq2text(): | ||
| return | ||
| pass |
|
|
||
| unsloth_seq2text.__unsloth_patched_seq2text__ = True | ||
| mm_processor._seq2text = unsloth_seq2text | ||
| except: |
|
|
||
| unsloth_seq2text.__unsloth_patched_seq2text__ = True | ||
| mm_processor._seq2text = unsloth_seq2text | ||
| except: |
| if all(isinstance(x, (int, np.integer)) for x in payload): | ||
| return [int(x) for x in payload] | ||
| # One level of nesting: list[list[int]] -> list[int] | ||
| if all(isinstance(x, (list, tuple)) for x in payload): |
There was a problem hiding this comment.
Is there any good reason to do that? is it not like merging tokens of diff unrelated sequences?
Summary
This fixes two vLLM compatibility issues that broke GRPO vision notebook runs referenced from unsloth issue
unslothai/unsloth#4081.WorkerLoRAManager.supports_tower_connector_lora()fallback inunsloth_zoo/vllm_lora_worker_manager.py._seq2textdecode handling inunsloth_zoo/vllm_utils.pyto normalize nested token payloads (dict/list/tensor) before decode.Why
GRPO vision runs were failing with:
supports_tower_connector_loraon the worker manager pathBoth failure paths are vLLM integration edges that should be handled in the Zoo patch layer.
Validation
Reproduced and reran issue notebooks with
max_steps=5where needed:qwen2_5_7b_vl_grpo_postfix4: PASS (train_runtime=197.8613s)qwen3_vl_8b_vision_grpo_postfix1: PASS (train_runtime=105.9599s)openenv_gpt_oss_20b_postfix2: PASS (train_runtime=817.5183s)Logs and rerun summary:
4_prs_check/logs/issue4081/postfix_reruns_summary.txt4_prs_check/logs/issue4081/*.logNotes
This PR is intentionally scoped to the vLLM compatibility layer and keeps behavior unchanged when payloads are already in standard token-id form.