Fix vLLM vision GRPO compatibility for issue #4081 by danielhanchen · Pull Request #520 · unslothai/unsloth-zoo

danielhanchen · 2026-02-26T00:17:14Z

Summary

This fixes two vLLM compatibility issues that broke GRPO vision notebook runs referenced from unsloth issue unslothai/unsloth#4081.

Add WorkerLoRAManager.supports_tower_connector_lora() fallback in unsloth_zoo/vllm_lora_worker_manager.py.
Patch multimodal _seq2text decode handling in unsloth_zoo/vllm_utils.py to normalize nested token payloads (dict/list/tensor) before decode.

Why

GRPO vision runs were failing with:

missing supports_tower_connector_lora on the worker manager path
multimodal decode path receiving nested/non-flat token payloads

Both failure paths are vLLM integration edges that should be handled in the Zoo patch layer.

Validation

Reproduced and reran issue notebooks with max_steps=5 where needed:

qwen2_5_7b_vl_grpo_postfix4: PASS (train_runtime=197.8613s)
qwen3_vl_8b_vision_grpo_postfix1: PASS (train_runtime=105.9599s)
openenv_gpt_oss_20b_postfix2: PASS (train_runtime=817.5183s)

Logs and rerun summary:

4_prs_check/logs/issue4081/postfix_reruns_summary.txt
4_prs_check/logs/issue4081/*.log

Notes

This PR is intentionally scoped to the vLLM compatibility layer and keeps behavior unchanged when payloads are already in standard token-id form.

gemini-code-assist · 2026-02-26T00:17:28Z

Summary of Changes

Hello @danielhanchen, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses critical compatibility issues within the vLLM integration layer, specifically targeting GRPO vision notebook runs. It introduces necessary adjustments to handle expected methods on the worker manager and normalizes complex token payloads during multimodal decoding, ensuring robust operation of vision models with vLLM.

Highlights

vLLM WorkerLoRAManager Compatibility: Added a supports_tower_connector_lora() method to WorkerLoRAManager to ensure compatibility with vLLM v1 vision code, providing a fallback for older adapter manager implementations.
Multimodal Decode Handling: Implemented a patch for vLLM's multimodal _seq2text function to normalize nested token payloads (dictionaries, lists, tensors) before decoding, resolving issues with non-flat token inputs.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

unsloth_zoo/vllm_lora_worker_manager.py
- Added the supports_tower_connector_lora method to WorkerLoRAManager to align with vLLM v1 vision requirements.
unsloth_zoo/vllm_utils.py
- Introduced patch_vllm_multimodal_seq2text to preprocess and flatten nested token payloads for vLLM's multimodal _seq2text function.
- Integrated the new patch_vllm_multimodal_seq2text into the main patch_vllm function to ensure its application.

Activity

The pull request author, danielhanchen, created this PR to fix vLLM compatibility issues for GRPO vision notebook runs, referencing unsloth issue #4081.
Validation was performed by reproducing and rerunning issue notebooks (qwen2_5_7b_vl_grpo_postfix4, qwen3_vl_8b_vision_grpo_postfix1, openenv_gpt_oss_20b_postfix2) with max_steps=5, all of which passed.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces two fixes to improve vLLM compatibility for vision models, specifically for GRPO runs. The first change adds a fallback implementation for supports_tower_connector_lora in the WorkerLoRAManager to support older vLLM versions. The second change patches vLLM's _seq2text function to normalize various nested token payload formats before decoding. The changes are logical and well-targeted. I've provided a couple of suggestions in unsloth_zoo/vllm_utils.py to enhance the robustness and maintainability of the patching logic.

gemini-code-assist · 2026-02-26T00:18:50Z

unsloth_zoo/vllm_utils.py

+                if isinstance(payload, str):
+                    return payload


The function _extract_token_ids has an inconsistent return type, as it can return a list of integers, None, or a string. This mixed responsibility makes the logic complex and potentially buggy, especially for list processing. For example, _extract_token_ids(['hello']) would return None because the recursive call for 'hello' returns a string, which is not a list.

To make the function's responsibility clearer (to only extract token IDs), it's better for it to only return a list of integers or None. String payloads can be handled by the wrapper function unsloth_seq2text.

gemini-code-assist · 2026-02-26T00:18:50Z

unsloth_zoo/vllm_utils.py

+            pass
+
+            mm_processor._seq2text = unsloth_seq2text
+        except:


Using a bare except: can hide unexpected errors and makes debugging difficult. It's better to catch a specific exception, like ImportError if you're only concerned about the module not being found, or at least Exception to avoid catching system-exiting exceptions like KeyboardInterrupt.

Suggested change

except:

except Exception:

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7375b071e9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-02-26T00:18:56Z

unsloth_zoo/vllm_lora_worker_manager.py

+        adapter_manager = getattr(self, "_adapter_manager", None)
+        return bool(
+            adapter_manager is not None and
+            getattr(adapter_manager, "supports_tower_connector_lora", False)


Call tower-connector capability hook instead of casting it

This returns bool(getattr(adapter_manager, "supports_tower_connector_lora", False)), which evaluates to True whenever the attribute is a bound method, regardless of what that method would return. In environments where the adapter manager implements supports_tower_connector_lora() but returns False, the worker will incorrectly advertise support and can route into unsupported vision-LoRA paths at runtime.

Useful? React with 👍 / 👎.

unsloth_zoo/vllm_utils.py

Datta0 · 2026-02-26T05:24:57Z

unsloth_zoo/vllm_lora_worker_manager.py

        else:
            return set(self._adapter_manager.list_adapters())

+    def supports_tower_connector_lora(self) -> bool:


I see where this comes from. Is this a necessity or a good to have?
Cuz from the looks of it, this is primarily aimed at mm_proj or those kind of modules LoRA and is kinda experimental acc to their docs

Datta0 · 2026-02-26T05:28:00Z

unsloth_zoo/vllm_utils.py

+            if getattr(original_seq2text, "__unsloth_patched_seq2text__", False):
+                return
+
+            def _extract_token_ids(payload, depth = 0):


Looks like we're heavily over complicating this. We should ideally check for 3-4 known things
list(int), list(list(int)), tensor. All the depth thing seems complicated

danielhanchen

Review Summary

The two patches are necessary -- supports_tower_connector_lora prevents AttributeError in vLLM v1 vision code, and the _seq2text normalizer handles non-standard payload types that upstream _seq2text (which only accepts str | list[int]) cannot process.

Validated _extract_token_ids with unit tests covering: list[int], torch.Tensor, np.ndarray, dict extraction, nested list flattening, string passthrough, single int, tuple, empty list, np.integer scalars. All pass.

Issues

1. supports_mm default should be False not True (vllm_lora_worker_manager.py:281)
getattr(adapter_manager, "supports_mm", True) assumes multimodal support when the attribute is absent. Upstream vLLM sets supports_mm as an explicit instance attribute -- if it's missing, the adapter manager likely doesn't support multimodal. Default should be False to be conservative.

2. _extract_token_ids over-engineered (vllm_utils.py:384-431)
Agree with @Datta0's comment. The depth-8 recursion and 8 preferred dict keys are complex for what should handle 3-4 known types: torch.Tensor, np.ndarray, list[list[int]], and dict with token_ids/input_ids. A simpler version would be more maintainable and less likely to mask bugs by silently normalizing unexpected inputs.

3. Inconsistent return type from _extract_token_ids
The function returns list[int], None, or str. The string path causes: _extract_token_ids(["a", "b"]) returns None because recursive calls return strings which fail isinstance(found, list). Not a crash (wrapper falls back to original seq), but it means multi-element string lists silently bypass normalization.

4. Bare except: on line 444 -- should be except Exception: to avoid catching KeyboardInterrupt/SystemExit. Consistent with codebase convention but easy to fix.

5. Issue #4081 reference -- Issue #4081 is about a tensor shape mismatch (828 vs 824) in compute_loss. The errors this PR fixes (missing supports_tower_connector_lora + nested payloads in _seq2text) are different. The PR description should clarify the relationship.

- Change supports_mm getattr default from True to False: when the adapter manager lacks the attribute, conservatively assume no multimodal support instead of assuming it. - Simplify _extract_token_ids to handle only known payload types: torch.Tensor, np.ndarray, int/np.integer, flat list[int], one-level nested list[list[int]], tuple equivalents, and dict with token_ids/input_ids/prompt_token_ids keys. Removes depth-8 recursion, 8 preferred dict keys, and exhaustive dict value scan.

unsloth_zoo/vllm_utils.py

- Add fallback import: try vllm.multimodal.processing.processor (v0.15.0+), fall back to vllm.multimodal.processing (v0.10.2-v0.14.x). The package path only exists from v0.15.0 onwards. - Detect use_cache parameter via inspect.signature before calling original _seq2text. The use_cache kwarg was added in v0.12.0; passing it on v0.10.2-v0.11.2 would raise TypeError. - Use **kwargs forwarding so the wrapper stays compatible with any future signature changes.

unsloth_zoo/vllm_utils.py

+            except (ImportError, ModuleNotFoundError):
+                # v0.10.2 - v0.14.x uses a flat module
+                import vllm.multimodal.processing as mm_processor
+            pass


unsloth_zoo/vllm_utils.py

+                            return _extract_token_ids(value)
+                    return None
+                return None
+            pass


unsloth_zoo/vllm_utils.py

+                if _has_use_cache:
+                    return original_seq2text(tokenizer, normalized_seq, **kwargs)
+                return original_seq2text(tokenizer, normalized_seq)
+            pass


unsloth_zoo/vllm_utils.py

+            mm_processor._seq2text = unsloth_seq2text
+        except:
+            pass
+    pass


unsloth_zoo/vllm_utils.py


+    def patch_vllm_multimodal_seq2text():
+        return
+    pass


unsloth_zoo/vllm_utils.py

+
+            unsloth_seq2text.__unsloth_patched_seq2text__ = True
+            mm_processor._seq2text = unsloth_seq2text
+        except:


unsloth_zoo/vllm_utils.py

+
+            unsloth_seq2text.__unsloth_patched_seq2text__ = True
+            mm_processor._seq2text = unsloth_seq2text
+        except:


Datta0 · 2026-03-02T04:58:47Z

unsloth_zoo/vllm_utils.py

+                    if all(isinstance(x, (int, np.integer)) for x in payload):
+                        return [int(x) for x in payload]
+                    # One level of nesting: list[list[int]] -> list[int]
+                    if all(isinstance(x, (list, tuple)) for x in payload):


Is there any good reason to do that? is it not like merging tokens of diff unrelated sequences?

Fix vLLM vision GRPO compatibility for LoRA + multimodal decode

7375b07

gemini-code-assist bot reviewed Feb 26, 2026

View reviewed changes

chatgpt-codex-connector bot reviewed Feb 26, 2026

View reviewed changes

github-code-quality bot found potential problems Feb 26, 2026

View reviewed changes

Make multimodal seq2text patch idempotent

d730400

github-code-quality bot found potential problems Feb 26, 2026

View reviewed changes

unsloth_zoo/vllm_utils.py Fixed Show fixed Hide fixed

unsloth_zoo/vllm_utils.py Fixed Show fixed Hide fixed

unsloth_zoo/vllm_utils.py Fixed Show fixed Hide fixed

unsloth_zoo/vllm_utils.py Fixed Show fixed Hide fixed

danielhanchen added 2 commits February 26, 2026 04:20

Fix tower-connector capability check for callable adapters

f5f3fd0

Normalize worker manager file mode

47ee885

Datta0 reviewed Feb 26, 2026

View reviewed changes

danielhanchen commented Mar 1, 2026

View reviewed changes

github-code-quality bot found potential problems Mar 1, 2026

View reviewed changes

Datta0 reviewed Mar 2, 2026

View reviewed changes

Conversation

danielhanchen commented Feb 26, 2026

Summary

Why

Validation

Notes

Uh oh!

gemini-code-assist bot commented Feb 26, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Datta0 Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Datta0 Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

danielhanchen left a comment

Choose a reason for hiding this comment

Review Summary

Issues

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Datta0 Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants