Skip to content

Fix vLLM vision GRPO compatibility for issue #4081#520

Open
danielhanchen wants to merge 6 commits intomainfrom
fix/issue-4081-vllm-vision-grpo
Open

Fix vLLM vision GRPO compatibility for issue #4081#520
danielhanchen wants to merge 6 commits intomainfrom
fix/issue-4081-vllm-vision-grpo

Conversation

@danielhanchen
Copy link
Copy Markdown
Contributor

Summary

This fixes two vLLM compatibility issues that broke GRPO vision notebook runs referenced from unsloth issue unslothai/unsloth#4081.

  1. Add WorkerLoRAManager.supports_tower_connector_lora() fallback in unsloth_zoo/vllm_lora_worker_manager.py.
  2. Patch multimodal _seq2text decode handling in unsloth_zoo/vllm_utils.py to normalize nested token payloads (dict/list/tensor) before decode.

Why

GRPO vision runs were failing with:

  • missing supports_tower_connector_lora on the worker manager path
  • multimodal decode path receiving nested/non-flat token payloads

Both failure paths are vLLM integration edges that should be handled in the Zoo patch layer.

Validation

Reproduced and reran issue notebooks with max_steps=5 where needed:

  • qwen2_5_7b_vl_grpo_postfix4: PASS (train_runtime=197.8613s)
  • qwen3_vl_8b_vision_grpo_postfix1: PASS (train_runtime=105.9599s)
  • openenv_gpt_oss_20b_postfix2: PASS (train_runtime=817.5183s)

Logs and rerun summary:

  • 4_prs_check/logs/issue4081/postfix_reruns_summary.txt
  • 4_prs_check/logs/issue4081/*.log

Notes

This PR is intentionally scoped to the vLLM compatibility layer and keeps behavior unchanged when payloads are already in standard token-id form.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @danielhanchen, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses critical compatibility issues within the vLLM integration layer, specifically targeting GRPO vision notebook runs. It introduces necessary adjustments to handle expected methods on the worker manager and normalizes complex token payloads during multimodal decoding, ensuring robust operation of vision models with vLLM.

Highlights

  • vLLM WorkerLoRAManager Compatibility: Added a supports_tower_connector_lora() method to WorkerLoRAManager to ensure compatibility with vLLM v1 vision code, providing a fallback for older adapter manager implementations.
  • Multimodal Decode Handling: Implemented a patch for vLLM's multimodal _seq2text function to normalize nested token payloads (dictionaries, lists, tensors) before decoding, resolving issues with non-flat token inputs.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • unsloth_zoo/vllm_lora_worker_manager.py
    • Added the supports_tower_connector_lora method to WorkerLoRAManager to align with vLLM v1 vision requirements.
  • unsloth_zoo/vllm_utils.py
    • Introduced patch_vllm_multimodal_seq2text to preprocess and flatten nested token payloads for vLLM's multimodal _seq2text function.
    • Integrated the new patch_vllm_multimodal_seq2text into the main patch_vllm function to ensure its application.
Activity
  • The pull request author, danielhanchen, created this PR to fix vLLM compatibility issues for GRPO vision notebook runs, referencing unsloth issue #4081.
  • Validation was performed by reproducing and rerunning issue notebooks (qwen2_5_7b_vl_grpo_postfix4, qwen3_vl_8b_vision_grpo_postfix1, openenv_gpt_oss_20b_postfix2) with max_steps=5, all of which passed.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces two fixes to improve vLLM compatibility for vision models, specifically for GRPO runs. The first change adds a fallback implementation for supports_tower_connector_lora in the WorkerLoRAManager to support older vLLM versions. The second change patches vLLM's _seq2text function to normalize various nested token payload formats before decoding. The changes are logical and well-targeted. I've provided a couple of suggestions in unsloth_zoo/vllm_utils.py to enhance the robustness and maintainability of the patching logic.

Comment on lines +387 to +388
if isinstance(payload, str):
return payload
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The function _extract_token_ids has an inconsistent return type, as it can return a list of integers, None, or a string. This mixed responsibility makes the logic complex and potentially buggy, especially for list processing. For example, _extract_token_ids(['hello']) would return None because the recursive call for 'hello' returns a string, which is not a list.

To make the function's responsibility clearer (to only extract token IDs), it's better for it to only return a list of integers or None. String payloads can be handled by the wrapper function unsloth_seq2text.

pass

mm_processor._seq2text = unsloth_seq2text
except:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using a bare except: can hide unexpected errors and makes debugging difficult. It's better to catch a specific exception, like ImportError if you're only concerned about the module not being found, or at least Exception to avoid catching system-exiting exceptions like KeyboardInterrupt.

Suggested change
except:
except Exception:

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7375b071e9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

adapter_manager = getattr(self, "_adapter_manager", None)
return bool(
adapter_manager is not None and
getattr(adapter_manager, "supports_tower_connector_lora", False)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Call tower-connector capability hook instead of casting it

This returns bool(getattr(adapter_manager, "supports_tower_connector_lora", False)), which evaluates to True whenever the attribute is a bound method, regardless of what that method would return. In environments where the adapter manager implements supports_tower_connector_lora() but returns False, the worker will incorrectly advertise support and can route into unsupported vision-LoRA paths at runtime.

Useful? React with 👍 / 👎.

else:
return set(self._adapter_manager.list_adapters())

def supports_tower_connector_lora(self) -> bool:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see where this comes from. Is this a necessity or a good to have?
Cuz from the looks of it, this is primarily aimed at mm_proj or those kind of modules LoRA and is kinda experimental acc to their docs

if getattr(original_seq2text, "__unsloth_patched_seq2text__", False):
return

def _extract_token_ids(payload, depth = 0):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we're heavily over complicating this. We should ideally check for 3-4 known things
list(int), list(list(int)), tensor. All the depth thing seems complicated

Copy link
Copy Markdown
Contributor Author

@danielhanchen danielhanchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

The two patches are necessary -- supports_tower_connector_lora prevents AttributeError in vLLM v1 vision code, and the _seq2text normalizer handles non-standard payload types that upstream _seq2text (which only accepts str | list[int]) cannot process.

Validated _extract_token_ids with unit tests covering: list[int], torch.Tensor, np.ndarray, dict extraction, nested list flattening, string passthrough, single int, tuple, empty list, np.integer scalars. All pass.

Issues

1. supports_mm default should be False not True (vllm_lora_worker_manager.py:281)
getattr(adapter_manager, "supports_mm", True) assumes multimodal support when the attribute is absent. Upstream vLLM sets supports_mm as an explicit instance attribute -- if it's missing, the adapter manager likely doesn't support multimodal. Default should be False to be conservative.

2. _extract_token_ids over-engineered (vllm_utils.py:384-431)
Agree with @Datta0's comment. The depth-8 recursion and 8 preferred dict keys are complex for what should handle 3-4 known types: torch.Tensor, np.ndarray, list[list[int]], and dict with token_ids/input_ids. A simpler version would be more maintainable and less likely to mask bugs by silently normalizing unexpected inputs.

3. Inconsistent return type from _extract_token_ids
The function returns list[int], None, or str. The string path causes: _extract_token_ids(["a", "b"]) returns None because recursive calls return strings which fail isinstance(found, list). Not a crash (wrapper falls back to original seq), but it means multi-element string lists silently bypass normalization.

4. Bare except: on line 444 -- should be except Exception: to avoid catching KeyboardInterrupt/SystemExit. Consistent with codebase convention but easy to fix.

5. Issue #4081 reference -- Issue #4081 is about a tensor shape mismatch (828 vs 824) in compute_loss. The errors this PR fixes (missing supports_tower_connector_lora + nested payloads in _seq2text) are different. The PR description should clarify the relationship.

- Change supports_mm getattr default from True to False: when the
  adapter manager lacks the attribute, conservatively assume no
  multimodal support instead of assuming it.

- Simplify _extract_token_ids to handle only known payload types:
  torch.Tensor, np.ndarray, int/np.integer, flat list[int],
  one-level nested list[list[int]], tuple equivalents, and dict
  with token_ids/input_ids/prompt_token_ids keys. Removes depth-8
  recursion, 8 preferred dict keys, and exhaustive dict value scan.
- Add fallback import: try vllm.multimodal.processing.processor
  (v0.15.0+), fall back to vllm.multimodal.processing (v0.10.2-v0.14.x).
  The package path only exists from v0.15.0 onwards.

- Detect use_cache parameter via inspect.signature before calling
  original _seq2text. The use_cache kwarg was added in v0.12.0;
  passing it on v0.10.2-v0.11.2 would raise TypeError.

- Use **kwargs forwarding so the wrapper stays compatible with any
  future signature changes.
except (ImportError, ModuleNotFoundError):
# v0.10.2 - v0.14.x uses a flat module
import vllm.multimodal.processing as mm_processor
pass
return _extract_token_ids(value)
return None
return None
pass
if _has_use_cache:
return original_seq2text(tokenizer, normalized_seq, **kwargs)
return original_seq2text(tokenizer, normalized_seq)
pass
mm_processor._seq2text = unsloth_seq2text
except:
pass
pass

def patch_vllm_multimodal_seq2text():
return
pass

unsloth_seq2text.__unsloth_patched_seq2text__ = True
mm_processor._seq2text = unsloth_seq2text
except:

unsloth_seq2text.__unsloth_patched_seq2text__ = True
mm_processor._seq2text = unsloth_seq2text
except:
if all(isinstance(x, (int, np.integer)) for x in payload):
return [int(x) for x in payload]
# One level of nesting: list[list[int]] -> list[int]
if all(isinstance(x, (list, tuple)) for x in payload):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any good reason to do that? is it not like merging tokens of diff unrelated sequences?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants