[studio] Fix VLM detection for transformers v5 by Datta0 · Pull Request #4868 · unslothai/unsloth

Datta0 · 2026-04-06T05:53:56Z

Datta0 · 2026-04-06T05:54:39Z

gemini-code-assist

Code Review

This pull request refactors the VLM (Vision Language Model) detection logic by introducing a centralized _is_vlm_config helper and a fallback mechanism that fetches raw config.json metadata from the Hugging Face Hub or local paths when standard loading fails. It also updates the subprocess-based vision check to return None on failure, allowing the detection logic to proceed to the metadata fallback. Review feedback suggests expanding the list of excluded non-VLM model types (such as T5 and BART) to prevent false positives and renaming a test case to clarify that it refers to Transformers version 5 rather than the T5 model architecture.

gemini-code-assist · 2026-04-06T05:56:28Z

studio/backend/utils/models/model_config.py

    "cogvlm2",
    "minicpmv",
 }
+_AUDIO_ONLY_MODEL_TYPES = {"csm", "whisper"}


The list of excluded model types should be expanded. The ForConditionalGeneration architecture suffix is used by many non-vision Seq2Seq models (such as T5, BART, Marian, etc.), which leads to false positives in VLM detection. Renaming this to a more general _NON_VLM_MODEL_TYPES and including common Seq2Seq families is recommended to improve detection accuracy.

Suggested change

_AUDIO_ONLY_MODEL_TYPES = {"csm", "whisper"}

_NON_VLM_MODEL_TYPES = {"csm", "whisper", "t5", "bart", "marian", "pegasus", "blenderbot", "m2m_100"}

gemini-code-assist · 2026-04-06T05:56:28Z

studio/backend/utils/models/model_config.py

+    if model_type in _AUDIO_ONLY_MODEL_TYPES:
        return False


Update the exclusion check to use the renamed and expanded list of non-VLM model types to prevent misidentifying standard Seq2Seq models as vision models.

Suggested change

if model_type in _AUDIO_ONLY_MODEL_TYPES:

return False

if model_type in _NON_VLM_MODEL_TYPES:

return False

gemini-code-assist · 2026-04-06T05:56:29Z

studio/backend/tests/test_transformers_version.py

+        }
+        assert model_config._is_vlm_config(config) is False
+
+    def test_is_vision_model_falls_back_to_raw_metadata_for_t5_models(self):


The test name is ambiguous because 't5' typically refers to the T5 model family, whereas here it refers to models requiring Transformers v5. Renaming it to test_is_vision_model_falls_back_to_raw_metadata_for_v5_models would clarify that the test is about version-based fallbacks rather than the T5 architecture.

Suggested change

def test_is_vision_model_falls_back_to_raw_metadata_for_t5_models(self):

def test_is_vision_model_falls_back_to_raw_metadata_for_v5_models(self):

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fbfa817091

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-06T05:57:57Z

studio/backend/utils/models/model_config.py

+    if architectures:
+        if any(x.endswith(_VLM_ARCH_SUFFIXES) for x in architectures):
+            return True


Restrict conditional-generation suffix before marking VLM

The new _is_vlm_config helper treats any architecture ending with ForConditionalGeneration as vision, which is too broad for the new raw-metadata fallback path in is_vision_model. When load_model_config fails (common with custom/remote-code repos), text-only seq2seq configs can now be classified as vision even if they have no vision_config/img_processor/image_token_index, which can route normal models into vision-specific training/inference flows. This regression is introduced by applying the suffix heuristic to raw config.json dictionaries without an additional vision signal.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e26a88e121

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-04-06T15:42:11Z

studio/backend/utils/models/model_config.py

+    if architectures and any(x.endswith("ForVisionText2Text") for x in architectures):
+        return True


Validate architecture entries before suffix matching

In the new raw-config.json fallback path, _is_vlm_config assumes every architectures element is a string and directly calls x.endswith(...). If a custom/partial config contains a non-string entry (for example null), this raises AttributeError and bubbles out of is_vision_model because the transformers-5 fallback branch does not wrap _is_vlm_config in a try/except. That turns a recoverable detection miss into a hard failure (e.g., /models/config can return 500) instead of returning False/None.

Useful? React with 👍 / 👎.

Fix VLM detection for models requiring transformers v5 (Qwen3.5, Gemma4) in Unsloth Studio. These models were incorrectly classified as text-only because the subprocess check failed and no fallback existed. Changes: - Add _is_vlm_config() helper for unified VLM detection across config types (AutoConfig objects and raw JSON dicts) - Add _load_model_config_metadata() as raw config.json fallback when both in-process AutoConfig and subprocess detection fail - For needs_transformers_5 models: try subprocess first, fall back to raw config.json metadata on transient failure - Replace ForConditionalGeneration architecture suffix heuristic with explicit vision signals (vision_config, img_processor, image_token_index, image_token_id) to eliminate seq2seq false positives - Add comprehensive _VLM_MODEL_TYPES safety net for known VLM model types - Add _classify_detection_error() for permanent vs transient error classification (EntryNotFoundError, RepositoryNotFoundError, etc.) - Update _VISION_CHECK_SCRIPT subprocess to match new detection logic - Preserve vision detection cache and error classification from unslothai#4853 - Add tests for _is_vlm_config and raw metadata fallback path Fixes: unslothai#4859

rolandtannous · 2026-04-07T16:10:07Z

@Datta0 this conflicts with #4878 which already solves Gemma4 and Qwen3.5 issues in studio and that I just merged. It also seggregates into two separate v5 transformers versions. (5.3.0 and 5.5.0). If there are no additional bits in this PR beyond just fixing training and inference for these two model families, then maybe we should close this one.

Datta0 · 2026-04-08T05:14:06Z

Closing this as the above mentioned one seems to handle 5.3 vs 5.5 as well

Datta0 requested review from danielhanchen and rolandtannous as code owners April 6, 2026 05:54

Datta0 mentioned this pull request Apr 6, 2026

Add tests for is_vision_model() caching behaviour #4855

Merged

gemini-code-assist bot reviewed Apr 6, 2026

View reviewed changes

chatgpt-codex-connector bot reviewed Apr 6, 2026

View reviewed changes

This was referenced Apr 6, 2026

[studio] Fix VLM detection for transformers v5 danielhanchen/unsloth-staging-2#4

Closed

[tests] [studio] Fix VLM detection for transformers v5 danielhanchen/unsloth-staging-2#5

Closed

virusapex mentioned this pull request Apr 6, 2026

[Bug] Can't train Qwen3.5 or Gemma4 on multimodal datasets in Unsloth Studio #4859

Open

danielhanchen force-pushed the studio_vlm_fixes branch 2 times, most recently from 78805d0 to 9b0d793 Compare April 6, 2026 15:34

chatgpt-codex-connector bot reviewed Apr 6, 2026

View reviewed changes

danielhanchen force-pushed the studio_vlm_fixes branch from e26a88e to 3f60688 Compare April 6, 2026 15:46

This was referenced Apr 6, 2026

[studio] Fix VLM detection for transformers v5 unslothai/unsloth-staging-1#19

Closed

[tests] [studio] Fix VLM detection for transformers v5 unslothai/unsloth-staging-1#20

Closed

Datta0 closed this Apr 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[studio] Fix VLM detection for transformers v5#4868

[studio] Fix VLM detection for transformers v5#4868
Datta0 wants to merge 1 commit intounslothai:mainfrom
Datta0:studio_vlm_fixes

Datta0 commented Apr 6, 2026 •

edited

Loading

Uh oh!

Datta0 commented Apr 6, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 6, 2026

Uh oh!

gemini-code-assist bot Apr 6, 2026

Uh oh!

gemini-code-assist bot Apr 6, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Apr 6, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Apr 6, 2026

Uh oh!

rolandtannous commented Apr 7, 2026 •

edited

Loading

Uh oh!

Datta0 commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	_AUDIO_ONLY_MODEL_TYPES = {"csm", "whisper"}
	_NON_VLM_MODEL_TYPES = {"csm", "whisper", "t5", "bart", "marian", "pegasus", "blenderbot", "m2m_100"}

	def test_is_vision_model_falls_back_to_raw_metadata_for_t5_models(self):
	def test_is_vision_model_falls_back_to_raw_metadata_for_v5_models(self):

		if architectures and any(x.endswith("ForVisionText2Text") for x in architectures):
		return True

Uh oh!

Conversation

Datta0 commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Datta0 commented Apr 6, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

rolandtannous commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Datta0 commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Datta0 commented Apr 6, 2026 •

edited

Loading

rolandtannous commented Apr 7, 2026 •

edited

Loading