fix(cache): pass textOnly to getSessionsConfig so is_pipeline_cached skips vision encoder by s-zx · Pull Request #1608 · huggingface/transformers.js

s-zx · 2026-03-26T17:01:38Z

Summary

is_pipeline_cached() returns false after successfully loading a text-generation pipeline for models like onnx-community/gemma-3-4b-it-ONNX because it checks for vision_encoder ONNX files that were never downloaded.

Root Cause

getSessionsConfig() forwarded only 2 arguments to the sessions factory, so the textOnly parameter was always undefined. The ImageTextToText / ImageAudioTextToText session factories include vision_encoder / audio_encoder when textOnly is falsy — but from_pretrained() correctly detects cross-architecture loading and sets textOnly = true to skip those files.

Path	`textOnly` computed?	`vision_encoder` included for text-gen?
`from_pretrained` (actual load)	YES	NO
`get_model_files` (cache check)	NO (was always undefined)	YES

Fix

Add textOnly parameter to getSessionsConfig() and forward it to the sessions factory
In get_model_files(), detect cross-architecture loading (same logic as resolveTypeConfig) and pass textOnly = true when appropriate
Export resolveTypeConfig for reuse

Test plan

is_pipeline_cached("text-generation", "onnx-community/gemma-3-4b-it-ONNX", { device: "webgpu", dtype: "q4f16" }) now returns true after the pipeline has been loaded
Models without cross-architecture loading (e.g. standard encoder-only) are unaffected

Closes #1606

…skips vision encoder is_pipeline_cached incorrectly returns false for text-generation pipelines on models like gemma-3-4b-it-ONNX because get_model_files checks for vision_encoder files that text-generation never downloads. Root cause: getSessionsConfig only forwarded two arguments to the sessions factory, so the textOnly parameter was always undefined. The ImageTextToText and ImageAudioTextToText factories include vision_encoder/audio_encoder when textOnly is falsy. - Add textOnly parameter to getSessionsConfig and forward it to the sessions factory - Export resolveTypeConfig for reuse - Compute textOnly in get_model_files using the same cross-architecture detection as from_pretrained (ForCausalLM loading a ForConditionalGeneration model) Closes huggingface#1606

xenova · 2026-03-26T17:18:53Z

Thanks for the PR 👍

cross-architecture loading detection is now duplicated in two placed. can you make sure that the model registry logic uses the same helper functions as defined in modeling utils?

Per review feedback, move the cross-architecture detection into a shared isTextOnlyConfig() helper in modeling_utils.js instead of duplicating the logic in get_model_files.js.

s-zx · 2026-03-27T20:19:14Z

Good point! I've extracted the cross-architecture detection into a shared isTextOnlyConfig() helper in modeling_utils.js and updated get_model_files.js to use it instead of duplicating the logic. Pushed in f2c8b0a.

xenova · 2026-03-28T00:44:59Z

packages/transformers/src/models/modeling_utils.js

+    return nativeArch.endsWith('ForConditionalGeneration');
+}
+
+export function getSessionsConfig(modelType, config, options = {}, textOnly = false) {


you also messed up the location of the isTextOnlyConfig function (it is in between getSessionsConfig and its jsdoc)

this also duplicates logic between this and resolveTypeConfig

xenova · 2026-03-28T00:46:16Z

packages/transformers/src/utils/model_registry/get_model_files.js

+    // Use the shared helper to detect cross-architecture loading (e.g.
+    // ForCausalLM loading a ForConditionalGeneration model). In text-only
+    // mode the sessions factory skips vision/audio encoder files.


no need for these comments :)

HuggingFaceDocBuilderDev · 2026-03-28T15:20:16Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

xenova · 2026-03-29T17:40:03Z

packages/transformers/src/models/modeling_utils.js

 * @returns {{ typeConfig: Object, textOnly: boolean, modelType: number|undefined }}
 */
-function resolveTypeConfig(modelName, config) {
+export function resolveTypeConfig(modelName, config) {


not used elsewhere? Probably meant to be exported and usage above.

…eration task inspired by #1608 Co-Authored-By: zxshen <zshen339@gatech.edu>

xenova · 2026-03-29T19:51:15Z

inspired by this PR, I opened #1614, which is a more robust fix of this problem. I added you as a co-author for the inspiration, so I'll close this PR

…ration pipeline (#1614) * Add unit test for text-generation on multimodal model * add more multimodal text-generation unit tests * Exclude certain sessions when loading multimodal models with text-generation task inspired by #1608 Co-Authored-By: zxshen <zshen339@gatech.edu> * simplify multimodal text-generation pipeline logic * invert logic to keep non-model files * cleanup --------- Co-authored-by: zxshen <zshen339@gatech.edu>

refactor: extract isTextOnlyConfig helper to avoid duplicated detection

f2c8b0a

Per review feedback, move the cross-architecture detection into a shared isTextOnlyConfig() helper in modeling_utils.js instead of duplicating the logic in get_model_files.js.

xenova reviewed Mar 28, 2026

View reviewed changes

xenova reviewed Mar 29, 2026

View reviewed changes

xenova added a commit that referenced this pull request Mar 29, 2026

Exclude certain sessions when loading multimodal models with text-gen…

3acec8b

…eration task inspired by #1608 Co-Authored-By: zxshen <zshen339@gatech.edu>

xenova mentioned this pull request Mar 29, 2026

Fix ModelRegistry calls when loading multimodal models with text-generation pipeline #1614

Merged

xenova closed this Mar 29, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(cache): pass textOnly to getSessionsConfig so is_pipeline_cached skips vision encoder#1608

fix(cache): pass textOnly to getSessionsConfig so is_pipeline_cached skips vision encoder#1608
s-zx wants to merge 2 commits intohuggingface:mainfrom
s-zx:fix/1606-is-pipeline-cached-task-filter

s-zx commented Mar 26, 2026

Uh oh!

xenova commented Mar 26, 2026

Uh oh!

s-zx commented Mar 27, 2026

Uh oh!

xenova Mar 28, 2026

Uh oh!

xenova Mar 29, 2026 •

edited

Loading

Uh oh!

xenova Mar 28, 2026

Uh oh!

HuggingFaceDocBuilderDev commented Mar 28, 2026

Uh oh!

xenova Mar 29, 2026

Uh oh!

xenova commented Mar 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

s-zx commented Mar 26, 2026

Summary

Root Cause

Fix

Test plan

Uh oh!

xenova commented Mar 26, 2026

Uh oh!

s-zx commented Mar 27, 2026

Uh oh!

xenova Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

xenova Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xenova Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Mar 28, 2026

Uh oh!

xenova Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

xenova commented Mar 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

xenova Mar 29, 2026 •

edited

Loading