Skip to content

fix(cache): pass textOnly to getSessionsConfig so is_pipeline_cached skips vision encoder#1608

Closed
s-zx wants to merge 2 commits intohuggingface:mainfrom
s-zx:fix/1606-is-pipeline-cached-task-filter
Closed

fix(cache): pass textOnly to getSessionsConfig so is_pipeline_cached skips vision encoder#1608
s-zx wants to merge 2 commits intohuggingface:mainfrom
s-zx:fix/1606-is-pipeline-cached-task-filter

Conversation

@s-zx
Copy link
Copy Markdown
Contributor

@s-zx s-zx commented Mar 26, 2026

Summary

is_pipeline_cached() returns false after successfully loading a text-generation pipeline for models like onnx-community/gemma-3-4b-it-ONNX because it checks for vision_encoder ONNX files that were never downloaded.

Root Cause

getSessionsConfig() forwarded only 2 arguments to the sessions factory, so the textOnly parameter was always undefined. The ImageTextToText / ImageAudioTextToText session factories include vision_encoder / audio_encoder when textOnly is falsy — but from_pretrained() correctly detects cross-architecture loading and sets textOnly = true to skip those files.

Path textOnly computed? vision_encoder included for text-gen?
from_pretrained (actual load) YES NO
get_model_files (cache check) NO (was always undefined) YES

Fix

  1. Add textOnly parameter to getSessionsConfig() and forward it to the sessions factory
  2. In get_model_files(), detect cross-architecture loading (same logic as resolveTypeConfig) and pass textOnly = true when appropriate
  3. Export resolveTypeConfig for reuse

Test plan

  • is_pipeline_cached("text-generation", "onnx-community/gemma-3-4b-it-ONNX", { device: "webgpu", dtype: "q4f16" }) now returns true after the pipeline has been loaded
  • Models without cross-architecture loading (e.g. standard encoder-only) are unaffected

Closes #1606

…skips vision encoder

is_pipeline_cached incorrectly returns false for text-generation
pipelines on models like gemma-3-4b-it-ONNX because get_model_files
checks for vision_encoder files that text-generation never downloads.

Root cause: getSessionsConfig only forwarded two arguments to the
sessions factory, so the textOnly parameter was always undefined.
The ImageTextToText and ImageAudioTextToText factories include
vision_encoder/audio_encoder when textOnly is falsy.

- Add textOnly parameter to getSessionsConfig and forward it to the
  sessions factory
- Export resolveTypeConfig for reuse
- Compute textOnly in get_model_files using the same cross-architecture
  detection as from_pretrained (ForCausalLM loading a
  ForConditionalGeneration model)

Closes huggingface#1606
@xenova
Copy link
Copy Markdown
Collaborator

xenova commented Mar 26, 2026

Thanks for the PR 👍

cross-architecture loading detection is now duplicated in two placed. can you make sure that the model registry logic uses the same helper functions as defined in modeling utils?

Per review feedback, move the cross-architecture detection into a
shared isTextOnlyConfig() helper in modeling_utils.js instead of
duplicating the logic in get_model_files.js.
@s-zx
Copy link
Copy Markdown
Contributor Author

s-zx commented Mar 27, 2026

Good point! I've extracted the cross-architecture detection into a shared isTextOnlyConfig() helper in modeling_utils.js and updated get_model_files.js to use it instead of duplicating the logic. Pushed in f2c8b0a.

return nativeArch.endsWith('ForConditionalGeneration');
}

export function getSessionsConfig(modelType, config, options = {}, textOnly = false) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you also messed up the location of the isTextOnlyConfig function (it is in between getSessionsConfig and its jsdoc)

Copy link
Copy Markdown
Collaborator

@xenova xenova Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this also duplicates logic between this and resolveTypeConfig

Comment on lines +98 to +100
// Use the shared helper to detect cross-architecture loading (e.g.
// ForCausalLM loading a ForConditionalGeneration model). In text-only
// mode the sessions factory skips vision/audio encoder files.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for these comments :)

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

* @returns {{ typeConfig: Object, textOnly: boolean, modelType: number|undefined }}
*/
function resolveTypeConfig(modelName, config) {
export function resolveTypeConfig(modelName, config) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not used elsewhere? Probably meant to be exported and usage above.

xenova added a commit that referenced this pull request Mar 29, 2026
…eration task

inspired by #1608

Co-Authored-By: zxshen <zshen339@gatech.edu>
@xenova
Copy link
Copy Markdown
Collaborator

xenova commented Mar 29, 2026

inspired by this PR, I opened #1614, which is a more robust fix of this problem. I added you as a co-author for the inspiration, so I'll close this PR

@xenova xenova closed this Mar 29, 2026
xenova added a commit that referenced this pull request Mar 29, 2026
…ration pipeline (#1614)

* Add unit test for text-generation on multimodal model

* add more multimodal text-generation unit tests

* Exclude certain sessions when loading multimodal models with text-generation task

inspired by #1608

Co-Authored-By: zxshen <zshen339@gatech.edu>

* simplify multimodal text-generation pipeline logic

* invert logic to keep non-model files

* cleanup

---------

Co-authored-by: zxshen <zshen339@gatech.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

is_pipeline_cached

4 participants