fix: Idefics3 encoder cache panic when do_image_splitting is enabled#2074
Open
romnn wants to merge 1 commit intoEricLBuehler:masterfrom
Open
fix: Idefics3 encoder cache panic when do_image_splitting is enabled#2074romnn wants to merge 1 commit intoEricLBuehler:masterfrom
romnn wants to merge 1 commit intoEricLBuehler:masterfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The per-image encoder cache in
Idefics3Model::forward_innerpanics withunwrap()onNonewhendo_image_splittingis enabled in the preprocessor config (the default for granite-docling-258M and other Idefics3 models).Root Cause
When
do_image_splitting=true, theIdefics3ImageProcessorsplits each user image into N sub-images (grid tiles + one global resize). For example, a 1239x1753 document page withlongest_edge=512produces 13 sub-images (4x3 grid + 1 global).However,
image_hashesis computed per original user image (one hash each), not per sub-image. This creates a mismatch:pixel_values.dim(0) = 13(sub-images after splitting)image_hashes.len() = 1(one hash for the original image)The encoder cache code allocates
vec![None; 13]but only fills index 0 (from the single hash), leaving indices 1-12 asNone. The subsequentper_image.into_iter().map(|t| t.unwrap())panics on the firstNone.Fix
Skip per-image encoder caching when
image_hashes.len() != pixel_values.dim(0), falling back to the existing batch processing path which handles any number of images correctly. Per-image caching is semantically invalid in this case anyway — different sub-images from the same original would all map to the same hash key.Reproduction
Any Idefics3 model with
do_image_splitting: trueinpreprocessor_config.json(the default) will panic when given an image larger thanmax_image_size.longest_edge(typically 512px). This includesibm-granite/granite-docling-258Mwith any document page image.