Skip to content

fix: Idefics3 encoder cache panic when do_image_splitting is enabled#2074

Open
romnn wants to merge 1 commit intoEricLBuehler:masterfrom
romnn:fix/idefics3-encoder-cache-image-splitting
Open

fix: Idefics3 encoder cache panic when do_image_splitting is enabled#2074
romnn wants to merge 1 commit intoEricLBuehler:masterfrom
romnn:fix/idefics3-encoder-cache-image-splitting

Conversation

@romnn
Copy link
Copy Markdown

@romnn romnn commented Apr 8, 2026

Summary

The per-image encoder cache in Idefics3Model::forward_inner panics with unwrap() on None when do_image_splitting is enabled in the preprocessor config (the default for granite-docling-258M and other Idefics3 models).

Root Cause

When do_image_splitting=true, the Idefics3ImageProcessor splits each user image into N sub-images (grid tiles + one global resize). For example, a 1239x1753 document page with longest_edge=512 produces 13 sub-images (4x3 grid + 1 global).

However, image_hashes is computed per original user image (one hash each), not per sub-image. This creates a mismatch:

  • pixel_values.dim(0) = 13 (sub-images after splitting)
  • image_hashes.len() = 1 (one hash for the original image)

The encoder cache code allocates vec![None; 13] but only fills index 0 (from the single hash), leaving indices 1-12 as None. The subsequent per_image.into_iter().map(|t| t.unwrap()) panics on the first None.

pixel_values: [13, C, H, W]   ← 13 sub-images after splitting
image_hashes: [hash_0]        ← 1 hash for the original image
per_image:    [Some, None, None, None, None, None, None, None, None, None, None, None, None]
                                ↑ unwrap() panics here

Fix

Skip per-image encoder caching when image_hashes.len() != pixel_values.dim(0), falling back to the existing batch processing path which handles any number of images correctly. Per-image caching is semantically invalid in this case anyway — different sub-images from the same original would all map to the same hash key.

Reproduction

Any Idefics3 model with do_image_splitting: true in preprocessor_config.json (the default) will panic when given an image larger than max_image_size.longest_edge (typically 512px). This includes ibm-granite/granite-docling-258M with any document page image.

let model = MultimodalModelBuilder::new("ibm-granite/granite-docling-258M")
    .build().await?;
// Send any image > 512px on either dimension → panic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant