Skip to content

[diffusiongemma-26B-A4B-it] Support other input modalities #5482

Description

@kamalrajkannan78

Current loader supports text-to-text only. Follow-up to add the model's other input modalities.

  • Image (vision) — supportable in the loader now; the transformers impl has the vision path (vision_tower, embed_vision, pixel_values).
  • Video — listed on the model card, but not wired in the current transformers (5.12.0)DiffusionGemmaEncoderModel (modeling_diffusion_gemma.py:975 — "doesn't support audio or video inputs"), so it needs transformers support first.
  • Audio — not supported by the model; out of scope.

Metadata

Metadata

Labels

Type

Fields

No fields configured for Task.

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions