[diffusiongemma-26B-A4B-it] Support other input modalities

Current loader supports text-to-text only. Follow-up to add the model's other input modalities.

- **Image (vision)** — supportable in the loader now; the transformers impl has the vision path (`vision_tower`, `embed_vision`, `pixel_values`).
- **Video** — listed on the model card, but **not wired in the current `transformers`  (5.12.0)`DiffusionGemmaEncoderModel`** (`modeling_diffusion_gemma.py:975` — "doesn't support audio or video inputs"), so it needs transformers support first.
- **Audio** — not supported by the model; out of scope.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[diffusiongemma-26B-A4B-it] Support other input modalities #5482

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[diffusiongemma-26B-A4B-it] Support other input modalities #5482

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions