feat: add audio content blocks for Gemma multimodal prompts in model_mm by cliu1003 · Pull Request #2240 · microsoft/onnxruntime-genai

cliu1003 · 2026-06-24T08:42:28Z

Description

GetUserContent() in examples/c/src/common.cpp only emitted image and
text content blocks for the Gemma-style structured-content path. Audio
inputs (num_audios) were silently ignored, so no {"type":"audio"} block
was added and the chat template never rendered an <|audio|> marker.

As a result, for Gemma-4 audio inference the audio soft tokens were appended
at the very front of the templated prompt (via the fallback in
ProcessGemma4Prompt), outside the user turn. The model therefore did not
associate the audio with the request and replied with things like
"Please provide the audio you would like me to transcribe."

This PR emits one {"type":"audio"} block per audio clip so the chat
template inserts the <|audio|> marker at the correct position within the
user turn, allowing the audio soft tokens to be expanded in place.

Changes

examples/c/src/common.cpp: add a loop that appends N {"type":"audio"}
blocks (one per audio clip) in the Gemma-style structured-content branch.

GetUserContent only emitted image and text blocks for the Gemma-style structured content path, so audio inputs were never inserted into the chat template. As a result the rendered prompt had no <|audio|> marker and Gemma-4 audio soft tokens were appended at the very front of the templated string, causing the model to ignore the audio (e.g. replying "Please provide the audio..."). Emit one {"type":"audio"} block per audio clip so the chat template inserts the <|audio|> marker in the correct position within the user turn. No effect on Gemma-3 since num_audios is 0 for text/vision-only usage.

Copilot

Pull request overview

This PR updates the C multimodal example prompt construction so Gemma-style structured-content messages include audio content blocks, allowing chat templates to place the audio marker(s) within the user turn (instead of relying on fallback prompt processing).

Changes:

Extend GetUserContent() structured-content branch to append one {"type":"audio"} block per audio clip.
Clarify the structured-content comment to reflect Gemma-3/Gemma-4 intent.

kunal-vaishnavi · 2026-06-24T08:59:23Z

Thank you for your contribution!

cliu1003 · 2026-06-25T02:05:43Z

Hi @kunal-vaishnavi, some failed checks are blocking the merge. Could you please re-run them? Thanks.

cliu1003 · 2026-06-26T02:45:29Z

Hi @kunal-vaishnavi , thanks for you help, there are still one failed. Could you please re-run this? Thanks.

Copilot AI review requested due to automatic review settings June 24, 2026 08:42

cliu1003 requested a review from a team as a code owner June 24, 2026 08:42

Copilot started reviewing on behalf of cliu1003 June 24, 2026 08:42 View session

Copilot AI reviewed Jun 24, 2026

View reviewed changes

Comment thread examples/c/src/common.cpp

Comment thread examples/c/src/common.cpp

kunal-vaishnavi previously approved these changes Jun 24, 2026

View reviewed changes

cliu1003 dismissed kunal-vaishnavi’s stale review via 7a041ce June 24, 2026 09:03

kunal-vaishnavi reviewed Jun 24, 2026

View reviewed changes

Comment thread examples/c/src/common.cpp Outdated

cliu1003 force-pushed the feat/model_mm_support_audio_in_gemma branch from 7a041ce to 0f470fc Compare June 24, 2026 09:12

kunal-vaishnavi approved these changes Jun 24, 2026

View reviewed changes

kunal-vaishnavi enabled auto-merge (squash) June 24, 2026 09:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add audio content blocks for Gemma multimodal prompts in model_mm#2240

feat: add audio content blocks for Gemma multimodal prompts in model_mm#2240
cliu1003 wants to merge 1 commit into
microsoft:mainfrom
cliu1003:feat/model_mm_support_audio_in_gemma

cliu1003 commented Jun 24, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

kunal-vaishnavi commented Jun 24, 2026

Uh oh!

Uh oh!

cliu1003 commented Jun 25, 2026

Uh oh!

cliu1003 commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

cliu1003 commented Jun 24, 2026

Description

Changes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

kunal-vaishnavi commented Jun 24, 2026

Uh oh!

Uh oh!

cliu1003 commented Jun 25, 2026

Uh oh!

cliu1003 commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants