webgpu: bypass manual mRoPE for text-only Qwen3.5 when GQA fuses RoPE by qjia7 · Pull Request #2245 · microsoft/onnxruntime-genai

qjia7 · 2026-06-26T08:42:26Z

Summary

For text-only Qwen3.5, multi-head RoPE (mRoPE) collapses to standard 1D RoPE:
Qwen3_5TextRotaryEmbedding expands a 2D position_ids into 3 identical axes
and apply_interleaved_mrope returns freqs[0] unchanged. The manual mRoPE
subgraph (Shape → Expand → interleaved cos/sin caches → custom kernel) is
therefore equivalent to a plain fused-RoPE pass inside GQA.

When the GQA operator supports fused RoPE (use_rope_in_attn=True, e.g. on
WebGPU), this PR detects the text-only case and routes through the fused path,
bypassing the manual mRoPE subgraph entirely. This removes the Shape → Memcpy node that reads a dynamic tensor shape at runtime — the path that
prevents WebGPU graph capture on Qwen3.5 text-only models.

Changes (src/python/py/models/builders/qwen.py only):

Add use_text_only_fused_rope flag: true when is_text_only and use_rope_in_attn.
When flag is set: call make_rotary_embedding_caches() (standard 2D cos/sin for GQA), skip mRoPE config, leave use_rope_in_attn=True.
When flag is not set: keep existing mRoPE path unchanged (VL mode and non-fused-RoPE EPs).
make_position_ids_reformatting: early-return None when fused RoPE is active (no position_ids tensor on the data flow).

Test plan

Verify text-only Qwen3.5-0.8B generates correct output on WebGPU with graph capture enabled
Verify multimodal Qwen3.5 (VL mode) is unaffected — still uses manual mRoPE path
Run existing Qwen integration tests: qwen3-0.6b, qwen2.5-0.5b-instruct

Text-only mRoPE collapses to standard 1D RoPE because Qwen3_5TextRotaryEmbedding expands a 2D position_ids to 3 identical axes and apply_interleaved_mrope returns freqs[0] unchanged. When GQA can perform fused RoPE we therefore bypass the manual mRoPE subgraph entirely, which removes the Shape -> Memcpy path that blocks WebGPU graph capture.

        self.attention_attrs["q_norm"] = True
        self.attention_attrs["k_norm"] = True
-        super().make_attention_init(config)
+        super().make_attention_init()


        super().__init__(config, io_dtype, onnx_dtype, ep, cache_dir, extra_options)

-    def make_attention_init(self, config):
+    def make_attention_init(self):


qjia7 force-pushed the qwen35-text-only-fused-rope-bypass branch from dd1fcfb to 86440e3 Compare June 26, 2026 08:47

github-advanced-security AI found potential problems Jun 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

webgpu: bypass manual mRoPE for text-only Qwen3.5 when GQA fuses RoPE#2245

webgpu: bypass manual mRoPE for text-only Qwen3.5 when GQA fuses RoPE#2245
qjia7 wants to merge 1 commit into
mainfrom
qwen35-text-only-fused-rope-bypass

qjia7 commented Jun 26, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

qjia7 commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

qjia7 commented Jun 26, 2026 •

edited

Loading