[CUDA] Qwen3.6-35B-A3B Throughput Optimization

This is used to track the progress of Qwen3.6-35B-A3B Throughput Optimization.

Related PRs:
For Qwen 3.6:

* olive-recipes
[Add Qwen3.6-35B-A3B MoE VLM recipe (CUDA + CPU)](https://github.com/microsoft/olive-recipes/pull/492)
 
* onnxruntime-genai
[Fix CUDA QMoE INT4 export for Qwen3.5/3.6 MoE models](https://github.com/microsoft/onnxruntime-genai/pull/2209)
https://github.com/microsoft/onnxruntime-genai/pull/2218

* cuda op / kernels:
https://github.com/microsoft/onnxruntime/pull/28980
https://github.com/microsoft/onnxruntime/pull/28985
https://github.com/microsoft/onnxruntime/pull/28986 (Not needed if we have shared expert optimization like below)
https://github.com/microsoft/onnxruntime/pull/29028
https://github.com/microsoft/onnxruntime/pull/29038 (Need to extend to block quantization)
https://github.com/microsoft/onnxruntime/pull/29013

Related issues:
[AddExternalInitializers copies device (GPU) OrtValues per session instead of using them in place
 ](https://github.com/microsoft/onnxruntime/issues/29009)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CUDA] Qwen3.6-35B-A3B Throughput Optimization #28987

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[CUDA] Qwen3.6-35B-A3B Throughput Optimization #28987

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions