[RFC] Standardized Model Config for Benchmarks

### 🚀 The feature, motivation and pitch

As we add support for more devices, benchmarking scripts increasingly rely on device-specific tensor shapes due to VRAM constraints. This leads to fragmented benchmark setups and scattered performance results that are difficult to compare or reproduce. To make benchmarking more scalable and consistent, I propose introducing a standardized benchmark model configuration.

### Motivation

Today, different kernels (and sometimes different devices) use ad-hoc shapes, which:

- Require device-specific dispatching in benchmark scripts
- Produce results that are hard to compare across hardware
- Increase maintenance burden for contributors adding new benchmarks

### Proposal

Define one or more representative “mainstream” model profiles (e.g., LLaMA-/GPT-like) with canonical parameters such as: `hidden_size`, `vocab_size`, `num_q_heads`, `num_kv_heads`, etc.


All benchmark scripts would derive their shapes from this shared config, optionally using scaled-down subsets when needed to fit memory constraints, instead of inventing per-device shapes.

This would:

- Eliminate ad-hoc device-specific shapes
- Improve comparability across kernels and hardware
- Lower the barrier for contributors writing new benchmarks
- Improve reproducibility of performance results

### Target Devices

Before finalizing the standardized config, it would be helpful to clarify which devices we officially want to support for benchmarking.

Here’s the list we currently have:

- NVIDIA H100 (80GB)
- Intel XPU GPU Max 1100 (48GB)
- NPU Atlas 900 A2 POD (64G)
- 
Please feel free to comment if I missed any devices, or if there are additional targets we should consider.

### Next Steps

- [ ]  Confirm target benchmarking devices
- [ ]  Agree on one or more baseline model profiles (e.g., LLaMA, Qwen, Gemma, etc)
- [ ]  Define canonical parameters (hidden_size, vocab_size, heads, etc.)
- [ ]  Specify scaling rules for memory-constrained devices
- [ ]  Refactor existing benchmarks to consume the standardized config


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Standardized Model Config for Benchmarks #1051

🚀 The feature, motivation and pitch

Motivation

Proposal

Target Devices

Next Steps

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RFC] Standardized Model Config for Benchmarks #1051

Description

🚀 The feature, motivation and pitch

Motivation

Proposal

Target Devices

Next Steps

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions