Skip to content

Latest commit

 

History

History
67 lines (47 loc) · 2.65 KB

File metadata and controls

67 lines (47 loc) · 2.65 KB

Examples

Curated kernels from KernelBench Level 2 and the Intel XPU Triton benchmarks, organized by optimization pattern. Each example includes a Triton kernel (.py) and a spec (.yaml). Source files live in test_kernels/; the examples/ directory provides a categorized view via symlinks.

Running an example

xe-forge -i examples/gemm/14_Gemm_Divide_Sum_Scaling.py \
         -s examples/gemm/14_Gemm_Divide_Sum_Scaling.yaml \
         -o optimized.py

Categories

GEMM

GEMM with post-matmul elementwise or reduction operations.

Kernel Operations
14_Gemm_Divide_Sum_Scaling GEMM + divide + column sum + scaling
39_Gemm_Scale_BatchNorm GEMM + scaling + batch normalization
45_Gemm_Sigmoid_LogSumExp GEMM + sigmoid + log-sum-exp reduction

Fused

Long activation chains fused into a single kernel.

Kernel Operations
81_Gemm_Swish_Divide_Clamp_Tanh_Clamp GEMM + swish + divide + clamp + tanh + clamp
95_Matmul_Add_Swish_Tanh_GELU_Hardtanh Matmul + add + swish + tanh + GELU + hardtanh
99_Matmul_GELU_Softmax Matmul + GELU + softmax

Reduction / Normalization

Kernels with reduction passes (batch norm, softmax).

Kernel Operations
84_Gemm_BatchNorm_Scaling_Softmax GEMM + batch norm + scaling + softmax

Attention

Kernel Operations
1_FlashAttention_Fwd Flash Attention forward (Q @ K, softmax, @ V)

Mixed Ops

Matmul combined with pooling, min/max, or other non-standard operations.

Kernel Operations
55_Matmul_MaxPool_Sum_Scale Matmul + max pool + sum + scaling
68_Matmul_Min_Subtract Matmul + row min + subtract

Adding a new example

  1. Add kernel .py and spec .yaml to test_kernels/
  2. Symlink into the appropriate examples/ category:
    cd examples/gemm
    ln -s ../../test_kernels/MyKernel.py .
    ln -s ../../test_kernels/MyKernel.yaml .
  3. Update this file