Skip to content

[Issue]: tune_gemm, rocm 7.0: Unable to run on gfx950 #888

Description

@matthiasdiener

Problem Description

When trying to run tune_gemm on a gfx950 using Rocm 7.0, I get the following error:

# ./tune_gemm.py --gemm_size_file t.yaml
Tuning 1 gemm sizes starts at: 2025-10-06 20:43:27.298307
SIZE: 8192 8192 8192 NN nConfigs: 1152 multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/usr/lib/python3.12/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
                    ^^^^^^^^^^^^^^^^^^^
  File "/dockerx/triton/python/perf-kernels/tools/tune_gemm/./tune_gemm.py", line 184, in extract_kernel_time
    first_value = df['DurationNs'].iloc[0]
                  ~~~~~~~~~~~~~~~~~~~~~^^^
  File "/opt/venv/lib/python3.12/site-packages/pandas/core/indexing.py", line 1191, in __getitem__
    return self._getitem_axis(maybe_callable, axis=axis)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/pandas/core/indexing.py", line 1752, in _getitem_axis
    self._validate_integer(key, axis)
  File "/opt/venv/lib/python3.12/site-packages/pandas/core/indexing.py", line 1685, in _validate_integer
    raise IndexError("single positional indexer is out-of-bounds")
IndexError: single positional indexer is out-of-bounds
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/dockerx/triton/python/perf-kernels/tools/tune_gemm/./tune_gemm.py", line 725, in <module>
    sys.exit(main())
             ^^^^^^
  File "/dockerx/triton/python/perf-kernels/tools/tune_gemm/./tune_gemm.py", line 655, in main
    minTime, bestConfig, compile_time, profile_time, post_time = tune_gemm_config(
                                                                 ^^^^^^^^^^^^^^^^^
  File "/dockerx/triton/python/perf-kernels/tools/tune_gemm/./tune_gemm.py", line 264, in tune_gemm_config
    config, myTime = task.get()
                     ^^^^^^^^^^
  File "/usr/lib/python3.12/multiprocessing/pool.py", line 774, in get
    raise self._value
IndexError: single positional indexer is out-of-bounds

It does run correctly with the following workaround applied:

diff --git a/python/perf-kernels/tools/tune_gemm/tune_gemm.py b/python/perf-kernels/tools/tune_gemm/tune_gemm.py
index 7d51575e4..4ae63f04e 100755
--- a/python/perf-kernels/tools/tune_gemm/tune_gemm.py
+++ b/python/perf-kernels/tools/tune_gemm/tune_gemm.py
@@ -64,7 +64,7 @@ def get_full_tuning_space():
     num_stage_range = [2]
     waves_per_eu_range = [0]
     matrix_instr_nonkdim_range = [16, 32]
-    kpack_range = [1, 2]
+    kpack_range = [1]
     schedule_hints = ["none"]

     space = itertools.product(block_mn_range, block_mn_range, block_k_range, num_warps_range, group_m_range,

Operating System

Ubuntu 24.04.3 LTS

CPU

AMD Ryzen Threadripper PRO 7985WX 64-Cores

GPU

AMD Instinct MI350X

ROCm Version

7.0

ROCm Component

No response

Steps to Reproduce

No response

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions