This project uses an XGBoost-based machine learning pipeline to predict the performance of AMD GPU MI kernels and accelerate the Tensile tuning process.
- Hardware: AMD GPU (MI210/MI300 series) for data collecting.
- Software: ROCm environment with
hipBLASLtandTensileLiteinstalled and builded.
We recommend using the official ROCm Docker container. Detailed instructions for Docker usage and building Tensile with mi_tune can be found in the Data Collection README.
The project is organized into two primary stages:
Data Collection: Tools to generate GEMM problems and run parallel benchmarking on multiple GPUs to collect GFLOPS data.
Model Training (train_mi): Scripts for data analysis, training the XGBoost model, and evaluating prediction accuracy.
All experimental data, logs, and visualizations are stored in the train_mi/ directory. You can find more details in Final Report
-
Top-1 Accuracy: 45.6%
-
Recall@20: 97.2%
-
Mean Regret: 3.67%
-
Avg Inference Time: 0.37s per problem