Hi, I’m trying to use Polygeist to extract polyhedral structure from CUDA kernels. According to the Polygeist-GPU paper (CGO’24), CUDA code can be lifted into GPU-level Polygeist IR such as:
However, when I compile CUDA kernels with cgeist, I only get MLIR scf IR (scf.if, scf.execute_region, scf.for, etc.) and the GPU parallel structure is not preserved.
Command:
cgeist --immediate \
-cuda-path=/usr/local/cuda \
--cuda-gpu-arch=sm_89 \
-I/usr/local/cuda/include \
-I/data/zyx/local/Polygeist/llvm-project/build/lib/clang/18/include \
--resource-dir=/data/zyx/local/Polygeist/llvm-project/build/lib/clang/18 \
-S \
test.cu > test.mlir
Example kernels and the MLIR output
__global__ void kernel_C(int m, int n, double alpha, double beta, double* C,
double* A, double* B, double* tmp) {
int i = blockDim.x * blockIdx.x + threadIdx.x;
int j = blockDim.y * blockIdx.y + threadIdx.y;
if (i < m && j < n)
C[i * n + j] = beta * C[i * n + j] +
alpha * B[i * n + j] * A[i * n + i] +
alpha * tmp[i * n + j];
}
__global__ void kernel_sum(int m, int n, double alpha, double beta, double* C,
double* A, double* B, double* tmp) {
int k = blockDim.x * blockIdx.x + threadIdx.x;
int j = blockDim.y * blockIdx.y + threadIdx.y;
if (k < m - 1 && j < n) {
for (int i = k + 1; i < m; i++)
C[k * n + j] += alpha * B[i * n + j] * A[i * n + k];
}
}
I also tested various available cgeist options, but none retained the for-loop structure for further analysis.
Questions:
- Is there a flag or pipeline that prevents lowering CUDA kernels directly into
scf and keeps them in Polygeist’s GPU IR?
- If this path is not supported anymore, is the GPU-wrapper lowering planned for reintroduction?
- Or should I implement a custom pass that intercepts the lowering before it becomes scf?
Any guidance would be very helpful. Thanks!
Hi, I’m trying to use Polygeist to extract polyhedral structure from CUDA kernels. According to the Polygeist-GPU paper (CGO’24), CUDA code can be lifted into GPU-level Polygeist IR such as:
However, when I compile CUDA kernels with cgeist, I only get MLIR scf IR (
scf.if,scf.execute_region,scf.for, etc.) and the GPU parallel structure is not preserved.Command:
cgeist --immediate \ -cuda-path=/usr/local/cuda \ --cuda-gpu-arch=sm_89 \ -I/usr/local/cuda/include \ -I/data/zyx/local/Polygeist/llvm-project/build/lib/clang/18/include \ --resource-dir=/data/zyx/local/Polygeist/llvm-project/build/lib/clang/18 \ -S \ test.cu > test.mlirExample kernels and the MLIR output
I also tested various available cgeist options, but none retained the for-loop structure for further analysis.
Questions:
scfand keeps them in Polygeist’s GPU IR?Any guidance would be very helpful. Thanks!