Which component has the problem?
CuTe DSL
Bug Report
Describe the bug
When compiling with Uint64 constants, if any of these constants are larger than the largest possible Int64 value, the code crashes. I encountered this bug while trying to implement a PRNG, many of which make use of full-width u64 constants. This goes for any intermediate consteval u64 value as well, so I can't construct a sufficiently large value from smaller ones.
This bug does not occur with runtime u64 values, i.e. I've resorted to passing all large constants as extra arguments to the kernel, but this is suboptimal.
Steps/Code to reproduce bug
import torch
import numpy as np
import cutlass
# 0xFFFFFFFFFFFFFFFF == 18446744073709551615
u64_max = np.iinfo(np.uint64).max
@cutlass.cute.kernel
def _set_val(a):
a[0] = u64_max
@cutlass.cute.jit
def set_val(a):
_set_val(a).launch()
array = torch.zeros(2, dtype=torch.uint64, device="cuda")
set_val(cutlass.cute.runtime.from_dlpack(array))
TypeError: get(): incompatible function arguments. The following argument types are supported:
1. get(type: cutlass._mlir._mlir_libs._cutlass_ir._mlir.ir.Type, value: int) -> cutlass._mlir._mlir_libs._cutlass_ir._mlir.ir.IntegerAttr
Invoked with types: cutlass._mlir._mlir_libs._cutlass_ir._mlir.ir.IntegerType, int
(The error is rather cryptic, but it disappears when you use a number <= i64::max, so I'm fairly certain it's about a hidden overflow check.)
Expected behavior
All u64 constants up to and including u64::max should be valid.
Environment details (please complete the following information):
Local, sm120
Additional context
I've tested it with all recent versions, up to nvidia-cutlass-dsl-libs-cu13==4.5.2
Which component has the problem?
CuTe DSL
Bug Report
Describe the bug
When compiling with Uint64 constants, if any of these constants are larger than the largest possible Int64 value, the code crashes. I encountered this bug while trying to implement a PRNG, many of which make use of full-width u64 constants. This goes for any intermediate consteval u64 value as well, so I can't construct a sufficiently large value from smaller ones.
This bug does not occur with runtime u64 values, i.e. I've resorted to passing all large constants as extra arguments to the kernel, but this is suboptimal.
Steps/Code to reproduce bug
(The error is rather cryptic, but it disappears when you use a number <= i64::max, so I'm fairly certain it's about a hidden overflow check.)
Expected behavior
All u64 constants up to and including u64::max should be valid.
Environment details (please complete the following information):
Local, sm120
Additional context
I've tested it with all recent versions, up to nvidia-cutlass-dsl-libs-cu13==4.5.2