I have an implementation of the SSD layer of mamba2 in mosaic gpu, and have currently supported it on hopper devices. Is that something that would belong here? Verified to be working with mamba2 weights as well as nemotron-H, comparable performance with official triton kernels.
I have an implementation of the SSD layer of mamba2 in mosaic gpu, and have currently supported it on hopper devices. Is that something that would belong here? Verified to be working with mamba2 weights as well as nemotron-H, comparable performance with official triton kernels.