I have an RTX 5000 (16GB VRAM) card, is a Turing chipset but Flash-Attention only work with Ampere (> 8) models. can we add SDPA as Fallback or FlashInfer (that already support Turing-based cards)?
I have an RTX 5000 (16GB VRAM) card, is a Turing chipset but Flash-Attention only work with Ampere (> 8) models.
can we add SDPA as Fallback or FlashInfer (that already support Turing-based cards)?