For those blocked on macOS due to the hardcoded flash_attn and triton dependencies, our lab has released a patched loader that enables native Metal (MPS) inference for Evo2 (StripedHyena).
The Fix:
- Flash Attention Bypass: Runtime injection disables the CUDA check without modifying the core library source.
- Memory Safety: Redirects
torch.load to avoid the CPU OOM spike common on Unified Memory (resolves the zsh: killed error during loading).
- Cache Resolution: Fixes pathing for HuggingFace
snapshot_download on macOS filesystems.
Benchmarks (M3 Ultra / 128GB):
- Model:
evo2_7b
- Inference: ~0.75s / sequence
- Precision: BFloat16
Repository:
https://github.com/ariz32601-ksl/evo2-silicon
Developed by KSL Research.
For those blocked on macOS due to the hardcoded
flash_attnandtritondependencies, our lab has released a patched loader that enables native Metal (MPS) inference for Evo2 (StripedHyena).The Fix:
torch.loadto avoid the CPU OOM spike common on Unified Memory (resolves thezsh: killederror during loading).snapshot_downloadon macOS filesystems.Benchmarks (M3 Ultra / 128GB):
evo2_7bRepository:
https://github.com/ariz32601-ksl/evo2-silicon
Developed by KSL Research.