[Solution] Native Apple Silicon (MPS) Support via FlashAttention Bypass

For those blocked on macOS due to the hardcoded `flash_attn` and `triton` dependencies, our lab has released a patched loader that enables native Metal (MPS) inference for Evo2 (StripedHyena).

**The Fix:**
*   **Flash Attention Bypass:** Runtime injection disables the CUDA check without modifying the core library source.
*   **Memory Safety:** Redirects `torch.load` to avoid the CPU OOM spike common on Unified Memory (resolves the `zsh: killed` error during loading).
*   **Cache Resolution:** Fixes pathing for HuggingFace `snapshot_download` on macOS filesystems.

**Benchmarks (M3 Ultra / 128GB):**
*   **Model:** `evo2_7b`
*   **Inference:** ~0.75s / sequence
*   **Precision:** BFloat16

**Repository:**
https://github.com/ariz32601-ksl/evo2-silicon

Developed by KSL Research. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Solution] Native Apple Silicon (MPS) Support via FlashAttention Bypass #222

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Solution] Native Apple Silicon (MPS) Support via FlashAttention Bypass #222

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions