Skip to content

[Solution] Native Apple Silicon (MPS) Support via FlashAttention Bypass #222

@ariz32601-ksl

Description

@ariz32601-ksl

For those blocked on macOS due to the hardcoded flash_attn and triton dependencies, our lab has released a patched loader that enables native Metal (MPS) inference for Evo2 (StripedHyena).

The Fix:

  • Flash Attention Bypass: Runtime injection disables the CUDA check without modifying the core library source.
  • Memory Safety: Redirects torch.load to avoid the CPU OOM spike common on Unified Memory (resolves the zsh: killed error during loading).
  • Cache Resolution: Fixes pathing for HuggingFace snapshot_download on macOS filesystems.

Benchmarks (M3 Ultra / 128GB):

  • Model: evo2_7b
  • Inference: ~0.75s / sequence
  • Precision: BFloat16

Repository:
https://github.com/ariz32601-ksl/evo2-silicon

Developed by KSL Research.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions