LLMs converted to Apple Core AI (.aimodel, iOS 27 / macOS 27) — downloadable, verified
on-device, with the conversion code and a knowledge base. Successor to
CoreML-Models.
| Model | Download (.aimodel) |
License |
|---|---|---|
| Qwen3.5-0.8B | 🤗 qwen3.5-0.8B-CoreAI | Apache-2.0 |
| Qwen3.5-2B | 🤗 qwen3.5-2B-CoreAI | Apache-2.0 |
| Qwen3.6-35B-A3B (MoE, Mac-only) | 🤗 Qwen3.6-35B-A3B-CoreAI | Apache-2.0 |
| Qwen3.6-27B (dense, Mac-only) | 🤗 Qwen3.6-27B-CoreAI | Apache-2.0 |
| GLM-4.7-Flash (MoE + MLA, Mac-only — zoo's first MLA) | 🤗 GLM-4.7-Flash-CoreAI | MIT |
| Gemma 4 E2B (text, incl. official-QAT int4) | 🤗 gemma-4-E2B-CoreAI | Gemma |
| Gemma 4 E4B (text, official-QAT int4) | 🤗 gemma-4-E4B-CoreAI | Gemma |
| Gemma 4 12B (dense, Mac-only — custom flash-decode kernel ‡) | 🤗 Gemma-4-12B-CoreAI | Gemma |
| Gemma 4 31B (dense, Mac-only — custom flash-decode kernel ‡) | 🤗 Gemma-4-31B-CoreAI | Gemma |
| LFM2.5-1.2B-Instruct | 🤗 LFM2.5-1.2B-CoreAI | LFM Open License v1.0 |
LFM2.5-8B-A1B (MoE, custom gather_qmm kernel — first iPhone MoE) |
🤗 LFM2.5-8B-A1B-CoreAI | LFM Open License v1.0 |
| Granite 4.0-H 1B / 350M | 🤗 granite-4.0-h-CoreAI | Apache-2.0 |
| Qwen3-VL (vision-language) | 🤗 2B · 4B · 8B | Apache-2.0 |
| MiniCPM-V 4.6 (vision-language, sub-2B — strongest tiny VLM) | 🤗 MiniCPM-V-4.6-CoreAI | Apache-2.0 |
| Gemma 4 E2B vision (VL) (image+text) | vl/ in 🤗 gemma-4-E2B-CoreAI |
Gemma |
| EmbeddingGemma 300M (text embeddings — on-device RAG / semantic search) | 🤗 embeddinggemma-300m-CoreAI | Gemma |
| Qwen3-Embedding 0.6B (multilingual text embeddings, last-token pooling + MRL) | 🤗 Qwen3-Embedding-0.6B-CoreAI | Apache-2.0 |
| Qwen3-Reranker 0.6B (cross-encoder reranker — yes/no relevance score) | 🤗 Qwen3-Reranker-0.6B-CoreAI | Apache-2.0 |
| RF-DETR nano/small/medium/large (object detection, no NMS) | 🤗 RF-DETR-CoreAI | Apache-2.0 |
| RF-DETR-Seg nano→2xlarge (instance segmentation, 6 sizes) | 🤗 RF-DETR-CoreAI | Apache-2.0 |
| iPhone 17 Pro · GPU | iPhone 17 Pro · ANE | M4 Max · GPU | |
|---|---|---|---|
| Qwen3.5-0.8B | 71.9 | 14.7 | 210 |
| Qwen3.5-2B | 29 | — | 161 |
| LFM2.5-1.2B | 45.4 | — | 276.5 |
| Granite 4.0-H 1B | 36.3 | — | 136.5 |
| Gemma 4 E2B | 30.3 (QAT 30.7) | 6 | 77.0 (QAT 78.9) |
| Gemma 4 E4B (official QAT) | 15.1 | — | 55.8 |
| Gemma 4 E2B VL (image+text, official QAT) | 25.5 | — | 82.4 |
| MiniCPM-V 4.6 (vision-language, sub-2B) | 53.4 | — | 224.3 |
| Qwen3.6-35B-A3B (MoE, 35B/~3B active, Mac-only) | — | — | 64.9 † |
| Qwen3.6-27B (dense, Mac-only) | — | — | 15.9 |
| GLM-4.7-Flash (MoE + MLA, 30B/~3B active, Mac-only) | — | — | 52.4 † |
| Gemma 4 12B (dense, Mac-only) | — | — | 23 int8 / 33 int4 ‡ |
| Gemma 4 31B (dense, Mac-only) | — | — | 17.2 int4 ‡ |
Measured on the iOS 27 / macOS 27 beta, Apple's coreai-pipelined GPU engine, zero custom
kernels (ANE column + †/‡ excepted). † = MoE bundle using the custom
gather_qmm Metal kernel (reads only the routed
experts). ‡ = dense bundle whose full/global-attention SDPA is a custom flash-decode Metal
kernel — the stock MPSGraph SDPA crashes on the ≥16-head × 512 Q (a GPU scratch-heap overflow,
apple/coreai-models#27), so these models are
unrunnable without it. Prefill, sizes, per-model caveats, and the Mac-only big models: zoo/.
CoreAIChat (apps/) — the zoo's models running on-device on iPhone.
- Try the app (iOS 27 / macOS 27 beta; the model downloads in-app):
- Run a model in your own app →
knowledge/swift-runtime.md+ the model card - Convert a model →
knowledge/conversion-guide.md - Compress →
knowledge/compression.md - Make it fast →
knowledge/custom-metal-kernels.md·knowledge/performance-ceiling.md - Known beta issue (in-graph KV-write crash; workarounds + the input-mask escape) →
knowledge/coreai-beta-mpsgraph-kvwrite-bug.md— FB23024751 / apple/coreai-models#5
| Dir | What |
|---|---|
zoo/ |
Model cards — configurations, sizes, parity, measured throughput. |
knowledge/ |
Verified notes on the framework: conversion, compression, stateful KV, custom Metal kernels, AOT, compute-unit rules, the Swift runtime. |
conversion/ |
Re-authored models + convert / verify / compress scripts (PyTorch → .aimodel). |
swift/ |
CoreAIRunner — a Swift package that drives .aimodel LLM bundles, including architectures beyond the standard runtime. |
apps/ |
SwiftUI on-device chat apps (iOS 27): CoreAIChat (Gemma 4 E2B GPU/ANE/⚡ + Qwen3.5 / Qwen3.5-2B / LFM2.5 / Granite ⚡pipelined, one picker) + QwenChatFast (Qwen3.5 static kernels) with in-app model download. |
BSD-3-Clause (LICENSE). Re-authored model code derives from Apple's BSD-3-Clause
coreai_models and retains its notices. Model weights follow their own licenses (see each
Hugging Face repo).
