CoreAI-Model-Zoo

LLMs converted to Apple Core AI (.aimodel, iOS 27 / macOS 27) — downloadable, verified on-device, with the conversion code and a knowledge base. Successor to CoreML-Models.

Models

Model	Download (`.aimodel`)	License
Qwen3.5-0.8B	🤗 qwen3.5-0.8B-CoreAI	Apache-2.0
Qwen3.5-2B	🤗 qwen3.5-2B-CoreAI	Apache-2.0
Qwen3.6-35B-A3B (MoE, Mac-only)	🤗 Qwen3.6-35B-A3B-CoreAI	Apache-2.0
Qwen3.6-27B (dense, Mac-only)	🤗 Qwen3.6-27B-CoreAI	Apache-2.0
GLM-4.7-Flash (MoE + MLA, Mac-only — zoo's first MLA)	🤗 GLM-4.7-Flash-CoreAI	MIT
Gemma 4 E2B (text, incl. official-QAT int4)	🤗 gemma-4-E2B-CoreAI	Gemma
Gemma 4 E4B (text, official-QAT int4)	🤗 gemma-4-E4B-CoreAI	Gemma
Gemma 4 12B (dense, Mac-only — custom flash-decode kernel ‡)	🤗 Gemma-4-12B-CoreAI	Gemma
Gemma 4 31B (dense, Mac-only — custom flash-decode kernel ‡)	🤗 Gemma-4-31B-CoreAI	Gemma
LFM2.5-1.2B-Instruct	🤗 LFM2.5-1.2B-CoreAI	LFM Open License v1.0
LFM2.5-8B-A1B (MoE, custom `gather_qmm` kernel — first iPhone MoE)	🤗 LFM2.5-8B-A1B-CoreAI	LFM Open License v1.0
Granite 4.0-H 1B / 350M	🤗 granite-4.0-h-CoreAI	Apache-2.0
Qwen3-VL (vision-language)	🤗 2B · 4B · 8B	Apache-2.0
MiniCPM-V 4.6 (vision-language, sub-2B — strongest tiny VLM)	🤗 MiniCPM-V-4.6-CoreAI	Apache-2.0
Gemma 4 E2B vision (VL) (image+text)	`vl/` in 🤗 gemma-4-E2B-CoreAI	Gemma
EmbeddingGemma 300M (text embeddings — on-device RAG / semantic search)	🤗 embeddinggemma-300m-CoreAI	Gemma
Qwen3-Embedding 0.6B (multilingual text embeddings, last-token pooling + MRL)	🤗 Qwen3-Embedding-0.6B-CoreAI	Apache-2.0
Qwen3-Reranker 0.6B (cross-encoder reranker — yes/no relevance score)	🤗 Qwen3-Reranker-0.6B-CoreAI	Apache-2.0
RF-DETR nano/small/medium/large (object detection, no NMS)	🤗 RF-DETR-CoreAI	Apache-2.0
RF-DETR-Seg nano→2xlarge (instance segmentation, 6 sizes)	🤗 RF-DETR-CoreAI	Apache-2.0

Decode throughput (tok/s, greedy; output top-1 exact vs the Hugging Face reference)

	iPhone 17 Pro · GPU	iPhone 17 Pro · ANE	M4 Max · GPU
Qwen3.5-0.8B	71.9	14.7	210
Qwen3.5-2B	29	—	161
LFM2.5-1.2B	45.4	—	276.5
Granite 4.0-H 1B	36.3	—	136.5
Gemma 4 E2B	30.3 (QAT 30.7)	6	77.0 (QAT 78.9)
Gemma 4 E4B (official QAT)	15.1	—	55.8
Gemma 4 E2B VL (image+text, official QAT)	25.5	—	82.4
MiniCPM-V 4.6 (vision-language, sub-2B)	53.4	—	224.3
Qwen3.6-35B-A3B (MoE, 35B/~3B active, Mac-only)	—	—	64.9 †
Qwen3.6-27B (dense, Mac-only)	—	—	15.9
GLM-4.7-Flash (MoE + MLA, 30B/~3B active, Mac-only)	—	—	52.4 †
Gemma 4 12B (dense, Mac-only)	—	—	23 int8 / 33 int4 ‡
Gemma 4 31B (dense, Mac-only)	—	—	17.2 int4 ‡

Measured on the iOS 27 / macOS 27 beta, Apple's coreai-pipelined GPU engine, zero custom kernels (ANE column + †/‡ excepted). † = MoE bundle using the custom gather_qmm Metal kernel (reads only the routed experts). ‡ = dense bundle whose full/global-attention SDPA is a custom flash-decode Metal kernel — the stock MPSGraph SDPA crashes on the ≥16-head × 512 Q (a GPU scratch-heap overflow, apple/coreai-models#27), so these models are unrunnable without it. Prefill, sizes, per-model caveats, and the Mac-only big models: zoo/.

CoreAIChat (apps/) — the zoo's models running on-device on iPhone.

Start here

Try the app (iOS 27 / macOS 27 beta; the model downloads in-app):
- Demo app, no build → Mac: .dmg (notarized, runs the Mac-only bundles) · iPhone: CoreAIChat on TestFlight (coming soon)
- Build it → apps/ — Xcode 27 beta + xcodegen, the coreai-models patch stack + tokenizer.json
Run a model in your own app → knowledge/swift-runtime.md + the model card
Convert a model → knowledge/conversion-guide.md
Compress → knowledge/compression.md
Make it fast → knowledge/custom-metal-kernels.md · knowledge/performance-ceiling.md
Known beta issue (in-graph KV-write crash; workarounds + the input-mask escape) → knowledge/coreai-beta-mpsgraph-kvwrite-bug.md — FB23024751 / apple/coreai-models#5

Repository layout

Dir	What
`zoo/`	Model cards — configurations, sizes, parity, measured throughput.
`knowledge/`	Verified notes on the framework: conversion, compression, stateful KV, custom Metal kernels, AOT, compute-unit rules, the Swift runtime.
`conversion/`	Re-authored models + convert / verify / compress scripts (PyTorch → `.aimodel`).
`swift/`	`CoreAIRunner` — a Swift package that drives `.aimodel` LLM bundles, including architectures beyond the standard runtime.
`apps/`	SwiftUI on-device chat apps (iOS 27): CoreAIChat (Gemma 4 E2B GPU/ANE/⚡ + Qwen3.5 / Qwen3.5-2B / LFM2.5 / Granite ⚡pipelined, one picker) + QwenChatFast (Qwen3.5 static kernels) with in-app model download.

License

BSD-3-Clause (LICENSE). Re-authored model code derives from Apple's BSD-3-Clause coreai_models and retains its notices. Model weights follow their own licenses (see each Hugging Face repo).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CoreAI-Model-Zoo

Models

Decode throughput (tok/s, greedy; output top-1 exact vs the Hugging Face reference)

Start here

Repository layout

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 99 Commits
apps		apps
conversion		conversion
knowledge		knowledge
official		official
swift		swift
zoo		zoo
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

CoreAI-Model-Zoo

Models

Decode throughput (tok/s, greedy; output top-1 exact vs the Hugging Face reference)

Start here

Repository layout

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages