Skip to content

john-rocky/coreai-model-zoo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

99 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CoreAI-Model-Zoo

LLMs converted to Apple Core AI (.aimodel, iOS 27 / macOS 27) — downloadable, verified on-device, with the conversion code and a knowledge base. Successor to CoreML-Models.

Models

Model Download (.aimodel) License
Qwen3.5-0.8B 🤗 qwen3.5-0.8B-CoreAI Apache-2.0
Qwen3.5-2B 🤗 qwen3.5-2B-CoreAI Apache-2.0
Qwen3.6-35B-A3B (MoE, Mac-only) 🤗 Qwen3.6-35B-A3B-CoreAI Apache-2.0
Qwen3.6-27B (dense, Mac-only) 🤗 Qwen3.6-27B-CoreAI Apache-2.0
GLM-4.7-Flash (MoE + MLA, Mac-only — zoo's first MLA) 🤗 GLM-4.7-Flash-CoreAI MIT
Gemma 4 E2B (text, incl. official-QAT int4) 🤗 gemma-4-E2B-CoreAI Gemma
Gemma 4 E4B (text, official-QAT int4) 🤗 gemma-4-E4B-CoreAI Gemma
Gemma 4 12B (dense, Mac-only — custom flash-decode kernel ‡) 🤗 Gemma-4-12B-CoreAI Gemma
Gemma 4 31B (dense, Mac-only — custom flash-decode kernel ‡) 🤗 Gemma-4-31B-CoreAI Gemma
LFM2.5-1.2B-Instruct 🤗 LFM2.5-1.2B-CoreAI LFM Open License v1.0
LFM2.5-8B-A1B (MoE, custom gather_qmm kernel — first iPhone MoE) 🤗 LFM2.5-8B-A1B-CoreAI LFM Open License v1.0
Granite 4.0-H 1B / 350M 🤗 granite-4.0-h-CoreAI Apache-2.0
Qwen3-VL (vision-language) 🤗 2B · 4B · 8B Apache-2.0
MiniCPM-V 4.6 (vision-language, sub-2B — strongest tiny VLM) 🤗 MiniCPM-V-4.6-CoreAI Apache-2.0
Gemma 4 E2B vision (VL) (image+text) vl/ in 🤗 gemma-4-E2B-CoreAI Gemma
EmbeddingGemma 300M (text embeddings — on-device RAG / semantic search) 🤗 embeddinggemma-300m-CoreAI Gemma
Qwen3-Embedding 0.6B (multilingual text embeddings, last-token pooling + MRL) 🤗 Qwen3-Embedding-0.6B-CoreAI Apache-2.0
Qwen3-Reranker 0.6B (cross-encoder reranker — yes/no relevance score) 🤗 Qwen3-Reranker-0.6B-CoreAI Apache-2.0
RF-DETR nano/small/medium/large (object detection, no NMS) 🤗 RF-DETR-CoreAI Apache-2.0
RF-DETR-Seg nano→2xlarge (instance segmentation, 6 sizes) 🤗 RF-DETR-CoreAI Apache-2.0

Decode throughput (tok/s, greedy; output top-1 exact vs the Hugging Face reference)

iPhone 17 Pro · GPU iPhone 17 Pro · ANE M4 Max · GPU
Qwen3.5-0.8B 71.9 14.7 210
Qwen3.5-2B 29 161
LFM2.5-1.2B 45.4 276.5
Granite 4.0-H 1B 36.3 136.5
Gemma 4 E2B 30.3 (QAT 30.7) 6 77.0 (QAT 78.9)
Gemma 4 E4B (official QAT) 15.1 55.8
Gemma 4 E2B VL (image+text, official QAT) 25.5 82.4
MiniCPM-V 4.6 (vision-language, sub-2B) 53.4 224.3
Qwen3.6-35B-A3B (MoE, 35B/~3B active, Mac-only) 64.9
Qwen3.6-27B (dense, Mac-only) 15.9
GLM-4.7-Flash (MoE + MLA, 30B/~3B active, Mac-only) 52.4
Gemma 4 12B (dense, Mac-only) 23 int8 / 33 int4 ‡
Gemma 4 31B (dense, Mac-only) 17.2 int4 ‡

Measured on the iOS 27 / macOS 27 beta, Apple's coreai-pipelined GPU engine, zero custom kernels (ANE column + / excepted). = MoE bundle using the custom gather_qmm Metal kernel (reads only the routed experts). = dense bundle whose full/global-attention SDPA is a custom flash-decode Metal kernel — the stock MPSGraph SDPA crashes on the ≥16-head × 512 Q (a GPU scratch-heap overflow, apple/coreai-models#27), so these models are unrunnable without it. Prefill, sizes, per-model caveats, and the Mac-only big models: zoo/.

CoreAIChat screen recording

CoreAIChat (apps/) — the zoo's models running on-device on iPhone.

Start here

Repository layout

Dir What
zoo/ Model cards — configurations, sizes, parity, measured throughput.
knowledge/ Verified notes on the framework: conversion, compression, stateful KV, custom Metal kernels, AOT, compute-unit rules, the Swift runtime.
conversion/ Re-authored models + convert / verify / compress scripts (PyTorch → .aimodel).
swift/ CoreAIRunner — a Swift package that drives .aimodel LLM bundles, including architectures beyond the standard runtime.
apps/ SwiftUI on-device chat apps (iOS 27): CoreAIChat (Gemma 4 E2B GPU/ANE/⚡ + Qwen3.5 / Qwen3.5-2B / LFM2.5 / Granite ⚡pipelined, one picker) + QwenChatFast (Qwen3.5 static kernels) with in-app model download.

License

BSD-3-Clause (LICENSE). Re-authored model code derives from Apple's BSD-3-Clause coreai_models and retains its notices. Model weights follow their own licenses (see each Hugging Face repo).

About

Community model zoo + knowledge base for Apple Core AI (iOS/macOS 27): Qwen3.5 & Gemma 4 converted end-to-end, verified on-device (iPhone 17 Pro GPU/ANE), conversion gotchas, custom Metal kernels, Swift runner

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors