Releases · Blaizzy/mlx-vlm

04 Apr 15:18

Blaizzy

v0.4.4

90732bd

v0.4.4 Latest

Latest

What's Changed

Fix Gemma 4 chunked prefill for KV-shared models and thinking by @Blaizzy in #901
Fix Gemma 4 vision + text degradation and missing processor config by @Blaizzy in #906
Fix Falcon-Perception 300M and move generate_perception to model by @Blaizzy in #910
Fix Gemma 4 tool parser for nested arguments by @Blaizzy in #916
Add VisionFeatureCache for multi-turn image caching by @Blaizzy in #913
Fix broken video_generate and smolvlm_video_generate CLI commands by @Blaizzy in #919
Optimize TurboQuant Metal kernels: 0.85-1.90x baseline with 89% KV savings by @Blaizzy in #909

Full Changelog: v0.4.3...v0.4.4

Contributors

Blaizzy

Assets 2

02 Apr 21:14

Blaizzy

v0.4.3

391179c

v0.4.3

What's Changed

Add SAM 3.1 with Object Multiplex and optimized realtime pipeline by @Blaizzy in #880
Add Falcon-OCR model support by @Griffintaur in #879
Add RF-DETR detection and segmentation model by @Blaizzy in #884
Add granite vision 3.2 and 4.0 by @Blaizzy in #885
Add wired_limit to vision model inference pipelines for CUDA support by @Blaizzy in #887
Add falcon perception by @Blaizzy in #888
fix(server): disable uvicorn reload by default to prevent memory leaks by @futurepitcher in #883
Add Gemma 4 model support (vision, audio, MoE) by @Blaizzy in #890
Fix Gemma 4 embedding scaling by @Blaizzy in #893
Add Turbo Quant by @Blaizzy in #858
Remove TurboQuant benchmark artifacts and add README docs by @Blaizzy in #894

New Contributors

@Griffintaur made their first contribution in #879
@futurepitcher made their first contribution in #883

Full Changelog: v0.4.2...v0.4.3

Contributors

Griffintaur, Blaizzy, and futurepitcher

Assets 2

28 Mar 21:21

Blaizzy

v0.4.2

7f7a04c

v0.4.2

What's Changed

Ensure minimal metadata exists in model card by @pcuenca in #853
fix(qwen3_5): register AutoProcessor patch for qwen3_5 model type by @mdstaff in #859
Patch qwen3_5_moe auto processor by @Blaizzy in #867
Cast Qwen3.5 RMSNorm output to input dtype by @Blaizzy in #868
Fix thinking defaults in CLI and server by @Blaizzy in #869
Fix LFM2-VL processor loading without torch by @Blaizzy in #872
Fix Magistral image token expansion by @Blaizzy in #873
Add Facebook Sam3 by @Blaizzy in #875
[Sam3] label drawing for mask only in realtime video by @Blaizzy in #876
Add DOTS-MOCR processor support by @Blaizzy in #874
Fix PaliGemma processor kwarg routing by @Blaizzy in #877

New Contributors

@mdstaff made their first contribution in #859

Full Changelog: v0.4.1...v0.4.2

Contributors

mdstaff, pcuenca, and Blaizzy

Assets 2

21 Mar 14:25

Blaizzy

v0.4.1

f613272

v0.4.1

What's Changed

feat(server): add --model and --adapter-path flags for startup preloading by @auggie246 in #811
fix(qwen3_5): guard batch dimension mismatches under continuous batching by @kol22 in #813
Add common sampling parameters to server endpoints by @spicyneuron in #814
Add an 'index' for each tool call as per OpenAI spec. by @viktike in #818
Fix Qwen3-Omni (qwen3_omni_moe) integration by @howeirdo in #820
Add mistral4 by @Blaizzy in #827
Clean up server + generate args, validation, and defaults by @spicyneuron in #829
Set custom processor as default and remove torch as dep by @Blaizzy in #821
Add molmo point by @Blaizzy in #844

New Contributors

@auggie246 made their first contribution in #811
@howeirdo made their first contribution in #820

Full Changelog: v0.4.0...v0.4.1

Contributors

viktike, Blaizzy, and 4 other contributors

Assets 2

07 Mar 19:01

Blaizzy

v0.4.0

718f69e

v0.4.0

What's Changed

Fix gemma3n short prompts by @Blaizzy in #751
Adding full weight finetuning by @Goekdeniz-Guelmez in #499
Initialize qwen3_5_moe.LanguageModel with _position_ids by @will-lms in #753
Fix batch inference to use InputEmbeddingsFeatures by @Blaizzy in #760
fix(qwen3_vl): reset _position_ids when new image/video arrives by @kol22 in #756
fix: replace 13 bare excepts with except Exception by @haosenwang1018 in #765
Reset position ids for Qwen3-VL-MoE image inputs by @kol22 in #761
Fix text only single batch by @Blaizzy in #771
Update LORA.md for the new trainer by @Goekdeniz-Guelmez in #775
Support /v1/ prefix for OpenAI compatible endpoints by @spicyneuron in #783
CORS middleware by @viktike in #766
fx: README curl error by @yinzhidong in #788
Streaming chatml response enhancements. by @viktike in #764
Add KV cache (quantization) parameters to server. by @viktike in #776
Fix mllama training and add more models in mapp by @Goekdeniz-Guelmez in #777
Add thinking budget and flag by @Blaizzy in #789
fix: add import guard for gradio in chat_ui entry point by @RtYkk in #787
Tool calling in server by @viktike in #773
Add Minicpm-o-2.5 by @Blaizzy in #791
Add prefill-step-size optional command line argument for server. by @viktike in #792
qwen3_omni_moe: fix visual_embeds_multiscale UnboundLocalError by @ronaldseoh in #794
Add Phi-4-reasoning-vision-15B (phi4-siglip) by @Blaizzy in #796
Add phi4mm by @Blaizzy in #797
Add ensure_fused_sdpa function for optimized attention computation by @Blaizzy in #800
[Internvl_chat] Add optional kwargs parameter to LanguageModel by @Blaizzy in #801
Guard load img & audio by @Blaizzy in #803
[Qwen3.5 MoE] Fix quant predicate by @Blaizzy in #804
Adding orpo by @Goekdeniz-Guelmez in #795
Add moondream3 by @Blaizzy in #807
Remove sleep duration with zero delay by @Blaizzy in #808
Update dependencies and version number by @Blaizzy in #809

New Contributors

@kol22 made their first contribution in #756
@haosenwang1018 made their first contribution in #765
@spicyneuron made their first contribution in #783
@viktike made their first contribution in #766
@yinzhidong made their first contribution in #788
@RtYkk made their first contribution in #787
@ronaldseoh made their first contribution in #794

Full Changelog: v0.3.12...v0.4.0

Contributors

ronaldseoh, viktike, and 8 other contributors

Assets 2

16 Feb 23:38

Blaizzy

v0.3.12

739f4ab

v0.3.12

What's Changed

[MODEL] support qwen3.5 series by @JJJYmmm in #722
qwen3_omni_moe: Fix destructuring error by @Nixuge in #723
docs: clarify that some models need [torch] extra for torchvision by @Mr-Neutr0n in #716
Use mlx-lm provided logits processors and samplers by @AG6GR in #724
Fix LoRA training crash when config uses image_token_id by @JesseRod329 in #720
video_generate.py: handle capitalized extensions in is_video_file by @Nixuge in #719
Honor quantization_config by @pcuenca in #692
[Ministral] Add FP8 dequantization for model weights by @Blaizzy in #727
Fix LFM2.5VL processor compatibility patch image placeholder count by @mattjcly in #725
Add activation quantization support for QQLinear layers by @Blaizzy in #728
Refactor processor return types in utils.py to use ProcessorMixin by @Blaizzy in #729
Fix glm (4v & 4v moe) broadcast by @Blaizzy in #731
[FastVLM] fix dtype cast by @Blaizzy in #733
Fix kimi broadcast by @Blaizzy in #734
Add custom processor for fastvlm by @Blaizzy in #736
Fix paligemma prefill and numerical precision by @Blaizzy in #737
Fix idefics3 llama3 and smovlm by @Blaizzy in #738
docs(README): fix JSON formatting in curl examples by @chsdwn in #739
fix chunk prefill in qwen3.5 moe by @JJJYmmm in #742
Fix florence-2 (processor and infer) by @Blaizzy in #743
Fix Qwen3.5 cast type and predicates by @Blaizzy in #744
[Qwen2, 2.5] Fix vision overflow by @Blaizzy in #745
[Ministral3] Fix multi-image by @Blaizzy in #747
[Phiv3] Fix dtype cast by @Blaizzy in #748
Add dots-ocr by @Blaizzy in #749

New Contributors

@Nixuge made their first contribution in #723
@Mr-Neutr0n made their first contribution in #716
@AG6GR made their first contribution in #724
@JesseRod329 made their first contribution in #720
@chsdwn made their first contribution in #739

Full Changelog: v0.3.11...v0.3.12

Contributors

pcuenca, AG6GR, and 7 other contributors

Assets 2

04 Feb 20:11

Blaizzy

v0.3.11

21f9b45

v0.3.11

What's Changed

Refactor input embedding handling in _generate_batch function by @Blaizzy in #694
[Qwen2-VL ] Fix incorrect attention mask in Vision by @hturbe in #704
Add GLM-OCR by @Blaizzy & @mikolaj92 in #706
fix: Log underlying ImportError when model loading fails by @antonvice in #710
Fix wired limit, deepstack eval and add prefill step size argument by @Blaizzy in #699
[PaddleOCR] Fix hardcoded processor config by @Blaizzy in #712
[SmolVLM] Refactor Attention class to calculate n_kv_heads dynamically by @Blaizzy in #713

New Contributors

@hturbe made their first contribution in #704
@mikolaj92 made their first contribution in #706
@antonvice made their first contribution in #710

Full Changelog: v0.3.10...v0.3.11

Contributors

mikolaj92, hturbe, and 2 other contributors

Assets 2

28 Jan 22:25

Blaizzy

v0.3.10

48fc189

v0.3.10

What's Changed

Fix qk_norm for lighton by @Blaizzy in #615
Fix GLM-4.6V vision types by @Blaizzy in #616
Add support for rope parameters [GLM-4.6V MoE] by @Blaizzy in #617
Enhance chat UI by @ivanfioravanti in #619
Add support for TokenizersBackend by @Blaizzy in #621
UI by @ivanfioravanti in #620
docs: fix /chat/completion endpoint typo in README by @zenyr in #623
Add Support for HunyuanOCR (hunyuan_vl) by @voxmenthe in #604
Add Batch Generation by @Blaizzy in #538
Add autoflake (remove unused imports) by @Blaizzy in #628
Bump version to 0.3.10 by @Blaizzy in #632
Fix qwen3_vl_moe kwargs forwarding by @mattjcly in #633
Add support for Jina VLM by @hanxiao in #631
Fix: Skip index.json in file copy to prevent overwrite by @mzau in #638
Add Molmo2 support by @aliyovic in #639
Fix glm46v by @Blaizzy in #629
migrate qwen3omni to MLX by @hellopahe in #598
Fix audio loading by @Blaizzy in #642
Fix pixtral text-only regression from batch generation PR by @mzau in #644
Fix: fixes mlx-vlm.generate --chat when prompted with no imgages by @Deekshith-Dade in #648
Enhance load_config to include generation_config.json and extract eos… by @cubist38 in #650
Add LFM2.5-VL by @Blaizzy in #653
Add MXFP4 Quantization Support by @zhutao100 in #514
Add support for nvfp4 and mxfp8 by @Blaizzy in #657
Fix server audio, chat and trust remote code by @Blaizzy in #660
Fix tokenizer by @Blaizzy in #662
Pass kwargs through to snapshot_download in get_model_path by @Deekshith-Dade in #663
Revert "Pass kwargs through to snapshot_download in get_model_path" by @Blaizzy in #666
[Model] Add PaddleOCR-VL Model Support by @zhang-prog in #656
Remove _has_bytelevel_pretokenizer by @pcuenca in #668
Feat/structured outputs by @cubist38 in #664
Fix: Correct position embedding for LLM in PaddleOCR-VL by @zhang-prog in #672
Use mx.fast.scaled_dot_product_attention in pixtral vision by @pherber3 in #677
fix: Remove invalid parameter "None", explicitly specify assignment p… by @neil0306 in #687
Add text prefill and input embeddings obj by @Blaizzy in #681
Make sure weights are released during conversion by @pcuenca in #691
Add deepseek ocr 2 by @Blaizzy in #690
fix: prevent base64 image data from being tokenized as text by @ne2030 in #685
Add Ernie-4.5-VL by @Blaizzy in #627
TFMS v5 RC3 + Fix processor registry by @Blaizzy in #693

New Contributors

@voxmenthe made their first contribution in #604
@hanxiao made their first contribution in #631
@mzau made their first contribution in #638
@aliyovic made their first contribution in #639
@hellopahe made their first contribution in #598
@Deekshith-Dade made their first contribution in #648
@cubist38 made their first contribution in #650
@zhutao100 made their first contribution in #514
@zhang-prog made their first contribution in #656
@pherber3 made their first contribution in #677
@neil0306 made their first contribution in #687
@ne2030 made their first contribution in #685

Full Changelog: v0.3.9...v0.3.10

Contributors

ivanfioravanti, pcuenca, and 15 other contributors

Assets 2

03 Dec 21:47

Blaizzy

v0.3.9

a010fd3

v0.3.9

What's Changed

Fix qwen3_vl ValueError: Image features and image tokens do not match by @ziya32 in #608
Add ministral3 by @Blaizzy in #611

New Contributors

@ziya32 made their first contribution in #608

Full Changelog: v0.3.8...v0.3.9

Contributors

Blaizzy and ziya32

Assets 2

27 Nov 01:34

Blaizzy

v0.3.8

cc9609e

v0.3.8

What's Changed

Add chat_ui command by @Blaizzy in #589
Fix 422 error when calling /chat/completions with OpenAI SDK #590 by @Blaizzy in #592
Fix qwen3_vl by @awni in #599

Full Changelog: v0.3.7...v0.3.8

Contributors

awni and Blaizzy

Assets 2

Uh oh!

Releases: Blaizzy/mlx-vlm

v0.4.4

What's Changed

Contributors

Uh oh!

v0.4.3

What's Changed

New Contributors

Contributors

Uh oh!

v0.4.2

What's Changed

New Contributors

Contributors

Uh oh!

v0.4.1

What's Changed

New Contributors

Contributors

Uh oh!

v0.4.0

What's Changed

New Contributors

Contributors

Uh oh!

v0.3.12

What's Changed

New Contributors

Contributors

Uh oh!

v0.3.11

What's Changed

New Contributors

Contributors

Uh oh!

v0.3.10

What's Changed

New Contributors

Contributors

Uh oh!

v0.3.9

What's Changed

New Contributors

Contributors

Uh oh!

v0.3.8

What's Changed

Contributors

Uh oh!