Releases: Blaizzy/mlx-vlm
Releases · Blaizzy/mlx-vlm
v0.4.4
What's Changed
- Fix Gemma 4 chunked prefill for KV-shared models and thinking by @Blaizzy in #901
- Fix Gemma 4 vision + text degradation and missing processor config by @Blaizzy in #906
- Fix Falcon-Perception 300M and move generate_perception to model by @Blaizzy in #910
- Fix Gemma 4 tool parser for nested arguments by @Blaizzy in #916
- Add VisionFeatureCache for multi-turn image caching by @Blaizzy in #913
- Fix broken video_generate and smolvlm_video_generate CLI commands by @Blaizzy in #919
- Optimize TurboQuant Metal kernels: 0.85-1.90x baseline with 89% KV savings by @Blaizzy in #909
Full Changelog: v0.4.3...v0.4.4
v0.4.3
What's Changed
- Add SAM 3.1 with Object Multiplex and optimized realtime pipeline by @Blaizzy in #880
- Add Falcon-OCR model support by @Griffintaur in #879
- Add RF-DETR detection and segmentation model by @Blaizzy in #884
- Add granite vision 3.2 and 4.0 by @Blaizzy in #885
- Add wired_limit to vision model inference pipelines for CUDA support by @Blaizzy in #887
- Add falcon perception by @Blaizzy in #888
- fix(server): disable uvicorn reload by default to prevent memory leaks by @futurepitcher in #883
- Add Gemma 4 model support (vision, audio, MoE) by @Blaizzy in #890
- Fix Gemma 4 embedding scaling by @Blaizzy in #893
- Add Turbo Quant by @Blaizzy in #858
- Remove TurboQuant benchmark artifacts and add README docs by @Blaizzy in #894
New Contributors
- @Griffintaur made their first contribution in #879
- @futurepitcher made their first contribution in #883
Full Changelog: v0.4.2...v0.4.3
v0.4.2
What's Changed
- Ensure minimal metadata exists in model card by @pcuenca in #853
- fix(qwen3_5): register AutoProcessor patch for qwen3_5 model type by @mdstaff in #859
- Patch qwen3_5_moe auto processor by @Blaizzy in #867
- Cast Qwen3.5 RMSNorm output to input dtype by @Blaizzy in #868
- Fix thinking defaults in CLI and server by @Blaizzy in #869
- Fix LFM2-VL processor loading without torch by @Blaizzy in #872
- Fix Magistral image token expansion by @Blaizzy in #873
- Add Facebook Sam3 by @Blaizzy in #875
- [Sam3] label drawing for mask only in realtime video by @Blaizzy in #876
- Add DOTS-MOCR processor support by @Blaizzy in #874
- Fix PaliGemma processor kwarg routing by @Blaizzy in #877
New Contributors
Full Changelog: v0.4.1...v0.4.2
v0.4.1
What's Changed
- feat(server): add --model and --adapter-path flags for startup preloading by @auggie246 in #811
- fix(qwen3_5): guard batch dimension mismatches under continuous batching by @kol22 in #813
- Add common sampling parameters to server endpoints by @spicyneuron in #814
- Add an 'index' for each tool call as per OpenAI spec. by @viktike in #818
- Fix Qwen3-Omni (qwen3_omni_moe) integration by @howeirdo in #820
- Add mistral4 by @Blaizzy in #827
- Clean up server + generate args, validation, and defaults by @spicyneuron in #829
- Set custom processor as default and remove torch as dep by @Blaizzy in #821
- Add molmo point by @Blaizzy in #844
New Contributors
- @auggie246 made their first contribution in #811
- @howeirdo made their first contribution in #820
Full Changelog: v0.4.0...v0.4.1
v0.4.0
What's Changed
- Fix gemma3n short prompts by @Blaizzy in #751
- Adding full weight finetuning by @Goekdeniz-Guelmez in #499
- Initialize qwen3_5_moe.LanguageModel with _position_ids by @will-lms in #753
- Fix batch inference to use InputEmbeddingsFeatures by @Blaizzy in #760
- fix(qwen3_vl): reset _position_ids when new image/video arrives by @kol22 in #756
- fix: replace 13 bare excepts with except Exception by @haosenwang1018 in #765
- Reset position ids for Qwen3-VL-MoE image inputs by @kol22 in #761
- Fix text only single batch by @Blaizzy in #771
- Update LORA.md for the new trainer by @Goekdeniz-Guelmez in #775
- Support /v1/ prefix for OpenAI compatible endpoints by @spicyneuron in #783
- CORS middleware by @viktike in #766
- fx: README curl error by @yinzhidong in #788
- Streaming chatml response enhancements. by @viktike in #764
- Add KV cache (quantization) parameters to server. by @viktike in #776
- Fix mllama training and add more models in mapp by @Goekdeniz-Guelmez in #777
- Add thinking budget and flag by @Blaizzy in #789
- fix: add import guard for gradio in chat_ui entry point by @RtYkk in #787
- Tool calling in server by @viktike in #773
- Add Minicpm-o-2.5 by @Blaizzy in #791
- Add prefill-step-size optional command line argument for server. by @viktike in #792
- qwen3_omni_moe: fix visual_embeds_multiscale UnboundLocalError by @ronaldseoh in #794
- Add Phi-4-reasoning-vision-15B (phi4-siglip) by @Blaizzy in #796
- Add phi4mm by @Blaizzy in #797
- Add ensure_fused_sdpa function for optimized attention computation by @Blaizzy in #800
- [Internvl_chat] Add optional kwargs parameter to LanguageModel by @Blaizzy in #801
- Guard load img & audio by @Blaizzy in #803
- [Qwen3.5 MoE] Fix quant predicate by @Blaizzy in #804
- Adding orpo by @Goekdeniz-Guelmez in #795
- Add moondream3 by @Blaizzy in #807
- Remove sleep duration with zero delay by @Blaizzy in #808
- Update dependencies and version number by @Blaizzy in #809
New Contributors
- @kol22 made their first contribution in #756
- @haosenwang1018 made their first contribution in #765
- @spicyneuron made their first contribution in #783
- @viktike made their first contribution in #766
- @yinzhidong made their first contribution in #788
- @RtYkk made their first contribution in #787
- @ronaldseoh made their first contribution in #794
Full Changelog: v0.3.12...v0.4.0
v0.3.12
What's Changed
- [MODEL] support qwen3.5 series by @JJJYmmm in #722
- qwen3_omni_moe: Fix destructuring error by @Nixuge in #723
- docs: clarify that some models need [torch] extra for torchvision by @Mr-Neutr0n in #716
- Use mlx-lm provided logits processors and samplers by @AG6GR in #724
- Fix LoRA training crash when config uses image_token_id by @JesseRod329 in #720
- video_generate.py: handle capitalized extensions in is_video_file by @Nixuge in #719
- Honor quantization_config by @pcuenca in #692
- [Ministral] Add FP8 dequantization for model weights by @Blaizzy in #727
- Fix LFM2.5VL processor compatibility patch image placeholder count by @mattjcly in #725
- Add activation quantization support for QQLinear layers by @Blaizzy in #728
- Refactor processor return types in utils.py to use ProcessorMixin by @Blaizzy in #729
- Fix glm (4v & 4v moe) broadcast by @Blaizzy in #731
- [FastVLM] fix dtype cast by @Blaizzy in #733
- Fix kimi broadcast by @Blaizzy in #734
- Add custom processor for fastvlm by @Blaizzy in #736
- Fix paligemma prefill and numerical precision by @Blaizzy in #737
- Fix idefics3 llama3 and smovlm by @Blaizzy in #738
- docs(README): fix JSON formatting in curl examples by @chsdwn in #739
- fix chunk prefill in qwen3.5 moe by @JJJYmmm in #742
- Fix florence-2 (processor and infer) by @Blaizzy in #743
- Fix Qwen3.5 cast type and predicates by @Blaizzy in #744
- [Qwen2, 2.5] Fix vision overflow by @Blaizzy in #745
- [Ministral3] Fix multi-image by @Blaizzy in #747
- [Phiv3] Fix dtype cast by @Blaizzy in #748
- Add dots-ocr by @Blaizzy in #749
New Contributors
- @Nixuge made their first contribution in #723
- @Mr-Neutr0n made their first contribution in #716
- @AG6GR made their first contribution in #724
- @JesseRod329 made their first contribution in #720
- @chsdwn made their first contribution in #739
Full Changelog: v0.3.11...v0.3.12
v0.3.11
What's Changed
- Refactor input embedding handling in _generate_batch function by @Blaizzy in #694
- [Qwen2-VL ] Fix incorrect attention mask in Vision by @hturbe in #704
- Add GLM-OCR by @Blaizzy & @mikolaj92 in #706
- fix: Log underlying ImportError when model loading fails by @antonvice in #710
- Fix wired limit, deepstack eval and add prefill step size argument by @Blaizzy in #699
- [PaddleOCR] Fix hardcoded processor config by @Blaizzy in #712
- [SmolVLM] Refactor Attention class to calculate n_kv_heads dynamically by @Blaizzy in #713
New Contributors
- @hturbe made their first contribution in #704
- @mikolaj92 made their first contribution in #706
- @antonvice made their first contribution in #710
Full Changelog: v0.3.10...v0.3.11
v0.3.10
What's Changed
- Fix qk_norm for lighton by @Blaizzy in #615
- Fix GLM-4.6V vision types by @Blaizzy in #616
- Add support for rope parameters [GLM-4.6V MoE] by @Blaizzy in #617
- Enhance chat UI by @ivanfioravanti in #619
- Add support for TokenizersBackend by @Blaizzy in #621
- UI by @ivanfioravanti in #620
- docs: fix /chat/completion endpoint typo in README by @zenyr in #623
- Add Support for HunyuanOCR (hunyuan_vl) by @voxmenthe in #604
- Add Batch Generation by @Blaizzy in #538
- Add autoflake (remove unused imports) by @Blaizzy in #628
- Bump version to 0.3.10 by @Blaizzy in #632
- Fix qwen3_vl_moe kwargs forwarding by @mattjcly in #633
- Add support for Jina VLM by @hanxiao in #631
- Fix: Skip index.json in file copy to prevent overwrite by @mzau in #638
- Add Molmo2 support by @aliyovic in #639
- Fix glm46v by @Blaizzy in #629
- migrate qwen3omni to MLX by @hellopahe in #598
- Fix audio loading by @Blaizzy in #642
- Fix pixtral text-only regression from batch generation PR by @mzau in #644
- Fix: fixes mlx-vlm.generate --chat when prompted with no imgages by @Deekshith-Dade in #648
- Enhance load_config to include generation_config.json and extract eos… by @cubist38 in #650
- Add LFM2.5-VL by @Blaizzy in #653
- Add MXFP4 Quantization Support by @zhutao100 in #514
- Add support for nvfp4 and mxfp8 by @Blaizzy in #657
- Fix server audio, chat and trust remote code by @Blaizzy in #660
- Fix tokenizer by @Blaizzy in #662
- Pass kwargs through to snapshot_download in get_model_path by @Deekshith-Dade in #663
- Revert "Pass kwargs through to snapshot_download in get_model_path" by @Blaizzy in #666
- [Model] Add PaddleOCR-VL Model Support by @zhang-prog in #656
- Remove _has_bytelevel_pretokenizer by @pcuenca in #668
- Feat/structured outputs by @cubist38 in #664
- Fix: Correct position embedding for LLM in PaddleOCR-VL by @zhang-prog in #672
- Use mx.fast.scaled_dot_product_attention in pixtral vision by @pherber3 in #677
- fix: Remove invalid parameter "None", explicitly specify assignment p… by @neil0306 in #687
- Add text prefill and input embeddings obj by @Blaizzy in #681
- Make sure weights are released during conversion by @pcuenca in #691
- Add deepseek ocr 2 by @Blaizzy in #690
- fix: prevent base64 image data from being tokenized as text by @ne2030 in #685
- Add Ernie-4.5-VL by @Blaizzy in #627
- TFMS v5 RC3 + Fix processor registry by @Blaizzy in #693
New Contributors
- @voxmenthe made their first contribution in #604
- @hanxiao made their first contribution in #631
- @mzau made their first contribution in #638
- @aliyovic made their first contribution in #639
- @hellopahe made their first contribution in #598
- @Deekshith-Dade made their first contribution in #648
- @cubist38 made their first contribution in #650
- @zhutao100 made their first contribution in #514
- @zhang-prog made their first contribution in #656
- @pherber3 made their first contribution in #677
- @neil0306 made their first contribution in #687
- @ne2030 made their first contribution in #685
Full Changelog: v0.3.9...v0.3.10