Skip to content

Releases: Blaizzy/mlx-vlm

v0.4.4

04 Apr 15:18
90732bd

Choose a tag to compare

What's Changed

  • Fix Gemma 4 chunked prefill for KV-shared models and thinking by @Blaizzy in #901
  • Fix Gemma 4 vision + text degradation and missing processor config by @Blaizzy in #906
  • Fix Falcon-Perception 300M and move generate_perception to model by @Blaizzy in #910
  • Fix Gemma 4 tool parser for nested arguments by @Blaizzy in #916
  • Add VisionFeatureCache for multi-turn image caching by @Blaizzy in #913
  • Fix broken video_generate and smolvlm_video_generate CLI commands by @Blaizzy in #919
  • Optimize TurboQuant Metal kernels: 0.85-1.90x baseline with 89% KV savings by @Blaizzy in #909

Full Changelog: v0.4.3...v0.4.4

v0.4.3

02 Apr 21:14
391179c

Choose a tag to compare

What's Changed

  • Add SAM 3.1 with Object Multiplex and optimized realtime pipeline by @Blaizzy in #880
  • Add Falcon-OCR model support by @Griffintaur in #879
  • Add RF-DETR detection and segmentation model by @Blaizzy in #884
  • Add granite vision 3.2 and 4.0 by @Blaizzy in #885
  • Add wired_limit to vision model inference pipelines for CUDA support by @Blaizzy in #887
  • Add falcon perception by @Blaizzy in #888
  • fix(server): disable uvicorn reload by default to prevent memory leaks by @futurepitcher in #883
  • Add Gemma 4 model support (vision, audio, MoE) by @Blaizzy in #890
  • Fix Gemma 4 embedding scaling by @Blaizzy in #893
  • Add Turbo Quant by @Blaizzy in #858
  • Remove TurboQuant benchmark artifacts and add README docs by @Blaizzy in #894

New Contributors

Full Changelog: v0.4.2...v0.4.3

v0.4.2

28 Mar 21:21
7f7a04c

Choose a tag to compare

What's Changed

  • Ensure minimal metadata exists in model card by @pcuenca in #853
  • fix(qwen3_5): register AutoProcessor patch for qwen3_5 model type by @mdstaff in #859
  • Patch qwen3_5_moe auto processor by @Blaizzy in #867
  • Cast Qwen3.5 RMSNorm output to input dtype by @Blaizzy in #868
  • Fix thinking defaults in CLI and server by @Blaizzy in #869
  • Fix LFM2-VL processor loading without torch by @Blaizzy in #872
  • Fix Magistral image token expansion by @Blaizzy in #873
  • Add Facebook Sam3 by @Blaizzy in #875
  • [Sam3] label drawing for mask only in realtime video by @Blaizzy in #876
  • Add DOTS-MOCR processor support by @Blaizzy in #874
  • Fix PaliGemma processor kwarg routing by @Blaizzy in #877

New Contributors

Full Changelog: v0.4.1...v0.4.2

v0.4.1

21 Mar 14:25
f613272

Choose a tag to compare

What's Changed

  • feat(server): add --model and --adapter-path flags for startup preloading by @auggie246 in #811
  • fix(qwen3_5): guard batch dimension mismatches under continuous batching by @kol22 in #813
  • Add common sampling parameters to server endpoints by @spicyneuron in #814
  • Add an 'index' for each tool call as per OpenAI spec. by @viktike in #818
  • Fix Qwen3-Omni (qwen3_omni_moe) integration by @howeirdo in #820
  • Add mistral4 by @Blaizzy in #827
  • Clean up server + generate args, validation, and defaults by @spicyneuron in #829
  • Set custom processor as default and remove torch as dep by @Blaizzy in #821
  • Add molmo point by @Blaizzy in #844

New Contributors

Full Changelog: v0.4.0...v0.4.1

v0.4.0

07 Mar 19:01
718f69e

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.3.12...v0.4.0

v0.3.12

16 Feb 23:38
739f4ab

Choose a tag to compare

What's Changed

  • [MODEL] support qwen3.5 series by @JJJYmmm in #722
  • qwen3_omni_moe: Fix destructuring error by @Nixuge in #723
  • docs: clarify that some models need [torch] extra for torchvision by @Mr-Neutr0n in #716
  • Use mlx-lm provided logits processors and samplers by @AG6GR in #724
  • Fix LoRA training crash when config uses image_token_id by @JesseRod329 in #720
  • video_generate.py: handle capitalized extensions in is_video_file by @Nixuge in #719
  • Honor quantization_config by @pcuenca in #692
  • [Ministral] Add FP8 dequantization for model weights by @Blaizzy in #727
  • Fix LFM2.5VL processor compatibility patch image placeholder count by @mattjcly in #725
  • Add activation quantization support for QQLinear layers by @Blaizzy in #728
  • Refactor processor return types in utils.py to use ProcessorMixin by @Blaizzy in #729
  • Fix glm (4v & 4v moe) broadcast by @Blaizzy in #731
  • [FastVLM] fix dtype cast by @Blaizzy in #733
  • Fix kimi broadcast by @Blaizzy in #734
  • Add custom processor for fastvlm by @Blaizzy in #736
  • Fix paligemma prefill and numerical precision by @Blaizzy in #737
  • Fix idefics3 llama3 and smovlm by @Blaizzy in #738
  • docs(README): fix JSON formatting in curl examples by @chsdwn in #739
  • fix chunk prefill in qwen3.5 moe by @JJJYmmm in #742
  • Fix florence-2 (processor and infer) by @Blaizzy in #743
  • Fix Qwen3.5 cast type and predicates by @Blaizzy in #744
  • [Qwen2, 2.5] Fix vision overflow by @Blaizzy in #745
  • [Ministral3] Fix multi-image by @Blaizzy in #747
  • [Phiv3] Fix dtype cast by @Blaizzy in #748
  • Add dots-ocr by @Blaizzy in #749

New Contributors

Full Changelog: v0.3.11...v0.3.12

v0.3.11

04 Feb 20:11
21f9b45

Choose a tag to compare

What's Changed

  • Refactor input embedding handling in _generate_batch function by @Blaizzy in #694
  • [Qwen2-VL ] Fix incorrect attention mask in Vision by @hturbe in #704
  • Add GLM-OCR by @Blaizzy & @mikolaj92 in #706
  • fix: Log underlying ImportError when model loading fails by @antonvice in #710
  • Fix wired limit, deepstack eval and add prefill step size argument by @Blaizzy in #699
  • [PaddleOCR] Fix hardcoded processor config by @Blaizzy in #712
  • [SmolVLM] Refactor Attention class to calculate n_kv_heads dynamically by @Blaizzy in #713

New Contributors

Full Changelog: v0.3.10...v0.3.11

v0.3.10

28 Jan 22:25
48fc189

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.3.9...v0.3.10

v0.3.9

03 Dec 21:47
a010fd3

Choose a tag to compare

What's Changed

  • Fix qwen3_vl ValueError: Image features and image tokens do not match by @ziya32 in #608
  • Add ministral3 by @Blaizzy in #611

New Contributors

Full Changelog: v0.3.8...v0.3.9

v0.3.8

27 Nov 01:34
cc9609e

Choose a tag to compare

What's Changed

Full Changelog: v0.3.7...v0.3.8