Moe kernels refactor by Datta0 · Pull Request #529 · unslothai/unsloth-zoo

Datta0 · 2026-03-03T10:26:26Z

Tried to preserve as much authorship as possible with the help from codex.

Ref: https://github.com/unslothai/unsloth/pull/4145/changes

* add moe grouped gemm kernel * add benchmark, README * remove formatting from __init__.py

* add llama4 reference layer * add llama4 reference impl * formatting

* Update ocr_eval.md * Update backward.py

* Update qwen3_moe.py * Update interface.py

* Update rl.py * Fix CE Loss * Versioning * Update loader.py * Update loader.py * extract_model_type_from_config * Model types * Update loader.py * get_transformers_model_type * Update loader.py * Update loader.py * Update loader.py * Update rl.py * Update pyproject.toml * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Versioning * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update vision.py * Update vision.py * Fix DataParallel * Update _utils.py * Update rl.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update mapper.py * Versioning * Update loader.py * Update loader.py * Update rl.py * Versioning * Update _utils.py * Fix auto_mapping * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update loader.py * Message * Update vision.py * Update loader.py * Update vision.py * cache_implementation * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Save max_seq_length * Update _utils.py * Update rl.py * Update vision.py * Update llama.py * Mistral3 vllm (#3349) * [WIP] use vLLM for vision language models * Update README.md Editing icon sizes * Update README.md Updating icon sizes * Update README.md (#2885) * MoE kernels AGPLv3 * versioning * Many bug fixes (#2908) * add deepseek v3 * add deepseek r1 base * add deepseek r1 zero * add deepseek distill llama * add deepseek distill models * remove redundant code when constructing model names * add mistral small to registry * rename model registration methods * rename deepseek registration methods * refactor naming for mistral and phi * add global register models * refactor model registration tests for new registry apis * add model search method * remove deprecated registration api * add quant type test * add registry readme * make llama registration more specific * clear registry when executing individual model registration file * more registry readme updates * Update _auto_install.py * Llama4 * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Synthetic data * Update mapper.py * Xet and Synthetic * Update synthetic.py * Update loader.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py --------- Co-authored-by: jeromeku <jerome.ku@gmail.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> * silienty skip falcon h1 import is transformers_version < 4.53.0 (#2912) * Dynamically adjust get_per_token_logps function and patch as well (#2911) * add intel gpu with vllm support (#2903) * [bugs] fix for casual mask (#2868) * fix for casual mask * use un_casual in sdpa * add missing mask * fix for type * Explicitly check if xformers exists for attention (#2889) * Update __init__.py * Update llama.py * if mlp doesn't exist in layer module check for feed_forward name for falcon h1 (#2913) * Move inputs to right devices. (#2919) * Move tensors to right devices * fix multi gpu for non mistral models * multi GPU RoPE for gemma2 * Finish up multi GPU inference * Make multiGPU rope a list * Remove unnecessary transfer to CPU * Remove unnecessary move to CPU * Donot move inputs to device yet will be handled separately in another PR * Move inputs to appropriate decoder device * Make device count global variable * Cleanup RoPE device code * Fixup num_gpu to device count * Cleanup device counts * Use device index for RoPE get_cache * Donot typecast * Use tuple instead of list for tensors. Use device index directly * fixup move to device logic * WIP VLM vLLM * Make vLLM patch a function * Add save and load lora functions * Make fast_inference setup depend on the flag * Improve fast inference patching mechanism * Make vision setting depend on checks in fastbasemodel * Check LoRA and vLLM intercompatibility for vision models * Comment pointing to vLLM LoRA check * Improve lora validation on vLLM * Error out on no vLLM and increase max lora rank * Bug fixes (#3017) * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py * Small fixes * Update vision.py * Update vision.py * versioning * Update __init__.py * Update llama.py * Update rl.py * Update rl.py * Update _utils.py * Update vision.py * Update vision.py * compiler stance * Update _utils.py * Update pyproject.toml * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990) This reverts commit 204fc46. * skip_guard_eval_unsafe fix * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update llama.py * Update llama.py * Fix `quantization_method` * versioning * fix for casual mask (#3011) * [intel] add for intel path for llama.py (#3012) * fix for intel path * remove unuse code * Update unsloth/models/llama.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update llama.py * Fix Gemma 2 (#3024) * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py * Small fixes * Update vision.py * Update vision.py * versioning * Update __init__.py * Update llama.py * Update rl.py * Update rl.py * Update _utils.py * Update vision.py * Update vision.py * compiler stance * Update _utils.py * Update pyproject.toml * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990) This reverts commit 204fc46. * skip_guard_eval_unsafe fix * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update llama.py * Update llama.py * Fix `quantization_method` * versioning * Update _utils.py * Update _utils.py * Update _utils.py * falcon force float32 on sm<75 machines (#3026) * Fix torch compile issues (#3028) * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py * Small fixes * Update vision.py * Update vision.py * versioning * Update __init__.py * Update llama.py * Update rl.py * Update rl.py * Update _utils.py * Update vision.py * Update vision.py * compiler stance * Update _utils.py * Update pyproject.toml * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990) This reverts commit 204fc46. * skip_guard_eval_unsafe fix * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update llama.py * Update llama.py * Fix `quantization_method` * versioning * Update _utils.py * Update _utils.py * Update _utils.py * check stride * Cleanup * Update rope_embedding.py * Update gemma2.py * Fix `set_stance` * Update pyproject.toml * Update _utils.py * Fixup patch vllm * Disable mllama * Use variables to decide VLM support * Better attn_impl handling * Patch TF protobuf incompatability * Torch 2.8 (#3186) * Fix mamba * Update loader.py * Update vision.py * Update loader.py * Filter vLLM standby logs (#3131) * filter vLLM standby logs * safeguard standby logger patch * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update loader.py * Add scaler * Update llama.py * Update _utils.py * Versioning * GPT OSS fix * GPT OSS fix * Update loader.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Update llama.py * Update llama.py * Update llama.py * Versioning * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Upcast norms * Update loader.py * Update vision.py * Upcast layernorms * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update save.py * Update rl.py * Update pyproject.toml * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update _utils.py * Update __init__.py * Torch 2.8 * Update rl_replacements.py --------- Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> * Update _auto_install.py * Update pyproject.toml * Update rl.py * Protobuf issue * Update pyproject.toml * Fix extras transformers typo in pyproject.toml * Update _utils.py * Bug fixes (#3195) * Fix mamba * Update loader.py * Update vision.py * Update loader.py * Filter vLLM standby logs (#3131) * filter vLLM standby logs * safeguard standby logger patch * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update loader.py * Add scaler * Update llama.py * Update _utils.py * Versioning * GPT OSS fix * GPT OSS fix * Update loader.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Update llama.py * Update llama.py * Update llama.py * Versioning * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Upcast norms * Update loader.py * Update vision.py * Upcast layernorms * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update save.py * Update rl.py * Update pyproject.toml * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update _utils.py * Update __init__.py * Torch 2.8 * Update rl_replacements.py * Update loader.py * UNSLOTH_ENABLE_CCE * Fix * Update loader.py * Update loader.py * Update __init__.py * Update __init__.py * Update __init__.py * Update __init__.py * Import fixes * Update loader.py * Fix aimv2 issue * Update loader.py * Update import_fixes.py * Update import_fixes.py * Update loader.py * Update loader.py * Update loader.py * Upgrade * Update loader.py * Update loader.py * Update loader.py * Update loader.py --------- Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> * adallow float32 dtype in FastLanguageModel (#3204) * Update loader.py * Update vision.py * Suppress message and use unsloth sampling params * Use trl sampling params for now * Improve error message * fixup quantized fast inference model name * Add mistral 3 support --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: jeromeku <jerome.ku@gmail.com> Co-authored-by: DoubleMathew <mmathew23@gmail.com> Co-authored-by: Lei Zhenyuan <zhenyuan.lei@intel.com> Co-authored-by: parth2510 <parthguptapg7326@gmail.com> * Set padding to 0 * Fix patch * fixup patch (#3359) Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> * Update vision.py * Versioning * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * MXFP4 dequant * Update loader.py * Update vision.py * load_in_16bit * Update vision.py * Update vision.py * Update vision.py * Update rl.py * Update vision.py * offload_embedding * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update rl_replacements.py * Update loader.py * Fix padding issue * Update pyproject.toml * Update _utils.py * Update pyproject.toml * Update _utils.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * New models * Update llama.py * Versioning * Update _utils.py * Update llama.py * Update _utils.py * Update llama.py * Fix AMD * Update _utils.py * Update llama.py * Update vision.py * DEVICE_TYPE_TORCH * Update __init__.py * Update __init__.py * Update _utils.py * Move DEVICE_TYPE * Update rl_replacements.py * Update loader.py * AMD install script * Move AMD * Update _amd_install.sh * Update pyproject.toml * Update pyproject.toml * Delete _amd_install.sh * Update device_type.py * Update loader.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update tokenizer_utils.py * Versioning * Update pyproject.toml * Update loader.py * Update _utils.py * Update pyproject.toml * Update pyproject.toml * Update _utils.py * Update pyproject.toml * Update _utils.py * Update _utils.py * Update loader.py * Update _utils.py * Update _utils.py * local_files_only * Cut Cross Entropy * Update llama.py * Update vision.py * Update vision.py * Update vision.py * Qwen 3 VL vLLM (#3489) * Update __init__.py * patch_torchao * torchao_logger * Update rl_replacements.py * Fix * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update _utils.py * Versioning * fbgemm fp8 block quant support (>=1.4.0) (#3531) * fbgemm fp8 block quant support (>=1.4.0) * Verify for fp8 support before proceeding * Use unsloth zoo's Version and improve comments * spacessss * Update vision.py * Update vision.py * Update rl.py * vllm_sampling_params * Update rl.py * Update rl.py * Update rl.py * Add `ruff` pre-commit hook and apply it (#3424) * Add Ruff pre-commit config and workflow * Add kwarg spacing enforcement helper * Apply Ruff formatting * Update fp8.py * Revert ruff on some files * Update * force-exclude = true * Datasets issue * Ruff * Remove mapper * Update mapper.py * Update pyproject.toml --------- Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: jeromeku <jerome.ku@gmail.com> Co-authored-by: DoubleMathew <mmathew23@gmail.com> Co-authored-by: Lei Zhenyuan <zhenyuan.lei@intel.com> Co-authored-by: parth2510 <parthguptapg7326@gmail.com> Co-authored-by: Dan Saunders <danjsaund@gmail.com>

@danielhanchen

* vllm sampling params fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * do not patch base_trainer * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * seperate vllm fixes * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Apply suggestion from @danielhanchen * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert "[pre-commit.ci] auto fixes from pre-commit.com hooks" This reverts commit fbb98c5c5cbcb7fc030f313dd7e181a120eaf706. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert "[pre-commit.ci] auto fixes from pre-commit.com hooks" This reverts commit c64d5b475e7f176436484ce19077fae571a73b1a. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert "[pre-commit.ci] auto fixes from pre-commit.com hooks" This reverts commit c1565455153e6c6a526814191346bd9a90535992. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>

This reverts commit 6c47dc5.

for more information, see https://pre-commit.ci

This reverts commit 9bf82bb.

for more information, see https://pre-commit.ci

This reverts commit 1146769.

for more information, see https://pre-commit.ci

This reverts commit c427be9.

* Improve MoE performance * small changes * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix imports * disable autotune * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * LoRA for MoE * Make autotune default * make dy contiguous * use non lora model as base for RL * Revert "use non lora model as base for RL" This reverts commit da9e2c4323724c8c3c50991b3d65a6f70f53e6e5. * fixup derp * non TMA [T4] * Revert "non TMA [T4]" This reverts commit 9bb0db2a66dcb9a7f7feba128b3fe32b14c330c0. * Fixes for VL MoE and v5 transformers * [transformers] [v5] remove unused hybridcache (#3910) * remote unused hybridcache * cleanup * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * No double compile for qwen3moe * Fix top_k on trl GRPO * Recognise GLM as MoE * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix missing RotaryEmbeddingConfigMixin * Licensing for autotuning cache * Cleanup --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Erland366 <erland.pg366@gmail.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>

TMA (Tensor Memory Accelerator) is an NVIDIA Hopper+ feature that does not exist on AMD GPUs. However, _check_tma_support() incorrectly returns True on ROCm because: 1. torch.cuda.get_device_capability() returns (11, 0) for gfx1100, satisfying the >= 9 check intended for Hopper (sm_90). 2. ROCm Triton exports tl.make_tensor_descriptor (the symbol exists even though the hardware does not support TMA). This would cause MoE grouped_gemm to attempt TMA operations on AMD GPUs, leading to runtime failures. Fix: early-return False for HIP devices, matching the existing XPU guard.

gemini-code-assist · 2026-03-03T10:29:40Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the performance of Mixture-of-Experts (MoE) MLP blocks by introducing a new, highly optimized grouped GEMM implementation. By fusing critical operations directly into Triton kernels, the changes aim to minimize computational overheads and memory transfers. The integration includes an intelligent autotuning and caching mechanism to dynamically adapt and store optimal kernel configurations, ensuring peak efficiency. This refactor provides a more performant and maintainable foundation for MoE architectures.

Highlights

Optimized MoE Kernel Implementation: Introduced a new, highly optimized grouped GEMM implementation for Mixture-of-Experts (MoE) MLP blocks, leveraging Triton kernels for enhanced performance.
Fused Operations: Implemented fusions for key MoE operations, including permutation of hidden states (token to expert order and vice-versa) and multiplication by expert weights, directly within the GEMM kernels to reduce overhead.
Autotuning and Caching System: Added a robust system for autotuning Triton kernel configurations and caching them to disk, ensuring optimal performance across different hardware setups without repetitive tuning.
Comprehensive Testing and Benchmarking: Included extensive unit and end-to-end tests, along with benchmarking scripts, to validate the correctness and measure the performance improvements of the new MoE grouped GEMM kernels for Qwen3 and Llama4 models.
Licensing Update: All newly added kernel files are licensed under the GNU Affero General Public License v3.0, ensuring open and collaborative use.
Temporary Patch Updates: Updated existing temporary patches to correctly reference the new MoE kernel module structure and ensure compatibility with LoRA extractor functions.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

unsloth_zoo/kernels/init.py
- Added package namespace.
unsloth_zoo/kernels/moe/LICENSE
- Added GNU Affero General Public License v3.0.
unsloth_zoo/kernels/moe/README.md
- Added detailed documentation for MoE Grouped GEMM, including background, optimizations, structure, tests, benchmarks, and notes.
unsloth_zoo/kernels/moe/init.py
- Added package namespace with AGPLv3 license header.
unsloth_zoo/kernels/moe/autotune_cache.py
- Added a system for caching and retrieving autotuned MoE kernel configurations.
unsloth_zoo/kernels/moe/benchmark/benchmark_fused_moe.py
- Added a script for benchmarking fused MoE implementations against Hugging Face models.
unsloth_zoo/kernels/moe/benchmark/utils.py
- Added utility functions for creating, processing, and saving benchmark results.
unsloth_zoo/kernels/moe/grouped_gemm/LICENSE
- Added GNU Affero General Public License v3.0.
unsloth_zoo/kernels/moe/grouped_gemm/init.py
- Added package namespace with AGPLv3 license header.
unsloth_zoo/kernels/moe/grouped_gemm/interface.py
- Added Python interface for grouped GEMM operations, including forward and backward passes, and TMA support checks.
unsloth_zoo/kernels/moe/grouped_gemm/kernels/init.py
- Added package namespace with AGPLv3 license header.
unsloth_zoo/kernels/moe/grouped_gemm/kernels/autotuning.py
- Added utilities for autotuning Triton kernel configurations, including default block sizes and pruning criteria.
unsloth_zoo/kernels/moe/grouped_gemm/kernels/backward.py
- Added Triton kernels for grouped GEMM backward passes (dX and dW), including autotuning.
unsloth_zoo/kernels/moe/grouped_gemm/kernels/forward.py
- Added Triton kernel for grouped GEMM forward pass, including autotuning and fusions.
unsloth_zoo/kernels/moe/grouped_gemm/kernels/tuning.py
- Added data classes for kernel configurations and results, along with pruning functions for manual tuning.
unsloth_zoo/kernels/moe/grouped_gemm/reference/init.py
- Added package namespace with AGPLv3 license header.
unsloth_zoo/kernels/moe/grouped_gemm/reference/layers/llama4_moe.py
- Added reference implementations for Llama4 MoE block using both torch-native and Triton grouped GEMM.
unsloth_zoo/kernels/moe/grouped_gemm/reference/layers/qwen3_moe.py
- Added reference implementations for Qwen3 MoE block using both torch-native and Triton grouped GEMM.
unsloth_zoo/kernels/moe/grouped_gemm/reference/moe_block.py
- Added a reference implementation of Qwen3 MoE block using Triton grouped GEMM.
unsloth_zoo/kernels/moe/grouped_gemm/reference/moe_ops.py
- Added utility functions for MoE operations like permuting, unpermuting, calculating top-k, and torch-native grouped GEMM.
unsloth_zoo/kernels/moe/requirements.txt
- Added new Python dependencies for the MoE kernels.
unsloth_zoo/kernels/moe/tests/init.py
- Added package namespace with AGPLv3 license header.
unsloth_zoo/kernels/moe/tests/common.py
- Added common utilities and data structures for MoE kernel testing.
unsloth_zoo/kernels/moe/tests/moe_utils.py
- Added utility functions for testing MoE blocks, including gradient checking and result comparison.
unsloth_zoo/kernels/moe/tests/run_qwen3_moe_tests.sh
- Added a shell script to run Qwen3 MoE tests with various configurations.
unsloth_zoo/kernels/moe/tests/test_grouped_gemm.py
- Added comprehensive unit tests for grouped GEMM forward and backward kernels.
unsloth_zoo/kernels/moe/tests/test_llama4_moe.py
- Added tests for Llama4 MoE integration with the new grouped GEMM kernels.
unsloth_zoo/kernels/moe/tests/test_qwen3_moe.py
- Added tests for Qwen3 MoE integration with the new grouped GEMM kernels.
unsloth_zoo/temporary_patches/moe_utils.py
- Updated import paths for grouped_gemm and autotune_cache.
- Added staticmethod decorator to _unsloth_lora_extractor_fn.
unsloth_zoo/temporary_patches/qwen3_5_moe.py
- Applied staticmethod decorator to _unsloth_lora_extractor_fn.
unsloth_zoo/temporary_patches/qwen3_moe.py
- Applied staticmethod decorator to _unsloth_lora_extractor_fn.
unsloth_zoo/temporary_patches/qwen3_next_moe.py
- Applied staticmethod decorator to _unsloth_lora_extractor_fn.
unsloth_zoo/temporary_patches/qwen3_vl_moe.py
- Applied staticmethod decorator to _qwen3_vl_lora_extractor.

Activity

The pull request introduces a new moe kernel package within unsloth_zoo/kernels, containing a complete implementation of optimized grouped GEMM operations for Mixture-of-Experts (MoE) models.
This includes Triton-based kernels for forward and backward passes, an autotuning and caching system for kernel configurations, and extensive testing and benchmarking infrastructure.
The author, Datta0, noted efforts to preserve authorship, suggesting the use of tools like Codex for code generation while maintaining proper attribution.
Several existing temporary patches related to Qwen3 and other MoE models were updated to correctly integrate with the new kernel structure and ensure proper function calls.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a significant refactoring by moving Mixture-of-Experts (MoE) kernels into a new unsloth_zoo package. This includes new, optimized Triton kernels for grouped GEMM operations, along with infrastructure for autotuning and caching. The changes are extensive and well-structured. My review focuses on the correctness, performance, and maintainability of this new kernel infrastructure. I've identified a few issues, including a bug that could lead to a runtime error, some performance inefficiencies in memory allocation for Triton kernels, and some dead code. Addressing these points will help improve the robustness and performance of the new MoE kernels.

unsloth_zoo/kernels/moe/grouped_gemm/interface.py

unsloth_zoo/kernels/moe/autotune_cache.py

jeromeku and others added 20 commits March 3, 2026 10:02

MoE Kernel (#2465)

1b8aeb8

* add moe grouped gemm kernel * add benchmark, README * remove formatting from __init__.py

Llama4 MoE Grouped GEMM (#2639)

223eb67

* add llama4 reference layer * add llama4 reference impl * formatting

Create LICENSE

4804886

Fix Typos in Documentation and Comments (#2721)

8931a39

* Update ocr_eval.md * Update backward.py

Docs: Fix typo and improve MoE docstrings (#2784)

04250a4

* Update qwen3_moe.py * Update interface.py

MoE kernels AGPLv3

74ad30e

Revert "[FIX] Vllm guided decoding params (#3662)"

8f5bb27

This reverts commit 6c47dc5.

auto fixes from pre-commit.com hooks

702a3c4

for more information, see https://pre-commit.ci

Revert "[pre-commit.ci] auto fixes from pre-commit.com hooks"

dd2f397

This reverts commit 9bf82bb.

auto fixes from pre-commit.com hooks

5847b78

for more information, see https://pre-commit.ci

Revert "[pre-commit.ci] auto fixes from pre-commit.com hooks"

5ea7023

This reverts commit 1146769.

auto fixes from pre-commit.com hooks

5218ad1

for more information, see https://pre-commit.ci

Revert "[pre-commit.ci] auto fixes from pre-commit.com hooks"

24f72b4

This reverts commit c427be9.

fix for tma (#4023)

b699bcc

Restructure imports for moe stuff and fixup qwen3moe extractor

62aa4db

Update headers

c5b008c

Datta0 mentioned this pull request Mar 3, 2026

Moe kernels refactor unslothai/unsloth#4145

Closed

gemini-code-assist bot reviewed Mar 3, 2026

View reviewed changes

Merge remote-tracking branch 'origin/main' into moe_kernels_refactor

3efe043

danielhanchen mentioned this pull request Mar 12, 2026

Moe kernels refactor unslothai/unsloth#4253

Open

Datta0 mentioned this pull request Mar 12, 2026

[refactor] move moe kernels to zoo unslothai/unsloth#4255

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Moe kernels refactor#529

Moe kernels refactor#529
Datta0 wants to merge 21 commits intounslothai:mainfrom
Datta0:moe_kernels_refactor

Datta0 commented Mar 3, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Mar 3, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

Datta0 commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot commented Mar 3, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Datta0 commented Mar 3, 2026 •

edited

Loading