Add Bnb4bit support for MoE models on transformers v5 - #4032 by sensai99 · Pull Request #527 · unslothai/unsloth-zoo

sensai99 · 2026-03-02T16:17:29Z

Hi!

This PR adds support for quantization for MoE parameters of nn.Parameter.

With transformers v5, MoE parameters of nn.Parameter won't get quantized. This PR adds support for the quantization by doing the folllowing:

Converts the expert params nn.Parameter to nn.Params4bit
Handles the quantization and dequantization accordingly for peft lora compatibility

Analysis using GLM-4.7-Flash:

gemini-code-assist · 2026-03-02T16:17:49Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces crucial functionality to enable 4-bit quantization for Mixture-of-Experts (MoE) models within the Transformers library, particularly for versions 5 and above. It addresses the challenge of quantizing MoE expert parameters that are defined as nn.Parameter rather than nn.Linear layers, which were previously not handled. The changes ensure that these parameters are correctly processed, quantized, and dequantized, maintaining full compatibility with PEFT LoRA adaptations by preserving necessary shape information throughout the quantization pipeline.

Highlights

MoE Parameter Quantization: Added comprehensive support for 4-bit quantization of Mixture-of-Experts (MoE) parameters, specifically targeting nn.Parameter instances in Transformers v5 models.
PEFT LoRA Compatibility: Ensured compatibility with PEFT LoRA by correctly handling the shape and attributes of quantized MoE expert parameters, including preserving _original_shape during quantization and reporting correct 3D shapes.
Bitsandbytes Integration: Implemented patches to transformers' bitsandbytes quantization logic to recognize, prepare, and convert MoE expert parameters (e.g., gate_up_proj, down_proj) into Params4bit placeholders.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

unsloth_zoo/temporary_patches/init.py
- Imported the new moe_bnb_transformers module to integrate its patching functionalities.
unsloth_zoo/temporary_patches/misc.py
- Introduced _ParamShapeProxy to correctly expose 3D shapes for 4-bit MoE parameters, ensuring compatibility with PEFT's ParamWrapper.
- Added patch_peft_param_wrapper_4bit_expert_shape to apply the shape proxy, allowing ParamWrapper to correctly derive dimensions for quantized MoE parameters.
unsloth_zoo/temporary_patches/moe_bnb_transformers.py
- Added a new module dedicated to patching transformers' bitsandbytes quantization for MoE expert parameters.
- Implemented _is_expert_module to identify MoE expert modules based on nn.Parameter attributes.
- Created replace_expert_params_with_bnb_params to prepare MoE expert parameters by replacing them with Params4bit placeholders on a meta device before weight loading.
- Developed patch_bnb4bit_quantize_convert to modify the Bnb4bitQuantize.convert method, ensuring correct quantization and preservation of _original_shape for MoE expert parameters.
- Included patch_bnb4bit_quantizer_param_needs_quantization to extend Bnb4BitHfQuantizer's logic to recognize Params4bit expert placeholders as needing quantization.
- Added patch_bnb4bit_quantizer_process_model to integrate the expert parameter replacement into Bnb4BitHfQuantizer._process_model_before_weight_loading.
unsloth_zoo/temporary_patches/moe_utils.py
- Integrated bitsandbytes availability checks and Params4bit import.
- Modified _get_base_weight to include dequantization logic for Params4bit instances.
- Updated _is_moe_experts_module to correctly identify 4-bit quantized MoE expert parameters.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This PR introduces support for quantization of MoE parameters in transformers v5 by converting expert parameters to nn.Params4bit and handling quantization/dequantization for PEFT LoRA compatibility. It includes a new module moe_bnb_transformers.py with patching functions and modifications to misc.py and __init__.py to integrate the new functionality.

gemini-code-assist · 2026-03-02T16:20:16Z

unsloth_zoo/temporary_patches/moe_utils.py

+    # If the parameter is a Params4bit, dequantize it
+    if _check_bnb_available() and isinstance(param, Params4bit):
+        # Dequantize the parameter
+        return bnb.functional.dequantize_4bit(param.data, param.quant_state)


Consider adding a check to ensure param.quant_state is not None before dequantizing. If quant_state is None, it could lead to an error during dequantization.

Suggested change

# If the parameter is a Params4bit, dequantize it

if _check_bnb_available() and isinstance(param, Params4bit):

# Dequantize the parameter

return bnb.functional.dequantize_4bit(param.data, param.quant_state)

# If the parameter is a Params4bit, dequantize it

if _check_bnb_available() and isinstance(param, Params4bit) and param.quant_state is not None:

# Dequantize the parameter

return bnb.functional.dequantize_4bit(param.data, param.quant_state)

unsloth_zoo/temporary_patches/misc.py

gemini-code-assist · 2026-03-02T16:20:16Z

unsloth_zoo/temporary_patches/moe_bnb_transformers.py

+    except Exception as e:
+        return raise_error("transformers.quantizers.quantizers_utils.should_convert_module", e)


Consider adding a more specific exception type instead of a general Exception to catch only the expected error, which would prevent masking other potential issues.

gemini-code-assist · 2026-03-02T16:20:16Z

unsloth_zoo/temporary_patches/moe_bnb_transformers.py

+    if not has_been_replaced:
+        logger.warning(f"Unsloth: No expert parameters were found to be replaced for {model.name_or_path}")


It's good to log a warning when no expert parameters are found. However, consider adding more context to the warning message, such as the specific layers or modules that were expected to have expert parameters, to aid in debugging.

gemini-code-assist · 2026-03-02T16:20:16Z

unsloth_zoo/temporary_patches/moe_bnb_transformers.py

+        except Exception as e:
+            logger.warning(f"Unsloth: Error handling expert param quantization for {full_layer_name}: {e}")


The except block catches a general Exception, which might hide unexpected errors. It's better to catch specific exceptions like KeyError or AttributeError that you anticipate and handle them accordingly. This can prevent masking other potential issues.

gemini-code-assist · 2026-03-02T16:20:16Z

unsloth_zoo/temporary_patches/moe_bnb_transformers.py

+            # TODO: Can we raise an error here?
+            logger.warning(
+                f"Unsloth: Error checking MoE expert param_needs_quantization for {param_name}: {e}"
+            )


The TODO comment suggests there might be a better way to handle the exception. Consider raising an error to prevent unexpected behavior or investigate why the exception is occurring in the first place.

unsloth_zoo/temporary_patches/moe_utils.py

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f3f2c6eba9

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

unsloth_zoo/temporary_patches/moe_bnb_transformers.py

unsloth_zoo/temporary_patches/moe_utils.py

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d5b567c528

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

unsloth_zoo/temporary_patches/misc.py

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fb69ead7ca

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

unsloth_zoo/temporary_patches/moe_utils.py

sensai99 added 18 commits February 19, 2026 16:46

add fixes for moe

dcde5aa

fix quantized params4bit returns correct features via proxy param

d07b488

remove the old code

a706108

clean moe_bnb_transformers.py

50d3560

clean moe_utils

eebb31d

clean misc.py

615bc12

replace only expert params

8e79b07

"fix function doc"

d98f6a1

"fix logic for handling experts params that are not nn.Parameter"

4cb3a65

"clean moe_bnb_transformers"

87eb337

"fix comments in misc"

52f0260

"clean moe_utils comments"

4b4ab68

"rename file"

d12c0c2

"fix minor issue"

c4ad5e7

"move patches to new file moe_bnb_transformers.py"

268c6c0

"restore moe_bnb.py"

d9bb923

"minor changes"

e730337

"fix Params4bit instance check in moe_utils"

f3f2c6e

gemini-code-assist bot reviewed Mar 2, 2026

View reviewed changes

chatgpt-codex-connector bot reviewed Mar 2, 2026

View reviewed changes

unsloth_zoo/temporary_patches/moe_bnb_transformers.py Outdated Show resolved Hide resolved

unsloth_zoo/temporary_patches/moe_bnb_transformers.py Show resolved Hide resolved

unsloth_zoo/temporary_patches/moe_utils.py Show resolved Hide resolved

"clean code"

d5b567c

chatgpt-codex-connector bot reviewed Mar 14, 2026

View reviewed changes

unsloth_zoo/temporary_patches/misc.py Outdated Show resolved Hide resolved

sensai99 added 3 commits March 14, 2026 09:50

Merge remote-tracking branch 'upstream/main' into moeFix

d7edd84

"fix docs @moe_utils.py"

d5a2a6b

"fix docs @misc.py"

fb69ead

chatgpt-codex-connector bot reviewed Mar 14, 2026

View reviewed changes

unsloth_zoo/temporary_patches/moe_utils.py Show resolved Hide resolved

"fix minor function patch issue"

493405b

		except Exception as e:
		return raise_error("transformers.quantizers.quantizers_utils.should_convert_module", e)

		if not has_been_replaced:
		logger.warning(f"Unsloth: No expert parameters were found to be replaced for {model.name_or_path}")

		except Exception as e:
		logger.warning(f"Unsloth: Error handling expert param quantization for {full_layer_name}: {e}")

Conversation

sensai99 commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot commented Mar 2, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gemini-code-assist bot Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

sensai99 commented Mar 2, 2026 •

edited

Loading