Conversation
There was a problem hiding this comment.
Pull request overview
This PR fixes Qwen3-VL SwiGLU patching so apply_liger_kernel_to_qwen3_vl(..., swiglu=True) is effective (including for already-instantiated model instances), adds regression tests, and documents Qwen3-VL support.
Changes:
- Enable SwiGLU patching for Qwen3-VL by default and patch
Qwen3VLTextMLP/ existingdecoder_layer.mlpinstances. - Add/extend monkey-patch tests to assert MLP forward patching behavior for Qwen3-VL.
- Update the README supported-models table to include Qwen3-VL.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
src/liger_kernel/transformers/monkey_patch.py |
Implements Qwen3-VL SwiGLU patching (module class + existing instances) and flips the default swiglu behavior. |
test/transformers/test_monkey_patch.py |
Adds assertions and a new test covering Qwen3-VL MLP patching via the swiglu flag. |
README.md |
Adds Qwen3-VL to the support matrix and adjusts Qwen-family rows. |
Comments suppressed due to low confidence (1)
src/liger_kernel/transformers/monkey_patch.py:1799
- The
apply_liger_kernel_to_qwen3_vldocstring’sArgs:section doesn’t document theropeparameter even though it is part of the public signature (and is patched inside the function). Please add anrope (bool): ...entry for clarity and to keep it consistent with the otherapply_liger_kernel_to_*docstrings in this module.
"""
Apply Liger kernels to replace original implementation in HuggingFace Qwen3-VL models.
Args:
cross_entropy (bool): Whether to apply Liger's cross entropy loss. Default is False.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| | Qwen2, Qwen2.5, & QwQ | `liger_kernel.transformers.apply_liger_kernel_to_qwen2` | RoPE, RMSNorm, SwiGLU, CrossEntropyLoss, FusedLinearCrossEntropy | | ||
| | Qwen2-VL, & QVQ | `liger_kernel.transformers.apply_liger_kernel_to_qwen2_vl` | RMSNorm, LayerNorm, SwiGLU, CrossEntropyLoss, FusedLinearCrossEntropy | | ||
| | Qwen2.5-VL | `liger_kernel.transformers.apply_liger_kernel_to_qwen2_5_vl` | RMSNorm, SwiGLU, CrossEntropyLoss, FusedLinearCrossEntropy | | ||
| | Qwen3-VL | `liger_kernel.transformers.apply_liger_kernel_to_qwen3_vl` | RoPE, RMSNorm, SwiGLU, CrossEntropyLoss, FusedLinearCrossEntropy | | ||
| | Qwen3 | `liger_kernel.transformers.apply_liger_kernel_to_qwen3` | RoPE, RMSNorm, SwiGLU, CrossEntropyLoss, FusedLinearCrossEntropy | | ||
| | Qwen3 MoE | `liger_kernel.transformers.apply_liger_kernel_to_qwen3_moe` | RoPE, RMSNorm, SwiGLU, CrossEntropyLoss, FusedLinearCrossEntropy | |
There was a problem hiding this comment.
This README table section appears to introduce mixed line endings (surrounding lines are CRLF, while the newly added/edited Qwen rows show as LF-only). Please normalize the line endings for the file (and ideally enforce via .editorconfig) to avoid noisy diffs and potential formatting issues on Windows tooling.
Summary
Fix
apply_liger_kernel_to_qwen3_vl(..., swiglu=True)so it is no longer a no-op.Changes
transformers.models.qwen3_vl.modeling_qwen3_vl.Qwen3VLTextMLPtoLigerSwiGLUMLPdecoder_layer.mlpmodules for existing Qwen3-VL model instancesswiglu=Trueby default forapply_liger_kernel_to_qwen3_vlswiglu=TrueValidation