Skip to content

Enable SwiGLU patching for Qwen3-VL#1175

Open
dongchany wants to merge 1 commit intolinkedin:mainfrom
dongchany:qwen3-vl-swiglu
Open

Enable SwiGLU patching for Qwen3-VL#1175
dongchany wants to merge 1 commit intolinkedin:mainfrom
dongchany:qwen3-vl-swiglu

Conversation

@dongchany
Copy link
Copy Markdown

Summary

Fix apply_liger_kernel_to_qwen3_vl(..., swiglu=True) so it is no longer a no-op.

Changes

  • patch transformers.models.qwen3_vl.modeling_qwen3_vl.Qwen3VLTextMLP to LigerSwiGLUMLP
  • patch instantiated decoder_layer.mlp modules for existing Qwen3-VL model instances
  • enable swiglu=True by default for apply_liger_kernel_to_qwen3_vl
  • add monkey-patch tests covering both default instance patching and explicit swiglu=True
  • document Qwen3-VL support in the README table

Validation

python -m pytest -q test/transformers/test_monkey_patch.py -k 'qwen3_vl and not moe'
python -m pytest -q test/convergence/bf16/test_mini_models.py -k 'mini_qwen3_vl and test_mini_model'
python -m ruff check src/liger_kernel/transformers/monkey_patch.py test/transformers/test_monkey_patch.py README.md

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes Qwen3-VL SwiGLU patching so apply_liger_kernel_to_qwen3_vl(..., swiglu=True) is effective (including for already-instantiated model instances), adds regression tests, and documents Qwen3-VL support.

Changes:

  • Enable SwiGLU patching for Qwen3-VL by default and patch Qwen3VLTextMLP / existing decoder_layer.mlp instances.
  • Add/extend monkey-patch tests to assert MLP forward patching behavior for Qwen3-VL.
  • Update the README supported-models table to include Qwen3-VL.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
src/liger_kernel/transformers/monkey_patch.py Implements Qwen3-VL SwiGLU patching (module class + existing instances) and flips the default swiglu behavior.
test/transformers/test_monkey_patch.py Adds assertions and a new test covering Qwen3-VL MLP patching via the swiglu flag.
README.md Adds Qwen3-VL to the support matrix and adjusts Qwen-family rows.
Comments suppressed due to low confidence (1)

src/liger_kernel/transformers/monkey_patch.py:1799

  • The apply_liger_kernel_to_qwen3_vl docstring’s Args: section doesn’t document the rope parameter even though it is part of the public signature (and is patched inside the function). Please add an rope (bool): ... entry for clarity and to keep it consistent with the other apply_liger_kernel_to_* docstrings in this module.
    """
    Apply Liger kernels to replace original implementation in HuggingFace Qwen3-VL models.

    Args:
        cross_entropy (bool): Whether to apply Liger's cross entropy loss. Default is False.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +263 to +268
| Qwen2, Qwen2.5, & QwQ | `liger_kernel.transformers.apply_liger_kernel_to_qwen2` | RoPE, RMSNorm, SwiGLU, CrossEntropyLoss, FusedLinearCrossEntropy |
| Qwen2-VL, & QVQ | `liger_kernel.transformers.apply_liger_kernel_to_qwen2_vl` | RMSNorm, LayerNorm, SwiGLU, CrossEntropyLoss, FusedLinearCrossEntropy |
| Qwen2.5-VL | `liger_kernel.transformers.apply_liger_kernel_to_qwen2_5_vl` | RMSNorm, SwiGLU, CrossEntropyLoss, FusedLinearCrossEntropy |
| Qwen3-VL | `liger_kernel.transformers.apply_liger_kernel_to_qwen3_vl` | RoPE, RMSNorm, SwiGLU, CrossEntropyLoss, FusedLinearCrossEntropy |
| Qwen3 | `liger_kernel.transformers.apply_liger_kernel_to_qwen3` | RoPE, RMSNorm, SwiGLU, CrossEntropyLoss, FusedLinearCrossEntropy |
| Qwen3 MoE | `liger_kernel.transformers.apply_liger_kernel_to_qwen3_moe` | RoPE, RMSNorm, SwiGLU, CrossEntropyLoss, FusedLinearCrossEntropy |
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This README table section appears to introduce mixed line endings (surrounding lines are CRLF, while the newly added/edited Qwen rows show as LF-only). Please normalize the line endings for the file (and ideally enforce via .editorconfig) to avoid noisy diffs and potential formatting issues on Windows tooling.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants