fix: skip non-attention layers in _get_vllm_state_dict (fixes unslothai/unsloth#4073) by stakeswky · Pull Request #510 · unslothai/unsloth-zoo

stakeswky · 2026-02-23T23:40:56Z

Problem

_get_vllm_state_dict crashes with UnboundLocalError: cannot access local variable 'prefix' when processing LFM2/Mamba models (e.g. LiquidAI/LFM2.5-1.2B-Thinking).

The layer iteration loop only sets prefix inside if hasattr(layer, "self_attn") and elif hasattr(layer, "cross_attn") branches. Mamba/SSM layers use mixer instead, so neither branch executes and prefix is never assigned before the subsequent get_state_dict(f"{prefix}.o_proj", ...) call.

Fix

Add an else branch that logs and skips layers without self_attn or cross_attn. This is the minimal-invasive fix — it prevents the crash and allows models with mixed layer types (attention + Mamba) to still extract the attention layers correctly.

Future work

A more complete fix would add an elif hasattr(layer, "mixer") branch that properly extracts Mamba layer state dicts (A, B, C, D matrices, dt projections, etc.) for full round-trip fidelity. This skip-based approach is a safe stopgap.

Fixes unslothai/unsloth#4073

…oundLocalError Layers without self_attn or cross_attn (e.g. Mamba/SSM mixer layers in LFM2 models) caused an UnboundLocalError because 'prefix' was never assigned before being used in get_state_dict calls. Add an else branch that logs and skips these unsupported layer types. Fixes unslothai/unsloth#4073

gemini-code-assist · 2026-02-23T23:41:10Z

Summary of Changes

Hello @stakeswky, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical bug in the _get_vllm_state_dict function that caused crashes when attempting to extract state dictionaries from models utilizing non-attention based layers, such as Mamba/SSM architectures. The fix ensures that the state dictionary extraction process can proceed without error by gracefully skipping unsupported layer types, thereby improving the robustness of the system for diverse model architectures.

Highlights

Fix UnboundLocalError in _get_vllm_state_dict: Resolved a crash occurring when processing LFM2/Mamba models due to the prefix variable not being initialized for layers lacking self_attn or cross_attn attributes.
Graceful handling of non-attention layers: Implemented an else branch to explicitly skip and log layers that do not possess self_attn or cross_attn modules, preventing crashes and allowing partial state dict extraction for mixed-architecture models.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

unsloth_zoo/vllm_utils.py
- Added an else block to the layer iteration loop to handle layers without self_attn or cross_attn attributes, preventing an UnboundLocalError.
- Included logging for skipped layers to provide visibility into the state dictionary extraction process.

Activity

No human activity has been recorded on this pull request yet.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request addresses a crash in _get_vllm_state_dict when processing models with non-attention layers, such as Mamba. The fix introduces an else branch to gracefully skip these layers, preventing the UnboundLocalError by using a continue statement. The change is correct, minimal, and effectively resolves the issue. The added logging is also helpful for users to understand why certain layers are being skipped.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3d71efc652

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-02-23T23:43:28Z

unsloth_zoo/vllm_utils.py

+            # Skip layers that don't have self_attn or cross_attn (e.g. Mamba/SSM layers
+            # like LFM2's mixer layers). Full Mamba state dict extraction can be added later.
+            logger.info(f"Unsloth: Skipping layer {kk} — no self_attn or cross_attn found.")
+            continue


Avoid silently dropping full non-attention layer weights

This continue skips not just attention projections but also the rest of the layer (mlp, layernorms, etc.), so mixed architectures with non-attention blocks now produce an incomplete quant_state_dict without failing. In convert_vllm_to_huggingface, missing keys are silently ignored via if f"{layer_name}.weight" not in quant_state_dict: ... continue (around vllm_utils.py:1330), which leaves those modules at their initialized values and can yield incorrect model behavior while appearing successful.

Useful? React with 👍 / 👎.

danielhanchen

Tested on 1x NVIDIA B200 (CUDA 12.8, torch 2.9.1, vLLM 0.15.1) with Qwen3_4B_GRPO notebook -- PASS (training + inference OK).

Code review:

The fix is correct. continue properly skips the o_proj, MLP, and layernorm extraction that follow the attention prefix assignment, preventing the UnboundLocalError from issue #4073.
Minimal and safe approach. Mamba/SSM layer state dict extraction can be added separately.

One minor nit: the log message uses an em dash (U+2014) in "Skipping layer {kk} --- no self_attn". Consider using a plain ASCII dash for consistency with the rest of the codebase, though this is cosmetic only.

LGTM.

danielhanchen

Tested on 1x NVIDIA B200 (CUDA 12.8, torch 2.9.1, vLLM 0.15.1).

Verification:

Regression test: unsloth/Llama-3.2-1B-Instruct with fast_inference=True -- PASS. _get_vllm_state_dict processes all attention layers correctly, inference produces valid output.
LFM2 test: LiquidAI/LFM2.5-1.2B-Thinking with fast_inference=True -- vLLM's BitsAndBytes loader fails before reaching _get_vllm_state_dict (vLLM doesn't fully support Lfm2ForCausalLM quantization yet). Confirmed the UnboundLocalError on prefix is NOT hit.
Unit tests: verified continue correctly skips o_proj (line 1127), mlp.gate_up_proj (line 1129), mlp.down_proj (line 1140), and layernorm extraction (lines 1143-1154) for layers without self_attn/cross_attn. Tested all-Mamba, mixed-ordering, and Mamba-first-layer edge cases.

Code review: Fix is correct and minimal. Two small nits:

Unclosed parenthesis in comment (line 1122): (e.g. Mamba/SSM layers -- missing closing ).
Consider logger.warning instead of logger.info -- silently dropping an entire layer's weights from the state dict is significant for debugging. A warning makes it more visible when users check extraction completeness. This aligns with the codex bot's inline comment about avoiding silent weight dropping.

Neither nit blocks merge.

danielhanchen

Update: End-to-end verification completed.

Successfully reproduced the bug and confirmed the fix works.

Bug reproduction (unpatched main, 16-bit LFM2):

  File "vllm_utils.py", line 1122, in _get_vllm_state_dict
    get_state_dict(f"{prefix}.o_proj", 0, state_dict, o_proj)
                      ^^^^^^
UnboundLocalError: cannot access local variable 'prefix' where it is not associated with a value

With PR fix applied (16-bit LFM2):

[WARNING] Unsloth: Skipping layer 0 - no self_attn or cross_attn found.
[WARNING] Unsloth: Skipping layer 1 - no self_attn or cross_attn found.

The else: continue branch fires correctly for the non-attention layers. No more UnboundLocalError.

Note: LFM2 then hits a separate, unrelated issue on its attention layers (Lfm2Attention uses out_proj instead of o_proj -- line 1098). That is out of scope for this PR and would need a separate fix for full LFM2 support.

Earlier 4-bit test did not reach _get_vllm_state_dict because vLLM's BnB loader fails for Lfm2ForCausalLM. Loading in 16-bit successfully reaches the function and exercises the fix.

gemini-code-assist bot reviewed Feb 23, 2026

View reviewed changes

chatgpt-codex-connector bot reviewed Feb 23, 2026

View reviewed changes

danielhanchen approved these changes Feb 24, 2026

View reviewed changes

nit: replace em dash with ASCII dash in log message

92105c9

danielhanchen reviewed Mar 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: skip non-attention layers in _get_vllm_state_dict (fixes unslothai/unsloth#4073)#510

fix: skip non-attention layers in _get_vllm_state_dict (fixes unslothai/unsloth#4073)#510
stakeswky wants to merge 2 commits intounslothai:mainfrom
stakeswky:fix/skip-mamba-layers-vllm-state-dict

stakeswky commented Feb 23, 2026

Uh oh!

gemini-code-assist bot commented Feb 23, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Feb 23, 2026

Uh oh!

danielhanchen left a comment

Uh oh!

danielhanchen left a comment

Uh oh!

danielhanchen left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

stakeswky commented Feb 23, 2026

Problem

Fix

Future work

Uh oh!

gemini-code-assist bot commented Feb 23, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

danielhanchen left a comment

Choose a reason for hiding this comment

Uh oh!

danielhanchen left a comment

Choose a reason for hiding this comment

Uh oh!

danielhanchen left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants