Conversation
LFM2 is a hybrid architecture with alternating attention and conv layers that uses different naming conventions than standard transformer models. Previously, _get_vllm_state_dict would crash with UnboundLocalError on layers without self_attn (conv layers), and even if that was patched with else:continue, all conv layer weights would be silently dropped. Changes in vllm_utils.py: - Add is_lfm2 detection based on model_type - Add LFM2 branch in _get_vllm_state_dict layer loop that handles: - Attention layers with out_proj (not o_proj), q_layernorm, k_layernorm - Conv layers via short_conv (in_proj, out_proj, conv) - Feed forward via feed_forward.w1/w2/w3 (not mlp.gate_up_proj) - Layer norms: operator_norm, ffn_norm - Add else:continue for non-LFM2 path to prevent UnboundLocalError on layers without self_attn or cross_attn - Fix final norm extraction to use embedding_norm for LFM2 - Add LFM2 norm names to layernorm_names in convert_vllm_to_huggingface - Add Conv1d (3D weight) handling for LFM2 conv.conv weights Changes in empty_model.py: - Add LFM2 layer names to standard_layers in get_model_layer_config - Add LFM2 norm names to layernorms in get_model_layer_config - Fix set_additional_modules to use embedding_norm for LFM2 Tested with LiquidAI/LFM2.5-1.2B-Thinking (16-bit, fast_inference=True). Regression tested with Llama-3.2-1B, Gemma-3-1b, Qwen2.5-0.5B, Phi-4-mini.
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces full compatibility for LFM2 models within the vLLM state dictionary extraction and Hugging Face conversion utilities. LFM2 models, characterized by their unique hybrid architecture combining attention and convolutional layers, required specialized handling due to different naming conventions and layer structures. The changes ensure that all LFM2-specific components, including attention, convolutional, and feed-forward layers, along with their associated normalization layers, are correctly identified, extracted, and converted, preventing previous errors and ensuring functional model conversion. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
| get_state_dict(f"{prefix}.v_proj", 2, state_dict, qkv_proj) | ||
| elif hasattr(layer, "cross_attn"): | ||
| prefix = f"{vllm_text_model_prefix}.layers.{kk}.cross_attn" | ||
| qkv_proj = layer.cross_attn.qkv_proj |
|
|
||
| elif weight.ndim == 3: | ||
| # Conv1d weights (e.g. LFM2 conv.conv) - set directly on existing module | ||
| weight_param = torch.nn.Parameter(getattr(weight, 'data', weight), requires_grad=False) |
There was a problem hiding this comment.
Code Review
This pull request adds comprehensive support for LFM2 models within the vLLM utilities, addressing their unique architecture and naming conventions. The changes in empty_model.py and vllm_utils.py appear correct and well-thought-out for handling LFM2-specific layers and norms. I have two suggestions to enhance the code quality: one focuses on refactoring a complex loop for better maintainability, and the other addresses the use of exec, proposing a safer alternative.
| elif weight.ndim == 3: | ||
| # Conv1d weights (e.g. LFM2 conv.conv) - set directly on existing module | ||
| weight_param = torch.nn.Parameter(getattr(weight, 'data', weight), requires_grad=False) | ||
| layer_name_br = re.sub(r"\.([\d]{1,})\.", r"[\1].", layer_name) | ||
| exec(f"new_model.{layer_name_br}.weight = None") | ||
| exec(f"new_model.{layer_name_br}.weight = weight_param") | ||
| if bias is not None: | ||
| exec(f"new_model.{layer_name_br}.bias = None") | ||
| exec(f"new_model.{layer_name_br}.bias = bias") | ||
| continue |
There was a problem hiding this comment.
Using exec() is generally discouraged due to security risks and reduced code clarity, and this pattern appears multiple times in the function. It would be safer and more maintainable to replace exec() with a helper function that uses getattr() and setattr() to navigate the model structure and assign the weights. For example, you could write a helper to parse the layer_name_br string, retrieve the target module, and then set its weight and bias attributes directly.
| if is_lfm2: | ||
| layer_prefix = f"{vllm_text_model_prefix}.layers.{kk}" | ||
|
|
||
| # Handle attention or conv sublayers | ||
| if hasattr(layer, "self_attn"): | ||
| prefix = f"{layer_prefix}.self_attn" | ||
| qkv_proj = layer.self_attn.qkv_proj | ||
| get_state_dict(f"{prefix}.q_proj", 0, state_dict, qkv_proj) | ||
| get_state_dict(f"{prefix}.k_proj", 1, state_dict, qkv_proj) | ||
| get_state_dict(f"{prefix}.v_proj", 2, state_dict, qkv_proj) | ||
| elif hasattr(layer, "cross_attn"): | ||
| prefix = f"{vllm_text_model_prefix}.layers.{kk}.cross_attn" | ||
| qkv_proj = layer.cross_attn.qkv_proj | ||
| o_proj = layer.cross_attn.o_proj | ||
| name = re.sub(r"\.(\d+)\.", r"[\1].", prefix.replace('model.language_model','language_model.model', 1) + ".qkv_proj") | ||
| cross_attn_layer = eval(f'vllm_internals.{name}') | ||
| q_proj = cross_attn_layer.proj['q_proj_decoder'] | ||
| kv_proj = cross_attn_layer.proj['kv_proj_encoder'] | ||
| get_state_dict(f"{prefix}.q_proj", 0, state_dict, q_proj) | ||
| get_state_dict(f"{prefix}.k_proj", 1, state_dict, kv_proj) | ||
| get_state_dict(f"{prefix}.v_proj", 2, state_dict, kv_proj) | ||
|
|
||
| get_state_dict(f"{prefix}.o_proj", 0, state_dict, o_proj) | ||
|
|
||
| proj = layer.mlp.gate_up_proj | ||
| use_fused_gate_up = _is_fused_module("gate_up_proj") | ||
| if use_fused_gate_up: | ||
| # For some model types like phi3 vllm will expect fused gate_up_proj (e.g. Phi3, Phi3.5-mini-instruct, Phi4-mini-instruct) | ||
| # so we should not split them here otherwise there will be a size mismatch when activating the adapter | ||
| # see https://github.com/vllm-project/vllm/blob/9b693d023cf595e60b5346fdeeb41cf2a6eda838/vllm/model_executor/models/phi3.py | ||
| get_state_dict(f"{vllm_text_model_prefix}.layers.{kk}.mlp.gate_up_proj", 0, state_dict, proj, slice_weights=False) | ||
| # LFM2 uses out_proj, not o_proj | ||
| out_proj = layer.self_attn.out_proj | ||
| get_state_dict(f"{prefix}.out_proj", 0, state_dict, out_proj) | ||
| # Attention-specific norms | ||
| for norm_name in ("q_layernorm", "k_layernorm"): | ||
| norm_module = getattr(layer.self_attn, norm_name, None) | ||
| if norm_module is not None: | ||
| norm_key = f"{prefix}.{norm_name}.weight" | ||
| state_dict[norm_key] = norm_module.weight.data | ||
| quant_state_dict[norm_key] = state_dict[norm_key] | ||
| elif hasattr(layer, "short_conv"): | ||
| # Conv layers | ||
| conv = layer.short_conv | ||
| get_state_dict(f"{layer_prefix}.conv.in_proj", 0, state_dict, conv.in_proj, slice_weights=False) | ||
| get_state_dict(f"{layer_prefix}.conv.out_proj", 0, state_dict, conv.out_proj, slice_weights=False) | ||
| get_state_dict(f"{layer_prefix}.conv.conv", 0, state_dict, conv.conv, slice_weights=False) | ||
| else: | ||
| continue | ||
|
|
||
| # Feed forward (both attention and conv layers have feed_forward) | ||
| ff = layer.feed_forward | ||
| # w1 is fused (w1 + w3 in HF) -- shard 0 = w1, shard 1 = w3 | ||
| get_state_dict(f"{layer_prefix}.feed_forward.w1", 0, state_dict, ff.w1) | ||
| get_state_dict(f"{layer_prefix}.feed_forward.w3", 1, state_dict, ff.w1) | ||
| get_state_dict(f"{layer_prefix}.feed_forward.w2", 0, state_dict, ff.w2, slice_weights=False) | ||
|
|
||
| # Layer norms | ||
| for norm_name in ("operator_norm", "ffn_norm"): | ||
| norm_module = getattr(layer, norm_name, None) | ||
| if norm_module is not None: | ||
| norm_key = f"{layer_prefix}.{norm_name}.weight" | ||
| state_dict[norm_key] = norm_module.weight.data | ||
| quant_state_dict[norm_key] = state_dict[norm_key] | ||
| else: | ||
| get_state_dict(f"{vllm_text_model_prefix}.layers.{kk}.mlp.gate_proj", 0, state_dict, proj) | ||
| get_state_dict(f"{vllm_text_model_prefix}.layers.{kk}.mlp.up_proj", 1, state_dict, proj) | ||
| if hasattr(layer, "self_attn"): | ||
| prefix = f"{vllm_text_model_prefix}.layers.{kk}.self_attn" | ||
| qkv_proj = layer.self_attn.qkv_proj | ||
| o_proj = layer.self_attn.o_proj | ||
|
|
||
| use_fused_qkv = _is_fused_module("qkv_proj") | ||
| if use_fused_qkv: | ||
| # For some model types like phi3 vllm will expect fused qkv (e.g. Phi3, Phi3.5-mini-instruct, Phi4-mini-instruct) | ||
| # so we should not split them here otherwise there will be a size mismatch when activating the adapter | ||
| # see https://github.com/vllm-project/vllm/blob/9b693d023cf595e60b5346fdeeb41cf2a6eda838/vllm/model_executor/models/phi3.py | ||
| get_state_dict(f"{prefix}.qkv_proj", 0, state_dict, qkv_proj, slice_weights=False) | ||
| else: | ||
| get_state_dict(f"{prefix}.q_proj", 0, state_dict, qkv_proj) | ||
| get_state_dict(f"{prefix}.k_proj", 1, state_dict, qkv_proj) | ||
| get_state_dict(f"{prefix}.v_proj", 2, state_dict, qkv_proj) | ||
| elif hasattr(layer, "cross_attn"): | ||
| prefix = f"{vllm_text_model_prefix}.layers.{kk}.cross_attn" | ||
| qkv_proj = layer.cross_attn.qkv_proj | ||
| o_proj = layer.cross_attn.o_proj | ||
| name = re.sub(r"\.(\d+)\.", r"[\1].", prefix.replace('model.language_model','language_model.model', 1) + ".qkv_proj") | ||
| cross_attn_layer = eval(f'vllm_internals.{name}') | ||
| q_proj = cross_attn_layer.proj['q_proj_decoder'] | ||
| kv_proj = cross_attn_layer.proj['kv_proj_encoder'] | ||
| get_state_dict(f"{prefix}.q_proj", 0, state_dict, q_proj) | ||
| get_state_dict(f"{prefix}.k_proj", 1, state_dict, kv_proj) | ||
| get_state_dict(f"{prefix}.v_proj", 2, state_dict, kv_proj) | ||
| else: | ||
| continue | ||
|
|
||
| proj = layer.mlp.down_proj | ||
| get_state_dict(f"{vllm_text_model_prefix}.layers.{kk}.mlp.down_proj", 0, state_dict, proj) | ||
| get_state_dict(f"{prefix}.o_proj", 0, state_dict, o_proj) | ||
|
|
||
| # Use layernorms from the layer configuration | ||
| layernorm_names = [name.format(kk=kk) for name in layer_config['layernorms']] | ||
| proj = layer.mlp.gate_up_proj | ||
| use_fused_gate_up = _is_fused_module("gate_up_proj") | ||
| if use_fused_gate_up: | ||
| # For some model types like phi3 vllm will expect fused gate_up_proj (e.g. Phi3, Phi3.5-mini-instruct, Phi4-mini-instruct) | ||
| # so we should not split them here otherwise there will be a size mismatch when activating the adapter | ||
| # see https://github.com/vllm-project/vllm/blob/9b693d023cf595e60b5346fdeeb41cf2a6eda838/vllm/model_executor/models/phi3.py | ||
| get_state_dict(f"{vllm_text_model_prefix}.layers.{kk}.mlp.gate_up_proj", 0, state_dict, proj, slice_weights=False) | ||
| else: | ||
| get_state_dict(f"{vllm_text_model_prefix}.layers.{kk}.mlp.gate_proj", 0, state_dict, proj) | ||
| get_state_dict(f"{vllm_text_model_prefix}.layers.{kk}.mlp.up_proj", 1, state_dict, proj) | ||
|
|
||
| for layernorm_name in layernorm_names: | ||
| vllm_name = layernorm_name.replace(f".{kk}.", f"[{kk}].").replace(vllm_text_model_prefix, "vllm_text_model") | ||
| try: | ||
| layernorm = eval(vllm_name).state_dict()["weight"] | ||
| layernorm_name = f"{layernorm_name}.weight" | ||
| state_dict[layernorm_name] = layernorm | ||
| quant_state_dict[layernorm_name] = state_dict[layernorm_name] | ||
| except Exception as e: | ||
| skipped_layernorms.append(layernorm_name.split(".")[-1]) | ||
| proj = layer.mlp.down_proj | ||
| get_state_dict(f"{vllm_text_model_prefix}.layers.{kk}.mlp.down_proj", 0, state_dict, proj) | ||
|
|
||
| # Use layernorms from the layer configuration | ||
| layernorm_names = [name.format(kk=kk) for name in layer_config['layernorms']] | ||
|
|
||
| for layernorm_name in layernorm_names: | ||
| vllm_name = layernorm_name.replace(f".{kk}.", f"[{kk}].").replace(vllm_text_model_prefix, "vllm_text_model") | ||
| try: | ||
| layernorm = eval(vllm_name).state_dict()["weight"] | ||
| layernorm_name = f"{layernorm_name}.weight" | ||
| state_dict[layernorm_name] = layernorm | ||
| quant_state_dict[layernorm_name] = state_dict[layernorm_name] | ||
| except Exception as e: | ||
| skipped_layernorms.append(layernorm_name.split(".")[-1]) | ||
| pass |
There was a problem hiding this comment.
The logic inside this loop has become quite complex with the large if is_lfm2: ... else: ... block. To improve readability and maintainability, consider refactoring the logic for LFM2 and non-LFM2 models into separate helper functions. This would make the main loop in _get_vllm_state_dict much cleaner and easier to follow.
Summary
_get_vllm_state_dictandconvert_vllm_to_huggingfaceUnboundLocalErroron layers that lackself_attn(conv layers). Even withelse: continue, all conv layer weights would be silently dropped, producing a broken model.set_additional_modulesandget_model_layer_configinempty_model.pyso the empty HF model is created with the correct layer names and norm attribute for LFM2What changed
vllm_utils.py --
_get_vllm_state_dict:model_type == "lfm2"and branch into dedicated extraction logicout_proj(LFM2 usesout_proj, noto_proj), plusq_layernorm/k_layernormshort_conv(in_proj,out_proj,conv)w1/w2/w3(LFM2 usesfeed_forward, notmlp)operator_norm,ffn_normembedding_normfor LFM2 (standard models usenorm)else: continuefor the non-LFM2 path to preventUnboundLocalErroron layers withoutself_attnorcross_attnvllm_utils.py --
convert_vllm_to_huggingface:operator_norm,ffn_norm,q_layernorm,k_layernorm) tolayernorm_namesconv.convweightsempty_model.py:
standard_layersinget_model_layer_configlayernormsinget_model_layer_configset_additional_modulesto useembedding_normfor LFM2Test plan
LiquidAI/LFM2.5-1.2B-Thinkingwithfast_inference=True,load_in_16bit=True-- loads and generates correctlyunsloth/Llama-3.2-1B-Instructwithfast_inference=True-- still worksunsloth/Qwen2.5-0.5B-Instructwithfast_inference=True-- still worksis_lfm2is always False for non-LFM2 models, so the original code path is unchanged