fix: load embed_tokens and norm for ColQwen2/ColQwen2_5 on transformers 5.x by whybe-choi · Pull Request #414 · illuin-tech/colpali

whybe-choi · 2026-06-10T06:44:23Z

Summary

Fixes #413.

On transformers 5.x, Qwen2VLModel.__init__ nests the text backbone under language_model, so the expected keys are language_model.embed_tokens, language_model.layers.*, language_model.norm. Released Qwen2-VL checkpoints store these as model.embed_tokens, model.layers.*, model.norm.

ColQwen2._checkpoint_conversion_mapping only remapped model.layers, so embed_tokens and norm were silently dropped and randomly re-initialized when loading from a Qwen2-VL checkpoint. ColQwen2_5 had the same incomplete mapping and was affected identically.

Changes

Add the missing rules to both modeling_colqwen2.py and modeling_colqwen2_5.py:

_checkpoint_conversion_mapping = {
    r"^base_model\.model\.custom_text_proj": "custom_text_proj",
    r"^model\.layers": "language_model.layers",
    r"^model\.embed_tokens": "language_model.embed_tokens",
    r"^model\.norm": "language_model.norm",
}

This is backward-compatible: the rules only match keys starting with model., so checkpoints already saved in the language_model.* layout are unaffected.

Result

Before, language_model.embed_tokens.weight and language_model.norm.weight were reported MISSING while their source tensors were UNEXPECTED. After the fix only the expected custom_text_proj.{weight,bias} (the ColBERT projection head trained from scratch) remain MISSING.

…s 5.x The text backbone is nested under language_model in transformers 5.x, but released Qwen2-VL checkpoints store embed_tokens/norm under model.*. The checkpoint conversion mapping only remapped model.layers, so embed_tokens and norm were silently dropped and randomly re-initialized. Add the missing rules so these pretrained tensors load correctly. The rules only match keys starting with model., so checkpoints already saved in the language_model.* layout are unaffected. Fixes illuin-tech#413

QuentinJGMace

Thanks for finding this and the fix ! Could you also update the changelog ?

Otherwise it looks good to me.

whybe-choi · 2026-06-10T08:38:16Z

Done — added the changelog entry. Thanks!

QuentinJGMace self-requested a review June 10, 2026 08:25

QuentinJGMace reviewed Jun 10, 2026

View reviewed changes

docs: add changelog entry for ColQwen2/ColQwen2_5 key mapping fix

61f4bcf

QuentinJGMace approved these changes Jun 10, 2026

View reviewed changes

QuentinJGMace merged commit c23838d into illuin-tech:main Jun 10, 2026
6 checks passed

whybe-choi deleted the fix-colqwen2-key-mapping branch June 16, 2026 01:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: load embed_tokens and norm for ColQwen2/ColQwen2_5 on transformers 5.x#414

fix: load embed_tokens and norm for ColQwen2/ColQwen2_5 on transformers 5.x#414
QuentinJGMace merged 2 commits into
illuin-tech:mainfrom
whybe-choi:fix-colqwen2-key-mapping

whybe-choi commented Jun 10, 2026

Uh oh!

QuentinJGMace left a comment

Uh oh!

whybe-choi commented Jun 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

whybe-choi commented Jun 10, 2026

Summary

Changes

Result

Uh oh!

QuentinJGMace left a comment

Choose a reason for hiding this comment

Uh oh!

whybe-choi commented Jun 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants