Skip to content

Upgrade transformers to 5.9#1506

Closed
dxqb wants to merge 4 commits into
masterfrom
revert-1504-revert-transformers-v5
Closed

Upgrade transformers to 5.9#1506
dxqb wants to merge 4 commits into
masterfrom
revert-1504-revert-transformers-v5

Conversation

@dxqb

@dxqb dxqb commented Jun 5, 2026

Copy link
Copy Markdown
Collaborator

reopens #1472

This PR needs significantly more work, because transformers have screwed up any existing code that uses CLIP encoders on a non-surface level (such as applying LoRAs).

There are also minor issues to fix for T5 and the Qwen TEs, but the main one is CLIP. This is the projected work, but this could be incomplete:

Background: transformers 5.6 flattened CLIPTextModel — the text_model wrapper submodule is
gone; embeddings, encoder and final_layer_norm sit directly on the model. Checkpoints on disk
keep the old text_model.* key format, and from_pretrained translates via a new central
conversion registry. CLIPTextModelWithProjection still nests a text_model, so only
CLIPTextModel users are affected: SD1.x, SDXL TE1, Flux TE1, HunyuanVideo TE2, Würstchen.
OneTrainer breaks everywhere it bypasses from_pretrained or reaches into the module structure
directly.

1. Loading. HFModelLoaderMixin loads weights manually (for custom quantization and dtype
control) and only knows the v4-era conversion hooks, which v5 removed. It needs to apply the
renamings from transformers' own registry instead (get_model_conversion_mapping +
rename_source_key), with a fallback that keeps the original key when the rename doesn't match
the module — CLIPTextModelWithProjection shares the registry entry but still has the nested
layout. This will also restore old-checkpoint Qwen loading, whose v4 workaround silently died with
the upgrade. Weights must also be re-tied after manual loading: v5 only ties them in
from_pretrained, leaving tied params like T5's shared/embed_tokens on the meta device (the
Chroma failure).

2. diffusers upgrade. Bump the pin to current main to pull in diffusers #13843, which fixes
from_single_file for flattened CLIP — covers single-file loading inside diffusers (used by SDXL
and others).

3. Attribute access. Drop .text_model at the sites where the encoder is a flattened
CLIPTextModel; add a clip_util.text_transformer() helper for the two genuinely polymorphic
sites (encode_clip, and the Würstchen prior, which is a CLIPTextModel for v2 but a
WithProjection for Stable Cascade).

4. LoRA key compatibility. LoRA key names derive from module paths, so they will silently
change (lora_te1.encoder… instead of lora_te1.text_model.encoder…), breaking resume of
existing LoRAs and kohya/ComfyUI-compatible export. Wrap flattened text encoders with a
".text_model" prefix, which reproduces the previous key set exactly — conversion tables then
need no changes.

5. Saving. save_pretrained only writes old-format keys when the model carries the
_weight_conversions it got from from_pretrained; manually built models will write flattened
keys to disk, breaking external consumers of saved checkpoints. The loader must attach the
conversions it applied, so saved models keep the ecosystem-standard key format. The single-file
exporters (convert_sd/sdxl_diffusers_to_ckpt) need the text_model segment re-added to their
output keys.

6. SD1.x single-file loading. This goes through diffusers' legacy converter, which upstream
did not fix. Normalize checkpoint keys to the flattened layout before conversion — making the old
NAI key fix implicit — and build the SD2 text encoder with the fixed modern single-file
implementation, injecting it into the legacy function to bypass its broken hardcoded conversion.
Fix the .ckpt fallback, currently dead due to a missing argument, in passing.

@dxqb dxqb mentioned this pull request Jun 6, 2026
5 tasks
@dxqb dxqb added the preview merged in the preview branch label Jun 13, 2026
@Koratahiu

Copy link
Copy Markdown
Contributor

Even with this, SDXL loading is still bugged. I had to add a few lines:

--- modules/modelSetup/BaseStableDiffusionXLSetup.py
+++ modules/modelSetup/BaseStableDiffusionXLSetup.py
@@ -142,14 +142,26 @@
     def _setup_embedding_wrapper(
             self,
             model: StableDiffusionXLModel,
             config: TrainConfig,
     ):
-        model.embedding_wrapper_1 = AdditionalEmbeddingWrapper(
-            tokenizer=model.tokenizer_1,
-            orig_module=model.text_encoder_1.text_model.embeddings.token_embedding,
-            embeddings=model.all_text_encoder_1_embeddings(),
-        )
-        model.embedding_wrapper_2 = AdditionalEmbeddingWrapper(
-            tokenizer=model.tokenizer_2,
-            orig_module=model.text_encoder_2.text_model.embeddings.token_embedding,
-            embeddings=model.all_text_encoder_2_embeddings(),
-        )
+        if hasattr(model.text_encoder_1, "text_model"):
+            orig_module_1 = model.text_encoder_1.text_model.embeddings.token_embedding
+        else:
+            orig_module_1 = model.text_encoder_1.embeddings.token_embedding
+
+        if hasattr(model.text_encoder_2, "text_model"):
+            orig_module_2 = model.text_encoder_2.text_model.embeddings.token_embedding
+        else:
+            orig_module_2 = model.text_encoder_2.embeddings.token_embedding
+
+        model.embedding_wrapper_1 = AdditionalEmbeddingWrapper(
+            tokenizer=model.tokenizer_1,
+            orig_module=orig_module_1,
+            embeddings=model.all_text_encoder_1_embeddings(),
+        )
+        model.embedding_wrapper_2 = AdditionalEmbeddingWrapper(
+            tokenizer=model.tokenizer_2,
+            orig_module=orig_module_2,
+            embeddings=model.all_text_encoder_2_embeddings(),
+        )
 
         model.embedding_wrapper_1.hook_to_module()
         model.embedding_wrapper_2.hook_to_module()

I also had to patch Diffusers with this PR: huggingface/diffusers#13843

dxqb added a commit that referenced this pull request Jun 14, 2026
…16) into preview

# Conflicts:
#	modules/ui/ConceptWindow.py
@dxqb

dxqb commented Jun 14, 2026

Copy link
Copy Markdown
Collaborator Author

Considered some shortcuts to avoid the full CLIP-compat work here. Updated summary, now that Lens (#1510) is parked — it has no pyproject.toml so it can't be pip-installed, and pulling it from a fork isn't a dependency worth owning:

Option Verdict
Korata's patch (hasattr(..., "text_model") branch) Smoke test only — fixes one .text_model access site (the SDXL embedding wrapper) so SDXL trains, but silently ships the LoRA-key churn (#4) and flattened save format (#5). Looks fixed, corrupts ecosystem-format output. Not mergeable as the fix.
Shim reproducing the old non-flat CLIP Higher recurring maintenance than the planned .text_model LoRA-key prefix (#4) — you'd own a CLIP subclass fighting v5 internals every release — and it doesn't address loading (#1) or SD1.x single-file (#6). Skip.
Pin to the last pre-flatten transformers (5.5.4) Viable interim, now that Lens is parked. The only remaining v5-forcer is Ideogram (#1522), which needs only ≥5.0. Verified the whole Ideogram stack runs on 5.5.4: the diffusers Ernie pipeline (create_causal_mask(inputs_embeds=...) and the Ernie4.5 model are both present), hub 1.16 is compatible (huggingface-hub>=1.5.0,<2.0), and diffusers 9a0aaba36 declares only transformers>=4.41.2. Pinning <5.6 sidesteps CLIP flattening entirely — #2#6 and the registry-renaming half of #1 are not needed.
Move CLIP loading onto from_pretrained + diffusers quant backends (#1521) Strategic fix. #1, #3, #5 and the registry-renaming half of #1 exist only because OneTrainer bypasses from_pretrained and reaches into transformers' module layout for custom quantized/dtype loading; on from_pretrained they collapse (it does the old↔flat key conversion + weight tying itself). Leaves only #4, solved the clean way: flat internal LoRA keys + an explicit text_model. conversion table at the kohya/ComfyUI save/load boundary (plus a back-compat read path for OneTrainer LoRAs saved with the old internal keys). Makes the eventual ≥5.6 jump nearly free instead of a 6-item migration.
Full 6-item CLIP-compat plan (this PR's scope) Deferred, not cancelled. Required the moment a ≥5.6 model actually lands (Lens once installable, or a future model). Worth keeping the WIP on this branch for then.

Note the 5.5.4 pin doesn't make all of #1506 disappear — the general v5 migration still has to happen regardless of the transformers version: weight re-tying after manual load (the T5 shared/embed_tokens-on-meta-device / Chroma failure is a v5.0 change, not 5.6-specific), the embedding-training rework that replaced the removed Trie, and the hub-1.16 / xet / thread_safety cleanups from #1472. But the painful, breakage-prone CLIP work comes out.

Drafted by Claude

5.5.4 is the last release before CLIP flattening in 5.6, which avoids
the full CLIP-compat migration while still picking up the general v5
fixes from #1472 (Trie removal, thread-safety, hub 1.16/xet cleanup).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Picks up CLIPTextModel flattening (5.6+), which requires the
CLIP-compat migration described in PR #1506.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@dxqb dxqb changed the title Upgrade transformers to 5.9 and huggingface-hub to 1.16 (#1472) Upgrade transformers to 5.9 Jun 14, 2026
@dxqb dxqb mentioned this pull request Jun 15, 2026
3 tasks
@dxqb

dxqb commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator Author

Follow-up on the 5.5.4 pin (#1524): Lens (#1510) doesn't push the transformers floor past 5.5.4 either. The dxqb/Lens fork it's installed from declares no transformers dependency, so Microsoft's reference requirements.txt pin (transformers==5.8.0) never applies to this install path — the ported LensGptOssEncoder runs fine on 5.5.4, verified with a real forward pass.

Drafted by Claude

@dxqb dxqb removed the preview merged in the preview branch label Jun 16, 2026
@dxqb

dxqb commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator Author

this PR is empty now, because all code changes merged with #1524
remaining issues for upgrading to > 5.5.4 are linked in requirements-global.txt

@dxqb dxqb closed this Jun 18, 2026
@dxqb dxqb deleted the revert-1504-revert-transformers-v5 branch June 18, 2026 19:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants