Upgrade transformers to 5.9 by dxqb · Pull Request #1506 · Nerogar/OneTrainer

dxqb · 2026-06-05T19:01:53Z

reopens #1472

This PR needs significantly more work, because transformers have screwed up any existing code that uses CLIP encoders on a non-surface level (such as applying LoRAs).

There are also minor issues to fix for T5 and the Qwen TEs, but the main one is CLIP. This is the projected work, but this could be incomplete:

Background: transformers 5.6 flattened CLIPTextModel — the text_model wrapper submodule is
gone; embeddings, encoder and final_layer_norm sit directly on the model. Checkpoints on disk
keep the old text_model.* key format, and from_pretrained translates via a new central
conversion registry. CLIPTextModelWithProjection still nests a text_model, so only
CLIPTextModel users are affected: SD1.x, SDXL TE1, Flux TE1, HunyuanVideo TE2, Würstchen.
OneTrainer breaks everywhere it bypasses from_pretrained or reaches into the module structure
directly.

1. Loading. HFModelLoaderMixin loads weights manually (for custom quantization and dtype
control) and only knows the v4-era conversion hooks, which v5 removed. It needs to apply the
renamings from transformers' own registry instead (get_model_conversion_mapping +
rename_source_key), with a fallback that keeps the original key when the rename doesn't match
the module — CLIPTextModelWithProjection shares the registry entry but still has the nested
layout. This will also restore old-checkpoint Qwen loading, whose v4 workaround silently died with
the upgrade. Weights must also be re-tied after manual loading: v5 only ties them in
from_pretrained, leaving tied params like T5's shared/embed_tokens on the meta device (the
Chroma failure).

2. diffusers upgrade. Bump the pin to current main to pull in diffusers #13843, which fixes
from_single_file for flattened CLIP — covers single-file loading inside diffusers (used by SDXL
and others).

3. Attribute access. Drop .text_model at the sites where the encoder is a flattened
CLIPTextModel; add a clip_util.text_transformer() helper for the two genuinely polymorphic
sites (encode_clip, and the Würstchen prior, which is a CLIPTextModel for v2 but a
WithProjection for Stable Cascade).

4. LoRA key compatibility. LoRA key names derive from module paths, so they will silently
change (lora_te1.encoder… instead of lora_te1.text_model.encoder…), breaking resume of
existing LoRAs and kohya/ComfyUI-compatible export. Wrap flattened text encoders with a
".text_model" prefix, which reproduces the previous key set exactly — conversion tables then
need no changes.

5. Saving. save_pretrained only writes old-format keys when the model carries the
_weight_conversions it got from from_pretrained; manually built models will write flattened
keys to disk, breaking external consumers of saved checkpoints. The loader must attach the
conversions it applied, so saved models keep the ecosystem-standard key format. The single-file
exporters (convert_sd/sdxl_diffusers_to_ckpt) need the text_model segment re-added to their
output keys.

6. SD1.x single-file loading. This goes through diffusers' legacy converter, which upstream
did not fix. Normalize checkpoint keys to the flattened layout before conversion — making the old
NAI key fix implicit — and build the SD2 text encoder with the fixed modern single-file
implementation, injecting it into the legacy function to bypass its broken hardcoded conversion.
Fix the .ckpt fallback, currently dead due to a missing argument, in passing.

…16 (#147…" This reverts commit 574ec55.

Koratahiu · 2026-06-14T16:07:09Z

Even with this, SDXL loading is still bugged. I had to add a few lines:

--- modules/modelSetup/BaseStableDiffusionXLSetup.py
+++ modules/modelSetup/BaseStableDiffusionXLSetup.py
@@ -142,14 +142,26 @@
     def _setup_embedding_wrapper(
             self,
             model: StableDiffusionXLModel,
             config: TrainConfig,
     ):
-        model.embedding_wrapper_1 = AdditionalEmbeddingWrapper(
-            tokenizer=model.tokenizer_1,
-            orig_module=model.text_encoder_1.text_model.embeddings.token_embedding,
-            embeddings=model.all_text_encoder_1_embeddings(),
-        )
-        model.embedding_wrapper_2 = AdditionalEmbeddingWrapper(
-            tokenizer=model.tokenizer_2,
-            orig_module=model.text_encoder_2.text_model.embeddings.token_embedding,
-            embeddings=model.all_text_encoder_2_embeddings(),
-        )
+        if hasattr(model.text_encoder_1, "text_model"):
+            orig_module_1 = model.text_encoder_1.text_model.embeddings.token_embedding
+        else:
+            orig_module_1 = model.text_encoder_1.embeddings.token_embedding
+
+        if hasattr(model.text_encoder_2, "text_model"):
+            orig_module_2 = model.text_encoder_2.text_model.embeddings.token_embedding
+        else:
+            orig_module_2 = model.text_encoder_2.embeddings.token_embedding
+
+        model.embedding_wrapper_1 = AdditionalEmbeddingWrapper(
+            tokenizer=model.tokenizer_1,
+            orig_module=orig_module_1,
+            embeddings=model.all_text_encoder_1_embeddings(),
+        )
+        model.embedding_wrapper_2 = AdditionalEmbeddingWrapper(
+            tokenizer=model.tokenizer_2,
+            orig_module=orig_module_2,
+            embeddings=model.all_text_encoder_2_embeddings(),
+        )
 
         model.embedding_wrapper_1.hook_to_module()
         model.embedding_wrapper_2.hook_to_module()

I also had to patch Diffusers with this PR: huggingface/diffusers#13843

…16) into preview # Conflicts: # modules/ui/ConceptWindow.py

dxqb · 2026-06-14T19:40:16Z

Considered some shortcuts to avoid the full CLIP-compat work here. Updated summary, now that Lens (#1510) is parked — it has no pyproject.toml so it can't be pip-installed, and pulling it from a fork isn't a dependency worth owning:

Option	Verdict
Korata's patch (`hasattr(..., "text_model")` branch)	Smoke test only — fixes one `.text_model` access site (the SDXL embedding wrapper) so SDXL trains, but silently ships the LoRA-key churn (#4) and flattened save format (#5). Looks fixed, corrupts ecosystem-format output. Not mergeable as the fix.
Shim reproducing the old non-flat CLIP	Higher recurring maintenance than the planned `.text_model` LoRA-key prefix (#4) — you'd own a CLIP subclass fighting v5 internals every release — and it doesn't address loading (#1) or SD1.x single-file (#6). Skip.
Pin to the last pre-flatten transformers (`5.5.4`)	Viable interim, now that Lens is parked. The only remaining v5-forcer is Ideogram (#1522), which needs only ≥5.0. Verified the whole Ideogram stack runs on 5.5.4: the diffusers Ernie pipeline (`create_causal_mask(inputs_embeds=...)` and the Ernie4.5 model are both present), hub 1.16 is compatible (`huggingface-hub>=1.5.0,<2.0`), and diffusers `9a0aaba36` declares only `transformers>=4.41.2`. Pinning `<5.6` sidesteps CLIP flattening entirely — #2–#6 and the registry-renaming half of #1 are not needed.
Move CLIP loading onto `from_pretrained` + diffusers quant backends (#1521)	Strategic fix. #1, #3, #5 and the registry-renaming half of #1 exist only because OneTrainer bypasses `from_pretrained` and reaches into transformers' module layout for custom quantized/dtype loading; on `from_pretrained` they collapse (it does the old↔flat key conversion + weight tying itself). Leaves only #4, solved the clean way: flat internal LoRA keys + an explicit `text_model.` conversion table at the kohya/ComfyUI save/load boundary (plus a back-compat read path for OneTrainer LoRAs saved with the old internal keys). Makes the eventual ≥5.6 jump nearly free instead of a 6-item migration.
Full 6-item CLIP-compat plan (this PR's scope)	Deferred, not cancelled. Required the moment a ≥5.6 model actually lands (Lens once installable, or a future model). Worth keeping the WIP on this branch for then.

Note the 5.5.4 pin doesn't make all of #1506 disappear — the general v5 migration still has to happen regardless of the transformers version: weight re-tying after manual load (the T5 shared/embed_tokens-on-meta-device / Chroma failure is a v5.0 change, not 5.6-specific), the embedding-training rework that replaced the removed Trie, and the hub-1.16 / xet / thread_safety cleanups from #1472. But the painful, breakage-prone CLIP work comes out.

Drafted by Claude

5.5.4 is the last release before CLIP flattening in 5.6, which avoids the full CLIP-compat migration while still picking up the general v5 fixes from #1472 (Trie removal, thread-safety, hub 1.16/xet cleanup). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Picks up CLIPTextModel flattening (5.6+), which requires the CLIP-compat migration described in PR #1506. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

dxqb · 2026-06-15T21:46:50Z

Follow-up on the 5.5.4 pin (#1524): Lens (#1510) doesn't push the transformers floor past 5.5.4 either. The dxqb/Lens fork it's installed from declares no transformers dependency, so Microsoft's reference requirements.txt pin (transformers==5.8.0) never applies to this install path — the ported LensGptOssEncoder runs fine on 5.5.4, verified with a real forward pass.

Drafted by Claude

dxqb · 2026-06-18T19:26:07Z

this PR is empty now, because all code changes merged with #1524
remaining issues for upgrading to > 5.5.4 are linked in requirements-global.txt

Revert "Revert "Upgrade transformers to 5.9 and huggingface-hub to 1.…

2f0620b

…16 (#147…" This reverts commit 574ec55.

dxqb mentioned this pull request Jun 6, 2026

New model: Microsoft Lens #1510

Draft

5 tasks

dxqb mentioned this pull request Jun 13, 2026

Please reconsider transformers>=5.0.0 huggingface/diffusers#13935

Open

dxqb added the preview merged in the preview branch label Jun 13, 2026

dxqb added a commit that referenced this pull request Jun 14, 2026

Merge PR #1506 (Upgrade transformers to 5.9 and huggingface-hub to 1.…

308c0e0

…16) into preview # Conflicts: # modules/ui/ConceptWindow.py

dxqb mentioned this pull request Jun 14, 2026

Consider reverting CLIP refactor huggingface/transformers#46644

Open

dxqb mentioned this pull request Jun 14, 2026

Upgrade transformers to 5.5.4 and huggingface-hub to 1.16 #1524

Merged

3 tasks

Bump transformers to 5.9.0

875ff99

Picks up CLIPTextModel flattening (5.6+), which requires the CLIP-compat migration described in PR #1506. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

dxqb changed the title ~~Upgrade transformers to 5.9 and huggingface-hub to 1.16 (#1472)~~ Upgrade transformers to 5.9 Jun 14, 2026

dxqb mentioned this pull request Jun 15, 2026

Transformers fix #1526

Closed

3 tasks

dxqb removed the preview merged in the preview branch label Jun 16, 2026

Merge branch 'master' into revert-1504-revert-transformers-v5

3fd2f8d

dxqb closed this Jun 18, 2026

dxqb deleted the revert-1504-revert-transformers-v5 branch June 18, 2026 19:26

dxqb mentioned this pull request Jun 21, 2026

Fix CLIPTextModel compatibility with transformers > 5.5 #1548

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Upgrade transformers to 5.9#1506

Upgrade transformers to 5.9#1506
dxqb wants to merge 4 commits into
masterfrom
revert-1504-revert-transformers-v5

dxqb commented Jun 5, 2026

Uh oh!

Koratahiu commented Jun 14, 2026

Uh oh!

dxqb commented Jun 14, 2026 •

edited

Loading

Uh oh!

dxqb commented Jun 15, 2026

Uh oh!

dxqb commented Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

dxqb commented Jun 5, 2026

Uh oh!

Koratahiu commented Jun 14, 2026

Uh oh!

dxqb commented Jun 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dxqb commented Jun 15, 2026

Uh oh!

dxqb commented Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dxqb commented Jun 14, 2026 •

edited

Loading