Upgrade transformers to 5.9#1506
Conversation
|
Even with this, SDXL loading is still bugged. I had to add a few lines: I also had to patch Diffusers with this PR: huggingface/diffusers#13843 |
…16) into preview # Conflicts: # modules/ui/ConceptWindow.py
|
Considered some shortcuts to avoid the full CLIP-compat work here. Updated summary, now that Lens (#1510) is parked — it has no
Note the Drafted by Claude |
5.5.4 is the last release before CLIP flattening in 5.6, which avoids the full CLIP-compat migration while still picking up the general v5 fixes from #1472 (Trie removal, thread-safety, hub 1.16/xet cleanup). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Picks up CLIPTextModel flattening (5.6+), which requires the CLIP-compat migration described in PR #1506. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Follow-up on the Drafted by Claude |
|
this PR is empty now, because all code changes merged with #1524 |
reopens #1472
This PR needs significantly more work, because transformers have screwed up any existing code that uses CLIP encoders on a non-surface level (such as applying LoRAs).
There are also minor issues to fix for T5 and the Qwen TEs, but the main one is CLIP. This is the projected work, but this could be incomplete:
Background: transformers 5.6 flattened
CLIPTextModel— thetext_modelwrapper submodule isgone;
embeddings,encoderandfinal_layer_normsit directly on the model. Checkpoints on diskkeep the old
text_model.*key format, andfrom_pretrainedtranslates via a new centralconversion registry.
CLIPTextModelWithProjectionstill nests atext_model, so onlyCLIPTextModelusers are affected: SD1.x, SDXL TE1, Flux TE1, HunyuanVideo TE2, Würstchen.OneTrainer breaks everywhere it bypasses
from_pretrainedor reaches into the module structuredirectly.
1. Loading.
HFModelLoaderMixinloads weights manually (for custom quantization and dtypecontrol) and only knows the v4-era conversion hooks, which v5 removed. It needs to apply the
renamings from transformers' own registry instead (
get_model_conversion_mapping+rename_source_key), with a fallback that keeps the original key when the rename doesn't matchthe module —
CLIPTextModelWithProjectionshares the registry entry but still has the nestedlayout. This will also restore old-checkpoint Qwen loading, whose v4 workaround silently died with
the upgrade. Weights must also be re-tied after manual loading: v5 only ties them in
from_pretrained, leaving tied params like T5'sshared/embed_tokenson the meta device (theChroma failure).
2. diffusers upgrade. Bump the pin to current main to pull in diffusers #13843, which fixes
from_single_filefor flattened CLIP — covers single-file loading inside diffusers (used by SDXLand others).
3. Attribute access. Drop
.text_modelat the sites where the encoder is a flattenedCLIPTextModel; add aclip_util.text_transformer()helper for the two genuinely polymorphicsites (
encode_clip, and the Würstchen prior, which is aCLIPTextModelfor v2 but aWithProjectionfor Stable Cascade).4. LoRA key compatibility. LoRA key names derive from module paths, so they will silently
change (
lora_te1.encoder…instead oflora_te1.text_model.encoder…), breaking resume ofexisting LoRAs and kohya/ComfyUI-compatible export. Wrap flattened text encoders with a
".text_model"prefix, which reproduces the previous key set exactly — conversion tables thenneed no changes.
5. Saving.
save_pretrainedonly writes old-format keys when the model carries the_weight_conversionsit got fromfrom_pretrained; manually built models will write flattenedkeys to disk, breaking external consumers of saved checkpoints. The loader must attach the
conversions it applied, so saved models keep the ecosystem-standard key format. The single-file
exporters (
convert_sd/sdxl_diffusers_to_ckpt) need thetext_modelsegment re-added to theiroutput keys.
6. SD1.x single-file loading. This goes through diffusers' legacy converter, which upstream
did not fix. Normalize checkpoint keys to the flattened layout before conversion — making the old
NAI key fix implicit — and build the SD2 text encoder with the fixed modern single-file
implementation, injecting it into the legacy function to bypass its broken hardcoded conversion.
Fix the
.ckptfallback, currently dead due to a missing argument, in passing.