Upgrade transformers to 5.5.4 and huggingface-hub to 1.16#1524
Merged
Conversation
…16 (Nerogar#147…" This reverts commit 574ec55.
5.5.4 is the last release before CLIP flattening in 5.6, which avoids the full CLIP-compat migration while still picking up the general v5 fixes from Nerogar#1472 (Trie removal, thread-safety, hub 1.16/xet cleanup). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This comment was marked as resolved.
This comment was marked as resolved.
HFModelLoaderMixin checked sub_module._checkpoint_conversion_mapping for
legacy checkpoint key renaming, but that attribute is just an empty {}
declared on the base PreTrainedModel class in this transformers version.
The actual renaming rules now live in transformers' central conversion
registry. Use get_checkpoint_conversion_mapping()/rename_source_key()
instead, which correctly remaps Ernie's Mistral3 text encoder
(language_model.model.* -> language_model.*) and Qwen's Qwen2_5_VL text
encoder (model.* -> model.language_model.*, visual.* -> model.visual.*).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This comment was marked as resolved.
This comment was marked as resolved.
T5EncoderModel's encoder.embed_tokens.weight is tied to shared.weight, saved only once in the checkpoint. The manual loading path in HFModelLoaderMixin.__load_sub_module only assigns tensors for keys present in the state dict, so the tied key was left an empty meta tensor, crashing Chroma's text_encoder, Flux's text_encoder_2 and SD3's text_encoder_3 with "Cannot copy out of meta tensor; no data!". Generalize the fix in __load_sub_module: for every _tied_weights_keys entry still on the meta device, clone the already-loaded, already dtype-converted source parameter into it. Cloning (rather than aliasing via the real tie_weights()) keeps the two parameters as separate objects, so a later in-place quantization of one side (e.g. quantize_layers() quantizing a quantized lm_head) can't silently corrupt the other (e.g. the embedding table) through a shared Parameter object -- verified empirically. This subsumes the per-loader manual workaround already used for Qwen3-based causal LM text encoders (Flux2, ZImage), which is removed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
get_checkpoint_conversion_mapping(sub_module.config.model_type), added to fix Ernie/Qwen text encoder loading, ran unconditionally in __load_sub_module. Diffusers sub-modules (loaded via _load_diffusers_sub_module, e.g. Anima's VAE) have a plain FrozenDict config with no model_type attribute, crashing with AttributeError. Diffusers has no such checkpoint-conversion registry and never needs this renaming, so skip it when the config lacks model_type. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
dxqb
commented
Jun 18, 2026
…hts check Addresses review comment on PR Nerogar#1524.
Collaborator
Author
|
smoke test passes on embeddings |
Collaborator
Author
|
LoRA smoke test passed via preview branch |
dxqb
added a commit
to dxqb/OneTrainer
that referenced
this pull request
Jun 27, 2026
The ctk view refactor moved ConceptWindow.__download_dataset into the new ConceptWindowController.download_dataset from a base predating PR Nerogar#1524, which silently reverted that PR's fix: the method again called huggingface_hub.login(token=..., new_session=False) with no empty-token guard. new_session was removed from login() in huggingface-hub 1.16, so every call raised TypeError, swallowed by the surrounding except, so snapshot_download never ran and dataset download was fully broken. Re-apply Nerogar#1524 at the new location: only login when a token is configured, and drop new_session. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
reopens #1472 but only upgrades to 5.5.4, right before huggingface/transformers#44431 was merged to avoid the issues described in #1506
Test plan
pre-commit run --all-filespassesAI assistance