New model: Microsoft Lens#1510
Conversation
…model composition in ModelType - Gradient checkpointing and layer offloading are now configured per component (text encoder, transformer, VAE) rather than globally - ModelType centralizes model composition and training method associations Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…16 (Nerogar#147…" This reverts commit 574ec55.
Introduces OnDemandModule, a persistent delegating proxy for text encoders that must be loaded on demand and freed after use rather than parked on the CPU temp device. Adds load_on_demand per-component config and four text_encoder_N_on_demand() resolvers in TrainConfig. BaseModel.to(device) is removed as an abstract method; release() is now the sole abstract method for parking a model. Each concrete model reads self.train_config.temp_device directly. Call sites in modelSetup, dataLoader, trainer, and SampleWindow are updated to model.release(). Co-Authored-By: dxqb <183307934+dxqb@users.noreply.github.com> Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Did a bit of testing with this PR:
What doesn’t work:
Loss starts on a pretty high level (~0.6). A longer run would be needed to see if that is a problem or not. But to do a longer run I would want common inference tools support the LoRA format first. |
torch.compile: that's a torch bug, introduced in torch 2.12: pytorch/pytorch#186537 |
Since torch 2.12, dynamo config overrides are stored in ContextVars (pytorch/pytorch#173568), which threads don't inherit. The recompile limit raised by init_compile() therefore no longer applied to the autograd worker threads, which recompute checkpointed modules during the backward pass - training crashed with FailOnRecompileLimitHit once enough input shapes accumulated, despite the raised limit. Call init_compile() in OffloadCheckpointLayer.__checkpointing_forward so the recomputation thread sets its own config. The non-offload CheckpointLayer is unaffected: it is compiled as a whole, so its recomputation runs inside the compiled backward without re-entering dynamo. The Mod.eval patch is applied once at import time instead. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Updated comments regarding the handling of negative numbers in Mod.eval and related issues.
Merged the bug workaround branch into this branch #1511 |
Yes, the workaround helps. Now both transformer blocks as well as the optimizer can be compiled. |
Move the per-callsite checkpointing_or_offloading_enabled() guard into enable_checkpointing() itself, so every Base*Setup can call enable_checkpointing_for_* unconditionally. Also extend the central gate to allow a compile-only path (no checkpointing/offloading, but still per-layer torch.compile wrapping) when config.compile is set. Three direct diffusers enable_gradient_checkpointing() calls (SD/SDXL unet, Wuerstchen v2 prior) keep their explicit guard since they bypass this central mechanism.
…odels The gating now lives inside enable_checkpointing() after the split-offload merge, so the per-callsite guard here is redundant.
# Conflicts: # modules/modelSetup/BaseErnieSetup.py # modules/modelSetup/BaseWuerstchenSetup.py # modules/util/checkpointing_util.py # training_presets/#flux2 LoRA 8GB.json
# Conflicts: # modules/modelSetup/BaseErnieSetup.py # modules/modelSetup/BaseWuerstchenSetup.py
Mirrors upstream commit 75a44d2, which converted the rest of the codebase from the trailing factory.register() call to the @factory.register decorator form.
# Conflicts: # modules/modelLoader/mixin/HFModelLoaderMixin.py # requirements-global.txt
Mirrors Nerogar#1520 (Ernie): tokens_mask/text_encoder_hidden_state are fixed at PROMPT_MAX_LENGTH after EncodeLensText's crop, so caching wastes disk space on padding for shorter prompts. Prune before caching, pad back to PROMPT_MAX_LENGTH on load. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Audit fixes applied during merge:
- BaseLensSetup.py: 3/4-arg autocast helpers (drop stale weight_list arg)
- Lens{FineTune,LoRA}Setup.py: latent_caching -> image_caching/text_caching
- LensBaseDataLoader.py: latent_caching -> text_caching
- BaseModelTabView/BaseTrainingTabView/TopBarController/BaseConvertModelUIView: wire up Lens UI (no text-encoder layer offloading, GPT-OSS is always on-demand/MXFP4)
- test/run_lora_presets.sh: add Lens LoRA preset
- requirements-global.txt: pin mgds to combined anima+lens+ideogram preview branch (6f6518a)
Test in preview branch: https://github.com/Nerogar/OneTrainer/tree/preview
Summary
This PR adds Microsoft Lens: https://huggingface.co/microsoft/Lens
I'll leave this here as a draft for anyone who wants to experiment with this model, because it'll take a while until merge.
Open points:
Includes #1506 and #1509
Test plan
pre-commit run --all-filespassesAI assistance