Skip to content

New model: Microsoft Lens#1510

Draft
dxqb wants to merge 23 commits into
Nerogar:masterfrom
dxqb:lens
Draft

New model: Microsoft Lens#1510
dxqb wants to merge 23 commits into
Nerogar:masterfrom
dxqb:lens

Conversation

@dxqb

@dxqb dxqb commented Jun 6, 2026

Copy link
Copy Markdown
Collaborator

Test in preview branch: https://github.com/Nerogar/OneTrainer/tree/preview

Summary

This PR adds Microsoft Lens: https://huggingface.co/microsoft/Lens

I'll leave this here as a draft for anyone who wants to experiment with this model, because it'll take a while until merge.

Open points:

  • code review incomplete, there is still unchecked AI code
  • Claude: Open point — timestep-shift schedule doesn't match the paper's training μ (§2.3)

Includes #1506 and #1509

Test plan

  • pre-commit run --all-files passes
  • Launched the affected UI or script and exercised the change
  • Tested with at least one real preset / config when relevant (note which: Lens)

AI assistance

  • Early AI prototype — opened for discussion, not ready for review

dxqb and others added 8 commits May 25, 2026 18:11
…model composition in ModelType

- Gradient checkpointing and layer offloading are now configured per component
  (text encoder, transformer, VAE) rather than globally
- ModelType centralizes model composition and training method associations

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Introduces OnDemandModule, a persistent delegating proxy for text encoders
that must be loaded on demand and freed after use rather than parked on the
CPU temp device. Adds load_on_demand per-component config and four
text_encoder_N_on_demand() resolvers in TrainConfig.

BaseModel.to(device) is removed as an abstract method; release() is now
the sole abstract method for parking a model. Each concrete model reads
self.train_config.temp_device directly. Call sites in modelSetup,
dataLoader, trainer, and SampleWindow are updated to model.release().

Co-Authored-By: dxqb <183307934+dxqb@users.noreply.github.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@Silvicultor

Copy link
Copy Markdown

Did a bit of testing with this PR:
What works:

  • Basic LoRA training.
  • Sampling.

What doesn’t work:

  • Torch compile, unlike a similar bug with Anima PR this one is already triggered by compiling transformer blocks (not the optimizer). Attached the error log.
  • Loading created LoRA weights in ComfyUI.

Loss starts on a pretty high level (~0.6). A longer run would be needed to see if that is a problem or not. But to do a longer run I would want common inference tools support the LoRA format first.

Traceback (most recent call last):
  File "/home/user/AI/OT_Lens/OneTrainer/venv/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 1883, in _compile
    raise_unimplemented_cache_limit_exceeded()
  File "/home/user/AI/OT_Lens/OneTrainer/venv/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 1867, in raise_unimplemented_cache_limit_exceeded
    unimplemented(
  File "/home/user/AI/OT_Lens/OneTrainer/venv/lib/python3.12/site-packages/torch/_dynamo/exc.py", line 657, in unimplemented
    raise Unsupported(msg, gb_type, skip_frame)
torch._dynamo.exc.Unsupported: Dynamo recompile limit exceeded
  Explanation: Dynamo attempted to recompile the code object too many times, exceeding the recompile_limit cache size limit (currently set to 8). Excessive recompilations can degrade performance due to the compilation overhead of each recompilation.
  Hint: To monitor recompilations, enable TORCH_LOGS=recompiles. If recompilations are expected, consider increasing torch._dynamo.config.recompile_limit to an appropriate value.
  Hint: See https://docs.pytorch.org/docs/main/user_guide/torch_compiler/compile/programming_model.recompilation.html for tips on dealing with recompilations.

  Developer debug context: Limit type: recompile_limit

 For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0039.html

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/user/AI/OT_Lens/OneTrainer/modules/ui/TrainUI.py", line 739, in __training_thread_function
    trainer.train()
  File "/home/user/AI/OT_Lens/OneTrainer/modules/trainer/GenericTrainer.py", line 756, in train
    loss.backward()
  File "/home/user/AI/OT_Lens/OneTrainer/venv/lib/python3.12/site-packages/torch/_tensor.py", line 631, in backward
    torch.autograd.backward(
  File "/home/user/AI/OT_Lens/OneTrainer/venv/lib/python3.12/site-packages/torch/autograd/__init__.py", line 379, in backward
    _engine_run_backward(
  File "/home/user/AI/OT_Lens/OneTrainer/venv/lib/python3.12/site-packages/torch/autograd/graph.py", line 882, in _engine_run_backward
    return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/AI/OT_Lens/OneTrainer/venv/lib/python3.12/site-packages/torch/autograd/function.py", line 317, in apply
    return user_fn(self, *args)
           ^^^^^^^^^^^^^^^^^^^^
  File "/home/user/AI/OT_Lens/OneTrainer/venv/lib/python3.12/site-packages/torch/utils/checkpoint.py", line 314, in backward
    outputs = ctx.run_function(*detached_inputs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/AI/OT_Lens/OneTrainer/modules/util/checkpointing_util.py", line 124, in __checkpointing_forward
    output = self.orig_forward(*args) if self.checkpoint is None else self.checkpoint(*args)
                                                                      ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/AI/OT_Lens/OneTrainer/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
    return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/AI/OT_Lens/OneTrainer/venv/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1047, in compile_wrapper
    result = fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^
  File "/home/user/AI/OT_Lens/OneTrainer/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1789, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/AI/OT_Lens/OneTrainer/venv/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 2474, in __call__
    result = self._torchdynamo_orig_backend(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/AI/OT_Lens/OneTrainer/venv/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 736, in __call__
    result = _compile(
             ^^^^^^^^^
  File "/home/user/AI/OT_Lens/OneTrainer/venv/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 1890, in _compile
    raise FailOnRecompileLimitHit(
torch._dynamo.exc.FailOnRecompileLimitHit: Hard failure due to fullgraph=True
Creating Backup workspace/run/backup/2026-06-07_11-12-09-backup-3-0-3
Saving models/model.safetensors

@dxqb

dxqb commented Jun 7, 2026

Copy link
Copy Markdown
Collaborator Author

Did a bit of testing with this PR

torch.compile: that's a torch bug, introduced in torch 2.12: pytorch/pytorch#186537
will need a workaround as long as we use torch 2.12. You could downgrade to torch 2.11 in the meantime.
Comfy: Lens is new - when I wrote this code, Comfy didn't even support the model at all yet. Will take care of that when the PR is close to merge.
high loss: that's normal for the Flux2 VAE, which Lens uses.

dxqb and others added 3 commits June 7, 2026 14:12
Since torch 2.12, dynamo config overrides are stored in ContextVars
(pytorch/pytorch#173568), which threads don't inherit. The recompile
limit raised by init_compile() therefore no longer applied to the
autograd worker threads, which recompute checkpointed modules during
the backward pass - training crashed with FailOnRecompileLimitHit
once enough input shapes accumulated, despite the raised limit.

Call init_compile() in OffloadCheckpointLayer.__checkpointing_forward
so the recomputation thread sets its own config. The non-offload
CheckpointLayer is unaffected: it is compiled as a whole, so its
recomputation runs inside the compiled backward without re-entering
dynamo. The Mod.eval patch is applied once at import time instead.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Updated comments regarding the handling of negative numbers in Mod.eval and related issues.
@dxqb

dxqb commented Jun 7, 2026

Copy link
Copy Markdown
Collaborator Author

torch.compile: that's a torch bug, introduced in torch 2.12: pytorch/pytorch#186537 will need a workaround as long as we use torch 2.12. You could downgrade to torch 2.11 in the meantime.

Merged the bug workaround branch into this branch #1511
I didn't see the error even before this workaround, so if you could re-test that would be useful. Quite sure it's fixed though.

@Silvicultor

Copy link
Copy Markdown

torch.compile: that's a torch bug, introduced in torch 2.12: pytorch/pytorch#186537 will need a workaround as long as we use torch 2.12. You could downgrade to torch 2.11 in the meantime.

Merged the bug workaround branch into this branch #1511 I didn't see the error even before this workaround, so if you could re-test that would be useful. Quite sure it's fixed though.

Yes, the workaround helps. Now both transformer blocks as well as the optimizer can be compiled.

@dxqb dxqb added the preview merged in the preview branch label Jun 13, 2026
dxqb added 4 commits June 13, 2026 17:35
Move the per-callsite checkpointing_or_offloading_enabled() guard into
enable_checkpointing() itself, so every Base*Setup can call
enable_checkpointing_for_* unconditionally. Also extend the central gate
to allow a compile-only path (no checkpointing/offloading, but still
per-layer torch.compile wrapping) when config.compile is set.

Three direct diffusers enable_gradient_checkpointing() calls (SD/SDXL
unet, Wuerstchen v2 prior) keep their explicit guard since they bypass
this central mechanism.
…odels

The gating now lives inside enable_checkpointing() after the
split-offload merge, so the per-callsite guard here is redundant.
dxqb and others added 8 commits June 18, 2026 01:20
# Conflicts:
#	modules/modelSetup/BaseErnieSetup.py
#	modules/modelSetup/BaseWuerstchenSetup.py
#	modules/util/checkpointing_util.py
#	training_presets/#flux2 LoRA 8GB.json
# Conflicts:
#	modules/modelSetup/BaseErnieSetup.py
#	modules/modelSetup/BaseWuerstchenSetup.py
Mirrors upstream commit 75a44d2, which converted the rest of the
codebase from the trailing factory.register() call to the @factory.register
decorator form.
# Conflicts:
#	modules/modelLoader/mixin/HFModelLoaderMixin.py
#	requirements-global.txt
Mirrors Nerogar#1520 (Ernie): tokens_mask/text_encoder_hidden_state are fixed
at PROMPT_MAX_LENGTH after EncodeLensText's crop, so caching wastes
disk space on padding for shorter prompts. Prune before caching, pad
back to PROMPT_MAX_LENGTH on load.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
dxqb added a commit that referenced this pull request Jun 19, 2026
Audit fixes applied during merge:
- BaseLensSetup.py: 3/4-arg autocast helpers (drop stale weight_list arg)
- Lens{FineTune,LoRA}Setup.py: latent_caching -> image_caching/text_caching
- LensBaseDataLoader.py: latent_caching -> text_caching
- BaseModelTabView/BaseTrainingTabView/TopBarController/BaseConvertModelUIView: wire up Lens UI (no text-encoder layer offloading, GPT-OSS is always on-demand/MXFP4)
- test/run_lora_presets.sh: add Lens LoRA preset
- requirements-global.txt: pin mgds to combined anima+lens+ideogram preview branch (6f6518a)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

preview merged in the preview branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants