Description
Training HiDream LoRA with the built-in #hidream LoRA preset crashes immediately on startup with:
File "modules/modelSetup/HiDreamLoRASetup.py", line 216, in setup_train_device
model.transformer_to(self.train_device)
File "modules/model/HiDreamModel.py", line 246, in transformer_to
self.transformer_offload_conductor.to(device)
File "modules/util/LayerOffloadConductor.py", line 649, in to
self.__offload_strategy = LayerOffloadStrategy(self.__layers, self.__layer_offload_fraction)
File "modules/util/LayerOffloadConductor.py", line 432, in __init__
self.max_loaded_bytes = max(sum([layer_bytes[i] for i in loaded_layers]) for loaded_layers in all_loaded_layers)
ValueError: max() iterable argument is empty
Root cause
This is a regression introduced by #1114 ("torch.compile support"), not a diffusers change.
In diffusers, HiDreamImageTransformer2DModel.double_stream_blocks / .single_stream_blocks are nn.ModuleLists of HiDreamBlock wrapper objects (HiDreamBlock(self, block: HiDreamImageTransformerBlock | HiDreamImageSingleTransformerBlock)), not the bare block types directly. This wrapper has existed since HiDream was first added to diffusers (PR huggingface/diffusers#11231) — it is not a recent diffusers change.
Before #1114, modules/util/checkpointing_util.py found HiDream's transformer blocks by walking every submodule (model.modules()) and checking isinstance(child_module, HiDreamImageTransformerBlock) directly. This correctly found the blocks nested inside HiDreamBlock.block, regardless of the wrapper.
#1114 refactored this into the generic enable_checkpointing() helper, which instead looks for an nn.ModuleList whose first element is directly isinstance(x, t):
for child_module in model.modules():
if isinstance(child_module, nn.ModuleList) and isinstance(child_module[0], t):
...
For HiDream, child_module[0] is a HiDreamBlock, not a HiDreamImageTransformerBlock/HiDreamImageSingleTransformerBlock, so this never matches. As a result, since #1114, no layers are ever registered with the LayerOffloadConductor for the HiDream transformer — self.__layers stays empty, and LayerOffloadStrategy.__init__ then calls max() over an empty sequence, raising the ValueError. This reproduces whenever gradient checkpointing is enabled for HiDream (e.g. the default #hidream LoRA preset), independent of layer_offload_fraction.
Verified empirically by instantiating HiDreamImageTransformer2DModel directly and checking every ModuleList in the tree — no ModuleList anywhere has elements of type HiDreamImageTransformerBlock/HiDreamImageSingleTransformerBlock directly.
Steps to reproduce
Run HiDream LoRA training with gradient checkpointing enabled (the default #hidream LoRA preset, layer_offload_fraction=0.5).
Suggested fix
In enable_checkpointing_for_hi_dream_transformer (modules/util/checkpointing_util.py), match HiDreamBlock and unwrap .block, or target model.double_stream_blocks / model.single_stream_blocks directly with the wrapper type, so the generic enable_checkpointing() matcher introduced in #1114 works correctly for HiDream's wrapped block structure.
Drafted by Claude
Description
Training HiDream LoRA with the built-in
#hidream LoRApreset crashes immediately on startup with:Root cause
This is a regression introduced by #1114 ("torch.compile support"), not a diffusers change.
In diffusers,
HiDreamImageTransformer2DModel.double_stream_blocks/.single_stream_blocksarenn.ModuleLists ofHiDreamBlockwrapper objects (HiDreamBlock(self, block: HiDreamImageTransformerBlock | HiDreamImageSingleTransformerBlock)), not the bare block types directly. This wrapper has existed since HiDream was first added to diffusers (PR huggingface/diffusers#11231) — it is not a recent diffusers change.Before #1114,
modules/util/checkpointing_util.pyfound HiDream's transformer blocks by walking every submodule (model.modules()) and checkingisinstance(child_module, HiDreamImageTransformerBlock)directly. This correctly found the blocks nested insideHiDreamBlock.block, regardless of the wrapper.#1114 refactored this into the generic
enable_checkpointing()helper, which instead looks for annn.ModuleListwhose first element is directlyisinstance(x, t):For HiDream,
child_module[0]is aHiDreamBlock, not aHiDreamImageTransformerBlock/HiDreamImageSingleTransformerBlock, so this never matches. As a result, since #1114, no layers are ever registered with theLayerOffloadConductorfor the HiDream transformer —self.__layersstays empty, andLayerOffloadStrategy.__init__then callsmax()over an empty sequence, raising theValueError. This reproduces whenever gradient checkpointing is enabled for HiDream (e.g. the default#hidream LoRApreset), independent oflayer_offload_fraction.Verified empirically by instantiating
HiDreamImageTransformer2DModeldirectly and checking everyModuleListin the tree — noModuleListanywhere has elements of typeHiDreamImageTransformerBlock/HiDreamImageSingleTransformerBlockdirectly.Steps to reproduce
Run HiDream LoRA training with gradient checkpointing enabled (the default
#hidream LoRApreset,layer_offload_fraction=0.5).Suggested fix
In
enable_checkpointing_for_hi_dream_transformer(modules/util/checkpointing_util.py), matchHiDreamBlockand unwrap.block, or targetmodel.double_stream_blocks/model.single_stream_blocksdirectly with the wrapper type, so the genericenable_checkpointing()matcher introduced in #1114 works correctly for HiDream's wrapped block structure.Drafted by Claude