Skip to content

HiDream LoRA crashes with ValueError: max() iterable argument is empty in LayerOffloadConductor #1532

Description

@dxqb

Description

Training HiDream LoRA with the built-in #hidream LoRA preset crashes immediately on startup with:

File "modules/modelSetup/HiDreamLoRASetup.py", line 216, in setup_train_device
    model.transformer_to(self.train_device)
File "modules/model/HiDreamModel.py", line 246, in transformer_to
    self.transformer_offload_conductor.to(device)
File "modules/util/LayerOffloadConductor.py", line 649, in to
    self.__offload_strategy = LayerOffloadStrategy(self.__layers, self.__layer_offload_fraction)
File "modules/util/LayerOffloadConductor.py", line 432, in __init__
    self.max_loaded_bytes = max(sum([layer_bytes[i] for i in loaded_layers]) for loaded_layers in all_loaded_layers)
ValueError: max() iterable argument is empty

Root cause

This is a regression introduced by #1114 ("torch.compile support"), not a diffusers change.

In diffusers, HiDreamImageTransformer2DModel.double_stream_blocks / .single_stream_blocks are nn.ModuleLists of HiDreamBlock wrapper objects (HiDreamBlock(self, block: HiDreamImageTransformerBlock | HiDreamImageSingleTransformerBlock)), not the bare block types directly. This wrapper has existed since HiDream was first added to diffusers (PR huggingface/diffusers#11231) — it is not a recent diffusers change.

Before #1114, modules/util/checkpointing_util.py found HiDream's transformer blocks by walking every submodule (model.modules()) and checking isinstance(child_module, HiDreamImageTransformerBlock) directly. This correctly found the blocks nested inside HiDreamBlock.block, regardless of the wrapper.

#1114 refactored this into the generic enable_checkpointing() helper, which instead looks for an nn.ModuleList whose first element is directly isinstance(x, t):

for child_module in model.modules():
    if isinstance(child_module, nn.ModuleList) and isinstance(child_module[0], t):
        ...

For HiDream, child_module[0] is a HiDreamBlock, not a HiDreamImageTransformerBlock/HiDreamImageSingleTransformerBlock, so this never matches. As a result, since #1114, no layers are ever registered with the LayerOffloadConductor for the HiDream transformer — self.__layers stays empty, and LayerOffloadStrategy.__init__ then calls max() over an empty sequence, raising the ValueError. This reproduces whenever gradient checkpointing is enabled for HiDream (e.g. the default #hidream LoRA preset), independent of layer_offload_fraction.

Verified empirically by instantiating HiDreamImageTransformer2DModel directly and checking every ModuleList in the tree — no ModuleList anywhere has elements of type HiDreamImageTransformerBlock/HiDreamImageSingleTransformerBlock directly.

Steps to reproduce

Run HiDream LoRA training with gradient checkpointing enabled (the default #hidream LoRA preset, layer_offload_fraction=0.5).

Suggested fix

In enable_checkpointing_for_hi_dream_transformer (modules/util/checkpointing_util.py), match HiDreamBlock and unwrap .block, or target model.double_stream_blocks / model.single_stream_blocks directly with the wrapper type, so the generic enable_checkpointing() matcher introduced in #1114 works correctly for HiDream's wrapped block structure.


Drafted by Claude

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions