Skip to content

[FEAT] Support device_map="auto" (Accelerate) for Low-VRAM GPUs #495

@LVladymyr

Description

@LVladymyr

✨ Is your feature request related to a problem?

Loading all 5 models takes ~3.3GB of VRAM.

On a 4GB GPU, the F.scaled_dot_product_attention step requires ~1.24GB, causing an unavoidable OOM crash.

💡 Describe the Solution You'd Like

By allowing device_map="auto" in the loaders, accelerate can seamlessly offload the massive Foundation model weights to system RAM, leaving the GPU free for the attention calculations.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions