[FEAT] Support device_map="auto" (Accelerate) for Low-VRAM GPUs

## ✨ Is your feature request related to a problem?

Loading all 5 models takes ~3.3GB of VRAM.

On a 4GB GPU, the F.scaled_dot_product_attention step requires ~1.24GB, causing an unavoidable OOM crash.

## 💡 Describe the Solution You'd Like
By allowing device_map="auto" in the loaders, accelerate can seamlessly offload the massive Foundation model weights to system RAM, leaving the GPU free for the attention calculations.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEAT] Support device_map="auto" (Accelerate) for Low-VRAM GPUs #495

✨ Is your feature request related to a problem?

💡 Describe the Solution You'd Like

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FEAT] Support device_map="auto" (Accelerate) for Low-VRAM GPUs #495

Description

✨ Is your feature request related to a problem?

💡 Describe the Solution You'd Like

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions