✨ Is your feature request related to a problem?
Loading all 5 models takes ~3.3GB of VRAM.
On a 4GB GPU, the F.scaled_dot_product_attention step requires ~1.24GB, causing an unavoidable OOM crash.
💡 Describe the Solution You'd Like
By allowing device_map="auto" in the loaders, accelerate can seamlessly offload the massive Foundation model weights to system RAM, leaving the GPU free for the attention calculations.
✨ Is your feature request related to a problem?
Loading all 5 models takes ~3.3GB of VRAM.
On a 4GB GPU, the F.scaled_dot_product_attention step requires ~1.24GB, causing an unavoidable OOM crash.
💡 Describe the Solution You'd Like
By allowing device_map="auto" in the loaders, accelerate can seamlessly offload the massive Foundation model weights to system RAM, leaving the GPU free for the attention calculations.