Quantization is extremely useful and FSDP2 has lots of nice features. It would be great if we could please have the option to do QLoRA so that we can fully transition from FSDP1 to FSDP2.
Example script that fails with FSDP2 but not FSDP1 (modulo some FSDP1-specific setup):
import torch
from accelerate import Accelerator
from peft import LoraConfig, get_peft_model
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
# Initialize accelerator
accelerator = Accelerator()
# Load model and tokenizer
model_name = "Qwen/Qwen3-0.6B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
peft_config = LoraConfig(
r=16,
lora_alpha=32,
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM",
)
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype="bfloat16",
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_storage="bfloat16",
)
model = AutoModelForCausalLM.from_pretrained(
model_name, quantization_config=quantization_config, torch_dtype=torch.bfloat16, use_cache=False
)
model = get_peft_model(model, peft_config, autocast_adapter_dtype=False)
# Tokenize input
inputs = tokenizer("Hello, World!", return_tensors="pt")
inputs = {k: v.to(accelerator.device) for k, v in inputs.items()}
# Create dummy optimizer for the submodule (required for FSDP2)
optimizer = torch.optim.AdamW(model.parameters(), lr=1e-4)
# Prepare the model and optimizer (FSDP2 requires both model and optimizer)
prepared_model, optimizer = accelerator.prepare(model, optimizer)
Quantization is extremely useful and FSDP2 has lots of nice features. It would be great if we could please have the option to do QLoRA so that we can fully transition from FSDP1 to FSDP2.
Example script that fails with FSDP2 but not FSDP1 (modulo some FSDP1-specific setup):