When training the tokenizer (e.g., vq_sam2_256x2_unshare.py) with A100 and xTuner, I observed a precision mismatch between model parameters and input tensors, even though the optimizer is configured to use FP32. It appears that xTuner automatically casts parts of the model (e.g., embedding tables) into BF16 for efficiency, while certain inputs or intermediate features remain in FP32. This leads to runtime errors in operations involving matrix multiplication (@ or torch.addmm), such as:
vq_sam2_256x2_unshare.py

sam2.py

I also noticed that the training script doesn’t explicitly define a codebook loss.
These changes allow training to proceed successfully. However, I’d like to ask:
Is this the intended way to handle mixed precision in your setup?
How do you typically manage precision when using xTuner + A100?
When training the tokenizer (e.g., vq_sam2_256x2_unshare.py) with A100 and xTuner, I observed a precision mismatch between model parameters and input tensors, even though the optimizer is configured to use FP32. It appears that xTuner automatically casts parts of the model (e.g., embedding tables) into BF16 for efficiency, while certain inputs or intermediate features remain in FP32. This leads to runtime errors in operations involving matrix multiplication (@ or torch.addmm), such as:

vq_sam2_256x2_unshare.py
sam2.py

I also noticed that the training script doesn’t explicitly define a codebook loss.
These changes allow training to proceed successfully. However, I’d like to ask:
Is this the intended way to handle mixed precision in your setup?
How do you typically manage precision when using xTuner + A100?