precision mismatch and loss definition in SAMTok tokenizer training

When training the tokenizer (e.g., vq_sam2_256x2_unshare.py) with A100 and xTuner, I observed a precision mismatch between model parameters and input tensors, even though the optimizer is configured to use FP32. It appears that xTuner automatically casts parts of the model (e.g., embedding tables) into BF16 for efficiency, while certain inputs or intermediate features remain in FP32. This leads to runtime errors in operations involving matrix multiplication (@ or torch.addmm), such as:
vq_sam2_256x2_unshare.py
<img width="755" height="280" alt="Image" src="https://github.com/user-attachments/assets/896221a2-a412-4b9c-841b-52efbfef2ec4" />

sam2.py
<img width="879" height="412" alt="Image" src="https://github.com/user-attachments/assets/e099348a-52bd-40c1-a0b2-fc9d2a9a3415" />

<img width="795" height="605" alt="Image" src="https://github.com/user-attachments/assets/464957e8-3978-49ec-8d80-42061d5de7ba" />

I also noticed that the training script doesn’t explicitly define a codebook loss.

<img width="941" height="412" alt="Image" src="https://github.com/user-attachments/assets/768cc2ed-28a1-4d21-8dda-41d4987642b2" />

These changes allow training to proceed successfully. However, I’d like to ask:
Is this the intended way to handle mixed precision in your setup?
How do you typically manage precision when using xTuner + A100?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

precision mismatch and loss definition in SAMTok tokenizer training #134

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

precision mismatch and loss definition in SAMTok tokenizer training #134

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions