Skip to content

precision mismatch and loss definition in SAMTok tokenizer training #134

@Mr-Sun123

Description

@Mr-Sun123

When training the tokenizer (e.g., vq_sam2_256x2_unshare.py) with A100 and xTuner, I observed a precision mismatch between model parameters and input tensors, even though the optimizer is configured to use FP32. It appears that xTuner automatically casts parts of the model (e.g., embedding tables) into BF16 for efficiency, while certain inputs or intermediate features remain in FP32. This leads to runtime errors in operations involving matrix multiplication (@ or torch.addmm), such as:
vq_sam2_256x2_unshare.py
Image

sam2.py
Image

Image

I also noticed that the training script doesn’t explicitly define a codebook loss.

Image

These changes allow training to proceed successfully. However, I’d like to ask:
Is this the intended way to handle mixed precision in your setup?
How do you typically manage precision when using xTuner + A100?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions