Skip to content

[BUG] The MHA-FP8 results from running CuteDSL on Thor are incorrect. #3318

@HuPengsheet

Description

@HuPengsheet

Which component has the problem?

CuTe DSL

Bug Report

Describe the bug
I am running on a Thor chip:
python /workspace/hupeng/cutlass-main/examples/python/CuTeDSL/cute/blackwell/kernel/attention/fmha/fmha.py --in_dtype Float8E4M3FN --out_dtype Float8E4M3FN
I found that the maximum error reaches 0.6. Is this a bit too large?

Steps/Code to reproduce bug
python /workspace/hupeng/cutlass-main/examples/python/CuTeDSL/cute/blackwell/kernel/attention/fmha/fmha.py --in_dtype Float8E4M3FN --out_dtype Float8E4M3FN

Expected behavior
I noticed that the error tolerance set in the code is 0.13 or 0.5. Would it be more reasonable for the error to be lower than these values? Could this large error be due to the specific characteristics of the Thor chip?

Environment details (please complete the following information):
nvidia-cutlass-dsl 4.5.2
nvidia-cutlass-dsl-libs-base 4.5.2
cuda-12.8

Additional context

Image

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions