[BUG] The MHA-FP8 results from running CuteDSL on Thor are incorrect.

### Which component has the problem?

CuTe DSL

### Bug Report

**Describe the bug**
I am running on a Thor chip:
`python /workspace/hupeng/cutlass-main/examples/python/CuTeDSL/cute/blackwell/kernel/attention/fmha/fmha.py --in_dtype Float8E4M3FN --out_dtype Float8E4M3FN`
I found that the maximum error reaches 0.6. Is this a bit too large?

**Steps/Code to reproduce bug**
python /workspace/hupeng/cutlass-main/examples/python/CuTeDSL/cute/blackwell/kernel/attention/fmha/fmha.py --in_dtype Float8E4M3FN --out_dtype Float8E4M3FN

**Expected behavior**
I noticed that the error tolerance set in the code is 0.13 or 0.5. Would it be more reasonable for the error to be lower than these values? Could this large error be due to the specific characteristics of the Thor chip?

**Environment details (please complete the following information):**
nvidia-cutlass-dsl           4.5.2
nvidia-cutlass-dsl-libs-base 4.5.2
cuda-12.8


**Additional context**

<img width="1768" height="1107" alt="Image" src="https://github.com/user-attachments/assets/448bd39a-8c34-4231-baa3-ca519412c5d9" />


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] The MHA-FP8 results from running CuteDSL on Thor are incorrect. #3318

Which component has the problem?

Bug Report

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[BUG] The MHA-FP8 results from running CuteDSL on Thor are incorrect. #3318

Description

Which component has the problem?

Bug Report

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions