[EP] Add SM80/A100 build support with SM90 feature guards#963
Conversation
|
@manojgop does this fix the issues you mentioned in https://uccl-dev.slack.com/archives/C0B2FQHQNTT/p1779093594675089?thread_ts=1778975124.159549&cid=C0B2FQHQNTT? I do not have A100 machines in hand, thus cannot verify. But I have tested on NV H100 + EFA and AMD MI325x + Broadcom, both working. @MaoZiming @zhenhuang12, can any of you also review it? |
@bkpathak I've fixed the build for cu13 and added additional changes to verify A100 with low latency mode for inter node and intra node in a two node setup with two A100 GPU per node. You may take this commit and update the PR #963 --> manojgop@f9dab2e. Note: I've tested this with Intel RDMA NIC |
|
Sure, will send the updated PR. |
- Add software grid barrier (cuda_grid_barrier) for non-SM90 path using cooperative launch with atomicAdd/atomicExch reset pattern - Use cudaLaunchKernelEx with cooperative attribute for non-SM90 GPUs - Guard TMA intrinsics with DISABLE_SM90_FEATURES to avoid ISA errors - Add warp-copy fallback for TMA store paths on pre-SM90 - Handle CUDA 13+ cuda_fp8.h include compatibility in ep_configs.cuh Signed-off-by: Manoj Gopalakrishnan <manoj.gopalakrishnan@intel.com>
|
@YangZhou1997 @MaoZiming - Could you please review this PR?. @bkpathak This PR needs to be rebased with main branch |
|
|
||
| // Acquire fence: ensure we see all writes from all blocks that arrived | ||
| // before us (transitively through the atomic total order). | ||
| __threadfence(); |
There was a problem hiding this comment.
The __threadfence(); here is not a acquire fence? Is it needed
|
@bkpathak Thank you for the contribution! would you also like to share some performance results on A100? |
|
Sure, will add the test results. |
Description
Please include a summary of the changes and the related issue.
Fixes # (issue)
Type of Change
How Has This Been Tested?
Include any tests here.
Checklist
format.shto follow the style guidelines.build.shto verify compilation.##Test-bed
Build Command
Install Wheel
Test