Skip to content

added cudaDeviceSynchronize() to all comms

2ee2b42
Select commit
Loading
Failed to load commit list.
Sign in for the full log view
Closed

support for CUDA aware MPI run, half-precision floating point (fp16) and reduce_scatter communication #1

added cudaDeviceSynchronize() to all comms
2ee2b42
Select commit
Loading
Failed to load commit list.