cuBLASXt: only grant pool access to peer-capable devices by JohnCobbler · Pull Request #3166 · JuliaGPU/CUDA.jl

JohnCobbler · 2026-06-05T10:51:59Z

As offered in #3165: xt_handle() grants pool access to all participating devices without checking can_access_peer, unlike maybe_enable_peer_access in CUDACore. Per the CUDA docs, an unchecked cuMemPoolSetAccess on a fresh pool can succeed on non-peer-capable devices, deferring the failure to a later allocation.

This checks peer capability first, warns (once) about devices that can't access a pool, and only grants access where valid.

Caveat up front: I only have a single GPU, so this is untested on actual multi-GPU hardware, and per the issue it likely isn't the whole story behind the NaN readbacks. Feel free to close if you'd rather investigate the root cause first.

cuMemPoolSetAccess on a fresh pool can succeed even when the devices are not peer capable, deferring the failure to a later allocation. Check can_access_peer before granting, and warn about devices that cannot access the pool, mirroring maybe_enable_peer_access in CUDACore.

codecov · 2026-06-05T13:41:46Z

Codecov Report

❌ Patch coverage is 66.66667% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 16.34%. Comparing base (aa47d7a) to head (434b095).

Files with missing lines	Patch %	Lines
lib/cublas/src/cuBLAS.jl	66.66%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3166      +/-   ##
==========================================
+ Coverage   16.33%   16.34%   +0.01%     
==========================================
  Files         124      124              
  Lines        9875     9880       +5     
==========================================
+ Hits         1613     1615       +2     
- Misses       8262     8265       +3

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

github-actions

CUDA.jl Benchmarks

Details

Benchmark suite	Current: `434b095`	Previous: `aa47d7a`	Ratio
`array/accumulate/Float32/1d`	`99232` ns	`98976` ns	`1.00`
`array/accumulate/Float32/dims=1`	`75352` ns	`75470` ns	`1.00`
`array/accumulate/Float32/dims=1L`	`1596782` ns	`1595788` ns	`1.00`
`array/accumulate/Float32/dims=2`	`140288` ns	`140419` ns	`1.00`
`array/accumulate/Float32/dims=2L`	`652706` ns	`653444` ns	`1.00`
`array/accumulate/Int64/1d`	`117981` ns	`118155` ns	`1.00`
`array/accumulate/Int64/dims=1`	`78945` ns	`78907` ns	`1.00`
`array/accumulate/Int64/dims=1L`	`1708040` ns	`1709506` ns	`1.00`
`array/accumulate/Int64/dims=2`	`154051` ns	`153939` ns	`1.00`
`array/accumulate/Int64/dims=2L`	`959710` ns	`959330` ns	`1.00`
`array/broadcast`	`18251` ns	`18270` ns	`1.00`
`array/construct`	`1176.7` ns	`1198.4` ns	`0.98`
`array/copy`	`16211` ns	`16676` ns	`0.97`
`array/copyto!/cpu_to_gpu`	`211348` ns	`211135` ns	`1.00`
`array/copyto!/gpu_to_cpu`	`279211` ns	`278832` ns	`1.00`
`array/copyto!/gpu_to_gpu`	`10288` ns	`10531` ns	`0.98`
`array/iteration/findall/bool`	`133016` ns	`131993` ns	`1.01`
`array/iteration/findall/int`	`146365` ns	`146745` ns	`1.00`
`array/iteration/findfirst/bool`	`111605` ns	`111631` ns	`1.00`
`array/iteration/findfirst/int`	`112190` ns	`111858` ns	`1.00`
`array/iteration/findmin/1d`	`65661` ns	`66902` ns	`0.98`
`array/iteration/findmin/2d`	`100624` ns	`100550` ns	`1.00`
`array/iteration/logical`	`190452` ns	`189124` ns	`1.01`
`array/iteration/scalar`	`64165` ns	`66015` ns	`0.97`
`array/permutedims/2d`	`49195` ns	`49598` ns	`0.99`
`array/permutedims/3d`	`50866` ns	`50240` ns	`1.01`
`array/permutedims/4d`	`50357` ns	`50411` ns	`1.00`
`array/random/rand/Float32`	`11522` ns	`11982` ns	`0.96`
`array/random/rand/Int64`	`23755` ns	`23515` ns	`1.01`
`array/random/rand!/Float32`	`7917` ns	`8122` ns	`0.97`
`array/random/rand!/Int64`	`20468` ns	`20501` ns	`1.00`
`array/random/randn/Float32`	`34471` ns	`34458` ns	`1.00`
`array/random/randn!/Float32`	`23757` ns	`24130` ns	`0.98`
`array/reductions/mapreduce/Float32/1d`	`32648` ns	`33763` ns	`0.97`
`array/reductions/mapreduce/Float32/dims=1`	`37942` ns	`38228` ns	`0.99`
`array/reductions/mapreduce/Float32/dims=1L`	`49702` ns	`50249` ns	`0.99`
`array/reductions/mapreduce/Float32/dims=2`	`55322` ns	`55439` ns	`1.00`
`array/reductions/mapreduce/Float32/dims=2L`	`67093` ns	`67163` ns	`1.00`
`array/reductions/mapreduce/Int64/1d`	`39301` ns	`40187` ns	`0.98`
`array/reductions/mapreduce/Int64/dims=1`	`40840` ns	`40738` ns	`1.00`
`array/reductions/mapreduce/Int64/dims=1L`	`86135` ns	`86458` ns	`1.00`
`array/reductions/mapreduce/Int64/dims=2`	`57359` ns	`57724` ns	`0.99`
`array/reductions/mapreduce/Int64/dims=2L`	`82502` ns	`82751` ns	`1.00`
`array/reductions/reduce/Float32/1d`	`33303` ns	`33392` ns	`1.00`
`array/reductions/reduce/Float32/dims=1`	`37906` ns	`38091` ns	`1.00`
`array/reductions/reduce/Float32/dims=1L`	`49787` ns	`49963` ns	`1.00`
`array/reductions/reduce/Float32/dims=2`	`55267` ns	`55491` ns	`1.00`
`array/reductions/reduce/Float32/dims=2L`	`68712` ns	`68898` ns	`1.00`
`array/reductions/reduce/Int64/1d`	`38577` ns	`40022` ns	`0.96`
`array/reductions/reduce/Int64/dims=1`	`40581` ns	`40672` ns	`1.00`
`array/reductions/reduce/Int64/dims=1L`	`86251` ns	`86301` ns	`1.00`
`array/reductions/reduce/Int64/dims=2`	`57424` ns	`57311` ns	`1.00`
`array/reductions/reduce/Int64/dims=2L`	`82572` ns	`82411` ns	`1.00`
`array/reverse/1d`	`16566` ns	`16807` ns	`0.99`
`array/reverse/1dL`	`67404` ns	`67676` ns	`1.00`
`array/reverse/1dL_inplace`	`65033` ns	`65187` ns	`1.00`
`array/reverse/1d_inplace`	`8102` ns	`9321.666666666666` ns	`0.87`
`array/reverse/2d`	`19833` ns	`19959` ns	`0.99`
`array/reverse/2dL`	`71728` ns	`71879` ns	`1.00`
`array/reverse/2dL_inplace`	`64897` ns	`65104` ns	`1.00`
`array/reverse/2d_inplace`	`9461` ns	`11067` ns	`0.85`
`array/sorting/1d`	`2650181` ns	`2655417` ns	`1.00`
`array/sorting/2d`	`1035694` ns	`1038734` ns	`1.00`
`array/sorting/by`	`3178013` ns	`3192232` ns	`1.00`
`cuda/synchronization/context/auto`	`1144.2` ns	`1131.5` ns	`1.01`
`cuda/synchronization/context/blocking`	`923.7` ns	`952.2173913043479` ns	`0.97`
`cuda/synchronization/context/nonblocking`	`6161.6` ns	`6097.8` ns	`1.01`
`cuda/synchronization/stream/auto`	`986.9285714285714` ns	`1004.5` ns	`0.98`
`cuda/synchronization/stream/blocking`	`800.8586956521739` ns	`825.6363636363636` ns	`0.97`
`cuda/synchronization/stream/nonblocking`	`5866.8` ns	`6045.333333333333` ns	`0.97`
`integration/byval/reference`	`143021` ns	`143141` ns	`1.00`
`integration/byval/slices=1`	`145015` ns	`145110` ns	`1.00`
`integration/byval/slices=2`	`283532` ns	`283495` ns	`1.00`
`integration/byval/slices=3`	`421546` ns	`422045` ns	`1.00`
`integration/cudadevrt`	`101462` ns	`101557` ns	`1.00`
`integration/volumerhs`	`9080052` ns	`9077766` ns	`1.00`
`kernel/indexing`	`12566` ns	`12534` ns	`1.00`
`kernel/indexing_checked`	`13329` ns	`13291` ns	`1.00`
`kernel/launch`	`2048.5555555555557` ns	`2072.5555555555557` ns	`0.99`
`kernel/occupancy`	`715.9782608695652` ns	`716.9097744360902` ns	`1.00`
`kernel/rand`	`13632` ns	`13723` ns	`0.99`
`latency/import`	`3875036193` ns	`3841987489` ns	`1.01`
`latency/precompile`	`4621167021` ns	`4621684240` ns	`1.00`
`latency/ttfp`	`4491013832` ns	`4482964065` ns	`1.00`

This comment was automatically generated by workflow using github-action-benchmark.

maleadt

LGTM, thanks.

github-actions Bot reviewed Jun 5, 2026

View reviewed changes

maleadt approved these changes Jun 10, 2026

View reviewed changes

maleadt merged commit fdb0f83 into JuliaGPU:main Jun 10, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cuBLASXt: only grant pool access to peer-capable devices#3166

cuBLASXt: only grant pool access to peer-capable devices#3166
maleadt merged 1 commit into
JuliaGPU:mainfrom
JohnCobbler:fix/cublasxt-peer-check

JohnCobbler commented Jun 5, 2026

Uh oh!

codecov Bot commented Jun 5, 2026

Uh oh!

github-actions Bot left a comment

Uh oh!

maleadt left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

JohnCobbler commented Jun 5, 2026

Uh oh!

codecov Bot commented Jun 5, 2026

Codecov Report

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

CUDA.jl Benchmarks

Uh oh!

maleadt left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants