Skip to content

Fix pointer types in cublasHgemm#3157

Merged
maleadt merged 2 commits into
JuliaGPU:mainfrom
lpawela:master
Jun 10, 2026
Merged

Fix pointer types in cublasHgemm#3157
maleadt merged 2 commits into
JuliaGPU:mainfrom
lpawela:master

Conversation

@lpawela

@lpawela lpawela commented May 28, 2026

Copy link
Copy Markdown
Contributor

In the current main branch this fails

N = 256
A = CUDA.rand(Float16, N, N)
B = CUDA.rand(Float16, N, N)
C = CUDA.zeros(Float16, N, N)
CUDA.CUBLAS.gemm!('N', 'N', one(Float16), A, B, zero(Float16), C)

with

ERROR: MethodError: no method matching unsafe_convert(::Type{Ptr{Float16}}, ::Float16)
The function `unsafe_convert` exists, but no method is defined for this combination of argument types.

Closest candidates are:
  unsafe_convert(::Type{Cwstring}, ::Any)
   @ Base strings/cstring.jl:101
  unsafe_convert(::Type{Ptr{T}}, ::Array{T}) where T
   @ Base pointer.jl:65
  unsafe_convert(::Type{Ptr{T}}, ::Ptr{NTuple{N, T}}) where {N, T}
   @ Base refpointer.jl:188

This PR fixes this issue.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CUDA.jl Benchmarks

Details
Benchmark suite Current: 4eecb7d Previous: 1caa3ad Ratio
array/accumulate/Float32/1d 99564 ns 100770 ns 0.99
array/accumulate/Float32/dims=1 74994 ns 76381 ns 0.98
array/accumulate/Float32/dims=1L 1627668 ns 1630799 ns 1.00
array/accumulate/Float32/dims=2 140212 ns 141553 ns 0.99
array/accumulate/Float32/dims=2L 652281 ns 653429 ns 1.00
array/accumulate/Int64/1d 118487 ns 119282 ns 0.99
array/accumulate/Int64/dims=1 78162 ns 80346 ns 0.97
array/accumulate/Int64/dims=1L 1722317 ns 1726167 ns 1.00
array/accumulate/Int64/dims=2 153130 ns 154865 ns 0.99
array/accumulate/Int64/dims=2L 959752 ns 960762 ns 1.00
array/broadcast 18289 ns 18636 ns 0.98
array/construct 1215.8 ns 1242.5 ns 0.98
array/copy 16369 ns 16748 ns 0.98
array/copyto!/cpu_to_gpu 212420 ns 214495 ns 0.99
array/copyto!/gpu_to_cpu 278989 ns 281315 ns 0.99
array/copyto!/gpu_to_gpu 9023 ns 10356 ns 0.87
array/iteration/findall/bool 132242 ns 134688 ns 0.98
array/iteration/findall/int 146865 ns 148663 ns 0.99
array/iteration/findfirst/bool 68968 ns 70356 ns 0.98
array/iteration/findfirst/int 70412 ns 71778 ns 0.98
array/iteration/findmin/1d 66370 ns 68513 ns 0.97
array/iteration/findmin/2d 100578 ns 101687 ns 0.99
array/iteration/logical 191377 ns 195448 ns 0.98
array/iteration/scalar 64811 ns 66072 ns 0.98
array/permutedims/2d 49758 ns 50059 ns 0.99
array/permutedims/3d 51177 ns 50890 ns 1.01
array/permutedims/4d 50724 ns 51304 ns 0.99
array/random/rand/Float32 11759 ns 11937 ns 0.99
array/random/rand/Int64 22787 ns 24454 ns 0.93
array/random/rand!/Float32 7948.75 ns 8092.666666666667 ns 0.98
array/random/rand!/Int64 20658 ns 20949 ns 0.99
array/random/randn/Float32 34093 ns 36329 ns 0.94
array/random/randn!/Float32 23709 ns 24624 ns 0.96
array/reductions/mapreduce/Float32/1d 33206 ns 33603 ns 0.99
array/reductions/mapreduce/Float32/dims=1 38492 ns 38517 ns 1.00
array/reductions/mapreduce/Float32/dims=1L 50040 ns 50114 ns 1.00
array/reductions/mapreduce/Float32/dims=2 56085 ns 56468 ns 0.99
array/reductions/mapreduce/Float32/dims=2L 66831 ns 67374 ns 0.99
array/reductions/mapreduce/Int64/1d 39421 ns 39976 ns 0.99
array/reductions/mapreduce/Int64/dims=1 41529 ns 41660 ns 1.00
array/reductions/mapreduce/Int64/dims=1L 86335 ns 86590 ns 1.00
array/reductions/mapreduce/Int64/dims=2 58367 ns 59096 ns 0.99
array/reductions/mapreduce/Int64/dims=2L 82816 ns 82999 ns 1.00
array/reductions/reduce/Float32/1d 32883 ns 33460 ns 0.98
array/reductions/reduce/Float32/dims=1 38547 ns 38736 ns 1.00
array/reductions/reduce/Float32/dims=1L 50252 ns 50547 ns 0.99
array/reductions/reduce/Float32/dims=2 55902 ns 56305 ns 0.99
array/reductions/reduce/Float32/dims=2L 66816 ns 67363 ns 0.99
array/reductions/reduce/Int64/1d 39417 ns 40096 ns 0.98
array/reductions/reduce/Int64/dims=1 41196 ns 41717 ns 0.99
array/reductions/reduce/Int64/dims=1L 86315 ns 86750 ns 0.99
array/reductions/reduce/Int64/dims=2 58055 ns 58479 ns 0.99
array/reductions/reduce/Int64/dims=2L 83335 ns 83983 ns 0.99
array/reverse/1d 16550 ns 17202 ns 0.96
array/reverse/1dL 67491 ns 68074 ns 0.99
array/reverse/1dL_inplace 65180 ns 65359 ns 1.00
array/reverse/1d_inplace 8181.333333333333 ns 8376.666666666666 ns 0.98
array/reverse/2d 20037 ns 20409 ns 0.98
array/reverse/2dL 71845 ns 72447 ns 0.99
array/reverse/2dL_inplace 64984 ns 64948 ns 1.00
array/reverse/2d_inplace 9619 ns 9770 ns 0.98
array/sorting/1d 2654972 ns 2653489 ns 1.00
array/sorting/2d 1033410 ns 1034091 ns 1.00
array/sorting/by 3178646 ns 3181225 ns 1.00
cuda/synchronization/context/auto 1106.7 ns 1135.5 ns 0.97
cuda/synchronization/context/blocking 898.1860465116279 ns 939.7407407407408 ns 0.96
cuda/synchronization/context/nonblocking 6006.6 ns 6181.8 ns 0.97
cuda/synchronization/stream/auto 988.3571428571429 ns 991.6 ns 1.00
cuda/synchronization/stream/blocking 818.2359550561798 ns 841.3783783783783 ns 0.97
cuda/synchronization/stream/nonblocking 6000.4 ns 6139.8 ns 0.98
integration/byval/reference 143159 ns 143264 ns 1.00
integration/byval/slices=1 145207 ns 145361 ns 1.00
integration/byval/slices=2 283925 ns 283971 ns 1.00
integration/byval/slices=3 422283 ns 422479 ns 1.00
integration/cudadevrt 101592 ns 101731 ns 1.00
integration/volumerhs 8988193 ns 8999650 ns 1.00
kernel/indexing 12642 ns 12703 ns 1.00
kernel/indexing_checked 13516 ns 13354 ns 1.01
kernel/launch 2083.8888888888887 ns 2086 ns 1.00
kernel/occupancy 734.7027027027027 ns 693.3698630136986 ns 1.06
kernel/rand 16065 ns 15890 ns 1.01
latency/import 3854784368 ns 3853824190 ns 1.00
latency/precompile 4635091489 ns 4635422433 ns 1.00
latency/ttfp 4535766511 ns 4550893912 ns 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@codecov

codecov Bot commented May 28, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 16.31%. Comparing base (1caa3ad) to head (4eecb7d).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3157      +/-   ##
==========================================
+ Coverage   10.82%   16.31%   +5.48%     
==========================================
  Files         123      124       +1     
  Lines        9479     9875     +396     
==========================================
+ Hits         1026     1611     +585     
+ Misses       8453     8264     -189     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

maleadt

This comment was marked as duplicate.

@maleadt maleadt left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You cannot edit libcublas.jl directly, instead (or also) edit res/wrap/libcublas_epilogue.jl so that these changes don't get lost on the next toolkit release.

@lpawela

lpawela commented Jun 7, 2026

Copy link
Copy Markdown
Contributor Author

@maleadt updated

@maleadt maleadt merged commit afc593c into JuliaGPU:main Jun 10, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants