Fix pointer types in cublasHgemm#3157
Merged
Merged
Conversation
Contributor
There was a problem hiding this comment.
CUDA.jl Benchmarks
Details
| Benchmark suite | Current: 4eecb7d | Previous: 1caa3ad | Ratio |
|---|---|---|---|
array/accumulate/Float32/1d |
99564 ns |
100770 ns |
0.99 |
array/accumulate/Float32/dims=1 |
74994 ns |
76381 ns |
0.98 |
array/accumulate/Float32/dims=1L |
1627668 ns |
1630799 ns |
1.00 |
array/accumulate/Float32/dims=2 |
140212 ns |
141553 ns |
0.99 |
array/accumulate/Float32/dims=2L |
652281 ns |
653429 ns |
1.00 |
array/accumulate/Int64/1d |
118487 ns |
119282 ns |
0.99 |
array/accumulate/Int64/dims=1 |
78162 ns |
80346 ns |
0.97 |
array/accumulate/Int64/dims=1L |
1722317 ns |
1726167 ns |
1.00 |
array/accumulate/Int64/dims=2 |
153130 ns |
154865 ns |
0.99 |
array/accumulate/Int64/dims=2L |
959752 ns |
960762 ns |
1.00 |
array/broadcast |
18289 ns |
18636 ns |
0.98 |
array/construct |
1215.8 ns |
1242.5 ns |
0.98 |
array/copy |
16369 ns |
16748 ns |
0.98 |
array/copyto!/cpu_to_gpu |
212420 ns |
214495 ns |
0.99 |
array/copyto!/gpu_to_cpu |
278989 ns |
281315 ns |
0.99 |
array/copyto!/gpu_to_gpu |
9023 ns |
10356 ns |
0.87 |
array/iteration/findall/bool |
132242 ns |
134688 ns |
0.98 |
array/iteration/findall/int |
146865 ns |
148663 ns |
0.99 |
array/iteration/findfirst/bool |
68968 ns |
70356 ns |
0.98 |
array/iteration/findfirst/int |
70412 ns |
71778 ns |
0.98 |
array/iteration/findmin/1d |
66370 ns |
68513 ns |
0.97 |
array/iteration/findmin/2d |
100578 ns |
101687 ns |
0.99 |
array/iteration/logical |
191377 ns |
195448 ns |
0.98 |
array/iteration/scalar |
64811 ns |
66072 ns |
0.98 |
array/permutedims/2d |
49758 ns |
50059 ns |
0.99 |
array/permutedims/3d |
51177 ns |
50890 ns |
1.01 |
array/permutedims/4d |
50724 ns |
51304 ns |
0.99 |
array/random/rand/Float32 |
11759 ns |
11937 ns |
0.99 |
array/random/rand/Int64 |
22787 ns |
24454 ns |
0.93 |
array/random/rand!/Float32 |
7948.75 ns |
8092.666666666667 ns |
0.98 |
array/random/rand!/Int64 |
20658 ns |
20949 ns |
0.99 |
array/random/randn/Float32 |
34093 ns |
36329 ns |
0.94 |
array/random/randn!/Float32 |
23709 ns |
24624 ns |
0.96 |
array/reductions/mapreduce/Float32/1d |
33206 ns |
33603 ns |
0.99 |
array/reductions/mapreduce/Float32/dims=1 |
38492 ns |
38517 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=1L |
50040 ns |
50114 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=2 |
56085 ns |
56468 ns |
0.99 |
array/reductions/mapreduce/Float32/dims=2L |
66831 ns |
67374 ns |
0.99 |
array/reductions/mapreduce/Int64/1d |
39421 ns |
39976 ns |
0.99 |
array/reductions/mapreduce/Int64/dims=1 |
41529 ns |
41660 ns |
1.00 |
array/reductions/mapreduce/Int64/dims=1L |
86335 ns |
86590 ns |
1.00 |
array/reductions/mapreduce/Int64/dims=2 |
58367 ns |
59096 ns |
0.99 |
array/reductions/mapreduce/Int64/dims=2L |
82816 ns |
82999 ns |
1.00 |
array/reductions/reduce/Float32/1d |
32883 ns |
33460 ns |
0.98 |
array/reductions/reduce/Float32/dims=1 |
38547 ns |
38736 ns |
1.00 |
array/reductions/reduce/Float32/dims=1L |
50252 ns |
50547 ns |
0.99 |
array/reductions/reduce/Float32/dims=2 |
55902 ns |
56305 ns |
0.99 |
array/reductions/reduce/Float32/dims=2L |
66816 ns |
67363 ns |
0.99 |
array/reductions/reduce/Int64/1d |
39417 ns |
40096 ns |
0.98 |
array/reductions/reduce/Int64/dims=1 |
41196 ns |
41717 ns |
0.99 |
array/reductions/reduce/Int64/dims=1L |
86315 ns |
86750 ns |
0.99 |
array/reductions/reduce/Int64/dims=2 |
58055 ns |
58479 ns |
0.99 |
array/reductions/reduce/Int64/dims=2L |
83335 ns |
83983 ns |
0.99 |
array/reverse/1d |
16550 ns |
17202 ns |
0.96 |
array/reverse/1dL |
67491 ns |
68074 ns |
0.99 |
array/reverse/1dL_inplace |
65180 ns |
65359 ns |
1.00 |
array/reverse/1d_inplace |
8181.333333333333 ns |
8376.666666666666 ns |
0.98 |
array/reverse/2d |
20037 ns |
20409 ns |
0.98 |
array/reverse/2dL |
71845 ns |
72447 ns |
0.99 |
array/reverse/2dL_inplace |
64984 ns |
64948 ns |
1.00 |
array/reverse/2d_inplace |
9619 ns |
9770 ns |
0.98 |
array/sorting/1d |
2654972 ns |
2653489 ns |
1.00 |
array/sorting/2d |
1033410 ns |
1034091 ns |
1.00 |
array/sorting/by |
3178646 ns |
3181225 ns |
1.00 |
cuda/synchronization/context/auto |
1106.7 ns |
1135.5 ns |
0.97 |
cuda/synchronization/context/blocking |
898.1860465116279 ns |
939.7407407407408 ns |
0.96 |
cuda/synchronization/context/nonblocking |
6006.6 ns |
6181.8 ns |
0.97 |
cuda/synchronization/stream/auto |
988.3571428571429 ns |
991.6 ns |
1.00 |
cuda/synchronization/stream/blocking |
818.2359550561798 ns |
841.3783783783783 ns |
0.97 |
cuda/synchronization/stream/nonblocking |
6000.4 ns |
6139.8 ns |
0.98 |
integration/byval/reference |
143159 ns |
143264 ns |
1.00 |
integration/byval/slices=1 |
145207 ns |
145361 ns |
1.00 |
integration/byval/slices=2 |
283925 ns |
283971 ns |
1.00 |
integration/byval/slices=3 |
422283 ns |
422479 ns |
1.00 |
integration/cudadevrt |
101592 ns |
101731 ns |
1.00 |
integration/volumerhs |
8988193 ns |
8999650 ns |
1.00 |
kernel/indexing |
12642 ns |
12703 ns |
1.00 |
kernel/indexing_checked |
13516 ns |
13354 ns |
1.01 |
kernel/launch |
2083.8888888888887 ns |
2086 ns |
1.00 |
kernel/occupancy |
734.7027027027027 ns |
693.3698630136986 ns |
1.06 |
kernel/rand |
16065 ns |
15890 ns |
1.01 |
latency/import |
3854784368 ns |
3853824190 ns |
1.00 |
latency/precompile |
4635091489 ns |
4635422433 ns |
1.00 |
latency/ttfp |
4535766511 ns |
4550893912 ns |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3157 +/- ##
==========================================
+ Coverage 10.82% 16.31% +5.48%
==========================================
Files 123 124 +1
Lines 9479 9875 +396
==========================================
+ Hits 1026 1611 +585
+ Misses 8453 8264 -189 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
maleadt
requested changes
Jun 4, 2026
maleadt
left a comment
Member
There was a problem hiding this comment.
You cannot edit libcublas.jl directly, instead (or also) edit res/wrap/libcublas_epilogue.jl so that these changes don't get lost on the next toolkit release.
Contributor
Author
|
@maleadt updated |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
In the current main branch this fails
with
This PR fixes this issue.