Improve Int128 ABI mismatch detection.#3170
Merged
Merged
Conversation
Contributor
There was a problem hiding this comment.
CUDA.jl Benchmarks
Details
| Benchmark suite | Current: 34dc577 | Previous: 1caa3ad | Ratio |
|---|---|---|---|
array/accumulate/Float32/1d |
99383 ns |
100770 ns |
0.99 |
array/accumulate/Float32/dims=1 |
75149 ns |
76381 ns |
0.98 |
array/accumulate/Float32/dims=1L |
1629231 ns |
1630799 ns |
1.00 |
array/accumulate/Float32/dims=2 |
140654 ns |
141553 ns |
0.99 |
array/accumulate/Float32/dims=2L |
653360 ns |
653429 ns |
1.00 |
array/accumulate/Int64/1d |
118282 ns |
119282 ns |
0.99 |
array/accumulate/Int64/dims=1 |
78853 ns |
80346 ns |
0.98 |
array/accumulate/Int64/dims=1L |
1724276 ns |
1726167 ns |
1.00 |
array/accumulate/Int64/dims=2 |
153119 ns |
154865 ns |
0.99 |
array/accumulate/Int64/dims=2L |
959403 ns |
960762 ns |
1.00 |
array/broadcast |
18258 ns |
18636 ns |
0.98 |
array/construct |
1213.6 ns |
1242.5 ns |
0.98 |
array/copy |
16518 ns |
16748 ns |
0.99 |
array/copyto!/cpu_to_gpu |
212993 ns |
214495 ns |
0.99 |
array/copyto!/gpu_to_cpu |
280084 ns |
281315 ns |
1.00 |
array/copyto!/gpu_to_gpu |
10334 ns |
10356 ns |
1.00 |
array/iteration/findall/bool |
132639 ns |
134688 ns |
0.98 |
array/iteration/findall/int |
146104 ns |
148663 ns |
0.98 |
array/iteration/findfirst/bool |
68874 ns |
70356 ns |
0.98 |
array/iteration/findfirst/int |
70427 ns |
71778 ns |
0.98 |
array/iteration/findmin/1d |
65071 ns |
68513 ns |
0.95 |
array/iteration/findmin/2d |
101226 ns |
101687 ns |
1.00 |
array/iteration/logical |
189699 ns |
195448 ns |
0.97 |
array/iteration/scalar |
64535 ns |
66072 ns |
0.98 |
array/permutedims/2d |
49387 ns |
50059 ns |
0.99 |
array/permutedims/3d |
50674 ns |
50890 ns |
1.00 |
array/permutedims/4d |
50651 ns |
51304 ns |
0.99 |
array/random/rand/Float32 |
11381 ns |
11937 ns |
0.95 |
array/random/rand/Int64 |
22105 ns |
24454 ns |
0.90 |
array/random/rand!/Float32 |
7847 ns |
8092.666666666667 ns |
0.97 |
array/random/rand!/Int64 |
17412 ns |
20949 ns |
0.83 |
array/random/randn/Float32 |
34683 ns |
36329 ns |
0.95 |
array/random/randn!/Float32 |
23723 ns |
24624 ns |
0.96 |
array/reductions/mapreduce/Float32/1d |
33101 ns |
33603 ns |
0.99 |
array/reductions/mapreduce/Float32/dims=1 |
38329 ns |
38517 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=1L |
50058 ns |
50114 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=2 |
55785 ns |
56468 ns |
0.99 |
array/reductions/mapreduce/Float32/dims=2L |
66656 ns |
67374 ns |
0.99 |
array/reductions/mapreduce/Int64/1d |
38514 ns |
39976 ns |
0.96 |
array/reductions/mapreduce/Int64/dims=1 |
41371 ns |
41660 ns |
0.99 |
array/reductions/mapreduce/Int64/dims=1L |
86330 ns |
86590 ns |
1.00 |
array/reductions/mapreduce/Int64/dims=2 |
58183 ns |
59096 ns |
0.98 |
array/reductions/mapreduce/Int64/dims=2L |
81985 ns |
82999 ns |
0.99 |
array/reductions/reduce/Float32/1d |
33246 ns |
33460 ns |
0.99 |
array/reductions/reduce/Float32/dims=1 |
38133 ns |
38736 ns |
0.98 |
array/reductions/reduce/Float32/dims=1L |
50073 ns |
50547 ns |
0.99 |
array/reductions/reduce/Float32/dims=2 |
55632 ns |
56305 ns |
0.99 |
array/reductions/reduce/Float32/dims=2L |
66888 ns |
67363 ns |
0.99 |
array/reductions/reduce/Int64/1d |
39180 ns |
40096 ns |
0.98 |
array/reductions/reduce/Int64/dims=1 |
40997 ns |
41717 ns |
0.98 |
array/reductions/reduce/Int64/dims=1L |
86233 ns |
86750 ns |
0.99 |
array/reductions/reduce/Int64/dims=2 |
57731 ns |
58479 ns |
0.99 |
array/reductions/reduce/Int64/dims=2L |
82615 ns |
83983 ns |
0.98 |
array/reverse/1d |
16748 ns |
17202 ns |
0.97 |
array/reverse/1dL |
67810 ns |
68074 ns |
1.00 |
array/reverse/1dL_inplace |
65177 ns |
65359 ns |
1.00 |
array/reverse/1d_inplace |
8217.666666666666 ns |
8376.666666666666 ns |
0.98 |
array/reverse/2d |
19882 ns |
20409 ns |
0.97 |
array/reverse/2dL |
71655 ns |
72447 ns |
0.99 |
array/reverse/2dL_inplace |
65002 ns |
64948 ns |
1.00 |
array/reverse/2d_inplace |
9597 ns |
9770 ns |
0.98 |
array/sorting/1d |
2655304 ns |
2653489 ns |
1.00 |
array/sorting/2d |
1032380 ns |
1034091 ns |
1.00 |
array/sorting/by |
3180105 ns |
3181225 ns |
1.00 |
cuda/synchronization/context/auto |
1128.8 ns |
1135.5 ns |
0.99 |
cuda/synchronization/context/blocking |
894.1276595744681 ns |
939.7407407407408 ns |
0.95 |
cuda/synchronization/context/nonblocking |
6043 ns |
6181.8 ns |
0.98 |
cuda/synchronization/stream/auto |
994.8181818181819 ns |
991.6 ns |
1.00 |
cuda/synchronization/stream/blocking |
799.989247311828 ns |
841.3783783783783 ns |
0.95 |
cuda/synchronization/stream/nonblocking |
6045 ns |
6139.8 ns |
0.98 |
integration/byval/reference |
143105 ns |
143264 ns |
1.00 |
integration/byval/slices=1 |
145192 ns |
145361 ns |
1.00 |
integration/byval/slices=2 |
283765 ns |
283971 ns |
1.00 |
integration/byval/slices=3 |
422310 ns |
422479 ns |
1.00 |
integration/cudadevrt |
101577 ns |
101731 ns |
1.00 |
integration/volumerhs |
8988630 ns |
8999650 ns |
1.00 |
kernel/indexing |
12590 ns |
12703 ns |
0.99 |
kernel/indexing_checked |
13321 ns |
13354 ns |
1.00 |
kernel/launch |
2089.3333333333335 ns |
2086 ns |
1.00 |
kernel/occupancy |
734.3741007194244 ns |
693.3698630136986 ns |
1.06 |
kernel/rand |
15863 ns |
15890 ns |
1.00 |
latency/import |
3864457489 ns |
3853824190 ns |
1.00 |
latency/precompile |
4639386595 ns |
4635422433 ns |
1.00 |
latency/ttfp |
4559920169 ns |
4550893912 ns |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3170 +/- ##
==========================================
+ Coverage 10.82% 16.31% +5.48%
==========================================
Files 123 124 +1
Lines 9479 9875 +396
==========================================
+ Hits 1026 1611 +585
+ Misses 8453 8264 -189 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
For some context on why we started doing this in the first place: The NVPTX back-end aligns 128-bit integers to 16 bytes. Julia only started doing that in 1.12; on 1.10 and 1.11, Int128 is 8-byte aligned. So on those versions an aggregate with an (U)Int128 field has a different layout on the host than on the device. struct {Int64; Int128} is 24 bytes on the host (field at offset 8) and 32 bytes on the device (offset 16), and codegen always uses the device layout, so the two disagree.