-
Notifications
You must be signed in to change notification settings - Fork 253
Change device!
function parameter type from Int32
to Int
#2906
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
+6
−3
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
vchuravy
reviewed
Oct 1, 2025
vchuravy
approved these changes
Oct 1, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CUDA.jl Benchmarks
Benchmark suite | Current: 14a933e | Previous: f99784f | Ratio |
---|---|---|---|
latency/precompile |
57559824690 ns |
57505943660.5 ns |
1.00 |
latency/ttfp |
8221184268 ns |
8175596902 ns |
1.01 |
latency/import |
4554132407 ns |
4548330811 ns |
1.00 |
integration/volumerhs |
9627333.5 ns |
9627247.5 ns |
1.00 |
integration/byval/slices=1 |
147180 ns |
147004 ns |
1.00 |
integration/byval/slices=3 |
426079 ns |
426015 ns |
1.00 |
integration/byval/reference |
145085 ns |
144995 ns |
1.00 |
integration/byval/slices=2 |
286565 ns |
286391 ns |
1.00 |
integration/cudadevrt |
103657 ns |
103484 ns |
1.00 |
kernel/indexing |
14124 ns |
14144.5 ns |
1.00 |
kernel/indexing_checked |
14940 ns |
14979 ns |
1.00 |
kernel/occupancy |
740.4360902255639 ns |
696.1307189542483 ns |
1.06 |
kernel/launch |
2192.8888888888887 ns |
2152.1111111111113 ns |
1.02 |
kernel/rand |
15803 ns |
17139 ns |
0.92 |
array/reverse/1d |
19938 ns |
20127 ns |
0.99 |
array/reverse/2dL_inplace |
66882 ns |
66835 ns |
1.00 |
array/reverse/1dL |
70144 ns |
70275 ns |
1.00 |
array/reverse/2d |
21951 ns |
22347 ns |
0.98 |
array/reverse/1d_inplace |
11377.5 ns |
9616 ns |
1.18 |
array/reverse/2d_inplace |
13269 ns |
13357 ns |
0.99 |
array/reverse/2dL |
73882 ns |
74201 ns |
1.00 |
array/reverse/1dL_inplace |
66741 ns |
66737 ns |
1.00 |
array/copy |
21128 ns |
21034 ns |
1.00 |
array/iteration/findall/int |
157964.5 ns |
158193.5 ns |
1.00 |
array/iteration/findall/bool |
139673.5 ns |
140393.5 ns |
0.99 |
array/iteration/findfirst/int |
161714 ns |
161096 ns |
1.00 |
array/iteration/findfirst/bool |
162871 ns |
161802 ns |
1.01 |
array/iteration/scalar |
73219 ns |
72492 ns |
1.01 |
array/iteration/logical |
215922 ns |
214800 ns |
1.01 |
array/iteration/findmin/1d |
51057.5 ns |
50555 ns |
1.01 |
array/iteration/findmin/2d |
96635.5 ns |
96485 ns |
1.00 |
array/reductions/reduce/Int64/1d |
43609 ns |
43782 ns |
1.00 |
array/reductions/reduce/Int64/dims=1 |
44757.5 ns |
44868.5 ns |
1.00 |
array/reductions/reduce/Int64/dims=2 |
61881 ns |
61893.5 ns |
1.00 |
array/reductions/reduce/Int64/dims=1L |
89359 ns |
89229.5 ns |
1.00 |
array/reductions/reduce/Int64/dims=2L |
88482 ns |
88425 ns |
1.00 |
array/reductions/reduce/Float32/1d |
38068.5 ns |
37869 ns |
1.01 |
array/reductions/reduce/Float32/dims=1 |
52338 ns |
52159 ns |
1.00 |
array/reductions/reduce/Float32/dims=2 |
60427.5 ns |
60004 ns |
1.01 |
array/reductions/reduce/Float32/dims=1L |
52804 ns |
52689 ns |
1.00 |
array/reductions/reduce/Float32/dims=2L |
72590.5 ns |
72514 ns |
1.00 |
array/reductions/mapreduce/Int64/1d |
43887 ns |
44158 ns |
0.99 |
array/reductions/mapreduce/Int64/dims=1 |
45179.5 ns |
44917.5 ns |
1.01 |
array/reductions/mapreduce/Int64/dims=2 |
62247 ns |
61973 ns |
1.00 |
array/reductions/mapreduce/Int64/dims=1L |
89325 ns |
89251 ns |
1.00 |
array/reductions/mapreduce/Int64/dims=2L |
88620 ns |
88531 ns |
1.00 |
array/reductions/mapreduce/Float32/1d |
38044 ns |
37472.5 ns |
1.02 |
array/reductions/mapreduce/Float32/dims=1 |
51317 ns |
52480 ns |
0.98 |
array/reductions/mapreduce/Float32/dims=2 |
60731 ns |
60249 ns |
1.01 |
array/reductions/mapreduce/Float32/dims=1L |
53299 ns |
52856 ns |
1.01 |
array/reductions/mapreduce/Float32/dims=2L |
73266 ns |
72524 ns |
1.01 |
array/broadcast |
20568.5 ns |
20030 ns |
1.03 |
array/copyto!/gpu_to_gpu |
13272 ns |
11386 ns |
1.17 |
array/copyto!/cpu_to_gpu |
213642 ns |
216869 ns |
0.99 |
array/copyto!/gpu_to_cpu |
283807.5 ns |
286424.5 ns |
0.99 |
array/accumulate/Int64/1d |
124845 ns |
124830 ns |
1.00 |
array/accumulate/Int64/dims=1 |
83759 ns |
83529 ns |
1.00 |
array/accumulate/Int64/dims=2 |
158398 ns |
157818 ns |
1.00 |
array/accumulate/Int64/dims=1L |
1709802.5 ns |
1709864 ns |
1.00 |
array/accumulate/Int64/dims=2L |
966551 ns |
966626.5 ns |
1.00 |
array/accumulate/Float32/1d |
109419 ns |
109404 ns |
1.00 |
array/accumulate/Float32/dims=1 |
80920 ns |
80482 ns |
1.01 |
array/accumulate/Float32/dims=2 |
148065.5 ns |
147930.5 ns |
1.00 |
array/accumulate/Float32/dims=1L |
1618777 ns |
1618960.5 ns |
1.00 |
array/accumulate/Float32/dims=2L |
698803 ns |
698784 ns |
1.00 |
array/construct |
1258.7 ns |
1292.1 ns |
0.97 |
array/random/randn/Float32 |
48210 ns |
45521 ns |
1.06 |
array/random/randn!/Float32 |
24868 ns |
24996 ns |
0.99 |
array/random/rand!/Int64 |
27329 ns |
27279 ns |
1.00 |
array/random/rand!/Float32 |
8602.833333333332 ns |
8755 ns |
0.98 |
array/random/rand/Int64 |
38111 ns |
30194 ns |
1.26 |
array/random/rand/Float32 |
13226 ns |
13350 ns |
0.99 |
array/permutedims/4d |
60712.5 ns |
59834 ns |
1.01 |
array/permutedims/2d |
54395 ns |
53788.5 ns |
1.01 |
array/permutedims/3d |
55384.5 ns |
54729 ns |
1.01 |
array/sorting/1d |
2758863 ns |
2758146 ns |
1.00 |
array/sorting/by |
3345133 ns |
3344254 ns |
1.00 |
array/sorting/2d |
1082485 ns |
1080860 ns |
1.00 |
cuda/synchronization/stream/auto |
1031.3 ns |
1008.3 ns |
1.02 |
cuda/synchronization/stream/nonblocking |
8114.8 ns |
7614.8 ns |
1.07 |
cuda/synchronization/stream/blocking |
793.81 ns |
789.4040404040404 ns |
1.01 |
cuda/synchronization/context/auto |
1194.4 ns |
1182.6 ns |
1.01 |
cuda/synchronization/context/nonblocking |
7387.2 ns |
8392.2 ns |
0.88 |
cuda/synchronization/context/blocking |
904.5227272727273 ns |
890.468085106383 ns |
1.02 |
This comment was automatically generated by workflow using github-action-benchmark.
vchuravy
reviewed
Oct 1, 2025
vchuravy
reviewed
Oct 1, 2025
vchuravy
reviewed
Oct 1, 2025
We should probably make |
CI failures unrelated. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Following the discussion in JuliaGPU/KernelAbstractions.jl#636,
this PR aligns the type parameter of the
device!
function forCUDABackend
s to the one required inhttps://github.com/JuliaGPU/KernelAbstractions.jl/blob/1ac546fc59cc611d749fa7a50e4a1efa3393851b/src/KernelAbstractions.jl#L622