Skip to content

Conversation

simone-silvestri
Copy link
Contributor

Following the discussion in JuliaGPU/KernelAbstractions.jl#636,
this PR aligns the type parameter of the device! function for CUDABackends to the one required in
https://github.com/JuliaGPU/KernelAbstractions.jl/blob/1ac546fc59cc611d749fa7a50e4a1efa3393851b/src/KernelAbstractions.jl#L622

@vchuravy vchuravy requested a review from michel2323 October 1, 2025 12:41
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CUDA.jl Benchmarks

Benchmark suite Current: 14a933e Previous: f99784f Ratio
latency/precompile 57559824690 ns 57505943660.5 ns 1.00
latency/ttfp 8221184268 ns 8175596902 ns 1.01
latency/import 4554132407 ns 4548330811 ns 1.00
integration/volumerhs 9627333.5 ns 9627247.5 ns 1.00
integration/byval/slices=1 147180 ns 147004 ns 1.00
integration/byval/slices=3 426079 ns 426015 ns 1.00
integration/byval/reference 145085 ns 144995 ns 1.00
integration/byval/slices=2 286565 ns 286391 ns 1.00
integration/cudadevrt 103657 ns 103484 ns 1.00
kernel/indexing 14124 ns 14144.5 ns 1.00
kernel/indexing_checked 14940 ns 14979 ns 1.00
kernel/occupancy 740.4360902255639 ns 696.1307189542483 ns 1.06
kernel/launch 2192.8888888888887 ns 2152.1111111111113 ns 1.02
kernel/rand 15803 ns 17139 ns 0.92
array/reverse/1d 19938 ns 20127 ns 0.99
array/reverse/2dL_inplace 66882 ns 66835 ns 1.00
array/reverse/1dL 70144 ns 70275 ns 1.00
array/reverse/2d 21951 ns 22347 ns 0.98
array/reverse/1d_inplace 11377.5 ns 9616 ns 1.18
array/reverse/2d_inplace 13269 ns 13357 ns 0.99
array/reverse/2dL 73882 ns 74201 ns 1.00
array/reverse/1dL_inplace 66741 ns 66737 ns 1.00
array/copy 21128 ns 21034 ns 1.00
array/iteration/findall/int 157964.5 ns 158193.5 ns 1.00
array/iteration/findall/bool 139673.5 ns 140393.5 ns 0.99
array/iteration/findfirst/int 161714 ns 161096 ns 1.00
array/iteration/findfirst/bool 162871 ns 161802 ns 1.01
array/iteration/scalar 73219 ns 72492 ns 1.01
array/iteration/logical 215922 ns 214800 ns 1.01
array/iteration/findmin/1d 51057.5 ns 50555 ns 1.01
array/iteration/findmin/2d 96635.5 ns 96485 ns 1.00
array/reductions/reduce/Int64/1d 43609 ns 43782 ns 1.00
array/reductions/reduce/Int64/dims=1 44757.5 ns 44868.5 ns 1.00
array/reductions/reduce/Int64/dims=2 61881 ns 61893.5 ns 1.00
array/reductions/reduce/Int64/dims=1L 89359 ns 89229.5 ns 1.00
array/reductions/reduce/Int64/dims=2L 88482 ns 88425 ns 1.00
array/reductions/reduce/Float32/1d 38068.5 ns 37869 ns 1.01
array/reductions/reduce/Float32/dims=1 52338 ns 52159 ns 1.00
array/reductions/reduce/Float32/dims=2 60427.5 ns 60004 ns 1.01
array/reductions/reduce/Float32/dims=1L 52804 ns 52689 ns 1.00
array/reductions/reduce/Float32/dims=2L 72590.5 ns 72514 ns 1.00
array/reductions/mapreduce/Int64/1d 43887 ns 44158 ns 0.99
array/reductions/mapreduce/Int64/dims=1 45179.5 ns 44917.5 ns 1.01
array/reductions/mapreduce/Int64/dims=2 62247 ns 61973 ns 1.00
array/reductions/mapreduce/Int64/dims=1L 89325 ns 89251 ns 1.00
array/reductions/mapreduce/Int64/dims=2L 88620 ns 88531 ns 1.00
array/reductions/mapreduce/Float32/1d 38044 ns 37472.5 ns 1.02
array/reductions/mapreduce/Float32/dims=1 51317 ns 52480 ns 0.98
array/reductions/mapreduce/Float32/dims=2 60731 ns 60249 ns 1.01
array/reductions/mapreduce/Float32/dims=1L 53299 ns 52856 ns 1.01
array/reductions/mapreduce/Float32/dims=2L 73266 ns 72524 ns 1.01
array/broadcast 20568.5 ns 20030 ns 1.03
array/copyto!/gpu_to_gpu 13272 ns 11386 ns 1.17
array/copyto!/cpu_to_gpu 213642 ns 216869 ns 0.99
array/copyto!/gpu_to_cpu 283807.5 ns 286424.5 ns 0.99
array/accumulate/Int64/1d 124845 ns 124830 ns 1.00
array/accumulate/Int64/dims=1 83759 ns 83529 ns 1.00
array/accumulate/Int64/dims=2 158398 ns 157818 ns 1.00
array/accumulate/Int64/dims=1L 1709802.5 ns 1709864 ns 1.00
array/accumulate/Int64/dims=2L 966551 ns 966626.5 ns 1.00
array/accumulate/Float32/1d 109419 ns 109404 ns 1.00
array/accumulate/Float32/dims=1 80920 ns 80482 ns 1.01
array/accumulate/Float32/dims=2 148065.5 ns 147930.5 ns 1.00
array/accumulate/Float32/dims=1L 1618777 ns 1618960.5 ns 1.00
array/accumulate/Float32/dims=2L 698803 ns 698784 ns 1.00
array/construct 1258.7 ns 1292.1 ns 0.97
array/random/randn/Float32 48210 ns 45521 ns 1.06
array/random/randn!/Float32 24868 ns 24996 ns 0.99
array/random/rand!/Int64 27329 ns 27279 ns 1.00
array/random/rand!/Float32 8602.833333333332 ns 8755 ns 0.98
array/random/rand/Int64 38111 ns 30194 ns 1.26
array/random/rand/Float32 13226 ns 13350 ns 0.99
array/permutedims/4d 60712.5 ns 59834 ns 1.01
array/permutedims/2d 54395 ns 53788.5 ns 1.01
array/permutedims/3d 55384.5 ns 54729 ns 1.01
array/sorting/1d 2758863 ns 2758146 ns 1.00
array/sorting/by 3345133 ns 3344254 ns 1.00
array/sorting/2d 1082485 ns 1080860 ns 1.00
cuda/synchronization/stream/auto 1031.3 ns 1008.3 ns 1.02
cuda/synchronization/stream/nonblocking 8114.8 ns 7614.8 ns 1.07
cuda/synchronization/stream/blocking 793.81 ns 789.4040404040404 ns 1.01
cuda/synchronization/context/auto 1194.4 ns 1182.6 ns 1.01
cuda/synchronization/context/nonblocking 7387.2 ns 8392.2 ns 0.88
cuda/synchronization/context/blocking 904.5227272727273 ns 890.468085106383 ns 1.02

This comment was automatically generated by workflow using github-action-benchmark.

@maleadt
Copy link
Member

maleadt commented Oct 2, 2025

We should probably make ndevices return Int; it's unexpected for length(::Iterator) to return a non-Integer.

@maleadt
Copy link
Member

maleadt commented Oct 7, 2025

CI failures unrelated.

@maleadt maleadt merged commit 046ef37 into JuliaGPU:master Oct 7, 2025
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants