Skip to content

Conversation

@kshyatt
Copy link
Member

@kshyatt kshyatt commented Nov 19, 2025

Needs to wait for a new GPUArrays to be tagged (assuming tests pass)

@kshyatt kshyatt added the cuda array Stuff about CuArray. label Nov 19, 2025
@kshyatt
Copy link
Member Author

kshyatt commented Nov 24, 2025

Let's bump this on top of GPUArrays and I'l remove the [sources]

@kshyatt kshyatt enabled auto-merge (squash) November 24, 2025 09:12
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CUDA.jl Benchmarks

Benchmark suite Current: adf22c4 Previous: 0048e5a Ratio
latency/precompile 56646452642.5 ns 57035764507 ns 0.99
latency/ttfp 8197390132.5 ns 8371031413.5 ns 0.98
latency/import 4356410400 ns 4496982792 ns 0.97
integration/volumerhs 9607159 ns 9624750 ns 1.00
integration/byval/slices=1 147011 ns 147060 ns 1.00
integration/byval/slices=3 425973.5 ns 426247.5 ns 1.00
integration/byval/reference 145125 ns 145079 ns 1.00
integration/byval/slices=2 286464 ns 286395 ns 1.00
integration/cudadevrt 103679 ns 103630 ns 1.00
kernel/indexing 14403 ns 14142 ns 1.02
kernel/indexing_checked 15133 ns 15085 ns 1.00
kernel/occupancy 670.2955974842768 ns 683.7784810126582 ns 0.98
kernel/launch 2217.777777777778 ns 2192.3333333333335 ns 1.01
kernel/rand 15392 ns 15556 ns 0.99
array/reverse/1d 20426 ns 19955 ns 1.02
array/reverse/2dL_inplace 66940 ns 67142 ns 1.00
array/reverse/1dL 70544 ns 70143 ns 1.01
array/reverse/2d 21812 ns 21759 ns 1.00
array/reverse/1d_inplace 11592 ns 9799 ns 1.18
array/reverse/2d_inplace 13472 ns 11305 ns 1.19
array/reverse/2dL 73704.5 ns 73707 ns 1.00
array/reverse/1dL_inplace 67011 ns 66873 ns 1.00
array/copy 20911 ns 20703 ns 1.01
array/iteration/findall/int 158488 ns 157587.5 ns 1.01
array/iteration/findall/bool 140466 ns 140083 ns 1.00
array/iteration/findfirst/int 161938.5 ns 161900 ns 1.00
array/iteration/findfirst/bool 162344.5 ns 162539 ns 1.00
array/iteration/scalar 73105 ns 73928 ns 0.99
array/iteration/logical 216945 ns 216280.5 ns 1.00
array/iteration/findmin/1d 50782.5 ns 53048.5 ns 0.96
array/iteration/findmin/2d 97085 ns 96623.5 ns 1.00
array/reductions/reduce/Int64/1d 43853 ns 43822 ns 1.00
array/reductions/reduce/Int64/dims=1 44765 ns 45024.5 ns 0.99
array/reductions/reduce/Int64/dims=2 61662 ns 61470 ns 1.00
array/reductions/reduce/Int64/dims=1L 89195 ns 88941 ns 1.00
array/reductions/reduce/Int64/dims=2L 88335 ns 87825.5 ns 1.01
array/reductions/reduce/Float32/1d 37556 ns 37058 ns 1.01
array/reductions/reduce/Float32/dims=1 51787 ns 50073 ns 1.03
array/reductions/reduce/Float32/dims=2 59944 ns 59949 ns 1.00
array/reductions/reduce/Float32/dims=1L 52540 ns 52439 ns 1.00
array/reductions/reduce/Float32/dims=2L 72277 ns 72064 ns 1.00
array/reductions/mapreduce/Int64/1d 43791.5 ns 43941 ns 1.00
array/reductions/mapreduce/Int64/dims=1 51253.5 ns 44815 ns 1.14
array/reductions/mapreduce/Int64/dims=2 61633 ns 61592 ns 1.00
array/reductions/mapreduce/Int64/dims=1L 89031.5 ns 88958 ns 1.00
array/reductions/mapreduce/Int64/dims=2L 88191.5 ns 88240 ns 1.00
array/reductions/mapreduce/Float32/1d 37365 ns 37401 ns 1.00
array/reductions/mapreduce/Float32/dims=1 41903 ns 43167 ns 0.97
array/reductions/mapreduce/Float32/dims=2 60007.5 ns 59977 ns 1.00
array/reductions/mapreduce/Float32/dims=1L 52904 ns 52531 ns 1.01
array/reductions/mapreduce/Float32/dims=2L 72377 ns 72397 ns 1.00
array/broadcast 20128 ns 20122 ns 1.00
array/copyto!/gpu_to_gpu 13159 ns 11368 ns 1.16
array/copyto!/cpu_to_gpu 215657 ns 215772 ns 1.00
array/copyto!/gpu_to_cpu 283085.5 ns 282327 ns 1.00
array/accumulate/Int64/1d 125130 ns 124672 ns 1.00
array/accumulate/Int64/dims=1 83335 ns 83443 ns 1.00
array/accumulate/Int64/dims=2 158509.5 ns 157725 ns 1.00
array/accumulate/Int64/dims=1L 1720548 ns 1710398.5 ns 1.01
array/accumulate/Int64/dims=2L 967847 ns 966565 ns 1.00
array/accumulate/Float32/1d 110322 ns 108966.5 ns 1.01
array/accumulate/Float32/dims=1 80302 ns 80321 ns 1.00
array/accumulate/Float32/dims=2 147411 ns 148101 ns 1.00
array/accumulate/Float32/dims=1L 1627874.5 ns 1619028 ns 1.01
array/accumulate/Float32/dims=2L 701566 ns 698667 ns 1.00
array/construct 1265.4 ns 1281.2 ns 0.99
array/random/randn/Float32 45829.5 ns 48183.5 ns 0.95
array/random/randn!/Float32 25511 ns 24910 ns 1.02
array/random/rand!/Int64 27510 ns 27271 ns 1.01
array/random/rand!/Float32 8750.333333333334 ns 8866.333333333334 ns 0.99
array/random/rand/Int64 30122 ns 37929.5 ns 0.79
array/random/rand/Float32 13078 ns 13112 ns 1.00
array/permutedims/4d 56328.5 ns 55650 ns 1.01
array/permutedims/2d 54109 ns 54104.5 ns 1.00
array/permutedims/3d 54980 ns 54918 ns 1.00
array/sorting/1d 2757574.5 ns 2757756 ns 1.00
array/sorting/by 3356022 ns 3344340.5 ns 1.00
array/sorting/2d 1088159 ns 1081498 ns 1.01
cuda/synchronization/stream/auto 1042 ns 1053.2 ns 0.99
cuda/synchronization/stream/nonblocking 7619.5 ns 7607.6 ns 1.00
cuda/synchronization/stream/blocking 818.5913978494624 ns 872.9166666666666 ns 0.94
cuda/synchronization/context/auto 1174.8 ns 1195.3 ns 0.98
cuda/synchronization/context/nonblocking 7280.299999999999 ns 7954 ns 0.92
cuda/synchronization/context/blocking 921.6279069767442 ns 921.375 ns 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@kshyatt
Copy link
Member Author

kshyatt commented Nov 24, 2025

Failure looks related (to bumping the GPUArrays version)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda array Stuff about CuArray.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants