-
Notifications
You must be signed in to change notification settings - Fork 257
Remove diagm in favour of GPUArrays #2979
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
kshyatt
wants to merge
2
commits into
master
Choose a base branch
from
ksh/diagm
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
+0
−34
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
maleadt
approved these changes
Nov 24, 2025
Member
Author
|
Let's bump this on top of GPUArrays and I'l remove the |
Contributor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CUDA.jl Benchmarks
| Benchmark suite | Current: adf22c4 | Previous: 0048e5a | Ratio |
|---|---|---|---|
latency/precompile |
56646452642.5 ns |
57035764507 ns |
0.99 |
latency/ttfp |
8197390132.5 ns |
8371031413.5 ns |
0.98 |
latency/import |
4356410400 ns |
4496982792 ns |
0.97 |
integration/volumerhs |
9607159 ns |
9624750 ns |
1.00 |
integration/byval/slices=1 |
147011 ns |
147060 ns |
1.00 |
integration/byval/slices=3 |
425973.5 ns |
426247.5 ns |
1.00 |
integration/byval/reference |
145125 ns |
145079 ns |
1.00 |
integration/byval/slices=2 |
286464 ns |
286395 ns |
1.00 |
integration/cudadevrt |
103679 ns |
103630 ns |
1.00 |
kernel/indexing |
14403 ns |
14142 ns |
1.02 |
kernel/indexing_checked |
15133 ns |
15085 ns |
1.00 |
kernel/occupancy |
670.2955974842768 ns |
683.7784810126582 ns |
0.98 |
kernel/launch |
2217.777777777778 ns |
2192.3333333333335 ns |
1.01 |
kernel/rand |
15392 ns |
15556 ns |
0.99 |
array/reverse/1d |
20426 ns |
19955 ns |
1.02 |
array/reverse/2dL_inplace |
66940 ns |
67142 ns |
1.00 |
array/reverse/1dL |
70544 ns |
70143 ns |
1.01 |
array/reverse/2d |
21812 ns |
21759 ns |
1.00 |
array/reverse/1d_inplace |
11592 ns |
9799 ns |
1.18 |
array/reverse/2d_inplace |
13472 ns |
11305 ns |
1.19 |
array/reverse/2dL |
73704.5 ns |
73707 ns |
1.00 |
array/reverse/1dL_inplace |
67011 ns |
66873 ns |
1.00 |
array/copy |
20911 ns |
20703 ns |
1.01 |
array/iteration/findall/int |
158488 ns |
157587.5 ns |
1.01 |
array/iteration/findall/bool |
140466 ns |
140083 ns |
1.00 |
array/iteration/findfirst/int |
161938.5 ns |
161900 ns |
1.00 |
array/iteration/findfirst/bool |
162344.5 ns |
162539 ns |
1.00 |
array/iteration/scalar |
73105 ns |
73928 ns |
0.99 |
array/iteration/logical |
216945 ns |
216280.5 ns |
1.00 |
array/iteration/findmin/1d |
50782.5 ns |
53048.5 ns |
0.96 |
array/iteration/findmin/2d |
97085 ns |
96623.5 ns |
1.00 |
array/reductions/reduce/Int64/1d |
43853 ns |
43822 ns |
1.00 |
array/reductions/reduce/Int64/dims=1 |
44765 ns |
45024.5 ns |
0.99 |
array/reductions/reduce/Int64/dims=2 |
61662 ns |
61470 ns |
1.00 |
array/reductions/reduce/Int64/dims=1L |
89195 ns |
88941 ns |
1.00 |
array/reductions/reduce/Int64/dims=2L |
88335 ns |
87825.5 ns |
1.01 |
array/reductions/reduce/Float32/1d |
37556 ns |
37058 ns |
1.01 |
array/reductions/reduce/Float32/dims=1 |
51787 ns |
50073 ns |
1.03 |
array/reductions/reduce/Float32/dims=2 |
59944 ns |
59949 ns |
1.00 |
array/reductions/reduce/Float32/dims=1L |
52540 ns |
52439 ns |
1.00 |
array/reductions/reduce/Float32/dims=2L |
72277 ns |
72064 ns |
1.00 |
array/reductions/mapreduce/Int64/1d |
43791.5 ns |
43941 ns |
1.00 |
array/reductions/mapreduce/Int64/dims=1 |
51253.5 ns |
44815 ns |
1.14 |
array/reductions/mapreduce/Int64/dims=2 |
61633 ns |
61592 ns |
1.00 |
array/reductions/mapreduce/Int64/dims=1L |
89031.5 ns |
88958 ns |
1.00 |
array/reductions/mapreduce/Int64/dims=2L |
88191.5 ns |
88240 ns |
1.00 |
array/reductions/mapreduce/Float32/1d |
37365 ns |
37401 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=1 |
41903 ns |
43167 ns |
0.97 |
array/reductions/mapreduce/Float32/dims=2 |
60007.5 ns |
59977 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=1L |
52904 ns |
52531 ns |
1.01 |
array/reductions/mapreduce/Float32/dims=2L |
72377 ns |
72397 ns |
1.00 |
array/broadcast |
20128 ns |
20122 ns |
1.00 |
array/copyto!/gpu_to_gpu |
13159 ns |
11368 ns |
1.16 |
array/copyto!/cpu_to_gpu |
215657 ns |
215772 ns |
1.00 |
array/copyto!/gpu_to_cpu |
283085.5 ns |
282327 ns |
1.00 |
array/accumulate/Int64/1d |
125130 ns |
124672 ns |
1.00 |
array/accumulate/Int64/dims=1 |
83335 ns |
83443 ns |
1.00 |
array/accumulate/Int64/dims=2 |
158509.5 ns |
157725 ns |
1.00 |
array/accumulate/Int64/dims=1L |
1720548 ns |
1710398.5 ns |
1.01 |
array/accumulate/Int64/dims=2L |
967847 ns |
966565 ns |
1.00 |
array/accumulate/Float32/1d |
110322 ns |
108966.5 ns |
1.01 |
array/accumulate/Float32/dims=1 |
80302 ns |
80321 ns |
1.00 |
array/accumulate/Float32/dims=2 |
147411 ns |
148101 ns |
1.00 |
array/accumulate/Float32/dims=1L |
1627874.5 ns |
1619028 ns |
1.01 |
array/accumulate/Float32/dims=2L |
701566 ns |
698667 ns |
1.00 |
array/construct |
1265.4 ns |
1281.2 ns |
0.99 |
array/random/randn/Float32 |
45829.5 ns |
48183.5 ns |
0.95 |
array/random/randn!/Float32 |
25511 ns |
24910 ns |
1.02 |
array/random/rand!/Int64 |
27510 ns |
27271 ns |
1.01 |
array/random/rand!/Float32 |
8750.333333333334 ns |
8866.333333333334 ns |
0.99 |
array/random/rand/Int64 |
30122 ns |
37929.5 ns |
0.79 |
array/random/rand/Float32 |
13078 ns |
13112 ns |
1.00 |
array/permutedims/4d |
56328.5 ns |
55650 ns |
1.01 |
array/permutedims/2d |
54109 ns |
54104.5 ns |
1.00 |
array/permutedims/3d |
54980 ns |
54918 ns |
1.00 |
array/sorting/1d |
2757574.5 ns |
2757756 ns |
1.00 |
array/sorting/by |
3356022 ns |
3344340.5 ns |
1.00 |
array/sorting/2d |
1088159 ns |
1081498 ns |
1.01 |
cuda/synchronization/stream/auto |
1042 ns |
1053.2 ns |
0.99 |
cuda/synchronization/stream/nonblocking |
7619.5 ns |
7607.6 ns |
1.00 |
cuda/synchronization/stream/blocking |
818.5913978494624 ns |
872.9166666666666 ns |
0.94 |
cuda/synchronization/context/auto |
1174.8 ns |
1195.3 ns |
0.98 |
cuda/synchronization/context/nonblocking |
7280.299999999999 ns |
7954 ns |
0.92 |
cuda/synchronization/context/blocking |
921.6279069767442 ns |
921.375 ns |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
Member
Author
|
Failure looks related (to bumping the GPUArrays version) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Needs to wait for a new GPUArrays to be tagged (assuming tests pass)