cuTENSOR: Preserve storage type when multiplying #2775

christiangnrd · 2025-05-08T17:42:58Z

This is the only instance I could find in cuTENSOR where the responsibility of the output storage type is on the library.

I would appreciate a second look however.

maleadt · 2025-05-08T18:08:30Z

Multiplying tensors is pretty common, so it's likely that this is the case @OliverDudgeon ran into.

I do wonder if we should add a buffer typevar to the CuTensor type. It's not needed for functionality (we can just take it from the contained array), but would make the CuArray field fully-typed. Maybe that's not worth it given the heavyweight nature of the operations applied to CuTensor objects.

github-actions

CUDA.jl Benchmarks

Benchmark suite	Current: `ebab590`	Previous: `82c2074`	Ratio
`latency/precompile`	`42980450862.5` ns	`42798778214.5` ns	`1.00`
`latency/ttfp`	`7130974552` ns	`7189648330` ns	`0.99`
`latency/import`	`3422341012` ns	`3448929760` ns	`0.99`
`integration/volumerhs`	`9605727` ns	`9608526` ns	`1.00`
`integration/byval/slices=1`	`146901` ns	`147048` ns	`1.00`
`integration/byval/slices=3`	`425496` ns	`425659` ns	`1.00`
`integration/byval/reference`	`145145` ns	`145118` ns	`1.00`
`integration/byval/slices=2`	`286258` ns	`286478.5` ns	`1.00`
`integration/cudadevrt`	`103406` ns	`103554` ns	`1.00`
`kernel/indexing`	`14335` ns	`14396` ns	`1.00`
`kernel/indexing_checked`	`15224` ns	`15267` ns	`1.00`
`kernel/occupancy`	`717.7214285714285` ns	`705.917808219178` ns	`1.02`
`kernel/launch`	`2319.8888888888887` ns	`2478.3333333333335` ns	`0.94`
`kernel/rand`	`17485` ns	`14849` ns	`1.18`
`array/reverse/1d`	`19940` ns	`19642` ns	`1.02`
`array/reverse/2d`	`24054.5` ns	`25359` ns	`0.95`
`array/reverse/1d_inplace`	`10603` ns	`11514` ns	`0.92`
`array/reverse/2d_inplace`	`12165` ns	`12988` ns	`0.94`
`array/copy`	`21558` ns	`21283` ns	`1.01`
`array/iteration/findall/int`	`159770` ns	`158862.5` ns	`1.01`
`array/iteration/findall/bool`	`139706` ns	`139368` ns	`1.00`
`array/iteration/findfirst/int`	`164573` ns	`162842` ns	`1.01`
`array/iteration/findfirst/bool`	`165091.5` ns	`164699.5` ns	`1.00`
`array/iteration/scalar`	`74481.5` ns	`72904` ns	`1.02`
`array/iteration/logical`	`218609` ns	`218588` ns	`1.00`
`array/iteration/findmin/1d`	`48018` ns	`48297` ns	`0.99`
`array/iteration/findmin/2d`	`99218.5` ns	`98436` ns	`1.01`
`array/reductions/reduce/1d`	`36200` ns	`43805.5` ns	`0.83`
`array/reductions/reduce/2d`	`42507` ns	`52620` ns	`0.81`
`array/reductions/mapreduce/1d`	`34419` ns	`40094.5` ns	`0.86`
`array/reductions/mapreduce/2d`	`41763.5` ns	`51319` ns	`0.81`
`array/broadcast`	`21106` ns	`21139` ns	`1.00`
`array/copyto!/gpu_to_gpu`	`12937` ns	`11014` ns	`1.17`
`array/copyto!/cpu_to_gpu`	`219200` ns	`216920` ns	`1.01`
`array/copyto!/gpu_to_cpu`	`285143` ns	`286440.5` ns	`1.00`
`array/accumulate/1d`	`109706` ns	`110134` ns	`1.00`
`array/accumulate/2d`	`80932` ns	`81297` ns	`1.00`
`array/construct`	`1266.9` ns	`1331.6999999999998` ns	`0.95`
`array/random/randn/Float32`	`48153.5` ns	`44531.5` ns	`1.08`
`array/random/randn!/Float32`	`25132` ns	`25102` ns	`1.00`
`array/random/rand!/Int64`	`27336` ns	`27335` ns	`1.00`
`array/random/rand!/Float32`	`8755` ns	`8881.833333333332` ns	`0.99`
`array/random/rand/Int64`	`34510` ns	`30496` ns	`1.13`
`array/random/rand/Float32`	`13273` ns	`13415` ns	`0.99`
`array/permutedims/4d`	`61469` ns	`61709` ns	`1.00`
`array/permutedims/2d`	`55555` ns	`55698` ns	`1.00`
`array/permutedims/3d`	`56369` ns	`56496` ns	`1.00`
`array/sorting/1d`	`2778468` ns	`2777538` ns	`1.00`
`array/sorting/by`	`3370277` ns	`3368839` ns	`1.00`
`array/sorting/2d`	`1086517.5` ns	`1086273.5` ns	`1.00`
`cuda/synchronization/stream/auto`	`1037.4545454545455` ns	`1004.5714285714286` ns	`1.03`
`cuda/synchronization/stream/nonblocking`	`8026.6` ns	`8095` ns	`0.99`
`cuda/synchronization/stream/blocking`	`845.3316326530612` ns	`844.8709677419355` ns	`1.00`
`cuda/synchronization/context/auto`	`1187.1` ns	`1158` ns	`1.03`
`cuda/synchronization/context/nonblocking`	`8035` ns	`7052.9` ns	`1.14`
`cuda/synchronization/context/blocking`	`908.8333333333334` ns	`884.25` ns	`1.03`

This comment was automatically generated by workflow using github-action-benchmark.

codecov · 2025-05-09T10:26:32Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.77%. Comparing base (71bc923) to head (ebab590).
Report is 1 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #2775      +/-   ##
==========================================
+ Coverage   89.72%   89.77%   +0.05%     
==========================================
  Files         153      153              
  Lines       13228    13228              
==========================================
+ Hits        11869    11876       +7     
+ Misses       1359     1352       -7

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Match output storage type

ebab590

maleadt added enhancement New feature or request cuda libraries Stuff about CUDA library wrappers. labels May 8, 2025

maleadt changed the title ~~cuTENSOR storage type fix~~ cuTENSOR: Preserve storage type when multiplying May 8, 2025

github-actions bot reviewed May 8, 2025

View reviewed changes

maleadt merged commit bb8259f into JuliaGPU:master May 9, 2025
3 checks passed

christiangnrd deleted the storage branch May 9, 2025 11:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cuTENSOR: Preserve storage type when multiplying #2775

cuTENSOR: Preserve storage type when multiplying #2775

christiangnrd commented May 8, 2025

maleadt commented May 8, 2025

github-actions bot left a comment

codecov bot commented May 9, 2025 •

edited

Loading

cuTENSOR: Preserve storage type when multiplying #2775

cuTENSOR: Preserve storage type when multiplying #2775

Conversation

christiangnrd commented May 8, 2025

maleadt commented May 8, 2025

github-actions bot left a comment

Choose a reason for hiding this comment

CUDA.jl Benchmarks

codecov bot commented May 9, 2025 • edited Loading

Codecov Report

codecov bot commented May 9, 2025 •

edited

Loading