You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Use newer version of mma_atom and copy_atom in 00_bmg_gemm (#540)
Modify 00_bmg_gemm to include new mma and copy atoms
(#477).
00_bmg_gemm combines two parts: mma and epilogue. To add new atom
changes, we need to update both parts since they currently use old
atoms. As starting we will:
> Keep CollectiveEpilogue unchanged for now
> Only modify CollectiveMma first
Old Atom:
Problem Size: 5120x4096x4096x1
Cutlass GEMM Performance: [96.448]TFlop/s (1.7813)ms
New Atom:
Problem Size: 5120x4096x4096x1
Cutlass GEMM Performance: [97.259]TFlop/s (1.7664)ms
Also depend on new copy_c/copy_d apis for load/store
#572
---------
Co-authored-by: Anamika Chatterjee <[email protected]>
0 commit comments