-
Notifications
You must be signed in to change notification settings - Fork 68
Rewrite mma unit tests #557
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
6f44bc8 to
6e2a6ac
Compare
test/unit/cute/intel_xe/mma.cpp
Outdated
| TEST(PVC_CuTe_Xe, MMA_XE_8x16x32_S32S8S8S32_TT) { | ||
| MMA_Test<XE_8x16x32_S32S8S8S32_TT, 64, 64, 8, 16, 32, int8_t, int8_t, | ||
| int32_t>(512, 512, 256); | ||
| run_mma_test<XE_8x16x32_S32S8S8S32_TT, Shape<_2, _2, _1>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the intention that the two of those are the same? If not, why not?
If you tried to keep them it's wrong because e.g. MNK would need to be 512,512,256 not 64,64,32.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same shape, such as <512 512 256>, cannot be used here because it will throw an OUT OF RESOURCE error during compilation. Since this unit test only verifies the correctness of atom MMA, the shape size has been reduced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is the error coming from?
In general please document in the PR message any non-obvious changes unrelated to the main goal of the PR. Helps both with code review and with later understanding why code was changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This error occurs because the original GEMM kernel computation copies data directly from global memory to registers. However, cooperative kernels, similar to nv's approach, first copy data from global memory to shared local memory. The original shape size exceeds the capacity of shared local memory, which leads to this error.
And I have updated the document.
97b9cf6 to
2202405
Compare
2efd984 to
648d6a6
Compare
| } | ||
|
|
||
| TEST(PVC_CuTe_Xe, MMA_DPAS_U8_1x16) { | ||
| MMA_Test<XE_DPAS_TT<1, int32_t, uint8_t>, 8, 64, 1, 16, 32, uint8_t, uint8_t, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should enable these UT as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
648d6a6 to
4eadb54
Compare
| ASMemCopyOp, BSMemCopyOp, CSMemCopyLdOp, CSMemCopyStOp | ||
| >> | ||
| ( sc_exp::launch_policy{sc::dim3(1), sc::dim3(ThreadBlockSize), sc_exp::local_mem_size{shared_memory_size}}, | ||
| ( sc_exp::launch_policy{sc::dim3(1), sc::dim3(ThreadBlockSize), sc_exp::local_mem_size{shared_memory_size}, sc_exp::kernel_properties{sycl_ext::sub_group_size<16>}}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add some comments for this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Rewrite MMA unit tests to use the cooperative GEMM interface (test_cooperative_gemm_col_major_layout) instead of the legacy standalone MMA_Test implementation.
Changes:
Removed legacy kernel implementation : The old gemm_device, gemm, and MMA_Test template functions with manual SYCL kernel launch logic were removed in favor of reusing the shared cooperative GEMM test infrastructure.
Reduced problem sizes: All tests now use [Shape<_128, _128, _16>{}] (128x128x16) instead of the previous (512x512x256) due to shared local memory limitations.
Added sub_group_size to cooperative GEMM launch policy : Added sc_exp::kernel_properties{sycl_ext::sub_group_size<16>} to the kernel launch policy in test_cooperative_gemm and test_cooperative_gemm_rmem_c. This is required because if the sub-group size is not explicitly specified, it may lead to GEMM computation errors on some hardware (e.g., PVC).