Skip to content

Conversation

@yuanhang-dev
Copy link

@yuanhang-dev yuanhang-dev commented Oct 13, 2025

Rewrite MMA unit tests to use the cooperative GEMM interface (test_cooperative_gemm_col_major_layout) instead of the legacy standalone MMA_Test implementation.

Changes:

  • Removed legacy kernel implementation : The old gemm_device, gemm, and MMA_Test template functions with manual SYCL kernel launch logic were removed in favor of reusing the shared cooperative GEMM test infrastructure.

  • Reduced problem sizes: All tests now use [Shape<_128, _128, _16>{}] (128x128x16) instead of the previous (512x512x256) due to shared local memory limitations.

  • Added sub_group_size to cooperative GEMM launch policy : Added sc_exp::kernel_properties{sycl_ext::sub_group_size<16>} to the kernel launch policy in test_cooperative_gemm and test_cooperative_gemm_rmem_c. This is required because if the sub-group size is not explicitly specified, it may lead to GEMM computation errors on some hardware (e.g., PVC).

@yuanhang-dev yuanhang-dev force-pushed the yh/rewrite_mma branch 2 times, most recently from 6f44bc8 to 6e2a6ac Compare October 13, 2025 08:41
TEST(PVC_CuTe_Xe, MMA_XE_8x16x32_S32S8S8S32_TT) {
MMA_Test<XE_8x16x32_S32S8S8S32_TT, 64, 64, 8, 16, 32, int8_t, int8_t,
int32_t>(512, 512, 256);
run_mma_test<XE_8x16x32_S32S8S8S32_TT, Shape<_2, _2, _1>,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the intention that the two of those are the same? If not, why not?
If you tried to keep them it's wrong because e.g. MNK would need to be 512,512,256 not 64,64,32.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same shape, such as <512 512 256>, cannot be used here because it will throw an OUT OF RESOURCE error during compilation. Since this unit test only verifies the correctness of atom MMA, the shape size has been reduced.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the error coming from?

In general please document in the PR message any non-obvious changes unrelated to the main goal of the PR. Helps both with code review and with later understanding why code was changed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error occurs because the original GEMM kernel computation copies data directly from global memory to registers. However, cooperative kernels, similar to nv's approach, first copy data from global memory to shared local memory. The original shape size exceeds the capacity of shared local memory, which leads to this error.

And I have updated the document.

@Antonyvance Antonyvance added the Tests For Unit tests and Benchmark tests and general validation specific changes label Oct 17, 2025
@yuanhang-dev yuanhang-dev force-pushed the yh/rewrite_mma branch 2 times, most recently from 97b9cf6 to 2202405 Compare December 9, 2025 08:48
}

TEST(PVC_CuTe_Xe, MMA_DPAS_U8_1x16) {
MMA_Test<XE_DPAS_TT<1, int32_t, uint8_t>, 8, 64, 1, 16, 32, uint8_t, uint8_t,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should enable these UT as well.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@tdeng5 tdeng5 self-requested a review December 9, 2025 09:43
ASMemCopyOp, BSMemCopyOp, CSMemCopyLdOp, CSMemCopyStOp
>>
( sc_exp::launch_policy{sc::dim3(1), sc::dim3(ThreadBlockSize), sc_exp::local_mem_size{shared_memory_size}},
( sc_exp::launch_policy{sc::dim3(1), sc::dim3(ThreadBlockSize), sc_exp::local_mem_size{shared_memory_size}, sc_exp::kernel_properties{sycl_ext::sub_group_size<16>}},
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add some comments for this change.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@tdeng5 tdeng5 self-requested a review December 10, 2025 07:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Tests For Unit tests and Benchmark tests and general validation specific changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants