Adding SME1 Convolution Kernel to convole_kleidiai.cpp #26402

JonathanC-ARM · 2025-10-24T15:37:35Z

Description

Integration of SME1 Variant of existing SME2 convolution Kernel, kai_run_imatmul_clamp_f32_f32p2vlx1_f32p2vlx1b_2vlx2vl_sme_mopa and associated packing functions
Formatting changes in convolve_kleidiai.cpp
Addition of proper sme2 gate for dynamic qgemm
Updating of kleidiai version to 1.14 (first version which contains the appropriate kernel)

Signed-off-by: Jonathan Clohessy <[email protected]>

hariharans29 · 2025-10-28T17:10:56Z

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline

azure-pipelines · 2025-10-28T17:11:15Z

Azure Pipelines successfully started running 4 pipeline(s).

hariharans29 · 2025-10-28T20:08:22Z

cmake/deps.txt

 cudnn_frontend;https://github.com/NVIDIA/cudnn-frontend/archive/refs/tags/v1.12.0.zip;7e733cfdc410d777b76122d64232499205589a96
 dawn;https://github.com/google/dawn/archive/13c1635a14574ebb7116b56a69f5519301417fda.zip;0aadd28fc385cf7d657d5fc70a352372d2d3c76a
-kleidiai;https://github.com/ARM-software/kleidiai/archive/refs/tags/v1.10.0.tar.gz;11b62149cb2514b3b9069cc435c3aa7a4e82b97a
+kleidiai;https://github.com/ARM-software/kleidiai/archive/refs/tags/v1.14.0.tar.gz;161cce94808f1141b08e32096ccb1f294aa901c5


Looks like this can be bumped up to 1.15 now ? Given that #26301 gets that update anyway ?

hariharans29 · 2025-10-28T20:10:21Z

onnxruntime/core/mlas/lib/qgemm.cpp

-    ArmKleidiAI::MlasDynamicQGemmBatch(Shape, DataParams, BatchN, ThreadPool);
+    //No fallback and putting in guards. This implementation is SME2 specific.
+    if(ArmKleidiAI::UseSME2){
+        ArmKleidiAI::MlasDynamicQGemmBatch(Shape, DataParams, BatchN, ThreadPool);


I guess this change is no longer needed after #26301 supports SME variants now ?

hariharans29 · 2025-10-28T20:15:27Z

onnxruntime/core/mlas/lib/kleidiai/convolve_kleidiai.cpp

-                -std::numeric_limits<float>::max(), std::numeric_limits<float>::max()
-            );
+            if (ArmKleidiAI::UseSME2) {
+                KLEIDIAI_KERNEL_LOG("kai_run_imatmul_clamp_f32_f32p2vlx1_f32p2vlx1b_2vlx2vl_sme2_mopa" << " M=" << TileSizeM << " N=" << TileSizeN << " k_chunk_count=" << (d_kh * d_kw) << " k_chunk_length=" << ci);


I guess the usage of the logging macros here mean that we need to wait for the logging PR to be merged ?

Colm-in-Arm and others added 2 commits October 24, 2025 16:08

Adding SME1 Convolution Kernel.

b2506f2

Signed-off-by: Jonathan Clohessy <[email protected]>

Merge branch 'microsoft:main' into jclohess_sme1_convolution_integration

b49e63e

hariharans29 reviewed Oct 28, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adding SME1 Convolution Kernel to convole_kleidiai.cpp #26402

Adding SME1 Convolution Kernel to convole_kleidiai.cpp #26402

Uh oh!

JonathanC-ARM commented Oct 24, 2025

Uh oh!

hariharans29 commented Oct 28, 2025

Uh oh!

azure-pipelines bot commented Oct 28, 2025

Uh oh!

hariharans29 Oct 28, 2025

Uh oh!

hariharans29 Oct 28, 2025

Uh oh!

hariharans29 Oct 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Adding SME1 Convolution Kernel to convole_kleidiai.cpp #26402

Are you sure you want to change the base?

Adding SME1 Convolution Kernel to convole_kleidiai.cpp #26402

Uh oh!

Conversation

JonathanC-ARM commented Oct 24, 2025

Description

Uh oh!

hariharans29 commented Oct 28, 2025

Uh oh!

azure-pipelines bot commented Oct 28, 2025

Uh oh!

hariharans29 Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

hariharans29 Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

hariharans29 Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants