[WebGPU] Unify core implementations of GEMM and MatMul #24586

xiaofeihan1 · 2025-04-29T01:42:37Z

Description

This PR extract core implementations into gemm_utils.cc which is used to generate shader both GEMM and Matmul ops. The core implemenations included scalar and vec4 versions of GEMM and Matmul.

Motivation and Context

There are many common codes for GEMM and Matmul, so we want to extra common code to unify their implementations.

onnxruntime/core/providers/webgpu/math/gemm.cc

onnxruntime/test/providers/cpu/math/gemm_test.cc

onnxruntime/test/providers/cpu/math/matmul_test.cc

onnxruntime/core/providers/webgpu/math/gemm_packed.cc

onnxruntime/test/providers/cpu/math/gemm_test.cc

onnxruntime/core/providers/webgpu/math/gemm_utils.cc

onnxruntime/core/providers/webgpu/math/gemm_packed.cc

onnxruntime/core/providers/webgpu/math/gemm_packed.h

onnxruntime/core/providers/webgpu/math/gemm_packed.cc

qjia7

LGTM.
@fs-eire @guschmue Please take a look, thanks.

guschmue · 2025-05-19T15:44:16Z

testing it with a bunch of models that use Gemm

guschmue · 2025-05-19T15:44:30Z

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows x64 QNN CI Pipeline

azure-pipelines · 2025-05-19T15:44:51Z

Azure Pipelines successfully started running 5 pipeline(s).

fs-eire · 2025-05-27T19:13:24Z

The following test cases are failing on QNN EP:

2025-05-27T17:02:22.9260730Z 1: [  FAILED  ] GemmOpTest.GemmOptimizePacked
2025-05-27T17:02:22.9260956Z 1: [  FAILED  ] GemmOpTest.GemmOptimizePackedTransA
2025-05-27T17:02:22.9261190Z 1: [  FAILED  ] GemmOpTest.GemmOptimizePackedTransB
2025-05-27T17:02:22.9261428Z 1: [  FAILED  ] GemmOpTest.GemmOptimizePackedTransAB

onnxruntime/test/providers/cpu/math/gemm_test.cc

fs-eire · 2025-05-28T06:36:08Z

Some CI jobs didn’t run. Trying to trigger them by close and re-open the PR.

guschmue · 2025-05-28T16:57:31Z

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows x64 QNN CI Pipeline

azure-pipelines · 2025-05-28T16:57:51Z

Azure Pipelines successfully started running 5 pipeline(s).

guschmue · 2025-05-28T16:59:22Z

/azp run Test Linux CUDA x64 Release,Test Linux TensorRT x64 Release,web_Debug / build_onnxruntime_web,web_Release / build_onnxruntime_web

azure-pipelines · 2025-05-28T16:59:28Z

No pipelines are associated with this pull request.

guschmue · 2025-05-28T19:37:15Z

/azp run web_Release / build_onnxruntime_web

azure-pipelines · 2025-05-28T19:37:21Z

No pipelines are associated with this pull request.

guschmue · 2025-05-28T19:37:50Z

I fear you need to merge with main - some definitions for the CI have changed :(

…ize_vec1

xiaofeihan1 · 2025-05-29T01:57:37Z

It seems that Lint pipeline is broken with the latest main.

fs-eire · 2025-05-29T03:40:26Z

There are some known issues and it's being fixed. please wait for the fixes merged into main.

guschmue · 2025-05-29T17:14:12Z

/azp run web_Debug / build_onnxruntime_web

azure-pipelines · 2025-05-29T17:14:18Z

No pipelines are associated with this pull request.

…ize_vec1

fs-eire · 2025-05-30T04:56:27Z

Merged latest main branch, which contains the CI pipeline fixes.

fs-eire · 2025-05-30T05:10:37Z

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows x64 QNN CI Pipeline

azure-pipelines · 2025-05-30T05:10:57Z

Azure Pipelines successfully started running 5 pipeline(s).

xiaofeihan1 added 3 commits April 29, 2025 09:40

add vec1

a3747fe

use 8,8,1 to implement

fffb4b9

unify matmul and gemm

0031b54

xiaofeihan1 changed the title ~~[WIP][WebGPU] Add vec1 implementation for GEMM~~ [WIP][WebGPU] Unify core implementations of GEMM and MatMul May 12, 2025

xiaofeihan1 added 5 commits May 12, 2025 22:25

fix build error

4ec92c4

add naive

33e8f11

naive cache hint

c10e51e

remove comments

6a1440a

fix pipeline

a245c48