Add matmul/addmm bwd examples and add test coverage #748

tianrengao · 2025-10-01T20:08:59Z

This PR adds backward pass (gradient computation) support for matrix multiplication and addmm operations in Helion, with comprehensive unit tests to ensure correctness against PyTorch baselines.

Key Changes

1. Backward Kernels

matmul_bwd: Computes gradients for matrix multiplication (grad_A = grad_C @ B.T, grad_B = A.T @ grad_C)
addmm_bwd: Computes gradients for addmm operation with alpha/beta scaling support

2. PyTorch Autograd

MatMulFunction & AddMMFunction: PyTorch autograd classes with proper *grad_outputs signatures
matmul_autograd & addmm_autograd: User-friendly API functions

3. Unit Tests

test_matmul_bwd: Validates matrix multiplication backward pass against PyTorch baseline
test_addmm_bwd: Validates addmm backward pass with gradient flow for all inputs

Usage Example

# Matrix multiplication with gradients
mat1 = torch.randn([128, 256], requires_grad=True, device='cuda')
mat2 = torch.randn([256, 128], requires_grad=True, device='cuda')
result = matmul_autograd(mat1, mat2)
result.sum().backward()  # Gradients available in mat1.grad, mat2.grad

# AddMM with scaling
bias = torch.randn([128, 128], requires_grad=True, device='cuda')
result = addmm_autograd(bias, mat1, mat2, alpha=2.0, beta=0.5)

Files Modified

examples/matmul.py: Added backward kernels, autograd classes, and enhanced testing
test/test_examples.py: Added test_matmul_bwd and test_addmm_bwd unit tests
test/test_examples.expected: Updated expected outputs for new tests

yf225 · 2025-10-07T16:00:27Z

examples/matmul.py

    m, k = mat1.size()
    k2, n = mat2.size()
    bias = torch.broadcast_to(bias, [m, n])
    return lambda: matmul(mat1, mat2, lambda acc, tile: acc + bias[tile[0], tile[1]])


Would you like to also add integration in benchmarks/run.py (similar to rms_norm-bwd), and test accuracy via tritonbench --metrics accuracy?

Also I believe these two *_tritonbench functions should probably just call the *_autograd functions.

Tritonbench PR for adding addm-bwd and gemm-bwd landed: meta-pytorch/tritonbench#531

All test passed except for gemm with partition-k. It seems partition-k in fwd is broken.

Addmm fwd

(M, N, K) triton_addmm-accuracy pt2_triton_matmul-accuracy helion_addmm_tritonbench-accuracy ------------------ ----------------------- ---------------------------- ----------------------------------- (20120, 512, 1536) 1 1 1 (34579, 512, 1536) 1 1 1 (34839, 512, 1536) 1 1 1 average 1 1 1

Addmm bwd

(M, N, K) triton_addmm-accuracy pt2_triton_matmul-accuracy helion_addmm_tritonbench-accuracy ------------------ ----------------------- ---------------------------- ----------------------------------- (20120, 512, 1536) 1 1 1 (34579, 512, 1536) 1 1 1 (34839, 512, 1536) 1 1 1 average 1 1 1

gemm fwd

(M, N, K) triton_tutorial_matmul-accuracy matmul_partition_k-accuracy triton_ops_matmul-accuracy aten_tunableop_matmul-accuracy pt2_triton_matmul-accuracy streamk_matmul-accuracy pt2_cutlass_matmul-accuracy helion_matmul_tritonbench-accuracy --------------- --------------------------------- ----------------------------- ---------------------------- -------------------------------- ---------------------------- ------------------------- ----------------------------- ------------------------------------ (256, 256, 256) 1 0 1 1 1 1 1 1 (384, 384, 384) 1 0 1 1 1 1 1 1 (512, 512, 512) 1 1 1 1 1 1 1 1 average 1 0.333333 1 1 1 1 1 1

gemm bwd

(M, N, K) triton_tutorial_matmul-accuracy matmul_partition_k-accuracy triton_ops_matmul-accuracy aten_tunableop_matmul-accuracy pt2_triton_matmul-accuracy streamk_matmul-accuracy pt2_cutlass_matmul-accuracy helion_matmul_tritonbench-accuracy --------------- --------------------------------- ----------------------------- ---------------------------- -------------------------------- ---------------------------- ------------------------- ----------------------------- ------------------------------------ (256, 256, 256) 1 0 1 1 1 1 1 1 (384, 384, 384) 1 0 1 1 1 1 1 1 (512, 512, 512) 1 0 1 1 1 1 1 1 average 1 0 1 1 1 1 1 1

- Fixed matmul_tritonbench to use addmm_autograd for gemm with bias testing - Updated tritonbench operators for proper requires_grad handling - gemm-bwd showing 100% accuracy for Helion implementation - Some gradient mismatches on larger shapes still being investigated Current test results: - addmm-bwd: 100% accuracy - gemm-bwd: 100% accuracy for Helion, some framework gradient issues remain

…leop_results0.csv, PR_748_DOCUMENTATION.md)

tianrengao · 2025-10-09T22:32:11Z

The test failure seems unrelated to my PR. It complains about illegal memory access for 500+ tests. I saw similar failures in other PRs too.

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 1, 2025

tianrengao changed the title ~~add matmul bwd~~ Add matmul/addmm bwd and add test coverage Oct 1, 2025

tianrengao changed the title ~~Add matmul/addmm bwd and add test coverage~~ Add matmul/addmm bwd examples and add test coverage Oct 1, 2025

tianrengao force-pushed the tianren/addmm_bwd branch from a7958ba to 0960861 Compare October 1, 2025 22:02

tianrengao marked this pull request as ready for review October 2, 2025 00:22

tianrengao requested a review from yf225 October 2, 2025 02:07

choijon5 mentioned this pull request Oct 2, 2025

Set up benchmark for TritonBench kernels #234

Open

74 tasks

yf225 reviewed Oct 7, 2025

View reviewed changes

tianrengao added 13 commits October 9, 2025 14:55

rebase to main and clean up code

61fa24a

matmul bwd updated, test updated, fix issues

6ddd36b

lint

117e807

fix test and lint

a8cf88c

clean up code and fix lint:

3321963

rerun with accept to update test matmul:

267b21f

bypass lint error complaning grad_out overwrite builtin func

5176df1

enable bwd tests for addmm and gemm; need tritonbench pr merge first

90f9f7b

update test and submodule

eb539e7

Remove accidentally committed files (addmm.py, crossentropy.py, tunab…

34d58bd

…leop_results0.csv, PR_748_DOCUMENTATION.md)

Remove accidentally committed root matmul.py file

6ce91c6

rebase and rerun EXPECTACCEPT=1

ee4f1d9

tianrengao force-pushed the tianren/addmm_bwd branch from 04eb760 to ee4f1d9 Compare October 9, 2025 22:11

tianrengao requested a review from yf225 October 9, 2025 22:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add matmul/addmm bwd examples and add test coverage #748

Add matmul/addmm bwd examples and add test coverage #748

tianrengao commented Oct 1, 2025 •

edited

Loading

Uh oh!

yf225 Oct 7, 2025

Uh oh!

tianrengao Oct 9, 2025

Uh oh!

tianrengao commented Oct 9, 2025

Uh oh!

Uh oh!

Add matmul/addmm bwd examples and add test coverage #748

Are you sure you want to change the base?

Add matmul/addmm bwd examples and add test coverage #748

Conversation

tianrengao commented Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Key Changes

1. Backward Kernels

2. **PyTorch Autograd **

3. Unit Tests

Usage Example

Files Modified

Uh oh!

yf225 Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

tianrengao Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

tianrengao commented Oct 9, 2025

Uh oh!

Uh oh!

tianrengao commented Oct 1, 2025 •

edited

Loading

2. PyTorch Autograd