Skip to content

Conversation

@k50112113
Copy link

@k50112113 k50112113 commented Dec 2, 2025

this PR fuses GEMM + split + cat in prefill of DS FP4

the kernel is in the standing PR on AITER waiting to be merge: ROCm/aiter#1434
the kernel is also merged to https://github.com/ROCm/aiter/tree/shaoclee/ds_fp4_fusion_mla_fix_1202 for building future images

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value|   |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.968|±  |0.0112|
|     |       |strict-match    |     5|exact_match|↑  |0.964|±  |0.0118|

same implementation for DS FP8 was done by Farel
vllm: https://github.com/ROCm/vllm/tree/farlukas/355_wip_ds_fp8_fuse_kv_proj_cat
aiter: https://github.com/ROCm/aiter/tree/farlukas/fused_gemm_a8w8_blockscale_split_cat

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants