Skip to content

Conversation

@yangjianfengo1
Copy link
Contributor

@yangjianfengo1 yangjianfengo1 commented Sep 25, 2025

描述:
本 PR 为 w4afp8的激活支持动态per token量化,权重支持per group量化,对于token=256,m=1792, k=8192的moe w4afp8 gemm,激活shape为[256, 8192],权重shape为[1792,8192] (方便描述起见省略了专家数)

  • 之前激活的量化方式静态per tensor,激活scale的shape为[1],权重量化方式为per channel,即scale的shape为[1792],
  • 现在激活的量化方式动态per token,激活scale的shape为[256],权重可以在channel维度上支持per group,group的大小必须是128的倍数,即scale的shape为[1792, 8192 / 128=64]

使用方式

  • 权重若要开启per group量化,那么产出的权重scale的shape为[num_export, K/128,M]
  • 激活若要开启动态量化,在权重的config的quantization_config字段中添加"moe_dynamic_quant": true

性能变化:
image

@yangjianfengo1 yangjianfengo1 changed the title w4afp8 支持per group 【New Feature】W4afp8 supports per group quantization Sep 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants