Skip to content

Conversation

xgqdut2016
Copy link
Collaborator

No description provided.

@PanZezhong1725 PanZezhong1725 marked this pull request as draft April 22, 2025 02:02
@@ -0,0 +1,210 @@

# `MatmulGptq`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QuantizeGPTQ

z_{n, g} = \left\lfloor \frac{- \min_{k} \{w_{n, k}\}}{s_{n, g}} \right\rfloor
$$

关于一些细节的补充可以参考 https://zhuanlan.zhihu.com/p/692338716 ,源代码参考 https://github.com/IST-DASLab/gptq 。
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

还是需要把量化的方式大概展现出来,不要直接上链接

### 计算

```c
infiniStatus_t infiniopMatmulGptq(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

infiniopQuantizeLinearGPTQ

### 创建算子描述

```c
infiniStatus_t infiniopCreateMatmulGptqDescriptor(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

infiniopCreateQuantizeLinearGPTQDescriptor

### 量化

```c
infiniStatus_t infiniopMatmulQuant(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

infiniopQuantizeGPTQ


```c
infiniStatus_t infiniopMatmulQuant(
infiniopMatmulGptqDescriptor_t desc,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里不能使用这个desc。因为linear的输入是动态的,而对权重的量化是不依赖任何动态形状信息的。接口逻辑需要重设计。或者你可以把这两个功能分成两个算子也可以

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants