-
Notifications
You must be signed in to change notification settings - Fork 26
gptq 算子文档 #23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
gptq 算子文档 #23
Conversation
infiniop/ops/matmul_gptq/README.md
Outdated
@@ -0,0 +1,210 @@ | |||
|
|||
# `MatmulGptq` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
QuantizeGPTQ
infiniop/ops/matmul_gptq/README.md
Outdated
z_{n, g} = \left\lfloor \frac{- \min_{k} \{w_{n, k}\}}{s_{n, g}} \right\rfloor | ||
$$ | ||
|
||
关于一些细节的补充可以参考 https://zhuanlan.zhihu.com/p/692338716 ,源代码参考 https://github.com/IST-DASLab/gptq 。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
还是需要把量化的方式大概展现出来,不要直接上链接
infiniop/ops/matmul_gptq/README.md
Outdated
### 计算 | ||
|
||
```c | ||
infiniStatus_t infiniopMatmulGptq( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
infiniopQuantizeLinearGPTQ
infiniop/ops/matmul_gptq/README.md
Outdated
### 创建算子描述 | ||
|
||
```c | ||
infiniStatus_t infiniopCreateMatmulGptqDescriptor( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
infiniopCreateQuantizeLinearGPTQDescriptor
infiniop/ops/matmul_gptq/README.md
Outdated
### 量化 | ||
|
||
```c | ||
infiniStatus_t infiniopMatmulQuant( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
infiniopQuantizeGPTQ
infiniop/ops/matmul_gptq/README.md
Outdated
|
||
```c | ||
infiniStatus_t infiniopMatmulQuant( | ||
infiniopMatmulGptqDescriptor_t desc, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里不能使用这个desc。因为linear的输入是动态的,而对权重的量化是不依赖任何动态形状信息的。接口逻辑需要重设计。或者你可以把这两个功能分成两个算子也可以
No description provided.