-
Notifications
You must be signed in to change notification settings - Fork 322
Open
Description
Overview
SmoothQuant and AWQ operate using a similar intuition, scaling up weights and scaling down activations. Therefore, benchmarks can share many logics within the vLLM ecosystem, using vllm/lm-eval
and vllm/benchmarks/benchmark_latency
:
https://vllm-ascend.readthedocs.io/en/latest/developer_guide/evaluation/using_lm_eval.html, https://github.com/vllm-project/vllm/blob/main/benchmarks/benchmark_latency.py.
Ideally, the following user guide and docs can be updated focusing on vLLM:
- AWQ:
torchao/prototype/awq/example.py & torchao/prototype/awq/README.md
- SmoothQuant:
torchao/prototype/smoothquant/example.py & torchao/prototype/smoothquant/README.md
Related Issue/PR
gau-nernst
Metadata
Metadata
Assignees
Labels
No labels