Skip to content

Benchmark AWQ and SmoothQuant within vLLM ecosystem #2815

@namgyu-youn

Description

@namgyu-youn

Overview

SmoothQuant and AWQ operate using a similar intuition, scaling up weights and scaling down activations. Therefore, benchmarks can share many logics within the vLLM ecosystem, using vllm/lm-eval and vllm/benchmarks/benchmark_latency:
https://vllm-ascend.readthedocs.io/en/latest/developer_guide/evaluation/using_lm_eval.html, https://github.com/vllm-project/vllm/blob/main/benchmarks/benchmark_latency.py.

Ideally, the following user guide and docs can be updated focusing on vLLM:

  • AWQ: torchao/prototype/awq/example.py & torchao/prototype/awq/README.md
  • SmoothQuant: torchao/prototype/smoothquant/example.py & torchao/prototype/smoothquant/README.md

Related Issue/PR

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions