Benchmark AWQ and SmoothQuant within vLLM ecosystem

### Overview
SmoothQuant and AWQ operate using a similar intuition, scaling up weights and scaling down activations. Therefore, benchmarks can share many logics within the vLLM ecosystem, using `vllm/lm-eval` and `vllm/benchmarks/benchmark_latency`: 
https://vllm-ascend.readthedocs.io/en/latest/developer_guide/evaluation/using_lm_eval.html, https://github.com/vllm-project/vllm/blob/main/benchmarks/benchmark_latency.py.

Ideally, the following user guide and docs can be updated focusing on vLLM: 
- AWQ: `torchao/prototype/awq/example.py & torchao/prototype/awq/README.md`
- SmoothQuant: `torchao/prototype/smoothquant/example.py & torchao/prototype/smoothquant/README.md`

### Related Issue/PR
 - https://github.com/pytorch/ao/pull/2728#discussion_r2285972094 
 - https://github.com/mit-han-lab/llm-awq/issues/130

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Benchmark AWQ and SmoothQuant within vLLM ecosystem #2815

Overview

Related Issue/PR

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Benchmark AWQ and SmoothQuant within vLLM ecosystem #2815

Description

Overview

Related Issue/PR

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions