Skip to content

Commit d5fef22

Browse files
authored
[Docs] Improve the AISBench multi-modal testing docs (#4255)
### What this PR does / why we need it? Add some of the pitfalls I ran into when using AISBench to test multi-modal models. - vLLM version: v0.11.0 - vLLM main: vllm-project/vllm@2918c1b --------- Signed-off-by: gcanlin <[email protected]>
1 parent d43022f commit d5fef22

File tree

1 file changed

+41
-0
lines changed

1 file changed

+41
-0
lines changed

docs/source/developer_guide/evaluation/using_ais_bench.md

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -152,6 +152,9 @@ rm gsm8k.zip
152152
Update the file `benchmark/ais_bench/benchmark/configs/models/vllm_api/vllm_api_general_chat.py`.
153153
There are several arguments that you should update according to your environment.
154154

155+
- `attr`: Identifier for the inference backend type, fixed as `service` (serving-based inference) or `local` (local model).
156+
- `type`: Used to select different backend API types.
157+
- `abbr`: Unique identifier for a local task, used to distinguish between multiple tasks.
155158
- `path`: Update to your model weight path.
156159
- `model`: Update to your model name in vLLM.
157160
- `host_ip` and `host_port`: Update to your vLLM server ip and port.
@@ -242,6 +245,8 @@ After each dataset execution, you can get the result from saved files such as `o
242245

243246
#### Execute Performance Evaluation
244247

248+
Text-only benchmarks:
249+
245250
```shell
246251
# run C-Eval dataset
247252
ais_bench --models vllm_api_general_chat --datasets ceval_gen_0_shot_cot_chat_prompt.py --summarizer default_perf --mode perf
@@ -262,6 +267,13 @@ ais_bench --models vllm_api_general_chat --datasets livecodebench_code_generate_
262267
ais_bench --models vllm_api_general_chat --datasets aime2024_gen_0_shot_chat_prompt.py --summarizer default_perf --mode perf
263268
```
264269

270+
Multi-modal benchmarks (text + images):
271+
272+
```shell
273+
# run textvqa dataset
274+
ais_bench --models vllm_api_stream_chat --datasets textvqa_gen_base64 --summarizer default_perf --mode perf
275+
```
276+
265277
After execution, you can get the result from saved files, there is an example as follows:
266278

267279
```
@@ -281,3 +293,32 @@ After execution, you can get the result from saved files, there is an example as
281293
|-- cevaldataset_plot.html # Final performance results (in html format)
282294
`-- cevaldataset_rps_distribution_plot_with_actual_rps.html # Final performance results (in html format)
283295
```
296+
297+
### 3. Troubleshooting
298+
299+
#### Invalid Image Path Error
300+
301+
If you download the TextVQA dataset following the AISBench documentation:
302+
303+
```bash
304+
cd ais_bench/datasets
305+
git lfs install
306+
git clone https://huggingface.co/datasets/maoxx241/textvqa_subset
307+
mv textvqa_subset/ textvqa/
308+
mkdir textvqa/textvqa_json/
309+
mv textvqa/*.json textvqa/textvqa_json/
310+
mv textvqa/*.jsonl textvqa/textvqa_json/
311+
```
312+
313+
you may encounter the following error:
314+
315+
```bash
316+
AISBench - ERROR - /vllm-workspace/benchmark/ais_bench/benchmark/clients/base_client.py - raise_error - 35 - [AisBenchClientException] Request failed: HTTP status 400. Server response: {"error":{"message":"1 validation error for ChatCompletionContentPartImageParam\nimage_url\n Input should be a valid dictionary [type=dict_type, input_value='data/textvqa/train_images/b2ae0f96dfbea5d8.jpg', input_type=str]\n For further information visit https://errors.pydantic.dev/2.12/v/dict_type None","type":"BadRequestError","param":null,"code":400}}
317+
```
318+
319+
You need to manually replace the dataset image paths with absolute paths, changing `/path/to/benchmark/ais_bench/datasets/textvqa/train_images/` to the actual absolute directory where the images are stored:
320+
321+
```bash
322+
cd ais_bench/datasets/textvqa/textvqa_json
323+
sed -i 's#data/textvqa/train_images/#/path/to/benchmark/ais_bench/datasets/textvqa/train_images/#g' textvqa_val.json
324+
```

0 commit comments

Comments
 (0)