Add llm benchmark #881

haixuanTao · 2025-03-17T15:32:19Z

This PR makes it possible to benchmark inference speed based on inference engine.

In my case I was able to detect that for GGUF, the model runs twice as fast as on transformers based config for qwen2.5 0.5B:

path	date	average_duration (s)	max_duration (s)	min_duration (s)	median_duration (s)	median_frequency	average_speed (tokens/s)	max_speed (tokens/s)	min_speed (tokens/s)	median_speed (tokens/s)	total_tokens
dora-llama-cpp-python	2025-03-17 15:45:25	0.03	0.09	0.03	0.03	37.76	222.73	233.59	69.38	226.54	6
dora-transformers	2025-03-17 16:20:33	0.07	0.40	0.05	0.06	16.15	96.37	111.81	15.14	96.90	6

haixuanTao · 2025-03-19T10:14:09Z

@MunishMummadi FYI, there is some minor breaking changes within transformers to make it slightly more easily debuggable. I had some weird bug on the way with inconsistent responses.

I think we can readd your optimizations later on when we can consistently test them.

S

MunishMummadi · 2025-03-20T00:27:49Z

Noted. Happy to do so, when you want me to.

haixuanTao force-pushed the add-llm-benchmark branch from c735dd2 to b3ed0ed Compare March 18, 2025 10:44

haixuanTao added 8 commits March 18, 2025 21:17

Minor fix within llama-cpp-python transformers and qwen

4c2194b

Adding benchmarking tool for llms

2d0bb81

Add a readme

535572d

Small typo fix

4b09f68

Adding additional script within benchmark

2cd29a8

Improve audio capabilities of phi4

2be6c50

Improve translation

dec278a

Try to remove as much noise as possible within phi4

ee7e72e

haixuanTao force-pushed the add-llm-benchmark branch from b8ed039 to ee7e72e Compare March 18, 2025 20:18

Fix linting

7fef34f

haixuanTao force-pushed the add-llm-benchmark branch from 37a7678 to 7fef34f Compare March 18, 2025 21:41

haixuanTao added 4 commits March 18, 2025 23:54

Remove unused dataflow from benchmark

effe15b

Rename benchs -> benches to be conformant with cargo

5ebd723

Fix test assertion in benchmark script

98cbff4

make history conditional

4dcc61f

haixuanTao added 3 commits March 19, 2025 11:29

Rewrite readme and add phi4 dependencies

32da06a

Adding talking capabilities

236af5b

Make phi4 test for flash_attn installation

b2f297a

haixuanTao merged commit dfb5942 into main Mar 19, 2025
125 checks passed

haixuanTao deleted the add-llm-benchmark branch March 19, 2025 11:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add llm benchmark #881

Add llm benchmark #881

haixuanTao commented Mar 17, 2025 •

edited

Loading

haixuanTao commented Mar 19, 2025

MunishMummadi commented Mar 20, 2025

Add llm benchmark #881

Add llm benchmark #881

Conversation

haixuanTao commented Mar 17, 2025 • edited Loading

haixuanTao commented Mar 19, 2025

MunishMummadi commented Mar 20, 2025

haixuanTao commented Mar 17, 2025 •

edited

Loading