-
Notifications
You must be signed in to change notification settings - Fork 586
Benchmark HF optimum-executorch #11450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
e4718b0
to
fff15c6
Compare
fff15c6
to
00149f2
Compare
00149f2
to
112eb2b
Compare
112eb2b
to
a38a694
Compare
a38a694
to
a0f636f
Compare
a0f636f
to
5d6dd04
Compare
@huydhn Okay, it turns out that I need to run install with |
5d6dd04
to
01ce07b
Compare
df785ca
to
8aa9c02
Compare
Another issue. @huydhn Why isn't the artifact uploaded though it's zipped and moved to the dest dir successfully? |
8aa9c02
to
b0d829a
Compare
"hf_xnnpack_custom_spda_kv_cache_8da4w", | ||
"et_xnnpack_custom_spda_kv_cache_8da4w", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whats the difference between these two?
-X \ | ||
--xnnpack-extended-ops \ | ||
-qmode 8da4w -G 32 -E 8,0 \ | ||
--metadata '{"get_bos_id":128000, "get_eos_ids":[128009, 128001]}' \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are these for llama_3_2?
Benchmark LLMs from
optimum-executorch
. With all the work recently happening inoptimum-executorch
, we are able to boost the out-of-the-box performance. Putting these models on benchmark infra to gather perf numbers and understand the remaining perf gaps between the in-house generated model via export_llama.We are able to do apple-to-apple comparison for CPU backend by introducing quant, custom SPDA, custom KV cache to native Hugging Face models in
optimum-executorch
:hf_xnnpack_custom_spda_kv_cache_8da4w
represents the recipe used by optimum-et,et_xnnpack_custom_spda_kv_cache_8da4w
is the counterpart for etLLM.Here are the benchmark jobs in our infra:
Note there may be failures when running optimum-et models on-device due to lack of support HF tokenizers in llama runner. I will remove packing tokenizer.json from the .zip shortly so that the benchmark apps will take optimum-et LLMs as non-GenAI models.