HF model tracker #899

pdhirajkumarprasad · 2025-01-09T08:51:19Z

Total no. of models	545
PASS	307 -> 408
Numeric	12 -> 37
compilation
compiled_inference
setup and import

Detailed list

amd-vivekag · 2025-02-13T09:54:51Z

Passing Summary

TOTAL TESTS = 142

Stage	# Passing	% of Total	% of Attempted
Setup	130	91.5%	91.5%
IREE Compilation	64	45.1%	49.2%
Gold Inference	43	30.3%	67.2%
IREE Inference Invocation	38	26.8%	88.4%
Inference Comparison (PASS)	36	25.4%	94.7%

Fail Summary

TOTAL TESTS = 142

Stage	# Failed at Stage	% of Total
Setup	12	8.5%
IREE Compilation	66	46.5%
Gold Inference	21	14.8%
IREE Inference Invocation	5	3.5%
Inference Comparison	2	1.4%

Passing Summary for text-classification testcases:

TOTAL TESTS = 72

Stage	# Passing	% of Total	% of Attempted
Setup	72	100.0%	100.0%
IREE Compilation	62	86.1%	86.1%
Gold Inference	62	86.1%	100.0%
IREE Inference Invocation	61	84.7%	98.4%
Inference Comparison (PASS)	60	83.3%	98.4%

Fail Summary

TOTAL TESTS = 72

Stage	# Failed at Stage	% of Total
Setup	0	0.0%
IREE Compilation	10	13.9%
Gold Inference	0	0.0%
IREE Inference Invocation	1	1.4%
Inference Comparison	1	1.4%

Failure summary:

#	Stage
61	compilation
6	compiled_inference
5	construct_inputs
15	import_model
16	native_inference
12	setup

GIST containing all the failures: https://gist.github.com/amd-vivekag/377a7b141b40c118f880b2ced176f95c

Setup failures categories:
Total Failures: 12

#	Device	Issue type	Issue Message	Issue no	#Model impacted	List of model	Assignee
1	CPU	setup	ImportError("Loading an AWQ quantized model requires auto-awq library (`pip install autoawq`)	918	2	hf_Midnight-Miqu-70B-v1.5-4bit, hf_Meta-Llama-3.1-8B-Instruct-AWQ-INT4
2	CPU	setup	requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url	919	3	hf_Multiple_Choice, hf_multiple_choice_model, hf_Multiple_Choice_EN
3	CPU	setup	IndexError: index out of range in self	920	1	hf_ruRoPEBert-e5-base-2k
4	CPU	setup	Unknown task: fill-mask	921	2	hf_multi-qa-mpnet-base-cos-v1, hf_all-mpnet-base-v1
5	CPU	setup	importlib.metadata.PackageNotFoundError: No package metadata was found for bitsandbytes	922	1	hf_Meta-Llama-3.1-8B-Instruct-bnb-4bit
6	CPU	setup	RuntimeError: Error(s) in loading state_dict for DebertaV2ForMultipleChoice:	923	1	hf_fine-tuned-MoritzLaurer-deberta-v3-large-zeroshot-v2.0-arceasy
7	CPU	setup	TypeError: DisableCompileContextManager.enter....() got an unexpected keyword argument 'dtype'	924	1	hf_Llama3-8B-1.58-100B-tokens-GGUF
8	CPU	setup	torch.onnx.errors.UnsupportedOperatorError: Exporting the operator 'aten::bitwise_and' to ONNX opset version 14 is not supported	925	1	hf_Mistral-7B-Instruct-v0.2-GPTQ
9	CPU	import_model	Killed due to OOM	#926	1	hf_StableBeluga2
10	CPU	import_model	assertNonNull: Assertion `g.get() != nullptr` failed	#927	5	hf_esm2_t36_3B_UR50D, hf_Phi-3.5-mini-instruct, hf_Phi-3-mini-128k-instruct, hf_Phi-3-mini-4k-instruct, hf_zephyr-7b-beta
11	CPU	import_model	assertInVersionRange: Assertion `version >= version_range.first && version <= version_range.second` failed	#928	8	hf_llama-7b, hf_oasst-sft-4-pythia-12b-epoch-3.5, hf_Qwen2.5-1.5B-Instruct, hf_Qwen2.5-7B-Instruct, hf_Qwen2-7B-Instruct, hf_TinyLlama-1.1B-Chat-v1.0, hf_vicuna-7b-v1.5, hf_wasmai-7b-v1
12	CPU	import_model	Assertion `node->outputs().size() < 4` failed	#929	1	hf_nfnet_l0.ra2_in1k
13	CPU	compilation	error: failed to legalize operation 'torch.operator' that was explicitly marked illegal	#930	45	hf_1_microsoft_deberta_V1.0, hf_1_microsoft_deberta_V1.1, hf_checkpoints_10_1_microsoft_deberta_V1.1_384, hf_checkpoints_1_16, hf_checkpoints_26_9_microsoft_deberta_21_9, hf_checkpoints_28_9_microsoft_deberta_V2, hf_checkpoints_28_9_microsoft_deberta_V4, hf_checkpoints_28_9_microsoft_deberta_V5, hf_checkpoints_29_9_microsoft_deberta_V1, hf_checkpoints_30_9_microsoft_deberta_V1.0_384, hf_checkpoints_3_14, hf_content, hf_deberta-base, hf_deberta_finetuned_pii, hf_deberta-large-mnli, hf_Debertalarg_model_multichoice_Version2, hf_deberta-v2-base-japanese, hf_deberta-v2-base-japanese-char-wwm, hf_deberta-v3-base, hf_deberta-v3-base-absa-v1.1, hf_deberta-v3-base_finetuned_ai4privacy_v2, hf_deberta-v3-base-injection, hf_DeBERTa-v3-base-mnli-fever-anli, hf_deberta-v3-base-squad2, hf_deberta-v3-base-zeroshot-v1.1-all-33, hf_deberta-v3-large, hf_deberta-v3-large_boolq, hf_deberta-v3-large-squad2, hf_deberta-v3-large_test, hf_deberta-v3-large_test_9e-6, hf_deberta-v3-small, hf_deberta-v3-xsmall, hf_llm-mdeberta-v3-swag, hf_mdeberta-v3-base, hf_mDeBERTa-v3-base-mnli-xnli, hf_mdeberta-v3-base-squad2, hf_mDeBERTa-v3-xnli-ft-bs-multiple-choice, hf_Medical-NER, hf_mxbai-rerank-base-v1, hf_mxbai-rerank-xsmall-v1, hf_nli-deberta-v3-base, hf_output, hf_piiranha-v1-detect-personal-information, hf_splinter-base, hf_splinter-base-qass
14	CPU	compilation	error: failed to legalize unresolved materialization from ('i64') to ('index') that remained live after conversion	#931	3	hf_deeplabv3-mobilevit-small, hf_deeplabv3-mobilevit-xx-small, hf_mobilevit-small
15	CPU	compilation	error: 'flow.dispatch.workgroups' op value set has 3 dynamic dimensions but only 2 dimension values are attached	#932	3	hf_beit-base-patch16-224-pt22k, hf_beit-base-patch16-224-pt22k-ft22k, hf_pedestrian_gender_recognition
16	CPU	compilation	error: expected sizes to be non-negative, but got -1	#933	7	hf_swin_base_patch4_window7_224.ms_in22k_ft_in1k, hf_swin-tiny-patch4-window7-224, hf_yolos-base, hf_yolos-fashionpedia, hf_yolos-small, hf_yolos-small-finetuned-license-plate-detection, hf_yolos-small-rego-plates-detection
17	CPU	compilation	error: 'stream.async.dispatch' op has invalid Read access range	#934	1	hf_dpt-large-ade
18	CPU	compilation	error: 'iree_linalg_ext.pack' op write affecting operations on global resources are restricted to workgroup distributed contexts.	#935	1	hf_distilhubert
19	CPU	compilation	error: expected offsets to be non-negative, but got -1	#936	1	hf_pnasnet5large.tf_in1k
20	CPU	construct_inputs	ValueError: Asking to pad but the tokenizer does not have a padding token	#938	4	hf_distilgpt2, hf_gpt2, hf_llama-68m, hf_tiny-random-mistral
21	CPU	construct_inputs	name 'tokens' is not defined	#939	1	hf_wavlm-base-plus	@amd-vivekag
22	CPU	native_inference	IndexError: tuple index out of range	#940	14	hf_bart-base, hf_gpt2-small-spanish, hf_ivila-row-layoutlm-finetuned-s2vl-v2, hf_opt-125m, hf_Qwen1.5-0.5B-Chat, hf_Qwen2-0.5B, hf_Qwen2.5-0.5B-Instruct, hf_really-tiny-falcon-testing, hf_tiny-dummy-qwen2, hf_tiny-Qwen2ForCausalLM-2.5, hf_tiny-random-GemmaForCausalLM, hf_tiny-random-LlamaForCausalLM, hf_tiny-random-mt5, hf_tiny-random-Phi3ForCausalLM
23	CPU	native_inference	[ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Got invalid dimensions for input: pixel_values for the following indices	#941	1	hf_mobilenet_v1_0.75_192
24	CPU	native_inference	[ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running Add node	#942	1	hf_eva_large_patch14_196.in22k_ft_in22k_in1k
25	CPU	compiled_inference	INVALID_ARGUMENT; function expected fewer input values; parsing input `input.bin`	#943	1	hf_ko-sroberta-multitask, hf_robertuito-sentiment-analysis, hf_sbert_large_nlu_ru, hf_sentence-bert-base-ja-mean-tokens-v2
26	CPU	compiled_inference	:0: FAILED_PRECONDITION; onnx.Expand input has a dim that is not statically 1	#944	1	hf_phobert-base-finetuned, hf_phobert-large-finetuned

zjgarvey · 2025-02-13T17:09:02Z

I assume the most recent run is on CPU? Can you share the detail table in a gist? Can you also post the IREE version?

amd-vivekag · 2025-02-13T17:44:15Z

I assume the most recent run is on CPU? Can you share the detail table in a gist? Can you also post the IREE version?

Yes, these are run on CPU. I was getting more failures (around 40 more failures on GPU). I'm using following IREE version:

IREE (https://iree.dev):
  IREE compiler version 3.2.0rc20250206 @ f3bef2de123f08b4fc3b0ce691494891bd6760d0
  LLVM version 20.0.0git
  Optimized build

Following is the detailed table link:
https://gist.github.com/amd-vivekag/377a7b141b40c118f880b2ced176f95c

pdhirajkumarprasad mentioned this issue Jan 9, 2025

[Tracker] All the issue related with e2e shark test suite #812

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HF model tracker #899

HF model tracker #899

pdhirajkumarprasad commented Jan 9, 2025 •

edited

Loading

amd-vivekag commented Feb 13, 2025 •

edited

Loading

zjgarvey commented Feb 13, 2025 •

edited

Loading

amd-vivekag commented Feb 13, 2025

HF model tracker #899

HF model tracker #899

Comments

pdhirajkumarprasad commented Jan 9, 2025 • edited Loading

amd-vivekag commented Feb 13, 2025 • edited Loading

Passing Summary

Fail Summary

Passing Summary for text-classification testcases:

Fail Summary

zjgarvey commented Feb 13, 2025 • edited Loading

amd-vivekag commented Feb 13, 2025

pdhirajkumarprasad commented Jan 9, 2025 •

edited

Loading

amd-vivekag commented Feb 13, 2025 •

edited

Loading

zjgarvey commented Feb 13, 2025 •

edited

Loading