You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
_normalize_exported_model (introduced in #681) runs ORT graph optimization on an already-tagged ONNX model. ORT's node fusion duplicates winml.hierarchy.* keys in the fused node's metadata_props, then the broken model is copied back over the original export.onnx, causing the subsequent main optimize step to fail with ModelValidationError: duplicate keys in metadata_props.
Context
PR #681 added a post-export normalization step (_normalize_exported_model) inside export_pytorch() to run shape inference on freshly exported models. However, by the time normalization runs, HTPExporter._embed_tags_in_onnx has already injected winml.hierarchy.tag and winml.hierarchy.depth into every ONNX node's metadata_props. When ORT's graph optimizer fuses multiple nodes (e.g. Gelu fusion, MatMul+Add fusion), it merges the source nodes' metadata_props into the fused node, creating duplicate keys. The ONNX spec forbids duplicate keys in metadata_props (model-level and node-level), so the next call to onnx.checker.check_model raises an error.
Observed on BAAI/bge-large-en-v1.5 (BERT, feature-extraction), which triggers both Gelu and MatMul+Add fusions.
Current State
Failing call chain:
perf.py:1369 perf()
perf.py:290 benchmark.run()
perf.py:354 _load_model()
auto.py:424 WinMLAutoModel.from_pretrained() → build_hf_model()
hf.py:271 run_optimize_analyze_loop(model_path=export_path, ...)
common.py:83 optimize_onnx(model=export_path, ...)
api.py:234 _load_model(export_path)
api.py:68 → raise ModelValidationError(
"Failed to load ONNX model",
"Validation error: Your model has duplicate keys in metadata_props.")
Root-cause sequence:
exporter.py:594-595 — _embed_tags_in_onnx adds winml.hierarchy.tag + winml.hierarchy.depth to every graph node, then saves to export.onnx.
optim/api.py:234 — _load_model(output_path) passes (onnx.checker finds no duplicates yet).
optim/pipes/graph.py:533,571,587 — ORTGraphPipe.process saves the model, creates an ort.InferenceSession (triggering ORT graph optimizations including Gelu/MatMul+Add fusions), then reloads with validate=False. ORT copies all source-node metadata_props into the fused node → duplicate winml.hierarchy.tag keys on fused nodes. The duplicates are invisible here because validation is skipped.
optim/api.py:276 — _hack_inject_quant_preprocess_metadata calls onnx.helper.set_model_props which only cleans model-levelmetadata_props; node-level duplicates survive.
optim/api.py:284 — save_onnx(optimized_model, tmp_path) writes the broken model to tmp_path.
pytorch.py:149 — copy_onnx_model(tmp_path, output_path)overwrites export.onnx with the broken model.
Main build run_optimize_analyze_loop tries to load export.onnx with validate=True → crash.
Desired State
_normalize_exported_model either:
Does not feed hierarchy-tagged nodes through ORT node fusion, or
Ensures the normalized model written back to export.onnx has no duplicate node-level metadata_props
so that the main optimize step can always load export.onnx without a checker error.
Acceptance Criteria
Building BAAI/bge-large-en-v1.5 (feature-extraction, QNN EP) succeeds end-to-end without ModelValidationError
_normalize_exported_model still runs shape inference / graph normalization on the raw ONNX export
winml.hierarchy.tag / winml.hierarchy.depth node metadata is present and correct in the final artifact
No regression on models that don't trigger node fusion
Option A — Move tag injection after normalization
In HTPExporter.export() (src/winml/modelkit/export/htp/exporter.py), swap the order so _normalize_exported_model (in export_pytorch) runs on the untagged ONNX, and only then inject hierarchy tags. This is the cleanest approach: the ORT optimizer never sees winml.* node props.
Option B — Strip/deduplicate node metadata_props before copying back
In _normalize_exported_model (src/winml/modelkit/export/pytorch.py), after optimize_onnx writes to tmp_path, load tmp_path and remove duplicate node-level winml.* keys before calling copy_onnx_model. More surgical but adds complexity.
Note: simply switching to validate=False in the main optimize_onnx load would hide the symptom, not fix the corruption.
Summary
_normalize_exported_model(introduced in #681) runs ORT graph optimization on an already-tagged ONNX model. ORT's node fusion duplicateswinml.hierarchy.*keys in the fused node'smetadata_props, then the broken model is copied back over the originalexport.onnx, causing the subsequent main optimize step to fail withModelValidationError: duplicate keys in metadata_props.Context
PR #681 added a post-export normalization step (
_normalize_exported_model) insideexport_pytorch()to run shape inference on freshly exported models. However, by the time normalization runs,HTPExporter._embed_tags_in_onnxhas already injectedwinml.hierarchy.tagandwinml.hierarchy.depthinto every ONNX node'smetadata_props. When ORT's graph optimizer fuses multiple nodes (e.g. Gelu fusion, MatMul+Add fusion), it merges the source nodes'metadata_propsinto the fused node, creating duplicate keys. The ONNX spec forbids duplicate keys inmetadata_props(model-level and node-level), so the next call toonnx.checker.check_modelraises an error.Observed on BAAI/bge-large-en-v1.5 (BERT, feature-extraction), which triggers both Gelu and MatMul+Add fusions.
Current State
Failing call chain:
Root-cause sequence:
exporter.py:594-595—_embed_tags_in_onnxaddswinml.hierarchy.tag+winml.hierarchy.depthto every graph node, then saves toexport.onnx.pytorch.py:148—_normalize_exported_model(output_path)callsoptimize_onnx(model=output_path, output=tmp_path).optim/api.py:234—_load_model(output_path)passes (onnx.checkerfinds no duplicates yet).optim/pipes/graph.py:533,571,587—ORTGraphPipe.processsaves the model, creates anort.InferenceSession(triggering ORT graph optimizations including Gelu/MatMul+Add fusions), then reloads withvalidate=False. ORT copies all source-nodemetadata_propsinto the fused node → duplicatewinml.hierarchy.tagkeys on fused nodes. The duplicates are invisible here because validation is skipped.optim/api.py:276—_hack_inject_quant_preprocess_metadatacallsonnx.helper.set_model_propswhich only cleans model-levelmetadata_props; node-level duplicates survive.optim/api.py:284—save_onnx(optimized_model, tmp_path)writes the broken model totmp_path.pytorch.py:149—copy_onnx_model(tmp_path, output_path)overwritesexport.onnxwith the broken model.run_optimize_analyze_looptries to loadexport.onnxwithvalidate=True→ crash.Desired State
_normalize_exported_modeleither:export.onnxhas no duplicate node-levelmetadata_propsso that the main optimize step can always load
export.onnxwithout a checker error.Acceptance Criteria
ModelValidationError_normalize_exported_modelstill runs shape inference / graph normalization on the raw ONNX exportwinml.hierarchy.tag/winml.hierarchy.depthnode metadata is present and correct in the final artifacttests/unit/export/test_pytorch_export.py,tests/integration/(or equivalent scope)Technical Notes
Two candidate fixes, either works:
Option A — Move tag injection after normalization
In
HTPExporter.export()(src/winml/modelkit/export/htp/exporter.py), swap the order so_normalize_exported_model(inexport_pytorch) runs on the untagged ONNX, and only then inject hierarchy tags. This is the cleanest approach: the ORT optimizer never seeswinml.*node props.Option B — Strip/deduplicate node
metadata_propsbefore copying backIn
_normalize_exported_model(src/winml/modelkit/export/pytorch.py), afteroptimize_onnxwrites totmp_path, loadtmp_pathand remove duplicate node-levelwinml.*keys before callingcopy_onnx_model. More surgical but adds complexity.Note: simply switching to
validate=Falsein the mainoptimize_onnxload would hide the symptom, not fix the corruption.Related Files
src/winml/modelkit/export/pytorch.py:105-158—_normalize_exported_model, introduced in feat(export): normalize exported ONNX in-place via optimize_onnx #681src/winml/modelkit/export/htp/exporter.py:568-600—_embed_tags_in_onnx+_embed_graph_metadata(step 7 of export)src/winml/modelkit/export/htp/exporter.py:295-297— call site for both step-7 functionssrc/winml/modelkit/optim/pipes/graph.py:491-590—ORTGraphPipe.process(ORT node fusion, reloads withvalidate=False)src/winml/modelkit/optim/api.py:163-173—_hack_inject_quant_preprocess_metadata(only fixes model-level props)src/winml/modelkit/onnx/persistence.py:28-68—load_onnx(validate=Trueby default)tests/unit/export/test_pytorch_export.py— existing normalization tests to extendReferences
feat(export): normalize exported ONNX in-place via optimize_onnx)metadata_propskeys must be unique at both model level and node levelmetadata_propsto the fused node (observed with Gelu and MatMul+Add fusions on BERT)