bug(export): _normalize_exported_model overwrites export.onnx with duplicate node metadata_props after ORT node fusion

## Summary

`_normalize_exported_model` (introduced in #681) runs ORT graph optimization on an already-tagged ONNX model. ORT's node fusion duplicates `winml.hierarchy.*` keys in the fused node's `metadata_props`, then the broken model is copied back over the original `export.onnx`, causing the subsequent main optimize step to fail with `ModelValidationError: duplicate keys in metadata_props`.

## Context

PR #681 added a post-export normalization step (`_normalize_exported_model`) inside `export_pytorch()` to run shape inference on freshly exported models. However, by the time normalization runs, `HTPExporter._embed_tags_in_onnx` has already injected `winml.hierarchy.tag` and `winml.hierarchy.depth` into every ONNX node's `metadata_props`. When ORT's graph optimizer fuses multiple nodes (e.g. Gelu fusion, MatMul+Add fusion), it merges the source nodes' `metadata_props` into the fused node, creating duplicate keys. The ONNX spec forbids duplicate keys in `metadata_props` (model-level *and* node-level), so the next call to `onnx.checker.check_model` raises an error.

Observed on BAAI/bge-large-en-v1.5 (BERT, feature-extraction), which triggers both Gelu and MatMul+Add fusions.

## Current State

**Failing call chain:**

```
perf.py:1369  perf()
perf.py:290   benchmark.run()
perf.py:354   _load_model()
auto.py:424   WinMLAutoModel.from_pretrained() → build_hf_model()
hf.py:271     run_optimize_analyze_loop(model_path=export_path, ...)
common.py:83  optimize_onnx(model=export_path, ...)
api.py:234    _load_model(export_path)
api.py:68     → raise ModelValidationError(
                  "Failed to load ONNX model",
                  "Validation error: Your model has duplicate keys in metadata_props.")
```

**Root-cause sequence:**

1. **`exporter.py:594-595`** — `_embed_tags_in_onnx` adds `winml.hierarchy.tag` + `winml.hierarchy.depth` to every graph node, then saves to `export.onnx`.  
2. **`pytorch.py:148`** — `_normalize_exported_model(output_path)` calls `optimize_onnx(model=output_path, output=tmp_path)`.  
3. **`optim/api.py:234`** — `_load_model(output_path)` passes (`onnx.checker` finds no duplicates yet).  
4. **`optim/pipes/graph.py:533,571,587`** — `ORTGraphPipe.process` saves the model, creates an `ort.InferenceSession` (triggering ORT graph optimizations including Gelu/MatMul+Add fusions), then reloads with **`validate=False`**. ORT copies all source-node `metadata_props` into the fused node → **duplicate `winml.hierarchy.tag` keys on fused nodes**. The duplicates are invisible here because validation is skipped.  
5. **`optim/api.py:276`** — `_hack_inject_quant_preprocess_metadata` calls `onnx.helper.set_model_props` which only cleans **model-level** `metadata_props`; **node-level duplicates survive**.  
6. **`optim/api.py:284`** — `save_onnx(optimized_model, tmp_path)` writes the broken model to `tmp_path`.  
7. **`pytorch.py:149`** — `copy_onnx_model(tmp_path, output_path)` **overwrites `export.onnx` with the broken model**.  
8. Main build `run_optimize_analyze_loop` tries to load `export.onnx` with `validate=True` → **crash**.

## Desired State

`_normalize_exported_model` either:
- Does **not** feed hierarchy-tagged nodes through ORT node fusion, or
- Ensures the normalized model written back to `export.onnx` has no duplicate node-level `metadata_props`

so that the main optimize step can always load `export.onnx` without a checker error.

## Acceptance Criteria

- [ ] Building BAAI/bge-large-en-v1.5 (feature-extraction, QNN EP) succeeds end-to-end without `ModelValidationError`
- [ ] `_normalize_exported_model` still runs shape inference / graph normalization on the raw ONNX export
- [ ] `winml.hierarchy.tag` / `winml.hierarchy.depth` node metadata is present and correct in the final artifact
- [ ] No regression on models that don't trigger node fusion
- [ ] Relevant pytest cases pass: `tests/unit/export/test_pytorch_export.py`, `tests/integration/` (or equivalent scope)

## Technical Notes

Two candidate fixes, either works:

**Option A — Move tag injection after normalization**  
In `HTPExporter.export()` (`src/winml/modelkit/export/htp/exporter.py`), swap the order so `_normalize_exported_model` (in `export_pytorch`) runs on the *untagged* ONNX, and only then inject hierarchy tags. This is the cleanest approach: the ORT optimizer never sees `winml.*` node props.

**Option B — Strip/deduplicate node `metadata_props` before copying back**  
In `_normalize_exported_model` (`src/winml/modelkit/export/pytorch.py`), after `optimize_onnx` writes to `tmp_path`, load `tmp_path` and remove duplicate node-level `winml.*` keys before calling `copy_onnx_model`. More surgical but adds complexity.

Note: simply switching to `validate=False` in the main `optimize_onnx` load would hide the symptom, not fix the corruption.

## Related Files

- `src/winml/modelkit/export/pytorch.py:105-158` — `_normalize_exported_model`, introduced in #681
- `src/winml/modelkit/export/htp/exporter.py:568-600` — `_embed_tags_in_onnx` + `_embed_graph_metadata` (step 7 of export)
- `src/winml/modelkit/export/htp/exporter.py:295-297` — call site for both step-7 functions
- `src/winml/modelkit/optim/pipes/graph.py:491-590` — `ORTGraphPipe.process` (ORT node fusion, reloads with `validate=False`)
- `src/winml/modelkit/optim/api.py:163-173` — `_hack_inject_quant_preprocess_metadata` (only fixes model-level props)
- `src/winml/modelkit/onnx/persistence.py:28-68` — `load_onnx` (`validate=True` by default)
- `tests/unit/export/test_pytorch_export.py` — existing normalization tests to extend

## References

- Introduced by: #681 (`feat(export): normalize exported ONNX in-place via optimize_onnx`)
- ONNX spec: `metadata_props` keys must be unique at both model level and node level
- ORT behavior: node fusion copies all source `metadata_props` to the fused node (observed with Gelu and MatMul+Add fusions on BERT)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

bug(export): _normalize_exported_model overwrites export.onnx with duplicate node metadata_props after ORT node fusion #696

Summary

Context

Current State

Desired State

Acceptance Criteria

Technical Notes

Related Files

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

bug(export): _normalize_exported_model overwrites export.onnx with duplicate node metadata_props after ORT node fusion #696

Description

Summary

Context

Current State

Desired State

Acceptance Criteria

Technical Notes

Related Files

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions