fix(perf): unify HF and ONNX paths through PerfBenchmark by xieofxie · Pull Request #659 · microsoft/winml-cli

xieofxie · 2026-05-19T03:10:27Z

Summary

Part 1 of #596.

winml perf -m hf/model and winml perf -m model.onnx previously ran two completely different pipelines: HF went through the full AOT build (export → optimize → quantize → compile) via PerfBenchmark, while .onnx files bypassed the pipeline entirely and ran a raw ORT JIT load through _run_onnx_benchmark. Same user-facing command, non-comparable numbers, and several flags (--no-quantize, --rebuild, --ignore-cache, --precision) silently no-oped on the ONNX path.
Both inputs now flow through PerfBenchmark, which dispatches to WinMLAutoModel.from_pretrained or .from_onnx. The is_onnx branch in _load_model (previously dead code) is now the live entry point, so an .onnx file runs optimize → [quantize] → [compile] like the HF flow minus export.
Delete _run_onnx_benchmark and the duplicate hardware-monitor / stats-collection logic it carried. The CLI keeps is_onnx only for the file-exists check, the --shape-config warning (shapes are baked into pre-exported ONNX), and feeding --op-tracing the raw input path.
Refresh docstrings on the perf command, PerfBenchmark._load_model, and the loop helpers to drop stale references; update the CLI test to assert ONNX inputs route through PerfBenchmark.run.
add skip_build option

`winml perf -m hf/model` and `winml perf -m model.onnx` previously ran two completely different pipelines: HF went through the full AOT build (export -> optimize -> quantize -> compile) via PerfBenchmark, while .onnx files bypassed the pipeline entirely and ran a raw ORT JIT load through _run_onnx_benchmark. Same user-facing command, different code path, non-comparable numbers, and several CLI flags (--no-quantize, --rebuild, --ignore-cache, --precision) silently no-oped on the ONNX path. Both paths now flow through PerfBenchmark, which dispatches to WinMLAutoModel.from_pretrained or .from_onnx based on the input. The ONNX branch in _load_model (previously dead code) is now the live entry point, so an .onnx file goes through optimize -> [quantize] -> [compile] just like the HF flow, minus the export stage. - Delete _run_onnx_benchmark and its private helpers' stale references. - Drop the is_onnx dispatcher branch in the CLI; keep is_onnx only for the file-exists check, the --shape-config warning (shapes are baked into a pre-exported ONNX), and feeding --op-tracing the raw input. - Refresh docstrings on the perf command and PerfBenchmark._load_model. - Update the CLI test to assert ONNX inputs route through PerfBenchmark.run; refresh e2e docstrings.

DingmaomaoBJTU

Clean, well-motivated change. Unifying both paths through PerfBenchmark removes ~100 lines of duplicated benchmark logic and makes latency numbers directly comparable — solid improvement.

A few inline comments below, mostly nits and one suggestion for robustness.

xieofxie · 2026-05-26T03:00:26Z

WinMLAutoModel.from_onnx has at least two issues:

when perf --device gpu without ep, it will analyze on all eps but will only perf on one ep
for same model path, if run first with --device cpu, it will cache a cpu model (openvino for example), when running again with --device gpu, it will still load the cpu cache but could not run (dml for example)

xieofxie · 2026-06-08T06:23:15Z

WinMLAutoModel.from_onnx has at least two issues:

when perf --device gpu without ep, it will analyze on all eps but will only perf on one ep

for same model path, if run first with --device cpu, it will cache a cpu model (openvino for example), when running again with --device gpu, it will still load the cpu cache but could not run (dml for example)

Tracked #827

DingmaomaoBJTU

Overall the unification looks solid — both HF and ONNX paths now flow through PerfBenchmark, skip_build is properly wired across perf/eval/run/engine, and the priority tests cover the key contracts. One minor gap still open: TestEvalPriority is missing skip_build coverage (see inline comment), but that shouldn't block the merge.

Fixes #827 ## Problem Two cache-naming bugs in `WinMLAutoModel.from_onnx` introduced after PR #659: 1. **Cache directory collision** – `from_onnx` called `get_model_dir(onnx_path.stem, ...)`, so two ONNX files with the same filename (e.g. `model.onnx`) from different directories would map to the same cache directory and overwrite each other's build outputs. 2. **Config change not reflected in cache** – `build_onnx_model` had no `cache_key` support (unlike `build_hf_model`), so different build configs for the same ONNX file would collide on the same `model.onnx` artifact path. ## Fix **`src/winml/modelkit/models/auto.py` — `from_onnx`** - Replace `get_model_dir(onnx_path.stem, ...)` with `get_model_dir(str(onnx_path.resolve()), ...)` so the cache dir is unique per absolute file path. - Compute `cache_key = get_cache_key(task_abbrev, config.generate_cache_key())` (mirrors `from_pretrained`) and pass it to `build_onnx_model`. **`src/winml/modelkit/build/onnx.py` — `build_onnx_model`** - Add `cache_key: str | None = None` parameter. - Add `_name()` helper (same pattern as `build_hf_model`) so all artifact paths are optionally prefixed — enabling multiple config variants to coexist in one directory. - Rebuild cleanup uses the cache_key-scoped glob pattern to avoid removing unrelated artifacts. ## Tests - 5 new tests in `test_auto_onnx.py`: resolved path used for model dir, same-stem/different-path collision prevention, cache_key passed through. - 5 new tests in `test_onnx.py`: `cache_key=None` backward compat, prefixed final artifact, prefixed config path, reuse with prefixed path, unrelated artifact preservation on rebuild.

xieofxie requested a review from a team as a code owner May 19, 2026 03:10

This comment was marked as outdated.

Sign in to view

Merge remote-tracking branch 'origin/main' into hualxie/unify_perf

340b6c7

This comment was marked as outdated.

Sign in to view

DingmaomaoBJTU reviewed May 25, 2026

View reviewed changes

Comment thread src/winml/modelkit/commands/perf.py

Comment thread src/winml/modelkit/commands/perf.py

Comment thread tests/unit/commands/test_perf_cli.py

hualxie and others added 2 commits May 25, 2026 17:24

address comments

cf3ecce

Merge branch 'main' into hualxie/unify_perf

fcd83a9

This comment was marked as resolved.

Sign in to view

hualxie added 3 commits May 26, 2026 10:29

Merge remote-tracking branch 'origin/main' into hualxie/unify_perf

0308b49

Merge remote-tracking branch 'origin/main' into hualxie/unify_perf

ed6d790

Merge remote-tracking branch 'origin/main' into hualxie/unify_perf

494cf22

xieofxie and others added 5 commits June 2, 2026 14:21

Merge remote-tracking branch 'origin/main' into hualxie/unify_perf

6335984

Merge remote-tracking branch 'origin/main' into hualxie/unify_perf

c266c29

set to more places

2752d4f

test fix

ba59ab9

add no skip build

9e907cb

xieofxie mentioned this pull request Jun 8, 2026

bug: WinMLAutoModel.from_onnx uses incorrect cache dir/file name #827

Closed

DingmaomaoBJTU reviewed Jun 8, 2026

View reviewed changes

Comment thread src/winml/modelkit/inference/engine.py

DingmaomaoBJTU reviewed Jun 8, 2026

View reviewed changes

Comment thread tests/unit/commands/test_config_value_priority.py

DingmaomaoBJTU reviewed Jun 8, 2026

View reviewed changes

Comment thread src/winml/modelkit/eval/config.py

xieofxie added 2 commits June 8, 2026 15:53

Merge remote-tracking branch 'origin/main' into hualxie/unify_perf

177de15

address PR review feedback

30cf698

DingmaomaoBJTU reviewed Jun 9, 2026

View reviewed changes

Comment thread src/winml/modelkit/inference/engine.py

DingmaomaoBJTU reviewed Jun 9, 2026

View reviewed changes

Comment thread tests/unit/commands/test_config_value_priority.py

add test cases

aa9971e

DingmaomaoBJTU reviewed Jun 9, 2026

View reviewed changes

Comment thread tests/unit/commands/test_config_value_priority.py

DingmaomaoBJTU approved these changes Jun 9, 2026

View reviewed changes

add more tests

24e8aae

DingmaomaoBJTU approved these changes Jun 9, 2026

View reviewed changes

xieofxie merged commit ec7e09b into main Jun 9, 2026
9 checks passed

xieofxie deleted the hualxie/unify_perf branch June 9, 2026 04:52

DingmaomaoBJTU mentioned this pull request Jun 9, 2026

chore(cli): convert boolean is_flag options to --flag/--no-flag pairs for consistency #843

Closed

5 tasks

chinazhangchao mentioned this pull request Jun 10, 2026

fix: use resolved path and cache_key in WinMLAutoModel.from_onnx #856

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(perf): unify HF and ONNX paths through PerfBenchmark#659

fix(perf): unify HF and ONNX paths through PerfBenchmark#659
xieofxie merged 16 commits into
mainfrom
hualxie/unify_perf

xieofxie commented May 19, 2026 •

edited

Loading

Uh oh!

This comment was marked as outdated.

This comment was marked as outdated.

DingmaomaoBJTU left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment was marked as resolved.

xieofxie commented May 26, 2026

Uh oh!

xieofxie commented Jun 8, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DingmaomaoBJTU left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

xieofxie commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

This comment was marked as outdated.

This comment was marked as outdated.

DingmaomaoBJTU left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment was marked as resolved.

xieofxie commented May 26, 2026

Uh oh!

xieofxie commented Jun 8, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DingmaomaoBJTU left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

xieofxie commented May 19, 2026 •

edited

Loading