Skip to content

fix(perf): unify HF and ONNX paths through PerfBenchmark#659

Merged
xieofxie merged 16 commits into
mainfrom
hualxie/unify_perf
Jun 9, 2026
Merged

fix(perf): unify HF and ONNX paths through PerfBenchmark#659
xieofxie merged 16 commits into
mainfrom
hualxie/unify_perf

Conversation

@xieofxie

@xieofxie xieofxie commented May 19, 2026

Copy link
Copy Markdown
Contributor

Summary

Part 1 of #596.

  • winml perf -m hf/model and winml perf -m model.onnx previously ran two completely different pipelines: HF went through the full AOT build (export → optimize → quantize → compile) via PerfBenchmark, while .onnx files bypassed the pipeline entirely and ran a raw ORT JIT load through _run_onnx_benchmark. Same user-facing command, non-comparable numbers, and several flags (--no-quantize, --rebuild, --ignore-cache, --precision) silently no-oped on the ONNX path.
  • Both inputs now flow through PerfBenchmark, which dispatches to WinMLAutoModel.from_pretrained or .from_onnx. The is_onnx branch in _load_model (previously dead code) is now the live entry point, so an .onnx file runs optimize → [quantize] → [compile] like the HF flow minus export.
  • Delete _run_onnx_benchmark and the duplicate hardware-monitor / stats-collection logic it carried. The CLI keeps is_onnx only for the file-exists check, the --shape-config warning (shapes are baked into pre-exported ONNX), and feeding --op-tracing the raw input path.
  • Refresh docstrings on the perf command, PerfBenchmark._load_model, and the loop helpers to drop stale references; update the CLI test to assert ONNX inputs route through PerfBenchmark.run.
  • add skip_build option

`winml perf -m hf/model` and `winml perf -m model.onnx` previously ran
two completely different pipelines: HF went through the full AOT build
(export -> optimize -> quantize -> compile) via PerfBenchmark, while
.onnx files bypassed the pipeline entirely and ran a raw ORT JIT load
through _run_onnx_benchmark. Same user-facing command, different code
path, non-comparable numbers, and several CLI flags (--no-quantize,
--rebuild, --ignore-cache, --precision) silently no-oped on the ONNX
path.

Both paths now flow through PerfBenchmark, which dispatches to
WinMLAutoModel.from_pretrained or .from_onnx based on the input. The
ONNX branch in _load_model (previously dead code) is now the live entry
point, so an .onnx file goes through optimize -> [quantize] -> [compile]
just like the HF flow, minus the export stage.

- Delete _run_onnx_benchmark and its private helpers' stale references.
- Drop the is_onnx dispatcher branch in the CLI; keep is_onnx only for
  the file-exists check, the --shape-config warning (shapes are baked
  into a pre-exported ONNX), and feeding --op-tracing the raw input.
- Refresh docstrings on the perf command and PerfBenchmark._load_model.
- Update the CLI test to assert ONNX inputs route through
  PerfBenchmark.run; refresh e2e docstrings.
@xieofxie xieofxie requested a review from a team as a code owner May 19, 2026 03:10
@xieofxie

This comment was marked as outdated.

@xieofxie

This comment was marked as outdated.

@DingmaomaoBJTU DingmaomaoBJTU left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean, well-motivated change. Unifying both paths through PerfBenchmark removes ~100 lines of duplicated benchmark logic and makes latency numbers directly comparable — solid improvement.

A few inline comments below, mostly nits and one suggestion for robustness.

Comment thread src/winml/modelkit/commands/perf.py
Comment thread src/winml/modelkit/commands/perf.py
Comment thread tests/unit/commands/test_perf_cli.py
@xieofxie

This comment was marked as resolved.

@xieofxie

Copy link
Copy Markdown
Contributor Author

WinMLAutoModel.from_onnx has at least two issues:

  • when perf --device gpu without ep, it will analyze on all eps but will only perf on one ep
  • for same model path, if run first with --device cpu, it will cache a cpu model (openvino for example), when running again with --device gpu, it will still load the cpu cache but could not run (dml for example)

@xieofxie

xieofxie commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

WinMLAutoModel.from_onnx has at least two issues:

  • when perf --device gpu without ep, it will analyze on all eps but will only perf on one ep
  • for same model path, if run first with --device cpu, it will cache a cpu model (openvino for example), when running again with --device gpu, it will still load the cpu cache but could not run (dml for example)

Tracked #827

Comment thread src/winml/modelkit/inference/engine.py
Comment thread tests/unit/commands/test_config_value_priority.py
Comment thread src/winml/modelkit/eval/config.py
Comment thread src/winml/modelkit/inference/engine.py
Comment thread tests/unit/commands/test_config_value_priority.py
Comment thread tests/unit/commands/test_config_value_priority.py

@DingmaomaoBJTU DingmaomaoBJTU left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall the unification looks solid — both HF and ONNX paths now flow through PerfBenchmark, skip_build is properly wired across perf/eval/run/engine, and the priority tests cover the key contracts. One minor gap still open: TestEvalPriority is missing skip_build coverage (see inline comment), but that shouldn't block the merge.

@xieofxie xieofxie merged commit ec7e09b into main Jun 9, 2026
9 checks passed
@xieofxie xieofxie deleted the hualxie/unify_perf branch June 9, 2026 04:52
chinazhangchao added a commit that referenced this pull request Jun 25, 2026
Fixes #827

## Problem

Two cache-naming bugs in `WinMLAutoModel.from_onnx` introduced after PR
#659:

1. **Cache directory collision** – `from_onnx` called
`get_model_dir(onnx_path.stem, ...)`, so two ONNX files with the same
filename (e.g. `model.onnx`) from different directories would map to the
same cache directory and overwrite each other's build outputs.

2. **Config change not reflected in cache** – `build_onnx_model` had no
`cache_key` support (unlike `build_hf_model`), so different build
configs for the same ONNX file would collide on the same `model.onnx`
artifact path.

## Fix

**`src/winml/modelkit/models/auto.py` — `from_onnx`**
- Replace `get_model_dir(onnx_path.stem, ...)` with
`get_model_dir(str(onnx_path.resolve()), ...)` so the cache dir is
unique per absolute file path.
- Compute `cache_key = get_cache_key(task_abbrev,
config.generate_cache_key())` (mirrors `from_pretrained`) and pass it to
`build_onnx_model`.

**`src/winml/modelkit/build/onnx.py` — `build_onnx_model`**
- Add `cache_key: str | None = None` parameter.
- Add `_name()` helper (same pattern as `build_hf_model`) so all
artifact paths are optionally prefixed — enabling multiple config
variants to coexist in one directory.
- Rebuild cleanup uses the cache_key-scoped glob pattern to avoid
removing unrelated artifacts.

## Tests

- 5 new tests in `test_auto_onnx.py`: resolved path used for model dir,
same-stem/different-path collision prevention, cache_key passed through.
- 5 new tests in `test_onnx.py`: `cache_key=None` backward compat,
prefixed final artifact, prefixed config path, reuse with prefixed path,
unrelated artifact preservation on rebuild.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants