fix(perf): unify HF and ONNX paths through PerfBenchmark#659
Conversation
`winml perf -m hf/model` and `winml perf -m model.onnx` previously ran two completely different pipelines: HF went through the full AOT build (export -> optimize -> quantize -> compile) via PerfBenchmark, while .onnx files bypassed the pipeline entirely and ran a raw ORT JIT load through _run_onnx_benchmark. Same user-facing command, different code path, non-comparable numbers, and several CLI flags (--no-quantize, --rebuild, --ignore-cache, --precision) silently no-oped on the ONNX path. Both paths now flow through PerfBenchmark, which dispatches to WinMLAutoModel.from_pretrained or .from_onnx based on the input. The ONNX branch in _load_model (previously dead code) is now the live entry point, so an .onnx file goes through optimize -> [quantize] -> [compile] just like the HF flow, minus the export stage. - Delete _run_onnx_benchmark and its private helpers' stale references. - Drop the is_onnx dispatcher branch in the CLI; keep is_onnx only for the file-exists check, the --shape-config warning (shapes are baked into a pre-exported ONNX), and feeding --op-tracing the raw input. - Refresh docstrings on the perf command and PerfBenchmark._load_model. - Update the CLI test to assert ONNX inputs route through PerfBenchmark.run; refresh e2e docstrings.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Clean, well-motivated change. Unifying both paths through PerfBenchmark removes ~100 lines of duplicated benchmark logic and makes latency numbers directly comparable — solid improvement.
A few inline comments below, mostly nits and one suggestion for robustness.
This comment was marked as resolved.
This comment was marked as resolved.
|
WinMLAutoModel.from_onnx has at least two issues:
|
Tracked #827 |
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Overall the unification looks solid — both HF and ONNX paths now flow through PerfBenchmark, skip_build is properly wired across perf/eval/run/engine, and the priority tests cover the key contracts. One minor gap still open: TestEvalPriority is missing skip_build coverage (see inline comment), but that shouldn't block the merge.
Fixes #827 ## Problem Two cache-naming bugs in `WinMLAutoModel.from_onnx` introduced after PR #659: 1. **Cache directory collision** – `from_onnx` called `get_model_dir(onnx_path.stem, ...)`, so two ONNX files with the same filename (e.g. `model.onnx`) from different directories would map to the same cache directory and overwrite each other's build outputs. 2. **Config change not reflected in cache** – `build_onnx_model` had no `cache_key` support (unlike `build_hf_model`), so different build configs for the same ONNX file would collide on the same `model.onnx` artifact path. ## Fix **`src/winml/modelkit/models/auto.py` — `from_onnx`** - Replace `get_model_dir(onnx_path.stem, ...)` with `get_model_dir(str(onnx_path.resolve()), ...)` so the cache dir is unique per absolute file path. - Compute `cache_key = get_cache_key(task_abbrev, config.generate_cache_key())` (mirrors `from_pretrained`) and pass it to `build_onnx_model`. **`src/winml/modelkit/build/onnx.py` — `build_onnx_model`** - Add `cache_key: str | None = None` parameter. - Add `_name()` helper (same pattern as `build_hf_model`) so all artifact paths are optionally prefixed — enabling multiple config variants to coexist in one directory. - Rebuild cleanup uses the cache_key-scoped glob pattern to avoid removing unrelated artifacts. ## Tests - 5 new tests in `test_auto_onnx.py`: resolved path used for model dir, same-stem/different-path collision prevention, cache_key passed through. - 5 new tests in `test_onnx.py`: `cache_key=None` backward compat, prefixed final artifact, prefixed config path, reuse with prefixed path, unrelated artifact preservation on rebuild.
Summary
Part 1 of #596.
winml perf -m hf/modelandwinml perf -m model.onnxpreviously ran two completely different pipelines: HF went through the full AOT build (export → optimize → quantize → compile) viaPerfBenchmark, while.onnxfiles bypassed the pipeline entirely and ran a raw ORT JIT load through_run_onnx_benchmark. Same user-facing command, non-comparable numbers, and several flags (--no-quantize,--rebuild,--ignore-cache,--precision) silently no-oped on the ONNX path.PerfBenchmark, which dispatches toWinMLAutoModel.from_pretrainedor.from_onnx. Theis_onnxbranch in_load_model(previously dead code) is now the live entry point, so an.onnxfile runs optimize → [quantize] → [compile] like the HF flow minus export._run_onnx_benchmarkand the duplicate hardware-monitor / stats-collection logic it carried. The CLI keepsis_onnxonly for the file-exists check, the--shape-configwarning (shapes are baked into pre-exported ONNX), and feeding--op-tracingthe raw input path.perfcommand,PerfBenchmark._load_model, and the loop helpers to drop stale references; update the CLI test to assert ONNX inputs route throughPerfBenchmark.run.