winml1.8 recipes by ssss141414 · Pull Request #407 · microsoft/winml-cli

ssss141414 · 2026-04-28T02:16:19Z

No description provided.

- Rename eval_dataset to eval_option with WinMLEvaluationConfig type - Dynamic load WinMLEvaluationConfig in build config (lazy import) - Rename build_script to dataset_script - Add --dataset-script and --trust-remote-code CLI options - Decouple config defaults from script execution logic - Simplify: dataset script prints path to stdout, no cache_dir logic - Config section only provides default values, no file existence checks

- 432 configs: 48 models × 3 EPs (amd/qnn/ov) × 3 precisions (w8a8/w8a16/fp16) - eval_option injected from models_with_acc.json where available - scripts/generate_example_configs.py: batch config generation - scripts/run_example_tests.py: batch eval testing - examples/generate_config.md: guide for generating configs - examples/test_config.md: guide for testing configs

…to shzhen/examples

…ModelKit into shzhen/examples

…U eval results - generate_example_configs.py: add --ep and --hardware args to filter EP generation - QNN NPU: 50 new eval results (zero-shot-classification, zero-shot-image-classification) - 21 eval.json results, 2 timeouts, 27 error files (CLIP/SigLIP eval failures)

- Added zero-shot-classification results for cross-encoder, lxyuan, MoritzLaurer DeBERTa/mDeBERTa models - Added zero-shot-image-classification results for CLIP, SigLIP, fashion-clip models - Updated segformer-b5 with new w8a8 perf/eval results - Fixed MoritzLaurer/deberta-v3-large-zeroshot-v2.0 fp16 timeout (now PASS) - Updated REPORT.md (48->64 models) and SUMMARY.md

- Tested 57 new configs across 16 model+task combinations - 7 new models fully pass (cross-encoder, joeddav, lxyuan, 4x MoritzLaurer) - 9 zero-shot-image-classification combos fail (CLIP, SigLIP, fashion-clip) - 2 large fill-mask models timeout (roberta-large, xlm-roberta-large) - nvidia/segformer-b5 eval failures confirmed - Updated REPORT.md: 48 -> 64 models, eval pass 93% -> 81% - Updated SUMMARY.md with new stats

…ts=65

Closes #583 ## Summary Rename the product from **"ModelKit"** / **"WinML ModelKit"** to **"WinML CLI"** everywhere users see the name. CLI binary (`winml`) and Python module path (`winml.modelkit`) are unchanged. 90 files, 314 / 316 lines (nearly symmetric — pure rename, near-zero behavior change). ## What changed ### User-visible (the main goal) - CLI `--help` text + module docstrings (`cli.py`, `__init__.py`, `__main__.py`) - All subcommand `--help` (`build`, `catalog`, `config`, `inspect`, `serve`, `sys`) - Serve API titles, console banners, `server_info["name"]` response - Rich UI panel titles (catalog, sys) - Runtime rule information messages - README, CONTRIBUTING, SUPPORT, docs/Privacy, docs/naming-convention - `serve/static/index.html` (browser title, logo, UI strings, code examples, MCP server identifiers, Claude Code skill filename) - `scripts/mcp_server.py` (FastMCP server name, descriptions, log messages) - Stale `wmk` CLI command examples in model docstrings -> `winml` ### Internal rename for consistency - Telemetry event names `ModelKit{Heartbeat,Action,Error}` -> `WinMLCLI{...}` (safe: not yet in production use, no dashboard coordination required) - Cache filename `modelkit.cache` -> `winmlcli.cache` - Internal attribute flags `_modelkit_*` -> `_winmlcli_*` in `telemetry/` - Env vars `MODELKIT_*` -> `WINMLCLI_*`: `_RULES_DIR`, `_SHOW_ALL_WARNINGS`, `_TIMING_LOG`, `_TELEMETRY_CACHE_DIR` - Internal module docstrings (`cache/`, `core/`, `onnx/`, `utils/`, etc) - `ModelKitPlugin` class identifier in Semantic Kernel example - `producer_name` strings in test fixtures + `pattern/base.py` ### Adjacent fixes pulled in - `pyproject.toml` URLs: `github.com/microsoft/ModelKit` -> `WinML-ModelKit` (stale 404 URLs; real repo is `microsoft/WinML-ModelKit`) - Same URL fix in `README.md`, `SUPPORT.md`, `CONTRIBUTING.md` - `WML` abbreviation -> `WinML` expansion in `__author__` and export subsystem docstring (team identity unchanged) - Stale duplicate copyright block (Apache-2.0 SPDX) removed from `graphpipe/builders` test assets — verified original Microsoft code, no Apache use ## Naming convention chosen Per `docs/naming-convention.md` (3-letter acronyms stay uppercase): - PascalCase: `WinMLCLI` (e.g. event names, class names) - ALL_CAPS env: `WINMLCLI_*` (e.g. `WINMLCLI_RULES_DIR`) - lowercase id: `winmlcli` (e.g. cache filename, MCP server name) - Two-word user-facing: `WinML CLI` (e.g. docs, help text)

…--device/--ep (#588) ## Summary - Validate numeric perf options at parse time using `click.IntRange`: `--iterations` must be `>= 1`, `--warmup` must be `>= 0`. Previously perf accepted `0` (or even negative) iterations and silently produced empty/garbage stats. - Replace three hand-rolled Click option blocks with the shared `cli_utils` helpers already used by `run`, `serve`, `eval`, and `catalog`: - `--model -m` → `cli_utils.model_option(required=False)` (renames the perf-local kwarg from `model_id` to `model` to match the helper; downstream `BenchmarkConfig.model_id`, `generate_output_path`, and `_run_onnx_benchmark` are unaffected). - `--device` → `cli_utils.device_option(required=False, default="auto", include_auto=True)`. - `--ep` → `cli_utils.ep_option(required=False, optional_message="Overrides device-to-provider mapping.")`. Side benefit: `--ep` typos are now caught at parse time instead of failing deep in the build pipeline. Net diff: `+12 / -30` lines in `src/winml/modelkit/commands/perf.py`. Closes #552 Co-authored-by: hualxie <hualxie@microsoft.com>

…it codes (#597) ## Summary Five `sys.exit(N)` paths in `commands/perf.py` used inconsistent, non-standard exit codes — and most importantly, the no-modules-matched user-error path called `sys.exit(0)`, masking a typo as success and silently breaking CI. This PR replaces all five with the proper Click exception types: | Path | Before | After | Exit | |---|---|---|---| | Module config generation failed | `sys.exit(3)` | `click.ClickException` | 1 | | **No modules matched `--module CLASSNAME`** | `sys.exit(0)` ← the bug | `click.UsageError` | **2** | | Defensive: configs missing `model_type` | `sys.exit(3)` | `click.ClickException` | 1 | | Model file not found | `sys.exit(3)` | `click.UsageError` | 2 | | Generic benchmark failure | `sys.exit(4)` | `click.ClickException` | 1 | User-facing errors now exit 2 (Click's `UsageError` convention), runtime failures exit 1, success exits 0 — matching the rest of the CLI. The now-unused `import sys` is also dropped. Added `test_module_no_match_exits_nonzero` as a regression guard for the specific scenario the issue described. End-to-end repro of the bug now behaves correctly: ``` $ winml perf -m microsoft/resnet-50 --module DoesNotExist --ep cpu --device cpu Generating module configs for DoesNotExist... Usage: winml perf [OPTIONS] Try 'winml perf --help' for help. Error: No modules matching 'DoesNotExist' found $ echo $? 2 ``` Closes #554 --------- Co-authored-by: hualxie <hualxie@microsoft.com>

…square detection accuracy (#479) DETR-family exports currently omit `pixel_mask`, so non-square inputs must be distorted or padded without masking, which drives significant object-detection regression. This PR updates export config behavior so DETR/Table Transformer ONNX models retain `pixel_mask` as an input aligned to image spatial dimensions. - **ONNX config override for DETR-family inputs** - Added `DetrIOConfig` and `TableTransformerIOConfig` overrides in `src/winml/modelkit/models/hf/detr.py`. - Registered overrides for relevant tasks (`object-detection` plus existing DETR export tasks) via `register_onnx_overwrite`. - Extended input spec to include: - `pixel_values`: `[batch, channels, height, width]` - `pixel_mask`: `[batch, height, width]` - **Dummy input generation updated for export path** - Added shared mixin logic to inject `pixel_mask` during dummy input generation. - `pixel_mask` is generated as `int64` ones with spatial shape derived from `pixel_values`. - **Focused DETR/Table Transformer coverage** - Expanded `tests/unit/models/detr/test_onnx_config.py` to validate: - config registration resolves to the new overrides - `pixel_mask` is present in ONNX input metadata with correct dynamic axes - generated dummy inputs include correctly shaped/dtyped `pixel_mask` for both `detr` and `table-transformer` Example of the resulting export input contract: ```python { "pixel_values": {0: "batch_size", 1: "num_channels", 2: "height", 3: "width"}, "pixel_mask": {0: "batch_size", 1: "height", 2: "width"}, } ``` > [!WARNING] > > <details> > <summary>Firewall rules blocked me from connecting to one or more addresses (expand for details)</summary> > > #### I tried to connect to the following addresses, but was blocked by firewall rules: > > - `download-r2.pytorch.org` > - Triggering command: `/usr/bin/python python -m pip install torch --index-url REDACTED` (dns block) > > If you need me to access, download, or install something from one of these locations, you can either: > > - Configure [Actions setup steps](https://gh.io/copilot/actions-setup-steps) to set up my environment, which run before the firewall is enabled > - Add the appropriate URLs or hosts to the custom allowlist in this repository's [Copilot coding agent settings](https://github.com/microsoft/WinML-ModelKit/settings/copilot/coding_agent) (admins only) > > </details> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: chinazhangchao <3822520+chinazhangchao@users.noreply.github.com> Co-authored-by: Charles Zhang <zhangchao@microsoft.com> Co-authored-by: vortex-captain <75063846+vortex-captain@users.noreply.github.com>

- Force-add 192 perf.json files (previously gitignored) - Update SUMMARY.md: OV CPU perf 0/192 -> 192/192 (100%)

- Force-add 192 perf.json files (previously gitignored) - Update SUMMARY.md perf pass rate: 192/192 (100%)

…nfigs + MLAS results - generate_example_configs.py: NPU keeps w8a8/w8a16/fp16 sweep; CPU/GPU EPs now generate a single config without --precision (uses EP default). - Update examples/generate_config.md and examples/test_config.md accordingly. - Delete previous CPU/GPU configs and results under mlas/cpu, openvino/cpu, qnn/gpu, openvino/gpu, nv_tensorrt_rtx/gpu, dml/gpu. - Regenerate CPU configs (126) and GPU configs (252) using new scheme. - Run perf+eval for MLAS CPU on this machine (in progress; partial results committed).

…3 configs)

Run finished 63/63. Eval results: PASS=49, FAIL=3, TIMEOUT=11. - Errors: microsoft/table-transformer-detection (object-detection), rizvandwiki/gender-classification (image-classification), w11wo/indonesian-roberta-base-posp-tagger (token-classification), Salesforce/blip-image-captioning-base (perf only; eval passed). - Timeouts (mostly large CPU-bound eval datasets): 5x fill-mask (FacebookAI/roberta-*, FacebookAI/xlm-roberta-*, google-bert/bert-base-multilingual-uncased), 6x zero-shot-image-classification (laion/CLIP-*, openai/clip-vit-*).

…l TIMEOUT)

…ModelKit into shzhen/examples

…or CPU/GPU EPs

…re); add report generator script - New scripts/generate_example_report.py generates REPORT.md from existing config/perf/eval/error/timeout artifacts for any EP/hardware folder. - Generated examples/mlas/cpu/REPORT.md from the recent MLAS run. - Force-added 62 *_perf.json files under examples/mlas/cpu/ so the table links in the report resolve (matches existing pattern under nv_tensorrt_rtx). - SUMMARY.md: removed broken REPORT.md links for openvino/cpu and qnn/gpu (no data yet for those EPs after CPU/GPU regen).

CPU: 55 models, 63 configs, perf 62/63 (98%), eval 46/63 (73%) GPU: 55 models, 63 configs, perf 62/63 (98%), eval 53/63 (84%) Updated SUMMARY.md and generated REPORT.md for both targets

ssss141414 added 7 commits April 27, 2026 21:18

Add eval_dataset support in config and eval command

54c274b

fix comments

a3f3e89

fix: add perf step to run_example_tests.py and update docs

f579f35

Add qnn npu example test results

326981d

Update qnn npu rerun test results

6acef7b

ssss141414 force-pushed the shzhen/examples branch from 09c9f28 to 6acef7b Compare April 30, 2026 03:34

ssss141414 added 13 commits April 30, 2026 11:35

Add OV NPU perf results (144 configs)

2b77271

Restore OV NPU eval results (135 PASS)

1489f07

feat: add test reports with SUMMARY.md (restore AMD eval results)

fd066e7

fix: show latency/throughput in perf and full metrics in eval reports

a585a16

fix: remove slash in perf link text to avoid GitHub 404

7cb86c5

fix: unignore perf.json in examples/ and add 140 AMD perf results

bc8d7c7

Merge branch 'main' of https://github.com/microsoft/WinML-ModelKit in…

934fa0b

…to shzhen/examples

Add configs for new zero-shot models

0cebceb

Add configs for new zero-shot models

fca1a13

Merge branch 'shzhen/examples' of https://github.com/microsoft/WinML-…

2bb4cde

…ModelKit into shzhen/examples

Merge main, keep examples config updates for testing

011cddf

Restore scripts/run_example_tests.py

08e7223

Reorganize examples into EP/hardware layout

7881de8

ssss141414 force-pushed the shzhen/examples branch from fbf16be to d0eabd0 Compare May 9, 2026 01:31

ssss141414 force-pushed the shzhen/examples branch from d0eabd0 to 302805f Compare May 9, 2026 01:33

ssss141414 added 3 commits May 9, 2026 18:44

QNN GPU test results: 192 configs, perf=162 eval=105 errors=53 timeou…

5a97a8f

…ts=65

ssss141414 force-pushed the shzhen/examples branch from 901bfe5 to 5a97a8f Compare May 12, 2026 08:21

ssss141414 added 2 commits May 12, 2026 16:40

Update example reports with latest QNN NPU/GPU results

360ea17

Remove QNN report generation outputs

26fdc6f

timenick and others added 28 commits May 15, 2026 11:56

Update summary metrics and push in-progress QNN rerun artifacts

f3c12d4

Add MLAS CPU perf results (152 perf.json files previously gitignored)

92792b3

Update MLAS CPU perf pass rate in SUMMARY.md (152/152)

864e525

Add OpenVINO CPU perf results and update summary

556e5f4

- Force-add 192 perf.json files (previously gitignored) - Update SUMMARY.md: OV CPU perf 0/192 -> 192/192 (100%)

Add NVIDIA TensorRT RTX perf results and update summary

0a15a2a

- Force-add 192 perf.json files (previously gitignored) - Update SUMMARY.md perf pass rate: 192/192 (100%)

examples: add NVIDIA TensorRT RTX GPU perf/eval results (55 models, 6…

1f7affe

…3 configs)

examples/mlas/cpu: rerun timeouts with --timeout 3600 (9 PASS, 2 stil…

5a3bc2a

…l TIMEOUT)

examples: re-run TRT RTX failed/timeout tests (62/63 perf, 58/63 eval)

8f4d959

Merge branch 'shzhen/examples' of https://github.com/microsoft/WinML-…

c678ee4

…ModelKit into shzhen/examples

examples/SUMMARY: update counts for MLAS rerun; mark pending re-run f…

69e4978

…or CPU/GPU EPs

Add DML GPU test results and REPORT.md (18/63 perf, 13/63 eval)

c6f7826

Add DML GPU perf results (force-add past gitignore)

7e7a3a4

Update SUMMARY.md with DML GPU results

0045ba2

Add OpenVINO CPU/GPU test results (new single-config format)

55c0ef4

CPU: 55 models, 63 configs, perf 62/63 (98%), eval 46/63 (73%) GPU: 55 models, 63 configs, perf 62/63 (98%), eval 53/63 (84%) Updated SUMMARY.md and generated REPORT.md for both targets

Update QNN GPU summary/report and rerun artifacts

17125d0

Add QNN GPU perf artifacts

9eb1365

Add QNN NPU perf artifacts

c5d565e

Update DML GPU results: 34/63 perf (54%), 28/63 eval (44%)

05c854b

Resolve merge conflict in SUMMARY.md, update DML GPU results

682570d

Add OpenVINO NPU perf results (50 perf.json files previously gitignored)

54c1150

Add VitisAI NPU perf results (36 perf.json files previously gitignored)

052c4b0

ssss141414 closed this May 19, 2026

ssss141414 changed the title ~~Shzhen/examples~~ winml1.8 recipes May 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

winml1.8 recipes#407

winml1.8 recipes#407
ssss141414 wants to merge 77 commits into
mainfrom
shzhen/examples

ssss141414 commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

Uh oh!

Conversation

ssss141414 commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants