winml1.8 recipes#407
Closed
ssss141414 wants to merge 77 commits into
Closed
Conversation
- Rename eval_dataset to eval_option with WinMLEvaluationConfig type - Dynamic load WinMLEvaluationConfig in build config (lazy import) - Rename build_script to dataset_script - Add --dataset-script and --trust-remote-code CLI options - Decouple config defaults from script execution logic - Simplify: dataset script prints path to stdout, no cache_dir logic - Config section only provides default values, no file existence checks
- 432 configs: 48 models × 3 EPs (amd/qnn/ov) × 3 precisions (w8a8/w8a16/fp16) - eval_option injected from models_with_acc.json where available - scripts/generate_example_configs.py: batch config generation - scripts/run_example_tests.py: batch eval testing - examples/generate_config.md: guide for generating configs - examples/test_config.md: guide for testing configs
09c9f28 to
6acef7b
Compare
…to shzhen/examples
…ModelKit into shzhen/examples
fbf16be to
d0eabd0
Compare
…U eval results - generate_example_configs.py: add --ep and --hardware args to filter EP generation - QNN NPU: 50 new eval results (zero-shot-classification, zero-shot-image-classification) - 21 eval.json results, 2 timeouts, 27 error files (CLIP/SigLIP eval failures)
d0eabd0 to
302805f
Compare
- Added zero-shot-classification results for cross-encoder, lxyuan, MoritzLaurer DeBERTa/mDeBERTa models - Added zero-shot-image-classification results for CLIP, SigLIP, fashion-clip models - Updated segformer-b5 with new w8a8 perf/eval results - Fixed MoritzLaurer/deberta-v3-large-zeroshot-v2.0 fp16 timeout (now PASS) - Updated REPORT.md (48->64 models) and SUMMARY.md
- Tested 57 new configs across 16 model+task combinations - 7 new models fully pass (cross-encoder, joeddav, lxyuan, 4x MoritzLaurer) - 9 zero-shot-image-classification combos fail (CLIP, SigLIP, fashion-clip) - 2 large fill-mask models timeout (roberta-large, xlm-roberta-large) - nvidia/segformer-b5 eval failures confirmed - Updated REPORT.md: 48 -> 64 models, eval pass 93% -> 81% - Updated SUMMARY.md with new stats
901bfe5 to
5a97a8f
Compare
Closes #583 ## Summary Rename the product from **"ModelKit"** / **"WinML ModelKit"** to **"WinML CLI"** everywhere users see the name. CLI binary (`winml`) and Python module path (`winml.modelkit`) are unchanged. 90 files, 314 / 316 lines (nearly symmetric — pure rename, near-zero behavior change). ## What changed ### User-visible (the main goal) - CLI `--help` text + module docstrings (`cli.py`, `__init__.py`, `__main__.py`) - All subcommand `--help` (`build`, `catalog`, `config`, `inspect`, `serve`, `sys`) - Serve API titles, console banners, `server_info["name"]` response - Rich UI panel titles (catalog, sys) - Runtime rule information messages - README, CONTRIBUTING, SUPPORT, docs/Privacy, docs/naming-convention - `serve/static/index.html` (browser title, logo, UI strings, code examples, MCP server identifiers, Claude Code skill filename) - `scripts/mcp_server.py` (FastMCP server name, descriptions, log messages) - Stale `wmk` CLI command examples in model docstrings -> `winml` ### Internal rename for consistency - Telemetry event names `ModelKit{Heartbeat,Action,Error}` -> `WinMLCLI{...}` (safe: not yet in production use, no dashboard coordination required) - Cache filename `modelkit.cache` -> `winmlcli.cache` - Internal attribute flags `_modelkit_*` -> `_winmlcli_*` in `telemetry/` - Env vars `MODELKIT_*` -> `WINMLCLI_*`: `_RULES_DIR`, `_SHOW_ALL_WARNINGS`, `_TIMING_LOG`, `_TELEMETRY_CACHE_DIR` - Internal module docstrings (`cache/`, `core/`, `onnx/`, `utils/`, etc) - `ModelKitPlugin` class identifier in Semantic Kernel example - `producer_name` strings in test fixtures + `pattern/base.py` ### Adjacent fixes pulled in - `pyproject.toml` URLs: `github.com/microsoft/ModelKit` -> `WinML-ModelKit` (stale 404 URLs; real repo is `microsoft/WinML-ModelKit`) - Same URL fix in `README.md`, `SUPPORT.md`, `CONTRIBUTING.md` - `WML` abbreviation -> `WinML` expansion in `__author__` and export subsystem docstring (team identity unchanged) - Stale duplicate copyright block (Apache-2.0 SPDX) removed from `graphpipe/builders` test assets — verified original Microsoft code, no Apache use ## Naming convention chosen Per `docs/naming-convention.md` (3-letter acronyms stay uppercase): - PascalCase: `WinMLCLI` (e.g. event names, class names) - ALL_CAPS env: `WINMLCLI_*` (e.g. `WINMLCLI_RULES_DIR`) - lowercase id: `winmlcli` (e.g. cache filename, MCP server name) - Two-word user-facing: `WinML CLI` (e.g. docs, help text)
…--device/--ep (#588) ## Summary - Validate numeric perf options at parse time using `click.IntRange`: `--iterations` must be `>= 1`, `--warmup` must be `>= 0`. Previously perf accepted `0` (or even negative) iterations and silently produced empty/garbage stats. - Replace three hand-rolled Click option blocks with the shared `cli_utils` helpers already used by `run`, `serve`, `eval`, and `catalog`: - `--model -m` → `cli_utils.model_option(required=False)` (renames the perf-local kwarg from `model_id` to `model` to match the helper; downstream `BenchmarkConfig.model_id`, `generate_output_path`, and `_run_onnx_benchmark` are unaffected). - `--device` → `cli_utils.device_option(required=False, default="auto", include_auto=True)`. - `--ep` → `cli_utils.ep_option(required=False, optional_message="Overrides device-to-provider mapping.")`. Side benefit: `--ep` typos are now caught at parse time instead of failing deep in the build pipeline. Net diff: `+12 / -30` lines in `src/winml/modelkit/commands/perf.py`. Closes #552 Co-authored-by: hualxie <hualxie@microsoft.com>
…it codes (#597) ## Summary Five `sys.exit(N)` paths in `commands/perf.py` used inconsistent, non-standard exit codes — and most importantly, the no-modules-matched user-error path called `sys.exit(0)`, masking a typo as success and silently breaking CI. This PR replaces all five with the proper Click exception types: | Path | Before | After | Exit | |---|---|---|---| | Module config generation failed | `sys.exit(3)` | `click.ClickException` | 1 | | **No modules matched `--module CLASSNAME`** | `sys.exit(0)` ← the bug | `click.UsageError` | **2** | | Defensive: configs missing `model_type` | `sys.exit(3)` | `click.ClickException` | 1 | | Model file not found | `sys.exit(3)` | `click.UsageError` | 2 | | Generic benchmark failure | `sys.exit(4)` | `click.ClickException` | 1 | User-facing errors now exit 2 (Click's `UsageError` convention), runtime failures exit 1, success exits 0 — matching the rest of the CLI. The now-unused `import sys` is also dropped. Added `test_module_no_match_exits_nonzero` as a regression guard for the specific scenario the issue described. End-to-end repro of the bug now behaves correctly: ``` $ winml perf -m microsoft/resnet-50 --module DoesNotExist --ep cpu --device cpu Generating module configs for DoesNotExist... Usage: winml perf [OPTIONS] Try 'winml perf --help' for help. Error: No modules matching 'DoesNotExist' found $ echo $? 2 ``` Closes #554 --------- Co-authored-by: hualxie <hualxie@microsoft.com>
…square detection accuracy (#479) DETR-family exports currently omit `pixel_mask`, so non-square inputs must be distorted or padded without masking, which drives significant object-detection regression. This PR updates export config behavior so DETR/Table Transformer ONNX models retain `pixel_mask` as an input aligned to image spatial dimensions. - **ONNX config override for DETR-family inputs** - Added `DetrIOConfig` and `TableTransformerIOConfig` overrides in `src/winml/modelkit/models/hf/detr.py`. - Registered overrides for relevant tasks (`object-detection` plus existing DETR export tasks) via `register_onnx_overwrite`. - Extended input spec to include: - `pixel_values`: `[batch, channels, height, width]` - `pixel_mask`: `[batch, height, width]` - **Dummy input generation updated for export path** - Added shared mixin logic to inject `pixel_mask` during dummy input generation. - `pixel_mask` is generated as `int64` ones with spatial shape derived from `pixel_values`. - **Focused DETR/Table Transformer coverage** - Expanded `tests/unit/models/detr/test_onnx_config.py` to validate: - config registration resolves to the new overrides - `pixel_mask` is present in ONNX input metadata with correct dynamic axes - generated dummy inputs include correctly shaped/dtyped `pixel_mask` for both `detr` and `table-transformer` Example of the resulting export input contract: ```python { "pixel_values": {0: "batch_size", 1: "num_channels", 2: "height", 3: "width"}, "pixel_mask": {0: "batch_size", 1: "height", 2: "width"}, } ``` > [!WARNING] > > <details> > <summary>Firewall rules blocked me from connecting to one or more addresses (expand for details)</summary> > > #### I tried to connect to the following addresses, but was blocked by firewall rules: > > - `download-r2.pytorch.org` > - Triggering command: `/usr/bin/python python -m pip install torch --index-url REDACTED` (dns block) > > If you need me to access, download, or install something from one of these locations, you can either: > > - Configure [Actions setup steps](https://gh.io/copilot/actions-setup-steps) to set up my environment, which run before the firewall is enabled > - Add the appropriate URLs or hosts to the custom allowlist in this repository's [Copilot coding agent settings](https://github.com/microsoft/WinML-ModelKit/settings/copilot/coding_agent) (admins only) > > </details> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: chinazhangchao <3822520+chinazhangchao@users.noreply.github.com> Co-authored-by: Charles Zhang <zhangchao@microsoft.com> Co-authored-by: vortex-captain <75063846+vortex-captain@users.noreply.github.com>
- Force-add 192 perf.json files (previously gitignored) - Update SUMMARY.md: OV CPU perf 0/192 -> 192/192 (100%)
- Force-add 192 perf.json files (previously gitignored) - Update SUMMARY.md perf pass rate: 192/192 (100%)
…nfigs + MLAS results - generate_example_configs.py: NPU keeps w8a8/w8a16/fp16 sweep; CPU/GPU EPs now generate a single config without --precision (uses EP default). - Update examples/generate_config.md and examples/test_config.md accordingly. - Delete previous CPU/GPU configs and results under mlas/cpu, openvino/cpu, qnn/gpu, openvino/gpu, nv_tensorrt_rtx/gpu, dml/gpu. - Regenerate CPU configs (126) and GPU configs (252) using new scheme. - Run perf+eval for MLAS CPU on this machine (in progress; partial results committed).
Run finished 63/63. Eval results: PASS=49, FAIL=3, TIMEOUT=11. - Errors: microsoft/table-transformer-detection (object-detection), rizvandwiki/gender-classification (image-classification), w11wo/indonesian-roberta-base-posp-tagger (token-classification), Salesforce/blip-image-captioning-base (perf only; eval passed). - Timeouts (mostly large CPU-bound eval datasets): 5x fill-mask (FacebookAI/roberta-*, FacebookAI/xlm-roberta-*, google-bert/bert-base-multilingual-uncased), 6x zero-shot-image-classification (laion/CLIP-*, openai/clip-vit-*).
…ModelKit into shzhen/examples
…re); add report generator script - New scripts/generate_example_report.py generates REPORT.md from existing config/perf/eval/error/timeout artifacts for any EP/hardware folder. - Generated examples/mlas/cpu/REPORT.md from the recent MLAS run. - Force-added 62 *_perf.json files under examples/mlas/cpu/ so the table links in the report resolve (matches existing pattern under nv_tensorrt_rtx). - SUMMARY.md: removed broken REPORT.md links for openvino/cpu and qnn/gpu (no data yet for those EPs after CPU/GPU regen).
CPU: 55 models, 63 configs, perf 62/63 (98%), eval 46/63 (73%) GPU: 55 models, 63 configs, perf 62/63 (98%), eval 53/63 (84%) Updated SUMMARY.md and generated REPORT.md for both targets
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.