Skip to content

winml1.8 recipes#407

Closed
ssss141414 wants to merge 77 commits into
mainfrom
shzhen/examples
Closed

winml1.8 recipes#407
ssss141414 wants to merge 77 commits into
mainfrom
shzhen/examples

Conversation

@ssss141414

Copy link
Copy Markdown
Contributor

No description provided.

- Rename eval_dataset to eval_option with WinMLEvaluationConfig type
- Dynamic load WinMLEvaluationConfig in build config (lazy import)
- Rename build_script to dataset_script
- Add --dataset-script and --trust-remote-code CLI options
- Decouple config defaults from script execution logic
- Simplify: dataset script prints path to stdout, no cache_dir logic
- Config section only provides default values, no file existence checks
- 432 configs: 48 models × 3 EPs (amd/qnn/ov) × 3 precisions (w8a8/w8a16/fp16)
- eval_option injected from models_with_acc.json where available
- scripts/generate_example_configs.py: batch config generation
- scripts/run_example_tests.py: batch eval testing
- examples/generate_config.md: guide for generating configs
- examples/test_config.md: guide for testing configs
…U eval results

- generate_example_configs.py: add --ep and --hardware args to filter EP generation
- QNN NPU: 50 new eval results (zero-shot-classification, zero-shot-image-classification)
  - 21 eval.json results, 2 timeouts, 27 error files (CLIP/SigLIP eval failures)
ssss141414 added 3 commits May 9, 2026 18:44
- Added zero-shot-classification results for cross-encoder, lxyuan, MoritzLaurer DeBERTa/mDeBERTa models
- Added zero-shot-image-classification results for CLIP, SigLIP, fashion-clip models
- Updated segformer-b5 with new w8a8 perf/eval results
- Fixed MoritzLaurer/deberta-v3-large-zeroshot-v2.0 fp16 timeout (now PASS)
- Updated REPORT.md (48->64 models) and SUMMARY.md
- Tested 57 new configs across 16 model+task combinations
- 7 new models fully pass (cross-encoder, joeddav, lxyuan, 4x MoritzLaurer)
- 9 zero-shot-image-classification combos fail (CLIP, SigLIP, fashion-clip)
- 2 large fill-mask models timeout (roberta-large, xlm-roberta-large)
- nvidia/segformer-b5 eval failures confirmed
- Updated REPORT.md: 48 -> 64 models, eval pass 93% -> 81%
- Updated SUMMARY.md with new stats
timenick and others added 28 commits May 15, 2026 11:56
Closes #583

## Summary

Rename the product from **"ModelKit"** / **"WinML ModelKit"** to
**"WinML CLI"** everywhere users see the name. CLI binary (`winml`) and
Python module path (`winml.modelkit`) are unchanged.

90 files, 314 / 316 lines (nearly symmetric — pure rename, near-zero
behavior change).

## What changed

### User-visible (the main goal)
- CLI `--help` text + module docstrings (`cli.py`, `__init__.py`,
`__main__.py`)
- All subcommand `--help` (`build`, `catalog`, `config`, `inspect`,
`serve`, `sys`)
- Serve API titles, console banners, `server_info["name"]` response
- Rich UI panel titles (catalog, sys)
- Runtime rule information messages
- README, CONTRIBUTING, SUPPORT, docs/Privacy, docs/naming-convention
- `serve/static/index.html` (browser title, logo, UI strings, code
examples, MCP server identifiers, Claude Code skill filename)
- `scripts/mcp_server.py` (FastMCP server name, descriptions, log
messages)
- Stale `wmk` CLI command examples in model docstrings -> `winml`

### Internal rename for consistency
- Telemetry event names `ModelKit{Heartbeat,Action,Error}` ->
`WinMLCLI{...}` (safe: not yet in production use, no dashboard
coordination required)
- Cache filename `modelkit.cache` -> `winmlcli.cache`
- Internal attribute flags `_modelkit_*` -> `_winmlcli_*` in
`telemetry/`
- Env vars `MODELKIT_*` -> `WINMLCLI_*`: `_RULES_DIR`,
`_SHOW_ALL_WARNINGS`, `_TIMING_LOG`, `_TELEMETRY_CACHE_DIR`
- Internal module docstrings (`cache/`, `core/`, `onnx/`, `utils/`, etc)
- `ModelKitPlugin` class identifier in Semantic Kernel example
- `producer_name` strings in test fixtures + `pattern/base.py`

### Adjacent fixes pulled in
- `pyproject.toml` URLs: `github.com/microsoft/ModelKit` ->
`WinML-ModelKit` (stale 404 URLs; real repo is
`microsoft/WinML-ModelKit`)
- Same URL fix in `README.md`, `SUPPORT.md`, `CONTRIBUTING.md`
- `WML` abbreviation -> `WinML` expansion in `__author__` and export
subsystem docstring (team identity unchanged)
- Stale duplicate copyright block (Apache-2.0 SPDX) removed from
`graphpipe/builders` test assets — verified original Microsoft code, no
Apache use

## Naming convention chosen

Per `docs/naming-convention.md` (3-letter acronyms stay uppercase):
- PascalCase: `WinMLCLI` (e.g. event names, class names)
- ALL_CAPS env: `WINMLCLI_*` (e.g. `WINMLCLI_RULES_DIR`)
- lowercase id: `winmlcli` (e.g. cache filename, MCP server name)
- Two-word user-facing: `WinML CLI` (e.g. docs, help text)
…--device/--ep (#588)

## Summary

- Validate numeric perf options at parse time using `click.IntRange`:
`--iterations` must be `>= 1`, `--warmup` must be `>= 0`. Previously
perf accepted `0` (or even negative) iterations and silently produced
empty/garbage stats.
- Replace three hand-rolled Click option blocks with the shared
`cli_utils` helpers already used by `run`, `serve`, `eval`, and
`catalog`:
- `--model -m` → `cli_utils.model_option(required=False)` (renames the
perf-local kwarg from `model_id` to `model` to match the helper;
downstream `BenchmarkConfig.model_id`, `generate_output_path`, and
`_run_onnx_benchmark` are unaffected).
- `--device` → `cli_utils.device_option(required=False, default="auto",
include_auto=True)`.
- `--ep` → `cli_utils.ep_option(required=False,
optional_message="Overrides device-to-provider mapping.")`. Side
benefit: `--ep` typos are now caught at parse time instead of failing
deep in the build pipeline.

Net diff: `+12 / -30` lines in `src/winml/modelkit/commands/perf.py`.

Closes #552

Co-authored-by: hualxie <hualxie@microsoft.com>
…it codes (#597)

## Summary

Five `sys.exit(N)` paths in `commands/perf.py` used inconsistent,
non-standard exit codes — and most importantly, the no-modules-matched
user-error path called `sys.exit(0)`, masking a typo as success and
silently breaking CI.

This PR replaces all five with the proper Click exception types:

| Path | Before | After | Exit |
|---|---|---|---|
| Module config generation failed | `sys.exit(3)` |
`click.ClickException` | 1 |
| **No modules matched `--module CLASSNAME`** | `sys.exit(0)` ← the bug
| `click.UsageError` | **2** |
| Defensive: configs missing `model_type` | `sys.exit(3)` |
`click.ClickException` | 1 |
| Model file not found | `sys.exit(3)` | `click.UsageError` | 2 |
| Generic benchmark failure | `sys.exit(4)` | `click.ClickException` | 1
|

User-facing errors now exit 2 (Click's `UsageError` convention), runtime
failures exit 1, success exits 0 — matching the rest of the CLI. The
now-unused `import sys` is also dropped.

Added `test_module_no_match_exits_nonzero` as a regression guard for the
specific scenario the issue described.

End-to-end repro of the bug now behaves correctly:

```
$ winml perf -m microsoft/resnet-50 --module DoesNotExist --ep cpu --device cpu
Generating module configs for DoesNotExist...
Usage: winml perf [OPTIONS]
Try 'winml perf --help' for help.

Error: No modules matching 'DoesNotExist' found
$ echo $?
2
```

Closes #554

---------

Co-authored-by: hualxie <hualxie@microsoft.com>
…square detection accuracy (#479)

DETR-family exports currently omit `pixel_mask`, so non-square inputs
must be distorted or padded without masking, which drives significant
object-detection regression. This PR updates export config behavior so
DETR/Table Transformer ONNX models retain `pixel_mask` as an input
aligned to image spatial dimensions.

- **ONNX config override for DETR-family inputs**
- Added `DetrIOConfig` and `TableTransformerIOConfig` overrides in
`src/winml/modelkit/models/hf/detr.py`.
- Registered overrides for relevant tasks (`object-detection` plus
existing DETR export tasks) via `register_onnx_overwrite`.
  - Extended input spec to include:
    - `pixel_values`: `[batch, channels, height, width]`
    - `pixel_mask`: `[batch, height, width]`

- **Dummy input generation updated for export path**
- Added shared mixin logic to inject `pixel_mask` during dummy input
generation.
- `pixel_mask` is generated as `int64` ones with spatial shape derived
from `pixel_values`.

- **Focused DETR/Table Transformer coverage**
  - Expanded `tests/unit/models/detr/test_onnx_config.py` to validate:
    - config registration resolves to the new overrides
- `pixel_mask` is present in ONNX input metadata with correct dynamic
axes
- generated dummy inputs include correctly shaped/dtyped `pixel_mask`
for both `detr` and `table-transformer`

Example of the resulting export input contract:

```python
{
    "pixel_values": {0: "batch_size", 1: "num_channels", 2: "height", 3: "width"},
    "pixel_mask": {0: "batch_size", 1: "height", 2: "width"},
}
```

> [!WARNING]
>
> <details>
> <summary>Firewall rules blocked me from connecting to one or more
addresses (expand for details)</summary>
>
> #### I tried to connect to the following addresses, but was blocked by
firewall rules:
>
> - `download-r2.pytorch.org`
> - Triggering command: `/usr/bin/python python -m pip install torch
--index-url REDACTED` (dns block)
>
> If you need me to access, download, or install something from one of
these locations, you can either:
>
> - Configure [Actions setup
steps](https://gh.io/copilot/actions-setup-steps) to set up my
environment, which run before the firewall is enabled
> - Add the appropriate URLs or hosts to the custom allowlist in this
repository's [Copilot coding agent
settings](https://github.com/microsoft/WinML-ModelKit/settings/copilot/coding_agent)
(admins only)
>
> </details>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: chinazhangchao <3822520+chinazhangchao@users.noreply.github.com>
Co-authored-by: Charles Zhang <zhangchao@microsoft.com>
Co-authored-by: vortex-captain <75063846+vortex-captain@users.noreply.github.com>
- Force-add 192 perf.json files (previously gitignored)
- Update SUMMARY.md: OV CPU perf 0/192 -> 192/192 (100%)
- Force-add 192 perf.json files (previously gitignored)
- Update SUMMARY.md perf pass rate: 192/192 (100%)
…nfigs + MLAS results

- generate_example_configs.py: NPU keeps w8a8/w8a16/fp16 sweep; CPU/GPU EPs now generate a single config without --precision (uses EP default).
- Update examples/generate_config.md and examples/test_config.md accordingly.
- Delete previous CPU/GPU configs and results under mlas/cpu, openvino/cpu, qnn/gpu, openvino/gpu, nv_tensorrt_rtx/gpu, dml/gpu.
- Regenerate CPU configs (126) and GPU configs (252) using new scheme.
- Run perf+eval for MLAS CPU on this machine (in progress; partial results committed).
Run finished 63/63. Eval results: PASS=49, FAIL=3, TIMEOUT=11.
- Errors: microsoft/table-transformer-detection (object-detection), rizvandwiki/gender-classification (image-classification), w11wo/indonesian-roberta-base-posp-tagger (token-classification), Salesforce/blip-image-captioning-base (perf only; eval passed).
- Timeouts (mostly large CPU-bound eval datasets): 5x fill-mask (FacebookAI/roberta-*, FacebookAI/xlm-roberta-*, google-bert/bert-base-multilingual-uncased), 6x zero-shot-image-classification (laion/CLIP-*, openai/clip-vit-*).
…re); add report generator script

- New scripts/generate_example_report.py generates REPORT.md from existing
  config/perf/eval/error/timeout artifacts for any EP/hardware folder.
- Generated examples/mlas/cpu/REPORT.md from the recent MLAS run.
- Force-added 62 *_perf.json files under examples/mlas/cpu/ so the table
  links in the report resolve (matches existing pattern under nv_tensorrt_rtx).
- SUMMARY.md: removed broken REPORT.md links for openvino/cpu and qnn/gpu
  (no data yet for those EPs after CPU/GPU regen).
CPU: 55 models, 63 configs, perf 62/63 (98%), eval 46/63 (73%)
GPU: 55 models, 63 configs, perf 62/63 (98%), eval 53/63 (84%)
Updated SUMMARY.md and generated REPORT.md for both targets
@ssss141414 ssss141414 closed this May 19, 2026
@ssss141414 ssss141414 changed the title Shzhen/examples winml1.8 recipes May 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants