-
Notifications
You must be signed in to change notification settings - Fork 4
example: add microsoft/swin-large-patch4-window7-224 #787
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
dbe77d5
add swin
4d55a2b
fix bug
831dd99
rename
5bd1dd5
Merge remote-tracking branch 'origin/main' into hualxie/ex_swing
fd72f51
fix comments
d28e46f
Merge remote-tracking branch 'origin/main' into hualxie/ex_swing
fa7a280
nit
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
144 changes: 144 additions & 0 deletions
144
examples/microsoft_swin-large-patch4-window7-224/README.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,144 @@ | ||
| # microsoft/swin-large-patch4-window7-224 | ||
|
|
||
| End-to-end build + accuracy + latency walkthrough for | ||
| `microsoft/swin-large-patch4-window7-224` (task: `image-classification`) | ||
| on the NPU, using the `timm/mini-imagenet` `test` split as the dataset. | ||
|
|
||
| Run all commands from the `ModelKit` repo root. | ||
|
|
||
| --- | ||
|
|
||
| ## 1. Build the model on NPU | ||
|
|
||
| Two steps: `winml config` generates a build config JSON, then | ||
| `winml build` consumes it. `--precision w8a16` is the default NPU | ||
| precision; the build produces a QDQ-quantized ONNX that executes on | ||
| the NPU. | ||
|
|
||
| ```powershell | ||
| winml config ` | ||
| -m microsoft/swin-large-patch4-window7-224 ` | ||
| --task image-classification ` | ||
| --device npu ` | ||
| --ep openvino ` | ||
| --precision w8a16 ` | ||
| -o build_config.json | ||
| ``` | ||
|
|
||
| ```powershell | ||
| winml build ` | ||
| -c build_config.json ` | ||
| -m microsoft/swin-large-patch4-window7-224 ` | ||
| --device npu ` | ||
| --ep openvino ` | ||
| --use-cache | ||
| ``` | ||
|
|
||
| Artifacts land under | ||
| `~/.cache/winml/artifacts/microsoft_swin-large-patch4-window7-224/` — | ||
| the file to evaluate is `imgcls_*_quantized.onnx`. | ||
|
|
||
| --- | ||
|
|
||
| ## 2. Evaluate on NPU with `winml eval` | ||
|
|
||
| The `timm/mini-imagenet` dataset is downloaded automatically from the | ||
| HuggingFace Hub by `winml eval` — no separate dataset build step is | ||
| needed. | ||
|
|
||
| Pass the ONNX file to `-m` and the HuggingFace model ID to `--model-id` | ||
| (needed for the image processor). `--output` writes a JSON file | ||
| containing the parsed metrics: | ||
|
|
||
| ```powershell | ||
| winml eval ` | ||
| -m $HOME/.cache/winml/artifacts/microsoft_swin-large-patch4-window7-224/imgcls_<hash>_quantized.onnx ` | ||
| --model-id microsoft/swin-large-patch4-window7-224 ` | ||
| --task image-classification ` | ||
| --device npu ` | ||
| --ep openvino ` | ||
| --dataset timm/mini-imagenet ` | ||
| --split test ` | ||
| --samples 1000 ` | ||
| --output winml_eval_output.json | ||
| ``` | ||
|
|
||
| Replace `<hash>` with the actual filename produced by step 1. | ||
|
|
||
| The accuracy value is `metrics.accuracy` inside | ||
| `winml_eval_output.json`. | ||
|
|
||
| --- | ||
|
|
||
| ## 3. Measure latency with `winml perf` | ||
|
|
||
| `winml perf` benchmarks the quantized ONNX directly using random | ||
| inputs derived from the model's I/O configuration. Point `-m` at the | ||
| same `*_quantized.onnx` produced in step 1. `--warmup` iterations are | ||
| excluded from the statistics; `--iterations` is the measured sample | ||
| count. | ||
|
|
||
| ```powershell | ||
| winml perf ` | ||
| -m $HOME/.cache/winml/artifacts/microsoft_swin-large-patch4-window7-224/imgcls_<hash>_quantized.onnx ` | ||
| --device npu ` | ||
| --ep openvino ` | ||
| --warmup 10 ` | ||
| --iterations 100 ` | ||
| -o winml_perf_output.json | ||
| ``` | ||
|
|
||
| The output JSON contains `latency_ms` (`mean`, `min`, `max`, `p50`, | ||
| `p90`, `p95`, `p99`, `std`) and `throughput` (`samples_per_sec`, | ||
| `batches_per_sec`). Mean and p50 latency are the headline numbers; | ||
| report them alongside the device and precision used. | ||
|
|
||
| --- | ||
|
|
||
| ## 4. Evaluate the original PyTorch model | ||
|
|
||
| `run_pytorch_baseline.py` loads the HuggingFace checkpoint with native | ||
| PyTorch on CPU and emits the same metric so the two runs are directly | ||
| comparable. The last stdout line is a single JSON object: | ||
| `{"metric": "accuracy", "value": <float>, "num_samples": <int>}`. | ||
|
|
||
| Pass `--perf-iterations N` (and optionally `--perf-warmup K`, default | ||
| `10`) to also measure PyTorch inference latency. When `N > 0`, the | ||
| script reuses the HuggingFace pipeline on the first dataset sample, | ||
| runs `K` untimed warmup iterations, then `N` timed iterations, and | ||
| emits a latency JSON line on stdout immediately before the metric | ||
| line. The metric line is still the final stdout line. | ||
|
|
||
| ```powershell | ||
| uv run python scripts/e2e_eval/run_pytorch_baseline.py ` | ||
| --model microsoft/swin-large-patch4-window7-224 ` | ||
| --task image-classification ` | ||
| --device cpu ` | ||
| --num-samples 1000 ` | ||
| --dataset timm/mini-imagenet ` | ||
| --split test ` | ||
| --winml-metric-key accuracy ` | ||
| --perf-warmup 10 ` | ||
| --perf-iterations 100 | ||
| ``` | ||
|
|
||
| The latency JSON line has the same `mean_ms` / `min_ms` / `max_ms` / | ||
| `p50_ms` / `p90_ms` / `p95_ms` / `p99_ms` keys as `winml perf` so the | ||
| two runs can be compared directly. | ||
|
|
||
| --- | ||
|
|
||
| ## 5. Comparing the results | ||
|
|
||
| For WinML, the accuracy value comes from `metrics.accuracy` in | ||
| `winml_eval_output.json` while for the PyTorch baseline, it comes from | ||
| the last stdout line. Latency comes from `latency_ms` in | ||
| `winml_perf_output.json` for WinML and from the latency JSON line on | ||
| stdout for the PyTorch baseline. | ||
|
|
||
| Result on CPU Intel(R) Core(TM) Ultra 7 258V: | ||
|
|
||
| | Model | Device | Precision | accuracy | mean latency (ms) | p50 latency (ms) | Size (MB) | | ||
| |---|---|---|---|---|---|---| | ||
| | PyTorch | CPU | fp32 | 0.837 | 662.3 | 647.9 | 750 | | ||
| | WinML (ONNX) | OpenVINO NPU | w8a16 (QDQ) | 0.836 | 64.9 | 64.3 | 193 | | ||
252 changes: 252 additions & 0 deletions
252
examples/microsoft_swin-large-patch4-window7-224/example.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,252 @@ | ||
| # ------------------------------------------------------------------------- | ||
| # Copyright (c) Microsoft Corporation. All rights reserved. | ||
| # Licensed under the MIT License. | ||
| # -------------------------------------------------------------------------- | ||
|
|
||
| """Run one image-classification inference with the WinML-built ONNX. | ||
|
|
||
| Mirrors the HuggingFace Swin Transformer usage example | ||
| (https://huggingface.co/docs/transformers/main/en/model_doc/swin) but | ||
| loads the quantized ONNX produced by ``winml build`` (step 1 of the | ||
| README) via :class:`WinMLAutoModel` instead of the original PyTorch | ||
| checkpoint. | ||
|
|
||
| The script preprocesses one image, runs inference, prints the top-5 | ||
| predicted classes (HF-docs format), and writes an annotated image with | ||
| the top-1 label drawn in the corner so the result is visually | ||
| verifiable. | ||
|
|
||
| Usage:: | ||
|
|
||
| uv run python examples/microsoft_swin-large-patch4-window7-224/example.py ` | ||
| --onnx $HOME/.cache/winml/artifacts/microsoft_swin-large-patch4-window7-224/` | ||
| `imgcls_<hash>_quantized.onnx | ||
| """ | ||
|
|
||
| from __future__ import annotations | ||
|
|
||
| import argparse | ||
| from pathlib import Path | ||
|
|
||
| import numpy as np | ||
| import torch | ||
| from PIL import Image, ImageDraw, ImageFont | ||
| from transformers import AutoConfig, AutoImageProcessor | ||
|
|
||
| from winml.modelkit import WinMLAutoModel | ||
|
|
||
|
|
||
| HF_MODEL_ID = "microsoft/swin-large-patch4-window7-224" | ||
| DEFAULT_DATASET = "timm/mini-imagenet" | ||
| DEFAULT_DATASET_SPLIT = "test" | ||
|
|
||
|
|
||
| def parse_args() -> argparse.Namespace: | ||
| """Parse command-line arguments.""" | ||
| parser = argparse.ArgumentParser(description=__doc__) | ||
| parser.add_argument( | ||
| "--onnx", | ||
| required=True, | ||
| type=Path, | ||
| help="Path to the quantized ONNX produced by step 1 of the README " | ||
| "(e.g. imgcls_<hash>_quantized.onnx).", | ||
| ) | ||
| parser.add_argument( | ||
| "--device", | ||
| default="npu", | ||
| choices=["auto", "npu", "gpu", "cpu"], | ||
| help="Target device (default: npu).", | ||
| ) | ||
| parser.add_argument( | ||
| "--ep", | ||
| default="openvino", | ||
| help="Execution provider alias (default: openvino).", | ||
| ) | ||
| parser.add_argument( | ||
| "--image", | ||
| type=Path, | ||
| default=None, | ||
| help="Local image path. If omitted, streams the first image from " | ||
| f"the {DEFAULT_DATASET} {DEFAULT_DATASET_SPLIT} split.", | ||
| ) | ||
| parser.add_argument( | ||
| "--top-k", | ||
| type=int, | ||
| default=5, | ||
| help="Number of top predictions to print (default: 5).", | ||
| ) | ||
| parser.add_argument( | ||
| "--output", | ||
| type=Path, | ||
| default=Path("prediction.png"), | ||
| help="Where to write the annotated image (default: prediction.png).", | ||
| ) | ||
| return parser.parse_args() | ||
|
|
||
|
|
||
| def load_image(image_arg: Path | None) -> tuple[Image.Image, str | None]: | ||
| """Load an image and (when streamed from the eval dataset) its WordNet synset. | ||
|
|
||
|
xieofxie marked this conversation as resolved.
|
||
| Returns ``(image, true_synset)``. ``true_synset`` is the WordNet ID | ||
| (e.g. ``"n01532829"``) for the dataset's labelled class, used as the | ||
| universal bridge between the dataset's class indexing and the model's. | ||
| ``None`` when the user supplied a custom ``--image``. | ||
| """ | ||
| if image_arg is not None: | ||
| return Image.open(image_arg.expanduser()).convert("RGB"), None | ||
|
|
||
| from datasets import load_dataset | ||
|
|
||
| # streaming=True so we only fetch the first sample instead of downloading | ||
| # the whole split. The ClassLabel feature (and its .names list) is still | ||
| # available on the streamed dataset, so we can recover the WordNet synset | ||
| # for the sample's integer label. trust_remote_code=False refuses to run | ||
| # any dataset-bundled loading script. | ||
| dataset = load_dataset( | ||
| DEFAULT_DATASET, | ||
| split=DEFAULT_DATASET_SPLIT, | ||
| streaming=True, | ||
| trust_remote_code=False, | ||
| ) | ||
| sample = next(iter(dataset)) | ||
|
|
||
| image = sample["image"] | ||
| if not isinstance(image, Image.Image): | ||
| image = Image.fromarray(np.asarray(image)) | ||
| image = image.convert("RGB") | ||
|
|
||
| label_value = sample.get("label") | ||
| label_feature = dataset.features.get("label") | ||
| if label_value is None or label_feature is None or not hasattr(label_feature, "names"): | ||
| return image, None | ||
| return image, label_feature.names[int(label_value)] | ||
|
|
||
|
|
||
| def imagenet_synset_to_id() -> dict[str, int]: | ||
| """Map WordNet synset ID -> ImageNet-1k class id (0-999). | ||
|
|
||
| Uses ``timm.data.ImageNetInfo`` so we don't have to ship the 1000-entry | ||
|
xieofxie marked this conversation as resolved.
|
||
| list inline. The mapping is the canonical ImageNet-1k ordering that | ||
| the model was trained against. | ||
|
|
||
| Requires the optional ``timm`` package (imported lazily here, like | ||
| ``datasets`` in ``load_image``); raises a clear error if it is missing. | ||
| """ | ||
| try: | ||
| from timm.data import ImageNetInfo | ||
| except ImportError as e: | ||
| raise ImportError( | ||
| "imagenet_synset_to_id() requires the 'timm' package. " | ||
| "Install it with `pip install timm`." | ||
| ) from e | ||
|
|
||
| info = ImageNetInfo() | ||
| return {synset: idx for idx, synset in enumerate(info.label_names())} | ||
|
|
||
|
|
||
| def draw_top_prediction( | ||
| image: Image.Image, | ||
| label: str, | ||
| score: float, | ||
| ) -> Image.Image: | ||
| """Draw the top-1 label + confidence on a copy of ``image``.""" | ||
| annotated = image.copy() | ||
| draw = ImageDraw.Draw(annotated) | ||
| try: | ||
| font = ImageFont.truetype("arial.ttf", size=max(14, annotated.height // 30)) | ||
| except OSError: | ||
| font = ImageFont.load_default() | ||
|
|
||
| caption = f"{label} ({score:.2f})" | ||
| tx0, ty0, tx1, ty1 = draw.textbbox((10, 10), caption, font=font) | ||
| pad = 6 | ||
| draw.rectangle( | ||
| [(tx0 - pad, ty0 - pad), (tx1 + pad, ty1 + pad)], | ||
| fill=(0, 0, 0), | ||
| ) | ||
| draw.text((10, 10), caption, fill=(255, 255, 255), font=font) | ||
| return annotated | ||
|
|
||
|
|
||
| def main() -> None: | ||
| """Load the quantized ONNX, run one inference, print + save the result.""" | ||
| args = parse_args() | ||
|
|
||
| image, true_synset = load_image(args.image) | ||
| image_processor = AutoImageProcessor.from_pretrained(HF_MODEL_ID) | ||
|
|
||
| # skip_build=True uses the ONNX as-is; it has already been optimized | ||
| # and quantized by `winml build`. use_cache=False avoids touching the | ||
| # winml artifact cache for this read-only example. | ||
| model = WinMLAutoModel.from_pretrained( | ||
| args.onnx.expanduser(), | ||
| task="image-classification", | ||
| device=args.device, | ||
| ep=args.ep, | ||
| skip_build=True, | ||
| use_cache=False, | ||
| ) | ||
|
|
||
| # Match the processor's output size to the ONNX's static input shape so | ||
| # pixel_values matches (B, C, H, W) exactly. | ||
| input_shapes = (model.io_config.get("input_shapes") or [[]])[0] | ||
| # Only applies to 4D image inputs (B, C, H, W); skip for other shapes. | ||
| if len(input_shapes) == 4: | ||
| _, _, h, w = input_shapes | ||
| image_processor.size = {"height": h, "width": w} | ||
|
|
||
| inputs = image_processor(images=image, return_tensors="pt") | ||
| outputs = model(pixel_values=inputs["pixel_values"]) | ||
|
|
||
| # logits: (1, num_classes). softmax → probabilities, then top-k. | ||
| logits = outputs.logits | ||
| probs = torch.softmax(logits, dim=-1)[0] | ||
| top_k = min(args.top_k, probs.numel()) | ||
| top_scores, top_ids = torch.topk(probs, k=top_k) | ||
|
|
||
| # WinML's bare-ONNX path doesn't attach an HF config to the model, so | ||
| # pull id2label from the HF hub for human-readable label names. | ||
|
xieofxie marked this conversation as resolved.
|
||
| id2label = AutoConfig.from_pretrained(HF_MODEL_ID).id2label | ||
|
|
||
| top_ids_list = top_ids.tolist() | ||
| top_label_names = [ | ||
| id2label.get(label_id, str(label_id)) for label_id in top_ids_list | ||
| ] | ||
|
|
||
| # Resolve the dataset's WordNet synset to an ImageNet-1k class id so we | ||
| # can compare against the model's prediction. The dataset (e.g. | ||
| # timm/mini-imagenet) often uses its own 0..N indexing over a subset of | ||
| # ImageNet-1k, so the raw integer label from the dataset does NOT match | ||
| # the model's class id — the synset is the universal bridge. | ||
| true_label_id: int | None = None | ||
| if true_synset is not None: | ||
| synset_to_id = imagenet_synset_to_id() | ||
| true_label_id = synset_to_id.get(true_synset) | ||
|
|
||
| if true_synset is not None: | ||
| if true_label_id is not None: | ||
| true_label_name = id2label.get(true_label_id, str(true_label_id)) | ||
| print(f"True label: {true_label_name} (synset={true_synset}, id={true_label_id})") | ||
| else: | ||
| print(f"True label: synset={true_synset} (not in ImageNet-1k vocabulary)") | ||
| else: | ||
| print("True label: unknown (custom --image)") | ||
| print(f"\nTop {top_k} predictions:") | ||
| for rank, (label, score) in enumerate( | ||
| zip(top_label_names, top_scores.tolist(), strict=True), start=1, | ||
| ): | ||
| print(f" {rank}. {label} ({score:.4f})") | ||
|
|
||
| if true_label_id is not None: | ||
| verdict = "PASS" if top_ids_list[0] == true_label_id else "FAIL" | ||
| print(f"\nVerdict (top-1): {verdict}") | ||
|
|
||
| annotated = draw_top_prediction(image, top_label_names[0], float(top_scores[0].item())) | ||
| output_path = args.output.expanduser() | ||
| output_path.parent.mkdir(parents=True, exist_ok=True) | ||
| annotated.save(output_path) | ||
| print(f"\nAnnotated image written to {output_path}") | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| main() | ||
File renamed without changes.
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.