diff --git a/examples/microsoft_swin-large-patch4-window7-224/README.md b/examples/microsoft_swin-large-patch4-window7-224/README.md new file mode 100644 index 000000000..ff9836712 --- /dev/null +++ b/examples/microsoft_swin-large-patch4-window7-224/README.md @@ -0,0 +1,144 @@ +# microsoft/swin-large-patch4-window7-224 + +End-to-end build + accuracy + latency walkthrough for +`microsoft/swin-large-patch4-window7-224` (task: `image-classification`) +on the NPU, using the `timm/mini-imagenet` `test` split as the dataset. + +Run all commands from the `ModelKit` repo root. + +--- + +## 1. Build the model on NPU + +Two steps: `winml config` generates a build config JSON, then +`winml build` consumes it. `--precision w8a16` is the default NPU +precision; the build produces a QDQ-quantized ONNX that executes on +the NPU. + +```powershell +winml config ` + -m microsoft/swin-large-patch4-window7-224 ` + --task image-classification ` + --device npu ` + --ep openvino ` + --precision w8a16 ` + -o build_config.json +``` + +```powershell +winml build ` + -c build_config.json ` + -m microsoft/swin-large-patch4-window7-224 ` + --device npu ` + --ep openvino ` + --use-cache +``` + +Artifacts land under +`~/.cache/winml/artifacts/microsoft_swin-large-patch4-window7-224/` — +the file to evaluate is `imgcls_*_quantized.onnx`. + +--- + +## 2. Evaluate on NPU with `winml eval` + +The `timm/mini-imagenet` dataset is downloaded automatically from the +HuggingFace Hub by `winml eval` — no separate dataset build step is +needed. + +Pass the ONNX file to `-m` and the HuggingFace model ID to `--model-id` +(needed for the image processor). `--output` writes a JSON file +containing the parsed metrics: + +```powershell +winml eval ` + -m $HOME/.cache/winml/artifacts/microsoft_swin-large-patch4-window7-224/imgcls__quantized.onnx ` + --model-id microsoft/swin-large-patch4-window7-224 ` + --task image-classification ` + --device npu ` + --ep openvino ` + --dataset timm/mini-imagenet ` + --split test ` + --samples 1000 ` + --output winml_eval_output.json +``` + +Replace `` with the actual filename produced by step 1. + +The accuracy value is `metrics.accuracy` inside +`winml_eval_output.json`. + +--- + +## 3. Measure latency with `winml perf` + +`winml perf` benchmarks the quantized ONNX directly using random +inputs derived from the model's I/O configuration. Point `-m` at the +same `*_quantized.onnx` produced in step 1. `--warmup` iterations are +excluded from the statistics; `--iterations` is the measured sample +count. + +```powershell +winml perf ` + -m $HOME/.cache/winml/artifacts/microsoft_swin-large-patch4-window7-224/imgcls__quantized.onnx ` + --device npu ` + --ep openvino ` + --warmup 10 ` + --iterations 100 ` + -o winml_perf_output.json +``` + +The output JSON contains `latency_ms` (`mean`, `min`, `max`, `p50`, +`p90`, `p95`, `p99`, `std`) and `throughput` (`samples_per_sec`, +`batches_per_sec`). Mean and p50 latency are the headline numbers; +report them alongside the device and precision used. + +--- + +## 4. Evaluate the original PyTorch model + +`run_pytorch_baseline.py` loads the HuggingFace checkpoint with native +PyTorch on CPU and emits the same metric so the two runs are directly +comparable. The last stdout line is a single JSON object: +`{"metric": "accuracy", "value": , "num_samples": }`. + +Pass `--perf-iterations N` (and optionally `--perf-warmup K`, default +`10`) to also measure PyTorch inference latency. When `N > 0`, the +script reuses the HuggingFace pipeline on the first dataset sample, +runs `K` untimed warmup iterations, then `N` timed iterations, and +emits a latency JSON line on stdout immediately before the metric +line. The metric line is still the final stdout line. + +```powershell +uv run python scripts/e2e_eval/run_pytorch_baseline.py ` + --model microsoft/swin-large-patch4-window7-224 ` + --task image-classification ` + --device cpu ` + --num-samples 1000 ` + --dataset timm/mini-imagenet ` + --split test ` + --winml-metric-key accuracy ` + --perf-warmup 10 ` + --perf-iterations 100 +``` + +The latency JSON line has the same `mean_ms` / `min_ms` / `max_ms` / +`p50_ms` / `p90_ms` / `p95_ms` / `p99_ms` keys as `winml perf` so the +two runs can be compared directly. + +--- + +## 5. Comparing the results + +For WinML, the accuracy value comes from `metrics.accuracy` in +`winml_eval_output.json` while for the PyTorch baseline, it comes from +the last stdout line. Latency comes from `latency_ms` in +`winml_perf_output.json` for WinML and from the latency JSON line on +stdout for the PyTorch baseline. + +Result on CPU Intel(R) Core(TM) Ultra 7 258V: + +| Model | Device | Precision | accuracy | mean latency (ms) | p50 latency (ms) | Size (MB) | +|---|---|---|---|---|---|---| +| PyTorch | CPU | fp32 | 0.837 | 662.3 | 647.9 | 750 | +| WinML (ONNX) | OpenVINO NPU | w8a16 (QDQ) | 0.836 | 64.9 | 64.3 | 193 | diff --git a/examples/microsoft_swin-large-patch4-window7-224/example.py b/examples/microsoft_swin-large-patch4-window7-224/example.py new file mode 100644 index 000000000..0ade9ab60 --- /dev/null +++ b/examples/microsoft_swin-large-patch4-window7-224/example.py @@ -0,0 +1,252 @@ +# ------------------------------------------------------------------------- +# Copyright (c) Microsoft Corporation. All rights reserved. +# Licensed under the MIT License. +# -------------------------------------------------------------------------- + +"""Run one image-classification inference with the WinML-built ONNX. + +Mirrors the HuggingFace Swin Transformer usage example +(https://huggingface.co/docs/transformers/main/en/model_doc/swin) but +loads the quantized ONNX produced by ``winml build`` (step 1 of the +README) via :class:`WinMLAutoModel` instead of the original PyTorch +checkpoint. + +The script preprocesses one image, runs inference, prints the top-5 +predicted classes (HF-docs format), and writes an annotated image with +the top-1 label drawn in the corner so the result is visually +verifiable. + +Usage:: + + uv run python examples/microsoft_swin-large-patch4-window7-224/example.py ` + --onnx $HOME/.cache/winml/artifacts/microsoft_swin-large-patch4-window7-224/` + `imgcls__quantized.onnx +""" + +from __future__ import annotations + +import argparse +from pathlib import Path + +import numpy as np +import torch +from PIL import Image, ImageDraw, ImageFont +from transformers import AutoConfig, AutoImageProcessor + +from winml.modelkit import WinMLAutoModel + + +HF_MODEL_ID = "microsoft/swin-large-patch4-window7-224" +DEFAULT_DATASET = "timm/mini-imagenet" +DEFAULT_DATASET_SPLIT = "test" + + +def parse_args() -> argparse.Namespace: + """Parse command-line arguments.""" + parser = argparse.ArgumentParser(description=__doc__) + parser.add_argument( + "--onnx", + required=True, + type=Path, + help="Path to the quantized ONNX produced by step 1 of the README " + "(e.g. imgcls__quantized.onnx).", + ) + parser.add_argument( + "--device", + default="npu", + choices=["auto", "npu", "gpu", "cpu"], + help="Target device (default: npu).", + ) + parser.add_argument( + "--ep", + default="openvino", + help="Execution provider alias (default: openvino).", + ) + parser.add_argument( + "--image", + type=Path, + default=None, + help="Local image path. If omitted, streams the first image from " + f"the {DEFAULT_DATASET} {DEFAULT_DATASET_SPLIT} split.", + ) + parser.add_argument( + "--top-k", + type=int, + default=5, + help="Number of top predictions to print (default: 5).", + ) + parser.add_argument( + "--output", + type=Path, + default=Path("prediction.png"), + help="Where to write the annotated image (default: prediction.png).", + ) + return parser.parse_args() + + +def load_image(image_arg: Path | None) -> tuple[Image.Image, str | None]: + """Load an image and (when streamed from the eval dataset) its WordNet synset. + + Returns ``(image, true_synset)``. ``true_synset`` is the WordNet ID + (e.g. ``"n01532829"``) for the dataset's labelled class, used as the + universal bridge between the dataset's class indexing and the model's. + ``None`` when the user supplied a custom ``--image``. + """ + if image_arg is not None: + return Image.open(image_arg.expanduser()).convert("RGB"), None + + from datasets import load_dataset + + # streaming=True so we only fetch the first sample instead of downloading + # the whole split. The ClassLabel feature (and its .names list) is still + # available on the streamed dataset, so we can recover the WordNet synset + # for the sample's integer label. trust_remote_code=False refuses to run + # any dataset-bundled loading script. + dataset = load_dataset( + DEFAULT_DATASET, + split=DEFAULT_DATASET_SPLIT, + streaming=True, + trust_remote_code=False, + ) + sample = next(iter(dataset)) + + image = sample["image"] + if not isinstance(image, Image.Image): + image = Image.fromarray(np.asarray(image)) + image = image.convert("RGB") + + label_value = sample.get("label") + label_feature = dataset.features.get("label") + if label_value is None or label_feature is None or not hasattr(label_feature, "names"): + return image, None + return image, label_feature.names[int(label_value)] + + +def imagenet_synset_to_id() -> dict[str, int]: + """Map WordNet synset ID -> ImageNet-1k class id (0-999). + + Uses ``timm.data.ImageNetInfo`` so we don't have to ship the 1000-entry + list inline. The mapping is the canonical ImageNet-1k ordering that + the model was trained against. + + Requires the optional ``timm`` package (imported lazily here, like + ``datasets`` in ``load_image``); raises a clear error if it is missing. + """ + try: + from timm.data import ImageNetInfo + except ImportError as e: + raise ImportError( + "imagenet_synset_to_id() requires the 'timm' package. " + "Install it with `pip install timm`." + ) from e + + info = ImageNetInfo() + return {synset: idx for idx, synset in enumerate(info.label_names())} + + +def draw_top_prediction( + image: Image.Image, + label: str, + score: float, +) -> Image.Image: + """Draw the top-1 label + confidence on a copy of ``image``.""" + annotated = image.copy() + draw = ImageDraw.Draw(annotated) + try: + font = ImageFont.truetype("arial.ttf", size=max(14, annotated.height // 30)) + except OSError: + font = ImageFont.load_default() + + caption = f"{label} ({score:.2f})" + tx0, ty0, tx1, ty1 = draw.textbbox((10, 10), caption, font=font) + pad = 6 + draw.rectangle( + [(tx0 - pad, ty0 - pad), (tx1 + pad, ty1 + pad)], + fill=(0, 0, 0), + ) + draw.text((10, 10), caption, fill=(255, 255, 255), font=font) + return annotated + + +def main() -> None: + """Load the quantized ONNX, run one inference, print + save the result.""" + args = parse_args() + + image, true_synset = load_image(args.image) + image_processor = AutoImageProcessor.from_pretrained(HF_MODEL_ID) + + # skip_build=True uses the ONNX as-is; it has already been optimized + # and quantized by `winml build`. use_cache=False avoids touching the + # winml artifact cache for this read-only example. + model = WinMLAutoModel.from_pretrained( + args.onnx.expanduser(), + task="image-classification", + device=args.device, + ep=args.ep, + skip_build=True, + use_cache=False, + ) + + # Match the processor's output size to the ONNX's static input shape so + # pixel_values matches (B, C, H, W) exactly. + input_shapes = (model.io_config.get("input_shapes") or [[]])[0] + # Only applies to 4D image inputs (B, C, H, W); skip for other shapes. + if len(input_shapes) == 4: + _, _, h, w = input_shapes + image_processor.size = {"height": h, "width": w} + + inputs = image_processor(images=image, return_tensors="pt") + outputs = model(pixel_values=inputs["pixel_values"]) + + # logits: (1, num_classes). softmax → probabilities, then top-k. + logits = outputs.logits + probs = torch.softmax(logits, dim=-1)[0] + top_k = min(args.top_k, probs.numel()) + top_scores, top_ids = torch.topk(probs, k=top_k) + + # WinML's bare-ONNX path doesn't attach an HF config to the model, so + # pull id2label from the HF hub for human-readable label names. + id2label = AutoConfig.from_pretrained(HF_MODEL_ID).id2label + + top_ids_list = top_ids.tolist() + top_label_names = [ + id2label.get(label_id, str(label_id)) for label_id in top_ids_list + ] + + # Resolve the dataset's WordNet synset to an ImageNet-1k class id so we + # can compare against the model's prediction. The dataset (e.g. + # timm/mini-imagenet) often uses its own 0..N indexing over a subset of + # ImageNet-1k, so the raw integer label from the dataset does NOT match + # the model's class id — the synset is the universal bridge. + true_label_id: int | None = None + if true_synset is not None: + synset_to_id = imagenet_synset_to_id() + true_label_id = synset_to_id.get(true_synset) + + if true_synset is not None: + if true_label_id is not None: + true_label_name = id2label.get(true_label_id, str(true_label_id)) + print(f"True label: {true_label_name} (synset={true_synset}, id={true_label_id})") + else: + print(f"True label: synset={true_synset} (not in ImageNet-1k vocabulary)") + else: + print("True label: unknown (custom --image)") + print(f"\nTop {top_k} predictions:") + for rank, (label, score) in enumerate( + zip(top_label_names, top_scores.tolist(), strict=True), start=1, + ): + print(f" {rank}. {label} ({score:.4f})") + + if true_label_id is not None: + verdict = "PASS" if top_ids_list[0] == true_label_id else "FAIL" + print(f"\nVerdict (top-1): {verdict}") + + annotated = draw_top_prediction(image, top_label_names[0], float(top_scores[0].item())) + output_path = args.output.expanduser() + output_path.parent.mkdir(parents=True, exist_ok=True) + annotated.save(output_path) + print(f"\nAnnotated image written to {output_path}") + + +if __name__ == "__main__": + main() diff --git a/examples/microsoft-table-transformer-detection/README.md b/examples/microsoft_table-transformer-detection/README.md similarity index 100% rename from examples/microsoft-table-transformer-detection/README.md rename to examples/microsoft_table-transformer-detection/README.md diff --git a/examples/microsoft-table-transformer-detection/example.py b/examples/microsoft_table-transformer-detection/example.py similarity index 99% rename from examples/microsoft-table-transformer-detection/example.py rename to examples/microsoft_table-transformer-detection/example.py index cea67b448..54d098b79 100644 --- a/examples/microsoft-table-transformer-detection/example.py +++ b/examples/microsoft_table-transformer-detection/example.py @@ -13,7 +13,7 @@ Usage:: - uv run python examples/microsoft-table-transformer-detection/example.py ` + uv run python examples/microsoft_table-transformer-detection/example.py ` --onnx $HOME/.cache/winml/artifacts/microsoft_table-transformer-detection/` `objdet__quantized.onnx """