feat: add eval_option support in build config and eval command by ssss141414 · Pull Request #383 · microsoft/winml-cli

ssss141414 · 2026-04-23T07:20:59Z

Summary

Add eval field to WinMLBuildConfig and refactor the eval command to a config-centric flow, enabling evaluation dataset configuration to be embedded in build config files.
Post-validation, fix one regression in CLI override mapping for output and extract a shared trust-remote-code CLI option across build/config/eval commands.

Changes

Refactor eval command to config-centric 3-step flow: build config -> resolve -> evaluate
Merge precedence: CLI > config file eval > quant/loader > dataclass defaults
Rename eval_option -> eval in build config
Move dataset-related fields into dataset config (build_script, label_mapping_file)
Fix output override mapping (output_path cli_name metadata)
Extract shared trust_remote_code_option and reuse in build/config/eval

Test Commands

resnet-50 CLI:
uv run winml eval -m microsoft/resnet-50 --task image-classification --dataset timm/mini-imagenet --split test --samples 100 --device cpu --output temp/eval_compare/resnet50_cli_100.json
resnet-50 Config-file:
uv run winml eval --config temp/eval_compare/config_resnet50.json -m microsoft/resnet-50 --samples 100 --device cpu --output temp/eval_compare/resnet50_config_100.json
resnet-50 run_eval.py:
uv run python scripts/e2e_eval/run_eval.py --registry scripts/e2e_eval/testsets/models_with_acc.json --eval-type accuracy --device cpu --timeout 1800 --task image-classification --hf-model microsoft/resnet-50
clip-vit-base-patch16 CLI:
uv run winml eval -m openai/clip-vit-base-patch16 --task zero-shot-image-classification --dataset uoft-cs/cifar100 --split test --samples 1000 --column input_column=img --column label_column=fine_label --device npu --output temp/eval_compare/clip_cli_1000.json
clip-vit-base-patch16 Config-file:
uv run winml eval --config temp/eval_compare/config_clip_zsic.json -m openai/clip-vit-base-patch16 --output temp/eval_compare/clip_config_1000.json
clip-vit-base-patch16 run_eval.py:
uv run python scripts/e2e_eval/run_eval.py --registry scripts/e2e_eval/testsets/models_with_acc.json --eval-type accuracy --device npu --timeout 3600 --task zero-shot-image-classification --hf-model openai/clip-vit-base-patch16

Results

Model	CLI	Config-file	run_eval.py	Status
microsoft/resnet-50	0.78	0.78	0.784	PASS
openai/clip-vit-base-patch16	top1=63.4 / top5=87.9	top1=63.4 / top5=87.9	top1=62.5 (ACCURACY_PASS, baseline=62.6, delta=-0.1)	PASS

- Rename eval_dataset to eval_option with WinMLEvaluationConfig type - Dynamic load WinMLEvaluationConfig in build config (lazy import) - Rename build_script to dataset_script - Add --dataset-script and --trust-remote-code CLI options - Decouple config defaults from script execution logic - Simplify: dataset script prints path to stdout, no cache_dir logic - Config section only provides default values, no file existence checks

…l, move dataset fields

…s-eval-dataset

…se main's _resolve_model_path

…onfig load time

…ode option

zhenchaoni

Please add unit test for the config build. Ensure the cli option > config file > default

zhenchaoni reviewed Apr 27, 2026

View reviewed changes

DingmaomaoBJTU reviewed Apr 27, 2026

View reviewed changes

Comment thread src/winml/modelkit/config/build.py Outdated

DingmaomaoBJTU reviewed Apr 27, 2026

View reviewed changes

Comment thread src/winml/modelkit/config/build.py

ssss141414 force-pushed the shzhen/example-configs-eval-dataset branch from d75d25f to 2dc8f75 Compare April 27, 2026 13:17

Add eval_dataset support in config and eval command

54c274b

ssss141414 force-pushed the shzhen/example-configs-eval-dataset branch from 2dc8f75 to 54c274b Compare April 27, 2026 13:19

ssss141414 force-pushed the shzhen/example-configs-eval-dataset branch from a615e66 to c63e6f3 Compare April 27, 2026 13:42

ssss141414 changed the title ~~Add eval_dataset support in config + example configs for AMD/QNN/OV~~ feat: add eval_option support in build config and eval command Apr 27, 2026

ssss141414 force-pushed the shzhen/example-configs-eval-dataset branch 3 times, most recently from cb403a4 to a3f3e89 Compare April 27, 2026 14:38

fix comments

1e8f748

ssss141414 force-pushed the shzhen/example-configs-eval-dataset branch from a3f3e89 to 1e8f748 Compare April 27, 2026 14:46

DingmaomaoBJTU reviewed Apr 28, 2026

View reviewed changes

Comment thread scripts/e2e_eval/datasets/build_fairface.py Outdated

DingmaomaoBJTU reviewed Apr 28, 2026

View reviewed changes

zhenchaoni requested changes Apr 28, 2026

View reviewed changes

Comment thread src/winml/modelkit/commands/eval.py Outdated

Comment thread src/winml/modelkit/commands/eval.py Outdated

Comment thread src/winml/modelkit/commands/eval.py Outdated

Comment thread src/winml/modelkit/commands/eval.py Outdated

ssss141414 added 4 commits April 29, 2026 11:24

refactor eval command: config-centric flow, rename eval_option to eva…

1da3384

…l, move dataset fields

revert: restore build scripts to original state

75f931b

fix: require dataset.path when build_script is set, always pass --output

151e7a1

Merge remote-tracking branch 'origin/main' into shzhen/example-config…

4fce98d

…s-eval-dataset

zhenchaoni requested changes May 6, 2026

View reviewed changes

Comment thread src/winml/modelkit/commands/eval.py Outdated

Comment thread src/winml/modelkit/commands/eval.py Outdated

Comment thread src/winml/modelkit/commands/eval.py Outdated

ssss141414 added 2 commits May 7, 2026 10:50

merge: resolve conflicts with main (take main's eval.py as base)

b02e62f

refactor: single _build_eval_config, collect_cli_overrides utility, u…

7ffb245

…se main's _resolve_model_path

ssss141414 marked this pull request as ready for review May 7, 2026 07:44

ssss141414 requested a review from a team as a code owner May 7, 2026 07:44

fix: lazy import WinMLEvaluationConfig to avoid heavy dep import at c…

5c6f74a

…onfig load time

ssss141414 force-pushed the shzhen/example-configs-eval-dataset branch from 6bc1d4a to 5c6f74a Compare May 7, 2026 08:36

xieofxie reviewed May 7, 2026

View reviewed changes

Comment thread src/winml/modelkit/commands/eval.py Outdated

fix(eval): restore --output override mapping and share trust-remote-c…

5517257

…ode option

ssss141414 added 3 commits May 8, 2026 11:42

merge: resolve conflicts with main, integrate ep parameter

316c939

merge: include remaining main changes from previous merge

ecab53b

merge: resolve eval.py conflicts with main

2ab5ae1

zhenchaoni approved these changes May 8, 2026

View reviewed changes

test(eval): add precedence unit test for cli over config defaults

83d5a7c

zhenchaoni approved these changes May 8, 2026

View reviewed changes

ssss141414 merged commit 0765d40 into main May 8, 2026
9 checks passed

ssss141414 deleted the shzhen/example-configs-eval-dataset branch May 8, 2026 05:51

Uh oh!

Conversation

ssss141414 commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Test Commands

Results

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zhenchaoni left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ssss141414 commented Apr 23, 2026 •

edited

Loading