feat: add eval_option support in build config and eval command#383
Merged
Conversation
zhenchaoni
reviewed
Apr 27, 2026
d75d25f to
2dc8f75
Compare
2dc8f75 to
54c274b
Compare
- Rename eval_dataset to eval_option with WinMLEvaluationConfig type - Dynamic load WinMLEvaluationConfig in build config (lazy import) - Rename build_script to dataset_script - Add --dataset-script and --trust-remote-code CLI options - Decouple config defaults from script execution logic - Simplify: dataset script prints path to stdout, no cache_dir logic - Config section only provides default values, no file existence checks
a615e66 to
c63e6f3
Compare
cb403a4 to
a3f3e89
Compare
a3f3e89 to
1e8f748
Compare
zhenchaoni
requested changes
Apr 28, 2026
…l, move dataset fields
zhenchaoni
requested changes
May 6, 2026
6bc1d4a to
5c6f74a
Compare
xieofxie
reviewed
May 7, 2026
zhenchaoni
approved these changes
May 8, 2026
zhenchaoni
left a comment
Member
There was a problem hiding this comment.
Please add unit test for the config build. Ensure the cli option > config file > default
zhenchaoni
approved these changes
May 8, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add eval field to WinMLBuildConfig and refactor the eval command to a config-centric flow, enabling evaluation dataset configuration to be embedded in build config files.
Post-validation, fix one regression in CLI override mapping for output and extract a shared trust-remote-code CLI option across build/config/eval commands.
Changes
Test Commands
resnet-50 CLI:
uv run winml eval -m microsoft/resnet-50 --task image-classification --dataset timm/mini-imagenet --split test --samples 100 --device cpu --output temp/eval_compare/resnet50_cli_100.jsonresnet-50 Config-file:
uv run winml eval --config temp/eval_compare/config_resnet50.json -m microsoft/resnet-50 --samples 100 --device cpu --output temp/eval_compare/resnet50_config_100.jsonresnet-50 run_eval.py:
uv run python scripts/e2e_eval/run_eval.py --registry scripts/e2e_eval/testsets/models_with_acc.json --eval-type accuracy --device cpu --timeout 1800 --task image-classification --hf-model microsoft/resnet-50clip-vit-base-patch16 CLI:
uv run winml eval -m openai/clip-vit-base-patch16 --task zero-shot-image-classification --dataset uoft-cs/cifar100 --split test --samples 1000 --column input_column=img --column label_column=fine_label --device npu --output temp/eval_compare/clip_cli_1000.jsonclip-vit-base-patch16 Config-file:
uv run winml eval --config temp/eval_compare/config_clip_zsic.json -m openai/clip-vit-base-patch16 --output temp/eval_compare/clip_config_1000.jsonclip-vit-base-patch16 run_eval.py:
uv run python scripts/e2e_eval/run_eval.py --registry scripts/e2e_eval/testsets/models_with_acc.json --eval-type accuracy --device npu --timeout 3600 --task zero-shot-image-classification --hf-model openai/clip-vit-base-patch16Results