Skip to content

feat: add eval_option support in build config and eval command#383

Merged
ssss141414 merged 15 commits into
mainfrom
shzhen/example-configs-eval-dataset
May 8, 2026
Merged

feat: add eval_option support in build config and eval command#383
ssss141414 merged 15 commits into
mainfrom
shzhen/example-configs-eval-dataset

Conversation

@ssss141414

@ssss141414 ssss141414 commented Apr 23, 2026

Copy link
Copy Markdown
Contributor

Summary

Add eval field to WinMLBuildConfig and refactor the eval command to a config-centric flow, enabling evaluation dataset configuration to be embedded in build config files.
Post-validation, fix one regression in CLI override mapping for output and extract a shared trust-remote-code CLI option across build/config/eval commands.

Changes

  • Refactor eval command to config-centric 3-step flow: build config -> resolve -> evaluate
  • Merge precedence: CLI > config file eval > quant/loader > dataclass defaults
  • Rename eval_option -> eval in build config
  • Move dataset-related fields into dataset config (build_script, label_mapping_file)
  • Fix output override mapping (output_path cli_name metadata)
  • Extract shared trust_remote_code_option and reuse in build/config/eval

Test Commands

  • resnet-50 CLI:
    uv run winml eval -m microsoft/resnet-50 --task image-classification --dataset timm/mini-imagenet --split test --samples 100 --device cpu --output temp/eval_compare/resnet50_cli_100.json

  • resnet-50 Config-file:
    uv run winml eval --config temp/eval_compare/config_resnet50.json -m microsoft/resnet-50 --samples 100 --device cpu --output temp/eval_compare/resnet50_config_100.json

  • resnet-50 run_eval.py:
    uv run python scripts/e2e_eval/run_eval.py --registry scripts/e2e_eval/testsets/models_with_acc.json --eval-type accuracy --device cpu --timeout 1800 --task image-classification --hf-model microsoft/resnet-50

  • clip-vit-base-patch16 CLI:
    uv run winml eval -m openai/clip-vit-base-patch16 --task zero-shot-image-classification --dataset uoft-cs/cifar100 --split test --samples 1000 --column input_column=img --column label_column=fine_label --device npu --output temp/eval_compare/clip_cli_1000.json

  • clip-vit-base-patch16 Config-file:
    uv run winml eval --config temp/eval_compare/config_clip_zsic.json -m openai/clip-vit-base-patch16 --output temp/eval_compare/clip_config_1000.json

  • clip-vit-base-patch16 run_eval.py:
    uv run python scripts/e2e_eval/run_eval.py --registry scripts/e2e_eval/testsets/models_with_acc.json --eval-type accuracy --device npu --timeout 3600 --task zero-shot-image-classification --hf-model openai/clip-vit-base-patch16

Results

Model CLI Config-file run_eval.py Status
microsoft/resnet-50 0.78 0.78 0.784 PASS
openai/clip-vit-base-patch16 top1=63.4 / top5=87.9 top1=63.4 / top5=87.9 top1=62.5 (ACCURACY_PASS, baseline=62.6, delta=-0.1) PASS

Comment thread src/winml/modelkit/commands/eval.py Outdated
Comment thread src/winml/modelkit/commands/eval.py Outdated
Comment thread src/winml/modelkit/commands/eval.py Outdated
Comment thread src/winml/modelkit/commands/eval.py Outdated
Comment thread src/winml/modelkit/commands/eval.py Outdated
Comment thread src/winml/modelkit/config/build.py Outdated
Comment thread src/winml/modelkit/config/build.py
@ssss141414 ssss141414 force-pushed the shzhen/example-configs-eval-dataset branch from d75d25f to 2dc8f75 Compare April 27, 2026 13:17
@ssss141414 ssss141414 force-pushed the shzhen/example-configs-eval-dataset branch from 2dc8f75 to 54c274b Compare April 27, 2026 13:19
- Rename eval_dataset to eval_option with WinMLEvaluationConfig type
- Dynamic load WinMLEvaluationConfig in build config (lazy import)
- Rename build_script to dataset_script
- Add --dataset-script and --trust-remote-code CLI options
- Decouple config defaults from script execution logic
- Simplify: dataset script prints path to stdout, no cache_dir logic
- Config section only provides default values, no file existence checks
@ssss141414 ssss141414 force-pushed the shzhen/example-configs-eval-dataset branch from a615e66 to c63e6f3 Compare April 27, 2026 13:42
@ssss141414 ssss141414 changed the title Add eval_dataset support in config + example configs for AMD/QNN/OV feat: add eval_option support in build config and eval command Apr 27, 2026
@ssss141414 ssss141414 force-pushed the shzhen/example-configs-eval-dataset branch 3 times, most recently from cb403a4 to a3f3e89 Compare April 27, 2026 14:38
@ssss141414 ssss141414 force-pushed the shzhen/example-configs-eval-dataset branch from a3f3e89 to 1e8f748 Compare April 27, 2026 14:46
Comment thread scripts/e2e_eval/datasets/build_fairface.py Outdated
Comment thread src/winml/modelkit/commands/eval.py Outdated
Comment thread src/winml/modelkit/commands/eval.py Outdated
Comment thread src/winml/modelkit/commands/eval.py Outdated
Comment thread src/winml/modelkit/commands/eval.py Outdated
Comment thread scripts/e2e_eval/datasets/build_fairface.py Outdated
Comment thread src/winml/modelkit/config/build.py Outdated
Comment thread src/winml/modelkit/commands/eval.py Outdated
Comment thread src/winml/modelkit/commands/eval.py Outdated
Comment thread src/winml/modelkit/commands/eval.py Outdated
Comment thread src/winml/modelkit/commands/eval.py Outdated
Comment thread src/winml/modelkit/commands/eval.py Outdated
Comment thread src/winml/modelkit/commands/eval.py Outdated
Comment thread src/winml/modelkit/commands/eval.py Outdated
@ssss141414 ssss141414 marked this pull request as ready for review May 7, 2026 07:44
@ssss141414 ssss141414 requested a review from a team as a code owner May 7, 2026 07:44
@ssss141414 ssss141414 force-pushed the shzhen/example-configs-eval-dataset branch from 6bc1d4a to 5c6f74a Compare May 7, 2026 08:36
Comment thread src/winml/modelkit/commands/eval.py Outdated

@zhenchaoni zhenchaoni left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add unit test for the config build. Ensure the cli option > config file > default

@ssss141414 ssss141414 merged commit 0765d40 into main May 8, 2026
9 checks passed
@ssss141414 ssss141414 deleted the shzhen/example-configs-eval-dataset branch May 8, 2026 05:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants