fix: op-tracing preflight, eval samples merge, compile device label, HTP tagged_nodes stats#209
fix: op-tracing preflight, eval samples merge, compile device label, HTP tagged_nodes stats#209xieofxie wants to merge 5 commits into
Conversation
- Replace fragile class-default comparison in evaluate.py with explicit_fields sentinel tracking; CLI --samples/--split/--shuffle now use None defaults - Add DatasetConfig.explicit_fields frozenset to track caller-set fields - Deduplicate tagged_nodes/coverage stats via _update_tag_stats() helper in HTPExporter; export() stats block now calls the shared helper - Fix test imports: HTPExporter from package level, DatasetConfig with explicit_fields in Bug B regression test
zhenchaoni
left a comment
There was a problem hiding this comment.
The change related to eval and evaluate looks good to me
There was a problem hiding this comment.
I dont quite get the default value changes here, why remove default values from cli args, and delay to the function call?
There was a problem hiding this comment.
I have updated the previous bugbash report in description
Bug B — P2: wmk eval --samples N silently ignored when no --dataset is given
Command: wmk eval -m microsoft/resnet-50 --samples 20
Symptom: Output always shows 'samples': 100 regardless of --samples value.
Root cause: evaluate.py:149-155 — when config.dataset.path is None (no --dataset flag), the entire DatasetConfig is replaced wholesale with the hardcoded _DEFAULT_DATASETS entry (which has samples=100), discarding the user's value:
if config.dataset.path is None:
config.dataset = deepcopy(default) # overwrites samples, split, shuffle
Fix: Merge user-specified fields (samples, split, shuffle, seed) onto the default rather than replacing it entirely.
Location: src/winml/modelkit/eval/evaluate.py:155
The None could let user decide either user value or default config value could take effect
Summary
wmk perf --op-tracing: movedis_qnn_profiling_available()check to beforePerfBenchmark.run()so the command fails fast instead of wasting a full benchmark runwmk eval --samples N: replaced wholesaleDatasetConfigoverwrite with a merge that preserves user-specifiedsamples,split,shuffle, andseedwhen falling back to a default datasetwmk compile --ep dml: derive the displayedDevice:label from_EP_TO_DEVICE[provider]instead of the raw CLI--devicedefault (npu)wmk export --no-hierarchy: gatetagged_nodesandcoverage_percentagestats onembed_hierarchy_attributesso they report0when hierarchy tagging is disabledEach bug is covered by a new regression test that failed before the fix and passes after.