Fix model-task inconsistency for vision feature-extraction models by zhenchaoni · Pull Request #786 · microsoft/winml-cli

zhenchaoni · 2026-05-29T02:56:05Z

Fix model-task inconsistency for vision feature-extraction models

Principle

winml inspect is the source of truth for valid (model_id, task) pairs. Both feature-extraction and image-feature-extraction are valid ways to address an image-embedding model like facebook/dinov2-base. Downstream commands must accept whichever name winml inspect accepts, then use (model_id, task) to locate the concrete class to act on.

Root cause

Optimum's TasksManager.get_exporter_config_constructor only knows canonical Optimum task names. Several call sites passed the raw user-supplied task straight through, so HF aliases like image-feature-extraction were rejected with "Unsupported". The evaluator additionally needs to know which HF pipeline name to dispatch on, which the canonical Optimum task name doesn't carry by itself for bimodal tasks like feature-extraction.

Fix

Inspect / export / HTP exporter: normalize via _map_task_synonym(task) (in export/io.py) before any TasksManager lookup because it requires normalized task input. This is a single function reused at each TasksManager boundary — no new global table.
Quantize: _resolve_dataset_class(task, io_config) in datasets/__init__.py dispatches to TextDataset / ImageDataset based on the actual ONNX input names. No AutoConfig.from_pretrained round-trip. Bimodal io_configs fall back to RandomDataset with a warning.
Evaluate: Because HF pipeline and evaluate library have their task name convention, to_hf_pipeline_task(task, model_id) in eval/evaluate.py translates to the HF pipeline name the underlying evaluate library expects. Uses OnnxConfig.inputs (no weights loaded) to pick the modality. Bimodal models (e.g. CLIP combined: both pixel_values and input_ids) keep the task unchanged via a len(hits) == 1 guard, preserving the explicit user task.

Validation

facebook/dinov2-base:

Command	Before	After
`winml inspect -m facebook/dinov2-base --task image-feature-extraction`	"Unsupported"	Resolves via `Dinov2OnnxConfig`
`winml export -m facebook/dinov2-base -t image-feature-extraction`	KeyError on TasksManager	Valid ONNX with `last_hidden_state`
`winml eval -m facebook/dinov2-base --task feature-extraction`	`RuntimeError: Failed to create feature-extraction dataset`	kNN metrics on mini-imagenet
`winml quantize <onnx> --task feature-extraction -m facebook/dinov2-small`	Failure by using TextDataset	Routes to `ImageDataset`

openai/clip-vit-base-patch32 (bimodal, regression check):

winml eval -m openai/clip-vit-base-patch32 --task feature-extraction → stays feature-extraction (text STS evaluator); not silently rerouted to image.
winml eval -m openai/clip-vit-base-patch32 (auto-detect) → resolves to feature-extraction (text).

Tests

Unit:

tests/unit/eval/test_eval.py::TestResolveTask — auto-detect, explicit task, bimodal guard, HF pipeline translation.
test_random_dataset.py — TASK_DATASET_MAPPING covers all registered tasks, including bimodal dict-of-dict.

E2E (-m e2e, dinov2 chosen because it isn't in MODEL_BUILD_CONFIGS and so actually exercises the TasksManager path):

tests/e2e/test_inspect_e2e.py::TestInspectDinoV2 — both image-feature-extraction and feature-extraction resolve.
tests/e2e/test_export_e2e.py::TestExportDinoV2::test_image_feature_extraction.
tests/e2e/test_eval_e2e.py::TestEvalPerTask::test_image_feature_extraction parameterized over both task names.
tests/e2e/test_quantize_e2e.py::test_feature_extraction_with_pixel_values_uses_image_dataset.

timenick · 2026-05-29T03:16:42Z

Thanks for tackling these three — the e2e coverage on dinov2 end-to-end across inspect/export/eval/quantize is exactly what was missing. Two concrete concerns and a proposal before this lands.

Concrete regression: CLIP `feature-extraction` will swap dataset class

canonical_task_to_known_task iterates _OPTIMUM_CANONICAL_TO_KNOWN_TASK["feature-extraction"] and returns on first input-name match. For CLIP, CLIPOnnxConfig for feature-extraction declares both pixel_values and input_ids (verified via TasksManager.get_exporter_config_constructor("onnx", model_type="clip", task="feature-extraction") — inputs are ['pixel_values', 'input_ids', 'attention_mask']). With current dict insertion order, pixel_values matches first → routes to image-feature-extraction.

But src/winml/modelkit/models/hf/clip.py:53 already encodes the right answer:

("clip", "feature-extraction"): CLIPTextModelWithProjection,        # text encoder
("clip", "image-feature-extraction"): CLIPVisionModelWithProjection, # vision encoder

So winml quantize -m openai/clip-vit-base-patch32 --task feature-extraction:

Before this PR: loader picks CLIPTextModelWithProjection, datasets picks TextDataset → consistent
After this PR: loader still picks CLIPTextModelWithProjection, datasets translated to image-feature-extraction → ImageDataset → dtype/shape mismatch at calibration

The input-name heuristic doesn't have enough information to disambiguate multi-modal architectures; MODEL_CLASS_MAPPING already holds the truth and disagrees with what the heuristic returns.

`DatasetCalibrationReader.init` quietly rewrites `self.task`

Caller passes task="feature-extraction", but after the new lines self.task becomes "image-feature-extraction". Any downstream consumer of reader.task expecting "what the caller asked for" gets something different. Either name the new attribute (self.resolved_task?) or document the semantic shift explicitly on the class.

Architectural direction — two patches vs. one source fix

The pattern we're ending up with: three places inline-call _map_task_synonym (forward), plus a recovery function (reverse) that has to re-derive what was lost. Both directions exist because _detect_task_from_config drops modality at the source by calling TasksManager.infer_task_from_model, which is Optimum-canonical.

Prevention alternative: have _detect_task_from_config return the HF (modality-aware) task to begin with (HF Hub pipeline_tag first, then a (model_type, config)→HF modality fallback). Then:

the three inline forward-direction calls collapse into one boundary helper (we'd want this regardless — a lint rule against direct TasksManager imports keeps new code honest),
canonical_task_to_known_task and _OPTIMUM_CANONICAL_TO_KNOWN_TASK go away,
the CLIP-style ambiguity above doesn't exist because we never need to recover modality from input names.

Proposed path

Fix the CLIP regression in this PR — it's a marquee architecture and the regression is real.
Resolve the self.task semantic shift (rename or document).
Merge — the e2e fix value for Model task handling is not consistent between config and build #777/winml build and eval command fails on image embedding model like facebook/dinov2-base #778/winml inspect -m facebook/dinov2-base --task image-feature-extraction returns unsupported #782 is real and shouldn't wait.
I'll open a follow-up doing the prevention refactor: _detect_task_from_config → HF task, centralize the forward-normalize into a boundary helper with a lint rule, delete the canonical-to-known table. Your e2e tests stay green throughout.

Happy to pair on the follow-up — there's overlap with the #724 / KNOWN_TASKS cleanup I'm queuing up separately.

(Edit: removed accidental #1/#2/#3 section markers that GitHub auto-linked to unrelated issues.)

DingmaomaoBJTU

Good fix for the task-naming inconsistency between HF pipeline aliases and Optimum's canonical names. The approach of normalizing via _map_task_synonym at each TasksManager call site is correct. The to_hf_pipeline_task evaluator mapper and the bimodal feature-extraction dataset dispatch are both well-thought-out. A few observations:

## Fix model-task inconsistency for vision feature-extraction models Fixes #777, #778, #782. ### Principle `winml inspect` is the source of truth for valid `(model_id, task)` pairs. Both `feature-extraction` and `image-feature-extraction` are valid ways to address an image-embedding model like `facebook/dinov2-base`. Downstream commands must accept whichever name `winml inspect` accepts, then use `(model_id, task)` to locate the concrete class to act on. ### Root cause Optimum's `TasksManager.get_exporter_config_constructor` only knows canonical Optimum task names. Several call sites passed the raw user-supplied task straight through, so HF aliases like `image-feature-extraction` were rejected with "Unsupported". The evaluator additionally needs to know which HF pipeline name to dispatch on, which the canonical Optimum task name doesn't carry by itself for bimodal tasks like `feature-extraction`. ### Fix - **Inspect / export / HTP exporter**: normalize via `_map_task_synonym(task)` (in `export/io.py`) before any `TasksManager` lookup because it requires normalized task input. This is a single function reused at each `TasksManager` boundary — no new global table. - **Quantize**: `_resolve_dataset_class(task, io_config)` in `datasets/__init__.py` dispatches to `TextDataset` / `ImageDataset` based on the actual ONNX input names. No `AutoConfig.from_pretrained` round-trip. Bimodal io_configs fall back to `RandomDataset` with a warning. - **Evaluate**: Because HF pipeline and evaluate library have their task name convention, `to_hf_pipeline_task(task, model_id)` in `eval/evaluate.py` translates to the HF pipeline name the underlying `evaluate` library expects. Uses `OnnxConfig.inputs` (no weights loaded) to pick the modality. Bimodal models (e.g. CLIP combined: both `pixel_values` and `input_ids`) keep the task unchanged via a `len(hits) == 1` guard, preserving the explicit user task. ### Validation `facebook/dinov2-base`: | Command | Before | After | |---|---|---| | `winml inspect -m facebook/dinov2-base --task image-feature-extraction` | "Unsupported" | Resolves via `Dinov2OnnxConfig` | | `winml export -m facebook/dinov2-base -t image-feature-extraction` | KeyError on TasksManager | Valid ONNX with `last_hidden_state` | | `winml eval -m facebook/dinov2-base --task feature-extraction` | `RuntimeError: Failed to create feature-extraction dataset` | kNN metrics on mini-imagenet | | `winml quantize <onnx> --task feature-extraction -m facebook/dinov2-small` | Failure by using TextDataset | Routes to `ImageDataset` | `openai/clip-vit-base-patch32` (bimodal, regression check): - `winml eval -m openai/clip-vit-base-patch32 --task feature-extraction` → stays `feature-extraction` (text STS evaluator); not silently rerouted to image. - `winml eval -m openai/clip-vit-base-patch32` (auto-detect) → resolves to `feature-extraction` (text). ### Tests Unit: - `tests/unit/eval/test_eval.py::TestResolveTask` — auto-detect, explicit task, bimodal guard, HF pipeline translation. - test_random_dataset.py — `TASK_DATASET_MAPPING` covers all registered tasks, including bimodal dict-of-dict. E2E (`-m e2e`, dinov2 chosen because it isn't in `MODEL_BUILD_CONFIGS` and so actually exercises the `TasksManager` path): - `tests/e2e/test_inspect_e2e.py::TestInspectDinoV2` — both `image-feature-extraction` and `feature-extraction` resolve. - `tests/e2e/test_export_e2e.py::TestExportDinoV2::test_image_feature_extraction`. - `tests/e2e/test_eval_e2e.py::TestEvalPerTask::test_image_feature_extraction` parameterized over both task names. - `tests/e2e/test_quantize_e2e.py::test_feature_extraction_with_pixel_values_uses_image_dataset`.

@zhenchaoni

…e logic (#820) ## What `#807` (merged) made `detect_task` modality-aware and removed `#786`'s bimodal `feature-extraction` io_config reverse-reconstruction (in both `datasets` and `eval`). But `#786`'s e2e tests — still on `main` — assert the *old* bimodal behavior, so they now fail on `main`: - `tests/e2e/test_quantize_e2e.py::TestPerTaskDatasets::test_feature_extraction_with_pixel_values_uses_image_dataset` - `tests/e2e/test_eval_e2e.py::TestEvalPerTask::test_image_feature_extraction[feature-extraction]` Both crash with `KeyError: Dinov2Config` — a vision model + `--task feature-extraction` now resolves to the text `TextDataset`, which tries to load a tokenizer DINOv2 does not have. ## Why update the tests (not revert the code) Under the merged modality-aware task vocabulary: - `feature-extraction` is the **text** feature task; a vision embedding model's canonical task is **`image-feature-extraction`**. - `winml inspect` / auto-detect always report `image-feature-extraction` for a vision model (e.g. `facebook/dinov2-small`) — the tool never hands a user `--task feature-extraction` for a vision model. - So explicit `--task feature-extraction` on a vision model is a genuine modality mismatch and is expected to error. The `#786` bimodal io_config dispatch (which silently recovered modality from the ONNX inputs) was deliberately removed in `#807`. The e2e tests follow the new logic: - **quantize**: `test_feature_extraction_with_pixel_values_uses_image_dataset` → `test_image_feature_extraction_uses_image_dataset`, asserting `--task image-feature-extraction` → `ImageDataset` (the canonical vision-feature calibration path). - **eval**: drop the `feature-extraction` param from `test_image_feature_extraction` (vision feature = `image-feature-extraction`). Also loosen the kNN floors to sanity levels (10/25), consistent with this file's N=10 convention (*"Loose floors guard against degenerate output, not magnitude"*) — the previous 30/60 floors flaked at top1=20 on a 10-sample kNN. ## Heads-up for @zhenchaoni This realigns the two e2e tests you added in `#786` and, by extension, accepts dropping the bimodal `--task feature-extraction`-on-a-vision-model capability `#786` introduced. That removal landed in `#807` (merged); this PR only updates the tests to match. If you'd rather keep that capability working (graceful) instead of erroring on the mismatch, that means restoring the io_config dispatch — happy to do that instead. Flagging for your call.

zhenchaoni requested a review from a team as a code owner May 29, 2026 02:56

This comment was marked as duplicate.

Sign in to view

zhenchaoni force-pushed the private/zhenni/fix_task_inconsistency branch from 7675ece to b272e3b Compare June 1, 2026 03:14

This comment was marked as duplicate.

Sign in to view

Fix inside quantize and evaluator

42830e0

zhenchaoni force-pushed the private/zhenni/fix_task_inconsistency branch from b272e3b to 42830e0 Compare June 1, 2026 07:21

Resolve comments

d7791ab

DingmaomaoBJTU reviewed Jun 1, 2026

View reviewed changes

Comment thread src/winml/modelkit/eval/evaluate.py

Comment thread src/winml/modelkit/datasets/__init__.py

Comment thread src/winml/modelkit/inspect/resolver.py Outdated

This comment was marked as duplicate.

Sign in to view

DingmaomaoBJTU mentioned this pull request Jun 1, 2026

feat(timm): enable timm image-classification models via library routing #790

Merged

zhenchaoni changed the title ~~Private/zhenni/fix task inconsistency~~ Fix model-task inconsistency for vision feature-extraction models Jun 2, 2026

github-advanced-security AI found potential problems Jun 2, 2026

View reviewed changes

Comment thread src/winml/modelkit/export/__init__.py Fixed

Resolve comments

303c6b2

zhenchaoni force-pushed the private/zhenni/fix_task_inconsistency branch from 2aa814f to 303c6b2 Compare June 2, 2026 03:42

DingmaomaoBJTU approved these changes Jun 2, 2026

View reviewed changes

resolve conflict

ff497a8

DingmaomaoBJTU approved these changes Jun 2, 2026

View reviewed changes

zhenchaoni merged commit 2dfce36 into main Jun 2, 2026
9 checks passed

zhenchaoni deleted the private/zhenni/fix_task_inconsistency branch June 2, 2026 05:50

timenick mentioned this pull request Jun 8, 2026

e2e test_image_feature_extraction: kNN accuracy floor is unreachable & noise-level at samples=10 (quantization & EP innocent) #826

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix model-task inconsistency for vision feature-extraction models#786

Fix model-task inconsistency for vision feature-extraction models#786
zhenchaoni merged 4 commits into
mainfrom
private/zhenni/fix_task_inconsistency

zhenchaoni commented May 29, 2026 •

edited

Loading

Uh oh!

timenick commented May 29, 2026 •

edited

Loading

Uh oh!

This comment was marked as duplicate.

Uh oh!

This comment was marked as duplicate.

Uh oh!

This comment was marked as duplicate.

Uh oh!

This comment was marked as duplicate.

Uh oh!

This comment was marked as duplicate.

Uh oh!

This comment was marked as duplicate.

Uh oh!

DingmaomaoBJTU left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment was marked as duplicate.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

zhenchaoni commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!