Skip to content

Fix model-task inconsistency for vision feature-extraction models#786

Merged
zhenchaoni merged 4 commits into
mainfrom
private/zhenni/fix_task_inconsistency
Jun 2, 2026
Merged

Fix model-task inconsistency for vision feature-extraction models#786
zhenchaoni merged 4 commits into
mainfrom
private/zhenni/fix_task_inconsistency

Conversation

@zhenchaoni

@zhenchaoni zhenchaoni commented May 29, 2026

Copy link
Copy Markdown
Member

Fix model-task inconsistency for vision feature-extraction models

Fixes #777, #778, #782.

Principle

winml inspect is the source of truth for valid (model_id, task) pairs. Both feature-extraction and image-feature-extraction are valid ways to address an image-embedding model like facebook/dinov2-base. Downstream commands must accept whichever name winml inspect accepts, then use (model_id, task) to locate the concrete class to act on.

Root cause

Optimum's TasksManager.get_exporter_config_constructor only knows canonical Optimum task names. Several call sites passed the raw user-supplied task straight through, so HF aliases like image-feature-extraction were rejected with "Unsupported". The evaluator additionally needs to know which HF pipeline name to dispatch on, which the canonical Optimum task name doesn't carry by itself for bimodal tasks like feature-extraction.

Fix

  • Inspect / export / HTP exporter: normalize via _map_task_synonym(task) (in export/io.py) before any TasksManager lookup because it requires normalized task input. This is a single function reused at each TasksManager boundary — no new global table.
  • Quantize: _resolve_dataset_class(task, io_config) in datasets/__init__.py dispatches to TextDataset / ImageDataset based on the actual ONNX input names. No AutoConfig.from_pretrained round-trip. Bimodal io_configs fall back to RandomDataset with a warning.
  • Evaluate: Because HF pipeline and evaluate library have their task name convention, to_hf_pipeline_task(task, model_id) in eval/evaluate.py translates to the HF pipeline name the underlying evaluate library expects. Uses OnnxConfig.inputs (no weights loaded) to pick the modality. Bimodal models (e.g. CLIP combined: both pixel_values and input_ids) keep the task unchanged via a len(hits) == 1 guard, preserving the explicit user task.

Validation

facebook/dinov2-base:

Command Before After
winml inspect -m facebook/dinov2-base --task image-feature-extraction "Unsupported" Resolves via Dinov2OnnxConfig
winml export -m facebook/dinov2-base -t image-feature-extraction KeyError on TasksManager Valid ONNX with last_hidden_state
winml eval -m facebook/dinov2-base --task feature-extraction RuntimeError: Failed to create feature-extraction dataset kNN metrics on mini-imagenet
winml quantize <onnx> --task feature-extraction -m facebook/dinov2-small Failure by using TextDataset Routes to ImageDataset

openai/clip-vit-base-patch32 (bimodal, regression check):

  • winml eval -m openai/clip-vit-base-patch32 --task feature-extraction → stays feature-extraction (text STS evaluator); not silently rerouted to image.
  • winml eval -m openai/clip-vit-base-patch32 (auto-detect) → resolves to feature-extraction (text).

Tests

Unit:

  • tests/unit/eval/test_eval.py::TestResolveTask — auto-detect, explicit task, bimodal guard, HF pipeline translation.
  • test_random_dataset.py — TASK_DATASET_MAPPING covers all registered tasks, including bimodal dict-of-dict.

E2E (-m e2e, dinov2 chosen because it isn't in MODEL_BUILD_CONFIGS and so actually exercises the TasksManager path):

  • tests/e2e/test_inspect_e2e.py::TestInspectDinoV2 — both image-feature-extraction and feature-extraction resolve.
  • tests/e2e/test_export_e2e.py::TestExportDinoV2::test_image_feature_extraction.
  • tests/e2e/test_eval_e2e.py::TestEvalPerTask::test_image_feature_extraction parameterized over both task names.
  • tests/e2e/test_quantize_e2e.py::test_feature_extraction_with_pixel_values_uses_image_dataset.

@zhenchaoni zhenchaoni requested a review from a team as a code owner May 29, 2026 02:56
@timenick

timenick commented May 29, 2026

Copy link
Copy Markdown
Collaborator

Thanks for tackling these three — the e2e coverage on dinov2 end-to-end across inspect/export/eval/quantize is exactly what was missing. Two concrete concerns and a proposal before this lands.

Concrete regression: CLIP feature-extraction will swap dataset class

canonical_task_to_known_task iterates _OPTIMUM_CANONICAL_TO_KNOWN_TASK["feature-extraction"] and returns on first input-name match. For CLIP, CLIPOnnxConfig for feature-extraction declares both pixel_values and input_ids (verified via TasksManager.get_exporter_config_constructor("onnx", model_type="clip", task="feature-extraction") — inputs are ['pixel_values', 'input_ids', 'attention_mask']). With current dict insertion order, pixel_values matches first → routes to image-feature-extraction.

But src/winml/modelkit/models/hf/clip.py:53 already encodes the right answer:

("clip", "feature-extraction"): CLIPTextModelWithProjection,        # text encoder
("clip", "image-feature-extraction"): CLIPVisionModelWithProjection, # vision encoder

So winml quantize -m openai/clip-vit-base-patch32 --task feature-extraction:

  • Before this PR: loader picks CLIPTextModelWithProjection, datasets picks TextDataset → consistent
  • After this PR: loader still picks CLIPTextModelWithProjection, datasets translated to image-feature-extractionImageDataset → dtype/shape mismatch at calibration

The input-name heuristic doesn't have enough information to disambiguate multi-modal architectures; MODEL_CLASS_MAPPING already holds the truth and disagrees with what the heuristic returns.

DatasetCalibrationReader.__init__ quietly rewrites self.task

Caller passes task="feature-extraction", but after the new lines self.task becomes "image-feature-extraction". Any downstream consumer of reader.task expecting "what the caller asked for" gets something different. Either name the new attribute (self.resolved_task?) or document the semantic shift explicitly on the class.

Architectural direction — two patches vs. one source fix

The pattern we're ending up with: three places inline-call _map_task_synonym (forward), plus a recovery function (reverse) that has to re-derive what was lost. Both directions exist because _detect_task_from_config drops modality at the source by calling TasksManager.infer_task_from_model, which is Optimum-canonical.

Prevention alternative: have _detect_task_from_config return the HF (modality-aware) task to begin with (HF Hub pipeline_tag first, then a (model_type, config)→HF modality fallback). Then:

  • the three inline forward-direction calls collapse into one boundary helper (we'd want this regardless — a lint rule against direct TasksManager imports keeps new code honest),
  • canonical_task_to_known_task and _OPTIMUM_CANONICAL_TO_KNOWN_TASK go away,
  • the CLIP-style ambiguity above doesn't exist because we never need to recover modality from input names.

Proposed path

  1. Fix the CLIP regression in this PR — it's a marquee architecture and the regression is real.
  2. Resolve the self.task semantic shift (rename or document).
  3. Merge — the e2e fix value for Model task handling is not consistent between config and build #777/winml build and eval command fails on image embedding model like facebook/dinov2-base #778/winml inspect -m facebook/dinov2-base --task image-feature-extraction returns unsupported #782 is real and shouldn't wait.
  4. I'll open a follow-up doing the prevention refactor: _detect_task_from_config → HF task, centralize the forward-normalize into a boundary helper with a lint rule, delete the canonical-to-known table. Your e2e tests stay green throughout.

Happy to pair on the follow-up — there's overlap with the #724 / KNOWN_TASKS cleanup I'm queuing up separately.


(Edit: removed accidental #1/#2/#3 section markers that GitHub auto-linked to unrelated issues.)

DingmaomaoBJTU

This comment was marked as duplicate.

DingmaomaoBJTU

This comment was marked as duplicate.

DingmaomaoBJTU

This comment was marked as duplicate.

DingmaomaoBJTU

This comment was marked as duplicate.

DingmaomaoBJTU

This comment was marked as duplicate.

@zhenchaoni zhenchaoni force-pushed the private/zhenni/fix_task_inconsistency branch from 7675ece to b272e3b Compare June 1, 2026 03:14
DingmaomaoBJTU

This comment was marked as duplicate.

@zhenchaoni zhenchaoni force-pushed the private/zhenni/fix_task_inconsistency branch from b272e3b to 42830e0 Compare June 1, 2026 07:21

@DingmaomaoBJTU DingmaomaoBJTU left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good fix for the task-naming inconsistency between HF pipeline aliases and Optimum's canonical names. The approach of normalizing via _map_task_synonym at each TasksManager call site is correct. The to_hf_pipeline_task evaluator mapper and the bimodal feature-extraction dataset dispatch are both well-thought-out. A few observations:

Comment thread src/winml/modelkit/eval/evaluate.py
Comment thread src/winml/modelkit/datasets/__init__.py
Comment thread src/winml/modelkit/inspect/resolver.py Outdated
DingmaomaoBJTU

This comment was marked as duplicate.

@zhenchaoni zhenchaoni changed the title Private/zhenni/fix task inconsistency Fix model-task inconsistency for vision feature-extraction models Jun 2, 2026
Comment thread src/winml/modelkit/export/__init__.py Fixed
@zhenchaoni zhenchaoni force-pushed the private/zhenni/fix_task_inconsistency branch from 2aa814f to 303c6b2 Compare June 2, 2026 03:42
@zhenchaoni zhenchaoni merged commit 2dfce36 into main Jun 2, 2026
9 checks passed
@zhenchaoni zhenchaoni deleted the private/zhenni/fix_task_inconsistency branch June 2, 2026 05:50
DingmaomaoBJTU pushed a commit that referenced this pull request Jun 5, 2026
## Fix model-task inconsistency for vision feature-extraction models

Fixes #777, #778, #782.

### Principle

`winml inspect` is the source of truth for valid `(model_id, task)`
pairs. Both `feature-extraction` and `image-feature-extraction` are
valid ways to address an image-embedding model like
`facebook/dinov2-base`. Downstream commands must accept whichever name
`winml inspect` accepts, then use `(model_id, task)` to locate the
concrete class to act on.

### Root cause

Optimum's `TasksManager.get_exporter_config_constructor` only knows
canonical Optimum task names. Several call sites passed the raw
user-supplied task straight through, so HF aliases like
`image-feature-extraction` were rejected with "Unsupported". The
evaluator additionally needs to know which HF pipeline name to dispatch
on, which the canonical Optimum task name doesn't carry by itself for
bimodal tasks like `feature-extraction`.

### Fix

- **Inspect / export / HTP exporter**: normalize via
`_map_task_synonym(task)` (in `export/io.py`) before any `TasksManager`
lookup because it requires normalized task input. This is a single
function reused at each `TasksManager` boundary — no new global table.
- **Quantize**: `_resolve_dataset_class(task, io_config)` in
`datasets/__init__.py` dispatches to `TextDataset` / `ImageDataset`
based on the actual ONNX input names. No `AutoConfig.from_pretrained`
round-trip. Bimodal io_configs fall back to `RandomDataset` with a
warning.
- **Evaluate**: Because HF pipeline and evaluate library have their task
name convention, `to_hf_pipeline_task(task, model_id)` in
`eval/evaluate.py` translates to the HF pipeline name the underlying
`evaluate` library expects. Uses `OnnxConfig.inputs` (no weights loaded)
to pick the modality. Bimodal models (e.g. CLIP combined: both
`pixel_values` and `input_ids`) keep the task unchanged via a `len(hits)
== 1` guard, preserving the explicit user task.

### Validation

`facebook/dinov2-base`:

| Command | Before | After |
|---|---|---|
| `winml inspect -m facebook/dinov2-base --task
image-feature-extraction` | "Unsupported" | Resolves via
`Dinov2OnnxConfig` |
| `winml export -m facebook/dinov2-base -t image-feature-extraction` |
KeyError on TasksManager | Valid ONNX with `last_hidden_state` |
| `winml eval -m facebook/dinov2-base --task feature-extraction` |
`RuntimeError: Failed to create feature-extraction dataset` | kNN
metrics on mini-imagenet |
| `winml quantize <onnx> --task feature-extraction -m
facebook/dinov2-small` | Failure by using TextDataset | Routes to
`ImageDataset` |

`openai/clip-vit-base-patch32` (bimodal, regression check):

- `winml eval -m openai/clip-vit-base-patch32 --task feature-extraction`
→ stays `feature-extraction` (text STS evaluator); not silently rerouted
to image.
- `winml eval -m openai/clip-vit-base-patch32` (auto-detect) → resolves
to `feature-extraction` (text).

### Tests

Unit:
- `tests/unit/eval/test_eval.py::TestResolveTask` — auto-detect,
explicit task, bimodal guard, HF pipeline translation.
- test_random_dataset.py — `TASK_DATASET_MAPPING` covers all registered
tasks, including bimodal dict-of-dict.

E2E (`-m e2e`, dinov2 chosen because it isn't in `MODEL_BUILD_CONFIGS`
and so actually exercises the `TasksManager` path):
- `tests/e2e/test_inspect_e2e.py::TestInspectDinoV2` — both
`image-feature-extraction` and `feature-extraction` resolve.
-
`tests/e2e/test_export_e2e.py::TestExportDinoV2::test_image_feature_extraction`.
-
`tests/e2e/test_eval_e2e.py::TestEvalPerTask::test_image_feature_extraction`
parameterized over both task names.
-
`tests/e2e/test_quantize_e2e.py::test_feature_extraction_with_pixel_values_uses_image_dataset`.
timenick added a commit that referenced this pull request Jun 8, 2026
…e logic (#820)

## What

`#807` (merged) made `detect_task` modality-aware and removed `#786`'s
bimodal `feature-extraction` io_config reverse-reconstruction (in both
`datasets` and `eval`). But `#786`'s e2e tests — still on `main` —
assert the *old* bimodal behavior, so they now fail on `main`:

-
`tests/e2e/test_quantize_e2e.py::TestPerTaskDatasets::test_feature_extraction_with_pixel_values_uses_image_dataset`
-
`tests/e2e/test_eval_e2e.py::TestEvalPerTask::test_image_feature_extraction[feature-extraction]`

Both crash with `KeyError: Dinov2Config` — a vision model + `--task
feature-extraction` now resolves to the text `TextDataset`, which tries
to load a tokenizer DINOv2 does not have.

## Why update the tests (not revert the code)

Under the merged modality-aware task vocabulary:

- `feature-extraction` is the **text** feature task; a vision embedding
model's canonical task is **`image-feature-extraction`**.
- `winml inspect` / auto-detect always report `image-feature-extraction`
for a vision model (e.g. `facebook/dinov2-small`) — the tool never hands
a user `--task feature-extraction` for a vision model.
- So explicit `--task feature-extraction` on a vision model is a genuine
modality mismatch and is expected to error. The `#786` bimodal io_config
dispatch (which silently recovered modality from the ONNX inputs) was
deliberately removed in `#807`.

The e2e tests follow the new logic:

- **quantize**:
`test_feature_extraction_with_pixel_values_uses_image_dataset` →
`test_image_feature_extraction_uses_image_dataset`, asserting `--task
image-feature-extraction` → `ImageDataset` (the canonical vision-feature
calibration path).
- **eval**: drop the `feature-extraction` param from
`test_image_feature_extraction` (vision feature =
`image-feature-extraction`). Also loosen the kNN floors to sanity levels
(10/25), consistent with this file's N=10 convention (*"Loose floors
guard against degenerate output, not magnitude"*) — the previous 30/60
floors flaked at top1=20 on a 10-sample kNN.

## Heads-up for @zhenchaoni

This realigns the two e2e tests you added in `#786` and, by extension,
accepts dropping the bimodal `--task
feature-extraction`-on-a-vision-model capability `#786` introduced. That
removal landed in `#807` (merged); this PR only updates the tests to
match. If you'd rather keep that capability working (graceful) instead
of erroring on the mismatch, that means restoring the io_config dispatch
— happy to do that instead. Flagging for your call.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Model task handling is not consistent between config and build

4 participants