Fix model-task inconsistency for vision feature-extraction models#786
Conversation
|
Thanks for tackling these three — the e2e coverage on Concrete regression: CLIP
|
7675ece to
b272e3b
Compare
b272e3b to
42830e0
Compare
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Good fix for the task-naming inconsistency between HF pipeline aliases and Optimum's canonical names. The approach of normalizing via _map_task_synonym at each TasksManager call site is correct. The to_hf_pipeline_task evaluator mapper and the bimodal feature-extraction dataset dispatch are both well-thought-out. A few observations:
2aa814f to
303c6b2
Compare
## Fix model-task inconsistency for vision feature-extraction models Fixes #777, #778, #782. ### Principle `winml inspect` is the source of truth for valid `(model_id, task)` pairs. Both `feature-extraction` and `image-feature-extraction` are valid ways to address an image-embedding model like `facebook/dinov2-base`. Downstream commands must accept whichever name `winml inspect` accepts, then use `(model_id, task)` to locate the concrete class to act on. ### Root cause Optimum's `TasksManager.get_exporter_config_constructor` only knows canonical Optimum task names. Several call sites passed the raw user-supplied task straight through, so HF aliases like `image-feature-extraction` were rejected with "Unsupported". The evaluator additionally needs to know which HF pipeline name to dispatch on, which the canonical Optimum task name doesn't carry by itself for bimodal tasks like `feature-extraction`. ### Fix - **Inspect / export / HTP exporter**: normalize via `_map_task_synonym(task)` (in `export/io.py`) before any `TasksManager` lookup because it requires normalized task input. This is a single function reused at each `TasksManager` boundary — no new global table. - **Quantize**: `_resolve_dataset_class(task, io_config)` in `datasets/__init__.py` dispatches to `TextDataset` / `ImageDataset` based on the actual ONNX input names. No `AutoConfig.from_pretrained` round-trip. Bimodal io_configs fall back to `RandomDataset` with a warning. - **Evaluate**: Because HF pipeline and evaluate library have their task name convention, `to_hf_pipeline_task(task, model_id)` in `eval/evaluate.py` translates to the HF pipeline name the underlying `evaluate` library expects. Uses `OnnxConfig.inputs` (no weights loaded) to pick the modality. Bimodal models (e.g. CLIP combined: both `pixel_values` and `input_ids`) keep the task unchanged via a `len(hits) == 1` guard, preserving the explicit user task. ### Validation `facebook/dinov2-base`: | Command | Before | After | |---|---|---| | `winml inspect -m facebook/dinov2-base --task image-feature-extraction` | "Unsupported" | Resolves via `Dinov2OnnxConfig` | | `winml export -m facebook/dinov2-base -t image-feature-extraction` | KeyError on TasksManager | Valid ONNX with `last_hidden_state` | | `winml eval -m facebook/dinov2-base --task feature-extraction` | `RuntimeError: Failed to create feature-extraction dataset` | kNN metrics on mini-imagenet | | `winml quantize <onnx> --task feature-extraction -m facebook/dinov2-small` | Failure by using TextDataset | Routes to `ImageDataset` | `openai/clip-vit-base-patch32` (bimodal, regression check): - `winml eval -m openai/clip-vit-base-patch32 --task feature-extraction` → stays `feature-extraction` (text STS evaluator); not silently rerouted to image. - `winml eval -m openai/clip-vit-base-patch32` (auto-detect) → resolves to `feature-extraction` (text). ### Tests Unit: - `tests/unit/eval/test_eval.py::TestResolveTask` — auto-detect, explicit task, bimodal guard, HF pipeline translation. - test_random_dataset.py — `TASK_DATASET_MAPPING` covers all registered tasks, including bimodal dict-of-dict. E2E (`-m e2e`, dinov2 chosen because it isn't in `MODEL_BUILD_CONFIGS` and so actually exercises the `TasksManager` path): - `tests/e2e/test_inspect_e2e.py::TestInspectDinoV2` — both `image-feature-extraction` and `feature-extraction` resolve. - `tests/e2e/test_export_e2e.py::TestExportDinoV2::test_image_feature_extraction`. - `tests/e2e/test_eval_e2e.py::TestEvalPerTask::test_image_feature_extraction` parameterized over both task names. - `tests/e2e/test_quantize_e2e.py::test_feature_extraction_with_pixel_values_uses_image_dataset`.
…e logic (#820) ## What `#807` (merged) made `detect_task` modality-aware and removed `#786`'s bimodal `feature-extraction` io_config reverse-reconstruction (in both `datasets` and `eval`). But `#786`'s e2e tests — still on `main` — assert the *old* bimodal behavior, so they now fail on `main`: - `tests/e2e/test_quantize_e2e.py::TestPerTaskDatasets::test_feature_extraction_with_pixel_values_uses_image_dataset` - `tests/e2e/test_eval_e2e.py::TestEvalPerTask::test_image_feature_extraction[feature-extraction]` Both crash with `KeyError: Dinov2Config` — a vision model + `--task feature-extraction` now resolves to the text `TextDataset`, which tries to load a tokenizer DINOv2 does not have. ## Why update the tests (not revert the code) Under the merged modality-aware task vocabulary: - `feature-extraction` is the **text** feature task; a vision embedding model's canonical task is **`image-feature-extraction`**. - `winml inspect` / auto-detect always report `image-feature-extraction` for a vision model (e.g. `facebook/dinov2-small`) — the tool never hands a user `--task feature-extraction` for a vision model. - So explicit `--task feature-extraction` on a vision model is a genuine modality mismatch and is expected to error. The `#786` bimodal io_config dispatch (which silently recovered modality from the ONNX inputs) was deliberately removed in `#807`. The e2e tests follow the new logic: - **quantize**: `test_feature_extraction_with_pixel_values_uses_image_dataset` → `test_image_feature_extraction_uses_image_dataset`, asserting `--task image-feature-extraction` → `ImageDataset` (the canonical vision-feature calibration path). - **eval**: drop the `feature-extraction` param from `test_image_feature_extraction` (vision feature = `image-feature-extraction`). Also loosen the kNN floors to sanity levels (10/25), consistent with this file's N=10 convention (*"Loose floors guard against degenerate output, not magnitude"*) — the previous 30/60 floors flaked at top1=20 on a 10-sample kNN. ## Heads-up for @zhenchaoni This realigns the two e2e tests you added in `#786` and, by extension, accepts dropping the bimodal `--task feature-extraction`-on-a-vision-model capability `#786` introduced. That removal landed in `#807` (merged); this PR only updates the tests to match. If you'd rather keep that capability working (graceful) instead of erroring on the mismatch, that means restoring the io_config dispatch — happy to do that instead. Flagging for your call.
Fix model-task inconsistency for vision feature-extraction models
Fixes #777, #778, #782.
Principle
winml inspectis the source of truth for valid(model_id, task)pairs. Bothfeature-extractionandimage-feature-extractionare valid ways to address an image-embedding model likefacebook/dinov2-base. Downstream commands must accept whichever namewinml inspectaccepts, then use(model_id, task)to locate the concrete class to act on.Root cause
Optimum's
TasksManager.get_exporter_config_constructoronly knows canonical Optimum task names. Several call sites passed the raw user-supplied task straight through, so HF aliases likeimage-feature-extractionwere rejected with "Unsupported". The evaluator additionally needs to know which HF pipeline name to dispatch on, which the canonical Optimum task name doesn't carry by itself for bimodal tasks likefeature-extraction.Fix
_map_task_synonym(task)(inexport/io.py) before anyTasksManagerlookup because it requires normalized task input. This is a single function reused at eachTasksManagerboundary — no new global table._resolve_dataset_class(task, io_config)indatasets/__init__.pydispatches toTextDataset/ImageDatasetbased on the actual ONNX input names. NoAutoConfig.from_pretrainedround-trip. Bimodal io_configs fall back toRandomDatasetwith a warning.to_hf_pipeline_task(task, model_id)ineval/evaluate.pytranslates to the HF pipeline name the underlyingevaluatelibrary expects. UsesOnnxConfig.inputs(no weights loaded) to pick the modality. Bimodal models (e.g. CLIP combined: bothpixel_valuesandinput_ids) keep the task unchanged via alen(hits) == 1guard, preserving the explicit user task.Validation
facebook/dinov2-base:winml inspect -m facebook/dinov2-base --task image-feature-extractionDinov2OnnxConfigwinml export -m facebook/dinov2-base -t image-feature-extractionlast_hidden_statewinml eval -m facebook/dinov2-base --task feature-extractionRuntimeError: Failed to create feature-extraction datasetwinml quantize <onnx> --task feature-extraction -m facebook/dinov2-smallImageDatasetopenai/clip-vit-base-patch32(bimodal, regression check):winml eval -m openai/clip-vit-base-patch32 --task feature-extraction→ staysfeature-extraction(text STS evaluator); not silently rerouted to image.winml eval -m openai/clip-vit-base-patch32(auto-detect) → resolves tofeature-extraction(text).Tests
Unit:
tests/unit/eval/test_eval.py::TestResolveTask— auto-detect, explicit task, bimodal guard, HF pipeline translation.TASK_DATASET_MAPPINGcovers all registered tasks, including bimodal dict-of-dict.E2E (
-m e2e, dinov2 chosen because it isn't inMODEL_BUILD_CONFIGSand so actually exercises theTasksManagerpath):tests/e2e/test_inspect_e2e.py::TestInspectDinoV2— bothimage-feature-extractionandfeature-extractionresolve.tests/e2e/test_export_e2e.py::TestExportDinoV2::test_image_feature_extraction.tests/e2e/test_eval_e2e.py::TestEvalPerTask::test_image_feature_extractionparameterized over both task names.tests/e2e/test_quantize_e2e.py::test_feature_extraction_with_pixel_values_uses_image_dataset.