Skip to content

feat(keypoint-detection): enable ViTPose config/build/perf#905

Open
jeon185 wants to merge 1 commit into
mainfrom
feat/keypoint-detection-enablement
Open

feat(keypoint-detection): enable ViTPose config/build/perf#905
jeon185 wants to merge 1 commit into
mainfrom
feat/keypoint-detection-enablement

Conversation

@jeon185

@jeon185 jeon185 commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Enables the 6 ViTPose keypoint-detection models from #284 to pass wmk config -> build -> perf (CPU and OpenVINO). Eval isn't included here - I'll do the accuracy side in a follow-up since it needs a couple of design decisions first.

Two things were blocking all 6 models:

  1. wmk config failed with "Task 'keypoint-detection' not supported by TasksManager". Optimum has the ViTPose ONNX export config but no task->class entry for keypoint-detection, and AutoModelForKeypointDetection only covers SuperPoint. Added the (vitpose, keypoint-detection) -> VitPoseForPoseEstimation mapping, same way we already do it for CLIP/SAM.

  2. The plus checkpoints (MoE backbone) crashed during export with "dataset_index must be provided when using multiple experts". Optimum's VitPoseModelPatcher injects a constant dataset_index, but patch_model_for_export defaults model_kwargs to None so it crashed on init. Passing an explicit model_kwargs={} fixes that. The trace step (Step 3) was also running the model outside the patcher context, so I wrapped it the same way the export step already is.

The exporter change isn't ViTPose-specific - it helps any MoE model whose patcher injects forward args.

Verified config/build/perf on all 6: vitpose-base-simple, vitpose-plus-small/base/large/huge, and synthpose-vitpose-huge-hf. Added unit tests for the mapping and the patcher model_kwargs handling.

One note: you still need to pass --task keypoint-detection explicitly for now - the task isn't auto-detected from the config yet. I left auto-detection out of this PR to keep it small; can add it here or as a follow-up if you'd prefer.

Refs #284.

@jeon185 jeon185 requested a review from a team as a code owner June 16, 2026 18:19
ViTPose keypoint-detection models could not pass the wmk pipeline:

1. Task resolution: Optimum registers the ViTPose ONNX export config but has
   no task-to-class entry for keypoint-detection, and transformers'
   AutoModelForKeypointDetection only recognizes SuperPoint. Add
   MODEL_CLASS_MAPPING[(vitpose, keypoint-detection)] = VitPoseForPoseEstimation
   (models/hf/vitpose.py) so the resolver loads the correct class.

2. MoE export: the vitpose-plus checkpoints use a Mixture-of-Experts backbone
   whose patcher injects a constant dataset_index at export time. Optimum's
   patch_model_for_export defaults model_kwargs to None, so the patcher crashed
   on init. Pass an explicit model_kwargs={} in _get_optimum_patcher. Also wrap
   the Step 3 hierarchy trace in the same patcher context (it previously ran
   the model forward without the injected dataset_index, failing before export).

Verified config -> build -> perf on all 6 acceptance models in #284
(base-simple, plus-{small,base,large,huge}, synthpose-vitpose-huge-hf).
# are traced with the same inputs they are exported with. The export
# in Step 4 re-enters the patcher; the contexts are sequential, not
# nested.
with self._get_optimum_patcher(model, task):

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change looks good and low-risk overall. The model_kwargs={} fix is effectively a no-op for non-MoE patchers (Optimum's ModelPatcher.__init__ already coerces None → {}), so that one's safe.

My one concern is wrapping Step 3 (_trace_model_hierarchy) in the patcher. Models that already resolve a real Optimum patcher today — e.g. CLIP (clip_text_model / clip_vision_model), SAM, T5, SigLIP, whisper, VED — will now have their hierarchy trace run through patched forward for the first time. The ONNX graph is unaffected (export already ran patched), but the traced module path can shift, which could change the hierarchy / tag coverage on those models.

You verified the 6 ViTPose models, but those weren't being traced-under-patch before. Could you also run a before/after on at least one already-patched non-ViTPose model (CLIP is a good pick) and confirm the tag coverage / hierarchy stats are unchanged?


# (model_type, task) -> HuggingFace model class
MODEL_CLASS_MAPPING: dict[tuple[str, str], type] = {
("vitpose", "keypoint-detection"): VitPoseForPoseEstimation,

@vortex-captain vortex-captain Jun 24, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
("vitpose", "keypoint-detection"): VitPoseForPoseEstimation,
("vitpose", "keypoint-detection"): VitPoseForPoseEstimation,
("vitpose", None): VitPoseForPoseEstimation,

Could you help try test to see this line makes command with task omitted work?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants