Skip to content

fix HunyuanOCR crash in vLLM#39

Open
souvikchand wants to merge 1 commit into
Tencent-Hunyuan:mainfrom
souvikchand:fix
Open

fix HunyuanOCR crash in vLLM#39
souvikchand wants to merge 1 commit into
Tencent-Hunyuan:mainfrom
souvikchand:fix

Conversation

@souvikchand

Copy link
Copy Markdown

This PR fixes a runtime error in the vLLM multimodal pipeline when running HunyuanOCR.
The issue #35 was caused by sending images using the wrong message schema, which led vLLM to misinterpret the image input and generate an invalid tensor shape.

ValueError: image_grid_thw has rank 3 but expected 2.
Expected shape: ('ni', 3), but got torch.Size([2, 1, 3])

what i changed

  1. updated request format
{ "type": "image_url", "image_url": { "url": "data:image/jpeg;base64,..." } }

to

{
    "type": "image_url",
    "image_url": f"data:{mime};base64,{encode_image(image_path)}"
 },
  1. Added automatic MIME-type detection to ensure images are sent with the correct format (png/jpeg/webp/etc)
  2. Ensured image_url is a string, not a nested object, which aligns with vLLM’s expected schema for HuggingFace vision models.

@diegocarturan-debug

Copy link
Copy Markdown

Uploading ia-prime-historia-completa-scene-4.png…

@souvikchand

Copy link
Copy Markdown
Author

@diegocarturan-debug
sorry but can you explain your comment above. it's actually redirecting to this same page

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants