How Autogen Handles “Broken” Image Tags for Multimodal LLMs? #5679

Aswathprabhu · 2025-02-24T03:32:34Z

Aswathprabhu
Feb 24, 2025

I’m exploring how Autogen processes in-prompt images for its agents, specifically referencing this documentation. When passing an <img https://th.bing.com/th/id/OIP.29Mi2kJmcHHyQVGe_0NG7QHaEo?pid=ImgDet&rs=1> tag to an Autogen agent with multimodal support, it seems to successfully interpret and process the image. How does Autogen handle this “broken” image tag (where src is missing)? Is there an abstraction at play that rewrites or interprets it before sending to the underlying LLM (e.g., GPT-4V)? Who is responsible for parsing and converting this data to the OpenAI format?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How Autogen Handles “Broken” Image Tags for Multimodal LLMs? #5679

{{title}}

Replies: 0 comments

Select a reply

How Autogen Handles “Broken” Image Tags for Multimodal LLMs? #5679

Aswathprabhu Feb 24, 2025

Replies: 0 comments

Aswathprabhu
Feb 24, 2025