How Autogen Handles “Broken” Image Tags for Multimodal LLMs? #5679
Unanswered
Aswathprabhu
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I’m exploring how Autogen processes in-prompt images for its agents, specifically referencing this documentation. When passing an <img https://th.bing.com/th/id/OIP.29Mi2kJmcHHyQVGe_0NG7QHaEo?pid=ImgDet&rs=1> tag to an Autogen agent with multimodal support, it seems to successfully interpret and process the image. How does Autogen handle this “broken” image tag (where src is missing)? Is there an abstraction at play that rewrites or interprets it before sending to the underlying LLM (e.g., GPT-4V)? Who is responsible for parsing and converting this data to the OpenAI format?
Beta Was this translation helpful? Give feedback.
All reactions