You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello,
Thank you for the amazing library. I referred the RAG implementation notebooks from hugging face using this library and loved it. I am just wondering if there is a way to not just have a text prompt but feed in an image as well to retrieve multimodal documents or images.
To be specific the user query contains text as well as an image input to colpali that can retrieve top k similar docs which then can be used by the vlm including the prompt and the image for generating an output.
I am not sure byaldi supports that integration or even colpali does. Or is there any other way. Hope this is the right space to raise this issue or discussion!
The text was updated successfully, but these errors were encountered:
Hello,
Thank you for the amazing library. I referred the RAG implementation notebooks from hugging face using this library and loved it. I am just wondering if there is a way to not just have a text prompt but feed in an image as well to retrieve multimodal documents or images.
To be specific the user query contains text as well as an image input to colpali that can retrieve top k similar docs which then can be used by the vlm including the prompt and the image for generating an output.
I am not sure byaldi supports that integration or even colpali does. Or is there any other way. Hope this is the right space to raise this issue or discussion!
The text was updated successfully, but these errors were encountered: