Multimodal way to retrieve documents #81

AmazingK2k3 · 2025-02-24T05:06:19Z

Hello,
Thank you for the amazing library. I referred the RAG implementation notebooks from hugging face using this library and loved it. I am just wondering if there is a way to not just have a text prompt but feed in an image as well to retrieve multimodal documents or images.

To be specific the user query contains text as well as an image input to colpali that can retrieve top k similar docs which then can be used by the vlm including the prompt and the image for generating an output.

I am not sure byaldi supports that integration or even colpali does. Or is there any other way. Hope this is the right space to raise this issue or discussion!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multimodal way to retrieve documents #81

Multimodal way to retrieve documents #81

AmazingK2k3 commented Feb 24, 2025

Multimodal way to retrieve documents #81

Multimodal way to retrieve documents #81

Comments

AmazingK2k3 commented Feb 24, 2025