Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multimodal way to retrieve documents #81

Open
AmazingK2k3 opened this issue Feb 24, 2025 · 0 comments
Open

Multimodal way to retrieve documents #81

AmazingK2k3 opened this issue Feb 24, 2025 · 0 comments

Comments

@AmazingK2k3
Copy link

Hello,
Thank you for the amazing library. I referred the RAG implementation notebooks from hugging face using this library and loved it. I am just wondering if there is a way to not just have a text prompt but feed in an image as well to retrieve multimodal documents or images.

To be specific the user query contains text as well as an image input to colpali that can retrieve top k similar docs which then can be used by the vlm including the prompt and the image for generating an output.

I am not sure byaldi supports that integration or even colpali does. Or is there any other way. Hope this is the right space to raise this issue or discussion!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant