-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for searching with query image #27
Comments
@bclavie If you think that this would be a useful feature, I'd be happy to contribute and raise a PR for the same. |
Hey! It'd actually be a completely experimental feature since it's not even done in the paper, but I'd be happy to include it under a beta flag if you would like to contribute it! |
@bclavie Thanks for your response! Sure, yes I understand. In terms of the logic it remains the same i.e. process_images would return pixel_values & input_ids for the prompt (just like when we do indexing). If we are searching by image, we just make the forward call with both pixel_values and input_ids and get the embeddings which can later be used for maxsim calculations. I will make the code changes later this week and raise a PR . |
Hi, |
I’m also interested in image queries. @nuschandra, were you able to make progress on this modification? Thank you in advance for all your hard work! |
I worked on image-based RAG. First, I modified the search function in ragmodel.py to compare image similarities and attempted to search images. However, I was not successful because the ColPali model structure is designed to find similarities between a PDF page and an image. The images in the PDF occupied very little space, leading to failed searches. To address this, I extracted the images from the PDF and created a dedicated RAG database for them. With this approach, I achieved significantly higher accuracy in image searching. |
Thank you for your response! Would you mind sharing the code modifications you made? |
Hi All, Sorry for the delayed response. I was held up with a few personal commitments over the last month and couldn't get back to this. I did make the code changes for this but the performance was not particularly great which concurs with what @bclavie mentioned earlier in that ColPali was not designed for image-image similarity. It might work for certain specific use-cases but in general it's not designed for that. If any of you think it'll be useful, I can try to revisit this again. |
I am particularly interested in finding similarities between document pages (layouts, logos, and text within the document) rather than focusing on the images embedded in the PDF. I believe ColPali could work well for this specific use case, don't you think? Locally, I’ve modified the code to enable searching with an image (extracted from a PDF page) by calling encode_image instead of encode_query. After replacing encode_query with encode_image, I continued using processor.score as usual. However, it seems the similarity calculation isn’t fully accurate—it appears to be weighted rather than patch-based. A patch-based approach would be more useful as it could detect similarities across different parts of the page. I need to continue exploring alternative scoring methods and gain a deeper understanding of the project. Any comments on your tests or help with code you’ve already experimented with would be greatly appreciated. In any case, thank you for all the work you’ve done so far! |
Hi @bclavie & team,
I currently don't see support for searching through an index with a query image instead of a text query. I understand that there is an encode_image option but that only provides the embeddings of the query image and not a full search through the indexed documents along with maxsim calculations. It would be really nice to have support for querying with an image too.
The text was updated successfully, but these errors were encountered: