Support for searching with query image #27

nuschandra · 2024-09-27T09:05:18Z

I currently don't see support for searching through an index with a query image instead of a text query. I understand that there is an encode_image option but that only provides the embeddings of the query image and not a full search through the indexed documents along with maxsim calculations. It would be really nice to have support for querying with an image too.

nuschandra · 2024-09-28T01:52:20Z

@bclavie If you think that this would be a useful feature, I'd be happy to contribute and raise a PR for the same.

bclavie · 2024-10-03T07:29:12Z

Hey! It'd actually be a completely experimental feature since it's not even done in the paper, but I'd be happy to include it under a beta flag if you would like to contribute it!

nuschandra · 2024-10-03T18:53:01Z

@bclavie Thanks for your response! Sure, yes I understand. In terms of the logic it remains the same i.e. process_images would return pixel_values & input_ids for the prompt (just like when we do indexing). If we are searching by image, we just make the forward call with both pixel_values and input_ids and get the embeddings which can later be used for maxsim calculations. I will make the code changes later this week and raise a PR .

sergenerbay · 2024-10-27T20:04:45Z

Hi,
I need the same feature. Have you completed the code?

VictorUceda · 2024-12-05T22:48:46Z

I’m also interested in image queries. @nuschandra, were you able to make progress on this modification? Thank you in advance for all your hard work!

sergenerbay · 2024-12-06T09:24:21Z

I worked on image-based RAG. First, I modified the search function in ragmodel.py to compare image similarities and attempted to search images. However, I was not successful because the ColPali model structure is designed to find similarities between a PDF page and an image. The images in the PDF occupied very little space, leading to failed searches. To address this, I extracted the images from the PDF and created a dedicated RAG database for them. With this approach, I achieved significantly higher accuracy in image searching.

VictorUceda · 2024-12-06T15:07:02Z

Thank you for your response! Would you mind sharing the code modifications you made?
I plan to use this model to compare layout similarities between two document pages. I still have a lot to learn about this model, but I believe it could work effectively for my use case.

nuschandra · 2024-12-09T23:16:30Z

Hi All,

Sorry for the delayed response. I was held up with a few personal commitments over the last month and couldn't get back to this.

I did make the code changes for this but the performance was not particularly great which concurs with what @bclavie mentioned earlier in that ColPali was not designed for image-image similarity. It might work for certain specific use-cases but in general it's not designed for that. If any of you think it'll be useful, I can try to revisit this again.

VictorUceda · 2024-12-10T23:11:27Z

I am particularly interested in finding similarities between document pages (layouts, logos, and text within the document) rather than focusing on the images embedded in the PDF. I believe ColPali could work well for this specific use case, don't you think?

Locally, I’ve modified the code to enable searching with an image (extracted from a PDF page) by calling encode_image instead of encode_query. After replacing encode_query with encode_image, I continued using processor.score as usual. However, it seems the similarity calculation isn’t fully accurate—it appears to be weighted rather than patch-based. A patch-based approach would be more useful as it could detect similarities across different parts of the page.

I need to continue exploring alternative scoring methods and gain a deeper understanding of the project. Any comments on your tests or help with code you’ve already experimented with would be greatly appreciated. In any case, thank you for all the work you’ve done so far!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for searching with query image #27

Support for searching with query image #27

nuschandra commented Sep 27, 2024 •

edited

Loading

nuschandra commented Sep 28, 2024

bclavie commented Oct 3, 2024

nuschandra commented Oct 3, 2024

sergenerbay commented Oct 27, 2024 •

edited

Loading

VictorUceda commented Dec 5, 2024

sergenerbay commented Dec 6, 2024

VictorUceda commented Dec 6, 2024

nuschandra commented Dec 9, 2024

VictorUceda commented Dec 10, 2024

Support for searching with query image #27

Support for searching with query image #27

Comments

nuschandra commented Sep 27, 2024 • edited Loading

nuschandra commented Sep 28, 2024

bclavie commented Oct 3, 2024

nuschandra commented Oct 3, 2024

sergenerbay commented Oct 27, 2024 • edited Loading

VictorUceda commented Dec 5, 2024

sergenerbay commented Dec 6, 2024

VictorUceda commented Dec 6, 2024

nuschandra commented Dec 9, 2024

VictorUceda commented Dec 10, 2024

nuschandra commented Sep 27, 2024 •

edited

Loading

sergenerbay commented Oct 27, 2024 •

edited

Loading