-
Notifications
You must be signed in to change notification settings - Fork 253
Breaking Language Barriers with Cross-Lingual Vector Search Python Notebook #491
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Found 1 changed notebook. Review the changes at https://app.gitnotebooks.com/elastic/elasticsearch-labs/pull/491 |
|
Thanks for adding the code in a notebook @qn895! I've tweaked the connection settings to use API Key and endpoint instead of cloud id and the basic auth without credentials to show best connection practice. Could you also add an example query to the end of the notebook as well from the piece for completeness? Once that's in it should be good to go in my view. Hope that helps! |
|
@carlyrichmond Just updated the notebook with more example queries, thank you for the feedback! |
|
@carlyrichmond Hi Carly, just wanted to check with you if you're good with my latest updates for this notebook. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a comprehensive Python Jupyter notebook demonstrating cross-lingual vector search capabilities using multilingual embedding models. The notebook shows how to overcome language barriers by enabling queries and information retrieval in any language from both single and multilingual datasets.
Key changes include:
- Complete implementation of multilingual COCO dataset processing and indexing
- Integration with Elasticsearch for vector search functionality
- Cross-lingual query examples demonstrating search capabilities across different languages
Comments suppressed due to low confidence (4)
supporting-blog-content/multilingual-embedding/multilingual_embedding.ipynb:1
- The count of documents is incorrect.
len(bulk_data)includes both index operations and document data, so it counts twice the actual number of documents. Should belen(bulk_data) // 2to get the correct document count.
{
supporting-blog-content/multilingual-embedding/multilingual_embedding.ipynb:1
- This creates a new Elasticsearch client without credentials, overriding the previously configured client with authentication. This will likely cause authentication failures. Should reuse the existing
esclient or remove this redundant initialization.
{
supporting-blog-content/multilingual-embedding/multilingual_embedding.ipynb:1
- Index name inconsistency: the destination index is 'coco_multilingual' but earlier the index was created as 'coco_multi'. This will cause the reindex operation to fail or create data in an unmapped index.
{
supporting-blog-content/multilingual-embedding/multilingual_embedding.ipynb:1
- Index name inconsistency: searching 'coco_multi' but the reindex operation in the previous cell targets 'coco_multilingual'. The index names should be consistent throughout the notebook.
{
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
supporting-blog-content/multilingual-embedding/multilingual_embedding.ipynb
Outdated
Show resolved
Hide resolved
…bedding.ipynb Co-authored-by: Copilot <[email protected]>
|
@qn895 I've fixed the notebook and added a prerequisites cell, so it's good to merge. Thanks so much for sharing! |
No description provided.