Skip to content

Conversation

@qn895
Copy link
Member

@qn895 qn895 commented Sep 29, 2025

No description provided.

@qn895 qn895 self-assigned this Sep 29, 2025
@qn895 qn895 added the blog label Sep 29, 2025
@gitnotebooks
Copy link

gitnotebooks bot commented Sep 29, 2025

Found 1 changed notebook. Review the changes at https://app.gitnotebooks.com/elastic/elasticsearch-labs/pull/491

@carlyrichmond
Copy link
Contributor

Thanks for adding the code in a notebook @qn895! I've tweaked the connection settings to use API Key and endpoint instead of cloud id and the basic auth without credentials to show best connection practice.

Could you also add an example query to the end of the notebook as well from the piece for completeness? Once that's in it should be good to go in my view.

Hope that helps!

@qn895
Copy link
Member Author

qn895 commented Oct 8, 2025

@carlyrichmond Just updated the notebook with more example queries, thank you for the feedback!

@qn895 qn895 requested a review from Copilot October 14, 2025 16:22
@qn895
Copy link
Member Author

qn895 commented Oct 14, 2025

@carlyrichmond Hi Carly, just wanted to check with you if you're good with my latest updates for this notebook. Thanks!

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a comprehensive Python Jupyter notebook demonstrating cross-lingual vector search capabilities using multilingual embedding models. The notebook shows how to overcome language barriers by enabling queries and information retrieval in any language from both single and multilingual datasets.

Key changes include:

  • Complete implementation of multilingual COCO dataset processing and indexing
  • Integration with Elasticsearch for vector search functionality
  • Cross-lingual query examples demonstrating search capabilities across different languages
Comments suppressed due to low confidence (4)

supporting-blog-content/multilingual-embedding/multilingual_embedding.ipynb:1

  • The count of documents is incorrect. len(bulk_data) includes both index operations and document data, so it counts twice the actual number of documents. Should be len(bulk_data) // 2 to get the correct document count.
{

supporting-blog-content/multilingual-embedding/multilingual_embedding.ipynb:1

  • This creates a new Elasticsearch client without credentials, overriding the previously configured client with authentication. This will likely cause authentication failures. Should reuse the existing es client or remove this redundant initialization.
{

supporting-blog-content/multilingual-embedding/multilingual_embedding.ipynb:1

  • Index name inconsistency: the destination index is 'coco_multilingual' but earlier the index was created as 'coco_multi'. This will cause the reindex operation to fail or create data in an unmapped index.
{

supporting-blog-content/multilingual-embedding/multilingual_embedding.ipynb:1

  • Index name inconsistency: searching 'coco_multi' but the reindex operation in the previous cell targets 'coco_multilingual'. The index names should be consistent throughout the notebook.
{

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@carlyrichmond
Copy link
Contributor

@qn895 I've fixed the notebook and added a prerequisites cell, so it's good to merge. Thanks so much for sharing!

@carlyrichmond carlyrichmond merged commit 36d1e62 into main Oct 15, 2025
2 checks passed
@carlyrichmond carlyrichmond deleted the multilingual-embedding branch October 15, 2025 15:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants