Breaking Language Barriers with Cross-Lingual Vector Search Python Notebook #491

qn895 · 2025-09-29T00:31:56Z

No description provided.

gitnotebooks · 2025-09-29T00:32:00Z

Found 1 changed notebook. Review the changes at https://app.gitnotebooks.com/elastic/elasticsearch-labs/pull/491

…reference

carlyrichmond · 2025-10-08T11:26:03Z

Thanks for adding the code in a notebook @qn895! I've tweaked the connection settings to use API Key and endpoint instead of cloud id and the basic auth without credentials to show best connection practice.

Could you also add an example query to the end of the notebook as well from the piece for completeness? Once that's in it should be good to go in my view.

Hope that helps!

qn895 · 2025-10-08T15:47:29Z

@carlyrichmond Just updated the notebook with more example queries, thank you for the feedback!

qn895 · 2025-10-14T16:22:40Z

@carlyrichmond Hi Carly, just wanted to check with you if you're good with my latest updates for this notebook. Thanks!

Copilot

Pull Request Overview

This PR introduces a comprehensive Python Jupyter notebook demonstrating cross-lingual vector search capabilities using multilingual embedding models. The notebook shows how to overcome language barriers by enabling queries and information retrieval in any language from both single and multilingual datasets.

Key changes include:

Complete implementation of multilingual COCO dataset processing and indexing
Integration with Elasticsearch for vector search functionality
Cross-lingual query examples demonstrating search capabilities across different languages

Comments suppressed due to low confidence (4)

supporting-blog-content/multilingual-embedding/multilingual_embedding.ipynb:1

The count of documents is incorrect. len(bulk_data) includes both index operations and document data, so it counts twice the actual number of documents. Should be len(bulk_data) // 2 to get the correct document count.

supporting-blog-content/multilingual-embedding/multilingual_embedding.ipynb:1

This creates a new Elasticsearch client without credentials, overriding the previously configured client with authentication. This will likely cause authentication failures. Should reuse the existing es client or remove this redundant initialization.

supporting-blog-content/multilingual-embedding/multilingual_embedding.ipynb:1

Index name inconsistency: the destination index is 'coco_multilingual' but earlier the index was created as 'coco_multi'. This will cause the reindex operation to fail or create data in an unmapped index.

supporting-blog-content/multilingual-embedding/multilingual_embedding.ipynb:1

Index name inconsistency: searching 'coco_multi' but the reindex operation in the previous cell targets 'coco_multilingual'. The index names should be consistent throughout the notebook.

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

supporting-blog-content/multilingual-embedding/multilingual_embedding.ipynb

…bedding.ipynb Co-authored-by: Copilot <[email protected]>

carlyrichmond · 2025-10-15T15:33:39Z

@qn895 I've fixed the notebook and added a prerequisites cell, so it's good to merge. Thanks so much for sharing!

Add new python notebook

680b27c

qn895 self-assigned this Sep 29, 2025

qn895 added the blog label Sep 29, 2025

carlyrichmond added 3 commits October 8, 2025 13:12

Adding title to make the build pass

3702553

Changing to use endpoint instead of cloud id, and removing unused os …

fd84a1a

…reference

Changing code formatting to fix issue

3b8e5d2

Update with new ES queries for completeness

4ef9d83

qn895 requested a review from Copilot October 14, 2025 16:22

Copilot AI reviewed Oct 14, 2025

View reviewed changes

supporting-blog-content/multilingual-embedding/multilingual_embedding.ipynb Outdated Show resolved Hide resolved

qn895 and others added 3 commits October 14, 2025 11:23

Update supporting-blog-content/multilingual-embedding/multilingual_em…

fcaaffd

…bedding.ipynb Co-authored-by: Copilot <[email protected]>

Update notebook

07dc30b

Fixing notebook build and validating changes

453f044

carlyrichmond approved these changes Oct 15, 2025

View reviewed changes

carlyrichmond merged commit 36d1e62 into main Oct 15, 2025
2 checks passed

carlyrichmond deleted the multilingual-embedding branch October 15, 2025 15:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Breaking Language Barriers with Cross-Lingual Vector Search Python Notebook #491

Breaking Language Barriers with Cross-Lingual Vector Search Python Notebook #491

Uh oh!

qn895 commented Sep 29, 2025

Uh oh!

gitnotebooks bot commented Sep 29, 2025

Uh oh!

carlyrichmond commented Oct 8, 2025

Uh oh!

qn895 commented Oct 8, 2025

Uh oh!

qn895 commented Oct 14, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

carlyrichmond commented Oct 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Breaking Language Barriers with Cross-Lingual Vector Search Python Notebook #491

Breaking Language Barriers with Cross-Lingual Vector Search Python Notebook #491

Uh oh!

Conversation

qn895 commented Sep 29, 2025

Uh oh!

gitnotebooks bot commented Sep 29, 2025

Uh oh!

carlyrichmond commented Oct 8, 2025

Uh oh!

qn895 commented Oct 8, 2025

Uh oh!

qn895 commented Oct 14, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

carlyrichmond commented Oct 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants