Skip to content

Conversation

TomeHirata
Copy link
Collaborator

Closes #8807

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Adds save/load functionality to the Embeddings class, enabling persistence of embeddings indices to disk for fast loading without recomputing embeddings.

  • Implements save(), load(), and from_saved() methods for the Embeddings class
  • Adds comprehensive test coverage for save/load functionality including error handling
  • Removes the TODO comment about adding save/load methods

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
dspy/retrievers/embeddings.py Implements save/load methods with pickle for config, numpy for embeddings, and FAISS index persistence
tests/retrievers/test_embeddings.py Adds comprehensive tests for save/load functionality and error cases

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

self.index = None

# Reinitialize the search function
self.search_fn = Unbatchify(self._batch_forward)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to reinitialize the search_fn?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, this is because from_saved bypasses __init__

# but we can still save the embeddings for brute force search
pass

def load(self, path: str, embedder):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we want this load to be called? Do we want users to first create an Embedding instance then call embedding.load(), or make it a class method that return a loaded embedding?

Copy link
Collaborator Author

@TomeHirata TomeHirata Sep 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually only for consistency with other APIs like module.load. I guess mostly people will use from_saved. Do you think we should make load a classmethod?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, I am asking because of the line self.search_fn = Unbatchify(self._batch_forward), if we do:

embedder = dspy.Embeddings(...)
embedder.load(...)

do we still need this self.search_fn = Unbatchify(self._batch_forward)?

@chenmoneygithub
Copy link
Collaborator

LGTM after #8818 (comment) is resolved, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature] Add .save and .load methods in embeddings.py
2 participants