Skip to content

Support Custom User-Provided Embeddings in VectorStore add(...) Method #3540

Open
@aniketg-21

Description

@aniketg-21

🔍 Motivation
Current VectorStore implementations (e.g., ChromaVectorStore, PgVectorStore) automatically compute embeddings from Document.content via the configured EmbeddingModel. This rigid behavior is limiting in real-world applications where:

  1. Embeddings are precomputed externally using fine-tuned or specialized models (offline pipelines).
  2. Embeddings may represent a prompt, summary, or condensed form, not the entire content.
  3. Structured data (e.g., JSON) may be stored as content, but embedding the full structure reduces semantic quality.

✅ What This Proposal Adds
This feature introduces support for user-provided embeddings at ingestion time, improving flexibility and performance. Highlights include:

  • Overloaded add(List, List<float[]>) method in the VectorStore interface.
  • AbstractObservationVectorStore refactored to call a centralized doAdd with validation.
  • Embedding generation logic removed from VectorStore doAdd() implementations — instead, embeddings must be passed explicitly.
  • No need to modify the Document model.
  • No extra user config required for backward-compatible usage (existing add(List) continues to auto-embed).

⚙️ Implementation Benefits

  • Clean separation of embedding generation from storage logic.
  • Maintains full backward compatibility.
  • Enables efficient batch ingestion using external embedding workflows.

📎 Related Work
#1600 – Discusses the need for prompt-based or user-controlled embedding logic.
#1239 – Adds prompt-based embedding, but doesn't support full injection of embeddings per document.

✅ Acceptance Criteria

  • Overloaded add(documents, embeddings) method available in all VectorStore implementations.
  • Embedding validation (dimension, NaN/Inf check) is done before ingestion.
  • If add(documents) is called, embeddings are generated as before.
  • Supports batching where applicable (no batching enforced by user; store decides).
  • Works out-of-the-box for existing stores (e.g., Pinecone, PGVector, Milvus).

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions