Support Custom User-Provided Embeddings in VectorStore add(...) Method

🔍 Motivation
Current VectorStore implementations (e.g., ChromaVectorStore, PgVectorStore) automatically compute embeddings from Document.content via the configured EmbeddingModel. This rigid behavior is limiting in real-world applications where:

1. Embeddings are precomputed externally using fine-tuned or specialized models (offline pipelines).
2. Embeddings may represent a prompt, summary, or condensed form, not the entire content.
3. Structured data (e.g., JSON) may be stored as content, but embedding the full structure reduces semantic quality.

✅ What This Proposal Adds
This feature introduces support for user-provided embeddings at ingestion time, improving flexibility and performance. Highlights include:

- Overloaded add(List<Document>, List<float[]>) method in the VectorStore interface.
- AbstractObservationVectorStore refactored to call a centralized doAdd with validation.
- Embedding generation logic removed from VectorStore doAdd() implementations — instead, embeddings must be passed explicitly.
- No need to modify the Document model.
- No extra user config required for backward-compatible usage (existing add(List<Document>) continues to auto-embed).

⚙️ Implementation Benefits
- Clean separation of embedding generation from storage logic.
- Maintains full backward compatibility.
- Enables efficient batch ingestion using external embedding workflows.

📎 Related Work
[#1600](https://github.com/spring-projects/spring-ai/issues/1600) – Discusses the need for prompt-based or user-controlled embedding logic.
[#1239](https://github.com/spring-projects/spring-ai/pull/1239) – Adds prompt-based embedding, but doesn't support full injection of embeddings per document.

✅ Acceptance Criteria
-  Overloaded add(documents, embeddings) method available in all VectorStore implementations.
-  Embedding validation (dimension, NaN/Inf check) is done before ingestion.
-  If add(documents) is called, embeddings are generated as before.
-  Supports batching where applicable (no batching enforced by user; store decides).
-  Works out-of-the-box for existing stores (e.g., Pinecone, PGVector, Milvus).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support Custom User-Provided Embeddings in VectorStore add(...) Method #3540

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support Custom User-Provided Embeddings in VectorStore add(...) Method #3540

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions