Skip to content

Commit

Permalink
Pinecone: add section for 2.x integration (#105)
Browse files Browse the repository at this point in the history
* integrations/

* fixes

* Update the text for 2.0

---------

Co-authored-by: bilgeyucel <[email protected]>
  • Loading branch information
anakin87 and bilgeyucel authored Jan 3, 2024
1 parent c21a522 commit 43e1857
Showing 1 changed file with 129 additions and 9 deletions.
138 changes: 129 additions & 9 deletions integrations/pinecone-document-store.md
Original file line number Diff line number Diff line change
@@ -1,31 +1,151 @@
---
layout: integration
name: Pinecone Document Store
name: Pinecone
description: Use a Pinecone database with Haystack
authors:
- name: deepset
socials:
github: deepset-ai
twitter: deepset_ai
linkedin: deepset-ai
pypi: https://pypi.org/project/farm-haystack
repo: https://github.com/deepset-ai/haystack
- name: Ashwin Mathur
socials:
github: awinml
twitter: awinml
linkedin: ashwin-mathur-ds
- name: Varun Mathur
socials:
github: vrunm
twitter: vrunmnlp
linkedin: varun-mathur-ds
pypi: https://pypi.org/project/pinecone_haystack/
repo: https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/pinecone
type: Document Store
report_issue: https://github.com/deepset-ai/haystack/issues
report_issue: https://github.com/deepset-ai/haystack-core-integrations/issues
logo: /logos/pinecone.png
version: Haystack 2.0
toc: true
---

[Pinecone](https://www.pinecone.io/) is a fast and scalable vector database which you can use in Haystack pipelines with the [PineconeDocumentStore](https://docs.haystack.deepset.ai/docs/document_store#initialization)
### Table of Contents

- [Overview](#overview)
- [Haystack 2.0](#haystack-20)
- [Installation](#installation)
- [Usage](#usage)
- [Haystack 1.x](#haystack-1x)
- [Installation (1.x)](#installation-1x)
- [Usage (1.x)](#usage-1x)

## Overview

[Pinecone](https://www.pinecone.io/) is a fast and scalable vector database which you can use in Haystack pipelines with the [PineconeDocumentStore](https://docs.haystack.deepset.ai/docs/document_store#initialization).

For a detailed overview of all the available methods and settings for the `PineconeDocumentStore`, visit the Haystack [API Reference](https://docs.haystack.deepset.ai/reference/document-store-api#pineconedocumentstore).

## Haystack 2.x

### Installation

```bash
pip install pinecone-haystack
```

### Usage

To use Pinecone as your data storage for your Haystack LLM pipelines, you must have an account with Pinecone and an API Key. Once you have those, you can initialize a `PineconeDocumentStore` for Haystack:

```python
from haystack.document_stores import PineconeDocumentStore

document_store = PineconeDocumentStore(api_key='YOUR_API_KEY',
similarity="cosine",
dimension=768)
```

#### Writing Documents to PineconeDocumentStore

To write documents to your `PineconeDocumentStore`, create an indexing pipeline, or use the `write_documents()` function.
For this step, you may make use of the available [Converters](https://docs.haystack.deepset.ai/v2.0/docs/converters) and [PreProcessors](https://docs.haystack.deepset.ai/v2.0/docs/preprocessors), as well as other [Integrations](/integrations) that might help you fetch data from other resources. Below is an example indexing pipeline that indexes your Markdown files into a Pinecone database.

#### Indexing Pipeline

```python
from haystack import Pipeline
from haystack.components.converters import MarkdownToDocument
from haystack.components.writers import DocumentWriter
from haystack.components.embedders import SentenceTransformersDocumentEmbedder
from haystack.components.preprocessors import DocumentSplitter
from pinecone_haystack import PineconeDocumentStore

document_store = PineconeDocumentStore(api_key="YOUR_API_KEY",
environment="gcp-starter",
dimension=768)

indexing = Pipeline()
indexing.add_component("converter", MarkdownToDocument())
indexing.add_component("splitter", DocumentSplitter(split_by="sentence", split_length=2))
indexing.add_component("embedder", SentenceTransformersDocumentEmbedder())
indexing.add_component("writer", DocumentWriter(document_store))
indexing.connect("converter", "splitter")
indexing.connect("splitter", "embedder")
indexing.connect("embedder", "writer")

indexing.run({"converter": {"sources": ["filename.pdf"]}})
```

### Using Pinecone in a RAG Pipeline

Once you have documents in your `PineconeDocumentStore`, it's ready to be used in any Haystack pipeline. Then, you can use `PineconeDenseRetriever` to retrieve data from your PineconeDocumentStore. For example, below is a pipeline that makes use of a custom prompt that is designed to answer questions for the retrieved documents.

```python
from haystack.components.embedders import SentenceTransformersTextEmbedder
from haystack.components.builders import PromptBuilder
from haystack.components.generators import OpenAIGenerator
from pinecone_haystack import PineconeDocumentStore
from pinecone_haystack.dense_retriever import PineconeDenseRetriever

document_store = PineconeDocumentStore(api_key='YOUR_API_KEY',
dimension=768)

prompt_template = """Answer the following query based on the provided context. If the context does
not include an answer, reply with 'I don't know'.\n
Query: {{query}}
Documents:
{% for doc in documents %}
{{ doc.content }}
{% endfor %}
Answer:
"""

query_pipeline = Pipeline()
query_pipeline.add_component("text_embedder", SentenceTransformersTextEmbedder())
query_pipeline.add_component("retriever", PineconeDenseRetriever(document_store=document_store))
query_pipeline.add_component("prompt_builder", PromptBuilder(template=prompt_template))
query_pipeline.add_component("generator", OpenAIGenerator(api_key=YOUR_OPENAI_KEY, model_name="gpt-4"))
query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
query_pipeline.connect("retriever.documents", "prompt_builder.documents")
query_pipeline.connect("prompt_builder", "generator")

query = "What is Pinecone?"
results = query_pipeline.run(
{
"text_embedder": {"text": question},
"prompt_builder": {"question": question},
}
)
```


For a detailed overview of all the available methods and settings for the `PineconeDocumentStore`, visit the Haystack [API Reference](https://docs.haystack.deepset.ai/reference/document-store-api#pineconedocumentstore)
## Haystack 1.x

## Installation
### Installation

```bash
pip install farm-haystack[pinecone]
```

## Usage
### Usage

To use Pinecone as your data storage for your Haystack LLM pipelines, you must have an account with Pinecone and an API Key. Once you have those, you can initialize a `PineconeDocumentStore` for Haystack:

Expand All @@ -37,7 +157,7 @@ document_store = PineconeDocumentStore(api_key='YOUR_API_KEY',
embedding_dim=768)
```

### Writing Documents to PineconeDocumentStore
#### Writing Documents to PineconeDocumentStore

To write documents to your `PineconeDocumentStore`, create an indexing pipeline, or use the `write_documents()` function.
For this step, you may make use of the available [FileConverters](https://docs.haystack.deepset.ai/docs/file_converters) and [PreProcessors](https://docs.haystack.deepset.ai/docs/preprocessor), as well as other [Integrations](/integrations) that might help you fetch data from other resources. Below is an example indexing pipeline that indexes your Markdown files into a Pinecone database.
Expand Down

0 comments on commit 43e1857

Please sign in to comment.