Skip to content

Commit

Permalink
Improve indexify examples (#918)
Browse files Browse the repository at this point in the history
* fix video summarization example

* fixes examples and adds docker compose support

* remote graph fixes for knowledge graph example and docker compose added

* remote graph fixes for knowledge graph example and docker compose added
  • Loading branch information
PulkitMishra authored Oct 8, 2024
1 parent 6508928 commit b09056e
Show file tree
Hide file tree
Showing 14 changed files with 769 additions and 415 deletions.
9 changes: 7 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,8 @@ src/state/store/snapshots/*
*.mp3
data/
upload.pdf
*.wav
*.tgz

# MacOS-specific files
.DS_Store
Expand All @@ -53,6 +55,9 @@ local_server_config_*.yaml
sqlite*
*.trace

# logs
*.log

# Miscellaneous
*.tgz
/executor-py/~
/executor-py/~

5 changes: 2 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,8 @@ Some of the use-cases that you can use Indexify for -

* [Scraping and Summarizing websites](examples/website_audio_summary/)
* [PDF Documents Extraction and Indexing](examples/pdf_document_extraction/)
* [Transcribing audio and summarization](examples/website_audio_summary/)
* [Knowledge Graph RAG Pipeline](examples/knowledge_graph_generation/)
* [Knowledge Graph QA System](examples/knowledge_graph_qa_system/)
* [Transcribing audio and summarization](examples/video_summarization/)
* [Knowledge Graph RAG and Question Answering](examples/knowledge_graph/)

### Key Features
* **Conditional Branching and Data Flow:** Router functions can conditionally chose one or more edges in Graph making it easy to invoke expert models based on inputs.
Expand Down
15 changes: 0 additions & 15 deletions examples/image_detection_indexing/workflow.py

This file was deleted.

82 changes: 63 additions & 19 deletions examples/knowledge_graph/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Knowledge Graph RAG and Question Answering with Indexify

This project demonstrates how to build a Knowledge Graph Retrieval-Augmented Generation (RAG) pipeline and a Question Answering system using Indexify. The pipeline extracts entities and relationships from text, builds a knowledge graph, stores it in Neo4j, generates embeddings, and answers questions based on the stored knowledge.
This project demonstrates how to build a Knowledge Graph Retrieval-Augmented Generation (RAG) pipeline and a Question Answering system using Indexify.

## Features

Expand All @@ -13,42 +13,86 @@ This project demonstrates how to build a Knowledge Graph Retrieval-Augmented Gen
## Prerequisites

- Python 3.9+
- Neo4j database
- Google Cloud account with Gemini API access
- Indexify library
- Docker and Docker Compose (for containerized setup)

## Installation
## Installation and Usage

### Option 1: Local Installation - In Process

1. Clone this repository:
```
git clone https://github.com/tensorlakeai/indexify
cd indexify/examples/knowledge_graph
```

2. Install the required dependencies:
2. Create a virtual environment and activate it:
```
python -m venv venv
source venv/bin/activate
```

3. Install the required dependencies:
```
pip install -r requirements.txt
python -m spacy download en_core_web_sm
```

3. Set up environment variables:
4. Install and start a Neo4j database locally.

5. Set up environment variables:
```
export NEO4J_URI=bolt://localhost:7687
export NEO4J_USER=neo4j
export NEO4J_PASSWORD=your_password
export GOOGLE_API_KEY=your_google_api_key
```

## Usage
6. Run the main script:
```
python workflow.py --mode in-process-run
```

1. Ensure your Neo4j database is running and accessible.
### Option 2: Using Docker Compose - deployed graph

2. Run the main script:
1. Clone this repository:
```
python kg_rag_qa_pipeline.py
git clone https://github.com/tensorlakeai/indexify
cd indexify/examples/knowledge_graph
```

3. The script will process a sample document about Albert Einstein, create a knowledge graph, store it in Neo4j, generate embeddings, and then answer sample questions.
2. Create a virtual environment and activate it:
```
python -m venv venv
source venv/bin/activate
```

3. Install indexify:
```
pip install indexify
```

4. Ensure Docker and Docker Compose are installed on your system.

5. Create a `.env` file in the project directory and add your Google API key:
```
GOOGLE_API_KEY=your_google_api_key_here
```

4. Build and start the services:
```
indexify-cli build-image workflow.py NLPFunction
indexify-cli build-image workflow.py generate_embeddings
indexify-cli build-image workflow.py build_knowledge_graph
indexify-cli build-image workflow.py store_in_neo4j
indexify-cli build-image workflow.py generate_answer
docker-compose up --build
```

5. Run the main script:
```
python workflow.py --mode remote-deploy
python workflow.py --mode remote-run
```

## How it Works

Expand All @@ -64,13 +108,6 @@ This project demonstrates how to build a Knowledge Graph Retrieval-Augmented Gen
- Query Execution: Executes the Cypher query on the Neo4j database.
- Answer Generation: Uses Gemini AI to generate a natural language answer based on the query results.

## Customization

- Modify the `sample_doc` in the `main()` function to process different texts.
- Adjust the relationship extraction logic in `extract_relationships()` for more sophisticated relationship identification.
- Change the embedding model in `generate_embeddings()` to use different pre-trained models.
- Fine-tune the prompts in `question_to_cypher()` and `generate_answer()` functions for better results.

## Indexify Graph Structure

The project uses two Indexify graphs:
Expand All @@ -85,3 +122,10 @@ The project uses two Indexify graphs:
```
question_to_cypher -> execute_cypher_query -> generate_answer
```

## Customization

- Modify the `sample_doc` in the `main()` function of `kg_rag_qa_pipeline.py` to process different texts.
- Adjust the relationship extraction logic in `extract_relationships()` for more sophisticated relationship identification.
- Change the embedding model in `generate_embeddings()` to use different pre-trained models.
- Fine-tune the prompts in `question_to_cypher()` and `generate_answer()` functions for better results.
123 changes: 123 additions & 0 deletions examples/knowledge_graph/docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
networks:
server:
services:
indexify:
image: tensorlake/indexify-server
ports:
- 8900:8900
networks:
server:
aliases:
- indexify-server
volumes:
- data:/tmp/indexify-blob-storage

nlp-executor:
image: tensorlake/nlp-image:latest
command:
[
"indexify-cli",
"executor",
"--server-addr",
"indexify:8900"
]
networks:
server:
volumes:
- data:/tmp/indexify-blob-storage
deploy:
mode: replicated
replicas: 1

embedding-executor:
image: tensorlake/embedding-image:latest
command:
[
"indexify-cli",
"executor",
"--server-addr",
"indexify:8900"
]
networks:
server:
volumes:
- data:/tmp/indexify-blob-storage
deploy:
mode: replicated
replicas: 1

neo4j-executor:
image: tensorlake/neo4j-image:latest
environment:
- NEO4J_URI=bolt://neo4j-server:7687
- NEO4J_USER=neo4j
- NEO4J_PASSWORD=indexify
command:
[
"indexify-cli",
"executor",
"--server-addr",
"indexify:8900"
]
networks:
server:
volumes:
- data:/tmp/indexify-blob-storage
deploy:
mode: replicated
replicas: 1

gemini-executor:
image: tensorlake/gemini-image:latest
environment:
- GOOGLE_API_KEY=${GOOGLE_API_KEY}
command:
[
"indexify-cli",
"executor",
"--server-addr",
"indexify:8900"
]
networks:
server:
volumes:
- data:/tmp/indexify-blob-storage
deploy:
mode: replicated
replicas: 1

base-executor:
image: tensorlake/base-image:latest
command:
[
"indexify-cli",
"executor",
"--server-addr",
"indexify:8900"
]
networks:
server:
volumes:
- data:/tmp/indexify-blob-storage
deploy:
mode: replicated
replicas: 1

neo4j-server:
image: neo4j:4.4
environment:
- NEO4J_AUTH=neo4j/indexify
ports:
- "7474:7474"
- "7687:7687"
networks:
server:
volumes:
- data:/tmp/indexify-blob-storage
deploy:
mode: replicated
replicas: 1

volumes:
data:
neo4j_data:
Loading

0 comments on commit b09056e

Please sign in to comment.