Improve indexify examples (#918)

* fix video summarization example * fixes examples and adds docker compose support * remote graph fixes for knowledge graph example and docker compose added * remote graph fixes for knowledge graph example and docker compose added
tensorlakeai · Oct 8, 2024 · b09056e · b09056e
1 parent 6508928
commit b09056e
Show file tree

Hide file tree

Showing 14 changed files with 769 additions and 415 deletions.
diff --git a/.gitignore b/.gitignore
@@ -39,6 +39,8 @@ src/state/store/snapshots/*
 *.mp3
 data/
 upload.pdf
+*.wav
+*.tgz
 
 # MacOS-specific files
 .DS_Store
@@ -53,6 +55,9 @@ local_server_config_*.yaml
 sqlite*
 *.trace
 
+# logs
+*.log
+
 # Miscellaneous
-*.tgz
-/executor-py/~
+/executor-py/~
+
diff --git a/README.md b/README.md
@@ -10,9 +10,8 @@ Some of the use-cases that you can use Indexify for -
 
 * [Scraping and Summarizing websites](examples/website_audio_summary/)
 * [PDF Documents Extraction and Indexing](examples/pdf_document_extraction/)
-* [Transcribing audio and summarization](examples/website_audio_summary/)
-* [Knowledge Graph RAG Pipeline](examples/knowledge_graph_generation/)
-* [Knowledge Graph QA System](examples/knowledge_graph_qa_system/)
+* [Transcribing audio and summarization](examples/video_summarization/)
+* [Knowledge Graph RAG and Question Answering](examples/knowledge_graph/)
 
 ### Key Features
 * **Conditional Branching and Data Flow:** Router functions can conditionally chose one or more edges in Graph making it easy to invoke expert models based on inputs.

diff --git a/examples/image_detection_indexing/workflow.py b/examples/image_detection_indexing/workflow.py
diff --git a/examples/knowledge_graph/README.md b/examples/knowledge_graph/README.md
@@ -1,6 +1,6 @@
 # Knowledge Graph RAG and Question Answering with Indexify
 
-This project demonstrates how to build a Knowledge Graph Retrieval-Augmented Generation (RAG) pipeline and a Question Answering system using Indexify. The pipeline extracts entities and relationships from text, builds a knowledge graph, stores it in Neo4j, generates embeddings, and answers questions based on the stored knowledge.
+This project demonstrates how to build a Knowledge Graph Retrieval-Augmented Generation (RAG) pipeline and a Question Answering system using Indexify.
 
 ## Features
 
@@ -13,42 +13,86 @@ This project demonstrates how to build a Knowledge Graph Retrieval-Augmented Gen
 ## Prerequisites
 
 - Python 3.9+
-- Neo4j database
 - Google Cloud account with Gemini API access
-- Indexify library
+- Docker and Docker Compose (for containerized setup)
 
-## Installation
+## Installation and Usage
+
+### Option 1: Local Installation - In Process
 
 1. Clone this repository:
    ```
    git clone https://github.com/tensorlakeai/indexify
    cd indexify/examples/knowledge_graph
    ```
 
-2. Install the required dependencies:
+2. Create a virtual environment and activate it:
+   ```
+   python -m venv venv
+   source venv/bin/activate
+   ```
+
+3. Install the required dependencies:
    ```
    pip install -r requirements.txt
-   python -m spacy download en_core_web_sm
    ```
 
-3. Set up environment variables:
+4. Install and start a Neo4j database locally.
+
+5. Set up environment variables:
    ```
    export NEO4J_URI=bolt://localhost:7687
    export NEO4J_USER=neo4j
    export NEO4J_PASSWORD=your_password
    export GOOGLE_API_KEY=your_google_api_key
    ```
 
-## Usage
+6. Run the main script:
+   ```
+   python workflow.py --mode in-process-run
+   ```
 
-1. Ensure your Neo4j database is running and accessible.
+### Option 2: Using Docker Compose - deployed graph
 
-2. Run the main script:
+1. Clone this repository:
    ```
-   python kg_rag_qa_pipeline.py
+   git clone https://github.com/tensorlakeai/indexify
+   cd indexify/examples/knowledge_graph
    ```
 
-3. The script will process a sample document about Albert Einstein, create a knowledge graph, store it in Neo4j, generate embeddings, and then answer sample questions.
+2. Create a virtual environment and activate it:
+   ```
+   python -m venv venv
+   source venv/bin/activate
+   ```
+
+3. Install indexify:
+   ```
+   pip install indexify
+   ```
+
+4. Ensure Docker and Docker Compose are installed on your system.
+
+5. Create a `.env` file in the project directory and add your Google API key:
+   ```
+   GOOGLE_API_KEY=your_google_api_key_here
+   ```
+
+4. Build and start the services:
+   ```
+   indexify-cli build-image workflow.py NLPFunction
+   indexify-cli build-image workflow.py generate_embeddings
+   indexify-cli build-image workflow.py build_knowledge_graph
+   indexify-cli build-image workflow.py store_in_neo4j
+   indexify-cli build-image workflow.py generate_answer
+   docker-compose up --build
+   ```
+
+5. Run the main script:
+   ```
+   python workflow.py --mode remote-deploy
+   python workflow.py --mode remote-run
+   ```
 
 ## How it Works
 
@@ -64,13 +108,6 @@ This project demonstrates how to build a Knowledge Graph Retrieval-Augmented Gen
    - Query Execution: Executes the Cypher query on the Neo4j database.
    - Answer Generation: Uses Gemini AI to generate a natural language answer based on the query results.
 
-## Customization
-
-- Modify the `sample_doc` in the `main()` function to process different texts.
-- Adjust the relationship extraction logic in `extract_relationships()` for more sophisticated relationship identification.
-- Change the embedding model in `generate_embeddings()` to use different pre-trained models.
-- Fine-tune the prompts in `question_to_cypher()` and `generate_answer()` functions for better results.
-
 ## Indexify Graph Structure
 
 The project uses two Indexify graphs:
@@ -85,3 +122,10 @@ The project uses two Indexify graphs:
    ```
    question_to_cypher -> execute_cypher_query -> generate_answer
    ```
+
+## Customization
+
+- Modify the `sample_doc` in the `main()` function of `kg_rag_qa_pipeline.py` to process different texts.
+- Adjust the relationship extraction logic in `extract_relationships()` for more sophisticated relationship identification.
+- Change the embedding model in `generate_embeddings()` to use different pre-trained models.
+- Fine-tune the prompts in `question_to_cypher()` and `generate_answer()` functions for better results.
diff --git a/examples/knowledge_graph/docker-compose.yml b/examples/knowledge_graph/docker-compose.yml
@@ -0,0 +1,123 @@
+networks:
+  server:
+services:
+  indexify:
+    image: tensorlake/indexify-server
+    ports:
+      - 8900:8900
+    networks:
+      server:
+        aliases:
+          - indexify-server
+    volumes:
+      - data:/tmp/indexify-blob-storage
+
+  nlp-executor:
+    image: tensorlake/nlp-image:latest
+    command:
+      [
+        "indexify-cli",
+        "executor",
+        "--server-addr",
+        "indexify:8900"
+      ]
+    networks:
+      server:
+    volumes:
+      - data:/tmp/indexify-blob-storage
+    deploy:
+      mode: replicated
+      replicas: 1
+
+  embedding-executor:
+    image: tensorlake/embedding-image:latest
+    command:
+      [
+        "indexify-cli",
+        "executor",
+        "--server-addr",
+        "indexify:8900"
+      ]
+    networks:
+      server:
+    volumes:
+      - data:/tmp/indexify-blob-storage
+    deploy:
+      mode: replicated
+      replicas: 1
+
+  neo4j-executor:
+    image: tensorlake/neo4j-image:latest
+    environment:
+      - NEO4J_URI=bolt://neo4j-server:7687
+      - NEO4J_USER=neo4j
+      - NEO4J_PASSWORD=indexify
+    command:
+      [
+        "indexify-cli",
+        "executor",
+        "--server-addr",
+        "indexify:8900"
+      ]
+    networks:
+      server:
+    volumes:
+      - data:/tmp/indexify-blob-storage
+    deploy:
+      mode: replicated
+      replicas: 1
+
+  gemini-executor:
+    image: tensorlake/gemini-image:latest
+    environment:
+      - GOOGLE_API_KEY=${GOOGLE_API_KEY}
+    command:
+      [
+        "indexify-cli",
+        "executor",
+        "--server-addr",
+        "indexify:8900"
+      ]
+    networks:
+      server:
+    volumes:
+      - data:/tmp/indexify-blob-storage
+    deploy:
+      mode: replicated
+      replicas: 1
+
+  base-executor:
+    image: tensorlake/base-image:latest
+    command:
+      [
+        "indexify-cli",
+        "executor",
+        "--server-addr",
+        "indexify:8900"
+      ]
+    networks:
+      server:
+    volumes:
+      - data:/tmp/indexify-blob-storage
+    deploy:
+      mode: replicated
+      replicas: 1
+
+  neo4j-server:
+    image: neo4j:4.4
+    environment:
+      - NEO4J_AUTH=neo4j/indexify
+    ports:
+      - "7474:7474"
+      - "7687:7687"
+    networks:
+      server:
+    volumes:
+      - data:/tmp/indexify-blob-storage
+    deploy:
+      mode: replicated
+      replicas: 1
+
+volumes:
+  data:
+  neo4j_data: