GoogleCloudPlatform
diff --git a/‎ai-ml/rag-architectures/.gitignore
+4 b/‎ai-ml/rag-architectures/.gitignore
+4
diff --git a/‎ai-ml/rag-architectures/README.md
+32 b/‎ai-ml/rag-architectures/README.md
+32
diff --git a/‎ai-ml/rag-architectures/images/architecture.svg
+2,685 b/‎ai-ml/rag-architectures/images/architecture.svg
+2,685
diff --git a/‎ai-ml/rag-architectures/images/chatbot-success.png
242 KB b/‎ai-ml/rag-architectures/images/chatbot-success.png
242 KB
diff --git a/‎ai-ml/rag-architectures/images/ingestion-logs.png
226 KB b/‎ai-ml/rag-architectures/images/ingestion-logs.png
226 KB
diff --git a/‎ai-ml/rag-architectures/ingestion/Dockerfile
+6 b/‎ai-ml/rag-architectures/ingestion/Dockerfile
+6
diff --git a/‎ai-ml/rag-architectures/ingestion/README.md
+36 b/‎ai-ml/rag-architectures/ingestion/README.md
+36
diff --git a/‎ai-ml/rag-architectures/ingestion/embeddings.json
+104 b/‎ai-ml/rag-architectures/ingestion/embeddings.json
+104
diff --git a/‎ai-ml/rag-architectures/ingestion/main.py
+205 b/‎ai-ml/rag-architectures/ingestion/main.py
+205
diff --git a/‎ai-ml/rag-architectures/ingestion/pyvenv.cfg
+3 b/‎ai-ml/rag-architectures/ingestion/pyvenv.cfg
+3
diff --git a/‎ai-ml/rag-architectures/ingestion/requirements.txt
+6 b/‎ai-ml/rag-architectures/ingestion/requirements.txt
+6
@@ -0,0 +1,4 @@
+*lib*
+*bin*
+*share*
+*include*
@@ -0,0 +1,32 @@
+# RAG Architectures - App Source Code
+
+This directory contains the application source code for the [Google Cloud Next 2025 breakout session: "Architectural approaches for RAG infrastructure."](https://cloud.withgoogle.com/next/25/session-library?session=BRK2-074#all), by Kumar Dhanagopal and [Megan O'Keefe](https://github.com/askmeegs). The sample application is a chatbot that uses a RAG (Retrieval-Augmented Generation) setup to answer questions about Quantum Computing.
+
+The infrastructure code for this demo can be found in the [Architecture Center Samples repository](https://github.com/GoogleCloudPlatform/architecture-center-samples/tree/main/gen-ai-rag-vertex-ai-vector-search). 
+
+## Demo Architecture 
+
+![architecture](images/architecture.svg)
+
+## How to Run
+
+### Prerequisites 
+
+- A Google Cloud project with Billing enabled. 
+- Google Cloud SDK (`gcloud` CLI) installed on your machine.
+- Clone this repository to your local machine or Cloud Shell.
+- Docker, or another container runtime that allows you to run `docker` commands. 
+
+### Steps 
+
+1. Follow the [Architecture Center Samples instructions]() to deploy the base infra, including the Cloud Run Storage bucket, Vertex AI Vector Search index with endpoint, and the "shells" of the three Cloud Run Services. 
+2. Build, push, and deploy the `ingestion` Cloud Run Function. See the [ingestion README](ingestion/README.md).
+3. Build, push, and deploy the serving `backend` Cloud Run Service. See the [serving/backend README](serving/backend/README.md).
+4. Build, push, and deploy the serving `frontend` Cloud Run Service. See the [serving/frontend README](serving/frontend/README.md). 
+5. Test the demo by uploading one of the Wikipedia text articles from `ingestion/wikipedia-data` to the Cloud Storage bucket. Use the Cloud Run logs for the `ingestion` function to verify that the article was ingested and indexed in Vertex AI Vector Search. 
+
+![logs](images/ingestion-logs.png)
+
+Then, open the public URL for the Cloud Run `frontend` service. Ask a question related to the article you just uploaded, and view the response:    
+
+![chatbot-success](images/chatbot-success.png)
@@ -0,0 +1,6 @@
+FROM python:3.12-slim
+WORKDIR /app
+COPY . /app
+RUN pip install --no-cache-dir -r requirements.txt
+EXPOSE 80
+ENTRYPOINT ["python", "main.py"]
@@ -0,0 +1,36 @@
+# Ingestion Function 
+
+This Python function ingests raw text data from Cloud Storage, converts it to JSONL-style text embeddings, then uploads the embeddings to a Vertex AI Vector Search Index.
+
+### Sample Data
+
+The `wikipedia-data/` directory contains a sample dataset of 13 Wikipedia articles related to Quantum Computing. These articles were downloaded in February 2025. All Wikipedia text was licensed under the Creative Commons Attribution-ShareAlike 4.0 License ([src](https://en.wikipedia.org/wiki/Wikipedia:Text_of_the_Creative_Commons_Attribution-ShareAlike_4.0_International_License)), which allows for sharing and adaptation, with attribution.
+
+### Environment Variables [Required]
+
+```
+export PROJECT_ID=<project-id>
+export GCP_REGION=<region>
+export VECTOR_SEARCH_INDEX_ID=<index-id>
+export VECTOR_SEARCH_DEPLOYED_INDEX_ID=<deployed-index-id>
+export VECTOR_SEARCH_INDEX_ENDPOINT_NAME=<index-endpoint-name>
+```
+
+### Cloud Run Functions - IAM Roles needed for service account
+
+- Vertex AI User 
+- Storage Object User 
+
+
+### Build and push image to Artifact Registry 
+
+```
+export ARTIFACT_REGISTRY_HOST=us-central1-docker.pkg.dev
+export AR_REPOSITORY=gcf-artifacts/ingestion
+export PROJECT_ID=next25rag
+export IMAGE_TAG=latest 
+
+export URL=$ARTIFACT_REGISTRY_HOST/$PROJECT_ID/$AR_REPOSITORY/$IMAGE_TAG
+docker build --platform linux/amd64 -t $URL .
+docker push $URL 
+```
@@ -0,0 +1,205 @@
+import os
+import json
+import random
+import string
+from google import genai
+from google.genai.types import EmbedContentConfig
+from langchain.text_splitter import RecursiveCharacterTextSplitter
+from google.cloud import storage
+from google.cloud import aiplatform
+import functions_framework
+
+# Cloud Storage Client
+storage_client = storage.Client()
+
+# Vertex AI Embeddings API (genai SDK)
+os.environ["GOOGLE_CLOUD_PROJECT"] = os.getenv("PROJECT_ID")
+os.environ["GOOGLE_CLOUD_LOCATION"] = os.getenv("GCP_REGION")
+os.environ["GOOGLE_GENAI_USE_VERTEXAI"] = "true"
+genai_client = genai.Client()  # Embeddings API
+
+# Vertex AI Vector Search (fka Matching Engine)
+project_id = os.getenv("PROJECT_ID")
+location = os.getenv("GCP_REGION")
+index_id = os.getenv("VECTOR_SEARCH_INDEX_ID")
+index_endpoint_name = os.getenv("VECTOR_SEARCH_INDEX_ENDPOINT_NAME")
+aiplatform_client = aiplatform.init(project=project_id, location=location)
+index = aiplatform.MatchingEngineIndex(index_id)
+index_endpoint = aiplatform.MatchingEngineIndexEndpoint(
+    index_endpoint_name=index_endpoint_name
+)
+
+
+# ------- HELPER FUNCTIONS --------------------------
+def randomStringDigits(stringLength=5):
+    """Generate a random string of letters and digits"""
+    lettersAndDigits = string.digits
+    return "".join(random.choice(lettersAndDigits) for i in range(stringLength))
+
+
+def gcs_download_document(bucket_name, blob_name):
+    """
+    Downloads raw text document from Cloud Storage bucket
+    """
+    print("🪣 Downloading doc from GCS:" + bucket_name + "/" + blob_name)
+    bucket = storage_client.bucket(bucket_name)
+    blob = bucket.blob(blob_name)
+    dl = blob.download_as_string()
+    # clean up the text
+    return dl.decode("utf-8").strip().replace("\n", " ")
+
+
+def chunk_text(text, chunk_size=500):
+    """
+    Chunks raw document text into roughly 500-character chunks, while preserving individual words. https://python.langchain.com/api_reference/text_splitters/character/langchain_text_splitters.character.RecursiveCharacterTextSplitter.html#recursivecharactertextsplitter
+
+    Why 500? Another Vertex AI DB product, Vertex AI Search, uses a default chunk size of 500 tokens https://cloud.google.com/vertex-ai/generative-ai/docs/embeddings/get-text-embeddings#googlegenaisdk_embeddings_docretrieval_with_txt-python_genai_sdk
+    """
+    splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size)
+    return splitter.split_text(text)
+
+
+def get_embeddings(text_chunks):
+    """
+    Call Vertex AI Embeddings API (text-embedding-005 model) to generate vector representations of all document chunks
+    """
+    to_write = []
+    for i, chunk in enumerate(text_chunks):
+        print("⏲️ Generating embeddings for chunk " + str(i))
+        response = genai_client.models.embed_content(
+            model="text-embedding-005",
+            contents=[chunk],
+            config=EmbedContentConfig(
+                task_type="RETRIEVAL_DOCUMENT",
+                output_dimensionality=768,
+            ),
+        )
+        emb = response.embeddings[0].values
+        body = {
+            "id": randomStringDigits(stringLength=5),
+            "text": chunk,
+            "embedding": emb,
+        }
+        to_write.append(body)
+    return to_write
+
+
+def write_embeddings_to_jsonl(embeddings, outfile):
+    """
+    Write the embeddings to a JSONL file.
+    JSONL ("JSON List") is what Vertex AI Vector Search needs for upsert.
+    """
+    print("📝 Writing embeddings to JSONL")
+    with open(outfile, "w") as f:
+        for embedding in embeddings:
+            f.write(json.dumps(embedding) + "\n")
+
+
+def store_embeddings_vavs(infile):
+    """
+    Upsert (stream) embeddings to Vertex AI Vector Search index.
+    https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.indexes/upsertDatapoints#IndexDatapoint
+    """
+    with open(infile) as f:
+        lines = f.readlines()
+        datapoints = []
+        for line in lines:
+            item = json.loads(line)
+            d = {
+                "datapoint_id": str(item["id"]),
+                # Correctly format the restricts field as a list of dictionaries
+                "restricts": [{"namespace": "text", "allow_list": [item["text"]]}],
+                "feature_vector": item["embedding"],
+            }
+            datapoints.append(d)
+
+        print(
+            "⬆️ Upserting "
+            + str(len(datapoints))
+            + " embeddings to Vertex AI Vector Search"
+        )
+        index.upsert_datapoints(datapoints=datapoints)
+        print("✅ Done upserting.")
+
+
+def extract_id_and_text(neighbor):
+    """
+    Extract ID and text from a Vertex AI Vector Search "MatchNeighbor" object
+    """
+    id_value = neighbor.id
+    text_value = None
+    if hasattr(neighbor, "restricts") and neighbor.restricts:
+        for restrict in neighbor.restricts:
+            if hasattr(restrict, "name") and restrict.name == "text":
+                if hasattr(restrict, "allow_tokens") and restrict.allow_tokens:
+                    text_value = restrict.allow_tokens[0]
+                    break
+
+    return {"id": id_value, "text": text_value}
+
+
+def test_nearest_neighbors_query(q):
+    """
+    Test a query against the deployed Vertex AI Vector Search index.
+    """
+    response = genai_client.models.embed_content(
+        model="text-embedding-005",
+        contents=[q],
+        config=EmbedContentConfig(
+            task_type="RETRIEVAL_QUERY",
+            output_dimensionality=768,
+        ),
+    )
+    query_embedding = response.embeddings[0].values
+    print("Query is: " + str(q))
+    neighbors = index_endpoint.find_neighbors(
+        deployed_index_id=os.getenv("VECTOR_SEARCH_DEPLOYED_INDEX_ID"),
+        queries=[query_embedding],
+        num_neighbors=3,
+        return_full_datapoint=True,  # Make sure this is True
+    )
+
+    print("Got # neighbors: " + str(len(neighbors[0])))
+    for n in neighbors[0]:
+        result = extract_id_and_text(n)
+        print(f"ID: {result['id']}")
+        print(f"Text: {result['text']}")
+
+
+def ingest_text_document(filename):
+    """
+    Main ingestion function:
+    - Downloads raw text from Cloud Storage
+    - Chunks text
+    - Generates embeddings
+    - Writes embeddings to JSONL
+    - Upserts embeddings as JSONL to Vertex AI Vector Search
+    """
+    gcs_bucket = os.getenv("GCS_BUCKET")
+    filename = os.getenv("INPUT_DOC_FILENAME")
+    raw_text = gcs_download_document(gcs_bucket, filename)
+    print("\n📄 Raw text is char length: " + str(len(raw_text)))
+    text_chunks = chunk_text(raw_text)
+    print("\n✂️ Created " + str(len(text_chunks)) + " text chunks from document.")
+    embeddings = get_embeddings(text_chunks)
+    print("🧠 Created 1 embedding per chunk.")
+    write_embeddings_to_jsonl(embeddings, "embeddings.json")
+    store_embeddings_vavs("embeddings.json")
+    test_nearest_neighbors_query(filename)
+
+
+"""
+Process the CloudEvent data (GCS file upload) to trigger Vertex AI Vector Search ingestion for that file.
+"""
+
+
+@functions_framework.cloud_event
+def process_data(cloud_event):
+    data = cloud_event.data
+    print(f"CloudEvent data: \n {data}")
+    """ 
+     {'message': {'attributes': {'bucketId': 'ingest-67ab', 'eventTime': '2025-02-27T15:44:39.422831Z', 'eventType': 'OBJECT_FINALIZE', 'notificationConfig': 'projects/_/buckets/ingest-67ab/notificationConfigs/1', 'objectGeneration': '1740671079418498', 'objectId': 'willow_processor.txt', 'payloadFormat': 'JSON_API_V1'}, 'data': '...', 'messageId': '14113274556428337', 'message_id': '14113274556428337', 'publishTime': '2025-02-27T15:44:39.603Z', 'publish_time': '2025-02-27T15:44:39.603Z'}, 'subscription': 'projects/next25rag/subscriptions/eventarc-us-central1-ingestion-67ab-481379-sub-361'}
+    """
+    os.environ["GCS_BUCKET"] = data["message"]["attributes"]["bucketId"]
+    os.environ["INPUT_DOC_FILENAME"] = data["message"]["attributes"]["objectId"]
+    ingest_text_document(os.getenv("INPUT_DOC_FILENAME"))
@@ -0,0 +1,3 @@
+home = /Users/mokeefe/.pyenv/versions/3.10.10/bin
+include-system-site-packages = false
+version = 3.10.10
@@ -0,0 +1,6 @@
+google.genai
+langchain
+google-cloud-storage
+google-cloud-aiplatform
+functions-framework>=3.0.0,<4.0
+
-Original file line number
+Diff line change
 +*lib*
 +*bin*
 +*share*
 +*include*
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+home = /Users/mokeefe/.pyenv/versions/3.10.10/bin`
	`2`	`+include-system-site-packages = false`
	`3`	`+version = 3.10.10`