VectorInstitute · jacobthebanana · Aug 11, 2025 · Aug 11, 2025
diff --git a/.gitignore b/.gitignore
@@ -25,3 +25,6 @@ wheels/
 
 # Built documentations
 site/
+
+# Gradio "share=True" utils.
+/.gradio
diff --git a/README.md b/README.md
@@ -8,16 +8,6 @@ This is a collection of reference implementations for Vector Institute's **Agent
 
 This repository includes several modules, each showcasing a different aspect of agent-based RAG systems:
 
-**1. Basics: Reason-and-Act RAG**
-A minimal Reason-and-Act (ReAct) agent for knowledge retrieval, implemented without any agent framework.
-
-- **[1.0 Search Demo](src/1_basics/0_search_demo/README.md)**
-  A simple demo showing the capabilities (and limitations) of a knowledgebase search.
-
-
-- **[1.1 ReAct Agent for RAG](src/1_basics/1_react_rag/README.md)**
-  Basic ReAct agent for step-by-step retrieval and answer generation.
-
 **2. Frameworks: OpenAI Agents SDK**
   Showcases the use of the OpenAI agents SDK to reduce boilerplate and improve readability.
 
@@ -26,7 +16,7 @@ A minimal Reason-and-Act (ReAct) agent for knowledge retrieval, implemented with
   The use of langfuse for making the agent less of a black-box is also introduced in this module.
 
 - **[2.2 Multi-agent Setup for Deep Research](src/2_frameworks/2_multi_agent/README.md)**
-  Demo of a multi-agent architecture with planner, researcher, and writer agents collaborating on complex queries.
+  Demo of a multi-agent architecture to improve efficiency on long-context inputs, reduce latency, and reduce LLM costs. Two versions are available- "efficient" and "verbose". For the build days, you should start from the "efficient" version as that provides greater flexibility and is easier to follow.
 
 **3. Evals: Automated Evaluation Pipelines**
   Contains scripts and utilities for evaluating agent performance using LLM-as-a-judge and synthetic data generation. Includes tools for uploading datasets, running evaluations, and integrating with [Langfuse](https://langfuse.com/) for traceability.
@@ -37,6 +27,17 @@ A minimal Reason-and-Act (ReAct) agent for knowledge retrieval, implemented with
 - **[3.2 Evaluation on Synthetic Dataset](src/3_evals/2_synthetic_data/README.md)**
   Showcases the generation of synthetic evaluation data for testing agents.
 
+We also provide "basic" no-framework implementations. These are meant to showcase how agents work behind the scene and are excessively verbose in the implementation. You should not use these as the basis for real projects.
+
+**1. Basics: Reason-and-Act RAG**
+A minimal Reason-and-Act (ReAct) agent for knowledge retrieval, implemented without any agent framework.
+
+- **[1.0 Search Demo](src/1_basics/0_search_demo/README.md)**
+  A simple demo showing the capabilities (and limitations) of a knowledgebase search.
+
+- **[1.1 ReAct Agent for RAG](src/1_basics/1_react_rag/README.md)**
+  Basic ReAct agent for step-by-step retrieval and answer generation.
+
 
 ## Getting Started
 
@@ -49,29 +50,45 @@ cp -v .env.example .env
 Run integration tests to validate that your API keys are set up correctly.
 
 ```bash
-PYTHONPATH="." uv run pytest -sv tests/tool_tests/test_integration.py
+PYTHONPATH="." uv run --env-file .env pytest -sv tests/tool_tests/test_integration.py
 ```
 
 ## Reference Implementations
 
-### 1. Basics
+For "Gradio App" reference implementations, running the script would print out a "public URL" ending in `gradio.live` (might take a few seconds to appear.) To access the gradio app with the full streaming capabilities, copy and paste this `gradio.live` URL into a new browser tab.
 
-Interactive knowledge base demo. Access the gradio interface in your browser (see forwarded ports.)
+For all reference implementations, to exit, press "Ctrl/Control-C" and wait up to ten seconds. If you are a Mac user, you should use "Control-C" and not "Command-C". Please note that by default, the gradio web app reloads automatically as you edit the Python script. There is no need to manually stop and restart the program each time you make some code changes.
 
-```bash
-uv run --env-file .env -m src.1_basics.0_search_demo.gradio
+You might see warning messages like the following:
+
+```json
+ERROR:openai.agents:[non-fatal] Tracing client error 401: {
+  "error": {
+    "message": "Incorrect API key provided. You can find your API key at https://platform.openai.com/account/api-keys.",
+    "type": "invalid_request_error",
+    "param": null,
+    "code": "invalid_api_key"
+  }
+}
 ```
 
-Basic Reason-and-Act Agent- command line version. To exit, press `Control-\`.
+These warnings can be safely ignored, as they are the result of a bug in the upstream libraries. Your agent traces will be uploaded to LangFuse as configured.
+
+### 1. Basics
+
+Interactive knowledge base demo. Access the gradio interface in your browser to see if your knowledge base meets your expectations.
 
 ```bash
-uv run --env-file .env -m src.1_basics.1_react_rag.main
+PYTHONPATH="." uv run --env-file .env gradio src/1_basics/0_search_demo/app.py
 ```
 
-Interactive web version of the Gradio Reason-and-Act Agent.
+Basic Reason-and-Act Agent- for demo purposes only.
+
+As noted above, these are unnecessarily verbose for real applications.
 
 ```bash
-uv run --env-file .env -m src.1_basics.1_react_rag.gradio
+# PYTHONPATH="." uv run --env-file .env src/1_basics/1_react_rag/cli.py
+# PYTHONPATH="." uv run --env-file .env gradio src/1_basics/1_react_rag/app.py
 ```
 
 
@@ -80,30 +97,28 @@ uv run --env-file .env -m src.1_basics.1_react_rag.gradio
 Reason-and-Act Agent without the boilerplate- using the OpenAI Agent SDK.
 
 ```bash
-uv run --env-file .env -m src.2_frameworks.1_react_rag.basic
-uv run --env-file .env -m src.2_frameworks.1_react_rag.gradio
-uv run --env-file .env -m src.2_frameworks.1_react_rag.langfuse_gradio
+PYTHONPATH="." uv run --env-file .env src/2_frameworks/1_react_rag/cli.py
+PYTHONPATH="." uv run --env-file .env gradio src/2_frameworks/1_react_rag/langfuse_gradio.py
 ```
 
 Multi-agent examples, also via the OpenAI Agent SDK.
 
 ```bash
-uv run --env-file .env \
--m src.2_frameworks.2_multi_agent.planner_worker_gradio
+PYTHONPATH="." uv run --env-file .env gradio src/2_frameworks/2_multi_agent/efficient.py
+# Verbose option- greater control over the agent flow, but less flexible.
+# PYTHONPATH="." uv run --env-file .env gradio src/2_frameworks/2_multi_agent/verbose.py
 ```
 
-Python Code Interpreter demo- using the OpenAI Agent SDK, E2B for secure code sandbox, and LangFuse for observability.
+Python Code Interpreter demo- using the OpenAI Agent SDK, E2B for secure code sandbox, and LangFuse for observability. Refer to [src/2_frameworks/3_code_interpreter/README.md](src/2_frameworks/3_code_interpreter/README.md) for details.
 
-```bash
-uv run --env-file .env -m src.2_frameworks.code_interpreter_gradio
-```
 
 ### 3. Evals
 
 Synthetic data.
 
 ```bash
-uv run -m src.3_evals.2_synthetic_data.synthesize_data \
+uv run --env-file .env \
+-m src.3_evals.2_synthetic_data.synthesize_data \
 --source_dataset hf://vector-institute/hotpotqa@d997ecf:train \
 --langfuse_dataset_name search-dataset-synthetic-20250609 \
 --limit 18
@@ -132,7 +147,7 @@ Visualize embedding diversity of synthetic data
 ```bash
 uv run \
 --env-file .env \
--m src.3_evals.2_synthetic_data.gradio_visualize_diversity
+gradio src/3_evals/2_synthetic_data/gradio_visualize_diversity.py
 ```
 
 Run LLM-as-a-judge Evaluation on synthetic data
@@ -149,6 +164,7 @@ uv run \
 ## Requirements
 
 - Python 3.12+
+- API keys as configured in `.env`.
 
 ### Tidbit
 

diff --git a/pyproject.toml b/pyproject.toml
@@ -12,7 +12,7 @@ dependencies = [
     "beautifulsoup4>=4.13.4",
     "datasets>=3.6.0",
     "e2b-code-interpreter>=1.5.2",
-    "gradio>=5.35.0",
+    "gradio>=5.37.0",
     "langfuse>=3.1.3",
     "lxml>=6.0.0",
     "nest-asyncio>=1.6.0",

diff --git a/sandbox_content/.gitkeep b/sandbox_content/.gitkeep
diff --git a/src/1_basics/0_search_demo/README.md b/src/1_basics/0_search_demo/README.md
@@ -4,7 +4,4 @@ This folder contains logic for showcasing the capabilities (and limitations) of
 
 Format of the output is similar to what the Agent LLM will receive as tool output.
 
-```bash
-source .env && \
-uv run -m src.1_basics.0_search_demo.gradio
-```
+Refer to the README.md file under project root for instructions.
diff --git a/src/1_basics/0_search_demo/gradio.py → src/1_basics/0_search_demo/app.py b/src/1_basics/0_search_demo/gradio.py → src/1_basics/0_search_demo/app.py
@@ -71,4 +71,4 @@ async def search_and_pretty_format(keyword: str) -> str:
     ],
 )
 
-demo.launch(server_name="0.0.0.0")
+demo.launch(share=True)
diff --git a/src/1_basics/1_react_rag/README.md b/src/1_basics/1_react_rag/README.md
@@ -4,8 +4,4 @@ This folder contains an example of a basic Reason-and-Act (ReAct) agent for know
 
 ## Run
 
-```bash
-uv run -m src.1_basics.1_react_rag.manual_tools
-uv run -m src.1_basics.1_react_rag.main
-uv run -m src.1_basics.1_react_rag.gradio
-```
+Refer to the README.md file under project root for instructions.
diff --git a/src/1_basics/1_react_rag/gradio.py → src/1_basics/1_react_rag/app.py b/src/1_basics/1_react_rag/gradio.py → src/1_basics/1_react_rag/app.py
@@ -159,6 +159,6 @@ async def react_rag(query: str, history: list[ChatMessage]):
     signal.signal(signal.SIGINT, _handle_sigint)
 
     try:
-        demo.launch(server_name="0.0.0.0")
+        demo.launch(share=True)
     finally:
         asyncio.run(_cleanup_clients())
diff --git a/src/1_basics/1_react_rag/main.py → src/1_basics/1_react_rag/cli.py b/src/1_basics/1_react_rag/main.py → src/1_basics/1_react_rag/cli.py
diff --git a/src/2_frameworks/1_react_rag/README.md b/src/2_frameworks/1_react_rag/README.md
@@ -4,8 +4,4 @@ This folder reproduces the functionalities of section 1.1, but involve far less
 
 ## Run
 
-```bash
-uv run -m src.2_frameworks.1_react_rag.basic
-uv run -m src.2_frameworks.1_react_rag.gradio
-uv run -m src.2_frameworks.1_react_rag.langfuse_gradio
-```
+Refer to the README.md file under project root for instructions.
diff --git a/src/2_frameworks/1_react_rag/gradio.py → src/2_frameworks/1_react_rag/app.py b/src/2_frameworks/1_react_rag/gradio.py → src/2_frameworks/1_react_rag/app.py
@@ -109,6 +109,6 @@ async def _main(question: str, gr_messages: list[ChatMessage]):
     signal.signal(signal.SIGINT, _handle_sigint)
 
     try:
-        demo.launch(server_name="0.0.0.0")
+        demo.launch(share=True)
     finally:
         asyncio.run(_cleanup_clients())
diff --git a/src/2_frameworks/1_react_rag/basic.py → src/2_frameworks/1_react_rag/cli.py b/src/2_frameworks/1_react_rag/basic.py → src/2_frameworks/1_react_rag/cli.py
diff --git a/src/2_frameworks/1_react_rag/langfuse_gradio.py b/src/2_frameworks/1_react_rag/langfuse_gradio.py
@@ -108,6 +108,6 @@ async def _main(question: str, gr_messages: list[ChatMessage]):
     signal.signal(signal.SIGINT, _handle_sigint)
 
     try:
-        demo.launch(server_name="0.0.0.0")
+        demo.launch(share=True)
     finally:
         asyncio.run(_cleanup_clients())
diff --git a/src/2_frameworks/2_multi_agent/README.md b/src/2_frameworks/2_multi_agent/README.md
@@ -5,8 +5,8 @@ The planner agents take a user query and breaks it down into search queries for
 each performs a search for each query by calling a search tool. The writer agent then sythesizes the search results into
 a summary that is presented to the user.
 
-## Run
+## "Efficient" or "Verbose"?
 
-```bash
-uv run -m src.2_frameworks.2_multi_agent.gradio
-```
+"Efficient" variant- recommended starting point.
+
+"Verbose" variant- only if you need fine-grained control over the behavior of the agent. Beware that this implementation is more complex and reduces the agency and flexibility of your agent system.
diff --git a/...ks/2_multi_agent/planner_worker_gradio.py → src/2_frameworks/2_multi_agent/efficient.py b/...ks/2_multi_agent/planner_worker_gradio.py → src/2_frameworks/2_multi_agent/efficient.py
@@ -140,6 +140,6 @@ async def _main(question: str, gr_messages: list[ChatMessage]):
     signal.signal(signal.SIGINT, _handle_sigint)
 
     try:
-        demo.launch(server_name="0.0.0.0")
+        demo.launch(share=True)
     finally:
         asyncio.run(_cleanup_clients())
diff --git a/src/2_frameworks/2_multi_agent/gradio.py → src/2_frameworks/2_multi_agent/verbose.py b/src/2_frameworks/2_multi_agent/gradio.py → src/2_frameworks/2_multi_agent/verbose.py
@@ -1,5 +1,9 @@
 """Multi-agent Planner-Researcher Setup via OpenAI Agents SDK.
 
+Note: this implementation does not unlock the full potential and flexibility
+of LLM agents. Use this reference implementation only if your use case requires
+the additional structures, and you are okay with the additional complexities.
+
 Log traces to LangFuse for observability and evaluation.
 """
 

diff --git a/src/2_frameworks/3_code_interpreter/README.md b/src/2_frameworks/3_code_interpreter/README.md
@@ -0,0 +1,12 @@
+# Code Interpreter Demo
+
+Prerequisites:
+
+- E2B account (free plan is enough.)
+- E2B API Key: `E2B_API_KEY=e2b_...` in your `.env` file.
+
+Run:
+
+```bash
+PYTHONPATH="." uv run --env-file .env gradio src/2_frameworks/3_code_interpreter/app.py
+```
diff --git a/src/2_frameworks/code_interpreter_gradio.py → src/2_frameworks/3_code_interpreter/app.py b/src/2_frameworks/code_interpreter_gradio.py → src/2_frameworks/3_code_interpreter/app.py
@@ -46,7 +46,10 @@
 AGENT_LLM_NAME = "gemini-2.5-flash"
 async_openai_client = AsyncOpenAI()
 code_interpreter = CodeInterpreter(
-    local_files=[Path("tests/tool_tests/example_files/example_a.csv")]
+    local_files=[
+        Path("sandbox_content/"),
+        Path("tests/tool_tests/example_files/example_a.csv"),
+    ]
 )
 
 
@@ -95,4 +98,4 @@ async def _main(question: str, gr_messages: list[ChatMessage]):
 
 
 if __name__ == "__main__":
-    demo.launch(server_name="0.0.0.0")
+    demo.launch(share=True)
diff --git a/src/2_frameworks/basic.py b/src/2_frameworks/basic.py
diff --git a/src/3_evals/2_synthetic_data/gradio_visualize_diversity.py b/src/3_evals/2_synthetic_data/gradio_visualize_diversity.py
@@ -4,7 +4,7 @@
 
 uv run \
 --env-file .env \
--m src.3_evals.2_synthetic_data.gradio_visualize_diversity
+gradio src.3_evals/2_synthetic_data/gradio_visualize_diversity.py
 """
 
 from typing import List

diff --git a/src/utils/tools/README.md b/src/utils/tools/README.md
@@ -4,5 +4,5 @@ This module contains various tools for LLM agents.
 
 ```bash
 # Tool for getting a list of recent news headlines from enwiki
-uv run -m src.utils.tools.news_events
+PYTHONPATH="." uv run --env-file .env python3 src/utils/tools/news_events.py
 ```
diff --git a/src/utils/tools/code_interpreter.py b/src/utils/tools/code_interpreter.py
@@ -1,5 +1,6 @@
 """Code interpreter tool."""
 
+import os
 from pathlib import Path
 from typing import Sequence
 
@@ -77,6 +78,29 @@ async def _upload_files(
     return list(remote_paths)
 
 
+def _enumerate_files(base_path: str | Path) -> list[Path]:
+    """
+    Recursively enumerate all files under a directory.
+
+    Args
+    ----
+        base_path: Path to the starting directory.
+            If input is a file, that file alone will be returned.
+
+    Returns
+    -------
+        list[str]: List of file paths.
+    """
+    if os.path.isfile(base_path):
+        return [Path(base_path)]
+
+    file_list = []
+    for root, _, files in os.walk(base_path):
+        for name in files:
+            file_list.append(Path(root) / name)
+    return file_list
+
+
 class CodeInterpreter:
     """Code Interpreter tool for the agent."""
 
@@ -94,12 +118,17 @@ def __init__(
         ----------
             local_files : list[pathlib.Path | str] | None
                 Optionally, specify a list of local files (as paths)
-                to upload to sandbox working directory.
+                to upload to sandbox working directory. Folders will be flattened.
             timeout_seconds : int
                 Limit executions to this duration.
         """
         self.timeout_seconds = timeout_seconds
-        self.local_files = local_files if local_files else []
+        self.local_files = []
+
+        # Recursively find files if the given path is a folder.
+        if local_files:
+            for _path in local_files:
+                self.local_files.extend(_enumerate_files(_path))
 
     async def run_code(self, code: str) -> str:
         """Run the given Python code in a sandbox environment.
-Original file line number
+Diff line change
@@ Expand Up / @@ -71,4 +71,4 @@ async def search_and_pretty_format(keyword: str) -> str: @@
         ],
     )
-    demo.launch(server_name="0.0.0.0")
+    demo.launch(share=True)