Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -25,3 +25,6 @@ wheels/

# Built documentations
site/

# Gradio "share=True" utils.
/.gradio
78 changes: 47 additions & 31 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,16 +8,6 @@ This is a collection of reference implementations for Vector Institute's **Agent

This repository includes several modules, each showcasing a different aspect of agent-based RAG systems:

**1. Basics: Reason-and-Act RAG**
A minimal Reason-and-Act (ReAct) agent for knowledge retrieval, implemented without any agent framework.

- **[1.0 Search Demo](src/1_basics/0_search_demo/README.md)**
A simple demo showing the capabilities (and limitations) of a knowledgebase search.


- **[1.1 ReAct Agent for RAG](src/1_basics/1_react_rag/README.md)**
Basic ReAct agent for step-by-step retrieval and answer generation.

**2. Frameworks: OpenAI Agents SDK**
Showcases the use of the OpenAI agents SDK to reduce boilerplate and improve readability.

Expand All @@ -26,7 +16,7 @@ A minimal Reason-and-Act (ReAct) agent for knowledge retrieval, implemented with
The use of langfuse for making the agent less of a black-box is also introduced in this module.

- **[2.2 Multi-agent Setup for Deep Research](src/2_frameworks/2_multi_agent/README.md)**
Demo of a multi-agent architecture with planner, researcher, and writer agents collaborating on complex queries.
Demo of a multi-agent architecture to improve efficiency on long-context inputs, reduce latency, and reduce LLM costs. Two versions are available- "efficient" and "verbose". For the build days, you should start from the "efficient" version as that provides greater flexibility and is easier to follow.

**3. Evals: Automated Evaluation Pipelines**
Contains scripts and utilities for evaluating agent performance using LLM-as-a-judge and synthetic data generation. Includes tools for uploading datasets, running evaluations, and integrating with [Langfuse](https://langfuse.com/) for traceability.
Expand All @@ -37,6 +27,17 @@ A minimal Reason-and-Act (ReAct) agent for knowledge retrieval, implemented with
- **[3.2 Evaluation on Synthetic Dataset](src/3_evals/2_synthetic_data/README.md)**
Showcases the generation of synthetic evaluation data for testing agents.

We also provide "basic" no-framework implementations. These are meant to showcase how agents work behind the scene and are excessively verbose in the implementation. You should not use these as the basis for real projects.

**1. Basics: Reason-and-Act RAG**
A minimal Reason-and-Act (ReAct) agent for knowledge retrieval, implemented without any agent framework.

- **[1.0 Search Demo](src/1_basics/0_search_demo/README.md)**
A simple demo showing the capabilities (and limitations) of a knowledgebase search.

- **[1.1 ReAct Agent for RAG](src/1_basics/1_react_rag/README.md)**
Basic ReAct agent for step-by-step retrieval and answer generation.


## Getting Started

Expand All @@ -49,29 +50,45 @@ cp -v .env.example .env
Run integration tests to validate that your API keys are set up correctly.

```bash
PYTHONPATH="." uv run pytest -sv tests/tool_tests/test_integration.py
PYTHONPATH="." uv run --env-file .env pytest -sv tests/tool_tests/test_integration.py
```

## Reference Implementations

### 1. Basics
For "Gradio App" reference implementations, running the script would print out a "public URL" ending in `gradio.live` (might take a few seconds to appear.) To access the gradio app with the full streaming capabilities, copy and paste this `gradio.live` URL into a new browser tab.

Interactive knowledge base demo. Access the gradio interface in your browser (see forwarded ports.)
For all reference implementations, to exit, press "Ctrl/Control-C" and wait up to ten seconds. If you are a Mac user, you should use "Control-C" and not "Command-C". Please note that by default, the gradio web app reloads automatically as you edit the Python script. There is no need to manually stop and restart the program each time you make some code changes.

```bash
uv run --env-file .env -m src.1_basics.0_search_demo.gradio
You might see warning messages like the following:

```json
ERROR:openai.agents:[non-fatal] Tracing client error 401: {
"error": {
"message": "Incorrect API key provided. You can find your API key at https://platform.openai.com/account/api-keys.",
"type": "invalid_request_error",
"param": null,
"code": "invalid_api_key"
}
}
```

Basic Reason-and-Act Agent- command line version. To exit, press `Control-\`.
These warnings can be safely ignored, as they are the result of a bug in the upstream libraries. Your agent traces will be uploaded to LangFuse as configured.

### 1. Basics

Interactive knowledge base demo. Access the gradio interface in your browser to see if your knowledge base meets your expectations.

```bash
uv run --env-file .env -m src.1_basics.1_react_rag.main
PYTHONPATH="." uv run --env-file .env gradio src/1_basics/0_search_demo/app.py
```

Interactive web version of the Gradio Reason-and-Act Agent.
Basic Reason-and-Act Agent- for demo purposes only.

As noted above, these are unnecessarily verbose for real applications.

```bash
uv run --env-file .env -m src.1_basics.1_react_rag.gradio
# PYTHONPATH="." uv run --env-file .env src/1_basics/1_react_rag/cli.py
# PYTHONPATH="." uv run --env-file .env gradio src/1_basics/1_react_rag/app.py
```


Expand All @@ -80,30 +97,28 @@ uv run --env-file .env -m src.1_basics.1_react_rag.gradio
Reason-and-Act Agent without the boilerplate- using the OpenAI Agent SDK.

```bash
uv run --env-file .env -m src.2_frameworks.1_react_rag.basic
uv run --env-file .env -m src.2_frameworks.1_react_rag.gradio
uv run --env-file .env -m src.2_frameworks.1_react_rag.langfuse_gradio
PYTHONPATH="." uv run --env-file .env src/2_frameworks/1_react_rag/cli.py
PYTHONPATH="." uv run --env-file .env gradio src/2_frameworks/1_react_rag/langfuse_gradio.py
```

Multi-agent examples, also via the OpenAI Agent SDK.

```bash
uv run --env-file .env \
-m src.2_frameworks.2_multi_agent.planner_worker_gradio
PYTHONPATH="." uv run --env-file .env gradio src/2_frameworks/2_multi_agent/efficient.py
# Verbose option- greater control over the agent flow, but less flexible.
# PYTHONPATH="." uv run --env-file .env gradio src/2_frameworks/2_multi_agent/verbose.py
```

Python Code Interpreter demo- using the OpenAI Agent SDK, E2B for secure code sandbox, and LangFuse for observability.
Python Code Interpreter demo- using the OpenAI Agent SDK, E2B for secure code sandbox, and LangFuse for observability. Refer to [src/2_frameworks/3_code_interpreter/README.md](src/2_frameworks/3_code_interpreter/README.md) for details.

```bash
uv run --env-file .env -m src.2_frameworks.code_interpreter_gradio
```

### 3. Evals

Synthetic data.

```bash
uv run -m src.3_evals.2_synthetic_data.synthesize_data \
uv run --env-file .env \
-m src.3_evals.2_synthetic_data.synthesize_data \
--source_dataset hf://vector-institute/hotpotqa@d997ecf:train \
--langfuse_dataset_name search-dataset-synthetic-20250609 \
--limit 18
Expand Down Expand Up @@ -132,7 +147,7 @@ Visualize embedding diversity of synthetic data
```bash
uv run \
--env-file .env \
-m src.3_evals.2_synthetic_data.gradio_visualize_diversity
gradio src/3_evals/2_synthetic_data/gradio_visualize_diversity.py
```

Run LLM-as-a-judge Evaluation on synthetic data
Expand All @@ -149,6 +164,7 @@ uv run \
## Requirements

- Python 3.12+
- API keys as configured in `.env`.

### Tidbit

Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ dependencies = [
"beautifulsoup4>=4.13.4",
"datasets>=3.6.0",
"e2b-code-interpreter>=1.5.2",
"gradio>=5.35.0",
"gradio>=5.37.0",
"langfuse>=3.1.3",
"lxml>=6.0.0",
"nest-asyncio>=1.6.0",
Expand Down
Empty file added sandbox_content/.gitkeep
Empty file.
5 changes: 1 addition & 4 deletions src/1_basics/0_search_demo/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,4 @@ This folder contains logic for showcasing the capabilities (and limitations) of

Format of the output is similar to what the Agent LLM will receive as tool output.

```bash
source .env && \
uv run -m src.1_basics.0_search_demo.gradio
```
Refer to the README.md file under project root for instructions.
Original file line number Diff line number Diff line change
Expand Up @@ -71,4 +71,4 @@ async def search_and_pretty_format(keyword: str) -> str:
],
)

demo.launch(server_name="0.0.0.0")
demo.launch(share=True)
6 changes: 1 addition & 5 deletions src/1_basics/1_react_rag/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,4 @@ This folder contains an example of a basic Reason-and-Act (ReAct) agent for know

## Run

```bash
uv run -m src.1_basics.1_react_rag.manual_tools
uv run -m src.1_basics.1_react_rag.main
uv run -m src.1_basics.1_react_rag.gradio
```
Refer to the README.md file under project root for instructions.
Original file line number Diff line number Diff line change
Expand Up @@ -159,6 +159,6 @@ async def react_rag(query: str, history: list[ChatMessage]):
signal.signal(signal.SIGINT, _handle_sigint)

try:
demo.launch(server_name="0.0.0.0")
demo.launch(share=True)
finally:
asyncio.run(_cleanup_clients())
File renamed without changes.
6 changes: 1 addition & 5 deletions src/2_frameworks/1_react_rag/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,4 @@ This folder reproduces the functionalities of section 1.1, but involve far less

## Run

```bash
uv run -m src.2_frameworks.1_react_rag.basic
uv run -m src.2_frameworks.1_react_rag.gradio
uv run -m src.2_frameworks.1_react_rag.langfuse_gradio
```
Refer to the README.md file under project root for instructions.
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,6 @@ async def _main(question: str, gr_messages: list[ChatMessage]):
signal.signal(signal.SIGINT, _handle_sigint)

try:
demo.launch(server_name="0.0.0.0")
demo.launch(share=True)
finally:
asyncio.run(_cleanup_clients())
2 changes: 1 addition & 1 deletion src/2_frameworks/1_react_rag/langfuse_gradio.py
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,6 @@ async def _main(question: str, gr_messages: list[ChatMessage]):
signal.signal(signal.SIGINT, _handle_sigint)

try:
demo.launch(server_name="0.0.0.0")
demo.launch(share=True)
finally:
asyncio.run(_cleanup_clients())
8 changes: 4 additions & 4 deletions src/2_frameworks/2_multi_agent/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@ The planner agents take a user query and breaks it down into search queries for
each performs a search for each query by calling a search tool. The writer agent then sythesizes the search results into
a summary that is presented to the user.

## Run
## "Efficient" or "Verbose"?

```bash
uv run -m src.2_frameworks.2_multi_agent.gradio
```
"Efficient" variant- recommended starting point.

"Verbose" variant- only if you need fine-grained control over the behavior of the agent. Beware that this implementation is more complex and reduces the agency and flexibility of your agent system.
Original file line number Diff line number Diff line change
Expand Up @@ -140,6 +140,6 @@ async def _main(question: str, gr_messages: list[ChatMessage]):
signal.signal(signal.SIGINT, _handle_sigint)

try:
demo.launch(server_name="0.0.0.0")
demo.launch(share=True)
finally:
asyncio.run(_cleanup_clients())
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
"""Multi-agent Planner-Researcher Setup via OpenAI Agents SDK.

Note: this implementation does not unlock the full potential and flexibility
of LLM agents. Use this reference implementation only if your use case requires
the additional structures, and you are okay with the additional complexities.

Log traces to LangFuse for observability and evaluation.
"""

Expand Down
12 changes: 12 additions & 0 deletions src/2_frameworks/3_code_interpreter/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Code Interpreter Demo

Prerequisites:

- E2B account (free plan is enough.)
- E2B API Key: `E2B_API_KEY=e2b_...` in your `.env` file.

Run:

```bash
PYTHONPATH="." uv run --env-file .env gradio src/2_frameworks/3_code_interpreter/app.py
```
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,10 @@
AGENT_LLM_NAME = "gemini-2.5-flash"
async_openai_client = AsyncOpenAI()
code_interpreter = CodeInterpreter(
local_files=[Path("tests/tool_tests/example_files/example_a.csv")]
local_files=[
Path("sandbox_content/"),
Path("tests/tool_tests/example_files/example_a.csv"),
]
)


Expand Down Expand Up @@ -95,4 +98,4 @@ async def _main(question: str, gr_messages: list[ChatMessage]):


if __name__ == "__main__":
demo.launch(server_name="0.0.0.0")
demo.launch(share=True)
1 change: 0 additions & 1 deletion src/2_frameworks/basic.py

This file was deleted.

2 changes: 1 addition & 1 deletion src/3_evals/2_synthetic_data/gradio_visualize_diversity.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

uv run \
--env-file .env \
-m src.3_evals.2_synthetic_data.gradio_visualize_diversity
gradio src.3_evals/2_synthetic_data/gradio_visualize_diversity.py
"""

from typing import List
Expand Down
2 changes: 1 addition & 1 deletion src/utils/tools/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,5 +4,5 @@ This module contains various tools for LLM agents.

```bash
# Tool for getting a list of recent news headlines from enwiki
uv run -m src.utils.tools.news_events
PYTHONPATH="." uv run --env-file .env python3 src/utils/tools/news_events.py
```
33 changes: 31 additions & 2 deletions src/utils/tools/code_interpreter.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
"""Code interpreter tool."""

import os
from pathlib import Path
from typing import Sequence

Expand Down Expand Up @@ -77,6 +78,29 @@ async def _upload_files(
return list(remote_paths)


def _enumerate_files(base_path: str | Path) -> list[Path]:
"""
Recursively enumerate all files under a directory.

Args
----
base_path: Path to the starting directory.
If input is a file, that file alone will be returned.

Returns
-------
list[str]: List of file paths.
"""
if os.path.isfile(base_path):
return [Path(base_path)]

file_list = []
for root, _, files in os.walk(base_path):
for name in files:
file_list.append(Path(root) / name)
return file_list


class CodeInterpreter:
"""Code Interpreter tool for the agent."""

Expand All @@ -94,12 +118,17 @@ def __init__(
----------
local_files : list[pathlib.Path | str] | None
Optionally, specify a list of local files (as paths)
to upload to sandbox working directory.
to upload to sandbox working directory. Folders will be flattened.
timeout_seconds : int
Limit executions to this duration.
"""
self.timeout_seconds = timeout_seconds
self.local_files = local_files if local_files else []
self.local_files = []

# Recursively find files if the given path is a folder.
if local_files:
for _path in local_files:
self.local_files.extend(_enumerate_files(_path))

async def run_code(self, code: str) -> str:
"""Run the given Python code in a sandbox environment.
Expand Down
Loading