Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions .claude-plugin/marketplace.json
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{
"$schema": "https://anthropic.com/claude-code/marketplace.schema.json",
"name": "braintrust-claude-plugin",
"version": "1.2.0",
"version": "1.3.0",
"description": "Braintrust plugins for LLM evaluation, logging, and observability",
"owner": {
"name": "Braintrust",
Expand All @@ -12,14 +12,14 @@
"name": "braintrust",
"description": "Enables AI agents to use Braintrust for LLM evaluation, logging, and observability. Provides correct API usage, working examples, and helper scripts.",
"version": "1.1.0",
"source": "./",
"source": "./plugins/braintrust",
"category": "development"
},
{
"name": "trace-claude-code",
"description": "Automatically trace Claude Code conversations to Braintrust. Captures user messages, assistant responses, and tool calls for observability.",
"version": "1.0.0",
"source": "./skills/trace-claude-code",
"version": "1.1.0",
"source": "./plugins/trace-claude-code",
"category": "observability"
}
]
Expand Down
29 changes: 29 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,34 @@
# Agent guidelines

## About this repository

This is the **Braintrust Claude Code plugin marketplace** - a repository that distributes Claude Code plugins for Braintrust integration.

### Structure

```
claude-plugin/
├── .claude-plugin/
│ └── marketplace.json # Marketplace catalog (lists available plugins)
├── plugins/
│ ├── braintrust/ # Plugin: Braintrust evaluation & logging
│ └── trace-claude-code/ # Plugin: Session tracing to Braintrust
└── evals/ # Evaluation suite for testing the plugins
```

### Plugins

| Plugin | Description |
|--------|-------------|
| `braintrust` | Enables AI agents to use Braintrust for LLM evaluation, logging, and observability. Includes MCP server config and the `troubleshoot-braintrust-mcp` skill. |
| `trace-claude-code` | Automatically traces Claude Code conversations to Braintrust. Uses hooks to capture sessions, turns, and tool calls. |

### Terminology

- **Marketplace**: A repository with a `marketplace.json` that catalogs multiple plugins for distribution
- **Plugin**: An installable unit with its own `.claude-plugin/plugin.json` manifest
- **Skill**: A capability within a plugin (e.g., `troubleshoot-braintrust-mcp` is a skill in the `braintrust` plugin)

## Style conventions

- Use sentence case for all text (capitalize first word only, except for proper nouns and code references)
Expand Down
49 changes: 49 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# Development of the plugin itself

## Prerequisites

- Python 3.12+
- [uv](https://docs.astral.sh/uv/) package manager

## Local testing

Test a plugin without installing from marketplace:

```bash
claude --plugin-dir /path/to/thisrepo/plugins/{plugin dir here}
# example
claude --plugin-dir /path/to/thisrepo/plugins/braintrust
```

## Running evals

The `evals/` directory contains tests that verify the plugin works correctly (e.g., Claude generates valid SQL queries, logs data properly).

```bash
cd evals
export BRAINTRUST_API_KEY="your-key"

# Run all evals
uv run braintrust eval .

# Run specific eval
uv run braintrust eval eval_e2e_log_fetch.py
```

## Pre-commit hooks

```bash
# Install hooks
uv run pre-commit install

# Run all hooks
uv run pre-commit run --all-files
```

# Updating the plugin

After making changes:

1. Bump version in `.claude-plugin/plugin.json` and `.claude-plugin/marketplace.json`
2. Commit and push
3. Users update with: `claude plugin marketplace update braintrust-claude-plugin`
173 changes: 27 additions & 146 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,171 +1,52 @@
# Braintrust Claude plugins
# Braintrust Claude Code Marketplace

Claude Code plugins for Braintrust - LLM evaluation, logging, observability, and tracing.
A Claude Code plugin marketplace for [Braintrust](https://braintrust.dev) integration - LLM evaluation, logging, observability, and session tracing.

## Plugins

### 1. Braintrust (evaluation & logging)

Enables AI agents to use Braintrust for LLM evaluation, logging, and observability.

```bash
claude plugin marketplace add braintrustdata/braintrust-claude-plugin
claude plugin install braintrust@braintrust-claude-plugin
```

### 2. Trace Claude Code (observability)

Automatically trace Claude Code conversations to Braintrust.

```bash
claude plugin install trace-claude-code@braintrust-claude-plugin
```
## Prerequisites

See [trace-claude-code/SKILL.md](skills/trace-claude-code/SKILL.md) for setup instructions.
- A [Braintrust account](https://braintrust.dev)
- `BRAINTRUST_API_KEY` exported in your environment

## Agent skills
## Installation

This repo includes skills built on the open [Agent Skills](https://agentskills.io/home) format, compatible with Claude Code, Cursor, Amp, and other agents.
Add the marketplace:

**Install all skills:**
```bash
curl -sL https://github.com/braintrustdata/braintrust-claude-plugin/archive/main.tar.gz | tar -xz -C ~/.claude/skills --strip-components=2 braintrust-claude-plugin-main/skills
```

Available skills:
- [using-braintrust](skills/using-braintrust/SKILL.md) - Evaluation, logging, and SQL queries
- [trace-claude-code](skills/trace-claude-code/SKILL.md) - Automatic conversation tracing

## Setup

Create a `.env` file in your project directory:

```
BRAINTRUST_API_KEY=your-api-key-here
```

The plugin scripts automatically load `.env` files from the current directory or parent directories.

## What the plugin provides

### Scripts

The plugin includes ready-to-use scripts for common operations:

**Query logs with SQL:**
```bash
uv run query_logs.py --project "My Project" --query "SELECT count(*) as count FROM logs WHERE created > now() - interval 1 day"
```

**Log data:**
```bash
uv run log_data.py --project "My Project" --input "hello" --output "world"
```

**Run evaluations:**
```bash
uv run run_eval.py --project "My Project" --data '[{"input": "test", "expected": "test"}]'
```

### SDK patterns

The skill teaches Claude how to use the Braintrust SDK correctly:

```python
# Correct Eval() usage - project name is FIRST POSITIONAL arg
braintrust.Eval(
"My Project", # NOT project_name="My Project"
data=lambda: [...],
task=lambda input: ...,
scores=[Factuality],
)

# Logging with flush
logger = braintrust.init_logger(project="My Project")
logger.log(input="hello", output="world")
logger.flush() # Important!
```

### SQL query syntax

The skill teaches Claude to write SQL queries for Braintrust logs:

```sql
SELECT input, output, created FROM logs WHERE created > now() - interval 1 day LIMIT 10
```

**SQL quirks in Braintrust:**
- Use `hour()`, `day()`, `month()`, `year()` instead of `date_trunc()`
- Intervals use format `interval 1 day` (no quotes, singular unit)

## Project structure

```
braintrust-claude-plugin/
├── .claude-plugin/
│ ├── plugin.json # Plugin manifest
│ └── marketplace.json # Marketplace index
├── skills/
│ ├── using-braintrust/
│ │ ├── SKILL.md # Evaluation & logging skill
│ │ └── scripts/ # Helper scripts
│ │ ├── query_logs.py
│ │ ├── log_data.py
│ │ └── run_eval.py
│ └── trace-claude-code/
│ ├── SKILL.md # Claude Code tracing skill
│ └── hooks/
│ └── stop_hook.sh # Hook script
├── evals/ # Evaluation suite
│ ├── eval_e2e_*.py # End-to-end tests
│ └── eval_*.py # Baseline tests
└── README.md
claude plugin marketplace add braintrustdata/braintrust-claude-plugin
```

## Development
Then install the plugins you need:

### Prerequisites
## Plugins

- Python 3.12+
- [uv](https://docs.astral.sh/uv/) package manager
### braintrust

### Local testing
Enables AI agents to use Braintrust for LLM evaluation, logging, and observability.

Test the plugin without installing from marketplace:
- Query Braintrust projects, experiments, datasets, and logs
- Instrument your code with the Braintrust SDK and write evals

```bash
claude --plugin-dir /path/to/braintrust-claude-plugin
claude plugin install braintrust@braintrust-claude-plugin
```

### Running evals
### trace-claude-code

The `evals/` directory contains tests that verify the skill works correctly (e.g., Claude generates valid SQL queries, logs data properly).
Automatically traces Claude Code conversations to Braintrust. Captures sessions, conversation turns, and tool calls as hierarchical traces.

```bash
cd evals
export BRAINTRUST_API_KEY="your-key"

# Run all evals
uv run braintrust eval .

# Run specific eval
uv run braintrust eval eval_e2e_log_fetch.py
claude plugin install trace-claude-code@braintrust-claude-plugin
```

### Pre-commit hooks
To enable tracing, add the following to your `~/.claude/settings.json` or your project's `.claude/settings.local.json`:

```bash
# Install hooks
uv run pre-commit install

# Run all hooks
uv run pre-commit run --all-files
```json
{
"env": {
"TRACE_TO_BRAINTRUST": "true",
"BRAINTRUST_CC_PROJECT": "project-name-to-send-cc-traces-to"
}
}
```

## Updating the plugin

After making changes:

1. Bump version in `.claude-plugin/plugin.json` and `.claude-plugin/marketplace.json`
2. Commit and push
3. Users update with: `claude plugin marketplace update braintrust-claude-plugin`
Traces are sent to the `claude-code` project by default.
1 change: 0 additions & 1 deletion evals/eval_datasets.py
Original file line number Diff line number Diff line change
Expand Up @@ -142,7 +142,6 @@ def baseline_task(input_str):
scores=[criteria_scorer],
metadata={
"description": "Tests agent's ability to create and manage Braintrust datasets",
"skill": "using-braintrust",
"category": "datasets",
},
)
1 change: 0 additions & 1 deletion evals/eval_docs_search.py
Original file line number Diff line number Diff line change
Expand Up @@ -159,7 +159,6 @@ def baseline_task(input: str) -> str:
scores=[criteria_scorer],
metadata={
"description": "Tests agent's ability to answer Braintrust documentation questions",
"skill": "using-braintrust",
"category": "docs_search",
},
)
Loading