braintrustdata · realark · Feb 2, 2026 · Jan 20, 2026
diff --git a/.claude-plugin/marketplace.json b/.claude-plugin/marketplace.json
@@ -1,7 +1,7 @@
 {
   "$schema": "https://anthropic.com/claude-code/marketplace.schema.json",
   "name": "braintrust-claude-plugin",
-  "version": "1.2.0",
+  "version": "1.3.0",
   "description": "Braintrust plugins for LLM evaluation, logging, and observability",
   "owner": {
     "name": "Braintrust",
@@ -12,14 +12,14 @@
       "name": "braintrust",
       "description": "Enables AI agents to use Braintrust for LLM evaluation, logging, and observability. Provides correct API usage, working examples, and helper scripts.",
       "version": "1.1.0",
-      "source": "./",
+      "source": "./plugins/braintrust",
       "category": "development"
     },
     {
       "name": "trace-claude-code",
       "description": "Automatically trace Claude Code conversations to Braintrust. Captures user messages, assistant responses, and tool calls for observability.",
-      "version": "1.0.0",
-      "source": "./skills/trace-claude-code",
+      "version": "1.1.0",
+      "source": "./plugins/trace-claude-code",
       "category": "observability"
     }
   ]

diff --git a/AGENTS.md b/AGENTS.md
@@ -1,5 +1,34 @@
 # Agent guidelines
 
+## About this repository
+
+This is the **Braintrust Claude Code plugin marketplace** - a repository that distributes Claude Code plugins for Braintrust integration.
+
+### Structure
+
+```
+claude-plugin/
+├── .claude-plugin/
+│   └── marketplace.json      # Marketplace catalog (lists available plugins)
+├── plugins/
+│   ├── braintrust/           # Plugin: Braintrust evaluation & logging
+│   └── trace-claude-code/    # Plugin: Session tracing to Braintrust
+└── evals/                    # Evaluation suite for testing the plugins
+```
+
+### Plugins
+
+| Plugin | Description |
+|--------|-------------|
+| `braintrust` | Enables AI agents to use Braintrust for LLM evaluation, logging, and observability. Includes MCP server config and the `troubleshoot-braintrust-mcp` skill. |
+| `trace-claude-code` | Automatically traces Claude Code conversations to Braintrust. Uses hooks to capture sessions, turns, and tool calls. |
+
+### Terminology
+
+- **Marketplace**: A repository with a `marketplace.json` that catalogs multiple plugins for distribution
+- **Plugin**: An installable unit with its own `.claude-plugin/plugin.json` manifest
+- **Skill**: A capability within a plugin (e.g., `troubleshoot-braintrust-mcp` is a skill in the `braintrust` plugin)
+
 ## Style conventions
 
 - Use sentence case for all text (capitalize first word only, except for proper nouns and code references)

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -0,0 +1,49 @@
+# Development of the plugin itself
+
+## Prerequisites
+
+- Python 3.12+
+- [uv](https://docs.astral.sh/uv/) package manager
+
+## Local testing
+
+Test a plugin without installing from marketplace:
+
+```bash
+claude --plugin-dir /path/to/thisrepo/plugins/{plugin dir here}
+# example
+claude --plugin-dir /path/to/thisrepo/plugins/braintrust
+```
+
+## Running evals
+
+The `evals/` directory contains tests that verify the plugin works correctly (e.g., Claude generates valid SQL queries, logs data properly).
+
+```bash
+cd evals
+export BRAINTRUST_API_KEY="your-key"
+
+# Run all evals
+uv run braintrust eval .
+
+# Run specific eval
+uv run braintrust eval eval_e2e_log_fetch.py
+```
+
+## Pre-commit hooks
+
+```bash
+# Install hooks
+uv run pre-commit install
+
+# Run all hooks
+uv run pre-commit run --all-files
+```
+
+# Updating the plugin
+
+After making changes:
+
+1. Bump version in `.claude-plugin/plugin.json` and `.claude-plugin/marketplace.json`
+2. Commit and push
+3. Users update with: `claude plugin marketplace update braintrust-claude-plugin`
diff --git a/README.md b/README.md
@@ -1,171 +1,52 @@
-# Braintrust Claude plugins
+# Braintrust Claude Code Marketplace
 
-Claude Code plugins for Braintrust - LLM evaluation, logging, observability, and tracing.
+A Claude Code plugin marketplace for [Braintrust](https://braintrust.dev) integration - LLM evaluation, logging, observability, and session tracing.
 
-## Plugins
-
-### 1. Braintrust (evaluation & logging)
-
-Enables AI agents to use Braintrust for LLM evaluation, logging, and observability.
-
-```bash
-claude plugin marketplace add braintrustdata/braintrust-claude-plugin
-claude plugin install braintrust@braintrust-claude-plugin
-```
-
-### 2. Trace Claude Code (observability)
-
-Automatically trace Claude Code conversations to Braintrust.
-
-```bash
-claude plugin install trace-claude-code@braintrust-claude-plugin
-```
+## Prerequisites
 
-See [trace-claude-code/SKILL.md](skills/trace-claude-code/SKILL.md) for setup instructions.
+- A [Braintrust account](https://braintrust.dev)
+- `BRAINTRUST_API_KEY` exported in your environment
 
-## Agent skills
+## Installation
 
-This repo includes skills built on the open [Agent Skills](https://agentskills.io/home) format, compatible with Claude Code, Cursor, Amp, and other agents.
+Add the marketplace:
 
-**Install all skills:**
 ```bash
-curl -sL https://github.com/braintrustdata/braintrust-claude-plugin/archive/main.tar.gz | tar -xz -C ~/.claude/skills --strip-components=2 braintrust-claude-plugin-main/skills
-```
-
-Available skills:
-- [using-braintrust](skills/using-braintrust/SKILL.md) - Evaluation, logging, and SQL queries
-- [trace-claude-code](skills/trace-claude-code/SKILL.md) - Automatic conversation tracing
-
-## Setup
-
-Create a `.env` file in your project directory:
-
-```
-BRAINTRUST_API_KEY=your-api-key-here
-```
-
-The plugin scripts automatically load `.env` files from the current directory or parent directories.
-
-## What the plugin provides
-
-### Scripts
-
-The plugin includes ready-to-use scripts for common operations:
-
-**Query logs with SQL:**
-```bash
-uv run query_logs.py --project "My Project" --query "SELECT count(*) as count FROM logs WHERE created > now() - interval 1 day"
-```
-
-**Log data:**
-```bash
-uv run log_data.py --project "My Project" --input "hello" --output "world"
-```
-
-**Run evaluations:**
-```bash
-uv run run_eval.py --project "My Project" --data '[{"input": "test", "expected": "test"}]'
-```
-
-### SDK patterns
-
-The skill teaches Claude how to use the Braintrust SDK correctly:
-
-```python
-# Correct Eval() usage - project name is FIRST POSITIONAL arg
-braintrust.Eval(
-    "My Project",  # NOT project_name="My Project"
-    data=lambda: [...],
-    task=lambda input: ...,
-    scores=[Factuality],
-)
-
-# Logging with flush
-logger = braintrust.init_logger(project="My Project")
-logger.log(input="hello", output="world")
-logger.flush()  # Important!
-```
-
-### SQL query syntax
-
-The skill teaches Claude to write SQL queries for Braintrust logs:
-
-```sql
-SELECT input, output, created FROM logs WHERE created > now() - interval 1 day LIMIT 10
-```
-
-**SQL quirks in Braintrust:**
-- Use `hour()`, `day()`, `month()`, `year()` instead of `date_trunc()`
-- Intervals use format `interval 1 day` (no quotes, singular unit)
-
-## Project structure
-
-```
-braintrust-claude-plugin/
-├── .claude-plugin/
-│   ├── plugin.json         # Plugin manifest
-│   └── marketplace.json    # Marketplace index
-├── skills/
-│   ├── using-braintrust/
-│   │   ├── SKILL.md        # Evaluation & logging skill
-│   │   └── scripts/        # Helper scripts
-│   │       ├── query_logs.py
-│   │       ├── log_data.py
-│   │       └── run_eval.py
-│   └── trace-claude-code/
-│       ├── SKILL.md        # Claude Code tracing skill
-│       └── hooks/
-│           └── stop_hook.sh  # Hook script
-├── evals/                  # Evaluation suite
-│   ├── eval_e2e_*.py       # End-to-end tests
-│   └── eval_*.py           # Baseline tests
-└── README.md
+claude plugin marketplace add braintrustdata/braintrust-claude-plugin
 ```
 
-## Development
+Then install the plugins you need:
 
-### Prerequisites
+## Plugins
 
-- Python 3.12+
-- [uv](https://docs.astral.sh/uv/) package manager
+### braintrust
 
-### Local testing
+Enables AI agents to use Braintrust for LLM evaluation, logging, and observability.
 
-Test the plugin without installing from marketplace:
+- Query Braintrust projects, experiments, datasets, and logs
+- Instrument your code with the Braintrust SDK and write evals
 
 ```bash
-claude --plugin-dir /path/to/braintrust-claude-plugin
+claude plugin install braintrust@braintrust-claude-plugin
 ```
 
-### Running evals
+### trace-claude-code
 
-The `evals/` directory contains tests that verify the skill works correctly (e.g., Claude generates valid SQL queries, logs data properly).
+Automatically traces Claude Code conversations to Braintrust. Captures sessions, conversation turns, and tool calls as hierarchical traces.
 
 ```bash
-cd evals
-export BRAINTRUST_API_KEY="your-key"
-
-# Run all evals
-uv run braintrust eval .
-
-# Run specific eval
-uv run braintrust eval eval_e2e_log_fetch.py
+claude plugin install trace-claude-code@braintrust-claude-plugin
 ```
 
-### Pre-commit hooks
+To enable tracing, add the following to your `~/.claude/settings.json` or your project's `.claude/settings.local.json`:
 
-```bash
-# Install hooks
-uv run pre-commit install
-
-# Run all hooks
-uv run pre-commit run --all-files
+```json
+{
+  "env": {
+    "TRACE_TO_BRAINTRUST": "true",
+    "BRAINTRUST_CC_PROJECT": "project-name-to-send-cc-traces-to"
+  }
+}
 ```
 
-## Updating the plugin
-
-After making changes:
-
-1. Bump version in `.claude-plugin/plugin.json` and `.claude-plugin/marketplace.json`
-2. Commit and push
-3. Users update with: `claude plugin marketplace update braintrust-claude-plugin`
+Traces are sent to the `claude-code` project by default.
diff --git a/evals/eval_datasets.py b/evals/eval_datasets.py
@@ -142,7 +142,6 @@ def baseline_task(input_str):
     scores=[criteria_scorer],
     metadata={
         "description": "Tests agent's ability to create and manage Braintrust datasets",
-        "skill": "using-braintrust",
         "category": "datasets",
     },
 )
diff --git a/evals/eval_docs_search.py b/evals/eval_docs_search.py
@@ -159,7 +159,6 @@ def baseline_task(input: str) -> str:
     scores=[criteria_scorer],
     metadata={
         "description": "Tests agent's ability to answer Braintrust documentation questions",
-        "skill": "using-braintrust",
         "category": "docs_search",
     },
 )