Debug AI agents like systems, not black boxes.
RunLens helps you understand why an agent run failed or became expensive — and what to fix. It captures runs step by step, shows cost per step, and lets you compare two runs side by side to see exactly what changed.
When an agent fails or costs too much, existing tools show you what happened — logs, traces, token counts. But they don't tell you why this run was different from the one that worked yesterday, or which specific decision made it 3x more expensive.
RunLens captures a full snapshot of every run — model, prompt version, tools, config — and lets you compare any two runs side by side. You see exactly what changed and where the cost went.
your agent → RunLens SDK → RunLens API → RunLens UI
You add 3 function calls to your agent. RunLens records every step, stores it, and lets you compare runs visually.
pip install runlens-sdkfrom runlens import start_run, record_step, end_run
# Start a run — capture your execution context
run = start_run(
task="answer customer question",
context={
"model": "gpt-4o",
"prompt_version": "v2",
"tools": ["search", "calculator"],
"temperature": 0.7,
},
api_url="https://runlens-api.onrender.com",
)
# Record each step — cost is calculated automatically for known models,
# or pass cost= explicitly for custom models.
record_step(
run_id=run.id,
step_type="llm_call",
name="classify intent",
input={"prompt": "..."},
output={"intent": "refund_request"},
model="gpt-4o",
tokens=150,
)
# End the run
end_run(run.id)Go to runlens-api.onrender.com. Select two runs and click Compare.
When you compare two runs, RunLens shows:
Context diff — what was different between the two runs:
| Key | Run A | Run B |
|---|---|---|
| model | gpt-4o | gpt-4o-mini |
| prompt_version | v1 | v2 |
| temperature | 0.7 | 0.3 |
Summary — steps, cost, tokens, duration delta at a glance.
Step diff — side by side steps, with extra/missing steps flagged in red.
See a concrete example: a support bot running the same task twice — once over-engineered, once lean. 5x cost difference, same output.
pip install runlens-sdk requests
RUNLENS_API=https://runlens-api.onrender.com python examples/demo_agent.pyThen go to runlens-api.onrender.com, select both runs, and click Compare.
runlens/
├── apps/
│ ├── api/ — FastAPI backend (SQLite)
│ └── web/ — Frontend (plain HTML + JS)
├── packages/
│ └── sdk-python/ — Python SDK
├── examples/
│ └── demo_agent.py — Demo: support bot comparison
└── CLAUDE.md — Project brief for AI-assisted development
Starts a new run. Returns a RunHandle with an .id.
task— short description of what the agent is doingcontext— dict with execution state: model, prompt version, tools, etc.api_url— RunLens API base URL (optional, streams data to API)storage_path— local JSON file path (optional, saves data locally)
record_step(run_id, step_type, input, output, cost=0.0, tokens=0, model=None, name=None, duration_ms=None)
Records a single step within a run.
step_type— e.g."llm_call","tool_call","retrieval"cost— cost in USD for this steptokens— token count for this step
Ends the run and returns the complete run record.
The API is a standard FastAPI app. Deploy anywhere that runs Python:
uvicorn main:app --host 0.0.0.0 --port $PORTUses SQLite by default. Set DATABASE_URL environment variable to use a different database.
- SDK — pure Python, zero dependencies
- Backend — FastAPI + SQLModel + SQLite
- Frontend — plain HTML + vanilla JS, no framework
MIT