Skip to content

mda-diaz/runlens

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

57 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RunLens

Debug AI agents like systems, not black boxes.

RunLens helps you understand why an agent run failed or became expensive — and what to fix. It captures runs step by step, shows cost per step, and lets you compare two runs side by side to see exactly what changed.

👉 Live demo


The problem

When an agent fails or costs too much, existing tools show you what happened — logs, traces, token counts. But they don't tell you why this run was different from the one that worked yesterday, or which specific decision made it 3x more expensive.

RunLens captures a full snapshot of every run — model, prompt version, tools, config — and lets you compare any two runs side by side. You see exactly what changed and where the cost went.


How it works

your agent → RunLens SDK → RunLens API → RunLens UI

You add 3 function calls to your agent. RunLens records every step, stores it, and lets you compare runs visually.


Quickstart

1. Install the SDK

pip install runlens-sdk

2. Instrument your agent

from runlens import start_run, record_step, end_run

# Start a run — capture your execution context
run = start_run(
    task="answer customer question",
    context={
        "model": "gpt-4o",
        "prompt_version": "v2",
        "tools": ["search", "calculator"],
        "temperature": 0.7,
    },
    api_url="https://runlens-api.onrender.com",
)

# Record each step — cost is calculated automatically for known models,
# or pass cost= explicitly for custom models.
record_step(
    run_id=run.id,
    step_type="llm_call",
    name="classify intent",
    input={"prompt": "..."},
    output={"intent": "refund_request"},
    model="gpt-4o",
    tokens=150,
)

# End the run
end_run(run.id)

3. Open the UI

Go to runlens-api.onrender.com. Select two runs and click Compare.


The comparison view

When you compare two runs, RunLens shows:

Context diff — what was different between the two runs:

Key Run A Run B
model gpt-4o gpt-4o-mini
prompt_version v1 v2
temperature 0.7 0.3

Summary — steps, cost, tokens, duration delta at a glance.

Step diff — side by side steps, with extra/missing steps flagged in red.


Run the demo

See a concrete example: a support bot running the same task twice — once over-engineered, once lean. 5x cost difference, same output.

pip install runlens-sdk requests
RUNLENS_API=https://runlens-api.onrender.com python examples/demo_agent.py

Then go to runlens-api.onrender.com, select both runs, and click Compare.


Project structure

runlens/
├── apps/
│   ├── api/          — FastAPI backend (SQLite)
│   └── web/          — Frontend (plain HTML + JS)
├── packages/
│   └── sdk-python/   — Python SDK
├── examples/
│   └── demo_agent.py — Demo: support bot comparison
└── CLAUDE.md         — Project brief for AI-assisted development

SDK reference

start_run(task, context=None, api_url=None, storage_path=None)

Starts a new run. Returns a RunHandle with an .id.

  • task — short description of what the agent is doing
  • context — dict with execution state: model, prompt version, tools, etc.
  • api_url — RunLens API base URL (optional, streams data to API)
  • storage_path — local JSON file path (optional, saves data locally)

record_step(run_id, step_type, input, output, cost=0.0, tokens=0, model=None, name=None, duration_ms=None)

Records a single step within a run.

  • step_type — e.g. "llm_call", "tool_call", "retrieval"
  • cost — cost in USD for this step
  • tokens — token count for this step

end_run(run_id)

Ends the run and returns the complete run record.


Self-hosting

The API is a standard FastAPI app. Deploy anywhere that runs Python:

uvicorn main:app --host 0.0.0.0 --port $PORT

Uses SQLite by default. Set DATABASE_URL environment variable to use a different database.


Tech stack

  • SDK — pure Python, zero dependencies
  • Backend — FastAPI + SQLModel + SQLite
  • Frontend — plain HTML + vanilla JS, no framework

License

MIT

About

RunLens helps teams compare and debug AI agent runs with step timelines, run diffs, and cost analysis.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors