Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
59 changes: 28 additions & 31 deletions src/agentunit/adapters/agentops_adapter.py
Original file line number Diff line number Diff line change
Expand Up @@ -67,13 +67,15 @@ def __init__(
self.project_id = project_id
self.default_tags = default_tags or []
self.auto_start_session = auto_start_session
self.enable_tracing = kwargs.get("enable_tracing", True)
self.client: Any = None
self.platform = MonitoringPlatform.AGENTOPS
"""
Initialize LangSmith adapter.

Args:
api_key: LangSmith API key
project_name: Project name for organizing traces
project_id: Langsmith project ID
endpoint: Optional custom LangSmith endpoint
enable_tracing: Whether to enable automatic tracing
enable_feedback: Whether to collect feedback data
Expand All @@ -98,9 +100,10 @@ def _initialize_agentops(self):
"""Initialize AgentOps client and verify connection."""
try:
# Import AgentOps SDK
import agentops
import agentops # type: ignore[import-not-found]

self.agentops = agentops
self.client = agentops
Comment on lines +103 to +106
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for list_runs usage in the codebase
echo "=== Searching for list_runs usage ==="
rg -n "\.list_runs\(" --type py -C 3

echo ""
echo "=== Searching for self.client usage in agentops_adapter.py ==="
rg -n "self\.client\." src/agentunit/adapters/agentops_adapter.py -C 2

echo ""
echo "=== Check lines 551, 637 specifically ==="
sed -n '549,553p; 635,639p' src/agentunit/adapters/agentops_adapter.py

echo ""
echo "=== Check if AgentOps client is instantiated ==="
rg -n "agentops\.(Client|init|get_session)" --type py -C 2

Repository: aviralgarg05/agentunit

Length of output: 4591


🌐 Web query:

AgentOps Python SDK client API list_runs method

💡 Result:

Do you mean the AgentOps SDK's method signature / example usage for a client.list_runs (to list runs/traces), or are you seeing an error when calling list_runs? Which language (pure agentops Python SDK, or the Public API via HTTP) and which version do you have?


🌐 Web query:

AgentOps Python SDK init method returns client session

💡 Result:

Yes — agentops.init() creates the SDK client and (by default) starts a session and returns a Session object. If you call agentops.init(auto_start_session=False) it will initialize the client without starting/returning a session. The docs also note the init call constructs the underlying Client (accessible via get_client()). [1][2]

Sources:

  • SDK reference (init, get_client). [1]
  • Sessions / Starting a session (auto_start_session behavior). [2]

🌐 Web query:

AgentOps SDK API methods list_runs create_dataset create_example

💡 Result:

I couldn't find public AgentOps SDK methods named exactly list_runs, create_dataset, or create_example in the AgentOps docs or SDK source. The official docs show the Python/TS SDK surface and a REST API (you can fetch the OpenAPI spec for full endpoint names), and AgentOps exposes session/run-related endpoints via its REST API. See:

  • SDK reference / quickstart (Python SDK overview). [1]
  • REST API / integrations page (mentions OpenAPI spec & session/run auth flow). [2]

If you want, I can:

  • fetch the AgentOps OpenAPI (JSON) and extract exact endpoint/method names (recommended), or
  • search the AgentOps GitHub repos for those exact method names and return code examples.

Which would you prefer?

Sources:
[1] https://docs.agentops.ai/v2/usage/sdk-reference
[2] https://docs.agentops.ai/v1/integrations/rest


🏁 Script executed:

#!/bin/bash
# Check the context around agentops.init() to see what it returns
echo "=== Lines around agentops.init() calls ==="
sed -n '105,125p' src/agentunit/adapters/agentops_adapter.py

echo ""
echo "=== Check what self.client is used for ==="
rg -n "self\.client\." src/agentunit/adapters/agentops_adapter.py | head -20

echo ""
echo "=== Check if there's a get_client or session pattern ==="
rg -n "agentops\.(get_client|get_session|Session|Client)" --type py

Repository: aviralgarg05/agentunit

Length of output: 1232


Critical issue: self.client is assigned to the agentops module but code calls non-existent LangSmith API methods.

At lines 103–106, the code assigns the agentops module to both self.agentops and self.client. However, agentops.init() (lines 110, 116) returns a Session object that is never captured. More critically, lines 551, 636, 728, and 735 attempt to call self.client.list_runs(), self.client.create_dataset(), and self.client.create_example() — methods that do not exist in the AgentOps SDK. These are LangSmith API methods (as evidenced by the comment "Query recent runs from LangSmith" at line 550). The adapter appears to be an incompletely adapted copy of the LangSmith adapter and will fail at runtime with AttributeError when these methods are invoked. Either capture the Session object returned by agentops.init() and use AgentOps-compatible APIs, or remove functionality that relies on non-existent methods.

🤖 Prompt for AI Agents
In src/agentunit/adapters/agentops_adapter.py around lines 103 to 106, the
adapter assigns the agentops module to self.client but never captures the
Session returned by agentops.init(), and later calls LangSmith-specific methods
(list_runs, create_dataset, create_example) that do not exist on AgentOps; fix
by changing the initialization to store the Session returned by agentops.init()
into self.client/self.session and refactor subsequent calls to use
AgentOps-compatible APIs on that Session, or remove/replace the
LangSmith-specific methods—ensure every call site (lines ~551, 636, 728, 735)
either calls a real AgentOps method on the Session or is removed, and update
imports/docs accordingly.


# Initialize AgentOps
if self.api_key:
Expand All @@ -124,11 +127,6 @@ def _initialize_agentops(self):
logger.error(f"Failed to connect to AgentOps: {e}")
raise

@property
def platform(self) -> MonitoringPlatform:
"""Return the monitoring platform type."""
return MonitoringPlatform.AGENTOPS

def create_agent(self, role: AgentRole, agent_id: str | None = None, **kwargs) -> AgentMetadata:
"""
Create an agent for AgentOps monitoring.
Expand Down Expand Up @@ -244,7 +242,7 @@ def send_message(
interaction = AgentInteraction(
interaction_id=interaction_id,
from_agent=from_agent,
to_agent=to_agent,
to_agent=to_agent or "broadcast",
content=message,
timestamp=timestamp,
metadata=metadata or {},
Expand Down Expand Up @@ -339,12 +337,15 @@ def calculate_coordination_metrics(self) -> dict[str, float]:
for interaction in self.session_interactions:
unique_agents.add(interaction.from_agent)
if interaction.to_agent:
unique_agents.add(interaction.to_agent)
if isinstance(interaction.to_agent, list):
unique_agents.update(interaction.to_agent)
else:
unique_agents.add(interaction.to_agent)

agent_participation = len(unique_agents)

# Calculate message distribution
agent_counts = {}
agent_counts: dict[str, int] = {}
for interaction in self.session_interactions:
from_agent = interaction.from_agent
agent_counts[from_agent] = agent_counts.get(from_agent, 0) + 1
Expand Down Expand Up @@ -374,7 +375,7 @@ def calculate_coordination_metrics(self) -> dict[str, float]:
else 0.0,
}

def run_scenario(self, scenario: Scenario) -> ScenarioResult:
async def run_scenario(self, scenario: Scenario) -> ScenarioResult:
"""
Run a scenario with LangSmith integration.

Expand All @@ -390,14 +391,14 @@ def run_scenario(self, scenario: Scenario) -> ScenarioResult:
scenario_run_id = None
if self.enable_tracing:
try:
run = self.client.create_run(
run_id = self.agentops.start_trace(
name=f"Scenario: {scenario.name}",
run_type="chain",
project_name=self.project_name,
inputs={"scenario": scenario.name, "description": scenario.description},
project_name=self.project_id,
inputs={"scenario": scenario.name},
tags=["agentunit", "scenario"],
)
scenario_run_id = str(run.id)
scenario_run_id = str(run_id)
Comment on lines +394 to +401
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

AgentOps Python SDK start_trace update_trace_metadata end_trace API documentation

💡 Result:

Here are the AgentOps Python SDK docs and the key details for start_trace, update_trace_metadata, and end_trace.

  • start_trace(trace_name: str = None, tags: Union[Dict, List] = None) — manually start a trace; returns a TraceContext (can be used as a context manager with with). Use when auto-start is disabled or you need separate traces. [1][2]

  • update_trace_metadata(metadata: Dict[str, Any], prefix: str = "trace.metadata") -> bool — update running-trace metadata (strings/numbers/booleans/lists), maps common keys to semantic conventions (e.g., tags, agent_name), validates types, returns True/False. Useful to add progress, stage, results mid-run. [1][2][3]

  • end_trace(trace: TraceContext = None, end_state: str = None, end_prompt: str = None, error_message: str = None) — end a specific trace (or all active traces if trace omitted); you can set an end_state like "Success"/"Error" and provide optional end details. Context-manager usage automatically ends traces on exit (Success or Error depending on exception). [1][2][4]

Concise examples (from docs):

  • Manual start/update/end
    import agentops
    agentops.init(api_key="KEY", auto_start_session=False)
    trace = agentops.start_trace("my-workflow", tags=["prod"])

    ... run ...

    agentops.update_trace_metadata({"stage":"processing","tags":["in-progress"]})
    agentops.end_trace(trace, "Success")
    [1][2]

Docs pages:

  1. SDK Reference (start_trace, update_trace_metadata, end_trace). [1]
  2. Manual Trace Control / examples. [2]
  3. Quickstart / updating trace metadata snippet. [3]
  4. Context manager / with start_trace behavior. [4]

Would you like direct links to any of those pages or a full code snippet tailored to your app?


🏁 Script executed:

cd /tmp && find . -name "agentops_adapter.py" -type f 2>/dev/null | head -1

Repository: aviralgarg05/agentunit

Length of output: 48


🏁 Script executed:

# Search for the file in the repository
git ls-files | grep agentops_adapter.py

Repository: aviralgarg05/agentunit

Length of output: 108


🏁 Script executed:

# Check imports and initialization in the adapter file
head -50 src/agentunit/adapters/agentops_adapter.py

Repository: aviralgarg05/agentunit

Length of output: 1388


🏁 Script executed:

# Search for other start_trace calls in the codebase
rg "start_trace" --type py

Repository: aviralgarg05/agentunit

Length of output: 160


🏁 Script executed:

# Check the full __init__ method and how self.agentops is initialized
sed -n '40,150p' src/agentunit/adapters/agentops_adapter.py

Repository: aviralgarg05/agentunit

Length of output: 3978


🏁 Script executed:

# Look for agentops import and initialization
rg "import.*agentops|from.*agentops|self\.agentops\s*=" src/agentunit/adapters/agentops_adapter.py -A 3

Repository: aviralgarg05/agentunit

Length of output: 235


🏁 Script executed:

# Check around line 394-401 for more context
sed -n '380,410p' src/agentunit/adapters/agentops_adapter.py

Repository: aviralgarg05/agentunit

Length of output: 1221


🏁 Script executed:

# Check if there's a custom wrapper around agentops being used
rg "class.*AgentOps|def start_trace|def update_trace_metadata|def end_trace" src/agentunit/adapters/agentops_adapter.py -B 2 -A 5

Repository: aviralgarg05/agentunit

Length of output: 316


🏁 Script executed:

# Check the entire run_scenario method to see how start_trace, update_trace_metadata, end_trace are used
sed -n '370,550p' src/agentunit/adapters/agentops_adapter.py

Repository: aviralgarg05/agentunit

Length of output: 6746


All three AgentOps trace method calls use incorrect parameter signatures incompatible with the official SDK.

Lines 394-401 call start_trace(name=, run_type=, project_name=, inputs=, tags=) but the AgentOps SDK signature is start_trace(trace_name: str = None, tags: Union[Dict, List] = None). The parameters run_type, project_name, and inputs do not exist in the SDK.

Lines 487-491 and 527-531 call update_trace_metadata(trace_id=, metadata=) and end_trace(trace_id=, status_code=) which use non-existent parameters. The correct signatures are:

  • update_trace_metadata(metadata: Dict[str, Any], prefix: str = "trace.metadata")
  • end_trace(trace: TraceContext = None, end_state: str = None, ...)

Additionally, the method docstring incorrectly states "Run a scenario with LangSmith integration" (line ~382) despite being in the AgentOpsAdapter class.

Refactor all three method calls to match the official AgentOps SDK signatures, and correct the docstring.

🤖 Prompt for AI Agents
In src/agentunit/adapters/agentops_adapter.py around lines 382 and 394-401 (and
also update calls near 487-491 and 527-531), the AgentOps SDK calls use the
wrong parameter names and the docstring is wrong: change the method docstring to
reference AgentOps integration (not LangSmith), refactor the start_trace call to
use the SDK signature start_trace(trace_name=..., tags=...) (move project_id,
run_type and inputs into the tags or into a metadata dict passed via
update_trace_metadata), refactor update_trace_metadata calls to call
update_trace_metadata(metadata_dict, prefix="trace.metadata") (pass a dict of
metadata as first arg), and refactor end_trace calls to call
end_trace(trace=trace_context_or_id, end_state="success"/"failed") using the
proper end_state string rather than status_code; ensure trace identifiers/types
match the SDK expected TraceContext or trace id usage and build metadata/tags as
dicts/lists accordingly.

except Exception as e:
logger.warning(f"Failed to create scenario run: {e}")

Expand Down Expand Up @@ -483,15 +484,11 @@ def run_scenario(self, scenario: Scenario) -> ScenarioResult:
# Update LangSmith run with results
if scenario_run_id and self.enable_tracing:
try:
self.client.update_run(
run_id=scenario_run_id,
outputs={
"result": result.passed,
"execution_time": execution_time,
"details": result.details,
},
end_time=datetime.now(timezone.utc),
self.agentops.update_trace_metadata(
trace_id=scenario_run_id,
metadata={"result": result.success_rate, "details": result.to_dict()},
)
self.agentops.end_trace(trace_id=scenario_run_id, status_code="SUCCESS")
except Exception as e:
logger.warning(f"Failed to update scenario run: {e}")

Expand Down Expand Up @@ -527,11 +524,11 @@ def run_scenario(self, scenario: Scenario) -> ScenarioResult:
# Update LangSmith run with error
if scenario_run_id and self.enable_tracing:
try:
self.client.update_run(
run_id=scenario_run_id,
outputs={"error": str(e)},
end_time=datetime.now(timezone.utc),
self.agentops.update_trace_metadata(
trace_id=scenario_run_id,
metadata={"result": result.success_rate, "details": result.to_dict()},
)
self.agentops.end_trace(trace_id=scenario_run_id, status_code="SUCCESS")
except Exception as e:
logger.warning(f"Failed to update failed scenario run: {e}")

Expand All @@ -551,7 +548,7 @@ def collect_metrics(self, scenario: Any, result: Any, **kwargs) -> ProductionMet
"""
try:
# Query recent runs from LangSmith
runs = list(self.client.list_runs(project_name=self.project_name, limit=100))
runs = list(self.client.list_runs(project_name=self.project_id, limit=100))

if not runs:
return ProductionMetrics(
Expand Down Expand Up @@ -637,7 +634,7 @@ def establish_baseline(
# Query historical runs
runs = list(
self.client.list_runs(
project_name=self.project_name, start_time=start_date, end_time=end_date
project_name=self.project_id, start_time=start_date, end_time=end_date
)
)

Expand Down Expand Up @@ -749,7 +746,7 @@ def create_evaluation_dataset(
logger.error(f"Failed to create LangSmith dataset: {e}")
raise

def run_evaluation(self, dataset_id: str, evaluator_function: Any, **kwargs) -> dict[str, Any]:
def run_evaluation(self, dataset_id: str, evaluator_function: Any, **kwargs) -> Any:
"""
Run evaluation on a LangSmith dataset.

Expand All @@ -767,7 +764,7 @@ def run_evaluation(self, dataset_id: str, evaluator_function: Any, **kwargs) ->
results = evaluate(
evaluator_function,
data=dataset_id,
project_name=f"{self.project_name}-evaluation",
experiment_prefix=f"{self.project_id}-evaluation",
**kwargs,
)

Expand Down
24 changes: 12 additions & 12 deletions src/agentunit/core/scenario.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
import random
from dataclasses import dataclass, field
from pathlib import Path
from typing import TYPE_CHECKING
from typing import TYPE_CHECKING, Any

from agentunit.datasets.registry import resolve_dataset

Expand Down Expand Up @@ -68,7 +68,7 @@ def from_openai_agents(

adapter = OpenAIAgentsAdapter.from_flow(flow, **options)
ds = resolve_dataset(dataset)
scenario_name = name or getattr(flow, "__name__", "openai-agents-scenario")
scenario_name = name or getattr(flow, "__name__", None) or "openai-agents-scenario"
return cls(name=scenario_name, adapter=adapter, dataset=ds)

@classmethod
Expand All @@ -95,7 +95,7 @@ def from_autogen(
orchestrator: object,
dataset: str | DatasetSource | None = None,
name: str | None = None,
**options: object,
**options: Any,
) -> Scenario:
"""
Create scenario from AutoGen orchestrator.
Expand All @@ -113,7 +113,7 @@ def from_haystack(
pipeline: object,
dataset: str | DatasetSource | None = None,
name: str | None = None,
**options: object,
**options: Any,
) -> Scenario:
from agentunit.adapters.haystack import HaystackAdapter

Expand All @@ -128,7 +128,7 @@ def from_llama_index(
engine: object,
dataset: str | DatasetSource | None = None,
name: str | None = None,
**options: object,
**options: Any,
) -> Scenario:
from agentunit.adapters.llama_index import LlamaIndexAdapter

Expand All @@ -143,7 +143,7 @@ def from_semantic_kernel(
invoker: object,
dataset: str | DatasetSource | None = None,
name: str | None = None,
**options: object,
**options: Any,
) -> Scenario:
from agentunit.adapters.semantic_kernel import SemanticKernelAdapter

Expand All @@ -158,7 +158,7 @@ def from_phidata(
agent: object,
dataset: str | DatasetSource | None = None,
name: str | None = None,
**options: object,
**options: Any,
) -> Scenario:
from agentunit.adapters.phidata import PhidataAdapter

Expand All @@ -173,7 +173,7 @@ def from_promptflow(
flow: object,
dataset: str | DatasetSource | None = None,
name: str | None = None,
**options: object,
**options: Any,
) -> Scenario:
from agentunit.adapters.promptflow import PromptFlowAdapter

Expand All @@ -188,7 +188,7 @@ def from_openai_swarm(
swarm: object,
dataset: str | DatasetSource | None = None,
name: str | None = None,
**options: object,
**options: Any,
) -> Scenario:
from agentunit.adapters.openai_swarm import OpenAISwarmAdapter

Expand All @@ -204,7 +204,7 @@ def from_anthropic_bedrock(
model_id: str,
dataset: str | DatasetSource | None = None,
name: str | None = None,
**options: object,
**options: Any,
) -> Scenario:
from agentunit.adapters.anthropic_bedrock import AnthropicBedrockAdapter

Expand All @@ -219,7 +219,7 @@ def from_mistral_server(
base_url: str,
dataset: str | DatasetSource | None = None,
name: str | None = None,
**options: object,
**options: Any,
) -> Scenario:
from agentunit.adapters.mistral_server import MistralServerAdapter

Expand All @@ -239,7 +239,7 @@ def from_rasa_endpoint(
target: str | Callable[[dict], object],
dataset: str | DatasetSource | None = None,
name: str | None = None,
**options: object,
**options: Any,
) -> Scenario:
from agentunit.adapters.rasa import RasaAdapter

Expand Down
21 changes: 17 additions & 4 deletions src/agentunit/datasets/builtins.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,9 @@

from __future__ import annotations

from typing import TYPE_CHECKING
from typing import TYPE_CHECKING, TypedDict

from typing_extensions import NotRequired

from .base import DatasetCase, DatasetSource

Expand All @@ -11,7 +13,18 @@
from collections.abc import Iterable


_GAIA_L1_SHOPPING: list[dict[str, object]] = [
class DatasetRow(TypedDict):
"""Provides structure for type hints"""

id: str
query: str
expected_output: str | None
tools: list[str] | None
context: list[str] | None
metadata: NotRequired[dict[str, object]]


_GAIA_L1_SHOPPING: list[DatasetRow] = [
{
"id": "gaia-shopping-001",
"query": "Find the best price for a pack of AA rechargeable batteries with at least 2500mAh capacity.",
Expand All @@ -28,7 +41,7 @@
},
]

_SWE_BENCH_LITE: list[dict[str, object]] = [
_SWE_BENCH_LITE: list[DatasetRow] = [
{
"id": "swe-lite-001",
"query": "Fix the bug where the API returns HTTP 500 when the username is missing.",
Expand All @@ -48,7 +61,7 @@
]


def _build_loader(rows: list[dict[str, object]]) -> Iterable[DatasetCase]:
def _build_loader(rows: list[DatasetRow]) -> Iterable[DatasetCase]:
for row in rows:
yield DatasetCase(
id=row["id"],
Expand Down
10 changes: 6 additions & 4 deletions src/agentunit/metrics/builtin.py
Original file line number Diff line number Diff line change
Expand Up @@ -143,8 +143,9 @@ def evaluate(self, case: DatasetCase, trace: TraceLog, outcome: Any) -> MetricRe
cost = 0.0

# Check trace metadata
if trace.metadata and "cost" in trace.metadata:
cost = float(trace.metadata["cost"])
metadata = getattr(trace, "metadata", {})
if metadata and "cost" in metadata:
cost = float(metadata["cost"])

# Check outcome
elif hasattr(outcome, "cost"):
Expand All @@ -168,8 +169,9 @@ def evaluate(self, case: DatasetCase, trace: TraceLog, outcome: Any) -> MetricRe
total_tokens = 0

# Check trace metadata
if trace.metadata and "usage" in trace.metadata:
usage = trace.metadata["usage"]
metadata = getattr(trace, "metadata", {})
if metadata and "usage" in metadata:
usage = metadata["usage"]
prompt_tokens = usage.get("prompt_tokens", 0)
completion_tokens = usage.get("completion_tokens", 0)
total_tokens = usage.get("total_tokens", 0)
Expand Down
2 changes: 1 addition & 1 deletion src/agentunit/production/integrations.py
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@ def _calculate_baseline_stats(
self, historical_data: list[dict[str, Any]], metrics: list[str]
) -> dict[str, dict[str, dict[str, float]]]:
"""Calculate baseline statistics from historical data."""
baseline_stats = {
baseline_stats: dict[str, dict[str, dict[str, float]]] = {
"performance_baseline": {},
"quality_baseline": {},
"reliability_baseline": {},
Expand Down