From f9658a91d3224297f659f5c6310d093bf3c7af62 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jannik=20Maierh=C3=B6fer?= Date: Mon, 31 Mar 2025 15:23:07 +0200 Subject: [PATCH 1/2] docs: add cookbook on evaluating openai agents --- examples/agents_sdk/evaluate_agents.ipynb | 2761 +++++++++++++++++++++ 1 file changed, 2761 insertions(+) create mode 100644 examples/agents_sdk/evaluate_agents.ipynb diff --git a/examples/agents_sdk/evaluate_agents.ipynb b/examples/agents_sdk/evaluate_agents.ipynb new file mode 100644 index 0000000000..820f9c58e0 --- /dev/null +++ b/examples/agents_sdk/evaluate_agents.ipynb @@ -0,0 +1,2761 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "qsOnSYBqfoSL" + }, + "source": [ + "# Evaluating OpenAI Agents\n", + "\n", + "In this cookbook, we will learn how to **monitor the internal steps (traces) of the [OpenAI agent SDK](https://github.com/openai/openai-agents-python)** and **evaluate its performance** using [Langfuse](https://langfuse.com/docs).\n", + "\n", + "This guide covers **online** and **offline evaluation** metrics used by teams to bring agents to production fast and reliably. To learn more about evaluation strategies, check out this [blog post](https://langfuse.com/blog/2025-03-04-llm-evaluation-101-best-practices-and-challenges).\n", + "\n", + "**Why AI agent Evaluation is important:**\n", + "- Debugging issues when tasks fail or produce suboptimal results\n", + "- Monitoring costs and performance in real-time\n", + "- Improving reliability and safety through continuous feedback\n", + "\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "94-c-mbeVk4q" + }, + "source": [ + "
\n", + " \n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ZzPbsmLrfoSN" + }, + "source": [ + "## Step 0: Install the Required Libraries\n", + "\n", + "Below we install the `openai-agents` library (the OpenAI Agents SDK [link text](https://github.com/openai/openai-agents-python)), the `pydantic-ai[logfire]` OpenTelemetry instrumentation, `langfuse` and the Hugging Face `datasets` library" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "collapsed": true, + "id": "_EI_0ZfzfoSO", + "outputId": "ace75429-9836-456e-98e7-b08df97f616e" + }, + "outputs": [], + "source": [ + "%pip install openai-agents\n", + "%pip install nest_asyncio\n", + "%pip install pydantic-ai[logfire]\n", + "%pip install langfuse\n", + "%pip install datasets" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "FHRsxz1VfoSP" + }, + "source": [ + "## Step 1: Instrument Your Agent\n", + "\n", + "In this notebook, we will use [Langfuse](https://langfuse.com/) to trace, debug and evaluate our agent.\n", + "\n", + "**Note:** If you are using LlamaIndex or LangGraph, you can find documentation on instrumenting them [here](https://langfuse.com/docs/integrations/llama-index/workflows) and [here](https://langfuse.com/docs/integrations/langchain/example-python-langgraph)." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "id": "mZnxtWx9foSP" + }, + "outputs": [], + "source": [ + "import os\n", + "import base64\n", + "\n", + "# Get keys for your project from the project settings page: https://cloud.langfuse.com\n", + "os.environ[\"LANGFUSE_PUBLIC_KEY\"] = \"pk-lf-...\"\n", + "os.environ[\"LANGFUSE_SECRET_KEY\"] = \"sk-lf-...\"\n", + "os.environ[\"LANGFUSE_HOST\"] = \"https://cloud.langfuse.com\" # 🇪🇺 EU region\n", + "# os.environ[\"LANGFUSE_HOST\"] = \"https://us.cloud.langfuse.com\" # 🇺🇸 US region\n", + "\n", + "LANGFUSE_AUTH = base64.b64encode(\n", + " f\"{os.environ.get('LANGFUSE_PUBLIC_KEY')}:{os.environ.get('LANGFUSE_SECRET_KEY')}\".encode()\n", + ").decode()\n", + "\n", + "os.environ[\"OTEL_EXPORTER_OTLP_ENDPOINT\"] = os.environ.get(\"LANGFUSE_HOST\") + \"/api/public/otel\"\n", + "os.environ[\"OTEL_EXPORTER_OTLP_HEADERS\"] = f\"Authorization=Basic {LANGFUSE_AUTH}\"\n", + "\n", + "# Set your OpenAI API Key\n", + "os.environ[\"OPENAI_API_KEY\"] = \"sk-proj-...\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "KQjuJNHkfoSP" + }, + "outputs": [], + "source": [ + "from opentelemetry.sdk.trace import TracerProvider\n", + "from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter\n", + "from opentelemetry.sdk.trace.export import SimpleSpanProcessor\n", + "\n", + "# Create a TracerProvider for OpenTelemetry\n", + "trace_provider = TracerProvider()\n", + "\n", + "# Add a SimpleSpanProcessor with the OTLPSpanExporter to send traces\n", + "trace_provider.add_span_processor(SimpleSpanProcessor(OTLPSpanExporter()))\n", + "\n", + "# Set the global default tracer provider\n", + "from opentelemetry import trace\n", + "trace.set_tracer_provider(trace_provider)\n", + "tracer = trace.get_tracer(__name__)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "IWr-MQY7hKdM" + }, + "source": [ + "Pydantic Logfire offers an instrumentation for the OpenAi Agent SDK. We use this to send traces to the [Langfuse OpenTelemetry Backend](https://langfuse.com/docs/opentelemetry/get-started)." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": { + "id": "td11AsCShBxA" + }, + "outputs": [], + "source": [ + "import nest_asyncio\n", + "nest_asyncio.apply()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "1MQoskgIhCQi" + }, + "outputs": [], + "source": [ + "import logfire\n", + "\n", + "# Configure logfire instrumentation.\n", + "logfire.configure(\n", + " service_name='my_agent_service',\n", + "\n", + " send_to_logfire=False,\n", + ")\n", + "# This method automatically patches the OpenAI Agents SDK to send logs via OTLP to Langfuse.\n", + "logfire.instrument_openai_agents()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "uulS5iGHfoSP" + }, + "source": [ + "## Step 2: Test Your Instrumentation\n", + "\n", + "Here is a simple Q&A agent. We run it to confirm that the instrumentation is working correctly. If everything is set up correctly, you will see logs/spans in your observability dashboard." + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "UcyynS9CfoSP", + "outputId": "5e8eb8c2-bcff-4149-ff8b-21d706c1d25b" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "12:01:03.401 OpenAI Agents trace: Agent workflow\n", + "12:01:03.403 Agent run: 'Assistant'\n", + "12:01:03.404 Responses API with 'gpt-4o'\n", + "Evaluating AI agents is crucial for several reasons:\n", + "\n", + "1. **Performance Verification**: Ensures that the AI performs its intended tasks accurately and efficiently, meeting the desired objectives and criteria.\n", + "\n", + "2. **Reliability and Consistency**: Assesses whether the AI provides consistent results across different scenarios and over time.\n", + "\n", + "3. **Safety and Risk Management**: Identifies potential risks or harmful behaviors that could lead to undesirable outcomes, ensuring the AI operates safely within defined limits.\n", + "\n", + "4. **Bias and Fairness**: Checks for any biases in the AI’s decision-making process to promote fairness and avoid discrimination against particular groups.\n", + "\n", + "5. **User Trust and Adoption**: Builds confidence and trust in the AI system among users and stakeholders, which is essential for widespread adoption.\n", + "\n", + "6. **Regulatory Compliance**: Ensures that the AI adheres to relevant laws, regulations, and ethical guidelines, which may vary by industry or region.\n", + "\n", + "7. **Continuous Improvement**: Provides feedback that can be used to refine and improve the AI model over time, enhancing its effectiveness and efficiency.\n", + "\n", + "8. **Integration and Compatibility**: Evaluates how well the AI integrates with existing systems and processes, ensuring compatibility and smooth operation.\n", + "\n", + "9. **Resource Optimization**: Assesses the efficiency of the AI in terms of computational resources, which can lead to cost savings and improved performance.\n", + "\n", + "Evaluating AI agents systematically and rigorously supports their development and deployment in a responsible and effective manner.\n" + ] + } + ], + "source": [ + "import asyncio\n", + "from agents import Agent, Runner\n", + "\n", + "async def main():\n", + " agent = Agent(\n", + " name=\"Assistant\",\n", + " instructions=\"You are a senior software engineer\",\n", + " )\n", + "\n", + " result = await Runner.run(agent, \"Tell me why it is important to evaluate AI agents.\")\n", + " print(result.final_output)\n", + "\n", + "loop = asyncio.get_running_loop()\n", + "await loop.create_task(main())" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "hPLt1hRkfoSQ" + }, + "source": [ + "Check your [Langfuse Traces Dashboard](https://cloud.langfuse.com/traces) to confirm that the spans and logs have been recorded.\n", + "\n", + "Example trace in Langfuse:\n", + "\n", + "![Example trace in Langfuse](https://langfuse.com/images/cookbook/integration_openai-agents/first-example-trace.png)\n", + "\n", + "_[Link to the trace](https://cloud.langfuse.com/project/cloramnkj0002jz088vzn1ja4/traces/0195948781a9f0d78fd5e067154aa508?timestamp=2025-03-14T12%3A01%3A03.401Z&observation=64bcac3cb82d04e9)_" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "onjMD-ZJfoSQ" + }, + "source": [ + "## Step 3: Observe and Evaluate a More Complex Agent\n", + "\n", + "Now that you have confirmed your instrumentation works, let's try a more complex query so we can see how advanced metrics (token usage, latency, costs, etc.) are tracked." + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "3qdMJh9KfoSQ", + "outputId": "04136f13-51b0-4938-afa6-ade016c01aa9" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "13:33:30.839 OpenAI Agents trace: Agent workflow\n", + "13:33:30.840 Agent run: 'Hello world'\n", + "13:33:30.842 Responses API with 'gpt-4o'\n", + "13:33:31.822 Function: get_weather\n", + "13:33:31.825 Responses API with 'gpt-4o'\n", + "The weather in Berlin is currently sunny.\n" + ] + } + ], + "source": [ + "import asyncio\n", + "from agents import Agent, Runner, function_tool\n", + "\n", + "# Example function tool.\n", + "@function_tool\n", + "def get_weather(city: str) -> str:\n", + " return f\"The weather in {city} is sunny.\"\n", + "\n", + "agent = Agent(\n", + " name=\"Hello world\",\n", + " instructions=\"You are a helpful agent.\",\n", + " tools=[get_weather],\n", + ")\n", + "\n", + "async def main():\n", + " result = await Runner.run(agent, input=\"What's the weather in Berlin?\")\n", + " print(result.final_output)\n", + "\n", + "loop = asyncio.get_running_loop()\n", + "await loop.create_task(main())" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "fjkhTgLWfoSQ" + }, + "source": [ + "### Trace Structure\n", + "\n", + "Langfuse records a **trace** that contains **spans**, which represent each step of your agent’s logic. Here, the trace contains the overall agent run and sub-spans for:\n", + "- The tool call (get_weather)\n", + "- The LLM calls (Responses API with 'gpt-4o')\n", + "\n", + "You can inspect these to see precisely where time is spent, how many tokens are used, and so on:\n", + "\n", + "![Trace tree in Langfuse](https://langfuse.com/images/cookbook/integration_openai-agents/trace-tree.png)\n", + "\n", + "_[Link to the trace](https://cloud.langfuse.com/project/cloramnkj0002jz088vzn1ja4/traces/019594b5b9a27c5d497b13be71e7f255?timestamp=2025-03-14T12%3A51%3A32.386Z&display=preview&observation=6374a3c96baf831d)_" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "JHZAkQuefoSQ" + }, + "source": [ + "## Online Evaluation\n", + "\n", + "Online Evaluation refers to evaluating the agent in a live, real-world environment, i.e. during actual usage in production. This involves monitoring the agent’s performance on real user interactions and analyzing outcomes continuously.\n", + "\n", + "We have written down a guide on different evaluation techniques [here](https://langfuse.com/blog/2025-03-04-llm-evaluation-101-best-practices-and-challenges).\n", + "\n", + "### Common Metrics to Track in Production\n", + "\n", + "1. **Costs** — The instrumentation captures token usage, which you can transform into approximate costs by assigning a price per token.\n", + "2. **Latency** — Observe the time it takes to complete each step, or the entire run.\n", + "3. **User Feedback** — Users can provide direct feedback (thumbs up/down) to help refine or correct the agent.\n", + "4. **LLM-as-a-Judge** — Use a separate LLM to evaluate your agent’s output in near real-time (e.g., checking for toxicity or correctness).\n", + "\n", + "Below, we show examples of these metrics." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "QHMvJ1QlfoSQ" + }, + "source": [ + "#### 1. Costs\n", + "\n", + "Below is a screenshot showing usage for `gpt-4o` calls. This is useful to see costly steps and optimize your agent.\n", + "\n", + "![Costs](https://langfuse.com/images/cookbook/integration_openai-agents/gpt-4o-costs.png)\n", + "\n", + "_[Link to the trace](https://cloud.langfuse.com/project/cloramnkj0002jz088vzn1ja4/traces/019594b5b9a27c5d497b13be71e7f255?timestamp=2025-03-14T12%3A51%3A32.386Z&display=preview&observation=6374a3c96baf831d)_" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "yz0y9mn7foSQ" + }, + "source": [ + "#### 2. Latency\n", + "\n", + "We can also see how long it took to complete each step. In the example below, the entire run took 7 seconds, which you can break down by step. This helps you identify bottlenecks and optimize your agent.\n", + "\n", + "![Latency](https://langfuse.com/images/cookbook/integration_openai-agents/openai-agent-latency.png)\n", + "\n", + "_[Link to the trace](https://cloud.langfuse.com/project/cloramnkj0002jz088vzn1ja4/traces/019594b5b9a27c5d497b13be71e7f255?timestamp=2025-03-14T12%3A51%3A32.386Z&display=timeline&observation=b12967a01b3f8bcb)_" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "MByq31MzfoSQ" + }, + "source": [ + "#### 3. Additional Attributes\n", + "\n", + "Opentelemetry lets you attach a set of attributes to all spans by setting [`set_attribute`](https://opentelemetry.io/docs/languages/python/instrumentation/#add-attributes-to-a-span). This allows you to set properties like a Langfuse Session ID, to group traces into Langfuse Sessions or a User ID, to assign traces to a specific user. You can find a list of all supported attributes in the [here](/docs/opentelemetry/get-started#property-mapping).\n", + "\n", + "In this example, we pass a [user_id](https://langfuse.com/docs/tracing-features/users), [session_id](https://langfuse.com/docs/tracing-features/sessions) and [trace_tags](https://langfuse.com/docs/tracing-features/tags) to Langfuse. You can also use the span attribute `input.value` and `output.value` to set the trace level input and output." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "CaSQNrgyfoSR" + }, + "outputs": [], + "source": [ + "from opentelemetry.sdk.trace import TracerProvider\n", + "from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter\n", + "from opentelemetry.sdk.trace.export import SimpleSpanProcessor\n", + "\n", + "trace_provider = TracerProvider()\n", + "trace_provider.add_span_processor(SimpleSpanProcessor(OTLPSpanExporter()))\n", + "\n", + "# Sets the global default tracer provider\n", + "from opentelemetry import trace\n", + "trace.set_tracer_provider(trace_provider)\n", + "\n", + "# Creates a tracer from the global tracer provider\n", + "tracer = trace.get_tracer(__name__)" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "w5O02Ren3Kmu", + "outputId": "63a0020d-c873-46c6-d159-3a045ab681f0" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "13:34:49.654 OpenAI Agents trace: Agent workflow\n", + "13:34:49.655 Agent run: 'Assistant'\n", + "13:34:49.657 Responses API with 'gpt-4o'\n", + "AI agent evaluation is crucial for several reasons:\n", + "\n", + "1. **Performance Verification**: It ensures that the AI agent performs its intended tasks effectively and meets specific criteria or benchmarks.\n", + "\n", + "2. **Safety and Reliability**: Evaluation helps identify and mitigate risks, ensuring that the AI operates safely and reliably in real-world situations.\n", + "\n", + "3. **Continuous Improvement**: Analyzing performance data allows developers to refine and enhance the AI, leading to better outcomes and more efficient systems.\n", + "\n", + "4. **Transparency and Accountability**: Thorough evaluation provides transparency into how decisions are made by the AI, which is essential for accountability, especially in sensitive applications.\n", + "\n", + "5. **Bias and Fairness**: Evaluating AI systems helps detect and address potential biases, ensuring fair treatment of all users and stakeholders.\n", + "\n", + "6. **Compliance**: It ensures adherence to regulations and industry standards, which is critical for legal and ethical compliance.\n", + "\n", + "7. **User Trust**: A well-evaluated AI fosters trust among users, stakeholders, and the public, as they can be confident in its capabilities and limitations.\n", + "\n", + "8. **Resource Allocation**: Evaluation helps determine if the AI is using resources efficiently, which can be crucial for cost management and scalability.\n" + ] + } + ], + "source": [ + "input_query = \"Why is AI agent evaluation important?\"\n", + "\n", + "with tracer.start_as_current_span(\"OpenAI-Agent-Trace\") as span:\n", + " span.set_attribute(\"langfuse.user.id\", \"user-12345\")\n", + " span.set_attribute(\"langfuse.session.id\", \"my-agent-session\")\n", + " span.set_attribute(\"langfuse.tags\", [\"staging\", \"demo\", \"OpenAI Agent SDK\"])\n", + "\n", + " async def main(input_query):\n", + " agent = Agent(\n", + " name = \"Assistant\",\n", + " instructions = \"You are a helpful assistant.\",\n", + " )\n", + "\n", + " result = await Runner.run(agent, input_query)\n", + " print(result.final_output)\n", + " return result\n", + "\n", + " result = await main(input_query)\n", + "\n", + " # Add input and output values to parent trace\n", + " span.set_attribute(\"input.value\", input_query)\n", + " span.set_attribute(\"output.value\", result.final_output)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tEPeGdAafoSR" + }, + "source": [ + "![Example trace in Langfuse](https://langfuse.com/images/cookbook/integration_openai-agents/openai-agent-sdk-custom-attributes.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XtKiK62HfoSR" + }, + "source": [ + "#### 4. User Feedback\n", + "\n", + "If your agent is embedded into a user interface, you can record direct user feedback (like a thumbs-up/down in a chat UI). Below is an example using `IPython.display` for simple feedback mechanism.\n", + "\n", + "In the code snippet below, when a user sends a chat message, we capture the OpenTelemetry trace ID. If the user likes/dislikes the last answer, we attach a score to the trace." + ] + }, + { + "cell_type": "code", + "execution_count": 38, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 468, + "referenced_widgets": [ + "8b10018448324153af2ee1f9bd83d140", + "cebcce63ea37474ca10f1828105ca2e6", + "9153dfceabff450ead31493c3c518d4c", + "5a2b1d2255a34a7597b263755eaa14b3", + "c8b3aa3aeec046ef8acfab640c2dee17", + "ee1b1596e6ec42029fbf8b711c0fc41a", + "ecd5521cdbc34eb7a866b4b2094fd500", + "df007d6320cb4198a6dbf58485980394" + ] + }, + "id": "YI9siKKKfoSR", + "outputId": "3eb086d0-4277-4cdd-966f-10b3c12272a6" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Enter your question: What is Langfuse?\n", + "13:54:41.574 OpenAI Agents trace: Agent workflow\n", + "13:54:41.575 Agent run: 'WebSearchAgent'\n", + "13:54:41.577 Responses API with 'gpt-4o'\n", + "Langfuse is an open-source engineering platform designed to enhance the development, monitoring, and optimization of Large Language Model (LLM) applications. It offers a suite of tools that provide observability, prompt management, evaluations, and metrics, facilitating the debugging and improvement of LLM-based solutions. ([toolkitly.com](https://www.toolkitly.com/langfuse?utm_source=openai))\n", + "\n", + "**Key Features of Langfuse:**\n", + "\n", + "- **LLM Observability:** Langfuse enables developers to monitor and analyze the performance of language models by tracking API calls, user inputs, prompts, and outputs. This observability aids in understanding model behavior and identifying areas for improvement. ([toolkitly.com](https://www.toolkitly.com/langfuse?utm_source=openai))\n", + "\n", + "- **Prompt Management:** The platform provides tools for managing, versioning, and deploying prompts directly within Langfuse. This feature allows for efficient organization and refinement of prompts to optimize model responses. ([toolkitly.com](https://www.toolkitly.com/langfuse?utm_source=openai))\n", + "\n", + "- **Evaluations and Metrics:** Langfuse offers capabilities to collect and calculate scores for LLM completions, run model-based evaluations, and gather user feedback. It also tracks key metrics such as cost, latency, and quality, providing insights through dashboards and data exports. ([toolkitly.com](https://www.toolkitly.com/langfuse?utm_source=openai))\n", + "\n", + "- **Playground Environment:** The platform includes a playground where users can interactively experiment with different models and prompts, facilitating prompt engineering and testing. ([toolkitly.com](https://www.toolkitly.com/langfuse?utm_source=openai))\n", + "\n", + "- **Integration Capabilities:** Langfuse integrates seamlessly with various tools and frameworks, including LlamaIndex, LangChain, OpenAI SDK, LiteLLM, and more, enhancing its functionality and allowing for the development of complex applications. ([toolerific.ai](https://toolerific.ai/ai-tools/opensource/langfuse-langfuse?utm_source=openai))\n", + "\n", + "- **Open Source and Self-Hosting:** Being open-source, Langfuse allows developers to customize and extend the platform according to their specific needs. It can be self-hosted, providing full control over infrastructure and data. ([vafion.com](https://www.vafion.com/blog/unlocking-power-language-models-langfuse/?utm_source=openai))\n", + "\n", + "Langfuse is particularly valuable for developers and researchers working with LLMs, offering a comprehensive set of tools to improve the performance and reliability of LLM applications. Its flexibility, integration capabilities, and open-source nature make it a robust choice for those seeking to enhance their LLM projects. \n", + "How did you like the agent response?\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "8b10018448324153af2ee1f9bd83d140", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "HBox(children=(Button(description='👍', icon='thumbs-up', style=ButtonStyle()), Button(description='👎', icon='t…" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Scored the trace in Langfuse\n" + ] + } + ], + "source": [ + "from agents import Agent, Runner, WebSearchTool\n", + "from opentelemetry.trace import format_trace_id\n", + "import ipywidgets as widgets\n", + "from IPython.display import display\n", + "from langfuse import Langfuse\n", + "\n", + "langfuse = Langfuse()\n", + "\n", + "# Define your agent with the web search tool\n", + "agent = Agent(\n", + " name=\"WebSearchAgent\",\n", + " instructions=\"You are an agent that can search the web.\",\n", + " tools=[WebSearchTool()]\n", + ")\n", + "\n", + "formatted_trace_id = None # We'll store the current trace_id globally for demonstration\n", + "\n", + "def on_feedback(button):\n", + " if button.icon == \"thumbs-up\":\n", + " langfuse.score(\n", + " value=1,\n", + " name=\"user-feedback\",\n", + " comment=\"The user gave this response a thumbs up\",\n", + " trace_id=formatted_trace_id\n", + " )\n", + " elif button.icon == \"thumbs-down\":\n", + " langfuse.score(\n", + " value=0,\n", + " name=\"user-feedback\",\n", + " comment=\"The user gave this response a thumbs down\",\n", + " trace_id=formatted_trace_id\n", + " )\n", + " print(\"Scored the trace in Langfuse\")\n", + "\n", + "user_input = input(\"Enter your question: \")\n", + "\n", + "# Run agent\n", + "with trace.get_tracer(__name__).start_as_current_span(\"OpenAI-Agent-Trace\") as span:\n", + "\n", + " # Run your agent with a query\n", + " result = Runner.run_sync(agent, user_input)\n", + " print(result.final_output)\n", + "\n", + " current_span = trace.get_current_span()\n", + " span_context = current_span.get_span_context()\n", + " trace_id = span_context.trace_id\n", + " formatted_trace_id = str(format_trace_id(trace_id))\n", + " langfuse.trace(id=formatted_trace_id, input=user_input, output=result.final_output)\n", + "\n", + "# Get feedback\n", + "print(\"How did you like the agent response?\")\n", + "\n", + "thumbs_up = widgets.Button(description=\"👍\", icon=\"thumbs-up\")\n", + "thumbs_down = widgets.Button(description=\"👎\", icon=\"thumbs-down\")\n", + "\n", + "thumbs_up.on_click(on_feedback)\n", + "thumbs_down.on_click(on_feedback)\n", + "\n", + "display(widgets.HBox([thumbs_up, thumbs_down]))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "iiemuS7YfoSR" + }, + "source": [ + "User feedback is then captured in Langfuse:\n", + "\n", + "![User feedback is being captured in Langfuse](https://langfuse.com/images/cookbook/integration_openai-agents/open-ai-agent-user-feedback.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "29KsI9xcfoSR" + }, + "source": [ + "#### 5. LLM-as-a-Judge\n", + "\n", + "LLM-as-a-Judge is another way to automatically evaluate your agent's output. You can set up a separate LLM call to gauge the output’s correctness, toxicity, style, or any other criteria you care about.\n", + "\n", + "**Workflow**:\n", + "1. You define an **Evaluation Template**, e.g., \"Check if the text is toxic.\"\n", + "2. You set a model that is used as judge-model; in this case `gpt-4o-mini`.\n", + "2. Each time your agent generates output, you pass that output to your \"judge\" LLM with the template.\n", + "3. The judge LLM responds with a rating or label that you log to your observability tool.\n", + "\n", + "Example from Langfuse:\n", + "\n", + "![LLM-as-a-Judge Evaluation Template](https://langfuse.com/images/cookbook/integration_openai-agents/evaluator-template.png)\n", + "![LLM-as-a-Judge Evaluator](https://langfuse.com/images/cookbook/integration_openai-agents/evaluator.png)" + ] + }, + { + "cell_type": "code", + "execution_count": 40, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "UGGlYrB7foSR", + "outputId": "3e2d7d06-a5be-4552-9f17-88d93dd7b600" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "14:05:34.735 OpenAI Agents trace: Agent workflow\n", + "14:05:34.736 Agent run: 'WebSearchAgent'\n", + "14:05:34.738 Responses API with 'gpt-4o'\n" + ] + } + ], + "source": [ + "# Example: Checking if the agent’s output is toxic or not.\n", + "from agents import Agent, Runner, WebSearchTool\n", + "\n", + "# Define your agent with the web search tool\n", + "agent = Agent(\n", + " name=\"WebSearchAgent\",\n", + " instructions=\"You are an agent that can search the web.\",\n", + " tools=[WebSearchTool()]\n", + ")\n", + "\n", + "input_query = \"Is eating carrots good for the eyes?\"\n", + "\n", + "# Run agent\n", + "with trace.get_tracer(__name__).start_as_current_span(\"OpenAI-Agent-Trace\") as span:\n", + " # Run your agent with a query\n", + " result = Runner.run_sync(agent, input_query)\n", + "\n", + " # Add input and output values to parent trace\n", + " span.set_attribute(\"input.value\", input_query)\n", + " span.set_attribute(\"output.value\", result.final_output)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Izr-3LiQfoSR" + }, + "source": [ + "You can see that the answer of this example is judged as \"not toxic\".\n", + "\n", + "![LLM-as-a-Judge Evaluation Score](https://langfuse.com/images/cookbook/integration_openai-agents/llm-as-a-judge-score.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "g7fN0UTkfoSR" + }, + "source": [ + "#### 6. Observability Metrics Overview\n", + "\n", + "All of these metrics can be visualized together in dashboards. This enables you to quickly see how your agent performs across many sessions and helps you to track quality metrics over time.\n", + "\n", + "![Observability metrics overview](https://langfuse.com/images/cookbook/integration_openai-agents/dashboard-dark.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "zlwltgEkfoSR" + }, + "source": [ + "## Offline Evaluation\n", + "\n", + "Online evaluation is essential for live feedback, but you also need **offline evaluation**—systematic checks before or during development. This helps maintain quality and reliability before rolling changes into production." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "p5R8eNQxfoSR" + }, + "source": [ + "### Dataset Evaluation\n", + "\n", + "In offline evaluation, you typically:\n", + "1. Have a benchmark dataset (with prompt and expected output pairs)\n", + "2. Run your agent on that dataset\n", + "3. Compare outputs to the expected results or use an additional scoring mechanism\n", + "\n", + "Below, we demonstrate this approach with the [search-dataset](https://huggingface.co/datasets/junzhang1207/search-dataset), which contains questions that can be answered via the web search tool and expected answers." + ] + }, + { + "cell_type": "code", + "execution_count": 44, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 510, + "referenced_widgets": [ + "dbf84b0798e0432599453e370740acaa", + "f05fbf51e518417ea66a24db7b0a472e", + "27794498f97b40a6b4ef1f2e36bf317e", + "35fb6e8a89194653abcaa84f6da4f190", + "93572e5407ea443999d1da45280741f7", + "f3574545791843998157f0a3176e0ded", + "636090e9932f4ff6a76152c64c92347d", + "7928d3e09417467aaa74cb2c6cea32ca", + "1ef6f4bf46e24841916cc9c611c1498b", + "07f767145c7741bfb950cd983c757cc7", + "b620b045db0242189200a99090ea6b9f", + "33f6af1d99a9451a86e6e6690cec7e43", + "722784d0cb184f02b5250c57829cfdc4", + "e98c3c9567334b69a46ab23cd378358e", + "f0c04648902343288b4248dbf5589d7b", + "b32d7f992f064016ab548a961569b632", + "a95842a82d46492796b7d0fd45bb9795", + "9b33b4e7c6bb4a728f21258c10034066", + "6b2cbe08ebad4df8bf9e4711d78920f2", + "ed53bc55a6da404bac64993433f78ccf", + "f68c9919bd26417cb1f3950f121276e3", + "e815dd583c3243efa5d3b57672519f74", + "ea5cea15ae5741418720f77d5879ecc2", + "f8be1fdbe50649ebb614194c140e0d9a", + "bdb56cd2387e4e30a1bf82beecf481d1", + "12b08dc8912c4bfc81c81524eabc7898", + "57152a81a7a24410992ccce525f28af0", + "799212d16c814bd696d57d0cdd35d9c7", + "f6935aa898f544d2b4051ee259ec9e30", + "a049a1b305ed4660bc48f81fe1c6c0a7", + "9e40a549b3094b3892de5159dfe935ff", + "15fa9808df56469db8cc623bd127ceae", + "8948024991964982ad058f199066cc55", + "0f5356bc1d4a4e0f895fa9482bb06a7a", + "427363e7b1ce4c7387b93f2794bdedb4", + "690c3eb1f314478083b725bcb1d38a25", + "edc537f5b13348c6b831365930c0cf31", + "7f0d7cf658054c448668bebacd96a59d", + "71277158b5ec43c8aa0accb86b15952d", + "c6b284ea83ee4de0a6de5210067b4b6b", + "c248d2dba9484807bc53d23ece012644", + "2b4fb4ce8c71405d80da978589750950", + "07431a8d1a2044b6baebcef3242f41c5", + "e2941796709b4284aa3b6143b19e064e" + ] + }, + "id": "r77WUP9NfoSS", + "outputId": "7809f9f1-b496-4672-e17c-41681e66accc" + }, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "dbf84b0798e0432599453e370740acaa", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "README.md: 0%| | 0.00/2.12k [00:00= 49: # For this example, we upload only the first 50 items\n", + " break" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "QgHw2e6afoSS" + }, + "source": [ + "![Dataset items in Langfuse](https://langfuse.com/images/cookbook/integration_openai-agents/example-dataset.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "xZl2W_u0foSS" + }, + "source": [ + "#### Running the Agent on the Dataset\n", + "\n", + "We define a helper function `run_openai_agent()` that:\n", + "1. Starts an OpenTelemetry span\n", + "2. Runs our agent on the prompt\n", + "3. Records the trace ID in Langfuse\n", + "\n", + "Then, we loop over each dataset item, run the agent, and link the trace to the dataset item. We can also attach a quick evaluation score if desired." + ] + }, + { + "cell_type": "code", + "execution_count": 60, + "metadata": { + "id": "-rYh1PBRfoSS" + }, + "outputs": [], + "source": [ + "from agents import Agent, Runner, WebSearchTool\n", + "from opentelemetry.trace import format_trace_id\n", + "\n", + "# Define your agent with the web search tool\n", + "agent = Agent(\n", + " name=\"WebSearchAgent\",\n", + " instructions=\"You are an agent that can search the web.\",\n", + " tools=[WebSearchTool(search_context_size= \"high\")]\n", + ")\n", + "\n", + "def run_openai_agent(question):\n", + " with tracer.start_as_current_span(\"OpenAI-Agent-Trace\") as span:\n", + " span.set_attribute(\"langfuse.tag\", \"dataset-run\")\n", + "\n", + " # Run your agent with a query\n", + " result = Runner.run_sync(agent, question)\n", + "\n", + " # Get the Langfuse trace_id to link the dataset run item to the agent trace\n", + " current_span = trace.get_current_span()\n", + " span_context = current_span.get_span_context()\n", + " trace_id = span_context.trace_id\n", + " formatted_trace_id = format_trace_id(trace_id)\n", + "\n", + " langfuse_trace = langfuse.trace(\n", + " id=formatted_trace_id,\n", + " input=question,\n", + " output=result.final_output\n", + " )\n", + " return langfuse_trace, result.final_output" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "collapsed": true, + "id": "nF6JLCsYfoST", + "outputId": "84c3b74e-1aa3-4ef2-f285-f390095b03c0" + }, + "outputs": [], + "source": [ + "dataset = langfuse.get_dataset(langfuse_dataset_name)\n", + "\n", + "# Run our agent against each dataset item\n", + "for item in dataset.items:\n", + " langfuse_trace, output = run_openai_agent(item.input[\"text\"])\n", + "\n", + " # Link the trace to the dataset item for analysis\n", + " item.link(\n", + " langfuse_trace,\n", + " run_name=\"openai-agent-run-03\",\n", + " run_metadata={ \"search_context_size\": \"high\"}\n", + " )\n", + "\n", + " # Optionally, store a quick evaluation score for demonstration\n", + " langfuse_trace.score(\n", + " name=\"\",\n", + " value=1,\n", + " comment=\"This is a comment\"\n", + " )\n", + "\n", + "# Flush data to ensure all telemetry is sent\n", + "langfuse.flush()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2kUYV69HfoST" + }, + "source": [ + "You can repeat this process with different:\n", + "- Search tools (e.g. different context sized for OpenAI's `WebSearchTool`)\n", + "- Models (gpt-4o-mini, o1, etc.)\n", + "- Tools (search vs. no search)\n", + "\n", + "Then compare them side-by-side in Langfuse. In this example, I did run the agent 3 times on the 50 dataset questions. For each run, I used a different setting for the context size of OpenAI's `WebSearchTool`. You can see that an increased context size also slightly increased the answer correctness from `0.89` to `0.92`. The `correct_answer` score is created by an [LLM-as-a-Judge Evaluator](https://langfuse.com/docs/scores/model-based-evals) that is set up to judge the correctness of the question based on the sample answer given in the dataset.\n", + "\n", + "![Dataset run overview](https://langfuse.com/images/cookbook/integration_openai-agents/dataset_runs.png)\n", + "![Dataset run comparison](https://langfuse.com/images/cookbook/integration_openai-agents/dataset-run-comparison.png)\n" + ] + } + ], + "metadata": { + "colab": { + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.13.2" + }, + "widgets": { + "application/vnd.jupyter.widget-state+json": { + "07431a8d1a2044b6baebcef3242f41c5": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "07f767145c7741bfb950cd983c757cc7": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "0f5356bc1d4a4e0f895fa9482bb06a7a": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_427363e7b1ce4c7387b93f2794bdedb4", + "IPY_MODEL_690c3eb1f314478083b725bcb1d38a25", + "IPY_MODEL_edc537f5b13348c6b831365930c0cf31" + ], + "layout": "IPY_MODEL_7f0d7cf658054c448668bebacd96a59d" + } + }, + "12b08dc8912c4bfc81c81524eabc7898": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_15fa9808df56469db8cc623bd127ceae", + "placeholder": "​", + "style": "IPY_MODEL_8948024991964982ad058f199066cc55", + "value": " 316k/316k [00:00<00:00, 1.94MB/s]" + } + }, + "15fa9808df56469db8cc623bd127ceae": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "1ef6f4bf46e24841916cc9c611c1498b": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "27794498f97b40a6b4ef1f2e36bf317e": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_7928d3e09417467aaa74cb2c6cea32ca", + "max": 2125, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_1ef6f4bf46e24841916cc9c611c1498b", + "value": 2125 + } + }, + "2b4fb4ce8c71405d80da978589750950": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "33f6af1d99a9451a86e6e6690cec7e43": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_722784d0cb184f02b5250c57829cfdc4", + "IPY_MODEL_e98c3c9567334b69a46ab23cd378358e", + "IPY_MODEL_f0c04648902343288b4248dbf5589d7b" + ], + "layout": "IPY_MODEL_b32d7f992f064016ab548a961569b632" + } + }, + "35fb6e8a89194653abcaa84f6da4f190": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_07f767145c7741bfb950cd983c757cc7", + "placeholder": "​", + "style": "IPY_MODEL_b620b045db0242189200a99090ea6b9f", + "value": " 2.12k/2.12k [00:00<00:00, 37.6kB/s]" + } + }, + "427363e7b1ce4c7387b93f2794bdedb4": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_71277158b5ec43c8aa0accb86b15952d", + "placeholder": "​", + "style": "IPY_MODEL_c6b284ea83ee4de0a6de5210067b4b6b", + "value": "Generating train split: 100%" + } + }, + "57152a81a7a24410992ccce525f28af0": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "5a2b1d2255a34a7597b263755eaa14b3": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "636090e9932f4ff6a76152c64c92347d": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "DescriptionStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "690c3eb1f314478083b725bcb1d38a25": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_c248d2dba9484807bc53d23ece012644", + "max": 934, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_2b4fb4ce8c71405d80da978589750950", + "value": 934 + } + }, + "6b2cbe08ebad4df8bf9e4711d78920f2": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "71277158b5ec43c8aa0accb86b15952d": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "722784d0cb184f02b5250c57829cfdc4": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_a95842a82d46492796b7d0fd45bb9795", + "placeholder": "​", + "style": "IPY_MODEL_9b33b4e7c6bb4a728f21258c10034066", + "value": "data-samples.json: 100%" + } + }, + "7928d3e09417467aaa74cb2c6cea32ca": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "799212d16c814bd696d57d0cdd35d9c7": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "7f0d7cf658054c448668bebacd96a59d": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "8948024991964982ad058f199066cc55": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "DescriptionStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "8b10018448324153af2ee1f9bd83d140": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_cebcce63ea37474ca10f1828105ca2e6", + "IPY_MODEL_9153dfceabff450ead31493c3c518d4c" + ], + "layout": "IPY_MODEL_5a2b1d2255a34a7597b263755eaa14b3" + } + }, + "9153dfceabff450ead31493c3c518d4c": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "ButtonModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ButtonModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ButtonView", + "button_style": "", + "description": "👎", + "disabled": false, + "icon": "thumbs-down", + "layout": "IPY_MODEL_ecd5521cdbc34eb7a866b4b2094fd500", + "style": "IPY_MODEL_df007d6320cb4198a6dbf58485980394", + "tooltip": "" + } + }, + "93572e5407ea443999d1da45280741f7": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "9b33b4e7c6bb4a728f21258c10034066": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "DescriptionStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "9e40a549b3094b3892de5159dfe935ff": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "a049a1b305ed4660bc48f81fe1c6c0a7": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "a95842a82d46492796b7d0fd45bb9795": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "b32d7f992f064016ab548a961569b632": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "b620b045db0242189200a99090ea6b9f": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "DescriptionStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "bdb56cd2387e4e30a1bf82beecf481d1": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_a049a1b305ed4660bc48f81fe1c6c0a7", + "max": 316103, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_9e40a549b3094b3892de5159dfe935ff", + "value": 316103 + } + }, + "c248d2dba9484807bc53d23ece012644": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "c6b284ea83ee4de0a6de5210067b4b6b": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "DescriptionStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "c8b3aa3aeec046ef8acfab640c2dee17": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "cebcce63ea37474ca10f1828105ca2e6": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "ButtonModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ButtonModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ButtonView", + "button_style": "", + "description": "👍", + "disabled": false, + "icon": "thumbs-up", + "layout": "IPY_MODEL_c8b3aa3aeec046ef8acfab640c2dee17", + "style": "IPY_MODEL_ee1b1596e6ec42029fbf8b711c0fc41a", + "tooltip": "" + } + }, + "dbf84b0798e0432599453e370740acaa": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_f05fbf51e518417ea66a24db7b0a472e", + "IPY_MODEL_27794498f97b40a6b4ef1f2e36bf317e", + "IPY_MODEL_35fb6e8a89194653abcaa84f6da4f190" + ], + "layout": "IPY_MODEL_93572e5407ea443999d1da45280741f7" + } + }, + "df007d6320cb4198a6dbf58485980394": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "ButtonStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ButtonStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "button_color": null, + "font_weight": "" + } + }, + "e2941796709b4284aa3b6143b19e064e": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "DescriptionStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "e815dd583c3243efa5d3b57672519f74": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "DescriptionStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "e98c3c9567334b69a46ab23cd378358e": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_6b2cbe08ebad4df8bf9e4711d78920f2", + "max": 2479, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_ed53bc55a6da404bac64993433f78ccf", + "value": 2479 + } + }, + "ea5cea15ae5741418720f77d5879ecc2": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_f8be1fdbe50649ebb614194c140e0d9a", + "IPY_MODEL_bdb56cd2387e4e30a1bf82beecf481d1", + "IPY_MODEL_12b08dc8912c4bfc81c81524eabc7898" + ], + "layout": "IPY_MODEL_57152a81a7a24410992ccce525f28af0" + } + }, + "ecd5521cdbc34eb7a866b4b2094fd500": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "ed53bc55a6da404bac64993433f78ccf": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "edc537f5b13348c6b831365930c0cf31": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_07431a8d1a2044b6baebcef3242f41c5", + "placeholder": "​", + "style": "IPY_MODEL_e2941796709b4284aa3b6143b19e064e", + "value": " 934/934 [00:00<00:00, 33.25 examples/s]" + } + }, + "ee1b1596e6ec42029fbf8b711c0fc41a": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "ButtonStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ButtonStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "button_color": null, + "font_weight": "" + } + }, + "f05fbf51e518417ea66a24db7b0a472e": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_f3574545791843998157f0a3176e0ded", + "placeholder": "​", + "style": "IPY_MODEL_636090e9932f4ff6a76152c64c92347d", + "value": "README.md: 100%" + } + }, + "f0c04648902343288b4248dbf5589d7b": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_f68c9919bd26417cb1f3950f121276e3", + "placeholder": "​", + "style": "IPY_MODEL_e815dd583c3243efa5d3b57672519f74", + "value": " 2.48k/2.48k [00:00<00:00, 58.3kB/s]" + } + }, + "f3574545791843998157f0a3176e0ded": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "f68c9919bd26417cb1f3950f121276e3": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "1.2.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "f6935aa898f544d2b4051ee259ec9e30": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "DescriptionStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "f8be1fdbe50649ebb614194c140e0d9a": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "1.5.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_799212d16c814bd696d57d0cdd35d9c7", + "placeholder": "​", + "style": "IPY_MODEL_f6935aa898f544d2b4051ee259ec9e30", + "value": "data.jsonl: 100%" + } + } + } + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} From 0de20a548e844c654e58234983ff4c17ff5f48c2 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jannik=20Maierh=C3=B6fer?= Date: Mon, 31 Mar 2025 15:30:38 +0200 Subject: [PATCH 2/2] edit tree and author --- authors.yaml | 5 +++++ registry.yaml | 8 ++++++++ 2 files changed, 13 insertions(+) diff --git a/authors.yaml b/authors.yaml index 0978111801..24f68d6e4f 100644 --- a/authors.yaml +++ b/authors.yaml @@ -267,3 +267,8 @@ erikakettleson-openai: name: "Erika Kettleson" website: "https://www.linkedin.com/in/erika-kettleson-85763196/" avatar: "https://avatars.githubusercontent.com/u/186107044?v=4" + +jannik-maierhofer: + name: "Jannik Maierhöfer" + website: "https://www.linkedin.com/in/maierhoefer/" + avatar: "https://avatars.githubusercontent.com/u/48529566" diff --git a/registry.yaml b/registry.yaml index 4b11b77ec5..090a1178da 100644 --- a/registry.yaml +++ b/registry.yaml @@ -1868,3 +1868,11 @@ - audio - speech +- title: Evaluating OpenAI Agents + path: examples/agents_sdk/evaluate_agents.ipynb + date: 2025-03-31 + authors: + - jannik-maierhofer + tags: + - agents + - evaluation