diff --git a/use-case-examples/Agentic-Tool-Calling-with-Nemotron-Super/README.md b/use-case-examples/Agentic-Tool-Calling-with-Nemotron-Super/README.md
new file mode 100644
index 0000000..72f326b
--- /dev/null
+++ b/use-case-examples/Agentic-Tool-Calling-with-Nemotron-Super/README.md
@@ -0,0 +1,52 @@
+# Agentic Tool-Calling with Nemotron 3 Super
+
+Build multi-step AI agents that plan, call tools, and synthesize results using Nemotron 3 Super's structured function-calling capabilities.
+
+## Overview
+
+This example demonstrates how to build agentic workflows with Nemotron 3 Super, progressing from simple tool calls to a fully autonomous agent loop:
+
+1. **Single Tool Call** - Model selects and invokes one function
+2. **Multi-Turn Tool Calling** - Model chains tool results across conversation turns
+3. **Autonomous Agent Loop** - Model plans a strategy, executes multiple tools, and synthesizes a final report
+4. **Reasoning Modes** - Compare `reasoning-off`, `regular`, and `low-effort` modes with tool calling
+
+## Models Used
+
+| Component | Model | Parameters | Deployment |
+|-----------|-------|------------|------------|
+| **Reasoning + Tool Calling** | `nvidia/nemotron-3-super-120b-a12b` | 120B total / 12B active | NVIDIA API or self-hosted (vLLM) |
+
+## Why Nemotron 3 Super for Agents?
+
+- **85.6% on PinchBench** - Best open model for agentic tasks
+- **Trained on 21 RL environments** including TerminalBench, TauBench V2, and SWE-Bench
+- **Structured tool calling** with JSON schema support via OpenAI-compatible API
+- **Three reasoning modes** for balancing speed vs. depth in tool-calling scenarios
+- **Hybrid Mamba-Transformer MoE** architecture delivers high throughput at inference time
+
+## Requirements
+
+- Python 3.10+
+- NVIDIA API Key ([get one here](https://build.nvidia.com/))
+
+## Quick Start
+
+```bash
+# Install dependencies
+pip install openai
+
+# Set your API key
+export NVIDIA_API_KEY="your-key-here"
+
+# Run the notebook
+jupyter notebook agentic_tool_calling_tutorial.ipynb
+```
+
+## What You'll Learn
+
+- How to define tools with JSON schema for Nemotron 3 Super
+- Building a tool-calling conversation loop with proper message threading
+- Implementing an autonomous agent that plans and executes multi-step tasks
+- Choosing the right reasoning mode for different agentic scenarios
+- Best practices for system prompts, error handling, and tool result formatting
diff --git a/use-case-examples/Agentic-Tool-Calling-with-Nemotron-Super/agentic_tool_calling_tutorial.ipynb b/use-case-examples/Agentic-Tool-Calling-with-Nemotron-Super/agentic_tool_calling_tutorial.ipynb
new file mode 100644
index 0000000..d2a0ce0
--- /dev/null
+++ b/use-case-examples/Agentic-Tool-Calling-with-Nemotron-Super/agentic_tool_calling_tutorial.ipynb
@@ -0,0 +1,719 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Agentic Tool-Calling with Nemotron 3 Super\n",
+    "\n",
+    "This notebook demonstrates how to build multi-step AI agents using **Nemotron 3 Super's** structured tool-calling capabilities. We progress from a single tool call to a fully autonomous agent loop that plans, executes, and synthesizes.\n",
+    "\n",
+    "| Component | Model | Parameters | Deployment |\n",
+    "|-----------|-------|------------|------------|\n",
+    "| **Reasoning + Tool Calling** | `nvidia/nemotron-3-super-120b-a12b` | 120B total / 12B active | NVIDIA API |\n",
+    "\n",
+    "**Prerequisites:** An NVIDIA API key from [build.nvidia.com](https://build.nvidia.com/)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 1. Setup"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!pip install -q openai"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import json\n",
+    "import os\n",
+    "from getpass import getpass\n",
+    "\n",
+    "from openai import OpenAI\n",
+    "\n",
+    "NVIDIA_API_KEY = os.environ.get(\"NVIDIA_API_KEY\") or getpass(\"NVIDIA API key: \").strip()\n",
+    "\n",
+    "client = OpenAI(\n",
+    "    base_url=\"https://integrate.api.nvidia.com/v1\",\n",
+    "    api_key=NVIDIA_API_KEY,\n",
+    ")\n",
+    "\n",
+    "MODEL = \"nvidia/nemotron-3-super-120b-a12b\"\n",
+    "\n",
+    "print(f\"Using model: {MODEL}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 2. Define Tools\n",
+    "\n",
+    "We define a set of tools that our agent can use. Each tool has a JSON schema describing its parameters, following the OpenAI function-calling format that Nemotron 3 Super supports natively.\n",
+    "\n",
+    "Our agent will be a **research assistant** that can search for information, read documents, extract structured data, perform calculations, and save reports."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Tool definitions using OpenAI-compatible JSON schema format\n",
+    "TOOLS = [\n",
+    "    {\n",
+    "        \"type\": \"function\",\n",
+    "        \"function\": {\n",
+    "            \"name\": \"search_knowledge_base\",\n",
+    "            \"description\": \"Search a knowledge base for information on a topic. Returns a list of relevant snippets with source references.\",\n",
+    "            \"parameters\": {\n",
+    "                \"type\": \"object\",\n",
+    "                \"properties\": {\n",
+    "                    \"query\": {\n",
+    "                        \"type\": \"string\",\n",
+    "                        \"description\": \"The search query\"\n",
+    "                    },\n",
+    "                    \"max_results\": {\n",
+    "                        \"type\": \"integer\",\n",
+    "                        \"description\": \"Maximum number of results to return\",\n",
+    "                        \"default\": 3\n",
+    "                    }\n",
+    "                },\n",
+    "                \"required\": [\"query\"]\n",
+    "            }\n",
+    "        }\n",
+    "    },\n",
+    "    {\n",
+    "        \"type\": \"function\",\n",
+    "        \"function\": {\n",
+    "            \"name\": \"read_document\",\n",
+    "            \"description\": \"Read the full text of a document given its identifier.\",\n",
+    "            \"parameters\": {\n",
+    "                \"type\": \"object\",\n",
+    "                \"properties\": {\n",
+    "                    \"document_id\": {\n",
+    "                        \"type\": \"string\",\n",
+    "                        \"description\": \"The unique identifier of the document to read\"\n",
+    "                    }\n",
+    "                },\n",
+    "                \"required\": [\"document_id\"]\n",
+    "            }\n",
+    "        }\n",
+    "    },\n",
+    "    {\n",
+    "        \"type\": \"function\",\n",
+    "        \"function\": {\n",
+    "            \"name\": \"extract_structured_data\",\n",
+    "            \"description\": \"Extract structured fields from unstructured text. Returns a JSON object with the requested fields.\",\n",
+    "            \"parameters\": {\n",
+    "                \"type\": \"object\",\n",
+    "                \"properties\": {\n",
+    "                    \"text\": {\n",
+    "                        \"type\": \"string\",\n",
+    "                        \"description\": \"The text to extract data from\"\n",
+    "                    },\n",
+    "                    \"fields\": {\n",
+    "                        \"type\": \"array\",\n",
+    "                        \"items\": {\"type\": \"string\"},\n",
+    "                        \"description\": \"List of field names to extract\"\n",
+    "                    }\n",
+    "                },\n",
+    "                \"required\": [\"text\", \"fields\"]\n",
+    "            }\n",
+    "        }\n",
+    "    },\n",
+    "    {\n",
+    "        \"type\": \"function\",\n",
+    "        \"function\": {\n",
+    "            \"name\": \"calculate\",\n",
+    "            \"description\": \"Evaluate a mathematical expression and return the result.\",\n",
+    "            \"parameters\": {\n",
+    "                \"type\": \"object\",\n",
+    "                \"properties\": {\n",
+    "                    \"expression\": {\n",
+    "                        \"type\": \"string\",\n",
+    "                        \"description\": \"A mathematical expression to evaluate (e.g., '(100 * 1.05) - 50')\"\n",
+    "                    }\n",
+    "                },\n",
+    "                \"required\": [\"expression\"]\n",
+    "            }\n",
+    "        }\n",
+    "    },\n",
+    "    {\n",
+    "        \"type\": \"function\",\n",
+    "        \"function\": {\n",
+    "            \"name\": \"save_report\",\n",
+    "            \"description\": \"Save a research report with a title and content.\",\n",
+    "            \"parameters\": {\n",
+    "                \"type\": \"object\",\n",
+    "                \"properties\": {\n",
+    "                    \"title\": {\n",
+    "                        \"type\": \"string\",\n",
+    "                        \"description\": \"The report title\"\n",
+    "                    },\n",
+    "                    \"content\": {\n",
+    "                        \"type\": \"string\",\n",
+    "                        \"description\": \"The full report content in markdown format\"\n",
+    "                    }\n",
+    "                },\n",
+    "                \"required\": [\"title\", \"content\"]\n",
+    "            }\n",
+    "        }\n",
+    "    }\n",
+    "]\n",
+    "\n",
+    "print(f\"Defined {len(TOOLS)} tools: {[t['function']['name'] for t in TOOLS]}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Simulated Tool Implementations\n",
+    "\n",
+    "These implementations simulate real tool behavior so the notebook is self-contained and requires no external services beyond the NVIDIA API."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Simulated knowledge base for the research assistant\n",
+    "KNOWLEDGE_BASE = {\n",
+    "    \"nemotron-architecture\": {\n",
+    "        \"title\": \"Nemotron 3 Super Architecture Overview\",\n",
+    "        \"content\": (\n",
+    "            \"Nemotron 3 Super is a 120.6B parameter hybrid Mamba-Transformer \"\n",
+    "            \"Latent Mixture-of-Experts model with 12.7B active parameters per \"\n",
+    "            \"forward pass. It uses 64 experts with top-4 routing for MLP layers \"\n",
+    "            \"and Latent MoE attention with 16 experts (top-2 routing). The hybrid \"\n",
+    "            \"architecture alternates between 32 Mamba-2 layers and 32 Transformer \"\n",
+    "            \"layers. Context window extends to 1M tokens via YaRN-based positional \"\n",
+    "            \"interpolation. Training used 30T tokens across pretraining, SFT, and \"\n",
+    "            \"a three-stage RL pipeline.\"\n",
+    "        ),\n",
+    "    },\n",
+    "    \"nemotron-benchmarks\": {\n",
+    "        \"title\": \"Nemotron 3 Super Benchmark Results\",\n",
+    "        \"content\": (\n",
+    "            \"PinchBench: 85.6% (best open model). MATH-500: 97.4%. AIME 2025: 72.2%. \"\n",
+    "            \"GPQA Diamond: 71.1%. LiveCodeBench v6: 63.3%. HumanEval: 92.1%. \"\n",
+    "            \"SWE-Bench Verified: 55.4%. TerminalBench: 40.6%. TauBench V2 Airline: 62.0%. \"\n",
+    "            \"The model achieves these scores with only 12B active parameters, \"\n",
+    "            \"delivering 5x higher throughput than dense models of similar accuracy.\"\n",
+    "        ),\n",
+    "    },\n",
+    "    \"nemotron-training\": {\n",
+    "        \"title\": \"Nemotron 3 Super Training Pipeline\",\n",
+    "        \"content\": (\n",
+    "            \"Three-stage training: (1) Pretraining on 30T tokens with curriculum \"\n",
+    "            \"learning across code, math, science, and general text. (2) Multi-domain \"\n",
+    "            \"SFT over 7M samples covering 15+ data domains including competition \"\n",
+    "            \"math/code, software engineering, agentic programming, CUDA, financial \"\n",
+    "            \"reasoning, and more. Uses a novel two-stage SFT loss. (3) Three-stage \"\n",
+    "            \"RL: multi-environment RLVR across 21 environments and 37 datasets, \"\n",
+    "            \"SWE-RL with container-isolated sandbox execution, and RLHF with a \"\n",
+    "            \"principle-following GenRM.\"\n",
+    "        ),\n",
+    "    },\n",
+    "    \"mamba-architecture\": {\n",
+    "        \"title\": \"Mamba-2 State Space Model Architecture\",\n",
+    "        \"content\": (\n",
+    "            \"Mamba-2 is a selective state space model that achieves linear-time \"\n",
+    "            \"sequence processing. Unlike attention which scales quadratically, \"\n",
+    "            \"Mamba-2 maintains a compressed state that is updated incrementally. \"\n",
+    "            \"This enables efficient processing of very long sequences. In the \"\n",
+    "            \"hybrid architecture, Mamba layers handle sequential dependencies \"\n",
+    "            \"while Transformer layers provide precise attention for complex \"\n",
+    "            \"reasoning tasks. The alternating pattern (32 Mamba + 32 Transformer) \"\n",
+    "            \"balances efficiency and accuracy.\"\n",
+    "        ),\n",
+    "    },\n",
+    "    \"latent-moe\": {\n",
+    "        \"title\": \"Latent Mixture-of-Experts Explained\",\n",
+    "        \"content\": (\n",
+    "            \"Latent MoE is an architectural innovation that applies the MoE \"\n",
+    "            \"pattern to attention layers, not just MLP layers. Traditional MoE \"\n",
+    "            \"routes tokens to different MLP experts. Latent MoE extends this by \"\n",
+    "            \"routing to different attention heads, effectively allowing 4x more \"\n",
+    "            \"experts at the same compute cost. Nemotron 3 Super uses 16 Latent \"\n",
+    "            \"MoE attention experts with top-2 routing in each Transformer layer, \"\n",
+    "            \"alongside 64 MLP experts with top-4 routing.\"\n",
+    "        ),\n",
+    "    },\n",
+    "}\n",
+    "\n",
+    "SAVED_REPORTS = []\n",
+    "\n",
+    "\n",
+    "def execute_tool(name: str, arguments: dict) -> str:\n",
+    "    \"\"\"Execute a tool call and return the result as a string.\"\"\"\n",
+    "    if name == \"search_knowledge_base\":\n",
+    "        query = arguments[\"query\"].lower()\n",
+    "        max_results = arguments.get(\"max_results\", 3)\n",
+    "        results = []\n",
+    "        for doc_id, doc in KNOWLEDGE_BASE.items():\n",
+    "            # Simple keyword matching for simulation\n",
+    "            if any(word in doc[\"content\"].lower() or word in doc[\"title\"].lower()\n",
+    "                   for word in query.split()):\n",
+    "                results.append({\n",
+    "                    \"document_id\": doc_id,\n",
+    "                    \"title\": doc[\"title\"],\n",
+    "                    \"snippet\": doc[\"content\"][:150] + \"...\",\n",
+    "                })\n",
+    "        return json.dumps(results[:max_results])\n",
+    "\n",
+    "    elif name == \"read_document\":\n",
+    "        doc_id = arguments[\"document_id\"]\n",
+    "        if doc_id in KNOWLEDGE_BASE:\n",
+    "            doc = KNOWLEDGE_BASE[doc_id]\n",
+    "            return json.dumps({\"title\": doc[\"title\"], \"content\": doc[\"content\"]})\n",
+    "        return json.dumps({\"error\": f\"Document '{doc_id}' not found\"})\n",
+    "\n",
+    "    elif name == \"extract_structured_data\":\n",
+    "        text = arguments[\"text\"]\n",
+    "        fields = arguments[\"fields\"]\n",
+    "        # Simulate extraction by returning placeholder values\n",
+    "        extracted = {}\n",
+    "        for field in fields:\n",
+    "            field_lower = field.lower()\n",
+    "            if \"parameter\" in field_lower and \"120\" in text:\n",
+    "                extracted[field] = \"120.6B total, 12.7B active\"\n",
+    "            elif \"active\" in field_lower and \"12\" in text:\n",
+    "                extracted[field] = \"12.7B\"\n",
+    "            elif \"context\" in field_lower and \"1M\" in text:\n",
+    "                extracted[field] = \"1M tokens\"\n",
+    "            elif \"expert\" in field_lower and \"64\" in text:\n",
+    "                extracted[field] = \"64 MLP experts (top-4), 16 attention experts (top-2)\"\n",
+    "            elif \"benchmark\" in field_lower or \"score\" in field_lower:\n",
+    "                extracted[field] = \"PinchBench 85.6%, MATH-500 97.4%\"\n",
+    "            else:\n",
+    "                extracted[field] = f\"[extracted from text for '{field}']\"\n",
+    "        return json.dumps(extracted)\n",
+    "\n",
+    "    elif name == \"calculate\":\n",
+    "        expression = arguments[\"expression\"]\n",
+    "        try:\n",
+    "            # Only allow safe mathematical expressions\n",
+    "            allowed = set(\"0123456789+-*/().% \")\n",
+    "            if all(c in allowed for c in expression):\n",
+    "                result = eval(expression)  # noqa: S307\n",
+    "                return json.dumps({\"expression\": expression, \"result\": result})\n",
+    "            return json.dumps({\"error\": \"Invalid expression\"})\n",
+    "        except Exception as e:\n",
+    "            return json.dumps({\"error\": str(e)})\n",
+    "\n",
+    "    elif name == \"save_report\":\n",
+    "        title = arguments[\"title\"]\n",
+    "        content = arguments[\"content\"]\n",
+    "        report = {\"title\": title, \"content\": content, \"id\": len(SAVED_REPORTS) + 1}\n",
+    "        SAVED_REPORTS.append(report)\n",
+    "        return json.dumps({\"status\": \"saved\", \"report_id\": report[\"id\"], \"title\": title})\n",
+    "\n",
+    "    return json.dumps({\"error\": f\"Unknown tool: {name}\"})\n",
+    "\n",
+    "\n",
+    "print(\"Tool implementations ready.\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 3. Single Tool Call\n",
+    "\n",
+    "The simplest case: the model decides to call one tool to answer a question."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "response = client.chat.completions.create(\n",
+    "    model=MODEL,\n",
+    "    messages=[\n",
+    "        {\"role\": \"system\", \"content\": \"You are a research assistant. Use the provided tools to answer questions accurately.\"},\n",
+    "        {\"role\": \"user\", \"content\": \"What is Nemotron 3 Super's score on PinchBench?\"},\n",
+    "    ],\n",
+    "    tools=TOOLS,\n",
+    "    tool_choice=\"auto\",\n",
+    ")\n",
+    "\n",
+    "message = response.choices[0].message\n",
+    "print(f\"Model decided to call: {message.tool_calls[0].function.name}\")\n",
+    "print(f\"With arguments: {message.tool_calls[0].function.arguments}\")\n",
+    "\n",
+    "# Execute the tool\n",
+    "tool_call = message.tool_calls[0]\n",
+    "result = execute_tool(tool_call.function.name, json.loads(tool_call.function.arguments))\n",
+    "print(f\"\\nTool result: {result}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now feed the tool result back to the model so it can formulate a natural language answer:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Continue the conversation with the tool result\n",
+    "follow_up = client.chat.completions.create(\n",
+    "    model=MODEL,\n",
+    "    messages=[\n",
+    "        {\"role\": \"system\", \"content\": \"You are a research assistant. Use the provided tools to answer questions accurately.\"},\n",
+    "        {\"role\": \"user\", \"content\": \"What is Nemotron 3 Super's score on PinchBench?\"},\n",
+    "        message,  # The assistant's tool-call message\n",
+    "        {\"role\": \"tool\", \"tool_call_id\": tool_call.id, \"content\": result},\n",
+    "    ],\n",
+    "    tools=TOOLS,\n",
+    "    tool_choice=\"auto\",\n",
+    ")\n",
+    "\n",
+    "print(\"Model's answer:\")\n",
+    "print(follow_up.choices[0].message.content)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 4. Multi-Turn Tool Calling\n",
+    "\n",
+    "In a multi-turn scenario, the model calls a tool, gets the result, then decides whether to call another tool or provide a final answer. This enables the model to gather information incrementally.\n",
+    "\n",
+    "Here we implement a helper that runs the tool-calling loop until the model produces a final text response."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def run_tool_loop(\n",
+    "    messages: list[dict],\n",
+    "    tools: list[dict],\n",
+    "    max_turns: int = 10,\n",
+    "    verbose: bool = True,\n",
+    ") -> str:\n",
+    "    \"\"\"\n",
+    "    Run a tool-calling loop until the model produces a final text response\n",
+    "    or the maximum number of turns is reached.\n",
+    "\n",
+    "    Returns the model's final text response.\n",
+    "    \"\"\"\n",
+    "    for turn in range(max_turns):\n",
+    "        response = client.chat.completions.create(\n",
+    "            model=MODEL,\n",
+    "            messages=messages,\n",
+    "            tools=tools,\n",
+    "            tool_choice=\"auto\",\n",
+    "        )\n",
+    "\n",
+    "        assistant_message = response.choices[0].message\n",
+    "        messages.append(assistant_message)\n",
+    "\n",
+    "        # If the model didn't call any tools, we have our final answer\n",
+    "        if not assistant_message.tool_calls:\n",
+    "            if verbose:\n",
+    "                print(f\"\\n[Turn {turn + 1}] Final answer (no more tool calls)\")\n",
+    "            return assistant_message.content\n",
+    "\n",
+    "        # Execute each tool call and add results to the conversation\n",
+    "        for tool_call in assistant_message.tool_calls:\n",
+    "            args = json.loads(tool_call.function.arguments)\n",
+    "            if verbose:\n",
+    "                print(f\"[Turn {turn + 1}] Calling {tool_call.function.name}({json.dumps(args, indent=2)})\")\n",
+    "\n",
+    "            result = execute_tool(tool_call.function.name, args)\n",
+    "            if verbose:\n",
+    "                print(f\"  -> Result: {result[:200]}{'...' if len(result) > 200 else ''}\")\n",
+    "\n",
+    "            messages.append({\n",
+    "                \"role\": \"tool\",\n",
+    "                \"tool_call_id\": tool_call.id,\n",
+    "                \"content\": result,\n",
+    "            })\n",
+    "\n",
+    "    return \"[Max turns reached without a final answer]\"\n",
+    "\n",
+    "\n",
+    "print(\"Tool loop helper defined.\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Multi-turn example: a question that requires multiple tool calls\n",
+    "messages = [\n",
+    "    {\n",
+    "        \"role\": \"system\",\n",
+    "        \"content\": (\n",
+    "            \"You are a research assistant with access to a technical knowledge base. \"\n",
+    "            \"Search for information, read documents, and extract data to answer questions. \"\n",
+    "            \"Always cite your sources by document ID.\"\n",
+    "        ),\n",
+    "    },\n",
+    "    {\n",
+    "        \"role\": \"user\",\n",
+    "        \"content\": (\n",
+    "            \"Compare the Mamba and Transformer components in Nemotron 3 Super. \"\n",
+    "            \"How many layers of each type are there, and what role does each play?\"\n",
+    "        ),\n",
+    "    },\n",
+    "]\n",
+    "\n",
+    "answer = run_tool_loop(messages, TOOLS)\n",
+    "print(\"\\n\" + \"=\" * 60)\n",
+    "print(\"FINAL ANSWER:\")\n",
+    "print(\"=\" * 60)\n",
+    "print(answer)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 5. Autonomous Agent Loop\n",
+    "\n",
+    "Now we build a fully autonomous agent. Given a complex research task, the model:\n",
+    "1. **Plans** what information it needs\n",
+    "2. **Searches** the knowledge base\n",
+    "3. **Reads** relevant documents in full\n",
+    "4. **Extracts** structured data\n",
+    "5. **Synthesizes** findings into a report\n",
+    "6. **Saves** the report\n",
+    "\n",
+    "The system prompt instructs the model to work autonomously through all these steps."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "AGENT_SYSTEM_PROMPT = \"\"\"\\\n",
+    "You are an autonomous research agent. When given a research task, you must:\n",
+    "\n",
+    "1. Search the knowledge base to find relevant documents\n",
+    "2. Read the full text of the most relevant documents\n",
+    "3. Extract structured data from the documents as needed\n",
+    "4. Perform any calculations required\n",
+    "5. Synthesize your findings into a comprehensive report\n",
+    "6. Save the report using the save_report tool\n",
+    "\n",
+    "Work through these steps autonomously. Do not ask the user for clarification\n",
+    "- use your best judgment. After saving the report, provide a brief summary\n",
+    "of your findings to the user.\n",
+    "\n",
+    "Always be thorough: read multiple documents when available, cross-reference\n",
+    "information, and note any gaps in the available data.\"\"\"\n",
+    "\n",
+    "# Reset saved reports\n",
+    "SAVED_REPORTS.clear()\n",
+    "\n",
+    "messages = [\n",
+    "    {\"role\": \"system\", \"content\": AGENT_SYSTEM_PROMPT},\n",
+    "    {\n",
+    "        \"role\": \"user\",\n",
+    "        \"content\": (\n",
+    "            \"Research the Nemotron 3 Super model architecture and training pipeline. \"\n",
+    "            \"I need a report covering: (1) the hybrid architecture design and why it \"\n",
+    "            \"was chosen, (2) the key training stages and techniques used, (3) the \"\n",
+    "            \"resulting benchmark performance. Calculate the ratio of active to total \"\n",
+    "            \"parameters and explain what this means for inference efficiency. \"\n",
+    "            \"Save a complete report when done.\"\n",
+    "        ),\n",
+    "    },\n",
+    "]\n",
+    "\n",
+    "print(\"Starting autonomous agent...\")\n",
+    "print(\"=\" * 60)\n",
+    "answer = run_tool_loop(messages, TOOLS, max_turns=15)\n",
+    "print(\"\\n\" + \"=\" * 60)\n",
+    "print(\"AGENT SUMMARY:\")\n",
+    "print(\"=\" * 60)\n",
+    "print(answer)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# View the saved report\n",
+    "if SAVED_REPORTS:\n",
+    "    report = SAVED_REPORTS[-1]\n",
+    "    print(f\"Report #{report['id']}: {report['title']}\")\n",
+    "    print(\"-\" * 60)\n",
+    "    print(report[\"content\"])\n",
+    "else:\n",
+    "    print(\"No reports saved yet.\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 6. Reasoning Modes with Tool Calling\n",
+    "\n",
+    "Nemotron 3 Super supports three reasoning modes that affect how it approaches tool-calling tasks:\n",
+    "\n",
+    "| Mode | Behavior | Best For |\n",
+    "|------|----------|----------|\n",
+    "| **`reasoning-off`** | Direct tool selection, no internal deliberation | Simple lookups, high-throughput pipelines |\n",
+    "| **`regular`** | Full chain-of-thought before tool selection | Complex multi-step tasks |\n",
+    "| **`low-effort`** | Brief reasoning, then tool selection | Balanced speed/accuracy |\n",
+    "\n",
+    "Let's compare how each mode handles the same task."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "REASONING_TASK = (\n",
+    "    \"Find out how many MLP experts and attention experts Nemotron 3 Super uses, \"\n",
+    "    \"then calculate the total number of experts across both types.\"\n",
+    ")\n",
+    "\n",
+    "\n",
+    "def run_with_reasoning_mode(mode: str) -> None:\n",
+    "    \"\"\"Run the same task with a specific reasoning mode.\"\"\"\n",
+    "    print(f\"\\n{'=' * 60}\")\n",
+    "    print(f\"Mode: {mode}\")\n",
+    "    print(\"=\" * 60)\n",
+    "\n",
+    "    extra_body = {\"chat_template_kwargs\": {}}\n",
+    "    if mode == \"reasoning-off\":\n",
+    "        extra_body[\"chat_template_kwargs\"][\"enable_thinking\"] = False\n",
+    "    elif mode == \"regular\":\n",
+    "        extra_body[\"chat_template_kwargs\"][\"enable_thinking\"] = True\n",
+    "    elif mode == \"low-effort\":\n",
+    "        extra_body[\"chat_template_kwargs\"][\"enable_thinking\"] = True\n",
+    "        extra_body[\"chat_template_kwargs\"][\"low_effort\"] = True\n",
+    "\n",
+    "    messages = [\n",
+    "        {\"role\": \"system\", \"content\": \"You are a research assistant. Use tools to answer accurately.\"},\n",
+    "        {\"role\": \"user\", \"content\": REASONING_TASK},\n",
+    "    ]\n",
+    "\n",
+    "    response = client.chat.completions.create(\n",
+    "        model=MODEL,\n",
+    "        messages=messages,\n",
+    "        tools=TOOLS,\n",
+    "        tool_choice=\"auto\",\n",
+    "        extra_body=extra_body,\n",
+    "    )\n",
+    "\n",
+    "    msg = response.choices[0].message\n",
+    "    usage = response.usage\n",
+    "    print(f\"Tokens used: {usage.total_tokens} (prompt: {usage.prompt_tokens}, completion: {usage.completion_tokens})\")\n",
+    "\n",
+    "    if msg.tool_calls:\n",
+    "        for tc in msg.tool_calls:\n",
+    "            print(f\"  Tool call: {tc.function.name}({tc.function.arguments})\")\n",
+    "    else:\n",
+    "        print(f\"  Direct answer: {msg.content[:200]}\")\n",
+    "\n",
+    "\n",
+    "for mode in [\"reasoning-off\", \"low-effort\", \"regular\"]:\n",
+    "    run_with_reasoning_mode(mode)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 7. Best Practices\n",
+    "\n",
+    "### System Prompt Design\n",
+    "\n",
+    "Effective system prompts for tool-calling agents should:\n",
+    "- Clearly state the agent's role and available capabilities\n",
+    "- Specify when to use tools vs. answer directly\n",
+    "- Define the expected workflow (search -> read -> extract -> synthesize)\n",
+    "- Set autonomy level (ask for clarification vs. use best judgment)\n",
+    "\n",
+    "### Tool Schema Design\n",
+    "\n",
+    "- Use clear, descriptive names (`search_knowledge_base` not `search`)\n",
+    "- Write detailed descriptions - the model reads these to decide which tool to use\n",
+    "- Mark required vs. optional parameters explicitly\n",
+    "- Include `default` values for optional parameters\n",
+    "- Use specific types (`integer` not `number` when appropriate)\n",
+    "\n",
+    "### Handling Edge Cases\n",
+    "\n",
+    "- **Boolean parameters**: Use JSON `true`/`false`, not Python strings `\"True\"`/`\"False\"` (see [issue #52](https://github.com/NVIDIA-NeMo/Nemotron/issues/52))\n",
+    "- **Token limits**: For long tool results, consider truncating or summarizing before passing back\n",
+    "- **Error handling**: Return structured error JSON from tools so the model can recover\n",
+    "- **Max turns**: Always set a maximum iteration count to prevent infinite loops\n",
+    "\n",
+    "### Choosing a Reasoning Mode\n",
+    "\n",
+    "| Scenario | Recommended Mode |\n",
+    "|----------|------------------|\n",
+    "| Simple data lookup | `reasoning-off` (fastest) |\n",
+    "| Multi-step research task | `regular` (most thorough) |\n",
+    "| Production pipeline with latency constraints | `low-effort` (balanced) |\n",
+    "| Debugging tool-calling behavior | `regular` (shows reasoning) |"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Next Steps\n",
+    "\n",
+    "- **Self-hosted deployment**: Replace the NVIDIA API with a local vLLM server for full control. See the [vLLM cookbook](../../usage-cookbook/Nemotron-3-Super/vllm_cookbook.ipynb).\n",
+    "- **Real tools**: Connect to live APIs (web search, databases, file systems) instead of simulated tools.\n",
+    "- **Multi-agent patterns**: Orchestrate multiple Nemotron agents with different system prompts and tool sets.\n",
+    "- **Streaming**: Use `stream=True` with `delta.tool_calls` for real-time tool-call streaming. See the [Getting Started Guide](../Nemotron-3-Super-Getting-Started-Guide/Nemotron-3-Super-Getting-Started-Guide.ipynb)."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "name": "python",
+   "version": "3.10.0"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}