diff --git a/use-case-examples/Agentic-Tool-Calling-with-Nemotron-Super/README.md b/use-case-examples/Agentic-Tool-Calling-with-Nemotron-Super/README.md new file mode 100644 index 0000000..72f326b --- /dev/null +++ b/use-case-examples/Agentic-Tool-Calling-with-Nemotron-Super/README.md @@ -0,0 +1,52 @@ +# Agentic Tool-Calling with Nemotron 3 Super + +Build multi-step AI agents that plan, call tools, and synthesize results using Nemotron 3 Super's structured function-calling capabilities. + +## Overview + +This example demonstrates how to build agentic workflows with Nemotron 3 Super, progressing from simple tool calls to a fully autonomous agent loop: + +1. **Single Tool Call** - Model selects and invokes one function +2. **Multi-Turn Tool Calling** - Model chains tool results across conversation turns +3. **Autonomous Agent Loop** - Model plans a strategy, executes multiple tools, and synthesizes a final report +4. **Reasoning Modes** - Compare `reasoning-off`, `regular`, and `low-effort` modes with tool calling + +## Models Used + +| Component | Model | Parameters | Deployment | +|-----------|-------|------------|------------| +| **Reasoning + Tool Calling** | `nvidia/nemotron-3-super-120b-a12b` | 120B total / 12B active | NVIDIA API or self-hosted (vLLM) | + +## Why Nemotron 3 Super for Agents? + +- **85.6% on PinchBench** - Best open model for agentic tasks +- **Trained on 21 RL environments** including TerminalBench, TauBench V2, and SWE-Bench +- **Structured tool calling** with JSON schema support via OpenAI-compatible API +- **Three reasoning modes** for balancing speed vs. depth in tool-calling scenarios +- **Hybrid Mamba-Transformer MoE** architecture delivers high throughput at inference time + +## Requirements + +- Python 3.10+ +- NVIDIA API Key ([get one here](https://build.nvidia.com/)) + +## Quick Start + +```bash +# Install dependencies +pip install openai + +# Set your API key +export NVIDIA_API_KEY="your-key-here" + +# Run the notebook +jupyter notebook agentic_tool_calling_tutorial.ipynb +``` + +## What You'll Learn + +- How to define tools with JSON schema for Nemotron 3 Super +- Building a tool-calling conversation loop with proper message threading +- Implementing an autonomous agent that plans and executes multi-step tasks +- Choosing the right reasoning mode for different agentic scenarios +- Best practices for system prompts, error handling, and tool result formatting diff --git a/use-case-examples/Agentic-Tool-Calling-with-Nemotron-Super/agentic_tool_calling_tutorial.ipynb b/use-case-examples/Agentic-Tool-Calling-with-Nemotron-Super/agentic_tool_calling_tutorial.ipynb new file mode 100644 index 0000000..d2a0ce0 --- /dev/null +++ b/use-case-examples/Agentic-Tool-Calling-with-Nemotron-Super/agentic_tool_calling_tutorial.ipynb @@ -0,0 +1,719 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Agentic Tool-Calling with Nemotron 3 Super\n", + "\n", + "This notebook demonstrates how to build multi-step AI agents using **Nemotron 3 Super's** structured tool-calling capabilities. We progress from a single tool call to a fully autonomous agent loop that plans, executes, and synthesizes.\n", + "\n", + "| Component | Model | Parameters | Deployment |\n", + "|-----------|-------|------------|------------|\n", + "| **Reasoning + Tool Calling** | `nvidia/nemotron-3-super-120b-a12b` | 120B total / 12B active | NVIDIA API |\n", + "\n", + "**Prerequisites:** An NVIDIA API key from [build.nvidia.com](https://build.nvidia.com/)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 1. Setup" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!pip install -q openai" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import json\n", + "import os\n", + "from getpass import getpass\n", + "\n", + "from openai import OpenAI\n", + "\n", + "NVIDIA_API_KEY = os.environ.get(\"NVIDIA_API_KEY\") or getpass(\"NVIDIA API key: \").strip()\n", + "\n", + "client = OpenAI(\n", + " base_url=\"https://integrate.api.nvidia.com/v1\",\n", + " api_key=NVIDIA_API_KEY,\n", + ")\n", + "\n", + "MODEL = \"nvidia/nemotron-3-super-120b-a12b\"\n", + "\n", + "print(f\"Using model: {MODEL}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 2. Define Tools\n", + "\n", + "We define a set of tools that our agent can use. Each tool has a JSON schema describing its parameters, following the OpenAI function-calling format that Nemotron 3 Super supports natively.\n", + "\n", + "Our agent will be a **research assistant** that can search for information, read documents, extract structured data, perform calculations, and save reports." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Tool definitions using OpenAI-compatible JSON schema format\n", + "TOOLS = [\n", + " {\n", + " \"type\": \"function\",\n", + " \"function\": {\n", + " \"name\": \"search_knowledge_base\",\n", + " \"description\": \"Search a knowledge base for information on a topic. Returns a list of relevant snippets with source references.\",\n", + " \"parameters\": {\n", + " \"type\": \"object\",\n", + " \"properties\": {\n", + " \"query\": {\n", + " \"type\": \"string\",\n", + " \"description\": \"The search query\"\n", + " },\n", + " \"max_results\": {\n", + " \"type\": \"integer\",\n", + " \"description\": \"Maximum number of results to return\",\n", + " \"default\": 3\n", + " }\n", + " },\n", + " \"required\": [\"query\"]\n", + " }\n", + " }\n", + " },\n", + " {\n", + " \"type\": \"function\",\n", + " \"function\": {\n", + " \"name\": \"read_document\",\n", + " \"description\": \"Read the full text of a document given its identifier.\",\n", + " \"parameters\": {\n", + " \"type\": \"object\",\n", + " \"properties\": {\n", + " \"document_id\": {\n", + " \"type\": \"string\",\n", + " \"description\": \"The unique identifier of the document to read\"\n", + " }\n", + " },\n", + " \"required\": [\"document_id\"]\n", + " }\n", + " }\n", + " },\n", + " {\n", + " \"type\": \"function\",\n", + " \"function\": {\n", + " \"name\": \"extract_structured_data\",\n", + " \"description\": \"Extract structured fields from unstructured text. Returns a JSON object with the requested fields.\",\n", + " \"parameters\": {\n", + " \"type\": \"object\",\n", + " \"properties\": {\n", + " \"text\": {\n", + " \"type\": \"string\",\n", + " \"description\": \"The text to extract data from\"\n", + " },\n", + " \"fields\": {\n", + " \"type\": \"array\",\n", + " \"items\": {\"type\": \"string\"},\n", + " \"description\": \"List of field names to extract\"\n", + " }\n", + " },\n", + " \"required\": [\"text\", \"fields\"]\n", + " }\n", + " }\n", + " },\n", + " {\n", + " \"type\": \"function\",\n", + " \"function\": {\n", + " \"name\": \"calculate\",\n", + " \"description\": \"Evaluate a mathematical expression and return the result.\",\n", + " \"parameters\": {\n", + " \"type\": \"object\",\n", + " \"properties\": {\n", + " \"expression\": {\n", + " \"type\": \"string\",\n", + " \"description\": \"A mathematical expression to evaluate (e.g., '(100 * 1.05) - 50')\"\n", + " }\n", + " },\n", + " \"required\": [\"expression\"]\n", + " }\n", + " }\n", + " },\n", + " {\n", + " \"type\": \"function\",\n", + " \"function\": {\n", + " \"name\": \"save_report\",\n", + " \"description\": \"Save a research report with a title and content.\",\n", + " \"parameters\": {\n", + " \"type\": \"object\",\n", + " \"properties\": {\n", + " \"title\": {\n", + " \"type\": \"string\",\n", + " \"description\": \"The report title\"\n", + " },\n", + " \"content\": {\n", + " \"type\": \"string\",\n", + " \"description\": \"The full report content in markdown format\"\n", + " }\n", + " },\n", + " \"required\": [\"title\", \"content\"]\n", + " }\n", + " }\n", + " }\n", + "]\n", + "\n", + "print(f\"Defined {len(TOOLS)} tools: {[t['function']['name'] for t in TOOLS]}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Simulated Tool Implementations\n", + "\n", + "These implementations simulate real tool behavior so the notebook is self-contained and requires no external services beyond the NVIDIA API." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Simulated knowledge base for the research assistant\n", + "KNOWLEDGE_BASE = {\n", + " \"nemotron-architecture\": {\n", + " \"title\": \"Nemotron 3 Super Architecture Overview\",\n", + " \"content\": (\n", + " \"Nemotron 3 Super is a 120.6B parameter hybrid Mamba-Transformer \"\n", + " \"Latent Mixture-of-Experts model with 12.7B active parameters per \"\n", + " \"forward pass. It uses 64 experts with top-4 routing for MLP layers \"\n", + " \"and Latent MoE attention with 16 experts (top-2 routing). The hybrid \"\n", + " \"architecture alternates between 32 Mamba-2 layers and 32 Transformer \"\n", + " \"layers. Context window extends to 1M tokens via YaRN-based positional \"\n", + " \"interpolation. Training used 30T tokens across pretraining, SFT, and \"\n", + " \"a three-stage RL pipeline.\"\n", + " ),\n", + " },\n", + " \"nemotron-benchmarks\": {\n", + " \"title\": \"Nemotron 3 Super Benchmark Results\",\n", + " \"content\": (\n", + " \"PinchBench: 85.6% (best open model). MATH-500: 97.4%. AIME 2025: 72.2%. \"\n", + " \"GPQA Diamond: 71.1%. LiveCodeBench v6: 63.3%. HumanEval: 92.1%. \"\n", + " \"SWE-Bench Verified: 55.4%. TerminalBench: 40.6%. TauBench V2 Airline: 62.0%. \"\n", + " \"The model achieves these scores with only 12B active parameters, \"\n", + " \"delivering 5x higher throughput than dense models of similar accuracy.\"\n", + " ),\n", + " },\n", + " \"nemotron-training\": {\n", + " \"title\": \"Nemotron 3 Super Training Pipeline\",\n", + " \"content\": (\n", + " \"Three-stage training: (1) Pretraining on 30T tokens with curriculum \"\n", + " \"learning across code, math, science, and general text. (2) Multi-domain \"\n", + " \"SFT over 7M samples covering 15+ data domains including competition \"\n", + " \"math/code, software engineering, agentic programming, CUDA, financial \"\n", + " \"reasoning, and more. Uses a novel two-stage SFT loss. (3) Three-stage \"\n", + " \"RL: multi-environment RLVR across 21 environments and 37 datasets, \"\n", + " \"SWE-RL with container-isolated sandbox execution, and RLHF with a \"\n", + " \"principle-following GenRM.\"\n", + " ),\n", + " },\n", + " \"mamba-architecture\": {\n", + " \"title\": \"Mamba-2 State Space Model Architecture\",\n", + " \"content\": (\n", + " \"Mamba-2 is a selective state space model that achieves linear-time \"\n", + " \"sequence processing. Unlike attention which scales quadratically, \"\n", + " \"Mamba-2 maintains a compressed state that is updated incrementally. \"\n", + " \"This enables efficient processing of very long sequences. In the \"\n", + " \"hybrid architecture, Mamba layers handle sequential dependencies \"\n", + " \"while Transformer layers provide precise attention for complex \"\n", + " \"reasoning tasks. The alternating pattern (32 Mamba + 32 Transformer) \"\n", + " \"balances efficiency and accuracy.\"\n", + " ),\n", + " },\n", + " \"latent-moe\": {\n", + " \"title\": \"Latent Mixture-of-Experts Explained\",\n", + " \"content\": (\n", + " \"Latent MoE is an architectural innovation that applies the MoE \"\n", + " \"pattern to attention layers, not just MLP layers. Traditional MoE \"\n", + " \"routes tokens to different MLP experts. Latent MoE extends this by \"\n", + " \"routing to different attention heads, effectively allowing 4x more \"\n", + " \"experts at the same compute cost. Nemotron 3 Super uses 16 Latent \"\n", + " \"MoE attention experts with top-2 routing in each Transformer layer, \"\n", + " \"alongside 64 MLP experts with top-4 routing.\"\n", + " ),\n", + " },\n", + "}\n", + "\n", + "SAVED_REPORTS = []\n", + "\n", + "\n", + "def execute_tool(name: str, arguments: dict) -> str:\n", + " \"\"\"Execute a tool call and return the result as a string.\"\"\"\n", + " if name == \"search_knowledge_base\":\n", + " query = arguments[\"query\"].lower()\n", + " max_results = arguments.get(\"max_results\", 3)\n", + " results = []\n", + " for doc_id, doc in KNOWLEDGE_BASE.items():\n", + " # Simple keyword matching for simulation\n", + " if any(word in doc[\"content\"].lower() or word in doc[\"title\"].lower()\n", + " for word in query.split()):\n", + " results.append({\n", + " \"document_id\": doc_id,\n", + " \"title\": doc[\"title\"],\n", + " \"snippet\": doc[\"content\"][:150] + \"...\",\n", + " })\n", + " return json.dumps(results[:max_results])\n", + "\n", + " elif name == \"read_document\":\n", + " doc_id = arguments[\"document_id\"]\n", + " if doc_id in KNOWLEDGE_BASE:\n", + " doc = KNOWLEDGE_BASE[doc_id]\n", + " return json.dumps({\"title\": doc[\"title\"], \"content\": doc[\"content\"]})\n", + " return json.dumps({\"error\": f\"Document '{doc_id}' not found\"})\n", + "\n", + " elif name == \"extract_structured_data\":\n", + " text = arguments[\"text\"]\n", + " fields = arguments[\"fields\"]\n", + " # Simulate extraction by returning placeholder values\n", + " extracted = {}\n", + " for field in fields:\n", + " field_lower = field.lower()\n", + " if \"parameter\" in field_lower and \"120\" in text:\n", + " extracted[field] = \"120.6B total, 12.7B active\"\n", + " elif \"active\" in field_lower and \"12\" in text:\n", + " extracted[field] = \"12.7B\"\n", + " elif \"context\" in field_lower and \"1M\" in text:\n", + " extracted[field] = \"1M tokens\"\n", + " elif \"expert\" in field_lower and \"64\" in text:\n", + " extracted[field] = \"64 MLP experts (top-4), 16 attention experts (top-2)\"\n", + " elif \"benchmark\" in field_lower or \"score\" in field_lower:\n", + " extracted[field] = \"PinchBench 85.6%, MATH-500 97.4%\"\n", + " else:\n", + " extracted[field] = f\"[extracted from text for '{field}']\"\n", + " return json.dumps(extracted)\n", + "\n", + " elif name == \"calculate\":\n", + " expression = arguments[\"expression\"]\n", + " try:\n", + " # Only allow safe mathematical expressions\n", + " allowed = set(\"0123456789+-*/().% \")\n", + " if all(c in allowed for c in expression):\n", + " result = eval(expression) # noqa: S307\n", + " return json.dumps({\"expression\": expression, \"result\": result})\n", + " return json.dumps({\"error\": \"Invalid expression\"})\n", + " except Exception as e:\n", + " return json.dumps({\"error\": str(e)})\n", + "\n", + " elif name == \"save_report\":\n", + " title = arguments[\"title\"]\n", + " content = arguments[\"content\"]\n", + " report = {\"title\": title, \"content\": content, \"id\": len(SAVED_REPORTS) + 1}\n", + " SAVED_REPORTS.append(report)\n", + " return json.dumps({\"status\": \"saved\", \"report_id\": report[\"id\"], \"title\": title})\n", + "\n", + " return json.dumps({\"error\": f\"Unknown tool: {name}\"})\n", + "\n", + "\n", + "print(\"Tool implementations ready.\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 3. Single Tool Call\n", + "\n", + "The simplest case: the model decides to call one tool to answer a question." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "response = client.chat.completions.create(\n", + " model=MODEL,\n", + " messages=[\n", + " {\"role\": \"system\", \"content\": \"You are a research assistant. Use the provided tools to answer questions accurately.\"},\n", + " {\"role\": \"user\", \"content\": \"What is Nemotron 3 Super's score on PinchBench?\"},\n", + " ],\n", + " tools=TOOLS,\n", + " tool_choice=\"auto\",\n", + ")\n", + "\n", + "message = response.choices[0].message\n", + "print(f\"Model decided to call: {message.tool_calls[0].function.name}\")\n", + "print(f\"With arguments: {message.tool_calls[0].function.arguments}\")\n", + "\n", + "# Execute the tool\n", + "tool_call = message.tool_calls[0]\n", + "result = execute_tool(tool_call.function.name, json.loads(tool_call.function.arguments))\n", + "print(f\"\\nTool result: {result}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now feed the tool result back to the model so it can formulate a natural language answer:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Continue the conversation with the tool result\n", + "follow_up = client.chat.completions.create(\n", + " model=MODEL,\n", + " messages=[\n", + " {\"role\": \"system\", \"content\": \"You are a research assistant. Use the provided tools to answer questions accurately.\"},\n", + " {\"role\": \"user\", \"content\": \"What is Nemotron 3 Super's score on PinchBench?\"},\n", + " message, # The assistant's tool-call message\n", + " {\"role\": \"tool\", \"tool_call_id\": tool_call.id, \"content\": result},\n", + " ],\n", + " tools=TOOLS,\n", + " tool_choice=\"auto\",\n", + ")\n", + "\n", + "print(\"Model's answer:\")\n", + "print(follow_up.choices[0].message.content)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 4. Multi-Turn Tool Calling\n", + "\n", + "In a multi-turn scenario, the model calls a tool, gets the result, then decides whether to call another tool or provide a final answer. This enables the model to gather information incrementally.\n", + "\n", + "Here we implement a helper that runs the tool-calling loop until the model produces a final text response." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def run_tool_loop(\n", + " messages: list[dict],\n", + " tools: list[dict],\n", + " max_turns: int = 10,\n", + " verbose: bool = True,\n", + ") -> str:\n", + " \"\"\"\n", + " Run a tool-calling loop until the model produces a final text response\n", + " or the maximum number of turns is reached.\n", + "\n", + " Returns the model's final text response.\n", + " \"\"\"\n", + " for turn in range(max_turns):\n", + " response = client.chat.completions.create(\n", + " model=MODEL,\n", + " messages=messages,\n", + " tools=tools,\n", + " tool_choice=\"auto\",\n", + " )\n", + "\n", + " assistant_message = response.choices[0].message\n", + " messages.append(assistant_message)\n", + "\n", + " # If the model didn't call any tools, we have our final answer\n", + " if not assistant_message.tool_calls:\n", + " if verbose:\n", + " print(f\"\\n[Turn {turn + 1}] Final answer (no more tool calls)\")\n", + " return assistant_message.content\n", + "\n", + " # Execute each tool call and add results to the conversation\n", + " for tool_call in assistant_message.tool_calls:\n", + " args = json.loads(tool_call.function.arguments)\n", + " if verbose:\n", + " print(f\"[Turn {turn + 1}] Calling {tool_call.function.name}({json.dumps(args, indent=2)})\")\n", + "\n", + " result = execute_tool(tool_call.function.name, args)\n", + " if verbose:\n", + " print(f\" -> Result: {result[:200]}{'...' if len(result) > 200 else ''}\")\n", + "\n", + " messages.append({\n", + " \"role\": \"tool\",\n", + " \"tool_call_id\": tool_call.id,\n", + " \"content\": result,\n", + " })\n", + "\n", + " return \"[Max turns reached without a final answer]\"\n", + "\n", + "\n", + "print(\"Tool loop helper defined.\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Multi-turn example: a question that requires multiple tool calls\n", + "messages = [\n", + " {\n", + " \"role\": \"system\",\n", + " \"content\": (\n", + " \"You are a research assistant with access to a technical knowledge base. \"\n", + " \"Search for information, read documents, and extract data to answer questions. \"\n", + " \"Always cite your sources by document ID.\"\n", + " ),\n", + " },\n", + " {\n", + " \"role\": \"user\",\n", + " \"content\": (\n", + " \"Compare the Mamba and Transformer components in Nemotron 3 Super. \"\n", + " \"How many layers of each type are there, and what role does each play?\"\n", + " ),\n", + " },\n", + "]\n", + "\n", + "answer = run_tool_loop(messages, TOOLS)\n", + "print(\"\\n\" + \"=\" * 60)\n", + "print(\"FINAL ANSWER:\")\n", + "print(\"=\" * 60)\n", + "print(answer)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 5. Autonomous Agent Loop\n", + "\n", + "Now we build a fully autonomous agent. Given a complex research task, the model:\n", + "1. **Plans** what information it needs\n", + "2. **Searches** the knowledge base\n", + "3. **Reads** relevant documents in full\n", + "4. **Extracts** structured data\n", + "5. **Synthesizes** findings into a report\n", + "6. **Saves** the report\n", + "\n", + "The system prompt instructs the model to work autonomously through all these steps." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "AGENT_SYSTEM_PROMPT = \"\"\"\\\n", + "You are an autonomous research agent. When given a research task, you must:\n", + "\n", + "1. Search the knowledge base to find relevant documents\n", + "2. Read the full text of the most relevant documents\n", + "3. Extract structured data from the documents as needed\n", + "4. Perform any calculations required\n", + "5. Synthesize your findings into a comprehensive report\n", + "6. Save the report using the save_report tool\n", + "\n", + "Work through these steps autonomously. Do not ask the user for clarification\n", + "- use your best judgment. After saving the report, provide a brief summary\n", + "of your findings to the user.\n", + "\n", + "Always be thorough: read multiple documents when available, cross-reference\n", + "information, and note any gaps in the available data.\"\"\"\n", + "\n", + "# Reset saved reports\n", + "SAVED_REPORTS.clear()\n", + "\n", + "messages = [\n", + " {\"role\": \"system\", \"content\": AGENT_SYSTEM_PROMPT},\n", + " {\n", + " \"role\": \"user\",\n", + " \"content\": (\n", + " \"Research the Nemotron 3 Super model architecture and training pipeline. \"\n", + " \"I need a report covering: (1) the hybrid architecture design and why it \"\n", + " \"was chosen, (2) the key training stages and techniques used, (3) the \"\n", + " \"resulting benchmark performance. Calculate the ratio of active to total \"\n", + " \"parameters and explain what this means for inference efficiency. \"\n", + " \"Save a complete report when done.\"\n", + " ),\n", + " },\n", + "]\n", + "\n", + "print(\"Starting autonomous agent...\")\n", + "print(\"=\" * 60)\n", + "answer = run_tool_loop(messages, TOOLS, max_turns=15)\n", + "print(\"\\n\" + \"=\" * 60)\n", + "print(\"AGENT SUMMARY:\")\n", + "print(\"=\" * 60)\n", + "print(answer)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# View the saved report\n", + "if SAVED_REPORTS:\n", + " report = SAVED_REPORTS[-1]\n", + " print(f\"Report #{report['id']}: {report['title']}\")\n", + " print(\"-\" * 60)\n", + " print(report[\"content\"])\n", + "else:\n", + " print(\"No reports saved yet.\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 6. Reasoning Modes with Tool Calling\n", + "\n", + "Nemotron 3 Super supports three reasoning modes that affect how it approaches tool-calling tasks:\n", + "\n", + "| Mode | Behavior | Best For |\n", + "|------|----------|----------|\n", + "| **`reasoning-off`** | Direct tool selection, no internal deliberation | Simple lookups, high-throughput pipelines |\n", + "| **`regular`** | Full chain-of-thought before tool selection | Complex multi-step tasks |\n", + "| **`low-effort`** | Brief reasoning, then tool selection | Balanced speed/accuracy |\n", + "\n", + "Let's compare how each mode handles the same task." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "REASONING_TASK = (\n", + " \"Find out how many MLP experts and attention experts Nemotron 3 Super uses, \"\n", + " \"then calculate the total number of experts across both types.\"\n", + ")\n", + "\n", + "\n", + "def run_with_reasoning_mode(mode: str) -> None:\n", + " \"\"\"Run the same task with a specific reasoning mode.\"\"\"\n", + " print(f\"\\n{'=' * 60}\")\n", + " print(f\"Mode: {mode}\")\n", + " print(\"=\" * 60)\n", + "\n", + " extra_body = {\"chat_template_kwargs\": {}}\n", + " if mode == \"reasoning-off\":\n", + " extra_body[\"chat_template_kwargs\"][\"enable_thinking\"] = False\n", + " elif mode == \"regular\":\n", + " extra_body[\"chat_template_kwargs\"][\"enable_thinking\"] = True\n", + " elif mode == \"low-effort\":\n", + " extra_body[\"chat_template_kwargs\"][\"enable_thinking\"] = True\n", + " extra_body[\"chat_template_kwargs\"][\"low_effort\"] = True\n", + "\n", + " messages = [\n", + " {\"role\": \"system\", \"content\": \"You are a research assistant. Use tools to answer accurately.\"},\n", + " {\"role\": \"user\", \"content\": REASONING_TASK},\n", + " ]\n", + "\n", + " response = client.chat.completions.create(\n", + " model=MODEL,\n", + " messages=messages,\n", + " tools=TOOLS,\n", + " tool_choice=\"auto\",\n", + " extra_body=extra_body,\n", + " )\n", + "\n", + " msg = response.choices[0].message\n", + " usage = response.usage\n", + " print(f\"Tokens used: {usage.total_tokens} (prompt: {usage.prompt_tokens}, completion: {usage.completion_tokens})\")\n", + "\n", + " if msg.tool_calls:\n", + " for tc in msg.tool_calls:\n", + " print(f\" Tool call: {tc.function.name}({tc.function.arguments})\")\n", + " else:\n", + " print(f\" Direct answer: {msg.content[:200]}\")\n", + "\n", + "\n", + "for mode in [\"reasoning-off\", \"low-effort\", \"regular\"]:\n", + " run_with_reasoning_mode(mode)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 7. Best Practices\n", + "\n", + "### System Prompt Design\n", + "\n", + "Effective system prompts for tool-calling agents should:\n", + "- Clearly state the agent's role and available capabilities\n", + "- Specify when to use tools vs. answer directly\n", + "- Define the expected workflow (search -> read -> extract -> synthesize)\n", + "- Set autonomy level (ask for clarification vs. use best judgment)\n", + "\n", + "### Tool Schema Design\n", + "\n", + "- Use clear, descriptive names (`search_knowledge_base` not `search`)\n", + "- Write detailed descriptions - the model reads these to decide which tool to use\n", + "- Mark required vs. optional parameters explicitly\n", + "- Include `default` values for optional parameters\n", + "- Use specific types (`integer` not `number` when appropriate)\n", + "\n", + "### Handling Edge Cases\n", + "\n", + "- **Boolean parameters**: Use JSON `true`/`false`, not Python strings `\"True\"`/`\"False\"` (see [issue #52](https://github.com/NVIDIA-NeMo/Nemotron/issues/52))\n", + "- **Token limits**: For long tool results, consider truncating or summarizing before passing back\n", + "- **Error handling**: Return structured error JSON from tools so the model can recover\n", + "- **Max turns**: Always set a maximum iteration count to prevent infinite loops\n", + "\n", + "### Choosing a Reasoning Mode\n", + "\n", + "| Scenario | Recommended Mode |\n", + "|----------|------------------|\n", + "| Simple data lookup | `reasoning-off` (fastest) |\n", + "| Multi-step research task | `regular` (most thorough) |\n", + "| Production pipeline with latency constraints | `low-effort` (balanced) |\n", + "| Debugging tool-calling behavior | `regular` (shows reasoning) |" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Next Steps\n", + "\n", + "- **Self-hosted deployment**: Replace the NVIDIA API with a local vLLM server for full control. See the [vLLM cookbook](../../usage-cookbook/Nemotron-3-Super/vllm_cookbook.ipynb).\n", + "- **Real tools**: Connect to live APIs (web search, databases, file systems) instead of simulated tools.\n", + "- **Multi-agent patterns**: Orchestrate multiple Nemotron agents with different system prompts and tool sets.\n", + "- **Streaming**: Use `stream=True` with `delta.tool_calls` for real-time tool-call streaming. See the [Getting Started Guide](../Nemotron-3-Super-Getting-Started-Guide/Nemotron-3-Super-Getting-Started-Guide.ipynb)." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "name": "python", + "version": "3.10.0" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +}