llamastack · leseb · Apr 2, 2026 · Apr 2, 2026 · Apr 2, 2026 · Apr 2, 2026
@@ -1,80 +1,23 @@
 ---
 title: Building Applications
-description: Comprehensive guides for building AI applications with Llama Stack
+description: Guides for building AI applications with Llama Stack
 sidebar_label: Overview
 sidebar_position: 5
 ---
 
-# AI Application Examples
+# Building Applications
 
-Llama Stack provides all the building blocks needed to create sophisticated AI applications.
+Llama Stack provides the building blocks for AI applications. Start with the notebook for a hands-on walkthrough, then dive into specific topics.
 
-## Getting Started
+**[Building AI Applications Notebook](https://github.com/meta-llama/llama-stack/blob/main/docs/getting_started.ipynb)**
 
-The best way to get started is to look at this comprehensive notebook which walks through the various APIs (from basic inference, to RAG agents) and how to use them.
+## Topics
 
-**📓 [Building AI Applications Notebook](https://github.com/meta-llama/llama-stack/blob/main/docs/getting_started.ipynb)**
-
-## Core Topics
-
-Here are the key topics that will help you build effective AI applications:
-
-### 🤖 **Agent Development**
-- **[Agent Framework](/docs/building_applications/agent)** - Understand the components and design patterns of the Llama Stack agent framework
-- **[Agent Execution Loop](/docs/building_applications/agent_execution_loop)** - How agents process information, make decisions, and execute actions
-- **[Agents vs Responses API](/docs/building_applications/responses_vs_agents)** - Learn when to use each API for different use cases
-
-### 📚 **Knowledge Integration**
-- **[RAG (Retrieval-Augmented Generation)](/docs/building_applications/rag)** - Enhance your agents with external knowledge through retrieval mechanisms
-
-### 🛠️ **Capabilities & Extensions**
-- **[Tools](/docs/building_applications/tools)** - Extend your agents' capabilities by integrating with external tools and APIs
-
-### 📊 **Quality & Monitoring**
-- **[Evaluations](/docs/building_applications/evals)** - Evaluate your agents' effectiveness and identify areas for improvement
-- **[Telemetry](/docs/building_applications/telemetry)** - Monitor and analyze your agents' performance and behavior
-- **[Safety](/docs/building_applications/safety)** - Implement guardrails and safety measures to ensure responsible AI behavior
-
-## Application Patterns
-
-### 🤖 **Conversational Agents**
-Build intelligent chatbots and assistants that can:
-- Maintain context across conversations
-- Access external knowledge bases
-- Execute actions through tool integrations
-- Apply safety filters and guardrails
-
-### 📖 **RAG Applications**
-Create knowledge-augmented applications that:
-- Retrieve relevant information from documents
-- Generate contextually accurate responses
-- Handle large knowledge bases efficiently
-- Provide source attribution
-
-### 🔧 **Tool-Enhanced Systems**
-Develop applications that can:
-- Search the web for real-time information
-- Interact with databases and APIs
-- Perform calculations and analysis
-- Execute complex multi-step workflows
-
-### 🛡️ **Enterprise Applications**
-Build production-ready systems with:
-- Comprehensive safety measures
-- Performance monitoring and analytics
-- Scalable deployment configurations
-- Evaluation and quality assurance
-
-## Next Steps
-
-1. **📖 Start with the Notebook** - Work through the complete tutorial
-2. **🎯 Choose Your Pattern** - Pick the application type that matches your needs
-3. **🏗️ Build Your Foundation** - Set up your [providers](/docs/providers/) and [distributions](/docs/distributions/)
-4. **🚀 Deploy & Monitor** - Use our [deployment guides](/docs/deploying/) for production
-
-## Related Resources
-
-- **[Getting Started](/docs/getting_started/quickstart)** - Basic setup and concepts
-- **[Providers](/docs/providers/)** - Available AI service providers
-- **[Distributions](/docs/distributions/)** - Pre-configured deployment packages
-- **[API Reference](/docs/api/llama-stack-specification)** - Complete API documentation
+- **[RAG](/docs/building_applications/rag)** - Retrieval-augmented generation with vector stores and file search
+- **[Agent Framework](/docs/building_applications/agent)** - Components and design patterns for agents
+- **[Agent Execution Loop](/docs/building_applications/agent_execution_loop)** - How agents process, decide, and act
+- **[Responses vs Agents API](/docs/building_applications/responses_vs_agents)** - When to use each
+- **[Tools](/docs/building_applications/tools)** - Extend capabilities with web search, MCP, and custom functions
+- **[Evaluations](/docs/building_applications/evals)** - Measure and improve quality
+- **[Telemetry](/docs/building_applications/telemetry)** - Monitor performance and behavior
+- **[Safety](/docs/building_applications/safety)** - Guardrails and content moderation
@@ -1,221 +1,57 @@
 ---
-title: Agents vs OpenAI Responses API
-description: Compare the Agents API and OpenAI Responses API for building AI applications with tool calling capabilities
-sidebar_label: Agents vs Responses API
+title: Responses API vs Agents API
+description: Understanding which API to use for building AI applications
+sidebar_label: Responses vs Agents
 sidebar_position: 5
 ---
 
-import Tabs from '@theme/Tabs';
-import TabItem from '@theme/TabItem';
+# Responses API vs Agents API
 
-# Agents vs OpenAI Responses API
+Llama Stack provides two APIs for building AI applications with tool calling. The **Responses API** is the recommended path for new applications.
 
-Llama Stack (LLS) provides two different APIs for building AI applications with tool calling capabilities: the **Agents API** and the **OpenAI Responses API**. While both enable AI systems to use tools, and maintain full conversation history, they serve different use cases and have distinct characteristics.
+## Use the Responses API
 
-:::note
-**Note:** For simple and basic inferencing, you may want to use the [Chat Completions API](../providers/openai#chat-completions) directly, before progressing to Agents or Responses API.
-:::
+The Responses API is OpenAI-compatible and provides:
 
-## Overview
-
-### LLS Agents API
-The Agents API is a full-featured, stateful system designed for complex, multi-turn conversations. It maintains conversation state through persistent sessions identified by a unique session ID. The API supports comprehensive agent lifecycle management, detailed execution tracking, and rich metadata about each interaction through a structured session/turn/step hierarchy. The API can orchestrate multiple tool calls within a single turn.
-
-### OpenAI Responses API
-The OpenAI Responses API is a full-featured, stateful system designed for complex, multi-turn conversations, with direct compatibility with OpenAI's conversational patterns enhanced by LLama Stack's tool calling capabilities. It maintains conversation state by chaining responses through a `previous_response_id`, allowing interactions to branch or continue from any prior point. Each response can perform multiple tool calls within a single turn.
-
-### Key Differences
-The LLS Agents API uses the Chat Completions API on the backend for inference as it's the industry standard for building AI applications and most LLM providers are compatible with this API. For a detailed comparison between Responses and Chat Completions, see [OpenAI's documentation](https://platform.openai.com/docs/guides/responses-vs-chat-completions).
-
-Additionally, Agents let you specify input/output shields whereas Responses do not (though support is planned). Agents use a linear conversation model referenced by a single session ID. Responses, on the other hand, support branching, where each response can serve as a fork point, and conversations are tracked by the latest response ID. Responses also lets you dynamically choose the model, vector store, files, MCP servers, and more on each inference call, enabling more complex workflows. Agents require a static configuration for these components at the start of the session.
-
-Today the Agents and Responses APIs can be used independently depending on the use case. But, it is also productive to treat the APIs as complementary. It is not currently supported, but it is planned for the LLS Agents API to alternatively use the Responses API as its backend instead of the default Chat Completions API, i.e., enabling a combination of the safety features of Agents with the dynamic configuration and branching capabilities of Responses.
-
-## Feature Comparison
-
-| Feature | LLS Agents API | OpenAI Responses API |
-|---------|------------|---------------------|
-| **Conversation Management** | Linear persistent sessions | Can branch from any previous response ID |
-| **Input/Output Safety Shields** | Supported | Not yet supported |
-| **Per-call Flexibility** | Static per-session configuration | Dynamic per-call configuration |
-
-## Use Case Example: Research with Multiple Search Methods
-
-Let's compare how both APIs handle a research task where we need to:
-1. Search for current information and examples
-2. Access different information sources dynamically
-3. Continue the conversation based on search results
-
-<Tabs>
-<TabItem value="agents" label="Agents API">
-
-### Session-based Configuration with Safety Shields
+- **Dynamic configuration** - change model, tools, and vector stores on every call
+- **Conversation branching** - fork from any previous response via `previous_response_id`
+- **Built-in tool orchestration** - file_search, web_search, MCP, and custom functions
+- **Standard OpenAI SDK** - works with any OpenAI client library
 
 ```python
-# Create agent with static session configuration
-agent = Agent(
-    client,
-    model="Llama3.2-3B-Instruct",
-    instructions="You are a helpful coding assistant",
-    tools=[
-        {
-            "name": "builtin::file_search",
-            "args": {"vector_db_ids": ["code_docs"]},
-        },
-        "builtin::code_interpreter",
-    ],
-    input_shields=["llama_guard"],
-    output_shields=["llama_guard"],
-)
-
-session_id = agent.create_session("code_session")
+from openai import OpenAI
 
-# First turn: Search and execute
-response1 = agent.create_turn(
-    messages=[
-        {
-            "role": "user",
-            "content": "Find examples of sorting algorithms and run a bubble sort on [3,1,4,1,5]",
-        },
-    ],
-    session_id=session_id,
-)
+client = OpenAI(base_url="http://localhost:8321/v1", api_key="fake")
 
-# Continue conversation in same session
-response2 = agent.create_turn(
-    messages=[
-        {
-            "role": "user",
-            "content": "Now optimize that code and test it with a larger dataset",
-        },
-    ],
-    session_id=session_id,  # Same session, maintains full context
+response = client.responses.create(
+    model="llama3.2:3b",
+    input="Search my docs for deployment instructions",
+    tools=[{
+        "type": "file_search",
+        "vector_store_ids": ["vs_abc123"],
+    }],
 )
 
-# Agents API benefits:
-# ✅ Safety shields protect against malicious code execution
-# ✅ Session maintains context between code executions
-# ✅ Consistent tool configuration throughout conversation
-print(f"First result: {response1.output_message.content}")
-print(f"Optimization: {response2.output_message.content}")
-```
-
-</TabItem>
-<TabItem value="responses" label="Responses API">
-
-### Dynamic Per-call Configuration with Branching
-
-```python
-# First response: Use web search for latest algorithms
-response1 = client.responses.create(
-    model="Llama3.2-3B-Instruct",
-    input="Search for the latest efficient sorting algorithms and their performance comparisons",
-    tools=[
-        {
-            "type": "web_search",
-        },
-    ],  # Web search for current information
-)
-
-# Continue conversation: Switch to file search for local docs
+# Continue the conversation, switch model
 response2 = client.responses.create(
-    model="Llama3.2-1B-Instruct",  # Switch to faster model
-    input="Now search my uploaded files for existing sorting implementations",
-    tools=[
-        {  # Using Responses API built-in tools
-            "type": "file_search",
-            "vector_store_ids": ["vs_abc123"],  # Vector store containing uploaded files
-        },
-    ],
-    previous_response_id=response1.id,
-)
-
-# Branch from first response: Try different search approach
-response3 = client.responses.create(
-    model="Llama3.2-3B-Instruct",
-    input="Instead, search the web for Python-specific sorting best practices",
-    tools=[{"type": "web_search"}],  # Different web search query
-    previous_response_id=response1.id,  # Branch from response1
+    model="openai/gpt-4o",
+    input="Now summarize what you found",
+    previous_response_id=response.id,
 )
-
-# Responses API benefits:
-# ✅ Dynamic tool switching (web search ↔ file search per call)
-# ✅ OpenAI-compatible tool patterns (web_search, file_search)
-# ✅ Branch conversations to explore different information sources
-# ✅ Model flexibility per search type
-print(f"Web search results: {response1.output_message.content}")
-print(f"File search results: {response2.output_message.content}")
-print(f"Alternative web search: {response3.output_message.content}")
 ```
 
-</TabItem>
-</Tabs>
-
-Both APIs demonstrate distinct strengths that make them valuable on their own for different scenarios. The Agents API excels in providing structured, safety-conscious workflows with persistent session management, while the Responses API offers flexibility through dynamic configuration and OpenAI compatible tool patterns.
-
-## Use Case Examples
-
-### 1. Research and Analysis with Safety Controls
-**Best Choice: Agents API**
-
-**Scenario:** You're building a research assistant for a financial institution that needs to analyze market data, execute code to process financial models, and search through internal compliance documents. The system must ensure all interactions are logged for regulatory compliance and protected by safety shields to prevent malicious code execution or data leaks.
-
-**Why Agents API?** The Agents API provides persistent session management for iterative research workflows, built-in safety shields to protect against malicious code in financial models, and structured execution logs (session/turn/step) required for regulatory compliance. The static tool configuration ensures consistent access to your knowledge base and code interpreter throughout the entire research session.
-
-### 2. Dynamic Information Gathering with Branching Exploration
-**Best Choice: Responses API**
-
-**Scenario:** You're building a competitive intelligence tool that helps businesses research market trends. Users need to dynamically switch between web search for current market data and file search through uploaded industry reports. They also want to branch conversations to explore different market segments simultaneously and experiment with different models for various analysis types.
-
-**Why Responses API?** The Responses API's branching capability lets users explore multiple market segments from any research point. Dynamic per-call configuration allows switching between web search and file search as needed, while experimenting with different models (faster models for quick searches, more powerful models for deep analysis). The OpenAI-compatible tool patterns make integration straightforward.
-
-### 3. OpenAI Migration with Advanced Tool Capabilities
-**Best Choice: Responses API**
-
-**Scenario:** You have an existing application built with OpenAI's Assistants API that uses file search and web search capabilities. You want to migrate to Llama Stack for better performance and cost control while maintaining the same tool calling patterns and adding new capabilities like dynamic vector store selection.
-
-**Why Responses API?** The Responses API provides full OpenAI tool compatibility (`web_search`, `file_search`) with identical syntax, making migration seamless. The dynamic per-call configuration enables advanced features like switching vector stores per query or changing models based on query complexity - capabilities that extend beyond basic OpenAI functionality while maintaining compatibility.
-
-### 4. Educational Programming Tutor
-**Best Choice: Agents API**
-
-**Scenario:** You're building a programming tutor that maintains student context across multiple sessions, safely executes code exercises, and tracks learning progress with audit trails for educators.
-
-**Why Agents API?** Persistent sessions remember student progress across multiple interactions, safety shields prevent malicious code execution while allowing legitimate programming exercises, and structured execution logs help educators track learning patterns.
-
-### 5. Advanced Software Debugging Assistant
-**Best Choice: Agents API with Responses Backend**
-
-**Scenario:** You're building a debugging assistant that helps developers troubleshoot complex issues. It needs to maintain context throughout a debugging session, safely execute diagnostic code, switch between different analysis tools dynamically, and branch conversations to explore multiple potential causes simultaneously.
-
-**Why Agents + Responses?** The Agent provides safety shields for code execution and session management for the overall debugging workflow. The underlying Responses API enables dynamic model selection and flexible tool configuration per query, while branching lets you explore different theories (memory leak vs. concurrency issue) from the same debugging point and compare results.
-
-:::info[Future Enhancement]
-The ability to use Responses API as the backend for Agents is not yet implemented but is planned for a future release. Currently, Agents use Chat Completions API as their backend by default.
-:::
-
-## Decision Framework
-
-Use this framework to choose the right API for your use case:
+## Legacy Agents API
 
-### Choose Agents API when:
-- ✅ You need **safety shields** for input/output validation
-- ✅ Your application requires **linear conversation flow** with persistent context
-- ✅ You need **audit trails** and structured execution logs
-- ✅ Your tool configuration is **static** throughout the session
-- ✅ You're building **educational, financial, or enterprise** applications with compliance requirements
+The Agents API is an older, Llama Stack-specific API that uses sessions and turns. It is still functional but is not recommended for new applications.
 
-### Choose Responses API when:
-- ✅ You need **conversation branching** to explore multiple paths
-- ✅ You want **dynamic per-call configuration** (models, tools, vector stores)
-- ✅ You're **migrating from OpenAI** and want familiar tool patterns
-- ✅ You need **OpenAI compatibility** for existing workflows
-- ✅ Your application benefits from **flexible, experimental** interactions
+Key differences from Responses:
 
-## Related Resources
+| | Responses API | Agents API |
+|---|---|---|
+| **SDK** | Standard OpenAI SDK | Llama Stack client only |
+| **Configuration** | Dynamic per call | Static per session |
+| **Conversation model** | Branching via response IDs | Linear sessions |
+| **Tools** | file_search, web_search, MCP, functions | builtin::file_search, code_interpreter |
+| **Safety** | Via `/v1/moderations` or guardrail params | Built-in input/output shields |
 
-- **[Agents](/docs/building_applications/agent)** - Understanding the Agents API fundamentals
-- **[Agent Execution Loop](/docs/building_applications/agent_execution_loop)** - How agents process turns and steps
-- **[Tools Integration](/docs/building_applications/tools)** - Adding capabilities to both APIs
-- **[OpenAI Compatibility](../providers/openai)** - Using OpenAI-compatible endpoints
-- **[Safety Guardrails](/docs/building_applications/safety)** - Implementing safety measures in agents
+If you have existing code using the Agents API, it will continue to work. For new projects, use the Responses API.