The batteries-included deep agent harness for Python.
Terminal AI assistant out of the box — or build production agents with one function call.
Docs · PyPI · CLI · Framework · DeepResearch · Examples
- 2026-04-12 v0.3.8 — Stuck loop detection, context limit warnings for the model, expanded context file discovery (CLAUDE.md, .cursorrules, etc.), eviction & orphan repair migrated to capabilities hooks.
- 2026-04-11 v0.3.6 — One-command installer + self-update:
curl -fsSL .../install.sh | bashinstalls everything automatically. Newpydantic-deep updatecommand. Startup update notifications with 24-hour PyPI cache. - 2026-04-10 v0.3.5 — Headless runner (
pydantic-deep run), Docker sandbox with named workspaces, browser automation via Playwright, Harbor adapter for Terminal Bench evaluation.
Full history: CHANGELOG.md
Pydantic Deep Agents is an agent harness — the complete infrastructure that wraps an LLM and makes it a functional autonomous agent. The model provides intelligence; the harness provides planning, tools, memory, sandboxed execution, and unlimited context.
| 🔧 Tool-calling | File read/write/edit, shell execution, glob, grep, web search, web fetch, browser automation — wired up and ready. |
| 🧠 Persistent memory | MEMORY.md persists across sessions. Auto-injected into the system prompt. Each agent has isolated memory by default. |
| ♾️ Unlimited context | Auto-summarization when approaching the token budget. LLM-based or zero-cost sliding window. Never hits a context wall. |
| 🤝 Multi-agent / swarm | Spawn subagents for parallel workstreams. Shared TODO lists with claiming. Peer-to-peer message bus. Full team coordination. |
| 🐳 Sandboxed execution | Docker sandbox with named workspaces. Installed packages persist between sessions. Project dir mounted at /workspace. |
| 🗂️ Plan Mode | Dedicated planner subagent asks clarifying questions and structures the work before execution begins. Headless-compatible. |
| 🔖 Checkpoints | Save conversation state at any point. Rewind to any checkpoint. Fork sessions to explore alternative approaches. |
| 📚 Skills system | Domain-specific knowledge loaded on demand from SKILL.md files. Built-in skills: code-review, refactor, test-writer, git-workflow, and more. |
| 🔌 MCP | Connect any Model Context Protocol server via pydantic-ai's native MCP capability. |
| ⚡ Lifecycle hooks | Claude Code-style PRE_TOOL_USE / POST_TOOL_USE hooks. Shell commands or Python handlers. Audit logging, safety gates. |
| 📐 Structured output | Type-safe Pydantic model responses via output_type. No JSON parsing. No dict["key"]. Full IDE autocomplete. |
| 🔄 Stuck loop detection | Detects repeated identical tool calls, A-B-A-B alternating patterns, and no-op calls. Warns the model or stops the run. |
| Model receives URGENT/CRITICAL warnings when approaching context limits (70%), well before auto-compression (90%). | |
| 💰 Cost tracking | Real-time token and USD cost tracking per run and cumulative. Hard budget limits with BudgetExceededError. |
| ✨ Self-improving | /improve analyzes past sessions and proposes updates to MEMORY.md, SOUL.md, and AGENTS.md. |
| 🏷️ 100% type-safe | Pyright strict + MyPy strict. 100% test coverage. Every public API is fully typed — safe to use in production. |
Built natively on pydantic-ai — uses the Capabilities API directly, inherits all pydantic-ai streaming, multi-model support, and Pydantic validation automatically.
A Claude Code-style terminal AI assistant that works with any model and any provider.
curl -fsSL https://raw.githubusercontent.com/vstorm-co/pydantic-deep/main/install.sh | bashNo Python setup required — the script installs uv and the CLI automatically. Then:
export ANTHROPIC_API_KEY=sk-ant-...
pydantic-deepWindows / manual:
pip install "pydantic-deep[cli]"· Update:pydantic-deep update
Works with any model that supports tool-calling:
| Provider | Example models |
|---|---|
| Anthropic | anthropic:claude-opus-4-6, claude-sonnet-4-6 |
| OpenAI | openai:gpt-5.4, gpt-4.1 |
| OpenRouter | openrouter:anthropic/claude-opus-4-6 (200+ models) |
| Google Gemini | google-gla:gemini-2.5-pro |
| Ollama (local) | ollama:qwen3, ollama:llama3.3 |
| Any OpenAI-compatible | Custom base URL via env |
Switch model anytime: pydantic-deep config set model openai:gpt-5.4 or /model in the TUI.
| Feature | |
|---|---|
| 💬 | Streaming chat with tool call visualization |
| 📁 | File read / write / edit, shell execution, glob, grep |
| 🧠 | Persistent memory and self-improvement across sessions |
| 🗂️ | Task planning, plan mode, and subagent delegation |
| ♾️ | Context compression for unlimited conversations |
| 🔖 | Checkpoints — save, rewind, and fork any session |
| 🌐 | Web search & fetch built-in |
| 🖥️ | Browser automation via Playwright (--browser) |
| 🐳 | Docker sandbox — sandboxed execution with named workspaces |
| 💭 | Extended thinking — minimal / low / medium / high / xhigh |
| 💰 | Real-time cost and token tracking per session |
| 🛡️ | Tool approval dialogs — approve, auto-approve, or deny per tool call |
| @ | @filename file references · !command shell passthrough |
| ✨ | /improve, /skills, /diff, /model, /theme, /compact, and more |
# Interactive TUI (default)
pydantic-deep
pydantic-deep tui --model openrouter:anthropic/claude-opus-4-6
# Headless deep agent — benchmarks, CI/CD, scripted automation
pydantic-deep run "Fix the failing test in test_auth.py"
pydantic-deep run --task-file task.md --json
pydantic-deep run "Refactor utils.py" --no-web-search --thinking false
# Docker sandbox — sandboxed execution, project dir mounted at /workspace
pydantic-deep tui --sandbox docker
pydantic-deep tui --workspace ml-env # named workspace, packages persist
# Browser automation (requires pydantic-deep[browser])
pydantic-deep tui --browser
pydantic-deep run "Go to example.com and summarize the content" --browser
# Config & skills
pydantic-deep config set model anthropic:claude-sonnet-4-6
pydantic-deep skills list
pydantic-deep update # update to latest versionSee CLI docs for the full reference.
pip install pydantic-deepOne function call gives you a production deep agent with planning, tool-calling, multi-agent delegation, persistent memory, unlimited context, and cost tracking. Everything is a toggle:
from pydantic_ai_backends import StateBackend
from pydantic_deep import create_deep_agent, create_default_deps
agent = create_deep_agent(
model="anthropic:claude-sonnet-4-6",
include_todo=True, # Task planning with subtasks and dependencies
include_subagents=True, # Multi-agent swarm — delegate to subagents
include_skills=True, # Domain-specific skills from SKILL.md files
include_memory=True, # Persistent memory across sessions
include_plan=True, # Structured planning before execution
include_teams=True, # Agent teams with shared TODO lists + message bus
web_search=True, # Tool-calling: web search
web_fetch=True, # Tool-calling: web fetch
thinking="high", # Extended thinking / reasoning effort
context_manager=True, # Unlimited context via auto-summarization
cost_tracking=True, # Token/USD budget enforcement
include_checkpoints=True, # Save, rewind, and fork conversations
)
deps = create_default_deps(StateBackend())
result = await agent.run("Build a REST API for user auth", deps=deps)Type-safe responses with Pydantic models — no JSON parsing, no dict["key"]:
from pydantic import BaseModel
class CodeReview(BaseModel):
summary: str
issues: list[str]
score: int
agent = create_deep_agent(output_type=CodeReview)
result = await agent.run("Review the auth module", deps=deps)
print(result.output.score) # fully typedSpawn isolated subagents for parallel workstreams. Each subagent is a full deep agent with its own tool-calling, memory, and context:
agent = create_deep_agent(
subagents=[
{
"name": "researcher",
"description": "Researches topics using web search",
"instructions": "Search the web, synthesize findings, cite sources.",
},
{
"name": "code-reviewer",
"description": "Reviews code for quality, security, and performance",
"instructions": "Check for security issues, N+1 queries, missing tests...",
},
],
)
# Main agent delegates: task(description="Review auth.py", subagent_type="code-reviewer")Auto-summarization keeps long-running agents within the token budget:
from pydantic_deep import create_summarization_processor
processor = create_summarization_processor(
trigger=("tokens", 100000), # compress at 100k tokens
keep=("messages", 20), # keep last 20 messages verbatim
)
agent = create_deep_agent(history_processors=[processor])from pydantic_deep import Hook, HookEvent
agent = create_deep_agent(
hooks=[
Hook(
event=HookEvent.PRE_TOOL_USE,
command="echo 'Tool: $TOOL_NAME args: $TOOL_INPUT' >> /tmp/audit.log",
),
],
)from pydantic_ai.capabilities import MCP
agent = create_deep_agent(
capabilities=[MCP(url="https://mcp.example.com/api")],
)Pydantic Deep Agents auto-discovers and injects project-specific context into every conversation:
| File | Purpose | Who Sees It |
|---|---|---|
AGENTS.md |
Project conventions, architecture, instructions | Main agent + all subagents |
CLAUDE.md |
Claude Code project instructions | Main agent + all subagents |
SOUL.md |
Agent personality, style, communication preferences | Main agent only |
.cursorrules |
Cursor editor conventions | Main agent only |
.github/copilot-instructions.md |
GitHub Copilot instructions | Main agent only |
CONVENTIONS.md |
Project coding conventions | Main agent only |
CODING_GUIDELINES.md |
Coding guidelines | Main agent only |
MEMORY.md |
Persistent memory — read/write/update tools | Per-agent (isolated) |
Compatible with Claude Code, Cursor, GitHub Copilot, and other agent frameworks. AGENTS.md follows the agents.md spec.
See the full API reference for all options.
A full-featured research deep agent with web UI — built entirely on Pydantic Deep Agents.
Web search (Tavily, Brave, Jina), sandboxed code execution, Excalidraw diagrams, plan mode, report export.
cd apps/deepresearch && uv sync && cp .env.example .env
uv run deepresearch # → http://localhost:8080See apps/deepresearch/README.md for full setup.
Pydantic Deep Agents uses pydantic-ai's native Capabilities API for all cross-cutting concerns — hooks, memory, skills, context files, teams, and plan mode are all first-class pydantic-ai capabilities.
| Capability | Package | What It Does |
|---|---|---|
| CostTracking | pydantic-ai-shields | Token/USD budget enforcement and real-time cost callbacks |
| ContextManagerCapability | summarization-pydantic-ai | Unlimited context via auto-summarization |
| LimitWarnerCapability | summarization-pydantic-ai | URGENT/CRITICAL warnings when context limits approach |
| StuckLoopDetection | pydantic-deep | Detects and breaks repetitive agent loops |
| EvictionCapability | pydantic-deep | Intercepts large tool outputs before they enter history |
| PatchToolCallsCapability | pydantic-deep | Fixes orphaned tool calls/results in history |
| HooksCapability | pydantic-deep | Claude Code-style PRE/POST_TOOL_USE lifecycle hooks |
| CheckpointMiddleware | pydantic-deep | Save, rewind, and fork conversation state |
| WebSearch / WebFetch | pydantic-ai built-in | Tool-calling: web search and URL fetching |
| SkillsCapability | pydantic-deep | Domain-specific skills from SKILL.md files |
| MemoryCapability | pydantic-deep | Persistent memory across sessions |
| TeamCapability | pydantic-deep | Multi-agent swarm — shared TODOs, message bus |
| PlanCapability | pydantic-deep | Structured planning before execution |
Every component is a standalone package — use only what you need:
| Package | What It Does |
|---|---|
| pydantic-ai-backend | File storage, Docker sandbox, console toolset |
| pydantic-ai-todo | Task planning with subtasks and dependencies |
| subagents-pydantic-ai | Sync/async delegation, background tasks, cancellation |
| summarization-pydantic-ai | LLM summaries or zero-cost sliding window |
| pydantic-ai-shields | Cost tracking, input/output/tool blocking |
Pydantic Deep Agents
+---------------------------------------------------------------------+
| |
| +----------+ +----------+ +----------+ +----------+ +---------+ |
| | Planning | |Filesystem| | Subagents| | Skills | | Teams | |
| +----+-----+ +----+-----+ +----+-----+ +----+-----+ +----+----+ |
| | | | | | |
| +------------+-----+------+------------+------------+ |
| | |
| v |
| Summarization --> +------------------+ <-- Capabilities |
| Checkpointing --> | Deep Agent | <-- Hooks |
| Cost Tracking --> | (pydantic-ai) | <-- Memory |
| Loop Detect --> | | <-- Limit Warner |
| +--------+---------+ |
| | |
| +-----------------+-----------------+ |
| v v v |
| +------------+ +------------+ +------------+ |
| | State | | Local | | Docker | |
| | Backend | | Backend | | Sandbox | |
| +------------+ +------------+ +------------+ |
| |
+---------------------------------------------------------------------+
Expand
ls,read_file,write_file,edit_file,glob,grep,execute— full filesystem access- Docker sandbox with named workspaces — sandboxed execution, packages persist between sessions
- Web search (DuckDuckGo, Tavily, Brave) and web fetch
- Browser automation via Playwright —
navigate,click,type_text,screenshot,execute_js, and more
- Planning — Task tracking with subtasks, dependencies, and cycle detection
- Subagents / Multi-agent swarm — Sync/async delegation, background task management, soft/hard cancellation
- Agent Teams — Shared TODO lists with claiming and dependency tracking, peer-to-peer message bus
- Plan Mode — Dedicated planner subagent for structured planning before execution
- Persistent memory — MEMORY.md that persists across sessions, auto-injected into system prompt
- Self-improving —
/improveanalyzes past sessions, proposes updates to context files
- Unlimited context — Auto-summarization when approaching token budget (LLM-based or sliding window)
- Context limit warnings — Model receives URGENT/CRITICAL messages when approaching 70% context usage
- Eviction capability — Intercepts large tool outputs via
after_tool_executebefore they enter history - Context files — Auto-discover and inject AGENTS.md, CLAUDE.md, SOUL.md, .cursorrules, copilot-instructions, CONVENTIONS.md, CODING_GUIDELINES.md
- Checkpoints — Save state, rewind or fork conversations. In-memory and file-based stores. Per-run isolation via
for_run()
- Stuck loop detection — Detects repeated identical calls, A-B-A-B alternating, and no-op patterns. Warns or stops the agent
- Orphan repair — Fixes orphaned tool calls/results in conversation history before each model request
- Context limit warnings — Injects URGENT/CRITICAL messages so the model knows to wrap up
- MCP — Connect any Model Context Protocol server
- Lifecycle hooks — Claude Code-style PRE/POST_TOOL_USE. Shell commands or Python handlers
- Structured output — Type-safe responses with Pydantic models via
output_type - Cost tracking — Token/USD budgets with automatic enforcement and real-time callbacks
- Streaming — Full streaming support for real-time responses
- Image support — Multi-modal analysis with image inputs
- Human-in-the-loop — Confirmation workflows for sensitive operations
- Output styles — Built-in (concise, explanatory, formal, conversational) or custom
- Interactive TUI (Textual) with streaming, tool visualization, session management
- Headless runner (
pydantic-deep run) for CI/CD, benchmarks, scripted automation - 20+ slash commands:
/improve,/compact,/diff,/model,/provider,/skills,/theme, and more @filenamefile references,!commandshell passthrough- Tool approval dialogs with auto-approve
- Debug logging per session
git clone https://github.com/vstorm-co/pydantic-deepagents.git
cd pydantic-deepagents
make install
make test # 100% coverage required
make all # lint + typecheck + testpydantic-deepagents is part of a broader open-source ecosystem for production AI agents:
| Project | Description | |
|---|---|---|
| full-stack-ai-agent-template | Zero to production AI app in 30 minutes. FastAPI + Next.js 15, 6 AI frameworks (incl. pydantic-deep), RAG pipeline, 75+ config options. | |
| pydantic-ai-shields | Drop-in guardrails for Pydantic AI agents. 5 infra + 5 content shields. | |
| pydantic-ai-subagents | Declarative multi-agent orchestration with token tracking. | |
| pydantic-ai-summarization | Smart context compression for long-running agents. | |
| pydantic-ai-backend | Sandboxed execution for AI agents. Docker + Daytona. |
Want the full stack? Use full-stack-ai-agent-template — it ships pydantic-deep integrated with FastAPI, Next.js, auth, WebSocket streaming, and RAG out of the box.
Browse all projects at oss.vstorm.co
MIT — see LICENSE




