Pydantic Deep Agents

The batteries-included deep agent harness for Python.
Terminal AI assistant out of the box — or build production agents with one function call.

Docs · PyPI · CLI · Framework · DeepResearch · Examples

What's New

2026-04-12 v0.3.8 — Stuck loop detection, context limit warnings for the model, expanded context file discovery (CLAUDE.md, .cursorrules, etc.), eviction & orphan repair migrated to capabilities hooks.
2026-04-11 v0.3.6 — One-command installer + self-update: curl -fsSL .../install.sh | bash installs everything automatically. New pydantic-deep update command. Startup update notifications with 24-hour PyPI cache.
2026-04-10 v0.3.5 — Headless runner (pydantic-deep run), Docker sandbox with named workspaces, browser automation via Playwright, Harbor adapter for Terminal Bench evaluation.

Full history: CHANGELOG.md

The Agent Harness

Pydantic Deep Agents is an agent harness — the complete infrastructure that wraps an LLM and makes it a functional autonomous agent. The model provides intelligence; the harness provides planning, tools, memory, sandboxed execution, and unlimited context.

🔧 Tool-calling	File read/write/edit, shell execution, glob, grep, web search, web fetch, browser automation — wired up and ready.
🧠 Persistent memory	MEMORY.md persists across sessions. Auto-injected into the system prompt. Each agent has isolated memory by default.
♾️ Unlimited context	Auto-summarization when approaching the token budget. LLM-based or zero-cost sliding window. Never hits a context wall.
🤝 Multi-agent / swarm	Spawn subagents for parallel workstreams. Shared TODO lists with claiming. Peer-to-peer message bus. Full team coordination.
🐳 Sandboxed execution	Docker sandbox with named workspaces. Installed packages persist between sessions. Project dir mounted at /workspace.
🗂️ Plan Mode	Dedicated planner subagent asks clarifying questions and structures the work before execution begins. Headless-compatible.
🔖 Checkpoints	Save conversation state at any point. Rewind to any checkpoint. Fork sessions to explore alternative approaches.
📚 Skills system	Domain-specific knowledge loaded on demand from SKILL.md files. Built-in skills: code-review, refactor, test-writer, git-workflow, and more.
🔌 MCP	Connect any Model Context Protocol server via pydantic-ai's native MCP capability.
⚡ Lifecycle hooks	Claude Code-style PRE_TOOL_USE / POST_TOOL_USE hooks. Shell commands or Python handlers. Audit logging, safety gates.
📐 Structured output	Type-safe Pydantic model responses via `output_type`. No JSON parsing. No `dict["key"]`. Full IDE autocomplete.
🔄 Stuck loop detection	Detects repeated identical tool calls, A-B-A-B alternating patterns, and no-op calls. Warns the model or stops the run.
⚠️ Context limit warnings	Model receives URGENT/CRITICAL warnings when approaching context limits (70%), well before auto-compression (90%).
💰 Cost tracking	Real-time token and USD cost tracking per run and cumulative. Hard budget limits with `BudgetExceededError`.
✨ Self-improving	`/improve` analyzes past sessions and proposes updates to MEMORY.md, SOUL.md, and AGENTS.md.
🏷️ 100% type-safe	Pyright strict + MyPy strict. 100% test coverage. Every public API is fully typed — safe to use in production.

Built natively on pydantic-ai — uses the Capabilities API directly, inherits all pydantic-ai streaming, multi-model support, and Pydantic validation automatically.

🖥️ CLI — Terminal AI Assistant

A Claude Code-style terminal AI assistant that works with any model and any provider.

Install (macOS & Linux)

curl -fsSL https://raw.githubusercontent.com/vstorm-co/pydantic-deep/main/install.sh | bash

No Python setup required — the script installs uv and the CLI automatically. Then:

export ANTHROPIC_API_KEY=sk-ant-...
pydantic-deep

Windows / manual: pip install "pydantic-deep[cli]" · Update: pydantic-deep update

Model & Provider Support

Works with any model that supports tool-calling:

Provider	Example models
Anthropic	`anthropic:claude-opus-4-6`, `claude-sonnet-4-6`
OpenAI	`openai:gpt-5.4`, `gpt-4.1`
OpenRouter	`openrouter:anthropic/claude-opus-4-6` (200+ models)
Google Gemini	`google-gla:gemini-2.5-pro`
Ollama (local)	`ollama:qwen3`, `ollama:llama3.3`
Any OpenAI-compatible	Custom base URL via env

Switch model anytime: pydantic-deep config set model openai:gpt-5.4 or /model in the TUI.

What you get in the TUI

	Feature
💬	Streaming chat with tool call visualization
📁	File read / write / edit, shell execution, glob, grep
🧠	Persistent memory and self-improvement across sessions
🗂️	Task planning, plan mode, and subagent delegation
♾️	Context compression for unlimited conversations
🔖	Checkpoints — save, rewind, and fork any session
🌐	Web search & fetch built-in
🖥️	Browser automation via Playwright (`--browser`)
🐳	Docker sandbox — sandboxed execution with named workspaces
💭	Extended thinking — `minimal` / `low` / `medium` / `high` / `xhigh`
💰	Real-time cost and token tracking per session
🛡️	Tool approval dialogs — approve, auto-approve, or deny per tool call
@	`@filename` file references · `!command` shell passthrough
✨	`/improve`, `/skills`, `/diff`, `/model`, `/theme`, `/compact`, and more

Usage

# Interactive TUI (default)
pydantic-deep
pydantic-deep tui --model openrouter:anthropic/claude-opus-4-6

# Headless deep agent — benchmarks, CI/CD, scripted automation
pydantic-deep run "Fix the failing test in test_auth.py"
pydantic-deep run --task-file task.md --json
pydantic-deep run "Refactor utils.py" --no-web-search --thinking false

# Docker sandbox — sandboxed execution, project dir mounted at /workspace
pydantic-deep tui --sandbox docker
pydantic-deep tui --workspace ml-env     # named workspace, packages persist

# Browser automation (requires pydantic-deep[browser])
pydantic-deep tui --browser
pydantic-deep run "Go to example.com and summarize the content" --browser

# Config & skills
pydantic-deep config set model anthropic:claude-sonnet-4-6
pydantic-deep skills list
pydantic-deep update                     # update to latest version

See CLI docs for the full reference.

🐍 Framework — Build Your Own Agent

pip install pydantic-deep

One function call gives you a production deep agent with planning, tool-calling, multi-agent delegation, persistent memory, unlimited context, and cost tracking. Everything is a toggle:

from pydantic_ai_backends import StateBackend
from pydantic_deep import create_deep_agent, create_default_deps

agent = create_deep_agent(
    model="anthropic:claude-sonnet-4-6",
    include_todo=True,          # Task planning with subtasks and dependencies
    include_subagents=True,     # Multi-agent swarm — delegate to subagents
    include_skills=True,        # Domain-specific skills from SKILL.md files
    include_memory=True,        # Persistent memory across sessions
    include_plan=True,          # Structured planning before execution
    include_teams=True,         # Agent teams with shared TODO lists + message bus
    web_search=True,            # Tool-calling: web search
    web_fetch=True,             # Tool-calling: web fetch
    thinking="high",            # Extended thinking / reasoning effort
    context_manager=True,       # Unlimited context via auto-summarization
    cost_tracking=True,         # Token/USD budget enforcement
    include_checkpoints=True,   # Save, rewind, and fork conversations
)

deps = create_default_deps(StateBackend())
result = await agent.run("Build a REST API for user auth", deps=deps)

Structured Output

Type-safe responses with Pydantic models — no JSON parsing, no dict["key"]:

from pydantic import BaseModel

class CodeReview(BaseModel):
    summary: str
    issues: list[str]
    score: int

agent = create_deep_agent(output_type=CodeReview)
result = await agent.run("Review the auth module", deps=deps)
print(result.output.score)  # fully typed

Multi-Agent Swarm

Spawn isolated subagents for parallel workstreams. Each subagent is a full deep agent with its own tool-calling, memory, and context:

agent = create_deep_agent(
    subagents=[
        {
            "name": "researcher",
            "description": "Researches topics using web search",
            "instructions": "Search the web, synthesize findings, cite sources.",
        },
        {
            "name": "code-reviewer",
            "description": "Reviews code for quality, security, and performance",
            "instructions": "Check for security issues, N+1 queries, missing tests...",
        },
    ],
)
# Main agent delegates: task(description="Review auth.py", subagent_type="code-reviewer")

Unlimited Context

Auto-summarization keeps long-running agents within the token budget:

from pydantic_deep import create_summarization_processor

processor = create_summarization_processor(
    trigger=("tokens", 100000),  # compress at 100k tokens
    keep=("messages", 20),       # keep last 20 messages verbatim
)
agent = create_deep_agent(history_processors=[processor])

Claude Code-Style Lifecycle Hooks

from pydantic_deep import Hook, HookEvent

agent = create_deep_agent(
    hooks=[
        Hook(
            event=HookEvent.PRE_TOOL_USE,
            command="echo 'Tool: $TOOL_NAME args: $TOOL_INPUT' >> /tmp/audit.log",
        ),
    ],
)

MCP Servers

from pydantic_ai.capabilities import MCP

agent = create_deep_agent(
    capabilities=[MCP(url="https://mcp.example.com/api")],
)

Context Files

Pydantic Deep Agents auto-discovers and injects project-specific context into every conversation:

File	Purpose	Who Sees It
`AGENTS.md`	Project conventions, architecture, instructions	Main agent + all subagents
`CLAUDE.md`	Claude Code project instructions	Main agent + all subagents
`SOUL.md`	Agent personality, style, communication preferences	Main agent only
`.cursorrules`	Cursor editor conventions	Main agent only
`.github/copilot-instructions.md`	GitHub Copilot instructions	Main agent only
`CONVENTIONS.md`	Project coding conventions	Main agent only
`CODING_GUIDELINES.md`	Coding guidelines	Main agent only
`MEMORY.md`	Persistent memory — read/write/update tools	Per-agent (isolated)

Compatible with Claude Code, Cursor, GitHub Copilot, and other agent frameworks. AGENTS.md follows the agents.md spec.

See the full API reference for all options.

🔬 DeepResearch — Reference App

A full-featured research deep agent with web UI — built entirely on Pydantic Deep Agents.

Plan Mode — planner asks clarifying questions	Multi-Agent Swarm — 5 subagents researching in parallel
Excalidraw Canvas — live diagrams synced with agent	File Browser — workspace files with inline preview

Web search (Tavily, Brave, Jina), sandboxed code execution, Excalidraw diagrams, plan mode, report export.

cd apps/deepresearch && uv sync && cp .env.example .env
uv run deepresearch    # → http://localhost:8080

See apps/deepresearch/README.md for full setup.

Architecture

Pydantic Deep Agents uses pydantic-ai's native Capabilities API for all cross-cutting concerns — hooks, memory, skills, context files, teams, and plan mode are all first-class pydantic-ai capabilities.

Capabilities

Capability	Package	What It Does
CostTracking	pydantic-ai-shields	Token/USD budget enforcement and real-time cost callbacks
ContextManagerCapability	summarization-pydantic-ai	Unlimited context via auto-summarization
LimitWarnerCapability	summarization-pydantic-ai	URGENT/CRITICAL warnings when context limits approach
StuckLoopDetection	pydantic-deep	Detects and breaks repetitive agent loops
EvictionCapability	pydantic-deep	Intercepts large tool outputs before they enter history
PatchToolCallsCapability	pydantic-deep	Fixes orphaned tool calls/results in history
HooksCapability	pydantic-deep	Claude Code-style PRE/POST_TOOL_USE lifecycle hooks
CheckpointMiddleware	pydantic-deep	Save, rewind, and fork conversation state
WebSearch / WebFetch	pydantic-ai built-in	Tool-calling: web search and URL fetching
SkillsCapability	pydantic-deep	Domain-specific skills from SKILL.md files
MemoryCapability	pydantic-deep	Persistent memory across sessions
TeamCapability	pydantic-deep	Multi-agent swarm — shared TODOs, message bus
PlanCapability	pydantic-deep	Structured planning before execution

Modular Packages

Every component is a standalone package — use only what you need:

Package	What It Does
pydantic-ai-backend	File storage, Docker sandbox, console toolset
pydantic-ai-todo	Task planning with subtasks and dependencies
subagents-pydantic-ai	Sync/async delegation, background tasks, cancellation
summarization-pydantic-ai	LLM summaries or zero-cost sliding window
pydantic-ai-shields	Cost tracking, input/output/tool blocking

                         Pydantic Deep Agents
+---------------------------------------------------------------------+
|                                                                     |
|   +----------+ +----------+ +----------+ +----------+ +---------+   |
|   | Planning | |Filesystem| | Subagents| |  Skills  | |  Teams  |   |
|   +----+-----+ +----+-----+ +----+-----+ +----+-----+ +----+----+   |
|        |            |            |            |            |        |
|        +------------+-----+------+------------+------------+        |
|                           |                                         |
|                           v                                         |
|  Summarization --> +------------------+ <-- Capabilities            |
|  Checkpointing --> |    Deep Agent    | <-- Hooks                   |
|  Cost Tracking --> |   (pydantic-ai)  | <-- Memory                  |
|  Loop Detect   --> |                  | <-- Limit Warner            |
|                    +--------+---------+                             |
|                             |                                       |
|           +-----------------+-----------------+                     |
|           v                 v                 v                     |
|    +------------+    +------------+    +------------+               |
|    |   State    |    |   Local    |    |   Docker   |               |
|    |  Backend   |    |  Backend   |    |  Sandbox   |               |
|    +------------+    +------------+    +------------+               |
|                                                                     |
+---------------------------------------------------------------------+

Full Feature List

Expand

Tool-Calling

ls, read_file, write_file, edit_file, glob, grep, execute — full filesystem access
Docker sandbox with named workspaces — sandboxed execution, packages persist between sessions
Web search (DuckDuckGo, Tavily, Brave) and web fetch
Browser automation via Playwright — navigate, click, type_text, screenshot, execute_js, and more

Deep Agent Architecture

Planning — Task tracking with subtasks, dependencies, and cycle detection
Subagents / Multi-agent swarm — Sync/async delegation, background task management, soft/hard cancellation
Agent Teams — Shared TODO lists with claiming and dependency tracking, peer-to-peer message bus
Plan Mode — Dedicated planner subagent for structured planning before execution
Persistent memory — MEMORY.md that persists across sessions, auto-injected into system prompt
Self-improving — /improve analyzes past sessions, proposes updates to context files

Context & Memory

Unlimited context — Auto-summarization when approaching token budget (LLM-based or sliding window)
Context limit warnings — Model receives URGENT/CRITICAL messages when approaching 70% context usage
Eviction capability — Intercepts large tool outputs via after_tool_execute before they enter history
Context files — Auto-discover and inject AGENTS.md, CLAUDE.md, SOUL.md, .cursorrules, copilot-instructions, CONVENTIONS.md, CODING_GUIDELINES.md
Checkpoints — Save state, rewind or fork conversations. In-memory and file-based stores. Per-run isolation via for_run()

Reliability

Stuck loop detection — Detects repeated identical calls, A-B-A-B alternating, and no-op patterns. Warns or stops the agent
Orphan repair — Fixes orphaned tool calls/results in conversation history before each model request
Context limit warnings — Injects URGENT/CRITICAL messages so the model knows to wrap up

Production Features

MCP — Connect any Model Context Protocol server
Lifecycle hooks — Claude Code-style PRE/POST_TOOL_USE. Shell commands or Python handlers
Structured output — Type-safe responses with Pydantic models via output_type
Cost tracking — Token/USD budgets with automatic enforcement and real-time callbacks
Streaming — Full streaming support for real-time responses
Image support — Multi-modal analysis with image inputs
Human-in-the-loop — Confirmation workflows for sensitive operations
Output styles — Built-in (concise, explanatory, formal, conversational) or custom

CLI

Interactive TUI (Textual) with streaming, tool visualization, session management
Headless runner (pydantic-deep run) for CI/CD, benchmarks, scripted automation
20+ slash commands: /improve, /compact, /diff, /model, /provider, /skills, /theme, and more
@filename file references, !command shell passthrough
Tool approval dialogs with auto-approve
Debug logging per session

Contributing

git clone https://github.com/vstorm-co/pydantic-deepagents.git
cd pydantic-deepagents
make install
make test   # 100% coverage required
make all    # lint + typecheck + test

Vstorm OSS Ecosystem

pydantic-deepagents is part of a broader open-source ecosystem for production AI agents:

Project	Description
full-stack-ai-agent-template	Zero to production AI app in 30 minutes. FastAPI + Next.js 15, 6 AI frameworks (incl. pydantic-deep), RAG pipeline, 75+ config options.
pydantic-ai-shields	Drop-in guardrails for Pydantic AI agents. 5 infra + 5 content shields.
pydantic-ai-subagents	Declarative multi-agent orchestration with token tracking.
pydantic-ai-summarization	Smart context compression for long-running agents.
pydantic-ai-backend	Sandboxed execution for AI agents. Docker + Daytona.

Want the full stack? Use full-stack-ai-agent-template — it ships pydantic-deep integrated with FastAPI, Next.js, auth, WebSocket streaming, and RAG out of the box.

Browse all projects at oss.vstorm.co

Star History

License

MIT — see LICENSE

Need help shipping AI agents in production?

We're Vstorm — an Applied Agentic AI Engineering Consultancy
with 30+ production agent implementations. Pydantic Deep Agents is what we build them with.

Made with care by Vstorm

Name		Name	Last commit message	Last commit date
Latest commit History 241 Commits
.github		.github
apps		apps
assets		assets
docs		docs
examples		examples
jobs/terminal-bench		jobs/terminal-bench
pydantic_deep		pydantic_deep
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
GOVERNANCE.md		GOVERNANCE.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
install.sh		install.sh
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

Pydantic Deep Agents

What's New

The Agent Harness

🖥️ CLI — Terminal AI Assistant

Install (macOS & Linux)

Model & Provider Support

What you get in the TUI

Usage

🐍 Framework — Build Your Own Agent

Structured Output

Multi-Agent Swarm

Unlimited Context

Claude Code-Style Lifecycle Hooks

MCP Servers

Context Files

🔬 DeepResearch — Reference App

Architecture

Capabilities

Modular Packages

Full Feature List

Tool-Calling

Deep Agent Architecture

Context & Memory

Reliability

Production Features

CLI

Contributing

Vstorm OSS Ecosystem

Star History

License

Need help shipping AI agents in production?

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 37

Uh oh!

Contributors

Uh oh!

Languages