GitHub - emlafza/datapizza-ai: Build reliable Gen AI solutions without overhead 🍕

Build reliable Gen AI solutions without overhead

Written in Python. Designed for speed. A no-fluff GenAI framework that gets your agents from dev to prod, fast

🏠Homepage • 🚀 Quick Start • 📖 Documentation • 🎯 Examples • 🤝 Community

🌟 Why Datapizza AI?

A framework that keeps your agents predictable, your debugging fast, and your code trusted in production. Built by Engineers, trusted by Engineers.

⚡ Less abstraction, more control | 🚀 API-first design | 🔧 Observable by design

How to install

pip install datapizza-ai

Client invoke

from datapizza.clients.openai import OpenAIClient

client = OpenAIClient(api_key="YOUR_API_KEY")
result = client.invoke("Hi, how are u?")
print(result.text)

✨ Key Features

🎯 API-first Multi-Provider Support: OpenAI, Google Gemini, Anthropic, Mistral, Azure Tool Integration: Built-in web search, document processing, custom tools Memory Management: Persistent conversations and context awareness	🔍 Composable Reusable blocks: Declarative configuration, easy overrides Document Processing: PDF, DOCX, images with Azure AI & Docling Smart Chunking: Context-aware text splitting and embedding Built-in reranking: Add a reranker (e.g., Cohere) to boost relevance
🔧 Observable OpenTelemetry tracing: Standards-based instrumentation Client I/O tracing: Optional toggle to log inputs, outputs, and in-memory context Custom spans: Trace fine-grained phases and sub-steps to pinpoint bottlenecks	🚀 Vendor-Agnostic Swap models: Change providers without rewiring business logic Clear Interfaces: Predictable APIs across all components Rich Ecosystem: Modular design with optional components Migration-friendly: Quick migration from other frameworks

🚀 Quick Start

Installation

# Core framework
pip install datapizza-ai

# With specific providers (optional)
pip install datapizza-ai-clients-openai
pip install datapizza-ai-clients-google
pip install datapizza-ai-clients-anthropic

Start with Agent

from datapizza.agents import Agent
from datapizza.clients.openai import OpenAIClient
from datapizza.tools import tool

@tool
def get_weather(city: str) -> str:
    return f"The weather in {city} is sunny"

client = OpenAIClient(api_key="YOUR_API_KEY")
agent = Agent(name="assistant", client=client, tools = [get_weather])

response = agent.run("What is the weather in Rome?")
# output: The weather in Rome is sunny

📊 Detailed Tracing

A key requirement for principled development of LLM applications over your data (RAG systems, agents) is being able to observe and debug.

Datapizza-ai provides built-in observability with OpenTelemetry tracing to help you monitor performance and understand execution flow.

🔍 Trace Your AI Operations

pip install datapizza-ai-tools-duckduckgo

from datapizza.agents import Agent
from datapizza.clients.openai import OpenAIClient
from datapizza.tools.duckduckgo import DuckDuckGoSearchTool
from datapizza.tracing import ContextTracing

client = OpenAIClient(api_key="OPENAI_API_KEY")
agent = Agent(name="assistant", client=client, tools = [DuckDuckGoSearchTool()])

with ContextTracing().trace("my_ai_operation"):
    response = agent.run("Tell me some news about Bitcoin")

# Output shows:
# ╭─ Trace Summary of my_ai_operation ──────────────────────────────────╮
# │ Total Spans: 3                                                      │
# │ Duration: 2.45s                                                     │
# │ ┏━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ |
# │ ┃ Model       ┃ Prompt Tokens ┃ Completion Tokens ┃ Cached Tokens ┃ |
# │ ┡━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ |
# │ │ gpt-4o-mini │ 31            │ 27                │ 0             │ |
# │ └─────────────┴───────────────┴───────────────────┴───────────────┘ |
# ╰─────────────────────────────────────────────────────────────────────╯

🎯 Examples

🌐 Multi-Agent System

Build sophisticated AI systems where multiple specialized agents collaborate to solve complex tasks. This example shows how to create a trip planning system with dedicated agents for weather information, web search, and planning coordination.

# Install DuckDuckGo tool
pip install datapizza-ai-tools-duckduckgo

from datapizza.agents.agent import Agent
from datapizza.clients.openai import OpenAIClient
from datapizza.tools import tool
from datapizza.tools.duckduckgo import DuckDuckGoSearchTool

client = OpenAIClient(api_key="YOUR_API_KEY", model="gpt-4.1")

@tool
def get_weather(city: str) -> str:
    return f""" it's sunny all the week in {city}"""

weather_agent = Agent(
    name="weather_expert",
    client=client,
    system_prompt="You are a weather expert. Provide detailed weather information and forecasts.",
    tools=[get_weather]
)

web_search_agent = Agent(
    name="web_search_expert",
    client=client,
    system_prompt="You are a web search expert. You can search the web for information.",
    tools=[DuckDuckGoSearchTool()]
)

planner_agent = Agent(
    name="planner",
    client=client,
    system_prompt="You are a trip planner. You should provide a plan for the user. Make sure to provide a detailed plan with the best places to visit and the best time to visit them."
)

planner_agent.can_call([weather_agent, web_search_agent])

response = planner_agent.run(
    "I need to plan a hiking trip in Seattle next week. I want to see some waterfalls and a forest."
)
print(response.text)

📊 Document Ingestion

Process and index documents for retrieval-augmented generation (RAG). This pipeline automatically parses PDFs, splits them into chunks, generates embeddings, and stores them in a vector database for efficient similarity search.

pip install datapizza-ai-parsers-docling

from datapizza.core.vectorstore import VectorConfig
from datapizza.embedders import ChunkEmbedder
from datapizza.embedders.openai import OpenAIEmbedder
from datapizza.modules.parsers.docling import DoclingParser
from datapizza.modules.splitters import NodeSplitter
from datapizza.pipeline import IngestionPipeline
from datapizza.vectorstores.qdrant import QdrantVectorstore

vectorstore = QdrantVectorstore(location=":memory:")
embedder = ChunkEmbedder(client=OpenAIEmbedder(api_key="YOUR_API_KEY", model_name="text-embedding-3-small"))
vectorstore.create_collection("my_documents",vector_config=[VectorConfig(name="embedding", dimensions=1536)])

pipeline = IngestionPipeline(
    modules=[
        DoclingParser(),
        NodeSplitter(max_char=1024),
        embedder,
    ],
    vector_store=vectorstore,
    collection_name="my_documents"
)

pipeline.run("sample.pdf")

results = vectorstore.search(query_vector = [0.0] * 1536, collection_name="my_documents", k=5)
print(results)

📊 RAG (Retrieval-Augmented Generation)

Create a complete RAG pipeline that enhances AI responses with relevant document context. This example demonstrates query rewriting, embedding generation, document retrieval, and response generation in a connected workflow.

from datapizza.clients.openai import OpenAIClient
from datapizza.embedders.openai import OpenAIEmbedder
from datapizza.modules.prompt import ChatPromptTemplate
from datapizza.modules.rewriters import ToolRewriter
from datapizza.pipeline import DagPipeline
from datapizza.vectorstores.qdrant import QdrantVectorstore

openai_client = OpenAIClient(
    model="gpt-4o-mini",
    api_key="YOUR_API_KEY"
)

dag_pipeline = DagPipeline()
dag_pipeline.add_module("rewriter", ToolRewriter(client=openai_client, system_prompt="Rewrite user queries to improve retrieval accuracy."))
dag_pipeline.add_module("embedder", OpenAIEmbedder(api_key= "YOUR_API_KEY", model_name="text-embedding-3-small"))
dag_pipeline.add_module("retriever", QdrantVectorstore(host="localhost", port=6333).as_retriever(collection_name="my_documents", k=5))
dag_pipeline.add_module("prompt", ChatPromptTemplate(user_prompt_template="User question: {{user_prompt}}\n:", retrieval_prompt_template="Retrieved content:\n{% for chunk in chunks %}{{ chunk.text }}\n{% endfor %}"))
dag_pipeline.add_module("generator", openai_client)

dag_pipeline.connect("rewriter", "embedder", target_key="text")
dag_pipeline.connect("embedder", "retriever", target_key="query_vector")
dag_pipeline.connect("retriever", "prompt", target_key="chunks")
dag_pipeline.connect("prompt", "generator", target_key="memory")

query = "tell me something about this document"
result = dag_pipeline.run({
    "rewriter": {"user_prompt": query},
    "prompt": {"user_prompt": query},
    "retriever": {"collection_name": "my_documents", "k": 3},
    "generator":{"input": query}
})

print(f"Generated response: {result['generator']}")

🌐 Ecosystem

🤖 Supported AI Providers

OpenAI

Google Gemini

Anthropic

Mistral

Azure OpenAI

🔧 Tools & Integrations

Category	Components
📄 Document Parsers	Azure AI Document Intelligence, Docling
🔍 Vector Stores	Qdrant
🎯 Rerankers	Cohere, Together AI
🌐 Tools	DuckDuckGo Search, Custom Tools
💾 Caching	Redis integration for performance optimization
📊 Embedders	OpenAI, Google, Cohere, FastEmbed

🎓 Learning Resources

📖 Complete Documentation - Comprehensive guides and API reference
🎯 RAG Tutorial - Build production RAG systems
🤖 Agent Examples - Real-world agent implementations

🤝 Community

💬 Discord Community
📚 Documentation
📧 GitHub Issues
🐦 Twitter

🌟 Contributing

We love contributions! Whether it's:

🐛 Bug Reports - Help us improve
💡 Feature Requests - Share your ideas
📝 Documentation - Make it better for everyone
🔧 Code Contributions - Build the future together

Check out our Contributing Guide to get started.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Built by Datapizza, the AI native company

A framework made to be easy to learn, easy to maintain and ready for production 🍕

⭐ Star us on GitHub • 🚀 Get Started • 💬 Join Discord

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
datapizza-ai-cache/redis		datapizza-ai-cache/redis
datapizza-ai-clients		datapizza-ai-clients
datapizza-ai-core		datapizza-ai-core
datapizza-ai-embedders		datapizza-ai-embedders
datapizza-ai-eval		datapizza-ai-eval
datapizza-ai-modules		datapizza-ai-modules
datapizza-ai-tools		datapizza-ai-tools
datapizza-ai-vectorstores/datapizza-ai-vectorstores-qdrant		datapizza-ai-vectorstores/datapizza-ai-vectorstores-qdrant
docs		docs
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
pyrightconfig.json		pyrightconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🌟 Why Datapizza AI?

⚡ Less abstraction, more control | 🚀 API-first design | 🔧 Observable by design

How to install

Client invoke

✨ Key Features

🎯 API-first

🔍 Composable

🔧 Observable

🚀 Vendor-Agnostic

🚀 Quick Start

Installation

Start with Agent

📊 Detailed Tracing

🎯 Examples

🌐 Multi-Agent System

📊 Document Ingestion

📊 RAG (Retrieval-Augmented Generation)

🌐 Ecosystem

🤖 Supported AI Providers

🔧 Tools & Integrations

🎓 Learning Resources

🤝 Community

🌟 Contributing

📄 License

Star History

About

Uh oh!

Releases

Packages

Languages

License

emlafza/datapizza-ai

Folders and files

Latest commit

History

Repository files navigation

🌟 Why Datapizza AI?

⚡ Less abstraction, more control | 🚀 API-first design | 🔧 Observable by design

How to install

Client invoke

✨ Key Features

🎯 API-first

🔍 Composable

🔧 Observable

🚀 Vendor-Agnostic

🚀 Quick Start

Installation

Start with Agent

📊 Detailed Tracing

🎯 Examples

🌐 Multi-Agent System

📊 Document Ingestion

📊 RAG (Retrieval-Augmented Generation)

🌐 Ecosystem

🤖 Supported AI Providers

🔧 Tools & Integrations

🎓 Learning Resources

🤝 Community

🌟 Contributing

📄 License

Star History

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages