Skip to content

Build reliable Gen AI solutions without overhead ๐Ÿ•

License

Notifications You must be signed in to change notification settings

emlafza/datapizza-ai

ย 
ย 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

44 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Datapizza AI Logo

Build reliable Gen AI solutions without overhead

Written in Python. Designed for speed. A no-fluff GenAI framework that gets your agents from dev to prod, fast

License: MIT PyPI version Python 3.10+ Downloads GitHub stars

๐Ÿ Homepage โ€ข ๐Ÿš€ Quick Start โ€ข ๐Ÿ“– Documentation โ€ข ๐ŸŽฏ Examples โ€ข ๐Ÿค Community


๐ŸŒŸ Why Datapizza AI?

A framework that keeps your agents predictable, your debugging fast, and your code trusted in production. Built by Engineers, trusted by Engineers.

โšก Less abstraction, more control | ๐Ÿš€ API-first design | ๐Ÿ”ง Observable by design

How to install

pip install datapizza-ai

Client invoke

from datapizza.clients.openai import OpenAIClient

client = OpenAIClient(api_key="YOUR_API_KEY")
result = client.invoke("Hi, how are u?")
print(result.text)

โœจ Key Features

๐ŸŽฏ API-first

  • Multi-Provider Support: OpenAI, Google Gemini, Anthropic, Mistral, Azure
  • Tool Integration: Built-in web search, document processing, custom tools
  • Memory Management: Persistent conversations and context awareness

๐Ÿ” Composable

  • Reusable blocks: Declarative configuration, easy overrides
  • Document Processing: PDF, DOCX, images with Azure AI & Docling
  • Smart Chunking: Context-aware text splitting and embedding
  • Built-in reranking: Add a reranker (e.g., Cohere) to boost relevance

๐Ÿ”ง Observable

  • OpenTelemetry tracing: Standards-based instrumentation
  • Client I/O tracing: Optional toggle to log inputs, outputs, and in-memory context
  • Custom spans: Trace fine-grained phases and sub-steps to pinpoint bottlenecks

๐Ÿš€ Vendor-Agnostic

  • Swap models: Change providers without rewiring business logic
  • Clear Interfaces: Predictable APIs across all components
  • Rich Ecosystem: Modular design with optional components
  • Migration-friendly: Quick migration from other frameworks

๐Ÿš€ Quick Start

Installation

# Core framework
pip install datapizza-ai

# With specific providers (optional)
pip install datapizza-ai-clients-openai
pip install datapizza-ai-clients-google
pip install datapizza-ai-clients-anthropic

Start with Agent

from datapizza.agents import Agent
from datapizza.clients.openai import OpenAIClient
from datapizza.tools import tool

@tool
def get_weather(city: str) -> str:
    return f"The weather in {city} is sunny"

client = OpenAIClient(api_key="YOUR_API_KEY")
agent = Agent(name="assistant", client=client, tools = [get_weather])

response = agent.run("What is the weather in Rome?")
# output: The weather in Rome is sunny

๐Ÿ“Š Detailed Tracing

A key requirement for principled development of LLM applications over your data (RAG systems, agents) is being able to observe and debug.

Datapizza-ai provides built-in observability with OpenTelemetry tracing to help you monitor performance and understand execution flow.

๐Ÿ” Trace Your AI Operations
pip install datapizza-ai-tools-duckduckgo
from datapizza.agents import Agent
from datapizza.clients.openai import OpenAIClient
from datapizza.tools.duckduckgo import DuckDuckGoSearchTool
from datapizza.tracing import ContextTracing

client = OpenAIClient(api_key="OPENAI_API_KEY")
agent = Agent(name="assistant", client=client, tools = [DuckDuckGoSearchTool()])

with ContextTracing().trace("my_ai_operation"):
    response = agent.run("Tell me some news about Bitcoin")

# Output shows:
# โ•ญโ”€ Trace Summary of my_ai_operation โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
# โ”‚ Total Spans: 3                                                      โ”‚
# โ”‚ Duration: 2.45s                                                     โ”‚
# โ”‚ โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”“ |
# โ”‚ โ”ƒ Model       โ”ƒ Prompt Tokens โ”ƒ Completion Tokens โ”ƒ Cached Tokens โ”ƒ |
# โ”‚ โ”กโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ฉ |
# โ”‚ โ”‚ gpt-4o-mini โ”‚ 31            โ”‚ 27                โ”‚ 0             โ”‚ |
# โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ |
# โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ

Demo

๐ŸŽฏ Examples

๐ŸŒ Multi-Agent System

Build sophisticated AI systems where multiple specialized agents collaborate to solve complex tasks. This example shows how to create a trip planning system with dedicated agents for weather information, web search, and planning coordination.

# Install DuckDuckGo tool
pip install datapizza-ai-tools-duckduckgo
from datapizza.agents.agent import Agent
from datapizza.clients.openai import OpenAIClient
from datapizza.tools import tool
from datapizza.tools.duckduckgo import DuckDuckGoSearchTool

client = OpenAIClient(api_key="YOUR_API_KEY", model="gpt-4.1")

@tool
def get_weather(city: str) -> str:
    return f""" it's sunny all the week in {city}"""

weather_agent = Agent(
    name="weather_expert",
    client=client,
    system_prompt="You are a weather expert. Provide detailed weather information and forecasts.",
    tools=[get_weather]
)

web_search_agent = Agent(
    name="web_search_expert",
    client=client,
    system_prompt="You are a web search expert. You can search the web for information.",
    tools=[DuckDuckGoSearchTool()]
)

planner_agent = Agent(
    name="planner",
    client=client,
    system_prompt="You are a trip planner. You should provide a plan for the user. Make sure to provide a detailed plan with the best places to visit and the best time to visit them."
)

planner_agent.can_call([weather_agent, web_search_agent])

response = planner_agent.run(
    "I need to plan a hiking trip in Seattle next week. I want to see some waterfalls and a forest."
)
print(response.text)

๐Ÿ“Š Document Ingestion

Process and index documents for retrieval-augmented generation (RAG). This pipeline automatically parses PDFs, splits them into chunks, generates embeddings, and stores them in a vector database for efficient similarity search.

pip install datapizza-ai-parsers-docling
from datapizza.core.vectorstore import VectorConfig
from datapizza.embedders import ChunkEmbedder
from datapizza.embedders.openai import OpenAIEmbedder
from datapizza.modules.parsers.docling import DoclingParser
from datapizza.modules.splitters import NodeSplitter
from datapizza.pipeline import IngestionPipeline
from datapizza.vectorstores.qdrant import QdrantVectorstore

vectorstore = QdrantVectorstore(location=":memory:")
embedder = ChunkEmbedder(client=OpenAIEmbedder(api_key="YOUR_API_KEY", model_name="text-embedding-3-small"))
vectorstore.create_collection("my_documents",vector_config=[VectorConfig(name="embedding", dimensions=1536)])

pipeline = IngestionPipeline(
    modules=[
        DoclingParser(),
        NodeSplitter(max_char=1024),
        embedder,
    ],
    vector_store=vectorstore,
    collection_name="my_documents"
)

pipeline.run("sample.pdf")

results = vectorstore.search(query_vector = [0.0] * 1536, collection_name="my_documents", k=5)
print(results)

๐Ÿ“Š RAG (Retrieval-Augmented Generation)

Create a complete RAG pipeline that enhances AI responses with relevant document context. This example demonstrates query rewriting, embedding generation, document retrieval, and response generation in a connected workflow.

from datapizza.clients.openai import OpenAIClient
from datapizza.embedders.openai import OpenAIEmbedder
from datapizza.modules.prompt import ChatPromptTemplate
from datapizza.modules.rewriters import ToolRewriter
from datapizza.pipeline import DagPipeline
from datapizza.vectorstores.qdrant import QdrantVectorstore

openai_client = OpenAIClient(
    model="gpt-4o-mini",
    api_key="YOUR_API_KEY"
)

dag_pipeline = DagPipeline()
dag_pipeline.add_module("rewriter", ToolRewriter(client=openai_client, system_prompt="Rewrite user queries to improve retrieval accuracy."))
dag_pipeline.add_module("embedder", OpenAIEmbedder(api_key= "YOUR_API_KEY", model_name="text-embedding-3-small"))
dag_pipeline.add_module("retriever", QdrantVectorstore(host="localhost", port=6333).as_retriever(collection_name="my_documents", k=5))
dag_pipeline.add_module("prompt", ChatPromptTemplate(user_prompt_template="User question: {{user_prompt}}\n:", retrieval_prompt_template="Retrieved content:\n{% for chunk in chunks %}{{ chunk.text }}\n{% endfor %}"))
dag_pipeline.add_module("generator", openai_client)

dag_pipeline.connect("rewriter", "embedder", target_key="text")
dag_pipeline.connect("embedder", "retriever", target_key="query_vector")
dag_pipeline.connect("retriever", "prompt", target_key="chunks")
dag_pipeline.connect("prompt", "generator", target_key="memory")

query = "tell me something about this document"
result = dag_pipeline.run({
    "rewriter": {"user_prompt": query},
    "prompt": {"user_prompt": query},
    "retriever": {"collection_name": "my_documents", "k": 3},
    "generator":{"input": query}
})

print(f"Generated response: {result['generator']}")

๐ŸŒ Ecosystem

๐Ÿค– Supported AI Providers


OpenAI

Google Gemini

Anthropic

Mistral

Azure OpenAI

๐Ÿ”ง Tools & Integrations

Category Components
๐Ÿ“„ Document Parsers Azure AI Document Intelligence, Docling
๐Ÿ” Vector Stores Qdrant
๐ŸŽฏ Rerankers Cohere, Together AI
๐ŸŒ Tools DuckDuckGo Search, Custom Tools
๐Ÿ’พ Caching Redis integration for performance optimization
๐Ÿ“Š Embedders OpenAI, Google, Cohere, FastEmbed

๐ŸŽ“ Learning Resources

๐Ÿค Community

๐ŸŒŸ Contributing

We love contributions! Whether it's:

  • ๐Ÿ› Bug Reports - Help us improve
  • ๐Ÿ’ก Feature Requests - Share your ideas
  • ๐Ÿ“ Documentation - Make it better for everyone
  • ๐Ÿ”ง Code Contributions - Build the future together

Check out our Contributing Guide to get started.

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


Built by Datapizza, the AI native company

A framework made to be easy to learn, easy to maintain and ready for production ๐Ÿ•

โญ Star us on GitHub โ€ข ๐Ÿš€ Get Started โ€ข ๐Ÿ’ฌ Join Discord

Star History

Star History Chart

About

Build reliable Gen AI solutions without overhead ๐Ÿ•

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.9%
  • Makefile 0.1%