Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
eb3df89
Add core harness API scaffold with context-scoped runtime
saschabuehrle Feb 25, 2026
8b0d2e0
Harden harness core scaffolding and complete API test coverage
saschabuehrle Feb 25, 2026
dadd279
feat(harness): implement OpenAI Python client auto-instrumentation
saschabuehrle Feb 25, 2026
75ff333
fix: address PR review — off-mode unpatch, enforce budget gate, strea…
saschabuehrle Feb 26, 2026
1f0fad0
Add OpenAI Agents SDK harness integration (opt-in)
saschabuehrle Feb 25, 2026
7bc50de
fix(openai-agents): align SDK interface and enforce-safe errors
saschabuehrle Feb 26, 2026
559fb60
Add CrewAI harness integration with before/after LLM-call hooks
saschabuehrle Feb 26, 2026
a498bf3
fix: address PR review — dict messages, start time leak, lint, extras
saschabuehrle Feb 26, 2026
1cf5590
docs(plan): claim v2 enforce-actions feature branch
saschabuehrle Feb 26, 2026
cb69081
feat(harness): enforce switch-model, deny-tool, and stop actions
saschabuehrle Feb 26, 2026
d032ba6
feat(harness): implement enforce actions for v2 harness
saschabuehrle Feb 26, 2026
bcee09c
fix(harness): clarify observe traces and hard-stop semantics
saschabuehrle Feb 26, 2026
ee6e040
perf(harness): optimize model utility hot paths
saschabuehrle Feb 26, 2026
b54637b
refactor(harness): unify pricing profiles across integrations
saschabuehrle Feb 26, 2026
6afcfa7
docs(plan): claim langchain harness extension branch
saschabuehrle Feb 26, 2026
cc51cf7
feat(harness): add privacy-safe decision telemetry and callback hooks
saschabuehrle Feb 26, 2026
ae1cf97
fix(harness): address telemetry review findings
saschabuehrle Mar 2, 2026
49ee601
fix(harness): avoid shadowing cascadeflow.agent module
saschabuehrle Mar 2, 2026
c1236f1
style: apply black formatting for harness integration files
saschabuehrle Mar 2, 2026
0261925
feat(langchain): add harness-aware callback and state extractor
saschabuehrle Feb 26, 2026
44506b8
feat(langchain): auto-attach harness callback in active run scopes
saschabuehrle Feb 26, 2026
f70572d
docs(plan): mark langchain harness extension branch completed
saschabuehrle Feb 26, 2026
d740cad
fix(langchain): address PR #161 review findings
saschabuehrle Feb 26, 2026
3bd7899
fix(langchain): enforce tool caps on executed calls and harden tool e…
saschabuehrle Feb 26, 2026
8f74dee
fix(harness): avoid shadowing cascadeflow.agent module
saschabuehrle Mar 2, 2026
5972e8b
feat(bench): add reproducibility pipeline for V2 Go/No-Go validation
saschabuehrle Mar 2, 2026
97250f4
docs(plan): update workboard — bench-repro-pipeline PR #163 in review
saschabuehrle Mar 2, 2026
805fef1
style(bench): apply linter formatting to repro pipeline files
saschabuehrle Mar 2, 2026
f05ca3d
style(langchain): finalize harness callback typing and formatting
saschabuehrle Mar 2, 2026
98f48bd
feat(integrations): add Google ADK harness plugin
saschabuehrle Mar 4, 2026
aa5fa3c
fix: resolve import regression and callback-key collision
saschabuehrle Mar 4, 2026
05c423f
fix: address PR #165 review — 5 findings resolved
saschabuehrle Mar 4, 2026
af414d0
feat(harness): add anthropic python auto-instrumentation for v2.1
saschabuehrle Mar 4, 2026
76f6c2e
feat(core): deliver v2.1 ts harness parity and sdk auto-instrumentation
saschabuehrle Mar 4, 2026
de7db49
test(harness): add comprehensive Anthropic auto-instrumentation tests
saschabuehrle Mar 4, 2026
de4a638
feat(harness): instrument Anthropic streaming usage and tool calls
saschabuehrle Mar 4, 2026
ac15742
fix(harness): finalize stream metrics on errors and harden env parsing
saschabuehrle Mar 4, 2026
b894cd3
docs: add harness quickstart and missing integration coverage
saschabuehrle Mar 4, 2026
6d3e6a8
feat(n8n): add multi-dimensional harness integration to Agent node
saschabuehrle Mar 4, 2026
510bdd1
fix(google-adk): initialize plugin name and stabilize callback correl…
saschabuehrle Mar 4, 2026
bace69d
chore(dx): clarify integration prerequisites and add optional integra…
saschabuehrle Mar 4, 2026
27b9402
style: apply Black formatting to 7 Python files
saschabuehrle Mar 4, 2026
37276b2
chore(ci/docs): enforce integration matrix across python versions
saschabuehrle Mar 4, 2026
1b470d6
style: fix ruff I001 import sorting in google_adk_harness example
saschabuehrle Mar 4, 2026
a986060
feat(benchmarks): add baseline and savings metrics to agentic tool be…
saschabuehrle Mar 5, 2026
39a469e
feat(dx): add LangChain harness docs, harness example, and llms.txt
saschabuehrle Mar 5, 2026
ca7fa4a
harden harness: input validation, trace rotation, NaN guard, phantom …
saschabuehrle Mar 5, 2026
9547ab1
docs: reframe positioning as agent runtime intelligence layer + add M…
saschabuehrle Mar 5, 2026
adbf47e
fix: switch GitHub Stars badge from social to flat style
saschabuehrle Mar 5, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 46 additions & 0 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,52 @@ jobs:
fail_ci_if_error: false
token: ${{ secrets.CODECOV_TOKEN }}

# Python opt-in integration install + focused tests
test-python-optional-integrations:
name: Python Optional Integrations (${{ matrix.integration }} / py${{ matrix.python-version }})
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
include:
- integration: openai-agents
python-version: '3.9'
extras: ".[dev,openai,openai-agents]"
tests: "tests/test_openai_agents_integration.py"
- integration: openai-agents
python-version: '3.11'
extras: ".[dev,openai,openai-agents]"
tests: "tests/test_openai_agents_integration.py"
- integration: crewai
python-version: '3.11'
extras: ".[dev,crewai,openai]"
tests: "tests/test_crewai_integration.py"
- integration: google-adk
python-version: '3.11'
extras: ".[dev,google-adk]"
tests: "tests/test_google_adk_integration.py"

steps:
- name: Checkout code
uses: actions/checkout@v4

- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
cache: 'pip'

- name: Install integration dependencies
run: |
python -m pip install --upgrade pip
pip install -e "${{ matrix.extras }}"

- name: Run focused integration tests
run: |
pytest ${{ matrix.tests }} -v
env:
PYTHONPATH: ${{ github.workspace }}

# TypeScript Core Tests
test-typescript-core:
name: TypeScript Core Tests
Expand Down
60 changes: 52 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
<img alt="cascadeflow Logo" src="./.github/assets/CF_logo_dark.svg" width="80%" style="margin: 20px auto;">
</picture>

# Smart AI model cascading for cost optimization
# Agent Runtime Intelligence Layer

[![PyPI version](https://img.shields.io/pypi/v/cascadeflow?color=blue&label=Python)](https://pypi.org/project/cascadeflow/)
[![npm version](https://img.shields.io/npm/v/@cascadeflow/core?color=red&label=TypeScript)](https://www.npmjs.com/package/@cascadeflow/core)
Expand All @@ -17,28 +17,27 @@
[![PyPI Downloads](https://static.pepy.tech/badge/cascadeflow)](https://pepy.tech/project/cascadeflow)
[![npm Downloads](https://img.shields.io/npm/dt/@cascadeflow/n8n-nodes-cascadeflow?label=npm%20downloads&color=orange)](https://www.npmjs.com/search?q=%40cascadeflow)
[![Tests](https://github.com/lemony-ai/cascadeflow/actions/workflows/test.yml/badge.svg)](https://github.com/lemony-ai/cascadeflow/actions/workflows/test.yml)
[![Docs](https://img.shields.io/badge/docs-cascadeflow.dev-blue)](https://docs.cascadeflow.dev)
[![Python Docs](https://img.shields.io/badge/docs-Python-blue)](./docs/)
[![TypeScript Docs](https://img.shields.io/badge/docs-TypeScript-red)](./docs/)
[![X Follow](https://img.shields.io/twitter/follow/saschabuehrle?style=social)](https://x.com/saschabuehrle)
[![GitHub Stars](https://img.shields.io/github/stars/lemony-ai/cascadeflow?style=social)](https://github.com/lemony-ai/cascadeflow)
[![GitHub Stars](https://img.shields.io/github/stars/lemony-ai/cascadeflow?style=flat&color=yellow&label=Stars)](https://github.com/lemony-ai/cascadeflow/stargazers)

<br>

**[Cost Savings Benchmarks](./tests/benchmarks/):** 69% (MT-Bench), 93% (GSM8K), 52% (MMLU), 80% (TruthfulQA) savings, retaining 96% GPT-5 quality.

<br>

**[<img src=".github/assets/CF_python_color.svg" width="22" height="22" alt="Python" style="vertical-align: middle;"/> Python](#-python) • [<img src=".github/assets/CF_ts_color.svg" width="22" height="22" alt="TypeScript" style="vertical-align: middle;"/> TypeScript](#-typescript) • [<picture><source media="(prefers-color-scheme: dark)" srcset="./.github/assets/LC-logo-bright.png"><source media="(prefers-color-scheme: light)" srcset="./.github/assets/LC-logo-dark.png"><img src=".github/assets/LC-logo-dark.png" height="22" alt="LangChain" style="vertical-align: middle;"></picture> LangChain](#-langchain-integration) • [<img src=".github/assets/CF_n8n_color.svg" width="22" height="22" alt="n8n" style="vertical-align: middle;"/> n8n](#-n8n-integration) • [<picture><source media="(prefers-color-scheme: dark)" srcset="./.github/assets/CF_vercel_bright.svg"><source media="(prefers-color-scheme: light)" srcset="./.github/assets/CF_vercel_dark.svg"><img src=".github/assets/CF_vercel_dark.svg" width="22" height="22" alt="Vercel AI" style="vertical-align: middle;"></picture> Vercel AI](./packages/integrations/vercel-ai/) • [<img src=".github/assets/CF_openclaw_color.svg" width="22" height="22" alt="OpenClaw" style="vertical-align: middle;"/> OpenClaw](https://clawhub.ai/saschabuehrle/cascadeflow) • [📖 Docs](./docs/) • [💡 Examples](#examples)**
**[<img src=".github/assets/CF_python_color.svg" width="22" height="22" alt="Python" style="vertical-align: middle;"/> Python](#-python) • [<img src=".github/assets/CF_ts_color.svg" width="22" height="22" alt="TypeScript" style="vertical-align: middle;"/> TypeScript](#-typescript) • [<picture><source media="(prefers-color-scheme: dark)" srcset="./.github/assets/LC-logo-bright.png"><source media="(prefers-color-scheme: light)" srcset="./.github/assets/LC-logo-dark.png"><img src=".github/assets/LC-logo-dark.png" height="22" alt="LangChain" style="vertical-align: middle;"></picture> LangChain](#-langchain-integration) • [<img src=".github/assets/CF_n8n_color.svg" width="22" height="22" alt="n8n" style="vertical-align: middle;"/> n8n](#-n8n-integration) • [<picture><source media="(prefers-color-scheme: dark)" srcset="./.github/assets/CF_vercel_bright.svg"><source media="(prefers-color-scheme: light)" srcset="./.github/assets/CF_vercel_dark.svg"><img src=".github/assets/CF_vercel_dark.svg" width="22" height="22" alt="Vercel AI" style="vertical-align: middle;"></picture> Vercel AI](./packages/integrations/vercel-ai/) • [<img src=".github/assets/CF_openclaw_color.svg" width="22" height="22" alt="OpenClaw" style="vertical-align: middle;"/> OpenClaw](https://clawhub.ai/saschabuehrle/cascadeflow) • [Full Docs](https://docs.cascadeflow.dev) • [📖 Docs](./docs/) • [💡 Examples](#examples)**

</div>

---

**Stop Bleeding Money on AI Calls. Cut Costs 30-65% in 3 Lines of Code.**
**The in-process intelligence layer for AI agents.** Optimize cost, latency, quality, budget, compliance, and energy — inside the execution loop, not at the HTTP boundary.

40-70% of text prompts and 20-60% of agent calls don't need expensive flagship models. You're overpaying every single day.

*cascadeflow fixes this with intelligent model cascading, available in Python and TypeScript.*
cascadeflow works where external proxies can't: per-step model decisions based on agent state, per-tool-call budget gating, runtime stop/continue/escalate actions, and business KPI injection during agent loops. Sub-1ms overhead. Works with LangChain, OpenAI Agents SDK, CrewAI, Google ADK, n8n, and Vercel AI SDK.

```python
pip install cascadeflow
Expand All @@ -52,6 +51,17 @@ npm install @cascadeflow/core

## Why cascadeflow?

### Proxy vs In-Process Harness

| Dimension | External Proxy | cascadeflow Harness |
|---|---|---|
| **Scope** | HTTP request boundary | Inside agent execution loop |
| **Dimensions** | Cost only | Cost + quality + latency + budget + compliance + energy |
| **Latency overhead** | 10-50ms network RTT | <1ms in-process |
| **Business logic** | None | KPI weights and targets |
| **Enforcement** | None (observe only) | stop, deny_tool, switch_model |
| **Auditability** | Request logs | Per-step decision traces |

cascadeflow is an intelligent AI model cascading library that dynamically selects the optimal model for each query or tool call through speculative execution. It's based on the research that 40-70% of queries don't require slow, expensive flagship models, and domain-specific smaller models often outperform large general-purpose models on specialized tasks. For the remaining queries that need advanced reasoning, cascadeflow automatically escalates to flagship models if needed.

### Use Cases
Expand Down Expand Up @@ -140,6 +150,34 @@ In practice, 60-70% of queries are handled by small, efficient models (8-20x cos

---

## Harness API

Three tiers of integration — zero-change observability to full policy control:

**Tier 1: Zero-change observability**
```python
import cascadeflow
cascadeflow.init(mode="observe")
# All OpenAI/Anthropic SDK calls are now tracked. No code changes needed.
```

**Tier 2: Scoped runs with budget**
```python
with cascadeflow.run(budget=0.50, max_tool_calls=10) as session:
result = await agent.run("Analyze this dataset")
print(session.summary()) # cost, latency, energy, steps, tool calls
print(session.trace()) # full decision audit trail
```

**Tier 3: Decorated agents with policy**
```python
@cascadeflow.agent(budget=0.20, compliance="gdpr", kpi_weights={"quality": 0.6, "cost": 0.3, "latency": 0.1})
async def my_agent(query: str):
return await llm.complete(query)
```

---

## Quick Start

### Drop-In Gateway (Existing Apps)
Expand Down Expand Up @@ -724,6 +762,12 @@ console.log(`Warnings: ${validation.warnings}`);
| 📋 **Message & Tool Call Lists** | Full conversation history with tool_calls and tool_call_id preservation across turns |
| 🪝 **Hooks & Callbacks** | Telemetry callbacks, cost events, and streaming hooks for observability |
| 🏭 **Production Ready** | Streaming, batch processing, tool handling, reasoning model support, caching, error recovery, anomaly detection |
| 💳 **Budget Enforcement** | Per-run and per-user budget caps with automatic stop actions when limits are exceeded |
| 🔒 **Compliance Gating** | GDPR, HIPAA, PCI, and strict model allowlists — block non-compliant models before execution |
| 📊 **KPI-Weighted Routing** | Inject business priorities (quality, cost, latency, energy) as weights into every model decision |
| 🌱 **Energy Tracking** | Deterministic compute-intensity coefficients for carbon-aware AI operations |
| 🔍 **Decision Traces** | Full per-step audit trail: action, reason, model, cost, budget state, enforcement status |
| ⚙️ **Harness Modes** | off / observe / enforce — roll out safely with observe, then switch to enforce when ready |

---

Expand Down Expand Up @@ -774,7 +818,7 @@ If you use cascadeflow in your research or project, please cite:
```bibtex
@software{cascadeflow2025,
author = {Lemony Inc., Sascha Buehrle and Contributors},
title = {cascadeflow: Smart AI model cascading for cost optimization},
title = {cascadeflow: Agent runtime intelligence layer for AI agent workflows},
year = {2025},
publisher = {GitHub},
url = {https://github.com/lemony-ai/cascadeflow}
Expand Down
51 changes: 23 additions & 28 deletions cascadeflow/__init__.py
Original file line number Diff line number Diff line change
@@ -1,30 +1,23 @@
"""
cascadeflow - Smart AI model cascading for cost optimization.

Route queries intelligently across multiple AI models from tiny SLMs
to frontier LLMs based on complexity, domain, and budget.

Features:
- 🚀 Speculative cascades (2-3x faster)
- 💰 60-95% cost savings
- 🎯 Per-prompt domain detection
- 🎨 2.0x domain boost for specialists
- 🔍 Multi-factor optimization
- 🆓 Free tier (Ollama + Groq)
- ⚡ 3 lines of code

Example:
>>> from cascadeflow import CascadeAgent, CascadePresets
>>>
>>> # Auto-detect available models
>>> models = CascadePresets.auto_detect_models()
>>>
>>> # Create agent with intelligence layer
>>> agent = CascadeAgent(models, enable_caching=True)
>>>
>>> # Run query (automatically optimized!)
>>> result = await agent.run("Fix this Python bug")
>>> print(f"Used {result.model_used} - Cost: ${result.cost:.6f}")
cascadeflow - Agent runtime intelligence layer.

In-process harness that optimizes cost, latency, quality, budget, compliance,
and energy across AI agent workflows. Works inside agent execution loops with
full state awareness -- not an external proxy.

Quick start:
import cascadeflow
cascadeflow.init(mode="observe")
# All OpenAI/Anthropic SDK calls are now tracked and traced.

Key APIs:
cascadeflow.init(mode) -- activate harness (off | observe | enforce)
cascadeflow.run(budget) -- scoped run with budget/trace
@cascadeflow.agent(budget) -- policy metadata on agent functions
session.summary() -- structured metrics
session.trace() -- full decision audit trail

Integrations: LangChain, OpenAI Agents SDK, CrewAI, Google ADK, n8n, Vercel AI SDK
"""

__version__ = "1.0.0"
Expand Down Expand Up @@ -240,14 +233,17 @@
)

# NEW: Harness API scaffold (V2 core branch)
# NOTE: harness.agent is NOT re-exported here — it would shadow the
# cascadeflow.agent *module* and break dotted-path resolution
# (e.g. patch("cascadeflow.agent.PROVIDER_REGISTRY")).
# Use ``from cascadeflow.harness import agent`` instead.
from .harness import (
HarnessConfig,
HarnessInitReport,
HarnessRunContext,
init,
reset,
run,
agent as harness_agent,
get_harness_config,
get_current_run,
)
Expand Down Expand Up @@ -401,7 +397,6 @@
"init",
"reset",
"run",
"harness_agent",
"get_harness_config",
"get_current_run",
# ===== PROVIDERS =====
Expand Down
4 changes: 4 additions & 0 deletions cascadeflow/harness/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,13 @@
HarnessInitReport,
HarnessRunContext,
agent,
get_harness_callback_manager,
get_current_run,
get_harness_config,
init,
reset,
run,
set_harness_callback_manager,
)

__all__ = [
Expand All @@ -29,6 +31,8 @@
"run",
"agent",
"get_current_run",
"get_harness_callback_manager",
"get_harness_config",
"set_harness_callback_manager",
"reset",
]
Loading