Skip to content

Commit 5cd3f58

Browse files
feat: merge agent-intelligence v2 integration train into main (#164)
* Add core harness API scaffold with context-scoped runtime * Harden harness core scaffolding and complete API test coverage * feat(harness): implement OpenAI Python client auto-instrumentation Replace the instrument.py scaffold with a full implementation that patches openai.resources.chat.completions.Completions.create (sync) and AsyncCompletions.create (async) for harness observe/enforce modes. Key capabilities: - Class-level patching of sync and async create methods - Streaming wrappers (_InstrumentedStream, _InstrumentedAsyncStream) that capture usage metrics after all chunks are consumed - Cost estimation from a built-in pricing table - Energy estimation using deterministic model coefficients - Tool call counting in both response and streaming chunks - Budget remaining tracking within scoped runs - Idempotent patching with clean unpatch/reset path Context tracking per call: - cost, step_count, latency_used_ms, energy_used, tool_calls - budget_remaining auto-updated when budget_max is set - model_used and decision trace via ctx.record() Added step_count, latency_used_ms, energy_used fields to HarnessRunContext in api.py. Hooked patch_openai into init() and unpatch_openai into reset(). 39 new tests covering: patch lifecycle, sync/async wrappers, sync/async stream wrappers, cost/energy estimation, nested run isolation, and edge cases (no usage, no choices, missing chunks). All 63 harness tests pass (39 instrument + 24 api). * fix: address PR review — off-mode unpatch, enforce budget gate, stream usage injection - init(mode="off") now calls unpatch_openai() if previously patched - Trace records actual mode (observe/enforce) instead of always "observe" - Enforce mode raises BudgetExceededError pre-call when budget exhausted - Auto-inject stream_options.include_usage=True for streaming requests - Add pytest.importorskip("openai") for graceful skip when not installed - 10 new tests covering all four fixes (73 total pass) * Add OpenAI Agents SDK harness integration (opt-in) * fix(openai-agents): align SDK interface and enforce-safe errors * Add CrewAI harness integration with before/after LLM-call hooks Implements cascadeflow.integrations.crewai module that hooks into CrewAI's native llm_hooks system (v1.5+) to feed cost, latency, energy, and step metrics into harness run contexts. - before_llm_call: budget gate in enforce mode, latency tracking - after_llm_call: token estimation, cost/energy/step accounting - enable()/disable() lifecycle with fail_open and budget_gate config - 37 tests covering hooks, estimation, enable/disable, and edge cases - Fixed __init__.py import ordering (CREWAI_AVAILABLE before __all__) * fix: address PR review — dict messages, start time leak, lint, extras - Add crewai extra to pyproject.toml (pip install cascadeflow[crewai]) - Handle dict messages in _extract_message_content (CrewAI passes {"role": "...", "content": "..."} not objects with .content attr) - Move budget gate check before start time recording so blocked calls don't leak entries in _call_start_times - Fix unused imports (field, TYPE_CHECKING, Callable) and import order - Fix docstring referencing nonexistent cost_model_override - Replace yield with return in test fixture (PT022) - Add 7 new tests: dict/object message extraction, blocked call leak * docs(plan): claim v2 enforce-actions feature branch * feat(harness): enforce switch-model, deny-tool, and stop actions * feat(harness): implement enforce actions for v2 harness * fix(harness): clarify observe traces and hard-stop semantics * perf(harness): optimize model utility hot paths * refactor(harness): unify pricing profiles across integrations * docs(plan): claim langchain harness extension branch * feat(harness): add privacy-safe decision telemetry and callback hooks * fix(harness): address telemetry review findings - Use time.monotonic() for duration_ms calculation instead of wall-clock delta (avoids NTP/suspend clock jumps) - Extract sanitize constants (_MAX_ACTION_LEN, _MAX_REASON_LEN, _MAX_MODEL_LEN) - Log warning when record() receives empty action (was silently defaulting) - Cache CallbackEvent import in _emit_harness_decision for hot-path perf - Add tests: no-callback-manager noop, empty-action warning, duration field * fix(harness): avoid shadowing cascadeflow.agent module * style: apply black formatting for harness integration files * feat(langchain): add harness-aware callback and state extractor * feat(langchain): auto-attach harness callback in active run scopes * docs(plan): mark langchain harness extension branch completed * fix(langchain): address PR #161 review findings - Document enforce-mode limitations for switch_model and deny_tool - Replace per-handler _executed_tool_calls with run_ctx.tool_calls - Fix _extract_candidate_state fallback leaking arbitrary kwargs - Remove return-in-finally (B012) and fix import ordering - Separate langgraph from langchain optional extra in pyproject.toml - Add 4 edge-case tests: no-run-context safety, state extraction guard, and run_ctx tool_calls gating * fix(langchain): enforce tool caps on executed calls and harden tool extraction * fix(harness): avoid shadowing cascadeflow.agent module * feat(bench): add reproducibility pipeline for V2 Go/No-Go validation Add 5 new benchmark modules and 15 unit tests that enable third-party reproducibility and automated V2 readiness checks: - repro.py: environment fingerprint (git SHA, packages, platform) - baseline.py: save/load baselines, delta comparison, Go/No-Go gates - harness_overhead.py: decision-path p95 measurement (<5ms gate) - observe_validation.py: observe-mode zero-change proof (6 cases) - artifact.py: JSON artifact bundler + REPRODUCE.md generation Extends run_all.py with --baseline, --harness-mode, --with-repro flags. * docs(plan): update workboard — bench-repro-pipeline PR #163 in review * style(bench): apply linter formatting to repro pipeline files * style(langchain): finalize harness callback typing and formatting * feat(integrations): add Google ADK harness plugin Add CascadeFlowADKPlugin(BasePlugin) that intercepts all LLM calls across ADK Runner agents for budget enforcement, cost/latency/energy tracking, tool call counting, and trace recording. New files: - cascadeflow/harness/pricing.py — shared pricing table with Gemini models - cascadeflow/integrations/google_adk.py — plugin + enable/disable API - tests/test_google_adk_integration.py — 49 tests - docs/guides/google_adk_integration.md - examples/integrations/google_adk_harness.py Modified: - cascadeflow/integrations/__init__.py — register integration - pyproject.toml — add google-adk optional extra * fix: resolve import regression and callback-key collision - Remove harness `agent` from top-level cascadeflow namespace to avoid shadowing the cascadeflow.agent module (breaks dotted-path patches in test_agent.py and test_agent_p0_tool_loop.py) - Use id(callback_context) fallback in ADK plugin _callback_key() when invocation_id and agent_name are both empty, preventing state map collisions under concurrency - Add 4 tests for callback-key collision scenario - Update test_harness_api to import agent from cascadeflow.harness * fix: address PR #165 review — 5 findings resolved 1. HIGH: off mode now respected — before/after callbacks return early when ctx.mode == "off", preventing metric tracking in off mode 2. HIGH: versioned Gemini model IDs now resolve correctly — added _resolve_pricing_key() with suffix stripping (-preview-XX-XX, -YYYYMMDD, -latest, -exp-N) and longest-prefix fallback matching 3. MEDIUM: callback key collision fixed — switched from (invocation_id, agent_name) tuple to id(callback_context) int key, guaranteeing uniqueness even for concurrent calls with same IDs 4. MEDIUM: fail_open tests now patch the correct symbol (cascadeflow.integrations.google_adk.get_current_run instead of cascadeflow.harness.api.get_current_run) 5. MEDIUM: budget error response no longer leaks spend/limit numbers — user-facing message is generic, exact figures logged at warning level Added 13 new tests: off-mode behavior (2), versioned model pricing (7), callback key collision (4). Total: 62 ADK tests pass. Full suite: 1097 passed, 69 skipped, 0 failures. * feat(harness): add anthropic python auto-instrumentation for v2.1 * feat(core): deliver v2.1 ts harness parity and sdk auto-instrumentation * test(harness): add comprehensive Anthropic auto-instrumentation tests Add 29 tests covering the Anthropic Python SDK monkey-patching that was introduced in v2.1. Tests cover usage extraction, tool call counting, sync/async wrapper behavior, budget enforcement in enforce mode, stream passthrough, cost/energy/latency tracking, and init/reset lifecycle. * feat(harness): instrument Anthropic streaming usage and tool calls * fix(harness): finalize stream metrics on errors and harden env parsing * docs: add harness quickstart and missing integration coverage * feat(n8n): add multi-dimensional harness integration to Agent node Port the Python harness decision engine to TypeScript and wire it into the n8n Agent node. Tracks 5 dimensions (cost, latency, energy, tool calls, quality) across every LLM call. Observe mode is on by default; enforce mode stops the agent loop when limits are hit. - Add nodes/harness/ with pricing (18 models, fuzzy resolution), HarnessRunContext (7-step decision cascade, compliance allowlists, KPI-weighted scoring), and 43 tests - Replace hardcoded estimatesPerMillion in CascadeChatModel with shared harness/pricing.ts (broader model coverage + suffix stripping) - Add harness UI parameters to Agent node (mode, budget, tool cap, latency cap, energy cap, compliance, KPI weights) - Wire pre-call checks and tool-call counting into agent executor loop - Add harness summary to Agent output JSON * fix(google-adk): initialize plugin name and stabilize callback correlation * chore(dx): clarify integration prerequisites and add optional integration CI * style: apply Black formatting to 7 Python files Fix CI Python Code Quality check — these files drifted from Black formatting after recent merges into the integration branch. * chore(ci/docs): enforce integration matrix across python versions * style: fix ruff I001 import sorting in google_adk_harness example * feat(benchmarks): add baseline and savings metrics to agentic tool benchmark * feat(dx): add LangChain harness docs, harness example, and llms.txt Close V2 Go/No-Go gaps: - Add harness section to langchain_integration.md documenting HarnessAwareCascadeFlowCallbackHandler and get_harness_callback - Create langchain_harness.py example (matches CrewAI/OpenAI Agents/ADK pattern) - Create llms.txt at repo root for LLM-readable project discovery - Update V2 workboard: all feature branches merged, Go/No-Go checklist updated * harden harness: input validation, trace rotation, NaN guard, phantom model fix - Add _validate_harness_params() to init() and run() — rejects negative budget/tool_calls/latency/energy and invalid compliance strings - Add trace rotation (MAX_TRACE_ENTRIES=1000) in both Python and TypeScript to prevent unbounded memory growth in long-running agents - Add sanitizeNumericParam() in n8n harness.ts — coerces NaN/Infinity/negative config values to null - Remove phantom gpt-5-nano from llms.txt (not in any pricing table) - Document HarnessRunContext thread-safety limitation in docstring - Add 10 new tests covering validation, compliance, and trace rotation * docs: reframe positioning as agent runtime intelligence layer + add Mintlify docs site Phase 0 — GitHub refresh: - pyproject.toml: update description, keywords, classifier to Production/Stable - __init__.py: replace emoji docstring with harness API focus - llms.txt: expand from 88 to 214 lines (HarnessConfig, pricing, energy, integrations) - README.md: new H1, comparison table, Harness API section, 6 new feature rows - docs/README.md: Mintlify banner, add LangChain to integrations list Phase 1 — Mintlify docs site (docs-site/): - docs.json config (palm theme, 5 tabs, full navigation) - 36 MDX pages: Get Started (4), Harness (8), Integrations (7), API Reference (8), Examples (6), index + changelog + contributing - Logo assets copied from .github/assets/ * fix: switch GitHub Stars badge from social to flat style Social-style shields.io badges intermittently render as "invalid" due to GitHub API rate limiting. Flat style is more reliable.
1 parent 12905d9 commit 5cd3f58

88 files changed

Lines changed: 10953 additions & 158 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/workflows/test.yml

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,52 @@ jobs:
4747
fail_ci_if_error: false
4848
token: ${{ secrets.CODECOV_TOKEN }}
4949

50+
# Python opt-in integration install + focused tests
51+
test-python-optional-integrations:
52+
name: Python Optional Integrations (${{ matrix.integration }} / py${{ matrix.python-version }})
53+
runs-on: ubuntu-latest
54+
strategy:
55+
fail-fast: false
56+
matrix:
57+
include:
58+
- integration: openai-agents
59+
python-version: '3.9'
60+
extras: ".[dev,openai,openai-agents]"
61+
tests: "tests/test_openai_agents_integration.py"
62+
- integration: openai-agents
63+
python-version: '3.11'
64+
extras: ".[dev,openai,openai-agents]"
65+
tests: "tests/test_openai_agents_integration.py"
66+
- integration: crewai
67+
python-version: '3.11'
68+
extras: ".[dev,crewai,openai]"
69+
tests: "tests/test_crewai_integration.py"
70+
- integration: google-adk
71+
python-version: '3.11'
72+
extras: ".[dev,google-adk]"
73+
tests: "tests/test_google_adk_integration.py"
74+
75+
steps:
76+
- name: Checkout code
77+
uses: actions/checkout@v4
78+
79+
- name: Set up Python ${{ matrix.python-version }}
80+
uses: actions/setup-python@v5
81+
with:
82+
python-version: ${{ matrix.python-version }}
83+
cache: 'pip'
84+
85+
- name: Install integration dependencies
86+
run: |
87+
python -m pip install --upgrade pip
88+
pip install -e "${{ matrix.extras }}"
89+
90+
- name: Run focused integration tests
91+
run: |
92+
pytest ${{ matrix.tests }} -v
93+
env:
94+
PYTHONPATH: ${{ github.workspace }}
95+
5096
# TypeScript Core Tests
5197
test-typescript-core:
5298
name: TypeScript Core Tests

README.md

Lines changed: 52 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
<img alt="cascadeflow Logo" src="./.github/assets/CF_logo_dark.svg" width="80%" style="margin: 20px auto;">
77
</picture>
88

9-
# Smart AI model cascading for cost optimization
9+
# Agent Runtime Intelligence Layer
1010

1111
[![PyPI version](https://img.shields.io/pypi/v/cascadeflow?color=blue&label=Python)](https://pypi.org/project/cascadeflow/)
1212
[![npm version](https://img.shields.io/npm/v/@cascadeflow/core?color=red&label=TypeScript)](https://www.npmjs.com/package/@cascadeflow/core)
@@ -17,28 +17,27 @@
1717
[![PyPI Downloads](https://static.pepy.tech/badge/cascadeflow)](https://pepy.tech/project/cascadeflow)
1818
[![npm Downloads](https://img.shields.io/npm/dt/@cascadeflow/n8n-nodes-cascadeflow?label=npm%20downloads&color=orange)](https://www.npmjs.com/search?q=%40cascadeflow)
1919
[![Tests](https://github.com/lemony-ai/cascadeflow/actions/workflows/test.yml/badge.svg)](https://github.com/lemony-ai/cascadeflow/actions/workflows/test.yml)
20+
[![Docs](https://img.shields.io/badge/docs-cascadeflow.dev-blue)](https://docs.cascadeflow.dev)
2021
[![Python Docs](https://img.shields.io/badge/docs-Python-blue)](./docs/)
2122
[![TypeScript Docs](https://img.shields.io/badge/docs-TypeScript-red)](./docs/)
2223
[![X Follow](https://img.shields.io/twitter/follow/saschabuehrle?style=social)](https://x.com/saschabuehrle)
23-
[![GitHub Stars](https://img.shields.io/github/stars/lemony-ai/cascadeflow?style=social)](https://github.com/lemony-ai/cascadeflow)
24+
[![GitHub Stars](https://img.shields.io/github/stars/lemony-ai/cascadeflow?style=flat&color=yellow&label=Stars)](https://github.com/lemony-ai/cascadeflow/stargazers)
2425

2526
<br>
2627

2728
**[Cost Savings Benchmarks](./tests/benchmarks/):** 69% (MT-Bench), 93% (GSM8K), 52% (MMLU), 80% (TruthfulQA) savings, retaining 96% GPT-5 quality.
2829

2930
<br>
3031

31-
**[<img src=".github/assets/CF_python_color.svg" width="22" height="22" alt="Python" style="vertical-align: middle;"/> Python](#-python) • [<img src=".github/assets/CF_ts_color.svg" width="22" height="22" alt="TypeScript" style="vertical-align: middle;"/> TypeScript](#-typescript) • [<picture><source media="(prefers-color-scheme: dark)" srcset="./.github/assets/LC-logo-bright.png"><source media="(prefers-color-scheme: light)" srcset="./.github/assets/LC-logo-dark.png"><img src=".github/assets/LC-logo-dark.png" height="22" alt="LangChain" style="vertical-align: middle;"></picture> LangChain](#-langchain-integration) • [<img src=".github/assets/CF_n8n_color.svg" width="22" height="22" alt="n8n" style="vertical-align: middle;"/> n8n](#-n8n-integration) • [<picture><source media="(prefers-color-scheme: dark)" srcset="./.github/assets/CF_vercel_bright.svg"><source media="(prefers-color-scheme: light)" srcset="./.github/assets/CF_vercel_dark.svg"><img src=".github/assets/CF_vercel_dark.svg" width="22" height="22" alt="Vercel AI" style="vertical-align: middle;"></picture> Vercel AI](./packages/integrations/vercel-ai/) • [<img src=".github/assets/CF_openclaw_color.svg" width="22" height="22" alt="OpenClaw" style="vertical-align: middle;"/> OpenClaw](https://clawhub.ai/saschabuehrle/cascadeflow) • [📖 Docs](./docs/) • [💡 Examples](#examples)**
32+
**[<img src=".github/assets/CF_python_color.svg" width="22" height="22" alt="Python" style="vertical-align: middle;"/> Python](#-python) • [<img src=".github/assets/CF_ts_color.svg" width="22" height="22" alt="TypeScript" style="vertical-align: middle;"/> TypeScript](#-typescript) • [<picture><source media="(prefers-color-scheme: dark)" srcset="./.github/assets/LC-logo-bright.png"><source media="(prefers-color-scheme: light)" srcset="./.github/assets/LC-logo-dark.png"><img src=".github/assets/LC-logo-dark.png" height="22" alt="LangChain" style="vertical-align: middle;"></picture> LangChain](#-langchain-integration) • [<img src=".github/assets/CF_n8n_color.svg" width="22" height="22" alt="n8n" style="vertical-align: middle;"/> n8n](#-n8n-integration) • [<picture><source media="(prefers-color-scheme: dark)" srcset="./.github/assets/CF_vercel_bright.svg"><source media="(prefers-color-scheme: light)" srcset="./.github/assets/CF_vercel_dark.svg"><img src=".github/assets/CF_vercel_dark.svg" width="22" height="22" alt="Vercel AI" style="vertical-align: middle;"></picture> Vercel AI](./packages/integrations/vercel-ai/) • [<img src=".github/assets/CF_openclaw_color.svg" width="22" height="22" alt="OpenClaw" style="vertical-align: middle;"/> OpenClaw](https://clawhub.ai/saschabuehrle/cascadeflow) • [Full Docs](https://docs.cascadeflow.dev) • [📖 Docs](./docs/) • [💡 Examples](#examples)**
3233

3334
</div>
3435

3536
---
3637

37-
**Stop Bleeding Money on AI Calls. Cut Costs 30-65% in 3 Lines of Code.**
38+
**The in-process intelligence layer for AI agents.** Optimize cost, latency, quality, budget, compliance, and energy — inside the execution loop, not at the HTTP boundary.
3839

39-
40-70% of text prompts and 20-60% of agent calls don't need expensive flagship models. You're overpaying every single day.
40-
41-
*cascadeflow fixes this with intelligent model cascading, available in Python and TypeScript.*
40+
cascadeflow works where external proxies can't: per-step model decisions based on agent state, per-tool-call budget gating, runtime stop/continue/escalate actions, and business KPI injection during agent loops. Sub-1ms overhead. Works with LangChain, OpenAI Agents SDK, CrewAI, Google ADK, n8n, and Vercel AI SDK.
4241

4342
```python
4443
pip install cascadeflow
@@ -52,6 +51,17 @@ npm install @cascadeflow/core
5251

5352
## Why cascadeflow?
5453

54+
### Proxy vs In-Process Harness
55+
56+
| Dimension | External Proxy | cascadeflow Harness |
57+
|---|---|---|
58+
| **Scope** | HTTP request boundary | Inside agent execution loop |
59+
| **Dimensions** | Cost only | Cost + quality + latency + budget + compliance + energy |
60+
| **Latency overhead** | 10-50ms network RTT | <1ms in-process |
61+
| **Business logic** | None | KPI weights and targets |
62+
| **Enforcement** | None (observe only) | stop, deny_tool, switch_model |
63+
| **Auditability** | Request logs | Per-step decision traces |
64+
5565
cascadeflow is an intelligent AI model cascading library that dynamically selects the optimal model for each query or tool call through speculative execution. It's based on the research that 40-70% of queries don't require slow, expensive flagship models, and domain-specific smaller models often outperform large general-purpose models on specialized tasks. For the remaining queries that need advanced reasoning, cascadeflow automatically escalates to flagship models if needed.
5666

5767
### Use Cases
@@ -140,6 +150,34 @@ In practice, 60-70% of queries are handled by small, efficient models (8-20x cos
140150

141151
---
142152

153+
## Harness API
154+
155+
Three tiers of integration — zero-change observability to full policy control:
156+
157+
**Tier 1: Zero-change observability**
158+
```python
159+
import cascadeflow
160+
cascadeflow.init(mode="observe")
161+
# All OpenAI/Anthropic SDK calls are now tracked. No code changes needed.
162+
```
163+
164+
**Tier 2: Scoped runs with budget**
165+
```python
166+
with cascadeflow.run(budget=0.50, max_tool_calls=10) as session:
167+
result = await agent.run("Analyze this dataset")
168+
print(session.summary()) # cost, latency, energy, steps, tool calls
169+
print(session.trace()) # full decision audit trail
170+
```
171+
172+
**Tier 3: Decorated agents with policy**
173+
```python
174+
@cascadeflow.agent(budget=0.20, compliance="gdpr", kpi_weights={"quality": 0.6, "cost": 0.3, "latency": 0.1})
175+
async def my_agent(query: str):
176+
return await llm.complete(query)
177+
```
178+
179+
---
180+
143181
## Quick Start
144182

145183
### Drop-In Gateway (Existing Apps)
@@ -724,6 +762,12 @@ console.log(`Warnings: ${validation.warnings}`);
724762
| 📋 **Message & Tool Call Lists** | Full conversation history with tool_calls and tool_call_id preservation across turns |
725763
| 🪝 **Hooks & Callbacks** | Telemetry callbacks, cost events, and streaming hooks for observability |
726764
| 🏭 **Production Ready** | Streaming, batch processing, tool handling, reasoning model support, caching, error recovery, anomaly detection |
765+
| 💳 **Budget Enforcement** | Per-run and per-user budget caps with automatic stop actions when limits are exceeded |
766+
| 🔒 **Compliance Gating** | GDPR, HIPAA, PCI, and strict model allowlists — block non-compliant models before execution |
767+
| 📊 **KPI-Weighted Routing** | Inject business priorities (quality, cost, latency, energy) as weights into every model decision |
768+
| 🌱 **Energy Tracking** | Deterministic compute-intensity coefficients for carbon-aware AI operations |
769+
| 🔍 **Decision Traces** | Full per-step audit trail: action, reason, model, cost, budget state, enforcement status |
770+
| ⚙️ **Harness Modes** | off / observe / enforce — roll out safely with observe, then switch to enforce when ready |
727771

728772
---
729773

@@ -774,7 +818,7 @@ If you use cascadeflow in your research or project, please cite:
774818
```bibtex
775819
@software{cascadeflow2025,
776820
author = {Lemony Inc., Sascha Buehrle and Contributors},
777-
title = {cascadeflow: Smart AI model cascading for cost optimization},
821+
title = {cascadeflow: Agent runtime intelligence layer for AI agent workflows},
778822
year = {2025},
779823
publisher = {GitHub},
780824
url = {https://github.com/lemony-ai/cascadeflow}

cascadeflow/__init__.py

Lines changed: 23 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -1,30 +1,23 @@
11
"""
2-
cascadeflow - Smart AI model cascading for cost optimization.
3-
4-
Route queries intelligently across multiple AI models from tiny SLMs
5-
to frontier LLMs based on complexity, domain, and budget.
6-
7-
Features:
8-
- 🚀 Speculative cascades (2-3x faster)
9-
- 💰 60-95% cost savings
10-
- 🎯 Per-prompt domain detection
11-
- 🎨 2.0x domain boost for specialists
12-
- 🔍 Multi-factor optimization
13-
- 🆓 Free tier (Ollama + Groq)
14-
- ⚡ 3 lines of code
15-
16-
Example:
17-
>>> from cascadeflow import CascadeAgent, CascadePresets
18-
>>>
19-
>>> # Auto-detect available models
20-
>>> models = CascadePresets.auto_detect_models()
21-
>>>
22-
>>> # Create agent with intelligence layer
23-
>>> agent = CascadeAgent(models, enable_caching=True)
24-
>>>
25-
>>> # Run query (automatically optimized!)
26-
>>> result = await agent.run("Fix this Python bug")
27-
>>> print(f"Used {result.model_used} - Cost: ${result.cost:.6f}")
2+
cascadeflow - Agent runtime intelligence layer.
3+
4+
In-process harness that optimizes cost, latency, quality, budget, compliance,
5+
and energy across AI agent workflows. Works inside agent execution loops with
6+
full state awareness -- not an external proxy.
7+
8+
Quick start:
9+
import cascadeflow
10+
cascadeflow.init(mode="observe")
11+
# All OpenAI/Anthropic SDK calls are now tracked and traced.
12+
13+
Key APIs:
14+
cascadeflow.init(mode) -- activate harness (off | observe | enforce)
15+
cascadeflow.run(budget) -- scoped run with budget/trace
16+
@cascadeflow.agent(budget) -- policy metadata on agent functions
17+
session.summary() -- structured metrics
18+
session.trace() -- full decision audit trail
19+
20+
Integrations: LangChain, OpenAI Agents SDK, CrewAI, Google ADK, n8n, Vercel AI SDK
2821
"""
2922

3023
__version__ = "1.0.0"
@@ -240,14 +233,17 @@
240233
)
241234

242235
# NEW: Harness API scaffold (V2 core branch)
236+
# NOTE: harness.agent is NOT re-exported here — it would shadow the
237+
# cascadeflow.agent *module* and break dotted-path resolution
238+
# (e.g. patch("cascadeflow.agent.PROVIDER_REGISTRY")).
239+
# Use ``from cascadeflow.harness import agent`` instead.
243240
from .harness import (
244241
HarnessConfig,
245242
HarnessInitReport,
246243
HarnessRunContext,
247244
init,
248245
reset,
249246
run,
250-
agent as harness_agent,
251247
get_harness_config,
252248
get_current_run,
253249
)
@@ -401,7 +397,6 @@
401397
"init",
402398
"reset",
403399
"run",
404-
"harness_agent",
405400
"get_harness_config",
406401
"get_current_run",
407402
# ===== PROVIDERS =====

cascadeflow/harness/__init__.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,11 +14,13 @@
1414
HarnessInitReport,
1515
HarnessRunContext,
1616
agent,
17+
get_harness_callback_manager,
1718
get_current_run,
1819
get_harness_config,
1920
init,
2021
reset,
2122
run,
23+
set_harness_callback_manager,
2224
)
2325

2426
__all__ = [
@@ -29,6 +31,8 @@
2931
"run",
3032
"agent",
3133
"get_current_run",
34+
"get_harness_callback_manager",
3235
"get_harness_config",
36+
"set_harness_callback_manager",
3337
"reset",
3438
]

0 commit comments

Comments
 (0)