Commit 5cd3f58
authored
feat: merge agent-intelligence v2 integration train into main (#164)
* Add core harness API scaffold with context-scoped runtime
* Harden harness core scaffolding and complete API test coverage
* feat(harness): implement OpenAI Python client auto-instrumentation
Replace the instrument.py scaffold with a full implementation that patches
openai.resources.chat.completions.Completions.create (sync) and
AsyncCompletions.create (async) for harness observe/enforce modes.
Key capabilities:
- Class-level patching of sync and async create methods
- Streaming wrappers (_InstrumentedStream, _InstrumentedAsyncStream)
that capture usage metrics after all chunks are consumed
- Cost estimation from a built-in pricing table
- Energy estimation using deterministic model coefficients
- Tool call counting in both response and streaming chunks
- Budget remaining tracking within scoped runs
- Idempotent patching with clean unpatch/reset path
Context tracking per call:
- cost, step_count, latency_used_ms, energy_used, tool_calls
- budget_remaining auto-updated when budget_max is set
- model_used and decision trace via ctx.record()
Added step_count, latency_used_ms, energy_used fields to
HarnessRunContext in api.py. Hooked patch_openai into init()
and unpatch_openai into reset().
39 new tests covering: patch lifecycle, sync/async wrappers,
sync/async stream wrappers, cost/energy estimation, nested run
isolation, and edge cases (no usage, no choices, missing chunks).
All 63 harness tests pass (39 instrument + 24 api).
* fix: address PR review — off-mode unpatch, enforce budget gate, stream usage injection
- init(mode="off") now calls unpatch_openai() if previously patched
- Trace records actual mode (observe/enforce) instead of always "observe"
- Enforce mode raises BudgetExceededError pre-call when budget exhausted
- Auto-inject stream_options.include_usage=True for streaming requests
- Add pytest.importorskip("openai") for graceful skip when not installed
- 10 new tests covering all four fixes (73 total pass)
* Add OpenAI Agents SDK harness integration (opt-in)
* fix(openai-agents): align SDK interface and enforce-safe errors
* Add CrewAI harness integration with before/after LLM-call hooks
Implements cascadeflow.integrations.crewai module that hooks into
CrewAI's native llm_hooks system (v1.5+) to feed cost, latency,
energy, and step metrics into harness run contexts.
- before_llm_call: budget gate in enforce mode, latency tracking
- after_llm_call: token estimation, cost/energy/step accounting
- enable()/disable() lifecycle with fail_open and budget_gate config
- 37 tests covering hooks, estimation, enable/disable, and edge cases
- Fixed __init__.py import ordering (CREWAI_AVAILABLE before __all__)
* fix: address PR review — dict messages, start time leak, lint, extras
- Add crewai extra to pyproject.toml (pip install cascadeflow[crewai])
- Handle dict messages in _extract_message_content (CrewAI passes
{"role": "...", "content": "..."} not objects with .content attr)
- Move budget gate check before start time recording so blocked calls
don't leak entries in _call_start_times
- Fix unused imports (field, TYPE_CHECKING, Callable) and import order
- Fix docstring referencing nonexistent cost_model_override
- Replace yield with return in test fixture (PT022)
- Add 7 new tests: dict/object message extraction, blocked call leak
* docs(plan): claim v2 enforce-actions feature branch
* feat(harness): enforce switch-model, deny-tool, and stop actions
* feat(harness): implement enforce actions for v2 harness
* fix(harness): clarify observe traces and hard-stop semantics
* perf(harness): optimize model utility hot paths
* refactor(harness): unify pricing profiles across integrations
* docs(plan): claim langchain harness extension branch
* feat(harness): add privacy-safe decision telemetry and callback hooks
* fix(harness): address telemetry review findings
- Use time.monotonic() for duration_ms calculation instead of wall-clock
delta (avoids NTP/suspend clock jumps)
- Extract sanitize constants (_MAX_ACTION_LEN, _MAX_REASON_LEN, _MAX_MODEL_LEN)
- Log warning when record() receives empty action (was silently defaulting)
- Cache CallbackEvent import in _emit_harness_decision for hot-path perf
- Add tests: no-callback-manager noop, empty-action warning, duration field
* fix(harness): avoid shadowing cascadeflow.agent module
* style: apply black formatting for harness integration files
* feat(langchain): add harness-aware callback and state extractor
* feat(langchain): auto-attach harness callback in active run scopes
* docs(plan): mark langchain harness extension branch completed
* fix(langchain): address PR #161 review findings
- Document enforce-mode limitations for switch_model and deny_tool
- Replace per-handler _executed_tool_calls with run_ctx.tool_calls
- Fix _extract_candidate_state fallback leaking arbitrary kwargs
- Remove return-in-finally (B012) and fix import ordering
- Separate langgraph from langchain optional extra in pyproject.toml
- Add 4 edge-case tests: no-run-context safety, state extraction
guard, and run_ctx tool_calls gating
* fix(langchain): enforce tool caps on executed calls and harden tool extraction
* fix(harness): avoid shadowing cascadeflow.agent module
* feat(bench): add reproducibility pipeline for V2 Go/No-Go validation
Add 5 new benchmark modules and 15 unit tests that enable third-party
reproducibility and automated V2 readiness checks:
- repro.py: environment fingerprint (git SHA, packages, platform)
- baseline.py: save/load baselines, delta comparison, Go/No-Go gates
- harness_overhead.py: decision-path p95 measurement (<5ms gate)
- observe_validation.py: observe-mode zero-change proof (6 cases)
- artifact.py: JSON artifact bundler + REPRODUCE.md generation
Extends run_all.py with --baseline, --harness-mode, --with-repro flags.
* docs(plan): update workboard — bench-repro-pipeline PR #163 in review
* style(bench): apply linter formatting to repro pipeline files
* style(langchain): finalize harness callback typing and formatting
* feat(integrations): add Google ADK harness plugin
Add CascadeFlowADKPlugin(BasePlugin) that intercepts all LLM calls
across ADK Runner agents for budget enforcement, cost/latency/energy
tracking, tool call counting, and trace recording.
New files:
- cascadeflow/harness/pricing.py — shared pricing table with Gemini models
- cascadeflow/integrations/google_adk.py — plugin + enable/disable API
- tests/test_google_adk_integration.py — 49 tests
- docs/guides/google_adk_integration.md
- examples/integrations/google_adk_harness.py
Modified:
- cascadeflow/integrations/__init__.py — register integration
- pyproject.toml — add google-adk optional extra
* fix: resolve import regression and callback-key collision
- Remove harness `agent` from top-level cascadeflow namespace to avoid
shadowing the cascadeflow.agent module (breaks dotted-path patches in
test_agent.py and test_agent_p0_tool_loop.py)
- Use id(callback_context) fallback in ADK plugin _callback_key() when
invocation_id and agent_name are both empty, preventing state map
collisions under concurrency
- Add 4 tests for callback-key collision scenario
- Update test_harness_api to import agent from cascadeflow.harness
* fix: address PR #165 review — 5 findings resolved
1. HIGH: off mode now respected — before/after callbacks return early
when ctx.mode == "off", preventing metric tracking in off mode
2. HIGH: versioned Gemini model IDs now resolve correctly — added
_resolve_pricing_key() with suffix stripping (-preview-XX-XX,
-YYYYMMDD, -latest, -exp-N) and longest-prefix fallback matching
3. MEDIUM: callback key collision fixed — switched from
(invocation_id, agent_name) tuple to id(callback_context) int key,
guaranteeing uniqueness even for concurrent calls with same IDs
4. MEDIUM: fail_open tests now patch the correct symbol
(cascadeflow.integrations.google_adk.get_current_run instead of
cascadeflow.harness.api.get_current_run)
5. MEDIUM: budget error response no longer leaks spend/limit numbers —
user-facing message is generic, exact figures logged at warning level
Added 13 new tests: off-mode behavior (2), versioned model pricing (7),
callback key collision (4). Total: 62 ADK tests pass.
Full suite: 1097 passed, 69 skipped, 0 failures.
* feat(harness): add anthropic python auto-instrumentation for v2.1
* feat(core): deliver v2.1 ts harness parity and sdk auto-instrumentation
* test(harness): add comprehensive Anthropic auto-instrumentation tests
Add 29 tests covering the Anthropic Python SDK monkey-patching that was
introduced in v2.1. Tests cover usage extraction, tool call counting,
sync/async wrapper behavior, budget enforcement in enforce mode, stream
passthrough, cost/energy/latency tracking, and init/reset lifecycle.
* feat(harness): instrument Anthropic streaming usage and tool calls
* fix(harness): finalize stream metrics on errors and harden env parsing
* docs: add harness quickstart and missing integration coverage
* feat(n8n): add multi-dimensional harness integration to Agent node
Port the Python harness decision engine to TypeScript and wire it into
the n8n Agent node. Tracks 5 dimensions (cost, latency, energy, tool
calls, quality) across every LLM call. Observe mode is on by default;
enforce mode stops the agent loop when limits are hit.
- Add nodes/harness/ with pricing (18 models, fuzzy resolution),
HarnessRunContext (7-step decision cascade, compliance allowlists,
KPI-weighted scoring), and 43 tests
- Replace hardcoded estimatesPerMillion in CascadeChatModel with shared
harness/pricing.ts (broader model coverage + suffix stripping)
- Add harness UI parameters to Agent node (mode, budget, tool cap,
latency cap, energy cap, compliance, KPI weights)
- Wire pre-call checks and tool-call counting into agent executor loop
- Add harness summary to Agent output JSON
* fix(google-adk): initialize plugin name and stabilize callback correlation
* chore(dx): clarify integration prerequisites and add optional integration CI
* style: apply Black formatting to 7 Python files
Fix CI Python Code Quality check — these files drifted from Black
formatting after recent merges into the integration branch.
* chore(ci/docs): enforce integration matrix across python versions
* style: fix ruff I001 import sorting in google_adk_harness example
* feat(benchmarks): add baseline and savings metrics to agentic tool benchmark
* feat(dx): add LangChain harness docs, harness example, and llms.txt
Close V2 Go/No-Go gaps:
- Add harness section to langchain_integration.md documenting
HarnessAwareCascadeFlowCallbackHandler and get_harness_callback
- Create langchain_harness.py example (matches CrewAI/OpenAI Agents/ADK pattern)
- Create llms.txt at repo root for LLM-readable project discovery
- Update V2 workboard: all feature branches merged, Go/No-Go checklist updated
* harden harness: input validation, trace rotation, NaN guard, phantom model fix
- Add _validate_harness_params() to init() and run() — rejects negative
budget/tool_calls/latency/energy and invalid compliance strings
- Add trace rotation (MAX_TRACE_ENTRIES=1000) in both Python and TypeScript
to prevent unbounded memory growth in long-running agents
- Add sanitizeNumericParam() in n8n harness.ts — coerces NaN/Infinity/negative
config values to null
- Remove phantom gpt-5-nano from llms.txt (not in any pricing table)
- Document HarnessRunContext thread-safety limitation in docstring
- Add 10 new tests covering validation, compliance, and trace rotation
* docs: reframe positioning as agent runtime intelligence layer + add Mintlify docs site
Phase 0 — GitHub refresh:
- pyproject.toml: update description, keywords, classifier to Production/Stable
- __init__.py: replace emoji docstring with harness API focus
- llms.txt: expand from 88 to 214 lines (HarnessConfig, pricing, energy, integrations)
- README.md: new H1, comparison table, Harness API section, 6 new feature rows
- docs/README.md: Mintlify banner, add LangChain to integrations list
Phase 1 — Mintlify docs site (docs-site/):
- docs.json config (palm theme, 5 tabs, full navigation)
- 36 MDX pages: Get Started (4), Harness (8), Integrations (7),
API Reference (8), Examples (6), index + changelog + contributing
- Logo assets copied from .github/assets/
* fix: switch GitHub Stars badge from social to flat style
Social-style shields.io badges intermittently render as "invalid"
due to GitHub API rate limiting. Flat style is more reliable.1 parent 12905d9 commit 5cd3f58
88 files changed
Lines changed: 10953 additions & 158 deletions
File tree
- .github/workflows
- cascadeflow
- harness
- integrations
- langchain
- tests
- docs-site
- api-reference
- python
- typescript
- examples
- get-started
- harness
- integrations
- logo
- docs
- guides
- strategy
- examples/integrations
- packages
- core
- src
- __tests__
- integrations/n8n/nodes
- CascadeFlowAgent
- LmChatCascadeFlow
- harness
- __tests__
- tests
- benchmarks
- bfcl
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
47 | 47 | | |
48 | 48 | | |
49 | 49 | | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
50 | 96 | | |
51 | 97 | | |
52 | 98 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
9 | | - | |
| 9 | + | |
10 | 10 | | |
11 | 11 | | |
12 | 12 | | |
| |||
17 | 17 | | |
18 | 18 | | |
19 | 19 | | |
| 20 | + | |
20 | 21 | | |
21 | 22 | | |
22 | 23 | | |
23 | | - | |
| 24 | + | |
24 | 25 | | |
25 | 26 | | |
26 | 27 | | |
27 | 28 | | |
28 | 29 | | |
29 | 30 | | |
30 | 31 | | |
31 | | - | |
| 32 | + | |
32 | 33 | | |
33 | 34 | | |
34 | 35 | | |
35 | 36 | | |
36 | 37 | | |
37 | | - | |
| 38 | + | |
38 | 39 | | |
39 | | - | |
40 | | - | |
41 | | - | |
| 40 | + | |
42 | 41 | | |
43 | 42 | | |
44 | 43 | | |
| |||
52 | 51 | | |
53 | 52 | | |
54 | 53 | | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
55 | 65 | | |
56 | 66 | | |
57 | 67 | | |
| |||
140 | 150 | | |
141 | 151 | | |
142 | 152 | | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
143 | 181 | | |
144 | 182 | | |
145 | 183 | | |
| |||
724 | 762 | | |
725 | 763 | | |
726 | 764 | | |
| 765 | + | |
| 766 | + | |
| 767 | + | |
| 768 | + | |
| 769 | + | |
| 770 | + | |
727 | 771 | | |
728 | 772 | | |
729 | 773 | | |
| |||
774 | 818 | | |
775 | 819 | | |
776 | 820 | | |
777 | | - | |
| 821 | + | |
778 | 822 | | |
779 | 823 | | |
780 | 824 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | | - | |
3 | | - | |
4 | | - | |
5 | | - | |
6 | | - | |
7 | | - | |
8 | | - | |
9 | | - | |
10 | | - | |
11 | | - | |
12 | | - | |
13 | | - | |
14 | | - | |
15 | | - | |
16 | | - | |
17 | | - | |
18 | | - | |
19 | | - | |
20 | | - | |
21 | | - | |
22 | | - | |
23 | | - | |
24 | | - | |
25 | | - | |
26 | | - | |
27 | | - | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
28 | 21 | | |
29 | 22 | | |
30 | 23 | | |
| |||
240 | 233 | | |
241 | 234 | | |
242 | 235 | | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
243 | 240 | | |
244 | 241 | | |
245 | 242 | | |
246 | 243 | | |
247 | 244 | | |
248 | 245 | | |
249 | 246 | | |
250 | | - | |
251 | 247 | | |
252 | 248 | | |
253 | 249 | | |
| |||
401 | 397 | | |
402 | 398 | | |
403 | 399 | | |
404 | | - | |
405 | 400 | | |
406 | 401 | | |
407 | 402 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
14 | 14 | | |
15 | 15 | | |
16 | 16 | | |
| 17 | + | |
17 | 18 | | |
18 | 19 | | |
19 | 20 | | |
20 | 21 | | |
21 | 22 | | |
| 23 | + | |
22 | 24 | | |
23 | 25 | | |
24 | 26 | | |
| |||
29 | 31 | | |
30 | 32 | | |
31 | 33 | | |
| 34 | + | |
32 | 35 | | |
| 36 | + | |
33 | 37 | | |
34 | 38 | | |
0 commit comments