[pull] main from danny-avila:main by pull[bot] · Pull Request #76 · innFactory/agents

pull · 2026-05-04T04:43:12Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

* Upgrade LangChain and LangGraph * Restore OpenAI delegate customizations * Align Anthropic wrapper with latest LangChain * Test OpenAI Responses API wrapper path * Refresh provider integration tests and soft-fail visibility * Expand upstream provider compatibility tests * Stabilize summarization token audit test * Bump package version for dev testing * Stabilize Google LLM live tests * Run CI for dev pull requests * Restore OpenRouter reasoning details coverage * fix: Preserve OpenRouter streaming invocation extras * fix: Address LangChain upgrade audit findings * fix: Normalize Anthropic stream usage metadata * fix: Preserve Anthropic sampling sentinels * fix: Narrow Anthropic sampling sentinel helper * fix: Guard Anthropic document block detection * fix: Expose active OpenAI delegate client

* feat: add LangChain facade exports * chore: resolve npm audit advisories * chore: keep audit fix cjs-safe

* Added Tavily integration as both a search provider and a scraper provider * refactor: consolidate Tavily types and fix BaseScraper interface - Replace per-provider union types with `AnyScraperResponse` named type - Add optional `scrapeUrls` batch method to `BaseScraper` interface - Rename `raw_content` to `rawContent` in `TavilyScrapeResponse` (camelCase convention) - Add `TavilySearchResult` and `TavilyExtractResult` named types - Split `TAVILY_API_URL` into separate `tavilySearchUrl` and `tavilyExtractUrl` fields - Add configurable `searchDepth` to `SearchConfig` - Flatten `TavilyConfig` into `SearchToolConfig` via `SearchConfig` extension * fix: address Tavily integration review findings Major fixes: - Wrap `new URL().hostname` in try/catch to prevent single bad URL from discarding entire result set (Finding #1) - Remove Tavily from country schema — Tavily API does not support country filtering (Finding #2) - Split shared `TAVILY_API_URL` into `TAVILY_SEARCH_URL` and `TAVILY_EXTRACT_URL` to avoid env var collision (Finding #3) - Reduce default timeout from 60s to 15s, matching other providers (Finding #4) - Implement batch URL extraction via `scrapeUrls` — Tavily Extract supports up to 20 URLs per request (Finding #5) - Remove dead `topStories` computation; let `news` flow through the existing `executeParallelSearches` merge pipeline (Finding #6) - Use `scraper.extractMetadata()` abstraction instead of runtime `'metadata' in` check (Finding #8) - Read `failed_results` from Tavily Extract API for actionable error messages (Finding #12) - Add JSDoc for `'h'` → `'day'` time range approximation (Finding #13) - Make `search_depth` configurable via `searchDepth` config (Finding #14) - Hoist `TAVILY_TIME_RANGE_MAP` to module scope (Finding #15) - Use batch-aware `scrapeMany` that delegates to `scrapeUrls` when available on the scraper instance * test: add comprehensive test suite for TavilyScraper Covers constructor defaults, env var handling, single URL scraping, batch scraping with chunking, mixed success/failure, extractContent, and extractMetadata — 18 tests total. * feat:support updated paramters * chore:tavily parameter updates * fix: align Tavily option handling * fix: forward country to Tavily search * fix: normalize Tavily country options * fix: pass safe search to Tavily * fix: omit Tavily safe search for fast depths * fix: align Tavily country and extract timeout handling * fix: make Tavily safe search opt-in * fix: gate Tavily chunk options * fix: respect Tavily tool search controls * fix: tune Tavily extract timeout defaults * fix: validate Tavily country filters * fix: preserve Tavily scraper overrides * fix: address Tavily review findings * chore: reuse Tavily metadata alias * fix: prioritize request search overrides --------- Co-authored-by: MayRamati <mr4331@columbia.edu> Co-authored-by: yashwanth-alapati <yashalap6966@gmail.com>

* fix: Skip custom OpenAI SSE events * test: Cover Azure SSE filtering * test: Cover OpenAI non-stream passthrough

* fix: Preserve DeepSeek reasoning content * test: Cover DeepSeek reasoning edge cases

* 🛡️ feat: First-class HITL & permissions surface Adds the SDK side of human-in-the-loop and tool-permission support so hosts can build approval flows, audit policies, and ask-user prompts against a stable surface that mirrors Claude Code's vocabulary. What's in: - HITL on by default. `RunConfig.humanInTheLoop` is opt-out via `{ enabled: false }`. When enabled and the host did not provide a checkpointer, `Run.create` installs a process-local `MemorySaver` fallback so `interrupt()` / `Command({ resume })` work out of the box. - ToolNode bundles every `ask`-decision tool call from one batch into a single `interrupt()` carrying a `tool_approval` payload, anchored to the node config via `AsyncLocalStorageProviderSingleton.runWithConfig` (works around `ToolNode.trace = false`). Resume decisions support approve / reject / edit / respond, accepted as either an array or a record keyed by `tool_call_id`. - AskUserQuestion as a separate `HumanInterruptType`. New `askUserQuestion()` helper for use inside any custom node; `Run.resume<T>` is generic so the same method handles both interrupt categories. Type guards `isToolApprovalInterrupt` / `isAskUserQuestionInterrupt` for narrowing the payload union. - `createToolPolicyHook` declarative permission hook with allow / deny / ask lists + `mode: 'default' | 'dontAsk' | 'bypass'`. Glob `*` patterns. Evaluation order matches Claude Code's: deny → bypass → allow → ask → dontAsk → fallthrough(ask). - New `PostToolBatch` hook event with full `PostToolBatchEntry[]` per dispatch. Per-hook `allowedDecisions` override flows into the interrupt's `review_configs`. `additionalContext` from any hook now injects a single `HumanMessage` into the conversation (previously collected but discarded). `preventContinuation` honored both pre-stream (early return) and mid-flight (`HookRegistry.haltRun()` signal polled by `Run.processStream`, skips the `Stop` hook). Async fire-and-forget hooks via `{ async: true }` — output is ignored, background work continues. - `Run.getInterrupt()` and `Run.getHaltReason()` for hosts to inspect why the run paused or stopped. Re-exports `Command`, `MemorySaver`, `BaseCheckpointSaver`, `INTERRUPT`, `interrupt`, `isInterrupted`, and the `Interrupt` type from `@langchain/langgraph` so hosts that build durable checkpointers extend the same instance the SDK was compiled against (no version skew). 49 new tests across `src/hooks/__tests__/createToolPolicyHook.test.ts` and `src/tools/__tests__/hitl.test.ts`; full hooks/tools/summarization suite (616 tests) green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * 🔖 chore: bump version to 3.1.75-dev.2 Carries the HITL & permissions surface to the next dev pre-release so LibreChat can pin it during Slice B integration. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * 🛠️ fix: address Codex review on HITL ToolNode Three issues from the PR #134 review: 1. P1: Unknown HITL resume decision types now fail closed instead of silently flowing into `approvedEntries`. Hosts deserialize resume payloads from untyped JSON, so a typo (`'aproved'`) or schema drift would have bypassed the approval gate entirely. Restructured to explicit approve/edit/reject/respond branches with a typed-widened final check that routes anything else through `blockEntry` with a diagnostic reason. 2. P2: `PostToolBatch` entries now carry the post-`PostToolUse`-hook output. Previously the batch entry recorded `result.content` even when a hook had replaced it via `updatedOutput`, so audit / redaction / convention-injecting batch hooks observed stale or unredacted data. Added a loop-scoped `finalToolOutput` that the PostToolUse-hook branch updates and the batch entry reads. 3. P2: `PostToolUseFailure` `additionalContext` is now injected into the next model turn. The hook result was awaited but discarded, so recovery guidance returned on tool errors (e.g. "if this errors, suggest the user retry with X") was silently dropped despite the API surface advertising the field. Now collected into the same batch additional-contexts accumulator. Three new tests in `src/tools/__tests__/hitl.test.ts` under the new "Codex review fixes" describe block: - host sends `{ type: 'aproved' }` → blocked, tool never dispatched - PostToolUse rewrite → batch entry sees the redacted output - tool errors with PostToolUseFailure additionalContext → injected 31/31 HITL tests pass; full hooks/tools/summarization suite (619 tests) still green; typecheck clean; build clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * 📝 docs: correct HITL default-on contract on RunConfig.humanInTheLoop Codex review on PR #134 caught that the JSDoc still described the opt-in semantics from before the default-on flip. Integrators reading the type docs would omit the field expecting no interrupts, then hit unexpected pauses + a checkpointer fallback in production. Aligns the contract with runtime behavior: HITL is on by default (omit or `{ enabled: true }` engages everything); opt out with `{ enabled: false }` to restore the pre-HITL fail-closed path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * 🛠️ fix: address comprehensive review on HITL surface Codex P1 (preserve session hooks across HITL resumes): - `Run.processStream` finally block was unconditionally calling `hookRegistry.clearSession(this.id)` — including after an interrupt fired. The very next call is `Run.resume()`, which needs the same policy hooks (e.g. the `PreToolUse` matcher that produced the interrupt) to fire on the re-executed node and uphold the approval flow. Clearing leaked the gate on resume. Now gated on `this._interrupt == null` so the session survives until either natural completion, error, or hook-driven halt. - Regression test covers the full lifecycle: interrupt → session matcher still present, resume → hook fires again (preCallCount === 2, proving policy actually applied), then natural completion → matcher cleared. Comprehensive review (audited 10 findings, addressed 7 valid): #2 — Mixed deny/ask/allow batch test: previously missing. Added a test with three tools where the policy denies one, asks one, and allows one. Verifies that on first pass only the ask appears in the interrupt and NO tools dispatched (LangGraph rolls back the node's effects on `interrupt()` throw — partial execution while a human is being asked would leak side effects ahead of approval). On resume, all three tools land in state with the correct outcomes. #4 — Documented in `ToolApprovalDecision` JSDoc that `respond` does NOT fire per-tool `PostToolUse` (no real execution happened) but DOES appear in the `PostToolBatch` entry array, so batch-level audit / convention hooks see the full set of outcomes. #5 — Added clarifying JSDoc at both hook-context injection sites (`Run.runPreStreamHooks`, `ToolNode.dispatchToolEvents`) explaining why we use `HumanMessage` with `additional_kwargs.role: 'system'`: mid-conversation `SystemMessage`s are rejected by Anthropic and Google providers; the `role` field is metadata only for hosts inspecting state. Mirrors the existing `convertInjectedMessages` convention. #6 — Decomposed `processStream`: extracted the pre-stream hook block (RunStart + UserPromptSubmit + context injection + halt short-circuit) into a private `runPreStreamHooks()` helper. Returns a boolean: `true` to halt, `false` to proceed. Cuts the method's body and matches AGENTS.md "Break complex operations into well-named helpers." #8 — Moved `HookHaltSignal` interface above the `HookRegistry` class JSDoc block so it doesn't read like a continuation of the Map-mutation rationale. #9 — `blockEntry` now records `turn` on its `PostToolBatch` entry push, matching the executed-path push. Same applies to the `respond` branch. Batch hooks now see uniform entry shapes regardless of outcome. Not addressed (intentionally, per project direction): - #1 default-on HITL: deliberate — most apps expect this. JSDoc on `RunConfig.humanInTheLoop` already calls out the contract. - #3 mixed-resume test: already covered by the existing "bundles multiple ask decisions into a single interrupt and resolves per call" test (resumes with `[approve, reject]`). - #7 unconditional MemorySaver allocation: HITL-on-by-default contract; opt out with `{ enabled: false }` for the latency- sensitive path. Per-Run MemorySaver overhead is small. - #10 `HumanInterruptType` exported but unused: kept as public API surface; LibreChat's wire types reference it. Validation: 32/32 HITL tests pass (2 new under "Codex review fixes" describe), full hooks/tools/summarization suite (621 tests) green, typecheck clean, lint clean on touched files (one pre-existing warning at `src/tools/ToolNode.ts:751` from `df6b739`, untouched), build clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * 🛠️ fix: address follow-up review on HITL extraction Four findings from the follow-up review on commit 389cc88: #1 (MAJOR) — runPreStreamHooks side-effect documentation. The extracted helper had three side effects (mutating `stateInputs.messages`, mutating `config.callbacks`, calling `registry.clearSession`) but the JSDoc only mentioned the first. Added a "Side effects" section that calls out each one and explains why `config.callbacks` is dropped inside the helper rather than by the caller (mirrors processStream's natural-completion cleanup). Also added a one-line comment on the defensive `this.Graph == null` guard explaining why it's there even though `processStream` already validates Graph (TS narrowing doesn't propagate across method boundaries; the guard keeps the body free of `!` assertions). #2 (MINOR) — stream errors after interrupt detection no longer preserve stale session hooks. Previously the `finally` guard checked only `_interrupt == null`, so a downstream handler throwing after the interrupt was stashed would re-throw, hit the finally with `_interrupt != null`, and leak the session matchers into the next run. Added a `streamThrew` flag set in the `catch` block and widened the guard to `_interrupt == null || streamThrew` so a captured-but-stale interrupt also triggers cleanup. New regression test covers this exactly: `customHandler` throws once it detects `run.getInterrupt() != null`, asserts the run still exposed the interrupt, AND that session hooks were cleared anyway. #3 (MINOR) — mixed deny/ask/allow batch test now subscribes a `PostToolBatch` hook and verifies the batch entry shape after resume reflects the FINAL outcomes (tool_a → error with policy reason, tool_b → success with ran:tool_b output, tool_c → success with ran:tool_c output). Closes the validation loop on the PostToolBatch entry data that batch-level audit / convention hooks would observe. #4 (NIT) — blockEntry's `turn` JSDoc no longer overpromises "mirrors the executed-path turn." Now correctly describes it as the pre-invocation count of prior successful invocations of the same tool name, with the same shape the executed path captures before incrementing. Not changed: - #5 (NIT) defensive Graph-null guard kept; reviewer agreed "no change needed." Added a one-line comment for clarity. Validation: 34/34 HITL tests (2 new under "Codex review fixes" describe), 622 across hooks/tools/summarization, typecheck clean, lint clean on touched files (only the pre-existing src/tools/ToolNode.ts:751 warning untouched), build clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * 🛠️ fix: address two more Codex P2s on ToolNode HITL flow Two findings on the HITL deny path; both were silent correctness bugs in mixed batches: 1. Duplicate ON_RUN_STEP_COMPLETED on resume (deny + ask batch) — `blockEntry` was dispatching `ON_RUN_STEP_COMPLETED` and the `PermissionDenied` hook synchronously the moment a deny decision landed. When the same batch also held an `ask` decision, the subsequent `interrupt()` threw and LangGraph rolled back the node; on resume the node re-executed from scratch, `blockEntry` fired again, and the host saw two completion events for one logical denial. Fix: queue both side effects in `deferredBlockedSideEffects` instead of firing immediately, then flush exactly once at the end of the hook-handling block. The first-pass throw skips the flush; resume / no-interrupt passes reach it once. New test counts the dispatched `ON_RUN_STEP_COMPLETED` events for the denied tool's call_id and asserts exactly one across the interrupt + resume lifecycle. 2. PostToolBatch entry order non-deterministic in mixed batches — entries used to be `push`'d into a flat array at three different call sites (deny synchronous, approved post-execution, respond on resume), so a batch with `[allowed, denied]` could surface `[denied, allowed]` in the entry list whenever the deny path resolved first. The API contract documents these entries as reflecting the original `toolCalls` order, and hooks correlating outcomes by position would misattribute. Fix: track entries in a `Map<callId, PostToolBatchEntry>` and materialize the array in `toolCalls` order at dispatch time. Two new tests pin the order under both `[deny, allow]` and `[allow, deny]` orderings; the existing mixed-batch test now also asserts the full `[call_a, call_b, call_c]` sequence on the snapshot. Validation: 36/36 HITL tests (3 new), 624 across hooks/tools/ summarization, typecheck clean, lint clean (only pre-existing ToolNode.ts:751 warning untouched), build clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * 🧹 chore: tighten mixed-batch test assertion + correct stale comment Round 3 review #2 (NIT). The mixed deny/ask/allow batch test comment claimed PostToolBatch fires on the interrupted pass with only tool_a, then again on resume with all three entries. That was incorrect: PostToolBatch is dispatched at the bottom of `dispatchToolEvents`, after tool execution, so `interrupt()` throws before the dispatch ever runs. Only the resume pass yields a snapshot. Tightened the assertion from `toBeGreaterThanOrEqual(1)` to `toHaveLength(1)` so the test pins the actual behavior, and reworded the comment to match. Future contributors reading this test now build the right mental model of when PostToolBatch fires relative to interrupts. The snapshot's per-entry assertions and order assertion (added in commit 6a9b6fd) are unchanged. Round 3 review #1 (the duplicate denial side-effect events) was already addressed in commit 6a9b6fd via the `deferredBlockedSideEffects` queue + post-interrupt flush. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * 🧹 chore: handle two NITs from Round 4 review #1: Add explanatory comment on the `respond` decision branch's immediate `dispatchStepCompleted` call. Without context, the asymmetry with `blockEntry`'s deferred dispatch reads as inconsistent — a future contributor might either "fix" respond to also defer (unnecessary, wastes effort) or wonder if it's a latent bug. The new comment explains: respond only executes inside the decision- processing loop, which is reachable only AFTER `interrupt()` has returned, so there's no rollback risk and no risk of duplicate ON_RUN_STEP_COMPLETED dispatch. #2: Add a test for mixed respond + reject in the same resume batch. Both decisions land on the same resume pass: respond dispatches ON_RUN_STEP_COMPLETED immediately via the resume branch; reject queues into deferredBlockedSideEffects and dispatches via the flush. The test asserts each tool dispatches exactly once, the PostToolBatch snapshot fires once with entries in the original toolCalls order, and the ToolMessages match the resume decisions. Pins the timing interaction between the immediate-dispatch and deferred-flush paths. Validation: 37/37 HITL tests pass, 625 across hooks/tools/ summarization, typecheck clean, lint clean (only pre-existing ToolNode.ts:751 warning untouched), build clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * 🛡️ fix: enforce allowedDecisions on resume + widen interrupt payload typing Two new Codex findings, both real correctness gaps in the HITL surface: P1 — `allowedDecisions` not enforced on resume. A `PreToolUse` hook returning `{ decision: 'ask', allowedDecisions: ['approve', 'reject'] }` advertises the restricted set in the interrupt's `review_configs`, but the SDK's resume handler never validated incoming `decision.type` against it. A buggy or hostile host UI could submit `{ type: 'edit', updatedInput: {...} }` and bypass the policy — mutating arguments or substituting a synthetic tool result that the hook explicitly forbade. Fix: per-entry check that reads `decision.type` through a widened view (handles untyped JSON / typo / missing field) and routes anything not in the per-call `allowedDecisions` allowlist through `blockEntry` with a diagnostic reason. The check fires before the existing approve/edit/reject/respond branches, so policy enforcement happens regardless of which decision type was offered. Two new tests pin both directions: an `edit` submitted against an allowlist of `['approve', 'reject']` is blocked and the host event never dispatches; an `approve` submitted against the same allowlist passes through and the tool runs with original args. P2 — `RunInterruptResult.payload` was typed as `HumanInterruptPayload` (the SDK's `tool_approval` / `ask_user_question` discriminated union), but `Run.resume` documents support for custom interrupt payloads from custom graph nodes. Hosts raising arbitrary shapes got back a mistyped value and downstream code could falsely assume the discriminator-based fields existed. Fix: - `RunInterruptResult<TPayload = HumanInterruptPayload>` is now generic. Default preserves the existing ergonomics for HITL-only consumers; custom hosts pass their own type. - `Run.getInterrupt<T = HumanInterruptPayload>(): RunInterruptResult<T>` is generic. Callers assert what shape they expect. - Internal storage typed as `RunInterruptResult<unknown>`, reflecting that the SDK does not validate the runtime payload. - `isToolApprovalInterrupt` / `isAskUserQuestionInterrupt` widened to accept `unknown` and narrow defensively (`typeof === 'object'`, null check, then discriminator). Lets host code call them on raw values from `getInterrupt()` without first asserting a type. Two new tests pin both: a custom-payload interrupt is captured and returned through `getInterrupt<MyCustom>()` with the right shape; the type guards safely return `false` for `null`, `undefined`, primitives, empty objects, and wrong-discriminator objects. Validation: 41/41 HITL tests pass (4 new), 629 across hooks/tools/ summarization, typecheck clean, lint clean (only pre-existing ToolNode.ts:751 warning untouched), build clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * 🧹 chore: hoist single declaredType read in HITL decision loop Round 5 review NIT. The `allowedDecisions` enforcement guard (commit 21692f2) introduced `declaredDecisionType`, which read the exact same widened value as the existing `declaredType` further down. Two reads from the same object under different names with no functional difference — pure cosmetic redundancy. Hoisted one `declaredType` constant before both checks, removed the duplicate. The two callers (allowlist enforcement, unknown-type fallthrough) now share the single source so any future change to how the wire shape is widened lands in one place. Behavior unchanged. 41/41 HITL tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * 🛡️ fix: apply updatedInput before ask + capture nullish interrupt payloads Two new Codex findings, both real correctness bugs: P1 — Hook returning `decision: 'ask'` + `updatedInput` silently dropped the rewrite. The `ask` branch in `dispatchToolEvents`'s PreToolUse loop pushed to `askEntries` and immediately `continue`d, skipping the `applyInputOverride` block below. Result: - The interrupt payload's `action_requests[i].arguments` surfaced the ORIGINAL args to the reviewer (e.g. unredacted secrets a sister matcher had rewritten). - On approve, the host event dispatched with the ORIGINAL args too — the policy rewrite was lost entirely. Fix: call `applyInputOverride` before queuing into `askEntries` when both signals are present. Real-world pattern: one matcher redacts/sanitizes, another matcher requires approval; both must take effect. New test seeds a hook returning `{decision: 'ask', updatedInput: {command: 'redacted-command'}}` against an original arg of `'original-secret'`, then asserts (a) the interrupt payload shows the redacted value, and (b) the host event dispatches with the redacted value after approve. P2 — Nullish interrupt payloads (`interrupt(null)` / `interrupt(undefined)`) silently downgraded paused runs to "completed." The detection block in `processStream` only stashed the interrupt when `first.value != null`; for nullish payloads, `_interrupt` stayed undefined, `getInterrupt()` returned undefined, the run looked like a natural completion, and the `Stop` hook fired. A custom node that pauses without metadata (the pause itself is the signal) was effectively invisible to the host's resume handling. Fix: capture unconditionally on any `__interrupt__` chunk; preserve the payload as-is (may be `null` / `undefined`). New test pins: a custom node calling `interrupt(null)` produces a defined `getInterrupt()` result with `payload === null`, and the Stop hook does NOT fire. Validation: 43/43 HITL tests pass (2 new), 631 across hooks/tools/ summarization, typecheck clean, lint clean (only pre-existing ToolNode.ts:751 warning untouched), build clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * 🛡️ fix: scope halt signals per session + add tool_call_id to review_configs Two findings from a Round 6 review (the colleague review aligned with the latest Codex P1): P1 — Cross-run halt-signal bleed. `HookRegistry._haltSignal` was a single `HookHaltSignal | undefined` field. Hosts that share one registry across concurrent runs (a normal pattern — a global policy hook lives on a shared registry) would see `preventContinuation` from run A trip run B's stream-loop poll on the next iteration, silently terminating an unrelated run. Real correctness bug for multi-tenant / parallel-run hosts. Fix: - Replace `_haltSignal` with `haltSignals: Map<sessionId, HookHaltSignal>`. - `haltRun(sessionId, reason, source)` — first-write-wins per session. - `getHaltSignal(sessionId)` / `clearHaltSignal(sessionId)` — scoped reads and clears. - `executeHooks` passes its `sessionId` (= the run id every in-tree callsite already uses) into `haltRun`. Hooks fired without a sessionId can't raise a halt — there's no run for the loop to poll under, which is the correct semantic. - `Run.processStream` polls / clears with `this.id` everywhere (resetValues, both pre-stream early-return paths, the mid-stream loop poll, and the `finally` cleanup). New regression test runs two Run instances on a shared registry: a RunStart hook halts run A; run B's signal stays undefined throughout, B's processStream completes naturally, and A's session entry is cleaned up before B even starts. Optional nit also addressed: add `tool_call_id` to `ToolApprovalReviewConfig`. The previous shape only carried `action_name`, so a UI mapping `review_configs[i]` → `action_requests[j]` had to assume positional pairing — fragile when a batch contains the same tool called twice (different `tool_call_id`s, same `name`). Carrying `tool_call_id` lets the UI key by id directly, matching the same field already on the action_requests side. New test pins both sides for a duplicate-tool batch. Validation: 45/45 HITL tests pass (2 new), 633 across hooks/tools/ summarization, typecheck clean, lint clean (only pre-existing ToolNode.ts:751 warning untouched), build clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * 🛡️ fix: address Round 7 comprehensive review (8 findings, 5 new tests) Three MAJOR fixes (real correctness gaps), three MINOR (refactor + coverage + observability), two NIT (docs + style). One MINOR (MemorySaver warning) deferred — see below. F1 (MAJOR) — `respond` decision bypassed truncation. Every other tool-output path runs through `truncateToolResultContent` to enforce `maxToolResultChars`, but the `respond` branch wrote `decision.responseText` raw into the ToolMessage. A user pasting a large document as a manual response could blow past the model context window. Fix: truncate up front and use the truncated value in the ToolMessage, the PostToolBatch entry, and the `dispatchStepCompleted` event so all three see the same final output. Test pins: 200-char response with `maxToolResultChars: 50` → ToolMessage and batch entry both carry the truncated value. F3 (MAJOR) — Session hooks leaked when interrupt + halt coexisted. A hook returning BOTH `decision: 'ask'` AND `preventContinuation: true` set both signals (interrupt captured, halt raised). The `finally` block's session-clear guard checked only `_interrupt == null || streamThrew`, so it preserved sessions even though the run was halted (no resume expected). Fix: widen the guard to `_interrupt == null || _haltedReason != null || streamThrew`. Test pins: hook returns ask + preventContinuation, both signals land on the Run, and session matchers are cleared. F2 (MAJOR) — `async: true` hook docs over-promised. The current implementation awaits the hook callback fully (subject to timeout) and only ignores its OUTPUT in the fold. The previous JSDoc said "the agent already moved on" implying detachment, which is false. Fix: rewrote the JSDoc to be honest about the semantic — the return value is ignored for influence, but the callback promise is still awaited. The fire-and-forget pattern requires the hook body to detach via `void promise.catch()` and return immediately (the SDK can't speculatively detach based on output shape it hasn't seen yet). Added a WRONG-pattern example showing the common mistake (`await` inside the body undoes the intent). F5 (MINOR) — Extracted two helpers from `dispatchToolEvents`: - `buildToolApprovalInterruptPayload(askEntries)` — pure module- scope function, takes the askEntries list, returns the `tool_approval` payload. Exposed `AskEntry` as a module-scope type so it can move out of the function body. - `dispatchPostToolBatchAndInjectContext({...})` — private async method on ToolNode, materializes batch entries in `toolCalls` order, fires `PostToolBatch`, and pushes the consolidated context HumanMessage. Cuts ~70 lines from the main function. Both extractions preserve behavior; existing tests cover them. F6 (MINOR) — Added direct tests for `Run.resume()` and `Run.getHaltReason()`. Previously the resume codepath was only exercised through `graph.invoke(new Command(...))` and getHaltReason had only the pre-stream `preventContinuation` test. New tests: - `Run.resume([{type:'approve'}], config)` end-to-end through the wrapper API after an interrupt. - `Run.getHaltReason()` returns `'PII detected'` when `UserPromptSubmit` denies the prompt with a reason. - `Run.getHaltReason()` returns `'prompt_requires_approval'` when `UserPromptSubmit` returns `decision: 'ask'` without a custom reason. F7 (MINOR) — `UserPromptSubmit` deny/ask now sets `_haltedReason`. Previously the only signal that landed in `_haltedReason` was `preventContinuation`; deny/ask returned `undefined` from processStream with no way for hosts to distinguish "blocked" from "completed naturally with no output." Fix: set canonical reasons (`prompt_denied`, `prompt_requires_approval`) when the hook doesn't supply one, surface the hook's `reason` when it does. F8 (NIT) — Added JSDoc on `HumanInterruptType` documenting the downstream consumer (LibreChat's wire types in `librechat-data-provider`) so the export isn't read as dead code. F9 (NIT) — Removed `// src/path/to/file.ts` header comments from the four new files (askUserQuestion, hitl/index, createToolPolicyHook, types/hitl). Per AGENTS.md "Avoid standalone `//` comments unless absolutely necessary." The descriptive block comments below them stay. Deferred: F4 (MemorySaver warning). The `humanInTheLoop` JSDoc already documents "production hosts should always provide a durable checkpointer." A console.warn per Run.create would push hosts to opt out, defeating the deliberate default-on contract. If observability becomes a real issue, a debug-gated log via `emitAgentLog` is the right shape — but that needs a logger plumb-through and is out of scope for this PR. Validation: 50/50 HITL tests pass (5 new), 638 across hooks/tools/ summarization, typecheck clean, lint clean (only pre-existing ToolNode.ts:808 warning untouched), build clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * 🛡️ fix: guard malformed respond decisions + cover prompt_denied canonical default Codex P1: malformed respond payloads (no responseText, or non-string responseText) crashed `truncateToolResultContent` on `content.length` — turning a fail-closed approval path into a hard run failure for one bad wire payload. Hosts deserialize resume payloads from untyped JSON, so the SDK can't trust the shape. Fix: validate `responseText` is a string BEFORE passing to truncation. If missing or non-string, route through `blockEntry` with a diagnostic that includes the actual offered shape (`<missing>` or the typeof) so the host can debug. Two regression tests pin both cases: - `{ type: 'respond' }` (no field) → blocked, error mentions `'<missing>'`, host event never dispatched - `{ type: 'respond', responseText: 42 }` → blocked, error mentions `'number'` Also addresses Round 7 follow-up review N2 (NIT, marked "not worth filing" by the reviewer): added an explicit test for the `prompt_denied` canonical default — the existing test only covered the custom-reason path. Now both the custom and canonical paths are explicitly pinned for both `decision: 'deny'` and `decision: 'ask'`. Skipped follow-up review N1 (pre-existing `// src/run.ts` header) — F9 was scoped to new files only; the reviewer agreed the scope is intentional. Validation: 53/53 HITL tests pass (3 new), 641 across hooks/tools/ summarization, typecheck clean, lint clean (only pre-existing warning untouched), build clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * 🛡️ fix: validate edit payloads + correct HITL default doc + scope direct-tool gap Three findings from Round 8 review (Codex's 5; 1 stale, 2 architectural deferred per the reviewer's own option, 2 fixed here + 1 doc): P2 (Codex new) — Malformed `edit` decisions weren't validated. Same trust boundary as the `respond` fix in 6a93e55: hosts deserialize resume payloads from untyped JSON, and `{ type: 'edit' }` (no `updatedInput`), `{ type: 'edit', updatedInput: 'string' }`, or `{ type: 'edit', updatedInput: [...] }` would feed garbage into `applyInputOverride` and silently approve a tool with the wrong-shape args. Fix: typeof + null + Array.isArray check before approving; fail-closed via `blockEntry` with a diagnostic that includes the offered shape (`<missing>` / `null` / `array` / typeof). Three regression tests pin all three malformed cases. Refactored both `respond` and `edit` diagnostic-formatting code into a single `describeOfferedShape(value)` module-scope helper to drop the nested-ternary lint warnings the new check introduced and keep the diagnostic strings consistent across both decision branches. P3 (Codex new) — `ToolNodeOptions.humanInTheLoop` JSDoc still described the pre-default-on contract ("undefined keeps fail-closed"). Updated to mirror `RunConfig.humanInTheLoop`'s default-on language and explicitly note the event-driven-only scope (the same caveat already documented on the parallel `hookRegistry` field). Architectural P2s deferred — Codex's reviewer explicitly offered this path: "If this PR is only meant to unblock LibreChat's event-driven tool approval path, they can be documented as out of scope." - Direct tools (those routed through `directToolNames` like handoffs/subagents) bypass HITL entirely. - Mixed direct+event batches re-execute direct tools on resume because LangGraph rolls back the entire ToolNode on `interrupt()`. Both are real but require restructuring `ToolNode.run` to defer direct execution until after HITL approval — non-trivial and better done as a focused follow-up. Documented explicitly in `HumanInTheLoopConfig`'s JSDoc with practical implications for hosts: LibreChat-style event-driven hosts get the full surface; direct-tool hosts need to switch to event-driven mode or accept that direct tools aren't approval-gated; mixed batches with side-effecting direct tools should not be combined with approval-gated event tools. Skipped: P1 (malformed respond) — already fixed in 6a93e55. Codex just hadn't re-evaluated yet. Validation: 56/56 HITL tests pass (3 new), 644 across hooks/tools/ summarization, typecheck clean, lint clean (only pre-existing ToolNode.ts:826 warning untouched), build clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * 🛡️ fix: preserve Graph sidecars across HITL interrupt + resume Codex P1: after a clean HITL interrupt, `processStream` was wiping the sidecars `Run.resume()` needs: - `Graph.resetValues()` on the next invocation cleared `toolCallStepIds`, `stepKeyIds`, `messageStepHasToolCalls`, accumulated `messages`, etc. - `Graph.clearHeavyState()` in the prior `finally` cleared `toolCallStepIds`, `_toolOutputRegistry`, `sessions`, `hookRegistry`, `humanInTheLoop`. Result: when LangGraph re-entered ToolNode on resume, the lookup in `dispatchStepCompleted`'s `toolCallStepIds.get(toolCallId) ?? ''` returned `''`. The host then dispatched `ON_RUN_STEP_COMPLETED` with an empty step id, and `stream.ts` dropped the completed tool result. Specifically broken for LibreChat's stream-resume story where the resumed approval path needs to complete the already-rendered tool step. The existing HITL tests didn't catch this because they replace `run.graphRunnable` with a custom graph whose ToolNode owns its own prefilled `toolCallStepIds` Map — disjoint from the SDK Graph's, so the SDK Graph's clear didn't observably affect the test ToolNode's lookups. Two gates added: 1. `processStream` finally: skip `Graph.clearHeavyState()` when `_interrupt != null && _haltedReason == null && !streamThrew` — i.e., a clean interrupt awaiting resume. Halt-driven and error-driven paths still clean up because no resume is expected. 2. `processStream` entry: skip `Graph.resetValues()` when entering via `Command` (resume). We're continuing an in-flight run, not starting fresh — the sidecars must survive the boundary. Cross-process resume (host rebuilds Run from scratch) is a separate concern handled host-side; documented in `HumanInTheLoopConfig` JSDoc earlier. This fix covers the same-Run-instance pause-and-resume path that's the SDK's primary contract. Two regression tests, paired: - "preserves Graph sidecars across HITL interrupt + resume" wires the test ToolNode to share `run.Graph.toolCallStepIds` by reference (mirroring how `StandardGraph` builds its inner ToolNode at Graph.ts:587). Pre-populates the map inside the agent node (mirroring `attemptInvoke`'s timing). Asserts the entry survives both the interrupt finally AND the resume entry, AND that the resumed dispatch fires `ON_RUN_STEP_COMPLETED` with the real step id (not `''`). - "clears Graph sidecars on natural completion" pins the negative case: a non-interrupt run still triggers `clearHeavyState`, so this gate doesn't accidentally leak memory across runs. Verified the regression tests catch the bug pre-fix by stashing `run.ts` and re-running: the preservation test fails on `expect(toolCallStepIds.has('call_1')).toBe(true)` exactly as the reviewer described. Validation: 58/58 HITL tests pass (2 new), 646 across hooks/tools/ summarization, typecheck/lint/build clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * 🚦 chore: flip HITL default to OFF until host UI ships Hosts (notably LibreChat) don't yet render `tool_approval` interrupts, so a default-on HITL surface would let `ask` decisions pause the run with no resolver — surfaces to end users as a hung tool-call card. Flipping the default to OFF keeps existing hosts on the pre-HITL fail-closed path until they're ready to wire the resume UI. Hosts opt in explicitly with `humanInTheLoop: { enabled: true }`. The plan of record (documented on `HumanInTheLoopConfig`) is to flip the default back to ON in a future minor once the consumer ecosystem is ready end-to-end. - `Run.applyHITLCheckpointerFallback`: only attach the in-memory `MemorySaver` when `enabled === true`. Omitted/false leaves `compileOptions.checkpointer` untouched. - `ToolNode` ask-decision branch: collapse `ask` into a synchronous block (the pre-HITL behavior) unless `enabled === true`. - JSDoc on `HumanInTheLoopConfig`, `RunConfig.humanInTheLoop`, and `ToolNodeOptions.humanInTheLoop` rewritten to lead with default-off semantics + the migration plan. - Tests: rename "default-on" assertion to "default-off blocks", and flip the host-checkpointer-preservation test to thread `enabled: true` explicitly (since default-off no longer engages the fallback path). Bump to 3.1.75-dev.3. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: Preserve DeepSeek same-run reasoning content * fix: Emit sanitized DeepSeek stream callbacks * fix: Harden DeepSeek thinking stream parsing * test: Cover DeepSeek reasoning edge cases * test: Cover DeepSeek streaming edge combinations

danny-avila and others added 9 commits May 3, 2026 01:46

🧭 feat: Add LangChain Facade Exports (#132)

4a78e2f

* feat: add LangChain facade exports * chore: resolve npm audit advisories * chore: keep audit fix cjs-safe

📡 fix: Skip Custom OpenAI SSE Events (#137)

ea59522

* fix: Skip custom OpenAI SSE events * test: Cover Azure SSE filtering * test: Cover OpenAI non-stream passthrough

🧠 fix: Preserve DeepSeek Reasoning Content (#136)

43343cf

* fix: Preserve DeepSeek reasoning content * test: Cover DeepSeek reasoning edge cases

v3.1.76

ce9a1d4

v3.1.77

9b2081b

pull Bot locked and limited conversation to collaborators May 4, 2026

pull Bot added the ⤵️ pull label May 4, 2026

pull Bot merged commit 9b2081b into innFactory:main May 4, 2026
2 checks passed

pull Bot had a problem deploying to publish May 4, 2026 04:43 Failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] main from danny-avila:main#76

[pull] main from danny-avila:main#76
pull[bot] merged 9 commits intoinnFactory:mainfrom
danny-avila:main

pull Bot commented May 4, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pull Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pull Bot commented May 4, 2026 •

edited

Loading