[pull] main from danny-avila:main#76
Merged
pull[bot] merged 9 commits intoinnFactory:mainfrom May 4, 2026
Merged
Conversation
* Upgrade LangChain and LangGraph * Restore OpenAI delegate customizations * Align Anthropic wrapper with latest LangChain * Test OpenAI Responses API wrapper path * Refresh provider integration tests and soft-fail visibility * Expand upstream provider compatibility tests * Stabilize summarization token audit test * Bump package version for dev testing * Stabilize Google LLM live tests * Run CI for dev pull requests * Restore OpenRouter reasoning details coverage * fix: Preserve OpenRouter streaming invocation extras * fix: Address LangChain upgrade audit findings * fix: Normalize Anthropic stream usage metadata * fix: Preserve Anthropic sampling sentinels * fix: Narrow Anthropic sampling sentinel helper * fix: Guard Anthropic document block detection * fix: Expose active OpenAI delegate client
* feat: add LangChain facade exports * chore: resolve npm audit advisories * chore: keep audit fix cjs-safe
* Added Tavily integration as both a search provider and a scraper provider * refactor: consolidate Tavily types and fix BaseScraper interface - Replace per-provider union types with `AnyScraperResponse` named type - Add optional `scrapeUrls` batch method to `BaseScraper` interface - Rename `raw_content` to `rawContent` in `TavilyScrapeResponse` (camelCase convention) - Add `TavilySearchResult` and `TavilyExtractResult` named types - Split `TAVILY_API_URL` into separate `tavilySearchUrl` and `tavilyExtractUrl` fields - Add configurable `searchDepth` to `SearchConfig` - Flatten `TavilyConfig` into `SearchToolConfig` via `SearchConfig` extension * fix: address Tavily integration review findings Major fixes: - Wrap `new URL().hostname` in try/catch to prevent single bad URL from discarding entire result set (Finding #1) - Remove Tavily from country schema — Tavily API does not support country filtering (Finding #2) - Split shared `TAVILY_API_URL` into `TAVILY_SEARCH_URL` and `TAVILY_EXTRACT_URL` to avoid env var collision (Finding #3) - Reduce default timeout from 60s to 15s, matching other providers (Finding #4) - Implement batch URL extraction via `scrapeUrls` — Tavily Extract supports up to 20 URLs per request (Finding #5) - Remove dead `topStories` computation; let `news` flow through the existing `executeParallelSearches` merge pipeline (Finding #6) - Use `scraper.extractMetadata()` abstraction instead of runtime `'metadata' in` check (Finding #8) - Read `failed_results` from Tavily Extract API for actionable error messages (Finding #12) - Add JSDoc for `'h'` → `'day'` time range approximation (Finding #13) - Make `search_depth` configurable via `searchDepth` config (Finding #14) - Hoist `TAVILY_TIME_RANGE_MAP` to module scope (Finding #15) - Use batch-aware `scrapeMany` that delegates to `scrapeUrls` when available on the scraper instance * test: add comprehensive test suite for TavilyScraper Covers constructor defaults, env var handling, single URL scraping, batch scraping with chunking, mixed success/failure, extractContent, and extractMetadata — 18 tests total. * feat:support updated paramters * chore:tavily parameter updates * fix: align Tavily option handling * fix: forward country to Tavily search * fix: normalize Tavily country options * fix: pass safe search to Tavily * fix: omit Tavily safe search for fast depths * fix: align Tavily country and extract timeout handling * fix: make Tavily safe search opt-in * fix: gate Tavily chunk options * fix: respect Tavily tool search controls * fix: tune Tavily extract timeout defaults * fix: validate Tavily country filters * fix: preserve Tavily scraper overrides * fix: address Tavily review findings * chore: reuse Tavily metadata alias * fix: prioritize request search overrides --------- Co-authored-by: MayRamati <mr4331@columbia.edu> Co-authored-by: yashwanth-alapati <yashalap6966@gmail.com>
* fix: Skip custom OpenAI SSE events * test: Cover Azure SSE filtering * test: Cover OpenAI non-stream passthrough
* fix: Preserve DeepSeek reasoning content * test: Cover DeepSeek reasoning edge cases
* 🛡️ feat: First-class HITL & permissions surface
Adds the SDK side of human-in-the-loop and tool-permission support so
hosts can build approval flows, audit policies, and ask-user prompts
against a stable surface that mirrors Claude Code's vocabulary.
What's in:
- HITL on by default. `RunConfig.humanInTheLoop` is opt-out via
`{ enabled: false }`. When enabled and the host did not provide a
checkpointer, `Run.create` installs a process-local `MemorySaver`
fallback so `interrupt()` / `Command({ resume })` work out of the box.
- ToolNode bundles every `ask`-decision tool call from one batch into
a single `interrupt()` carrying a `tool_approval` payload, anchored
to the node config via `AsyncLocalStorageProviderSingleton.runWithConfig`
(works around `ToolNode.trace = false`). Resume decisions support
approve / reject / edit / respond, accepted as either an array or a
record keyed by `tool_call_id`.
- AskUserQuestion as a separate `HumanInterruptType`. New
`askUserQuestion()` helper for use inside any custom node;
`Run.resume<T>` is generic so the same method handles both interrupt
categories. Type guards `isToolApprovalInterrupt` /
`isAskUserQuestionInterrupt` for narrowing the payload union.
- `createToolPolicyHook` declarative permission hook with allow / deny
/ ask lists + `mode: 'default' | 'dontAsk' | 'bypass'`. Glob `*`
patterns. Evaluation order matches Claude Code's:
deny → bypass → allow → ask → dontAsk → fallthrough(ask).
- New `PostToolBatch` hook event with full `PostToolBatchEntry[]` per
dispatch. Per-hook `allowedDecisions` override flows into the
interrupt's `review_configs`. `additionalContext` from any hook now
injects a single `HumanMessage` into the conversation (previously
collected but discarded). `preventContinuation` honored both
pre-stream (early return) and mid-flight (`HookRegistry.haltRun()`
signal polled by `Run.processStream`, skips the `Stop` hook). Async
fire-and-forget hooks via `{ async: true }` — output is ignored,
background work continues.
- `Run.getInterrupt()` and `Run.getHaltReason()` for hosts to inspect
why the run paused or stopped. Re-exports `Command`, `MemorySaver`,
`BaseCheckpointSaver`, `INTERRUPT`, `interrupt`, `isInterrupted`,
and the `Interrupt` type from `@langchain/langgraph` so hosts that
build durable checkpointers extend the same instance the SDK was
compiled against (no version skew).
49 new tests across `src/hooks/__tests__/createToolPolicyHook.test.ts`
and `src/tools/__tests__/hitl.test.ts`; full hooks/tools/summarization
suite (616 tests) green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* 🔖 chore: bump version to 3.1.75-dev.2
Carries the HITL & permissions surface to the next dev pre-release so
LibreChat can pin it during Slice B integration.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* 🛠️ fix: address Codex review on HITL ToolNode
Three issues from the PR #134 review:
1. P1: Unknown HITL resume decision types now fail closed instead of
silently flowing into `approvedEntries`. Hosts deserialize resume
payloads from untyped JSON, so a typo (`'aproved'`) or schema drift
would have bypassed the approval gate entirely. Restructured to
explicit approve/edit/reject/respond branches with a typed-widened
final check that routes anything else through `blockEntry` with a
diagnostic reason.
2. P2: `PostToolBatch` entries now carry the post-`PostToolUse`-hook
output. Previously the batch entry recorded `result.content` even
when a hook had replaced it via `updatedOutput`, so audit /
redaction / convention-injecting batch hooks observed stale or
unredacted data. Added a loop-scoped `finalToolOutput` that the
PostToolUse-hook branch updates and the batch entry reads.
3. P2: `PostToolUseFailure` `additionalContext` is now injected into
the next model turn. The hook result was awaited but discarded, so
recovery guidance returned on tool errors (e.g. "if this errors,
suggest the user retry with X") was silently dropped despite the
API surface advertising the field. Now collected into the same
batch additional-contexts accumulator.
Three new tests in `src/tools/__tests__/hitl.test.ts` under the new
"Codex review fixes" describe block:
- host sends `{ type: 'aproved' }` → blocked, tool never dispatched
- PostToolUse rewrite → batch entry sees the redacted output
- tool errors with PostToolUseFailure additionalContext → injected
31/31 HITL tests pass; full hooks/tools/summarization suite (619
tests) still green; typecheck clean; build clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* 📝 docs: correct HITL default-on contract on RunConfig.humanInTheLoop
Codex review on PR #134 caught that the JSDoc still described the
opt-in semantics from before the default-on flip. Integrators reading
the type docs would omit the field expecting no interrupts, then hit
unexpected pauses + a checkpointer fallback in production.
Aligns the contract with runtime behavior: HITL is on by default
(omit or `{ enabled: true }` engages everything); opt out with
`{ enabled: false }` to restore the pre-HITL fail-closed path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* 🛠️ fix: address comprehensive review on HITL surface
Codex P1 (preserve session hooks across HITL resumes):
- `Run.processStream` finally block was unconditionally calling
`hookRegistry.clearSession(this.id)` — including after an interrupt
fired. The very next call is `Run.resume()`, which needs the same
policy hooks (e.g. the `PreToolUse` matcher that produced the
interrupt) to fire on the re-executed node and uphold the approval
flow. Clearing leaked the gate on resume. Now gated on
`this._interrupt == null` so the session survives until either
natural completion, error, or hook-driven halt.
- Regression test covers the full lifecycle: interrupt → session
matcher still present, resume → hook fires again (preCallCount === 2,
proving policy actually applied), then natural completion → matcher
cleared.
Comprehensive review (audited 10 findings, addressed 7 valid):
#2 — Mixed deny/ask/allow batch test: previously missing. Added a
test with three tools where the policy denies one, asks one, and
allows one. Verifies that on first pass only the ask appears in the
interrupt and NO tools dispatched (LangGraph rolls back the node's
effects on `interrupt()` throw — partial execution while a human is
being asked would leak side effects ahead of approval). On resume,
all three tools land in state with the correct outcomes.
#4 — Documented in `ToolApprovalDecision` JSDoc that `respond` does
NOT fire per-tool `PostToolUse` (no real execution happened) but DOES
appear in the `PostToolBatch` entry array, so batch-level audit /
convention hooks see the full set of outcomes.
#5 — Added clarifying JSDoc at both hook-context injection sites
(`Run.runPreStreamHooks`, `ToolNode.dispatchToolEvents`) explaining
why we use `HumanMessage` with `additional_kwargs.role: 'system'`:
mid-conversation `SystemMessage`s are rejected by Anthropic and
Google providers; the `role` field is metadata only for hosts
inspecting state. Mirrors the existing `convertInjectedMessages`
convention.
#6 — Decomposed `processStream`: extracted the pre-stream hook
block (RunStart + UserPromptSubmit + context injection + halt
short-circuit) into a private `runPreStreamHooks()` helper. Returns
a boolean: `true` to halt, `false` to proceed. Cuts the method's
body and matches AGENTS.md "Break complex operations into
well-named helpers."
#8 — Moved `HookHaltSignal` interface above the `HookRegistry`
class JSDoc block so it doesn't read like a continuation of the
Map-mutation rationale.
#9 — `blockEntry` now records `turn` on its `PostToolBatch` entry
push, matching the executed-path push. Same applies to the `respond`
branch. Batch hooks now see uniform entry shapes regardless of
outcome.
Not addressed (intentionally, per project direction):
- #1 default-on HITL: deliberate — most apps expect this. JSDoc on
`RunConfig.humanInTheLoop` already calls out the contract.
- #3 mixed-resume test: already covered by the existing "bundles
multiple ask decisions into a single interrupt and resolves per
call" test (resumes with `[approve, reject]`).
- #7 unconditional MemorySaver allocation: HITL-on-by-default
contract; opt out with `{ enabled: false }` for the latency-
sensitive path. Per-Run MemorySaver overhead is small.
- #10 `HumanInterruptType` exported but unused: kept as public API
surface; LibreChat's wire types reference it.
Validation: 32/32 HITL tests pass (2 new under "Codex review fixes"
describe), full hooks/tools/summarization suite (621 tests) green,
typecheck clean, lint clean on touched files (one pre-existing
warning at `src/tools/ToolNode.ts:751` from `df6b739`, untouched),
build clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* 🛠️ fix: address follow-up review on HITL extraction
Four findings from the follow-up review on commit 389cc88:
#1 (MAJOR) — runPreStreamHooks side-effect documentation. The
extracted helper had three side effects (mutating
`stateInputs.messages`, mutating `config.callbacks`, calling
`registry.clearSession`) but the JSDoc only mentioned the first.
Added a "Side effects" section that calls out each one and explains
why `config.callbacks` is dropped inside the helper rather than by
the caller (mirrors processStream's natural-completion cleanup).
Also added a one-line comment on the defensive `this.Graph == null`
guard explaining why it's there even though `processStream` already
validates Graph (TS narrowing doesn't propagate across method
boundaries; the guard keeps the body free of `!` assertions).
#2 (MINOR) — stream errors after interrupt detection no longer
preserve stale session hooks. Previously the `finally` guard
checked only `_interrupt == null`, so a downstream handler throwing
after the interrupt was stashed would re-throw, hit the finally
with `_interrupt != null`, and leak the session matchers into the
next run. Added a `streamThrew` flag set in the `catch` block and
widened the guard to `_interrupt == null || streamThrew` so a
captured-but-stale interrupt also triggers cleanup. New regression
test covers this exactly: `customHandler` throws once it detects
`run.getInterrupt() != null`, asserts the run still exposed the
interrupt, AND that session hooks were cleared anyway.
#3 (MINOR) — mixed deny/ask/allow batch test now subscribes a
`PostToolBatch` hook and verifies the batch entry shape after
resume reflects the FINAL outcomes (tool_a → error with policy
reason, tool_b → success with ran:tool_b output, tool_c → success
with ran:tool_c output). Closes the validation loop on the
PostToolBatch entry data that batch-level audit / convention hooks
would observe.
#4 (NIT) — blockEntry's `turn` JSDoc no longer overpromises
"mirrors the executed-path turn." Now correctly describes it as the
pre-invocation count of prior successful invocations of the same
tool name, with the same shape the executed path captures before
incrementing.
Not changed:
- #5 (NIT) defensive Graph-null guard kept; reviewer agreed "no
change needed." Added a one-line comment for clarity.
Validation: 34/34 HITL tests (2 new under "Codex review fixes"
describe), 622 across hooks/tools/summarization, typecheck clean,
lint clean on touched files (only the pre-existing
src/tools/ToolNode.ts:751 warning untouched), build clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* 🛠️ fix: address two more Codex P2s on ToolNode HITL flow
Two findings on the HITL deny path; both were silent correctness
bugs in mixed batches:
1. Duplicate ON_RUN_STEP_COMPLETED on resume (deny + ask batch)
— `blockEntry` was dispatching `ON_RUN_STEP_COMPLETED` and the
`PermissionDenied` hook synchronously the moment a deny decision
landed. When the same batch also held an `ask` decision, the
subsequent `interrupt()` threw and LangGraph rolled back the node;
on resume the node re-executed from scratch, `blockEntry` fired
again, and the host saw two completion events for one logical
denial. Fix: queue both side effects in
`deferredBlockedSideEffects` instead of firing immediately, then
flush exactly once at the end of the hook-handling block. The
first-pass throw skips the flush; resume / no-interrupt passes
reach it once. New test counts the dispatched
`ON_RUN_STEP_COMPLETED` events for the denied tool's call_id and
asserts exactly one across the interrupt + resume lifecycle.
2. PostToolBatch entry order non-deterministic in mixed batches
— entries used to be `push`'d into a flat array at three different
call sites (deny synchronous, approved post-execution, respond on
resume), so a batch with `[allowed, denied]` could surface
`[denied, allowed]` in the entry list whenever the deny path
resolved first. The API contract documents these entries as
reflecting the original `toolCalls` order, and hooks correlating
outcomes by position would misattribute. Fix: track entries in a
`Map<callId, PostToolBatchEntry>` and materialize the array in
`toolCalls` order at dispatch time. Two new tests pin the order
under both `[deny, allow]` and `[allow, deny]` orderings; the
existing mixed-batch test now also asserts the full
`[call_a, call_b, call_c]` sequence on the snapshot.
Validation: 36/36 HITL tests (3 new), 624 across hooks/tools/
summarization, typecheck clean, lint clean (only pre-existing
ToolNode.ts:751 warning untouched), build clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* 🧹 chore: tighten mixed-batch test assertion + correct stale comment
Round 3 review #2 (NIT). The mixed deny/ask/allow batch test
comment claimed PostToolBatch fires on the interrupted pass with
only tool_a, then again on resume with all three entries. That was
incorrect: PostToolBatch is dispatched at the bottom of
`dispatchToolEvents`, after tool execution, so `interrupt()` throws
before the dispatch ever runs. Only the resume pass yields a
snapshot.
Tightened the assertion from `toBeGreaterThanOrEqual(1)` to
`toHaveLength(1)` so the test pins the actual behavior, and reworded
the comment to match. Future contributors reading this test now
build the right mental model of when PostToolBatch fires relative
to interrupts. The snapshot's per-entry assertions and order
assertion (added in commit 6a9b6fd) are unchanged.
Round 3 review #1 (the duplicate denial side-effect events) was
already addressed in commit 6a9b6fd via the
`deferredBlockedSideEffects` queue + post-interrupt flush.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* 🧹 chore: handle two NITs from Round 4 review
#1: Add explanatory comment on the `respond` decision branch's
immediate `dispatchStepCompleted` call. Without context, the
asymmetry with `blockEntry`'s deferred dispatch reads as inconsistent
— a future contributor might either "fix" respond to also defer
(unnecessary, wastes effort) or wonder if it's a latent bug. The new
comment explains: respond only executes inside the decision-
processing loop, which is reachable only AFTER `interrupt()` has
returned, so there's no rollback risk and no risk of duplicate
ON_RUN_STEP_COMPLETED dispatch.
#2: Add a test for mixed respond + reject in the same resume batch.
Both decisions land on the same resume pass: respond dispatches
ON_RUN_STEP_COMPLETED immediately via the resume branch; reject
queues into deferredBlockedSideEffects and dispatches via the flush.
The test asserts each tool dispatches exactly once, the PostToolBatch
snapshot fires once with entries in the original toolCalls order, and
the ToolMessages match the resume decisions. Pins the timing
interaction between the immediate-dispatch and deferred-flush paths.
Validation: 37/37 HITL tests pass, 625 across hooks/tools/
summarization, typecheck clean, lint clean (only pre-existing
ToolNode.ts:751 warning untouched), build clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* 🛡️ fix: enforce allowedDecisions on resume + widen interrupt payload typing
Two new Codex findings, both real correctness gaps in the HITL
surface:
P1 — `allowedDecisions` not enforced on resume. A `PreToolUse` hook
returning `{ decision: 'ask', allowedDecisions: ['approve', 'reject'] }`
advertises the restricted set in the interrupt's `review_configs`,
but the SDK's resume handler never validated incoming
`decision.type` against it. A buggy or hostile host UI could submit
`{ type: 'edit', updatedInput: {...} }` and bypass the policy —
mutating arguments or substituting a synthetic tool result that
the hook explicitly forbade.
Fix: per-entry check that reads `decision.type` through a widened
view (handles untyped JSON / typo / missing field) and routes
anything not in the per-call `allowedDecisions` allowlist through
`blockEntry` with a diagnostic reason. The check fires before the
existing approve/edit/reject/respond branches, so policy
enforcement happens regardless of which decision type was offered.
Two new tests pin both directions: an `edit` submitted against an
allowlist of `['approve', 'reject']` is blocked and the host event
never dispatches; an `approve` submitted against the same allowlist
passes through and the tool runs with original args.
P2 — `RunInterruptResult.payload` was typed as `HumanInterruptPayload`
(the SDK's `tool_approval` / `ask_user_question` discriminated
union), but `Run.resume` documents support for custom interrupt
payloads from custom graph nodes. Hosts raising arbitrary shapes
got back a mistyped value and downstream code could falsely assume
the discriminator-based fields existed.
Fix:
- `RunInterruptResult<TPayload = HumanInterruptPayload>` is now
generic. Default preserves the existing ergonomics for HITL-only
consumers; custom hosts pass their own type.
- `Run.getInterrupt<T = HumanInterruptPayload>(): RunInterruptResult<T>`
is generic. Callers assert what shape they expect.
- Internal storage typed as `RunInterruptResult<unknown>`,
reflecting that the SDK does not validate the runtime payload.
- `isToolApprovalInterrupt` / `isAskUserQuestionInterrupt` widened
to accept `unknown` and narrow defensively (`typeof === 'object'`,
null check, then discriminator). Lets host code call them on raw
values from `getInterrupt()` without first asserting a type.
Two new tests pin both: a custom-payload interrupt is captured and
returned through `getInterrupt<MyCustom>()` with the right shape; the
type guards safely return `false` for `null`, `undefined`, primitives,
empty objects, and wrong-discriminator objects.
Validation: 41/41 HITL tests pass (4 new), 629 across hooks/tools/
summarization, typecheck clean, lint clean (only pre-existing
ToolNode.ts:751 warning untouched), build clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* 🧹 chore: hoist single declaredType read in HITL decision loop
Round 5 review NIT. The `allowedDecisions` enforcement guard
(commit 21692f2) introduced `declaredDecisionType`, which read the
exact same widened value as the existing `declaredType` further
down. Two reads from the same object under different names with
no functional difference — pure cosmetic redundancy.
Hoisted one `declaredType` constant before both checks, removed
the duplicate. The two callers (allowlist enforcement, unknown-type
fallthrough) now share the single source so any future change to
how the wire shape is widened lands in one place.
Behavior unchanged. 41/41 HITL tests still pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* 🛡️ fix: apply updatedInput before ask + capture nullish interrupt payloads
Two new Codex findings, both real correctness bugs:
P1 — Hook returning `decision: 'ask'` + `updatedInput` silently
dropped the rewrite. The `ask` branch in `dispatchToolEvents`'s
PreToolUse loop pushed to `askEntries` and immediately `continue`d,
skipping the `applyInputOverride` block below. Result:
- The interrupt payload's `action_requests[i].arguments` surfaced
the ORIGINAL args to the reviewer (e.g. unredacted secrets a
sister matcher had rewritten).
- On approve, the host event dispatched with the ORIGINAL args
too — the policy rewrite was lost entirely.
Fix: call `applyInputOverride` before queuing into `askEntries`
when both signals are present. Real-world pattern: one matcher
redacts/sanitizes, another matcher requires approval; both must
take effect. New test seeds a hook returning
`{decision: 'ask', updatedInput: {command: 'redacted-command'}}`
against an original arg of `'original-secret'`, then asserts (a)
the interrupt payload shows the redacted value, and (b) the
host event dispatches with the redacted value after approve.
P2 — Nullish interrupt payloads (`interrupt(null)` /
`interrupt(undefined)`) silently downgraded paused runs to
"completed." The detection block in `processStream` only stashed
the interrupt when `first.value != null`; for nullish payloads,
`_interrupt` stayed undefined, `getInterrupt()` returned undefined,
the run looked like a natural completion, and the `Stop` hook
fired. A custom node that pauses without metadata (the pause
itself is the signal) was effectively invisible to the host's
resume handling.
Fix: capture unconditionally on any `__interrupt__` chunk; preserve
the payload as-is (may be `null` / `undefined`). New test pins:
a custom node calling `interrupt(null)` produces a defined
`getInterrupt()` result with `payload === null`, and the Stop
hook does NOT fire.
Validation: 43/43 HITL tests pass (2 new), 631 across hooks/tools/
summarization, typecheck clean, lint clean (only pre-existing
ToolNode.ts:751 warning untouched), build clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* 🛡️ fix: scope halt signals per session + add tool_call_id to review_configs
Two findings from a Round 6 review (the colleague review aligned with
the latest Codex P1):
P1 — Cross-run halt-signal bleed. `HookRegistry._haltSignal` was a
single `HookHaltSignal | undefined` field. Hosts that share one
registry across concurrent runs (a normal pattern — a global policy
hook lives on a shared registry) would see `preventContinuation`
from run A trip run B's stream-loop poll on the next iteration,
silently terminating an unrelated run. Real correctness bug for
multi-tenant / parallel-run hosts.
Fix:
- Replace `_haltSignal` with `haltSignals: Map<sessionId, HookHaltSignal>`.
- `haltRun(sessionId, reason, source)` — first-write-wins per session.
- `getHaltSignal(sessionId)` / `clearHaltSignal(sessionId)` — scoped reads
and clears.
- `executeHooks` passes its `sessionId` (= the run id every in-tree
callsite already uses) into `haltRun`. Hooks fired without a
sessionId can't raise a halt — there's no run for the loop to poll
under, which is the correct semantic.
- `Run.processStream` polls / clears with `this.id` everywhere
(resetValues, both pre-stream early-return paths, the mid-stream
loop poll, and the `finally` cleanup).
New regression test runs two Run instances on a shared registry: a
RunStart hook halts run A; run B's signal stays undefined throughout,
B's processStream completes naturally, and A's session entry is
cleaned up before B even starts.
Optional nit also addressed: add `tool_call_id` to
`ToolApprovalReviewConfig`. The previous shape only carried
`action_name`, so a UI mapping `review_configs[i]` →
`action_requests[j]` had to assume positional pairing — fragile when
a batch contains the same tool called twice (different `tool_call_id`s,
same `name`). Carrying `tool_call_id` lets the UI key by id directly,
matching the same field already on the action_requests side. New test
pins both sides for a duplicate-tool batch.
Validation: 45/45 HITL tests pass (2 new), 633 across hooks/tools/
summarization, typecheck clean, lint clean (only pre-existing
ToolNode.ts:751 warning untouched), build clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* 🛡️ fix: address Round 7 comprehensive review (8 findings, 5 new tests)
Three MAJOR fixes (real correctness gaps), three MINOR (refactor +
coverage + observability), two NIT (docs + style). One MINOR
(MemorySaver warning) deferred — see below.
F1 (MAJOR) — `respond` decision bypassed truncation. Every other
tool-output path runs through `truncateToolResultContent` to enforce
`maxToolResultChars`, but the `respond` branch wrote
`decision.responseText` raw into the ToolMessage. A user pasting a
large document as a manual response could blow past the model
context window. Fix: truncate up front and use the truncated value
in the ToolMessage, the PostToolBatch entry, and the
`dispatchStepCompleted` event so all three see the same final
output. Test pins: 200-char response with `maxToolResultChars: 50`
→ ToolMessage and batch entry both carry the truncated value.
F3 (MAJOR) — Session hooks leaked when interrupt + halt coexisted.
A hook returning BOTH `decision: 'ask'` AND `preventContinuation:
true` set both signals (interrupt captured, halt raised). The
`finally` block's session-clear guard checked only
`_interrupt == null || streamThrew`, so it preserved sessions even
though the run was halted (no resume expected). Fix: widen the
guard to `_interrupt == null || _haltedReason != null ||
streamThrew`. Test pins: hook returns ask + preventContinuation,
both signals land on the Run, and session matchers are cleared.
F2 (MAJOR) — `async: true` hook docs over-promised. The current
implementation awaits the hook callback fully (subject to timeout)
and only ignores its OUTPUT in the fold. The previous JSDoc said
"the agent already moved on" implying detachment, which is false.
Fix: rewrote the JSDoc to be honest about the semantic — the
return value is ignored for influence, but the callback promise
is still awaited. The fire-and-forget pattern requires the hook
body to detach via `void promise.catch()` and return immediately
(the SDK can't speculatively detach based on output shape it
hasn't seen yet). Added a WRONG-pattern example showing the
common mistake (`await` inside the body undoes the intent).
F5 (MINOR) — Extracted two helpers from `dispatchToolEvents`:
- `buildToolApprovalInterruptPayload(askEntries)` — pure module-
scope function, takes the askEntries list, returns the
`tool_approval` payload. Exposed `AskEntry` as a module-scope
type so it can move out of the function body.
- `dispatchPostToolBatchAndInjectContext({...})` — private async
method on ToolNode, materializes batch entries in `toolCalls`
order, fires `PostToolBatch`, and pushes the consolidated
context HumanMessage. Cuts ~70 lines from the main function.
Both extractions preserve behavior; existing tests cover them.
F6 (MINOR) — Added direct tests for `Run.resume()` and
`Run.getHaltReason()`. Previously the resume codepath was only
exercised through `graph.invoke(new Command(...))` and getHaltReason
had only the pre-stream `preventContinuation` test. New tests:
- `Run.resume([{type:'approve'}], config)` end-to-end through
the wrapper API after an interrupt.
- `Run.getHaltReason()` returns `'PII detected'` when
`UserPromptSubmit` denies the prompt with a reason.
- `Run.getHaltReason()` returns `'prompt_requires_approval'`
when `UserPromptSubmit` returns `decision: 'ask'` without
a custom reason.
F7 (MINOR) — `UserPromptSubmit` deny/ask now sets `_haltedReason`.
Previously the only signal that landed in `_haltedReason` was
`preventContinuation`; deny/ask returned `undefined` from
processStream with no way for hosts to distinguish "blocked" from
"completed naturally with no output." Fix: set canonical reasons
(`prompt_denied`, `prompt_requires_approval`) when the hook
doesn't supply one, surface the hook's `reason` when it does.
F8 (NIT) — Added JSDoc on `HumanInterruptType` documenting the
downstream consumer (LibreChat's wire types in
`librechat-data-provider`) so the export isn't read as dead code.
F9 (NIT) — Removed `// src/path/to/file.ts` header comments from
the four new files (askUserQuestion, hitl/index, createToolPolicyHook,
types/hitl). Per AGENTS.md "Avoid standalone `//` comments unless
absolutely necessary." The descriptive block comments below them
stay.
Deferred: F4 (MemorySaver warning). The `humanInTheLoop` JSDoc
already documents "production hosts should always provide a durable
checkpointer." A console.warn per Run.create would push hosts to
opt out, defeating the deliberate default-on contract. If observability
becomes a real issue, a debug-gated log via `emitAgentLog` is the
right shape — but that needs a logger plumb-through and is out of
scope for this PR.
Validation: 50/50 HITL tests pass (5 new), 638 across hooks/tools/
summarization, typecheck clean, lint clean (only pre-existing
ToolNode.ts:808 warning untouched), build clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* 🛡️ fix: guard malformed respond decisions + cover prompt_denied canonical default
Codex P1: malformed respond payloads (no responseText, or non-string
responseText) crashed `truncateToolResultContent` on `content.length`
— turning a fail-closed approval path into a hard run failure for one
bad wire payload. Hosts deserialize resume payloads from untyped JSON,
so the SDK can't trust the shape.
Fix: validate `responseText` is a string BEFORE passing to truncation.
If missing or non-string, route through `blockEntry` with a diagnostic
that includes the actual offered shape (`<missing>` or the typeof) so
the host can debug. Two regression tests pin both cases:
- `{ type: 'respond' }` (no field) → blocked, error mentions
`'<missing>'`, host event never dispatched
- `{ type: 'respond', responseText: 42 }` → blocked, error mentions
`'number'`
Also addresses Round 7 follow-up review N2 (NIT, marked "not worth
filing" by the reviewer): added an explicit test for the
`prompt_denied` canonical default — the existing test only covered
the custom-reason path. Now both the custom and canonical paths are
explicitly pinned for both `decision: 'deny'` and `decision: 'ask'`.
Skipped follow-up review N1 (pre-existing `// src/run.ts` header) —
F9 was scoped to new files only; the reviewer agreed the scope is
intentional.
Validation: 53/53 HITL tests pass (3 new), 641 across hooks/tools/
summarization, typecheck clean, lint clean (only pre-existing
warning untouched), build clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* 🛡️ fix: validate edit payloads + correct HITL default doc + scope direct-tool gap
Three findings from Round 8 review (Codex's 5; 1 stale, 2 architectural
deferred per the reviewer's own option, 2 fixed here + 1 doc):
P2 (Codex new) — Malformed `edit` decisions weren't validated. Same
trust boundary as the `respond` fix in 6a93e55: hosts deserialize
resume payloads from untyped JSON, and `{ type: 'edit' }` (no
`updatedInput`), `{ type: 'edit', updatedInput: 'string' }`, or
`{ type: 'edit', updatedInput: [...] }` would feed garbage into
`applyInputOverride` and silently approve a tool with the wrong-shape
args. Fix: typeof + null + Array.isArray check before approving;
fail-closed via `blockEntry` with a diagnostic that includes the
offered shape (`<missing>` / `null` / `array` / typeof). Three
regression tests pin all three malformed cases.
Refactored both `respond` and `edit` diagnostic-formatting code into a
single `describeOfferedShape(value)` module-scope helper to drop the
nested-ternary lint warnings the new check introduced and keep the
diagnostic strings consistent across both decision branches.
P3 (Codex new) — `ToolNodeOptions.humanInTheLoop` JSDoc still
described the pre-default-on contract ("undefined keeps fail-closed").
Updated to mirror `RunConfig.humanInTheLoop`'s default-on language and
explicitly note the event-driven-only scope (the same caveat already
documented on the parallel `hookRegistry` field).
Architectural P2s deferred — Codex's reviewer explicitly offered this
path: "If this PR is only meant to unblock LibreChat's event-driven
tool approval path, they can be documented as out of scope."
- Direct tools (those routed through `directToolNames` like
handoffs/subagents) bypass HITL entirely.
- Mixed direct+event batches re-execute direct tools on resume
because LangGraph rolls back the entire ToolNode on `interrupt()`.
Both are real but require restructuring `ToolNode.run` to defer
direct execution until after HITL approval — non-trivial and better
done as a focused follow-up. Documented explicitly in
`HumanInTheLoopConfig`'s JSDoc with practical implications for hosts:
LibreChat-style event-driven hosts get the full surface; direct-tool
hosts need to switch to event-driven mode or accept that direct
tools aren't approval-gated; mixed batches with side-effecting
direct tools should not be combined with approval-gated event tools.
Skipped: P1 (malformed respond) — already fixed in 6a93e55. Codex
just hadn't re-evaluated yet.
Validation: 56/56 HITL tests pass (3 new), 644 across hooks/tools/
summarization, typecheck clean, lint clean (only pre-existing
ToolNode.ts:826 warning untouched), build clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* 🛡️ fix: preserve Graph sidecars across HITL interrupt + resume
Codex P1: after a clean HITL interrupt, `processStream` was wiping
the sidecars `Run.resume()` needs:
- `Graph.resetValues()` on the next invocation cleared
`toolCallStepIds`, `stepKeyIds`, `messageStepHasToolCalls`,
accumulated `messages`, etc.
- `Graph.clearHeavyState()` in the prior `finally` cleared
`toolCallStepIds`, `_toolOutputRegistry`, `sessions`,
`hookRegistry`, `humanInTheLoop`.
Result: when LangGraph re-entered ToolNode on resume, the lookup
in `dispatchStepCompleted`'s `toolCallStepIds.get(toolCallId) ?? ''`
returned `''`. The host then dispatched `ON_RUN_STEP_COMPLETED`
with an empty step id, and `stream.ts` dropped the completed tool
result. Specifically broken for LibreChat's stream-resume story
where the resumed approval path needs to complete the
already-rendered tool step.
The existing HITL tests didn't catch this because they replace
`run.graphRunnable` with a custom graph whose ToolNode owns its
own prefilled `toolCallStepIds` Map — disjoint from the SDK
Graph's, so the SDK Graph's clear didn't observably affect the
test ToolNode's lookups.
Two gates added:
1. `processStream` finally: skip `Graph.clearHeavyState()` when
`_interrupt != null && _haltedReason == null && !streamThrew`
— i.e., a clean interrupt awaiting resume. Halt-driven and
error-driven paths still clean up because no resume is expected.
2. `processStream` entry: skip `Graph.resetValues()` when entering
via `Command` (resume). We're continuing an in-flight run, not
starting fresh — the sidecars must survive the boundary.
Cross-process resume (host rebuilds Run from scratch) is a
separate concern handled host-side; documented in
`HumanInTheLoopConfig` JSDoc earlier. This fix covers the
same-Run-instance pause-and-resume path that's the SDK's primary
contract.
Two regression tests, paired:
- "preserves Graph sidecars across HITL interrupt + resume" wires
the test ToolNode to share `run.Graph.toolCallStepIds` by
reference (mirroring how `StandardGraph` builds its inner
ToolNode at Graph.ts:587). Pre-populates the map inside the
agent node (mirroring `attemptInvoke`'s timing). Asserts the
entry survives both the interrupt finally AND the resume entry,
AND that the resumed dispatch fires `ON_RUN_STEP_COMPLETED`
with the real step id (not `''`).
- "clears Graph sidecars on natural completion" pins the negative
case: a non-interrupt run still triggers `clearHeavyState`, so
this gate doesn't accidentally leak memory across runs.
Verified the regression tests catch the bug pre-fix by stashing
`run.ts` and re-running: the preservation test fails on
`expect(toolCallStepIds.has('call_1')).toBe(true)` exactly as
the reviewer described.
Validation: 58/58 HITL tests pass (2 new), 646 across hooks/tools/
summarization, typecheck/lint/build clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* 🚦 chore: flip HITL default to OFF until host UI ships
Hosts (notably LibreChat) don't yet render `tool_approval` interrupts,
so a default-on HITL surface would let `ask` decisions pause the run
with no resolver — surfaces to end users as a hung tool-call card.
Flipping the default to OFF keeps existing hosts on the pre-HITL
fail-closed path until they're ready to wire the resume UI.
Hosts opt in explicitly with `humanInTheLoop: { enabled: true }`. The
plan of record (documented on `HumanInTheLoopConfig`) is to flip the
default back to ON in a future minor once the consumer ecosystem is
ready end-to-end.
- `Run.applyHITLCheckpointerFallback`: only attach the in-memory
`MemorySaver` when `enabled === true`. Omitted/false leaves
`compileOptions.checkpointer` untouched.
- `ToolNode` ask-decision branch: collapse `ask` into a synchronous
block (the pre-HITL behavior) unless `enabled === true`.
- JSDoc on `HumanInTheLoopConfig`, `RunConfig.humanInTheLoop`, and
`ToolNodeOptions.humanInTheLoop` rewritten to lead with default-off
semantics + the migration plan.
- Tests: rename "default-on" assertion to "default-off blocks", and
flip the host-checkpointer-preservation test to thread `enabled: true`
explicitly (since default-off no longer engages the fallback path).
Bump to 3.1.75-dev.3.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix: Preserve DeepSeek same-run reasoning content * fix: Emit sanitized DeepSeek stream callbacks * fix: Harden DeepSeek thinking stream parsing * test: Cover DeepSeek reasoning edge cases * test: Cover DeepSeek streaming edge combinations
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.4)
Can you help keep this open source service alive? 💖 Please sponsor : )