[pull] main from danny-avila:main#77
Merged
pull[bot] merged 5 commits intoinnFactory:mainfrom May 6, 2026
Merged
Conversation
…ges (#141) * 🧵 feat: Recency-window splitter for compaction boundaries Introduces splitAtRecencyBoundary, a pure helper that partitions a message list into a head (older messages, eligible for summarization) and a tail (recent messages, preserved verbatim). Cuts strictly at HumanMessage boundaries so tool_use ↔ tool_result pairs are never split. The most recent user-led turn is always included in the tail, so a single oversized first message is never destroyed by summarization. Optional token cap (`tokens`) allows walk-back; tokenless usage falls back to turn-count only (`turns`). 13 unit tests cover default behavior, the disabled (turns: 0) path, the token cap, and degenerate inputs (no HumanMessage, trailing tool results, single message). The splitter is wired into the summarize node in a follow-up commit. * 🛡️ feat: Preserve recent turns during summarization The summarize node previously summarized every message and replaced them all with the LLM-generated checkpoint. This destroyed the user's most recent payload whenever pruning fired on a single big turn — see librechat#12940 ("Skip Summarization of First Turn"), where a 30K-token paste on turn 1 was silently replaced by a generic summary. Adds SummarizationConfig.retainRecent ({ turns?, tokens? }) with a default of 2 turns. Before invoking the LLM, the node splits the state at a recency boundary; only the head is summarized and the tail is re-injected via the messages reducer: return { messages: [createRemoveAllMessage(), ...messagesToRetain] } When the head is empty (e.g. fresh conversation with one big user message), the LLM call is skipped entirely, the trigger is marked consumed via markSummarizationTriggered, and no ON_SUMMARIZE_* events are emitted — the user's payload survives intact. Setting `retainRecent: { turns: 0 }` reverts to legacy behavior for callers who want the old "summarize everything" semantics. Test updates: - node.test.ts default agent context now uses { turns: 0 } so the 17 existing LLM-call tests continue exercising that path. - Four new node-level tests cover first-turn skip, single-turn skip including tool messages, head/tail summarization, and the legacy shape. * 🧪 chore: Live multi-provider script for recency-window verification Adds src/scripts/summarization-recency.ts that exercises both recency-window scenarios end-to-end against real provider APIs: 1. First-turn protection — a single oversized user message must not trigger ON_SUMMARIZE_START/COMPLETE. An empty_messages pruning error is treated as expected when the message exceeds the configured budget; we only fail if summarization fired. 2. Multi-turn compaction — across four padded turns, summarization must fire (proving the LLM path still works) while the recency tail is preserved verbatim. Verified across anthropic (claude-sonnet-4-5), openAI (gpt-5.4-mini), google (gemini-2.5-flash), bedrock (us.anthropic.claude-sonnet-4-5), openrouter (moonshotai/kimi-k2.6), and deepseek (deepseek-v4-flash). Run with the dotenv preload + override flag so AWS SDK module-init captures populated env vars: DOTENV_CONFIG_OVERRIDE=true node -r dotenv/config \ --loader ./tsconfig-paths-bootstrap.mjs \ --experimental-specifier-resolution=node \ ./src/scripts/summarization-recency.ts --provider all The script auto-skips providers whose credentials are absent, and defaults BEDROCK_AWS_REGION to AWS_DEFAULT_REGION (or us-east-1) when the dedicated knob is missing. * 🪡 fix: PostCompact reports retained-tail length, not always zero The PostCompact hook payload's `messagesAfterCount` was hard-coded to `0`, which was correct under the legacy "remove-all only" return shape but stale once the recency window started preserving a tail ([removeAll, ...messagesToRetain]). Threads the retained tail length through dispatchCompletionEvents so observers (metrics, cleanup logic) see the true post-compaction message count. When retainRecent.turns is 0, the value is still 0; when retainRecent.turns > 0 and a tail survives, it reflects the preserved-message count. compactHooks.test.ts: - Existing test now opts into { turns: 0 } via a helper-default arg to keep documenting the legacy shape. - New test asserts messagesAfterCount === 4 under the default retainRecent.turns: 2 with a 20-turn synthetic conversation (last 2 turns × 2 messages each = 4 retained). Reported by chatgpt-codex-connector on PR #141. * 🪡 fix: Keep masked tool payloads in retained recency tail The retained tail was sliced from `restoredMessages`, which has oversized ToolMessage content rehydrated by `restoreOriginalToolContent` for the summarizer's benefit. Reinjecting that restored content into state defeats the masking that prune applied earlier in the turn, bloats checkpoint size, and forces more expensive re-pruning on later turns. The summarizer still reads the restored head (full content for summary quality), but the tail now slices `state.messages` (masked form) at the same `tailStartIndex` returned by the splitter. Both arrays have identical length and structure since `restoreOriginalToolContent` only swaps content at specific indices, so the same boundary applies to both. Adds a node-level test that exercises the divergence: the summarizer sees the restored head content (`FULL_ORIGINAL_OUTPUT...`) while the returned tail keeps the masked stub (`masked-tail-stub`). Reported by chatgpt-codex-connector on PR #141. * 🪡 fix: Carry forward original tool content for retained-tail messages The previous fix kept the masked tail content in live state but eagerly cleared `agentContext.pendingOriginalToolContent`. When a later compaction moves those tail messages into its head, the new pruner (recreated by `setSummary` clearing `pruneMessages`) has no record of the original full content — so the summarizer sees only the masked stub, dropping fidelity for tool-derived facts. Two changes: 1. summarize/node.ts: capture `pendingOriginalToolContent` locally before restoration; on the summarize-fired path, after computing `tailStartIndex`, build a reindexed map containing only the tail-relevant entries (idx >= tailStartIndex shifted to start at 0) and store it back on AgentContext. Entries < tailStartIndex belonged to the just-summarized head and are correctly dropped. On the skip path, the captured value is left untouched — state is unchanged so its indices remain valid. 2. graphs/Graph.ts: merge the pruner's masking record into `pendingOriginalToolContent` instead of overwriting it. Both maps key off current-state indices, so a key-wise union is the right semantic. Without this, a fresh pruner that masks new content on a later turn would clobber the carry-over before the next summarize could read it. Tests: - New: `preserves tail-relevant pendingOriginalToolContent entries (reindexed) for future summaries` — runs summarize with a map that spans both head and tail indices, asserts only tail entries survive and are reindexed correctly. - New: `does not clear pendingOriginalToolContent on the skip path (state unchanged)` — confirms the same Map reference survives when summarization is skipped, so the next prune cycle still sees the originals. Reported by chatgpt-codex-connector on PR #141. * 🪡 fix: Cap merged tool-content map and align dedupe baseline with tail Two issues introduced by the recency-window carry-over plumbing: 1. **Pending-content size cap bypassed by merge** `createPruneMessages` enforces ORIGINAL_CONTENT_MAX_CHARS (~2 MB) inside its own `originalToolContent` map via the `onContentStored` callback, but the prior commit's key-wise union in Graph.ts bypassed that accounting. Long sessions with large tool outputs could grow `pendingOriginalToolContent` without bound across compaction cycles. Fix: extract `enforceOriginalContentCap` from prune.ts and re-apply it in Graph.ts after the merge. 2. **Dedupe baseline reset to zero after partial compaction** `dispatchCompletionEvents` calls `rebuildTokenMapAfterSummarization({})` which sets `_lastSummarizationMsgCount = 0` — correct for the legacy "remove-all only" shape, but stale once the recency window keeps a tail. With baseline at 0, a follow-up prune cycle on the unchanged tail re-evaluates the trigger from scratch and can loop into another summarize/skip cycle. Fix: explicitly call `markSummarizationTriggered(messagesToRetain.length)` after dispatch so the baseline matches the surviving tail. Tests: - New prune.test.ts cases: `enforceOriginalContentCap` is no-op below cap, evicts oldest entries (Map insertion order) above cap, and exposes the cap constant. - New node.test.ts case: after compaction with retainRecent.turns: 1 on a 2-turn conversation, `shouldSkipSummarization(2)` returns true (matches tail) and `shouldSkipSummarization(3)` returns false (state grew by 1). Reported by chatgpt-codex-connector on PR #141. * 🪡 fix: Make recency-split linearity self-evident + pin via test The recency-split inner loop's upper bound was the mutated `tailStartIndex`, which is decreased to the current `turnStart` at the end of each iteration. The slices are disjoint and the algorithm is O(messages_in_visited_turns), bounded by O(messages.length) — but the nested-loop-over-`messages` shape was triggering repeated quadratic flags in static analysis (Codex P3 → P1). Refactor the inner upper bound to `turnStarts[t + 1]`, derived from the immutable `turnStarts` array. The disjoint-range invariant is now obvious without having to trace `tailStartIndex` mutations across iterations. Add a comment explaining the strategy. Also add two tests that pin the runtime call count: - `calls tokenCounter once per message in visited turns` — 2,000-message history (200 turns × 10 messages), generous caps so every turn is visited. Asserts `tokenCounter` is called exactly `messages.length` times. Quadratic behavior would push it to ~400,000 calls. - `stops counting once the tokens cap is exceeded` — 50-turn history with a 10-token cap. Asserts call count ≤ 12 (5 included turns + 1 rejected turn × 2 messages each), confirming the loop short-circuits rather than scanning all 100 messages. Reported by chatgpt-codex-connector on PR #141.
* feat: add local execution engine
* feat(local-engine): hardening + ergonomics follow-ups
Adds the follow-up items flagged in the design comparison so the local
engine ships with a defensible default posture.
Security & correctness
- Bridge bearer token: per-Run 256-bit token; programmatic Python and
bash callers send `x-librechat-bridge-token`; server uses
`timingSafeEqual`. Stops cross-process listeners on the same loopback
port from invoking SDK tools.
- Symlink-aware workspace path: new `resolveWorkspacePathSafe()` walks
up to the nearest existing ancestor, realpaths it, and rejects
containment escapes — covers `write_file` to a brand-new path
without false positives on `/tmp -> /private/tmp` style ancestors.
- Read tool guards: stat-cap (default 10 MiB, configurable via
`local.maxReadBytes`), NUL-byte binary detection on the first 8 KiB
before reading the rest, returns descriptive stubs.
- Local engine warns once per process when it runs without
`@anthropic-ai/sandbox-runtime` wrapping so the no-sandbox default
stays loud in CI/server logs.
Validation
- New `bashAst` config: `'off' | 'auto' | 'strict'` heuristic
categorical-hazard pass over the quote-stripped command
(command substitution, zsh privileged builtins,
`/proc/<pid>/environ` reads, `IFS=` injection, hex-escape
obfuscation, `source $VAR`, plus `eval`/`exec` denied in strict).
Forward-compatible name for a future tree-sitter-bash AST drop-in.
Ergonomics
- File checkpointing: `LocalFileCheckpointerImpl` snapshots pre-write
contents for `write_file`/`edit_file` and rewinds them on demand.
`createLocalCodingToolBundle` returns the checkpointer alongside the
tools when `local.fileCheckpointing: true`.
- Pluggable spawn seam: `LocalExecutionConfig.spawn?: LocalSpawn` lets
callers route every command through SSH/containers/remote runners
without forking the engine.
- Ripgrep fallback: `grep_search` and `glob_search` now probe for `rg`
once per process and fall back to a Node-side walker + JS regex when
ripgrep isn't on PATH; the old hard requirement is gone.
Tests
- 20 new unit tests covering AST findings, sandbox-off warning latch,
checkpointer round-trip and file deletion, binary-file refusal,
oversize stub, symlink escape, ripgrep fallback, and bridge auth
rejection. Existing 109 tests still pass.
* feat(toolnode): fire PreToolUse/PostToolUse/PostToolUseFailure on direct path
Before this change, in-process tools added to `directToolNames` (every
graphTool with a real implementation, including the new local-engine
tools) skipped the hook lifecycle entirely. Only schema-only tools
that round-tripped through the host event-dispatch path actually fired
PreToolUse/PostToolUse/PostToolUseFailure/PermissionDenied. That meant
`createToolPolicyHook`, the new HITL `'ask'` flow, and PostToolUse
output rewriting all silently no-op'd on direct tools — including
every `bash`/`code`/`edit_file`/`write_file` from the local engine.
This adds `runDirectToolWithLifecycleHooks(call, config, batchContext)`
which wraps `runTool` with the same hook semantics
`dispatchToolEvents` uses for the single-call case, and routes the
three direct-call sites in `run()` through it. Behavior:
- Fast path: when the registry has none of PreToolUse / PostToolUse /
PostToolUseFailure registered for this run, falls through to
`runTool` with zero overhead. No regression for callers without
hooks.
- PreToolUse: pre-resolves args (matching the event path), fires the
hook with resolved args.
- `decision: 'deny'` → synthesizes a Blocked error ToolMessage and
fires PermissionDenied (observational).
- `decision: 'ask'` → fail-closed deny regardless of
`humanInTheLoop.enabled`. The event path handles 'ask' via
LangGraph's interrupt(), which requires restructuring the direct
invoke path; until that lands, returning 'ask' on a direct tool
blocks the call (strict improvement vs. ignoring the hook
entirely). A one-time warning fires when HITL is enabled, so
hosts notice the gap.
- `updatedInput` → applied to the call before runTool; runTool's
placeholder resolution is idempotent on already-resolved args.
- PostToolUse: when runTool returns a non-error ToolMessage, fires
PostToolUse and applies `updatedOutput` (preserving id/name/status).
- PostToolUseFailure: when runTool returns a status='error'
ToolMessage, fires the failure hook (observational); the original
error continues to propagate.
PostToolBatch participation across direct + dispatched outcomes is
intentionally out of scope: the batch hook accumulates inside
dispatchToolEvents and is fired at the end of its scope, and merging
direct-call outcomes into that aggregation crosses the two paths'
result-shaping logic. Left as a follow-up.
8 new unit tests in directToolHooks.test.ts cover deny+PermissionDenied,
ask fail-closed, allow, updatedInput, updatedOutput, PostToolUseFailure,
no-hooks fast path, and a mixed batch where one direct call denies and
the sibling runs.
Total non-API test count: 1163 -> 1171 passing.
* chore(scripts): add live-API smoke scripts for the local engine
Four scripts that exercise the local execution engine end-to-end with
a real LLM provider (matching the convention in src/scripts/*). Each
spins up a temporary workspace, runs the graph, then dumps observable
state so the behavior under test is visible in the console.
- local_engine.ts — baseline e2e: bash + write_file + read_file
+ edit_file + list_directory in a single
multi-step task. Sets `bashAst: 'auto'` so
the categorical-hazard pass is exercised.
- local_engine_ptc.ts — `run_tools_with_code` (Python) calling
write_file / read_file / edit_file through
the localhost bridge in a single call. The
bridge token auth (timingSafeEqual) is on
by default for every Run, so this also
exercises that path.
- local_engine_hooks.ts — direct-path hook integration: registers a
`createToolPolicyHook` that denies
write_file / edit_file plus an explicit
PostToolUse that prefixes every successful
tool result with "[reviewed]". Asks the
model to write then read; the write must
be blocked, the read must come back tagged.
Also asserts blocked.txt does not land on
disk.
- local_engine_checkpointer.ts — builds the local coding tools via
`createLocalCodingToolBundle({ fileCheckpointing: true })`
so the host keeps a handle on the
checkpointer, lets the model edit two
seeded files, then calls `rewind()` and
verifies originals come back. Exits non-zero
if rewind didn't restore both files.
npm scripts added for each: `local`, `local:ptc`, `local:hooks`,
`local:checkpointer`. All four typecheck clean (`tsc --noEmit`).
* fix(graph): pass hookRegistry on the non-event-driven path
Live integration test (`npm run local:hooks`) surfaced that hooks
weren't actually firing on local-engine tools when used with a plain
`graphConfig: { type: 'standard' }` (no `agentContext.toolDefinitions`).
Reason: `Graph.ts` only forwarded `hookRegistry` and `humanInTheLoop`
to ToolNode in the event-driven branch — the legacy non-event-driven
branch dropped both, so the policy hook silently no-op'd and a
`write_file` that the script's `createToolPolicyHook` was supposed to
deny landed on disk.
Two changes:
1. `StandardGraph.createToolNode` now passes `hookRegistry` and
`humanInTheLoop` in BOTH branches. Hooks are conceptually
orthogonal to whether tools dispatch as events or invoke in
process; only the event path was historically wired.
2. `ToolNode.run`'s non-event-driven `else` branch now invokes
`runDirectToolWithLifecycleHooks` instead of `runTool` directly,
so PreToolUse / PostToolUse / PostToolUseFailure / PermissionDenied
fire the same way they do on the event-driven mixed-batch path.
The helper still falls through to `runTool` on the no-hooks fast
path, so callers without hooks pay zero overhead.
Verified by re-running `npm run local:hooks`:
- PermissionDenied fired 1 time(s) for write_file with the policy
reason
- PostToolUse saw read_file (the allowed read got the [reviewed]
tag)
- blocked.txt landed on disk? false (expected: false)
Also re-ran the full non-API jest suite — 1171 passing, no regressions.
* feat(local-engine): coding-tool parity push (fuzzy edit, diff, BOM/EOL, overflow)
Live cross-comparison against claude-code, pi-mono, and opencode (sst)
surfaced four meaningful gaps in the foundation tools. This commit
closes them; LSP integration and apply_patch (multi-file) are larger
and stay as documented follow-ups.
Edit fuzzy matching
-------------------
Our edit_file required exact `oldText` match, so a single mistyped
space would force the LLM to re-read and retry. New
`editStrategies.ts` walks a four-stage chain — exact, line-trimmed,
whitespace-normalized, indentation-flexible — stopping at the first
strategy that locates EXACTLY ONE match. The matched on-disk slice is
then literally replaced with `newText`; we never modify newText. The
strategy that fired is surfaced in the tool result's `strategies`
metadata so callers can see when fuzzy fallback kicked in. Inspired
by opencode's nine-strategy chain; kept to the four highest-yield
strategies for a first cut, room for more (block-anchor + Levenshtein
etc.) as needed.
Unified-diff output on edit/write
---------------------------------
`edit_file` and `write_file` now embed a unified diff in the
ToolMessage so the model sees what actually changed instead of the
prior `Applied 1 edit(s)` blob. Uses npm `diff`'s
`createTwoFilesPatch` with 3 lines of context, capped at 4 KB so
large rewrites don't blow context.
BOM + line-ending preservation
------------------------------
New `textEncoding.ts`:
- `decodeFile` strips and remembers UTF-8 BOM and CRLF/LF newline.
- `encodeFile` reapplies them on write.
`edit_file` / `write_file` now read the file, decode to LF + remember,
operate, and re-encode on write. Verified by tests: editing a CRLF
file keeps CRLF; overwriting a BOM file keeps the BOM.
Bash output overflow → temp file
--------------------------------
`spawnLocalProcess` previously truncated overflowing output in place.
Now: when total bytes exceed 2× `maxOutputChars`, the FULL
stdout/stderr is written to a temp file (`/tmp/lc-local-output-*.txt`)
and the path is appended to the formatted output as
`full_output_path: …`. The model can then `bash tail -n N <path>` (or
`grep`) to inspect specifics it actually needs. Mirrors pi-mono's and
opencode's overflow handling.
New deps
--------
- `diff@9` and `@types/diff@7` — the canonical TS diff lib opencode
uses too, MIT-licensed.
Tests
-----
Added unit tests for each new behavior — 5 new cases for fuzzy
matching, diff embedding, CRLF preservation, BOM preservation. Ran
the full non-API jest suite: 1176 passing (was 1171). Ran all four
live integration scripts (`local`, `local:hooks`, `local:checkpointer`,
`local:ptc`) end-to-end against the real Anthropic API; all four
green and the diff payload now shows up in the model-visible tool
results.
* feat(local-engine): inline image/PDF attachments in read_file
Closes the last "Close now" parity gap: read_file can now return image
and PDF files as inline `MessageContentComplex[]` content blocks so
vision-capable models actually see the bytes, instead of getting our
old "binary file" stub.
New module
----------
src/tools/local/attachments.ts
- Magic-byte sniff for the five formats we care about (PNG, JPEG,
GIF, WebP, PDF). Inlined the signatures rather than pulling in
`file-type` (ESM-only, awkward under ts-jest); LibreChat's
api/server/utils/files.js takes the same approach but uses the
package since it's a CJS-OK Node server.
- Returns a typed Attachment union that the read tool branches on:
image -> image_url block, pdf -> image_url block (for Anthropic's
PDF-as-data-url path), oversize -> stub, binary -> stub,
text-or-unknown -> falls through to the existing line-window read.
- Configurable via two new fields on `LocalExecutionConfig`:
- `attachReadAttachments`: 'off' (default) | 'images-only' |
'images-and-pdf'. Defaults `'off'` to preserve current behavior;
hosts opt in when their model is vision-capable.
- `maxAttachmentBytes`: pre-encoding size cap (default 5 MiB) that
bounds the post-base64 token cost.
Plumbing
--------
- read_file branches on attachment.kind. For an image, it returns
`[ [textBlock, imageUrlBlock], { artifact } ]`. LangChain's
Anthropic adapter (verified at
node_modules/@langchain/anthropic/dist/utils/message_inputs.js)
preserves image_url blocks inside `tool_result` content arrays;
OpenAI Chat Completions accepts them on vision models too. The
Responses API flattens to text on the wire (function_call_output
is string-only) which is the safe degrade.
- The textBlock that goes alongside the image_url carries the
filename/mime/byte count so non-vision models still get useful
signal.
Tests
-----
4 new unit cases:
- default behavior: still returns the binary stub
- images-only: returns array content with `image_url` data URL +
artifact { mime: 'image/png', attachment: 'image' }
- oversize: caps embedding past `maxAttachmentBytes` -> "Refusing to
embed" stub
- text files unaffected when embedding is enabled
29 LocalExecutionTools tests pass; 1180 non-API tests overall.
Live verification
-----------------
New `npm run local:image` script:
1. Copies a real PNG (the macOS Certificate Assistant droppedImage)
into a temp workspace.
2. Asks the agent to `read_file` it and describe what's in it.
3. Inspects the run's ToolMessage to confirm the content is
`[textBlock, imageBlock]` with a `data:image/png;base64,...` URL.
Live run against real Anthropic Sonnet:
ToolMessage(name=read_file, content=[text,image_url] url=data:image/png;base64,iVBORw0KGgo…)
Image attachment landed in tool result: true ✔
ASSISTANT: "The image shows a blank certificate template with a
light blue border. The word 'Certificate' appears at the top in an
elegant, cursive script font. There's a decorative flourish or
underline beneath the text, and a gold/yellow seal or badge
decoration in the lower right portion..."
That's the actual content of the source PNG — vision is wired
through end-to-end.
* feat(local-engine): post-edit syntax check + compile_check tool
The two cheap "tell-the-agent-when-it-broke-the-file" alternatives I
called out as the pragmatic substitute for full LSP integration. Both
are opt-in.
1. Post-edit syntax check
-------------------------
New `local.postEditSyntaxCheck: 'off' | 'auto' | 'strict'`
(default `'off'`).
After every successful `edit_file` / `write_file`, runs a fast
per-file syntax checker keyed on extension and appends any error to
the tool result so the model self-corrects on the next turn without
a separate read round-trip:
- `.js` / `.mjs` / `.cjs` / `.jsx` -> `node --check <file>`
- `.py` / `.pyw` -> `python3 -m py_compile`
- `.json` -> `JSON.parse` (in process)
- `.sh` / `.bash` -> `bash -n <file>`
`'auto'` appends `[syntax-check warning ...]` to the tool result and
returns success. `'strict'` throws the error so the next turn must
react. Each checker probes once for tool availability and silently
skips if missing.
TypeScript per-file syntax check is intentionally not in this list:
per-file `tsc` is slow and only catches a small fraction of TS
issues without type information. Use `compile_check` (below) for that.
2. compile_check tool
---------------------
New tool auto-bound when `engine: 'local'` and `includeCodingTools`
is on. Lets the agent ask "did my change break anything?" without us
shipping a real LSP client.
Auto-detection (first hit wins):
- tsconfig.json or package.json[typescript] -> `npx --no-install tsc --noEmit`
- Cargo.toml -> `cargo check --message-format=short`
- go.mod -> `go vet ./...`
- pyproject.toml/setup.py with mypy -> `python3 -m mypy .`
- pyproject.toml/setup.py without mypy -> `compileall .`
- none of the above -> tells the agent "no toolchain"
Honours `local.compileCheck.command` and `compile_check.command` arg
overrides; honours `local.compileCheck.timeoutMs` (default 120s).
Output is truncated through the standard local-engine output cap and
spills overflow to a temp file like the bash tool does. Returns
exit-code-aware `PASSED`/`FAILED` headline + body.
Tests
-----
8 new unit cases:
- JS/JSON broken/valid round-trip
- returns null for unknown extensions
- write_file appends `[syntax-check warning ...]` in `auto`
- write_file in `strict` throws
- compile_check reports "no recognised project marker" in an empty workspace
- compile_check honours an explicit command override and reports exit codes
37 LocalExecutionTools tests pass; 1188 non-API tests overall.
Live integration
----------------
New `npm run local:compile` script that:
1. Seeds a tiny TS workspace (tsconfig.json + index.ts).
2. Asks Anthropic Sonnet to write a broken JS file (post-edit
syntax-check warns) -> fix it -> write a broken TS file with a
real type error -> call compile_check (must FAIL with
`error TS2322`) -> fix it -> call compile_check again (must
PASS).
Live run, end-to-end:
- `[syntax-check warning via node --check]` appended to write_file
- First compile_check: FAILED via `npx --no-install tsc --noEmit`
(exit=2): `broken.ts(1,14): error TS2322: Type 'string' is not
assignable to type 'number'.`
- Second compile_check (after edit_file fix): PASSED
- Both assertions ✔.
Other live scripts (local, local:hooks, local:checkpointer,
local:ptc, local:image) still green.
* chore(scripts): side-by-side comparison harness vs pi-mono
`npm run compare:pi` runs four identical tasks against pi-mono's CLI
and our local engine in parallel temp workspaces using the same model
(claude-sonnet-4-5 by default). Captures: tool-call shape, wall time,
input/output/cache tokens, and verifies the workspace ended in the
expected state.
Tasks
-----
- T1 simple-edit : single literal substitution in greet.py
- T2 fuzzy-edit : edit a file with trailing whitespace + tabs;
exercises both agents' fuzzy-match recovery
- T3 syntax-error-fix : pre-seeded broken JS; ours has post-edit
syntax check, pi has to discover via bash
- T4 type-error-fix : pre-seeded TS file with a real type error in
a tiny tsconfig project; ours has
compile_check, pi has bash + tsc
Pi runner spawns `node $PI_BIN --print --mode json --no-session
--provider anthropic --model <model>` and parses its JSON event
stream. Ours uses Run.create with `engine: 'local'` and walks the
final conversation. PI_BIN defaults to
~/Projects/pi-mono/packages/coding-agent/dist/cli.js.
Live results (real Anthropic, Sonnet 4.5)
-----------------------------------------
task metric pi ours
T1 simple-edit verify ✔ ✔
wall 8.4s 11.0s
tool calls 2 2
output tok 313 281
T2 fuzzy-edit verify ✔ ✔
wall 16.1s 7.2s
tool calls 2 2
output tok 384 246
T4 type-error-fix-loop verify ✔ ✔
wall 23.5s 13.6s
tool calls 6 4
output tok 805 397
T3 syntax-error-fix verify ✔ ✔
wall 16.2s 11.7s
tool calls 3 4 (note: ours used 1
bash, pi used 2)
output tok 371 503
Both agents 4/4 verify ✔. Ours wins wall-time on 3/4 tasks, output
tokens on every task, and tool-call count on T4 (compile_check
short-circuits the bash+read+bash loop pi did).
Open follow-up surfaced by the harness: pi reports thousands of
cacheRead tokens per turn (Anthropic prompt cache rebate), ours
reports zero. We're paying full rate for the system+tools prefix on
every turn. Not a regression introduced here, but worth its own PR
to wire prompt caching on the Anthropic adapter.
* chore(scripts): compare:pi — caching enabled, +2 tasks, +variance
Updates to the side-by-side harness based on first-run findings.
1. Anthropic prompt caching wired on our side
-----------------------------------------------
The first comparison reported ours' cache_read=0 — pre-existing PR
clarification: caching IS supported (`addCacheControl` + agent
context's `hasAnthropicPromptCache()`), but it's gated on
`clientOptions.promptCache === true`. The harness's first cut set
that on `graphConfig.clientOptions`, but `Run.createLegacyGraph`
(src/run.ts:156) destructures `llmConfig` and rebuilds
`agentConfig.clientOptions` from it — `graphConfig.clientOptions`
is silently dropped on the standard path. So in this harness we
set `promptCache: true` on the llmConfig itself.
After the fix, ours reports real cache_read / cache_creation per
turn. Caching works.
2. New tasks
------------
- T5 multi-file-rename : rename `calc_total` → `calculateTotal` across
src/lib.ts, src/index.ts, src/index.test.ts. Tests how each agent
finds + applies the rename.
- T6 image-read-and-describe (ours-only): copies a real PNG into the
workspace (macOS Certificate Assistant icon) and asks the model to
describe it. Exercises our `attachReadAttachments: 'images-only'`
end-to-end. pi has no equivalent and is skipped via the new
`skip: 'pi'` field.
3. Variance support
-------------------
COMPARE_ITERS=N runs each task N times per side and reports the
mean. Skipped tasks count cleanly (no false N/A entries). The
verify check now force-fails when a runner errored, so a soft
verify (e.g. T6's "file-still-on-disk" check) can't mask a real
provider rejection.
4. Cost computation
-------------------
We now compute our cost from the same per-turn breakdown using
Sonnet 4.5 pricing ($3/$15 per Mtok input/output, $3.75 cache write,
$0.30 cache read), so the cost columns are directly comparable.
Live results (N=2)
------------------
Functional: pi 10/10 ✔, ours 12/12 ✔ (T6 ours-only adds 2).
Wall time: ours wins all 5 shared tasks (~30% faster on average).
Tool calls: ours fewer or equal on every shared task.
Output tokens: ours fewer on every shared task.
Cost: ours is ~6× more expensive per task even with caching on.
Why the cost gap survives caching
---------------------------------
Per-turn "input new" tokens:
pi (T1): 35 ours (T1): 28298
pi (T4): 76 ours (T4): 48630
Pi only sends the new turn payload (~35 tokens); ~the entire system
prefix and tool defs are cache-hits. Ours sends ~28k tokens of new
input per turn even with caching on, which strongly suggests our
cache prefix isn't covering tool definitions, OR the breakpoint
position isn't matching across turns. Worth a dedicated PR to tune
the cache breakpoints in `addCacheControl` / `AgentContext.systemRunnable`
so tool defs sit inside the cached prefix; this will close the cost
gap considerably.
The harness made this gap impossible to miss — the entire reason for
the cross-comparison.
* feat(cache): tool-prefix cache breakpoint + harness accounting fix
Two related changes; second one was the bigger win.
1. Tool-prefix cache breakpoint (Anthropic)
-------------------------------------------
New `partitionAndMarkAnthropicToolCache` helper at
src/messages/anthropicToolCache.ts. When `clientOptions.promptCache
=== true` and the provider is Anthropic, we now:
- Stable-partition the bound tool list into [static, deferred] using
`defer_loading` from `agentContext.toolDefinitions`.
- Stamp `cache_control: { type: 'ephemeral' }` on the LAST static
tool via the `extras` field LangChain's Anthropic adapter
forwards (`AnthropicToolExtrasSchema`).
This way, the cached prefix covers exactly the static tool inventory.
Discovered deferred tools that arrive across turns sit *after* the
breakpoint and don't invalidate the prefix.
The wrap is non-mutating (clones with `Object.create(getProto…)` so
the original tool reference is untouched) and idempotent. 10 unit
tests in `src/messages/__tests__/anthropicToolCache.test.ts` cover
partition order, stamp behaviour, prototype preservation, extras
merging, and idempotency.
Wired in `Graph.ts` right after `resolveLocalToolsForBinding`, gated
on Anthropic + `promptCache: true`.
In practice the impact is smaller than I expected because
`addCacheControl` on the last user message ALREADY creates a cache
breakpoint that covers the tools+system prefix as part of the
cumulative cached prefix. The tool-level breakpoint becomes load-
bearing in cases where:
- A turn arrives without a fresh user message (resume after tool
call only),
- The user message would otherwise blow the cache breakpoint, or
- Cache budget needs to be split across multiple breakpoints to fit
inside Anthropic's 4-breakpoint cap.
Still correct; still worth shipping; just not a 6× win on its own.
2. Harness accounting fix (the actual ~6× swing)
------------------------------------------------
`compare:pi`'s previous run reported ours' input tokens as ~28k–48k
per turn vs pi's ~35–80, suggesting a 6× cost gap that survived
caching. That number was wrong.
LangChain's Anthropic adapter at
src/llm/anthropic/utils/message_outputs.ts:31 reports
`usage_metadata.input_tokens` as the SUM of uncached input +
cache_creation + cache_read — i.e., the total prompt size, NOT the
uncached portion. Pi reports `usage.input` as uncached only. So our
"input new" column was double-counting cached tokens.
The harness now subtracts `cache_read + cache_creation` from
`input_tokens` when computing our "input new", so the column is
apples-to-apples with pi's `input` field.
Recomputed live results (N=2, caching on, real Anthropic):
task pi cost ours cost delta
T1 simple-edit $0.0158 $0.0171 +8%
T2 fuzzy-edit $0.0157 $0.0173 +10%
T3 syntax-fix $0.0210 $0.0248 +18%
T4 type-error-loop $0.0325 $0.0292 -10% ← we win
T5 multi-file $0.0264 $0.0312 +18%
T6 image (ours-only) — $0.0158
Functional: pi 10/10 ✔, ours 12/12 ✔.
Wall time: ours wins T3/T4/T5 (most-complex tasks), pi wins T1/T2.
Tool calls: ours wins or ties every shared task.
Output tokens: ours wins every shared task.
Cost: pi ~10–20% cheaper on simpler tasks, ours wins the most
complex one (T4).
We're effectively at parity. The "we're 6× more expensive" claim
posted earlier was the harness error.
* feat(local-engine): workspace + exec seams + HITL on direct path
Three changes that fit together — first two answer the "what about a
second engine" + "what's the workspace" questions; the third was the
last gap blocking workspace-policy `'ask'` from working end-to-end.
1. WorkspaceFS exec seam (engine reusability)
---------------------------------------------
New `WorkspaceFS` interface in src/tools/local/workspaceFS.ts (read,
write, stat, readdir, mkdir, realpath, unlink, open) with a
`nodeWorkspaceFS` default. Every file-touching factory in the local
coding suite (read_file, write_file, edit_file, grep_search,
glob_search, list_directory, file checkpointer, binary detection
helper) now routes through this interface.
A future engine — stateful remote sandbox, in-memory test FS, ssh
jail — supplies its own `WorkspaceFS` and inherits every tool's
behavior (fuzzy-match edit, BOM/EOL preservation, syntax check,
checkpointer, attachment classifier) for free. Same story for
process spawning: the existing `local.spawn` is now nested under
`local.exec.spawn` with the legacy top-level field still honoured.
The seams are wired through three new resolvers:
- `getWorkspaceRoots` — canonical root + additionalRoots
- `getReadRoots`/`getWriteRoots` — split allow-outside knobs
- `getWorkspaceFS` — `local.exec.fs` ?? nodeWorkspaceFS
- `getSpawn` — `local.exec.spawn` ?? local.spawn ?? Node
Plus 7 unit tests in src/tools/__tests__/workspaceSeam.test.ts that
verify additionalRoots, the new allowReadOutside/allowWriteOutside
split, legacy `cwd` + `allowOutsideWorkspace` back-compat, and that
a custom WorkspaceFS actually receives the file tool calls.
2. First-class workspace boundary + policy hook
-----------------------------------------------
Nested `local.workspace` config:
workspace: {
root: string,
additionalRoots?: readonly string[], // monorepo: extend boundary
allowReadOutside?: boolean, // split from write
allowWriteOutside?: boolean,
}
Back-compat: top-level `cwd` and `allowOutsideWorkspace` still work
and map to `workspace.root` / both `allowOutside*` flags.
`resolveWorkspacePathSafe(path, config, intent)` now takes a `'read'`
| `'write'` intent so reads and writes can have different allow-
outside semantics. Default callers pass `'write'` for stricter
clamping; read tools (read_file, grep_search, glob_search,
list_directory) explicitly pass `'read'`.
New `createWorkspacePolicyHook` in src/hooks/createWorkspacePolicyHook.ts
returns a `PreToolUse` callback that:
- inspects each tool call's input via per-tool path extractors
(defaults cover the local-engine coding suite; hosts can
override / extend),
- returns `allow` for in-workspace paths (incl. additionalRoots),
- returns `'ask'` (default) / `'allow'` / `'deny'` for outside
paths, with separate read/write policies.
This is exactly the user-visible "ask once / always" semantic
claude-code and opencode have, but built on the existing
PreToolUse + HITL machinery instead of a parallel permissions API.
14 unit tests in src/hooks/__tests__/createWorkspacePolicyHook.test.ts.
3. HITL interrupt() on the direct path (the missing piece)
----------------------------------------------------------
The previous direct-path hook commit fail-closed `'ask'` decisions
because the interrupt machinery only existed in the event-dispatch
path. The workspace policy hook above wants to use `'ask'` for
real, so this commit lifts the gap.
`runDirectToolWithLifecycleHooks` now:
- When `humanInTheLoop.enabled === true` and PreToolUse returns
`'ask'`: builds a single-tool `tool_approval` payload and raises
a real LangGraph `interrupt()` (anchored to the node's
RunnableConfig the same way `dispatchToolEvents` does — ToolNode
disables LangSmith tracing so the AsyncLocalStorage frame must
be re-established).
- On resume:
- `approve` runs the tool with the (possibly hook-rewritten) args
- `reject` blocks via the new `blockDirectCall` helper
- `respond` returns the host-supplied `responseText` as a
synthetic success ToolMessage (no tool execution)
- `edit` re-runs the tool with the host-edited args
- When HITL is off: collapses to a fail-closed deny (matches the
rest of the SDK's HITL-disabled default). One-time warning
logged so hosts notice the gap.
- `allowedDecisions` enforcement matches the event path — a
decision type outside the policy's allowlist is fail-closed.
`blockDirectCall` is the shared helper that synthesises the Blocked
ToolMessage and fires `PermissionDenied`, used by every deny path
(deny decision, fail-closed ask, reject, allowedDecisions violation,
malformed respond).
Updated the existing fail-closed-on-ask test in
directToolHooks.test.ts to assert the new HITL-disabled behaviour
explicitly.
Live verification
-----------------
New `npm run local:workspace` script:
- workspace.root = temp dir
- additionalRoots = sibling temp dir
- createWorkspacePolicyHook with outsideRead/outsideWrite='ask'
- humanInTheLoop.enabled = true
- asks the model to read 3 files: in-workspace, sibling, outside
Live result against real Anthropic Sonnet:
✔ inside.txt → workspace policy 'allow' → read INSIDE
✔ sibling.txt → 'allow' (additionalRoots) → read SIBLING
✔ outside.txt → 'ask' → INTERRUPT raised with action_request
carrying the path → script auto-approves
→ tool runs → read SECRET
Final: HITL interrupts handled = 1, successful tool messages = 3
The model didn't even need its "retry if blocked" instruction — HITL
handled the approval transparently.
Functional + non-API tests: 1245 passing (was 1224; +21 new across
workspace seam + workspace policy hook + the updated direct-tool
ask test).
* feat(common): canonical tool-name constants for local coding tools
LibreChat's `getToolIconType` (client/src/components/Chat/Messages/
Content/ToolOutput/ToolIcon.tsx) special-renders four of our tools
by exact name match: `bash_tool`, `read_file`, `execute_code`,
`run_tools_with_code`. We already use those names, so they pick up
the right icon end-to-end.
The newer local-engine tools (`write_file`, `edit_file`,
`grep_search`, `glob_search`, `list_directory`, `compile_check`)
were defined as per-file `Local*ToolName` consts — string literals
that would silently drift if anyone renamed them in only one place,
and that no consumer UI could discover programmatically.
Promotes them to `Constants.*` (single source of truth) and adds
two grouped lists alongside the workspace types so consumers can
match against canonical strings:
- `Constants.WRITE_FILE` ('write_file')
- `Constants.EDIT_FILE` ('edit_file')
- `Constants.GREP_SEARCH` ('grep_search')
- `Constants.GLOB_SEARCH` ('glob_search')
- `Constants.LIST_DIRECTORY` ('list_directory')
- `Constants.COMPILE_CHECK` ('compile_check')
- `LOCAL_CODING_TOOL_NAMES` — the 7 local-only tools (the 6 above
plus READ_FILE)
- `LOCAL_CODING_BUNDLE_NAMES` — the local-only set + the bash/code/
PTC pair the bundle wraps around
the existing factories
The per-file `Local*ToolName` and `CompileCheckToolName` exports are
retained as back-compat aliases pointing at the canonical Constants
so existing consumers keep working.
Wired through:
- `createWorkspacePolicyHook` default path extractors now key off
`Constants.*` (was string literals).
- `LocalCodingTools.ts` Local*ToolName aliases collapsed into
`Constants.*` references.
- `CompileCheckTool.ts` likewise.
5 new unit tests in localToolNames.test.ts pin:
- per-file aliases match Constants
- canonical strings match the wire-level expectations LibreChat
(and other consumer UIs) special-case
- LOCAL_CODING_BUNDLE_NAMES matches what `createLocalCodingTools`
actually emits
- LOCAL_CODING_TOOL_NAMES matches what
`createLocalCodingToolDefinitions` emits
- bundle list is a strict superset of the local-only list
Net effect: when LibreChat (or any other consumer UI) eventually
adds icons for write_file / edit_file / grep_search / glob_search /
list_directory / compile_check, the wiring picks up automatically
without an SDK change. And renames from one side without the other
get caught at test time.
Tests: 1250 passing (was 1245; +5 new).
* fix(local): three Codex P2 review nits — bash args, rg cache, root-relative extras
Three real issues Codex flagged on the PR. Each is small but
behavior-affecting; tests pin each.
#1 — `executeLocalCode` dropped `args` when lang was bash
--------------------------------------------------------
For every other runtime (py, js, ts, php, go, rs, c, cpp, java, r,
d, f90), `getRuntimeCommand` appends `input.args` as positional
parameters. The `lang === 'bash'` short-circuit was calling
`executeLocalBash(input.code, config)` and silently discarding
`args`, so `{lang:'bash', code:'echo $1', args:['hi']}` printed an
empty string instead of `hi`.
Fix: when lang is bash AND args is non-empty, route through a new
`executeLocalBashWithArgs` helper that uses the standard
`bash -lc <code> -- <args...>` form so `$1`, `$2`, … resolve. Same
AST validation as the no-args path. The plain `executeLocalBash`
(used by the `bash_tool` factory itself) is unchanged.
Test: `codex review fixes › executeLocalCode bash args › passes
input.args as positional shell parameters when lang is bash`.
#2 — ripgrep availability cache was process-global
--------------------------------------------------
With per-Run `local.exec.spawn` backends (the seam added for the
future remote-engine), a single `let cachedRgAvailable` would let
Run A's "rg works" verdict poison Run B whose backend (e.g. a
remote sandbox without rg) doesn't have it. Run B would skip the
probe, try to spawn rg, throw, and never reach the Node fallback.
Fix: cache is now a `WeakMap<LocalSpawn, Promise<boolean>>` keyed
on the effective backend (whatever `getSpawn(config)` returns).
WeakMap lets disposed backends GC their entry; the test reset hook
re-creates the map.
Test: `codex review fixes › ripgrep cache backend scope › does not
bleed an "rg available" verdict from one backend to another`.
#3 — `additionalRoots` resolved against process.cwd, not workspace root
----------------------------------------------------------------------
With `workspace: { root: '/repo/app', additionalRoots: ['../shared'] }`,
the intended boundary is `/repo/shared`. The previous
`path.resolve(extra)` anchored to the process cwd, producing
`${process.cwd()}/../shared` — completely different on a server
with a different cwd, and could deny the legitimate sibling or
allow an unrelated directory.
Fix: `getWorkspaceRoots` and `createWorkspacePolicyHook` now both
do `isAbsolute(extra) ? resolve(extra) : resolve(root, extra)`.
Behaviour matches what users would expect from a "monorepo
sibling" config.
Test: `codex review fixes › additionalRoots resolved against
workspace root › treats relative additionalRoots as siblings of
root, not of process.cwd`.
Tests: 1254 passing (was 1250; +4 new). Lint + tsc clean.
* docs(hitl): unstale the "event-driven only" caveats; pin resume scope
Three JSDoc blocks were honest at PR #134 ship time but went stale
once `6122460` (hooks fire on the legacy non-event-driven path) and
`caed7a2` (HITL interrupt() lifted into the direct path) landed:
- `HumanInTheLoopConfig` — "Scope: event-driven tools only" header
plus the bullet declaring direct-path tools "bypass the hook
system entirely. PreToolUse hooks do not fire for those tools and
HITL approval does not gate them." Both halves are now false.
- `ToolNodeOptions.hookRegistry` — "Only fires for event-driven
tool calls; tools routed through directToolNames bypass hook
dispatch entirely." Now false.
- `ToolNodeOptions.humanInTheLoop` — "the interrupt path is only
wired into the event-driven dispatch (dispatchToolEvents), not
into directToolNames execution — direct tools bypass HITL
entirely." Now false.
Also re-evaluated the "mixed direct + event batches re-execute the
direct half on resume" caveat. New tests in
`directToolHITLResumeScope.test.ts` pin the actual scope:
- When a single tool interrupts via the direct path, its body
runs **once** total. The first pass interrupted before reaching
runTool; the resume pass ran the body after the host's decision
was applied. The PreToolUse hook fires twice (the documented
idempotency caveat).
- When a sibling tool already ran in the same batch before the
interrupting tool, the sibling's body runs **twice** — once on
each pass. This applies symmetrically to event-dispatched and
direct siblings; LangGraph rolls back the entire ToolNode
invocation, not just the direct half.
Updated wording reflects both: HITL spans both paths uniformly, and
the side-effect warning applies to "any tool with side effects in a
batch where another tool interrupts" rather than singling out direct
tools. Hosts that want a tool to skip HITL must omit it from any
registered matcher rather than relying on the path it takes.
Tests: 1256 passing (was 1254; +2 new for resume-scope pinning).
* fix(local): two more Codex review nits — output OOM cap + bash_tool args
#4 P1 — `spawnLocalProcess` could OOM the host on noisy commands
----------------------------------------------------------------
Previous implementation accumulated every stdout/stderr chunk into
in-memory strings and only truncated post-`close`. A noisy command
(`yes`, `cat /dev/urandom | base64`, a verbose build) could stream
gigabytes before the cap kicked in — production-stability risk
since local tools execute arbitrary shell.
Fix: stream-time cap with three layers, all in `spawnLocalProcess`:
1. Per-stream in-memory buffer capped at `maxOutputChars * 2`.
Past that, chunks divert to a lazy temp file (the spill).
2. The spill file is seeded with whatever was already buffered so
it holds the FULL output (head from memory + tail from stream),
not just the post-cap remainder.
3. New `local.maxSpawnedBytes` config (default 50 MiB) hard-kills
the process tree once total stdout+stderr crosses the cap.
Independent from `maxOutputChars` (which only affects what the
model sees) — this is the OOM backstop.
`finish()` now waits for the spill stream to flush before resolving,
so the model never sees `full_output_path: …` for a still-being-
written file.
Tests pinned: `codex review fixes (round 2) › streaming output cap`:
- `yes` gets killed in ~200ms when capped at 64 KiB (vs the 30s
timeout that would otherwise apply)
- 200 KiB output with an 8 KB inline cap successfully spills; the
file holds more than the in-memory truncation
- small outputs do NOT create a spill file (no perf regression)
#5 P2 — `bash_tool` factory broke positional shell parameters
-------------------------------------------------------------
Same shape as the previous `executeLocalCode` bash-args fix
(`9b5fb85`), but in the `bash_tool` factory itself. It was
appending args literally to the command string:
`${command} ${args.map(shellQuote).join(' ')}`
So `command: 'echo "$1"'` with `args: ['hi']` ran
`echo "$1" hi` — `$1` was empty, `hi` got appended literally.
Tool schema advertised args as execution arguments; the actual
behavior didn't match.
Fix: route through the `executeLocalBashWithArgs` helper (now
exported from LocalExecutionEngine) which uses
`bash -lc <command> -- <args...>` so `$1`, `$2`, … resolve
correctly. Falls back to the no-args `executeLocalBash` when args
is empty (no behavior change for that case).
Test: `bash_tool args › populates positional shell parameters from
input.args` (and a second case for missing args).
Tests: 1261 passing (was 1256; +5 new).
* fix(local): three more Codex nits — config.shell, probe cache scope, grep -e
#6 P1 — validateBashCommand hard-coded DEFAULT_SHELL
----------------------------------------------------
The `bash -n -c <command>` syntax preflight was hard-coded to
DEFAULT_SHELL ('bash' on POSIX, 'bash.exe' on Windows), so a Run
configured with `local.shell: '/bin/sh'` (or pointing at zsh, dash,
or any host where bash isn't available) would get its commands
syntax-validated against a different binary than the one that would
actually execute them. Valid commands could be rejected before
execution; commands relying on bashisms could be falsely accepted
on hosts without bash.
Fix: route the syntax preflight through `config.shell ?? DEFAULT_SHELL`
so validation uses the same binary the actual execution path uses.
Test: `codex review fixes (round 3) › validateBashCommand honours
configured shell › routes the -n preflight through local.shell when
set`. Intercepts the spawn calls and asserts the syntax check used
`/bin/sh`, not the DEFAULT_SHELL fallback.
#7 P2 — syntax-check probe cache was process-global
---------------------------------------------------
Same shape as the ripgrep-cache fix in `9b5fb85`: per-Run
`local.exec.spawn` backends mean a "node available" / "python
available" / "bash available" verdict from one backend's probe
could poison subsequent Runs whose backend (e.g. a remote sandbox
that has python but not node) doesn't have those tools.
Fix: cache is now `WeakMap<LocalSpawn, ProbeCache>` keyed on the
effective spawn backend (whatever `getSpawn(config)` returns).
Disposed backends GC their entry; the test reset hook re-creates the
map. Mirrors the ripgrep cache exactly.
Test: `codex review fixes (round 3) › syntax-check probe cache is
backend-keyed › does not bleed an "rg/node/python available" verdict
from one backend to another`.
#8 P2 — grep pattern parsed as flag when dash-prefixed
------------------------------------------------------
The `grep_search` tool sent `input.pattern` as a positional arg to
`rg`. A pattern like `-foo` got parsed as an unknown flag and rg
bailed out instead of searching for it. `rg --help` explicitly
requires `-e/--regexp` (or `--`) for dash-prefixed patterns.
Fix: pass the pattern via `-e <pattern>`. Same trick also defends
against any future flag-conflict if a user query happens to look like
an rg long option. The Node fallback path doesn't need this — JS
RegExp accepts dash-prefixed patterns natively.
Test: `codex review fixes (round 3) › grep passes pattern via -e ›
handles dash-prefixed patterns without rg interpreting them as flags`.
Tests: 1264 passing (was 1261; +3 new).
* fix(local): two more P1 Codex nits — quoted destructive + symlink-escape
#9 P1 — dangerous-command gate missed quoted targets
----------------------------------------------------
`validateBashCommand` checks `dangerousCommandPatterns` against
`normalized` — the post-`stripQuotedContent` form. That pass blanks
the contents of every quoted span, so destructive targets that the
LLM (accidentally or not) wraps in quotes — `rm -rf "/"`,
`rm -rf "$HOME"`, `chmod -R 777 "/"` — slip past the regex even
though they execute identically to the bare form. Real safety
regression on the default-on guard.
Fix: split the destructive check into two passes.
1. The existing `dangerousCommandPatterns` continue to run against
`normalized`. Same false-positive defence (`echo "rm -rf /"`
stays valid because the destructive text is wrapped by the
OUTER quote pair, not by quotes around the path argument).
2. New `quotedDestructivePatterns` run against the ORIGINAL
command string. Each pattern explicitly REQUIRES a matching
quote pair around the destructive path (`"/"`, `"$HOME"`,
`'/'`, etc.). `echo "rm -rf /"` doesn't match — the `/` isn't
surrounded by quotes there, only the whole `rm -rf /` string is.
Tests: `codex review fixes (round 4) › quoted destructive targets`
covers `"/"`, `"$HOME"`, `'/'`, `chmod -R 777 "/"`, no-regression
on bare forms, and the `echo "rm -rf /"` no-false-positive case.
#10 P1 — workspace policy hook compared lexical paths only
----------------------------------------------------------
`createWorkspacePolicyHook` did `isAbsolute(p) ? resolve(p) :
resolve(root, p)` and called it done. A symlink inside the workspace
pointing outside (e.g. `workspace/escape → /etc/secret`) lexically
appears under `workspace/` and got auto-allowed — even though the
agent reading through it touches a path the policy meant to gate.
Critical when the hook is the primary gate: the documented "ask the
user" setup turns `allowReadOutside: true` so the file tools' own
clamp is off, leaving the hook as the sole defence.
Fix: the hook now realpaths both the candidate and the workspace
roots before comparing. Roots are pre-realpath'd once at construction
(memoized via a Promise so the per-call cost is one realpath) and
candidate paths get the same `realpathOfPathOrAncestor` walk
`resolveWorkspacePathSafe` already uses — so paths that don't yet
exist (e.g. `write_file` to a brand-new path) still get checked
against their nearest existing ancestor's realpath.
Two test cases pin both halves:
- `workspace/escape → extra/secret.txt` — hook DENIES read of
`escape` even though it's lexically inside.
- `extra/alt-mount → workspace/` — hook ALLOWS read of
`extra/alt-mount/file.ts` because the realpath lands inside the
workspace root (covers the alternate-mount case so we don't
over-correct).
Tests: 1272 passing (was 1264; +8 across both fixes).
* fix(local): treat maxSpawnedBytes=0 as unlimited (Codex P2 #11)
The public LocalExecutionConfig contract documents `maxSpawnedBytes: 0`
as "no cap", but the kill check `totalSpawnedBytes > hardKillBytes`
triggered on the very first byte when the cap was zero. Skip the kill
path entirely when `hardKillBytes <= 0` so configs that explicitly opt
out of the OOM backstop behave as documented.
* chore(scripts): forward ON_RUN_STEP in pi-vs-ours harness
Without an ON_RUN_STEP handler the aggregator's stepMap is empty when
ON_RUN_STEP_COMPLETED arrives, so every tool call logged "No run step
or runId found for completed step event" to stderr. Mirror the pattern
used in src/scripts/code_exec.ts (and friends): register both events.
Harness-only — no SDK behavior change.
* fix(local): ESM-safe spill + surface rg failures in glob_search (Codex P1 #12, P2 #13)
Two issues from the latest review pass:
1) `spawnLocalProcess`'s spill path used `require('fs')` inside an
ESM-shipped module. Fine in CJS test runs, throws
`ReferenceError: require is not defined` for any ESM consumer that
triggers the overflow path. Switch to a static
`import { createWriteStream } from 'fs'` so both build outputs work.
2) `createLocalGlobSearchTool` ignored ripgrep's exit code and stderr,
so `rg --files <missing-target>` (exit 2) silently mapped to "No
files found." — the agent then treated a tooling failure as a real
absence of matches. Now we check `result.exitCode > 1` (and
`timedOut`) and return an explicit `glob_search failed: <stderr>`
string + an artifact carrying the error.
Tests pinned: `codex review fixes (round 5) › spill path is ESM-safe`
and `… › glob_search surfaces ripgrep failures`. The glob test uses an
injected spawn backend so it runs deterministically on hosts without
ripgrep installed.
* fix(local): sandbox loopback bridge + additionalRoots writes (Codex P1 #14, P2 #15)
Two issues with how `buildSandboxRuntimeConfig` lays out the
SandboxRuntimeConfig:
1) The local programmatic-tool bridge listens on 127.0.0.1, but the
sandbox default `allowedDomains: []` denies all outbound network —
so `run_tools_with_code` / `run_tools_with_bash` fail under sandbox
even though they work unsandboxed. Seed allowedDomains with
`127.0.0.1`, `localhost`, `::1` (skip any host the user explicitly
listed in `deniedDomains`, and dedupe against user-supplied
`allowedDomains`).
2) The sandbox `allowWrite` allowlist was seeded with `cwd` only, so
`workspace.additionalRoots` paths were resolvable by file tools but
blocked for sandboxed shell/code. Now seed from `getWorkspaceRoots`
so both surfaces share the same boundary.
Exported `buildSandboxRuntimeConfig` so the round-6 tests can drive it
directly without standing up the real native sandbox runtime. Test
count: 1283 passing (was 1276). Lint baseline unchanged.
* fix(direct-path): HITL edit reads updatedInput; PostToolUse updates registry (Codex P1 #16, P2 #17)
Two bugs in `runDirectToolWithLifecycleHooks`:
1) The `edit` decision branch read `decision.args`, but the documented
`ToolApprovalDecision` shape uses `updatedInput`. Hosts following
the public type signature were silently ignored — the tool ran
with the original (un-edited) arguments. Now mirrors the
event-driven path's validation: read `updatedInput`, fail closed
on missing/wrong-shaped payloads (returns an error ToolMessage
instead of executing with undefined args).
2) When a `PostToolUse` hook returns `updatedOutput`, the direct
path returned a new ToolMessage with the replacement content but
never updated the `toolOutputReferences` registry. Subsequent
`{{tool<i>turn<n>}}` substitutions then delivered the stale
pre-hook bytes while the model observed the post-hook
replacement. Now reads `_refKey`/`_refScope` from the message
metadata (already stamped by `recordOutputReference`) and calls
`toolOutputRegistry.set` with the replacement.
Tests: `direct-path HITL: resume scope > edit decision (Codex P1 #16)`
covers both the happy path (edited args reach the body) and the
fail-closed branch. `ToolNode tool output references > PostToolUse
updatedOutput updates the registry (Codex P2 #17)` chains two calls
where the second references the first via {{tool0turn0}} and asserts
the substituted value matches the post-hook content.
* fix: snapshot direct-batch args + curl-first bash bridge (Codex P1 #18, P2 #19)
P1 #18 (ToolNode.ts): direct-path tools resolved {{tool…turn…}}
placeholders against the LIVE registry inside `runTool`, which runs
AFTER awaiting PreToolUse hooks. In a `Promise.all`-driven batch a
slow hook on call A could let call B finish first and register its
output, then A's late re-resolve would substitute B's output into A's
args — order-dependent leakage that violates same-turn isolation.
Snapshot the registry once per batch in `run()` (mirrors the
event-driven path), thread it through `RunToolBatchContext`, and have
both `runDirectToolWithLifecycleHooks` and `runTool` resolve against
the snapshot when present.
P2 #19 (LocalProgrammaticToolCalling.ts): the bash bridge required
`python3` on PATH, breaking `run_tools_with_bash` on minimal
containers and Windows hosts without python3 installed. Bridge now
serves a `?mode=text` variant that returns the already-serialized
result body so bash callers can use `curl` (universally available)
without pulling in a JSON parser. Bash helper tries curl first, falls
back to python3 for environments without curl, errors helpfully if
neither is available.
Tests pinned: `ToolNode tool output references › direct-batch
snapshot isolation (Codex P1 #18)` (slow PreToolUse hook + sibling
finishes-first scenario), and `ProgrammaticToolCalling › bash bridge
script does not require python3 (Codex P2 #19)` (asserts both helper
branches present in the generated script with curl preferred).
* fix(local): block `--` end-of-options + gate compile_check overrides (Codex P1 #20, P1 #21)
P1 #20: the destructive-command guard required the target path to
follow option flags directly, so `rm -rf -- "/"` (and similarly
`chmod -R 777 -- "/"`) bypassed the check via the GNU/BSD
end-of-options marker. Both `dangerousCommandPatterns` and
`quotedDestructivePatterns` now allow an optional `(?:--\\s+)?`
between the flags and the destructive target.
P1 #21: `compile_check` ran the resolved command through `bash -lc`
without going through `validateBashCommand`, so a host-supplied
`command` override (or `compileCheck.command` config entry) could
execute mutating/destructive commands even with `readOnly: true` or
the dangerous-command guard normally in force. Now routes through
`validateBashCommand(detection.command, config)` first; on failure
returns an explanatory ToolMessage with `ran: false` rather than
spawning the process.
Tests pinned: `codex review fixes (round 6) › destructive guard
handles `--` end-of-options` (5 cases including a benign-`--`
no-regression check) and `… › compile_check enforces
validateBashCommand + readOnly` (rm -rf, touch under readOnly,
benign echo). Lint baseline unchanged.
* chore: audit-pass cleanups + missing test coverage (manual review #1, #3, #5, #7, #9, #10, #11)
Comprehensive-review audit findings, fixed in one pass:
#1 (attachments full-file MIME sniff): switch classifyAttachment to
open() + 12-byte header read for sniffing; only readFile() the
full buffer when we're about to embed. A 9 MB image now allocates
12 bytes for sniffing instead of 9 MB.
#3 (fallback walker SKIP_DIRS too small): expanded the Node-fallback
walker's skip set to cover dist/build/.next/.cache/__pycache__/
.venv/venv/target/vendor/coverage/.tox/.mypy_cache and friends.
ripgrep itself respects .gitignore so this only affects the
fallback.
#5 (dynamic fs/promises import): replaced two `await import('fs/
promises').then(...)` calls in CompileCheckTool with a static
`import { readFile, stat } from 'fs/promises'` per AGENTS.md.
#7 (test-only resets in public API): added @internal JSDoc to all
three `_reset*ForTests` functions so IDE autocomplete signals
they aren't part of the SDK surface. (The leading underscore was
convention-only; @internal is the standard tag.)
#9 (legacy fields missing @deprecated): added @deprecated tags to
`LocalExecutionConfig.cwd` and `allowOutsideWorkspace` pointing
to their workspace.* replacements.
#10 (dead lexicallyInside compute in workspace policy hook): dropped
the unused variable. Realpath is the source of truth for both
the symlink-escape and alternate-mount cases; the lexical check
provides no information so we don't pay for it.
#11 (voided unused imports in pi comparison script): removed the
`void readdir; void cp;` workaround and the dead imports they
were silencing.
Bonus: also addressed the comprehensive review's #2 (bitwise OR) by
verifying it was a misread — the code uses `||`, not `|`. Manual
review's reading was wrong; no fix needed.
Larger findings addressed:
#4 (no editStrategies tests): added
`src/tools/local/__tests__/editStrategies.test.ts` with 12 cases
covering exact / line-trimmed / whitespace-normalized /
indentation-flexible match boundaries, start/end-of-file matches,
unicode content, multi-line needles spanning blank lines,
ambiguous-match rejection, and applyEdit semantics.
#12 (no FileCheckpointer tests): added
`src/tools/local/__tests__/FileCheckpointer.test.ts` with 6
cases covering capture+rewind happy path, idempotent capture,
absent-file rewind (deletion), multi-file rewind, oversize-file
tracking-but-not-snapshotting, and rewind across a deleted
parent directory.
Performance:
#8 (lineWindow split-whole-file): replaced `content.split('\\n')`
in `lineWindow` with a newline-walk that's O(start + limit)
instead of O(file). For a 10 MB file with `offset: 1, limit: 10`
this drops from "split the whole file into millions of strings"
to ~10 indexOf calls. Falls back to the simple split when no
limit is supplied.
Deferred:
#6 (spill files never cleaned) needs a Run-lifecycle hook to run
cleanup; tracked separately rather than fixed inline.
* fix: comprehensive review (round 7) — bridge hooks, sandbox latch, nested-shell, regex DoS, attachments via WorkspaceFS, expose checkpointer
Codex P2 (sandbox warning latch): syntax preflight, ripgrep-availability,
and node/python/bash probes used to spawn through the public path that
fires `maybeWarnSandboxOff`, both emitting a misleading "sandbox is off"
warning when the run had `sandbox.enabled: true` AND latching
`sandboxOffWarned = true` so a later genuinely-unsandboxed execution
silently skipped its warning. Added an opt-in `{ internal: true }`
options arg to spawnLocalProcess that suppresses both. Threaded it
through every internal probe.
Manual #A (P1): the in-process programmatic-tool bridge invoked inner
tools via `executeTools` directly, bypassing every PreToolUse hook the
host registered. ToolNode now plumbs a `hookContext` (registry +
runId/threadId/agentId) into the programmatic-tool factory; the bridge
runs PreToolUse before each inner tool.invoke. `deny` and `ask` (HITL
not reachable from inside an HTTP handler) fail closed; `updatedInput`
is applied to the inner tool args. 4 tests pinned.
Manual #B (P1): already fixed in c6e2632 (round 9 — compile_check
routes through validateBashCommand). Reviewer was at 75a54b3, predates
that commit.
Manual #C (P1): `bash -lc "rm -rf $HOME"` (and friends) bypassed the
destructive-command guard because `stripQuotedContent` blanks the
nested payload before the patterns run. Added a third pattern list
`nestedShellDestructivePatterns` that runs against the original
command and matches destructive ops (`rm -rf`, `chmod -R 777`,
`chown -R`) inside `<shell> -[l]?c "..."` and `eval "..."` payloads.
4 tests including a benign-nested-echo no-regression check.
Manual #D (P2): `fallbackGrep` compiled the model-supplied pattern via
`new RegExp` and ran it across every file, opening the door to
catastrophic-backtracking DoS (`(a+)+$`-style patterns). Added a
guardrail layer: 1024-char pattern length cap, nested-quantifier
heuristic rejection, 5-second wall-clock budget for the overall
search. Surfaced as a structured `FallbackGrepError` -> tool result
with engine: 'node-fallback' + the kind tag. Hosts that need
bulletproof regex safety should install ripgrep (RE2-based, no
backtracking). 2 tests pinned.
Manual #E (P2): `local.fileCheckpointing: true` in RunConfig was a
silent no-op in the auto-bind path — `createLocalCodingTools()` made
a checkpointer that was immediately discarded. Now the auto-bind path
uses `createLocalCodingToolBundle()` when checkpointing is on,
returns the checkpointer through `ResolveLocalToolsResult`, and
ToolNode stashes it + exposes via `getFileCheckpointer()`. 2 tests
pinned.
Manual #F (P3): `classifyAttachment` used host `fs/promises` directly
instead of the configured `WorkspaceFS`. A custom/remote engine could
fail to embed valid attachments OR accidentally read a host path with
the same absolute name. Added optional `fs` arg to classifyAttachment
threaded from `read_file` through to its `WorkspaceFS`. Backward
compatible — defaults to host fs/promises when no fs is supplied.
Tests touched: 768 passing across all suites (was 750ish). Lint
baseline unchanged.
* chore: address audit-of-audit findings (4/4 valid)
Follow-up audit on commit ebf8b6a flagged 4 nits/minors. All valid:
F1 (NIT): the lineWindow perf fix introduced exactly the same
`void <var>` pattern that finding #10 had me remove from
createWorkspacePolicyHook. Embarrassing — dropped the dead
`line` variable + `line++` + `void line`.
F2 (MINOR): the editStrategies test commit message claimed coverage
of "ambiguous-match rejection" but no test actually exercised the
multi-match → null path. Added the missing case for the exact
strategy. Noted in a comment that the looser strategies in the
chain may still resolve duplicates unambiguously by falling
through; whether that's correct is a separate design call.
F3 (NIT): the new FileCheckpointer.test.ts had a `import('fs/
promises')` inside a catch block when `mkdir` was already
available via the static import at the top. Static-imported
`mkdir` and restructured the test to mkdir-then-writeFile
unconditionally (the catch-and-retry pattern was unnecessary
defensive plumbing).
F4 (NIT): the unused-imports cleanup missed `sum`/`void sum` in
the comparison harness. Removed the function and its void.
* fix: expose file checkpointer through Run/Graph (audit follow-up)
Round-7 finding E only fixed half the problem: ToolNode got
`getFileCheckpointer()` so direct `new ToolNode(...)` users could
reach it, but the normal `Run.create(...)` path constructed
ToolNodes inline inside `StandardGraph.initializeTools` and dropped
the reference. So `RunConfig.toolExecution.local.fileCheckpointing:
true` was still a no-op for Run callers — and the public JSDoc on
`fileCheckpointing` referenced a `Run.rewindFiles()` that didn't
exist.
Now:
- Graph adds `getOrCreateFileCheckpointer()` — lazily constructs ONE
per-Run checkpointer when local fileCheckpointing is on. Cleared
in `clearHeavyState()` along with the other per-Run state.
- `StandardGraph.initializeTools` threads it into BOTH ToolNode
constructions (the agentContext branch and the traditional-tools
branch). Multi-agent graphs end up sharing a single snapshot
store, so `rewind()` reaches writes from any agent.
- ToolNode accepts `fileCheckpointer?` in its constructor and
passes it to `resolveLocalExecutionTools`, which forwards into
`createLocalCodingToolBundle({ checkpointer })`. Caller-provided
wins over the auto-created one.
- Run gains `getFileCheckpointer()` and `rewindFiles()` — the
documented APIs. `rewindFiles()` returns 0 when checkpointing is
disabled (no-op rather than throwing).
- `LocalExecutionConfig.fileCheckpointing` JSDoc updated to
reference the now-real APIs.
Tests pinned: `… › fileCheckpointer reachable through
Run.getFileChec…
`CodeEnvFile` and `FileRef` gain optional `entity_id`, propagated by `ToolNode`'s `runTool` and `getCodeSessionContext` paths so codeapi can resolve a per-file sessionKey. Without this, an execute that mixes files uploaded under different entities (e.g. a skill file plus a user attachment) collapses to a single request-level entity and one side 403s during file authorization. Additive: absent `entity_id` preserves existing behavior end-to-end.
* 🧬 feat: Inherit Parent Configurable Into Subagent Workflow
`SubagentExecutor` previously invoked the child workflow with a fresh
`configurable: { thread_id: childRunId }`, dropping every host-set
field on the parent's outer `configurable` (e.g. `requestBody`,
`user`, `userMCPAuthMap`). Subagent tool dispatches to the parent's
`ON_TOOL_EXECUTE` handler then arrived with that stripped view, so any
host that reads context from the configurable saw `undefined` for
fields that work fine for the parent agent — placeholder substitution
and per-user lookups silently failed in subagent contexts even though
the same handler runs in the same process.
Changes:
- `SubagentExecuteParams.parentConfigurable?: Record<string, unknown>` —
optional snapshot of the parent's `config.configurable`. The
executor spreads it into the child's `workflow.invoke` configurable,
scrubbing `thread_id` / `run_id` / `parent_run_id` (so the child
doesn't claim parent's run identity), then sets `thread_id` to the
new `childRunId`. Anything else the host put on parent's
configurable is forwarded as-is — the SDK doesn't interpret
host-defined keys.
- `Graph.ts` — the spawn-subagent tool's `_call` now passes
`parentConfigurable: config.configurable`. The closure already has
the parent's runnable config in scope, so this is a one-line wire-up.
- Live verification script (`src/scripts/subagent-configurable-inheritance.ts`)
spawns a self-subagent that uses a calculator tool, captures
`data.configurable` from each `ON_TOOL_EXECUTE` dispatch, and asserts
every key the parent put on its outer configurable shows up on the
subagent dispatch (and `thread_id` is overridden). Passes against
`gpt-4o-mini`.
- Unit tests cover: forwarding host fields, overriding parent
`thread_id`, scrubbing run-identity fields, and back-compat for
hosts that omit `parentConfigurable`.
`parentConfigurable` is optional, so existing callers continue to
work unchanged — they just keep the old isolated behavior.
* v3.1.78
* 🪄 fix: Drop run-identity scrubbing from inherited configurable
Reviewer pushback: "scrubbing `run_id`/`parent_run_id`/`thread_id` for
safety" was the wrong default. The subagent IS the same run /
conversation from the user's perspective, and `SubagentExecutor`
already commits to that scope by firing `SubagentStart` / `SubagentStop`
hooks against `this.parentRunId`. Scrubbing run-identity fields
contradicts that contract.
Changes:
- Forward `parentConfigurable` verbatim into the child workflow's
`configurable` — no key-level deletes. `thread_id` falls back to
`childRunId` only when parent didn't supply one (was: `thread_id`
always overridden to `childRunId`).
- Updated the live verification script to capture run-identity fields
alongside host-set keys and dump them on each dispatch. Empirical
finding (verified against gpt-4o + LangGraph runtime):
* `thread_id`: propagates as inherited ✅
* `parent_run_id`: propagates as inherited ✅
* `run_id`: LangGraph runtime overwrites at child invoke time —
the SDK forwards what we put in, but the runtime swaps it out
before tools see it. Host can't reliably propagate `run_id` via
configurable.
Type doc updated to reflect this so callers aren't misled.
- The script now uses a non-self subagent (math-worker with calculator;
parent has no tools) so the spawn path is forced and parent vs
subagent dispatches are reliably distinguishable by `agentId`.
- Unit tests updated: replaced the "scrubs run-identity fields" test
with "inherits parent thread_id when supplied" + "falls back to
childRunId when not" + "forwards run-identity fields verbatim into
child invoke configurable" (with a note that LangGraph runtime
overrides `run_id` downstream).
A future revision will likely make this inheritance configurable per
spawn type — background / async subagents may want isolation rather
than sharing parent's host context. Out of scope here.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.4)
Can you help keep this open source service alive? 💖 Please sponsor : )