Skip to content

fix: merge latest dev updates into main#18

Merged
nadav-node9 merged 22 commits intomainfrom
dev
Mar 16, 2026
Merged

fix: merge latest dev updates into main#18
nadav-node9 merged 22 commits intomainfrom
dev

Conversation

@nadav-node9
Copy link
Contributor

Auto-generated PR

Merge latest dev changes into main to trigger a release.

⚠️ Important: When you click Squash and Merge, ensure the commit message starts with:

  • fix: to publish a Patch release (0.0.X)
  • feat: to publish a Minor release (0.X.0)
    If it starts with chore:, no NPM package will be published!

nadavis and others added 14 commits March 14, 2026 16:41
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- native.ts: add extractContext + formatArgs with matchedField/matchedWord
  tracing for "Context Sniper" popup — shows dangerous word in context
- core.ts: extend evaluatePolicy return with matchedField/matchedWord;
  per-field scan after dangerous word found; pass through authorizeHeadless
- daemon/index.ts: gate SSE broadcast and browser open on browser config flag
- LICENSE/package.json/README.md: MIT → Apache-2.0
- .github/workflows/ai-review.yml: add paths-ignore to prevent self-modification
- scripts/ai-review.mjs: upgrade to claude-sonnet-4-6, max_tokens 2048

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Wrap diff in <diff>...</diff> markers with untrusted-content notice to mitigate prompt injection
- Surface truncation note in posted PR comment when diff exceeds MAX_DIFF_CHARS
- Downgrade API errors to warning comments + exit 0 so Anthropic outages don't block PRs
- Pin @anthropic-ai/sdk@0.78.0 and @octokit/rest@22.0.1 to prevent supply-chain drift
- Add explicit permissions block (contents: read, pull-requests: write)
- Exclude dependabot[bot] from triggering review
- Add fetch-depth: 0 to checkout step

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… main

- ai-review.yml: replace AUTO_PR_TOKEN with GITHUB_TOKEN (permissions block
  already scopes it correctly — no broad PAT needed)
- ai-review.yml: add --ignore-scripts to npm install to block malicious
  postinstall hooks from transitive dependencies
- sync-dev.yml: new workflow — after every push to main, merge main back into
  dev so release-bot version bumps don't cause recurring README conflicts on
  the next dev -> main PR

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Move @anthropic-ai/sdk and @octokit/rest into devDependencies and switch the
ai-review workflow from bare npm install to npm ci --ignore-scripts. This
locks all transitive dependencies to the committed lockfile, eliminating
supply-chain drift on every CI run.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- context-sniper.ts (new): shared RiskMetadata type, smartTruncate,
  extractContext (returns {snippet, lineIndex}), computeRiskMetadata
- native.ts: import from context-sniper, use .snippet on extractContext calls
- core.ts: add tier to evaluatePolicy returns; compute riskMetadata once in
  authorizeHeadless; pass it to initNode9SaaS, askDaemon, notifyDaemonViewer
- daemon/index.ts: store and broadcast riskMetadata in PendingEntry
- daemon/ui.html: renderPayload() uses riskMetadata for intent badge, tier,
  file path, annotated snippet, matched-word highlight; falls back to raw args

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Contributor

🤖 Claude Code Review

Code Review: Context Sniper Feature

Correctness & Edge Cases

extractContext — word boundary regex on special characters: \b doesn't work correctly with non-ASCII characters or tokens like --delete-orphans, rm -rf, etc. The replace escaping handles special regex chars, but \b assertions around punctuation-heavy CLI flags will silently fail to match, falling back to full-text — unexpected behavior for the primary use case.

EDIT intent detection is fragile: Detecting EDIT by presence of old_string + new_string is a heuristic. If another tool coincidentally uses those field names, it gets misclassified. Consider also checking toolName (e.g., str_replace_editor).

tier: 6 default in authorizeHeadless: policyResult.tier ?? 6 — tier 6 is "dangerous word" but this fallback fires for any policy result missing a tier tag. Several evaluatePolicy return paths (e.g., nuclear protection inline exec before this PR) now have explicit tiers, but any future path that omits tier silently gets tier 6. This is misleading. A dedicated tier: 0 or undefined sentinel would be safer.

Truncated diff: src/ui/native.ts diff is cut off mid-import (fr instead of from). Can't review what's actually changed there — incomplete PR.

Security Issues

Unvalidated riskMetadata in daemon: In src/daemon/index.ts, riskMetadata is parsed from the POST body via JSON.parse(body) and stored/forwarded with no validation. Any client can POST arbitrary riskMetadata content. Since this is rendered in the browser UI via esc(), XSS is mitigated, but a malicious/confused caller could forge tier: 1 or a misleading blockedByLabel, undermining the trust model. The daemon should treat riskMetadata as display-only decoration and never use it for access control decisions — this appears to be the case, but it should be documented explicitly.

JSON.parse(body) without try/catch: Pre-existing issue, but the new destructuring of riskMetadata makes a malformed body harder to debug. Still no parse guard visible in the diff.

Race Conditions / Async

No new async paths introduced. The metadata is computed synchronously before the race engine forks — correct approach.

TypeScript Type Safety

riskMetadata: riskMetadata ?? undefined is redundant — riskMetadata is already undefined if absent from the parsed JSON. Minor noise but harmless.

The RiskMetadata type is accepted as unknown from the wire in the daemon — should be validated or cast explicitly rather than trusted as the correct shape.

Performance

extractContext calls split('\n') and iterates lines twice (map + find). For large payloads this is fine given the 500-char truncation applied first — except truncation happens after the split when lines.length > 7. A 50,000-line file gets fully split before truncation. Apply smartTruncate as a pre-filter or add a line count guard.

Test Coverage Gaps

  • extractContext with matched words containing regex metacharacters (e.g., $PATH, *.js)
  • extractContext when matchedWord appears only in comments (non-comment preference branch)
  • computeRiskMetadata with Gemini-style stringified JSON
  • Daemon endpoint with a missing/malformed riskMetadata field
  • The truncated native.ts change has zero coverage visibility

Summary

The core design is sound — computing metadata once and flowing it downstream is the right call. The main issues are: the incomplete native.ts diff needs re-review, the tier ?? 6 default is a future footgun, the word-boundary regex won't reliably match CLI tokens, and the daemon needs explicit documentation (or a type guard) that riskMetadata is display-only and never trusted for enforcement decisions.

⚠️ Note: This diff exceeded 20,000 characters and was truncated. The review above covers only the first portion of the changes.


Automated review by Claude Sonnet

When the Node9 cloud responds with shadowMode:true (org is in shadow
mode), print a yellow warning to stderr instead of blocking the agent.
The developer sees exactly why it was flagged without being interrupted.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Contributor

🤖 Claude Code Review

Code Review: Context Sniper Module

Correctness & Edge Cases

extractContext — word boundary regex on special identifiers: \b doesn't work correctly for words containing hyphens, dots, or non-ASCII characters (e.g., rm -rf, file.txt, environment variable names). A match could silently fall back to the full-text path without the caller knowing why.

computeRiskMetadata — EDIT intent detection is fragile: The heuristic old_string !== undefined && new_string !== undefined is specific to one tool schema (Claude Code's str_replace). Other editors or future tools with different schemas will silently fall back to EXEC. This assumption should be documented as a known limitation or made configurable.

computeRiskMetadata — context extracted from new_string only: For EDIT intent, context comes from new_string but the match may have triggered on old_string. If the dangerous word appears only in the old content being replaced (not introduced), the snippet will be misleading to the approver.

Default tier fallback: policyResult.tier ?? 6 in core.ts is a silent catch-all. If a new policy path is added and forgets to set tier, it silently gets classified as tier 6 (dangerous word). This will produce incorrect UI labels. A never-exhaustion check or an explicit undefined tier rendering would be safer.

Security Issues

riskMetadata accepted from untrusted POST body in daemon: In src/daemon/index.ts, riskMetadata is parsed directly from the HTTP body with no validation or schema check. A malicious or compromised agent process could inject arbitrary riskMetadata — including a false tier: 1 or a misleading contextSnippet — to manipulate the human approver's decision. This is a significant trust boundary violation. The daemon should ignore client-supplied riskMetadata and recompute it server-side, or at minimum validate the shape strictly.

editFilePath rendered unescaped as a file path: The esc() call in the HTML template handles XSS, but there's no path traversal or length validation on editFilePath before it's stored and displayed. An agent could supply a 10 000-character path.

JSON.parse without try/catch in daemon: The body parse at line JSON.parse(body) has no error handling. A malformed body throws and likely crashes the request handler or produces an unhandled rejection.

Async / Race Conditions

No new async paths introduced. The riskMetadata is computed synchronously before entering the race engine, which is the right approach.

TypeScript Type Safety

JSON.parse(body) returns any with no runtime validation — the destructured fields (toolName, args, riskMetadata) are all untyped. Given this is the security approval path, a Zod or manual type guard is warranted.

Performance

extractContext calls text.split('\n') and iterates the full line array twice (map + find). For very large payloads (e.g., a 50 000-line SQL dump), this blocks the event loop. Consider a line-count guard before splitting, or lazy iteration.

Test Coverage Gaps

  • extractContext with matchedWord containing regex special characters (e.g., rm -rf, $VAR)
  • computeRiskMetadata when both old_string and new_string are present but the match is in old_string
  • Daemon /check endpoint receiving a crafted riskMetadata payload with manipulated tier/intent

Summary

The untrusted riskMetadata accepted over the daemon's HTTP interface is the most critical issue — it directly undermines the integrity of what the human approver sees. Everything else is medium-severity. The shadow mode addition looks clean and correct.

⚠️ Note: This diff exceeded 20,000 characters and was truncated. The review above covers only the first portion of the changes.


Automated review by Claude Sonnet

…er, and integration tests

- Add Zod v3 schema validation (config-schema.ts) with clear error messages for
  bad config.json — catches literal newlines, invalid regex, unknown keys, bad enums
- Fix silent JSON parse fallback in tryLoadConfig: bad config now warns to stderr
  instead of silently using DEFAULT_CONFIG (which had cloud:true causing unexpected
  browser/cloud popups when config was invalid)
- Fix auditLocalAllow fire-and-forget killed by process.exit: audit mode path now
  awaits the POST so SaaS receives the event before the process exits
- Gate all auditLocalAllow calls on approvers.cloud so cloud:false (privacy mode)
  never sends data to SaaS
- Fix double browser windows when cloud+browser both enabled: RACER 3 (browser)
  now skips when cloudEnforced, preventing duplicate daemon /check entries
- Fix calledFromDaemon guard on terminal status messages to prevent duplicate output
- Add check.integration.test.ts: 20 end-to-end tests spawning real node9 check
  subprocess with isolated HOME dirs and in-process mock SaaS server

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Contributor

🤖 Claude Code Review

Review

Deletions (config files)

Removing node9.config.json_ and node9.config.json__ is good housekeeping. One note: both files had trailing whitespace on the "push" and "force" lines — worth a .editorconfig or lint rule to prevent recurrence. No security concern in the deletion itself.

Zod dependency

Adding Zod at ^3.25.76 is a reasonable choice for config validation. However:

  • The diff shows no actual usage of Zod yet — no schema definitions, no parse/safeParse calls in any source file. This adds a production dependency with no shipped code. Either the usage is in a file not shown, or Zod was added prematurely. Clarify or defer.
  • The lockfile integrity hash for Zod should be verified against the official registry before merging into a security tool.

Integration tests (check.integration.test.ts)

Positive: Real subprocess spawning, isolated HOME per test group, NODE9_NO_AUTO_DAEMON guard, async runner for tests with a live HTTP server — all solid design choices.

Issues:

  1. Diff is truncated — the file cuts off mid-test at line 661. The malformed-JSON test is incomplete. Review cannot be considered complete without seeing the full file.

  2. spawnSync timeout race: timeoutMs = 8000 with spawnSync blocks the Node event loop for up to 8 seconds on hang. For a CI suite with many tests this is a silent pipeline killer. Consider a shorter default (2–3s) for fast-path tests.

  3. cleanupHome is not called on test failure: If an expect throws before afterEach, the temp directory is still cleaned up (vitest runs afterEach regardless) — this is fine. But the runCheckAsync timer fires resolve({ status: null }) on timeout without logging the child's stderr, making timeout failures opaque.

  4. Mock server teardown is not awaited: mockServer.close() in afterEach is fire-and-forget. If a subsequent test starts before the port is fully released, there's a potential bind conflict. Use mockServer.close(callback) wrapped in a Promise.

  5. Timing assertion is fragile: The await new Promise((res) => setTimeout(res, 200)) after the async check to verify auditCalls.length is a time-based assertion. This will produce flaky results under load. The mock server should signal completion via a Promise that resolves when the expected request arrives.

  6. Security gap in mock server: The mock /audit endpoint accepts any POST body without validating the Authorization/apiKey header. The test should assert that the correct API key is sent, since credential forwarding is a security-critical behavior.

  7. cwd and HOME are both set to tmpHome: This is intentional to avoid inheriting the project's node9.config.json, which is good. Worth a comment explaining why cwd is set, not just HOME.

  8. No test for allowPaths rule evaluation — a meaningful security regression surface — and no test for the ignoredTools glob patterns (list_*, task*). These are higher-value gaps than the audit-mode cloud-gating tests.

Summary

Block on: truncated diff, premature Zod dependency, flaky timing assertion in audit+cloud test, missing mock server teardown await. The overall test architecture is sound.

⚠️ Note: This diff exceeded 20,000 characters and was truncated. The review above covers only the first portion of the changes.


Automated review by Claude Sonnet

@github-actions
Copy link
Contributor

🤖 Claude Code Review

Review

Cleanup (config file deletions)

Removing node9.config.json_ and node9.config.json__ is correct housekeeping. Those double-underscore-suffixed files should never have been committed — they may have leaked default policy configuration including slackChannel references. Good removal.

Zod dependency addition

Adding Zod for config validation is a reasonable choice. One concern: the lockfile shows 3.25.76 pinned, but package.json uses ^3.25.76. Zod has had breaking minor changes historically. For a security tool, consider pinning exactly ("zod": "3.25.76") rather than allowing automatic minor bumps that could silently change validation behavior.

Integration tests — correctness issues

Flaky timing in audit/cloud test (critical):

await new Promise((res) => setTimeout(res, 200));
expect(auditCalls.length).toBeGreaterThan(0);

A 200ms fixed delay is unreliable in CI. If the CLI process hasn't finished its HTTP POST within that window, the test produces a false negative. Use a polling loop or expose a callback. This is especially fragile if the audit POST is fire-and-forget (not awaited in the CLI path).

runCheckAsync resource leak:
If child.kill() is called on timeout, the stdout/stderr streams are not explicitly destroyed. On some Node versions this can leave dangling listeners. Add child.stdout.destroy(); child.stderr.destroy() after kill.

spawnSync event-loop blocking (noted in comments but still risky):
The comment warns against using runCheck when a mock server is involved, but nothing prevents a future test author from accidentally using the sync version. Consider removing runCheck entirely or renaming it runCheckSync_UNSAFE to make the hazard obvious.

Security gaps in test coverage

No test for path traversal in allowPaths rules — e.g., rm ../../etc/passwd matching a tmp/** glob incorrectly. This is the highest-risk policy logic.

No test for toolInspection field extraction failure — what happens when bash is called without a command field? Does it default-allow or default-block? That edge case should be explicit.

No negative test for ignoredTools glob patternstask* matching taskExecuteArbitraryCode should be tested explicitly since wildcards in an allowlist are dangerous.

dangerousWords substring matching not tested for false positives — does "push" in dangerousWords also block git push --tags or a tool named pushNotification? Test the boundary.

Type safety

The diff doesn't show the Zod schema definitions, so it's impossible to verify whether all config fields are properly typed. If smartRules conditions or toolInspection mappings are typed as any after parsing, Zod's value is negated.

Missing test

The diff is truncated — the malformed JSON test is cut off mid-string. Ensure it actually covers the fallback-to-defaults behavior and doesn't silently pass due to an incomplete assertion.

Overall

The structure is solid and the intent is right, but the flaky timing assertion in the cloud audit test is the most actionable issue to fix before merging.

⚠️ Note: This diff exceeded 20,000 characters and was truncated. The review above covers only the first portion of the changes.


Automated review by Claude Sonnet

…plied

Previously tryLoadConfig warned about invalid fields (e.g. mode:"bad-mode")
but still returned the raw object, letting them override valid values from
higher-priority config layers. A project-level node9.config.json with
mode:"bad-mode" would override the global mode:"audit", bypassing audit mode
and triggering the full cloud approval race unexpectedly.

sanitizeConfig() now drops top-level keys that fail Zod validation so invalid
project configs cannot corrupt the effective merged config.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Contributor

🤖 Claude Code Review

Code Review: Node9 Proxy Integration Tests + Config Cleanup

What's in this PR

  • Deletes two scratch config files (node9.config.json_, node9.config.json__)
  • Adds zod as a production dependency
  • Adds 661-line integration test file for the check command

Issues Found

1. Zod added but never used in this diff (medium)
zod is added to package.json and package-lock.json but there's no corresponding source change showing where it's consumed. If it's for config validation, that code should be in this PR. Merging a dependency without the consuming code makes the addition unreviable from a security standpoint — this is a security tool and every new dependency is an attack surface.

2. Flaky timing in audit cloud test (high)

await new Promise((res) => setTimeout(res, 200));
expect(auditCalls.length).toBeGreaterThan(0);

A 200ms arbitrary sleep is a classic flakiness source, especially in CI environments under load. If auditLocalAllow is truly awaited in the production path, the test should be able to assert without a sleep. If it's fire-and-forget, that's a correctness bug in core.ts worth flagging separately.

3. spawnSync + mock server warning is documented but risky (medium)
The comment correctly explains that spawnSync blocks the event loop, which is why runCheckAsync exists. However, the sync runner is still used in sections 1–5 where no mock server is present. If a future test author copies the pattern and accidentally uses runCheck with a server-dependent test, it will silently hang until timeout. Consider removing runCheck entirely and standardizing on runCheckAsync.

4. Truncated test file (low/blocker for review)
The diff cuts off mid-test at line 661 inside the malformed JSON test case. The test suite is incomplete as shown — cannot fully review section 7 or any sections beyond it.

5. No cleanup of spawnSync timeout SPID on Windows (low)
spawnSync with timeout leaves zombie processes on Windows when the timeout fires. Not critical for Linux/macOS CI but worth noting.

6. Scratch files with double-underscore suffix deleted (good)
node9.config.json_ and node9.config.json__ were clearly development artifacts. Good to remove. Verify neither is referenced anywhere in scripts or docs.


What Looks Good

  • Isolated HOME per test group — correct approach for config isolation
  • NODE9_NO_AUTO_DAEMON=1 and NODE9_TESTING=1 guards are appropriate
  • beforeAll build check with a clear error message is good DX
  • Async/sync runner split is well-reasoned and documented

Summary

Block on: Zod dependency without consuming code, truncated diff. Fix before merge: the timing-based assertion in the audit cloud test.

⚠️ Note: This diff exceeded 20,000 characters and was truncated. The review above covers only the first portion of the changes.


Automated review by Claude Sonnet

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Contributor

🤖 Claude Code Review

Code Review

What's in this diff

  • Deletion of two backup config files (node9.config.json_, node9.config.json__)
  • Addition of zod as a runtime dependency
  • A new integration test file (check.integration.test.ts, partially shown — truncated mid-test)

Issues Found

1. Timing race in audit+cloud test (High)

await new Promise((res) => setTimeout(res, 200));
expect(auditCalls.length).toBeGreaterThan(0);

A fixed 200ms sleep is unreliable — CI machines under load will flake. If auditLocalAllow is supposed to be awaited by the CLI before exit, you should instead assert on process close (which runCheckAsync already waits for). If it's fire-and-forget in core.ts, that's a correctness bug in the production path — the audit POST may never complete if the process exits first. Either way, the test is masking this ambiguity.

2. Zod added as dependency but no schema usage is shown (Medium)
Zod 3.25.76 is a non-trivial addition to the critical path bundle. The diff adds it to package.json and package-lock.json but shows no usage. Since every AI tool call goes through this proxy, confirm: Is Zod validation happening inline on the hot path (/check)? If so, benchmark it. If it's only used for config loading at startup, that's acceptable — but this should be explicit in the PR description.

3. Backup config files contained sensitive-looking defaults (Low/Hygiene)
The deleted files had "slackChannel": "#general" in both production and development environments. Good that they're removed, but check git history hasn't persisted any real credentials or internal channel names from earlier commits.

4. Test isolation: cwd and HOME both set to tmpHome (Low)

runCheck({ ... }, { HOME: tmpHome }, tmpHome)

Setting cwd to tmpHome is intentional to avoid picking up the project's node9.config.json, but it also means relative-path resolution in the CLI (if any) anchors there. This is probably fine, but worth a comment explaining why both are set.

5. Diff is truncated — Section 7 is incomplete
The malformed JSON test is cut off mid-string. It's impossible to review the negative/error-handling tests for config validation, which are arguably the most important for a security tool (malformed config should fail closed, not open). Ensure the full test is reviewed before merge.

6. No test for the ignoredTools wildcard bypass
The ignored tools list includes patterns like task* and list_*. There's no test verifying that a tool named task_drop_all_tables (matching task*) is fast-pathed to allow — this is a significant security gap worth covering explicitly.


What Looks Good

  • Using spawnSync vs spawn is correctly reasoned (event loop blocking with mock servers)
  • makeTempHome/cleanupHome pattern is clean and correct
  • NODE9_NO_AUTO_DAEMON=1 guard is the right approach for integration tests
  • Deleting the __/_ backup configs is overdue hygiene

Bottom line: Don't merge until the truncated Section 7 is reviewed, the timing race is resolved, and Zod's placement on the hot path is justified.

⚠️ Note: This diff exceeded 20,000 characters and was truncated. The review above covers only the first portion of the changes.


Automated review by Claude Sonnet

…l test, improve comments

- Remove unreliable 200ms sleep from audit+cloud test: auditLocalAllow is
  awaited before process.exit so the POST is done by the time the subprocess
  closes; if it ever races here it would be a production bug too
- Add task* wildcard test: task_drop_all_tables must be fast-pathed to allow
  (documents the intentional security trade-off of user-configured wildcards)
- Expand runCheck docstring explaining why cwd=tmpHome is needed alongside
  HOME=tmpHome (avoids inheriting the repo's own node9.config.json)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Contributor

🤖 Claude Code Review

Code Review: Node9 Proxy — Integration Tests + Zod + Config Cleanup

Overall Assessment

Mostly solid work. Cleaning up the double-underscored config files is good housekeeping, and the integration test approach (real subprocess, isolated HOME) is the right architecture for a security tool. Several issues need attention.


Security Issues

1. ignoredTools entries in deleted configs deserve a comment, not silent removal.
The deleted node9.config.json_ files included write, edit, multiedit, and notebookedit in ignoredTools — these are write operations. Deleting the example configs without documenting why these were ignored (or flagging the security risk) leaves future users without guidance. If these were sample configs, they were misleading; confirm a canonical example exists elsewhere.

2. Zod version pinning is loose.
"zod": "^3.25.76" accepts any 3.x minor/patch. Given this is a security tool parsing untrusted AI agent payloads with Zod, consider ~3.25.76 (patch-only) or an exact pin to reduce supply-chain exposure.


Test Correctness Issues

3. Truncated diff — last test is cut off.
"audit mode + cloud:false → does NOT POST to /audit" ends mid-line with const r = awai. This test is not reviewable and cannot merge as-is.

4. task_drop_all_tables test documents a risky behavior without asserting the mechanism.
The comment says "the pattern is opt-in by the user" — correct, but the test name calls this a "fast-path to allow" which could be misleading. More importantly, there's no test asserting that a tool matching task* that also contains a dangerousWord is still bypassed. That interaction (ignoredTools vs. dangerousWords precedence) is the actual security-critical edge case to document.

5. Mock server error handling is incomplete.
In the audit mode + cloud gating suite, mockServer.close() in afterEach is called without a callback — if the test fails mid-request, the server may not be fully closed before the next test starts. Use mockServer.close(done) or return a Promise.

6. runCheckAsync has a process leak risk.
If child.kill() is called on timeout, child.on('close') will still fire afterward. clearTimeout(timer) won't have run yet when close fires post-kill, so resolve is called twice. Add a let resolved = false guard.


Minor Issues

7. NODE9_TESTING=1 is set but its behavior is never documented in the test file. A comment explaining what this flag does in core.ts / daemon would help future maintainers.

8. No test for malformed JSON payload to check. The /check endpoint and CLI are the trust boundary — a fuzz/malformed-input test should be here.


What Looks Good

  • Using spawnSync vs. spawn based on whether a mock server is in-process is correct and well-documented.
  • makeTempHome isolation pattern is clean.
  • beforeAll build check prevents silent false-passes.
  • Zod addition to dependencies is appropriate if input validation is being added to the policy engine.

Block on: truncated test, double-resolve in async runner, security note on ignoredTools write operations.

⚠️ Note: This diff exceeded 20,000 characters and was truncated. The review above covers only the first portion of the changes.


Automated review by Claude Sonnet

…noredTools precedence tests

- Add `let resolved = false` guard in runCheckAsync to prevent double-resolve
  when child.kill() is called on timeout (close event fires after kill)
- Fix mockServer.close() in afterEach to return a Promise (was fire-and-forget)
- Document NODE9_TESTING=1 behavior in file header comment
- Add runCheck/runCheckAsync raw string support for malformed payload testing
- Add section 10: malformed JSON payload tests (non-JSON, empty, partial JSON)
- Add ignoredTools precedence test: task* wildcard + dangerous word in input
  documents that ignoredTools fast-path bypasses dangerousWords (by design)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Contributor

🤖 Claude Code Review

Review

What's in this PR

  1. Deletion of two leftover scratch config files (node9.config.json_, node9.config.json__) — clean-up, no issues.
  2. Addition of zod as a runtime dependency.
  3. A new integration test file (check.integration.test.ts, diff is truncated at ~746 lines).

Issues Found

Security — ignoredTools precedence documented but not gated

The test comment explicitly documents that ignoredTools short-circuits before dangerousWords evaluation, and a tool named task_drop_all_tables with dangerous input is silently allowed. The test is correct about current behavior, but this is a security footgun worth calling out in a review: the behavior should be prominently documented in the config schema/README, and ideally a warning should be emitted at daemon startup if a write-capable tool pattern (e.g., bash, shell) appears in ignoredTools.

Test correctness — ambiguous assertion in "safe command" test

if (r.stdout.trim()) {
  const parsed = JSON.parse(r.stdout.trim());
  expect(parsed.decision).not.toBe('block');
}

This is a conditional assertion — if stdout is empty (the happy path), the expect never runs. The test passes vacuously even if the behavior regresses in a way that produces unexpected output. Use an unconditional assertion.

Test robustness — spawnSync timeout produces status: null

runCheck returns { status: null } on timeout, but several tests do expect(r.status).toBe(0) with no null-guard. If the subprocess hangs (e.g., approval UI leaks through despite NODE9_TESTING=1), the test fails with a confusing assertion error rather than a clear timeout message.

Dependency — zod version pinning

^3.25.76 allows any 3.x minor/patch upgrade. For a security tool on the critical path of every agent tool call, pin to an exact version (3.25.76) or at least lock the minor (~3.25.76). A zod minor release with a parsing regression could silently misclassify payloads.

Truncated diff

The diff cuts off mid-test inside beforeEach for the "audit mode + cloud gating" suite. It's impossible to review the mock server teardown, async runner usage, or assertions for that group. Ensure the mock server has a proper afterEach that calls mockServer.close() — leaked servers across tests will cause port conflicts or hang the suite.

Missing test coverage (visible from what's present)

  • No test for malformed/invalid JSON payload to check (the runCheckAsync helper supports raw strings but no test uses it in the visible diff).
  • No test for the toolInspection field mapping (verifying that bashcommand extraction works correctly).
  • No concurrency test: two simultaneous check calls for the same tool — relevant to the race engine in core.ts.

Minor Notes

  • makeTempHomeRaw is defined but unused in the visible diff — confirm it's used later in the truncated portion.
  • cleanupHome uses fs.rmSync with force: true, which silently swallows errors if cleanup fails. Acceptable for tests.
  • The beforeAll binary existence check is good practice.

Overall structure is solid. Address the conditional assertion and review the truncated server-teardown section before merging.

⚠️ Note: This diff exceeded 20,000 characters and was truncated. The review above covers only the first portion of the changes.


Automated review by Claude Sonnet

The CLI intentionally exits 0 on unparseable JSON (fail-open policy):
a transient serialization error must not block the AI session mid-flight.
The test was asserting the opposite. Updated all three malformed-payload
tests to verify graceful failure (no crash/stack trace) rather than an
error exit code.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Contributor

🤖 Claude Code Review

Code Review: Node9 Proxy Integration Tests + Zod Dependency

What's in this PR

  • Deletes two backup config files (node9.config.json_, node9.config.json__)
  • Adds zod as a production dependency
  • Adds src/__tests__/check.integration.test.ts (744 lines, diff truncated)

Issues Found

1. Zod added but never used in this diff (Medium)
zod is added to package.json and package-lock.json but no source file in this diff imports it. If it's being used in core.ts or daemon/index.ts for input validation (which would be the right use), those changes are missing from this PR and can't be reviewed. For a security tool, unreviewed schema validation at the /check endpoint is a significant gap — the diff is incomplete.

2. spawnSync timeout behavior is silent (Medium)
In runCheck, when spawnSync times out, result.status is null and result.error is set but discarded. The returned { status: null, ... } will cause tests to fail with confusing assertion mismatches rather than a clear "timed out" message. Add a check for result.error and throw or surface it.

3. Double-resolve guard is unnecessary and misleading (Low)
In runCheckAsync, the comment says close fires after kill() — that's true, but the resolved flag is the correct fix. However, the timer calls child.kill() then settle(...) with status: null. The subsequent close event will fire with the signal's exit code (e.g. -1 or null), which settle correctly ignores. This is fine but worth a comment clarifying status: null means "killed by timeout" vs "process reported null."

4. makeTempHomeRaw is defined but never used in the visible diff (Low)
Dead code in test utilities is harmless but suggests the diff is truncated and the full test file may test the JSON-parse error path — which would be important to see reviewed.

5. Security concern: ignoredTools precedence test documents unsafe behavior without a warning in code (Medium)
The test task* wildcard + dangerous word in input → ignoredTools wins correctly documents that ignoredTools bypasses dangerousWords. This is a footgun — the comment is good, but the actual policy engine in core.ts should enforce that ignoredTools only matches read-only/safe tool classes, or at minimum log a warning when a toolInspection-mapped tool (like bash) is added to ignoredTools. No code change here guards against this.

6. Config file cleanup (Good)
Deleting the node9.config.json_ and node9.config.json__ backup files is correct — these should never have been committed, especially since they contain slackChannel config and policy that could mislead future contributors.

7. Test coverage gap: daemon interaction
The visible tests only cover authorizeHeadless paths. There are no tests for the daemon /check/wait/:id/decision/:id flow, which is where race conditions in the async approval path would live.


Summary

The test suite is well-structured and the helper design is solid. The main concerns are: the incomplete diff hiding where/how zod is used, silent timeout failures in runCheck, and no daemon endpoint coverage. The ignoredTools precedence behavior needs a runtime warning in the engine, not just a test comment.

⚠️ Note: This diff exceeded 20,000 characters and was truncated. The review above covers only the first portion of the changes.


Automated review by Claude Sonnet

@nadav-node9 nadav-node9 merged commit 76b9747 into main Mar 16, 2026
8 checks passed
nadav-node9 pushed a commit that referenced this pull request Mar 16, 2026
## [1.0.9](v1.0.8...v1.0.9) (2026-03-16)

### Bug Fixes

* merge latest dev updates into main ([#18](#18)) ([76b9747](76b9747))
@nadav-node9
Copy link
Contributor Author

🎉 This PR is included in version 1.0.9 🎉

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants