refactor(agent): decouple nvm bootstrap from agent Run/InstallMCP commands by zpzjzj · Pull Request #76 · alibaba/skill-up

zpzjzj · 2026-06-01T03:12:34Z

Summary

Splits the nvm/Node bootstrap and the agent invocation into two distinct rt.Exec calls so failures can be cleanly attributed and the agent's stderr buffer stays free of bootstrap noise.

Fixes #72

Problem

nodeRuntimeCommandWithGuard (internal/agent/node_install.go) packed the bootstrap and the agent's real command (claude -p, codex exec --json, claude mcp add, codex mcp add) into a single multi-line shell script handed to one rt.Exec call. Consequences:

Bootstrap failures (curl timeout to the nvm installer, sha256 mismatch, nvm install errors against an offline mirror) surfaced as claude-code run failed: ... / codex run failed: .... Hard to triage without reading the wrapped script.
Bootstrap stderr was concatenated into result.Stderr, which is then scanned by providerAuthFailureSignal / providerRateLimitSignal — widening their false-positive surface.
Bootstrap stdout could prepend stdout.json artifacts.
Bootstrap re-fired silently at Run time when Install failed or wasn't called, masking what should be a clear "Install was never called" failure.

Fix

New ensureNodeRuntime(ctx, rt, cliBin, opts) helper runs only the bootstrap as its own rt.Exec. The script's first line short-circuits with exit 0 when command -v <cli> succeeds, so the happy path is one extra command -v check with no output.

Call site (before)	After
`claude_code.go:127` `nodeRuntimeCommandWithGuard("claude", ...)` (Run)	`ensureNodeRuntime` → Exec
`claude_code.go:72` `nodeRuntimeCommandWithGuard("claude", ...)` (MCP)	`InstallMCP` calls `ensureNodeRuntime` once before the per-server loop
`codex.go:236` (Run)	`ensureNodeRuntime` → Exec
`codex.go:136`, `:171` (MCP stdio / http+bridge)	wrap-once in `InstallMCP`

nodeRuntimeCommandWithGuard and nodeRuntimeCommand are removed — no remaining production callers. qodercli is untouched (it never used the wrapper).

Errors from ensureNodeRuntime wrap with node bootstrap failed: ... so they're trivially distinguishable from agent run failures in result handling and signal detectors. Bootstrap stderr no longer reaches result.Stderr of the agent Exec.

A follow-up commit (de46509) preserves the existing convention of returning a non-nil SessionResult{Engine, ExitCode: 1, DurationMs} on bootstrap failure, so the evaluator still records telemetry for failed cases.

Test plan

go build ./...
go test ./... (full suite green)
make verify (gofmt + vet + revive + golangci-lint, 0 issues)
New tests in internal/agent/node_install_test.go: script shape (CLI short-circuit at top), Exec-error wrapping, non-zero-exit wrapping with stderr surfacing, empty-cliBin guard
Updated fake runtimes (claudeCodeTestRuntime, codexTestRuntime) recognize bootstrap commands by content sniffing and short-circuit them so existing lastCommand / execResult assertions still observe the agent command
Manual smoke: run an eval with claude-code and verify trace shows two distinct rt.Exec calls (bootstrap + run) and that stdout.json contains only stream-json from claude
Negative smoke: set NVM_SOURCE to an unreachable host; failure should report node bootstrap failed: rather than claude-code run failed:

🤖 Generated with Claude Code

…ate Exec Previously `nodeRuntimeCommandWithGuard` packed the nvm/Node bootstrap and the agent's real command (`claude -p`, `codex exec --json`, MCP install) into a single multi-line shell script handed to one `rt.Exec` call. When the bootstrap failed (curl timeout to nvm installer, sha256 mismatch, nvm install errors against an offline mirror), the failure surfaced as a `claude-code run failed` / `codex run failed` and bootstrap stderr was concatenated into `result.Stderr`, which then fed `providerAuthFailureSignal` and `providerRateLimitSignal` — widening their false-positive surface. Replace the wrapper with `ensureNodeRuntime`, which runs only the bootstrap as its own `rt.Exec` (short-circuited by `if command -v <cli>; then exit 0`). Run/InstallMCP call it first, then Exec the agent command separately. Each Exec now has its own exit code, stderr, stdout; bootstrap failures wrap as `node bootstrap failed: ...` and never reach the agent's stderr buffer or `stdout.json` artifact. `qodercli` is unaffected — it never used the wrapper. Fixes #72

The new ensureNodeRuntime call path returned `nil, err` on bootstrap failure, breaking the existing convention (post-Exec error path) of always returning a SessionResult with Engine/ExitCode/DurationMs so the evaluator can record telemetry for failed cases. Build a minimal SessionResult before the error return, matching the surrounding pattern. Self-review catch.

zpzjzj added 2 commits June 1, 2026 11:05

zpzjzj requested a review from hittyt as a code owner June 1, 2026 03:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(agent): decouple nvm bootstrap from agent Run/InstallMCP commands#76

refactor(agent): decouple nvm bootstrap from agent Run/InstallMCP commands#76
zpzjzj wants to merge 2 commits into
mainfrom
fix/command_exec

zpzjzj commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

zpzjzj commented Jun 1, 2026

Summary

Problem

Fix

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant