Skip to content

Commit 255c357

Browse files
ericksoaColinM-sys
authored andcommitted
feat(inference): validate streaming events for /v1/responses and add NEMOCLAW_PREFERRED_API override (NVIDIA#1833)
## Summary - Adds streaming SSE event validation to the `/v1/responses` probe for custom OpenAI-compatible endpoints, catching backends like SGLang that return valid non-streaming responses but emit incomplete streaming events - Adds `NEMOCLAW_PREFERRED_API=openai-completions` env var to bypass `/v1/responses` probe entirely during onboarding - Documents both the env var override and the existing `NEMOCLAW_INFERENCE_API_OVERRIDE` workaround for already-onboarded sandboxes ## Context Community user reported SGLang passes onboarding validation for `/v1/responses` but fails at runtime because its streaming mode only emits 3 lifecycle events (`response.created`, `response.in_progress`, `response.completed`) — missing the granular content deltas OpenClaw requires (`response.output_text.delta`, etc.). ## Test plan - [ ] Unit tests for `shouldForceCompletionsApi()` (6 cases) and `runStreamingEventProbe()` (5 cases) pass - [ ] `NEMOCLAW_PREFERRED_API=openai-completions` skips `/v1/responses` probe during custom endpoint onboarding - [ ] Streaming probe detects SGLang-like incomplete SSE events and falls back to `/chat/completions` - [ ] Full test suite green <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added NEMOCLAW_PREFERRED_API to force Chat Completions (works interactive/non‑interactive) and optionally skip the /v1/responses probe * Onboarding now validates streaming events and will automatically fall back to Chat Completions if required events are missing; transport/probe failures produce a hard failure * **Documentation** * New troubleshooting and recovery steps (rerun `nemoclaw onboard` to re‑probe and bake the correct API) * Clarified that NEMOCLAW_INFERENCE_API_OVERRIDE only patches startup config and does not update baked image ARGs * Minor wording tweak about image rebuilds * **Tests** * Added tests covering streaming probes, cleanup, error cases, and the preference logic <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Aaron Erickson <aerickson@nvidia.com> Signed-off-by: ColinM-sys <cmcdonough@50words.com>
1 parent 377c543 commit 255c357

File tree

10 files changed

+509
-3
lines changed

10 files changed

+509
-3
lines changed

.agents/skills/nemoclaw-user-configure-inference/SKILL.md

Lines changed: 57 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -128,6 +128,35 @@ $ openshell inference set --provider compatible-anthropic-endpoint --model <mode
128128

129129
If the provider itself needs to change, rerun `nemoclaw onboard`.
130130

131+
#### Switching from Responses API to Chat Completions
132+
133+
If onboarding selected `/v1/responses` but the agent fails at runtime (for
134+
example, because the backend does not emit the streaming events OpenClaw
135+
requires), re-run onboarding so the wizard re-probes the endpoint and bakes
136+
the correct API path into the image:
137+
138+
```console
139+
$ nemoclaw onboard
140+
```
141+
142+
Select the same provider and endpoint again.
143+
The updated streaming probe will detect incomplete `/v1/responses` support
144+
and select `/v1/chat/completions` automatically.
145+
146+
To force `/v1/chat/completions` without re-probing, set `NEMOCLAW_PREFERRED_API`
147+
before onboarding:
148+
149+
```console
150+
$ NEMOCLAW_PREFERRED_API=openai-completions nemoclaw onboard
151+
```
152+
153+
> **Note:** `NEMOCLAW_INFERENCE_API_OVERRIDE` patches the config at container startup but
154+
> does not update the Dockerfile ARG baked into the image.
155+
> If you recreate the sandbox without the override env var, the image reverts to
156+
> the original API path.
157+
> A fresh `nemoclaw onboard` is the reliable fix because it updates both the
158+
> session and the baked image.
159+
131160
## Step 2: Cross-Provider Switching
132161

133162
Switching to a different provider family (for example, from NVIDIA Endpoints to Anthropic) requires updating both the gateway route and the sandbox config.
@@ -147,7 +176,7 @@ $ nemoclaw onboard --resume --recreate-sandbox
147176
```
148177

149178
The entrypoint patches `openclaw.json` at container startup with the override values.
150-
No image rebuild is needed.
179+
You do not need to rebuild the image.
151180
Remove the env vars and recreate the sandbox to revert to the original model.
152181

153182
`NEMOCLAW_INFERENCE_API_OVERRIDE` accepts `openai-completions` (for NVIDIA, OpenAI, Gemini, compatible endpoints) or `anthropic-messages` (for Anthropic and Anthropic-compatible endpoints).
@@ -281,6 +310,33 @@ $ NEMOCLAW_PROVIDER=custom \
281310
| `NEMOCLAW_MODEL` | Model ID as reported by the server. |
282311
| `COMPATIBLE_API_KEY` | API key for the endpoint. Use any non-empty value if authentication is not required. |
283312

313+
### Forcing Chat Completions API
314+
315+
Some OpenAI-compatible servers (such as SGLang) expose `/v1/responses` but do
316+
not emit the granular streaming events that OpenClaw requires.
317+
NemoClaw tests streaming events during onboarding and falls back to
318+
`/v1/chat/completions` automatically when it detects incomplete streaming.
319+
320+
If you need to bypass the `/v1/responses` probe entirely, set
321+
`NEMOCLAW_PREFERRED_API` before running onboard:
322+
323+
```console
324+
$ NEMOCLAW_PREFERRED_API=openai-completions nemoclaw onboard
325+
```
326+
327+
Set this variable to make the wizard skip the `/v1/responses` probe and use
328+
`/v1/chat/completions` directly.
329+
You can use it in both interactive and non-interactive mode.
330+
331+
| Variable | Values | Default |
332+
|---|---|---|
333+
| `NEMOCLAW_PREFERRED_API` | `openai-completions`, `chat-completions` | unset (auto-detect) |
334+
335+
If you already onboarded and the sandbox is failing at runtime, re-run
336+
`nemoclaw onboard` to re-probe the endpoint and bake the correct API path
337+
into the image.
338+
Refer to Switch Inference Models (see the `nemoclaw-user-configure-inference` skill) for details.
339+
284340
## Step 7: Anthropic-Compatible Server
285341

286342
If your local server implements the Anthropic Messages API (`/v1/messages`), choose **Other Anthropic-compatible endpoint** during onboarding instead.

.agents/skills/nemoclaw-user-reference/references/troubleshooting.md

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -256,6 +256,33 @@ $ export NEMOCLAW_LOCAL_INFERENCE_TIMEOUT=300
256256
$ nemoclaw onboard
257257
```
258258

259+
### Agent fails at runtime after onboarding succeeds with a compatible endpoint
260+
261+
Some OpenAI-compatible servers (such as SGLang) expose `/v1/responses` and pass
262+
the onboarding validation probe, but their streaming mode is incomplete.
263+
OpenClaw requires granular streaming events like `response.output_text.delta`
264+
that these backends do not emit.
265+
266+
NemoClaw now tests streaming events during the `/v1/responses` probe and falls
267+
back to `/v1/chat/completions` automatically.
268+
If you onboarded before this check was added, re-run onboarding so the wizard
269+
re-probes the endpoint and bakes the correct API path into the image:
270+
271+
```console
272+
$ nemoclaw onboard
273+
```
274+
275+
To force `/v1/chat/completions` without re-probing, set `NEMOCLAW_PREFERRED_API`:
276+
277+
```console
278+
$ NEMOCLAW_PREFERRED_API=openai-completions nemoclaw onboard
279+
```
280+
281+
Do not rely on `NEMOCLAW_INFERENCE_API_OVERRIDE` alone — it patches the config
282+
at container startup but does not update the Dockerfile ARG baked into the
283+
image.
284+
A fresh `nemoclaw onboard` is the reliable fix.
285+
259286
### `NEMOCLAW_DISABLE_DEVICE_AUTH=1` does not change an existing sandbox
260287

261288
This is expected behavior.

docs/inference/switch-inference-providers.md

Lines changed: 32 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,37 @@ $ openshell inference set --provider compatible-anthropic-endpoint --model <mode
7373

7474
If the provider itself needs to change, rerun `nemoclaw onboard`.
7575

76+
#### Switching from Responses API to Chat Completions
77+
78+
If onboarding selected `/v1/responses` but the agent fails at runtime (for
79+
example, because the backend does not emit the streaming events OpenClaw
80+
requires), re-run onboarding so the wizard re-probes the endpoint and bakes
81+
the correct API path into the image:
82+
83+
```console
84+
$ nemoclaw onboard
85+
```
86+
87+
Select the same provider and endpoint again.
88+
The updated streaming probe will detect incomplete `/v1/responses` support
89+
and select `/v1/chat/completions` automatically.
90+
91+
To force `/v1/chat/completions` without re-probing, set `NEMOCLAW_PREFERRED_API`
92+
before onboarding:
93+
94+
```console
95+
$ NEMOCLAW_PREFERRED_API=openai-completions nemoclaw onboard
96+
```
97+
98+
:::{note}
99+
`NEMOCLAW_INFERENCE_API_OVERRIDE` patches the config at container startup but
100+
does not update the Dockerfile ARG baked into the image.
101+
If you recreate the sandbox without the override env var, the image reverts to
102+
the original API path.
103+
A fresh `nemoclaw onboard` is the reliable fix because it updates both the
104+
session and the baked image.
105+
:::
106+
76107
## Cross-Provider Switching
77108

78109
Switching to a different provider family (for example, from NVIDIA Endpoints to Anthropic) requires updating both the gateway route and the sandbox config.
@@ -92,7 +123,7 @@ $ nemoclaw onboard --resume --recreate-sandbox
92123
```
93124

94125
The entrypoint patches `openclaw.json` at container startup with the override values.
95-
No image rebuild is needed.
126+
You do not need to rebuild the image.
96127
Remove the env vars and recreate the sandbox to revert to the original model.
97128

98129
`NEMOCLAW_INFERENCE_API_OVERRIDE` accepts `openai-completions` (for NVIDIA, OpenAI, Gemini, compatible endpoints) or `anthropic-messages` (for Anthropic and Anthropic-compatible endpoints).

docs/inference/use-local-inference.md

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -130,6 +130,33 @@ $ NEMOCLAW_PROVIDER=custom \
130130
| `NEMOCLAW_MODEL` | Model ID as reported by the server. |
131131
| `COMPATIBLE_API_KEY` | API key for the endpoint. Use any non-empty value if authentication is not required. |
132132

133+
### Forcing Chat Completions API
134+
135+
Some OpenAI-compatible servers (such as SGLang) expose `/v1/responses` but do
136+
not emit the granular streaming events that OpenClaw requires.
137+
NemoClaw tests streaming events during onboarding and falls back to
138+
`/v1/chat/completions` automatically when it detects incomplete streaming.
139+
140+
If you need to bypass the `/v1/responses` probe entirely, set
141+
`NEMOCLAW_PREFERRED_API` before running onboard:
142+
143+
```console
144+
$ NEMOCLAW_PREFERRED_API=openai-completions nemoclaw onboard
145+
```
146+
147+
Set this variable to make the wizard skip the `/v1/responses` probe and use
148+
`/v1/chat/completions` directly.
149+
You can use it in both interactive and non-interactive mode.
150+
151+
| Variable | Values | Default |
152+
|---|---|---|
153+
| `NEMOCLAW_PREFERRED_API` | `openai-completions`, `chat-completions` | unset (auto-detect) |
154+
155+
If you already onboarded and the sandbox is failing at runtime, re-run
156+
`nemoclaw onboard` to re-probe the endpoint and bake the correct API path
157+
into the image.
158+
Refer to [Switch Inference Models](switch-inference-providers.md) for details.
159+
133160
## Anthropic-Compatible Server
134161

135162
If your local server implements the Anthropic Messages API (`/v1/messages`), choose **Other Anthropic-compatible endpoint** during onboarding instead.

docs/reference/troubleshooting.md

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -286,6 +286,33 @@ $ export NEMOCLAW_LOCAL_INFERENCE_TIMEOUT=300
286286
$ nemoclaw onboard
287287
```
288288

289+
### Agent fails at runtime after onboarding succeeds with a compatible endpoint
290+
291+
Some OpenAI-compatible servers (such as SGLang) expose `/v1/responses` and pass
292+
the onboarding validation probe, but their streaming mode is incomplete.
293+
OpenClaw requires granular streaming events like `response.output_text.delta`
294+
that these backends do not emit.
295+
296+
NemoClaw now tests streaming events during the `/v1/responses` probe and falls
297+
back to `/v1/chat/completions` automatically.
298+
If you onboarded before this check was added, re-run onboarding so the wizard
299+
re-probes the endpoint and bakes the correct API path into the image:
300+
301+
```console
302+
$ nemoclaw onboard
303+
```
304+
305+
To force `/v1/chat/completions` without re-probing, set `NEMOCLAW_PREFERRED_API`:
306+
307+
```console
308+
$ NEMOCLAW_PREFERRED_API=openai-completions nemoclaw onboard
309+
```
310+
311+
Do not rely on `NEMOCLAW_INFERENCE_API_OVERRIDE` alone — it patches the config
312+
at container startup but does not update the Dockerfile ARG baked into the
313+
image.
314+
A fresh `nemoclaw onboard` is the reliable fix.
315+
289316
### `NEMOCLAW_DISABLE_DEVICE_AUTH=1` does not change an existing sandbox
290317

291318
This is expected behavior.

src/lib/http-probe.test.ts

Lines changed: 152 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ import { describe, expect, it } from "vitest";
88
import {
99
getCurlTimingArgs,
1010
runCurlProbe,
11+
runStreamingEventProbe,
1112
summarizeCurlFailure,
1213
summarizeProbeError,
1314
summarizeProbeFailure,
@@ -85,3 +86,154 @@ describe("http-probe helpers", () => {
8586
expect(result.stderr).toContain("spawn ENOENT");
8687
});
8788
});
89+
90+
describe("runStreamingEventProbe", () => {
91+
/** Helper to build a spawnSyncImpl that writes SSE content to the -o file. */
92+
function mockStreaming(sseBody: string, exitCode = 0) {
93+
return (_command: string, args: readonly string[]) => {
94+
const oIdx = args.indexOf("-o");
95+
if (oIdx !== -1) {
96+
const outputPath = args[oIdx + 1] as string;
97+
fs.writeFileSync(outputPath, sseBody);
98+
}
99+
return {
100+
pid: 1,
101+
output: [],
102+
stdout: "",
103+
stderr: "",
104+
status: exitCode,
105+
signal: null,
106+
};
107+
};
108+
}
109+
110+
it("passes when all required streaming events are present", () => {
111+
const sseBody = [
112+
"event: response.created",
113+
'data: {"id":"resp_1"}',
114+
"",
115+
"event: response.in_progress",
116+
'data: {"id":"resp_1"}',
117+
"",
118+
"event: response.output_item.added",
119+
'data: {"id":"resp_1"}',
120+
"",
121+
"event: response.content_part.added",
122+
'data: {"id":"resp_1"}',
123+
"",
124+
"event: response.output_text.delta",
125+
'data: {"delta":"OK"}',
126+
"",
127+
"event: response.output_text.done",
128+
'data: {"text":"OK"}',
129+
"",
130+
"event: response.content_part.done",
131+
'data: {"id":"resp_1"}',
132+
"",
133+
"event: response.completed",
134+
'data: {"id":"resp_1"}',
135+
"",
136+
].join("\n");
137+
138+
const result = runStreamingEventProbe(
139+
["-sS", "--max-time", "15", "https://example.test/v1/responses"],
140+
{ spawnSyncImpl: mockStreaming(sseBody) },
141+
);
142+
143+
expect(result.ok).toBe(true);
144+
expect(result.missingEvents).toEqual([]);
145+
});
146+
147+
it("fails when only basic lifecycle events are present (SGLang-like)", () => {
148+
const sseBody = [
149+
"event: response.created",
150+
'data: {"id":"resp_1"}',
151+
"",
152+
"event: response.in_progress",
153+
'data: {"id":"resp_1"}',
154+
"",
155+
"event: response.completed",
156+
'data: {"id":"resp_1","text":"OK"}',
157+
"",
158+
].join("\n");
159+
160+
const result = runStreamingEventProbe(
161+
["-sS", "--max-time", "15", "https://example.test/v1/responses"],
162+
{ spawnSyncImpl: mockStreaming(sseBody) },
163+
);
164+
165+
expect(result.ok).toBe(false);
166+
expect(result.missingEvents).toContain("response.output_text.delta");
167+
expect(result.message).toContain("response.output_text.delta");
168+
});
169+
170+
it("still passes if curl exits with 28 (timeout) but events were captured", () => {
171+
const sseBody = [
172+
"event: response.created",
173+
'data: {"id":"resp_1"}',
174+
"",
175+
"event: response.output_text.delta",
176+
'data: {"delta":"O"}',
177+
"",
178+
].join("\n");
179+
180+
const result = runStreamingEventProbe(
181+
["-sS", "--max-time", "15", "https://example.test/v1/responses"],
182+
{ spawnSyncImpl: mockStreaming(sseBody, 28) },
183+
);
184+
185+
expect(result.ok).toBe(true);
186+
expect(result.missingEvents).toEqual([]);
187+
});
188+
189+
it("fails on spawn error", () => {
190+
const result = runStreamingEventProbe(
191+
["-sS", "https://example.test/v1/responses"],
192+
{
193+
spawnSyncImpl: () => {
194+
const error = Object.assign(new Error("spawn ENOENT"), { code: "ENOENT" });
195+
return {
196+
pid: 1,
197+
output: [],
198+
stdout: "",
199+
stderr: "",
200+
status: null,
201+
signal: null,
202+
error,
203+
};
204+
},
205+
},
206+
);
207+
208+
expect(result.ok).toBe(false);
209+
expect(result.message).toContain("Streaming probe failed");
210+
});
211+
212+
it("cleans up temp files after probe", () => {
213+
let outputPath = "";
214+
runStreamingEventProbe(
215+
["-sS", "--max-time", "15", "https://example.test/v1/responses"],
216+
{
217+
spawnSyncImpl: (_command, args) => {
218+
const oIdx = args.indexOf("-o");
219+
if (oIdx !== -1) {
220+
outputPath = args[oIdx + 1] as string;
221+
fs.writeFileSync(outputPath, "event: response.output_text.delta\ndata: {}\n");
222+
}
223+
return {
224+
pid: 1,
225+
output: [],
226+
stdout: "",
227+
stderr: "",
228+
status: 0,
229+
signal: null,
230+
};
231+
},
232+
},
233+
);
234+
235+
expect(outputPath).not.toBe("");
236+
expect(fs.existsSync(outputPath)).toBe(false);
237+
expect(fs.existsSync(path.dirname(outputPath))).toBe(false);
238+
});
239+
});

0 commit comments

Comments
 (0)