NVIDIA · ericksoa · Apr 13, 2026 · Apr 13, 2026 · Apr 13, 2026 · Apr 13, 2026
diff --git a/.agents/skills/nemoclaw-user-configure-inference/SKILL.md b/.agents/skills/nemoclaw-user-configure-inference/SKILL.md
@@ -128,6 +128,35 @@ $ openshell inference set --provider compatible-anthropic-endpoint --model <mode
 
 If the provider itself needs to change, rerun `nemoclaw onboard`.
 
+#### Switching from Responses API to Chat Completions
+
+If onboarding selected `/v1/responses` but the agent fails at runtime (for
+example, because the backend does not emit the streaming events OpenClaw
+requires), re-run onboarding so the wizard re-probes the endpoint and bakes
+the correct API path into the image:
+
+```console
+$ nemoclaw onboard
+```
+
+Select the same provider and endpoint again.
+The updated streaming probe will detect incomplete `/v1/responses` support
+and select `/v1/chat/completions` automatically.
+
+To force `/v1/chat/completions` without re-probing, set `NEMOCLAW_PREFERRED_API`
+before onboarding:
+
+```console
+$ NEMOCLAW_PREFERRED_API=openai-completions nemoclaw onboard
+```
+
+> **Note:** `NEMOCLAW_INFERENCE_API_OVERRIDE` patches the config at container startup but
+> does not update the Dockerfile ARG baked into the image.
+> If you recreate the sandbox without the override env var, the image reverts to
+> the original API path.
+> A fresh `nemoclaw onboard` is the reliable fix because it updates both the
+> session and the baked image.
+
 ## Step 2: Cross-Provider Switching
 
 Switching to a different provider family (for example, from NVIDIA Endpoints to Anthropic) requires updating both the gateway route and the sandbox config.
@@ -147,7 +176,7 @@ $ nemoclaw onboard --resume --recreate-sandbox
 ```
 
 The entrypoint patches `openclaw.json` at container startup with the override values.
-No image rebuild is needed.
+You do not need to rebuild the image.
 Remove the env vars and recreate the sandbox to revert to the original model.
 
 `NEMOCLAW_INFERENCE_API_OVERRIDE` accepts `openai-completions` (for NVIDIA, OpenAI, Gemini, compatible endpoints) or `anthropic-messages` (for Anthropic and Anthropic-compatible endpoints).
@@ -281,6 +310,33 @@ $ NEMOCLAW_PROVIDER=custom \
 | `NEMOCLAW_MODEL` | Model ID as reported by the server. |
 | `COMPATIBLE_API_KEY` | API key for the endpoint. Use any non-empty value if authentication is not required. |
 
+### Forcing Chat Completions API
+
+Some OpenAI-compatible servers (such as SGLang) expose `/v1/responses` but do
+not emit the granular streaming events that OpenClaw requires.
+NemoClaw tests streaming events during onboarding and falls back to
+`/v1/chat/completions` automatically when it detects incomplete streaming.
+
+If you need to bypass the `/v1/responses` probe entirely, set
+`NEMOCLAW_PREFERRED_API` before running onboard:
+
+```console
+$ NEMOCLAW_PREFERRED_API=openai-completions nemoclaw onboard
+```
+
+Set this variable to make the wizard skip the `/v1/responses` probe and use
+`/v1/chat/completions` directly.
+You can use it in both interactive and non-interactive mode.
+
+| Variable | Values | Default |
+|---|---|---|
+| `NEMOCLAW_PREFERRED_API` | `openai-completions`, `chat-completions` | unset (auto-detect) |
+
+If you already onboarded and the sandbox is failing at runtime, re-run
+`nemoclaw onboard` to re-probe the endpoint and bake the correct API path
+into the image.
+Refer to Switch Inference Models (see the `nemoclaw-user-configure-inference` skill) for details.
+
 ## Step 7: Anthropic-Compatible Server
 
 If your local server implements the Anthropic Messages API (`/v1/messages`), choose **Other Anthropic-compatible endpoint** during onboarding instead.

diff --git a/.agents/skills/nemoclaw-user-reference/references/troubleshooting.md b/.agents/skills/nemoclaw-user-reference/references/troubleshooting.md
@@ -256,6 +256,33 @@ $ export NEMOCLAW_LOCAL_INFERENCE_TIMEOUT=300
 $ nemoclaw onboard
 ```
 
+### Agent fails at runtime after onboarding succeeds with a compatible endpoint
+
+Some OpenAI-compatible servers (such as SGLang) expose `/v1/responses` and pass
+the onboarding validation probe, but their streaming mode is incomplete.
+OpenClaw requires granular streaming events like `response.output_text.delta`
+that these backends do not emit.
+
+NemoClaw now tests streaming events during the `/v1/responses` probe and falls
+back to `/v1/chat/completions` automatically.
+If you onboarded before this check was added, re-run onboarding so the wizard
+re-probes the endpoint and bakes the correct API path into the image:
+
+```console
+$ nemoclaw onboard
+```
+
+To force `/v1/chat/completions` without re-probing, set `NEMOCLAW_PREFERRED_API`:
+
+```console
+$ NEMOCLAW_PREFERRED_API=openai-completions nemoclaw onboard
+```
+
+Do not rely on `NEMOCLAW_INFERENCE_API_OVERRIDE` alone — it patches the config
+at container startup but does not update the Dockerfile ARG baked into the
+image.
+A fresh `nemoclaw onboard` is the reliable fix.
+
 ### `NEMOCLAW_DISABLE_DEVICE_AUTH=1` does not change an existing sandbox
 
 This is expected behavior.

diff --git a/docs/inference/switch-inference-providers.md b/docs/inference/switch-inference-providers.md
@@ -73,6 +73,37 @@ $ openshell inference set --provider compatible-anthropic-endpoint --model <mode
 
 If the provider itself needs to change, rerun `nemoclaw onboard`.
 
+#### Switching from Responses API to Chat Completions
+
+If onboarding selected `/v1/responses` but the agent fails at runtime (for
+example, because the backend does not emit the streaming events OpenClaw
+requires), re-run onboarding so the wizard re-probes the endpoint and bakes
+the correct API path into the image:
+
+```console
+$ nemoclaw onboard
+```
+
+Select the same provider and endpoint again.
+The updated streaming probe will detect incomplete `/v1/responses` support
+and select `/v1/chat/completions` automatically.
+
+To force `/v1/chat/completions` without re-probing, set `NEMOCLAW_PREFERRED_API`
+before onboarding:
+
+```console
+$ NEMOCLAW_PREFERRED_API=openai-completions nemoclaw onboard
+```
+
+:::{note}
+`NEMOCLAW_INFERENCE_API_OVERRIDE` patches the config at container startup but
+does not update the Dockerfile ARG baked into the image.
+If you recreate the sandbox without the override env var, the image reverts to
+the original API path.
+A fresh `nemoclaw onboard` is the reliable fix because it updates both the
+session and the baked image.
+:::
+
 ## Cross-Provider Switching
 
 Switching to a different provider family (for example, from NVIDIA Endpoints to Anthropic) requires updating both the gateway route and the sandbox config.
@@ -92,7 +123,7 @@ $ nemoclaw onboard --resume --recreate-sandbox
 ```
 
 The entrypoint patches `openclaw.json` at container startup with the override values.
-No image rebuild is needed.
+You do not need to rebuild the image.
 Remove the env vars and recreate the sandbox to revert to the original model.
 
 `NEMOCLAW_INFERENCE_API_OVERRIDE` accepts `openai-completions` (for NVIDIA, OpenAI, Gemini, compatible endpoints) or `anthropic-messages` (for Anthropic and Anthropic-compatible endpoints).

diff --git a/docs/inference/use-local-inference.md b/docs/inference/use-local-inference.md
@@ -130,6 +130,33 @@ $ NEMOCLAW_PROVIDER=custom \
 | `NEMOCLAW_MODEL` | Model ID as reported by the server. |
 | `COMPATIBLE_API_KEY` | API key for the endpoint. Use any non-empty value if authentication is not required. |
 
+### Forcing Chat Completions API
+
+Some OpenAI-compatible servers (such as SGLang) expose `/v1/responses` but do
+not emit the granular streaming events that OpenClaw requires.
+NemoClaw tests streaming events during onboarding and falls back to
+`/v1/chat/completions` automatically when it detects incomplete streaming.
+
+If you need to bypass the `/v1/responses` probe entirely, set
+`NEMOCLAW_PREFERRED_API` before running onboard:
+
+```console
+$ NEMOCLAW_PREFERRED_API=openai-completions nemoclaw onboard
+```
+
+Set this variable to make the wizard skip the `/v1/responses` probe and use
+`/v1/chat/completions` directly.
+You can use it in both interactive and non-interactive mode.
+
+| Variable | Values | Default |
+|---|---|---|
+| `NEMOCLAW_PREFERRED_API` | `openai-completions`, `chat-completions` | unset (auto-detect) |
+
+If you already onboarded and the sandbox is failing at runtime, re-run
+`nemoclaw onboard` to re-probe the endpoint and bake the correct API path
+into the image.
+Refer to [Switch Inference Models](switch-inference-providers.md) for details.
+
 ## Anthropic-Compatible Server
 
 If your local server implements the Anthropic Messages API (`/v1/messages`), choose **Other Anthropic-compatible endpoint** during onboarding instead.

diff --git a/docs/reference/troubleshooting.md b/docs/reference/troubleshooting.md
@@ -286,6 +286,33 @@ $ export NEMOCLAW_LOCAL_INFERENCE_TIMEOUT=300
 $ nemoclaw onboard
 ```
 
+### Agent fails at runtime after onboarding succeeds with a compatible endpoint
+
+Some OpenAI-compatible servers (such as SGLang) expose `/v1/responses` and pass
+the onboarding validation probe, but their streaming mode is incomplete.
+OpenClaw requires granular streaming events like `response.output_text.delta`
+that these backends do not emit.
+
+NemoClaw now tests streaming events during the `/v1/responses` probe and falls
+back to `/v1/chat/completions` automatically.
+If you onboarded before this check was added, re-run onboarding so the wizard
+re-probes the endpoint and bakes the correct API path into the image:
+
+```console
+$ nemoclaw onboard
+```
+
+To force `/v1/chat/completions` without re-probing, set `NEMOCLAW_PREFERRED_API`:
+
+```console
+$ NEMOCLAW_PREFERRED_API=openai-completions nemoclaw onboard
+```
+
+Do not rely on `NEMOCLAW_INFERENCE_API_OVERRIDE` alone — it patches the config
+at container startup but does not update the Dockerfile ARG baked into the
+image.
+A fresh `nemoclaw onboard` is the reliable fix.
+
 ### `NEMOCLAW_DISABLE_DEVICE_AUTH=1` does not change an existing sandbox
 
 This is expected behavior.

diff --git a/src/lib/http-probe.test.ts b/src/lib/http-probe.test.ts
@@ -8,6 +8,7 @@ import { describe, expect, it } from "vitest";
 import {
   getCurlTimingArgs,
   runCurlProbe,
+  runStreamingEventProbe,
   summarizeCurlFailure,
   summarizeProbeError,
   summarizeProbeFailure,
@@ -85,3 +86,154 @@ describe("http-probe helpers", () => {
     expect(result.stderr).toContain("spawn ENOENT");
   });
 });
+
+describe("runStreamingEventProbe", () => {
+  /** Helper to build a spawnSyncImpl that writes SSE content to the -o file. */
+  function mockStreaming(sseBody: string, exitCode = 0) {
+    return (_command: string, args: readonly string[]) => {
+      const oIdx = args.indexOf("-o");
+      if (oIdx !== -1) {
+        const outputPath = args[oIdx + 1] as string;
+        fs.writeFileSync(outputPath, sseBody);
+      }
+      return {
+        pid: 1,
+        output: [],
+        stdout: "",
+        stderr: "",
+        status: exitCode,
+        signal: null,
+      };
+    };
+  }
+
+  it("passes when all required streaming events are present", () => {
+    const sseBody = [
+      "event: response.created",
+      'data: {"id":"resp_1"}',
+      "",
+      "event: response.in_progress",
+      'data: {"id":"resp_1"}',
+      "",
+      "event: response.output_item.added",
+      'data: {"id":"resp_1"}',
+      "",
+      "event: response.content_part.added",
+      'data: {"id":"resp_1"}',
+      "",
+      "event: response.output_text.delta",
+      'data: {"delta":"OK"}',
+      "",
+      "event: response.output_text.done",
+      'data: {"text":"OK"}',
+      "",
+      "event: response.content_part.done",
+      'data: {"id":"resp_1"}',
+      "",
+      "event: response.completed",
+      'data: {"id":"resp_1"}',
+      "",
+    ].join("\n");
+
+    const result = runStreamingEventProbe(
+      ["-sS", "--max-time", "15", "https://example.test/v1/responses"],
+      { spawnSyncImpl: mockStreaming(sseBody) },
+    );
+
+    expect(result.ok).toBe(true);
+    expect(result.missingEvents).toEqual([]);
+  });
+
+  it("fails when only basic lifecycle events are present (SGLang-like)", () => {
+    const sseBody = [
+      "event: response.created",
+      'data: {"id":"resp_1"}',
+      "",
+      "event: response.in_progress",
+      'data: {"id":"resp_1"}',
+      "",
+      "event: response.completed",
+      'data: {"id":"resp_1","text":"OK"}',
+      "",
+    ].join("\n");
+
+    const result = runStreamingEventProbe(
+      ["-sS", "--max-time", "15", "https://example.test/v1/responses"],
+      { spawnSyncImpl: mockStreaming(sseBody) },
+    );
+
+    expect(result.ok).toBe(false);
+    expect(result.missingEvents).toContain("response.output_text.delta");
+    expect(result.message).toContain("response.output_text.delta");
+  });
+
+  it("still passes if curl exits with 28 (timeout) but events were captured", () => {
+    const sseBody = [
+      "event: response.created",
+      'data: {"id":"resp_1"}',
+      "",
+      "event: response.output_text.delta",
+      'data: {"delta":"O"}',
+      "",
+    ].join("\n");
+
+    const result = runStreamingEventProbe(
+      ["-sS", "--max-time", "15", "https://example.test/v1/responses"],
+      { spawnSyncImpl: mockStreaming(sseBody, 28) },
+    );
+
+    expect(result.ok).toBe(true);
+    expect(result.missingEvents).toEqual([]);
+  });
+
+  it("fails on spawn error", () => {
+    const result = runStreamingEventProbe(
+      ["-sS", "https://example.test/v1/responses"],
+      {
+        spawnSyncImpl: () => {
+          const error = Object.assign(new Error("spawn ENOENT"), { code: "ENOENT" });
+          return {
+            pid: 1,
+            output: [],
+            stdout: "",
+            stderr: "",
+            status: null,
+            signal: null,
+            error,
+          };
+        },
+      },
+    );
+
+    expect(result.ok).toBe(false);
+    expect(result.message).toContain("Streaming probe failed");
+  });
+
+  it("cleans up temp files after probe", () => {
+    let outputPath = "";
+    runStreamingEventProbe(
+      ["-sS", "--max-time", "15", "https://example.test/v1/responses"],
+      {
+        spawnSyncImpl: (_command, args) => {
+          const oIdx = args.indexOf("-o");
+          if (oIdx !== -1) {
+            outputPath = args[oIdx + 1] as string;
+            fs.writeFileSync(outputPath, "event: response.output_text.delta\ndata: {}\n");
+          }
+          return {
+            pid: 1,
+            output: [],
+            stdout: "",
+            stderr: "",
+            status: 0,
+            signal: null,
+          };
+        },
+      },
+    );
+
+    expect(outputPath).not.toBe("");
+    expect(fs.existsSync(outputPath)).toBe(false);
+    expect(fs.existsSync(path.dirname(outputPath))).toBe(false);
+  });
+});