Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 57 additions & 1 deletion .agents/skills/nemoclaw-user-configure-inference/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,35 @@ $ openshell inference set --provider compatible-anthropic-endpoint --model <mode

If the provider itself needs to change, rerun `nemoclaw onboard`.

#### Switching from Responses API to Chat Completions

If onboarding selected `/v1/responses` but the agent fails at runtime (for
example, because the backend does not emit the streaming events OpenClaw
requires), re-run onboarding so the wizard re-probes the endpoint and bakes
the correct API path into the image:

```console
$ nemoclaw onboard
```

Select the same provider and endpoint again.
The updated streaming probe will detect incomplete `/v1/responses` support
and select `/v1/chat/completions` automatically.

To force `/v1/chat/completions` without re-probing, set `NEMOCLAW_PREFERRED_API`
before onboarding:

```console
$ NEMOCLAW_PREFERRED_API=openai-completions nemoclaw onboard
```

> **Note:** `NEMOCLAW_INFERENCE_API_OVERRIDE` patches the config at container startup but
> does not update the Dockerfile ARG baked into the image.
> If you recreate the sandbox without the override env var, the image reverts to
> the original API path.
> A fresh `nemoclaw onboard` is the reliable fix because it updates both the
> session and the baked image.

Comment thread
coderabbitai[bot] marked this conversation as resolved.
## Step 2: Cross-Provider Switching

Switching to a different provider family (for example, from NVIDIA Endpoints to Anthropic) requires updating both the gateway route and the sandbox config.
Expand All @@ -147,7 +176,7 @@ $ nemoclaw onboard --resume --recreate-sandbox
```

The entrypoint patches `openclaw.json` at container startup with the override values.
No image rebuild is needed.
You do not need to rebuild the image.
Remove the env vars and recreate the sandbox to revert to the original model.

`NEMOCLAW_INFERENCE_API_OVERRIDE` accepts `openai-completions` (for NVIDIA, OpenAI, Gemini, compatible endpoints) or `anthropic-messages` (for Anthropic and Anthropic-compatible endpoints).
Expand Down Expand Up @@ -281,6 +310,33 @@ $ NEMOCLAW_PROVIDER=custom \
| `NEMOCLAW_MODEL` | Model ID as reported by the server. |
| `COMPATIBLE_API_KEY` | API key for the endpoint. Use any non-empty value if authentication is not required. |

### Forcing Chat Completions API

Some OpenAI-compatible servers (such as SGLang) expose `/v1/responses` but do
not emit the granular streaming events that OpenClaw requires.
NemoClaw tests streaming events during onboarding and falls back to
`/v1/chat/completions` automatically when it detects incomplete streaming.

If you need to bypass the `/v1/responses` probe entirely, set
`NEMOCLAW_PREFERRED_API` before running onboard:

```console
$ NEMOCLAW_PREFERRED_API=openai-completions nemoclaw onboard
```

Set this variable to make the wizard skip the `/v1/responses` probe and use
`/v1/chat/completions` directly.
You can use it in both interactive and non-interactive mode.

| Variable | Values | Default |
|---|---|---|
| `NEMOCLAW_PREFERRED_API` | `openai-completions`, `chat-completions` | unset (auto-detect) |

If you already onboarded and the sandbox is failing at runtime, re-run
`nemoclaw onboard` to re-probe the endpoint and bake the correct API path
into the image.
Refer to Switch Inference Models (see the `nemoclaw-user-configure-inference` skill) for details.

## Step 7: Anthropic-Compatible Server

If your local server implements the Anthropic Messages API (`/v1/messages`), choose **Other Anthropic-compatible endpoint** during onboarding instead.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -256,6 +256,33 @@ $ export NEMOCLAW_LOCAL_INFERENCE_TIMEOUT=300
$ nemoclaw onboard
```

### Agent fails at runtime after onboarding succeeds with a compatible endpoint

Some OpenAI-compatible servers (such as SGLang) expose `/v1/responses` and pass
the onboarding validation probe, but their streaming mode is incomplete.
OpenClaw requires granular streaming events like `response.output_text.delta`
that these backends do not emit.

NemoClaw now tests streaming events during the `/v1/responses` probe and falls
back to `/v1/chat/completions` automatically.
If you onboarded before this check was added, re-run onboarding so the wizard
re-probes the endpoint and bakes the correct API path into the image:

```console
$ nemoclaw onboard
```

To force `/v1/chat/completions` without re-probing, set `NEMOCLAW_PREFERRED_API`:

```console
$ NEMOCLAW_PREFERRED_API=openai-completions nemoclaw onboard
```

Do not rely on `NEMOCLAW_INFERENCE_API_OVERRIDE` alone — it patches the config
at container startup but does not update the Dockerfile ARG baked into the
image.
A fresh `nemoclaw onboard` is the reliable fix.

### `NEMOCLAW_DISABLE_DEVICE_AUTH=1` does not change an existing sandbox

This is expected behavior.
Expand Down
33 changes: 32 additions & 1 deletion docs/inference/switch-inference-providers.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,37 @@ $ openshell inference set --provider compatible-anthropic-endpoint --model <mode

If the provider itself needs to change, rerun `nemoclaw onboard`.

#### Switching from Responses API to Chat Completions

If onboarding selected `/v1/responses` but the agent fails at runtime (for
example, because the backend does not emit the streaming events OpenClaw
requires), re-run onboarding so the wizard re-probes the endpoint and bakes
the correct API path into the image:

```console
$ nemoclaw onboard
```

Select the same provider and endpoint again.
The updated streaming probe will detect incomplete `/v1/responses` support
and select `/v1/chat/completions` automatically.

To force `/v1/chat/completions` without re-probing, set `NEMOCLAW_PREFERRED_API`
before onboarding:

```console
$ NEMOCLAW_PREFERRED_API=openai-completions nemoclaw onboard
```

:::{note}
`NEMOCLAW_INFERENCE_API_OVERRIDE` patches the config at container startup but
does not update the Dockerfile ARG baked into the image.
If you recreate the sandbox without the override env var, the image reverts to
the original API path.
A fresh `nemoclaw onboard` is the reliable fix because it updates both the
session and the baked image.
:::

## Cross-Provider Switching

Switching to a different provider family (for example, from NVIDIA Endpoints to Anthropic) requires updating both the gateway route and the sandbox config.
Expand All @@ -92,7 +123,7 @@ $ nemoclaw onboard --resume --recreate-sandbox
```

The entrypoint patches `openclaw.json` at container startup with the override values.
No image rebuild is needed.
You do not need to rebuild the image.
Remove the env vars and recreate the sandbox to revert to the original model.

`NEMOCLAW_INFERENCE_API_OVERRIDE` accepts `openai-completions` (for NVIDIA, OpenAI, Gemini, compatible endpoints) or `anthropic-messages` (for Anthropic and Anthropic-compatible endpoints).
Expand Down
27 changes: 27 additions & 0 deletions docs/inference/use-local-inference.md
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,33 @@ $ NEMOCLAW_PROVIDER=custom \
| `NEMOCLAW_MODEL` | Model ID as reported by the server. |
| `COMPATIBLE_API_KEY` | API key for the endpoint. Use any non-empty value if authentication is not required. |

### Forcing Chat Completions API

Some OpenAI-compatible servers (such as SGLang) expose `/v1/responses` but do
not emit the granular streaming events that OpenClaw requires.
NemoClaw tests streaming events during onboarding and falls back to
`/v1/chat/completions` automatically when it detects incomplete streaming.

If you need to bypass the `/v1/responses` probe entirely, set
`NEMOCLAW_PREFERRED_API` before running onboard:

```console
$ NEMOCLAW_PREFERRED_API=openai-completions nemoclaw onboard
```

Set this variable to make the wizard skip the `/v1/responses` probe and use
`/v1/chat/completions` directly.
You can use it in both interactive and non-interactive mode.

| Variable | Values | Default |
|---|---|---|
| `NEMOCLAW_PREFERRED_API` | `openai-completions`, `chat-completions` | unset (auto-detect) |

If you already onboarded and the sandbox is failing at runtime, re-run
`nemoclaw onboard` to re-probe the endpoint and bake the correct API path
into the image.
Refer to [Switch Inference Models](switch-inference-providers.md) for details.

## Anthropic-Compatible Server

If your local server implements the Anthropic Messages API (`/v1/messages`), choose **Other Anthropic-compatible endpoint** during onboarding instead.
Expand Down
27 changes: 27 additions & 0 deletions docs/reference/troubleshooting.md
Original file line number Diff line number Diff line change
Expand Up @@ -286,6 +286,33 @@ $ export NEMOCLAW_LOCAL_INFERENCE_TIMEOUT=300
$ nemoclaw onboard
```

### Agent fails at runtime after onboarding succeeds with a compatible endpoint

Some OpenAI-compatible servers (such as SGLang) expose `/v1/responses` and pass
the onboarding validation probe, but their streaming mode is incomplete.
OpenClaw requires granular streaming events like `response.output_text.delta`
that these backends do not emit.

NemoClaw now tests streaming events during the `/v1/responses` probe and falls
back to `/v1/chat/completions` automatically.
If you onboarded before this check was added, re-run onboarding so the wizard
re-probes the endpoint and bakes the correct API path into the image:

```console
$ nemoclaw onboard
```

To force `/v1/chat/completions` without re-probing, set `NEMOCLAW_PREFERRED_API`:

```console
$ NEMOCLAW_PREFERRED_API=openai-completions nemoclaw onboard
```

Do not rely on `NEMOCLAW_INFERENCE_API_OVERRIDE` alone — it patches the config
at container startup but does not update the Dockerfile ARG baked into the
image.
A fresh `nemoclaw onboard` is the reliable fix.

### `NEMOCLAW_DISABLE_DEVICE_AUTH=1` does not change an existing sandbox

This is expected behavior.
Expand Down
152 changes: 152 additions & 0 deletions src/lib/http-probe.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ import { describe, expect, it } from "vitest";
import {
getCurlTimingArgs,
runCurlProbe,
runStreamingEventProbe,
summarizeCurlFailure,
summarizeProbeError,
summarizeProbeFailure,
Expand Down Expand Up @@ -85,3 +86,154 @@ describe("http-probe helpers", () => {
expect(result.stderr).toContain("spawn ENOENT");
});
});

describe("runStreamingEventProbe", () => {
/** Helper to build a spawnSyncImpl that writes SSE content to the -o file. */
function mockStreaming(sseBody: string, exitCode = 0) {
return (_command: string, args: readonly string[]) => {
const oIdx = args.indexOf("-o");
if (oIdx !== -1) {
const outputPath = args[oIdx + 1] as string;
fs.writeFileSync(outputPath, sseBody);
}
return {
pid: 1,
output: [],
stdout: "",
stderr: "",
status: exitCode,
signal: null,
};
};
}

it("passes when all required streaming events are present", () => {
const sseBody = [
"event: response.created",
'data: {"id":"resp_1"}',
"",
"event: response.in_progress",
'data: {"id":"resp_1"}',
"",
"event: response.output_item.added",
'data: {"id":"resp_1"}',
"",
"event: response.content_part.added",
'data: {"id":"resp_1"}',
"",
"event: response.output_text.delta",
'data: {"delta":"OK"}',
"",
"event: response.output_text.done",
'data: {"text":"OK"}',
"",
"event: response.content_part.done",
'data: {"id":"resp_1"}',
"",
"event: response.completed",
'data: {"id":"resp_1"}',
"",
].join("\n");

const result = runStreamingEventProbe(
["-sS", "--max-time", "15", "https://example.test/v1/responses"],
{ spawnSyncImpl: mockStreaming(sseBody) },
);

expect(result.ok).toBe(true);
expect(result.missingEvents).toEqual([]);
});

it("fails when only basic lifecycle events are present (SGLang-like)", () => {
const sseBody = [
"event: response.created",
'data: {"id":"resp_1"}',
"",
"event: response.in_progress",
'data: {"id":"resp_1"}',
"",
"event: response.completed",
'data: {"id":"resp_1","text":"OK"}',
"",
].join("\n");

const result = runStreamingEventProbe(
["-sS", "--max-time", "15", "https://example.test/v1/responses"],
{ spawnSyncImpl: mockStreaming(sseBody) },
);

expect(result.ok).toBe(false);
expect(result.missingEvents).toContain("response.output_text.delta");
expect(result.message).toContain("response.output_text.delta");
});

it("still passes if curl exits with 28 (timeout) but events were captured", () => {
const sseBody = [
"event: response.created",
'data: {"id":"resp_1"}',
"",
"event: response.output_text.delta",
'data: {"delta":"O"}',
"",
].join("\n");

const result = runStreamingEventProbe(
["-sS", "--max-time", "15", "https://example.test/v1/responses"],
{ spawnSyncImpl: mockStreaming(sseBody, 28) },
);

expect(result.ok).toBe(true);
expect(result.missingEvents).toEqual([]);
});

it("fails on spawn error", () => {
const result = runStreamingEventProbe(
["-sS", "https://example.test/v1/responses"],
{
spawnSyncImpl: () => {
const error = Object.assign(new Error("spawn ENOENT"), { code: "ENOENT" });
return {
pid: 1,
output: [],
stdout: "",
stderr: "",
status: null,
signal: null,
error,
};
},
},
);

expect(result.ok).toBe(false);
expect(result.message).toContain("Streaming probe failed");
});

it("cleans up temp files after probe", () => {
let outputPath = "";
runStreamingEventProbe(
["-sS", "--max-time", "15", "https://example.test/v1/responses"],
{
spawnSyncImpl: (_command, args) => {
const oIdx = args.indexOf("-o");
if (oIdx !== -1) {
outputPath = args[oIdx + 1] as string;
fs.writeFileSync(outputPath, "event: response.output_text.delta\ndata: {}\n");
}
return {
pid: 1,
output: [],
stdout: "",
stderr: "",
status: 0,
signal: null,
};
},
},
);

expect(outputPath).not.toBe("");
expect(fs.existsSync(outputPath)).toBe(false);
expect(fs.existsSync(path.dirname(outputPath))).toBe(false);
});
});
Loading
Loading