Skip to content

Add install script for conda env based install#1254

Closed
tranzmatt wants to merge 0 commit intoNVIDIA:mainfrom
tranzmatt:main
Closed

Add install script for conda env based install#1254
tranzmatt wants to merge 0 commit intoNVIDIA:mainfrom
tranzmatt:main

Conversation

@tranzmatt
Copy link
Copy Markdown

@tranzmatt tranzmatt commented Apr 1, 2026

Summary

I wanted an installer script that could cleanly install nemoclaw under a specified conda env. It requires conda be previously intsalled.

Changes

  • Add install-conda.sh
  • Usage ./install-conda.sh --conda-env conda-name [--reprocess (to reuse existing conda)] [options]

Type of Change

  • [ X] Code change for a new feature, bug fix, or refactor.
  • Code change with doc updates.
  • Doc only. Prose changes without code sample modifications.
  • Doc only. Includes code sample changes.

Testing

  • npx prek run --all-files passes (or equivalently make check).
  • npm test passes.
  • make docs builds without warnings. (for doc-only changes)
  • [X ] Successfully installs nemoclaw/openshell to specified conda env

Checklist

General

Code Changes

  • Formatters applied — npx prek run --all-files auto-fixes formatting (or make format for targeted runs).
  • Tests added or updated for new or changed behavior.
  • [ X] No secrets, API keys, or credentials committed.
  • Doc pages updated for any user-facing behavior changes (new commands, changed defaults, new features, bug fixes that contradict existing docs).

Doc Changes

  • Follows the style guide. Try running the update-docs agent skill to draft changes while complying with the style guide. For example, prompt your agent with "/update-docs catch up the docs for the new changes I made in this PR."
  • New pages include SPDX license header and frontmatter, if creating a new page.
  • Cross-references and links verified.

Signed-off-by: Matthew Clark linux.matthew@gmail.com

Summary by CodeRabbit

  • New Features

    • Added a conda-based installer (named env, runtime checks, guided onboarding/resume).
    • Messaging onboarding: new “Messaging channels” step with Telegram/Discord/Slack provider setup and end-to-end tests.
    • Improved sandbox build staging, a streaming sandbox-create workflow, and a benchmarking script for build contexts.
    • New model/provider prompts and HTTP probe utilities for provider validation.
  • Behavior Changes

    • Host SSH handling now pins known_hosts (stricter verification/TOFU).
    • Host-side Telegram bridge removed; services now report messaging via native channels.
  • Docs

    • Clarified NEMOCLAW_DISABLE_DEVICE_AUTH is build-time only.
  • Chores

    • VS Code setting added to ignore PR notifications targeting main.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 1, 2026

📝 Walkthrough

Walkthrough

Adds a new conda-based installer and extensive onboarding/messaging provider support; introduces sandbox build-context optimizations, HTTP probe and provider-model utilities, streaming sandbox creation, SSH known-hosts pinning, CI/workflow and Docker changes, many tests and docs, removes host-side Telegram bridge, and updates several scripts for messaging and hardening.

Changes

Cohort / File(s) Summary
Conda installer
install-conda.sh
New strict-mode Bash installer creating/activating a named conda env (Python 3.12, Node 22), installs OpenShell (tarball + sha256) into $CONDA_PREFIX/bin, installs/links NemoClaw (local or GitHub ref), validates Node/npm, runs nemoclaw onboard with TTY/resume handling, supports CLI flags and spinner/logging.
Onboarding & messaging providers
bin/lib/onboard.js, test/onboard*.test.js, test/onboard-selection.test.js
Adds messaging channels step, provider upsert/existence checks, provider-arg/env helpers, provider-related exports, Dockerfile patching args for messaging, SSH known_hosts pruning, step renumbering, and many onboarding tests.
Sandbox build context & benchmarking
bin/lib/sandbox-build-context.js, scripts/benchmark-sandbox-image-build.js, test/sandbox-build-context.test.js
New module supporting legacy and optimized staging, file/size stats, and a benchmark script comparing build-context strategies with worktree handling and cleanup; tests compare sizes and file counts.
Streaming sandbox creation
src/lib/sandbox-create-stream.ts, src/lib/sandbox-create-stream.test.ts
New stream-based sandbox create helper that buffers/streams output, detects phases, heartbeats, optional readyCheck forcing early success, and tests covering progress/forced-ready/error paths.
HTTP probe & provider-models
src/lib/http-probe.ts, src/lib/provider-models.ts, src/lib/validation-recovery.ts, src/lib/*.{test}.ts
Adds curl-based probe utilities, provider model fetch/validate functions (NVIDIA/OpenAI-like/Anthropic), recovery messaging/classification, and extensive unit tests.
Model prompt utilities
src/lib/model-prompts.ts, src/lib/model-prompts.test.ts
New interactive model-prompt helpers (cloud/manual/remote/input), validation deferral behavior, constants/types, and tests.
Deploy SSH hardening
src/lib/deploy.ts, test/runner.test.js
Switches SSH behavior to pinned known_hosts via ssh-keyscan, adds resolve/build helpers for ssh args/options, increases retries, and ensures cleanup of temp known_hosts. Test updated to assert pinned-host-key semantics.
Services / Telegram bridge removal
src/lib/services.ts, scripts/start-services.sh, scripts/telegram-bridge.js (deleted), src/lib/services.test.ts, test/service-env.test.js
Removes host-side telegram-bridge lifecycle and related logic; service list now only includes cloudflared; start/status/stop behaviors and tests updated; telegram bridge script deleted.
Deploy/runtime scripts & hardening
scripts/nemoclaw-start.sh, scripts/debug.sh, scripts/fix-coredns.sh
Adds symlink validation and best-effort chattr hardening, expands messaging-channel configuration checks, creates temp known-hosts for debug, stricter CoreDNS upstream validation and jq-based patch JSON generation.
Dockerfile & runtime config
Dockerfile, Dockerfile.base
Switches to npm ci in builder, adds base64-encoded build args/envs for messaging channels/allowlists, updates python config writer to inject per-channel configs, adds writable .openclaw-data subdirs and symlinks, and removes npm caches during cleanup.
E2E / CI changes
.github/workflows/e2e-brev.yaml, .github/workflows/nightly-e2e.yaml, test/e2e/*, test/e2e/test-messaging-providers.sh
Adds messaging-providers test suite and nightly messaging-providers-e2e job, uploads install log on failure, extends Brev E2E to forward messaging envs, and adds a large new messaging-providers E2E script.
Tests & coverage additions
many test/*.test.js / src/lib/*.test.ts (multiple files listed in summary)
Adds numerous unit/integration tests across probes, prompts, providers, sandbox stream, services, ssh known_hosts pruning, validation-recovery, policies logging, and updates many existing tests to reflect behavioral changes.
Docs & quickstart
docs/*, docs/get-started/quickstart.md, docs/deployment/deploy-to-remote-gpu.md
Documented NEMOCLAW_DISABLE_DEVICE_AUTH as a build-time-only toggle and clarified onboarding effects; added troubleshooting note.
Editor setting
.vscode/settings.json
Adds VS Code setting to ignore PRs targeting main for GitHub Pull Requests extension.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant Installer as install-conda.sh
    participant Conda as Conda
    participant Npm as npm
    participant GH as GitHub (releases)
    participant OpenShell as OpenShell binary
    participant NemoClaw as NemoClaw CLI

    User->>Installer: run installer (--conda-env ...)
    Installer->>Conda: check conda / create env (py3.12,node22) / activate
    Conda-->>Installer: env active
    Installer->>Npm: verify Node/npm versions, set NPM_CONFIG_PREFIX
    Installer->>GH: fetch OpenShell asset + sha256
    GH-->>Installer: asset + checksum
    Installer->>Installer: download + verify checksum
    Installer->>OpenShell: extract & install -> $CONDA_PREFIX/bin
    Installer->>Npm: clone/install/build/link NemoClaw (local or tag)
    Npm-->>Installer: nemoclaw linked
    Installer->>NemoClaw: run `nemoclaw onboard` (attach TTY / resume if present)
    NemoClaw-->>User: onboarding prompts (interactive)
    User-->>NemoClaw: responses
    NemoClaw-->>Installer: onboarding finished
    Installer->>User: completion message / next steps
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

Suggested labels

Getting Started, CI/CD, enhancement: testing

Suggested reviewers

  • kjw3
  • cv

Poem

🐰 I hopped through scripts and tarball trails,
Built conda nests and followed logs’ bright trails,
OpenShell snug, NemoClaw linked with glee,
Messaging seeds planted — sandboxes free,
A carrot-coded cheer for this release! 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 23.19% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Add install script for conda env based install' directly describes the main change: a new installer script for conda environment-based installation, matching the primary file addition (install-conda.sh) and overall PR objectives.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@install-conda.sh`:
- Around line 305-315: Add preflight checks for the external tools used in the
download & verify steps: verify that at least one of gh or curl is available
(use the existing command_exists helper) before attempting the OpenShell
download and verify that shasum (or an alternative like sha256sum) is present
before checksum verification; if neither gh nor curl is found or no checksum
tool is available, exit with a clear error explaining which tool is missing.
Update the preflight section (where conda is checked) to include these checks,
and keep existing fallback logic in the download block (GH preferred, curl
fallback) while ensuring the script never reaches the download/verify steps
without first confirming the required binaries (gh, curl, shasum/sha256sum).
- Line 189: The help text for NEMOCLAW_INSTALL_TAG claims it accepts a "Git ref"
but the install uses git clone --branch which only accepts branch/tag names;
either narrow the help text to "branch or tag" (update the printf line that
prints NEMOCLAW_INSTALL_TAG) or change the clone logic to support arbitrary
refs: clone the repo (without --branch or with --no-checkout/fetch), then run
git -C "<clone_dir>" fetch --all (or fetch origin "<ref>") and git -C
"<clone_dir>" checkout "$NEMOCLAW_INSTALL_TAG" so commit SHAs work; update the
code paths that currently call git clone --branch to use this fetch+checkout
flow or change the printed description accordingly.
- Around line 612-615: The installer currently runs "npm config set prefix
\"$CONDA_PREFIX\"" which writes to the user's persistent npm config; instead
make the prefix ephemeral by not mutating config — remove the "npm config set
prefix" call and use an environment-scoped approach (e.g., set
NPM_CONFIG_PREFIX=\"$CONDA_PREFIX\" for the npm commands or pass --prefix
\"$CONDA_PREFIX\" to the npm invocations) so the prefix only applies to this
run; update install-conda.sh to replace the persistent "npm config set prefix"
use with NPM_CONFIG_PREFIX or per-command --prefix usage wherever npm is
invoked.
- Around line 214-219: The conda_env_exists function currently uses grep -qx
which treats env_name as a regex and can mis-match names with regex
metacharacters; update the literal match to use grep -Fxq -- "$env_name" (or
equivalent fixed-string flags) when checking the output of conda env list to
ensure exact env name matching for the env_name variable inside
conda_env_exists.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 4452265b-2403-413c-b9d7-0a5991f80208

📥 Commits

Reviewing files that changed from the base of the PR and between 97038c0 and 9cf5e96.

📒 Files selected for processing (1)
  • install-conda.sh

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
install-conda.sh (1)

11-13: Either cap Node at 22.x or update the runtime contract.

The script text says the env is created with Node 22.x, but the solver spec and runtime gate both accept any newer major. Once conda-forge advances, fresh installs can land on 23/24 and still pass validation. If 22.x is the supported line, add <23 here and reject other majors in ensure_supported_runtime; otherwise, the 22.x wording should be updated.

Also applies to: 233-237, 419-440

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@install-conda.sh` around lines 11 - 13, The comment and runtime checks are
inconsistent: either cap Node to the 22.x line or broaden the contract; update
both the install-conda.sh description and the runtime validation in
ensure_supported_runtime so they match. If you intend to restrict to Node 22,
change the narrative text to specify "Node >=22.x and <23.0.0" (or "22.x only")
and add a check in ensure_supported_runtime to reject majors >=23 (and any newer
major) by validating process.version or the parsed semver major; otherwise, if
you intend to permit newer LTS majors, update all occurrences of "Node 22.x" in
the script (including the other noted blocks) to a non-restrictive phrase like
"Node 22.x or newer" and adjust ensure_supported_runtime to allow newer majors
accordingly. Ensure the same decision is applied consistently across the
functions/files that reference Node support.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@install-conda.sh`:
- Around line 411-417: The current verify_nemoclaw function uses command_exists
nemoclaw which can pick up a global CLI; change verify_nemoclaw to ensure the
resolved binary path from command -v or realpath points inside
${CONDA_PREFIX}/bin (i.e. prefix match) before returning success; if command -v
fails or the resolved path is not under ${CONDA_PREFIX}/bin, call error with the
existing failure message so a broken npm link inside the conda env is detected.
Ensure you update the logic that currently calls command_exists and the success
message to reference the verified conda-env path.
- Around line 378-386: The checks and npm operations are running against the
caller’s CWD instead of the script’s repo directory; update the block to operate
from the install script’s directory (e.g., compute SCRIPT_DIR="$(cd \"$(dirname
\"${BASH_SOURCE[0]}\")\" && pwd)" at top) and run the commands from that path:
use cd "$SCRIPT_DIR" && ... (or pass "$SCRIPT_DIR" into spin) so the test for
./package.json, the pre_extract_openclaw invocation, spin calls that use
"$(pwd)", npm install, npm link, and the cd nemoclaw build step all run against
the repository in the script directory rather than the caller’s working
directory; ensure the pre_extract_openclaw invocation uses "$SCRIPT_DIR" instead
of "$(pwd)" and that npm/link/build commands are executed with cd "$SCRIPT_DIR"
(or wrapped in a subshell) so relative paths resolve correctly.

---

Nitpick comments:
In `@install-conda.sh`:
- Around line 11-13: The comment and runtime checks are inconsistent: either cap
Node to the 22.x line or broaden the contract; update both the install-conda.sh
description and the runtime validation in ensure_supported_runtime so they
match. If you intend to restrict to Node 22, change the narrative text to
specify "Node >=22.x and <23.0.0" (or "22.x only") and add a check in
ensure_supported_runtime to reject majors >=23 (and any newer major) by
validating process.version or the parsed semver major; otherwise, if you intend
to permit newer LTS majors, update all occurrences of "Node 22.x" in the script
(including the other noted blocks) to a non-restrictive phrase like "Node 22.x
or newer" and adjust ensure_supported_runtime to allow newer majors accordingly.
Ensure the same decision is applied consistently across the functions/files that
reference Node support.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 34006002-82ac-4479-a30e-d844bcc95f1b

📥 Commits

Reviewing files that changed from the base of the PR and between 9cf5e96 and 2cf443c.

📒 Files selected for processing (2)
  • .vscode/settings.json
  • install-conda.sh
✅ Files skipped from review due to trivial changes (1)
  • .vscode/settings.json

install-conda.sh Outdated
Comment on lines +378 to +386
if [[ -f "./package.json" ]] && grep -q '"name": "nemoclaw"' ./package.json 2>/dev/null; then
info "NemoClaw package.json found in current directory — installing from source…"
spin "Preparing OpenClaw package" bash -c \
"$(declare -f info warn pre_extract_openclaw); pre_extract_openclaw \"\$1\"" \
_ "$(pwd)" \
|| warn "Pre-extraction failed — npm install may fail if openclaw tarball is broken"
spin "Installing NemoClaw dependencies" npm install --ignore-scripts
spin "Building NemoClaw plugin" bash -c 'cd nemoclaw && npm install --ignore-scripts && npm run build'
spin "Linking NemoClaw CLI" npm link
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Install from the script directory, not the caller’s CWD.

./package.json, pwd, npm install, npm link, and cd nemoclaw all resolve relative to the shell’s current directory. Running bash /path/to/install-conda.sh ... from anywhere else will skip the checked-out repo or build the wrong tree.

Suggested fix
-  if [[ -f "./package.json" ]] && grep -q '"name": "nemoclaw"' ./package.json 2>/dev/null; then
-    info "NemoClaw package.json found in current directory — installing from source…"
+  if [[ -f "${SCRIPT_DIR}/package.json" ]] && grep -q '"name": "nemoclaw"' "${SCRIPT_DIR}/package.json" 2>/dev/null; then
+    info "NemoClaw package.json found next to the installer — installing from source…"
     spin "Preparing OpenClaw package" bash -c \
       "$(declare -f info warn pre_extract_openclaw); pre_extract_openclaw \"\$1\"" \
-      _ "$(pwd)" \
+      _ "$SCRIPT_DIR" \
       || warn "Pre-extraction failed — npm install may fail if openclaw tarball is broken"
-    spin "Installing NemoClaw dependencies" npm install --ignore-scripts
-    spin "Building NemoClaw plugin" bash -c 'cd nemoclaw && npm install --ignore-scripts && npm run build'
-    spin "Linking NemoClaw CLI" npm link
+    spin "Installing NemoClaw dependencies" bash -c 'cd "$1" && npm install --ignore-scripts' _ "$SCRIPT_DIR"
+    spin "Building NemoClaw plugin" bash -c 'cd "$1/nemoclaw" && npm install --ignore-scripts && npm run build' _ "$SCRIPT_DIR"
+    spin "Linking NemoClaw CLI" bash -c 'cd "$1" && npm link' _ "$SCRIPT_DIR"
   else
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@install-conda.sh` around lines 378 - 386, The checks and npm operations are
running against the caller’s CWD instead of the script’s repo directory; update
the block to operate from the install script’s directory (e.g., compute
SCRIPT_DIR="$(cd \"$(dirname \"${BASH_SOURCE[0]}\")\" && pwd)" at top) and run
the commands from that path: use cd "$SCRIPT_DIR" && ... (or pass "$SCRIPT_DIR"
into spin) so the test for ./package.json, the pre_extract_openclaw invocation,
spin calls that use "$(pwd)", npm install, npm link, and the cd nemoclaw build
step all run against the repository in the script directory rather than the
caller’s working directory; ensure the pre_extract_openclaw invocation uses
"$SCRIPT_DIR" instead of "$(pwd)" and that npm/link/build commands are executed
with cd "$SCRIPT_DIR" (or wrapped in a subshell) so relative paths resolve
correctly.

install-conda.sh Outdated
Comment on lines +411 to +417
verify_nemoclaw() {
if command_exists nemoclaw; then
info "Verified: nemoclaw available at $(command -v nemoclaw)"
return 0
fi
error "Installation failed: nemoclaw binary not found after npm link. Check npm link output above."
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Verify the env-local nemoclaw, not any binary on PATH.

command_exists nemoclaw passes if the user already has a global install, so a broken npm link inside the conda env can be masked and onboarding will run against the wrong CLI. Require the resolved binary to live under ${CONDA_PREFIX}/bin before continuing.

Suggested fix
 verify_nemoclaw() {
-  if command_exists nemoclaw; then
-    info "Verified: nemoclaw available at $(command -v nemoclaw)"
+  local nemoclaw_path
+  nemoclaw_path="$(type -P nemoclaw || true)"
+  if [[ "$nemoclaw_path" == "${CONDA_PREFIX}/bin/nemoclaw" ]]; then
+    info "Verified: nemoclaw available at ${nemoclaw_path}"
     return 0
   fi
-  error "Installation failed: nemoclaw binary not found after npm link. Check npm link output above."
+  error "Installation failed: expected ${CONDA_PREFIX}/bin/nemoclaw, got ${nemoclaw_path:-<not found>}."
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
verify_nemoclaw() {
if command_exists nemoclaw; then
info "Verified: nemoclaw available at $(command -v nemoclaw)"
return 0
fi
error "Installation failed: nemoclaw binary not found after npm link. Check npm link output above."
}
verify_nemoclaw() {
local nemoclaw_path
nemoclaw_path="$(type -P nemoclaw || true)"
if [[ "$nemoclaw_path" == "${CONDA_PREFIX}/bin/nemoclaw" ]]; then
info "Verified: nemoclaw available at ${nemoclaw_path}"
return 0
fi
error "Installation failed: expected ${CONDA_PREFIX}/bin/nemoclaw, got ${nemoclaw_path:-<not found>}."
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@install-conda.sh` around lines 411 - 417, The current verify_nemoclaw
function uses command_exists nemoclaw which can pick up a global CLI; change
verify_nemoclaw to ensure the resolved binary path from command -v or realpath
points inside ${CONDA_PREFIX}/bin (i.e. prefix match) before returning success;
if command -v fails or the resolved path is not under ${CONDA_PREFIX}/bin, call
error with the existing failure message so a broken npm link inside the conda
env is detected. Ensure you update the logic that currently calls command_exists
and the success message to reference the verified conda-env path.

@wscurran wscurran added enhancement New feature or request NemoClaw CLI Use this label to identify issues with the NemoClaw command-line interface (CLI). labels Apr 1, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (10)
docs/deployment/deploy-to-remote-gpu.md (1)

120-120: Use active voice.

The phrase "is also evaluated" is passive. Rewrite to specify the actor.

Suggested alternatives:

  • "The build process also evaluates NEMOCLAW_DISABLE_DEVICE_AUTH at image build time."
  • "The installer also evaluates NEMOCLAW_DISABLE_DEVICE_AUTH at image build time."
  • "NemoClaw also evaluates NEMOCLAW_DISABLE_DEVICE_AUTH at image build time."

As per coding guidelines: "Active voice required. Flag passive constructions."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/deployment/deploy-to-remote-gpu.md` at line 120, The sentence uses
passive voice ("is also evaluated"); change it to active voice by specifying the
actor (e.g., "The build process also evaluates `NEMOCLAW_DISABLE_DEVICE_AUTH` at
image build time." or "The installer also evaluates
`NEMOCLAW_DISABLE_DEVICE_AUTH` at image build time."). Replace the existing
passive phrase with one of these active alternatives and ensure
`NEMOCLAW_DISABLE_DEVICE_AUTH` remains exact and code-formatted.
bin/lib/sandbox-build-context.js (1)

39-46: Consider handling missing optional config files.

If any of these files (package.json, package-lock.json, tsconfig.json, openclaw.plugin.json) are missing from the source directory, fs.copyFileSync will throw an ENOENT error. While these files should always exist in a properly configured project, adding existence checks or documenting this precondition would make the function more robust.

🔧 Optional: Add existence check or let callers handle the error
   for (const file of [
     "package.json",
     "package-lock.json",
     "tsconfig.json",
     "openclaw.plugin.json",
   ]) {
-    fs.copyFileSync(path.join(sourceNemoclawDir, file), path.join(stagedNemoclawDir, file));
+    const src = path.join(sourceNemoclawDir, file);
+    if (fs.existsSync(src)) {
+      fs.copyFileSync(src, path.join(stagedNemoclawDir, file));
+    }
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bin/lib/sandbox-build-context.js` around lines 39 - 46, The loop that copies
config files uses fs.copyFileSync on a fixed array and will throw ENOENT if any
file is missing; update the code around the file-copy loop to check existence
(fs.existsSync or fs.statSync try/catch) for each file in the
["package.json","package-lock.json","tsconfig.json","openclaw.plugin.json"] list
before calling fs.copyFileSync(path.join(sourceNemoclawDir, file),
path.join(stagedNemoclawDir, file)), and either skip missing optional files with
a debug/log message or rethrow with a clearer error so callers (or upstream code
that constructs sourceNemoclawDir/stagedNemoclawDir) can handle the condition.
src/lib/http-probe.ts (1)

99-106: Mutating input array with pop() may surprise callers.

args.pop() removes the last element from the cloned array, but if the input argv is empty, url will be undefined and String(url || "") will produce an empty string. This is handled, but the behavior of expecting the URL as the last argument should be documented.

📝 Optional: Add JSDoc clarifying URL position
+/**
+ * Execute a curl probe. The URL must be the last element of argv.
+ * `@param` argv - Curl arguments with the URL as the final element
+ * `@param` opts - Optional configuration
+ */
 // eslint-disable-next-line complexity
 export function runCurlProbe(argv: string[], opts: CurlProbeOptions = {}): CurlProbeResult {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/http-probe.ts` around lines 99 - 106, The code currently derives url
using args.pop() which mutates the cloned args array and can be surprising;
change to compute url without mutating the input by doing something like const
url = argv[argv.length - 1]; const args = argv.slice(0, -1); (or use
destructuring) and keep using spawnSyncImpl (or opts.spawnSyncImpl) as before;
additionally add a brief JSDoc on the surrounding function to state that the URL
is expected as the last element of argv so callers are aware of the required
argument order.
test/service-env.test.js (1)

16-28: Rename this test or restore the negative assertion.

It now only checks the new banner text. If the old messaging warnings reappear, this case still passes, so the title no longer matches the coverage.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/service-env.test.js` around lines 16 - 28, The test titled "starts
without messaging-related warnings" now only asserts the new banner text via the
expect(result).toContain("Messaging:   via OpenClaw native channels") check, so
either rename the test to reflect it's verifying the new banner or restore the
original negative assertion to ensure old warnings don't appear; to fix, either
change the it() description to something like "displays new messaging banner" or
add a negative assertion such as expect(result).not.toContain("<old warning
text>") (or the previous warning substring) alongside the existing expect so the
test both confirms the new banner and verifies the absence of the old messaging
warnings.
test/e2e/test-messaging-providers.sh (1)

170-177: Background process handling could leave orphaned tail on early exit.

If the script exits unexpectedly between lines 172-174 (after starting tail -f but before the wait completes), the tail process may be orphaned. Consider using a trap to ensure cleanup.

+trap 'kill $tail_pid 2>/dev/null || true' EXIT
 INSTALL_LOG="/tmp/nemoclaw-e2e-install.log"
 bash install.sh --non-interactive >"$INSTALL_LOG" 2>&1 &
 install_pid=$!
 tail -f "$INSTALL_LOG" --pid=$install_pid 2>/dev/null &
 tail_pid=$!
 wait $install_pid
 install_exit=$?
 kill $tail_pid 2>/dev/null || true
 wait $tail_pid 2>/dev/null || true
+trap - EXIT
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/test-messaging-providers.sh` around lines 170 - 177, The background
tail process (started as tail -f "$INSTALL_LOG" --pid=$install_pid & assigned to
tail_pid) can become orphaned if the script exits early; add a trap that runs on
EXIT and on signals (e.g., INT, TERM) to kill the tail_pid and the install_pid
if still running and wait for them to reap children, then clear the trap; ensure
the trap references the existing variables install_pid, tail_pid and performs
safe kill(... ) || true and wait(... ) 2>/dev/null to avoid leaving stray
processes or zombies.
scripts/benchmark-sandbox-image-build.js (1)

54-60: Empty catch block silently swallows worktree removal errors.

The catch block at line 56 discards all errors without logging. While best-effort cleanup is reasonable, a debug log would help diagnose issues when worktree removal fails unexpectedly.

 function removeWorktree(worktreeRoot, currentRepo) {
   try {
     run("git", ["worktree", "remove", "--force", worktreeRoot], { cwd: currentRepo });
-  } catch {
+  } catch (err) {
     // Best-effort cleanup; remove the temp directory either way.
+    if (process.env.DEBUG) console.error(`[debug] worktree removal failed: ${err.message}`);
   }
   fs.rmSync(worktreeRoot, { recursive: true, force: true });
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/benchmark-sandbox-image-build.js` around lines 54 - 60, The empty
catch around the run("git", ["worktree", "remove", "--force", worktreeRoot], {
cwd: currentRepo }) call silently swallows errors—change the catch to capture
the error (e.g., catch (err)) and emit a debug/warn log that includes the error
and the worktreeRoot/currentRepo context (using console.debug/console.warn or
the project's logger) before continuing with the fs.rmSync(worktreeRoot, {
recursive: true, force: true }) cleanup.
Dockerfile (1)

111-115: Consider splitting the dense channel config comprehension for maintainability.

The inline Python dictionary comprehension at line 115 is complex and difficult to debug. While functional, it combines token key selection, environment variable mapping, and conditional allowlist logic in a single expression.

This is a minor maintainability concern—the code works, but future modifications could introduce subtle bugs.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Dockerfile` around lines 111 - 115, The one-line comprehension building
_ch_cfg is hard to read and brittle; refactor by replacing the comprehension
with an explicit loop that iterates over msg_channels, checks membership in
_token_keys, looks up token key via _token_keys[ch] and env variable via
_env_keys[ch], builds the account dict for 'main' (including the token entry and
'enabled': True), conditionally adds the 'dmPolicy':'allowlist' and 'allowFrom':
_allowed_ids[ch] only when _allowed_ids contains ch and is non-empty, and then
assigns the per-channel entry into _ch_cfg; keep the original variable names
msg_channels, _allowed_ids, _token_keys, _env_keys, and _ch_cfg so changes are
localized.
src/lib/model-prompts.ts (2)

60-65: Regex-based message matching is fragile.

The fallback check validation.message?.startsWith("Could not validate") would be more robust than regex, or better yet, always set deferValidation: true in validateNvidiaEndpointModel when deferral is intended. The current regex-based approach could silently break if the message format changes.

Suggested improvement
 function shouldDeferValidationFailure(validation: PromptValidationResult): boolean {
-  return (
-    validation.deferValidation === true ||
-    /^Could not validate model against /i.test(String(validation.message || ""))
-  );
+  // Prefer explicit deferValidation flag; message matching is a legacy fallback
+  return validation.deferValidation === true;
 }

Then ensure validateNvidiaEndpointModel sets deferValidation: true when validation cannot be performed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/model-prompts.ts` around lines 60 - 65, Replace the fragile regex
check in shouldDeferValidationFailure with a more robust approach: treat any
validation result that explicitly sets deferValidation === true as the
authoritative signal, and otherwise fall back to a simple startsWith check on
validation.message (e.g., validation.message?.startsWith("Could not validate")).
Also update validateNvidiaEndpointModel to set deferValidation: true on the
returned PromptValidationResult whenever validation cannot be performed so
callers of shouldDeferValidationFailure can rely on the deferValidation flag
rather than message parsing.

46-58: Acceptable duplication due to module boundary.

These functions duplicate logic from onboard.js, but importing them would create a circular dependency since onboard.js imports this module via dist/lib/model-prompts. The functions are small and stable, so the duplication is tolerable.

If this pattern expands, consider extracting these into a shared utility module (e.g., src/lib/prompt-utils.ts) that both can import.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/model-prompts.ts` around lines 46 - 58, The functions
getNavigationChoice and exitOnboardFromPrompt duplicate logic present in
onboard.js but importing would create a circular dependency; keep the current
small, stable duplication but add a concise TODO comment above these functions
noting the duplication and recommending extraction into a shared utility (e.g.,
src/lib/prompt-utils.ts) if similar logic expands, so future refactors can
consolidate getNavigationChoice and exitOnboardFromPrompt without introducing
circular imports.
bin/lib/onboard.js (1)

3005-3006: Minor: Redundant non-interactive check.

isNonInteractive() already checks process.env.NEMOCLAW_NON_INTERACTIVE === "1" (line 3629), so the explicit OR is unnecessary.

Suggested simplification
-  if (isNonInteractive() || process.env.NEMOCLAW_NON_INTERACTIVE === "1") {
+  if (isNonInteractive()) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bin/lib/onboard.js` around lines 3005 - 3006, The if condition redundantly
checks process.env.NEMOCLAW_NON_INTERACTIVE === "1" even though
isNonInteractive() already covers that; update the conditional at the block
using MESSAGING_CHANNELS and getMessagingToken to only call isNonInteractive()
(i.e., change if (isNonInteractive() || process.env.NEMOCLAW_NON_INTERACTIVE ===
"1") to if (isNonInteractive())) so the code relies on the centralized
isNonInteractive() check and avoids duplication.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@bin/lib/onboard.js`:
- Around line 3131-3145: The user IDs are only written to process.env and not
persisted; update the onboarding flow to persist allowed IDs using the existing
saveCredential helper. When handling ch.userIdEnvKey (the same branch that reads
existingIds from process.env and prompts via prompt), read existingIds from the
credential store if available and, after the user enters a non-empty userId,
call saveCredential(ch.userIdEnvKey, userId) and also set
process.env[ch.userIdEnvKey] = userId for the current run; keep the same console
messages for success/skip. Ensure you reference saveCredential, ch.userIdEnvKey,
process.env, and prompt when making the change.

In `@src/lib/services.ts`:
- Around line 103-104: SERVICE_NAMES was narrowed to ["cloudflared"] and left
legacy telegram-bridge processes orphaned; update the cleanup path so any
existing telegram-bridge.pid is detected and pruned before fully dropping
support. In the functions showStatus, stopAll, and getServiceStatuses add logic
to check for telegram-bridge.pid, read the PID, attempt a graceful stop
(SIGTERM, fallback SIGKILL), remove the pid file, and report the action; mirror
the same PID-check-and-remove behavior in scripts/start-services.sh so upgraded
installs won't leave the old bridge running with its bot-token env. Ensure you
reference SERVICE_NAMES/ServiceName unchanged for current services while adding
this separate legacy-cleanup flow tied to "telegram-bridge.pid".

In `@test/e2e/brev-e2e.test.js`:
- Around line 150-160: The test currently builds SSH command-line exports by
appending strings to envParts using shellEscape for keys like TELEGRAM_BOT_TOKEN
and DISCORD_BOT_TOKEN which causes secrets to appear in the local ssh '...'
command; instead stop putting these sensitive exports into the SSH command
string and transfer them via stdin or a remote temp file: detect the same keys
in envParts collection logic, remove or skip adding export lines for secret
keys, and write the key=value pairs to the remote host through a secure channel
(e.g., send on stdin and write to a temp file then source it remotely, or create
a remote temp file with proper permissions) before running the test commands so
secrets are never present in the local process argv.
- Around line 694-702: The messaging-providers test is accidentally forced to
use the shared sandbox because sshEnv() currently hardcodes
NEMOCLAW_SANDBOX_NAME=e2e-test; update the test harness so the messaging test
can request a different sandbox. Change sshEnv() or runRemoteTest() to accept an
optional sandboxName parameter and, when provided (e.g., from the "messaging
credential provider suite" test), set NEMOCLAW_SANDBOX_NAME to that value
instead of the hardcoded e2e-test; alternatively, when calling
runRemoteTest("test/e2e/test-messaging-providers.sh") pass an explicit env
override NEMOCLAW_SANDBOX_NAME=e2e-msg-provider so the script’s SANDBOX_NAME
default is respected. Ensure references to sshEnv(), runRemoteTest(), and the
test-messaging-providers.sh invocation are updated accordingly.

In `@test/ssh-known-hosts.test.js`:
- Around line 88-99: The test currently expects bracketed entries like
"[openshell-sandbox]:2222" to be preserved, but pruneKnownHostsEntries should
remove such entries; update the test to expect those lines to be pruned and
modify pruneKnownHostsEntries to normalize host fields by stripping surrounding
brackets and any ":port" suffix (e.g., convert "[openshell-sandbox]:2222" →
"openshell-sandbox") before comparing against the openshell pattern so bracketed
non-default-port entries are matched and removed; ensure the normalization logic
is used wherever host matching occurs in pruneKnownHostsEntries.

---

Nitpick comments:
In `@bin/lib/onboard.js`:
- Around line 3005-3006: The if condition redundantly checks
process.env.NEMOCLAW_NON_INTERACTIVE === "1" even though isNonInteractive()
already covers that; update the conditional at the block using
MESSAGING_CHANNELS and getMessagingToken to only call isNonInteractive() (i.e.,
change if (isNonInteractive() || process.env.NEMOCLAW_NON_INTERACTIVE === "1")
to if (isNonInteractive())) so the code relies on the centralized
isNonInteractive() check and avoids duplication.

In `@bin/lib/sandbox-build-context.js`:
- Around line 39-46: The loop that copies config files uses fs.copyFileSync on a
fixed array and will throw ENOENT if any file is missing; update the code around
the file-copy loop to check existence (fs.existsSync or fs.statSync try/catch)
for each file in the
["package.json","package-lock.json","tsconfig.json","openclaw.plugin.json"] list
before calling fs.copyFileSync(path.join(sourceNemoclawDir, file),
path.join(stagedNemoclawDir, file)), and either skip missing optional files with
a debug/log message or rethrow with a clearer error so callers (or upstream code
that constructs sourceNemoclawDir/stagedNemoclawDir) can handle the condition.

In `@Dockerfile`:
- Around line 111-115: The one-line comprehension building _ch_cfg is hard to
read and brittle; refactor by replacing the comprehension with an explicit loop
that iterates over msg_channels, checks membership in _token_keys, looks up
token key via _token_keys[ch] and env variable via _env_keys[ch], builds the
account dict for 'main' (including the token entry and 'enabled': True),
conditionally adds the 'dmPolicy':'allowlist' and 'allowFrom': _allowed_ids[ch]
only when _allowed_ids contains ch and is non-empty, and then assigns the
per-channel entry into _ch_cfg; keep the original variable names msg_channels,
_allowed_ids, _token_keys, _env_keys, and _ch_cfg so changes are localized.

In `@docs/deployment/deploy-to-remote-gpu.md`:
- Line 120: The sentence uses passive voice ("is also evaluated"); change it to
active voice by specifying the actor (e.g., "The build process also evaluates
`NEMOCLAW_DISABLE_DEVICE_AUTH` at image build time." or "The installer also
evaluates `NEMOCLAW_DISABLE_DEVICE_AUTH` at image build time."). Replace the
existing passive phrase with one of these active alternatives and ensure
`NEMOCLAW_DISABLE_DEVICE_AUTH` remains exact and code-formatted.

In `@scripts/benchmark-sandbox-image-build.js`:
- Around line 54-60: The empty catch around the run("git", ["worktree",
"remove", "--force", worktreeRoot], { cwd: currentRepo }) call silently swallows
errors—change the catch to capture the error (e.g., catch (err)) and emit a
debug/warn log that includes the error and the worktreeRoot/currentRepo context
(using console.debug/console.warn or the project's logger) before continuing
with the fs.rmSync(worktreeRoot, { recursive: true, force: true }) cleanup.

In `@src/lib/http-probe.ts`:
- Around line 99-106: The code currently derives url using args.pop() which
mutates the cloned args array and can be surprising; change to compute url
without mutating the input by doing something like const url = argv[argv.length
- 1]; const args = argv.slice(0, -1); (or use destructuring) and keep using
spawnSyncImpl (or opts.spawnSyncImpl) as before; additionally add a brief JSDoc
on the surrounding function to state that the URL is expected as the last
element of argv so callers are aware of the required argument order.

In `@src/lib/model-prompts.ts`:
- Around line 60-65: Replace the fragile regex check in
shouldDeferValidationFailure with a more robust approach: treat any validation
result that explicitly sets deferValidation === true as the authoritative
signal, and otherwise fall back to a simple startsWith check on
validation.message (e.g., validation.message?.startsWith("Could not validate")).
Also update validateNvidiaEndpointModel to set deferValidation: true on the
returned PromptValidationResult whenever validation cannot be performed so
callers of shouldDeferValidationFailure can rely on the deferValidation flag
rather than message parsing.
- Around line 46-58: The functions getNavigationChoice and exitOnboardFromPrompt
duplicate logic present in onboard.js but importing would create a circular
dependency; keep the current small, stable duplication but add a concise TODO
comment above these functions noting the duplication and recommending extraction
into a shared utility (e.g., src/lib/prompt-utils.ts) if similar logic expands,
so future refactors can consolidate getNavigationChoice and
exitOnboardFromPrompt without introducing circular imports.

In `@test/e2e/test-messaging-providers.sh`:
- Around line 170-177: The background tail process (started as tail -f
"$INSTALL_LOG" --pid=$install_pid & assigned to tail_pid) can become orphaned if
the script exits early; add a trap that runs on EXIT and on signals (e.g., INT,
TERM) to kill the tail_pid and the install_pid if still running and wait for
them to reap children, then clear the trap; ensure the trap references the
existing variables install_pid, tail_pid and performs safe kill(... ) || true
and wait(... ) 2>/dev/null to avoid leaving stray processes or zombies.

In `@test/service-env.test.js`:
- Around line 16-28: The test titled "starts without messaging-related warnings"
now only asserts the new banner text via the
expect(result).toContain("Messaging:   via OpenClaw native channels") check, so
either rename the test to reflect it's verifying the new banner or restore the
original negative assertion to ensure old warnings don't appear; to fix, either
change the it() description to something like "displays new messaging banner" or
add a negative assertion such as expect(result).not.toContain("<old warning
text>") (or the previous warning substring) alongside the existing expect so the
test both confirms the new banner and verifies the absence of the old messaging
warnings.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 31712fcd-2437-4c29-9723-5dc2b17b6b1a

📥 Commits

Reviewing files that changed from the base of the PR and between 2cf443c and 5cd6388.

📒 Files selected for processing (42)
  • .github/workflows/e2e-brev.yaml
  • .github/workflows/nightly-e2e.yaml
  • Dockerfile
  • Dockerfile.base
  • bin/lib/onboard.js
  • bin/lib/policies.js
  • bin/lib/sandbox-build-context.js
  • docs/deployment/deploy-to-remote-gpu.md
  • docs/get-started/quickstart.md
  • docs/reference/architecture.md
  • docs/reference/troubleshooting.md
  • scripts/benchmark-sandbox-image-build.js
  • scripts/debug.sh
  • scripts/fix-coredns.sh
  • scripts/nemoclaw-start.sh
  • scripts/start-services.sh
  • scripts/telegram-bridge.js
  • src/lib/deploy.ts
  • src/lib/http-probe.test.ts
  • src/lib/http-probe.ts
  • src/lib/model-prompts.test.ts
  • src/lib/model-prompts.ts
  • src/lib/provider-models.test.ts
  • src/lib/provider-models.ts
  • src/lib/sandbox-create-stream.test.ts
  • src/lib/sandbox-create-stream.ts
  • src/lib/services.test.ts
  • src/lib/services.ts
  • src/lib/validation-recovery.test.ts
  • src/lib/validation-recovery.ts
  • test/credential-exposure.test.js
  • test/e2e-gateway-isolation.sh
  • test/e2e-test.sh
  • test/e2e/brev-e2e.test.js
  • test/e2e/test-messaging-providers.sh
  • test/onboard-selection.test.js
  • test/onboard.test.js
  • test/policies.test.js
  • test/runner.test.js
  • test/sandbox-build-context.test.js
  • test/service-env.test.js
  • test/ssh-known-hosts.test.js
💤 Files with no reviewable changes (1)
  • scripts/telegram-bridge.js
✅ Files skipped from review due to trivial changes (4)
  • docs/get-started/quickstart.md
  • docs/reference/architecture.md
  • docs/reference/troubleshooting.md
  • .github/workflows/e2e-brev.yaml

Comment on lines +3131 to +3145
if (ch.userIdEnvKey) {
const existingIds = process.env[ch.userIdEnvKey] || "";
if (existingIds) {
console.log(` ✓ ${ch.name} — allowed IDs already set: ${existingIds}`);
} else {
console.log(` ${ch.userIdHelp}`);
const userId = (await prompt(` ${ch.userIdLabel}: `)).trim();
if (userId) {
process.env[ch.userIdEnvKey] = userId;
console.log(` ✓ ${ch.name} user ID saved`);
} else {
console.log(` Skipped ${ch.name} user ID (bot will require manual pairing)`);
}
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

User IDs are not persisted to credentials, unlike tokens.

Tokens are saved via saveCredential(ch.envKey, token) (line 3122), but user IDs are only stored in process.env (line 3139). This means users must re-enter their allowed IDs on subsequent onboard runs, while tokens are remembered.

Consider persisting user IDs to credentials for consistency:

Suggested fix
         const userId = (await prompt(`  ${ch.userIdLabel}: `)).trim();
         if (userId) {
           process.env[ch.userIdEnvKey] = userId;
+          saveCredential(ch.userIdEnvKey, userId);
           console.log(`  ✓ ${ch.name} user ID saved`);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (ch.userIdEnvKey) {
const existingIds = process.env[ch.userIdEnvKey] || "";
if (existingIds) {
console.log(` ✓ ${ch.name} — allowed IDs already set: ${existingIds}`);
} else {
console.log(` ${ch.userIdHelp}`);
const userId = (await prompt(` ${ch.userIdLabel}: `)).trim();
if (userId) {
process.env[ch.userIdEnvKey] = userId;
console.log(` ✓ ${ch.name} user ID saved`);
} else {
console.log(` Skipped ${ch.name} user ID (bot will require manual pairing)`);
}
}
}
if (ch.userIdEnvKey) {
const existingIds = process.env[ch.userIdEnvKey] || "";
if (existingIds) {
console.log(` ✓ ${ch.name} — allowed IDs already set: ${existingIds}`);
} else {
console.log(` ${ch.userIdHelp}`);
const userId = (await prompt(` ${ch.userIdLabel}: `)).trim();
if (userId) {
process.env[ch.userIdEnvKey] = userId;
saveCredential(ch.userIdEnvKey, userId);
console.log(` ✓ ${ch.name} user ID saved`);
} else {
console.log(` Skipped ${ch.name} user ID (bot will require manual pairing)`);
}
}
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bin/lib/onboard.js` around lines 3131 - 3145, The user IDs are only written
to process.env and not persisted; update the onboarding flow to persist allowed
IDs using the existing saveCredential helper. When handling ch.userIdEnvKey (the
same branch that reads existingIds from process.env and prompts via prompt),
read existingIds from the credential store if available and, after the user
enters a non-empty userId, call saveCredential(ch.userIdEnvKey, userId) and also
set process.env[ch.userIdEnvKey] = userId for the current run; keep the same
console messages for success/skip. Ensure you reference saveCredential,
ch.userIdEnvKey, process.env, and prompt when making the change.

Comment on lines +103 to 104
const SERVICE_NAMES = ["cloudflared"] as const;
type ServiceName = (typeof SERVICE_NAMES)[number];
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Keep a compatibility cleanup path for legacy telegram-bridge processes.

After this change, showStatus(), stopAll(), and getServiceStatuses() never touch telegram-bridge.pid anymore. Upgraded users can end up with the old bridge still running invisibly with its bot-token environment until they kill it manually. Please stop/prune that legacy process once here before fully dropping it, and mirror the same cleanup in scripts/start-services.sh.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/services.ts` around lines 103 - 104, SERVICE_NAMES was narrowed to
["cloudflared"] and left legacy telegram-bridge processes orphaned; update the
cleanup path so any existing telegram-bridge.pid is detected and pruned before
fully dropping support. In the functions showStatus, stopAll, and
getServiceStatuses add logic to check for telegram-bridge.pid, read the PID,
attempt a graceful stop (SIGTERM, fallback SIGKILL), remove the pid file, and
report the action; mirror the same PID-check-and-remove behavior in
scripts/start-services.sh so upgraded installs won't leave the old bridge
running with its bot-token env. Ensure you reference SERVICE_NAMES/ServiceName
unchanged for current services while adding this separate legacy-cleanup flow
tied to "telegram-bridge.pid".

Comment on lines +150 to +160
// Forward optional messaging tokens for the messaging-providers test
for (const key of [
"TELEGRAM_BOT_TOKEN",
"DISCORD_BOT_TOKEN",
"TELEGRAM_BOT_TOKEN_REAL",
"DISCORD_BOT_TOKEN_REAL",
"TELEGRAM_CHAT_ID_E2E",
]) {
if (process.env[key]) {
envParts.push(`export ${key}='${shellEscape(process.env[key])}'`);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't forward bot tokens in the SSH command string.

This change extends the existing argv leak to Telegram/Discord credentials: each export ${key}=... ends up in the local ssh '...' command, so the secrets are exposed to ps/CI diagnostics while the process is running. Send them over stdin or via a remote temp file instead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/brev-e2e.test.js` around lines 150 - 160, The test currently builds
SSH command-line exports by appending strings to envParts using shellEscape for
keys like TELEGRAM_BOT_TOKEN and DISCORD_BOT_TOKEN which causes secrets to
appear in the local ssh '...' command; instead stop putting these sensitive
exports into the SSH command string and transfer them via stdin or a remote temp
file: detect the same keys in envParts collection logic, remove or skip adding
export lines for secret keys, and write the key=value pairs to the remote host
through a secure channel (e.g., send on stdin and write to a temp file then
source it remotely, or create a remote temp file with proper permissions) before
running the test commands so secrets are never present in the local process
argv.

Comment on lines +694 to +702
it.runIf(TEST_SUITE === "messaging-providers" || TEST_SUITE === "all")(
"messaging credential provider suite passes on remote VM",
() => {
const output = runRemoteTest("test/e2e/test-messaging-providers.sh");
expect(output).toContain("PASS");
expect(output).not.toMatch(/FAIL:/);
},
900_000, // 15 min — creates a new sandbox with messaging providers
);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
sed -n '1,220p' test/e2e/test-messaging-providers.sh | nl -ba
rg -n -C2 'NEMOCLAW_SANDBOX_NAME|e2e-msg-provider|nemoclaw (onboard|deploy)|openshell sandbox' test/e2e/test-messaging-providers.sh

Repository: NVIDIA/NemoClaw

Length of output: 1664


🏁 Script executed:

# Find sshEnv() and runRemoteTest() definitions
rg -n "function sshEnv|const sshEnv|sshEnv\s*=" test/e2e/brev-e2e.test.js -A 10
rg -n "function runRemoteTest|const runRemoteTest|runRemoteTest\s*=" test/e2e/brev-e2e.test.js -A 15

Repository: NVIDIA/NemoClaw

Length of output: 1281


🏁 Script executed:

# Check if there's any environment cleanup between tests
rg -n "beforeEach|afterEach|beforeAll|afterAll" test/e2e/brev-e2e.test.js -A 3

Repository: NVIDIA/NemoClaw

Length of output: 898


🏁 Script executed:

# Find where sshEnv is called and how environment is used
rg -n "sshEnv\(\)" test/e2e/brev-e2e.test.js -B 2 -A 5

Repository: NVIDIA/NemoClaw

Length of output: 41


🏁 Script executed:

# Check if there's any special handling for the messaging-providers script path
rg -n "test-messaging-providers|messaging.provider" test/e2e/brev-e2e.test.js -B 3 -A 3

Repository: NVIDIA/NemoClaw

Length of output: 2129


🏁 Script executed:

# Check if runRemoteTest accepts parameters to customize environment
rg -n "runRemoteTest" test/e2e/brev-e2e.test.js

Repository: NVIDIA/NemoClaw

Length of output: 411


🏁 Script executed:

# Check if sshEnv has any override mechanism or if NEMOCLAW_SANDBOX_NAME is modified elsewhere
rg -n "NEMOCLAW_SANDBOX_NAME" test/e2e/brev-e2e.test.js

Repository: NVIDIA/NemoClaw

Length of output: 150


Fix sandbox isolation: messaging-providers test must not reuse the shared e2e-test sandbox.

The code comment (lines 691–693) claims the messaging-providers test "creates its own sandbox (e2e-msg-provider)", but sshEnv() (line 148) hardcodes NEMOCLAW_SANDBOX_NAME=e2e-test for all tests. When runRemoteTest() executes the messaging script, it passes through sshEnv(), which overwrites any intention to use a separate sandbox. The script's default on line 89 of test-messaging-providers.sh (SANDBOX_NAME="${NEMOCLAW_SANDBOX_NAME:-e2e-msg-provider}") is ineffective because the environment variable is already set.

When TEST_SUITE=all runs, the messaging-providers test reuses the shared e2e-test sandbox instead of isolating to e2e-msg-provider, causing potential interference with other E2E tests. Either:

  • Modify runRemoteTest() or sshEnv() to accept a sandboxName parameter and use it for the messaging test, or
  • Pass the correct NEMOCLAW_SANDBOX_NAME=e2e-msg-provider explicitly when running the messaging script
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/brev-e2e.test.js` around lines 694 - 702, The messaging-providers
test is accidentally forced to use the shared sandbox because sshEnv() currently
hardcodes NEMOCLAW_SANDBOX_NAME=e2e-test; update the test harness so the
messaging test can request a different sandbox. Change sshEnv() or
runRemoteTest() to accept an optional sandboxName parameter and, when provided
(e.g., from the "messaging credential provider suite" test), set
NEMOCLAW_SANDBOX_NAME to that value instead of the hardcoded e2e-test;
alternatively, when calling
runRemoteTest("test/e2e/test-messaging-providers.sh") pass an explicit env
override NEMOCLAW_SANDBOX_NAME=e2e-msg-provider so the script’s SANDBOX_NAME
default is respected. Ensure references to sshEnv(), runRemoteTest(), and the
test-messaging-providers.sh invocation are updated accordingly.

Comment on lines +88 to +99
it("removes [openshell-*]:port bracketed entries", () => {
// SSH known_hosts uses [host]:port for non-standard ports
const input = [
"[openshell-sandbox]:2222 ssh-ed25519 AAAA...",
"github.com ssh-rsa AAAA...",
].join("\n");
// The host field starts with "[openshell-" which starts with "["
// not "openshell-", so this line would be preserved by current logic.
// Documenting current behavior — bracket-prefixed entries are kept.
const result = pruneKnownHostsEntries(input);
expect(result).toBe(input);
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't codify bracketed known_hosts entries as "kept".

SSH stores non-default ports as [host]:port; preserving [openshell-*]:port means stale OpenShell keys still survive pruning and reconnects can keep failing after the gateway port changes. This test should expect removal, then update pruneKnownHostsEntries() to normalize bracketed host fields before matching.

💡 Expected assertion
-    const result = pruneKnownHostsEntries(input);
-    expect(result).toBe(input);
+    expect(pruneKnownHostsEntries(input)).toBe("github.com ssh-rsa AAAA...");
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
it("removes [openshell-*]:port bracketed entries", () => {
// SSH known_hosts uses [host]:port for non-standard ports
const input = [
"[openshell-sandbox]:2222 ssh-ed25519 AAAA...",
"github.com ssh-rsa AAAA...",
].join("\n");
// The host field starts with "[openshell-" which starts with "["
// not "openshell-", so this line would be preserved by current logic.
// Documenting current behavior — bracket-prefixed entries are kept.
const result = pruneKnownHostsEntries(input);
expect(result).toBe(input);
});
it("removes [openshell-*]:port bracketed entries", () => {
// SSH known_hosts uses [host]:port for non-standard ports
const input = [
"[openshell-sandbox]:2222 ssh-ed25519 AAAA...",
"github.com ssh-rsa AAAA...",
].join("\n");
// The host field starts with "[openshell-" which starts with "["
// not "openshell-", so this line would be preserved by current logic.
// Documenting current behavior — bracket-prefixed entries are kept.
expect(pruneKnownHostsEntries(input)).toBe("github.com ssh-rsa AAAA...");
});
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/ssh-known-hosts.test.js` around lines 88 - 99, The test currently
expects bracketed entries like "[openshell-sandbox]:2222" to be preserved, but
pruneKnownHostsEntries should remove such entries; update the test to expect
those lines to be pruned and modify pruneKnownHostsEntries to normalize host
fields by stripping surrounding brackets and any ":port" suffix (e.g., convert
"[openshell-sandbox]:2222" → "openshell-sandbox") before comparing against the
openshell pattern so bracketed non-default-port entries are matched and removed;
ensure the normalization logic is used wherever host matching occurs in
pruneKnownHostsEntries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request NemoClaw CLI Use this label to identify issues with the NemoClaw command-line interface (CLI).

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants