fix(onboard): recover from SSH 255 when sandbox was already created#1598
fix(onboard): recover from SSH 255 when sandbox was already created#1598
Conversation
When openshell sandbox create exits with SSH 255 after printing "Created sandbox:", NemoClaw previously hard-exited instead of checking whether the sandbox reached Ready state. This was a regression from the create-stream extraction (#1516) combined with the messaging provider migration path (#1081, #1527) that forces sandbox recreation. Two fixes: - streamSandboxCreate: do one final readyCheck on non-zero close to catch the race where the sandbox is already Ready when SSH dies. - onboard.js: when failure is sandbox_create_incomplete, fall through to the existing ready-wait loop (60s polling) instead of exiting. Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
📝 WalkthroughWalkthroughThis PR enhances sandbox creation error handling in two places: the onboard orchestrator now classifies creation failures and conditionally continues or exits, while the create-stream module adds a final readiness check when the child process exits with a non-zero code to override the error if the sandbox is actually ready. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Summary
openshell sandbox createexits with SSH 255 after printing "Created sandbox:", NemoClaw now treats this as recoverable instead of hard-exitingreadyCheckinstreamSandboxCreateon non-zero close to catch the race where the sandbox is already Ready when SSH diesonboard.js,sandbox_create_incompletefailures now fall through to the existing 60s ready-wait loop instead of callingprocess.exit()Root cause
Regression from #1516 (create-stream extraction) combined with #1081/#1527 (messaging provider migration forcing sandbox recreation). The create stream returns non-zero after "Created sandbox:" and NemoClaw exits before checking if the sandbox reached Ready state.
Test plan
sandbox-create-streamtests pass (7/7)Summary by CodeRabbit