Feature: add Terminus interactive terminal agent for TerminalBench by puririshi98 · Pull Request #12 · sdevare-nv/nv-OpenHands

puririshi98 · 2026-02-18T21:10:28Z

Files Committed

New Files (16):

✅ Terminus core implementation and tools (9 files)
✅ Action and observation classes (2 files)
✅ Test scripts
✅ Documentation (3 files)

Modified Files (6):

✅ Schema files (action.py, observation.py)
✅ Serialization files
✅ Tool names registry
✅ TerminalBench README

Implements Terminus, an interactive terminal session manager that enables persistent terminal sessions with state preservation for TerminalBench evaluation tasks. Follows the OpenCode/Codex agent pattern from PR sdevare-nv#11. Key features: - Persistent terminal sessions with environment and directory preservation - Interactive process support (REPLs, debuggers, interactive CLIs) - Multi-session management with isolated environments - Control sequence support (Ctrl+C, Ctrl+D, etc.) - Timeout handling and automatic cleanup - PTY-based terminal emulation Implementation includes: - 4 new action types (START, EXECUTE, INPUT, STOP) - 3 new observation types (OUTPUT, ERROR, SESSION) - Complete session manager with PTY-based implementation - LLM tool definitions for agent integration - Comprehensive test suite with all tests passing - Full documentation and usage examples This enables OpenHands to handle terminal-intensive tasks requiring persistent shell state, interactive processes, and sophisticated command execution workflows. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

5 comprehensive test scenarios, all passing: 1. ✅ Basic Sequential Commands - Verified multiple commands execute in order 2. ✅ State Persistence - Confirmed environment variables, directories, and files persist across commands 3. ✅ Error Recovery - Validated that failed commands don't break sessions 4. ✅ Complex Workflows - Tested realistic multi-step tasks (creating and running Python scripts) 5. ✅ Parallel Sessions - Verified multiple concurrent sessions with proper isolation Test Results ╔══════════════════════════════════════════════════════════╗ ║ TEST SUMMARY ║ ╠══════════════════════════════════════════════════════════╣ ║ Total Tests: 5 ║ ║ Passed: 5 ║ ║ Failed: 0 ║ ╚══════════════════════════════════════════════════════════╝ ✓ All tests passed!

This commit fixes all identified functionality bugs and adds comprehensive test coverage to prevent regressions. ## Bugs Fixed: **Fix #1: Exit code extraction no longer contaminates output buffer** - Modified execute_command to save buffer before extracting exit code - Created async _extract_exit_code_async method that doesn't pollute output - Exit code extraction now uses separate buffer that gets discarded **Fix sdevare-nv#2: Race conditions prevented with proper locking** - Added async lock to execute_command method - Prevents concurrent command execution on same session - Ensures output from different commands doesn't get mixed **Fix sdevare-nv#3: NameError in create_session error handling fixed** - Initialize master_fd and process variables before try block - Prevents NameError if exception occurs before PTY creation - Also added process cleanup in error handler **Fix sdevare-nv#4: Robust prompt detection with custom marker** - Replaced brittle regex patterns ([$#>]$) with unique marker - Set custom PS1 with PROMPT_MARKER after shell initialization - Commands outputting $, #, or > no longer cause false timeouts - Added _clean_output method to remove markers from user-visible output **Fix sdevare-nv#5: Process state validation added** - Check if shell process is still alive before executing commands - Prevents hangs when process has crashed or been killed externally - Added same check to send_input method **Fix sdevare-nv#6: Buffer clearing timing issues resolved** - Read and discard stale output before clearing buffers - Prevents output from previous commands contaminating current command - Fixed rapid successive command execution **Fix sdevare-nv#7: Thread-safe singleton pattern implemented** - Added async lock for get_session_manager_async - Created separate sync version for backward compatibility - Prevents race conditions in manager initialization **Fix sdevare-nv#8: Complete resource cleanup on errors** - Kill and wait for process in error handler - Close file descriptors even if process creation fails - No more zombie processes or fd leaks **Fix sdevare-nv#9: Blocking sleep replaced with async sleep** - Changed time.sleep to await asyncio.sleep in _extract_exit_code_async - Prevents blocking the event loop - Improved async performance ## Additional Improvements: - Enhanced ANSI escape code handling in exit code extraction - Added support for CSI sequences with ? character - Improved regex to handle terminal control sequences - Better exit code parsing for all edge cases ## Test Coverage: Added comprehensive test suite (test_terminus_bugfixes.py) with 12 tests: - 7 tests for specific bug fixes - 5 tests for previously uncovered scenarios: * Concurrent command execution * Rapid successive commands * Environment persistence * Exit code accuracy * Multi-line output handling All 12 tests pass successfully. ## Tested Scenarios: ✅ Exit code extraction doesn't contaminate subsequent commands ✅ Concurrent commands are properly serialized ✅ Error handling doesn't raise NameError ✅ Commands with prompt-like output ($, #, >) don't timeout ✅ Dead process detection works correctly ✅ Rapid successive commands don't mix output ✅ Resource cleanup is complete even on errors ✅ Environment variables persist across commands ✅ Exit codes (0, 1, custom) are captured accurately ✅ Multi-line command output is preserved Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

fix: address 9 critical bugs in Terminus implementation

puririshi98 and others added 2 commits February 18, 2026 21:05

adding my simple test to the PR, can remove after review

aa3cbb3

puririshi98 marked this pull request as draft February 18, 2026 21:22

puririshi98 and others added 6 commits February 18, 2026 14:58

Delete test_terminus.py

e9ab802

Delete simple_terminus_test.py

482dd6f

Delete demo_terminus.py

c8cbf02

Merge pull request #1 from puririshi98/pr-12

6839d5c

fix: address 9 critical bugs in Terminus implementation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: add Terminus interactive terminal agent for TerminalBench #12

Feature: add Terminus interactive terminal agent for TerminalBench #12
puririshi98 wants to merge 8 commits intosdevare-nv:sdd/codex-wipfrom
puririshi98:rishi-terminus

puririshi98 commented Feb 18, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

puririshi98 commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

puririshi98 commented Feb 18, 2026 •

edited

Loading