Skip to content

Feature: add Terminus interactive terminal agent for TerminalBench #12

Draft
puririshi98 wants to merge 8 commits intosdevare-nv:sdd/codex-wipfrom
puririshi98:rishi-terminus
Draft

Feature: add Terminus interactive terminal agent for TerminalBench #12
puririshi98 wants to merge 8 commits intosdevare-nv:sdd/codex-wipfrom
puririshi98:rishi-terminus

Conversation

@puririshi98
Copy link

@puririshi98 puririshi98 commented Feb 18, 2026

Files Committed

New Files (16):

  • ✅ Terminus core implementation and tools (9 files)
  • ✅ Action and observation classes (2 files)
  • ✅ Test scripts
  • ✅ Documentation (3 files)

Modified Files (6):

  • ✅ Schema files (action.py, observation.py)
  • ✅ Serialization files
  • ✅ Tool names registry
  • ✅ TerminalBench README

puririshi98 and others added 2 commits February 18, 2026 21:05
Implements Terminus, an interactive terminal session manager that enables
persistent terminal sessions with state preservation for TerminalBench
evaluation tasks. Follows the OpenCode/Codex agent pattern from PR sdevare-nv#11.

Key features:
- Persistent terminal sessions with environment and directory preservation
- Interactive process support (REPLs, debuggers, interactive CLIs)
- Multi-session management with isolated environments
- Control sequence support (Ctrl+C, Ctrl+D, etc.)
- Timeout handling and automatic cleanup
- PTY-based terminal emulation

Implementation includes:
- 4 new action types (START, EXECUTE, INPUT, STOP)
- 3 new observation types (OUTPUT, ERROR, SESSION)
- Complete session manager with PTY-based implementation
- LLM tool definitions for agent integration
- Comprehensive test suite with all tests passing
- Full documentation and usage examples

This enables OpenHands to handle terminal-intensive tasks requiring
persistent shell state, interactive processes, and sophisticated
command execution workflows.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@puririshi98 puririshi98 marked this pull request as draft February 18, 2026 21:22
puririshi98 and others added 6 commits February 18, 2026 14:58
5 comprehensive test scenarios, all passing:                                                                                                                                                                                                      
                                                                                                                                                                                                                                                    
  1. ✅ Basic Sequential Commands - Verified multiple commands execute in order                                                                                                                                                                     
  2. ✅ State Persistence - Confirmed environment variables, directories, and files persist across commands                                                                                                                                         
  3. ✅ Error Recovery - Validated that failed commands don't break sessions                                                                                                                                                                        
  4. ✅ Complex Workflows - Tested realistic multi-step tasks (creating and running Python scripts)                                                                                                                                                 
  5. ✅ Parallel Sessions - Verified multiple concurrent sessions with proper isolation                                                                                                                                                             
                                                                                                                                                                                                                                                    
  Test Results                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                    
  ╔══════════════════════════════════════════════════════════╗                                                                                                                                                                                      
  ║              TEST SUMMARY                                ║                                                                                                                                                                                      
  ╠══════════════════════════════════════════════════════════╣                                                                                                                                                                                      
  ║  Total Tests: 5                                          ║                                                                                                                                                                                      
  ║  Passed: 5                                               ║                                                                                                                                                                                      
  ║  Failed: 0                                               ║                                                                                                                                                                                      
  ╚══════════════════════════════════════════════════════════╝                                                                                                                                                                                      
                                                                                                                                                                                                                                                    
  ✓ All tests passed!
This commit fixes all identified functionality bugs and adds comprehensive
test coverage to prevent regressions.

## Bugs Fixed:

**Fix #1: Exit code extraction no longer contaminates output buffer**
- Modified execute_command to save buffer before extracting exit code
- Created async _extract_exit_code_async method that doesn't pollute output
- Exit code extraction now uses separate buffer that gets discarded

**Fix sdevare-nv#2: Race conditions prevented with proper locking**
- Added async lock to execute_command method
- Prevents concurrent command execution on same session
- Ensures output from different commands doesn't get mixed

**Fix sdevare-nv#3: NameError in create_session error handling fixed**
- Initialize master_fd and process variables before try block
- Prevents NameError if exception occurs before PTY creation
- Also added process cleanup in error handler

**Fix sdevare-nv#4: Robust prompt detection with custom marker**
- Replaced brittle regex patterns ([$#>]$) with unique marker
- Set custom PS1 with PROMPT_MARKER after shell initialization
- Commands outputting $, #, or > no longer cause false timeouts
- Added _clean_output method to remove markers from user-visible output

**Fix sdevare-nv#5: Process state validation added**
- Check if shell process is still alive before executing commands
- Prevents hangs when process has crashed or been killed externally
- Added same check to send_input method

**Fix sdevare-nv#6: Buffer clearing timing issues resolved**
- Read and discard stale output before clearing buffers
- Prevents output from previous commands contaminating current command
- Fixed rapid successive command execution

**Fix sdevare-nv#7: Thread-safe singleton pattern implemented**
- Added async lock for get_session_manager_async
- Created separate sync version for backward compatibility
- Prevents race conditions in manager initialization

**Fix sdevare-nv#8: Complete resource cleanup on errors**
- Kill and wait for process in error handler
- Close file descriptors even if process creation fails
- No more zombie processes or fd leaks

**Fix sdevare-nv#9: Blocking sleep replaced with async sleep**
- Changed time.sleep to await asyncio.sleep in _extract_exit_code_async
- Prevents blocking the event loop
- Improved async performance

## Additional Improvements:

- Enhanced ANSI escape code handling in exit code extraction
- Added support for CSI sequences with ? character
- Improved regex to handle terminal control sequences
- Better exit code parsing for all edge cases

## Test Coverage:

Added comprehensive test suite (test_terminus_bugfixes.py) with 12 tests:
- 7 tests for specific bug fixes
- 5 tests for previously uncovered scenarios:
  * Concurrent command execution
  * Rapid successive commands
  * Environment persistence
  * Exit code accuracy
  * Multi-line output handling

All 12 tests pass successfully.

## Tested Scenarios:

✅ Exit code extraction doesn't contaminate subsequent commands
✅ Concurrent commands are properly serialized
✅ Error handling doesn't raise NameError
✅ Commands with prompt-like output ($, #, >) don't timeout
✅ Dead process detection works correctly
✅ Rapid successive commands don't mix output
✅ Resource cleanup is complete even on errors
✅ Environment variables persist across commands
✅ Exit codes (0, 1, custom) are captured accurately
✅ Multi-line command output is preserved

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
fix: address 9 critical bugs in Terminus implementation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments