Feature: add Terminus interactive terminal agent for TerminalBench #12
Draft
puririshi98 wants to merge 8 commits intosdevare-nv:sdd/codex-wipfrom
Draft
Feature: add Terminus interactive terminal agent for TerminalBench #12puririshi98 wants to merge 8 commits intosdevare-nv:sdd/codex-wipfrom
puririshi98 wants to merge 8 commits intosdevare-nv:sdd/codex-wipfrom
Conversation
Implements Terminus, an interactive terminal session manager that enables persistent terminal sessions with state preservation for TerminalBench evaluation tasks. Follows the OpenCode/Codex agent pattern from PR sdevare-nv#11. Key features: - Persistent terminal sessions with environment and directory preservation - Interactive process support (REPLs, debuggers, interactive CLIs) - Multi-session management with isolated environments - Control sequence support (Ctrl+C, Ctrl+D, etc.) - Timeout handling and automatic cleanup - PTY-based terminal emulation Implementation includes: - 4 new action types (START, EXECUTE, INPUT, STOP) - 3 new observation types (OUTPUT, ERROR, SESSION) - Complete session manager with PTY-based implementation - LLM tool definitions for agent integration - Comprehensive test suite with all tests passing - Full documentation and usage examples This enables OpenHands to handle terminal-intensive tasks requiring persistent shell state, interactive processes, and sophisticated command execution workflows. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
5 comprehensive test scenarios, all passing:
1. ✅ Basic Sequential Commands - Verified multiple commands execute in order
2. ✅ State Persistence - Confirmed environment variables, directories, and files persist across commands
3. ✅ Error Recovery - Validated that failed commands don't break sessions
4. ✅ Complex Workflows - Tested realistic multi-step tasks (creating and running Python scripts)
5. ✅ Parallel Sessions - Verified multiple concurrent sessions with proper isolation
Test Results
╔══════════════════════════════════════════════════════════╗
║ TEST SUMMARY ║
╠══════════════════════════════════════════════════════════╣
║ Total Tests: 5 ║
║ Passed: 5 ║
║ Failed: 0 ║
╚══════════════════════════════════════════════════════════╝
✓ All tests passed!
This commit fixes all identified functionality bugs and adds comprehensive test coverage to prevent regressions. ## Bugs Fixed: **Fix #1: Exit code extraction no longer contaminates output buffer** - Modified execute_command to save buffer before extracting exit code - Created async _extract_exit_code_async method that doesn't pollute output - Exit code extraction now uses separate buffer that gets discarded **Fix sdevare-nv#2: Race conditions prevented with proper locking** - Added async lock to execute_command method - Prevents concurrent command execution on same session - Ensures output from different commands doesn't get mixed **Fix sdevare-nv#3: NameError in create_session error handling fixed** - Initialize master_fd and process variables before try block - Prevents NameError if exception occurs before PTY creation - Also added process cleanup in error handler **Fix sdevare-nv#4: Robust prompt detection with custom marker** - Replaced brittle regex patterns ([$#>]$) with unique marker - Set custom PS1 with PROMPT_MARKER after shell initialization - Commands outputting $, #, or > no longer cause false timeouts - Added _clean_output method to remove markers from user-visible output **Fix sdevare-nv#5: Process state validation added** - Check if shell process is still alive before executing commands - Prevents hangs when process has crashed or been killed externally - Added same check to send_input method **Fix sdevare-nv#6: Buffer clearing timing issues resolved** - Read and discard stale output before clearing buffers - Prevents output from previous commands contaminating current command - Fixed rapid successive command execution **Fix sdevare-nv#7: Thread-safe singleton pattern implemented** - Added async lock for get_session_manager_async - Created separate sync version for backward compatibility - Prevents race conditions in manager initialization **Fix sdevare-nv#8: Complete resource cleanup on errors** - Kill and wait for process in error handler - Close file descriptors even if process creation fails - No more zombie processes or fd leaks **Fix sdevare-nv#9: Blocking sleep replaced with async sleep** - Changed time.sleep to await asyncio.sleep in _extract_exit_code_async - Prevents blocking the event loop - Improved async performance ## Additional Improvements: - Enhanced ANSI escape code handling in exit code extraction - Added support for CSI sequences with ? character - Improved regex to handle terminal control sequences - Better exit code parsing for all edge cases ## Test Coverage: Added comprehensive test suite (test_terminus_bugfixes.py) with 12 tests: - 7 tests for specific bug fixes - 5 tests for previously uncovered scenarios: * Concurrent command execution * Rapid successive commands * Environment persistence * Exit code accuracy * Multi-line output handling All 12 tests pass successfully. ## Tested Scenarios: ✅ Exit code extraction doesn't contaminate subsequent commands ✅ Concurrent commands are properly serialized ✅ Error handling doesn't raise NameError ✅ Commands with prompt-like output ($, #, >) don't timeout ✅ Dead process detection works correctly ✅ Rapid successive commands don't mix output ✅ Resource cleanup is complete even on errors ✅ Environment variables persist across commands ✅ Exit codes (0, 1, custom) are captured accurately ✅ Multi-line command output is preserved Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
fix: address 9 critical bugs in Terminus implementation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Files Committed
New Files (16):
Modified Files (6):