Skip to content

Conversation

@matheper
Copy link
Collaborator

This pull request implements retry and replay functionality for agent runs in the event of terminal or timeout failures. It introduces a mechanism to automatically retry failed tasks up to a configurable limit, replaying previous actions to ensure continuity and minimize lost progress. The changes also improve error handling, progress tracking, and trajectory management throughout the agent execution lifecycle.

The most important changes are:

Retry and Replay Mechanism:

  • Added a --max-retries argument to allow configuration of the maximum number of retries for terminal or timeout failures, with replay of previous steps on retry (debug_gym/agents/utils.py, scripts/run.py). [1] [2]
  • Implemented logic in scripts/run.py to catch UnrecoverableTerminalError and AgentTimeoutException, save the current trajectory, reload actions from previous attempts, and replay them before generating new actions on retry. [1] [2]
  • Modified BaseAgent.run to accept a replay_actions argument, replaying previous LLM responses before proceeding with new steps, and logging replay details for traceability (debug_gym/agents/base_agent.py). [1] [2]

Trajectory Management:

  • Added load_trajectory utility to reconstruct LLMResponse objects from saved trajectories, enabling accurate action replay (debug_gym/agents/utils.py).
  • Ensured that trajectories are saved after terminal failures and before retries, to support seamless recovery (scripts/run.py).

Error Handling Improvements:

  • Enhanced error propagation and handling for unrecoverable terminal errors by attaching the last known env_info to exceptions, enabling better recovery and replay (debug_gym/gym/envs/env.py, debug_gym/gym/terminals/terminal.py, debug_gym/agents/base_agent.py). [1] [2] [3]
  • Cleaned up exception handling in BaseAgent.run and scripts/run.py to avoid duplicate error reporting and ensure accurate progress logging. [1] [2] [3] [4] [5]

Progress Tracking:

  • Updated DebugGymLogger to more accurately track completed tasks using a set of completed task IDs, ensuring correct progress reporting in the presence of retries (debug_gym/logger.py). [1] [2] [3]

Minor Fixes and Refactoring:

  • Fixed minor formatting and import issues, and improved test coverage for the new trajectory loading utility (debug_gym/agents/base_agent.py, debug_gym/agents/utils.py, debug_gym/gym/envs/env.py, tests/agents/test_utils.py). [1] [2] [3] [4]

@matheper matheper requested review from MarcCote and Copilot January 21, 2026 18:08
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request implements a comprehensive retry and replay mechanism for agent runs to recover from terminal failures (e.g., spot instance evictions, container deletions) and timeouts. The feature automatically retries failed tasks up to a configurable limit, replaying previously executed actions to minimize lost progress.

Changes:

  • Added --max-retries CLI argument (default: 3) to configure retry attempts with automatic trajectory replay
  • Enhanced exception handling to attach environment state to UnrecoverableTerminalError for better recovery
  • Improved progress tracking to use set-based completion tracking, preventing duplicate counts during retries

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
scripts/run.py Implements the retry loop with trajectory saving/loading and action replay on terminal/timeout failures
debug_gym/agents/base_agent.py Adds replay_actions parameter to run() method and updates execute_action() to record failed steps in history
debug_gym/agents/utils.py Adds load_trajectory() function to reconstruct LLMResponse objects from saved trajectories and --max-retries CLI argument
debug_gym/logger.py Refactors completion tracking from counter to set to prevent duplicate counting during retries
debug_gym/gym/terminals/terminal.py Enhances UnrecoverableTerminalError to accept and store env_info for replay purposes
debug_gym/gym/envs/env.py Attaches env_info to UnrecoverableTerminalError before re-raising; removes unnecessary f-string
tests/agents/test_utils.py Adds comprehensive tests for load_trajectory() covering various edge cases
tests/agents/test_base_agent.py Adds tests for execute_action() exception handling and replay functionality
tests/gym/envs/test_unrecoverable_terminal.py Updates test to verify exception now includes env_info
Comments suppressed due to low confidence (1)

scripts/run.py:179

  • The code unconditionally references agent and env on lines 171 and 176, but if the retry loop exits due to a KeyboardInterrupt (line 157) or if all retries fail before agent/env are created, these variables may be undefined, resulting in a NameError. Initialize both variables to None before the loop and add existence checks before using them.
        # save trajectory
        save_trajectory(agent, task_path, task_logger)

        # optionally apply patch
        if config.get("save_patch", True):
            try:
                save_patch(env, task_path, task_logger)
            except Exception as patch_error:
                # Terminal may be unavailable (e.g., pod died), log and continue
                task_logger.warning(f"Could not save patch: {patch_error!r}")

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Collaborator

@MarcCote MarcCote left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments. Overall LGTM.

@matheper matheper merged commit 86b881d into main Jan 23, 2026
11 checks passed
@matheper matheper deleted the retry-agent-run branch January 23, 2026 19:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants