Retry and replay for agent runs #336

matheper · 2026-01-21T18:08:33Z

This pull request implements retry and replay functionality for agent runs in the event of terminal or timeout failures. It introduces a mechanism to automatically retry failed tasks up to a configurable limit, replaying previous actions to ensure continuity and minimize lost progress. The changes also improve error handling, progress tracking, and trajectory management throughout the agent execution lifecycle.

The most important changes are:

Retry and Replay Mechanism:

Added a --max-retries argument to allow configuration of the maximum number of retries for terminal or timeout failures, with replay of previous steps on retry (debug_gym/agents/utils.py, scripts/run.py). [1] [2]
Implemented logic in scripts/run.py to catch UnrecoverableTerminalError and AgentTimeoutException, save the current trajectory, reload actions from previous attempts, and replay them before generating new actions on retry. [1] [2]
Modified BaseAgent.run to accept a replay_actions argument, replaying previous LLM responses before proceeding with new steps, and logging replay details for traceability (debug_gym/agents/base_agent.py). [1] [2]

Trajectory Management:

Added load_trajectory utility to reconstruct LLMResponse objects from saved trajectories, enabling accurate action replay (debug_gym/agents/utils.py).
Ensured that trajectories are saved after terminal failures and before retries, to support seamless recovery (scripts/run.py).

Error Handling Improvements:

Enhanced error propagation and handling for unrecoverable terminal errors by attaching the last known env_info to exceptions, enabling better recovery and replay (debug_gym/gym/envs/env.py, debug_gym/gym/terminals/terminal.py, debug_gym/agents/base_agent.py). [1] [2] [3]
Cleaned up exception handling in BaseAgent.run and scripts/run.py to avoid duplicate error reporting and ensure accurate progress logging. [1] [2] [3] [4] [5]

Progress Tracking:

Updated DebugGymLogger to more accurately track completed tasks using a set of completed task IDs, ensuring correct progress reporting in the presence of retries (debug_gym/logger.py). [1] [2] [3]

Minor Fixes and Refactoring:

Fixed minor formatting and import issues, and improved test coverage for the new trajectory loading utility (debug_gym/agents/base_agent.py, debug_gym/agents/utils.py, debug_gym/gym/envs/env.py, tests/agents/test_utils.py). [1] [2] [3] [4]

…r unrecoverable terminal states

Copilot

Pull request overview

This pull request implements a comprehensive retry and replay mechanism for agent runs to recover from terminal failures (e.g., spot instance evictions, container deletions) and timeouts. The feature automatically retries failed tasks up to a configurable limit, replaying previously executed actions to minimize lost progress.

Changes:

Added --max-retries CLI argument (default: 3) to configure retry attempts with automatic trajectory replay
Enhanced exception handling to attach environment state to UnrecoverableTerminalError for better recovery
Improved progress tracking to use set-based completion tracking, preventing duplicate counts during retries

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
`scripts/run.py`	Implements the retry loop with trajectory saving/loading and action replay on terminal/timeout failures
`debug_gym/agents/base_agent.py`	Adds `replay_actions` parameter to `run()` method and updates `execute_action()` to record failed steps in history
`debug_gym/agents/utils.py`	Adds `load_trajectory()` function to reconstruct `LLMResponse` objects from saved trajectories and `--max-retries` CLI argument
`debug_gym/logger.py`	Refactors completion tracking from counter to set to prevent duplicate counting during retries
`debug_gym/gym/terminals/terminal.py`	Enhances `UnrecoverableTerminalError` to accept and store `env_info` for replay purposes
`debug_gym/gym/envs/env.py`	Attaches `env_info` to `UnrecoverableTerminalError` before re-raising; removes unnecessary f-string
`tests/agents/test_utils.py`	Adds comprehensive tests for `load_trajectory()` covering various edge cases
`tests/agents/test_base_agent.py`	Adds tests for `execute_action()` exception handling and replay functionality
`tests/gym/envs/test_unrecoverable_terminal.py`	Updates test to verify exception now includes `env_info`

Comments suppressed due to low confidence (1)

scripts/run.py:179

The code unconditionally references agent and env on lines 171 and 176, but if the retry loop exits due to a KeyboardInterrupt (line 157) or if all retries fail before agent/env are created, these variables may be undefined, resulting in a NameError. Initialize both variables to None before the loop and add existence checks before using them.

        # save trajectory
        save_trajectory(agent, task_path, task_logger)

        # optionally apply patch
        if config.get("save_patch", True):
            try:
                save_patch(env, task_path, task_logger)
            except Exception as patch_error:
                # Terminal may be unavailable (e.g., pod died), log and continue
                task_logger.warning(f"Could not save patch: {patch_error!r}")

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

scripts/run.py

debug_gym/agents/base_agent.py

scripts/run.py

tests/agents/test_base_agent.py

debug_gym/agents/base_agent.py

debug_gym/gym/envs/env.py

MarcCote

Left a few comments. Overall LGTM.

…nsure trajectories and patches are saved on errors

…r management and logging

matheper added 4 commits January 21, 2026 09:40

Implement retry logic for agent actions and enhance error handling fo…

b426f6c

…r unrecoverable terminal states

Add tests for load_trajectory function

778389a

Add tests for BaseAgent.execute_action

d4e07ad

Add tests for replay_actions functionality in BaseAgent.run()

d714ec3

matheper requested review from MarcCote and Copilot January 21, 2026 18:08

Merge branch 'main' into retry-agent-run

b2329f8

Copilot started reviewing on behalf of matheper January 21, 2026 18:09 View session

Copilot AI reviewed Jan 21, 2026

View reviewed changes

matheper added 2 commits January 21, 2026 10:26

Update pre-commit: isort 7.0.0 and black 26.1.0

5730a22

Remove unused mock_reset function from TestBaseAgentRunReplayActions

0a68120

MarcCote reviewed Jan 23, 2026

View reviewed changes

debug_gym/agents/base_agent.py Show resolved Hide resolved

MarcCote reviewed Jan 23, 2026

View reviewed changes

debug_gym/gym/envs/env.py Outdated Show resolved Hide resolved

MarcCote reviewed Jan 23, 2026

View reviewed changes

debug_gym/gym/envs/env.py Outdated Show resolved Hide resolved

MarcCote approved these changes Jan 23, 2026

View reviewed changes

matheper added 5 commits January 23, 2026 08:46

Enhance error handling in save_patch and save_trajectory functions; e…

8e03ead

…nsure trajectories and patches are saved on errors

Add tests for save_trajectory function

3177508

Implement centralized error handling in RepoEnv class to improve erro…

42fac59

…r management and logging

Rename error handling method to clarify its purpose

583abad

Merge branch 'main' into retry-agent-run

eb32de4

MarcCote approved these changes Jan 23, 2026

View reviewed changes

matheper merged commit 86b881d into main Jan 23, 2026
11 checks passed

matheper deleted the retry-agent-run branch January 23, 2026 19:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retry and replay for agent runs #336

Retry and replay for agent runs #336

Uh oh!

matheper commented Jan 21, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MarcCote left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Retry and replay for agent runs #336

Retry and replay for agent runs #336

Uh oh!

Conversation

matheper commented Jan 21, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MarcCote left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants