Incompatibility between TorchRL and Gymnasium 1.0: Auto-Reset Feature Breaks Modularity and Data Integrity #2483

vmoens · 2024-10-10T07:15:28Z

vmoens
Oct 10, 2024
Collaborator

Description

The recent release of Gymnasium 1.0 introduces an auto-reset feature that silently but irrevocably changes the behavior of the step method in some but not all environments. While this feature may be useful for certain use cases, it breaks the modularity and data integrity assumptions in TorchRL.

Because of this, as of today there is no plan of supporting gymnasium v1.0 within the library.

Specifically, the auto-reset feature causes the following issues:

Unpredictable step counting: The number of steps executed by the environment becomes unpredictable, making it difficult to accurately count steps and manage training loops.
Data corruption: The environment may produce junk data during resets (reward, done states) or ask for garbage data (purposeless actions), which can pollute buffers and compromise the integrity of the training data if not filtered out.

Regarding 1. and 2.: This is true for vectorized environments as well as regular ones where autoresetting has been toggled on. From a TorchRL perspective, this means that the same script will behave differently when the same backend (gymnasium) is used but a ParallelEnv(GymWrapper(Env)) or a GymWrapper(VectorEnv) will be used. The only fix will be for you, the user, to account for these changes which will otherwise silently corrupt your data. This is not a responsibility we think TorchRL should endorse.

One may argue that resets are infrequent but in some frameworks (eg, Roobohive) they can occur as often as every 50 steps, which could lead to a 2% amount of corrupted data in the buffer.

We believe that if BC cannot be guaranteed, illegitimate behaviors should be prevented by raising errors or warnings during code execution to let users know that with the dependency they are using the intended behavior may not occur. With the silent BC-breaking changes introduced in 1.0, there is no way to warn users at runtime that they may be using the library in a way that is not intended.

Increased computational overhead: The additional complexity of auto-resets requires manual filtering and boilerplate code to mitigate these issues, compromising the efficiency and ease of use of TorchRL.
There is a more fundamental issue from the torchrl perspective. This is a typical rollout loop in gymnasium:

replay_buffer = []
obs, _ = envs.reset()
autoreset = np.zeros(envs.num_envs)
for _ in range(total_timesteps):
    next_obs, rewards, terminations, truncations, _ = envs.step(envs.action_space.sample())

    for j in range(envs.num_envs):
        if not autoreset[j]:
            replay_buffer.append((
                obs[j], rewards[j], terminations[j], truncations[j], next_obs[j]
            ))

    obs = next_obs
    autoreset = np.logical_or(terminations, truncations)

Now take the torchrl rollout (without resets since they should be accounted for):

for _ in range(total_timesteps):
	td = env.step(policy(td))
    buffer.add(td)
    td = step_mdp(td)

Imagine you are using gymnasium v1.0 vectorized envs in the backend. Before reset time, the ("next", "done") entry will be True.
This marks your data as the last of the trajectory: split_trajectories or GAE will consider that as the end of an episode. This data is not corrupted.
The problem arises in the next step. Now, we'll carry that observation to the root in step_mdp and put the reset "observation" key in the "next" tensordict during the next call to step (which is by essence a call to reset). This will silently cause the last observation of trajectory t to be considered as the first of trajectory t+1.

TensorDict(
    "observation": torch.Tensor(...),  # carried from the previous next, belongs to previous trajectory
    "next": TensorDict(
        "observation": torch.Tensor(...),  # obs obtained though call to step, belongs to new traj
    )
)

This means that every single trajectory collected by torchrl (except the first after reset) will be corrupted.

Another issue we note is that the new autorest API makes the number of steps executed in an environment potentially conditioned on the algorithm used for training. Take CartPole as an example: this environment has a reset frequency tightly linked to the performance of the algorithm (better algorithms have fewer resets). If we allocate ourselves a 1M steps budget for training, and train using two algorithms of different quality.

With gymnasium 1.0, the "bad" algorithm will have fewer total steps (because it will have more resets, which count as steps), so probably an even poorer performance. The opposite will be true for the "good" algorithm, and therefore the difference between the two will be amplified by the API choice (note that the effect is opposite if the environment resets more frequently when it is solved).

To maintain the integrity and efficiency of our library, we cannot support Gymnasium 1.0 or later versions at this time. We believe that the auto-reset feature as implemented is incompatible with the design principles of TorchRL, which prioritize modularity, data integrity, and ease of use.

Proposed Solutions:

We propose the following solutions to address this compatibility issue:

Partial resets: Introduce a partial reset mechanism that allows environments to reset without producing invalid data or affecting step counting.
Optional auto-reset: Make the auto-reset feature optional, allowing users to choose whether to enable it or not.
Alternative API: Provide an alternative API that maintains the original behavior of the step method, allowing TorchRL to continue supporting Gymnasium environments without breaking modularity and data integrity.

Unless some version of these is implemented, TorchRL will not be able to support gymnasium v1.0 and further release.

TorchRL is willing to make changes in its GymWrapper classes internally to make it compatible with gymnasium 1.0. As of now, any such work would still require us to change all the training scripts we have and ask users to do the same.

Discussion:

We would like to discuss this issue with the Gymnasium community and explore possible solutions that balance the needs of both libraries. We believe that finding a compatible solution will benefit both communities and promote the development of more robust and efficient reinforcement learning pipelines.

We strongly believe that Gym was a cornerstone in the development of RL thanks to its simplicity and the low probability for users to get things wrong. Let's work together to keep this standard alive!

carschandler
Nov 12, 2024

Is the current suggestion to pin to gymnasium<1.0.0 or is OpenAI/Gym preferred at this point?

1 reply

vmoens Nov 12, 2024
Collaborator Author

Either of these should work, and gymnasium folks are thankfully working on patches for further releases!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incompatibility between TorchRL and Gymnasium 1.0: Auto-Reset Feature Breaks Modularity and Data Integrity #2483

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Incompatibility between TorchRL and Gymnasium 1.0: Auto-Reset Feature Breaks Modularity and Data Integrity #2483

vmoens Oct 10, 2024 Collaborator

Description

Proposed Solutions:

Discussion:

Replies: 1 comment · 1 reply

carschandler Nov 12, 2024

vmoens Nov 12, 2024 Collaborator Author

vmoens
Oct 10, 2024
Collaborator

Replies: 1 comment 1 reply

carschandler
Nov 12, 2024

vmoens Nov 12, 2024
Collaborator Author