Replies: 1 comment 1 reply
-
Is the current suggestion to pin to |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Description
The recent release of Gymnasium 1.0 introduces an auto-reset feature that silently but irrevocably changes the behavior of the step method in some but not all environments. While this feature may be useful for certain use cases, it breaks the modularity and data integrity assumptions in TorchRL.
Because of this, as of today there is no plan of supporting gymnasium v1.0 within the library.
Specifically, the auto-reset feature causes the following issues:
Regarding 1. and 2.: This is true for vectorized environments as well as regular ones where autoresetting has been toggled on. From a TorchRL perspective, this means that the same script will behave differently when the same backend (gymnasium) is used but a
ParallelEnv(GymWrapper(Env))
or aGymWrapper(VectorEnv)
will be used. The only fix will be for you, the user, to account for these changes which will otherwise silently corrupt your data. This is not a responsibility we think TorchRL should endorse.One may argue that resets are infrequent but in some frameworks (eg, Roobohive) they can occur as often as every 50 steps, which could lead to a 2% amount of corrupted data in the buffer.
We believe that if BC cannot be guaranteed, illegitimate behaviors should be prevented by raising errors or warnings during code execution to let users know that with the dependency they are using the intended behavior may not occur. With the silent BC-breaking changes introduced in 1.0, there is no way to warn users at runtime that they may be using the library in a way that is not intended.
Increased computational overhead: The additional complexity of auto-resets requires manual filtering and boilerplate code to mitigate these issues, compromising the efficiency and ease of use of TorchRL.
There is a more fundamental issue from the torchrl perspective. This is a typical rollout loop in gymnasium:
Now take the torchrl rollout (without resets since they should be accounted for):
Imagine you are using gymnasium v1.0 vectorized envs in the backend. Before reset time, the
("next", "done")
entry will beTrue
.This marks your data as the last of the trajectory:
split_trajectories
orGAE
will consider that as the end of an episode. This data is not corrupted.The problem arises in the next step. Now, we'll carry that observation to the root in
step_mdp
and put the reset"observation"
key in the"next"
tensordict during the next call tostep
(which is by essence a call toreset
). This will silently cause the last observation of trajectoryt
to be considered as the first of trajectoryt+1
.This means that every single trajectory collected by torchrl (except the first after reset) will be corrupted.
With gymnasium 1.0, the "bad" algorithm will have fewer total steps (because it will have more resets, which count as steps), so probably an even poorer performance. The opposite will be true for the "good" algorithm, and therefore the difference between the two will be amplified by the API choice (note that the effect is opposite if the environment resets more frequently when it is solved).
To maintain the integrity and efficiency of our library, we cannot support Gymnasium 1.0 or later versions at this time. We believe that the auto-reset feature as implemented is incompatible with the design principles of TorchRL, which prioritize modularity, data integrity, and ease of use.
Proposed Solutions:
We propose the following solutions to address this compatibility issue:
Unless some version of these is implemented, TorchRL will not be able to support gymnasium v1.0 and further release.
TorchRL is willing to make changes in its
GymWrapper
classes internally to make it compatible with gymnasium 1.0. As of now, any such work would still require us to change all the training scripts we have and ask users to do the same.Discussion:
We would like to discuss this issue with the Gymnasium community and explore possible solutions that balance the needs of both libraries. We believe that finding a compatible solution will benefit both communities and promote the development of more robust and efficient reinforcement learning pipelines.
We strongly believe that Gym was a cornerstone in the development of RL thanks to its simplicity and the low probability for users to get things wrong. Let's work together to keep this standard alive!
Related content:
#2473
#2477
Beta Was this translation helpful? Give feedback.
All reactions