-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix missing seed / option args in dummy and subproc vec_env resets during stepping #1805
base: master
Are you sure you want to change the base?
Conversation
Hello @npit thanks for the PR and sorry for the delay. By looking at it, I think it would be better to document the fact that we don't allow to pass options if reset is not called right after. To explain what I mean: Let's say the first env reset, then you can pass Enforcing this limitation would not limit the user though. What do you think @qgallouedec ? PS: options should not be required to work with SB3, so you should in theory make it opt-in (see what I did for the |
As I see it, providing functionality of parameterizing resets but having it apply on a subset of resets is quite confusing. What is the alternative if you want to control implicit resets? Should we make custom functionality instead (e.g. constructor args and dedicated logic)? Doesn't that route kind of defeat the purpose of having arguments to
Currently, options persist until the user explicitly invokes E.g. if reset options are a function of what previous options were consumed in the previous step, users can perhaps track / count the number of episodes per env and adjust. If they want different options per implicit reset, maybe pass an options list and cycle elements from it in Does that make sense, or am I missing something? |
yes, this is the recommended way (having setters, see the example I linked which is in the doc now: https://stable-baselines3.readthedocs.io/en/master/guide/vec_envs.html#modifying-vectorized-environments-attributes).
this API (passing options at reset) was not a SB3 decision but a gymnasium :/ (I was actually against it) and it was designed with single env in mind. We only provided some helper for people that wanted to use it. What I would do in any way is update the doc, add a warning and link to the current discussion. I will wait for the opinion of other maintainers to decide if this is something to be fixed or if it is a behavior that is ok but must be documented. |
I have the impression that passing an option to reset is a very little-used feature. I think it's important to support it though. However, supporting very precise usage such as a list of options seems to me very complicated, both in terms of implementation and maintenance. And it seems like a bottomless pit: I approve of this PR as it stands, and I think we should limit the support to this. |
If the user absolutely need option to be a cycle, they can still do this: import gymnasium as gym
class MyWrapper(gym.Wrapper):
def __init__(self, env, option_cycle):
super().__init__(env)
self.option_cycle = option_cycle
def reset(self, seed=None, options=None):
return self.env.reset(seed=seed, options=next(self.option_cycle)) |
# reset the environment, supplying seed and options | ||
seed = self._seeds[env_idx] | ||
options = self._options[env_idx] | ||
obs, self.reset_infos[env_idx] = self.envs[env_idx].reset(seed=seed, options=options) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as I wrote in my first comment: "PS: options should not be required to work with SB3, so you should in theory make it opt-in (see what I did for the reset())"
Description
self._seeds
andself._options
to a "done" env that calls itsreset
function, inDummyVecEnv.step_wait
options
argument in thereset
functions ofTimeDelayEnv
andCustomSubClassedSpaceEnv
SubprocVecEnv
to send options and seeds arguments as in (1) instep_async
._worker
function's "step" case, to receive and pass these arguments into the env'sreset
, as in (1).Motivation and Context
Closes #1790
Types of changes
Checklist
make format
(required)make check-codestyle
andmake lint
(required)make pytest
andmake type
both pass. (required)make doc
(required)Note: You can run most of the checks using
make commit-checks
.Note: we are using a maximum length of 127 characters per line