Property inheritence for a custom environment when wrapped as TransformedEnv #2522

MorganCThomas · 2024-10-28T12:21:08Z

MorganCThomas
Oct 28, 2024

I'm not totally sure if this is a desired behaviour or bug, so for now, I wanted to ask the question...

I'm writing a custom environment, let's say MyEnv, and I want to use the EnvBase.rollout() method but possibly change a few options like "_simple_done_value" or "_step_mdp_value" etc. Lastly, MyEnv is wrapped into a TransformedEnv(MyEnv) with a couple of transforms.

If I want to override any method in EnvBase it's fine until it's turned into a TransformedEnv, i.e., you can't override any EnvBase attributes or methods for example, a custom rollout method, or a new _step_mdp property. Is this a desired behaviour or a bug?

To Reproduce

Super simple example:

import torch
from torchrl.envs import TransformedEnv
from torchrl.envs.common import EnvBase

class MyEnv(EnvBase):
    def __init__(self):
        super().__init__()

    def _reset(self):
        print("Resetting")

    def _set_seed(self, seed: int):
        print(f"Seeding {seed}")

    def _step(self):
        print("Stepping")

    def rollout(self, max_steps): # Override rollout here
        print(f"Rolling for {max_steps}")

env = MyEnv()
env.rollout(5) # -> Rolling for 5
tenv = TransformedEnv(env)
tenv.rollout(5) # -> TypeError: MyEnv._reset() takes 1 positional argument but 2 were given ()

Where the second error is due to running EnvBase.rollout() i.e., MyEnv.rollout() doesn't override EnvBase.rollout().

Why this happens?

This is because when you instantiate a TransformedEnv it inherits from EnvBase and not the MyEnv, so any overridden methods/attributes are lost.

super().__init__(device=None, allow_done_after_reset=None, **kwargs)

Discussion

This behaviour seems strange to me, is it meant to be like this? Even if I try to set _step_mdp directly on the TransformedEnv, I can't as it has no setter. I would expect to neatly describe all aspects of the environment in a custom MyEnv class, override EnvBase attributes/methods if necessary (or even provide custom arguments to e.g., _step_mdp), and have a TransformedEnv inherit these.

If it is the desired behaviour, how do I set properties like "_step_mdp_value" without them being reset to None every step of the TransformedEnv?

Answered by vmoens

Oct 29, 2024

Hey thanks for posting this.
I see what the issue is but it's going to be hard to fix consistently, maybe the best would be to provide some doc regarding what can and can't be done and how to do fancy stuff without breaking the class.

This are indeed sort of "meant to be like this". You've probably realized by now that TransformedEnv is rather complex, because it needs to do a lot of things during step: call the inverse transform, execute the base_env._step, make some quality checks, execute the forward transforms, aggregate done states etc.
During reset, for some transforms we need to know what the input tensordict was (remember that _reset signature takes a tensordict as input too), so …

View full answer

vmoens · 2024-10-29T09:26:56Z

vmoens
Oct 29, 2024
Collaborator

Hey thanks for posting this.
I see what the issue is but it's going to be hard to fix consistently, maybe the best would be to provide some doc regarding what can and can't be done and how to do fancy stuff without breaking the class.

This are indeed sort of "meant to be like this". You've probably realized by now that TransformedEnv is rather complex, because it needs to do a lot of things during step: call the inverse transform, execute the base_env._step, make some quality checks, execute the forward transforms, aggregate done states etc.
During reset, for some transforms we need to know what the input tensordict was (remember that _reset signature takes a tensordict as input too), so the transforms _reset methods take two tensordicts as input (the input one and the one resulting from env._reset). All in all, we must be a bit stringent regarding the methods signatures because of everything we must support.

To answer specifically to your queries:

For _reset I would simply suggest to have a signature that matches the one that is expected. _reset should take a tensordict as input, even if you do nothing with it.
_reset and _step are the only private methods you should override. In fact, these are the only methods where we guarantee backward compatibility. Any other method you change may (silently) break in a future release without notice!
If you want to override the step_mdp logic, it's a bit dangerous as we have a cached and a non-cached version of step_mdp and the env will try to use the cached version if the specs allow it and fall back on the slower, non-cached version otherwise. So if you overwrite one you may actually not have something very robust.
Can you share a bit about what you'd like to hack in step_mdp? I see that everything I wrote above doesn't feel very encouraging but we defo want users to be able to hack in our codebase so it's also important for us to understand what kind of things people want to do and how we can best support them! (maybe in this case the solution for you would be to write your own rollout and make your specific step_mdp function?)

Note that this kind of things isn't just about torchrl: in pytorch, if you overwrite nn.Modules.parameters it will only work for instances of that very module, but anytime you wrap it in a nn.Sequence or such, you'll lose that.

3 replies

MorganCThomas Oct 29, 2024
Author

Thanks for the response, and clarifying that this is really a desired behaviour to keep everything stable.

As you suggested, currently we have our own rollout as a standalone function that takes the environment and the policy as arguments, which used step_mdp as such ...

from torchrl.envs.utils import ExplorationType, step_mdp

step_mdp(
    tensordict_,
    keep_other=True,
    exclude_action=True,
    exclude_reward=True,
)

I was looking to migrate to a TorchRL rollout while maintaining exact behaviour of step_mdp for consistency and simpler code, in preparation for MARL. It is not a dealbreaker, as EnvBase.rollout() still works keeping action and reward keys as default.

But then I stumbled upon the inheritence behaviour which I wasn't sure was a bug or not, hence I posted in this discussion.

vmoens Oct 29, 2024
Collaborator

What is so special about your step_mdp? Maybe a transform could do what it does?

MorganCThomas Oct 29, 2024
Author

Nothing special, but it pointed me towards the inheritance behaviour which was my core question. You're right I can implement this as a transform. Thanks again!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Property inheritence for a custom environment when wrapped as TransformedEnv #2522

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Property inheritence for a custom environment when wrapped as TransformedEnv #2522

Uh oh!

MorganCThomas Oct 28, 2024

To Reproduce

Why this happens?

Discussion

Replies: 1 comment · 3 replies

Uh oh!

vmoens Oct 29, 2024 Collaborator

Uh oh!

MorganCThomas Oct 29, 2024 Author

Uh oh!

vmoens Oct 29, 2024 Collaborator

Uh oh!

MorganCThomas Oct 29, 2024 Author

MorganCThomas
Oct 28, 2024

Replies: 1 comment 3 replies

vmoens
Oct 29, 2024
Collaborator

MorganCThomas Oct 29, 2024
Author

vmoens Oct 29, 2024
Collaborator

MorganCThomas Oct 29, 2024
Author