The functionality of on_episode_end() #13

liuzuxin · 2021-10-13T15:26:48Z

liuzuxin
Oct 13, 2021

Hi, I have two questions regarding the on_episode_end() method:

If I don't use N-step reward, and I set the 'done' signal correctly in the buffer, do I still need to call this method when I reset the env? If so, why?
Is there any examples about how to safely rewrite this method? I'm asking because I would like to calculate the reward to go value and advantage value after each episode and store them in the buffer.

Thanks and best regards,

Oct 13, 2021

If I don't use N-step reward, and I set the 'done' signal correctly in the buffer, do I still need to call this method when I reset the env? If so, why?

To be honest, it is not necessary to call on_episode_end() when you don't use the Nstep feature, or any of memory compress features (aka. next_of etc.).
However, as a rule, we assume users always call on_episode_end() at the end of every episode, so that it is possible that we will add some functionalities in the method and you will get bug if you don't call it.

Is there any examples about how to safely rewrite this method? I'm asking because I would like to calculate the reward to go value and advantage value after each…

View full answer

ymd-h · 2021-10-13T22:56:35Z

ymd-h
Oct 13, 2021
Maintainer

Hi, @liuzuxin

If I don't use N-step reward, and I set the 'done' signal correctly in the buffer, do I still need to call this method when I reset the env? If so, why?

To be honest, it is not necessary to call on_episode_end() when you don't use the Nstep feature, or any of memory compress features (aka. next_of etc.).
However, as a rule, we assume users always call on_episode_end() at the end of every episode, so that it is possible that we will add some functionalities in the method and you will get bug if you don't call it.

Is there any examples about how to safely rewrite this method? I'm asking because I would like to calculate the reward to go value and advantage value after each episode and store them in the buffer.

It is better to implement the functionality outside instead of rewriting ReplayBuffer method directly.
If you want to make a custom class, I propose making a wrapper class which contains ReplayBuffer as a member.

If you still have any questions, please feel free to ask us.

0 replies

ymd-h · 2022-02-05T00:06:25Z

ymd-h
Feb 5, 2022
Maintainer

[FYI]
We released HindsightReplayBuffer for Hindsight Experience Replay.
The implementation might be helpful for needs similar to No. 2 (aka. reward to go / advantage).

The class has two internal replay buffers. The one (rb) is for main buffer, the other (episode_rb) is for current episode.
At first, transitions are inserted to the episode_rb. At the episode end (on_episode_end), goal relabeling is executed and those transitions are moved to rb.

1 reply

liuzuxin Feb 5, 2022
Author

Awesome, thanks!

jamartinh · 2022-02-15T11:29:19Z

jamartinh
Feb 15, 2022

Could not be the case that "on_episode_end" method is always consident with a terminal state, e.g., when done=True?

So this method should always be used when done=True? In that case, the flag done=True when inserting into buffer should be the trigger of on_episode_end() method ?

1 reply

ymd-h Feb 15, 2022
Maintainer

Precisely speaking, the method on_episode_end() is necessary to resolve overlapped data e.g. next_of.
The buffer have to know the end of sequential steps, which is usually (but not always) "episode end".

If there are some environments (or a vectorized environments) and (certain numbers of) transitions are added from them one after another, we have to call on_episode_end() even when the last step is not its terminal state because of the discontinuity.

I admit the method name is confusing, however, we don't want to change the public API without enough reason.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The functionality of on_episode_end() #13

{{title}}

Replies: 3 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

The functionality of on_episode_end() #13

liuzuxin Oct 13, 2021

Replies: 3 comments · 2 replies

ymd-h Oct 13, 2021 Maintainer

ymd-h Feb 5, 2022 Maintainer

liuzuxin Feb 5, 2022 Author

jamartinh Feb 15, 2022

ymd-h Feb 15, 2022 Maintainer

liuzuxin
Oct 13, 2021

Replies: 3 comments 2 replies

ymd-h
Oct 13, 2021
Maintainer

ymd-h
Feb 5, 2022
Maintainer

liuzuxin Feb 5, 2022
Author

jamartinh
Feb 15, 2022

ymd-h Feb 15, 2022
Maintainer