Skip to content

The functionality of on_episode_end() #13

Answered by ymd-h
liuzuxin asked this question in Q&A
Discussion options

You must be logged in to vote

Hi, @liuzuxin

If I don't use N-step reward, and I set the 'done' signal correctly in the buffer, do I still need to call this method when I reset the env? If so, why?

To be honest, it is not necessary to call on_episode_end() when you don't use the Nstep feature, or any of memory compress features (aka. next_of etc.).
However, as a rule, we assume users always call on_episode_end() at the end of every episode, so that it is possible that we will add some functionalities in the method and you will get bug if you don't call it.


Is there any examples about how to safely rewrite this method? I'm asking because I would like to calculate the reward to go value and advantage value after each…

Replies: 3 comments 2 replies

Comment options

You must be logged in to vote
0 replies
Answer selected by ymd-h
Comment options

You must be logged in to vote
1 reply
@liuzuxin
Comment options

Comment options

You must be logged in to vote
1 reply
@ymd-h
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants