-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Add capability to log on a step-based interval in OffPolicyAlgorithm #1708
Comments
Hello, |
Hi. Thank you for the quick answer and for that pointer! Indeed, with a custom callback, logging on step-basis can be done.
Some minor draw backs remain with this approach:
Would you still consider some of the proposed changes? I think it could clean things up and improve consistency. |
Adding a callback to do step based logging to the collection provided by sb3 would be a good addition i think. |
Could you elaborate on your vision for this? Like what set of features should this callback support? Should it be a minimal callback for only step based logging or should it be more general logging callback supporting all sorts of logging applications? |
Have a simple
We leave general purpose/custom callbacks to the user. |
🚀 Feature
At the time of writing the logging interval is controlled by the
log_interval
argument of thelearn
method. Permitted are integers. InOnPolicyAlgorithm
this is the number of rounds (environment interaction + training steps) and inOffPolicyAlgorithm
the number of episodes between logging.Since episodes in general do not have fixed length or an end at all, logging on an episode-basis is not always practical (see Motivation).
Could we add the capability to log on a step-based interval?
Motivation
The main motivation is experiment tracking.
It is good practice to run experiments multiple times with different random seeds and display training plots with confidence intervals or min and max.
If you want to plot against environment steps (e.g. when you are interested sample complexity) you can't really do that properly if the logs are not done on a step-basis.
This is because with variable episode length the logs will not be aligned and it is difficult to compute the confidence intervals (e.g. what should the interval at step x be if you have a log at step x - 3 and x + 5).
So it would be great if this could be added.
Note also that the documentation is currently stating for all algorithms that
log_interval
is "The number of episodes before logging.", which is not true for on-policy algorithms. Issue #725 is related to that.Pitch
I propose we change the
OffPolicyAlgorithm
case to be the same as theOnPolicyAlgorithm
case.Then logging on a step-basis can be done using a
train_freq
on a step-basis.This has the additional benefit of more consistency between on-policy and off-policy algorithms.
While we are at it I also propose to move logging to after training in both
OnPolicyAlgorithm
andOffPolicyAlgorithm
.That way we can get all the information of one round in the same log.
As it is the information of the last training steps will not be logged.
Alternatives
The least invasive (to sb3) alternative is to modify the environments of interest to have fixed episode length.
However, this might not be practical for all environments and seems ugly.
Another alternative would be to allow
log_interval
to be a tuple (e.g. (4, "episodes")) liketrain_freq
.This also seems ugly.
Update:
Custom callback.
See discussion below.
Additional context
No response
Checklist
The text was updated successfully, but these errors were encountered: