Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Add capability to log on a step-based interval in OffPolicyAlgorithm #1708

Open
2 tasks done
tobiabir opened this issue Oct 6, 2023 · 5 comments · May be fixed by #1709
Open
2 tasks done

[Feature Request] Add capability to log on a step-based interval in OffPolicyAlgorithm #1708

tobiabir opened this issue Oct 6, 2023 · 5 comments · May be fixed by #1709
Labels
enhancement New feature or request help wanted Help from contributors is welcomed

Comments

@tobiabir
Copy link

tobiabir commented Oct 6, 2023

🚀 Feature

At the time of writing the logging interval is controlled by the log_interval argument of the learn method. Permitted are integers. In OnPolicyAlgorithm this is the number of rounds (environment interaction + training steps) and in OffPolicyAlgorithm the number of episodes between logging.

Since episodes in general do not have fixed length or an end at all, logging on an episode-basis is not always practical (see Motivation).
Could we add the capability to log on a step-based interval?

Motivation

The main motivation is experiment tracking.
It is good practice to run experiments multiple times with different random seeds and display training plots with confidence intervals or min and max.
If you want to plot against environment steps (e.g. when you are interested sample complexity) you can't really do that properly if the logs are not done on a step-basis.
This is because with variable episode length the logs will not be aligned and it is difficult to compute the confidence intervals (e.g. what should the interval at step x be if you have a log at step x - 3 and x + 5).
So it would be great if this could be added.

Note also that the documentation is currently stating for all algorithms that log_interval is "The number of episodes before logging.", which is not true for on-policy algorithms. Issue #725 is related to that.

Pitch

I propose we change the OffPolicyAlgorithm case to be the same as the OnPolicyAlgorithm case.
Then logging on a step-basis can be done using a train_freq on a step-basis.
This has the additional benefit of more consistency between on-policy and off-policy algorithms.

While we are at it I also propose to move logging to after training in both OnPolicyAlgorithm and OffPolicyAlgorithm.
That way we can get all the information of one round in the same log.
As it is the information of the last training steps will not be logged.

Alternatives

The least invasive (to sb3) alternative is to modify the environments of interest to have fixed episode length.
However, this might not be practical for all environments and seems ugly.

Another alternative would be to allow log_interval to be a tuple (e.g. (4, "episodes")) like train_freq.
This also seems ugly.

Update:
Custom callback.
See discussion below.

Additional context

No response

Checklist

  • I have checked that there is no similar issue in the repo
  • If I'm requesting a new feature, I have proposed alternatives
@tobiabir tobiabir added the enhancement New feature or request label Oct 6, 2023
tobiabir added a commit to tobiabir/stable-baselines3 that referenced this issue Oct 6, 2023
tobiabir added a commit to tobiabir/stable-baselines3 that referenced this issue Oct 6, 2023
tobiabir added a commit to tobiabir/stable-baselines3 that referenced this issue Oct 6, 2023
tobiabir added a commit to tobiabir/stable-baselines3 that referenced this issue Oct 6, 2023
@araffin
Copy link
Member

araffin commented Oct 6, 2023

Hello,
have you considered callbacks as an alternative? (see doc and section on tensorboard)
they should allow you to log every n steps or every k iterations.

@tobiabir
Copy link
Author

tobiabir commented Oct 6, 2023

Hi.

Thank you for the quick answer and for that pointer!

Indeed, with a custom callback, logging on step-basis can be done.
I quickly tested this on SAC with the following callback.

class TensorboardCallback(sb3.common.callbacks.BaseCallback):
    def __init__(self, log_interval):
        super().__init__()
        self.log_interval = log_interval

    def _on_step(self):
        if self.model.num_timesteps % self.log_interval == 0:
            self.model._dump_logs()

Some minor draw backs remain with this approach:

  • One has to avoid interference from the other logs. This can be done by setting log_interval in learn to np.inf.
  • Everyone who needs this has to reimplement a custom callback even though logging is otherwise nicely integrated into the code base.
  • The logging is still done before the training. So to get the information of the last training steps one has to also implement callback _on_training_end.

Would you still consider some of the proposed changes? I think it could clean things up and improve consistency.
At least I think the documentation of log_interval should be fixed as for on-policy algorithms it is currently incorrect.

@araffin
Copy link
Member

araffin commented Oct 16, 2023

Adding a callback to do step based logging to the collection provided by sb3 would be a good addition i think.
Logging more often shouldn't be a problem (or solvable as you describe).

@araffin araffin added the help wanted Help from contributors is welcomed label Oct 16, 2023
@tobiabir
Copy link
Author

Could you elaborate on your vision for this? Like what set of features should this callback support? Should it be a minimal callback for only step based logging or should it be more general logging callback supporting all sorts of logging applications?

@araffin
Copy link
Member

araffin commented Oct 18, 2023

Could you elaborate on your vision for this?

Have a simple LogEveryNSteps callback that calls self.logger.dump() every n steps (n calls to env.step(), it might correspond to more than one steps when using multiple envs, as shown in the doc).

general logging callback supporting all sorts of logging applications?

We leave general purpose/custom callbacks to the user.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Help from contributors is welcomed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants