Skip to content

Latest commit

 

History

History
43 lines (31 loc) · 1.64 KB

README.md

File metadata and controls

43 lines (31 loc) · 1.64 KB

Dependencies

Stable Baselines

Policy Types

https://stable-baselines.readthedocs.io/en/master/modules/policies.html

  • MLP (Multi-layer perceptron)
    • MLPPolicy
      • Basic implementation, 2 layers of 64
    • MLPLstmPolicy

      LSTMs are explicitly designed to avoid the long-term dependency problem. Remembering information for long periods of time is practically their default behavior, not something they struggle to learn!

      • The problem of Mario shouldn't need long term dependencies
    • MLPLnLstmPolicy
      • LSTM but input is normalized
  • CNN
    • cnns are for images only

Customizing Policies

We can customize by setting the parameters of the Policy class
https://stable-baselines.readthedocs.io/en/master/guide/custom_policy.html

Ones we probably care about:

  • n_env - (int) The number of environments to run
  • n_steps - (int) The number of steps to run for each environment
  • n_batch - (int) The number of batch to run (n_envs * n_steps)

PPO2 Parameters

PPO hyper parameters explained: https://medium.com/aureliantactics/ppo-hyperparameters-and-ranges-6fc2d29bccbe

  • learning_rate
  • noptepochs - number of epochs

Automatic Hyper-parameter Tuning

There is a project that created some pre-trained agents called rl-zoo. They use a project called Optuna to find the best hyper-parameters for the agents so we might want to use it too.