You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Performance of UDRLPG in three continuous control gym environments and compared to GoGePo (for its similarity) and DDPG (popular RL algorithm for continuous control tasks).
Inverted Pendulum v4
Swimmer v4
Hopper v4
Generator’s ability to produce policies that achieve returns across the entire spectrum
Ablation
Weighting strategy and performance-based replay buffer
About
Upside Down Reinforcement Learning with Policy Generators