-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Highlights over existing PyTorch RL repos #20
Comments
Hello, Thanks for bringing that up. Stable Baselines vs other RL libsI had a slide about SB vs other libs: https://araffin.github.io/slides/rl-tuto-jnrr19/#/3/3 So, in short:
As mentioned by @hill-a , SB is not about the backend but a common api and good documentation. ipython friendly / common interfaceCompared to the lib you mentioned, SB is super simple to use (sklearn syntax, 2 lines of code) and has an active community (e.g. rlpyt last update was one month ago). tianshou and rlpyt gives you only building blocks but you need to write your own networks. That's why I would disagree with ipython friendly and common interface for them. CallbacksWhere did you see them in tianshou / rlpyt? PEP8 - codestyle - type hintsyes, both follow PEP8 but for tianshou variable names are not all meaningful... (a lot of one letter variable names...) for type hints, tianshou have them but there is no static type check... (at least in the ci script) DocumentationThe documentation is minimal in tianshou and mostly about the api in rlpyt whereas we have a detailed user guide and full tutorial in SB. Dict supportI found nothing in tianshou... it looks like you need to implement it yourself, if so, then you have the support in SB too, by writing a custom policy. There are also no examples in rlpyt (even though there is some code that mention it). And I'm not sure to which extent it support all |
I would also like to add this: Baselines, and stable-baselines, both have received attention from many users over the years, along with many issues and fixes to the algorithms which are easy to miss. As both are still wildly used code-bases, I believe they have withstanded a sort of "Test of Time", making them good for baseline experiments. |
You are right. I have corrected the table above.
But according to thu-ml/tianshou#38, tianshou supports dict obs naturally. |
It is not clear if it handles all possibilities too... (e.g. Image + Box + Discrete + Binary + Multidiscrete) |
Hmm... it also supports the different types I think. In [9]: d=Batch(
...: done= array([False, False]),
...: info= array([{'is_success': 0.0}, {'is_success': 0.0}], dtype=object),
...: obs= Batch(
...: test_int= 1,
...: test_list= [1, 2, 3],
...: test_str= 'asdasdas',
...: achieved_goal= array([[1.42853749, 0.63666553, 0.42473605],
...: [1.20106979, 0.77055984, 0.42473605]]),
...: desired_goal= array([[1.47228501, 0.88356362, 0.4490597 ],
...: [1.37898198, 0.71865667, 0.42469975]]),
...: observation= array([[ 1.36945404e+00, 7.78820325e-01, 5.64114120e-01,
...: 1.42853749e+00, 6.36665526e-01, 4.24736048e-01,
...: 5.90834503e-02, -1.42154798e-01, -1.39378071e-01,
...: 3.97852140e-02, 4.16259342e-02, -3.85214084e-07,
...: 5.92637053e-07, 1.12208536e-13, -2.49958924e-02,
...: -2.62562578e-02, -2.64169060e-02, 1.87589293e-07,
...: -2.88598912e-07, 1.30443021e-18, 2.49958852e-02,
...: 2.62562532e-02, 2.64435715e-02, 6.99533302e-02,
...: 7.08374623e-02],
...: [ 1.36945404e+00, 7.78820325e-01, 5.64114120e-01,
...: 1.20106979e+00, 7.70559842e-01, 4.24736048e-01,
...: -1.68384253e-01, -8.26048228e-03, -1.39378071e-01,
...: 3.97852140e-02, 4.16259342e-02, -3.85214084e-07,
...: 5.92637053e-07, 1.12208536e-13, -2.49958924e-02,
...: -2.62562578e-02, -2.64169060e-02, 1.87589293e-07,
...: -2.88598912e-07, -3.26321805e-18, 2.49958852e-02,
...: 2.62562532e-02, 2.64435715e-02, 6.99533302e-02,
...: 7.08374623e-02]]),
...: ),
...: rew= array([-1., -1.], dtype=float32),
...: )
In [10]: d.obs.test_list
Out[10]: [1, 2, 3]
In [11]: d.obs.test_str
Out[11]: 'asdasdas' |
Btw, thank you Raffin, you address most of my concerns. |
I found another interesting difference: both rlpyt and tianshou supports recurrent policy and multiprocessing for all algorithms. However, SB seems to support brokenly. Is this feature in your roadmap? |
RNNs are on roadmap (see #1). Multiprocessing is not explicitly included, albeit I am bit uncertain what level of multiprocessing you refer to here (parallel processing of samples during training?). |
I see the table in SB3, the column "multiprocessing" for td3 and sac is ❌, so I'd like to ask. Typically means parallel sampling from environments. I think SB3 should support it naturally. (Maybe I misunderstood? Because baselines supports vecenv :) |
Ah, yes, that "Multiprocessing" refers to sampling from multiple environments at once. PPO/A2C support this out-of-the-box, SAC/TD3 need updates to replay buffers and to related code to support this. |
To me, I am comparing stable-baselines to something like RLLib. The upside of rllib is it's very scalable. The downside is it's pretty complex and hard to get running. I must say, Stable Baselines 3 is much more pleasant to install and run without requirement for TF. |
Yes, RLLib is a bit the opposite of SB3 in term of modularity.
Good to hear =) |
Btw, I think tianshou is aim at exactly the same thing as SB3. For me, it is more hackable comparing with SB3. Here is a result of cloc:
and tianshou:
which is half of SB3 but with more functions and algorithms. |
Rlpyt can support dict/tuple observation well using its named array tuple data structure for observation. It has a method called buffer_to to recursively transfer all array in this structure into tensor & proper devices. The structure also supports "None" in places.It then directly pass the processed data structure into net.forward() method so it's pretty modular. Currently, stable baseline 3 definitely has better support if you want to choose an "out of box" policy. You can import Mlp or NatureCnn directly. However, the thing passed into the feature extractor.forward method must be an observation tensor. I believe this break some purpose why a lot of people want pytorch (modular and work with complex input, complex architectures) |
As written in the roadmap (issue #1) and above, this is a planned feature for SB3 but for v1.1+. |
As a follow-up to this discussion, there is the SB3 blog post: https://araffin.github.io/post/sb3/ |
I updated the table to include TRPO and the experimental PPO LSTM in contrib. |
Closing as table was updated (PPO LSTM is now merged in contrib master) and original question was answered |
Greetings! I'm a PyTorch RL fan but previously used baselines and stable baselines for research. I notice stable-baselines3 through the origin stable-baselines issue.
Recently there are many PyTorch RL platforms that emerged, including rlpyt, tianshou, etc. I went through their code and compared with stable-baselines3.
And for the planned features of stable-baselines3:
Also, the most important feature "modularization", from my perspective, tianshou is the best of all, rlpyt is the second. I hate OpenAI Baselines at this point, but stable-baselines is much better than openai.
Just some of my concerns.
The text was updated successfully, but these errors were encountered: