Highlights over existing PyTorch RL repos #20

fishinglover · 2020-05-14T06:58:01Z

Greetings! I'm a PyTorch RL fan but previously used baselines and stable baselines for research. I notice stable-baselines3 through the origin stable-baselines issue.
Recently there are many PyTorch RL platforms that emerged, including rlpyt, tianshou, etc. I went through their code and compared with stable-baselines3.

Features	Stable-Baselines3	rlpyt	tianshou
State of the art RL methods	✔️	✔️	✔️
Documentation	✔️	✔️	✔️
Custom environments	✔️	Just so-so	✔️
Custom policies	✔️	✔️	✔️
Common interface	✔️	✔️	✔️
Ipython / Notebook friendly	✔️	✔️	✔️
PEP8 code style	✔️	✔️	✔️
Custom callback	✔️	❌	❌
High code coverage	✔️	❌	✔️
Type hints	✔️	❌	✔️

And for the planned features of stable-baselines3:

Features	Stable-Baselines3	rlpyt	tianshou
Tensorboard support	✔️	✔️	✔️
DQN extensions	➖ QR-DQN in SB3 contrib	✔️	✔️
Support for Dict observation spaces	✔️	✔️	✔️
Recurrent Policies	✔️ in contrib	✔️	✔️
TRPO	✔️ in contrib	❌	✔️

Also, the most important feature "modularization", from my perspective, tianshou is the best of all, rlpyt is the second. I hate OpenAI Baselines at this point, but stable-baselines is much better than openai.

Just some of my concerns.

araffin · 2020-05-14T09:39:05Z

Hello,

Thanks for bringing that up.

Stable Baselines vs other RL libs

I had a slide about SB vs other libs: https://araffin.github.io/slides/rl-tuto-jnrr19/#/3/3

So, in short:

We focus on model free RL and single agent setting
We aim at creating a user-friendly lib and avoid introducing breaking changes (we have a lot of backward compat code in SB2). We also document every changes.
We aim at being consistent and providing a clean API (sklearn like interface)
We provide self-contained implementations (vs modular lib) even though we make use of inheritance to avoid code duplication
We have an active community and a complete documentation + tutorials

As mentioned by @hill-a , SB is not about the backend but a common api and good documentation.

ipython friendly / common interface

Compared to the lib you mentioned, SB is super simple to use (sklearn syntax, 2 lines of code) and has an active community (e.g. rlpyt last update was one month ago).

tianshou and rlpyt gives you only building blocks but you need to write your own networks. That's why I would disagree with ipython friendly and common interface for them.

Callbacks

Where did you see them in tianshou / rlpyt?

PEP8 - codestyle - type hints

yes, both follow PEP8 but for tianshou variable names are not all meaningful... (a lot of one letter variable names...)

for type hints, tianshou have them but there is no static type check... (at least in the ci script)

Documentation

The documentation is minimal in tianshou and mostly about the api in rlpyt whereas we have a detailed user guide and full tutorial in SB.

Dict support

I found nothing in tianshou... it looks like you need to implement it yourself, if so, then you have the support in SB too, by writing a custom policy. There are also no examples in rlpyt (even though there is some code that mention it). And I'm not sure to which extent it support all Dictspaces (there are a lot of possibilities).

Miffyli · 2020-05-14T09:41:45Z

I would also like to add this:

Baselines, and stable-baselines, both have received attention from many users over the years, along with many issues and fixes to the algorithms which are easy to miss. As both are still wildly used code-bases, I believe they have withstanded a sort of "Test of Time", making them good for baseline experiments.

fishinglover · 2020-05-14T09:44:09Z

Callbacks
Where did you see them in tianshou / rlpyt?

You are right. I have corrected the table above.

Dict support
I found nothing in tianshou... it looks like you need to implement it yourself, if so, then you have the support in SB too, by writing a custom policy. There are also no examples in rlpyt (even though there is some code that mention it). And I'm not sure to which extent it support all `Dict``spaces (there are a lot of possibilities).

But according to thu-ml/tianshou#38, tianshou supports dict obs naturally.

araffin · 2020-05-14T09:47:44Z

But according to thu-ml/tianshou#38, tianshou supports dict obs naturally.

It is not clear if it handles all possibilities too... (e.g. Image + Box + Discrete + Binary + Multidiscrete) ~~it seems like it will work for Dict of the same type only. In that case, SB also supports that with HER.~~

fishinglover · 2020-05-14T09:55:01Z

Hmm... it also supports the different types I think.

In [9]: d=Batch( 
   ...:     done= array([False, False]), 
   ...:     info= array([{'is_success': 0.0}, {'is_success': 0.0}], dtype=object), 
   ...:     obs= Batch( 
   ...:              test_int= 1, 
   ...:              test_list= [1, 2, 3], 
   ...:              test_str= 'asdasdas', 
   ...:              achieved_goal= array([[1.42853749, 0.63666553, 0.42473605], 
   ...:                                    [1.20106979, 0.77055984, 0.42473605]]), 
   ...:              desired_goal= array([[1.47228501, 0.88356362, 0.4490597 ], 
   ...:                                   [1.37898198, 0.71865667, 0.42469975]]), 
   ...:              observation= array([[ 1.36945404e+00,  7.78820325e-01,  5.64114120e-01, 
   ...:                                    1.42853749e+00,  6.36665526e-01,  4.24736048e-01, 
   ...:                                    5.90834503e-02, -1.42154798e-01, -1.39378071e-01, 
   ...:                                    3.97852140e-02,  4.16259342e-02, -3.85214084e-07, 
   ...:                                    5.92637053e-07,  1.12208536e-13, -2.49958924e-02, 
   ...:                                   -2.62562578e-02, -2.64169060e-02,  1.87589293e-07, 
   ...:                                   -2.88598912e-07,  1.30443021e-18,  2.49958852e-02, 
   ...:                                    2.62562532e-02,  2.64435715e-02,  6.99533302e-02, 
   ...:                                    7.08374623e-02], 
   ...:                                  [ 1.36945404e+00,  7.78820325e-01,  5.64114120e-01, 
   ...:                                    1.20106979e+00,  7.70559842e-01,  4.24736048e-01, 
   ...:                                   -1.68384253e-01, -8.26048228e-03, -1.39378071e-01, 
   ...:                                    3.97852140e-02,  4.16259342e-02, -3.85214084e-07, 
   ...:                                    5.92637053e-07,  1.12208536e-13, -2.49958924e-02, 
   ...:                                   -2.62562578e-02, -2.64169060e-02,  1.87589293e-07, 
   ...:                                   -2.88598912e-07, -3.26321805e-18,  2.49958852e-02, 
   ...:                                    2.62562532e-02,  2.64435715e-02,  6.99533302e-02, 
   ...:                                    7.08374623e-02]]), 
   ...:          ),  
   ...:     rew= array([-1., -1.], dtype=float32), 
   ...: )

In [10]: d.obs.test_list
Out[10]: [1, 2, 3]

In [11]: d.obs.test_str
Out[11]: 'asdasdas'

fishinglover · 2020-05-14T09:57:39Z

Btw, thank you Raffin, you address most of my concerns.

fishinglover · 2020-05-14T10:25:24Z

I found another interesting difference: both rlpyt and tianshou supports recurrent policy and multiprocessing for all algorithms. However, SB seems to support brokenly. Is this feature in your roadmap?

Miffyli · 2020-05-14T10:28:00Z

RNNs are on roadmap (see #1). Multiprocessing is not explicitly included, albeit I am bit uncertain what level of multiprocessing you refer to here (parallel processing of samples during training?).

fishinglover · 2020-05-14T10:40:09Z

I see the table in SB3, the column "multiprocessing" for td3 and sac is ❌, so I'd like to ask. Typically means parallel sampling from environments. I think SB3 should support it naturally. (Maybe I misunderstood? Because baselines supports vecenv :)

Miffyli · 2020-05-14T10:41:53Z

Ah, yes, that "Multiprocessing" refers to sampling from multiple environments at once. PPO/A2C support this out-of-the-box, SAC/TD3 need updates to replay buffers and to related code to support this.

drozzy · 2020-05-20T23:14:40Z

To me, I am comparing stable-baselines to something like RLLib. The upside of rllib is it's very scalable. The downside is it's pretty complex and hard to get running.
Stable-baselines was always kind of run "just run" type of deal. However, it falls short of rllib if you want to actually extend the library or reuse some of its abstractions (e.g. ray/rllib has a "trainable" class which encapsulates the whole training process - so it loads and saves the whole training state and not just the model state).

I must say, Stable Baselines 3 is much more pleasant to install and run without requirement for TF.

araffin · 2020-05-21T14:26:47Z

To me, I am comparing stable-baselines to something like RLLib. The upside of rllib is it's very scalable. The downside is it's pretty complex and hard to get running.

Yes, RLLib is a bit the opposite of SB3 in term of modularity.
As mentioned, SB3 has "self-contained" implementation whereas RLLib breaks the code into many many modules. This allows RLLib to scale well and also easy to support multi-agent setting (which we do not). On the otherside, the implementations of SB3 are easy to read/run and hack (for custom purposes).

I must say, Stable Baselines 3 is much more pleasant to install and run without requirement for TF.

Good to hear =)

fishinglover · 2020-05-21T23:27:03Z

To me, I am comparing stable-baselines to something like RLLib. The upside of rllib is it's very scalable. The downside is it's pretty complex and hard to get running.
Stable-baselines was always kind of run "just run" type of deal. However, it falls short of rllib if you want to actually extend the library or reuse some of its abstractions (e.g. ray/rllib has a "trainable" class which encapsulates the whole training process - so it loads and saves the whole training state and not just the model state).

I must say, Stable Baselines 3 is much more pleasant to install and run without requirement for TF.

Btw, I think tianshou is aim at exactly the same thing as SB3. For me, it is more hackable comparing with SB3. Here is a result of cloc:

[stable_baselines3 (master)]$ cloc stable_baselines3
      44 text files.
      44 unique files.                              
       2 files ignored.

github.com/AlDanial/cloc v 1.74  T=0.14 s (308.4 files/s, 61067.4 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Python                          43           1419           2585           4511
-------------------------------------------------------------------------------
SUM:                            43           1419           2585           4511
-------------------------------------------------------------------------------

and tianshou:

[tianshou (master ✗)]$ cloc tianshou         
      29 text files.
      29 unique files.                              
       2 files ignored.

github.com/AlDanial/cloc v 1.74  T=0.22 s (132.6 files/s, 15262.6 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Python                          29            355            841           2141
-------------------------------------------------------------------------------
SUM:                            29            355            841           2141
-------------------------------------------------------------------------------

which is half of SB3 but with more functions and algorithms.

buoyancy99 · 2020-06-24T21:35:21Z

Rlpyt can support dict/tuple observation well using its named array tuple data structure for observation. It has a method called buffer_to to recursively transfer all array in this structure into tensor & proper devices. The structure also supports "None" in places.It then directly pass the processed data structure into net.forward() method so it's pretty modular.

Currently, stable baseline 3 definitely has better support if you want to choose an "out of box" policy. You can import Mlp or NatureCnn directly. However, the thing passed into the feature extractor.forward method must be an observation tensor. I believe this break some purpose why a lot of people want pytorch (modular and work with complex input, complex architectures)

araffin · 2020-06-25T07:19:08Z

believe this break some purpose why a lot of people want pytorch
Rlpyt can support dict/tuple observation

As written in the roadmap (issue #1) and above, this is a planned feature for SB3 but for v1.1+.

araffin · 2021-04-21T14:11:57Z

As a follow-up to this discussion, there is the SB3 blog post: https://araffin.github.io/post/sb3/
I also updated the table to reflect the new and upcoming features in SB3 (dict obs support and SB3 contrib that has QR-DQN in it)

drozzy · 2021-07-02T23:52:44Z

Thanks @araffin great post.
Anyone familiar with ALL (autonomous learning library) and how it compares to sb3?

araffin · 2022-01-18T22:04:23Z

I updated the table to include TRPO and the experimental PPO LSTM in contrib.
Rainbow DQN is also planned in #622 .

araffin · 2022-05-30T02:44:04Z

Closing as table was updated (PPO LSTM is now merged in contrib master) and original question was answered

araffin added the question Further information is requested label May 14, 2020

Miffyli mentioned this issue Jun 10, 2020

Comparison with tensorflow1/2 implementation #56

Closed

araffin mentioned this issue Jul 7, 2020

[Question/Discussion] Comparing stable-baselines3 vs stable-baselines #90

Closed

araffin mentioned this issue Mar 22, 2022

[Question] Why is the SARSA algorithm not available in Stable Baselines 3 #786

Closed

araffin closed this as completed May 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Highlights over existing PyTorch RL repos #20

Highlights over existing PyTorch RL repos #20

fishinglover commented May 14, 2020 •

edited by araffin

Loading

araffin commented May 14, 2020 •

edited

Loading

Miffyli commented May 14, 2020

fishinglover commented May 14, 2020 •

edited

Loading

araffin commented May 14, 2020 •

edited

Loading

fishinglover commented May 14, 2020 •

edited

Loading

fishinglover commented May 14, 2020 •

edited

Loading

fishinglover commented May 14, 2020 •

edited

Loading

Miffyli commented May 14, 2020

fishinglover commented May 14, 2020 •

edited

Loading

Miffyli commented May 14, 2020 •

edited

Loading

drozzy commented May 20, 2020 •

edited

Loading

araffin commented May 21, 2020 •

edited

Loading

fishinglover commented May 21, 2020

buoyancy99 commented Jun 24, 2020

araffin commented Jun 25, 2020

araffin commented Apr 21, 2021

drozzy commented Jul 2, 2021

araffin commented Jan 18, 2022

araffin commented May 30, 2022

Highlights over existing PyTorch RL repos #20

Highlights over existing PyTorch RL repos #20

Comments

fishinglover commented May 14, 2020 • edited by araffin Loading

araffin commented May 14, 2020 • edited Loading

Stable Baselines vs other RL libs

ipython friendly / common interface

Callbacks

PEP8 - codestyle - type hints

Documentation

Dict support

Miffyli commented May 14, 2020

fishinglover commented May 14, 2020 • edited Loading

araffin commented May 14, 2020 • edited Loading

fishinglover commented May 14, 2020 • edited Loading

fishinglover commented May 14, 2020 • edited Loading

fishinglover commented May 14, 2020 • edited Loading

Miffyli commented May 14, 2020

fishinglover commented May 14, 2020 • edited Loading

Miffyli commented May 14, 2020 • edited Loading

drozzy commented May 20, 2020 • edited Loading

araffin commented May 21, 2020 • edited Loading

fishinglover commented May 21, 2020

buoyancy99 commented Jun 24, 2020

araffin commented Jun 25, 2020

araffin commented Apr 21, 2021

drozzy commented Jul 2, 2021

araffin commented Jan 18, 2022

araffin commented May 30, 2022

fishinglover commented May 14, 2020 •

edited by araffin

Loading

araffin commented May 14, 2020 •

edited

Loading

fishinglover commented May 14, 2020 •

edited

Loading

araffin commented May 14, 2020 •

edited

Loading

fishinglover commented May 14, 2020 •

edited

Loading

fishinglover commented May 14, 2020 •

edited

Loading

fishinglover commented May 14, 2020 •

edited

Loading

fishinglover commented May 14, 2020 •

edited

Loading

Miffyli commented May 14, 2020 •

edited

Loading

drozzy commented May 20, 2020 •

edited

Loading

araffin commented May 21, 2020 •

edited

Loading