Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Highlights over existing PyTorch RL repos #20

Closed
fishinglover opened this issue May 14, 2020 · 19 comments
Closed

Highlights over existing PyTorch RL repos #20

fishinglover opened this issue May 14, 2020 · 19 comments
Labels
question Further information is requested

Comments

@fishinglover
Copy link

fishinglover commented May 14, 2020

Greetings! I'm a PyTorch RL fan but previously used baselines and stable baselines for research. I notice stable-baselines3 through the origin stable-baselines issue.
Recently there are many PyTorch RL platforms that emerged, including rlpyt, tianshou, etc. I went through their code and compared with stable-baselines3.

Features Stable-Baselines3 rlpyt tianshou
State of the art RL methods ✔️ ✔️ ✔️
Documentation ✔️ ✔️ ✔️
Custom environments ✔️ Just so-so ✔️
Custom policies ✔️ ✔️ ✔️
Common interface ✔️ ✔️ ✔️
Ipython / Notebook friendly ✔️ ✔️ ✔️
PEP8 code style ✔️ ✔️ ✔️
Custom callback ✔️
High code coverage ✔️ ✔️
Type hints ✔️ ✔️

And for the planned features of stable-baselines3:

Features Stable-Baselines3 rlpyt tianshou
Tensorboard support ✔️ ✔️ ✔️
DQN extensions ➖ QR-DQN in SB3 contrib ✔️ ✔️
Support for Dict observation spaces ✔️ ✔️ ✔️
Recurrent Policies ✔️ in contrib ✔️ ✔️
TRPO ✔️ in contrib ✔️

Also, the most important feature "modularization", from my perspective, tianshou is the best of all, rlpyt is the second. I hate OpenAI Baselines at this point, but stable-baselines is much better than openai.

Just some of my concerns.

@araffin araffin added the question Further information is requested label May 14, 2020
@araffin
Copy link
Member

araffin commented May 14, 2020

Hello,

Thanks for bringing that up.

Stable Baselines vs other RL libs

I had a slide about SB vs other libs: https://araffin.github.io/slides/rl-tuto-jnrr19/#/3/3

So, in short:

  1. We focus on model free RL and single agent setting
  2. We aim at creating a user-friendly lib and avoid introducing breaking changes (we have a lot of backward compat code in SB2). We also document every changes.
  3. We aim at being consistent and providing a clean API (sklearn like interface)
  4. We provide self-contained implementations (vs modular lib) even though we make use of inheritance to avoid code duplication
  5. We have an active community and a complete documentation + tutorials

As mentioned by @hill-a , SB is not about the backend but a common api and good documentation.

ipython friendly / common interface

Compared to the lib you mentioned, SB is super simple to use (sklearn syntax, 2 lines of code) and has an active community (e.g. rlpyt last update was one month ago).

tianshou and rlpyt gives you only building blocks but you need to write your own networks. That's why I would disagree with ipython friendly and common interface for them.

Callbacks

Where did you see them in tianshou / rlpyt?

PEP8 - codestyle - type hints

yes, both follow PEP8 but for tianshou variable names are not all meaningful... (a lot of one letter variable names...)

for type hints, tianshou have them but there is no static type check... (at least in the ci script)

Documentation

The documentation is minimal in tianshou and mostly about the api in rlpyt whereas we have a detailed user guide and full tutorial in SB.

Dict support

I found nothing in tianshou... it looks like you need to implement it yourself, if so, then you have the support in SB too, by writing a custom policy. There are also no examples in rlpyt (even though there is some code that mention it). And I'm not sure to which extent it support all Dictspaces (there are a lot of possibilities).

@Miffyli
Copy link
Collaborator

Miffyli commented May 14, 2020

I would also like to add this:

Baselines, and stable-baselines, both have received attention from many users over the years, along with many issues and fixes to the algorithms which are easy to miss. As both are still wildly used code-bases, I believe they have withstanded a sort of "Test of Time", making them good for baseline experiments.

@fishinglover
Copy link
Author

fishinglover commented May 14, 2020

Callbacks
Where did you see them in tianshou / rlpyt?

You are right. I have corrected the table above.

Dict support
I found nothing in tianshou... it looks like you need to implement it yourself, if so, then you have the support in SB too, by writing a custom policy. There are also no examples in rlpyt (even though there is some code that mention it). And I'm not sure to which extent it support all `Dict``spaces (there are a lot of possibilities).

But according to thu-ml/tianshou#38, tianshou supports dict obs naturally.

@araffin
Copy link
Member

araffin commented May 14, 2020

But according to thu-ml/tianshou#38, tianshou supports dict obs naturally.

It is not clear if it handles all possibilities too... (e.g. Image + Box + Discrete + Binary + Multidiscrete) it seems like it will work for Dict of the same type only. In that case, SB also supports that with HER.

@fishinglover
Copy link
Author

fishinglover commented May 14, 2020

Hmm... it also supports the different types I think.

In [9]: d=Batch( 
   ...:     done= array([False, False]), 
   ...:     info= array([{'is_success': 0.0}, {'is_success': 0.0}], dtype=object), 
   ...:     obs= Batch( 
   ...:              test_int= 1, 
   ...:              test_list= [1, 2, 3], 
   ...:              test_str= 'asdasdas', 
   ...:              achieved_goal= array([[1.42853749, 0.63666553, 0.42473605], 
   ...:                                    [1.20106979, 0.77055984, 0.42473605]]), 
   ...:              desired_goal= array([[1.47228501, 0.88356362, 0.4490597 ], 
   ...:                                   [1.37898198, 0.71865667, 0.42469975]]), 
   ...:              observation= array([[ 1.36945404e+00,  7.78820325e-01,  5.64114120e-01, 
   ...:                                    1.42853749e+00,  6.36665526e-01,  4.24736048e-01, 
   ...:                                    5.90834503e-02, -1.42154798e-01, -1.39378071e-01, 
   ...:                                    3.97852140e-02,  4.16259342e-02, -3.85214084e-07, 
   ...:                                    5.92637053e-07,  1.12208536e-13, -2.49958924e-02, 
   ...:                                   -2.62562578e-02, -2.64169060e-02,  1.87589293e-07, 
   ...:                                   -2.88598912e-07,  1.30443021e-18,  2.49958852e-02, 
   ...:                                    2.62562532e-02,  2.64435715e-02,  6.99533302e-02, 
   ...:                                    7.08374623e-02], 
   ...:                                  [ 1.36945404e+00,  7.78820325e-01,  5.64114120e-01, 
   ...:                                    1.20106979e+00,  7.70559842e-01,  4.24736048e-01, 
   ...:                                   -1.68384253e-01, -8.26048228e-03, -1.39378071e-01, 
   ...:                                    3.97852140e-02,  4.16259342e-02, -3.85214084e-07, 
   ...:                                    5.92637053e-07,  1.12208536e-13, -2.49958924e-02, 
   ...:                                   -2.62562578e-02, -2.64169060e-02,  1.87589293e-07, 
   ...:                                   -2.88598912e-07, -3.26321805e-18,  2.49958852e-02, 
   ...:                                    2.62562532e-02,  2.64435715e-02,  6.99533302e-02, 
   ...:                                    7.08374623e-02]]), 
   ...:          ),  
   ...:     rew= array([-1., -1.], dtype=float32), 
   ...: )

In [10]: d.obs.test_list
Out[10]: [1, 2, 3]

In [11]: d.obs.test_str
Out[11]: 'asdasdas'

@fishinglover
Copy link
Author

fishinglover commented May 14, 2020

Btw, thank you Raffin, you address most of my concerns.

@fishinglover
Copy link
Author

fishinglover commented May 14, 2020

I found another interesting difference: both rlpyt and tianshou supports recurrent policy and multiprocessing for all algorithms. However, SB seems to support brokenly. Is this feature in your roadmap?

@Miffyli
Copy link
Collaborator

Miffyli commented May 14, 2020

RNNs are on roadmap (see #1). Multiprocessing is not explicitly included, albeit I am bit uncertain what level of multiprocessing you refer to here (parallel processing of samples during training?).

@fishinglover
Copy link
Author

fishinglover commented May 14, 2020

I see the table in SB3, the column "multiprocessing" for td3 and sac is ❌, so I'd like to ask. Typically means parallel sampling from environments. I think SB3 should support it naturally. (Maybe I misunderstood? Because baselines supports vecenv :)

@Miffyli
Copy link
Collaborator

Miffyli commented May 14, 2020

Ah, yes, that "Multiprocessing" refers to sampling from multiple environments at once. PPO/A2C support this out-of-the-box, SAC/TD3 need updates to replay buffers and to related code to support this.

@drozzy
Copy link

drozzy commented May 20, 2020

To me, I am comparing stable-baselines to something like RLLib. The upside of rllib is it's very scalable. The downside is it's pretty complex and hard to get running.
Stable-baselines was always kind of run "just run" type of deal. However, it falls short of rllib if you want to actually extend the library or reuse some of its abstractions (e.g. ray/rllib has a "trainable" class which encapsulates the whole training process - so it loads and saves the whole training state and not just the model state).

I must say, Stable Baselines 3 is much more pleasant to install and run without requirement for TF.

@araffin
Copy link
Member

araffin commented May 21, 2020

To me, I am comparing stable-baselines to something like RLLib. The upside of rllib is it's very scalable. The downside is it's pretty complex and hard to get running.

Yes, RLLib is a bit the opposite of SB3 in term of modularity.
As mentioned, SB3 has "self-contained" implementation whereas RLLib breaks the code into many many modules. This allows RLLib to scale well and also easy to support multi-agent setting (which we do not). On the otherside, the implementations of SB3 are easy to read/run and hack (for custom purposes).

I must say, Stable Baselines 3 is much more pleasant to install and run without requirement for TF.

Good to hear =)

@fishinglover
Copy link
Author

To me, I am comparing stable-baselines to something like RLLib. The upside of rllib is it's very scalable. The downside is it's pretty complex and hard to get running.
Stable-baselines was always kind of run "just run" type of deal. However, it falls short of rllib if you want to actually extend the library or reuse some of its abstractions (e.g. ray/rllib has a "trainable" class which encapsulates the whole training process - so it loads and saves the whole training state and not just the model state).

I must say, Stable Baselines 3 is much more pleasant to install and run without requirement for TF.

Btw, I think tianshou is aim at exactly the same thing as SB3. For me, it is more hackable comparing with SB3. Here is a result of cloc:

[stable_baselines3 (master)]$ cloc stable_baselines3
      44 text files.
      44 unique files.                              
       2 files ignored.

github.com/AlDanial/cloc v 1.74  T=0.14 s (308.4 files/s, 61067.4 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Python                          43           1419           2585           4511
-------------------------------------------------------------------------------
SUM:                            43           1419           2585           4511
-------------------------------------------------------------------------------

and tianshou:

[tianshou (master ✗)]$ cloc tianshou         
      29 text files.
      29 unique files.                              
       2 files ignored.

github.com/AlDanial/cloc v 1.74  T=0.22 s (132.6 files/s, 15262.6 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Python                          29            355            841           2141
-------------------------------------------------------------------------------
SUM:                            29            355            841           2141
-------------------------------------------------------------------------------

which is half of SB3 but with more functions and algorithms.

@buoyancy99
Copy link
Contributor

Rlpyt can support dict/tuple observation well using its named array tuple data structure for observation. It has a method called buffer_to to recursively transfer all array in this structure into tensor & proper devices. The structure also supports "None" in places.It then directly pass the processed data structure into net.forward() method so it's pretty modular.

Currently, stable baseline 3 definitely has better support if you want to choose an "out of box" policy. You can import Mlp or NatureCnn directly. However, the thing passed into the feature extractor.forward method must be an observation tensor. I believe this break some purpose why a lot of people want pytorch (modular and work with complex input, complex architectures)

@araffin
Copy link
Member

araffin commented Jun 25, 2020

believe this break some purpose why a lot of people want pytorch
Rlpyt can support dict/tuple observation

As written in the roadmap (issue #1) and above, this is a planned feature for SB3 but for v1.1+.

@araffin
Copy link
Member

araffin commented Apr 21, 2021

As a follow-up to this discussion, there is the SB3 blog post: https://araffin.github.io/post/sb3/
I also updated the table to reflect the new and upcoming features in SB3 (dict obs support and SB3 contrib that has QR-DQN in it)

@drozzy
Copy link

drozzy commented Jul 2, 2021

Thanks @araffin great post.
Anyone familiar with ALL (autonomous learning library) and how it compares to sb3?

@araffin
Copy link
Member

araffin commented Jan 18, 2022

I updated the table to include TRPO and the experimental PPO LSTM in contrib.
Rainbow DQN is also planned in #622 .

@araffin
Copy link
Member

araffin commented May 30, 2022

Closing as table was updated (PPO LSTM is now merged in contrib master) and original question was answered

@araffin araffin closed this as completed May 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants