Feature request: n-steps for vector rewards #7

semitable · 2021-03-03T23:44:10Z

semitable
Mar 3, 2021

I hope this is the right place for posting a request. I have been using cpprb in a project/library for multi-agent reinforcement learning that I have been developing and has been a great experience so far. Thank you for your work.

There is however a small issue that we have not been able to overcome. It's understandable given that rewards are usually scalar but in multi-agent reinforcement learning you get one reward for each agent (usually in the form of a numpy array).
I have been successful at using cpprb's replay buffer by just setting env_dict["rew"]["shape"] = env.n_agents which works pretty well.
However, when it comes to an n-step replay buffer, it gives me an error ValueError: could not broadcast input array from shape (2,1) into shape (1,1) which is quite expected given that the implementation surely expects a scalar for rewards.

But, computing returns for a vector should be very straightforward. Given e.g. a reward [0.5, 0.9] and a next reward [0.8, 1.0] the shapes are broadcastable and can be computed easily with r1 + gamma*r2 + gamma**2...r3 etc given that numpy would just do the broadcasting.

Thank you in advance for considering this suggestion, and hopefully, it is an easy fix.

ymd-h · 2021-03-04T03:27:16Z

ymd-h
Mar 4, 2021
Maintainer

@semitable

Thank you for your feedback!

I will investigate detail implementation later, however, I feel this is unintentional bug.

I try to manage as soon as possible.

4 replies

ymd-h Mar 4, 2021
Maintainer

@semitable

Now I remembered. There is also another solution.

Define "rew1", "rew2" etc. at env_dict as simple scalars, and specifiy ["rew1", "rew2"] at Nstep["rew"]

semitable Mar 4, 2021
Author

@ymd-h
Ah! I see! Yes, I think I made it work! Thank you very much for your work in this library, it is an amazing one!

ymd-h Mar 4, 2021
Maintainer

@semitable

I'm happy to hear that!

By the way, the following simple vector-reward code works fine without error on my Linux container.
I wonder what kind of code you executed.

rb = ReplayBuffer(10, {"rew": {"shape": 2}, "done": {}},
                  Nstep={"size": 2, "rew": "rew", "gamma": 0.9})

for _ in range(20):
    rb.add(rew=[1,2], done=0.0)

semitable Mar 4, 2021
Author

I actually traced it down to me giving an array of done while I had not changed the shape there. However, all(dones) works fine for me so consider this solved!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: n-steps for vector rewards #7

{{title}}

Replies: 1 comment 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Feature request: n-steps for vector rewards #7

semitable Mar 3, 2021

Replies: 1 comment · 4 replies

ymd-h Mar 4, 2021 Maintainer

ymd-h Mar 4, 2021 Maintainer

semitable Mar 4, 2021 Author

ymd-h Mar 4, 2021 Maintainer

semitable Mar 4, 2021 Author

semitable
Mar 3, 2021

Replies: 1 comment 4 replies

ymd-h
Mar 4, 2021
Maintainer

ymd-h Mar 4, 2021
Maintainer

semitable Mar 4, 2021
Author

ymd-h Mar 4, 2021
Maintainer

semitable Mar 4, 2021
Author