Replies: 1 comment 4 replies
-
Thank you for your feedback! I will investigate detail implementation later, however, I feel this is unintentional bug. I try to manage as soon as possible. |
Beta Was this translation helpful? Give feedback.
4 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I hope this is the right place for posting a request. I have been using cpprb in a project/library for multi-agent reinforcement learning that I have been developing and has been a great experience so far. Thank you for your work.
There is however a small issue that we have not been able to overcome. It's understandable given that rewards are usually scalar but in multi-agent reinforcement learning you get one reward for each agent (usually in the form of a numpy array).
I have been successful at using cpprb's replay buffer by just setting
env_dict["rew"]["shape"] = env.n_agents
which works pretty well.However, when it comes to an n-step replay buffer, it gives me an error
ValueError: could not broadcast input array from shape (2,1) into shape (1,1)
which is quite expected given that the implementation surely expects a scalar for rewards.But, computing returns for a vector should be very straightforward. Given e.g. a reward [0.5, 0.9] and a next reward [0.8, 1.0] the shapes are broadcastable and can be computed easily with r1 + gamma*r2 + gamma**2...r3 etc given that numpy would just do the broadcasting.
Thank you in advance for considering this suggestion, and hopefully, it is an easy fix.
Beta Was this translation helpful? Give feedback.
All reactions