-
Notifications
You must be signed in to change notification settings - Fork 10
Open
Description
Thanks for the wonderful repo and paper! I am wondering if the authors can help me with a few questions. Thanks for your time and insights!
-
the self. training is always True at
, even in the testing settings since the "training" is set as True atTorchGRL/pfrl/pfrl/agents/dqn.py
Line 534 in 830e41e
if self.training: , I am not sure why it is always set as True even in the testing.Line 12 in 830e41e
training = True -
The second question is related to the above question. In the
function, if we set the self. training as False, thenTorchGRL/pfrl/pfrl/agents/dqn.py
Line 527 in 830e41e
def batch_act(self, batch_obs: Sequence[Any]) -> Sequence[Any]: batch_action = batch_argmax. Then the algorithm does not work well at all. -
In
, the actions are selected by epsilon_greedy.py, while the epsilon is always outputted as 0.3 and not changed during training if we print it here:TorchGRL/pfrl/pfrl/agents/dqn.py
Line 538 in 830e41e
batch_action = [ . Could you please give any ideas about this? (in the standard DQN, the epsilon would decay from large number to a small number)def select_action_epsilon_greedily(epsilon, random_action_func, greedy_action_func): -
The paper said the action space is
[change to left; go straight; change to right], while the action space is defined as 2 at:.act_space = Box(low=0, high=1, shape=(N,), dtype=np.int32)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels