Questions about the exploration

Thanks for the wonderful repo and paper! I am wondering if the authors can help me with a few questions. Thanks for your time and insights!

- the self. training is always True at https://github.com/Jacklinkk/TorchGRL/blob/830e41e0cba943f3834fbca7004b09fc9604cf79/pfrl/pfrl/agents/dqn.py#L534, even in the testing settings since the "training" is set as True at https://github.com/Jacklinkk/TorchGRL/blob/830e41e0cba943f3834fbca7004b09fc9604cf79/pfrl/pfrl/agent.py#L12, I am not sure why it is always set as True even in the testing.

- The second question is related to the above question. In the https://github.com/Jacklinkk/TorchGRL/blob/830e41e0cba943f3834fbca7004b09fc9604cf79/pfrl/pfrl/agents/dqn.py#L527 function, if we set the self. training as False, then `batch_action = batch_argmax`. Then the algorithm does not work well at all.

- In https://github.com/Jacklinkk/TorchGRL/blob/830e41e0cba943f3834fbca7004b09fc9604cf79/pfrl/pfrl/agents/dqn.py#L538, the actions are selected by epsilon_greedy.py, while the epsilon is always outputted as 0.3 and not changed during training if we print it here: https://github.com/Jacklinkk/TorchGRL/blob/830e41e0cba943f3834fbca7004b09fc9604cf79/pfrl/pfrl/explorers/epsilon_greedy.py#L8. Could you please give any ideas about this? (in the standard DQN, the epsilon would decay from large number to a small number)

- The paper said the action space is `[change to left; go straight; change to right]`, while the action space is defined as 2 at: https://github.com/Jacklinkk/TorchGRL/blob/830e41e0cba943f3834fbca7004b09fc9604cf79/GRL_Simulation/Experiment/DQN_experiments.py#L113.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about the exploration #2

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Questions about the exploration #2

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions