Skip to content

Questions about the exploration #2

@DongChen06

Description

@DongChen06

Thanks for the wonderful repo and paper! I am wondering if the authors can help me with a few questions. Thanks for your time and insights!

  • the self. training is always True at

    if self.training:
    , even in the testing settings since the "training" is set as True at
    training = True
    , I am not sure why it is always set as True even in the testing.

  • The second question is related to the above question. In the

    def batch_act(self, batch_obs: Sequence[Any]) -> Sequence[Any]:
    function, if we set the self. training as False, then batch_action = batch_argmax. Then the algorithm does not work well at all.

  • In

    batch_action = [
    , the actions are selected by epsilon_greedy.py, while the epsilon is always outputted as 0.3 and not changed during training if we print it here:
    def select_action_epsilon_greedily(epsilon, random_action_func, greedy_action_func):
    . Could you please give any ideas about this? (in the standard DQN, the epsilon would decay from large number to a small number)

  • The paper said the action space is [change to left; go straight; change to right], while the action space is defined as 2 at:

    act_space = Box(low=0, high=1, shape=(N,), dtype=np.int32)
    .

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions