Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DQL still learning at evaluation time #115

Open
blumu opened this issue Oct 13, 2023 · 1 comment
Open

DQL still learning at evaluation time #115

blumu opened this issue Oct 13, 2023 · 1 comment
Labels
bug Something isn't working

Comments

@blumu
Copy link
Contributor

blumu commented Oct 13, 2023

Issue forked from #87 by @kvas7andy

learner.epsilon_greedy_search(...) is often used for training agents with different algorithms, including DQL in the dql_run. However dql_exploit_run with input network dql_run as policy-agent and eval_episode_count parameter for the number of episodes, gives an impression that runs are used for evaluation of the trained DQN. The only distinguishable difference between 2 runs is epsilon queal to 0, which leads to exploitation mode of training, but does not exclude training, because during run with learner.epsilon_greedy_search the optimizer.step() is executed on each step of training in the file agent_dql.py, function call learner.on_step(...).

ToyCTF benchmark is inaccurate, because with correct evaluation procedure, like with chain network configuration, agent does not reqch goal of 6 owned nodes after 200 training episodes.

@blumu
Copy link
Contributor Author

blumu commented Oct 13, 2023

That's a valid point though I think it's valid and fair game to assume the agent can continue to learn and adjust it's policy during evaluation as long as its state (Q-function) is reset at the beginning of each episode to what it was after the learning phase (which might not be currently the case and may need to be fixed.)

Also if we really want to handicap the agent and prevent it to learn during an episode then I suggest we add a freeze_learning:bool parameter to DeepQLearnerPolicy agent. If set to true then function update_q_function becomes a no-op.

def update_q_function(self,

@blumu blumu added the bug Something isn't working label Oct 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant