Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Negative rewards and rewarding new successful attack to already owned nodes #62

Open
blumu opened this issue May 25, 2022 · 1 comment
Labels
enhancement New feature or request

Comments

@blumu
Copy link
Contributor

blumu commented May 25, 2022

  1. In here, the reward at each step is calculated as reward = max(0., reward). I see the code about penalty in actions.py and I understand why you cancel that. When I tried to remove this max() operation, the reward became highly negative and the agent learns nothing. However, it makes the agent overfit because the reward is so sparse. I think adding some time cost is necessary such as -1 or -0.5.
  2. In here, when giving the reward for NEW_SUCCESSFULL_ATTACK_REWARD, it does not take into consideration whether the attacked the node is already owned. It's meaningless to attack a node already owned by the attacker. It will make the agent repeatedly launch attacks between owned nodes and ignore discovering new nodes.

In my experiment, I trained an agent with the original reward design in the chain env. The agent can perfectly take ownership of the network in training. When I saved the model and evaluate it with epsilon-greedy, the success rate is only about 90%. When I patched the two points I proposed above and trained an agent with the same parameters, the successful rate for evaluation is about 100%. I think the original reward design makes the agent overfit.

Could you please take a look at the two points and give some feedback? Anyway, thanks again for your codes, it helps with my research, and I even would like to use them in my next research project about online learning :)

Originally posted by @sherdencooper in #46 (comment)

@blumu blumu added the enhancement New feature or request label May 25, 2022
@blumu
Copy link
Contributor Author

blumu commented Jun 8, 2022

@sherdencooper These are great points. Regarding 1.: that's correct, clipping negative rewards was a recommendation made by a RL researcher who advised the project. It did help with the learning as you pointed out. That said your proposed changes make sense and we'd be happy to integrate them if you were to submit a PR for it.
Question: You are seeing a clear improvement for epsilon greedy with your changes. Do you also see improvements for the D-QL agent?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant