Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why use nonempty bins rather than all bins? #13

Open
DHPO opened this issue Mar 14, 2019 · 1 comment
Open

Why use nonempty bins rather than all bins? #13

DHPO opened this issue Mar 14, 2019 · 1 comment

Comments

@DHPO
Copy link

DHPO commented Mar 14, 2019

Why you divide weights by nonempty bins (n) rather than all bins(self.bins)?


I think M is the amount of all bins in the paper. Am I missing something?

@libuyu
Copy link
Owner

libuyu commented Mar 14, 2019

@DHPO You are right. In the paper, we define the M as the number of all bins. And in the latest version of our code, we choose the number of valid (non-empty) bins.

Suppose that you have 100 bins, and all the examples have the same gradient norm of 0.8 (although this is impossible in practice). Then each example will get a harmonizing parameter of 1/100 according to the original equation. And when the bin number is 10000, the parameter will become 1/10000. But in these cases, we would like to use a harmonizing parameter of 1 for all examples since they should be equally treated and should not be down-weighted. And the harmonizing parameters should not depend on the bin numbers. So we think the number of valid bins is more reasonable.

Thank you for reading the code and paper so carefully.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants