-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
His dataset has a big problem #22
Comments
How did you solve this probelm#19 ? I am stuck on it. |
I gave up on solving this problem and have been using minibatches for generation. |
你好,我看到你在#19中的回复,我猜想你应该也是中国人,在这里我就直接用中文和你交流。 |
Hello? Do you solve this problem? |
抱歉,我没有看到这个回复。你的图片我无法看到,但是我对于linux卡死问题是进行小批量的pkl文件生成,因为过大的卡死我也无法解决。 |
我的整体思路和另外一个人的差不多,就是限制在200-400左右,避免卡死。这是linux虚拟机能够承受的,超过就会卡死,不知道能不能对你有所帮助 |
我看另一个人解决的是修改了select函数,这是我的回复https://github.com/epicosy/devign/issues/19#issuecomment-2106168341,你看一下我理解的对不对 |
Hello! Am I correct in assuming by a bn layer you mean applying 1-d BatchNorm before the final linear head? I encountered the same problem (loss stuck on 0.68-69), applying 1-d BatchNorm seems to solve it. Could you please explain why sigmoid function causes this? Is it because it disturbs gradient flow? |
Because the output of the sigmoid function is too close to 0.5, this problem can be solved by using the bn layer and expanding the variance, but the bn layer will be simpler. |
Thank you for the answer, it helped me a lot. I'm going to create a pull-request regarding this issue since it seems like this BatchNorm layer should be included by default, if you can look at it later I would be grateful. |
His model, if the dataset is replaced, can have normal binary classification performance. Firstly, on his dataset, the model loses to 0.69 because of the sigmoid function. Adding a bn layer can solve the problem. Secondly,if you solve the loss to 0.69 problem,you will find that this model performs well on the training set, but very poorly on the test and validation sets. I solved the problem by replacing the dataset
So,I think the authors may have deliberately given an erroneous data set that prevented us from reproducing the results
The text was updated successfully, but these errors were encountered: