Remove dropout from decoder cell state#15
Remove dropout from decoder cell state#15richardburleigh wants to merge 1 commit intoNVIDIA:masterfrom
Conversation
Fix FP16 stagnation at "OVERFLOW! Skipping step. Attempted loss scale.."
|
Thank you! That helped me |
|
but why? |
|
It helped me, too. After several studies, Here is my opinion: But the cell states run directly along the entire chain of RNN, to achieve the |
|
@mychiux413 But how would that lead to gradient overflow ? |
@mychiux413 But how about the quality of fp32 model after change the code like this commit ? |
Fix FP16 stagnation at "OVERFLOW! Skipping step. Attempted loss scale.."