-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Weights for speech recognition are not restored when again starting the training as loss value climbs back to 1st epoch value i.e 316 instead of starting from reduced loss #89
Comments
Can you upload your kurfile? |
Text form of file speech.yml speech.txt |
Without seeing your loss plot it's hard to tell (you can generate one from your log directory, check the tutorial on kur.deepgram.com for that). I am betting you are running into confusion resulting from sortagrad. Sortagrad is a curriculum learning method that is enabled in this kurfile which will start training on short audio files at first and ramp up throughout the epoch til the longest audio files at the end of the epoch (sorted in order). Loss is a function of how many errors you make and with longer audio files you tend to make more errors, so the loss tends to go up with longer audio files. This means that your first epoch will start out with low loss and ramp up over time. It may continue increasing until the very end of your first epoch, or (if you have enough data) might roll over and start declining until it hits the end of the epoch. Your second epoch will then start training with randomly shuffled audio files (as in typical in normal training). But, if you stop and restart, sortagrad will run for the first epoch coming back up, no matter what. Even if you already completed a full epoch beforehand (or more). To stop sortagrad from starting, just comment out the line in the kurfile with sortagrad in it. I'm still not 100% sure that's where your problem lies but let me know if this helps (and even better, upload a loss plot!). |
i am training a model of speech recognition(speech.yml) but after the training was interrupted due to some reason i restart the training. The training continues from the next epoch value but the loss comes out to be the same as 1st epoch loss i.e. 316 and i have trained the model till loss 37. Why the loss value is again 316 but not continues from 37?
I have check the weights folder but it shows 0KB size of file for each file but size on disk is nearly 75mb.
Please suggest me what to do to start the training again from the same loss or to restore the weights files?
The text was updated successfully, but these errors were encountered: