-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Polish model #6
Comments
Could you link an article or lists of steps on how to train the model for different languages? I assumed you got the dataset from here. What would be the next step if I'd like to train it for czech/slovak language? |
@phodina From my testing, training on common voice data actually didn't work that well for Internet content with talking, because the speech is too high-quality and clear. The dataset is composed entirely of reading-speech, as the dataset collection is done by having people read written sentences out loud, and this produces a somewhat different kind of speech compared to natural talking or conversation. I may write an article with some findings and instructions later, but for now I trained the model using this recipe with some modifications to use common voice instead of LibriSpeech, and I used this to finally export the checkpoint to a .april file. |
Hi @abb128 , thanks for the explanation. I'll look at the recipe you suggested! |
Hello, The desired language (for myself) is greek. However for any other languages, as a general workflow. It would be very helpful. [EDIT] But also: how to be informed / notified once new .april models gets added? To know to come back / check again. |
What are the modifications you performed? Can you provide the patch file? |
An initial Polish model has been trained on the 160 hours of Mozilla Common Voice using 80/10/10 train/test/dev speaker split, with 4.51% WER on unseen speakers (some of which may actually potentially not be unseen because Common Voice allows anonymous submissions and doesn't link them? not certain)
It is available here: https://april.sapples.net/april-polish-dev-2_pl.april
The text was updated successfully, but these errors were encountered: