You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm running the model on a MacBook (Intel) using a set of sample WAV/MP3 files with sample rates of 16,000, 24,000, and 44,100, but the quality of the generated cloned voice is nowhere near the demos on your website.
While debugging, I came across the following warning message, which might be causing the issue:
lib/python3.10/site-packages/huggingface_hub/file_download.py:795: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Can anyone confirm if this warning is affecting the output quality? Also, any recommendations on how to resolve this and achieve better voice cloning results would be greatly appreciated.
Thanks!
The text was updated successfully, but these errors were encountered:
Hey @Uniqbank! 👋 I'm here to help with bugs, questions, and becoming a contributor. Let's squash those bugs together!
The warning message you encountered regarding BertForMaskedLM initialization is related to the Hugging Face library and typically indicates that some weights from the model checkpoint are not being used. This is expected behavior when using a model checkpoint for a different task or architecture and should not directly affect the voice cloning quality unless the model's functionality relies on those specific weights, which is unlikely in this context.
For improving the voice cloning quality, consider the following:
Audio Quality: Ensure that the reference audio files are clean, sufficiently long, and contain speech from only one person. Avoid files with background noise or long blank sections [1].
Sample Rate: While the specific optimal sample rate for OpenVoice isn't documented, using a consistent sample rate across your files might help. You could try resampling all your audio files to a common rate, such as 16,000 Hz, which is commonly used in speech processing.
File Format: Ensure that the audio files are in a format supported by the model, such as WAV, which is typically preferred for high-quality audio processing.
These steps should help in achieving better results with the OpenVoice model.
Hi everyone,
I'm running the model on a MacBook (Intel) using a set of sample WAV/MP3 files with sample rates of 16,000, 24,000, and 44,100, but the quality of the generated cloned voice is nowhere near the demos on your website.
While debugging, I came across the following warning message, which might be causing the issue:
Can anyone confirm if this warning is affecting the output quality? Also, any recommendations on how to resolve this and achieve better voice cloning results would be greatly appreciated.
Thanks!
The text was updated successfully, but these errors were encountered: