Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with Cloned Voice Quality – Need Assistance #364

Open
Uniqbank opened this issue Feb 20, 2025 · 2 comments
Open

Issue with Cloned Voice Quality – Need Assistance #364

Uniqbank opened this issue Feb 20, 2025 · 2 comments
Labels
bug Something isn't working question Further information is requested

Comments

@Uniqbank
Copy link

Hi everyone,

I'm running the model on a MacBook (Intel) using a set of sample WAV/MP3 files with sample rates of 16,000, 24,000, and 44,100, but the quality of the generated cloned voice is nowhere near the demos on your website.

While debugging, I came across the following warning message, which might be causing the issue:

lib/python3.10/site-packages/huggingface_hub/file_download.py:795: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).

Can anyone confirm if this warning is affecting the output quality? Also, any recommendations on how to resolve this and achieve better voice cloning results would be greatly appreciated.

Thanks!

Copy link

dosubot bot commented Feb 20, 2025

Hey @Uniqbank! 👋 I'm here to help with bugs, questions, and becoming a contributor. Let's squash those bugs together!

The warning message you encountered regarding BertForMaskedLM initialization is related to the Hugging Face library and typically indicates that some weights from the model checkpoint are not being used. This is expected behavior when using a model checkpoint for a different task or architecture and should not directly affect the voice cloning quality unless the model's functionality relies on those specific weights, which is unlikely in this context.

For improving the voice cloning quality, consider the following:

  1. Audio Quality: Ensure that the reference audio files are clean, sufficiently long, and contain speech from only one person. Avoid files with background noise or long blank sections [1].

  2. Sample Rate: While the specific optimal sample rate for OpenVoice isn't documented, using a consistent sample rate across your files might help. You could try resampling all your audio files to a common rate, such as 16,000 Hz, which is commonly used in speech processing.

  3. File Format: Ensure that the audio files are in a format supported by the model, such as WAV, which is typically preferred for high-quality audio processing.

These steps should help in achieving better results with the OpenVoice model.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other

@dosubot dosubot bot added bug Something isn't working question Further information is requested labels Feb 20, 2025
@blackjyn
Copy link

I get only 22050 rate
very low
it must be at least 44.1Khz

help needed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants