Issue with Cloned Voice Quality – Need Assistance #364

Uniqbank · 2025-02-20T20:26:17Z

Hi everyone,

I'm running the model on a MacBook (Intel) using a set of sample WAV/MP3 files with sample rates of 16,000, 24,000, and 44,100, but the quality of the generated cloned voice is nowhere near the demos on your website.

While debugging, I came across the following warning message, which might be causing the issue:

lib/python3.10/site-packages/huggingface_hub/file_download.py:795: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).

Can anyone confirm if this warning is affecting the output quality? Also, any recommendations on how to resolve this and achieve better voice cloning results would be greatly appreciated.

Thanks!

The text was updated successfully, but these errors were encountered:

dosubot · 2025-02-20T20:27:35Z

Hey @Uniqbank! 👋 I'm here to help with bugs, questions, and becoming a contributor. Let's squash those bugs together!

The warning message you encountered regarding BertForMaskedLM initialization is related to the Hugging Face library and typically indicates that some weights from the model checkpoint are not being used. This is expected behavior when using a model checkpoint for a different task or architecture and should not directly affect the voice cloning quality unless the model's functionality relies on those specific weights, which is unlikely in this context.

For improving the voice cloning quality, consider the following:

Audio Quality: Ensure that the reference audio files are clean, sufficiently long, and contain speech from only one person. Avoid files with background noise or long blank sections ^[1].
Sample Rate: While the specific optimal sample rate for OpenVoice isn't documented, using a consistent sample rate across your files might help. You could try resampling all your audio files to a common rate, such as 16,000 Hz, which is commonly used in speech processing.
File Format: Ensure that the audio files are in a format supported by the model, such as WAV, which is typically preferred for high-quality audio processing.

These steps should help in achieving better results with the OpenVoice model.

To continue talking to Dosu, mention @dosu.

^{Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other}

blackjyn · 2025-02-24T07:33:30Z

I get only 22050 rate
very low
it must be at least 44.1Khz

help needed

dosubot bot added bug Something isn't working question Further information is requested labels Feb 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with Cloned Voice Quality – Need Assistance #364

Issue with Cloned Voice Quality – Need Assistance #364

Uniqbank commented Feb 20, 2025

dosubot bot commented Feb 20, 2025

blackjyn commented Feb 24, 2025

Issue with Cloned Voice Quality – Need Assistance #364

Issue with Cloned Voice Quality – Need Assistance #364

Comments

Uniqbank commented Feb 20, 2025

dosubot bot commented Feb 20, 2025

blackjyn commented Feb 24, 2025