WorldWideWave - AI Voice Clone & Dubbing Pipeline

An end-to-end Python script that automatically transcribes audio, translates it to a different language, and then re-synthesizes the audio in the target language while preserving the original speaker's voice.

Features

Speech-to-Text: Utilizes OpenAI's Whisper for highly accurate transcription with word-level timestamps.
Translation: Translates the transcribed text into a wide variety of target languages using deep-translator.
Voice Cloning: Employs Coqui-AI's powerful XTTS-v2 model for zero-shot voice cloning. It analyzes a short sample of the original speaker's voice and uses it to generate new audio.
Text-to-Speech: Synthesizes the translated text in the target language using the cloned voice.
Duration Matching: Intelligently time-stretches the dubbed audio segments using Rubber Band to ensure they match the duration of the original speech, maintaining lip-sync compatibility.
End-to-End Pipeline: A single script integrates all components, providing a seamless workflow from input audio to final dubbed output.

Requirements

cupy-cuda12x==13.5.1
deep-translator==1.11.4
jupyter==1.1.1
meson==1.8.3
numpy==2.2.6
notebook==7.4.5
openai-whisper @ git+https://github.com/openai/whisper.git@c0d2f624c09dc18e709e37c2ad90c039a4eb72a2
pydub==0.25.1
pyrubberband==0.4.0
torch==2.7.1+cu128
torchaudio==2.7.1
torchvision==0.22.1+cu128
transformers==4.39.3
TTS==0.22.0

System-Level Dependencies

Python 3.10.x
FFmpeg: pydub and TTS require FFmpeg for audio file handling.
Rubber Band CLI: The pyrubberband library requires the rubberband command-line tool.

Tips for Best Results

High-Quality Reference Audio: For the best voice cloning results, use a high-quality INPUT_AUDIO_PATH file that is 15-30 seconds long and contains only the target speaker's voice with no background noise or music.
Tweak TTS Speed: You can adjust the speed parameter within the tts.tts_to_file() function call to make the synthesized voice speak slightly faster or slower, which can sometimes result in a more natural cadence.
GPU Acceleration: For any significant amount of audio, running this script on a machine with an NVIDIA GPU will be dramatically faster.

*This README.md file has been improved for overall readability (grammar, sentence structure, and organization) using AI tools.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
output		output
LICENSE		LICENSE
README.md		README.md
Source_EN.mp3		Source_EN.mp3
speech-translation.ipynb		speech-translation.ipynb
temp_translated_segment.wav		temp_translated_segment.wav

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WorldWideWave - AI Voice Clone & Dubbing Pipeline

Features

Requirements

System-Level Dependencies

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

WorldWideWave - AI Voice Clone & Dubbing Pipeline

Features

Requirements

System-Level Dependencies

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages