An end-to-end Python script that automatically transcribes audio, translates it to a different language, and then re-synthesizes the audio in the target language while preserving the original speaker's voice.
- Speech-to-Text: Utilizes OpenAI's Whisper for highly accurate transcription with word-level timestamps.
- Translation: Translates the transcribed text into a wide variety of target languages using deep-translator.
- Voice Cloning: Employs Coqui-AI's powerful XTTS-v2 model for zero-shot voice cloning. It analyzes a short sample of the original speaker's voice and uses it to generate new audio.
- Text-to-Speech: Synthesizes the translated text in the target language using the cloned voice.
- Duration Matching: Intelligently time-stretches the dubbed audio segments using Rubber Band to ensure they match the duration of the original speech, maintaining lip-sync compatibility.
- End-to-End Pipeline: A single script integrates all components, providing a seamless workflow from input audio to final dubbed output.
cupy-cuda12x==13.5.1
deep-translator==1.11.4
jupyter==1.1.1
meson==1.8.3
numpy==2.2.6
notebook==7.4.5
openai-whisper @ git+https://github.com/openai/whisper.git@c0d2f624c09dc18e709e37c2ad90c039a4eb72a2
pydub==0.25.1
pyrubberband==0.4.0
torch==2.7.1+cu128
torchaudio==2.7.1
torchvision==0.22.1+cu128
transformers==4.39.3
TTS==0.22.0
- Python 3.10.x
- FFmpeg:
pydubandTTSrequire FFmpeg for audio file handling. - Rubber Band CLI: The
pyrubberbandlibrary requires the rubberband command-line tool.
Tips for Best Results
- High-Quality Reference Audio: For the best voice cloning results, use a high-quality INPUT_AUDIO_PATH file that is 15-30 seconds long and contains only the target speaker's voice with no background noise or music.
- Tweak TTS Speed: You can adjust the speed parameter within the tts.tts_to_file() function call to make the synthesized voice speak slightly faster or slower, which can sometimes result in a more natural cadence.
- GPU Acceleration: For any significant amount of audio, running this script on a machine with an NVIDIA GPU will be dramatically faster.
*This README.md file has been improved for overall readability (grammar, sentence structure, and organization) using AI tools.