The YouTube Video Transcription CLI Tool is a Python script that allows you to easily transcribe YouTube videos and save the transcriptions as formatted text files. It utilizes yt-dlp for video downloading, the OpenAI whisper library for transcription, and includes features for text processing and formatting.
The tool supports both single video transcription and batch processing of multiple videos from a list of URLs.
- Transcribe single or multiple YouTube videos using Whisper's speech recognition
- Download videos efficiently using yt-dlp
- Preserve natural speech segments with accurate timestamps
- Automatically format and capitalize text for improved readability
- Generate organized output with video title and timestamps
- Save transcriptions in a structured directory format
- Process multiple videos from a text file
- Clone the repository:
git clone https://github.com/your-username/youtube-transcription-tool.git- Navigate to the project directory:
cd youtube-transcription-tool- Create a Python virtual environment:
python -m venv .venv- Activate the virtual environment:
source .venv/bin/activate # On Unix/macOS
.venv\Scripts\activate # On Windows- Install the required dependencies:
pip install -r requirements.txtTo transcribe a single YouTube video:
python whisper_transcribe_videos.py -u "https://youtube.com/watch?v=..."
# or
python whisper_transcribe_videos.py --url "https://youtube.com/watch?v=..."To transcribe multiple videos, create a text file with one YouTube URL per line, then:
python whisper_transcribe_video.py -f urls.txt
# or
python whisper_transcribe_video.py --file urls.txtExample urls.txt format:
https://www.youtube.com/watch?v=video1
https://www.youtube.com/watch?v=video2
https://www.youtube.com/watch?v=video3
The script will:
- Create a
videosdirectory for downloaded videos - Create a
transcriptionsdirectory for output files - Generate transcription files with the format:
{video_title}_{date}.txt - Include timestamps and properly formatted text in the transcription
The transcription file will contain:
Video Title
[00:00:00.000] First segment of speech...
[00:00:04.123] Next segment of speech...
[00:00:08.456] And so on...
This project is licensed under the MIT License.
