A lightweight Python library for downloading videos and extracting frames at precise intervals. It handles direct URLs, supports sub-second extraction, and includes metadata analysis and an interactive player for debugging.
- Direct Download: Process videos directly from URLs without manual downloading.
- Precise Extraction: Support for decimal intervals (e.g., every 0.5 seconds).
- Smart resizing: Resize frames on the fly to save storage.
- Metadata: Automatically extracts FPS, duration, and resolution data.
- Interactive Mode: Optional built-in player to preview or control the process.
- Robust: Includes retry logic, logging, and summary reports.
pip install video-frame-extractor-cvFor development or building from source:
git clone https://github.com/chibuezedev/video-frame-extractor.git
cd video-frame-extractor
pip install -e .The library ships with a video-extractor entry point for quick operations.
# basic: download and extract frames every 5s (default)
video-extractor "https://example.com/video.mp4"
# advanced: extract every 0.5s, resize to width 1280px, skip playback
video-extractor "https://example.com/video.mp4" -i 0.5 -w 1280 --no-play
# specific range: extract from 00:30 to 01:00
video-extractor "https://example.com/video.mp4" -s 30 -e 60The recommended way to use the library is via the context manager, which handles cleanup automatically.
from video_frame_extractor import VideoFrameExtractor
# use context manager to handle resources automatically
with VideoFrameExtractor("https://example.com/video.mp4") as extractor:
# this downloads the video and extracts metadata
extractor.download_video()
# get info before processing
meta = extractor.get_video_metadata()
print(f"processing {meta['duration_seconds']}s video...")
# run extraction
count = extractor.extract_frames()
print(f"done. extracted {count} frames.")You can customize the extractor behavior extensively via the constructor.
extractor = VideoFrameExtractor(
video_url="https://example.com/video.mp4",
output_folder="dataset/train",
interval=2.5, # extract frame every 2.5 seconds
quality=90, # jpeg quality (1-100)
max_width=1280, # downscale if width > 1280
start_time=10, # start at 10s mark
end_time=60, # stop at 60s mark
log_level="DEBUG"
)
# run() wraps download, extraction, and reporting in one call
extractor.run(play_video=False, create_report=True)The library organizes outputs into a clean directory structure:
output_folder/
├── frame_0000_time_0.0s.jpg # extracted frames
├── frame_0001_time_2.5s.jpg
├── video_metadata.json # resolution, fps, source info
├── extraction_report.txt # human-readable summary
└── extraction_log.txt # debug logs
| Flag | Short | Description | Default |
|---|---|---|---|
--output |
-o |
Output directory | frames |
--interval |
-i |
Time between frames (seconds) | 5 |
--quality |
-q |
JPEG quality (1-100) | 95 |
--width |
-w |
Max frame width (px) | None |
--start |
-s |
Start timestamp (seconds) | 0 |
--end |
-e |
End timestamp (seconds) | None |
--no-play |
Disable interactive player | False |
If you run without --no-play, an OpenCV window will open.
q: Quitp: Pause/Resumer: Restartf/b: Seek forward/back 10s
Process a list of URLs and organize them into separate folders.
urls = [
"https://example.com/clip1.mp4",
"https://example.com/clip2.mp4"
]
for i, url in enumerate(urls):
folder = f"data/clip_{i}"
# initialize and run in one go
extractor = VideoFrameExtractor(url, output_folder=folder)
if extractor.run(play_video=False):
print(f"finished {url}")
else:
print(f"failed {url}")Extract frames from specific time ranges within a single video.
# (start_time, end_time) tuples
scenes = [(30, 60), (120, 180)]
for start, end in scenes:
extractor = VideoFrameExtractor(
"https://example.com/movie.mp4",
output_folder=f"frames/{start}_{end}",
start_time=start,
end_time=end,
interval=1.0
)
extractor.run(play_video=False)from video_frame_extractor import VideoFrameExtractor
extractor = VideoFrameExtractor("https://example.com/video.mp4", interval=1.0)
# download only
if extractor.download_video():
print("Video downloaded successfully")
# metadata
metadata = extractor.get_video_metadata()
print(f"Video duration: {metadata.get('duration_seconds', 0):.1f} seconds")
# extract frames without playing
frames_extracted = extractor.extract_frames()
print(f"Extracted {frames_extracted} frames")
# play video separately
extractor.play_video(show_controls=True)
# create report
report_path = extractor.create_summary_report()
print(f"Report saved to: {report_path}")from video_frame_extractor import VideoPlayer
player = VideoPlayer()
player.play("path/to/video.mp4", start_time=10, end_time=60)from video_frame_extractor import validate_url, sanitize_filename
# validate video URL
is_valid = validate_url("https://example.com/video.mp4")
print(f"URL is valid: {is_valid}")
# clean filename
clean_name = sanitize_filename("my video [1080p].mp4")
print(f"Clean filename: {clean_name}")The library creates several output files:
output_folder/
├── frame_0000_time_0.0s.jpg # Extracted frames
├── frame_0001_time_5.0s.jpg
├── frame_0002_time_10.0s.jpg
├── ...
├── video_metadata.json # Video information
├── extraction_report.txt # Summary report
└── extraction_log.txt # Detailed logs
{
"source_url": "https://example.com/video.mp4",
"fps": 30.0,
"total_frames": 1800,
"width": 1920,
"height": 1080,
"duration_seconds": 60.0,
"extraction_interval": 5.0,
"extraction_time": "2024-01-15T10:30:00",
"start_time": 0,
"end_time": null,
"quality": 95,
"max_width": null,
"frames_extracted": 12
}from video_frame_extractor import VideoFrameExtractor
try:
extractor = VideoFrameExtractor("https://invalid-url.com/video.mp4")
success = extractor.run()
if not success:
print("Extraction failed - check logs for details")
except Exception as e:
print(f"Unexpected error: {e}")video_url(str): URL of the video to download and processoutput_folder(str, optional): Directory to save extracted frames (default: "frames")interval(float, optional): Time interval in seconds between frame extractions (default: 5.0)quality(int, optional): JPEG quality for saved frames, 1-100 (default: 95)max_width(int, optional): Maximum width for extracted frames (default: None)start_time(float, optional): Start time in seconds for extraction (default: 0)end_time(float, optional): End time in seconds for extraction (default: None)log_level(str, optional): Logging level (default: "INFO")
download_video(timeout=30, chunk_size=8192): Download video from URLget_video_metadata(): Extract and return video metadataextract_frames(): Extract frames at specified intervalsplay_video(show_controls=True): Play the downloaded videocreate_summary_report(): Generate extraction reportrun(play_video=True, create_report=True): Execute complete process
play(video_path, start_time=0, end_time=None, show_controls=True): Play video file
validate_url(url, timeout=10): Check if URL is accessible videosanitize_filename(filename): Clean filename for filesystem compatibilitysetup_logging(output_folder, log_level="INFO"): Configure logging
OpenCV Errors:
If you see errors related to cv2 or libGL, you might need the headless version of OpenCV for server environments:
pip install opencv-python-headlessDownload Failures:
Ensure the URL is a direct link to a file (ends in .mp4, .avi, etc). For YouTube links, use a tool like yt-dlp to get the direct stream URL first.
- Fork the repo
- Create your feature branch (
git checkout -b feature/cool-feature) - Commit changes (
git commit -m 'add cool feature') - Push to branch (
git push origin feature/cool-feature) - Open a Pull Request
For support, please:
- Check the troubleshooting section
- Search existing issues
- Create a new issue
Distributed under the MIT License. See LICENSE for more information.