Clipper is a Python-based tool designed to extract video clips from .mp4
files based on keywords found in corresponding .srt
subtitle files. It handles complex filenames with emojis and special characters, ensuring compatibility across Windows and Linux, and generates clips with embedded subtitles at 720P (configurable) resolution.
- Keyword-Based Clipping: Extracts segments where specified keywords appear in subtitles.
- NLP based Speech Category Clipping: Uses NLP to find and clip one or more specific category of speech from in subtitles.
- Multi-Keyword Filenames: Output filenames include all unique keywords in a clip’s subtitle range (e.g.,
big_copper
). - Special Character Support: Safely processes filenames with emojis (e.g., 🔱🐈) and long formats.
- Subtitle Embedding: Embeds adjusted subtitles into clips using FFmpeg.
- Configurable Buffers: Adds pre- and post-buffers (default 5s) around matched subtitles.
- Resolution Control: Scales clips to 720P, preserving aspect ratio.
- Parallel Processing: Supports multi-threaded processing via a configurable thread pool.
- Flexible Configuration: Uses a JSON config file for settings.
- Add S2T (Speech-to-Text) subtitle creation for videos without subtitles
- Add output resolution control to config file
- Add output content control flags in config file; video, audio, subtitles, metadata
- Add support for multiple subtitle formats
- Add support for multiple video formats
- Add support for multiple audio formats
- Python 3.6+
- FFmpeg (installed and accessible in your PATH)
- Required Python packages:
srt
tqdm
- Install via:
pip install -r requirements.txt
- Clone the Repository:
git clone https://github.com/mattladewig/clipper.git cd clipper
- Set Up a Virtual Environment (optional but recommended):
python -m venv .venv source .venv/bin/activate # On Windows: .venv\Scripts\activate
- Install Dependencies:
pip install -r requirements.txt
- Install FFmpeg:
- On Ubuntu:
sudo apt install ffmpeg
- On macOS:
brew install ffmpeg
- On Windows: Download from FFmpeg’s site and add to PATH.
- On Ubuntu:
-
Prepare Your Files:
- Place
.mp4
videos and their corresponding.srt
subtitle files (named identically, e.g.,1.mp4
and1.srt
) in thevideos/
directory (or configure a custom directory inconfig.json
).
- Place
-
Configure the Tool:
- Edit
config.json
to specify keywords and settings (see Configuration below). - Example
config.json
:{ "directory": "videos", "output_dir": "clips", "keywords": ["big", "copper"], "word_alt_map": {}, "pre_buffer": 5.0, "post_buffer": 5.0, "max_workers": 2, "use_subdirs": false, "logging": "INFO", "speech_categories": [ ], }
- Edit
-
Run the Script:
python clipper.py --config config.json
- Add
--verbose
for debug-level logging:python clipper.py --config config.json --verbose
.
- Add
-
Output:
- Clips are saved in the
clips/
directory (or as configured), named likevideoID-keywords_clipNumber_startTime.mp4
(e.g.,2-big_copper_001_111.mp4
).
- Clips are saved in the
The config.json
file supports the following options:
directory
: Input directory for.mp4
and.srt
files (default:"videos"
).output_dir
: Output directory for clipped videos (default:"clips"
).keywords
: List of keywords to search for in subtitles (e.g.,["big", "copper"]
).word_alt_map
: Dictionary of keyword aliases (e.g.,{"big": ["large", "huge"]}
). Optional.pre_buffer
: Seconds added before each matched subtitle (default:5.0
).post_buffer
: Seconds added after each matched subtitle (default:5.0
).max_workers
: Number of threads for parallel processing (default:1
).use_subdirs
: Iftrue
, creates subdirectories per video ID in the output directory (default:false
).logging
: Logging level ("DEBUG"
,"INFO"
,"WARNING"
,"ERROR"
, default:"INFO"
).speech_categories
: List of speech categories to search for in subtitles (e.g.,["narration", "dialogue"]
). Optional, slow.
Given:
videos/1.mp4
andvideos/1.srt
videos/2.mp4
andvideos/2.srt
- Keywords:
["big", "copper"]
Running python clipper.py --config config.json
might produce:
clips/1-big_001_11.mp4
clips/1-big_002_46.mp4
clips/1-big_003_156.mp4
clips/1-big_004_185.mp4
clips/1-copper_005_333.mp4
clips/2-big_copper_001_111.mp4
Each clip:
- Is scaled to 720p.
- Contains embedded subtitles adjusted to the clip’s timeline.
- Has a filename reflecting all keywords in the clip’s subtitle range.
- Subtitle Parsing: Loads
.srt
files and searches for keywords (case-insensitive). - Range Merging: Combines overlapping subtitle ranges with buffers.
- Clip Extraction: Uses FFmpeg to cut video segments and embed subtitles.
- Naming: Generates filenames based on all keywords found in the clip’s full subtitle range (
clip_subs_filtered
).
- q: Exits, tmp/ empties.
- p: Pauses, logs "Pausing processing...".
- r: Resumes, logs "Resuming processing...".
- Use
--verbose
to see detailed logs, or set logging to DEBUG in config file.python clipper.py --config config.json --verbose
- Logs include search targets, clip ranges, subtitle contents, and FFmpeg commands.
Feel free to fork the repository, submit issues, or create pull requests on GitHub.
This project is open-source under the MIT License.
- Built with Python, FFmpeg, Hugging Face Transformers, and the
srt
library. - Thanks to contributors and users for feedback!