Implementation of the paper 'Ridiculously Fast Shot Boundary Detection with Fully Convolutional Neural Networks' from scratch. Paper can be found at https://arxiv.org/abs/1705.08214
3D CNN is extensively used in this CNN model to achieve the stated performance. hence the model is fullt convolutional in time.
- Python
- Keras (with tensorflow-gpu preffered)
- Moviepy
- Opencv,Numpy,Pandas
The samples videos should be snippets of the video scenes based on the scene boundary or shot-cut, preferably kept in 'out-clips/'.
Augmentation class works using the moviepy(used for editing the videos) and it offer an effective library to augment the dataset of videos. it contains:
Creates the dataset from the multiple augmentations listed in the 'augmentation_helper.py'.\n Creates a video and a csv file which includes the scene boundary frame numbers.
A helper file that has several functions to augment the dataset which includes many real to life scenerios including artificial flash mentioned in the paper.
Sample use case of 'dataset_generator.py', creates the files aug_final.mp4 and csv_aug_data.csv.
The model is an implementation of 10 frames/predcition model from the paper which gives one output for 10 frames. Video augmented data is not required as long as you can provide a csv(with 'frame_no,cut and transistion' colums) and video file.
The script for training the model, files aug_final.mp4 and csv_aug_data.csv has to be provided. The model uses 'adam' as optimizer and 'categorical crossentropy' for calculating loss.
Tensorboard and model checkpoints are used.
Both the fles are to handle the image queue for the training purpose. epoch_generator.py ensures that the data fed into the model is equalized(equal no of postitive and negative dataset).
The script is to test the model performance using the generated model weights after training, ie the 'cut_video_final.h5'. Provides a image stream of 10 images and corresponding prediction.
'check_vid.py' will provide a visualization for a .csv with scene cut frames and a video corresponding to it. As test_model.py it also provide a image stream of 10 images and corresponding prediction.
- Augmentation for artificial lighting, blurness, speed, color-channel(hue,BW and channel switch)
- Augmentation for paning and zooming.
- A generic model for scalable operation to reduce redundancy(any no of frames/ many prediction).