Deep Learning Project of Computer Vision with Classification+Style Transfer
This project, CelebVision, is a comprehensive celebrity image processing system. The project focuses on two main tasks: image classification and style transfer using both customized and pretrained models.
- Data Overview
- Task 1: Image Classification
- Customized Model: CNN
- Pretrained Model: InceptionV3
- Task 2: Style Transfer
- Customized Model: CycleGAN
- Pretrained Model: VGG-19
- Model Operations & Parameter Update
- Findings
- References
The dataset used is the CelebA dataset, which includes 202,599 images of celebrities annotated with 40 facial attributes. The dataset has no missing values and is ideal for tasks requiring image classification and style transfer.
Develop and evaluate a machine learning model capable of accurately classifying the gender of individuals in the CelebA dataset based on facial attributes.
A custom deep convolutional neural network (CNN) designed specifically for the CelebA dataset. This model provides full control over the architecture, allowing detailed optimization and adaptation to the dataset's characteristics.
Pros:
- Tailored Architecture for better performance
- Full control over the model
Cons:
- Time-consuming and computationally intensive
- Risk of overfitting
InceptionV3 is a deep CNN designed by Google, known for its high accuracy and scalability.
Pros:
- High accuracy and excellent performance on various image classification tasks
- Capable of handling large-scale datasets
Cons:
- Complex architecture
- Harder to implement and tune
Train and evaluate models to apply specific artistic styles to content images, creating new images that blend the visual aesthetics of the style images with the structural elements of the content images.
CycleGAN is used for unpaired image-to-image translation, enabling style transfer without requiring paired training data.
Pros:
- Handles unpaired data
- Simplicity in architecture
Cons:
- Training instability and mode collapse
- Sensitive to hyperparameters
- Generated images may contain artifacts or appear blurry
VGG-19 is a 19-layer deep CNN pretrained on the ImageNet dataset. It is used for extracting content and style features for style transfer.
Pros:
- High performance
- Easy to implement and computationally efficient
Cons:
- No model saved, requiring generation for each new input style/content image
- Hard to compare loss across different generation tasks
The models are deployed using Streamlit, allowing users to upload test images and apply style transfer in real-time.
The infrastructure is set up using AWS, with CloudWatch monitoring the application health. The models fetch new data from AWS S3 for retraining as needed.
The customized CNN showed slightly higher accuracy compared to InceptionV3 for gender classification. For style transfer, CycleGAN and VGG-19 both demonstrated effective style applications, with VGG-19 providing more detailed style features.
- Siyan Li
- Stella Wang
- Xinran Wang
- Yumin Zhang