This repository contains the implementation of a Movie Recommendation System aimed at enhancing content streaming services. The project leverages machine learning models trained on the MovieLens dataset to provide personalized movie recommendations based on user reviews.
The system utilizes two datasets from MovieLens:
-
Small Dataset (100k reviews)
- 600 users
- 9000 movies
- Reviews span from 1995 - 2023
- Each review contains:
- An anonymized user ID
- A movie ID
- A rating on a 5-star scale (half-star increments)
-
Large Dataset (33M reviews)
- 330,975 users
- 86,537 movies
- Algorithm: Singular Value Decomposition (SVD) using
surpriselibrary. - Parameters: Default
- Performance Metrics:
- RMSE: 0.88
- FCP: 0.65
- Optimized Parameters:
n_factors: 200n_epochs: 150regParam: 0.1
- Performance Improvement:
- RMSE: 0.85
- FCP: 0.68
- t-SNE Clustering to visualize movies with similar rating patterns.
- Tools: Databricks, PySpark
- Algorithm: Alternating Least Squares (ALS)
- Parameters: Default
- Performance Metrics:
- RMSE: 3.5
- Performed cross-validation on a small subset to determine optimal parameters:
alpha: 0.5maxIter: 10rank: 15regParam: 0.3
- Final Model Performance
- RMSE: 0.91
- FCP: 0.55
- Further hyperparameter tuning to improve the FCP score for the large dataset.
- Investigate and analyze clustering patterns in the data.