CinemAI is a machine learning-based movie recommendation system designed to help users discover movies tailored to their preferences. This project leverages content-based filtering, cosine similarity, and text vectorization to deliver highly relevant recommendations through an intuitive user interface built with Streamlit, HTML, and CSS.
- Personalized Recommendations: Suggests movies based on user input and preferences.
- Content-Based Filtering: Uses attributes like genres, directors, and actors to calculate similarity.
- Interactive UI: A user-friendly interface to search for movies and get recommendations instantly.
- TMDb Dataset: Data-driven insights using the popular TMDb 5000 Movie Dataset.
- Cosine Similarity: Measures the relevance of movies to suggest the closest matches.
- Model Persistence: Utilizes the Pickle library to save and load machine learning models efficiently.
- NumPy: For numerical computations.
- Pandas: For data manipulation and analysis.
- Scikit-Learn: For building and evaluating the recommendation engine.
- Streamlit: For creating the web-based interface.
- Pickle: For saving and loading models.
Dataset can be found on Kaggle https://www.kaggle.com/datasets/tmdb/tmdb-movie-metadata
This project integrates TMDb API to fetch movie details. Follow these steps to obtain your API Key and Access Token:
-
Sign up on TMDb
- Go to The Movie Database (TMDb)
- Create an account if you don’t already have one.
-
Request an API Key
- Navigate to your profile by clicking on your avatar (top-right corner).
- Go to Settings > API.
- Click on Create a new API key.
- Choose "Developer" and provide the required details.
- You will receive a TMDB API Key.
-
Generate an Access Token
- On the same API page, request a Bearer Token.
- TMDb will generate an access token that you can use for authentication.
-
Store API Credentials Securely
-
Create a
.envfile in your project directory and add the following:TMDB_API_KEY=your_tmdb_api_key_here TMDB_ACCESS_TOKEN=your_tmdb_access_token_here -
Do NOT share this file or upload it to GitHub.
-
Add
.envto your.gitignorefile to keep it private.
-
-
Data Preprocessing:
- Cleans the dataset by handling missing values and duplicates.
- Extracts relevant features like genres, directors, and actors.
-
Text Vectorization:
- Converts textual data into numerical format using techniques like TF-IDF.
-
Similarity Calculation:
- Computes cosine similarity between movies based on their features.
-
Recommendation Engine:
- Suggests movies based on similarity scores.
-
Web Application:
- Built using Streamlit for seamless user interaction.
-
Clone this repository:
git clone https://github.com/your-username/cinemai.git cd cinemai -
Install the required libraries:
pip install -r requirements.txt -
Run the application:
streamlit run main.py
