This is my second project as part of Exploratory Data Analysis. The link to the first one is https://github.com/sudeshnadutta/EDA-on-IPL-Data. Do have a look and let me know your thoughts.
The dataset contains information on 979 movies from IMDB site. It has data on star ratings, content ratings, genre, duration and actors of movies.
The primary goal of this exercise is to do an exploratory data analysis on this dataset so as to come up with interesting patterns which can be further used in advanced statistical or machine learning problems.
The workflow of the EDA process is as follows:
- Imporing necessary packages
- Generating an overview of the data
- Cleaning and preparing data for analysis purpose
- Analysing different patterns in the dataset
Being a newbie in this field, I have tried to get my hands dirty and figured out few patterns in the data but I beleive that there are definitely much more hidden patterns. So, you guys can definitely give it a try and find out more interesting facts and then we all can have a much better understanding of the data which will help us in future analyses as well.