Black Friday marks the beginning of the major holiday shopping season, where retailers experience extremely high sales volumes. Understanding customer purchasing behavior during this period helps businesses optimize pricing strategies, inventory planning, and marketing efforts.
This project focuses on performing detailed Exploratory Data Analysis (EDA) and data preprocessing on historical Black Friday sales data. The main objective of this phase is to clean, transform, and prepare the dataset for future machine learning modeling.
This project is part of my 100 Days of Machine Learning journey.
The dataset contains transactional records from a retail store during Black Friday sales.
-
User_ID -
Product_ID -
Gender -
Age -
Occupation -
City_Category -
Stay_In_Current_City_Years -
Marital_Status -
Product_Category_1 -
Product_Category_2 -
Product_Category_3 -
Purchase(Target Variable) -
Total Records: 500,000+
-
Total Columns: 12
-
Target Variable: Purchase
- Checked dataset shape and structure
- Reviewed data types
- Identified missing values
- Checked duplicate records
- Identified null values in
Product_Category_2andProduct_Category_3 - Visualized missing data using heatmaps
- Calculated percentage of missing values
- Analyzed distribution of
Purchase - Checked skewness and kurtosis
- Visualized outliers using boxplots
- Male customers contributed a larger share of total purchases.
- Average spending differed between genders.
- Customers aged 26–35 showed higher purchasing activity.
- Other age groups showed comparatively lower spending patterns.
- Certain occupations demonstrated higher average purchase amounts.
- Spending behavior varied across city categories.
- Filled missing values in
Product_Category_2andProduct_Category_3 - Ensured dataset contains no null values before modeling
- Dropped
User_IDandProduct_IDas they do not contribute to prediction
- Used the IQR (Interquartile Range) method to detect outliers
- Applied capping technique to reduce the impact of extreme values
-
Label Encoding applied to:
Gender
-
Ordinal Mapping applied to:
Age
-
One-Hot Encoding applied to:
City_Category
-
Cleaned and converted:
Stay_In_Current_City_Yearsinto numeric format
- Applied StandardScaler to normalize numerical features
- Ensured features are centered and scaled
- Prepared dataset for future regression modeling
- Python
- Pandas
- NumPy
- Matplotlib
- Seaborn
- Scikit-learn
- Jupyter Notebook
✅ Data Cleaning Completed
✅ Exploratory Data Analysis Completed
✅ Missing Value Treatment Completed
✅ Feature Engineering Completed
✅ Feature Scaling Completed
🔜 Next Step: Applying Regression Models to Predict Purchase Amount
Black-Friday-Sales/ │ ├── notebook.ipynb ├── dataset.csv └── README.md