In this project, we put ourselves as a part of the Data Scientist Team of certain shope, e-commerce. We are assigned to help the marketing team to create segmentation of Olist customers based on their behavior.
Kaggle Brazilian E-Commerce Public Dataset by Olist
Business Problem :
- How to segment the customers at Olist marketplace so we can divide customers based on their shopping behaviour?
- What kind of treatment for each cluster to increase retention rate customer?
| Attribute | Data Type | Description |
|---|---|---|
| order_id | object | order unique identifier |
| order_purchase_stamp | datetime64[s] | shows the purchase timestamp. |
| order_item_id | float64 | sequential number identifying number of items included in the same order. |
| product unique identifier | object | product unique identifier |
| payment_type | object | method of payment chosen by the customer. |
| payment_value | float64 | transaction value. |
| review_score | float64 | note ranging from 1 to 5 given by the customer on a satisfaction survey |
| customer_unique_id | object | unique identifier of a customer |
| product_category_name_english | object | category name in English |
| month_order | object | month of order |
| weekday_order | object | weekday of order |
| month_year_order | period[M] | month and year of order |
- Data Merging
- Data Cleaning and Data Pre-processing
- Exploratory Data Analysis (EDA)
- Modeling (K-Means Clustering using RFM)
- Conclusion
- Business Recommendation
In the modelling section, the features we use are Recency, Frequency, and Monetary from customers. These three things can describe the transaction behaviour of a customer. The meaning of RFM itself is:
- Recency: The last time the customer made a purchase
- Frequency: Number of transactions
- Monetary: The spending power of a customer
By using the RFM feature, we use the K-Means Clustering algorithm to perform customer segmentation.
Based on the Elbow Method, we choose 4 clusters.| recency | frequency | monetary | |
|---|---|---|---|
| Best Customers | 207.85 | 11.40 | 27733.93 |
| Loyal Customers | 236.80 | 3.97 | 1141.96 |
| New Customers | 132.46 | 1.11 | 170.77 |
| Lost Customers | 392.97 | 1.11 | 170.48 |
| RFM Segment | Description | Strategy |
|---|---|---|
| Best Customers | Made transactions recently, made more than 1 transaction, and had the highest total transactions. | Loyalty program/reward points, new product recommendations, and exclusive product offers. (Cross / Up-Selling Strategy) |
| Loyal Customers | made transactions recently, made more than 1 transaction, and had the high total transactions. | loyalty program/reward points and new product recommendations(Cross / Up-Selling Strategy) |
| New Customers | Made transactions recently, made only 1 transaction, and had the low total transactions. | Welcome e-mail to build the relationship, offer loyalty program/reward points, and discount vouchers (Cross/Up-Selling Strategy) |
| Lost Customers | Not made a transaction for a long time, made only 1 transaction, and had the lowest total transactions. | Regular limited offers, discount vouchers, campaign via e-mail and asking for feedback. (Retention & Reactivate Strategies) |



