Skip to content

srijan399/mall_seg_kdb

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MallMind

Customer segmentation app built with Streamlit. Takes the classic Mall Customers dataset (200 rows), runs K-Means and DBSCAN clustering, and lets you explore the results interactively.

Quick start

python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
streamlit run app.py

Opens at http://localhost:8501. The bundled Mall_Customers.csv loads automatically — or upload your own CSV through the sidebar.

Project structure

customer_seg/
├── app.py                  # Entry point — page config, sidebar, tab routing
├── config.py               # Constants: colours, Plotly theme, personas, defaults
├── data_loader.py          # CSV ingestion and column normalisation
├── preprocessing.py        # StandardScaler, LabelEncoder, alignment checks
├── clustering.py           # K-Means, DBSCAN, elbow, PCA, summaries
├── components.py           # Reusable UI pieces (metric cards, banners, etc.)
├── styles.css              # All custom CSS lives here
├── tabs/
│   ├── eda.py              # Distributions, scatter plots, correlation, box plots
│   ├── preprocessing_tab.py# Pipeline overview, null check, descriptive stats
│   ├── models.py           # Elbow/silhouette, PCA visualisation, radar chart
│   ├── predict.py          # Predict cluster for a new customer
│   └── metrics.py          # Silhouette, DBI, inertia, model comparison
├── Mall_Customers.csv      # 200-row dataset (Kaggle)
├── report.md               # Maths, metrics breakdown, finetuning notes
└── README.md

Dataset

Mall Customer Segmentation from Kaggle — 200 customers with age, gender, annual income (k$), and a mall-assigned spending score (1–100).

What the app does

  1. EDA — KPI cards, histograms, gender split, correlation heatmap, income-vs-spending scatter, age group analysis, box plots by gender.
  2. Preprocessing — Shows the pipeline step by step: load → null check → label encoding → feature selection → StandardScaler.
  3. Models — Elbow method + silhouette sweep for picking K. PCA-projected cluster plots for both K-Means and DBSCAN. Cluster profile table and radar chart.
  4. Predict — Slide in age/income/score for a hypothetical customer and see which segment they land in under each model.
  5. Metrics — Silhouette, Davies-Bouldin, inertia side by side for both models. Cluster size distribution.

Tuning tips

  • The 5 income×spending clusters are cleanest when you use just Annual Income + Spending Score as features. Adding Age blurs boundaries and drops silhouette.
  • DBSCAN with the default ε=0.5 on standardised data is tight — bump to 0.6–0.8 and lower min_samples to 3–4 to reduce noise points.
  • Check report.md for the full maths and a "was this successful?" checklist.

Dependencies

Streamlit, pandas, numpy, scikit-learn, plotly, matplotlib, seaborn, joblib. All pinned in requirements.txt.

About

Mall Customer Segmentation using K-means and DB-Scan

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors