MallMind

Customer segmentation app built with Streamlit. Takes the classic Mall Customers dataset (200 rows), runs K-Means and DBSCAN clustering, and lets you explore the results interactively.

Quick start

python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
streamlit run app.py

Opens at http://localhost:8501. The bundled Mall_Customers.csv loads automatically — or upload your own CSV through the sidebar.

Project structure

customer_seg/
├── app.py                  # Entry point — page config, sidebar, tab routing
├── config.py               # Constants: colours, Plotly theme, personas, defaults
├── data_loader.py          # CSV ingestion and column normalisation
├── preprocessing.py        # StandardScaler, LabelEncoder, alignment checks
├── clustering.py           # K-Means, DBSCAN, elbow, PCA, summaries
├── components.py           # Reusable UI pieces (metric cards, banners, etc.)
├── styles.css              # All custom CSS lives here
├── tabs/
│   ├── eda.py              # Distributions, scatter plots, correlation, box plots
│   ├── preprocessing_tab.py# Pipeline overview, null check, descriptive stats
│   ├── models.py           # Elbow/silhouette, PCA visualisation, radar chart
│   ├── predict.py          # Predict cluster for a new customer
│   └── metrics.py          # Silhouette, DBI, inertia, model comparison
├── Mall_Customers.csv      # 200-row dataset (Kaggle)
├── report.md               # Maths, metrics breakdown, finetuning notes
└── README.md

Dataset

Mall Customer Segmentation from Kaggle — 200 customers with age, gender, annual income (k$), and a mall-assigned spending score (1–100).

What the app does

EDA — KPI cards, histograms, gender split, correlation heatmap, income-vs-spending scatter, age group analysis, box plots by gender.
Preprocessing — Shows the pipeline step by step: load → null check → label encoding → feature selection → StandardScaler.
Models — Elbow method + silhouette sweep for picking K. PCA-projected cluster plots for both K-Means and DBSCAN. Cluster profile table and radar chart.
Predict — Slide in age/income/score for a hypothetical customer and see which segment they land in under each model.
Metrics — Silhouette, Davies-Bouldin, inertia side by side for both models. Cluster size distribution.

Tuning tips

The 5 income×spending clusters are cleanest when you use just Annual Income + Spending Score as features. Adding Age blurs boundaries and drops silhouette.
DBSCAN with the default ε=0.5 on standardised data is tight — bump to 0.6–0.8 and lower min_samples to 3–4 to reduce noise points.
Check report.md for the full maths and a "was this successful?" checklist.

Dependencies

Streamlit, pandas, numpy, scikit-learn, plotly, matplotlib, seaborn, joblib. All pinned in requirements.txt.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
datasets		datasets
tabs		tabs
utils		utils
.gitignore		.gitignore
README.md		README.md
WALKTHROUGH.md		WALKTHROUGH.md
app.py		app.py
report.md		report.md
requirements.txt		requirements.txt
styles.css		styles.css

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MallMind

Quick start

Project structure

Dataset

What the app does

Tuning tips

Dependencies

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MallMind

Quick start

Project structure

Dataset

What the app does

Tuning tips

Dependencies

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages