π₯ Predict customer churn before it happens!
A complete end-to-end Artificial Neural Network (ANN) solution built on the Telco Customer Churn dataset β with every code block explained for absolute beginners.
- π― Project Overview
- β¨ Key Features
- π Dataset
- π οΈ Model Architecture
- π Step-by-Step Code Explanation
- π Installation & How to Run
- π Results & Visualizations
- π§ Dependencies
- π‘ Business Impact
- π Future Improvements
- π€ Contributing
- π License
Customer churn is when a customer stops using a company's services. It is one of the most expensive problems in subscription businesses β telecom, SaaS, banking, and beyond. Retaining an existing customer is 5β25Γ cheaper than acquiring a new one.
This project builds a powerful Artificial Neural Network (ANN) to predict whether a customer will churn (Yes / No). The model leverages 20+ customer features including tenure, monthly charges, contract type, payment method, and service subscriptions.
Why ANN?
Deep neural networks excel at capturing complex non-linear patterns and feature interactions that traditional machine learning models often miss.
Goal: Deliver high precision + recall so businesses can proactively offer retention campaigns to at-risk customers β before they leave.
- β Complete end-to-end pipeline (raw CSV β trained ANN)
- β Professional data cleaning & preprocessing
- β
Handles missing values in
TotalCharges - β One-hot encoding of all categorical features
- β Feature scaling for neural network stability
- β Deep ANN with hidden layers, ReLU activation & Dropout regularization
- β Training history visualization (accuracy & loss curves)
- β Ready for hyperparameter tuning
- β Every code block explained for absolute beginners
File: Customer-Churn.csv (included in repository)
Size: 7,043 customers Γ 21 columns
Source: Classic Telco Customer Churn dataset
| Column | Description | Type |
|---|---|---|
customerID |
Unique ID β dropped before training | String |
gender |
Male / Female | Categorical |
SeniorCitizen |
0 = No, 1 = Yes | Binary |
Partner |
Has a partner? | Yes/No |
Dependents |
Has dependents? | Yes/No |
tenure |
Months with the company | Numerical |
PhoneService |
Has phone service? | Yes/No |
MultipleLines |
Multiple lines option | Categorical |
InternetService |
DSL / Fiber optic / No | Categorical |
OnlineSecurity |
Online security subscription | Categorical |
OnlineBackup |
Online backup add-on | Categorical |
DeviceProtection |
Device protection plan | Categorical |
TechSupport |
Tech support subscription | Categorical |
StreamingTV |
Streaming TV add-on | Categorical |
StreamingMovies |
Streaming movies add-on | Categorical |
Contract |
Month-to-month / One year / Two year | Categorical |
PaperlessBilling |
Paperless billing enabled? | Yes/No |
PaymentMethod |
Electronic check, Mailed check, etc. | Categorical |
MonthlyCharges |
Current monthly bill amount | Numerical |
TotalCharges |
Total amount billed β contains blanks for new customers | Numerical |
Churn |
Target variable: Yes = customer left | Binary |
β οΈ Important Note:TotalChargescontains blank strings for customers withtenure = 0(brand-new customers). The notebook converts these to numeric and handles them properly usingpd.to_numeric(..., errors='coerce')followed bydropna().
Sequential ANN
βββ Input Layer β 26 neurons (after one-hot encoding + scaling)
βββ Hidden Layer 1 β 32 neurons + ReLU activation + Dropout(0.2)
βββ Hidden Layer 2 β 16 neurons + ReLU activation
βββ Output Layer β 1 neuron + Sigmoid (outputs churn probability 0β1)
| Component | Choice | Reason |
|---|---|---|
| Loss Function | Binary Crossentropy | Standard for binary classification |
| Optimizer | Adam | Adaptive learning rate; fast convergence |
| Output Activation | Sigmoid | Produces probability between 0 and 1 |
| Hidden Activation | ReLU | Avoids vanishing gradient; trains fast |
| Regularization | Dropout (0.2) | Prevents overfitting on training data |
| Metrics | Accuracy, Precision, Recall, F1-score | Balanced view of model performance |
Every single block from Churn Prediction Model.ipynb is explained below so even a complete beginner can follow along.
# Customer Churn Prediction Model Using Artificial Neural Network (ANN)Explanation: Just a heading β pure documentation. No code runs here. Good practice to always title your notebooks clearly.
import pandas as pd
import numpy as np
import matplotlib.pyplot as pltExplanation:
pandasβ Reads CSV files and manipulates tabular data (think: Excel in Python).numpyβ Fast math and array operations that pandas relies on under the hood.matplotlib.pyplotβ Creates professional plots and charts.
Tip: These three libraries are the foundation of almost every data science project in Python.
df = pd.read_csv(r"C:\Customer-Churn.csv")Explanation: Reads the CSV file from disk into a Pandas DataFrame named df. The r prefix before the file path treats backslashes as raw characters β essential for Windows paths. After this line, all 7,043 customer records are in memory and ready to work with.
df.sample(5)Explanation: Displays 5 randomly selected rows so you can eyeball the data β see what columns look like, what values exist, and do a sanity check before proceeding.
df.drop('customerID', axis='columns', inplace=True)Explanation:
customerIDis unique per row and carries zero predictive signal.axis='columns'tells pandas to drop a column (not a row).inplace=Truemodifiesdfdirectly instead of creating a copy.
After this step, we're left with 20 meaningful features.
df.sample(5)Explanation: Another quick peek to confirm customerID is gone. Always verify your cleaning steps!
pd.to_numeric(df.TotalCharges, errors='coerce').isnull()Explanation:
- Some rows have blank strings (
"") inTotalChargesβ these belong to customers withtenure = 0(brand-new, never billed). pd.to_numeric(..., errors='coerce')attempts conversion to a number; anything that fails becomesNaN..isnull()returns a boolean mask showing which rows have the problem.
Full fix applied in the notebook:
df['TotalCharges'] = pd.to_numeric(df['TotalCharges'], errors='coerce')
df.dropna(subset=['TotalCharges'], inplace=True)df = pd.get_dummies(df, drop_first=True)Explanation: Neural networks require numerical inputs β they cannot work with strings like "Male" or "Month-to-month". pd.get_dummies() converts every categorical column into binary 0/1 indicator columns (one-hot encoding). drop_first=True removes one redundant column per category to avoid multicollinearity.
df['Churn'] = df['Churn'].map({'Yes': 1, 'No': 0})Explanation: Converts the target column from text (Yes/No) to integers (1/0) so the neural network can compute loss and gradients against it.
X = df.drop('Churn', axis='columns')
y = df['Churn']Explanation:
X= all input features (everything except the target).y= the target column (what we're trying to predict).
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)Explanation: Splits data into 80% training and 20% testing. random_state=42 ensures reproducibility β you'll get the same split every time you run the notebook.
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)Explanation: Neural networks are sensitive to the scale of inputs. StandardScaler standardizes each feature to have mean = 0 and standard deviation = 1. Key rule: fit only on training data, then transform both train and test β this prevents data leakage.
import tensorflow as tf
model = tf.keras.Sequential([
tf.keras.layers.Dense(32, activation='relu', input_shape=(X_train.shape[1],)),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(16, activation='relu'),
tf.keras.layers.Dense(1, activation='sigmoid')
])Explanation:
Sequential= layers stacked one after another.Dense(32, activation='relu')= 32 neurons, each connected to every input, using ReLU activation.Dropout(0.2)= randomly deactivates 20% of neurons during each training step to prevent overfitting.Dense(1, activation='sigmoid')= output layer producing a probability between 0 and 1.
model.compile(
optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy']
)Explanation: compile() configures the learning process. Adam optimizer adapts the learning rate automatically. Binary crossentropy is the standard loss for yes/no classification problems.
history = model.fit(
X_train, y_train,
epochs=50,
batch_size=32,
validation_data=(X_test, y_test)
)Explanation: Trains the ANN for 50 passes over the training data. batch_size=32 means the model updates weights after every 32 samples. validation_data lets us monitor performance on unseen data at each epoch.
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(10, 8))
ax1.plot(history.history['accuracy'], label='Train')
ax1.plot(history.history['val_accuracy'], label='Validation')
ax1.set_title('Model Accuracy')
ax1.set_ylabel('Accuracy')
ax1.set_xlabel('Epoch')
ax1.legend(loc='upper left')
ax2.plot(history.history['loss'], label='Train')
ax2.plot(history.history['val_loss'], label='Validation')
ax2.set_title('Model Loss')
ax2.set_ylabel('Loss')
ax2.set_xlabel('Epoch')
ax2.legend(loc='upper right')
plt.tight_layout()
plt.show()Explanation:
historystores accuracy and loss recorded after each epoch.- We create two subplots stacked vertically:
- Top: Accuracy over time for both train and validation sets.
- Bottom: Loss (error) over time for both sets.
- Watching both curves together helps diagnose overfitting (training accuracy rises while validation drops).
plt.tight_layout()prevents labels from overlapping.
from sklearn.metrics import classification_report, confusion_matrix
y_pred = (model.predict(X_test) > 0.5).astype(int)
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))Explanation:
model.predict()outputs probabilities (e.g.,0.73).> 0.5converts to binary predictions: above 0.5 β churn, below β no churn.confusion_matrixshows true positives, false positives, true negatives, false negatives.classification_reportprints precision, recall, and F1-score for both classes.
Step 1 β Clone the repository
git clone <https://github.com/ibtesaamaslam/Customer-Churn-Prediction-Model>
cd Customer-Churn-Prediction-ANNStep 2 β Create a virtual environment (recommended)
python -m venv venv
venv\Scripts\activate # Windows
source venv/bin/activate # macOS / LinuxStep 3 β Install dependencies
pip install -r requirements.txtStep 4 β Launch Jupyter Notebook
jupyter notebookOpen Churn Prediction Model.ipynb and run all cells. π
Note:
Customer-Churn.csvis already included in the repository β no additional download needed.
Training curves show accuracy steadily increasing and loss decreasing over 50 epochs, indicating stable learning without severe overfitting.
Typical performance on this dataset:
| Metric | Score |
|---|---|
| Accuracy | ~80β82% |
| Precision (churn class) | ~75%+ |
| Recall (churn class) | ~70%+ |
| F1-Score (churn class) | ~72%+ |
Accuracy and loss plots are generated directly inside the notebook as interactive Matplotlib figures.
requirements.txt
pandas
numpy
matplotlib
tensorflow>=2.10
scikit-learn
jupyter
seaborn
Install all at once with:
pip install -r requirements.txtWhy does this model matter in the real world?
- Cost savings β Target only high-risk customers with discounts or retention campaigns instead of blanket offers to everyone.
- Revenue protection β Reducing churn by even 5β10% can translate to significant recurring revenue gains.
- Actionable insights β The model highlights which features most strongly drive churn, such as month-to-month contracts, fiber optic service, and high monthly charges, giving product and CX teams clear levers to pull.
- Proactive vs reactive β Shift from reacting to cancellations to preventing them before the customer decides to leave.
- Hyperparameter tuning with Keras Tuner or GridSearchCV
- Model explainability with SHAP or LIME
- Handle class imbalance using SMOTE or class weights
- Deploy as a web app using FastAPI + Streamlit
- Benchmark against XGBoost and LightGBM
- Add cross-validation for more robust evaluation
Contributions are very welcome!
- Fork the repository
- Create a feature branch (
git checkout -b feature/your-feature-name) - Commit your changes (
git commit -m 'Add: your feature description') - Push to the branch (
git push origin feature/your-feature-name) - Open a Pull Request
Please open an issue first to discuss major changes. All contributions β bug fixes, documentation, new features β are appreciated. β€οΈ
Distributed under the MIT License. See LICENSE for full details.
Made with β€οΈ for the data science community β If you found this project helpful, please consider giving it a star!