🧠 Stroke Risk Prediction

📌 Project Overview

Stroke is a critical health condition and a leading cause of death and long-term disability worldwide. This project focuses on predicting the risk of stroke using machine learning models based on medical and demographic features.

Two models were implemented and evaluated:

Decision Tree Classifier
Random Forest Classifier (🏆 Best Performing)

📂 Project Resources

🔹 Dataset (CSV File): Download healthcare-dataset-stroke-data.csv
🔹 Project Code (.ipynb): View Jupyter Notebook
🔹 Project Report (DOCX File): Download Report
🔹 Presentation (PPTX File): Download PPT

🧾 Dataset Overview

Total Records: 5,110
Target Variable: stroke (1 = Stroke occurred, 0 = No stroke)
Features:
- Demographic: Age, Gender, Marital Status, Residence Type
- Clinical: Hypertension, Heart Disease, Glucose Level, BMI
- Lifestyle: Smoking Status, Work Type

📝 Class Imbalance Notice: Only ~5% of the data points represent stroke cases, making class balancing necessary.

🛠 Data Preprocessing

✔️ Cleaning & Transformation:

Dropped non-informative id column
Handled missing values in bmi using median imputation
Encoded categorical variables using LabelEncoder

✔️ Outlier Removal:

Used IQR method to eliminate outliers in bmi and avg_glucose_level

✔️ Feature Scaling:

Applied StandardScaler to numerical columns

✔️ Train-Test Split:

80% Training, 20% Testing using Stratified Sampling

✔️ Imbalance Handling:

Used SMOTE (Synthetic Minority Over-sampling Technique) on training set

📊 Exploratory Data Analysis

Visualizations: Histograms, KDE plots, bar charts, and heatmaps
Findings:
- Higher age and glucose levels correlate positively with stroke risk
- Hypertension and heart disease increase likelihood of stroke
- Gender and residence type have minimal impact

🤖 Models Used

1️⃣ Decision Tree Classifier

✔️ Transparent and interpretable
❌ Slightly lower accuracy on imbalanced data

2️⃣ Random Forest Classifier (🏆 Best Model)

✅ High accuracy and better generalization
✅ Handles imbalance effectively with SMOTE
✅ Offers feature importance insights

📈 Model Performance

Model	Accuracy	Precision	F1 Score
Decision Tree	89.5%	68.2%	74.4%
Random Forest	93.7%	75.9%	81.1%

📌 Random Forest outperformed Decision Tree in all metrics.

🔍 Feature Importance

The top predictors identified by the Random Forest model:

Age 🥇
Average Glucose Level
BMI
Hypertension
Heart Disease
Smoking Status

🚀 Future Scope

🔹 Incorporate more clinical features (blood pressure, cholesterol, medication history)
🔹 Apply deep learning methods (e.g., Neural Networks, LSTM)
🔹 Develop a real-time decision support system for healthcare providers

🏗 Project Highlights

This project showcases the complete data science workflow for stroke risk prediction, including:
✔️ Data Preprocessing & Cleaning
✔️ Exploratory Data Analysis and Feature Engineering
✔️ Model Development and Evaluation (Decision Tree, Random Forest)
✔️ Report, Visualization, and Documentation

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
Stroke Risk Prediction Code.ipynb		Stroke Risk Prediction Code.ipynb
Stroke Risk Prediction Report.pdf		Stroke Risk Prediction Report.pdf
Stroke Risk Predictions PPT.pptx		Stroke Risk Predictions PPT.pptx
healthcare-dataset-stroke-data.csv		healthcare-dataset-stroke-data.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧠 Stroke Risk Prediction

📌 Project Overview

📂 Project Resources

🧾 Dataset Overview

🛠 Data Preprocessing

📊 Exploratory Data Analysis

🤖 Models Used

1️⃣ Decision Tree Classifier

2️⃣ Random Forest Classifier (🏆 Best Model)

📈 Model Performance

🔍 Feature Importance

🚀 Future Scope

🏗 Project Highlights

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🧠 Stroke Risk Prediction

📌 Project Overview

📂 Project Resources

🧾 Dataset Overview

🛠 Data Preprocessing

📊 Exploratory Data Analysis

🤖 Models Used

1️⃣ Decision Tree Classifier

2️⃣ Random Forest Classifier (🏆 Best Model)

📈 Model Performance

🔍 Feature Importance

🚀 Future Scope

🏗 Project Highlights

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages