Skip to content

zoehendershot/Stroke-Risk-Prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

50 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Using Machine Learning to Understand and Predict Stroke Risk

Comparative analysis of machine learning techniques to a dataset of demographic, physiological, and lifestyle factors to both predict the likelihood of stroke and identify which factors contribute most to stroke risk.

Contents of this repository

This repository contains the project data (including the train–test split), along with the Jupyter notebooks used for exploratory data analysis, model building, hyperparameter tuning, and evaluation. It also includes a folder of final output images generated during the exploration and analysis.

Software & platform

  • Python version: 3.11.4
  • Notebook environment: Jupyter Notebook / VS Code with Jupyter Extension
  • Key packages: pandas, numpy, scikit-learn, PyTorch, Matplotlib, Seaborn
  • Operating system: Cross-platform (Windows, macOS, Linux). The notebooks use standard Python libraries and relative file paths, so they should run on any system with the required packages installed.

Map of documentation (folder tree)

Stroke-Risk-Prediction/
├── README.md
├── final_report.pdf
├── Notebooks/
│   ├── preprocessing.ipynb          # Data cleaning, feature engineering, train/test split
│   ├── EDA.ipynb                    # Exploratory data analysis and visualizations
│   ├── Logisticregression.ipynb     # Logistic regression model training
│   ├── RandomForest_Train.ipynb     # Random Forest model training and hyperparameter tuning
│   ├── SVC_PCA.ipynb                # Support Vector Classifier with PCA
│   ├── nn_train.ipynb               # Neural network model training
│   └── Final_Model.ipynb            # Final model evaluation and comparison
├── Data/
│   ├── healthcare-dataset-stroke-data.csv  # Original dataset
│   ├── X_train.csv                         # Preprocessed training features
│   ├── X_test.csv                          # Preprocessed test features
│   ├── y_train.csv                         # Training labels
│   └── y_test.csv                          # Test labels
└── Outputs/
    ├── age_by_stroke.png
    ├── age_distribution.png
    ├── avg_glucose_distribution.png
    ├── bmi_by_stroke.png
    ├── bmi_distribution_raw.png
    ├── correlation_heatmap.png
    ├── final_model_confusion_matrix.png
    ├── final_model_feature_importance.png
    ├── random_forest_decision_tree.png
    ├── stroke_by_gender.png
    ├── stroke_by_smoking_status.png
    ├── stroke_by_work_type.png
    └── stroke_distribution.png

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors