This project builds a supervised machine learning pipeline to predict equipment failures using historical sensor data.
It includes complete steps from preprocessing to training and evaluation β all in a production-style modular setup.
predictive_maintenance/
β
βββ data/ # Raw and processed CSV files
βββ notebooks/ # EDA notebook
βββ scripts/ # Feature engineering, training, evaluation scripts
βββ models/ # Saved models (pkl)
βββ output/ # Evaluation outputs (plots & predictions)
βββ main.py # One-click pipeline entry
βββ README.md # Project description
βββ requirements.txt # Required packages
Can we predict if a machine is going to fail based on sensor readings like torque, temperature, and speed?
This project answers that by:
- Engineering features from raw IoT-style telemetry data
- Training tree-based models (Decision Tree, Random Forest)
- Evaluating performance with accuracy, confusion matrix, and ROC-AUC
- Visualizing model insights (feature importance)
- Source: AI4I 2020 Predictive Maintenance Dataset
- Size: ~10,000 samples
- Features: Temperature, Speed, Torque, Tool wear, etc.
- Label:
Machine failure(0: No, 1: Yes)
- Python 3.8
- pandas, NumPy, matplotlib, seaborn
- scikit-learn, joblib
- ROC, AUC, confusion matrix, feature importance
- Clone the repository
- Create a virtual environment & install requirements:
pip install -r requirements.txt- Run the full pipeline:
python main.pyThis will:
- Preprocess the data
- Train models
- Generate evaluation outputs (in
/output)
- Random Forest Accuracy: 99.9%
- Decision Tree Accuracy: 99.8%
- Both models performed well; Random Forest showed better precision with fewer false positives.
π½ Outputs:
output/roc_curve.pngoutput/feature_importance.pngoutput/predictions.csv
Mehmet Ozturk
Feel free to connect: GitHub
Open to suggestions, ideas, and collaboration!
Follow the repo to stay updated with future ML projects. π