Skip to content

PercyMayaba/Driver-Risk-Scoring-Model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Driver-Risk-Scoring-Model

This project implements a machine learning system to predict driver risk levels (Low, Medium, High) based on telematics data. The model helps insurance companies assign risk categories for personalized premium pricing using synthetic driving behavior data. Insurance Telematics Risk Assessment:

Predict driver risk levels from telematics features

Enable data-driven premium pricing

Reduce claims through risk-aware pricing

Improve customer segmentation

📊 Features Used

The model analyzes 8 key telematics features:

Speed Variance - Variability in driving speed (km/h)

Harsh Braking Events - Sudden braking incidents per week

Hard Acceleration Count - Rapid acceleration events per week

Night Driving Frequency - Percentage of trips during night hours

Cornering Intensity - Aggressiveness in taking turns (1-5 scale)

Mileage Per Trip - Average distance per driving session (km)

Rapid Lane Changes - Quick lane switching events per week

Idle Time Ratio - Percentage of time spent idling

🏗️ Model Architecture Algorithms Implemented:

Random Forest Classifier

XGBoost Classifier

Neural Network (3-layer DNN)

Performance Metrics:

Accuracy Score

Classification Report

Confusion Matrix

Cross-validation

📁 Project Structure text

driver-risk-scoring/ ├── driver_risk_scoring.ipynb # Main Colab notebook ├── driver_risk_model.pkl # Trained model (output) ├── scaler.pkl # Feature scaler (output) ├── feature_names.txt # Feature list (output) └── README.md # This file

🚀 Quick Start

  1. Installation & Setup python

Run in Colab cell

!pip install sdv xgboost tensorflow scikit-learn pandas numpy matplotlib seaborn

  1. Data Generation

    Synthetic dataset of 10,000 drivers

    8 telematics features with realistic distributions

    3 risk categories: Low, Medium, High

  2. Model Training python

Example training code

from xgboost import XGBClassifier model = XGBClassifier(n_estimators=100, random_state=42) model.fit(X_train, y_train)

  1. Risk Prediction python

Predict risk for a new driver

sample_driver = [15.3, 3.2, 4.5, 0.4, 2.5, 22.1, 2.8, 0.15] risk_prediction = model.predict([sample_driver]) premium_multiplier = calculate_premium(risk_prediction)

💡 Key Features 🔍 Exploratory Data Analysis

Feature distributions by risk level

Correlation matrix visualization

Risk category balance analysis

🤖 Multiple ML Models

Random Forest: Robust ensemble method

XGBoost: High-performance gradient boosting

Neural Network: Deep learning approach

📈 Model Evaluation

Accuracy comparison across algorithms

Feature importance analysis

Confusion matrices

Business impact assessment

💼 Insurance Application

Risk-based premium calculation

Premium multipliers:

    Low Risk: 0.8x (20% discount)

    Medium Risk: 1.0x (standard premium)

    High Risk: 1.5x (50% surcharge)

📊 Results Summary Model Performance:

XGBoost: ~95% accuracy (recommended)

Random Forest: ~94% accuracy

Neural Network: ~92% accuracy

Top Predictive Features:

Harsh Braking Events

Speed Variance

Night Driving Frequency

Business Impact:

Risk-adjusted premium pricing

Fair customer segmentation

Data-driven underwriting decisions

🛠️ Usage Examples Individual Risk Assessment python

Assess a single driver

driver_data = [8.2, 1.5, 2.1, 0.2, 1.8, 12.3, 1.2, 0.08] result = calculate_risk_score_and_premium(driver_data, model)

print(f"Risk Level: {result['predicted_risk']}") print(f"Premium Multiplier: {result['suggested_premium_multiplier']}x")

Batch Processing python

Process multiple drivers

risk_predictions = model.predict(batch_driver_data) premiums = [calculate_premium(pred) for pred in risk_predictions]

📈 Visualizations

The project includes comprehensive visualizations:

Risk distribution charts

Feature importance plots

Model accuracy comparisons

Training history graphs (Neural Network)

Correlation heatmaps

🔮 Future Enhancements

Real telematics data integration

Time-series analysis of driving patterns

Geospatial risk factors

Weather condition integration

Real-time risk scoring API

Mobile app integration

📚 Technical Details Data Synthesis:

Generated using sklearn's make_classification

Realistic value ranges based on industry data

Balanced risk categories

Model Selection:

XGBoost chosen for best performance

Hyperparameter tuning ready

Cross-validation implemented

Deployment Ready:

Model serialization with joblib

Feature scaling pipeline

Production inference code

👥 Target Audience

Insurance Companies: Risk assessment teams

Data Scientists: ML model development

Product Managers: Insurance product design

Researchers: Telematics and risk modeling

📄 License

This project is for educational and demonstration purposes. Adapt for commercial use with proper validation and compliance with insurance regulations.

Note: This implementation uses synthetic data. For production use, validate with real telematics data and ensure compliance with local insurance regulations and data privacy laws.

About

This project implements a machine learning system to predict driver risk levels (Low, Medium, High) based on telematics data. The model helps insurance companies assign risk categories for personalized premium pricing using synthetic driving behavior data.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors