Skip to content

avrtt/anomscoring-PoC

Repository files navigation

Here you can find a financial fraud detection & credit risk scoring system that uses anomaly detection, ensemble methods, deep NNs, gradient boosting, custom metric evaluation (ROC AUC & Kolmogorov-Smirnov) and explainability techniques on synthetic financial data to simulate real-world risk analytics. This project is a proof-of-concept part of my real freelance work that was published with the client's permission; all sensitive data has been replaced with synthetic examples for privacy purposes.

The project combines two modules:

  • Fraud detection module
    Uses transaction logs (with features such as transaction amount, time, risk scores) to train models (incl. logreg, random forest, ensemble voting classifier, NNs and an isolation forest for anomaly detection. Advanced explainability is provided via SHAP (if installed).
  • Credit risk scoring module
    Generates synthetic customer data (income, credit score, loan amounts, etc.) to build probability-of-default models using logistic regression and XGBoost and further enhances results via ensemble methods. It computes key metrics such as ROC AUC and KS statistic and plots ROC curves.

Features:

  • Data generation
    Synthetic data creation for both transaction logs and customer credit history.
  • Fraud detection
    • Preprocessing, feature scaling and handling missing values.
    • Multiple model trainings: logistic regression, random forest (with GridSearch), neural network (Keras), voting classifier ensemble and anomaly detection using isolation forest.
    • Model explainability using SHAP.
    • Visualization of ROC curves.
  • Credit risk scoring
    • Preprocessing with scaling and missing value imputation.
    • Training of Logistic Regression and XGBoost models (with hyperparameter tuning via GridSearch) plus ensemble combination.
    • Calculation of custom metrics (ROC AUC, KS statistic) and plotting of ROC curves.
  • Utilities & testing
    Utility modules for data processing, model saving/loading, custom metric computation and unit tests for each module.

Dependencies

numpy 
pandas 
scikit-learn 
xgboost 
tensorflow 
keras 
matplotlib 
scipy 
joblib 
shap

Installation

  1. Clone:

    git clone [email protected]:avrtt/anomscoring-PoC.git
    cd anomscoring-PoC
  2. Create a venv:

    python -m venv venv
    source venv/bin/activate # Windows: venv\Scripts\activate
  3. Install dependencies. Create a requirements.txt or install manually with:

    pip install numpy pandas scikit-learn xgboost tensorflow keras matplotlib scipy joblib shap

Usage

Run the integrated system:

python main.py --module all

Or run individual modules:

python fraud_detection.py
python credit_risk_scoring.py

Contribution

Feel free to open PRs and issues.

License

MIT

About

A financial fraud detection & credit risk scoring system utilizing a variety of techniques

Topics

Resources

License

Stars

Watchers

Forks

Languages