Here you can find a financial fraud detection & credit risk scoring system that uses anomaly detection, ensemble methods, deep NNs, gradient boosting, custom metric evaluation (ROC AUC & Kolmogorov-Smirnov) and explainability techniques on synthetic financial data to simulate real-world risk analytics. This project is a proof-of-concept part of my real freelance work that was published with the client's permission; all sensitive data has been replaced with synthetic examples for privacy purposes.
The project combines two modules:
- Fraud detection module
Uses transaction logs (with features such as transaction amount, time, risk scores) to train models (incl. logreg, random forest, ensemble voting classifier, NNs and an isolation forest for anomaly detection. Advanced explainability is provided via SHAP (if installed). - Credit risk scoring module
Generates synthetic customer data (income, credit score, loan amounts, etc.) to build probability-of-default models using logistic regression and XGBoost and further enhances results via ensemble methods. It computes key metrics such as ROC AUC and KS statistic and plots ROC curves.
Features:
- Data generation
Synthetic data creation for both transaction logs and customer credit history. - Fraud detection
- Preprocessing, feature scaling and handling missing values.
- Multiple model trainings: logistic regression, random forest (with GridSearch), neural network (Keras), voting classifier ensemble and anomaly detection using isolation forest.
- Model explainability using SHAP.
- Visualization of ROC curves.
- Credit risk scoring
- Preprocessing with scaling and missing value imputation.
- Training of Logistic Regression and XGBoost models (with hyperparameter tuning via GridSearch) plus ensemble combination.
- Calculation of custom metrics (ROC AUC, KS statistic) and plotting of ROC curves.
- Utilities & testing
Utility modules for data processing, model saving/loading, custom metric computation and unit tests for each module.
numpy
pandas
scikit-learn
xgboost
tensorflow
keras
matplotlib
scipy
joblib
shap
-
Clone:
git clone [email protected]:avrtt/anomscoring-PoC.git cd anomscoring-PoC
-
Create a venv:
python -m venv venv source venv/bin/activate # Windows: venv\Scripts\activate
-
Install dependencies. Create a
requirements.txt
or install manually with:pip install numpy pandas scikit-learn xgboost tensorflow keras matplotlib scipy joblib shap
Run the integrated system:
python main.py --module all
Or run individual modules:
python fraud_detection.py
python credit_risk_scoring.py
Feel free to open PRs and issues.
MIT