Skip to content

Real-time UPI fraud detection system using XGBoost, SVM, and ensemble learning

Notifications You must be signed in to change notification settings

TheCodingAyush/UPI-Fraud-Detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

UPI Fraud Detection System

A full-stack machine learning project that detects fraudulent UPI-like transactions in real time. It includes a model training pipeline, a Flask API server, and a modern web UI for interactive testing.


Features

  • Real-time fraud scoring via REST API
  • Ensemble learning: XGBoost (with SMOTE + PCA), Logistic Regression, Decision Tree (+ optional SVM)
  • Robust handling of class imbalance using SMOTE
  • Dimensionality reduction with PCA for speed and generalization
  • Weighted ensemble with rule-based red flags (time and balance consistency checks)
  • Clean web UI served from Flask for quick experiments

Project Structure

upi_fraud_detection/
├── app.py                     # Flask API server (serves UI + predictions)
├── train_models.py            # Training pipeline (SMOTE, PCA, XGBoost + others)
├── test_api.py                # Simple script to exercise /predict
├── requirements.txt           # Python dependencies
├── .gitignore                 # Git ignore rules
├── data/
│   └── PS_20174392719_1491204439457_log.csv  # PaySim dataset
├── frontend/
│   ├── index.html             # Web interface
│   ├── upi_directory.json     # Sample UPI directory data
│   ├── css/
│   │   └── styles.css         # Complete stylesheet with animations
│   └── js/
│       ├── main.js            # Application logic and API calls
│       └── animations.js      # GSAP and AOS animation setup
└── models/                    # Trained model artifacts (created by training)
    └── fraud_detector_all.pkl # Saved models + preprocessors

Dataset

  • PaySim (synthetic mobile money transactions) from Kaggle: https://www.kaggle.com/datasets/ealaxi/paysim1
  • Highly imbalanced: fraud ≪ non-fraud
  • Note: The dataset is not included in this repository due to its size (470MB). Download it from Kaggle and place the CSV file in the data/ folder before training models.
  • Key engineered features used:
    • hour, day from step
    • errorBalanceOrig = newbalanceOrig + amountoldbalanceOrg
    • errorBalanceDest = oldbalanceDest + amountnewbalanceDest
    • Transaction type encoded as type_encoded

Quick Start

  1. Install dependencies
pip install -r requirements.txt
  1. Train models (creates models/fraud_detector_all.pkl)
python train_models.py
  1. Run the API server
python app.py
  1. Open the UI

Optional: Test the API from script

python test_api.py

Training Pipeline (train_models.py)

  • Loads CSV from data/
  • Feature engineering: time splits + balance consistency features
  • Split: stratified 80/20 train/test
  • Scale features (StandardScaler)
  • Paper 1 model:
    • Apply SMOTE (handle class imbalance)
    • Apply PCA (retain 95% variance)
    • Train XGBoost (AUC eval)
  • Paper 2 models:
    • Logistic Regression
    • Decision Tree
    • SVM (skipped by default for speed; placeholder in ensemble)
  • Saves everything (models + scaler + PCA + label encoder) to models/fraud_detector_all.pkl

Outputs printed:

  • Accuracy, AUC-ROC, and classification report for XGBoost
  • Accuracies for LR, DT, and ensemble

API Server (app.py)

  • Serves the UI (frontend/index.html)
  • Loads saved artifacts from models/fraud_detector_all.pkl
  • Endpoints:

GET /health

Health check

Response:

{
  "status": "healthy",
  "models_loaded": true
}

GET /model-info

Model metadata and claimed accuracies

GET /upi_directory.json

Serves static UPI directory JSON (used by UI)

POST /predict

Score a single transaction. Request body:

{
  "amount": 100000,
  "oldbalanceOrg": 150000,
  "newbalanceOrig": 50000,
  "oldbalanceDest": 0,
  "newbalanceDest": 100000,
  "transactionType": "TRANSFER",
  "hour": 14,
  "day": 15
}

Response (abridged):

{
  "success": true,
  "results": {
    "paper1": { "algorithm": "XGBoost + SMOTE + PCA", "probability": 0.87, "prediction": 1 },
    "paper2_lr": { "algorithm": "Logistic Regression", "probability": 0.62, "prediction": 1 },
    "paper2_svm": { "algorithm": "Support Vector Machine (Not Trained)", "probability": 0.62, "prediction": 1 },
    "paper2_dt": { "algorithm": "Decision Tree", "probability": 0.58, "prediction": 1 },
    "ensemble": { "algorithm": "Weighted Ensemble (XGBoost 40%)", "probability": 0.73, "prediction": 1, "recommendation": "High Risk - Manual review required" }
  },
  "transaction": { /* echoed request */ }
}

cURL example:

curl -X POST http://localhost:5000/predict \
  -H "Content-Type: application/json" \
  -d '{
    "amount": 50000,
    "oldbalanceOrg": 10000,
    "newbalanceOrig": -40000,
    "oldbalanceDest": 0,
    "newbalanceDest": 50000,
    "transactionType": "PAYMENT",
    "hour": 14,
    "day": 15
  }'

Frontend (frontend/)

Modern, responsive web interface with professional styling and animations:

  • index.html: Main UI with two modes - "Check Past Transaction" and "Check & Pay"
  • css/styles.css: Complete stylesheet with navy/teal color palette, gradients, shadows, and keyframe animations
  • js/main.js: Application logic, API calls, UPI directory search, and result rendering
  • js/animations.js: GSAP and AOS (Animate On Scroll) animation orchestration
  • upi_directory.json: Sample UPI directory for recipient validation

Features:

  • Quick test buttons for common scenarios (normal, suspicious, high-risk)
  • Real-time UPI ID validation and search
  • Transaction history with local storage
  • Animated result cards with risk gauges and confidence scores
  • Fully responsive design for mobile and desktop

Modeling Details

  • Paper 1: XGBoost with SMOTE + PCA (best overall AUC/accuracy)
  • Paper 2: Logistic Regression, Decision Tree, (SVM optional)
  • Weighted ensemble for final decision:
    • XGBoost 40%, SVM 20%, LR 20%, DT 20% (SVM falls back to LR if skipped)
  • Rule-based checks add extra signals:
    • Large-amount anomalies
    • Balance inconsistencies (errorBalanceOrig, errorBalanceDest)
    • Off-hours transactions (e.g., < 6 or > 22)

Troubleshooting

  • "Models not found" when starting app.py:
    • Run training first: python train_models.py
    • Confirm models/fraud_detector_all.pkl exists
  • Version issues installing wheels (Windows):
    • Ensure a recent Python and pip; optionally create a fresh venv
  • Server starts but UI empty:
  • 500 error on /predict:
    • Check request fields match the README example
    • Ensure scaler/PCA/model versions match the trained artifacts

Roadmap

  • Optional SVM training switch and progress logging
  • Persisted evaluation artifacts (confusion matrix, ROC plots)
  • Threshold tuning by segment (type/amount bands)
  • Dockerfile + compose for easy deploy
  • Authentication and rate limiting for API

License

Add a license file if you plan to open source or distribute.

About

Real-time UPI fraud detection system using XGBoost, SVM, and ensemble learning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published