UPI Fraud Detection System

A full-stack machine learning project that detects fraudulent UPI-like transactions in real time. It includes a model training pipeline, a Flask API server, and a modern web UI for interactive testing.

Features

Real-time fraud scoring via REST API
Ensemble learning: XGBoost (with SMOTE + PCA), Logistic Regression, Decision Tree (+ optional SVM)
Robust handling of class imbalance using SMOTE
Dimensionality reduction with PCA for speed and generalization
Weighted ensemble with rule-based red flags (time and balance consistency checks)
Clean web UI served from Flask for quick experiments

Project Structure

upi_fraud_detection/
├── app.py                     # Flask API server (serves UI + predictions)
├── train_models.py            # Training pipeline (SMOTE, PCA, XGBoost + others)
├── test_api.py                # Simple script to exercise /predict
├── requirements.txt           # Python dependencies
├── .gitignore                 # Git ignore rules
├── data/
│   └── PS_20174392719_1491204439457_log.csv  # PaySim dataset
├── frontend/
│   ├── index.html             # Web interface
│   ├── upi_directory.json     # Sample UPI directory data
│   ├── css/
│   │   └── styles.css         # Complete stylesheet with animations
│   └── js/
│       ├── main.js            # Application logic and API calls
│       └── animations.js      # GSAP and AOS animation setup
└── models/                    # Trained model artifacts (created by training)
    └── fraud_detector_all.pkl # Saved models + preprocessors

Dataset

PaySim (synthetic mobile money transactions) from Kaggle: https://www.kaggle.com/datasets/ealaxi/paysim1
Highly imbalanced: fraud ≪ non-fraud
Note: The dataset is not included in this repository due to its size (470MB). Download it from Kaggle and place the CSV file in the data/ folder before training models.
Key engineered features used:
- hour, day from step
- errorBalanceOrig = newbalanceOrig + amount − oldbalanceOrg
- errorBalanceDest = oldbalanceDest + amount − newbalanceDest
- Transaction type encoded as type_encoded

Quick Start

Install dependencies

pip install -r requirements.txt

Train models (creates models/fraud_detector_all.pkl)

python train_models.py

Run the API server

python app.py

Open the UI

Navigate to: http://localhost:5000

Optional: Test the API from script

python test_api.py

Training Pipeline (train_models.py)

Loads CSV from data/
Feature engineering: time splits + balance consistency features
Split: stratified 80/20 train/test
Scale features (StandardScaler)
Paper 1 model:
- Apply SMOTE (handle class imbalance)
- Apply PCA (retain 95% variance)
- Train XGBoost (AUC eval)
Paper 2 models:
- Logistic Regression
- Decision Tree
- SVM (skipped by default for speed; placeholder in ensemble)
Saves everything (models + scaler + PCA + label encoder) to models/fraud_detector_all.pkl

Outputs printed:

Accuracy, AUC-ROC, and classification report for XGBoost
Accuracies for LR, DT, and ensemble

API Server (app.py)

Serves the UI (frontend/index.html)
Loads saved artifacts from models/fraud_detector_all.pkl
Endpoints:

GET /health

Health check

Response:

{
  "status": "healthy",
  "models_loaded": true
}

GET /model-info

Model metadata and claimed accuracies

GET /upi_directory.json

Serves static UPI directory JSON (used by UI)

POST /predict

Score a single transaction. Request body:

{
  "amount": 100000,
  "oldbalanceOrg": 150000,
  "newbalanceOrig": 50000,
  "oldbalanceDest": 0,
  "newbalanceDest": 100000,
  "transactionType": "TRANSFER",
  "hour": 14,
  "day": 15
}

Response (abridged):

{
  "success": true,
  "results": {
    "paper1": { "algorithm": "XGBoost + SMOTE + PCA", "probability": 0.87, "prediction": 1 },
    "paper2_lr": { "algorithm": "Logistic Regression", "probability": 0.62, "prediction": 1 },
    "paper2_svm": { "algorithm": "Support Vector Machine (Not Trained)", "probability": 0.62, "prediction": 1 },
    "paper2_dt": { "algorithm": "Decision Tree", "probability": 0.58, "prediction": 1 },
    "ensemble": { "algorithm": "Weighted Ensemble (XGBoost 40%)", "probability": 0.73, "prediction": 1, "recommendation": "High Risk - Manual review required" }
  },
  "transaction": { /* echoed request */ }
}

cURL example:

curl -X POST http://localhost:5000/predict \
  -H "Content-Type: application/json" \
  -d '{
    "amount": 50000,
    "oldbalanceOrg": 10000,
    "newbalanceOrig": -40000,
    "oldbalanceDest": 0,
    "newbalanceDest": 50000,
    "transactionType": "PAYMENT",
    "hour": 14,
    "day": 15
  }'

Frontend (frontend/)

Modern, responsive web interface with professional styling and animations:

index.html: Main UI with two modes - "Check Past Transaction" and "Check & Pay"
css/styles.css: Complete stylesheet with navy/teal color palette, gradients, shadows, and keyframe animations
js/main.js: Application logic, API calls, UPI directory search, and result rendering
js/animations.js: GSAP and AOS (Animate On Scroll) animation orchestration
upi_directory.json: Sample UPI directory for recipient validation

Features:

Quick test buttons for common scenarios (normal, suspicious, high-risk)
Real-time UPI ID validation and search
Transaction history with local storage
Animated result cards with risk gauges and confidence scores
Fully responsive design for mobile and desktop

Modeling Details

Paper 1: XGBoost with SMOTE + PCA (best overall AUC/accuracy)
Paper 2: Logistic Regression, Decision Tree, (SVM optional)
Weighted ensemble for final decision:
- XGBoost 40%, SVM 20%, LR 20%, DT 20% (SVM falls back to LR if skipped)
Rule-based checks add extra signals:
- Large-amount anomalies
- Balance inconsistencies (errorBalanceOrig, errorBalanceDest)
- Off-hours transactions (e.g., < 6 or > 22)

Troubleshooting

"Models not found" when starting app.py:
- Run training first: python train_models.py
- Confirm models/fraud_detector_all.pkl exists
Version issues installing wheels (Windows):
- Ensure a recent Python and pip; optionally create a fresh venv
Server starts but UI empty:
- Visit http://localhost:5000 (root serves the UI)
500 error on /predict:
- Check request fields match the README example
- Ensure scaler/PCA/model versions match the trained artifacts

Roadmap

Optional SVM training switch and progress logging
Persisted evaluation artifacts (confusion matrix, ROC plots)
Threshold tuning by segment (type/amount bands)
Dockerfile + compose for easy deploy
Authentication and rate limiting for API

License

Add a license file if you plan to open source or distribute.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UPI Fraud Detection System

Features

Project Structure

Dataset

Quick Start

Training Pipeline (train_models.py)

API Server (app.py)

GET /health

GET /model-info

GET /upi_directory.json

POST /predict

Frontend (frontend/)

Modeling Details

Troubleshooting

Roadmap

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
frontend		frontend
models		models
.gitignore		.gitignore
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt
test_api.py		test_api.py
train_models.py		train_models.py

TheCodingAyush/UPI-Fraud-Detection

Folders and files

Latest commit

History

Repository files navigation

UPI Fraud Detection System

Features

Project Structure

Dataset

Quick Start

Training Pipeline (train_models.py)

API Server (app.py)

GET /health

GET /model-info

GET /upi_directory.json

POST /predict

Frontend (frontend/)

Modeling Details

Troubleshooting

Roadmap

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages