A full-stack machine learning project that detects fraudulent UPI-like transactions in real time. It includes a model training pipeline, a Flask API server, and a modern web UI for interactive testing.
- Real-time fraud scoring via REST API
- Ensemble learning: XGBoost (with SMOTE + PCA), Logistic Regression, Decision Tree (+ optional SVM)
- Robust handling of class imbalance using SMOTE
- Dimensionality reduction with PCA for speed and generalization
- Weighted ensemble with rule-based red flags (time and balance consistency checks)
- Clean web UI served from Flask for quick experiments
upi_fraud_detection/
├── app.py # Flask API server (serves UI + predictions)
├── train_models.py # Training pipeline (SMOTE, PCA, XGBoost + others)
├── test_api.py # Simple script to exercise /predict
├── requirements.txt # Python dependencies
├── .gitignore # Git ignore rules
├── data/
│ └── PS_20174392719_1491204439457_log.csv # PaySim dataset
├── frontend/
│ ├── index.html # Web interface
│ ├── upi_directory.json # Sample UPI directory data
│ ├── css/
│ │ └── styles.css # Complete stylesheet with animations
│ └── js/
│ ├── main.js # Application logic and API calls
│ └── animations.js # GSAP and AOS animation setup
└── models/ # Trained model artifacts (created by training)
└── fraud_detector_all.pkl # Saved models + preprocessors
- PaySim (synthetic mobile money transactions) from Kaggle: https://www.kaggle.com/datasets/ealaxi/paysim1
- Highly imbalanced: fraud ≪ non-fraud
- Note: The dataset is not included in this repository due to its size (470MB). Download it from Kaggle and place the CSV file in the
data/folder before training models. - Key engineered features used:
hour,dayfromsteperrorBalanceOrig=newbalanceOrig+amount−oldbalanceOrgerrorBalanceDest=oldbalanceDest+amount−newbalanceDest- Transaction type encoded as
type_encoded
- Install dependencies
pip install -r requirements.txt- Train models (creates
models/fraud_detector_all.pkl)
python train_models.py- Run the API server
python app.py- Open the UI
- Navigate to: http://localhost:5000
Optional: Test the API from script
python test_api.py- Loads CSV from
data/ - Feature engineering: time splits + balance consistency features
- Split: stratified 80/20 train/test
- Scale features (StandardScaler)
- Paper 1 model:
- Apply SMOTE (handle class imbalance)
- Apply PCA (retain 95% variance)
- Train XGBoost (AUC eval)
- Paper 2 models:
- Logistic Regression
- Decision Tree
- SVM (skipped by default for speed; placeholder in ensemble)
- Saves everything (models + scaler + PCA + label encoder) to
models/fraud_detector_all.pkl
Outputs printed:
- Accuracy, AUC-ROC, and classification report for XGBoost
- Accuracies for LR, DT, and ensemble
- Serves the UI (frontend/index.html)
- Loads saved artifacts from
models/fraud_detector_all.pkl - Endpoints:
Health check
Response:
{
"status": "healthy",
"models_loaded": true
}Model metadata and claimed accuracies
Serves static UPI directory JSON (used by UI)
Score a single transaction. Request body:
{
"amount": 100000,
"oldbalanceOrg": 150000,
"newbalanceOrig": 50000,
"oldbalanceDest": 0,
"newbalanceDest": 100000,
"transactionType": "TRANSFER",
"hour": 14,
"day": 15
}Response (abridged):
{
"success": true,
"results": {
"paper1": { "algorithm": "XGBoost + SMOTE + PCA", "probability": 0.87, "prediction": 1 },
"paper2_lr": { "algorithm": "Logistic Regression", "probability": 0.62, "prediction": 1 },
"paper2_svm": { "algorithm": "Support Vector Machine (Not Trained)", "probability": 0.62, "prediction": 1 },
"paper2_dt": { "algorithm": "Decision Tree", "probability": 0.58, "prediction": 1 },
"ensemble": { "algorithm": "Weighted Ensemble (XGBoost 40%)", "probability": 0.73, "prediction": 1, "recommendation": "High Risk - Manual review required" }
},
"transaction": { /* echoed request */ }
}cURL example:
curl -X POST http://localhost:5000/predict \
-H "Content-Type: application/json" \
-d '{
"amount": 50000,
"oldbalanceOrg": 10000,
"newbalanceOrig": -40000,
"oldbalanceDest": 0,
"newbalanceDest": 50000,
"transactionType": "PAYMENT",
"hour": 14,
"day": 15
}'Modern, responsive web interface with professional styling and animations:
- index.html: Main UI with two modes - "Check Past Transaction" and "Check & Pay"
- css/styles.css: Complete stylesheet with navy/teal color palette, gradients, shadows, and keyframe animations
- js/main.js: Application logic, API calls, UPI directory search, and result rendering
- js/animations.js: GSAP and AOS (Animate On Scroll) animation orchestration
- upi_directory.json: Sample UPI directory for recipient validation
Features:
- Quick test buttons for common scenarios (normal, suspicious, high-risk)
- Real-time UPI ID validation and search
- Transaction history with local storage
- Animated result cards with risk gauges and confidence scores
- Fully responsive design for mobile and desktop
- Paper 1: XGBoost with SMOTE + PCA (best overall AUC/accuracy)
- Paper 2: Logistic Regression, Decision Tree, (SVM optional)
- Weighted ensemble for final decision:
- XGBoost 40%, SVM 20%, LR 20%, DT 20% (SVM falls back to LR if skipped)
- Rule-based checks add extra signals:
- Large-amount anomalies
- Balance inconsistencies (
errorBalanceOrig,errorBalanceDest) - Off-hours transactions (e.g., < 6 or > 22)
- "Models not found" when starting
app.py:- Run training first:
python train_models.py - Confirm
models/fraud_detector_all.pklexists
- Run training first:
- Version issues installing wheels (Windows):
- Ensure a recent Python and pip; optionally create a fresh venv
- Server starts but UI empty:
- Visit http://localhost:5000 (root serves the UI)
- 500 error on /predict:
- Check request fields match the README example
- Ensure scaler/PCA/model versions match the trained artifacts
- Optional SVM training switch and progress logging
- Persisted evaluation artifacts (confusion matrix, ROC plots)
- Threshold tuning by segment (type/amount bands)
- Dockerfile + compose for easy deploy
- Authentication and rate limiting for API
Add a license file if you plan to open source or distribute.