Email Threat Classifier

A machine learning-powered email classification system that identifies NORMAL, SPAM, and FRAUD emails using a fine-tuned DistilBERT model and Gmail API integration.

Features

Fine-tuned DistilBERT transformer model for email classification
Real-time Gmail API integration
Interactive Streamlit dashboard
REST API with FastAPI
Three-class classification: NORMAL, SPAM, FRAUD

Prerequisites

Python 3.8+
Gmail API credentials (OAuth 2.0)
Weights & Biases account (optional, for training)

Installation

Clone the repository and navigate to the project directory:

cd UDBHAV

Install uv if not already installed:

curl -LsSf https://astral.sh/uv/install.sh | sh

Install dependencies using uv:

uv sync

Set up environment variables by creating a .env file:

GOOGLE_CLIENT_ID=your_client_id
GOOGLE_CLIENT_SECRET=your_client_secret
GOOGLE_REFRESH_TOKEN=your_refresh_token
EMAIL_ADDRESS=your_email@gmail.com
MODEL_PATH=./email_model
WANDB_API_KEY=your_wandb_key

Setup

1. Train the Model

Train the email classifier on your dataset:

uv run train.py

This will:

Load and preprocess final_dataset.csv
Fine-tune DistilBERT on email data
Save the trained model to ./email_model/
Generate evaluation metrics

2. Run the Streamlit Dashboard

Launch the interactive web interface:

uv run streamlit run app.py

Access the dashboard at http://localhost:8501

3. Run the FastAPI Server (Optional)

Start the REST API server:

uv run uvicorn server:app --reload

API endpoints:

POST /predict_email - Classify a single email
GET /scan_gmail - Fetch and classify recent Gmail messages

Project Structure

.
├── app.py              # Streamlit dashboard
├── server.py           # FastAPI REST API
├── train.py            # Model training script
├── utils.py            # Helper functions
├── final_dataset.csv   # Training dataset
├── email_model/        # Trained model directory
└── .env               # Environment variables

Usage

Dashboard

Click "Fetch & Classify Last 10 Emails"
View classified emails in categorized tabs
Review confidence scores and labels

API

curl -X POST "http://localhost:8000/predict_email" \
  -H "Content-Type: application/json" \
  -d '{"text": "Congratulations! You won $1000000"}'

Model Details

Architecture: DistilBERT (distilbert-base-uncased)
Classes: NORMAL (0), SPAM (1), FRAUD (2)
Max Token Length: 256
Training: 3 epochs with weighted metrics

Gmail API Setup

Create a project in Google Cloud Console
Enable Gmail API
Create OAuth 2.0 credentials
Generate refresh token using OAuth 2.0 Playground
Add credentials to .env file

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Email Threat Classifier

Features

Prerequisites

Installation

Setup

1. Train the Model

2. Run the Streamlit Dashboard

3. Run the FastAPI Server (Optional)

Project Structure

Usage

Dashboard

API

Model Details

Gmail API Setup

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Email Threat Classifier

Features

Prerequisites

Installation

Setup

1. Train the Model

2. Run the Streamlit Dashboard

3. Run the FastAPI Server (Optional)

Project Structure

Usage

Dashboard

API

Model Details

Gmail API Setup