Skip to content

SSHRIHARI006/EmailClassifier

Repository files navigation

Email Threat Classifier

A machine learning-powered email classification system that identifies NORMAL, SPAM, and FRAUD emails using a fine-tuned DistilBERT model and Gmail API integration.

Features

  • Fine-tuned DistilBERT transformer model for email classification
  • Real-time Gmail API integration
  • Interactive Streamlit dashboard
  • REST API with FastAPI
  • Three-class classification: NORMAL, SPAM, FRAUD

Prerequisites

  • Python 3.8+
  • Gmail API credentials (OAuth 2.0)
  • Weights & Biases account (optional, for training)

Installation

  1. Clone the repository and navigate to the project directory:
cd UDBHAV
  1. Install uv if not already installed:
curl -LsSf https://astral.sh/uv/install.sh | sh
  1. Install dependencies using uv:
uv sync
  1. Set up environment variables by creating a .env file:
GOOGLE_CLIENT_ID=your_client_id
GOOGLE_CLIENT_SECRET=your_client_secret
GOOGLE_REFRESH_TOKEN=your_refresh_token
EMAIL_ADDRESS=your_email@gmail.com
MODEL_PATH=./email_model
WANDB_API_KEY=your_wandb_key

Setup

1. Train the Model

Train the email classifier on your dataset:

uv run train.py

This will:

  • Load and preprocess final_dataset.csv
  • Fine-tune DistilBERT on email data
  • Save the trained model to ./email_model/
  • Generate evaluation metrics

2. Run the Streamlit Dashboard

Launch the interactive web interface:

uv run streamlit run app.py

Access the dashboard at http://localhost:8501

3. Run the FastAPI Server (Optional)

Start the REST API server:

uv run uvicorn server:app --reload

API endpoints:

  • POST /predict_email - Classify a single email
  • GET /scan_gmail - Fetch and classify recent Gmail messages

Project Structure

.
├── app.py              # Streamlit dashboard
├── server.py           # FastAPI REST API
├── train.py            # Model training script
├── utils.py            # Helper functions
├── final_dataset.csv   # Training dataset
├── email_model/        # Trained model directory
└── .env               # Environment variables

Usage

Dashboard

  1. Click "Fetch & Classify Last 10 Emails"
  2. View classified emails in categorized tabs
  3. Review confidence scores and labels

API

curl -X POST "http://localhost:8000/predict_email" \
  -H "Content-Type: application/json" \
  -d '{"text": "Congratulations! You won $1000000"}'

Model Details

  • Architecture: DistilBERT (distilbert-base-uncased)
  • Classes: NORMAL (0), SPAM (1), FRAUD (2)
  • Max Token Length: 256
  • Training: 3 epochs with weighted metrics

Gmail API Setup

  1. Create a project in Google Cloud Console
  2. Enable Gmail API
  3. Create OAuth 2.0 credentials
  4. Generate refresh token using OAuth 2.0 Playground
  5. Add credentials to .env file

About

Email Classifier for fraud and scam emails

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages