Skip to content

Latest commit

 

History

History
46 lines (29 loc) · 1.55 KB

File metadata and controls

46 lines (29 loc) · 1.55 KB

email-triage-classifier

IT service ticket email classifier - an experiment to see if IT support emails could be triaged effectively using partially synthetic data to train an XLNetTokenizer + Inference model. Minimal real IT support data utilised (Service locations); everything else (subject lines, email contents, etc. -- taken from open source datasets and generated using python and Faker).

Quick Start

Prerequisites

  • Docker and Docker Compose installed
  • Git

Run Instructions

# Clone the repository

git clone https://github.com/enzojoly/email-triage-classifier.git

cd email-triage-classifier

# Launch JupyterLab environment
docker compose -f jupyter.yaml up -d

JupyterLab will be available at http://localhost:8888

📓 Project Notebooks

Core Components

  1. Model Training Pipeline XLNet model training workflow for email classification

  2. Inference Testing Testing and validation of the trained classifier on service ticket predictions

  3. Data Generation Demo Synthetic data enrichment using Faker library for training data augmentation

Project Structure

  • notebooks/: Jupyter notebooks for training, inference, and data generation
  • processed_data/: Enhanced datasets and service keyword mappings
  • raw_data/: Original ticket data and email sources
  • requirements.txt: Python dependencies
  • Dockerfile & jupyter.yaml: Containerized development environment