Nick Tomasetti, Sam Nguyen, and Abhay Mathur
Reach out if you have any questions
This is our analysis of the Kaggle Yelp Dataset. We set out to solve two problems:
- Use review text to classify a business's categories. Ex: Seafood, Nail Salon
- Predict if a business will close using review data and a business's metatdata
We utilized WPI's ARC Turing Cluster to as well as the full dataset to accomplish this.
Read our report: Report
Look at our slides: Slides
Files for part 1:
task1_bert_preprocessing.py- Preprocess the raw yelp review data into inputs and expected value tensors
task1_bert_training.py- Fine-tune BERT model using the data
task1_bert_analysis.py- Evaluate the fine-tuned Bert model and report metrics
Files for part 2:
part2preprocessing.ipynb- Preprocess the raw yelp review and business data into inputs and expected value tensors
part2model2.ipynb- Train the DNN, Bert hybrid model and evaluate it
- Predict which factors led to business closure predictions
- Evaluate model
Note: This repository is not suited for simply running the code, specific setup is required
Used technologies:
- Torch
- BERT/transformers
- SHAP
- Numpy
- Pandas
- NLTK
- spaCy