This project aims to create a model that is able to do two tasks simultaneously: detect entities from Yelp reviews and assign sentiment scores towards those entities (e.g. food brands). We evaluate various open-source tools: VADER, Stanford NLP, and Benepar. Results can be found on the notebooks.
Role | Responsibility | Full name | |
---|---|---|---|
Project Owner | Stakeholder | Felipe Penha | [email protected] |
Collaborator | Co-author / Project Lead | Tim Kartawijaya | [email protected] |
Collaborator | Co-author | Charlene Luo | [email protected] |
Collaborator | Co-author | Fernando Troeman | [email protected] |
Collaborator | Co-author | Nico Winata | [email protected] |
Collaborator | Co-author | Jefferson Zhou | [email protected] |
To reproduce results in the paper:
-
For final validation results (spearman correlation ranking results) run sentiment_and_parsing_rules_end_to_end_validation end to end. The dataset used in the notebook (restaurant_reviews_1900k.json) is restricted due to Yelp policy, so please contact Tim Kartawijaya.
-
For qualitative results, run qualitative_testings_VADER_Stanford_NLP_Benepar.
To use the package for your own dataset / brand list, follow the steps done in usage_example. Documentation on how neoway_nlp works can be found in the main file. (Further documentation needed here for better access). Data used in run() (restaurant_reviews_10k.csv and brand_list.csv) is restricted due to Yelp policy, so please contact Tim Kartawijaya.
- docs: contains documentation of the project (NOT COMPLETED).
- analysis: contains notebooks for modeling experimentation.
- final_validation: contains notebooks that produce the final qualitative/quantitative results.
- end_to_end_rules: contains notebooks that test the different parsing rules we developed.
- entity_recognition: contains notebooks that produce the Spacy ER model.
- preprocess: contains code to preprocess data from the Raw Yelp Reviews Dataset to digestable data.
- tests: contains files used for unit tests. (NOT COMPLETED).
- neoway_nlp: main Python package with source of the model.
Complete Yelp Reviews Dataset - https://www.yelp.com/dataset