Star Rating Prediction of Amazon Movie Review

Main Goal of the Project

Nowadays, a massive amount of reviews is available online. Besides offering a valuable wording review, it always comes with a rating from 0 to 5 stars. The goal of this project is to predict the star rating associated with user reviews from Amazon Movie Reviews using the available feature.

Data Fields:

ProductId - unique identifier for the product
UserId - unique identifier for the user HelpfulnessNumerator - number of users who found the review helpful
HelpfulnessDenominator - number of users who indicated whether they found the review helpful
Score - rating between 1 and 5
Time - timestamp for the review
Summary - brief summary of the review
Text - text of the review
Id - a unique identifier associated with a review

Model Applied

Before applying models, I applied some common NLP models to turn reviews (natural language) into numeric matrices and clean the data, including Vectorizing the word and finding the stem of each English word. I applied 4 different models onto the dataset: KNN, Naive Bayes, SVM, and Logistic Regression, and made some adjustments and evaluation based on their performances.

predict-constant.py to predict the same score for all rows in the test set
predict-knn.py to predict the score using KNN
Naive Bayes.ipynb to predict the score using KNN
SVM.ipynb and SVM_COMBINE.ipynb to predict the score using svm model on either only Text or the combination of Text and other feature like Summary
Logistic-Regression.ipynb to predict the score using Logistic-Regression

Short Summary

In general, Naive Bayes is a good way to start the prediction project since it creats a relatively precise prediction in a short time. But Logistic Regression have best performances when you increase the iteration. However, it requires a long time to preduce the outcome. More detailed comparison are in the Report_liyy.pdf file.

Dataset Citation

https://drive.google.com/drive/folders/1TGj8-EogM9MmPB8cMfFvmSsFqAeaJEae?usp=sharing J. McAuley and J. Leskovec. From amateurs to connoisseurs: modeling the evolution of user expertise through online reviews. WWW, 2013

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
Logistic-Regression.ipynb		Logistic-Regression.ipynb
Naive Bayes.ipynb		Naive Bayes.ipynb
README.md		README.md
SVM-COMBINE.ipynb		SVM-COMBINE.ipynb
SVM.ipynb		SVM.ipynb
feature_extraction.py		feature_extraction.py
predict-constant.py		predict-constant.py
predict-knn.py		predict-knn.py
report_liyy.pdf		report_liyy.pdf
score-distribution.png		score-distribution.png
test_setup.py		test_setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Star Rating Prediction of Amazon Movie Review

Main Goal of the Project

Data Fields:

Model Applied

Short Summary

Dataset Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Star Rating Prediction of Amazon Movie Review

Main Goal of the Project

Data Fields:

Model Applied

Short Summary

Dataset Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages