This notebook performs sentiment analysis on Steam game reviews using three machine learning models:
- Multinomial Naive Bayes (Linear)
- Random Forest (Non-linear)
- SVM with RBF kernel (Non-linear)
It uses two datasets:
- Steam Reviews Dataset: A dataset containing raw reviews with language, game title, and a "recommended" label.
- Metacritic Dataset: A dataset with critic and user scores per game.
- Load and input data from datasets using
config.json - Clean and pre-process reviews (balance and filter reviews for model evaluation)
- Vectorize text using TF-IDF
- Train/test split + model training
- Evaluate models (accuracy, precision, recall, F1, confusion matrix)
- Compare model-predicted sentiment vs actual Metacritic scores (correlation + graphs)
- Bonus insights:
- Most common positive/negative words
- Random review samples (with model correctness for error analysis)
- No setup needed beyond having
config.jsonand the datasets in your right file path - Simply run each cell in order