Skip to content

mastino/drug_review_classification

Repository files navigation

drug_review_classification

This project is a simple demostration of classifying drug ratings based on the text reviews associated with those ratings. It has three parts; some EDA to help the user understand the data and make decisions about the model, an option to compare three types of models or sweep hyperparameters of a forest of trees, and finally an option to run a default model.

Right now it isn't quite working as intended. Notably, the plot I try to make to analyze the hyperparameters is broken and the sweep doesn't show much difference . I chose not to push it because my computer is slow and iterating on this project is taking more time than I have.

EDA

The EDA script isn't much of anything, but can be run to do some counting of lengths and character types. Also, to do sentiment analysis of the text.

Model Comparison

The model_comparison script has three options:

  • --compare_models
  • --sweep_hyperparameters
  • --run_model

Compare models

Compare models trains and tests three models on the dataset:

  • 'Logistic Regression': LogisticRegression(random_state=42)
  • 'Random Forest': RandomForestClassifier(n_estimators=100, random_state=42)
  • 'SVM': SVC(kernel='rbf')

Sweep Hyperparameters

This options runs the Random Forest model with ranges of n_estimators and min_samples and prints the average accuracy to see what the user should select.

The printed results are imperfect because they only print average accuracy, but each class has different precision and recall results. To select final settings, the user shoudl review the more detailed results.

run model

This runs a model with hard coded parameters because the hyper parameter tuning didn't give any interesting results.

About

A simple example of building a model to classify the reviews of drugs from a Hugging Face dataset.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published