Skip to content

Latest commit

 

History

History
27 lines (16 loc) · 712 Bytes

nlp-assignment.md

File metadata and controls

27 lines (16 loc) · 712 Bytes

Twitter US Airline Sentiment

About

Analyze how travelers in February 2015 expressed their feelings on Twitter

Data

Download the *.zip file from here.

https://www.kaggle.com/crowdflower/twitter-airline-sentiment

Only use the airline_sentiment and text columns during the analysis.

Part I

  • Build a KNN Model.
  • Split the data into train and test.
  • Use the similarity score to find the nearest neighbor. Remember to adjust k to optimize your model.
  • Use accuracy as your primary metric.

Part II

  • Cluster the tweets into 3 groups, using KMeans.
  • Perform PCA and lower the dimensionality of the clusters to a 2-d representation.
  • Graph the clusters as a scatter plot.