Analyze how travelers in February 2015 expressed their feelings on Twitter
Download the *.zip file from here.
https://www.kaggle.com/crowdflower/twitter-airline-sentiment
Only use the airline_sentiment
and text
columns during the analysis.
- Build a KNN Model.
- Split the data into train and test.
- Use the
similarity
score to find the nearest neighbor. Remember to adjustk
to optimize your model. - Use
accuracy
as your primary metric.
- Cluster the tweets into 3 groups, using KMeans.
- Perform PCA and lower the dimensionality of the clusters to a 2-d representation.
- Graph the clusters as a scatter plot.