A Real-time Twitter sentiment analysis showcased with the help of Twitter API's , NLTK and WORD2VEC
Visit our Website here
- You will see this home page on visiting the site via Desktop and Mobile.
- Select any topic:
- On selecting any topic, suppose say UEFA, it will open sentiment analysis on UEFA
- On the left side, we have chart depicting the percentage of tweets sentiment for over an year.
- On the right side, we see a text bar, where u can enter tweet to find its sentiment percentage.
- Bottom to the text bar, you see live tweets with its sentiment emoji at right.
- Enter any tweet in the tweet bar to check its sentiment percentage
- Tweepy is an open source Python package that gives you a very convenient way to access the Twitter API with Python.
- It's easy to understand its documentation here
- To start pulling tweets from tweepy, you need to install tweepy using
!pip install tweepy
and then import as follows:
import tweepy
- To authorize API, you need to create a Twitter Developer Account from here
- We used this API for live pulling of tweets and showcasing them on live website
def pull_tweets(query, co=50):
fetch_tweets = api.search(q="#"+query,count=co)
- To check out full code, click here
"Twint is an advanced tool for Twitter scrapping. We can use this tool to scrape any user’s followers, following, tweets, etc. without having to use Twitter API".
You can check out more about TWINT API here
Twint API is more preferable than Tweepy API because of its many benifits
- No restriction in scrapping tweets
- No hassle in setting up
- Can be used anonymously without Twitter sign-up.
In this project, we extracted over 3 lakh tweets for 5 topics via Twint API ranging from start of 2020 to present date.
- NLTK also known as Natural Language Toolkit is the library used mainly for Natural Language text processing
- NLTK is used for data cleaning and removal of unnecessary words which doesn't make sense.
- We use NLTK's stopwords and lemmatizer to clean the unwanted part of tweets.
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
lemm = WordNetLemmatizer()
stop_words = stopwords.words("english")
- You can check out whole data cleaning of tweets here
- After cleaning of Raw tweets, they are passed into Word2Vec model
- We import Gensim's Word2Vec model as follows
from gensim.models.phrases import Phrases, Phraser
from gensim.models import Word2Vec
from gensim.test.utils import get_tmpfile
from gensim.models import KeyedVectors
- Model is set to the following parameters
w2v_model = Word2Vec(min_count=3,
w2v_model.build_vocab(sentences, progress_per=50000)
For full code of how to build Word2Vec model, click here
- The obtained vector forms of words from Word2Vec model are processed next into KMeans to divide it into 3 clusters and sentiment score of corresponding cluster is saved as per the cluster value.
- The parameters of the KMeans is as follows
model = KMeans(n_clusters=3, max_iter=1000, random_state=True, n_init=50).fit(X=word_vectors.vectors.astype('double'))
- To view the full code of KMeans, click here
- Tf-Idf Vectorizer is applied on the cleaned dataset to calculate the tf-idf score of every word and is combined with the previous sentiment score
- Import Tf-Idf vectorizer as follows
from sklearn.feature_extraction.text import TfidfVectorizer
- You can view the full code of how we utilized Tf-Idf here
A basic responsive Flask App which is designed using:
- Basic JavaScript and Python
- Used ChartJS for better visualisation of Data
- Visualising the percentage using ChartJS\
- Depoyed using Heroku
- Clone the repo
git clone https://github.com/Zeph-T/RealTime-Twitter-Sentiment-Analysis.git
- Navigate to Flask Folder, create a virtual environment
python3 -m venv <your_environment_name>
- Activate the virtual environment using the following command
Source <environment name>/bin/activate
- Install all the required Packages using the command
pip install -r requirements.txt
- Run the .py file
run python3 app.py