A Real-time Twitter sentiment analysis showcased with the help of Twitter API's , NLTK and WORD2VEC
Visit our Website here
- You will see this home page on visiting the site via Desktop and Mobile.
- Select any topic:
- COVID
- BTS
- UEFA
- REACT
- BCCI
- On selecting any topic, suppose say UEFA, it will open sentiment analysis on UEFA
- On the left side, we have chart depicting the percentage of tweets sentiment for over an year.
- On the right side, we see a text bar, where u can enter tweet to find its sentiment percentage.
- Bottom to the text bar, you see live tweets with its sentiment emoji at right.
- Enter any tweet in the tweet bar to check its sentiment percentage
- Tweepy is an open source Python package that gives you a very convenient way to access the Twitter API with Python.
- It's easy to understand its documentation here
- To start pulling tweets from tweepy, you need to install tweepy using
!pip install tweepy
and then import as follows:
import tweepy
auth=tweepy.OAuthHandler(api_key,api_secret)
auth.set_access_token(access_token,access_secret)
api=tweepy.API(auth,wait_on_rate_limit=True)
- To authorize API, you need to create a Twitter Developer Account from here
- We used this API for live pulling of tweets and showcasing them on live website
def pull_tweets(query, co=50):
fetch_tweets = api.search(q="#"+query,count=co)
- To check out full code, click here
-
"Twint is an advanced tool for Twitter scrapping. We can use this tool to scrape any user’s followers, following, tweets, etc. without having to use Twitter API".
-
You can check out more about TWINT API here
-
Twint API is more preferable than Tweepy API because of its many benifits
- No restriction in scrapping tweets
- No hassle in setting up
- Can be used anonymously without Twitter sign-up.
-
In this project, we extracted over 3 lakh tweets for 5 topics via Twint API ranging from start of 2020 to present date.
- NLTK also known as Natural Language Toolkit is the library used mainly for Natural Language text processing
- NLTK is used for data cleaning and removal of unnecessary words which doesn't make sense.
- We use NLTK's stopwords and lemmatizer to clean the unwanted part of tweets.
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
lemm = WordNetLemmatizer()
stop_words = stopwords.words("english")
- You can check out whole data cleaning of tweets here
- After cleaning of Raw tweets, they are passed into Word2Vec model
- We import Gensim's Word2Vec model as follows
from gensim.models.phrases import Phrases, Phraser
from gensim.models import Word2Vec
from gensim.test.utils import get_tmpfile
from gensim.models import KeyedVectors
- Model is set to the following parameters
w2v_model = Word2Vec(min_count=3,
window=4,
size=300,
sample=1e-5,
alpha=0.03,
min_alpha=0.0007,
negative=20,
)
w2v_model.build_vocab(sentences, progress_per=50000)
For full code of how to build Word2Vec model, click here
- The obtained vector forms of words from Word2Vec model are processed next into KMeans to divide it into 3 clusters and sentiment score of corresponding cluster is saved as per the cluster value.
- The parameters of the KMeans is as follows
model = KMeans(n_clusters=3, max_iter=1000, random_state=True, n_init=50).fit(X=word_vectors.vectors.astype('double'))
- To view the full code of KMeans, click here
- Tf-Idf Vectorizer is applied on the cleaned dataset to calculate the tf-idf score of every word and is combined with the previous sentiment score
- Import Tf-Idf vectorizer as follows
from sklearn.feature_extraction.text import TfidfVectorizer
- You can view the full code of how we utilized Tf-Idf here
A basic responsive Flask App which is designed using:
- HTML,CSS
- Basic JavaScript and Python
- Used ChartJS for better visualisation of Data
- Visualising the percentage using ChartJS\
- Depoyed using Heroku
- Clone the repo
git clone https://github.com/Zeph-T/RealTime-Twitter-Sentiment-Analysis.git
- Navigate to Flask Folder, create a virtual environment
python3 -m venv <your_environment_name>
- Activate the virtual environment using the following command
Source <environment name>/bin/activate
- Install all the required Packages using the command
pip install -r requirements.txt
- Run the .py file
run python3 app.py