RealTime-Twitter-Sentiment-Analysis

A Real-time Twitter sentiment analysis showcased with the help of Twitter API's , NLTK and WORD2VEC

Visit our Website here

Live Demo

You will see this home page on visiting the site via Desktop and Mobile.
Select any topic:
- COVID
- BTS
- UEFA
- REACT
- BCCI
On selecting any topic, suppose say UEFA, it will open sentiment analysis on UEFA

On the left side, we have chart depicting the percentage of tweets sentiment for over an year.
On the right side, we see a text bar, where u can enter tweet to find its sentiment percentage.
Bottom to the text bar, you see live tweets with its sentiment emoji at right.
Enter any tweet in the tweet bar to check its sentiment percentage

Technologies Used

1. Tweepy API

Tweepy is an open source Python package that gives you a very convenient way to access the Twitter API with Python.
It's easy to understand its documentation here
To start pulling tweets from tweepy, you need to install tweepy using !pip install tweepy and then import as follows:

 import tweepy

 auth=tweepy.OAuthHandler(api_key,api_secret)
 auth.set_access_token(access_token,access_secret)
 api=tweepy.API(auth,wait_on_rate_limit=True)

To authorize API, you need to create a Twitter Developer Account from here
We used this API for live pulling of tweets and showcasing them on live website

def pull_tweets(query, co=50):
    fetch_tweets = api.search(q="#"+query,count=co)

To check out full code, click here

2. Twint API

"Twint is an advanced tool for Twitter scrapping. We can use this tool to scrape any user’s followers, following, tweets, etc. without having to use Twitter API".
You can check out more about TWINT API here
Twint API is more preferable than Tweepy API because of its many benifits
- No restriction in scrapping tweets
- No hassle in setting up
- Can be used anonymously without Twitter sign-up.
In this project, we extracted over 3 lakh tweets for 5 topics via Twint API ranging from start of 2020 to present date.

3. NLTK

NLTK also known as Natural Language Toolkit is the library used mainly for Natural Language text processing
NLTK is used for data cleaning and removal of unnecessary words which doesn't make sense.
We use NLTK's stopwords and lemmatizer to clean the unwanted part of tweets.

from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer

lemm = WordNetLemmatizer()

stop_words = stopwords.words("english")

You can check out whole data cleaning of tweets here

4. Gensim's Word2Vec model

After cleaning of Raw tweets, they are passed into Word2Vec model
We import Gensim's Word2Vec model as follows

from gensim.models.phrases import Phrases, Phraser
from gensim.models import Word2Vec
from gensim.test.utils import get_tmpfile
from gensim.models import KeyedVectors

Model is set to the following parameters

w2v_model = Word2Vec(min_count=3,
                   window=4,
                   size=300,
                   sample=1e-5, 
                   alpha=0.03, 
                   min_alpha=0.0007, 
                   negative=20,
                   )

w2v_model.build_vocab(sentences, progress_per=50000)

For full code of how to build Word2Vec model, click here

5. KMeans Clustering

The obtained vector forms of words from Word2Vec model are processed next into KMeans to divide it into 3 clusters and sentiment score of corresponding cluster is saved as per the cluster value.

The parameters of the KMeans is as follows

model = KMeans(n_clusters=3, max_iter=1000, random_state=True, n_init=50).fit(X=word_vectors.vectors.astype('double'))

To view the full code of KMeans, click here

6. Tf-Idf Vectorizer

Tf-Idf Vectorizer is applied on the cleaned dataset to calculate the tf-idf score of every word and is combined with the previous sentiment score
Import Tf-Idf vectorizer as follows

from sklearn.feature_extraction.text import TfidfVectorizer

You can view the full code of how we utilized Tf-Idf here

Deployment

A basic responsive Flask App which is designed using:

HTML,CSS
Basic JavaScript and Python
Used ChartJS for better visualisation of Data
Visualising the percentage using ChartJS\
Depoyed using Heroku

Setting up the Project

Clone the repo

git clone https://github.com/Zeph-T/RealTime-Twitter-Sentiment-Analysis.git

Navigate to Flask Folder, create a virtual environment

python3 -m venv <your_environment_name>

Activate the virtual environment using the following command

Source <environment name>/bin/activate

Install all the required Packages using the command

pip install -r requirements.txt

Run the .py file

run python3 app.py

Name		Name	Last commit message	Last commit date
Latest commit History 95 Commits
.idea		.idea
.ipynb_checkpoints		.ipynb_checkpoints
Classes		Classes
Cleaned_Datasets		Cleaned_Datasets
Dictionaries		Dictionaries
Flask		Flask
Readme		Readme
Word2vec models		Word2vec models
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RealTime-Twitter-Sentiment-Analysis

Live Demo

Technologies Used

1. Tweepy API

2. Twint API

3. NLTK

4. Gensim's Word2Vec model

5. KMeans Clustering

6. Tf-Idf Vectorizer

Deployment

Setting up the Project

About

Releases

Packages

Languages

varchasv8/RealTime-Twitter-Sentiment-Analysis

Folders and files

Latest commit

History

Repository files navigation

RealTime-Twitter-Sentiment-Analysis

Live Demo

Technologies Used

1. Tweepy API

2. Twint API

3. NLTK

4. Gensim's Word2Vec model

5. KMeans Clustering

6. Tf-Idf Vectorizer

Deployment

Setting up the Project

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages