Skip to content

varchasv8/RealTime-Twitter-Sentiment-Analysis

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

95 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

All Contributors All Technologies

RealTime-Twitter-Sentiment-Analysis

A Real-time Twitter sentiment analysis showcased with the help of Twitter API's , NLTK and WORD2VEC

Visit our Website here


Live Demo

  • You will see this home page on visiting the site via Desktop and Mobile.
  • Select any topic:
    • COVID
    • BTS
    • UEFA
    • REACT
    • BCCI
  • On selecting any topic, suppose say UEFA, it will open sentiment analysis on UEFA

  • On the left side, we have chart depicting the percentage of tweets sentiment for over an year.
  • On the right side, we see a text bar, where u can enter tweet to find its sentiment percentage.
  • Bottom to the text bar, you see live tweets with its sentiment emoji at right.
  • Enter any tweet in the tweet bar to check its sentiment percentage

drawingdrawing


Technologies Used

1. Tweepy API

  • Tweepy is an open source Python package that gives you a very convenient way to access the Twitter API with Python.
  • It's easy to understand its documentation here
  • To start pulling tweets from tweepy, you need to install tweepy using !pip install tweepy and then import as follows:
 import tweepy

 auth=tweepy.OAuthHandler(api_key,api_secret)
 auth.set_access_token(access_token,access_secret)
 api=tweepy.API(auth,wait_on_rate_limit=True)
  • To authorize API, you need to create a Twitter Developer Account from here
  • We used this API for live pulling of tweets and showcasing them on live website
def pull_tweets(query, co=50):
    fetch_tweets = api.search(q="#"+query,count=co)
  • To check out full code, click here

2. Twint API

  • "Twint is an advanced tool for Twitter scrapping. We can use this tool to scrape any user’s followers, following, tweets, etc. without having to use Twitter API".

  • You can check out more about TWINT API here

  • Twint API is more preferable than Tweepy API because of its many benifits

    • No restriction in scrapping tweets
    • No hassle in setting up
    • Can be used anonymously without Twitter sign-up.
  • In this project, we extracted over 3 lakh tweets for 5 topics via Twint API ranging from start of 2020 to present date.


3. NLTK

  • NLTK also known as Natural Language Toolkit is the library used mainly for Natural Language text processing
  • NLTK is used for data cleaning and removal of unnecessary words which doesn't make sense.
  • We use NLTK's stopwords and lemmatizer to clean the unwanted part of tweets.
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer

lemm = WordNetLemmatizer()

stop_words = stopwords.words("english")
  • You can check out whole data cleaning of tweets here

4. Gensim's Word2Vec model

  • After cleaning of Raw tweets, they are passed into Word2Vec model
  • We import Gensim's Word2Vec model as follows
from gensim.models.phrases import Phrases, Phraser
from gensim.models import Word2Vec
from gensim.test.utils import get_tmpfile
from gensim.models import KeyedVectors
  • Model is set to the following parameters
w2v_model = Word2Vec(min_count=3,
                   window=4,
                   size=300,
                   sample=1e-5, 
                   alpha=0.03, 
                   min_alpha=0.0007, 
                   negative=20,
                   )

w2v_model.build_vocab(sentences, progress_per=50000)

For full code of how to build Word2Vec model, click here


5. KMeans Clustering

  • The obtained vector forms of words from Word2Vec model are processed next into KMeans to divide it into 3 clusters and sentiment score of corresponding cluster is saved as per the cluster value.
  • The parameters of the KMeans is as follows
    model = KMeans(n_clusters=3, max_iter=1000, random_state=True, n_init=50).fit(X=word_vectors.vectors.astype('double'))
  • To view the full code of KMeans, click here

6. Tf-Idf Vectorizer

  • Tf-Idf Vectorizer is applied on the cleaned dataset to calculate the tf-idf score of every word and is combined with the previous sentiment score
  • Import Tf-Idf vectorizer as follows
from sklearn.feature_extraction.text import TfidfVectorizer
  • You can view the full code of how we utilized Tf-Idf here

Deployment

A basic responsive Flask App which is designed using:

  • HTML,CSS
  • Basic JavaScript and Python
  • Used ChartJS for better visualisation of Data
  • Visualising the percentage using ChartJS\
  • Depoyed using Heroku

Setting up the Project

  1. Clone the repo
git clone https://github.com/Zeph-T/RealTime-Twitter-Sentiment-Analysis.git
  1. Navigate to Flask Folder, create a virtual environment
python3 -m venv <your_environment_name>
  1. Activate the virtual environment using the following command
Source <environment name>/bin/activate
  1. Install all the required Packages using the command
pip install -r requirements.txt
  1. Run the .py file
run python3 app.py

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 58.7%
  • JavaScript 28.5%
  • HTML 7.3%
  • Python 4.8%
  • CSS 0.7%