Skip to content

dkiswanto/sentiment-analyzer-nltk-twitter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Simple Sentiment Analyzer with NLTK for Twitter

Tech Stack

  • Python 2.7
  • NLTK (Natural Language Processing Toolkit) 3.2.2
  • TwitterSearch

Web App Tech

  • Django 1.9.11
  • Semantic UI
  • JQuery

Step

Pre-processing :

  • Tokenization

Using : http://www.nltk.org/_modules/nltk/tokenize/casual.html#TweetTokenizer

Example : 
>>> from nltk.tokenize import TweetTokenizer
>>> tweet = "This is a cooool #dummysmiley: :-) :-P <3 and some arrows < > -> <--"
>>> TwitterTokenizer.tokenize(tweet)
['This', 'is', 'a', 'cooool', '#dummysmiley', ':', ':-)', ':-P', '<3', 'and', 'some', 'arrows', '<', '>', '->', '<--']
  • removing stop word
>>> from nltk.corpus import stopwords
>>> english_stops = set(stopwords.words('english'))
>>> "is" in english_stops
True
>>> "ganteng" in english_stops
False
  • Stemming Porter Algorithm

Algorithm : http://snowball.tartarus.org/algorithms/porter/stemmer.html

Simple Explanation :

1.a

$sses -> $ss | caresses -> caress

$ies -> $i | ponies -> poni

$ss -> $ss | caress -> caress

$s -> $ | cats -> cat

1.b

$(verb)-ing -&gt; $(verb) | walking -> walk

$(verb)-ed -&gt; $(verb) | walked -> walk

2.(for long stems)

$ational -> $ate | relational -> relate

$izer -> $ize | digitizer -> digitize

3.(for long stems)

$al -> $ | revival -> reviv

$able -> $ | adjustable -> adjust

  • lower_case using python built in .lower() method
>>> "TwitterPostTweet".lower()
"twitterposttweet"

Extraksi Feature

  • Using Binary term frequency.
>>> tweet = ["apple", "product", "best", "use", "apple", "forever"]
>>> "extraction_feature(tweet)
{"apple": True, "product": True, "best": True, "forever": True}

Classifier

  • Using NaiveBayesClassifier (NLTK),

Screenshoot

Alt text

Alt text

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published