Skip to content

Latest commit

 

History

History
45 lines (27 loc) · 764 Bytes

README.textile

File metadata and controls

45 lines (27 loc) · 764 Bytes

STOPWORDS

REALLY JUST A LIST OF STOPWORDS WITH SOME HELPERS

Obviously part of something bigger but worth breaking out for reuse.

USAGE


	
require 'stopwords'

#List all stop words
Stopwords::STOP_WORDS

#Test to see if a token is a stop word
Stopwords.is?('and')

=>true

#Ensures a token is both a 'word' and not a stop word
Stopwords.valid?('vector')

=>true

SPECS


$ rake specs

SANITIZE

Not part of the library but you should probably sanitize tokens before using them (if your tokenize doesn’t already)


SANITIZE_REGEXP = /('|\"|‘|’|\/|\\)/
text.downcase.gsub(SANITIZE_REGEXP, '')

ENDAX

Software Services shop (primarily Ruby) in Brooklyn, NY.