A machine learning model which is going to predict the news is fake or genuine with the help of Natural Language Processing(NLP).
Goal:-The main aim of this project is to evaluate which news is fake and which news is real. There are many news which gives false accusation towards a particular person or a community. This model will solve this problem and give a better accuracy. It will give the result in either true or false.
Proposed System Steps:- • Collecting Information about fake news from a dataset.
• Cleaning datasets by removing unnecessary details from it.
• Analyzing dataset for information required.
• Implementing the classification algorithms on the dataset.
In this project a model is build based on the count vectorizer or a tfidf matrix ( i.e. ) word tallies relatives to how often they are used in other articles in your dataset ) can help . Since this problem is a kind of text classification, implementing a Naive Bayes classifier will be best as this is standard for text-based processing. The actual goal is in developing a model which was the text transformation (count vectorizer vs tfidf vectorizer) and choosing which type of text to use (headlines vs full text). Now the next step is to extract the most optimal features for count vectorizer or tfidf-vectorizer, this is done by using a n-number of the most used words, and/or phrases, lower casing or not, mainly removing the stop words which are common words such as “the”, “when”, and “there” and only using those words that appear at least a given number of times in a given text dataset.
Prerequisites to run this project:-
Software:- • You need to install and run the anaconda environment to work on spyder. • You need to install spyder to run the program. • You also need to install Microsoft excel to see our data set.