Janatahack-Independence-Day-2020-ML-Hackathon

This hackathon was organised by Analytics Vidhya to showcase machine learning skills by sharing approaches and more important to learn how to apply machine learning and predictive analytics to new domains. In this hackathon, we have to perform Topic Modeling for Research Articles.It is a multi-label classification problem.
In this repository, I used machine learning and deep learning models along with NLP to solve this problem.

Problem Statement -

Researchers have access to large online archives of scientific articles. As a consequence, finding relevant articles has become more difficult. Tagging or topic modelling provides a way to give token of identification to research articles which facilitates recommendation and search process.

Given the abstract and title for a set of research articles, predict the topics for each article included in the test set. Note that a research article can possibly have more than 1 topic. The research article abstracts and titles are sourced from the following 6 topics:

Computer Science
Physics
Mathematics
Statistics
Quantitative Biology
Quantitative Finance

Scores

Public Leaderboard - 23 rank
Private Learderboard - 15 rank

Approaches

For preprocessing, first of all I cleaned the data using NLP (remove stopwords,noise,lemmatization).Then to covert the text into vectors, I used TFIDF(term frequency–inverse document frequency) vectorizer. Now comes the model building part. For that, I have implemented 2 approches -

1. Machine Learning Models

Since, it is a multi-label classification problem, therefore tried different models like Logistic Regression, SVM, Naive Bayes, etc along with OneVsRestClassifier to predict the target classes.

2. Deep Neural Network

This gives me better accuracy then the previous approach. The preprocessing steps remains the same as discussed above. After conveting the text into vectors using TFIDF, the vectors are passed to the Artificial Neural Network which consists of Dense layers and Dropout was also applied to avoid overfitting. This prediction gives me 23 rank on Public Leaderboard and 15 rank on Private Learderboard.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
multilabel_class_ml.ipynb		multilabel_class_ml.ipynb
multilalabel_class_ann.ipynb		multilalabel_class_ann.ipynb
test.csv		test.csv
train.csv		train.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Janatahack-Independence-Day-2020-ML-Hackathon

Problem Statement -

Scores

Approaches

1. Machine Learning Models

2. Deep Neural Network

About

Releases

Packages

Languages

jyoti0225/Multilabel-Text-Classification

Folders and files

Latest commit

History

Repository files navigation

Janatahack-Independence-Day-2020-ML-Hackathon

Problem Statement -

Scores

Approaches

1. Machine Learning Models

2. Deep Neural Network

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages