This repository contains code and resources for a multitext classification project. The project was given by Intact as part of the CxC contest. The goal of the project is to classify text data into multiple predefined categories.
In these files I explore different ML models and observe how they perform with the given task.
The repository includes the following files:
-
Intact_EDA.ipynb
: a notebook file containing Exploratory Data Analysis (EDA) code for the text dataset. This file explores the dataset, analyzes its structure, performs basic text analysis, and visualizes various aspects of the data. -
Intact_models.ipynb
: a notebook file containing Machine Learning models (Naive Bayes, Support Vector Machines, Logistic Regression) for text classification. This file includes the implementation of different ML algorithms and techniques to train and evaluate models on the text dataset. It covers preprocessing, feature engineering, model training, and performance evaluation.
To-Do: Documentation and explain observed results.
Intact_Transformer_Models.ipynb
: a notebook file exploring different transformer-based models and how they perform in the given task.
The dataset used in this project is not included in this repository due confidentiality. Please refer to the contest or obtain the dataset separately to reproduce the results or apply the code to your own dataset.
Credits are given to the following authors for inspiration and walkthroughs: