Resume Classification using NLP

This project is designed to classify resumes into predefined categories using Natural Language Processing (NLP) and Machine Learning techniques. It demonstrates how to preprocess text data, extract relevant features, and train classification models to predict the domain or field of a candidate's resume.

Project Objective

To build an automated system that can accurately categorize resumes into various job-related fields such as Data Science, HR, DevOps, Testing, Web Development, etc., based on the text content in resumes.

Project Pipeline

Data Loading
Load the dataset containing resumes and their corresponding categories.
Exploratory Data Analysis (EDA)
Understand the structure of the dataset, check for class distribution and imbalances.
Text Preprocessing
- Tokenization
- Lowercasing
- Removing stopwords, punctuation, and special characters
- Lemmatization
Feature Engineering
- TF-IDF Vectorization
- Word Cloud Generation
Model Building
Train multiple machine learning models:
- Logistic Regression
- Random Forest Classifier
- Support Vector Machine (SVM)
- Multinomial Naive Bayes
Model Evaluation
Evaluate models using metrics like accuracy, precision, recall, and F1-score.
Prediction and Inference
Predict the category of a resume based on user input.

Dataset

The dataset contains resumes and their labels. Each entry consists of:

'Category': The class or job domain the resume belongs to.
'Resume': The raw text content of the resume.

You can download the dataset used in this project from here.

Requirements

Install the required packages using the following command:

pip install -r requirements1.txt

Folder Structure

Resume_Classification/
│
├── Resume_Classification.ipynb    # Main notebook
├── requirements1.txt               # Python package requirements
├── dataset/                       # Contains resume data (if applicable)
├── README.md                      # Project documentation

How to Run

Clone the repository:

git clone https://github.com/your-username/resume-classification.git
cd resume-classification

Install dependencies:

pip install -r requirements1.txt

Open the Jupyter notebook:

jupyter notebook Resume_Classification.ipynb

Future Improvements

Deployment via Streamlit or Flask for real-time classification
Integration with OCR for PDF/DOCX resume uploads
Incorporate deep learning (e.g., BERT, LSTM) for better accuracy

License

This project is open-source and available under the MIT License.

Author

Pandidharan

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
LICENSE		LICENSE
README.md		README.md
Resume_Classification (1).ipynb		Resume_Classification (1).ipynb
UpdatedResumeDataSet.csv		UpdatedResumeDataSet.csv
requirements1.txt		requirements1.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Resume Classification using NLP

Project Objective

Project Pipeline

Dataset

Requirements

Folder Structure

How to Run

Future Improvements

License

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Resume Classification using NLP

Project Objective

Project Pipeline

Dataset

Requirements

Folder Structure

How to Run

Future Improvements

License

Author

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages