This project is designed to classify resumes into predefined categories using Natural Language Processing (NLP) and Machine Learning techniques. It demonstrates how to preprocess text data, extract relevant features, and train classification models to predict the domain or field of a candidate's resume.
To build an automated system that can accurately categorize resumes into various job-related fields such as Data Science, HR, DevOps, Testing, Web Development, etc., based on the text content in resumes.
-
Data Loading
Load the dataset containing resumes and their corresponding categories. -
Exploratory Data Analysis (EDA)
Understand the structure of the dataset, check for class distribution and imbalances. -
Text Preprocessing
- Tokenization
- Lowercasing
- Removing stopwords, punctuation, and special characters
- Lemmatization
-
Feature Engineering
- TF-IDF Vectorization
- Word Cloud Generation
-
Model Building
Train multiple machine learning models:- Logistic Regression
- Random Forest Classifier
- Support Vector Machine (SVM)
- Multinomial Naive Bayes
-
Model Evaluation
Evaluate models using metrics like accuracy, precision, recall, and F1-score. -
Prediction and Inference
Predict the category of a resume based on user input.
The dataset contains resumes and their labels. Each entry consists of:
- 'Category': The class or job domain the resume belongs to.
- 'Resume': The raw text content of the resume.
You can download the dataset used in this project from here.
Install the required packages using the following command:
pip install -r requirements1.txtResume_Classification/
│
├── Resume_Classification.ipynb # Main notebook
├── requirements1.txt # Python package requirements
├── dataset/ # Contains resume data (if applicable)
├── README.md # Project documentation- Clone the repository:
git clone https://github.com/your-username/resume-classification.git
cd resume-classification- Install dependencies:
pip install -r requirements1.txt- Open the Jupyter notebook:
jupyter notebook Resume_Classification.ipynb- Deployment via Streamlit or Flask for real-time classification
- Integration with OCR for PDF/DOCX resume uploads
- Incorporate deep learning (e.g., BERT, LSTM) for better accuracy
This project is open-source and available under the MIT License.
Pandidharan