The job market is rapidly transforming due to technological advancements, remote work trends, and evolving industry demands. LinkedIn, as a leading professional networking platform, offers an extensive repository of job postings, providing opportunities for data-driven insights.
This project analyzes LinkedIn job listings to identify high-demand roles, in-demand skills, and salary benchmarks. It also implements a recommendation system to match job seekers with roles aligned to their profiles. The use of ML, NLP, and explainable AI ensures scalable, accurate, and interpretable insights.
- Traditional ML models like Logistic Regression and Decision Trees are widely used for job classification.
- Neural network embeddings improve personalized job recommendations.
- Explainable AI (XAI) techniques like SHAP and LIME enhance model interpretability.
- This project combines traditional ML, neural embeddings, and XAI to address limitations of prior studies, ensuring robust evaluation across multiple metrics.
- Cleaned missing values and outliers
- Applied feature engineering:
- TF-IDF for text data
- One-hot encoding for categorical variables
- Normalization for numerical fields
- Classification: Logistic Regression, Random Forest, KNN for job types
- Regression: Neural Networks, Gradient Boosting for salary prediction
- Recommendation: Neural embeddings for personalized job suggestions
- Metrics: Accuracy, Precision, Recall, F1-score (classification), MSE (regression), Precision/Recall (recommendation)
- SHAP visualizations used for explainability
| Model | Use Case | Key Notes |
|---|---|---|
| Logistic Regression | Job type classification | Interpretable baseline |
| Naive Bayes | Text classification | Simple but affected by class imbalance |
| K-Nearest Neighbors (KNN) | Personalized recommendations | High accuracy for job recommendations |
| Random Forest | Classification | Robust feature importance analysis |
| Gradient Boosting | Salary prediction | Captures complex interactions |
| Neural Networks | Recommendations | Outperforms traditional models using embeddings |
Hyperparameter optimization was applied for all models.
- Size: 124,000 LinkedIn job postings
- Fields:
- Job Titles & Descriptions
- Work Types (remote, hybrid, onsite)
- Salaries (min, max, median)
- Company Info (industry, HQ, employee count)
- Skills (technical & soft skills)
- Categorical missing values → "Unknown"
- Salary missing values → median imputation
- Merged external datasets for skills and industries to enhance predictive power
- Job titles, locations, work types, and experience levels analyzed
- Correlation heatmaps used for numerical feature relationships
- Textual data: TF-IDF vectorization for job descriptions and skills
- Categorical encoding: One-hot encoding for work type and experience level
- Numerical normalization: Min-Max scaling for salary fields
- Data split: 80% training / 20% testing with stratified sampling
- Cross-validation: 10-fold
- Hyperparameter tuning: Optuna
- Oversampling: Handled class imbalance
- Ensemble methods: Combined multiple models for stability
- Accuracy: 65.26% | AUC: 71.85% | F1-score: 55.32%
- Moderate performance; captures text-based patterns
- Accuracy: 68.99% | AUC: 68.00% | F1-score: 12.20%
- Poor recall due to class imbalance
- 10-fold CV Accuracy: 97.09% | Test Accuracy: 94.97% | F1-score: 92.12%
- High performance for personalized recommendations
- Test Accuracy: 92.09% | AUC: 97.41% | F1-score: 87.85%
- Robust and interpretable
- Test Accuracy: 91.67% | AUC: 96.76% | F1-score: 87.24%
- Strong predictive power
- Neural embedding-based system provides personalized job suggestions
- SHAP plots highlight skills and location influencing recommendations
- SHAP visualizations provide transparency and trust
- Helps stakeholders validate and interpret recommendations
- KNN and Random Forest excel at classification; Neural Networks for recommendations
- Challenges: class imbalance, high computational costs
- Feature engineering and XAI improved accuracy and trust
- ML and NLP successfully analyzed LinkedIn job listings
- High-performing models for classification, regression, and recommendation
- SHAP visualizations enhanced interpretability
- Future work: longitudinal data, real-time recommendation optimization
Job-Recommendation-System/
─ cleaned_text.csv
─ cleaned_text_with_job_info.csv
─ postings.csv
─ companies/ # Company-related datasets
─ jobs/ # Job-related datasets
─ mappings/ # Mappings for skills, industries.
─ ML4GEN.ipynb # Main project notebook
─ NLP.ipynb # NLP analysis notebook