Produced an in-depth report and built 9 machine learning models using 3 different datasets for Unsupervised learning, Regression and Classification tasks. Research questions were formulated and literature reviews were carried out for each problem and thereafter the machine learning models were built.
All models were built using python.
Links for datasets can be found in the appendix of the report.
The study used unsupervised learning on a wholesale customer dataset to analyze the distribution, correlation, and spending patterns of different products among retailers and HoReCa. PRINCIPAL COMPONENT ANALYSIS (PCA) was employed to reduce the dimensionality of the data and identify key components. Thereafter, K-MEANS CLUSTERING was used to identify clusters between each principal component.
The research focused on regression analysis using the Boston Housing Price dataset to predict housing prices. The study examined the impact of variables such as number of rooms, house age, and air pollution levels on housing prices. MULTIPLE LINEAR REGRESSION, DECISION TREE REGRESSION, RANDOM FOREST REGRESSION and XGBOOST REGRESSION was applied, and the model's performance was evaluated using metrics such as R-squared, explained variance, mean squared error (MSE), and root mean squared error (RMSE)
Classification algorithms were used to predict the likelihood of heart attacks based on the Heart Attack Prediction dataset. LOGISTIC REGRESSION, DECISION TREE CLASSIFIER and GRADIENT BOOSTING CLASSIFIER were used as models. The models were evaluated using, Accuracy, Precision, Recall and F1 scores; Confusion matrices were used and the ROC curve was also utilized to measure the AUC.