To view all the code and files click here
The problem is that people from supermarkets have trouble finding out which products will sell and which ones will sell and which one will not which causes them to waste a lot of money trying and testing products
In this project I had to do a lot of data wrangling and cleaning of the data. Since there were many ways the data was written since there were different ways the data was written for example the low fat was also writing as LF which we had to replace(look at the fat section from before and after data wrangling graphs) there were many missing values which we had to impute and that there were.
I used a regression model to predict the sales for each item. I also had to sort out the data and remove columns that had no effect on the model, I created new features which are simpler using all the data we had, and I used one hot encoding and label encoding to make my model more efficient
To create the model I used the Linear regression model(LM)
Data wrangling -Had many NA’s and missing values -Had data which didn't make sense -Removing data which didn't make sense One hot encoding and linear encoding Deciding which feature to drop or not
This project helps people from supermarkets and helps them decide which products will sell and which wouldn't. This saves them a lot of money and it could also help smaller stores since they can’t afford to buy random things and see if it sells but instead they can use this model to see which items to sell.
In conclusion i was able to get an R2 of 0.564684 using the LM model.Which is pretty good for this situation. Throughout this project I was able to learn about the different types of encodings and how they impact the model. I also learnt the importance of the data wrangling and how that takes up most of the time when building a model.

