This project demonstrates a Linear Regression Analysis performed in R using a dataset related to advertising spend and product sales. The goal is to understand how different types of marketing (TV, Social Media, and Radio) affect Sales, and to evaluate the model's performance using metrics like RMSE, MAE, and regression accuracy.
- R
- RStudio / Jupyter Notebook
- Packages:
dplyr,caTools,ggplot2,broom
The dataset (Dummy Data HSS.csv) contains the following columns:
TVβ advertising budget spent on TVSocial.Mediaβ budget on social media platformsRadioβ radio ad spendSalesβ resulting product sales
- Data Preprocessing: Loaded CSV data and checked for null values.
- Data Splitting: Split data into training and testing sets using
caTools. - Model Building:
- Simple Linear Regression:
Sales ~ TV - Multiple Linear Regression:
Sales ~ TV + Social.Media + Radio
- Simple Linear Regression:
- Model Evaluation:
- RMSE:
3.0365 - MAE:
2.4315 - Regression Accuracy:
98.19984(calculated using MAPE)
- RMSE:
Although accuracy is more common in classification, here it's estimated using MAPE:
mape <- mean(abs((predictions - actuals) / actuals)) * 100
accuracy <- 100 - mapeThe multiple linear regression model shows a decent performance with RMSE and MAE within acceptable limits. Further improvements can be made by exploring feature engineering or using regularization techniques like Ridge/Lasso.
- Open the notebook in RStudio or Jupyter.
- Make sure the CSV file is in the same directory.
- Run all cells to see outputs and evaluation metrics.