This project leverages Simple Linear Regression to model and predict product sales based on advertisement expenditures across different media channels. Using a dataset sourced from Kaggle, the analysis identifies relationships between advertisement spending (TV, Radio, and Newspaper) and product sales. The primary focus of the project is to build a predictive model that effectively forecasts sales based on TV advertisement spend.
Understand the correlation between advertisement spending and sales.
Develop a regression model to predict sales using TV ad spend as the independent variable.
Evaluate the performance of the model using statistical metrics and provide insights through visualization.
Source: Kaggle Advertising Dataset
Key Features:
Independent Variables: TV, Radio, Newspaper Advertisement Spend
Target Variable: Sales
Dataset Summary:
Records: 200
Variables: 3 Features + 1 Target (Sales)
Step-1: Data Loading and Cleaning:
Loaded the dataset from Kaggle.
Conducted initial data checks to ensure no missing or null values.
Standardized features for uniform scaling using StandardScaler.
Step-2 : Exploratory Data Analysis (EDA):
Analyzed correlations using a correlation matrix.
Visualized data distributions and relationships through scatter plots and pair plots.
Step-3 : Data Splitting:
Split the dataset into Training (75%) and Testing (25%) sets using train_test_split.
Step-4 : Model Development:
Trained a Simple Linear Regression Model with TV Advertisement Spend as the predictor for Sales using scikit-learn's LinearRegression.
Step-5 : Performance Evaluation:
Evaluated the model using:
MSE (Mean Squared Error): 5.51
MAE (Mean Absolute Error): 1.87
RMSE (Root Mean Squared Error): 2.35
R² Score: 0.797
Adjusted R² Score: 0.793
Compared actual v/s predicted sales on the test set.
Visualization:
Plotted actual v/s predicted sales on a scatter plot for validation.
Strong Correlation:
TV advertisement spend shows a high correlation (0.90) with sales, making it an effective predictor.
Model Performance::
The model explains ~80% of the variance in sales, showcasing its effectiveness as a linear predictor.
Actionable Inight:
Increased spending on TV advertisements is positively correlated with higher sales.
Pandas
NumPy
Matplotlib
Seaborn
Scikit-learn
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(X_train, y_train)
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
mse = mean_squared_error(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)
rmse = np.sqrt(mse)
r2 = r2_score(y_test, y_pred)
This project demonstrates the predictive power of simple linear regression in understanding the relationship between advertisement spending and sales. The findings indicate that TV advertisement spending has a strong positive correlation (0.90) with sales, making it a significant factor in driving revenue.
The regression model effectively explains approximately 80% of the variance in sales, as reflected by the R² score of 0.797, validating its utility as a forecasting tool
For more details, check my github repository: Click Here