In this article, we will learn about Multiple Linear Regression in Machine Learning.

If you are new to Machine Learning then you can follow this blog for a basic understanding of machine learning.

### What is Multiple Linear Regression?

-> A multiple linear regression model is able to analyze the relationship between several independent variables and a single dependent variable.

-> Multiple Linear Regression attempts to model the relationship between two or more features and a response by fitting a linear equation to observed data. The steps to perform multiple linear Regression are almost similar to that of simple linear Regression.

-> Multiple regression has numerous real-world applications in three problem domains: examining relationships between variables, making numerical predictions, and time series forecasting.

Equation : **Y = b0 + b1 * x1 + b2 * x2 + b3 * x3 + …… bn * xn **

Here, Y = Dependent variable and x1, x2, x3, …… xn = multiple independent variables, bo = Constant, b1 = Coefficient

### Assumptions for Multiple Linear Regression

- Linearity: The relationship between dependent and independent variables should be linear.
- Homoscedasticity: Constant variance of the errors should be maintained.
- Multivariate normality: Multiple Regression assumes that the residuals are normally distributed.
- The regression residuals must be normally distributed

**Dummy Variable:**

In statistics and econometrics, particularly in regression analysis, a dummy variable is one that takes only the value 0 or 1 to indicate the absence or presence of some categorical effect that may be expected to shift the outcome.

### Let’s start to implement one example for Multiple Linear Regression.

We have a dataset of 50 start-up companies. This dataset contains five main information: R&D Spend, Administration Spend, Marketing Spend, State, and Profit for a financial year.

Our goal is to create a model that can easily determine which company has a maximum profit, and which is the most affecting factor for the profit of a company.

**Step 1) Importing the libraries**

import numpy as np import matplotlib.pyplot as plt import pandas as pd

**Step 2 ) Importing the dataset**

dataset = pd.read_csv('50_Startups.csv') X = dataset.iloc[:, :-1].values y = dataset.iloc[:, -1].values # OUTPUT [[165349.2 136897.8 471784.1 'New York'] [162597.7 151377.59 443898.53 'California'] ... ... ... ]

**Step 3 ) Encoding categorical data**

- As we have one categorical variable (State), which cannot be directly applied to the model, so we will encode it. To encode the categorical variable into numbers, we will use the OneHotEncoder class.

from sklearn.compose import ColumnTransformer from sklearn.preprocessing import OneHotEncoder ct = ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [3])], remainder='passthrough') X = np.array(ct.fit_transform(X)) # OUTPUT [[0.0 0.0 1.0 165349.2 136897.8 471784.1] [1.0 0.0 0.0 162597.7 151377.59 443898.53] ... ... ... ]

**Step 4 ) Splitting the dataset into the Training set and Test set**

from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

**Step 5 ) Training the Multiple Linear Regression model on the Training set**

from sklearn.linear_model import LinearRegression regressor = LinearRegression() regressor.fit(X_train, y_train)

**Step 6 ) Predicting the Test set results**

y_pred = regressor.predict(X_test) np.set_printoptions(precision=2) print(np.concatenate((y_pred.reshape(len(y_pred),1), y_test.reshape(len(y_test),1)),1)) # OUTPUT [[103015.2 103282.38] [132582.28 144259.4 ] ... ... ... ]

- So we have some amazing predictions, very close to the real profits and some. Okay, predictions.
- So here, from what we see, well, we could say that the multiple in our regression is well adapted to this data set. The data set does not necessarily have some perfect linear correlations. However, you can be assured that this linear regression class, was able to select the right features with the right parameters to make these predictions.
- And even if you tune your linear regression model by, for example, applying backward elimination to select, you know, a team of more statistically significant features, you will actually get similar results.

I hope you’ll get impressed by this multiple linear regression model prediction. So let’s start coding within the next blog. If you’ve got any questions regarding this blog, please let me know in the comments.