• Written By Ritesh Kumar Gupta
  • Last Modified 22-06-2023

Regression – Definition, Formula, Derivation & Applications

img-icon

The term “Regression” refers to the process of determining the relationship between one or more factors and the output variable. The outcome variable is called the response variable, whereas the risk factors and co-founders are known as predictors or independent variables. In regression analysis, the dependent variable is represented by “y”, while the independent variables are represented by “x.”

There are various types of regression analysis, including linear, multiple linear, and nonlinear. Simple linear and multiple linear models are the most common. Nonlinear regression analysis is used for more complex data sets. The relationships between the dependent and independent variables show a nonlinear relationship. Read on to learn about

What is Regression?

“Regression” comes from the word “regress,” derived from the Latin word “regressus,” which means “to go back” (to something). So, regression is the technique that helps you “to go back” from a jumbled, difficult-to-understand set of data to a simpler, more meaningful model.

Regression is a statistical technique used in economics, investing, and other fields to evaluate the strength and nature of a relationship between one dependent variable (usually denoted by \(Y\)) and a set of other variables (known as independent variables). Regression attempts to find a mathematical relationship between a set of random variables thought to predict \(Y\).

Simple linear regression and multiple linear regression are the two basic types of regression. Multiple linear regression uses two or more independent variables to predict the outcome of the dependent variable \(Y\). In contrast, simple linear regression uses one independent variable to describe or predict the outcome of the dependent variable \(Y\).

Regression

Regression Formula

There are several types of regression, including linear, multiple linear, and nonlinear. Simple linear and multiple linear models are the most common. However, nonlinear regression analysis is widely used for more complex data sets with nonlinear relationships between the dependent and independent variables.

The general form of regression is:

  • Simple Linear Regression: \(y = a + bx + \in \)

Where:

\(y = \) Dependent Variable

\(x = \) Independent Variable

\(a = y – {\text{Intercept}} = \frac{{\sum y \sum {{x^2}}  – \sum x \sum x y}}{{n\left( {\sum {{x^2}} } \right) – {{\left( {\sum x } \right)}^2}}}\)

\(b = {\text{Slope}}\,{\text{of}}\,{\text{the}}\,{\text{line}} = \frac{{n\sum x y – \left( {\sum x } \right)\left( {\sum y }\right)}}{{n\left( {\sum {{x^2}} } \right) – {{\left( {\sum x } \right)}^2}}}\)

\(x\) and \(y\) are two variables on the regression line

\(x =\) Values of the first data set
\(y =\) Values of the second data set

\(∈=\) Residuals

  • Multiple Linear Regression: \(Y = a + {b_1}{X_1} + {b_2}{X_2} + {b_3}{X_3} +  \ldots {b_s}{X_s} +  \in \)

Where:

\(Y=\) Dependent Variable

\(X,{X_1},{X_2},{X_3}, \ldots ,{X_s} = \) Independent Variables

\({b_1},{b_2},{b_3}, \ldots ,{b_s} = \) Slopes

\(a=\) Intercept

\(∈ =\) Residuals

Residuals are the difference between the observed dependent value and the predicted value. Each data has one residual. The mean of residuals in the linear regression is always \(0\).

Linear Regression Example

A regression line can depict a positive, negative, or no linear relationship.

For a Simple Linear Regression: \(y=a+bx\)

Case 1: If \(b=\) slope of line \(=0⟹\) There is no connection: In simple linear regression, the graphed line is flat (not sloped). The two variables do not have any relationship.

Case 2: If \(b=\) slope of line \(=+ve⟹\) The regression line slopes upward, with the lower end of the line at the graph’s \(y\)-intercept (axis) and the upper end of the line extending upward into the graph field, away from the \(x\)-intercept (axis). The two variables have a positive linear relationship: as the value of one rises, the value of the other rises as well.

Case 3: \(b=\) slope of line \(=-ve⟹\) The regression line slopes downward, with its upper end at the graph’s \(y\)-intercept (axis) and its lower end extending downward into the graph field, toward the \(x\)-intercept (axis). The two variables have a negative linear relationship: as the value of one increases, the value of the other decreases.

Linear Regression Derivation

Linear Regression Mathematical Derivation

Given \(n\) data pairs \(\left({{x_1},\,{y_1}} \right)….\left({{x_n},\,{y_n}} \right)\), the best fit for the straight-line regression model

\(y=a+bx\) is found by the method of least squares.

Starting with the sum of square of the residuals, \(S\) we get

\(S = \sum\limits_{i = 1}^n {{{\left( {{y_i} – a – b{x_i}} \right)}^2}} \)

And using

\(\frac{{\partial S}}{{\partial a}} = 0\) and \(\frac{{\partial S}}{{\partial b}} = 0\)

gives two simultaneous linear equations whose solution (which gives the minimum  value of \(S\)) is

\(a=y-\) \({\text{Intercept}} = \frac{{\sum y \sum {{x^2}}  – \sum x \sum x y}}{{n\left( {\sum {{x^2}} } \right) – {{\left( {\sum x } \right)}^2}}}\)

\(b = {\text{Slope}}\,{\text{of}}\,{\text{the}}\,{\text{line}} = \frac{{n\sum x y – \left( {\sum x } \right)\left( {\sum y } \right)}}{{n\left( {\sum {{x^2}} } \right) – {{\left( {\sum x } \right)}^2}}}\)

Difference Between Correlation and Regression

The difference between correlation and regression are as follows:

  1. As the name implies, ‘correlation’ determines the interconnection or co-relationship between the variables, whereas ‘regression’ explains how an independent variable is numerically related to the dependent variable.
  2. Both the independent and dependent values have no difference in correlation.
    In regression, however, both the dependent and independent variables are distinct.
  3. The primary goal of correlation is to determine a numerical/quantitative value that expresses the relationship between the values. However, when it comes to regression, the main goal is to calculate the values of a random variable using the values of a fixed variable.
  4. Correlation aids in the formation of a connection between the two variables. Regression aids in calculating the value of a variable dependent on another value.

Applications of Regression

Following are some of the most popular applications of regression:

  1. Regression is used in finance to measure a stock’s Beta (volatility of returns compared to the overall market). 
  2. When predicting financial statements for a company, multiple regression can be useful to see how changes in some expectations or market drivers may affect sales or expenses in the future.
  3. For prediction and forecasting, regression is used. This is closely related to the field of machine learning.
  4. Regression is also used in different fields like marketing, manufacturing, medicine etc. It helps to understand the efficacy of marketing strategies, predicted pricing, and product revenue. It helps to evaluate the relationship between variables that decide the efficiency of a better engine. It aids in preparing generic drugs for illnesses and forecasting various combinations of medicines.
  5. Regression helps a credit card company understand various factors such as a customer’s probability of credit default, predicted consumer behaviour, credit balance prediction, and so on, based on these findings. Then, the company introduces specific EMI options while minimising default among risky customers.
  6. Regression is used with the decision-making strategies, optimisation of business, Risk analysis and predictive analysis.
Foreasting
What are the Applications of Regression?

Solved Examples

Q.1. Find the regression line for the following set of data
\(\{ ( – 1,0),(0,2),(1,4),(2,5)\} \)
Ans: Let the regression line is
\(y=a+bx\)

\(x\)\(y\)\(xy\)\({x^2}\)
\(-2\)\(-1\)\(2\)\(4\)
\(1\)\(1\)\(1\)\(1\)
\(3\)\(2\)\(6\)\(9\)
\(\sum x = 2\)\(\sum y = 2\)\(\sum x y = 9\)\(\sum {{x^2}} = 14\)

\(a = y – {\text{intercept}} = \frac{{\sum y \sum {{x^2}}  – \sum x \sum x y}}{{n\left( {\sum {{x^2}} }\right) – {{\left( {\sum x } \right)}^2}}} = \frac{{2 \times 14 – 2 \times 9}}{{3 \times 14 – 4}} = \frac{5}{{19}}\)

\(b = {\text{Slope}}\,{\text{of}}\,{\text{the}}\,{\text{line}} = \frac{{n\sum x y – \left( {\sum x } \right)\left( {\sum y } \right)}}{{n\left( {\sum {{x^2}} ) – {{\left( {\sum x } \right)}^2}} \right.}} = \frac{{3 \times 9 – 2 \times 2}}{{3 \times 14 – 4}} = \frac{{23}}{{38}}\)

Hence, \(y = \frac{5}{{19}} + \frac{{23}}{{38}}x\) is the required regression line.

Q.2. The value of \(x\) and their corresponding values of \(y\) are shown in the table below

\(x\)\(0\)\(1\)\(2\)\(3\)\(4\)
\(y\)\(2\)\(3\)\(5\)\(4\)\(6\)

Find the regression line \(y=a+bx\) and also estimate the value of \(y\) when \(x=10\)
Ans:

\(x\)\(y\)\(xy\)\({x^2}\)
\(0\)\(2\)\(0\)\(0\)
\(1\)\(3\)\(3\)\(1\)
\(2\)\(5\)\(10\)\(4\)
\(3\)\(4\)\(12\)\(9\)
\(4\)\(6\)\(24\)\(16\)
\(\sum x = 10\)\(\sum y = 20\)\(\sum x y = 49\)\(\sum{{x^2}} = 30\)

\(a = y – {\text{intercept}} = \frac{{\sum y \sum {{x^2}}  – \sum x \sum x y}}{{n\left( {\sum {{x^2}} } \right) – {{\left( {\sum x } \right)}^2}}} = \frac{{20 \times 30 – 10 \times 49}}{{5 \times 30 – 100}} = \frac{{110}}{{50}} = 2.2\)

\(b = {\text{Slope}}\,{\text{of}}\,{\text{the}}\,{\text{line}} = \frac{{n\sum x y – \left( {\sum x } \right)\left( {\sum y } \right)}}{{n\left( {{{\sum x }^2}} \right) – {{\left( {\sum x } \right)}^2}}} = \frac{{5 \times 49 – 10 \times 20}}{{5 \times 30 – 100}} = \frac{{45}}{{50}} = 0.9\)

Hence, \(y=2.2+0.9x\) is the required regression line.

So, When \(x=10, y=2.2+0.9×10=11.2\)

Q.3. The sales of a company (in a million dollars) for each year are shown below:

\(x\)(year)\(2005\)\(2006\)\(2007\)\(2008\)\(2009\)
\(y\)(sales)\(12\)\(19\)\(29\)\(37\)\(45\)

Find the regression line \(y=a+bx\) and also estimate the sales of the company in \(2012\).
Ans: We are rewriting the above table to reduce the calculation to be involved.

\(x\)(years after \(2005\))\(0\)\(1\)\(2\)\(3\)\(4\)
\(y\)(sales)\(12\)\(19\)\(29\)\(37\)\(45\)
\(x\)\(y\)\(xy\)\({x^2}\)
\(0\)\(12\)\(0\)\(0\)
\(1\)\(19\)\(19\)\(1\)
\(2\)\(29\)\(58\)\(4\)
\(3\)\(37\)\(111\)\(9\)
\(4\)\(45\)\(180\)\(16\)
\(\sum x = 10\)\(\sum y = 142\)\(\sum x y = 368\)\(\sum{{x^2}} = 30\)

\(a = y – {\text{intercept}} = \frac{{\sum y \sum {{x^2}}  – \sum x \sum x y}}{{n\left( {\sum {{x^2}} } \right) – {{\left( {\sum x } \right)}^2}}} = \frac{{142 \times 30 – 10 \times 368}}{{5 \times 30 – 100}} = \frac{{580}}{{50}} = 11.6\)

\(b = {\text{Slope}}\,{\text{of}}\,{\text{the}}\,{\text{line}} = \frac{{n\sum xy – \left( {\sum x } \right)\left( {\sum y } \right)}}{{n\left( {\sum {{x^2}} } \right) – {{\left( {\sum x } \right)}^2}}} = \frac{{5 \times 368 – 10 \times 142}}{{5 \times 30 – 100}} = 8.4\)

Hence, \(y=11.6+8.4x\) is the required regression line.

So, For Sales in \(2012,\) i.e. \(7\) years after \(2005\)

So, for \(x=7, y=11.6+8.4×7=70.4\) million dollars.

Summary

Regression is a statistical technique used in economics, investing, and other fields to evaluate the strength and nature of a relationship between one dependent variable. We have learnt about the regression formula & its application in real-life situations. It has immense uses in the real world that led to a significant role in this mathematical world.

FAQs

Q.1. What is the concept of regression?
Ans: A collection of statistical methods for estimating relationships between a dependent variable and one or more independent variables is known as regression. It can be used to determine the strength of a relationship between variables and to predict how they will interact in the future.

Q.2. Why do we use regression?
Ans: A regression analysis can be used for either of two purposes: predicting the value of the dependent variable for individuals for whom knowledge about the explanatory variables is available or estimating the impact of any explanatory variable on the dependent variable.

Q.3. How does regression work?
Ans:  Regression is a method of predicting the values of a dependent variable by using an independent variable. A line of best fit is used in linear regression to derive an equation from the training dataset, which can then be used to predict the values of the testing dataset. The equation can be written as \(y=mx+b\), where \(y\) is the expected value, \(m\) is the line’s gradient, and \(b\) is the line’s intersection with the \(y\)-axis.

Q.4. What are regression and its types?
Ans: Regression is a powerful statistical tool that helps us to examine the relationship between two or more variables of interest. There are several types of regression, including linear, multiple linear, and nonlinear. Simple linear and multiple linear models are the most common.

Q.5. What is an example of regression?
Ans: In an industry, regressions can be used to assess patterns and make predictions or forecasts. For example, suppose a company’s sales have been increasing steadily every month for the past few years. In that case, the company might predict sales in future months by running a linear analysis on the sales data with monthly sales.

Some other helpful articles by Embibe are provided below:

Foundation ConceptsClass-wise Mathematical Formulas
HCF And LCMMaths Formulas For Class 6
Algebra FormulasMaths Formulas For Class 7
BODMAS RuleMaths Formulas For Class 8
Properties Of TrianglesMaths Formulas For Class 9
Trigonometry FormulasMaths Formulas For Class 10
Mensuration FormulasMaths Formulas For Class 11
Differentiation FormulasMaths Formulas For Class 12

We hope this article on regression has provided significant value to your knowledge. If you have any queries or suggestions, feel to write them down in the comment section below. We will love to hear from you. Embibe wishes you all the best of luck!

Unleash Your True Potential With Personalised Learning on EMBIBE