# Regression in Excel: equation, examples. Linear Regression

Regression analysis is a statistical research method that allows you to show the dependence of a parameter on one or more independent variables. In the pre-computer era, its use was quite difficult, especially when it came to large amounts of data. Today, having learned how to build a regression in Excel, you can solve complex statistical problems in just a couple of minutes. Below are specific examples from the field of economics.

## Types of regression

The concept itself was introduced into mathematics by Francis G alton in 1886. Regression happens:

• linear;
• parabolic;
• power;
• exponential;
• hyperbolic;
• demonstrative;
• logarithmic.

## Example 1

Let's consider the problem of determining the dependence of the number of retired team members on the average salary at 6 industrial enterprises.

Task. At six enterprises, we analyzed the average monthly salary and the number of employees who left of their own free will. ATtabular form we have:

 A B C 1 X Number of quitters Salary 2 y 30000 rubles 3 1 60 35000 rubles 4 2 35 40000 rubles 5 3 20 45000 rubles 6 4 20 50000 rubles 7 5 15 55000 rubles 8 6 15 60000 rubles

For the problem of determining the dependence of the number of laid-off workers on the average salary at 6 enterprises, the regression model has the form of the equation Y=a0 + a1 x1 +…+akxk, where xi- influencing variables, ai - regression coefficients, a k - number of factors.

For this task, Y is the indicator of employees who left, and the influencing factor is the salary, which we denote by X.

## Using the power of the spreadsheetExcel processor

Regression analysis in Excel must be preceded by the application of built-in functions to the available tabular data. However, for these purposes, it is better to use the very useful add-in "Analysis Toolkit". To activate it you need:

• from the "File" tab go to the "Options" section;
• in the window that opens, select the line "Add-ons";
• click on the "Go" button located at the bottom, to the right of the "Management" line;
• check the box next to the name "Analysis Package" and confirm your actions by clicking "OK".

If everything is done correctly, the desired button will appear on the right side of the Data tab, located above the Excel worksheet.

## Linear Regression in Excel

Now that we have all the necessary virtual tools at hand to perform econometric calculations, we can begin to solve our problem. To do this:

• click on the "Data Analysis" button;
• in the window that opens, click on the "Regression" button;
• in the tab that appears, enter the range of values for Y (the number of employees who quit) and for X (their salaries);
• confirm our actions by pressing the "Ok" button.

As a result, the program will automatically fill in a new sheet of the spreadsheet with regression analysis data. Note! Excel has the ability to manually set the location you prefer for this purpose. For example, it could be the same sheet where the Y and X values are, or even a new workbook specifically designed to store suchdata.

## Analysis of regression results for R-square

In Excel, the data obtained in the course of processing the data of the considered example looks like:

First of all, you should pay attention to the value of the R-square. It is the coefficient of determination. In this example, R-square=0.755 (75.5%), i.e., the calculated parameters of the model explain the relationship between the considered parameters by 75.5%. The higher the value of the coefficient of determination, the more applicable the chosen model for a particular task. It is believed that it correctly describes the real situation when the value of R-square is higher than 0.8. If R-square is <0.5, then such a regression analysis in Excel cannot be considered reasonable.

## Ratio analysis

The number 64, 1428 shows what the value of Y will be if all the variables xi in the model we are considering are set to zero. In other words, it can be argued that the value of the analyzed parameter is also influenced by other factors that are not described in a particular model.

The following coefficient -0.16285, located in cell B18, shows the weight of the influence of variable X on Y. This means that the average monthly salary of employees within the considered model affects the number of quitters with a weight of -0.16285, i.e. The degree of its influence is quite small. The "-" sign indicates that the coefficient has a negative value. This is obvious, since everyone knows that the higher the salary at the enterprise, the less people express a desire to terminate the employment contract or quit.

## Multiple regression

This term refers to a connection equation with several independent variables of the form:

y=f(x1+x2+…xm) + ε, where y is the resulting feature (dependent variable), and x1, x2, …xmare features -factors (independent variables).

## Estimation of parameters

For multiple regression (MR) it is carried out using the method of least squares (LSM). For linear equations of the form Y=a + b1x1 +…+bmxm + ε build a system of normal equations (see below)

To understand the principle of the method, consider the two-factor case. Then we have a situation described by the formula

From here we get:

where σ is the variance of the corresponding feature reflected in the index.

LSM applies to the MP equation on a standardizable scale. In this case, we get the equation:

in which ty, tx1, …t xm - standardizable variables with means equal to 0; βi are the standardized regression coefficients and the standard deviation is 1.

Please note that all βiin this case are set as normalized and centralized, so their comparison with each other is considered correct andadmissible. In addition, it is customary to filter out factors, discarding those with the smallest values of βi.

## Problem using linear regression equation

Suppose there is a table of the price dynamics of a particular item N during the last 8 months. It is necessary to make a decision on the advisability of purchasing its batch at a price of 1850 rubles/t.

 A B C 1 month number name of the month price of item N 2 1 January 1750 rubles per ton 3 2 February 1755 rubles per ton 4 3 march 1767 rubles per ton 5 4 April 1760 rubles per ton 6 5 May 1770 rubles per ton 7 6 June 1790 rubles per ton 8 7 July 1810 rubles per ton 9 8 August 1840 rubles per ton

To solve this problem in the Excel spreadsheet, you need to use the Data Analysis tool already known from the above example. Next, select the "Regression" section and set the parameters. It must be remembered that in the "Input interval Y" field, a range of values for the dependent variable (in this case, the price of goods in specific months of the year) must be entered, and in the "Input interval X" - for the independent variable (month number). Confirm the action by clicking "Ok". On a new sheet (if it was indicated so), we get data for regression.

Build a linear equation of the form y=ax+b using them, where the parameters a and b are the coefficients of the row with the name of the month number and the coefficients and the "Y-intersection" line from the sheet with the results of regression analysis. Thus, the linear regression equation (LR) for task 3 is written as:

Price for item N=11, 714 month number + 1727, 54.

or in algebraic notation

y=11, 714 x + 1727, 54

## Analysis of results

To decide whether the resulting linear regression equation is adequate, multiple correlation coefficients (MCC) and determination coefficients are used, as well as Fisher's test and Student's test. In the Excel table with regression results, they appear under the names of multiple R, R-square, F-statistic and t-statistic, respectively.

KMC R makes it possible to estimate the tightness of the probabilistic relationshipbetween independent and dependent variables. Its high value indicates a fairly strong relationship between the variables "Number of the month" and "Price of goods N in rubles per 1 ton". However, the nature of this connection remains unknown.

The square of the coefficient of determination R2(RI) is a numerical characteristic of the share of the total spread and shows the spread of which part of the experimental data, i.e. values of the dependent variable corresponds to the linear regression equation. In the problem under consideration, this value is equal to 84.8%, i.e., the statistical data are described with a high degree of accuracy by the obtained SD.

F-statistic, also called Fisher's test, is used to assess the significance of a linear relationship, refuting or confirming the hypothesis of its existence.

The value of t-statistics (Student's t-test) helps to evaluate the significance of the coefficient with an unknown or free term of a linear relationship. If the value of the t-criterion is > tcr, then the hypothesis of the insignificance of the free term of the linear equation is rejected.

In the problem under consideration for the free member, using Excel tools, it was obtained that t=169, 20903, and p=2, 89E-12, i.e. we have a zero probability that the correct hypothesis about insignificance of the free term. For the coefficient at unknown t=5.79405, and p=0.001158. In other words, the probability that the correct hypothesis about the insignificant coefficient at unknown will be rejected is 0.12%.

Thus, it can be argued that the resulting linear regression equationadequately.

## The problem of the expediency of buying a block of shares

Multiple regression in Excel is done using the same Data Analysis tool. Consider a specific applied problem.

The management of NNN must make a decision on the advisability of buying a 20% stake in JSC MMM. The cost of the package (JV) is 70 million US dollars. NNN specialists collected data on similar transactions. It was decided to evaluate the value of the block of shares according to such parameters, expressed in millions of US dollars, as:

• accounts payable (VK);
• Annual turnover (VO);
• accounts receivable (VD);
• value of fixed assets (COF).

In addition, the parameter is the company's wage arrears (V3 P) in thousands of US dollars.

First of all, you need to create a table of initial data. It looks like this:

Next:

• call the "Data Analysis" window;
• select the Regression section;
• enter the range of values of dependent variables from column G into the "Input interval Y" box;
• click on the red arrow icon to the right of the Input Interval X window and select the range of all values from columns B, C, D, F on the sheet.

Check the item "New worksheet" and press "Ok".

Get the regression analysis for the given problem.

## Study of results and conclusions

"Collect" from the rounded data presented above on the Excel spreadsheet sheet, the regression equation:

SP=0.103SOF + 0.541VO – 0.031VK +0.405VD +0.691VZP – 265, 844.

In a more familiar mathematical form, it can be written as:

y=0, 103x1 + 0, 541x2 – 0, 031x3 +0, 405x4 +0, 691x5 – 265, 844

Data for JSC "MMM" are presented in the table:

 SOF, USD VO, USD VK, USD VD, USD VZP, USD SP, USD 102, 5 535, 5 45, 2 41, 5 21, 55 64, 72

Substituting them into the regression equation, we get a figure of 64.72 million US dollars. This means that the shares of MMM JSC are not worth buying, as their value of 70 million US dollars is rather overpriced.

As you can see, the use of the Excel spreadsheet and the regression equation made it possible to make an informed decision regarding the appropriateness of a very specific transaction.

Now you know what regression is. The examples in Excel discussed above will help you solve practical problems from the field of econometrics.

## Editor's choice

• How to block a person in "Contact"? Protect your page from ill-wishers

The popularity of social networks is increasing every day. This is due to the fact that such an instrument of mass communication has entered everyday life and firmly established itself in it, as one of the most convenient ways of communication. Almost every advanced person has his own page on the social network, where he posts his photos, posts posts and, most importantly, communicates with friends and makes new acquaintances

• Why doesn't "VK" work? The VKontakte website is not working: what to do?

Social media addiction is considered one of the most serious cyber diseases. Every day, tens of millions of users around the globe visit their personal accounts, chatting with friends, discussing the latest events in the world. Social networks have become such a dense part of our lives that it seems unusual to spend a day without looking at them even for a second

• What is button accordion? Learn slang online

The Internet is a territory with its own rules, language, style of communication, and, more recently, laws. It so happened that many new users may be left at a loss after communicating with hardened Internet - "hacks". We are talking, first of all, about the use of slang by the latter in their language, which is far from accessible to everyone who opened the page of any forum for the first time

• Hashtag - what is it? How to use hashtags

The Internet is constantly progressing, and every day new features appear in it that make working on it more convenient. They are associated, as a rule, with the optimization and structuring of information, since there is such a large amount of it on the World Wide Web that it is sometimes difficult to find the necessary material. For example, in the recent past, so-called hashtags appeared

• Why is the internet slow? The most important reasons

In the last couple of years, the Internet has become an important part of the life of almost every person. Using the World Wide Web, we can find out the weather, the latest world news, download the necessary programs, play online games, etc. When the Internet is working well, doing the above things is very nice, but what if all of a sudden web pages start to load for a very long time, and online games slow down?