Regression analysis is a statistical research method that allows you to show the dependence of a parameter on one or more independent variables. In the pre-computer era, its use was quite difficult, especially when it came to large amounts of data. Today, having learned how to build a regression in Excel, you can solve complex statistical problems in just a couple of minutes. Below are specific examples from the field of economics.
Types of regression
The concept itself was introduced into mathematics by Francis G alton in 1886. Regression happens:
Let's consider the problem of determining the dependence of the number of retired team members on the average salary at 6 industrial enterprises.
Task. At six enterprises, we analyzed the average monthly salary and the number of employees who left of their own free will. ATtabular form we have:
|Number of quitters||Salary|
For the problem of determining the dependence of the number of laid-off workers on the average salary at 6 enterprises, the regression model has the form of the equation Y=a0 + a1 x1 +…+akxk, where xi- influencing variables, ai - regression coefficients, a k - number of factors.
For this task, Y is the indicator of employees who left, and the influencing factor is the salary, which we denote by X.
Using the power of the spreadsheetExcel processor
Regression analysis in Excel must be preceded by the application of built-in functions to the available tabular data. However, for these purposes, it is better to use the very useful add-in "Analysis Toolkit". To activate it you need:
- from the "File" tab go to the "Options" section;
- in the window that opens, select the line "Add-ons";
- click on the "Go" button located at the bottom, to the right of the "Management" line;
- check the box next to the name "Analysis Package" and confirm your actions by clicking "OK".
If everything is done correctly, the desired button will appear on the right side of the Data tab, located above the Excel worksheet.
Linear Regression in Excel
Now that we have all the necessary virtual tools at hand to perform econometric calculations, we can begin to solve our problem. To do this:
- click on the "Data Analysis" button;
- in the window that opens, click on the "Regression" button;
- in the tab that appears, enter the range of values for Y (the number of employees who quit) and for X (their salaries);
- confirm our actions by pressing the "Ok" button.
As a result, the program will automatically fill in a new sheet of the spreadsheet with regression analysis data. Note! Excel has the ability to manually set the location you prefer for this purpose. For example, it could be the same sheet where the Y and X values are, or even a new workbook specifically designed to store suchdata.
Analysis of regression results for R-square
In Excel, the data obtained in the course of processing the data of the considered example looks like:
First of all, you should pay attention to the value of the R-square. It is the coefficient of determination. In this example, R-square=0.755 (75.5%), i.e., the calculated parameters of the model explain the relationship between the considered parameters by 75.5%. The higher the value of the coefficient of determination, the more applicable the chosen model for a particular task. It is believed that it correctly describes the real situation when the value of R-square is higher than 0.8. If R-square is <0.5, then such a regression analysis in Excel cannot be considered reasonable.
The number 64, 1428 shows what the value of Y will be if all the variables xi in the model we are considering are set to zero. In other words, it can be argued that the value of the analyzed parameter is also influenced by other factors that are not described in a particular model.
The following coefficient -0.16285, located in cell B18, shows the weight of the influence of variable X on Y. This means that the average monthly salary of employees within the considered model affects the number of quitters with a weight of -0.16285, i.e. The degree of its influence is quite small. The "-" sign indicates that the coefficient has a negative value. This is obvious, since everyone knows that the higher the salary at the enterprise, the less people express a desire to terminate the employment contract or quit.
This term refers to a connection equation with several independent variables of the form:
y=f(x1+x2+…xm) + ε, where y is the resulting feature (dependent variable), and x1, x2, …xmare features -factors (independent variables).
Estimation of parameters
For multiple regression (MR) it is carried out using the method of least squares (LSM). For linear equations of the form Y=a + b1x1 +…+bmxm + ε build a system of normal equations (see below)
To understand the principle of the method, consider the two-factor case. Then we have a situation described by the formula
From here we get:
where σ is the variance of the corresponding feature reflected in the index.
LSM applies to the MP equation on a standardizable scale. In this case, we get the equation:
in which ty, tx1, …t xm - standardizable variables with means equal to 0; βi are the standardized regression coefficients and the standard deviation is 1.
Please note that all βiin this case are set as normalized and centralized, so their comparison with each other is considered correct andadmissible. In addition, it is customary to filter out factors, discarding those with the smallest values of βi.
Problem using linear regression equation
Suppose there is a table of the price dynamics of a particular item N during the last 8 months. It is necessary to make a decision on the advisability of purchasing its batch at a price of 1850 rubles/t.
|month number||name of the month||price of item N|
|2||1||January||1750 rubles per ton|
|3||2||February||1755 rubles per ton|
|4||3||march||1767 rubles per ton|
|5||4||April||1760 rubles per ton|
|6||5||May||1770 rubles per ton|
|7||6||June||1790 rubles per ton|
|8||7||July||1810 rubles per ton|
1840 rubles per ton
To solve this problem in the Excel spreadsheet, you need to use the Data Analysis tool already known from the above example. Next, select the "Regression" section and set the parameters. It must be remembered that in the "Input interval Y" field, a range of values for the dependent variable (in this case, the price of goods in specific months of the year) must be entered, and in the "Input interval X" - for the independent variable (month number). Confirm the action by clicking "Ok". On a new sheet (if it was indicated so), we get data for regression.
Build a linear equation of the form y=ax+b using them, where the parameters a and b are the coefficients of the row with the name of the month number and the coefficients and the "Y-intersection" line from the sheet with the results of regression analysis. Thus, the linear regression equation (LR) for task 3 is written as:
Price for item N=11, 714 month number + 1727, 54.
or in algebraic notation
y=11, 714 x + 1727, 54
Analysis of results
To decide whether the resulting linear regression equation is adequate, multiple correlation coefficients (MCC) and determination coefficients are used, as well as Fisher's test and Student's test. In the Excel table with regression results, they appear under the names of multiple R, R-square, F-statistic and t-statistic, respectively.
KMC R makes it possible to estimate the tightness of the probabilistic relationshipbetween independent and dependent variables. Its high value indicates a fairly strong relationship between the variables "Number of the month" and "Price of goods N in rubles per 1 ton". However, the nature of this connection remains unknown.
The square of the coefficient of determination R2(RI) is a numerical characteristic of the share of the total spread and shows the spread of which part of the experimental data, i.e. values of the dependent variable corresponds to the linear regression equation. In the problem under consideration, this value is equal to 84.8%, i.e., the statistical data are described with a high degree of accuracy by the obtained SD.
F-statistic, also called Fisher's test, is used to assess the significance of a linear relationship, refuting or confirming the hypothesis of its existence.
The value of t-statistics (Student's t-test) helps to evaluate the significance of the coefficient with an unknown or free term of a linear relationship. If the value of the t-criterion is > tcr, then the hypothesis of the insignificance of the free term of the linear equation is rejected.
In the problem under consideration for the free member, using Excel tools, it was obtained that t=169, 20903, and p=2, 89E-12, i.e. we have a zero probability that the correct hypothesis about insignificance of the free term. For the coefficient at unknown t=5.79405, and p=0.001158. In other words, the probability that the correct hypothesis about the insignificant coefficient at unknown will be rejected is 0.12%.
Thus, it can be argued that the resulting linear regression equationadequately.
The problem of the expediency of buying a block of shares
Multiple regression in Excel is done using the same Data Analysis tool. Consider a specific applied problem.
The management of NNN must make a decision on the advisability of buying a 20% stake in JSC MMM. The cost of the package (JV) is 70 million US dollars. NNN specialists collected data on similar transactions. It was decided to evaluate the value of the block of shares according to such parameters, expressed in millions of US dollars, as:
- accounts payable (VK);
- Annual turnover (VO);
- accounts receivable (VD);
- value of fixed assets (COF).
In addition, the parameter is the company's wage arrears (V3 P) in thousands of US dollars.
Solution using Excel spreadsheet
First of all, you need to create a table of initial data. It looks like this:
- call the "Data Analysis" window;
- select the Regression section;
- enter the range of values of dependent variables from column G into the "Input interval Y" box;
- click on the red arrow icon to the right of the Input Interval X window and select the range of all values from columns B, C, D, F on the sheet.
Check the item "New worksheet" and press "Ok".
Get the regression analysis for the given problem.
Study of results and conclusions
"Collect" from the rounded data presented above on the Excel spreadsheet sheet, the regression equation:
SP=0.103SOF + 0.541VO – 0.031VK +0.405VD +0.691VZP – 265, 844.
In a more familiar mathematical form, it can be written as:
y=0, 103x1 + 0, 541x2 – 0, 031x3 +0, 405x4 +0, 691x5 – 265, 844
Data for JSC "MMM" are presented in the table:
|SOF, USD||VO, USD||VK, USD||VD, USD||VZP, USD||SP, USD|
|102, 5||535, 5||45, 2||41, 5||21, 55||64, 72|
Substituting them into the regression equation, we get a figure of 64.72 million US dollars. This means that the shares of MMM JSC are not worth buying, as their value of 70 million US dollars is rather overpriced.
As you can see, the use of the Excel spreadsheet and the regression equation made it possible to make an informed decision regarding the appropriateness of a very specific transaction.
Now you know what regression is. The examples in Excel discussed above will help you solve practical problems from the field of econometrics.