simple regression model

55
KTEE 310 FINANCIAL ECONOMETRICS THE SIMPLE REGRESSION MODEL Chap 4 – S & W 1 Dr TU Thuy Anh Faculty of International Economics

Upload: le-thai-son

Post on 28-Dec-2015

46 views

Category:

Documents


0 download

DESCRIPTION

Econometrics

TRANSCRIPT

Page 1: Simple Regression Model

KTEE 310 FINANCIAL ECONOMETRICS

THE SIMPLE REGRESSION MODEL Chap 4 – S & W

1

Dr TU Thuy AnhFaculty of International Economics

Page 2: Simple Regression Model

Output and labor use

2

Page 3: Simple Regression Model

1

Output and labor use

The scatter diagram shows output q plotted against labor use l for a sample of 24 observations

Page 4: Simple Regression Model

4

Output and labor useEconomic theory (theory of firms) predicts

An increase in labor use leads to an increase output in the SR, as long as MPL>0, other things being equal

From the data:consistent with common sensethe relationship looks linear

Setting up:Want to know the impact of labor use on output

=> Y: output, X: labor use

Page 5: Simple Regression Model

1

Y

SIMPLE LINEAR REGRESSION MODEL

Suppose that a variable Y is a linear function of another variable X, with unknown parameters 1 and 2 that we wish to estimate.

XY 21

1

XX1 X2 X3 X4

Page 6: Simple Regression Model

Suppose that we have a sample of 4 observations with X values as shown.

SIMPLE LINEAR REGRESSION MODEL

2

XY 21

1

Y

XX1 X2 X3 X4

Page 7: Simple Regression Model

If the relationship were an exact one, the observations would lie on a straight line and we would have no trouble obtaining accurate estimates of 1 and 2.

Q1

Q2

Q3

Q4

SIMPLE LINEAR REGRESSION MODEL

3

XY 21

1

Y

XX1 X2 X3 X4

Page 8: Simple Regression Model

P4

In practice, most economic relationships are not exact and the actual values of Y are different from those corresponding to the straight line.

P3P2

P1

Q1

Q2

Q3

Q4

SIMPLE LINEAR REGRESSION MODEL

4

XY 21

1

Y

XX1 X2 X3 X4

Page 9: Simple Regression Model

P4

To allow for such divergences, we will write the model as Y = 1 + 2X + u, where u is a disturbance term.

P3P2

P1

Q1

Q2

Q3

Q4

SIMPLE LINEAR REGRESSION MODEL

5

XY 21

1

Y

XX1 X2 X3 X4

Page 10: Simple Regression Model

P4

Each value of Y thus has a nonrandom component, 1 + 2X, and a random component, u. The first observation has been decomposed into these two components.

P3P2

P1

Q1

Q2

Q3

Q4u1

SIMPLE LINEAR REGRESSION MODEL

6

XY 21

1

Y

121 X

XX1 X2 X3 X4

Page 11: Simple Regression Model

P4

In practice we can see only the P points.

P3P2

P1

SIMPLE LINEAR REGRESSION MODEL

7

Y

XX1 X2 X3 X4

Page 12: Simple Regression Model

P4

Obviously, we can use the P points to draw a line which is an approximation to the line Y = 1 + 2X. If we write this line Y = b1 + b2X, b1 is an estimate of 1 and b2

is an estimate of 2.

P3P2

P1

^

SIMPLE LINEAR REGRESSION MODEL

8

XbbY 21ˆ

b1

Y

XX1 X2 X3 X4

Page 13: Simple Regression Model

P4

The line is called the fitted model and the values of Y predicted by it are called the fitted values of Y. They are given by the heights of the R points.

P3P2

P1

R1

R2

R3 R4

SIMPLE LINEAR REGRESSION MODEL

9

XbbY 21ˆ

b1

Y (fitted value)

Y (actual value)

Y

XX1 X2 X3 X4

Page 14: Simple Regression Model

P4

XX1 X2 X3 X4

The discrepancies between the actual and fitted values of Y are known as the residuals.

P3P2

P1

R1

R2

R3 R4

(residual)

e1

e2

e3

e4

SIMPLE LINEAR REGRESSION MODEL

10

XbbY 21ˆ

b1

Y (fitted value)

Y (actual value)

eYY ˆY

Page 15: Simple Regression Model

P4

Note that the values of the residuals are not the same as the values of the disturbance term. The diagram now shows the true unknown relationship as well as the fitted line.

The disturbance term in each observation is responsible for the divergence between the nonrandom component of the true relationship and the actual observation.

P3P2

P1

SIMPLE LINEAR REGRESSION MODEL

12

Q2Q1

Q3

Q4

XbbY 21ˆ

XY 21

1

b1

Y (fitted value)

Y (actual value)

Y

XX1 X2 X3 X4

unknown PRF

estimatedSRF

Page 16: Simple Regression Model

P4

The residuals are the discrepancies between the actual and the fitted values.

If the fit is a good one, the residuals and the values of the disturbance term will be similar, but they must be kept apart conceptually.

P3P2

P1

R1

R2

R3 R4

SIMPLE LINEAR REGRESSION MODEL

13

XbbY 21ˆ

XY 21

1

b1

Y (fitted value)

Y (actual value)

Y

XX1 X2 X3 X4

unknown PRF

estimatedSRF

Page 17: Simple Regression Model

17

Lbb

Mini

XbbYbb

Minie

bbMinRSS

bbMin

eXbbY

eXbbY

2,

1

221

2,

1

2

2,

12,

1

21

21

You must prevent negative residuals from cancelling positive ones, and one way to do this is to use the squares of the residuals.

We will determine the values of b1 and b2 that minimize RSS, the sum of the squares of the residuals.

Least squares criterion:

Page 18: Simple Regression Model

0

1

2

3

4

5

6

0 1 2 3

1Y

2Y3Y

DERIVING LINEAR REGRESSION COEFFICIENTS

YXbbY

uXY

21

21

ˆ :line Fitted

:model True

X

This sequence shows how the regression coefficients for a simple regression model are derived, using the least squares criterion (OLS, for ordinary least squares)

We will start with a numerical example with just three observations: (1,3), (2,5), and (3,6)1

Page 19: Simple Regression Model

0

1

2

3

4

5

6

0 1 2 3

1Y

2Y3Y

211 bbY 212 2ˆ bbY

213 3ˆ bbY

DERIVING LINEAR REGRESSION COEFFICIENTS

Y

b2b1

X

Writing the fitted regression as Y = b1 + b2X, we will determine the values of b1 and b2 that minimize RSS, the sum of the squares of the residuals.

3

^

XbbY

uXY

21

21

ˆ :line Fitted

:model True

Page 20: Simple Regression Model

0

1

2

3

4

5

6

0 1 2 3

1Y

2Y3Y

211 bbY 212 2ˆ bbY

213 3ˆ bbY

Given our choice of b1 and b2, the residuals are as shown.

DERIVING LINEAR REGRESSION COEFFICIENTS

Y

b2b1

21333

21222

21111

36ˆ

25ˆ

bbYYe

bbYYe

bbYYe

4

XbbY

uXY

21

21

ˆ :line Fitted

:model True

X

221

221

221

23

22

21 )36()25()3( bbbbbbeeeRSS

Page 21: Simple Regression Model

SIMPLE REGRESSION ANALYSIS

0281260 211

bb

bRSS

06228120 212

bb

bRSS

50.1,67.1 21 bb

The first-order conditions give us two equations in two unknowns. Solving them, we find that RSS is minimized when b1 and b2 are equal to 1.67 and 1.50, respectively.

10

221

221

221

23

22

21 )36()25()3( bbbbbbeeeRSS

Page 22: Simple Regression Model

0

1

2

3

4

5

6

0 1 2 3

1Y

2Y3Y

17.31 Y67.4ˆ

2 Y

17.6ˆ3 Y

DERIVING LINEAR REGRESSION COEFFICIENTS

YXY

uXY

50.167.1ˆ :line Fitted

:model True 21

X

The fitted line and the fitted values of Y are as shown.

12

1.501.67

Page 23: Simple Regression Model

DERIVING LINEAR REGRESSION COEFFICIENTS

XXnX1

Y

XbbY

uXY

21

21

ˆ :line Fitted

:model True

1Y

nY

Now we will do the same thing for the general case with n observations.

13

Page 24: Simple Regression Model

DERIVING LINEAR REGRESSION COEFFICIENTS

XXnX1

Y

b1

XbbY

uXY

21

21

ˆ :line Fitted

:model True

1211 XbbY

1Y

b2

nY

nn XbbY 21ˆ

Given our choice of b1 and b2, we will obtain a fitted line as shown.

14

Page 25: Simple Regression Model

DERIVING LINEAR REGRESSION COEFFICIENTS

The residual for the first observation is defined.

Similarly we define the residuals for the remaining observations. That for the last one is marked.

XXnX1

Y

b1

XbbY

uXY

21

21

ˆ :line Fitted

:model True

nnnnn XbbYYYe

XbbYYYe

21

1211111

ˆ

.....

ˆ

1211 XbbY

1Y

b2

nY

1e

nenn XbbY 21

ˆ

16

Page 26: Simple Regression Model

26

Lbb

Mini

XbbYbb

Minie

bbMinRSS

bbMin

eXbbY

eXbbY

2,

1

221

2,

1

2

2,

12,

1

21

21

We will determine the values of b1 and b2 that minimize RSS, the sum of the squares of the residuals.

Least squares criterion:

Page 27: Simple Regression Model

DERIVING LINEAR REGRESSION COEFFICIENTS

XXnX1

Y

b1

XbbY

uXY

21

21

ˆ :line Fitted

:model True

1211 XbbY

1Y

b2

nY

nn XbbY 21ˆ

XbYb 21

We chose the parameters of the fitted line so as to minimize the sum of the squares of the residuals. As a result, we derived the expressions for b1 and b2 using the first order condition

40

22

XX

YYXXb

i

ii

Page 28: Simple Regression Model

Practice – calculate b1 and b2

Year Output - Y Labor - X

1899 100 100

1900 101 105

1901 112 110

1902 122 118

1903 124 123

1904 122 116

1905 143 125

1906 152 133

1907 151 138

1908 126 121

1909 155 140

1910 159 144

1911 153 145

1912 177 152

1913 184 154

1914 169 149

1915 189 154

1916 225 182

1917 227 196

1918 223 200

1919 218 193

1920 231 193

1921 179 147

1922 240 161

28

Page 29: Simple Regression Model

29

Model 1: OLS, using observations 1899-1922 (T = 24)Dependent variable: q

coefficient std. error t-ratio p-value --------------------------------------------------------- const -38,7267 14,5994 -2,653 0,0145 ** l 1,40367 0,0982155 14,29 1,29e-012 ***

Mean dependent var 165,9167 S.D. dependent var 43,75318Sum squared resid 4281,287 S.E. of regression 13,95005R-squared 0,902764 Adjusted R-squared 0,898344F(1, 22) 204,2536 P-value(F) 1,29e-12Log-likelihood -96,26199 Akaike criterion 196,5240Schwarz criterion 198,8801 Hannan-Quinn 197,1490rho 0,836471 Durbin-Watson 0,763565

INTERPRETATION OF A REGRESSION EQUATION

This is the output from a regression of output q, using gretl.

Page 30: Simple Regression Model

30

80

100

120

140

160

180

200

220

240

260

100 120 140 160 180 200

q

l

q versus l (with least squares fit)

Y = -38,7 + 1,40X

Here is the scatter diagram again, with the regression line shown

Page 31: Simple Regression Model

31

THE COEFFICIENT OF DETERMINATION

Question: how does the sample regression line fit the data at hand?

How much does the independent var. explain the variation in the dependent var. (in the sample)?

We have:

2

22

ˆ( )

( )

ii

ii

Y YR

Y Y

2 2 2ˆ( ) ( )i i ii i i

Y Y Y Y e

total variation of Y

The share explained by model

Page 32: Simple Regression Model

GOODNESS OF FIT

RSSESSTSS

2

22

)(

)ˆ(

YY

YY

TSSESS

Ri

i

222 ˆiii eYYYY

The main criterion of goodness of fit, formally described as the coefficient of determination, but usually referred to as R2, is defined to be the ratio of ESS to TSS, that is, the proportion of the variance of Y explained by the regression equation.

Page 33: Simple Regression Model

GOODNESS OF FIT

2

2

2

)(1

YY

e

TSSRSSTSS

Ri

i

2

22

)(

)ˆ(

YY

YY

TSSESS

Ri

i

The OLS regression coefficients are chosen in such a way as to minimize the sum of the squares of the residuals. Thus it automatically follows that they maximize R2.

222 ˆiii eYYYY RSSESSTSS

Page 34: Simple Regression Model

34

Model 1: OLS, using observations 1899-1922 (T = 24)Dependent variable: q

coefficient std. error t-ratio p-value --------------------------------------------------------- const -38,7267 14,5994 -2,653 0,0145 ** l 1,40367 0,0982155 14,29 1,29e-012 ***

Mean dependent var 165,9167 S.D. dependent var 43,75318Sum squared resid 4281,287 S.E. of regression 13,95005R-squared 0,902764 Adjusted R-squared 0,898344F(1, 22) 204,2536 P-value(F) 1,29e-12Log-likelihood -96,26199 Akaike criterion 196,5240Schwarz criterion 198,8801 Hannan-Quinn 197,1490rho 0,836471 Durbin-Watson 0,763565

INTERPRETATION OF A REGRESSION EQUATION

This is the output from a regression of output q, using gretl.

Page 35: Simple Regression Model

BASIC (Gauss-Makov) ASSUMPTION OF THE OLS

35

1. zero systematic error: E(ui) =0

2. Homoscedasticity: var(ui) = δ2 for all i

3. No autocorrelation: cov(ui; uj) = 0 for all i

#j

4. X is non-stochastic

5. u~ N(0, δ2)

Page 36: Simple Regression Model

BASIC ASSUMPTION OF THE OLS

Homoscedasticity: var(ui) = δ2 for all i

36

f(u)

X1 X2 Xi X

Y

ii XYPRF 21:

Mật

độ

xác

suất

củ

a ui

Page 37: Simple Regression Model

BASIC ASSUMPTION OF THE OLS

Heteroscedasticity: var(ui) = δi2

37

f(u)

X1 X2 Xi X

Y

ii XYPRF 21:

Mật

độ

xác

suất

của

ui

Page 38: Simple Regression Model

BASIC ASSUMPTION OF THE OLS

No autocorrelation: cov(ui; uj) = 0 for all i #j

38

(a) (b)

(c )

iu iu

iu iu

iu

iu

iu iu

iuiu

iu

iu

Page 39: Simple Regression Model

Simple regression model: Y = 1 + 2X + u

We saw in a previous slideshow that the slope coefficient may be decomposed into the true value and a weighted sum of the values of the disturbance term.

UNBIASEDNESS OF THE REGRESSION COEFFICIENTS

ii

i

ii uaXX

YYXXb 222

2XX

XXa

j

ii

Page 40: Simple Regression Model

Simple regression model: Y = 1 + 2X + u

2 is fixed so it is unaffected by taking expectations. The first expectation rule states that the expectation of a sum of several quantities is equal to the sum of their expectations.

UNBIASEDNESS OF THE REGRESSION COEFFICIENTS

ii

i

ii uaXX

YYXXb 222

2XX

XXa

j

ii

2

22

22

iiii

ii

uEauaE

uaEEbE

iinnnnii uaEuaEuaEuauaEuaE ...... 1111

Page 41: Simple Regression Model

Simple regression model: Y = 1 + 2X + u

Now for each i, E(aiui) = aiE(ui)

UNBIASEDNESS OF THE REGRESSION COEFFICIENTS

ii

i

ii uaXX

YYXXb 222

2XX

XXa

j

ii

2

22

22

iiii

ii

uEauaE

uaEEbE

Page 42: Simple Regression Model

Simple regression model: Y = 1 + 2X + u

Efficiency

PRECISION OF THE REGRESSION COEFFICIENTS

The Gauss–Markov theorem states that, provided that the regression model assumptions are valid, the OLS estimators are BLUE: Linear, Unbiased,

Minimum variance in the class of all unbiased estimators

probability densityfunction of b2

OLS

other unbiased estimator

2 b2

Page 43: Simple Regression Model

Simple regression model: Y = 1 + 2X + u

PRECISION OF THE REGRESSION COEFFICIENTS

In this sequence we will see that we can also obtain estimates of the standard deviations of the distributions. These will give some idea of their likely reliability and will provide a basis for tests of hypotheses.

probability densityfunction of b2

2

standard deviation of density function of b2

b2

Page 44: Simple Regression Model

Simple regression model: Y = 1 + 2X + u

PRECISION OF THE REGRESSION COEFFICIENTS

Expressions (which will not be derived) for the variances of their distributions are shown above.

We will focus on the implications of the expression for the variance of b2. Looking at the numerator, we see that the variance of b2 is proportional to u

2. This is as we would expect. The more noise there is in the model, the less precise will be our estimates.

2

222 1

1 XX

Xn

i

ub

)(MSD

2

2

22

2 XnXXu

i

ub

Page 45: Simple Regression Model

Simple regression model: Y = 1 + 2X + u

PRECISION OF THE REGRESSION COEFFICIENTS

However the size of the sum of the squared deviations depends on two factors: the number of observations, and the size of the deviations of Xi around its sample mean. To discriminate between them, it is convenient to define the mean square deviation of X, MSD(X).

2

222 1

1 XX

Xn

i

ub

)(MSD

2

2

22

2 XnXXu

i

ub

21)(MSD XXn

X i

Page 46: Simple Regression Model

Simple regression model: Y = 1 + 2X + u

PRECISION OF THE REGRESSION COEFFICIENTS

This is illustrated by the diagrams above. The nonstochastic component of the relationship, Y = 3.0 + 0.8X, represented by the dotted line, is the same in both diagrams.

However, in the right-hand diagram the random numbers have been multiplied by a factor of 5. As a consequence, the regression line, the solid line, is a much poorer approximation to the nonstochastic relationship.

-15

-10

-5

0

5

10

15

20

25

30

35

0 5 10 15 20

-15

-10

-5

0

5

10

15

20

25

30

35

0 5 10 15 20

Y Y

X X

Y = 3.0 + 0.8X

Page 47: Simple Regression Model

Simple regression model: Y = 1 + 2X + u

PRECISION OF THE REGRESSION COEFFICIENTS

Looking at the denominator, the larger is the sum of the squared deviations of X, the smaller is the variance of b2.

2

222 1

1 XX

Xn

i

ub

)(MSD

2

2

22

2 XnXXu

i

ub

Page 48: Simple Regression Model

Simple regression model: Y = 1 + 2X + u

PRECISION OF THE REGRESSION COEFFICIENTS

11

2

222 1

1 XX

Xn

i

ub

)(MSD

2

2

22

2 XnXXu

i

ub

21)(MSD XXn

X i

A third implication of the expression is that the variance is inversely proportional to the mean square deviation of X.

Page 49: Simple Regression Model

Simple regression model: Y = 1 + 2X + u

PRECISION OF THE REGRESSION COEFFICIENTS

In the diagrams above, the nonstochastic component of the relationship is the same and the same random numbers have been used for the 20 values of the disturbance term.

-15

-10

-5

0

5

10

15

20

25

30

35

0 5 10 15 20

-15

-10

-5

0

5

10

15

20

25

30

35

0 5 10 15 20

Y Y

X X

Y = 3.0 + 0.8X

Page 50: Simple Regression Model

Simple regression model: Y = 1 + 2X + u

PRECISION OF THE REGRESSION COEFFICIENTS

However, MSD(X) is much smaller in the right-hand diagram because the values of X are much closer together.

-15

-10

-5

0

5

10

15

20

25

30

35

0 5 10 15 20

-15

-10

-5

0

5

10

15

20

25

30

35

0 5 10 15 20

Y Y

X X

Y = 3.0 + 0.8X

Page 51: Simple Regression Model

Simple regression model: Y = 1 + 2X + u

PRECISION OF THE REGRESSION COEFFICIENTS

Hence in that diagram the position of the regression line is more sensitive to the values of the disturbance term, and as a consequence the regression line is likely to be relatively inaccurate.

-15

-10

-5

0

5

10

15

20

25

30

35

0 5 10 15 20

-15

-10

-5

0

5

10

15

20

25

30

35

0 5 10 15 20

Y Y

X X

Y = 3.0 + 0.8X

Page 52: Simple Regression Model

Simple regression model: Y = 1 + 2X + u

PRECISION OF THE REGRESSION COEFFICIENTS

2

222 1

1 XX

Xn

i

ub )(MSD

2

2

22

2 XnXXu

i

ub

We cannot calculate the variances exactly because we do not know the variance of the disturbance term. However, we can derive an estimator of u

2 from the residuals.

Page 53: Simple Regression Model

Simple regression model: Y = 1 + 2X + u

PRECISION OF THE REGRESSION COEFFICIENTS

2

222 1

1 XX

Xn

i

ub )(MSD

2

2

22

2 XnXXu

i

ub

22 11)(MSD ii e

nee

ne

Clearly the scatter of the residuals around the regression line will reflect the unseen scatter of u about the line Yi = 1 + b2Xi, although in general the residual and the value of the disturbance term in any given observation are not equal to one another. One measure of the scatter of the residuals is their mean square error, MSD(e), defined as shown.

Page 54: Simple Regression Model

54

Model 1: OLS, using observations 1899-1922 (T = 24)Dependent variable: q

coefficient std. error t-ratio p-value --------------------------------------------------------- const -38,7267 14,5994 -2,653 0,0145 ** l 1,40367 0,0982155 14,29 1,29e-012 ***

Mean dependent var 165,9167 S.D. dependent var 43,75318Sum squared resid 4281,287 S.E. of regression 13,95005R-squared 0,902764 Adjusted R-squared 0,898344F(1, 22) 204,2536 P-value(F) 1,29e-12Log-likelihood -96,26199 Akaike criterion 196,5240Schwarz criterion 198,8801 Hannan-Quinn 197,1490rho 0,836471 Durbin-Watson 0,763565

PRECISION OF THE REGRESSION COEFFICIENTS

The standard errors of the coefficients always appear as part of the output of a regression. The standard errors appear in a column to the right of the coefficients.

Page 55: Simple Regression Model

Summing up

55

Simple Linear Regression model:Verify dependent, independent variables,

parameters, and the error termsInterpret estimated parameters b1 & b2 as they

show the relationship between X and Y.OLS provides BLUE estimators for the

parameters under 5 Gauss-Makov ass.What next:

Estimation of multiple regression model