simple linier regression model

7/30/2019 Simple Linier Regression Model

1/30

Copyright 1996 Lawrence C. Marsh 3.1Chapter 3

The Simple LinearRegressionModel

Copyright 1997 John Wiley & Sons, Inc. All rights reserved. Reproduction or translation of this work beyondthat permitted in Section 117 of the 1976 United States Copyright Act without the express written permission of thecopyright owner is unlawful. Request for further information should be addressed to the Permissions Department,John Wiley & Sons, Inc. The purchaser may make back-up copies for his/her own use only and not for distributionor resale. The Publisher assumes no responsibility for errors, omissions, or damages, caused by the use of these

programs or from the use of the information contained herein.


2/30

Copyright 1996 Lawrence C. Marsh

Purpose of Regression Analysis

3.2

1. Estimate a relationship amongeconomic variables, such as y = f(x).

2. Forecast or predict the value of onevariable, y, based on the value of

another variable, x.


3/30

Copyright 1996 Lawrence C. Marsh 3.3

Weekly Food Expenditures

y = dollars spent each week on food items.x = consumers weekly income.

The relationship between x and the expectedvalue of y , given x, might be linear :

E(y|x) = 1 + 2 x


4/30

Copyright 1996 Lawrence C. Marsh 3.4f(y|x=480)

f(y|x=480)

yy|x=480

Figure 3.1a Probability Distribution f(y|x=480)

of Food Expenditures if given income x=$480.


5/30

Copyright 1996 Lawrence C. Marsh 3.5f(y|x) f(y|x=480) f(y|x=800)

yy|x=480 y|x=800

Figure 3.1b Probability Distribution of Food

Expenditures if given income x=$480 and x=$800.


6/30


{1x

E(y|x)

E(y|x)

AverageExpenditure

E(y|x)= 1+2x

2=E(y|x)

x

x (income)

Figure 3.2 The Economic Model: a linear relationship

between avearage expenditure on food and income.


7/30


.

.

xtx1=480 x 2=800

ytf(y t)

e x p

e n d i t

u r e

Homoskedastic Case

income

3.7

Figure 3.3. The probability density functionfor y t at two levels of household income, x t


8/30


.x tx1 x2

yt

f(y t)

e x p

e n d i t

u r e

Heteroskedastic Case

x3

. .

income

3.8

Figure 3.3+. The variance of y t increasesas household income, x t , increases.


9/30

Copyright 1996 Lawrence C. Marsh 3.9Assumptions of the Simple Linear

Regression Model - I

1. The average value of y, given x, is given bythe linear regression:

E(y) = 1 + 2x

2. For each value of x, the values of y aredistributed around their mean with variance :var(y) = 2

3. The values of y are uncorrelated, having zerocovariance and thus no linear relationship:cov(y i ,y j) = 0

4. The variable x must take at least two differentvalues , so that x c, where c is a constant.


10/30


One more assumption that is often used inpractice but is not required for least squares:

3.10

5. (optional) The values of y are normallydistributed about their mean for eachvalue of x:

y ~ N [( 1+ 2x), 2 ]

h h


11/30


The Error Term3.11

y is a random variable composed of two parts :

I. Systematic component: E(y) = 1 + 2xThis is the mean of y .

II. Random component: e = y - E(y)= y - 1 - 2x

This is called the random error .

Together E(y) and e form the model :y = 1 + 2x + e

C i h 1996 L C M h


12/30


.

..

.

y4

y1

y2y3

x1 x2 x3 x4

}

}

{

{

e1

e2e3

e4 E(y) = 1 + 2x

x

y 3.12

Figure 3.5 The relationship among y, e andthe true regression line.

C i h 1996 L C M h


13/30


}.

}.

.

.

y4

y1

y2 y3

x1 x2 x3 x4

{

{

e1

e2e3

e4

x

Figure 3.7a The relationship among y, e andthe fitted regression line.

^

y = b 1 + b2x^.

..

.y1y2

y3y4

^ ^

^^

^

^

^^

3.13y

C ight 1996 L C M h


14/30


{.

.

.

.

.

y4

y1

y2 y3

x1 x2 x3 x4 x

y = b 1 + b2x^

.

..y1^

y3^y4^

y = b 1 + b2x^** *

*

e1

^*

e2^*

y2^*

e3^*

* e4^*

*

{{

3.14y

Figure 3.7b The sum of squared residualsfrom any other line will be larger.

Copyright 1996 Lawrence C Marsh


15/30

Copyright 1996 Lawrence C. Marshf(.) f(e) f(y)

0 1+2x

3.15

Figure 3.4 Probability density function for e and y



16/30

Copyright 1996 Lawrence C. Marsh 3.16The Error Term Assumptions

1. The value of y, for each value of x, isy = 1 + 2x + e

2. The average value of the random error e is:E(e) = 03. The variance of the random error e is:

var(e) = 2 = var(y)4. The covariance between any pair of es is:

cov(e i ,e j) = cov(y i ,y j) = 05. x must take at least two different values so that

x c, where c is a constant.

6. e is normally distributed with mean 0, var(e)= 2

(optional) e ~ N(0, 2)



17/30

Copyright 1996 Lawrence C. Marsh 3.17Unobservable Nature

of the Error Term1. Unspecified factors / explanatory variables,

not in the model, may be in the error term.

2. Approximation error is in the error term if relationship between y and x is not

exactly a perfectly linear relationship.

3. Strictly unpredictable random behavior thatmay be unique to that observation is in error.



18/30


Population regression values:

y t = 1 + 2x t + e t

Population regression line:E(y t|x t) = 1 + 2x t

Sample regression values:y t = b1 + b2x t + e t

Sample regression line:

y t = b1 + b2x t

^

^

3.18



19/30


y t = 1 + 2x t + e t

et

= yt

- 1

- 2x

t

3.19

Minimize error sum of squared deviations:

S(1,2) = (y t - 1 - 2x t )2 (3.3.4)t=1

T

Copyright 1996 Lawrence C Marsh 3 20


20/30

Copyright 1996 Lawrence C. Marsh 3.20Minimize w.r.t. 1 and 2:

S(1,2) = (y t - 1 - 2x t )2 (3.3.4)t =1

T

= - 2

(yt-

1-

2x

t)

S(.)

1

S(.) 2 = - 2 x t (y t - 1 - 2x t )

Set each of these two derivatives equal to zero andsolve these two equations for the two unknowns: 1 2

Copyright 1996 Lawrence C. Marsh 3 21


21/30


S(.)S(.)

ib i

..

.

Minimize w.r.t. 1 and 2:

S(.) = (y t - 1 - 2x t )2t =1T

S(.)

i< 0

S(.) i

> 0S(.)

i= 0


3 22


22/30

Copyright 1996 Lawrence C. MarshTo minimize S(.), you set the twoderivatives equal to zero to get:

3.22

= - 2 (y t - b1 - b2x t ) = 0S(.)

1S(.)

2= - 2

x t(y t - b1 - b2x t

) = 0

When these two terms are set to zero,

1 and 2 become b1 and b2 because they no longerrepresent just any value of 1 and 2 but the specialvalues that correspond to the minimum of S(.) .



23/30

py g

- 2 (y t - b1 - b2x t ) = 0

- 2 x t (y t - b1 - b2x t ) = 0

y t - Tb1 - b2 x t = 0

x ty t - b1 x t - b2 xt = 02

Tb1 + b2 x t = y t

b1 x t + b2 xt = x ty t2

3.23



24/30

py g 3.24Tb1 + b2 x t = y t

b1 x t + b2 xt = x ty t2

Solve for b1 and b2 using definitions of x and y

T x tyt - x t y t

T x

t- (

x

t)2 2

b2 =

b1 = y - b2 x


3 25


25/30

py g

elasticities3.25

percentage change in ypercentage change in x = = x / x

y / y=

y xx y

Using calculus, we can get the elasticity at a point:

= lim y xx y

= y xx yx 0


3 26


26/30

py g

applying elasticities3.26

E(y) = 1

+ 2

x

E(y)x = 2

E(y)

x= 2 =

E(y)

x

E(y)

x


3 27


27/30

estimating elasticitiesy

x= b

2 =

y

x

y

x^

yt

= b1

+ b2

xt

= 4 + 1.5 xt

^

x = 8 = average number of years of experiencey = $10 = average wage rate

3.27

= 1.5 = 1.28

10= b2 yx^


3 28


28/30

Prediction 3.28

yt = 4 + 1.5 x t^

Estimated regression equation:

x t = years of experienceyt = predicted wage rate^

If x t = 2 years, then yt = $7.00 per hour .^

If x t = 3 years, then yt = $8.50 per hour .^



29/30

3.29

log-log modelsln(y) =

1+

2ln(x)

ln(y)x

ln(x)x= 2

y

x = 21

y

x

x

1

x



30/30

3.30yx = 2

1y

xx

1x

= 2yx

xy

elasticity of y with respect to x:

= 2y

x

x

y =

simple linier regression model

Documents