the ols equations give a nice, clear intuitive meaning about the influence of the variable x on the...

)(Var),(Cov

1^

XYX

b

The OLS equations give a nice, clear intuitive meaning about the influence of the variable X on the size of the slope, since it shows that:

So_

1^_

0^

XbYb

are how the computer determines the size of the intercept and the slope respectively in an OLS regression

i) the greater the covariance between X and Y, the larger the (absolute value of) the slope

ii) the smaller the variance of X, the larger the (absolute value of) the slope

It is equally important to be able to interpret the effect of an estimated regression coefficient

Given OLS essentially passes a straight line through the data, then given

Y = b0 + b1X

1b

Xd

dY

So the OLS estimate of the slope will give an estimate of the unit change in the dependent variable y following a unit change in the level of the explanatory variable

(so you need to be aware of the units of measurement of your variables in order to be able to interpret what the OLS coefficient is telling you)

PROPERTIES OF OLS

6

Using the fact that for any individual observation, i, the ols residual is the difference between the actual and predicted value

iYiYiu ˆ^

iXbbiY 1^

0^

ˆ Sub. in

iiiii XbbYYYu 1^

0^^

ˆ So that

and dividing by N

iii XN

bbYN

uN

1111

^0

^^

Summing over all N observations in the data set

iXbbiYiu 1^

0^^

6

and since

XbbYu 1^

0^

_^

XbXbYYu 1^

)1^

(

_^

XbYb 1^

0^

iii XN

bbYN

uN

1111

^0

^^

So the mean value of the OLS residuals is zero

(as any residual should be, since random and unpredictable by definition)

Since the sum of any series divided by the sample size gives the mean, can write

01^

1^

_^

XbXbYYu

YY ˆ

the mean of the OLS predicted values equals the mean of the actual values in the data

(so OLS predicts average behaviour in the data set – another useful property)

The 2nd useful property of OLS is that

6

and since

XbbYu 1^

0^

_^

XbXbYYu 1^

)1^

(

_^

XbYb 1^

0^

iii XN

bbYN

uN

1111

^0

^^



Since the sum of any series divided by the sample size gives the mean, can write

01^

1^

_^

XbXbYYu

iiiii XbbYYYu 1^

0^^

ˆ iYiYiu ˆ^

iii YYu ˆ^

Proof:

summing

Dividing by N iii Yn

Yn

un

ˆ111 ^

YYu ˆ_^

We know from above that 0

_^u

so YY ˆ

Story so far……

• Looked at idea behind OLS – fit a straight line thru the data that gives the “best fit”

• Do this by choosing a value for the intercept and the slope of the straight line that will minimise the sum of squared residuals

• OLS formula to do this given by

_1

^_0

^XbYb )(Var

),(Cov1

^

XYX

b

Things to do in Lecture 3

Useful algebra results underlying OLS

Invent a summary measure of how well regression fits the data (R2)

Look at contribution of individual variables in explaining outcomes, (Bias, Efficiency)

6

and since

XbbYu 1^

0^

_^

XbXbYYu 1^

)1^

(

_^

XbYb 1^

0^

iii XN

bbYN

uN

1111

^0

^^



Useful Algebraic aspects of OLS

01^

1^

_^

XbXbYYu

iiiii XbbYYYu 1^

0^^

ˆ iYiYiu ˆ^

iii YYu ˆ^

Proof:

summing

Dividing by N iii Yn

Yn

un

ˆ111 ^

YYu ˆ_^

We know from above that 0

_^u

so YY ˆ

OLS predicts average behaviour

0)(Var)(Var

),(Cov),(Cov2

),(Cov2),(Cov2

)2,(Cov)1,(Cov),(Cov2

])21[,(Cov2),(Cov20

),2(Cov),1(Cov)]),21([Cov),ˆ(Cov

XX

YXYXb

XXbYXb

XbXbXYXb

XbbYXbuXb

uXbubuXbbuY

the covariance between the fitted values of Y and the residuals must be zero.

22

0),ˆ(Cov^

uY

The 3rd useful result is that

GOODNESS OF FIT

Now we know how to summarise the relationships in the data using the OLS method, we next need a summary measure of “how well” the estimated OLS line fits the data

Think of the dispersion of all possible y values (the variation in Y) being represented by a circle

YAnd similarly the dispersion in the range of x values

X

The more the circles overlap the more the variation in the X data explains the variation in y

Y X

Little overlap in values so X not explain much of variation in Y

YX

Large overlap in values so X variable explains much of variation in Y

iiiii

i uYYYYu^^

ˆˆ

)^

,ˆCov(2)^

Var()ˆVar()^

ˆVar()(Var uYuYuYY

So the variation in the variable of interest, var(Y), is explained by either the variation in the variables included in the OLS model,

29

To derive a statistical measure which does much the same thing remember that

Using the rules on covariances (see problem set 0) we know that

^

Y

or by variation in the residual

)ˆVar(Y

)Var(^

u

)^

Var()ˆVar()^

,ˆCov(2)^

Var()ˆVar()^

ˆVar()(Var uYuYuYuYY

So we use the ratio

As a measure of how well the model fits the data. (R2 is also known as the coefficient of determination)

So R2 measures the % of variation in the dependent variable explained by the model.

If the model explains all the variation in y then the ratio equals 1

So the closer the ratio is to one the better the fit.

)var(

)^

var(2Y

YR

29

It is more common however to use one further algebraic adjustment.

The 1/n is common to both sides, so can cancel out and using the results that

2)

_^^

(12)ˆˆ(

12)(1

uun

YYn

YYn

YY ˆ 0

_^

u

Given (1) says that

)^

Var()ˆVar()(Var uYY Can write this as

2^

2)ˆ(2)( uYYYYThen we have

The left side of the equation is the sum of the squared deviations of Y about its sample mean.

This is called the Total Sum of Squares.

The right hand side consists of the sum of squared deviations of the predictions around the sample mean

(the Explained Sum of Squares)

and the Residual Sum of Squares

29

RSSESSTSS

2^

22 )ˆ()( uYYYY

From this can have an alternative definition of the goodness of fit

TSS

ESS

yiy

yiyR

2)_

(

2)_^

(2TSS

RSS1

2)( YY

2)ˆ( YY

2^

u

Can see from above that it must hold that 0<= R2 <=1

when ESS = 0, then R2 = 0(and model explains none of the variation in the dependent variable)

when ESS = TSS, then R2 = 1(and model explains all of the variation in the dependent variable)

In general the R2 lies between these two extremes.

You will find that for cross-section data (ie samples of individuals, firms etc) the R2 are typically in the region of 0.2

for time-series data (ie samples of aggregate (whole economy) data measured at different points in time) the R2 are typically in the region of 0.9

Why?

Cross section data typically are bigger and have a lot more variation while time series data sets are smaller and have less variation in the dependent variable (easier to fit a straight line if data don’t vary much)

While we would like the R2 to be as high as possible you can only compare R2 in models with the SAME dependent (Y) variable, so can’t say one model is a better fit than another if dependent variables are different

GOODNESS OF FIT

)ˆ(Var)(Var

)ˆ,(Covˆ, YY

YYr YY

43

So the R2 measures the proportion of variance in the dependent variable explained by the modelAnother useful interpretation of the R2 is that it equals the square of the correlation coefficient between the actual and predicted values of Y

Proof: We know the formula for the correlation coefficient

Sub. In for^^uYY

(actual value = predicted value plus residual )

)ˆ(Var)(Var

)ˆ],ˆ([Cov^

ˆ, YY

YuYr YY

GOODNESS OF FIT

)ˆ(Var)(Var

)ˆ,(Cov)ˆ,ˆ(Cov

)ˆ(Var)(Var

)ˆ],ˆ([Cov^^

YY

YuYY

YY

YuY

43

Expand the covariance terms

)ˆ(Var)(Var

)ˆ(Var

YY

Y

(since already proved 0),ˆ(Cov^

uY

)ˆ(Var)(Var

)ˆ(Var)ˆ(Var

YY

YY

And can always write any variance term as square root of the product

GOODNESS OF FIT

)(Var

)ˆ(Var

)ˆ(Var)(Var

)ˆ(Var)ˆ(Var

Y

Y

YY

YY

Thus the correlation coefficient is the square root of R2.

Eg R2 =0.25 implies correlation coefficient between Y variable & X variable (or between Y and predicted values ) = √0.25 = 0.5

43

Cancelling terms

so

Rxyr 2

the ols equations give a nice, clear intuitive meaning about the influence of the variable x on the...

Documents

ols residuals

ols regression

ols coefficient

ols equations

properties of ols

ols method

slope slide

estimated ols line