ie slide02(1)

Introductory Econometrics

ECON2206/ECON3209

Slides02

Lecturer: Minxian Yang

ie_Slides02 my, School of Economics, UNSW 1

2. Simple Regression Model (Ch2)

2. Simple Regression Model

• Lecture plan

– Motivation and definitions

– ZCM assumption

– Estimation method: OLS

– Units of measurement

– Nonlinear relationships

– Underlying assumptions of simple regression model

– Expected values and variances of OLS estimators

– Regression with STATA



• Motivation

– Example 1. Ceteris paribus effect of fertiliser on soybean yield

yield = β0 + β1ferti + u .

– Example 2. Ceteris paribus effect of education on wage

wage = β0 + β1educ + u .

– In general,

y = β0 + β1x + u,

where u represents factors other than x that affect y.

– We are interested in

• explaining y in terms of x,

• how y responds to changes in x,

holding other factors fixed.



• Simple regression model

– Definition

y = β0 + β1x + u ,

• y : dependent variable (observable)

• x : independent variable (observable)

• β1 : slope parameter, “partial effect,” (to be estimated)

• β0 : intercept parameter (to be estimated)

• u : error term or disturbance (unobservable)

– The disturbance u represents all factors other than x.

– With the intercept β0, the population average of u can

always be set to zero (without losing anything)

E(u) = 0 . y = β0 + E(u) + β1x + u − E(u)



• Zero conditional mean assumption

– If other factors in u are held fixed (Δu = 0), the ceteris

paribus effect of x on y is β1 :

Δy = β1 Δx .

– But under what condition u can be held fixed while x

changes?

• As x and u are treated as random variables,

“u is fixed while x varying” is described as

“the mean of u for any given x is the same (zero)”.

– The required condition is

E(u | x) = E(u) = 0 ,

known as zero-conditional-mean (ZCM) assumption.


Δ = “change”

X = X1 X2 X3 ...

E(u |X) = 0 0 0 0

y = β0 + β1x + u

y + Δy = β0 + β1(x + Δx)

+ u + Δu



– Example 2. wage = β0 + β1educ + u

Suppose u represents ability.

Then ZCM assumption amounts to

E(ability | educ) = 0 ,

ie, the average ability is the same irrespective of the

years of education.

This is not true

• if people choose the education level to suit their ability;

• or if more ability is associated with less (or more)

education.

In practice, we do not know if ZCM holds and have to

deal with this issue.ie_Slides02 my, School of Economics, UNSW 6



– Taking the conditional expectations of

y = β0 + β1x + u

for given x, ZCM implies

E(y | x) = β0 + β1x ,

known as the population regression function

(PRF), which is a linear function of x.

– The distribution of y is centred about E(y | x).

Systematic part of y : E(y | x).

Unsystematic part of y : u.



• Simple regression model

yi = β0 + β1xi + ui


x

y

x1

E(y| x = x3)

= β0 + β1x3

distribution of y

for given x = x3

conditional mean of y given x

(population regression line)

x2 x3

E(y| x = x2)

E(y| x = x1)

u

distribution of u


• Observations on (x, y)

– A random sample is a set of independent

observations on (x, y), ie, {(xi , yi), i = 1,2,...,n}.

– At observation level, the model may be written as

yi = β0 + β1xi + ui , i = 1, 2, ..., n

where i is the observation index.

– Collectively,

– Matrix notation:


.

1

1

1

or ,

1

1

1

2

1

1

02

1

2

1

2

1

2

1

10

2

1

nnnnnn u

u

u

x

x

x

y

y

y

u

u

u

x

x

x

y

y

y

. UBXY


• Estimate simple regression

– The model:

yi = β0 + β1xi + ui , i = 1, 2, ..., n

– Let be the estimates of (β0 , β1).

– Corresponding residual is

– The sum of squared residuals (SSR)

indicates the goodness of the estimates.

– Good estimates should make SSR small.


)ˆ,ˆ( 10

.,...,,,ˆˆˆ nixyu iii 21 10

n

i

ii

n

i

i xyuSSR1

2

10

1

2 )ˆˆ(ˆ


• Ordinary least squares (OLS)

– The OLS estimates minimise the SSR:

– Choose to minimise SSR.

The first order conditions lead to


SSR. of minimiser )ˆ,ˆ( 10

,)ˆˆ( 01

10

n

i

ii xy

)ˆ,ˆ( 10

)ˆ,ˆ( 10

.)ˆˆ( 01

10

n

i

iii xxy

mean residual = 0

covariance of

residual and x

= 0


• Ordinary least squares (OLS)

– Solving the two equations with two unknowns gives

where

– OLS requires the condition


,ˆˆ ,

)(

))((ˆ xy

xx

yyxx

n

i

i

n

i

ii

10

1

2

11

.)(

n

i

i xx1

2 0

. ,

n

i

i

n

i

i xn

xyn

y11

11


• OLS regression line or SRF

– For any set of data {(xi , yi), i = 1,2,...,n} with n > 2,

OLS can always be carried out as long as

– Once OLS estimates are obtained,

is known as the fitted value of y when x = xi.

– By OLS regression line or sample regression

function (SRF), we refer to

which is an estimate of PRF E(y | x) = β0 + β1 x.


.)(

n

i

i xx1

2 0

ii xy 10 ˆˆˆ

,ˆˆˆ xy 10


• Interpretation of OLS estimate

– In the SRF

the slope estimate is the change in when x

increases by one unit:

which is of primary interest in practice.

– The dependent variable y may be decomposed either

as the sum of the SRF and the residual

or as the sum of the PRF and the disturbance.


,ˆˆˆ xy 10

1 y

,/ˆˆ xy 1

uyy ˆˆ

.)|( uxyEy


• PRF versus SRF– Hope: SRF = PRF “on average” or “when n goes to infinity”.


population

regression

line β0+ β1x

sample

regression

line

(xi, yi)

ui

x

y

iii uxy 10

residual

x10 ˆˆ


• OLS example

– Example 2. (regress wage educ)

• Population : workforce in 1976

• y = wage : hourly earnings (in $)

• x = educ : years of education

• OLS SRF : n = 526

• Interpretation

– Slope 0.54 : each additional year of schooling increases

the wage by $0.54.

– Intercept -0.90 : “fitted wage of a person with educ = 0”?

SRF does poorly at low levels of education.

• Predicted wage for a person with educ = 10?


,..ˆ educegwa 540900 0 5 10 15

05

10

15

20

25

educ

wa

ge


• Properties of OLS

– The first order conditions:

imply that

• the sum of residuals is zero.

• the sample covariance of x and the residual is zero.

• the mean point is always on the SRF (or OLS

regression line).


,)ˆˆ( 01

10

n

i

ii xy

01

10

n

i

iii xxy )ˆˆ(

),( yx


• Sums of squares

– Each yi may be decomposed into

– Measure variations from :

• Total sum of squares (total variation in yi ):

• Explained sum of squares (variation in ):

• sum of squared Residuals (variation in ):

• It can be shown that SST = SSE + SSR .


y

.ˆˆiii uyy

,)(

n

i i yySST1

2

,)ˆ(

n

i i yySSE1

2

.ˆ

n

i iuSSR1

2

iy

iu


• R-squared: a goodness-of-fit measure

– How well does x explain y?

or how well does the OLS regression line fit data?

– We may use the fraction of variation in y that is

explained by x (or by the SRF) to measure.

– R-squared (coefficient of determination):

• larger R2, better fit;

• 0 ≤ R2 ≤ 1.

eg. R2 = 0.165 for

16.5% of variation in wage is explained by educ.


.SST

SSR

SST

SSER 12

,..ˆ educegwa 540900

Not advisable to put

too much weight on

R2 when evaluating

regression models.


• Effects of changing units of measurement

– If y is multiplied by a constant c, then the OLS

intercept and slope estimates are also multiplied by c.

– If x is multiplied by a constant c, then the OLS

intercept estimate is unchanged but the slope

estimate is multiplied by 1/c.

– The R2 does not change when varying the units of

measurement.

eg. When wage is in dollars,

If wage is in cents,


...ˆ educegwa 540900

.ˆ educegwa 5490


• Nonlinear relationships between x and y

– The OLS only requires the regression model

y = β0 + β1x + u

to be linear in parameters.

– Nonlinear relationships between y and x can be easily

accommodated.

eg. Suppose a better description

is that each year of education

increases wage by a fixed

percentage. This leads to

log(wage) = β0 + β1 educ + u ,

with %Δwage = (100β1)Δeduc

when Δu= 0.

OLS:


186008305840 2 .,..ˆ Reduceglwa

0 5 10 15

01

23

educlw

ag

e


• Nonlinear relationships between x and y

– Linear models are linear in parameters.

– OLS applies to linear models no matter how x and y

are defined.

– But be careful about the interpretation of β.



• OLS estimators

– A random sample, containing independent draws

from the same population, is random.

• A data set is a realisation of the random sample.

– OLS “estimates” computed from a random

sample is random, called the OLS estimators.

– To make inference about the population parameters

(β0, β1), we need to understand the statistical

properties of the OLS estimators.

– In particular, we like to know the means and

variances of the OLS estimators.

– We find these under a set of assumptions about the

simple regression model.


)ˆ,ˆ( 10


• Assumptions about simple regression model(SLR1 to SLR4)

1. (linear in parameters) In the population model, y is

related to x by y = β0 + β1 x + u, where (β0, β1) are

population parameters and u is disturbance.

2. (random sample) {(xi , yi), i = 1,2,...,n} with n > 2 is a

random sample drawn from the population model.

3. (sample variation) The sample outcomes on x are

not of the same value.

4. (zero conditional mean) The disturbance u satisfies

E(u | x) = 0 for any given value of x. For the random

sample, E(ui | xi) = 0 for i = 1,2,...,n.



• Property 1 of OLS estimators

Theorem 2.1

Under SLR1 to SLR4, the OLS estimators are

unbiased:

Unbiased estimators

– they are “centred” around (β0, β1).

– they correctly estimate (β0, β1) on average.

It is useful to note that


.)ˆ( ,)ˆ( 0011 EE

)ˆ,ˆ( 10

),()()( uuxxyy iii 1

.)(

))((ˆ

n

i i

n

i ii

xx

xxuu

1

2

111

The estimation error is

entirely driven by a linear

combination of ui with

weights dependent on x.


• Property 2 of OLS estimators

5. (SLR5, homoskedasticity)

Var(ui|xi) = σ2 for i = 1,2,...,n. (It implies Var(ui) = σ2.)

Theorem 2.2

Under SLR1 to SLR5, the variances of are:

– the larger is σ2, the greater are the variances.

– the larger the variation in x, the smaller the variances.


)ˆ,ˆ( 10

.)(

)ˆ( ,)(

)ˆ(

n

i i

n

i i

n

i i xx

xnVar

xxVar

1

2

1

212

0

1

2

2

1

Strictly, Theorem 2.2 is about the variances

of OLS estimators, conditional on given x.


• Homoskedasticity and heteroskedasticity



• Estimation of σ2

– As the residual approximates u, the estimator of σ2 is

– is known as the standard error of the

regression, useful in forming the standard errors of

.

Theorem 2.3 (unbiased estimator of σ2)

Under SLR1 to SLR5,


.)ˆ( 22 E

.ˆ

ˆ22

1

2

2

n

u

n

SSRn

i i

2 ˆˆ

)ˆ,ˆ( 10

“2” is the number of

estimated coefficients


• OLS in STATA


SSR

standard

error of

regression


• Summary

– What is a simple regression model?

– What is the ZCM assumption? Why is it crucial for model interpretation and OLS being unbiased?

– What is the OLS estimation principle?

– What are PRF, SRF, error term and residual?

– How is R-squared is related to SSR?

– Can we describe, in a simple linear regression model, the nonlinear relationship between x and y?

– What are Assumptions SLR1 to SLR5? Why do we need to understand them?

– What are the statistical properties of OLS estimators?

– How do you OLS in STATA? regress y x


ie slide02(1)

Documents