chapter 3: the multiple linear regression model€¦ · · 2013-12-19ruud p., (2000) an...

Chapter 3: The Multiple Linear Regression ModelAdvanced Econometrics - HEC Lausanne

Christophe Hurlin

University of Orléans

November 23, 2013

Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 1 / 174

Section 1

Introduction


1. Introduction

The objectives of this chapter are the following:

1 Dene the multiple linear regression model.

2 Introduce the ordinary least squares (OLS) estimator.


1. Introduction

The outline of this chapter is the following:

Section 2: The multiple linear regression model

Section 3: The ordinary least squares estimator

Section 4: Statistical properties of the OLS estimator

Subsection 4.1: Finite sample properties

Subsection 4:2: Asymptotic properties


1. Introduction

References

Amemiya T. (1985), Advanced Econometrics. Harvard University Press.

Greene W. (2007), Econometric Analysis, sixth edition, Pearson - PrenticeHil (recommended)

Pelgrin, F. (2010), Lecture notes Advanced Econometrics, HEC Lausanne (aspecial thank)

Ruud P., (2000) An introduction to Classical Econometric Theory, OxfordUniversity Press.


1. Introduction

Notations: In this chapter, I will (try to...) follow some conventions ofnotation.

fY (y) probability density or mass function

FY (y) cumulative distribution function

Pr () probability

y vector

Y matrix

Be careful: in this chapter, I dont distinguish between a random vector(matrix) and a vector (matrix) of deterministic elements. For moreappropriate notations, see:

Abadir and Magnus (2002), Notation in econometrics: a proposal for astandard, Econometrics Journal.


Section 2

The Multiple Linear Regression Model


2. The Multiple Linear Regression Model

Objectives

1 Dene the concept of Multiple linear regression model.

2 Semi-parametric and Parametric multiple linear regression model.

3 The multiple linear Gaussian model.



Denition (Multiple linear regression model)The multiple linear regression model is used to study the relationshipbetween a dependent variable and one or more independent variables. Thegeneric form of the linear regression model is

y = x1β1 + x2β2 + ..+ xK βK + ε

where y is the dependent or explained variable and x1, .., xK are theindependent or explanatory variables.



Notations

1 y is the dependent variable, the regressand or the explainedvariable.

2 xj is an explanatory variable, a regressor or a covariate.

3 ε is the error term or disturbance.

IMPORTANT: do not use the term "residual"..



Notations (contd)

The term ε is a random disturbance, so named because it disturbsanotherwise stable relationship. The disturbance arises for several reasons:

1 Primarily because we cannot hope to capture every inuence on aneconomic variable in a model, no matter how elaborate. The nete¤ect, which can be positive or negative, of these omitted factors iscaptured in the disturbance.

2 There are many other contributors to the disturbance in an empiricalmodel. Probably the most signicant is errors of measurement. It iseasy to theorize about the relationships among precisely denedvariables; it is quite another to obtain accurate measures of thesevariables.



Notations (contd)

We assume that each observation in a sample fyi , xi1, xi2..xiK g fori = 1, ..,N is generated by an underlying process described by

yi = xi1β1 + xi2β2 + ..+ xiK βK + εi

Remark:

xik = value of the kth explanatory variable for the i th unit of the sample

xunit,variable



Notations (contd)

Let the N 1 column vector xk be the N observations on variable xk ,for k = 1, ..,K .

Let assemble these data in an N K data matrix, X.

Let y be the N 1 column vector of the N observations, y1,y2, .., yN .

Let ε be the N 1 column vector containing the N disturbances.



Notations (contd)

yN1

=

0BBBBBB@

y1y2..yi..yN

1CCCCCCA xkN1

=

0BBBBBB@

x1kx2k..xik..xNk

1CCCCCCA εN1

=

0BBBBBB@

ε1ε2..εi..εN

1CCCCCCA βK1

=

0BB@β1β2..

βK

1CCA



Notations (contd)

XNK

= (x1 : x2 : .. : xK )

or equivalently

XNK

=

0BBBBBB@

x11 x12 .. x1k .. x1Kx21 x22 .. x2k .. x2K.. .. .. .. .. ..xi1 xi2 .. xik .. xiK.. .. .. .. ..xN1 xN2 .. xNk .. xNK

1CCCCCCA



FactIn most of cases, the rst column of X is assumed to be a column of 1s sothat β1 is the constant term in the model.

x1N1

= 1N1

XNK

=

0BBBBBB@

1 x12 .. x1k .. x1K1 x22 .. x2k .. x2K.. .. .. .. .. ..1 xi2 .. xik .. xiK.. .. .. .. ..1 xN2 .. xNk .. xNK

1CCCCCCA



Remark

More generally, the matrix X may as well contain stochastic and nonstochastic elements such as:

Constant;

Time trend;

Dummy variables (for specic episodes in time);

Etc.

Therefore, X is generally a mixture of xed and random variables.



Denition (Simple linear regression model)The simple linear regression model is a model with only one stochasticregressor: K = 1 if there is no constant

yi = β1xi + εi

or K = 2 if there is a constant:

yi = β1 + β2xi2 + εi

for i = 1, ..,N, ory = β1 + β2x2 + ε



Denition (Multiple linear regression model)The multiple linear regression model can be written

yN1

= XNK

βK1

+ εN1



One key di¤erence for the specication of the MLRM:

Parametric/semi-parametric specication

Parametric model: the distribution of the error terms is fullycharacterized, e.g. ε N (0,Ω)

Semi-Parametric specication: only a few moments of the error termsare specied, e.g. E (ε) = 0 and V (ε) = E

εε>

= Ω.



This di¤erence does not matter for the derivation of the ordinary leastsquare estimator

But this di¤erence matters for (among others):

1 The characterization of the statistical properties of the OLS estimator(e.g., e¢ ciency);

2 The choice of alternative estimators (e.g., the maximum likelihoodestimator);

3 Etc.



Denition (Semi-parametric multiple linear regression model)The semi-parametric multiple linear regression model is dened by

y = Xβ+ ε

where the error term ε satises

E (εjX) = 0N1

V (εjX) = σ2 INNN

and IN is the identity matrix of order N.



Remarks

1 If the matrix X is non stochastic (xed), i.e. there are only xedregressors, then the conditions on the error term u read:

E (ε) = 0

V (ε) = σ2IN2 If the (conditional) variance covariance matrix of ε is not diagonal,i.e. if

V (εjX) = Ω

the model is called the Multiple Generalized Linear RegressionModel



Remarks (contd)

The two conditions on the error term ε

E (εjX) = 0N1

V (εjX) = σ2IN

are equivalent toE (yjX) = Xβ

V (yjX) = σ2IN



Denition (The multiple linear Gaussian model)

The (parametric) multiple linear Gaussian model is dened by

y = Xβ+ ε

where the error term ε is normally distributed

ε N0, σ2IN

As a consequence, the vector y has a conditional normal distribution with

yjX NXβ, σ2IN



Remarks

1 The multiple linear Gaussian model is (by denition) a parametricmodel.

2 If the matrix X is non stochastic (xed), i.e. there are only xedregressors, then the vector y has marginal normal distribution:

y NXβ, σ2IN



The classical linear regression model consists of a set of assumptions thatdescribes how the data set is produced by a data generating process (DGP)

Assumption 1: Linearity

Assumption 2: Full rank condition or identication

Assumption 3: Exogeneity

Assumption 4: Spherical error terms

Assumption 5: Data generation

Assumption 6: Normal distribution



Denition (Assumption 1: Linearity)

The model is linear with respect to the parameters β1, .., βK .


Remarks

The model species a linear relationship between the dependentvariable and the regressors. For instance, the models

y = β0 + β1x + u

y = β0 + β1 cos (x) + v

y = β0 + β1 1x+ w

are all linear (with respect to β).

In contrast, the model y = β0 + β1xβ2 + ε is non linear



Remark

The model can be linear after some transformations. Starting fromy = Ax β exp (ε) , one has a log-linear specication:

ln (y) = ln (A) + β ln (x) + ε



Denition (Log-linear model)The loglinear model is

ln (yi ) = β1 ln (xi1) + β2 ln (xi2) + ..+ βK ln (xiK ) + εi

This equation is also known as the constant elasticity form as in thisequation, the elasticity of y with respect to changes in x does not varywith xik :

βk =∂ ln (yi )∂ ln (xik )

=∂yi∂xik

xikyi



Denition (Assumption 2: Full column rank)X is an N K matrix with rank K .



Interpretation

1 There is no exact relationship among any of the independent variablesin the model.

2 The columns of X are linearly independent.



ExampleSuppose that a cross-section model satises:

yi = β0 + β1non labor incomei + β2salaryi+β3total incomei + εi

The identication condition does not hold since total income is exactlyequal to salary plus non labor income (exact linear dependency in themodel).



Remarks

1 Perfect multi-collinearity is generally not di¢ cult to spot and issignalled by most statistical software.

2 Imperfect multi-collinearity is a more serious issue (see further).



Denition (Identication)The multiple linear regression model is identiable if and only if one thefollowing equivalent assertions holds:

(i) rank (X) = K

(ii) The matrix X>X is invertible

(iii) The columns of X form a basis of L (X)(iv) Xβ1 = Xβ2 =) β1 = β2 8 (β1, β2) 2 RK RK

(v) Xβ = 0 =) β = 0 8β 2 RK

(vi) ker (X) = f0g



Denition (Assumption 3: Strict exogeneity of the regressors)The regressors are exogenous in the sense that:

E (εjX) = 0N1

or equivalently for all the units i 2 f1, ..Ng

E ( εi jX) = 0

or equivalentlyE ( εi j xjk ) = 0

for any explanatory variable k 2 f1, ..Kg and any unit j 2 f1, ..Ng .



CommentsComments

1 The expected value of the error term at observation i (in the sample)is not a function of the independent variables observed at anyobservation (including the i th observation). The independent variablesare not predictors of the error terms.

2 The strict exogeneity condition can be rewritten as:

E (y j X) = Xβ

3 If the regressors are xed, this condition can be rewritten as:

E (ε) = 0N1



Implications

The (strict) exogeneity condition E (εjX) = 0N1 has two implications:1 The zero conditional mean of ε implies that the unconditional meanof u is also zero (the reverse is not true):

E (ε) = EX (E (εjX)) = EX (0) = 0

2 The zero conditional mean of ε implies that (the reverse is not true):

E (εixjk ) = 0 8i , j , k

orCov (εi ,X) = 0 8i



Denition (Assumption 4: Spherical disturbances)The error terms are such that:

V ( εi jX) = E

ε2iX = σ2 for all i 2 f1, ..Ng

andCov ( εi , εj jX) = E ( εi εj jX) = 0 for all i 6= j

The condition of constant variances is called homoscedasticity. Theuncorrelatedness across observations is called nonautocorrelation.



Comments

1 Spherical disturbances = homoscedasticity + nonautocorrelation

2 If the errors are not spherical, we call them nonspherical disturbances.

3 The assumption of homoscedasticity is a strong one: this is theexception rather than the rule!



Comments

Let us consider the (conditional) variance covariance matrix of the errorterms:

V (εjX)| z NN

= E

εε>X| z

NN

=

0BBBBBB@

E

ε21X E ( ε1ε2jX) .. E ( ε1εj jX) .. E ( ε1εN jX)

E ( ε2ε1jX) E

ε22X .. E ( ε2εj jX) .. E ( ε2εN jX)

.. .. .. .. ..E ( εi ε1jX) .. E ( εi εj jX) .. E ( εi εN jX)

.. .. .. .. ..E ( εN ε1jX) .. E ( εN εj jX) .. E

ε2NX

1CCCCCCA



Comments

The two assumptions (homoscedasticity and nonautocorrelation) implythat:

V (εjX)| z NN

= E

εε>X| z

NN

= σ2 IN

=

0BBBBBB@

σ2 0 .. 0 .. 00 σ2 .. 0 .. 0.. .. .. .. .... .. .. .. 0.. .. .. .. ..0 .. 0 .. σ2

1CCCCCCA



Denition (Assumption 5: Data generation)

The data in (xi1 xi2 ...xiK ) may be any mixture of constants and randomvariables.



Comments

1 Analysis will be done conditionally on the observed X, so whether theelements in X are xed constants or random draws from a stochasticprocess will not inuence the results.

2 In the case of stochastic regressors, the unconditional statisticalproperties of are obtained in two steps: (1) using the resultconditioned on X and (2) nding the unconditional result byaveraging (i.e., integrating over) the conditional distributions.



Comments

Assumptions regarding (xi1 xi2 ...xiK yi ) for i = 1, ..,N is alsorequired. This is a statement about how the sample is drawn.

In the sequel, we assume that (xi1 xi2 ...xiK yi ) for i = 1, ..,N areindependently and identically distributed (i.i.d).

The observations are drawn by a simple random sampling from a largepopulation.



Denition (Assumption 6: Normal distribution)The disturbances are normally distributed.

εi jX N0, σ2

or equivalently

εjX N0N1, σ2IN



Comments

1 Once again, this is a convenience that we will dispense with aftersome analysis of its implications.

2 Normality is not necessary to obtain many of the results presentedbelow.

3 Assumption 6 implies assumptions 3 (exogeneity) and 4 (sphericaldisturbances).



Summary

The main assumptions of the multiple linear regression model

A1: linearity The model is linear with β

A2: identication X is an N K matrix with rank K

A3: exogeneity E (εjX) = 0N1A4: spherical error terms V (εjX) = σ2IN

A5: data generation X may be xed or random

A6: normal distribution εjX N0N1, σ2IN



Key Concepts

1 Simple linear regression model

2 Multiple linear regression model

3 Semi-parametric multiple linear regression model

4 Multiple linear Gaussian model

5 Assumptions of the multiple linear regression model

6 Linearity (A1), Identication (A2), Exogeneity (A3), Spherical errorterms (A4), Data generation (A5) and Normal distribution (A6)


Section 3

The ordinary least squares estimator


3. The ordinary least squares estimator

Introduction

1 The simple linear regression model assumes that the followingspecication is true in the population:

y = Xβ+ ε

where other unobserved factors determining y are captured by theerror term ε.

2 Consider a sample fxi1, xi2, .., xiK , yigNi=1 of i .i .d . random variables(be careful to the change of notations here) and only one realizationof this sample (your data set).

3 How to estimate the vector of parameters β?



Introduction (contd)

1 If we assume that assumptions A1-A6 hold, we have a multiple linearGaussian model (parametric model), and a solution is to use theMLE. The MLE estimator for β coincides to the ordinary leastsquares (OLS) estimator (cf. chapter 2).

2 If we assume that only assumptions A1-A5 hold, we have asemi-parametric multiple linear regression model, the MLE isunfeasible.

3 In this case, the only solution is to use the ordinary least squaresestimator (OLS).



Intuition

Let us consider the simple linear regression model and for simplicity denotexi = xi2:

yi = β1 + β2xi + εi

The general idea of the OLS consists in minimizing the distancebetween the points (xi , yi ) and the regression line

byi = bβ1 + bβ2xior the points (xi , byi ) for all i = 1, ..,N



Intuition (contd)

Estimates of β1 and β2 are chosen by minimizing the sum of the squaredresiduals (SSR):

N

∑i=1

bε2iThis SSR can be written as:

N

∑i=1

bε2i = yi bβ1 bβ2xi2Therefore, bβ1 and bβ2 are the solutions of the minimization problembβ1, bβ2 = argmin

(β1,β2)

N

∑i=1(yi β1 β2xi )

2



Denition (OLS - simple linear regression model)

In the simple linear regression model yi = β1 + β2xi + εi , the OLSestimators bβ1 and bβ2 are the solutions of the minimization problembβ1, bβ2 = argmin

(β1,β2)

N

∑i=1(yi β1 β2xi )

2

The solutions are: bβ1 = yN bβ2xNbβ2 = ∑N

i=1 (xi xN ) (yi yN )∑Ni=1 (xi xN )

2

where yN = N1 ∑N

i=1 yi and xN = N1 ∑N

i=1 xi respectively denote thesample mean of the dependent variable y and the regressor x .



Remark

The OLS estimator is a linear estimator (cf. chapter 1) since it can beexpressed as a linear function of the observations yi :

bβ2 = N

∑i=1

ωiyi

with

ωi =(xi xN )

∑Ni=1 (xi xN )

2

in the case where yN = 0.



Denition (Fitted value)The predicted or tted value for observation i is:

byi = bβ1 + bβ2xiwith a sample mean equal to the sample average of the observations

byN = 1N

N

∑i=1byi = yN = 1

N

N

∑i=1yi



Denition (Fitted residual)The residual for observation i is:

bεi = yi bβ1 bβ2xiwith a sample mean equal to zero by denition.

bεN = 1N

N

∑i=1

bεi = 0



Remarks

1 The t of the regression is good if the sum ∑Ni=1bε2i (or SSR) is

small, i.e., the unexplained part of the variance of y is small.

2 The coe¢ cient of determination or R2 is given by:

R2 =∑Ni=1 (byi yN )2

∑Ni=1 (yi yN )

2 = 1∑Ni=1bε2i

∑Ni=1 (yi yN )

2



Orthogonality conditions

Under assumption A3 (strict exogeneity), we have E ( εi j xi ) = 0. Thiscondition implies that:

E (εi ) = 0 E (εixi ) = 0

Using the sample analog of this moment conditions (cf. chapter 6,GMM), one has:

1N

N

∑i=1

yi bβ1 bβ2xi = 0

1N

N

∑i=1

yi bβ1 bβ2xi xi = 0

This is a system of two equations and two unknowns bβ1 and bβ2.Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 67 / 174


Denition (Orthogonality conditions)The ordinary least squares estimator can be dened from the two sampleanalogs of the following moment conditions:

E (εi ) = 0

E (εixi ) = 0

The corresponding system of equations is just-identied.



OLS and multiple linear regression model

Now consider the multiple linear regression model

y = Xβ+ ε

or

yi =K

∑k=1

βkxik + εi

Objective: nd an estimator (estimate) of β1, β2, .., βK and σ2 under theassumptions A1-A5.



OLS and multiple linear regression model

Di¤erent methods:

1 Minimize the sum of squared residuals (SSR)

2 Solve the same minimization problem with matrix notation.

3 Use moment conditions.

4 Geometrical interpretation



1. Minimize the sum of squared residuals (SSR):

As in the simple linear regression,

bβ = argminβ

N

∑i=1

ε2i = argminβ

N

∑i=1

yi

K

∑k=1

βkxik

!2One can derive the rst order conditions with respect to βk for k = 1, ..,Kand solve a system of K equations with K unknowns.



2. Using matrix notations:

Denition (OLS and multiple linear regression model)

In the multiple linear regression model yi = x>i β+ εi , withxi = (xi1, .., xiK )

> , the OLS estimator bβ is the solution ofbβ = argmin

β

N

∑i=1

yi x>i β

2The OLS estimators of β is:

bβ = N

∑i=1xix>i

!1 N

∑i=1xiyi

!




Denition (Normal equations)Under suitable regularity conditions, in the multiple linear regression modelyi = x>i β+ εi , with xi = (xi1 : .. : xiK )

> , the normal equations are

N

∑i=1xiyi x>i bβ = 0K1




Denition (OLS and multiple linear regression model)

In the multiple linear regression model y = Xβ+ ε, the OLS estimator bβis the solution of the minimization problem

bβ = argminβ

ε>ε = argminβ

(yXβ)> (yXβ)

The OLS estimators of β is:

bβ = X>X1 X>y




Denition

The ordinary least squares estimator bβ of β minimizes the following criteria

s (β) = k(yXβ)k2IN = (yXβ)> (yXβ)




The FOC (normal equations) are dened by:

∂s (β)∂β

bβ = 2 X>|zKN

yXbβ| z N1

= 0K1

The second-order conditions hold:

∂s (β)

∂β∂β>

bβ = 2 X>X| z KK

is denite positive

since by denition X>X is a positive denite matrix. We have a minimum.




Denition (Normal equations)Under suitable regularity conditions, in the multiple linear regression modely = Xβ+ ε, the normal equations are given by:

X>|zKN

(yXβ)| z N1

= 0K1



Denition (Unbiased variance estimator)

In the multiple linear regression model y = Xβ+ ε, the unbiasedestimator of σ2 is given by:

bσ2 = 1N K

N

∑i=1

bε2i SSRN K




The estimator bσ2 can also be written as:bσ2 = 1

N KN

∑i=1

yi x>i bβ2

bσ2 = (yXβ)> (yXβ)

N K

bσ2 = k(yXβ)k2INN K



3. Using moment conditions:

Under assumption A3 (strict exogeneity), we have E (εjX) = 0. Thiscondition implies:

E (εixi ) = 0K1

with xi = (xi1 : .. : xiK )> . Using the sample analogs, one has:

1N

N

∑i=1xiyi x>i bβ = 0K1

We have K (normal) equations with K unknown parameters bβ1, .., bβK .The system is just identied.



4. Geometric interpretation:

1 The ordinary least squares estimation methods consists in determiningthe adjusted vector, by, which is the closest to y (in a certain space...)such that the squared norm between y and by is minimized.

2 Finding by is equivalent to nd an estimator of β.




Denition (Geometric interpretation)

The adjusted vector, by, is the (orthogonal) projection of y onto thecolumn space of X. The tted error terms, bε, is the projection of y ontothe orthogonal space engendered by the column space of X. The vectors byand bε are orthogonal.




Source: F. Pelgrin (2010), Lecture notes, Advanced Econometrics




Denition (Projection matrices)

The vectors by and bε are dened to be:by = P ybε =M y

where P and M denote the two following projection matrices:

P = XX>X

1X>

M = IN P = IN XX>X

1X>



Other geometric interpretations:

Suppose that there is a constant term in the model.

1 The least squares residuals sum to zero:

N

∑i=1

bεi = 02 The regression hyperplane passes through the point of means of thedata (xN , yN ) .

3 The mean of the tted (adjusted) values of y equals the mean of theactual values of y : byN = yN



Denition (Coe¢ cient of determination)The coe¢ cient of determination of the multiple linear regression model(with a constant term) is the ratio of the total (empirical) varianceexplained by model to the total (empirical) variance of y:

R2 =∑Ni=1 (byi yN )2

∑Ni=1 (yi yN )

2 = 1∑Ni=1bε2i

∑Ni=1 (yi yN )

2



Remark

1 The coe¢ cient of determination measures the proportion of the totalvariance (or variability) in y that is accounted for by variation in theregressors (or the model).

2 Problem: the R2 automatically and spuriously increases when extraexplanatory variables are added to the model.



Denition (Adjusted R-squared)The adjusted R-squared coe¢ cient is dened to be:

R2= 1 N 1

N p 11 R2

where p denotes the number of regressors (not counting the constantterm, i.e., p = K 1 if there is a constant or p = K otherwise).



Remark

One can show that

1 R2<R2

2 if N is large R2 'R2

3 The adjusted R-squared R2can be negative.



Key Concepts

1 OLS estimator and estimate

2 Fitted or predicted value

3 Residual or tted residual

4 Orthogonality conditions

5 Normal equations

6 Geometric interpretations of the OLS

7 Coe¢ cient of determination and adjusted R-squared


Section 4

Statistical properties of the OLS estimator


4. Statistical properties of the OLS estimator

In order to study the statistical properties of the OLS estimator, we haveto distinguish (cf. chapter 1):

1 The nite sample properties

2 The large sample or asymptotic properties



But, we have also to distinguish the properties given the assumptionsmade on the linear regression model

1 Semi-parametric linear regression model (the exact distribution ofε is unknown) versus parametric linear regression model (andespecially Gaussian linear regression model, assumption A6).

2 X is a matrix of random regressors versus X is a matrix of xedregressors.



Fact (Assumptions)In the rest of this section, we assume that assumptions A1-A5 hold.

A1: linearity The model is linear with β

A2: identication X is an N K matrix with rank K

A3: exogeneity E (εjX) = 0N1A4: spherical error terms V (εjX) = σ2IN

A5: data generation X may be xed or random


Subsection 4.1.

Finite sample properties of the OLS estimator


4.1. Finite sample properties

Objectives

The objectives of this subsection are the following:

1 Compute the two rst moments of the (unknown) nite sampledistribution of the OLS estimators bβ and bσ2

2 Determine the nite sample distribution of the OLS estimators bβ andbσ under particular assumptions (A6).3 Determine if the OLS estimators are "good": e¢ cient estimatorversus BLUE.

4 Introduce the Gauss-Markov theorem.



First moments of the OLS estimators



Moments

In a rst step, we will derive the rst moments of the OLS estimators

1 Step 1: compute Ebβ and V

bβ2 Step 2: compute E

bσ2 and Vbσ2



Denition (Unbiased estimator)

In the multiple linear regression model y = Xβ0 + ε, under the assumptionA3 (strict exogeneity), the OLS estimator bβ is unbiased:

Ebβ = β0

where β0 denotes the true value of the vector of parameters. This resultholds whether or not the matrix X is considered as random.



Proof

Case 1: xed regressors (cf. chapter 1)

bβ = X>X1 X>y = β0 +X>X

1 X>ε

So, if X is a matrix of xed regressors:

Ebβ = β0 +

X>X

1 X>E (ε)

Under assumption A3 (exogeneity), E (εjX) = E (ε) = 0. Then, we get:

Ebβ = β0

The OLS estimator is unbiased.



Proof (contd)

Case 2: random regressors

bβ = X>X1 X>y = β0 +X>X

1 X>ε

If X is includes some random elements:

EbβX = β0 +

X>X

1 X>E (εjX)

Under assumption A3 (exogeneity), E (εjX) = 0. Then, we get:

EbβX = β0



Proof (contd)

Case 2: random regressors

The OLS estimator bβ is conditionally unbiased.EbβX = β0

Besides, we have:

Ebβ = EX

EbβX = EX (β0) = β0

where EX denotes the expectation with respect to the distribution of X.So, the OLS estimator bβ is unbiased.

Ebβ = β0



Denition (Variance of the OLS estimator, non-stochastic regressors)

In the multiple linear regression model y = Xβ+ ε, if the matrix X isnon-stochastic, the unconditional variance covariance matrix of the OLSestimator bβ is:

Vbβ = σ2

X>X

1



Proof bβ = X>X1 X>y = β0 +X>X

1 X>ε

So, if X is a matrix of xed regressors:

Vbβ = E

bβ β0

> bβ β0

= E

X>X

1X>εε>X

X>X

1=

X>X

1X>E

εε>

XX>X

1



Proof (contd)

Under assumption A4 (spherical disturbances), we have:

V (ε) = E

εε>= σ2IN

The variance covariance matrix of the OLS estimator is dened by:

Vbβ =

X>X

1X>E

εε>

XX>X

1=

X>X

1X>σ2INX

X>X

1= σ2

X>X

1 X>X

X>X

1= σ2

X>X

1



Denition (Variance of the OLS estimator, stochastic regressors)

In the multiple linear regression model y = Xβ0 + ε, if the matrix X isstochastic, the conditional variance covariance matrix of the OLSestimator bβ is:

VbβX = σ2

X>X

1The unconditional variance covariance matrix is equal to:

Vbβ = σ2 EX

X>X

1where EX denotes the expectation with respect to the distribution of X.



Proof bβ = X>X1 X>y = β0 +X>X

1 X>ε

So, if X is a stochastic matrix:

VbβX = E

bβ β0

> bβ β0

X= E

X>X

1X>εε>X

X>X

1X=

X>X

1X>E

εε>

XX X>X1



Proof (contd)

Under assumption A4 (spherical disturbances), we have:

V (εjX) = E

εε>X = σ2IN

The conditional variance covariance matrix of the OLS estimator is denedby:

VbβX =

X>X

1X>E

εε>

XX X>X1=

X>X

1X>σ2INX

X>X

1= σ2

X>X

1



Proof (contd)

We have:VbβX = σ2

X>X

1The (unconditional) variance covariance matrix of the OLS estimator isdened by:

Vbβ = EX

VbβX = σ2 EX

X>X

1where EX denotes the expectation with respect to the distribution of X.



Summary

Case 1: X stochastic Case 2: X nonstochastic

Mean Ebβ = β0 E

bβ = β0

Variance Vbβ = σ2 EX

X>X

1Vbβ = σ2

X>X

1Cond. mean E

bβX = β0

Cond. var VbβX = σ2

X>X

1



Question

How to estimate the variance covariance matrix of the OLS estimator?

VbβOLS = σ2

X>X

1if X is nonstochastic

VbβOLS = σ2 EX

X>X

1if X is stochastic



Question (contd)

Denition (Variance estimator)An unbiased estimator of the variance covariance matrix of the OLSestimator is given: bV bβOLS = bσ2 X>X1where bσ2 = (N K )1 bε>bε is an unbiased estimator of σ2. This resultholds whether X is stochastic or non stochastic.



Summary


Variance Vbβ = σ2 EX

X>X

1Vbβ = σ2

X>X

1Estimator bV bβOLS = bσ2 X>X1 bV bβOLS = bσ2 X>X1



Denition (Estimator of the variance of disturbances)Under the assumption A1-A5, in the multiple linear regression modely = Xβ+ ε, the estimator bσ2 is unbiased:

Ebσ2 = σ2

where bσ2 = 1N K

N

∑i=1

bε2i = bε>bεN K

This result holds whether or not the matrix X is considered as random.



Proof

We assume that X is stochastic. Let M denotes the projection matrix(residual maker) dened by:

M = IN XX>X

1X>

with bε(N ,1)

= M(N ,N )

y(N ,1)

The N N matrix M satises the following properties:

1 if X is regressed on X, a perfect t will result and the residuals will bezero, so M X = 0

2 The matrix M is symmetric M> =M and idempotent M M =M



Proof (contd)

The residuals are dened as to be:

bε =M y

Since y = Xβ+ ε, we have

bε =M (Xβ+ ε) =MXβ+Mε

Since MX = 0, we have bε =Mε



Proof (contd)The estimator bσ2 is based on the sum of squared residuals (SSR)

bσ2 = bε>bεN K =

ε>Mε

N K

The expected value of the SSR is

Ebε>bεX = E

ε>Mε

XThe scalar ε>Mε is a 1 1 scalar, so it is equal to its trace.

tr

E

ε>MεX = trE

εε>M

X = trEMεε>

Xsince tr (AB) = tr (AB) .



Proof (contd)

Since M = IN XX>X

1X> depends on X, we have:

Ebε>bεX = trE

Mεε>

X = trME

εε>X

Under assumptions A3 and A4, we have

E

εε>X = σ2IN

As a consequence

Ebε>bεX = tr σ2M IN

= σ2tr (M)



Proof (contd)

Ebε>bεX = σ2 tr (M)

= σ2 tr

IN X

X>X

1X>

= σ2 tr (IN ) σ2 tr

XX>X

1X>

= σ2 tr (IN ) σ2 tr

X>X

X>X

1= σ2 tr (IN ) σ2 tr (IK )= σ2 (N K )



Proof (contd)

By denition of bσ2, we have:Ebσ2X = E

bε>bεXN K =

σ2 (N K )N K = σ2

So, the estimator bσ2 is conditionally unbiased.Ebσ2 = EX

Ebσ2X = EX

σ2= σ2

The estimator bσ2 is unbiased:Ebσ2 = σ2



Remark

Given the same principle, we can compute the variance of the estimator bσ2.bσ2 = bε>bε

N K =ε>Mε

N K

As a consequence, we have:

Vbσ2X = 1

(N K )V

ε>MεX

Vbσ2 = EX

Vbσ2X

But, it takes... at least ten slides.....



Denition (Variance of the estimator bσ2)In the multiple linear regression model y = Xβ0 + ε, the variance of theestimator bσ2 is

Vbσ2 = 2σ4

N Kwhere σ2 denotes the true value of variance of the error terms. This resultholds whether or not the matrix X is considered as random.



Summary


Mean Ebσ2 = σ2 E

bσ2 = σ2

Variance Vbσ2 = 2σ4

NK Vbσ2 = 2σ4

NK

Cond. mean Ebσ2X = σ2

Cond. var Vbσ2X = 2σ4

NK



Finite sample distributions



Summary

1 Under assumptions A1-A5, we can derive the two rst moments ofthe (unknown) nite sample distribution of the OLS estimator bβ, i.e.Ebβ and V

bβ .2 Are we able to characterize the nite sample distribution (or exactsampling distribution) of bβ ?

3 For that, we need to put an assumption on the distribution of ε anduse a parametric specication.



Finite sample distribution

Fact (Finite sample distribution, I)

(1) In a parametric multiple linear regression model with stochasticregressors, the conditional nite sample distribution of bβ is known. Theunconditional nite sample distribution is generally unknown:

bβX D bβ ??

where D is a multivariate distribution.




Fact (Finite sample distribution, II)

(2) In a parametric multiple linear regression model with nonstochasticregressors, the marginal (unconditional) nite sample distribution of bβ isknown: bβ D




Fact (Finite sample distribution, III)

(3) In a semi-parametric multiple linear regression model, with xed orrandom regressors, the nite sample distribution of bβ is unknown

bβX ?? bβ ??



Example (Linear Gaussian regression model)

Under the assumption A6 (normality - linear Gaussian regression model),

ε N0,σ2IN

bβ = X>X1 X>y = β+

X>X

1 X>ε

If X is a stochastic matrix, the conditional distribution of bβ is

bβX N β,σ2

X>X

1If X is a nonstochastic matrix, the marginal distribution of bβ is

bβ N β,σ2

X>X

1Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 129 / 174


Theorem (Linear gaussian regression model)

Under the assumption A6 (normality), the estimators bβ and bσ2 have anite sample distribution given by:

bβ N β,σ2

X>X

1bσ2σ2(N K ) χ2 (N K )

Moreover, bβ and bσ2 are independent. This result holds whether or not thematrix X is considered as random. In this last case, the distribution of bβ isconditional to X with V

bβX = σ2X>X

1.



E¢ cient estimator or BLUE?



Question: OLS estimator = "good" estimator of β?

The question is to know if there this estimator is preferred to otherunbiased estimators?

VbβOLS 7 V

bβotherIn general, to answer to this question we use the FDCR orCramer-Rao bound and study the e¢ ciency of the estimator (cf.chapters 1 and 2).

Problem: the computation of the FDCR bound requires anassumption on the distribution of ε.



Two cases:

1 In a parametric model, it is possible to show that the OLS estimatoris e¢ cient - its variance reaches the FDCR bound. The OLS is theBest Unbiased Estimator (BUE)

2 In a semi-parametric model:1 we dont known the conditional distribution of y ven X and thelikelihood of the sample.

2 The FDCR bound is unknown: so it is impossible to show thee¢ ciency of the OLS estimator.

3 In a semi-parametric model, we introduce another concept: the BestLinear Unbiased Estimator (BLUE)



Problem (E¢ ciency of the OLS estimator)

In a semi-parametric multiple linear regression model (with noassumption on the distribution of ε), it is impossible to compute theFDCR or Cramer Rao bound and to show that the e¢ ciency of the OLSestimator. The e¢ ciency of the OLS estimator can only be shown in aparametric multiple linear regression model.



Theorem (E¢ ciency - Gaussian model)

In a Gaussian multiple linear regression model y = Xβ0 + ε, withε N

0,σ2IN

(assumption A6), the OLS estimator bβ is e¢ cient. Its

variance attains the FDCR or Cramer-Rao bound:

Vbβ = I1N (β0)

This result holds whether or not the matrix X is considered as random.



Proof

Let us consider the case where X is nonstochastic. Under the assumptionA6, the distribution is known, since y N

Xβ,σ2IN

. The log-likelihood

of the sample fyi , xigNi=1, with xi = (xi1 : .. : xiK )> , is then equal to:

`N (β; y, x) = N2lnσ2 N2ln (2π) (yXβ)> (yXβ)

2σ2

If we denote θ =

β> : σ2>, we have (cf. chapter 2):

IN (θ0) = Eθ

∂2`N (θ; y, x)

∂θ∂θ>

=

1σ2X>X 021012 T

2σ4

!



Proof (contd)

IN (θ0) =

1σ2X>X 021012 T

2σ4

!The inverse of this block matrix is given by

I1N (θ0) =

0@ σ2X>X

1021

012 2σ4

T

1A =

I1N (β0) 021012 I1N

σ2 !

Then we have:

Vbβ = I1N (β0) = σ2

X>X

1



Problem

1 In a semi-parametric model (with no assumption on the distributionof ε), it is impossible to compute the FDCR bound and to show thee¢ ciency of the OLS estimator.

2 The solution consists in introducing the concept of best linearunbiased estimator (BLUE): the Gauss-Markov theorem...



Theorem (Gauss-Markov theorem)In the linear regression model under assumptions A1-A5, the least squaresestimator bβ is the best linear unbiased estimator (BLUE) of β0 whetherX is stochastic or nonstochastic.



Comment

The estimator bβk for k = 1, ..K is the BLUE of βk

Best = smallest variance

Linear (in y or yi ) : bβk = ∑Ni=1 ωkiyi

Unbiased: Ebβk = βk

Estimator: bβk = f (y1, .., yN )



Remark

The fact that the OLS is the BLUE means that there is no linearunbiased estimator with a smallest variance than bβBUT, nothing guarantees that a nonlinear unbiased estimator mayhave a smallest variance than bβ



Summary

Model Case 1: X stochastic Case 2: X nonstochastic

Semi-parametric bβ is the BLUE bβ is the BLUE(Gauss-Markov) (Gauss-Markov)

Parametric bβ is e¢ cient and BUE bβ is e¢ cient and BUE(FDCR bound) (FDCR bound)

BUE: Best Unbiased Estimator



Other remarks



Remarks

1 Including irrelevant variables (i.e., coe¢ cient = 0) does not a¤ectthe unbiasedness of the estimator.

2 Irrelevant variables decrease the precision of OLS estimator, but donot result in bias.

3 Omitting relevant regressors obviously does a¤ect the estimator,since the zero conditional mean assumption does not hold anymore.



Remarks

1 Omitting relevant variables cause bias (under some conditions onthese variables).

1 The direction of the bias can be (sometimes) characterized.

2 The size of the bias is more complicated to derive.

2 More generally, correlation between a single regressor and the errorterm (endogeneity) usually results in all OLS estimators being biased



Key Concepts

1 OLS unbiased estimator under assumption A3 (exogeneity)

2 Mean and variance of the OLS estimator under assumptions A3-A4(exogeneity - spherical disturbances)

3 Finite sample distribution

4 Estimator of the variance of the error terms

5 BLUE estimator and e¢ cient estimator (FDCR bound)

6 Gauss-Markov theorem


Subsection 4.2.

Asymptotic properties of the OLS estimator


4.2. Asymptotic properties

Objectives

The objectives of this subsection are the following

1 Study the consistency of the OLS estimator.

2 Derive the asymptotic distribution of the OLS estimator bβ.3 Estimate the asymptotic variance covariance matrix of bβ.



Theorem (Consistency)

Let consider the multiple linear regression model yi = x>i β0 + εi , withxi = (xi1 : .. : xiK )

> . Under assumptions A1-A5, the OLS estimatorsbβ and bσ2 are (weakly) consistent:bβ p! β0

bσ2 p! σ2


4.2. Asymptotic Properties

Proof (cf. chapter 1)

We have:

bβ = N

∑i=1xix>i

!1 N

∑i=1xiyi

!= β0 +

N

∑i=1xix>i

!1 N

∑i=1xi εi

!

By multiplying and dividing by N, we get:

bβ = β+

1N

N

∑i=1xix>i

!1 1N

N

∑i=1xi εi

!




bβ = β+

1N

N

∑i=1xix>i

!1 1N

N

∑i=1xi εi

!

1 By using the (weak) law of large number (Kitchines theorem), wehave:

1N

N

∑i=1xix>i

p! Exix>i

1N

N

∑i=1xi εi

p! E (xi εi )

2 By using the Slutskys theorem:

bβ p! β+E1xix>i

E (xi εi )




bβ p! β+E1xix>i

E (xiµi )

Under assumption A3 (exogeneity):

E ( εi j xi ) = 0)

E (εixi ) = 0K1E (εi ) = 0

We have bβ p! β

The OLS estimator bβ is (weakly) consistent. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 152 / 174


Theorem (Asymptotic distribution)

Consider the multiple linear regression model yi = x>i β0 + εi , withxi = (xi1 : .. : xiK )

> . Under assumptions A1-A5, the OLS estimator bβ isasymptotically normally distributed:

pNbβ β0

d! N

0, σ2E1X

xix>i

or equivalently bβ asy

N

β0,σ2

NE1X

xix>i



Comments

1 Whatever the specication used (parametric or semi-parametric), it isalways possible to characterize the asymptotic distribution of theOLS estimator.

2 Under assumptions A1-A5, the OLS estimator bβ is asymptoticallynormally distributed, even if ε is not normally distributed.

3 This result (asymptotic normality) is valid whether X is stochastic ornonstochastic.


4.2. Asymptotic propertiesProof (cf. chapter 1)

This proof has be shown in the chapter 1 (Estimation theory)

1 Let us rewrite the OLS estimator as:

bβ = β0 +

N

∑i=1xix>i

!1 N

∑i=1xi εi

!

2 Normalize the vector bβ β0

pNbβ β0

=

1N

N

∑i=1xix>i

!1 pN1N

N

∑i=1xi εi

!= A1N bN

AN =1N

N

∑i=1xix>i bN =

pN1N

N

∑i=1xi εi


4.2. Asymptotic propertiesProof (cf. chapter 1)

3. Using the WLLN (Kinchines theorem) and the CMP (continuousmapping theorem). Remark: if xi this part is useless.

1N

N

∑i=1xix>i

!1p! E1X

xix>i

4. Using the CLT:

BN =pN

1N

N

∑i=1xi εi E (xi εi )

!d! N (0,V (xi εi ))

Under assumption A3 E ( εi j xi ) = 0 =) E (xi εi ) = 0 and

V (xi εi ) = Exi εi εix>i

= EX

Exi εi εix>i

xi= EX

xiV ( εi j xi ) x>i

= σ2EX

xix>i




So we have pNbβ β0

= A1N bN

withA1N

p! A1

bNd! N (Eb,Vb)

A1 = E1X

xix>i

Eb = 0 Vb = σ2EX

xix>i

= σ2A




A1Np! A1

bNd! N (Eb,Vb)

A1 = E1xix>i

Eb = 0 Vb = σ2EX

xix>i

= σ2A

By using the Sluskys theorem (for a convergence in distribution), we have:

A1N bNd! N (Ec,Vc)

Ec = A1 Eb = A1 0 = 0Vc = A1 Vb A1 = A1 σ2AA1 = σ2A1




pNbβ β0

= A1N bN

d! N0, σ2A1

with A1 = E1X

x>i xi

(be careful: it is the inverse of the expectation

and not the reverse).

Finally, we have:

pNbβ β0

d! N

0, σ2E1X

xix>i


4.2. Asymptotic propertiesRemark

1 In some textbooks (Greene, 2007 for instance), the asymptoticvariance covariance matrix of the OLS estimator is expressed as afunction of the plim of N1X>X denoted by Q.

Q = p lim1NX>X

2 Both notations are equivalent since:

1NX>X =

1N

N

∑i=1xix>i

By using the WLLN (Kinchines theorem), we have:

1NX>X

p! Q = EX

xix>i



Theorem (Asymptotic distribution)

Consider the multiple linear regression model y = Xβ0 + ε. Underassumptions A1-A5, the OLS estimator bβ is asymptotically normallydistributed: p

Nbβ β0

d! N

0, σ2Q1

where

Q = p lim1NX>X = EX

x>i xi

or equivalently bβ asy

N

β0,σ2

NQ1



Denition (Asymptotic variance covariance matrix)

The asymptotic variance covariance matrix of the OLS estimators bβ isequal to

Vasy

bβ = σ2

NE1X

xix>i

or equivalently

Vasy

bβ = σ2

NQ1

where Q = p limN1X>X.



Remark

Finite sample variance

Vbβ = σ2EX

X>X

1Asymptotic variance

Vasy

bβ = σ2

NE1X

xix>i



Remark

How to estimate the asymptotic variance covariance matrix of the OLSestimator?

Vasy

bβ = σ2

NE1X

xix>i



Denition (Estimator of the asymptotic variance matrix)A consistent estimator of the asymptotic variance covariance matrix isgiven by: bVasy

bβ = bσ2 X>X1where bσ2 is consistent estimator of σ2 :

bσ2 = bε>bεN K



Proof

Vasy

bβ = σ2

NE1X

xix>i

We known that: bσ2 p! σ2

X>XN

=1N

N

∑i=1xix>i

p! EX

xix>i

= Q

By using the CMP and the Slutsky theorem, we get:

bσ2 X>XN

!1p! σ2E1X

xix>i


4.2. Asymptotic propertiesProof (contd)

Vasy

bβ = σ2

NE1X

xix>i

bσ2 X>X

N

!1p! σ2E1X

xix>i

As consequence

bσ2N

X>XN

!1p! σ2

NE1X

xix>i

bσ2 X>X1 p! Vasy

bβor equivalently bVasy

bβ p! Vasy

bβ



Remark

Under assumptions A1-A5, for a stochastic matrix X, we have

Finite sample Asymptotic

Variance Vbβ = σ2EX

X>X

1Vasy

bβ = σ2N1Q1

with Q = p lim 1NX

>X

Estimator bV bβ = bσ2 X>X1 bVasy

bβ = bσ2 X>X1



Example (CAPM)The empirical analogue of the CAPM is given by:

erit = αi + βiermt + εt

erit = rit rft| z excess return of security i at time t

ermt = (rmt rft )| z market excess return at time t

where εt is an i .i .d . error term with:

E (εt ) = 0 V (εt ) = σ2 E ( εt jermt ) = 0Data le: capm.xls for Microsoft, SP500 and Tbill (closing prices) from11/1/1993 to 04/03/2003


4.2. Asymptotic sample properties

Example (CAPM, contd)

Question: write a Matlab code in order to nd (1) the estimates of α andβ for Microsoft, (2) the corresponding standard errors, (3) the SE of theregression, (4) the SSR, (5) the coe¢ cient of determination and (6) theadjusted R2. Compare your results to the following table (Eviews)



Key Concepts

1 The OLS estimator is weakly consist

2 The OLS estimator is asymptotically normally distributed

3 Asymptotic variance covariance matrix

4 Estimator of the asymptotic variance


End of Chapter 3

Christophe Hurlin (University of Orléans)


chapter 3: the multiple linear regression model€¦ · · 2013-12-19ruud p., (2000) an...

Documents