chapter 3: the multiple linear regression model€¦ · · 2013-12-19ruud p., (2000) an...
TRANSCRIPT
Chapter 3: The Multiple Linear Regression ModelAdvanced Econometrics - HEC Lausanne
Christophe Hurlin
University of Orléans
November 23, 2013
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 1 / 174
Section 1
Introduction
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 2 / 174
1. Introduction
The objectives of this chapter are the following:
1 Dene the multiple linear regression model.
2 Introduce the ordinary least squares (OLS) estimator.
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 3 / 174
1. Introduction
The outline of this chapter is the following:
Section 2: The multiple linear regression model
Section 3: The ordinary least squares estimator
Section 4: Statistical properties of the OLS estimator
Subsection 4.1: Finite sample properties
Subsection 4:2: Asymptotic properties
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 4 / 174
1. Introduction
References
Amemiya T. (1985), Advanced Econometrics. Harvard University Press.
Greene W. (2007), Econometric Analysis, sixth edition, Pearson - PrenticeHil (recommended)
Pelgrin, F. (2010), Lecture notes Advanced Econometrics, HEC Lausanne (aspecial thank)
Ruud P., (2000) An introduction to Classical Econometric Theory, OxfordUniversity Press.
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 5 / 174
1. Introduction
Notations: In this chapter, I will (try to...) follow some conventions ofnotation.
fY (y) probability density or mass function
FY (y) cumulative distribution function
Pr () probability
y vector
Y matrix
Be careful: in this chapter, I dont distinguish between a random vector(matrix) and a vector (matrix) of deterministic elements. For moreappropriate notations, see:
Abadir and Magnus (2002), Notation in econometrics: a proposal for astandard, Econometrics Journal.
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 6 / 174
Section 2
The Multiple Linear Regression Model
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 7 / 174
2. The Multiple Linear Regression Model
Objectives
1 Dene the concept of Multiple linear regression model.
2 Semi-parametric and Parametric multiple linear regression model.
3 The multiple linear Gaussian model.
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 8 / 174
2. The Multiple Linear Regression Model
Denition (Multiple linear regression model)The multiple linear regression model is used to study the relationshipbetween a dependent variable and one or more independent variables. Thegeneric form of the linear regression model is
y = x1β1 + x2β2 + ..+ xK βK + ε
where y is the dependent or explained variable and x1, .., xK are theindependent or explanatory variables.
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 9 / 174
2. The Multiple Linear Regression Model
Notations
1 y is the dependent variable, the regressand or the explainedvariable.
2 xj is an explanatory variable, a regressor or a covariate.
3 ε is the error term or disturbance.
IMPORTANT: do not use the term "residual"..
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 10 / 174
2. The Multiple Linear Regression Model
Notations (contd)
The term ε is a random disturbance, so named because it disturbsanotherwise stable relationship. The disturbance arises for several reasons:
1 Primarily because we cannot hope to capture every inuence on aneconomic variable in a model, no matter how elaborate. The nete¤ect, which can be positive or negative, of these omitted factors iscaptured in the disturbance.
2 There are many other contributors to the disturbance in an empiricalmodel. Probably the most signicant is errors of measurement. It iseasy to theorize about the relationships among precisely denedvariables; it is quite another to obtain accurate measures of thesevariables.
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 11 / 174
2. The Multiple Linear Regression Model
Notations (contd)
We assume that each observation in a sample fyi , xi1, xi2..xiK g fori = 1, ..,N is generated by an underlying process described by
yi = xi1β1 + xi2β2 + ..+ xiK βK + εi
Remark:
xik = value of the kth explanatory variable for the i th unit of the sample
xunit,variable
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 12 / 174
2. The Multiple Linear Regression Model
Notations (contd)
Let the N 1 column vector xk be the N observations on variable xk ,for k = 1, ..,K .
Let assemble these data in an N K data matrix, X.
Let y be the N 1 column vector of the N observations, y1,y2, .., yN .
Let ε be the N 1 column vector containing the N disturbances.
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 13 / 174
2. The Multiple Linear Regression Model
Notations (contd)
yN1
=
0BBBBBB@
y1y2..yi..yN
1CCCCCCA xkN1
=
0BBBBBB@
x1kx2k..xik..xNk
1CCCCCCA εN1
=
0BBBBBB@
ε1ε2..εi..εN
1CCCCCCA βK1
=
0BB@β1β2..
βK
1CCA
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 14 / 174
2. The Multiple Linear Regression Model
Notations (contd)
XNK
= (x1 : x2 : .. : xK )
or equivalently
XNK
=
0BBBBBB@
x11 x12 .. x1k .. x1Kx21 x22 .. x2k .. x2K.. .. .. .. .. ..xi1 xi2 .. xik .. xiK.. .. .. .. ..xN1 xN2 .. xNk .. xNK
1CCCCCCA
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 15 / 174
2. The Multiple Linear Regression Model
FactIn most of cases, the rst column of X is assumed to be a column of 1s sothat β1 is the constant term in the model.
x1N1
= 1N1
XNK
=
0BBBBBB@
1 x12 .. x1k .. x1K1 x22 .. x2k .. x2K.. .. .. .. .. ..1 xi2 .. xik .. xiK.. .. .. .. ..1 xN2 .. xNk .. xNK
1CCCCCCA
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 16 / 174
2. The Multiple Linear Regression Model
Remark
More generally, the matrix X may as well contain stochastic and nonstochastic elements such as:
Constant;
Time trend;
Dummy variables (for specic episodes in time);
Etc.
Therefore, X is generally a mixture of xed and random variables.
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 17 / 174
2. The Multiple Linear Regression Model
Denition (Simple linear regression model)The simple linear regression model is a model with only one stochasticregressor: K = 1 if there is no constant
yi = β1xi + εi
or K = 2 if there is a constant:
yi = β1 + β2xi2 + εi
for i = 1, ..,N, ory = β1 + β2x2 + ε
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 18 / 174
2. The Multiple Linear Regression Model
Denition (Multiple linear regression model)The multiple linear regression model can be written
yN1
= XNK
βK1
+ εN1
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 19 / 174
2. The Multiple Linear Regression Model
One key di¤erence for the specication of the MLRM:
Parametric/semi-parametric specication
Parametric model: the distribution of the error terms is fullycharacterized, e.g. ε N (0,Ω)
Semi-Parametric specication: only a few moments of the error termsare specied, e.g. E (ε) = 0 and V (ε) = E
εε>
= Ω.
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 20 / 174
2. The Multiple Linear Regression Model
This di¤erence does not matter for the derivation of the ordinary leastsquare estimator
But this di¤erence matters for (among others):
1 The characterization of the statistical properties of the OLS estimator(e.g., e¢ ciency);
2 The choice of alternative estimators (e.g., the maximum likelihoodestimator);
3 Etc.
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 21 / 174
2. The Multiple Linear Regression Model
Denition (Semi-parametric multiple linear regression model)The semi-parametric multiple linear regression model is dened by
y = Xβ+ ε
where the error term ε satises
E (εjX) = 0N1
V (εjX) = σ2 INNN
and IN is the identity matrix of order N.
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 22 / 174
2. The Multiple Linear Regression Model
Remarks
1 If the matrix X is non stochastic (xed), i.e. there are only xedregressors, then the conditions on the error term u read:
E (ε) = 0
V (ε) = σ2IN2 If the (conditional) variance covariance matrix of ε is not diagonal,i.e. if
V (εjX) = Ω
the model is called the Multiple Generalized Linear RegressionModel
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 23 / 174
2. The Multiple Linear Regression Model
Remarks (contd)
The two conditions on the error term ε
E (εjX) = 0N1
V (εjX) = σ2IN
are equivalent toE (yjX) = Xβ
V (yjX) = σ2IN
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 24 / 174
2. The Multiple Linear Regression Model
Denition (The multiple linear Gaussian model)
The (parametric) multiple linear Gaussian model is dened by
y = Xβ+ ε
where the error term ε is normally distributed
ε N0, σ2IN
As a consequence, the vector y has a conditional normal distribution with
yjX NXβ, σ2IN
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 25 / 174
2. The Multiple Linear Regression Model
Remarks
1 The multiple linear Gaussian model is (by denition) a parametricmodel.
2 If the matrix X is non stochastic (xed), i.e. there are only xedregressors, then the vector y has marginal normal distribution:
y NXβ, σ2IN
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 26 / 174
2. The Multiple Linear Regression Model
The classical linear regression model consists of a set of assumptions thatdescribes how the data set is produced by a data generating process (DGP)
Assumption 1: Linearity
Assumption 2: Full rank condition or identication
Assumption 3: Exogeneity
Assumption 4: Spherical error terms
Assumption 5: Data generation
Assumption 6: Normal distribution
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 27 / 174
2. The Multiple Linear Regression Model
Denition (Assumption 1: Linearity)
The model is linear with respect to the parameters β1, .., βK .
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 28 / 174
Remarks
The model species a linear relationship between the dependentvariable and the regressors. For instance, the models
y = β0 + β1x + u
y = β0 + β1 cos (x) + v
y = β0 + β1 1x+ w
are all linear (with respect to β).
In contrast, the model y = β0 + β1xβ2 + ε is non linear
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 29 / 174
2. The Multiple Linear Regression Model
Remark
The model can be linear after some transformations. Starting fromy = Ax β exp (ε) , one has a log-linear specication:
ln (y) = ln (A) + β ln (x) + ε
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 30 / 174
2. The Multiple Linear Regression Model
Denition (Log-linear model)The loglinear model is
ln (yi ) = β1 ln (xi1) + β2 ln (xi2) + ..+ βK ln (xiK ) + εi
This equation is also known as the constant elasticity form as in thisequation, the elasticity of y with respect to changes in x does not varywith xik :
βk =∂ ln (yi )∂ ln (xik )
=∂yi∂xik
xikyi
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 31 / 174
2. The Multiple Linear Regression Model
The classical linear regression model consists of a set of assumptions thatdescribes how the data set is produced by a data generating process (DGP)
Assumption 1: Linearity
Assumption 2: Full rank condition or identication
Assumption 3: Exogeneity
Assumption 4: Spherical error terms
Assumption 5: Data generation
Assumption 6: Normal distribution
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 32 / 174
2. The Multiple Linear Regression Model
Denition (Assumption 2: Full column rank)X is an N K matrix with rank K .
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 33 / 174
2. The Multiple Linear Regression Model
Interpretation
1 There is no exact relationship among any of the independent variablesin the model.
2 The columns of X are linearly independent.
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 34 / 174
2. The Multiple Linear Regression Model
ExampleSuppose that a cross-section model satises:
yi = β0 + β1non labor incomei + β2salaryi+β3total incomei + εi
The identication condition does not hold since total income is exactlyequal to salary plus non labor income (exact linear dependency in themodel).
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 35 / 174
2. The Multiple Linear Regression Model
Remarks
1 Perfect multi-collinearity is generally not di¢ cult to spot and issignalled by most statistical software.
2 Imperfect multi-collinearity is a more serious issue (see further).
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 36 / 174
2. The Multiple Linear Regression Model
Denition (Identication)The multiple linear regression model is identiable if and only if one thefollowing equivalent assertions holds:
(i) rank (X) = K
(ii) The matrix X>X is invertible
(iii) The columns of X form a basis of L (X)(iv) Xβ1 = Xβ2 =) β1 = β2 8 (β1, β2) 2 RK RK
(v) Xβ = 0 =) β = 0 8β 2 RK
(vi) ker (X) = f0g
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 37 / 174
2. The Multiple Linear Regression Model
The classical linear regression model consists of a set of assumptions thatdescribes how the data set is produced by a data generating process (DGP)
Assumption 1: Linearity
Assumption 2: Full rank condition or identication
Assumption 3: Exogeneity
Assumption 4: Spherical error terms
Assumption 5: Data generation
Assumption 6: Normal distribution
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 38 / 174
2. The Multiple Linear Regression Model
Denition (Assumption 3: Strict exogeneity of the regressors)The regressors are exogenous in the sense that:
E (εjX) = 0N1
or equivalently for all the units i 2 f1, ..Ng
E ( εi jX) = 0
or equivalentlyE ( εi j xjk ) = 0
for any explanatory variable k 2 f1, ..Kg and any unit j 2 f1, ..Ng .
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 39 / 174
2. The Multiple Linear Regression Model
CommentsComments
1 The expected value of the error term at observation i (in the sample)is not a function of the independent variables observed at anyobservation (including the i th observation). The independent variablesare not predictors of the error terms.
2 The strict exogeneity condition can be rewritten as:
E (y j X) = Xβ
3 If the regressors are xed, this condition can be rewritten as:
E (ε) = 0N1
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 40 / 174
2. The Multiple Linear Regression Model
Implications
The (strict) exogeneity condition E (εjX) = 0N1 has two implications:1 The zero conditional mean of ε implies that the unconditional meanof u is also zero (the reverse is not true):
E (ε) = EX (E (εjX)) = EX (0) = 0
2 The zero conditional mean of ε implies that (the reverse is not true):
E (εixjk ) = 0 8i , j , k
orCov (εi ,X) = 0 8i
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 41 / 174
2. The Multiple Linear Regression Model
The classical linear regression model consists of a set of assumptions thatdescribes how the data set is produced by a data generating process (DGP)
Assumption 1: Linearity
Assumption 2: Full rank condition or identication
Assumption 3: Exogeneity
Assumption 4: Spherical error terms
Assumption 5: Data generation
Assumption 6: Normal distribution
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 42 / 174
2. The Multiple Linear Regression Model
Denition (Assumption 4: Spherical disturbances)The error terms are such that:
V ( εi jX) = E
ε2iX = σ2 for all i 2 f1, ..Ng
andCov ( εi , εj jX) = E ( εi εj jX) = 0 for all i 6= j
The condition of constant variances is called homoscedasticity. Theuncorrelatedness across observations is called nonautocorrelation.
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 43 / 174
2. The Multiple Linear Regression Model
Comments
1 Spherical disturbances = homoscedasticity + nonautocorrelation
2 If the errors are not spherical, we call them nonspherical disturbances.
3 The assumption of homoscedasticity is a strong one: this is theexception rather than the rule!
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 44 / 174
2. The Multiple Linear Regression Model
Comments
Let us consider the (conditional) variance covariance matrix of the errorterms:
V (εjX)| z NN
= E
εε>X| z
NN
=
0BBBBBB@
E
ε21X E ( ε1ε2jX) .. E ( ε1εj jX) .. E ( ε1εN jX)
E ( ε2ε1jX) E
ε22X .. E ( ε2εj jX) .. E ( ε2εN jX)
.. .. .. .. ..E ( εi ε1jX) .. E ( εi εj jX) .. E ( εi εN jX)
.. .. .. .. ..E ( εN ε1jX) .. E ( εN εj jX) .. E
ε2NX
1CCCCCCA
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 45 / 174
2. The Multiple Linear Regression Model
Comments
The two assumptions (homoscedasticity and nonautocorrelation) implythat:
V (εjX)| z NN
= E
εε>X| z
NN
= σ2 IN
=
0BBBBBB@
σ2 0 .. 0 .. 00 σ2 .. 0 .. 0.. .. .. .. .... .. .. .. 0.. .. .. .. ..0 .. 0 .. σ2
1CCCCCCA
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 46 / 174
2. The Multiple Linear Regression Model
The classical linear regression model consists of a set of assumptions thatdescribes how the data set is produced by a data generating process (DGP)
Assumption 1: Linearity
Assumption 2: Full rank condition or identication
Assumption 3: Exogeneity
Assumption 4: Spherical error terms
Assumption 5: Data generation
Assumption 6: Normal distribution
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 47 / 174
2. The Multiple Linear Regression Model
Denition (Assumption 5: Data generation)
The data in (xi1 xi2 ...xiK ) may be any mixture of constants and randomvariables.
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 48 / 174
2. The Multiple Linear Regression Model
Comments
1 Analysis will be done conditionally on the observed X, so whether theelements in X are xed constants or random draws from a stochasticprocess will not inuence the results.
2 In the case of stochastic regressors, the unconditional statisticalproperties of are obtained in two steps: (1) using the resultconditioned on X and (2) nding the unconditional result byaveraging (i.e., integrating over) the conditional distributions.
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 49 / 174
2. The Multiple Linear Regression Model
Comments
Assumptions regarding (xi1 xi2 ...xiK yi ) for i = 1, ..,N is alsorequired. This is a statement about how the sample is drawn.
In the sequel, we assume that (xi1 xi2 ...xiK yi ) for i = 1, ..,N areindependently and identically distributed (i.i.d).
The observations are drawn by a simple random sampling from a largepopulation.
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 50 / 174
2. The Multiple Linear Regression Model
The classical linear regression model consists of a set of assumptions thatdescribes how the data set is produced by a data generating process (DGP)
Assumption 1: Linearity
Assumption 2: Full rank condition or identication
Assumption 3: Exogeneity
Assumption 4: Spherical error terms
Assumption 5: Data generation
Assumption 6: Normal distribution
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 51 / 174
2. The Multiple Linear Regression Model
Denition (Assumption 6: Normal distribution)The disturbances are normally distributed.
εi jX N0, σ2
or equivalently
εjX N0N1, σ2IN
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 52 / 174
2. The Multiple Linear Regression Model
Comments
1 Once again, this is a convenience that we will dispense with aftersome analysis of its implications.
2 Normality is not necessary to obtain many of the results presentedbelow.
3 Assumption 6 implies assumptions 3 (exogeneity) and 4 (sphericaldisturbances).
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 53 / 174
2. The Multiple Linear Regression Model
Summary
The main assumptions of the multiple linear regression model
A1: linearity The model is linear with β
A2: identication X is an N K matrix with rank K
A3: exogeneity E (εjX) = 0N1A4: spherical error terms V (εjX) = σ2IN
A5: data generation X may be xed or random
A6: normal distribution εjX N0N1, σ2IN
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 54 / 174
2. The Multiple Linear Regression Model
Key Concepts
1 Simple linear regression model
2 Multiple linear regression model
3 Semi-parametric multiple linear regression model
4 Multiple linear Gaussian model
5 Assumptions of the multiple linear regression model
6 Linearity (A1), Identication (A2), Exogeneity (A3), Spherical errorterms (A4), Data generation (A5) and Normal distribution (A6)
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 55 / 174
Section 3
The ordinary least squares estimator
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 56 / 174
3. The ordinary least squares estimator
Introduction
1 The simple linear regression model assumes that the followingspecication is true in the population:
y = Xβ+ ε
where other unobserved factors determining y are captured by theerror term ε.
2 Consider a sample fxi1, xi2, .., xiK , yigNi=1 of i .i .d . random variables(be careful to the change of notations here) and only one realizationof this sample (your data set).
3 How to estimate the vector of parameters β?
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 57 / 174
3. The ordinary least squares estimator
Introduction (contd)
1 If we assume that assumptions A1-A6 hold, we have a multiple linearGaussian model (parametric model), and a solution is to use theMLE. The MLE estimator for β coincides to the ordinary leastsquares (OLS) estimator (cf. chapter 2).
2 If we assume that only assumptions A1-A5 hold, we have asemi-parametric multiple linear regression model, the MLE isunfeasible.
3 In this case, the only solution is to use the ordinary least squaresestimator (OLS).
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 58 / 174
3. The ordinary least squares estimator
Intuition
Let us consider the simple linear regression model and for simplicity denotexi = xi2:
yi = β1 + β2xi + εi
The general idea of the OLS consists in minimizing the distancebetween the points (xi , yi ) and the regression line
byi = bβ1 + bβ2xior the points (xi , byi ) for all i = 1, ..,N
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 59 / 174
3. The ordinary least squares estimator
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 60 / 174
3. The ordinary least squares estimator
Intuition (contd)
Estimates of β1 and β2 are chosen by minimizing the sum of the squaredresiduals (SSR):
N
∑i=1
bε2iThis SSR can be written as:
N
∑i=1
bε2i = yi bβ1 bβ2xi2Therefore, bβ1 and bβ2 are the solutions of the minimization problembβ1, bβ2 = argmin
(β1,β2)
N
∑i=1(yi β1 β2xi )
2
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 61 / 174
3. The ordinary least squares estimator
Denition (OLS - simple linear regression model)
In the simple linear regression model yi = β1 + β2xi + εi , the OLSestimators bβ1 and bβ2 are the solutions of the minimization problembβ1, bβ2 = argmin
(β1,β2)
N
∑i=1(yi β1 β2xi )
2
The solutions are: bβ1 = yN bβ2xNbβ2 = ∑N
i=1 (xi xN ) (yi yN )∑Ni=1 (xi xN )
2
where yN = N1 ∑N
i=1 yi and xN = N1 ∑N
i=1 xi respectively denote thesample mean of the dependent variable y and the regressor x .
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 62 / 174
3. The ordinary least squares estimator
Remark
The OLS estimator is a linear estimator (cf. chapter 1) since it can beexpressed as a linear function of the observations yi :
bβ2 = N
∑i=1
ωiyi
with
ωi =(xi xN )
∑Ni=1 (xi xN )
2
in the case where yN = 0.
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 63 / 174
3. The ordinary least squares estimator
Denition (Fitted value)The predicted or tted value for observation i is:
byi = bβ1 + bβ2xiwith a sample mean equal to the sample average of the observations
byN = 1N
N
∑i=1byi = yN = 1
N
N
∑i=1yi
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 64 / 174
3. The ordinary least squares estimator
Denition (Fitted residual)The residual for observation i is:
bεi = yi bβ1 bβ2xiwith a sample mean equal to zero by denition.
bεN = 1N
N
∑i=1
bεi = 0
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 65 / 174
3. The ordinary least squares estimator
Remarks
1 The t of the regression is good if the sum ∑Ni=1bε2i (or SSR) is
small, i.e., the unexplained part of the variance of y is small.
2 The coe¢ cient of determination or R2 is given by:
R2 =∑Ni=1 (byi yN )2
∑Ni=1 (yi yN )
2 = 1∑Ni=1bε2i
∑Ni=1 (yi yN )
2
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 66 / 174
3. The ordinary least squares estimator
Orthogonality conditions
Under assumption A3 (strict exogeneity), we have E ( εi j xi ) = 0. Thiscondition implies that:
E (εi ) = 0 E (εixi ) = 0
Using the sample analog of this moment conditions (cf. chapter 6,GMM), one has:
1N
N
∑i=1
yi bβ1 bβ2xi = 0
1N
N
∑i=1
yi bβ1 bβ2xi xi = 0
This is a system of two equations and two unknowns bβ1 and bβ2.Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 67 / 174
3. The ordinary least squares estimator
Denition (Orthogonality conditions)The ordinary least squares estimator can be dened from the two sampleanalogs of the following moment conditions:
E (εi ) = 0
E (εixi ) = 0
The corresponding system of equations is just-identied.
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 68 / 174
3. The ordinary least squares estimator
OLS and multiple linear regression model
Now consider the multiple linear regression model
y = Xβ+ ε
or
yi =K
∑k=1
βkxik + εi
Objective: nd an estimator (estimate) of β1, β2, .., βK and σ2 under theassumptions A1-A5.
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 69 / 174
3. The ordinary least squares estimator
OLS and multiple linear regression model
Di¤erent methods:
1 Minimize the sum of squared residuals (SSR)
2 Solve the same minimization problem with matrix notation.
3 Use moment conditions.
4 Geometrical interpretation
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 70 / 174
3. The ordinary least squares estimator
1. Minimize the sum of squared residuals (SSR):
As in the simple linear regression,
bβ = argminβ
N
∑i=1
ε2i = argminβ
N
∑i=1
yi
K
∑k=1
βkxik
!2One can derive the rst order conditions with respect to βk for k = 1, ..,Kand solve a system of K equations with K unknowns.
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 71 / 174
3. The ordinary least squares estimator
2. Using matrix notations:
Denition (OLS and multiple linear regression model)
In the multiple linear regression model yi = x>i β+ εi , withxi = (xi1, .., xiK )
> , the OLS estimator bβ is the solution ofbβ = argmin
β
N
∑i=1
yi x>i β
2The OLS estimators of β is:
bβ = N
∑i=1xix>i
!1 N
∑i=1xiyi
!
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 72 / 174
3. The ordinary least squares estimator
2. Using matrix notations:
Denition (Normal equations)Under suitable regularity conditions, in the multiple linear regression modelyi = x>i β+ εi , with xi = (xi1 : .. : xiK )
> , the normal equations are
N
∑i=1xiyi x>i bβ = 0K1
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 73 / 174
3. The ordinary least squares estimator
2. Using matrix notations:
Denition (OLS and multiple linear regression model)
In the multiple linear regression model y = Xβ+ ε, the OLS estimator bβis the solution of the minimization problem
bβ = argminβ
ε>ε = argminβ
(yXβ)> (yXβ)
The OLS estimators of β is:
bβ = X>X1 X>y
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 74 / 174
3. The ordinary least squares estimator
2. Using matrix notations:
Denition
The ordinary least squares estimator bβ of β minimizes the following criteria
s (β) = k(yXβ)k2IN = (yXβ)> (yXβ)
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 75 / 174
3. The ordinary least squares estimator
2. Using matrix notations:
The FOC (normal equations) are dened by:
∂s (β)∂β
bβ = 2 X>|zKN
yXbβ| z N1
= 0K1
The second-order conditions hold:
∂s (β)
∂β∂β>
bβ = 2 X>X| z KK
is denite positive
since by denition X>X is a positive denite matrix. We have a minimum.
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 76 / 174
3. The ordinary least squares estimator
2. Using matrix notations:
Denition (Normal equations)Under suitable regularity conditions, in the multiple linear regression modely = Xβ+ ε, the normal equations are given by:
X>|zKN
(yXβ)| z N1
= 0K1
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 77 / 174
3. The ordinary least squares estimator
Denition (Unbiased variance estimator)
In the multiple linear regression model y = Xβ+ ε, the unbiasedestimator of σ2 is given by:
bσ2 = 1N K
N
∑i=1
bε2i SSRN K
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 78 / 174
3. The ordinary least squares estimator
2. Using matrix notations:
The estimator bσ2 can also be written as:bσ2 = 1
N KN
∑i=1
yi x>i bβ2
bσ2 = (yXβ)> (yXβ)
N K
bσ2 = k(yXβ)k2INN K
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 79 / 174
3. The ordinary least squares estimator
3. Using moment conditions:
Under assumption A3 (strict exogeneity), we have E (εjX) = 0. Thiscondition implies:
E (εixi ) = 0K1
with xi = (xi1 : .. : xiK )> . Using the sample analogs, one has:
1N
N
∑i=1xiyi x>i bβ = 0K1
We have K (normal) equations with K unknown parameters bβ1, .., bβK .The system is just identied.
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 80 / 174
3. The ordinary least squares estimator
4. Geometric interpretation:
1 The ordinary least squares estimation methods consists in determiningthe adjusted vector, by, which is the closest to y (in a certain space...)such that the squared norm between y and by is minimized.
2 Finding by is equivalent to nd an estimator of β.
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 81 / 174
3. The ordinary least squares estimator
4. Geometric interpretation:
Denition (Geometric interpretation)
The adjusted vector, by, is the (orthogonal) projection of y onto thecolumn space of X. The tted error terms, bε, is the projection of y ontothe orthogonal space engendered by the column space of X. The vectors byand bε are orthogonal.
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 82 / 174
3. The ordinary least squares estimator
4. Geometric interpretation:
Source: F. Pelgrin (2010), Lecture notes, Advanced Econometrics
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 83 / 174
3. The ordinary least squares estimator
4. Geometric interpretation:
Denition (Projection matrices)
The vectors by and bε are dened to be:by = P ybε =M y
where P and M denote the two following projection matrices:
P = XX>X
1X>
M = IN P = IN XX>X
1X>
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 84 / 174
3. The ordinary least squares estimator
Other geometric interpretations:
Suppose that there is a constant term in the model.
1 The least squares residuals sum to zero:
N
∑i=1
bεi = 02 The regression hyperplane passes through the point of means of thedata (xN , yN ) .
3 The mean of the tted (adjusted) values of y equals the mean of theactual values of y : byN = yN
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 85 / 174
3. The ordinary least squares estimator
Denition (Coe¢ cient of determination)The coe¢ cient of determination of the multiple linear regression model(with a constant term) is the ratio of the total (empirical) varianceexplained by model to the total (empirical) variance of y:
R2 =∑Ni=1 (byi yN )2
∑Ni=1 (yi yN )
2 = 1∑Ni=1bε2i
∑Ni=1 (yi yN )
2
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 86 / 174
3. The ordinary least squares estimator
Remark
1 The coe¢ cient of determination measures the proportion of the totalvariance (or variability) in y that is accounted for by variation in theregressors (or the model).
2 Problem: the R2 automatically and spuriously increases when extraexplanatory variables are added to the model.
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 87 / 174
3. The ordinary least squares estimator
Denition (Adjusted R-squared)The adjusted R-squared coe¢ cient is dened to be:
R2= 1 N 1
N p 11 R2
where p denotes the number of regressors (not counting the constantterm, i.e., p = K 1 if there is a constant or p = K otherwise).
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 88 / 174
3. The ordinary least squares estimator
Remark
One can show that
1 R2<R2
2 if N is large R2 'R2
3 The adjusted R-squared R2can be negative.
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 89 / 174
3. The ordinary least squares estimator
Key Concepts
1 OLS estimator and estimate
2 Fitted or predicted value
3 Residual or tted residual
4 Orthogonality conditions
5 Normal equations
6 Geometric interpretations of the OLS
7 Coe¢ cient of determination and adjusted R-squared
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 90 / 174
Section 4
Statistical properties of the OLS estimator
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 91 / 174
4. Statistical properties of the OLS estimator
In order to study the statistical properties of the OLS estimator, we haveto distinguish (cf. chapter 1):
1 The nite sample properties
2 The large sample or asymptotic properties
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 92 / 174
4. Statistical properties of the OLS estimator
But, we have also to distinguish the properties given the assumptionsmade on the linear regression model
1 Semi-parametric linear regression model (the exact distribution ofε is unknown) versus parametric linear regression model (andespecially Gaussian linear regression model, assumption A6).
2 X is a matrix of random regressors versus X is a matrix of xedregressors.
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 93 / 174
4. Statistical properties of the OLS estimator
Fact (Assumptions)In the rest of this section, we assume that assumptions A1-A5 hold.
A1: linearity The model is linear with β
A2: identication X is an N K matrix with rank K
A3: exogeneity E (εjX) = 0N1A4: spherical error terms V (εjX) = σ2IN
A5: data generation X may be xed or random
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 94 / 174
Subsection 4.1.
Finite sample properties of the OLS estimator
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 95 / 174
4.1. Finite sample properties
Objectives
The objectives of this subsection are the following:
1 Compute the two rst moments of the (unknown) nite sampledistribution of the OLS estimators bβ and bσ2
2 Determine the nite sample distribution of the OLS estimators bβ andbσ under particular assumptions (A6).3 Determine if the OLS estimators are "good": e¢ cient estimatorversus BLUE.
4 Introduce the Gauss-Markov theorem.
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 96 / 174
4.1. Finite sample properties
First moments of the OLS estimators
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 97 / 174
4.1. Finite sample properties
Moments
In a rst step, we will derive the rst moments of the OLS estimators
1 Step 1: compute Ebβ and V
bβ2 Step 2: compute E
bσ2 and Vbσ2
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 98 / 174
4.1. Finite sample properties
Denition (Unbiased estimator)
In the multiple linear regression model y = Xβ0 + ε, under the assumptionA3 (strict exogeneity), the OLS estimator bβ is unbiased:
Ebβ = β0
where β0 denotes the true value of the vector of parameters. This resultholds whether or not the matrix X is considered as random.
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 99 / 174
4.1. Finite sample properties
Proof
Case 1: xed regressors (cf. chapter 1)
bβ = X>X1 X>y = β0 +X>X
1 X>ε
So, if X is a matrix of xed regressors:
Ebβ = β0 +
X>X
1 X>E (ε)
Under assumption A3 (exogeneity), E (εjX) = E (ε) = 0. Then, we get:
Ebβ = β0
The OLS estimator is unbiased.
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 100 / 174
4.1. Finite sample properties
Proof (contd)
Case 2: random regressors
bβ = X>X1 X>y = β0 +X>X
1 X>ε
If X is includes some random elements:
EbβX = β0 +
X>X
1 X>E (εjX)
Under assumption A3 (exogeneity), E (εjX) = 0. Then, we get:
EbβX = β0
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 101 / 174
4.1. Finite sample properties
Proof (contd)
Case 2: random regressors
The OLS estimator bβ is conditionally unbiased.EbβX = β0
Besides, we have:
Ebβ = EX
EbβX = EX (β0) = β0
where EX denotes the expectation with respect to the distribution of X.So, the OLS estimator bβ is unbiased.
Ebβ = β0
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 102 / 174
4.1. Finite sample properties
Denition (Variance of the OLS estimator, non-stochastic regressors)
In the multiple linear regression model y = Xβ+ ε, if the matrix X isnon-stochastic, the unconditional variance covariance matrix of the OLSestimator bβ is:
Vbβ = σ2
X>X
1
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 103 / 174
4.1. Finite sample properties
Proof bβ = X>X1 X>y = β0 +X>X
1 X>ε
So, if X is a matrix of xed regressors:
Vbβ = E
bβ β0
> bβ β0
= E
X>X
1X>εε>X
X>X
1=
X>X
1X>E
εε>
XX>X
1
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 104 / 174
4.1. Finite sample properties
Proof (contd)
Under assumption A4 (spherical disturbances), we have:
V (ε) = E
εε>= σ2IN
The variance covariance matrix of the OLS estimator is dened by:
Vbβ =
X>X
1X>E
εε>
XX>X
1=
X>X
1X>σ2INX
X>X
1= σ2
X>X
1 X>X
X>X
1= σ2
X>X
1
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 105 / 174
4.1. Finite sample properties
Denition (Variance of the OLS estimator, stochastic regressors)
In the multiple linear regression model y = Xβ0 + ε, if the matrix X isstochastic, the conditional variance covariance matrix of the OLSestimator bβ is:
VbβX = σ2
X>X
1The unconditional variance covariance matrix is equal to:
Vbβ = σ2 EX
X>X
1where EX denotes the expectation with respect to the distribution of X.
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 106 / 174
4.1. Finite sample properties
Proof bβ = X>X1 X>y = β0 +X>X
1 X>ε
So, if X is a stochastic matrix:
VbβX = E
bβ β0
> bβ β0
X= E
X>X
1X>εε>X
X>X
1X=
X>X
1X>E
εε>
XX X>X1
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 107 / 174
4.1. Finite sample properties
Proof (contd)
Under assumption A4 (spherical disturbances), we have:
V (εjX) = E
εε>X = σ2IN
The conditional variance covariance matrix of the OLS estimator is denedby:
VbβX =
X>X
1X>E
εε>
XX X>X1=
X>X
1X>σ2INX
X>X
1= σ2
X>X
1
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 108 / 174
4.1. Finite sample properties
Proof (contd)
We have:VbβX = σ2
X>X
1The (unconditional) variance covariance matrix of the OLS estimator isdened by:
Vbβ = EX
VbβX = σ2 EX
X>X
1where EX denotes the expectation with respect to the distribution of X.
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 109 / 174
4.1. Finite sample properties
Summary
Case 1: X stochastic Case 2: X nonstochastic
Mean Ebβ = β0 E
bβ = β0
Variance Vbβ = σ2 EX
X>X
1Vbβ = σ2
X>X
1Cond. mean E
bβX = β0
Cond. var VbβX = σ2
X>X
1
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 110 / 174
4.1. Finite sample properties
Question
How to estimate the variance covariance matrix of the OLS estimator?
VbβOLS = σ2
X>X
1if X is nonstochastic
VbβOLS = σ2 EX
X>X
1if X is stochastic
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 111 / 174
4.1. Finite sample properties
Question (contd)
Denition (Variance estimator)An unbiased estimator of the variance covariance matrix of the OLSestimator is given: bV bβOLS = bσ2 X>X1where bσ2 = (N K )1 bε>bε is an unbiased estimator of σ2. This resultholds whether X is stochastic or non stochastic.
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 112 / 174
4.1. Finite sample properties
Summary
Case 1: X stochastic Case 2: X nonstochastic
Variance Vbβ = σ2 EX
X>X
1Vbβ = σ2
X>X
1Estimator bV bβOLS = bσ2 X>X1 bV bβOLS = bσ2 X>X1
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 113 / 174
4.1. Finite sample properties
Denition (Estimator of the variance of disturbances)Under the assumption A1-A5, in the multiple linear regression modely = Xβ+ ε, the estimator bσ2 is unbiased:
Ebσ2 = σ2
where bσ2 = 1N K
N
∑i=1
bε2i = bε>bεN K
This result holds whether or not the matrix X is considered as random.
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 114 / 174
4.1. Finite sample properties
Proof
We assume that X is stochastic. Let M denotes the projection matrix(residual maker) dened by:
M = IN XX>X
1X>
with bε(N ,1)
= M(N ,N )
y(N ,1)
The N N matrix M satises the following properties:
1 if X is regressed on X, a perfect t will result and the residuals will bezero, so M X = 0
2 The matrix M is symmetric M> =M and idempotent M M =M
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 115 / 174
4.1. Finite sample properties
Proof (contd)
The residuals are dened as to be:
bε =M y
Since y = Xβ+ ε, we have
bε =M (Xβ+ ε) =MXβ+Mε
Since MX = 0, we have bε =Mε
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 116 / 174
4.1. Finite sample properties
Proof (contd)The estimator bσ2 is based on the sum of squared residuals (SSR)
bσ2 = bε>bεN K =
ε>Mε
N K
The expected value of the SSR is
Ebε>bεX = E
ε>Mε
XThe scalar ε>Mε is a 1 1 scalar, so it is equal to its trace.
tr
E
ε>MεX = trE
εε>M
X = trEMεε>
Xsince tr (AB) = tr (AB) .
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 117 / 174
4.1. Finite sample properties
Proof (contd)
Since M = IN XX>X
1X> depends on X, we have:
Ebε>bεX = trE
Mεε>
X = trME
εε>X
Under assumptions A3 and A4, we have
E
εε>X = σ2IN
As a consequence
Ebε>bεX = tr σ2M IN
= σ2tr (M)
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 118 / 174
4.1. Finite sample properties
Proof (contd)
Ebε>bεX = σ2 tr (M)
= σ2 tr
IN X
X>X
1X>
= σ2 tr (IN ) σ2 tr
XX>X
1X>
= σ2 tr (IN ) σ2 tr
X>X
X>X
1= σ2 tr (IN ) σ2 tr (IK )= σ2 (N K )
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 119 / 174
4.1. Finite sample properties
Proof (contd)
By denition of bσ2, we have:Ebσ2X = E
bε>bεXN K =
σ2 (N K )N K = σ2
So, the estimator bσ2 is conditionally unbiased.Ebσ2 = EX
Ebσ2X = EX
σ2= σ2
The estimator bσ2 is unbiased:Ebσ2 = σ2
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 120 / 174
4.1. Finite sample properties
Remark
Given the same principle, we can compute the variance of the estimator bσ2.bσ2 = bε>bε
N K =ε>Mε
N K
As a consequence, we have:
Vbσ2X = 1
(N K )V
ε>MεX
Vbσ2 = EX
Vbσ2X
But, it takes... at least ten slides.....
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 121 / 174
4.1. Finite sample properties
Denition (Variance of the estimator bσ2)In the multiple linear regression model y = Xβ0 + ε, the variance of theestimator bσ2 is
Vbσ2 = 2σ4
N Kwhere σ2 denotes the true value of variance of the error terms. This resultholds whether or not the matrix X is considered as random.
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 122 / 174
4.1. Finite sample properties
Summary
Case 1: X stochastic Case 2: X nonstochastic
Mean Ebσ2 = σ2 E
bσ2 = σ2
Variance Vbσ2 = 2σ4
NK Vbσ2 = 2σ4
NK
Cond. mean Ebσ2X = σ2
Cond. var Vbσ2X = 2σ4
NK
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 123 / 174
4.1. Finite sample properties
Finite sample distributions
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 124 / 174
4.1. Finite sample properties
Summary
1 Under assumptions A1-A5, we can derive the two rst moments ofthe (unknown) nite sample distribution of the OLS estimator bβ, i.e.Ebβ and V
bβ .2 Are we able to characterize the nite sample distribution (or exactsampling distribution) of bβ ?
3 For that, we need to put an assumption on the distribution of ε anduse a parametric specication.
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 125 / 174
4.1. Finite sample properties
Finite sample distribution
Fact (Finite sample distribution, I)
(1) In a parametric multiple linear regression model with stochasticregressors, the conditional nite sample distribution of bβ is known. Theunconditional nite sample distribution is generally unknown:
bβX D bβ ??
where D is a multivariate distribution.
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 126 / 174
4.1. Finite sample properties
Finite sample distribution
Fact (Finite sample distribution, II)
(2) In a parametric multiple linear regression model with nonstochasticregressors, the marginal (unconditional) nite sample distribution of bβ isknown: bβ D
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 127 / 174
4.1. Finite sample properties
Finite sample distribution
Fact (Finite sample distribution, III)
(3) In a semi-parametric multiple linear regression model, with xed orrandom regressors, the nite sample distribution of bβ is unknown
bβX ?? bβ ??
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 128 / 174
4.1. Finite sample properties
Example (Linear Gaussian regression model)
Under the assumption A6 (normality - linear Gaussian regression model),
ε N0,σ2IN
bβ = X>X1 X>y = β+
X>X
1 X>ε
If X is a stochastic matrix, the conditional distribution of bβ is
bβX N β,σ2
X>X
1If X is a nonstochastic matrix, the marginal distribution of bβ is
bβ N β,σ2
X>X
1Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 129 / 174
4.1. Finite sample properties
Theorem (Linear gaussian regression model)
Under the assumption A6 (normality), the estimators bβ and bσ2 have anite sample distribution given by:
bβ N β,σ2
X>X
1bσ2σ2(N K ) χ2 (N K )
Moreover, bβ and bσ2 are independent. This result holds whether or not thematrix X is considered as random. In this last case, the distribution of bβ isconditional to X with V
bβX = σ2X>X
1.
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 130 / 174
4.1. Finite sample properties
E¢ cient estimator or BLUE?
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 131 / 174
4.1. Finite sample properties
Question: OLS estimator = "good" estimator of β?
The question is to know if there this estimator is preferred to otherunbiased estimators?
VbβOLS 7 V
bβotherIn general, to answer to this question we use the FDCR orCramer-Rao bound and study the e¢ ciency of the estimator (cf.chapters 1 and 2).
Problem: the computation of the FDCR bound requires anassumption on the distribution of ε.
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 132 / 174
4.1. Finite sample properties
Two cases:
1 In a parametric model, it is possible to show that the OLS estimatoris e¢ cient - its variance reaches the FDCR bound. The OLS is theBest Unbiased Estimator (BUE)
2 In a semi-parametric model:1 we dont known the conditional distribution of y ven X and thelikelihood of the sample.
2 The FDCR bound is unknown: so it is impossible to show thee¢ ciency of the OLS estimator.
3 In a semi-parametric model, we introduce another concept: the BestLinear Unbiased Estimator (BLUE)
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 133 / 174
4.1. Finite sample properties
Problem (E¢ ciency of the OLS estimator)
In a semi-parametric multiple linear regression model (with noassumption on the distribution of ε), it is impossible to compute theFDCR or Cramer Rao bound and to show that the e¢ ciency of the OLSestimator. The e¢ ciency of the OLS estimator can only be shown in aparametric multiple linear regression model.
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 134 / 174
4.1. Finite sample properties
Theorem (E¢ ciency - Gaussian model)
In a Gaussian multiple linear regression model y = Xβ0 + ε, withε N
0,σ2IN
(assumption A6), the OLS estimator bβ is e¢ cient. Its
variance attains the FDCR or Cramer-Rao bound:
Vbβ = I1N (β0)
This result holds whether or not the matrix X is considered as random.
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 135 / 174
4.1. Finite sample properties
Proof
Let us consider the case where X is nonstochastic. Under the assumptionA6, the distribution is known, since y N
Xβ,σ2IN
. The log-likelihood
of the sample fyi , xigNi=1, with xi = (xi1 : .. : xiK )> , is then equal to:
`N (β; y, x) = N2lnσ2 N2ln (2π) (yXβ)> (yXβ)
2σ2
If we denote θ =
β> : σ2>, we have (cf. chapter 2):
IN (θ0) = Eθ
∂2`N (θ; y, x)
∂θ∂θ>
=
1σ2X>X 021012 T
2σ4
!
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 136 / 174
4.1. Finite sample properties
Proof (contd)
IN (θ0) =
1σ2X>X 021012 T
2σ4
!The inverse of this block matrix is given by
I1N (θ0) =
0@ σ2X>X
1021
012 2σ4
T
1A =
I1N (β0) 021012 I1N
σ2 !
Then we have:
Vbβ = I1N (β0) = σ2
X>X
1
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 137 / 174
4.1. Finite sample properties
Problem
1 In a semi-parametric model (with no assumption on the distributionof ε), it is impossible to compute the FDCR bound and to show thee¢ ciency of the OLS estimator.
2 The solution consists in introducing the concept of best linearunbiased estimator (BLUE): the Gauss-Markov theorem...
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 138 / 174
4.1. Finite sample properties
Theorem (Gauss-Markov theorem)In the linear regression model under assumptions A1-A5, the least squaresestimator bβ is the best linear unbiased estimator (BLUE) of β0 whetherX is stochastic or nonstochastic.
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 139 / 174
4.1. Finite sample properties
Comment
The estimator bβk for k = 1, ..K is the BLUE of βk
Best = smallest variance
Linear (in y or yi ) : bβk = ∑Ni=1 ωkiyi
Unbiased: Ebβk = βk
Estimator: bβk = f (y1, .., yN )
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 140 / 174
4.1. Finite sample properties
Remark
The fact that the OLS is the BLUE means that there is no linearunbiased estimator with a smallest variance than bβBUT, nothing guarantees that a nonlinear unbiased estimator mayhave a smallest variance than bβ
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 141 / 174
4.1. Finite sample properties
Summary
Model Case 1: X stochastic Case 2: X nonstochastic
Semi-parametric bβ is the BLUE bβ is the BLUE(Gauss-Markov) (Gauss-Markov)
Parametric bβ is e¢ cient and BUE bβ is e¢ cient and BUE(FDCR bound) (FDCR bound)
BUE: Best Unbiased Estimator
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 142 / 174
4.1. Finite sample properties
Other remarks
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 143 / 174
4.1. Finite sample properties
Remarks
1 Including irrelevant variables (i.e., coe¢ cient = 0) does not a¤ectthe unbiasedness of the estimator.
2 Irrelevant variables decrease the precision of OLS estimator, but donot result in bias.
3 Omitting relevant regressors obviously does a¤ect the estimator,since the zero conditional mean assumption does not hold anymore.
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 144 / 174
4.1. Finite sample properties
Remarks
1 Omitting relevant variables cause bias (under some conditions onthese variables).
1 The direction of the bias can be (sometimes) characterized.
2 The size of the bias is more complicated to derive.
2 More generally, correlation between a single regressor and the errorterm (endogeneity) usually results in all OLS estimators being biased
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 145 / 174
4.1. Finite sample properties
Key Concepts
1 OLS unbiased estimator under assumption A3 (exogeneity)
2 Mean and variance of the OLS estimator under assumptions A3-A4(exogeneity - spherical disturbances)
3 Finite sample distribution
4 Estimator of the variance of the error terms
5 BLUE estimator and e¢ cient estimator (FDCR bound)
6 Gauss-Markov theorem
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 146 / 174
Subsection 4.2.
Asymptotic properties of the OLS estimator
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 147 / 174
4.2. Asymptotic properties
Objectives
The objectives of this subsection are the following
1 Study the consistency of the OLS estimator.
2 Derive the asymptotic distribution of the OLS estimator bβ.3 Estimate the asymptotic variance covariance matrix of bβ.
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 148 / 174
4.2. Asymptotic properties
Theorem (Consistency)
Let consider the multiple linear regression model yi = x>i β0 + εi , withxi = (xi1 : .. : xiK )
> . Under assumptions A1-A5, the OLS estimatorsbβ and bσ2 are (weakly) consistent:bβ p! β0
bσ2 p! σ2
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 149 / 174
4.2. Asymptotic Properties
Proof (cf. chapter 1)
We have:
bβ = N
∑i=1xix>i
!1 N
∑i=1xiyi
!= β0 +
N
∑i=1xix>i
!1 N
∑i=1xi εi
!
By multiplying and dividing by N, we get:
bβ = β+
1N
N
∑i=1xix>i
!1 1N
N
∑i=1xi εi
!
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 150 / 174
4.2. Asymptotic Properties
Proof (cf. chapter 1)
bβ = β+
1N
N
∑i=1xix>i
!1 1N
N
∑i=1xi εi
!
1 By using the (weak) law of large number (Kitchines theorem), wehave:
1N
N
∑i=1xix>i
p! Exix>i
1N
N
∑i=1xi εi
p! E (xi εi )
2 By using the Slutskys theorem:
bβ p! β+E1xix>i
E (xi εi )
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 151 / 174
4.2. Asymptotic Properties
Proof (cf. chapter 1)
bβ p! β+E1xix>i
E (xiµi )
Under assumption A3 (exogeneity):
E ( εi j xi ) = 0)
E (εixi ) = 0K1E (εi ) = 0
We have bβ p! β
The OLS estimator bβ is (weakly) consistent. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 152 / 174
4.2. Asymptotic properties
Theorem (Asymptotic distribution)
Consider the multiple linear regression model yi = x>i β0 + εi , withxi = (xi1 : .. : xiK )
> . Under assumptions A1-A5, the OLS estimator bβ isasymptotically normally distributed:
pNbβ β0
d! N
0, σ2E1X
xix>i
or equivalently bβ asy
N
β0,σ2
NE1X
xix>i
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 153 / 174
4.2. Asymptotic properties
Comments
1 Whatever the specication used (parametric or semi-parametric), it isalways possible to characterize the asymptotic distribution of theOLS estimator.
2 Under assumptions A1-A5, the OLS estimator bβ is asymptoticallynormally distributed, even if ε is not normally distributed.
3 This result (asymptotic normality) is valid whether X is stochastic ornonstochastic.
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 154 / 174
4.2. Asymptotic propertiesProof (cf. chapter 1)
This proof has be shown in the chapter 1 (Estimation theory)
1 Let us rewrite the OLS estimator as:
bβ = β0 +
N
∑i=1xix>i
!1 N
∑i=1xi εi
!
2 Normalize the vector bβ β0
pNbβ β0
=
1N
N
∑i=1xix>i
!1 pN1N
N
∑i=1xi εi
!= A1N bN
AN =1N
N
∑i=1xix>i bN =
pN1N
N
∑i=1xi εi
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 155 / 174
4.2. Asymptotic propertiesProof (cf. chapter 1)
3. Using the WLLN (Kinchines theorem) and the CMP (continuousmapping theorem). Remark: if xi this part is useless.
1N
N
∑i=1xix>i
!1p! E1X
xix>i
4. Using the CLT:
BN =pN
1N
N
∑i=1xi εi E (xi εi )
!d! N (0,V (xi εi ))
Under assumption A3 E ( εi j xi ) = 0 =) E (xi εi ) = 0 and
V (xi εi ) = Exi εi εix>i
= EX
Exi εi εix>i
xi= EX
xiV ( εi j xi ) x>i
= σ2EX
xix>i
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 156 / 174
4.2. Asymptotic properties
Proof (cf. chapter 1)
So we have pNbβ β0
= A1N bN
withA1N
p! A1
bNd! N (Eb,Vb)
A1 = E1X
xix>i
Eb = 0 Vb = σ2EX
xix>i
= σ2A
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 157 / 174
4.2. Asymptotic properties
Proof (cf. chapter 1)
A1Np! A1
bNd! N (Eb,Vb)
A1 = E1xix>i
Eb = 0 Vb = σ2EX
xix>i
= σ2A
By using the Sluskys theorem (for a convergence in distribution), we have:
A1N bNd! N (Ec,Vc)
Ec = A1 Eb = A1 0 = 0Vc = A1 Vb A1 = A1 σ2AA1 = σ2A1
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 158 / 174
4.2. Asymptotic properties
Proof (cf. chapter 1)
pNbβ β0
= A1N bN
d! N0, σ2A1
with A1 = E1X
x>i xi
(be careful: it is the inverse of the expectation
and not the reverse).
Finally, we have:
pNbβ β0
d! N
0, σ2E1X
xix>i
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 159 / 174
4.2. Asymptotic propertiesRemark
1 In some textbooks (Greene, 2007 for instance), the asymptoticvariance covariance matrix of the OLS estimator is expressed as afunction of the plim of N1X>X denoted by Q.
Q = p lim1NX>X
2 Both notations are equivalent since:
1NX>X =
1N
N
∑i=1xix>i
By using the WLLN (Kinchines theorem), we have:
1NX>X
p! Q = EX
xix>i
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 160 / 174
4.2. Asymptotic properties
Theorem (Asymptotic distribution)
Consider the multiple linear regression model y = Xβ0 + ε. Underassumptions A1-A5, the OLS estimator bβ is asymptotically normallydistributed: p
Nbβ β0
d! N
0, σ2Q1
where
Q = p lim1NX>X = EX
x>i xi
or equivalently bβ asy
N
β0,σ2
NQ1
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 161 / 174
4.2. Asymptotic properties
Denition (Asymptotic variance covariance matrix)
The asymptotic variance covariance matrix of the OLS estimators bβ isequal to
Vasy
bβ = σ2
NE1X
xix>i
or equivalently
Vasy
bβ = σ2
NQ1
where Q = p limN1X>X.
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 162 / 174
4.2. Asymptotic properties
Remark
Finite sample variance
Vbβ = σ2EX
X>X
1Asymptotic variance
Vasy
bβ = σ2
NE1X
xix>i
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 163 / 174
4.2. Asymptotic properties
Remark
How to estimate the asymptotic variance covariance matrix of the OLSestimator?
Vasy
bβ = σ2
NE1X
xix>i
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 164 / 174
4.2. Asymptotic properties
Denition (Estimator of the asymptotic variance matrix)A consistent estimator of the asymptotic variance covariance matrix isgiven by: bVasy
bβ = bσ2 X>X1where bσ2 is consistent estimator of σ2 :
bσ2 = bε>bεN K
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 165 / 174
4.2. Asymptotic properties
Proof
Vasy
bβ = σ2
NE1X
xix>i
We known that: bσ2 p! σ2
X>XN
=1N
N
∑i=1xix>i
p! EX
xix>i
= Q
By using the CMP and the Slutsky theorem, we get:
bσ2 X>XN
!1p! σ2E1X
xix>i
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 166 / 174
4.2. Asymptotic propertiesProof (contd)
Vasy
bβ = σ2
NE1X
xix>i
bσ2 X>X
N
!1p! σ2E1X
xix>i
As consequence
bσ2N
X>XN
!1p! σ2
NE1X
xix>i
bσ2 X>X1 p! Vasy
bβor equivalently bVasy
bβ p! Vasy
bβ
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 167 / 174
4.2. Asymptotic properties
Remark
Under assumptions A1-A5, for a stochastic matrix X, we have
Finite sample Asymptotic
Variance Vbβ = σ2EX
X>X
1Vasy
bβ = σ2N1Q1
with Q = p lim 1NX
>X
Estimator bV bβ = bσ2 X>X1 bVasy
bβ = bσ2 X>X1
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 168 / 174
4.2. Asymptotic properties
Example (CAPM)The empirical analogue of the CAPM is given by:
erit = αi + βiermt + εt
erit = rit rft| z excess return of security i at time t
ermt = (rmt rft )| z market excess return at time t
where εt is an i .i .d . error term with:
E (εt ) = 0 V (εt ) = σ2 E ( εt jermt ) = 0Data le: capm.xls for Microsoft, SP500 and Tbill (closing prices) from11/1/1993 to 04/03/2003
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 169 / 174
4.2. Asymptotic sample properties
Example (CAPM, contd)
Question: write a Matlab code in order to nd (1) the estimates of α andβ for Microsoft, (2) the corresponding standard errors, (3) the SE of theregression, (4) the SSR, (5) the coe¢ cient of determination and (6) theadjusted R2. Compare your results to the following table (Eviews)
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 170 / 174
4.2. Asymptotic sample properties
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 171 / 174
4.2. Asymptotic sample properties
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 172 / 174
4.2. Asymptotic sample properties
Key Concepts
1 The OLS estimator is weakly consist
2 The OLS estimator is asymptotically normally distributed
3 Asymptotic variance covariance matrix
4 Estimator of the asymptotic variance
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 173 / 174
End of Chapter 3
Christophe Hurlin (University of Orléans)
Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 174 / 174