deeqa,ecole a - institut national de la recherche agronomique deeqa.pdf · deeqa,ecole do ctorale...

Download DEEQA,Ecole A - Institut national de la recherche agronomique DEEQA.pdf · DEEQA,Ecole Do ctorale MPSE A cademic y ear 2003-2004 A dv anced Econometrics P anel data econometrics and

If you can't read please download the document

Upload: phammien

Post on 06-Feb-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

  • DEEQA,Ecole Doctorale MPSE

    Academic year 2003-2004

    Advanced Econometrics

    Panel data econometricsand GMM estimation

    Alban ThomasMF 102, [email protected]

  • 2

    Purpose of the course

    Present recent developments in econometrics, that allow fora consistent treatment of the impact of unobserved heterogeneity

    on model predictions: Panel data analysis.

    Present a convenient econometric framework for dealing withrestrictions imposed by theory: Method of Moments estimation.

    Deal with discrete-choice models with unobserved hetero-geneity.

    Two keywords: unobserved heterogeneity and endogeneity.

    Methods:

    - Fixed Eects Least Squares

    - Generalized Least Squares

    - Instrumental Variables

    - Maximum Likelihood estimation for Panel Data models

    - Generalized Method of Moments for Times Series

    - Generalized Method of Moments for Panel Data

    - Heteroskedasticity-consistent estimation

    - Dynamic Panel Data models

    - Logit and Probit models for Panel Data

    - Simulation-based inference

    - Nonparametric and Semiparametric estimation

    Statistical software: SAS, GAUSS, STATA (?)

  • 3

  • 4

  • Contents

    I Panel Data Models 7

    1 Introduction 9

    1.1 Gains in pooling cross section and time series . . . 9

    1.1.1 Discrimination between alternative models . 9

    1.1.2 Examples . . . . . . . . . . . . . . . . . . . 10

    1.1.3 Less colinearity between explanatory variables 11

    1.1.4 May reduce bias due to missing or unob-

    served variables . . . . . . . . . . . . . . . 11

    1.2 Analysis of variance . . . . . . . . . . . . . . . . . 12

    1.3 Some denitions . . . . . . . . . . . . . . . . . . . 15

    2 The linear model 17

    2.1 Notation . . . . . . . . . . . . . . . . . . . . . . . 17

    2.1.1 Model notation . . . . . . . . . . . . . . . 18

    2.1.2 Standard matrices and operators . . . . . . 19

    2.1.3 Important properties of operators . . . . . 20

    2.2 The One-Way Fixed Eects model . . . . . . . . . 21

    2.2.1 The estimator in terms of the Frisch-Waugh-

    Lovell theorem . . . . . . . . . . . . . . . . 21

    2.2.2 Interpretation as a covariance estimator . . 23

    2.2.3 Comments . . . . . . . . . . . . . . . . . . 24

    2.2.4 Testing for poolability and individual eects 25

    5

  • 6 CONTENTS

    2.3 The Random Eects model . . . . . . . . . . . . . 26

    2.3.1 Notation and assumptions . . . . . . . . . 26

    2.3.2 GLS estimation of the Random-eect model 27

    2.3.3 Comparison between GLS, OLS and Within 29

    2.3.4 Fixed individual eects or error components? 29

    2.3.5 Example: Wage equation, Hausman (1978) 30

    2.3.6 Best Quadratic Unbiased Estimators (BQU)

    of variances . . . . . . . . . . . . . . . . . 31

    3 Extensions 33

    3.1 The Two-way panel data model . . . . . . . . . . . 33

    3.1.1 The Two-way xed-eect model . . . . . . 33

    3.1.2 Example: Production function (Hoch 1962) 36

    3.2 More on non-spherical disturbances . . . . . . . . 37

    3.2.1 Heteroskedasticity in individual eect . . . 37

    3.2.2 `Typical heteroskedasticity . . . . . . . . . 38

    3.3 Unbalanced panel data models . . . . . . . . . . . 39

    3.3.1 Introduction . . . . . . . . . . . . . . . . . 39

    3.3.2 Fixed eect models for unbalanced panels . 40

    4 Augmented panel data models 47

    4.1 Introduction . . . . . . . . . . . . . . . . . . . . . 47

    4.2 Choice between Within and GLS . . . . . . . . . . 48

    4.3 An important test for endogeneity . . . . . . . . . 49

    4.4 Instrumental Variable estimation: Hausman-Taylor

    GLS estimator . . . . . . . . . . . . . . . . . . . . 51

    4.4.1 Instrumental Variable estimation . . . . . . 51

    4.4.2 IV in a panel-data context . . . . . . . . . 51

    4.4.3 Exogeneity assumptions and a rst instru-

    ment matrix . . . . . . . . . . . . . . . . . 52

  • CONTENTS 7

    4.4.4 More ecient procedures: Amemiya-MaCurdy

    and Breusch-Mizon-Schmidt . . . . . . . . 53

    4.5 Computation of variance-covariance matrix for IV

    estimators . . . . . . . . . . . . . . . . . . . . . . 55

    4.5.1 Full IV-GLS estimation procedure . . . . . 56

    4.6 Example: Wage equation . . . . . . . . . . . . . . 56

    4.6.1 Model specication . . . . . . . . . . . . . 56

    4.7 Application: returns to education . . . . . . . . . 58

    4.7.1 Variables related to job status . . . . . . . 58

    4.7.2 Variables related to characteristics of house-

    holds heads . . . . . . . . . . . . . . . . . 58

    5 Dynamic panel data models 63

    5.1 Motivation . . . . . . . . . . . . . . . . . . . . . . 63

    5.1.1 Dynamic formulations from dynamic pro-

    gramming problems . . . . . . . . . . . . . 63

    5.1.2 Euler equations and consumption . . . . . . 65

    5.1.3 Long-run relationships in economics . . . . 67

    5.2 The dynamic xed-eect model . . . . . . . . . . . 69

    5.2.1 Bias in the Fixed-Eects estimator . . . . . 70

    5.2.2 Instrumental-variable estimation . . . . . . 73

    5.3 The Random-eects model . . . . . . . . . . . . . 75

    5.3.1 Bias in the ML estimator . . . . . . . . . . 75

    5.3.2 An equivalent representation . . . . . . . . 76

    5.3.3 The role of initial conditions . . . . . . . . 77

    5.3.4 Possible inconsistency of GLS . . . . . . . . 78

    5.3.5 Example: The Balestra-Nerlove study . . . 78

  • 8 CONTENTS

    II Generalized Method of Moments estimation 83

    6 The GMM estimator 85

    6.1 Moment conditions and the method of moments . 85

    6.1.1 Moment conditions . . . . . . . . . . . . . 85

    6.1.2 Example: Linear regression model . . . . . 86

    6.1.3 Example: Gamma distribution . . . . . . . 87

    6.1.4 Method of moments estimation . . . . . . . 87

    6.1.5 Example: Poisson counting model . . . . . 88

    6.1.6 Comments . . . . . . . . . . . . . . . . . . 89

    6.2 The Generalized Method of Moments (GMM) . . . 91

    6.2.1 Introduction . . . . . . . . . . . . . . . . . 91

    6.2.2 Example: Just-identied IV model . . . . . 91

    6.2.3 A denition . . . . . . . . . . . . . . . . . 92

    6.2.4 Example: The IV estimator again . . . . . 92

    6.3 Asymptotic properties of the GMM estimator . . . 93

    6.3.1 Consistency . . . . . . . . . . . . . . . . . 94

    6.3.2 Asymptotic normality . . . . . . . . . . . . 95

    6.4 Optimal and two-step GMM . . . . . . . . . . . . 97

    6.5 Inference with GMM . . . . . . . . . . . . . . . . 99

    6.6 Extension: optimal instruments for GMM . . . . . 102

    6.6.1 Conditional moment restrictions . . . . . . 102

    6.6.2 A rst feasible estimator . . . . . . . . . . 104

    6.6.3 Nearest-neighbor estimation of optimal in-

    struments . . . . . . . . . . . . . . . . . . 106

    6.6.4 Generalizing the approach: other nonpara-

    metric estimators . . . . . . . . . . . . . . 109

    7 GMM estimators for time series models 115

    7.1 GMM and Euler equation models . . . . . . . . . 115

    7.1.1 Hansen and Singleton framework . . . . . . 115

  • CONTENTS 9

    7.1.2 GMM estimation . . . . . . . . . . . . . . 117

    7.2 GMM Estimation of MA models . . . . . . . . . . 118

    7.2.1 A simple estimator . . . . . . . . . . . . . 118

    7.2.2 A more ecient estimator . . . . . . . . . . 120

    7.2.3 Example: The Durbin estimator . . . . . . 121

    7.3 GMM Estimation of ARMA models . . . . . . . . 122

    7.3.1 The ARMA(1,1) model . . . . . . . . . . . 122

    7.3.2 IV estimation . . . . . . . . . . . . . . . . 123

    7.4 Covariance matrix estimation . . . . . . . . . . . . 125

    7.4.1 Example 1: Conditional homoskedasticity . 126

    7.4.2 Example 2: Conditional heteroskedasticity . 126

    7.4.3 Example 3: Covariance stationary process . 127

    7.4.4 The Newey-West estimator . . . . . . . . . 128

    7.4.5 Weighted autocovariance estimators . . . . 130

    7.4.6 Weighted periodogram estimators . . . . . 133

    8 GMM estimators for dynamic panel data 135

    8.1 Introduction . . . . . . . . . . . . . . . . . . . . . 135

    8.2 The Arellano-Bond estimator . . . . . . . . . . . . 136

    8.2.1 Model assumptions . . . . . . . . . . . . . 136

    8.2.2 Implementation of the GMM estimator . . 137

    8.3 More ecient procedures (Ahn-Schmidt) . . . . . . 139

    8.3.1 Additional assumptions . . . . . . . . . . . 139

    8.4 The Blundell-Bond estimator . . . . . . . . . . . . 140

    8.5 Dynamic models with Multiplicative eects . . . . 141

    8.5.1 Multiplicative individual eects . . . . . . . 141

    8.5.2 Mixed structure . . . . . . . . . . . . . . . 143

    8.6 Example: Wage equation . . . . . . . . . . . . . . 145

  • 10 CONTENTS

    III Discrete choice models 149

    9 Nonlinear panel data models 151

    9.1 Brief review of binary discrete-choice models . . . 151

    9.1.1 Linear Probability model . . . . . . . . . . 151

    9.1.2 Logit model . . . . . . . . . . . . . . . . . 152

    9.1.3 Probit model . . . . . . . . . . . . . . . . . 152

    9.2 Logit models for panel data . . . . . . . . . . . . . 153

    9.2.1 Sucient statistics . . . . . . . . . . . . . . 153

    9.2.2 Conditional probabilities . . . . . . . . . . 155

    9.2.3 Example: T = 2 . . . . . . . . . . . . . . . 156

    9.3 Probit models . . . . . . . . . . . . . . . . . . . . 157

    9.4 Semiparametric estimation of discrete-choice models 158

    9.4.1 The binary choice model . . . . . . . . . . 159

    9.4.2 The IV estimator . . . . . . . . . . . . . . 162

    9.5 SML estimation of selection models . . . . . . . . 164

    9.5.1 The GHK simulator . . . . . . . . . . . . . 164

    9.5.2 Example . . . . . . . . . . . . . . . . . . . 168

    Appendix 1. Maximum-Likelihood estimation of the

    Random-eect model 171

    Appendix 2. The two-way random eects model 173

    Appendix 3. The one-way unbalanced random eects

    model 179

    Appendix 4. ML estimation of dynamic panel models181

    Appendix 5. GMM estimation of static panel models185

  • CONTENTS 11

    Appendix 6. A framework for simulation-based infer-

    ence 194

    Appendix 7. Example: the SAS c Software 203

    Appendix 8. A crash course in Gauss c 211

    Appendix 9. Example: The Gauss c software 219

    Appendix 10. IV and GMM estimation with Gauss c224

    Appendix 11. DPD estimation with Gauss c 232

    References 238

  • 12 CONTENTS

  • Part I

    Panel Data Models

    13

  • Chapter 1

    Introduction

    Panel data: Sequential observations on a number of

    units (individuals, rms).

    Also called cross-sections over time, longitudinal data or pooled

    cross-section time-series data.

    1.1 Gains in pooling cross section and time se-

    ries

    1.1.1 Discrimination between alternative models

    Many economic models in the form:

    F (Y;X;Z; ) = 0;

    where Y : individual control variables (workers, rms); X: (public

    policy or principal's) variables; Z: (xed) individual attributes;

    : parameters.

    Linear model:

    Y = 0 + xX + zZ + u:

    15

  • 16 CHAPTER 1. INTRODUCTION

    Alternative views concerning this model:

    Policy variables have a signicant impact whatever individualcharacteristics, or

    Dierences across individuals are due to idiosyncratic individualfeatures, not included in Z.

    In practice, observed dierences across individuals may be due

    to both inter-individual dierences and the impact of policy vari-

    ables.

    1.1.2 Examples

    a) WAGE = 0 + 1EDUCATION + 2Z.

    People with higher education level have higher wages becauserms value those people more;

    People have higher education because they have higher ability(expected productivity) anyway, and rms value worker ability

    more.

    b) SALES = 0 + 1ADV ERTISEMENT + 2Z.

    Advertisement expenditures boost sales;More ecient rms enjoy more sales, and thus have more moneyfor advertisement expenditures.

    c) OUTPUT = 0 + 1REGULATION + 2Z.

    Regulatory control aects rm output; Firms with higher output are more regulated on average.

    d) WAGE = 0 + 11I(UNION) + 2Z.

    Belonging to a union signicantly raises wages;

  • 1.1. GAINS IN POOLING CROSS SECTION AND TIME SERIES 17

    Firms react to higher wages imposed by unions by hiring higher-quality workers, and 1I(UNION) is a proxy for worker quality.

    1.1.3 Less colinearity between explanatory variables

    In consumer or production economics, input, output or consumer

    prices are dicult to use, because:

    Time-series: Aggregated macro price indexes are highly cor-related;

    Cross-sections: Not enough price variation across individualsor rms.

    With panel data, variations across individuals and across time pe-

    riods are accounted for.

    Time-series: no information on the impact of individual char-acteristics (socioeconomic variables,...);

    Cross-sections: no information on adjustment dynamics. Es-timates may reect inter-individual dierences inherent in com-

    parisons of dierent people or rms.

    1.1.4 May reduce bias due to missing or unobserved

    variables

    With panel data, easy to control for unobserved heterogeneity

    across individuals. This is critical in practice, explains why panel

    data models are now so popular in micro- and macro-econometrics.

    Point related to endogeneity and omitted variables issues.

  • 18 CHAPTER 1. INTRODUCTION

    Example: Output supply function under perfect competition

    max = pQ C(;Q) where C(;Q) = c(Q)

    , p = @c(Q)@Q

    = AQ1 (Cobb-Douglas)

    = (0 + 1Q) (Quadratic).

    Cobb-Douglas case: logQ = 11 (log p log A ). From

    equilibrium condition to estimable equation: Observations (Qit; pit),

    unobserved heterogeneity i, rm i, period t.

    logQit =1

    1 (log pit log i A )

    Identication issue: estimable equation is

    ~Qit = a0 + a1~pit + uit; i = 1; 2; : : : ; N; t = 1; 2; : : : ; T;

    where ~Qit = logQit, ~pit = log pit, a1 = 1=( 1),a0 = (A E log i) =( 1), Euit = 0.Model identied if E log i = 0, i.e., Ei = 1, otherwise A is bi-

    ased if i is overlooked and E log i 6= 0.

    Empirical issue: possible correlation between output price pitand eciency term i.

    1.2 Analysis of variance

    Consider the model

    yit = i + xiti + "it; i = 1; 2; : : : ; N; t = 1; 2; : : : ; Ti;

    where xit is scalar, i and i are parameters, and Ti: number of

    time periods available for individual i.

  • 1.2. ANALYSIS OF VARIANCE 19

    Useful rst-order empirical moments are

    yi =1

    T

    TiXt=1

    yit; xi =1

    T

    TiXt=1

    xit;

    Sxxi =

    TiXt=1

    (xit xi)2; Sxyi =TiXt=1

    (xit xi)(yit yi);

    and

    Syyi =

    TiXt=1

    (yit yi)2; i = 1; 2; : : : ; N:

    Least-square parameter estimates are computed as

    i = Sxyi=Sxxi and i = yi xi

    and the Residual Sum of Squares (RSS) for individual i is

    RSSi = Syyi S2xyi=Sxxi; with (Ti 2) degrees of freedom:

    Consider now a restricted model with constant slopes and con-

    stant intercepts:

    yit = + xit + "it;

    which obtains by imposing the following restrictions1 = 2 = = N(= )1 = 2 = = N(= ):

    Under these restrictions, least-squares parameter estimates would

    be

    =

    PN

    i=1

    PTi

    t=1(xit x)(yit y)PN

    i=1

    PTi

    t=1(xit x)2

  • 20 CHAPTER 1. INTRODUCTION

    and = y x, where

    y =1

    NP

    iTi

    NXi=1

    TiXt=1

    yit; x =1

    NP

    iTi

    NXi=1

    TiXt=1

    xit:

    The Residual Sum of Squares is

    RSS =

    NXi=1

    TiXt=1

    (yit y)2

    hPN

    i=1

    PTi

    t=1(yit y)(xit x)i2

    PN

    i=1

    PTi

    t=1(xit x)2;

    with as number of degrees of freedom:P

    N

    i=1 Ti 2.

    For a majority of applications, the rst model is too general and

    estimation would require a great number of time observations. If

    unobserved heterogeneity is additive in the model, we might con-

    sider the following specication with constant slope and dierent

    intercepts:

    yit = i + xit + "it:

    MinimizingP

    i

    Pt(yit i xit)2 with respect to i and , we

    haveXi

    Xt

    (yit i xit) = 0;Xi

    Xt

    xit(yit i xit) = 0;

    so that

    i = yi xi and =P

    i

    Ptxit(yit yi)P

    i

    Ptxit(xit xi)

    :

    Residual Sum of Squares has nowP

    iTi (N +1) degrees of free-

    dom (N + 1 parameters are estimated).

    This is the most popular model encountered in empirical ap-

    plications.

  • 1.3. SOME DEFINITIONS 21

    1.3 Some denitions

    Typical panel: when number of units (individuals) N is large,and number of time periods (T ) is small.

    Short (long) panel: when # periods T is small (large).

    Balanced panel: same # periods for every unit (individual).

    Rotating panel: A subset of individuals is replaced every pe-riod. Rotating panels can be balanced or unbalanced.

    Pseudo panel: when one is pooling cross-sections made ofdierent individuals for every period.

    Attrition: with long panels, the probability that an individualremains in the sample decreases as the number of periods increases

    (non response, moving, death, etc.)

  • 22 CHAPTER 1. INTRODUCTION

  • Chapter 2

    The linear model

    2.1 Notation

    yit = xit + uit; i = 1; 2; : : : ; N; t = 1; 2; : : : ; T;

    where xit is a K vector, is a (K 1) vector of parameters, anduit is the residual term.

    yit and components of xit are both time-varying and varying across

    individuals.

    Component of dependent variable that is unexplained by xit:

    uit = i + t + "it;

    where i is the time-invariant individual eect, t is the time

    eect, and "it is the i.i.d. component.

    One-way error-component model: uit = i + "it.

    Two-way error-component model: uit = i + t + "it.

    23

  • 24 CHAPTER 2. THE LINEAR MODEL

    Allows several predictions of yit given Xit:

    E(yitjxit) = xit across i and t,E(yitjxit; i) = xit + i for ind. i, across periods,E(yitjxit; t) = xit + t for period t, across individuals,E(yitjxit; i; t) = xit + i + t for ind. i and period t.

    2.1.1 Model notation

    2.1.1.1 Model in matrix form

    Y = X + + + ";

    where Y; ; and " are (NT 1), X is (NT K).Convention: index t runs faster, index i runs slower:

    0BBBBBBBBBBBBBBBBBBBB@

    y11...

    y1Ty21...

    y2T...

    yit...

    yN1...

    yNT

    1CCCCCCCCCCCCCCCCCCCCA

    =

    266666666666666666666664

    X(1)11 X

    (K)11

    ... ...X

    (1)1T X

    (K)1T

    X(1)21 X

    (K)21

    ... ...X

    (1)

    2T X(K)

    2T... ...X

    (1)it

    X(K)it

    ... ...X

    (1)N1 X

    (K)N1

    ... ...X

    (1)NT

    X(K)NT

    377777777777777777777775

    0BBBBBBB@

    12...

    k...

    K

    1CCCCCCCA+ + + "

  • 2.1. NOTATION 25

    2.1.1.2 Model in vector form

    yi = Xi + i+ + "i; i = 1; 2; : : : ; N;

    where yi is T 1, Xi is T K. Note: = (1; 2; : : : ; T )0 andi= (i; i; : : : ; i)

    0 are (T 1).

    2.1.2 Standard matrices and operators

    INT : identity matrix w/ NT rows and NT columns; eT : T -vector of ones;

    B = IN (1=T )eTe0T : (Between-individual operator);

    B = (1=N)eNe0N IT : (Between-period operator);

    Q = INT IN (1=T )eTe0T = INT B(Within-individual operator);

    Q = INT (1=N)eNe0N IT = INT B(Within-period operator;)

    B B = (1=NT )eNTe0NT(Computes full population mean).

    Important assumption: No intercept term in the

    model (otherwise, use B B to demean all variables).

    The B operators are used to compute, from NT vectors and ma-

    trices, individual- or time-specic means of variables which are

  • 26 CHAPTER 2. THE LINEAR MODEL

    stored in matrices of row dimension NT .

    The Q operators are used to compute deviations from these

    means.

    2.1.3 Important properties of operators

    Symmetry, idempotency and orthogonality

    Q0 = Q; B0 = B; Q2 = Q; B2 = B; BQ = QB = 0;

    Rank of idempotent matrix = its trace

    ) rank(Q) = N(T 1) and rank(B) = N:Decomposition of the Q operator with N = T = 2:

    Qy =

    0BB@26641 0 0 0

    0 1 0 0

    0 0 1 0

    0 0 0 1

    3775 1 00 1

    1

    2

    1 1

    1 1

    1CCA y

    =

    0BB@y11y12y21y22

    1CCA 1226641 1 0 0

    1 1 0 0

    0 0 1 1

    0 0 1 1

    37750BB@y11y12y21y22

    1CCA

    =

    0BB@y11y12y21y22

    1CCA 120BB@y11 + y12y11 + y12y21 + y22y21 + y22

    1CCAWe will also use

    BT = (1=T )eTe0T : Between operator for a single individual; QT = IT (1=T )eTe0T = IT BT : Within operator for a singleindividual.

  • 2.2. THE ONE-WAY FIXED EFFECTS MODEL 27

    2.2 The One-Way Fixed Eects model

    Terminology: the xed-eects model does not mean that indi-

    vidual eects i are not random in the true model ! Rather,

    estimation is conditional on unobserved heterogeneity: the i's

    are treated as parameters to be estimated.

    2.2.1 The estimator in terms of the Frisch-Waugh-Lovell

    theorem

    Inference is conditional on individual eects: estimates obtain by

    regressing Y on X and on individual dummies.

    Let E the NT N matrix of individual dummy variables:

    E =

    266666666666666666664

    1 0 0 01 0 0 01 0 0 00 1 0 00 1 0 00 1 0 0... ...0 0 0 10 0 0 10 0 0 1" " "(i = 1) (i = 2) (i = N)

    377777777777777777775and consider the model

    Y = X +E + " =W + u

    where W = [X;E], = ( 0; 0)0, u = + ".

  • 28 CHAPTER 2. THE LINEAR MODEL

    Frish-Waugh-Lovell theorem: Parameter estimates are numeri-

    cally identical in the 2 following procedures:

    from OLS = (0; 0)0 = (W 0W )1W 0Y

    = (X0X)1X0Y ; where

    X = [I E(E 0E)1E 0]X = PEX;Y = [I E(E 0E)1E 0]Y = PEY

    (residuals from least-square regression of X and Y on E).

    But E = IN eT , E 0E = IN e0TeT = IN T, PE = I E(E 0E)1E 0 = I 1TE(IN)E 0= I 1

    T(IN eT )(IN eT )0 = I IN 1T eTe

    0T= Q.

    Hence = (X0

    X)1(X0

    Y ) = (X 0P 0EPEX)

    1(X 0P 0EPEY )

    = (X 0QX)1(X 0QY ).

    Idea behind the xed-eect estimation procedure:

    Eliminate individual eects , Eliminate individual-specic deviations

    from variables

    Transformation of the linear model as follows:

    yit 1=TXt

    yit = (xit 1=TXt

    xit) + uit 1=TXt

    uit

    , Y BY = (X BX) + uBu , QY = QX +Qu:Least square parameter estimate:

    = [(QX)0(QX)]1

    (QX)0QY = [X 0Q0QX]1

    (X 0Q0QY )

    = (X 0QX)1X 0QY and V ar() = 2"(X 0QX)1.

  • 2.2. THE ONE-WAY FIXED EFFECTS MODEL 29

    2.2.2 Interpretation as a covariance estimator

    The model is, in vector form:26664y1y2...

    yN

    37775 =26664x1x2...

    xN

    37775 +26664eT0T...

    0T

    377751 +266640TeT...

    0T

    377752

    + +

    266640T0T...

    eT

    37775N +26664"1"2...

    "N

    37775 ;with assumptions:

    E("i) = 0; E("i"0i) = 2

    "IT ; E("i"

    0j) = 0 i 6= j:

    OLS estimates of and i obtain by

    min

    NXi=1

    "0i"i =

    NXi=1

    (yi i xi)0(yi i xi)

    , i = yi xi; i = 1; 2; : : : ; N;and substituting in partial derivative wrt. , we have

    =

    "N;TXi;t

    (xit xi)(xit xi)0#1 "

    N;TXi;t

    (xit xi)(yit yi)#

    This is called the covariance estimator, or the LSDV (Least-Square

    Dummy-Variable) estimator. is unbiased, is consistent when N

    or T tends to innity. Its covariance matrix is

    V ar= 2

    "

    "NXi=1

    xiQTx0i

    #1;

  • 30 CHAPTER 2. THE LINEAR MODEL

    where QT = IT (1=T )eTe0T .i is unbiased but consistent only when T !1.

    2.2.3 Comments

    Model transformation by ltering out individual components) Coecients associated with time-invariant regressors are notidentied.

    Fixed-eect procedure uses variation within periods for eachunit, hence the name.

    Another possibility is the Between procedure, using varia-tion between individuals.

    BY = BX + B+ B";

    = [(BX)0(BX)]1

    (BX)0BY = [X 0BX]1X 0BY:

    This alternative estimator uses variation between individual means

    for model variables.

    If X1 is time-varying only, BX1 = f 1TP

    T

    tx1itgi;t = x1 8i, and

    the intercept term is not identied.

    A word of caution in computing variance estimates. In the

    model QY = QX + Qu, statistical software would divide RSS

    by NT K (individual eects not included). But in the modelY = X+E++", the RSS would be divided by N(T1)K.

    Parameter variance estimates in the Within regression model must

    be multiplied by (NT K)=[N(T 1)K].

  • 2.2. THE ONE-WAY FIXED EFFECTS MODEL 31

    Y

    X

    Between

    Within

    y

    1

    2

    3

    ................................................................................

    ...........

    2.2.4 Testing for poolability and individual eects

    Poolability

    As before:yit = i + xiti + "itversus

    yit = i + xit + "it;

    but now xit is a K vector.

    H0 : 1 = 2 = = N(= ) (K(N 1) constraints).Fisher test statistic is

    (RRSS URSS)=K(N 1)URSS=N(T K 1) v F (K(N 1); N(T K 1)) ;

    where RRSS: from Within regression

    and URSS:=P

    N

    i=1RSSi where RSSi = SyyiS2xyi=Sxxi (see 1.2).

    Testing for individual eects

    H0 : 1 = = N (= ).

  • 32 CHAPTER 2. THE LINEAR MODEL

    yit = + xit + "it (OLS)

    versus

    yit = i + xit + "it (Within):

    Fisher test statistic is

    (RRSS URSS)=(N 1)URSS=(NT N K) v F ((N 1); NT N K)) ;

    where RRSS: from OLS regression on pooled data

    and URSS: from Within (LSDV) regression.

    2.3 The Random Eects model

    2.3.1 Notation and assumptions

    Problem with Fixed-eect model: degrees of freedom are lost when

    N ! 1. Dierent approach: assume individual eects are ran-dom, i.e., model inference is drawn marginally (unconditionally

    upon the i's) wrt. the population of all eects.

    Assumptions:

    i v IID(0; 2); "it v IID(0;

    2"); E(i"it) = E(ixit) = 0;

    with

    E(ij) =

    2 if i = j;

    0 otherwise;

    E("it"sj) =

    2"

    if i = j and t = s;

    0 otherwise:

    Hence cov(uit; ujs) = 2+ 2

    "if i = j and t = s, and 2

    if i = j

    and t 6= s.

  • 2.3. THE RANDOM EFFECTS MODEL 33

    Let

    T = E(uiu0i) =

    266642+ 2

    "2

    2

    2 2 +

    2" 2

    ... ...2

    2 2 + 2"

    37775 ;a (T T ) matrix, for every individual i, i = 1; 2; : : : ; N . We have

    E(uu0) = = IN T = IN

    2(eTe

    0T) + 2

    "IT

    = IN

    2(T BT ) + 2"(QT + BT )

    since QT = IT BT and BT = (1=T )eTe0T . Therefore

    = IN

    2(T BT ) + 2"(QT + BT )

    = T2B +

    2"INT

    or equivalently: = 2"Q+ (T2

    + 2

    ")B.

    2.3.2 GLS estimation of the Random-eect model

    General model form: Y = X + U; with E(UU 0) = .

    Generalized Least Squares (GLS) produce ecient parameter es-

    timates of , 2 and 2" , based on known structure of variance-

    covariance matrix .

    GLS =X 01X

    1X 01Y

    and V ar(GLS) = 2"

    X 01X

    1.

    Computation of 1: use of the formula

    r = (2")rQ+ (T2

    + 2

    ")rB

    for an arbitrary scalar r. Based on properties of Q and B (idem-

    potency and orthogonality).

  • 34 CHAPTER 2. THE LINEAR MODEL

    Hence useful matrices are

    1 =1

    2"

    Q+1

    T2+ 2

    "

    B

    and

    1=2 =1

    "Q+

    1

    (T2 + 2")

    1=2B:

    We have GLS =X 01X

    1X 01Y

    =

    "X 0

    2"

    1X

    #1 "X 0

    2"

    1Y

    #:

    =hX 0 (Q+ B)

    1Xi1 h

    X 0 (Q+ B)1Yi;

    where = (T2+ 2

    ")=2

    "= 1 + T2

    =2

    ".

    GLS as Weighted Least Squares. Premultiply the model by

    "

    1=2 and use OLS: Y = X + u, where

    Y = "

    1=2Y =

    Q+

    "

    (" + T)1=2B

    Y

    X = "

    1=2X =

    Q+

    "

    (" + T)1=2B

    X;

    so that Y = (Q + 1=2B)Y; X = (Q + 1=2B)X; and in

    scalar form:

    fyitg = (yit yi) + 1=2yi = yit (1

    1p)yi

    fxitg = (xit xi) + 1=2xi = xit (11p)xi:

    See Appendix 1 for Maximum Likelihood Estimation of the random-

    eects model.

  • 2.3. THE RANDOM EFFECTS MODEL 35

    2.3.3 Comparison between GLS, OLS and Within

    GLS =

    X 0QX +

    1

    X 0BX

    1X 0QY +

    1

    X 0BY

    Within = (X

    0QX)1X 0QY; Between = (X0BX)1X 0BY;

    so that

    GLS = S1Within + S2Between;

    where S1 = [X0QX + 1

    X 0BX]1X 0QX and

    S2 = [X0QX + 1

    X 0BX]1X

    0BX

    .

    (i) If 2= 0, then 1= = 1 and GLS = OLS.

    (ii) If T !1, then 1=! 0 and GLS ! Within. (iii) If 1=!1, then GLS ! Between. (iv) V ar(Within) V ar(GLS) is a s.d.p. matrix. (v) If 1=! 0 then V ar(Within)! V ar(GLS).

    2.3.4 Fixed individual eects or error components?

    Crucial issue in panel data econometrics: how should we treat ef-

    fects i's ? As parameters or as random variables ?

    ) If inference is restricted to the specic units (individuals)in the sample: conditional inference, use Fixed eects. Example:

    Individuals are not selected as random, or all rms in a given in-

    dustry are selected.

    ) If inference on the whole population: marginal (uncondi-tional) inference, use Random eects. Example: Individuals are

    selected randomly from a huge population (consumers).

  • 36 CHAPTER 2. THE LINEAR MODEL

    2.3.4.1 Some practical choice criteria

    Interpretation of eects in the (economic) model; Sampling process: purely random or not; Number of units (countries, regions, households,...); Interchangeability of units; Endogeneity of Xit (see later).

    2.3.4.2 Terminology

    When xed individual eects are considered, Fixed-Eects or

    Within estimation procedure. When random individual eects,

    GLS (Generalized Least Squares) estimation procedure.

    2.3.5 Example: Wage equation, Hausman (1978)

    629 high-school graduates, Michigan income dynamics study. 3774

    observations (N = 629, T = 6).

    Dependent variable: log wage

    The GLS estimator is a weighted-average of the Within and Be-

    tween estimators, where the weight is the inverse of the corre-

    sponding variance.

    The Within estimator neglects the variation between individuals,

    the Between estimator neglects the

    variation within individuals, and the OLS gives equal weight to

    both Within and Between variations.

    Note. If the model contains an intercept:

    yit = + xit + i + "it;

  • 2.3. THE RANDOM EFFECTS MODEL 37

    Table 2.1: Within and GLS estimation results

    Variable Within GLS

    Constant 0.8499

    Age in [20,35] 0.0557 0.0393

    Age in [35,45] 0.0351 0.0092

    Age in [45,55] 0.0209 -0.0007

    Age in [55,65] 0.0209 -0.0097

    Age 65 over -0.0171 -0.0423

    Unemployed prev. year -0.0042 -0.0277

    Poor health prev. year -0.0204 -0.0250

    Self-employed -0.2190 -0.2670

    South -0.1569 -0.0324

    Rural -0.0101 -0.1215

    we use B B B instead of B (to eliminate ) in the formulae.

    2.3.6 Best Quadratic Unbiased Estimators (BQU) of

    variances

    If errors are normal, BQU estimates of 2and 2

    "are found from

    2"= u0Qu=tr(Q) =

    PN

    i=1

    PT

    t=1(uit ui)2N(T 1)

    and \2"+ T2

    = u0Bu=tr(B) = T

    NXi=1

    u2i=N;

    because tr(Q) = N(T 1) and tr(B) = N .

    But in practice, the uit's are unknown and we must estimates

    variances from the uit's instead.

  • 38 CHAPTER 2. THE LINEAR MODEL

    1/ Wallace and Hussain (1969): Use OLS residuals in place of

    true u's;

    2/ Amemiya (1971): Use LSDV residuals estimates. We have pNT (2

    " 2

    ")p

    N(2 2

    )

    v N

    0;

    24

    "0

    0 24

    where 2 =

    \2" + T

    2 2"

    =T .

    3/ Swamy and Arora (1972): Use mean square errors of the

    Within and the Between regressions.

    Mean square error from Within regression:

    2"=Y 0QY Y 0QX(X 0QX)1X 0QY

    =[N(T 1)K]

    and from the Between regression:

    \2" + T2 =

    Y 0BY Y 0BX(X 0BX)1X 0BY

    =[N K 1]:

    Note: Intercept term in the Between regressors (X), not in the

    Within regression.

    4/ Nerlove (1971): Compute 2= 1

    N1

    PN

    i=1(i i)2, where iare parameter estimates associated to individual dummies from

    LSDV regression. And 2"is estimated from Within regression.

    Estimation methods above with covariance components replaced

    by consistent estimates: Feasible GLS.

  • Chapter 3

    Extensions

    3.1 The Two-way panel data model

    Error component structure of the form:

    uit = i + t + "it i = 1; 2; : : : ; N; t = 1; 2; : : : ; T;

    or in matrix form

    U = (IN eT )+ (eN IT )+ ";

    where = (1; : : : ; N)0 and = (1; : : : ; T )

    0.

    3.1.1 The Two-way xed-eect model

    i and t are treated as xed parameters, conditional inference

    on the N individuals over the period 1! T .

    3.1.1.1 Notation

    Fixed-eect estimates of obtain by using the new operator:

    Q = IN IT IN (eTe0T=T ) (eNe0N=N) IT ;

    39

  • 40 CHAPTER 3. EXTENSIONS

    so that Qu = fuit ui utgit :Averaging over individuals, we have

    yt = xt + t + "t with restriction

    NXi=1

    i= 0:

    and averaging over time periods:

    yi = xi + i+ "i with restriction

    TXt=1

    t = 0;

    OLS on model in deviations yields

    = (X 0QX)1X 0QY;

    i = yi xi;t = yt xt:

    If the model contains an intercept, operator Q becomes

    Q = IN IT IN (eTe0T=T ) (eNe0N=N) IT

    +(eNe0N=N) (eTe0T=T )

    so that Qu = fuit ui ut + ugit, and Within estimates are

    = (X 0QX)1X 0QY;

    i = (yi y) (xi x);t = (yt y) (xt x):

    3.1.1.2 Testing for eects

    1/ H0 : 1 = = N = 1 = = T = 0.

  • 3.1. THE TWO-WAY PANEL DATA MODEL 41

    Fisher test statistic:

    (RRSS URSS)=(N + T 2)URSS=[(N 1)(T 1)K] v F (k1; k2);

    where

    k1 = N + T 2; k2 = (N 1)(T 1)K); and

    URSS (Unrestricted RSS): from Within model,

    RRSS: (Restricted RSS): from pooled OLS.

    2/ H0 : 1 = = N = 0 given t 6= 0; t T 1.

    Fisher test statistic:

    (RRSS URSS)=(N 1)URSS=[(N 1)(T 1)K] v F (k1; k2);

    where

    k1 = N 1; k2 = (N 1)(T 1)K); and

    URSS: from Within model,

    RRSS: from regression w/ time dummies only:

    (yit yt) = (xit xt) + (uit ut):

    3/ H0 : 1 = = T1 = 0 given i 6= 0; i N 1.

    Fisher test statistic:

    (RRSS URSS)=(T 1)URSS=[(N 1)(T 1)K] v F (k1; k2);

    where

    k1 = T 1; k2 = (N 1)(T 1)K); and

  • 42 CHAPTER 3. EXTENSIONS

    URSS: from Within model,

    RRSS: from Within regression as in one-way model:

    (yit yi) = (xit xi) + (uit ui):

    See Appendix 2 for the two-way random eects model.

    3.1.2 Example: Production function (Hoch 1962)

    Sample: 63 Minnesota farms over the period 1946-1951.

    Estimation of a Cobb-Douglas production function:

    logOutputit = 0 + 1 logLaborit + 2 logReal estateit+3 logMachineryit + 4 logFertilizerit:

    Motivation for adding specic eects (into uit):

    Climatic conditions, identical across farms (t); Farm-specic factors (soil, managerial quality) (i).

    Table 3.1: Least square estimates of Cobb-Douglas production func-

    tionAssumption

    (I) (II) (III)

    Estimate i = t = 0 i = 0 t = 0

    1 (Labor) 0.256 0.166 0.043

    2 (Real estate) 0.135 0.230 0.199

    3 (Machinery) 0.163 0.261 0.194

    4 (Fertilizer) 0.349 0.311 0.289

    Sum of 's 0.904 0.967 0.726R2 0.721 0.813 0.884

  • 3.2. MORE ON NON-SPHERICAL DISTURBANCES 43

    3.2 More on non-spherical disturbances

    Panel data: in the random-eect context, heteroskedasticity due

    to panel data structure. But variances 2 and

    2" are assumed

    constant.

    Heteroskedasticity and serial correlation:

    V ar(i) = 2i

    Individual-specic heteroskedasticity

    V ar("i) = 2i

    Typical heteroskedasticity

    E("it"is) 6= 0 t 6= s Serial correlation:

    We present here the rst two cases only.

    3.2.1 Heteroskedasticity in individual eect

    Mazodier and Trognon (1978):

    V ar(i) = 2i

    "it v IID(0; 2"); i = 1; 2; : : : ; N;

    or E(0) = diag[2i] = and " v IID(0;

    2").

    = E(UU 0) = diag[2i] (eTe0T ) + diag[2" ] IT ;

    where diag[2"] is N N . We have

    = diag[T2i + 2" ]

    eTe

    0T

    T

    + diag[2" ]

    IT

    eTe0T

    T

    r = diag[(T2

    i+2

    ")r]

    eTe

    0T

    T

    +diag[(2

    ")r]

    IT

    eTe0T

    T

    :

    Transformation of the heteroskedastic model:

    multiply both sides by "

    1=2

    = diag

    "

    (T2i+ 2

    ")1=2

    eTe

    0T

    T

    + IN

    IT

    eTe0T

    T

    :

  • 44 CHAPTER 3. EXTENSIONS

    Transformed variables in scalar form:

    yit= yit

    "1

    "p

    T2i+ 2"

    !#yi:

    Same form as in the homoskedastic case, only here is individual-

    specic:

    i = (T2i +

    2")=

    2" and y

    it = yit

    1 1p

    i

    yi:

    Feasible GLS:

    Step 1. Estimate 2" consistently from usual Within regression;

    Step 2. Noting that V ar(uit) = w2i = 2i + 2" , estimate w2i by1=(T 1)

    PT

    t=1(uit iu)2, where uit is OLS residual; Step 3. Compute 2

    i= w2

    i 2

    ";

    Step 4. Form T 2i + 2", i and compute yit; xit; Step 5. Regress y

    iton x

    itto get .

    Important: consistency of variance components estimates w2i; i =

    1; 2; : : : ; N requires T >> N .

    3.2.2 `Typical heteroskedasticity

    Assumptions: i v IID(0; 2i) and V ar("it) =

    2i.

    = E(UU 0) = diag[2] (eTe0T ) + diag[2i ] IT

    = diag[T2+ 2

    i] (eTe0T=T ) + diag[2i ] (IT eTe0T=T ) :

    Transformed model uses

    1=2 = diag[1p

    T2 + 2i

    ] (eTe0T=T )

  • 3.3. UNBALANCED PANEL DATA MODELS 45

    +diag[1=i] (IT eTe0T=T ) ;so that Y = 1=2 has typical element

    yit=yit yii

    +yip

    T2+ 2

    i

    =yit iyi

    iwhere i = 1

    ipT2 +

    2i

    E(u2it) = w2

    i= 2+

    2i8i, hence OLS residuals uit can be used to

    estimate w2i: w2

    i= 1=(T 1)

    PT

    t(uit iu)2.

    Within residuals ~uit are then used to compute

    2i = 1=(T 1)P

    T

    t(~uit ~ui)2.

    A consistent estimate of 2 is 2 = (1=N)

    PN

    i(w2

    i 2i ).

    3.3 Unbalanced panel data models

    3.3.1 Introduction

    Denition: number of time periods is dierent from one unit (indi-

    vidual) to another. For individual i, we have Ti periods, and total

    number of observations is nowP

    N

    i=1 Ti (instead of NT previously).

    Examples

    Firms: may close down or new intrants in an industry; Consumers: may move, die or refuse to answer anymore; Workers: may become unemployed,...

    Problem of attrition: probability of a unit staying in the sample

    decreases as the # of periods increases.

  • 46 CHAPTER 3. EXTENSIONS

    3.3.2 Fixed eect models for unbalanced panels

    3.3.2.1 The one-way unbalanced xed-eect model

    Consider the unbalanced model with T1 = 3 and T2 = 2:0BBBB@y11y12y13y21y22

    1CCCCA =0BBBB@x11x12x13x21x22

    1CCCCA +0BBBB@11122

    1CCCCA+0BBBB@"11"12"13"21"22

    1CCCCA :To eliminate , we need a new Within operator

    Q =

    I3 e3e03=3 0

    0 I2 e2e02=2

    =

    2666642=3 1=3 1=3 0 0

    1=3 2=3 1=3 0 01=3 1=3 2=3 0 0

    0 0 0 1=2 1=20 0 0 1=2 1=2

    377775 ;and the same procedure as in the balanced case is applied:

    Within = (X0QX)

    1X 0QY

    where Q = diag(ITi eTie0Ti=Ti)ji=1;2;:::;N .

    3.3.2.2 The two-way unbalanced xed-eect model

    The model is

    yit = xit + i + t + "it i = 1; 2; : : : ; Nt; t = 1; 2; : : : ; T;

  • 3.3. UNBALANCED PANEL DATA MODELS 47

    where Nt: # of units observed in period t, and n =P

    T

    t=1Nt.

    Total number of observations is n.

    A bit more complex to extend the Within approach here.

    Important: We now assume that observations are ordered dif-

    ferently: i runs fast and t runs slowly.

    Consider a N N matrix at time t from which we delete rowscorresponding to missing individuals at t.

    Example: N = 3, N1 = 3, N2 = 2, N3 = 2, and observations are

    (y11; y21; y31) (y12; y32) (y13; y23).

    24 1 0 00 1 00 0 1

    35 )

    8>>>>>>>>>>>>>>>>>>>>>>>>>:

    D1 =

    24 1 0 00 1 00 0 1

    35D2 =

    1 0 0

    0 0 1

    D3 =

    1 0 0

    0 1 0

    We have 3 (Nt N) matrices Dt, t = 1; 2; 3 constructed from I3above.

    Now dene a new matrix as (1;2), where1 = (D01; : : : ; D

    0T)0,

    a (nN) matrix, and 2 = diag(DteN), a (n T ) matrix:

    =

    26664D1 D1eN 0D2 0 0... 0

    ......

    DT 0 DTeN

    37775 :

  • 48 CHAPTER 3. EXTENSIONS

    The DteN 's provide the number of units present for each period t

    (the Nt's).

    Matrix is n (N + T ), and corresponds to the matrix of alldummies (units and periods) present in the sample. Part 1 in

    is the equivalent ot matrix E (containing individual dummies)

    before.

    Note that 011 = diag(Ti) (number of periods in the sample for

    unit i), and 022 = diag(Nt) (number of individuals for period

    t).

    Also, 021 is a TN matrix of dummy variables for the presencein the sample of unit i at time t.

    Fixed-eect estimator could be implemented by considering the

    model

    yit = xit +Dit + "it i = 1; 2; : : : ; Nt; t = 1; 2; : : : ; T;

    where Dit: particular row of matrix , and contains all the i's

    and t's.

    In the balanced panel case, we would have 1 = (eT IN) and2 = (IT eN), and would be NT (N + T ).

  • 3.3. UNBALANCED PANEL DATA MODELS 49

    In example above, n = 3 + 2 + 2 = 7 and N = 3:

    =

    26666666664

    1 0 0 1 0 0

    0 1 0 1 0 0

    0 0 1 1 0 0

    1 0 0 0 1 0

    0 0 1 0 1 0

    1 0 0 0 0 1

    0 1 0 0 0 1

    37777777775;

    vector would be (1; 2; 3; 1; 2; 3), and 0Y =

    26666664

    1 0 0 1 0 1 0

    0 1 0 0 0 0 1

    0 0 1 0 1 0 0

    1 1 1 0 0 0 0

    0 0 0 1 1 0 0

    0 0 0 0 0 1 1

    37777775

    0BBBBBBBBB@

    y11y21y31y12y32y13y23

    1CCCCCCCCCA=

    0BBBBBB@

    y11 + y12 + y13y21 + y23y31 + y32y11 + y21 + y31y12 + y32y13 + y23

    1CCCCCCAwould compute the sums of variables over periods and inviduals.

    Easier method if N and T are large: use deviations from indi-

    vidual and time means, as in the balanced two-way Within case.

    LetN =

    011 (N N);

    T = 022 (T T );

    NT = 021 (T N);

    = 2 11N 0NT (n T );P = T NT1N 0NT = 02 (T T ):

  • 50 CHAPTER 3. EXTENSIONS

    Wansbeek and Kapteyn (1989): The required Within operator for

    such unbalanced two-way panel is

    Q =In 11N 01

    P 0;

    where P: generalized inverse of P .

    Transformed variable QY , say, is also written as

    QY = Y 11N 01Y P 0Y = Y 11N 1 ;

    where 1 = 01Y and = P

    0Y .

    1 compute the individual sumsP

    Ti

    t=1 yti.

    Typical transformed element:

    (QY )ti = yti 1i

    Ti

    +

    a0i

    Ti

    t;

    where ai: i-th column of NT .

    Example

    Let Y = (y11; y21; y31; y12; y32; y13; y23) = (1; 2; 3; 2; 6; 3; 4), n = 7,

    N = 3, T = 3.

    We have

    N = T =

    24 3 0 00 2 00 0 2

    35 ; NT =24 1 1 11 0 11 1 0

    35 ;P =

    24 1:6666 0:8333 0:83330:8333 1:1666 0:33330:8333 0:3333 1:1666

    35

  • 3.3. UNBALANCED PANEL DATA MODELS 51

    QY =

    0BBBBBBBBB@

    0:4582

    0:1875

    0:50000:54180:5000

    0:0832

    0:1875

    1CCCCCCCCCA; 1 =

    0@ 669

    1A =0@ 0:33831:6618

    2:0368

    1A

    For example,

    Qy11 = 16

    3+ (

    1

    3) (1 1 1 )

    0@ 0:33831:66182:0368

    1A+ 0:3383 = 0:4582:Qy31 = 3

    9

    2+ (

    1

    2) (1 1 0 )

    0@ 0:33831:66182:0368

    1A+ 0:3383 = 0:5:See Appendix 3 for the unbalanced random-eects model.

  • 52 CHAPTER 3. EXTENSIONS

  • Chapter 4

    Augmented panel data models

    What are augmented panel models ? Implication for estimation ?

    Special estimation techniques when GLS are not feasible.

    4.1 Introduction

    Consider the model

    yit = xit + zi + i + "it; i = 1; 2; : : : ; N; t = 1; 2; : : : ; T;

    with xit a 1K vector of time- and individual-varying regressors,and zi a 1G vector of individual-specic (time-invariant) re-gressors.

    Example:

    logWAGE = 1HOURS + 1EDUC + 2SEX + i + "it:

    Estimation method:

    Within: is not identiable because

    QY = QX + (I B)Z +Q +Q" = QX +Q";

    53

  • 54 CHAPTER 4. AUGMENTED PANEL DATA MODELS

    since BZ = Z. Only identiable. But two-step procedure is

    feasible:

    1/ Run Within regression ) ;2/ Run Between regression on

    yi xi = i + Zi + "i; i = 1; 2; : : : ; N;to estimate the 's.

    GLS: Both and are identiable.

    4.2 Choice between Within and GLS

    One of the choice criterion between Within and GLS: presence of

    zi's in the model.

    Recall: GLS is a consistent and ecient estimator provided re-

    gressors are exogenous:

    E(ixit) = 0 and E(izi) = 0 8i; t:

    Consider the non-augmented model yit = xit + i + "it.

    If xit is endogenous in the sense E(ixit) 6= 0, then GLS are notconsistent:

    GLS = +X 01X

    1 X 01U

    = +

    X 0Q+ 1B

    X1

    X 0Q+ 1B

    U;

    where = 1 + T2=2" , so that

    X 0Q+ 1B

    U= [X 0Q"+X 0(B +B")=]

  • 4.3. AN IMPORTANT TEST FOR ENDOGENEITY 55

    = 0 +X 0B= + 0 = X 0= 6= 0;

    because E(X 0") = 0 and B = .

    Same problem with the augmented model, if E(X 0) 6= 0 and/orE(Z 0) 6= 0.

    Important consequence in practice: If (some of the) re-

    gressors are endogenous, GLS estimates are not consistent, but

    Within estimates are consistent because is ltered out.

    Another criterion of choice between Within and GLS:

    If endogenous regressors ) Choose Within estimation (but not identiable);

    If all regressors are exogenous, use GLS (the most ecient).

    Three problems remain:

    still not identied, because in the Between regressionyi xi = zi + i + "i,

    zi still correlated with i.

    If one uses Within, all regressors are treated as endogenous (nodistinction between exogenous and endogenous xit's).

    Within estimates not ecient.

    4.3 An important test for endogeneity

    Null hypothesis: H0 : E(X0) = E(Z 0) = 0 (exogeneity).

    Comparison between two estimators:

  • 56 CHAPTER 4. AUGMENTED PANEL DATA MODELS

    GLS WithinH0 Consistent, Consistent,

    ecient not ecient

    Alternative Not consistent Consistent

    Hausman (1978): Even if the xit's are exogenous, GLS esti-

    mates of are not consistent in the augmented model. Therefore,

    one can test for exogeneity using parameter estimates for only.

    Hausman test statistic: Under H0,

    HT =Within GLS

    0 hV ar(Within) V ar(GLS)

    i1Within GLS

    v 2(K):

    Notes

    GLS and Within must have the same dimension.Weighting matrix

    hV ar(Within) V ar(GLS)

    iis positive: GLS

    more ecient than Within under the null.

    Recall that V ar(GLS) = 2"(X 0QX+X 0BX)1 and V ar(w) =

    2"(X 0QX)1.

    Interpretation of # of degrees of freedom of the test:

    Within estimator is based on the conditionE(X 0QU) = 0, whereas

    GLS is based onE(X 01U) = 0 ) E(X 0QU) = 0 and E(X 0BU) =0.

    For GLS, we add K additional conditions (in terms of B): rank

    of X. Hausman test uses these additional restrictions (see GMM

    later).

  • 4.4. INSTRUMENTAL VARIABLE ESTIMATION: HAUSMAN-TAYLORGLS ESTIMATOR57

    4.4 Instrumental Variable estimation: Hausman-

    Taylor GLS estimator

    4.4.1 Instrumental Variable estimation

    Alternative method: Instrumental-variable estimation. In the

    cross-section context with N observations:

    Y = X + "; E(X 0") 6= 0; E(W 0") = 0;

    where W is a N L matrix of instruments. If K = L,

    [W 0(Y X)] = 0 , (W 0Y ) = (W 0X)

    = (W 0X)1W 0Y (IV estimator):

    If L > K,

    [W 0(Y X)] = 0 (L conditions on K parameters)

    and construct quadratic form (Y X)0W (W 0W )1W 0(Y X) where PW =W (W 0W )1W 0

    ) = (X 0P 0WX)1(X 0PWY ):

    Note: in general, instruments W originate from or outside the

    equation.

    4.4.2 IV in a panel-data context

    Account for variance-covariance structure (); Find relevant instruments, not correlated with .

  • 58 CHAPTER 4. AUGMENTED PANEL DATA MODELS

    Consider the general, augmented model:

    Y = X11 +X22 + Z11 + Z22 + + ";

    where

    X1 : N K1 exogenous, varying across i and t;X2 : N K2 endogenous, varying across i and t;Z1 : N G1 exogenous, varying across i;Z2 : N G2 endogenous, varying across i;

    and let = (X 01; X02; Z

    01; Z

    02) and = (

    01;

    02;

    01;

    02)0.

    General form of the Instrumental-variable estimator for panel

    data: Let Y = 1=2Y , X = 1=2X, and = 1=2. We

    have

    IV =h

    0

    PWi1 h

    0

    PWYi

    =h01=2PW

    1=2i1 h

    01=2PW

    1=2Y

    i:

    Computation of 1=2: as in the usual GLS case.

    4.4.3 Exogeneity assumptions and a rst instrument ma-

    trix

    Exogeneity assumptions: E(X 01) = E(Z01) = 0

    ) Obvious instruments are X1 and Z1, not sucient becauseK1 +G1 < K1 +K2 +G1 +G2.

    Additional instruments: must not be correlated with .

    Because is the source of endogeneity, every variable not cor-

    related with is a valid instrument. Best valid instruments are

    highly correlated with X2 and Z2.

    QX1 and QX2 are valid instruments: E[(QX1)0] = E[X 01Q] =

  • 4.4. INSTRUMENTAL VARIABLE ESTIMATION: HAUSMAN-TAYLORGLS ESTIMATOR59

    0 and E[(QX2)0] = E[X 02Q] = 0.

    As for X1, equivalent to use BX1 because we need

    E[X 01

    1U ] = E[X 01(Q+

    1B)U ] = E[X 01B(Q+ 1B)U ]

    since BQ = 0 and BB = B.

    Hausman-Taylor (1981) matrix of instruments:

    WHT = [QX1; QX2; BX1; Z1] = [QX1; QX2; X1; Z1]:

    Identication condition: We have K1+K2+G1+G2 parameters

    to estimate, using K1 +K1 +K2 +G1 instruments (K1 +K2 in-

    struments inQX). Therefore, identication condition isK1 G2.

    4.4.4 More ecient procedures: Amemiya-MaCurdy and

    Breusch-Mizon-Schmidt

    4.4.4.1 Amemiya and MaCurdy (1986)

    Use the fact that if xit is exogenous, we can use the following con-

    ditions: E(xiti) = 0 8i; 8t instead of E(x0ii) = 0.

    Amemiya and MaCurdy (1986) suggest to use matrix X1 in

  • 60 CHAPTER 4. AUGMENTED PANEL DATA MODELS

    the list of instruments:

    X1 =

    26666666666666664

    x11 x12 : : : x1T (i = 1; t = 1)

    x11 x12 : : : x1T (i = 1; t = 2)

    : : : : : : : : : : : : : : :

    x21 x22 : : : x2T (i = 2; t = 1)

    x21 x22 : : : x2T (i = 2; t = 2)

    : : : : : : : : : : : : : : :

    xN1 xN2 : : : xNT (i = N; t = 1)

    xN1 xN2 : : : xNT (i = N; t = 2)

    : : : : : : : : : : : : : : :

    xN1 xN2 : : : xNT (i = N; t = T )

    37777777777777775such that QX1 = 0 and BX

    1 = X

    1 . The AM instrument matrix

    is WAM = [QX;X1 ; Z1], and an equivalent estimator obtains by

    using

    WAM = [QX; (QX1); BX1; Z1];

    where (QX1) is constructed as X1 above.

    Amemiya and MaCurdy: their instrument matrix yields an es-

    timator as least as ecient as with the Hausman-Taylor matrix,

    if i is not correlated with regressors 8t.

    Identication condition: We add (QX1) to the Hausman-Taylor

    list of instruments, but as [(QX1); X1] is of rank K1, we only add

    (T 1)K1 instruments. identication condition is TK1 G2.

    4.4.4.2 Breusch, Mizon and Schmidt (1989)

    Even more ecient estimator: based on conditions

    E[(QX2it)0i] = 0 8i; 8t, instead of condition

    E[(QTX2i)0i] = 0.

  • 4.5. COMPUTATION OFVARIANCE-COVARIANCE MATRIX FOR IV ESTIMATORS61

    For BMS, estimator is more ecient if endogeneity in X2 origi-

    nates from a time-invariant component. BMS instrument matrix:

    WBMS = [QX; (QX1); (QX2)

    ; BX1; Z1]

    where (QX1) and (QX2)

    are constructed the same way as X1for AM.

    Identication condition: For BMS, we add (QX2) to Amemiya-

    MaCurdy instruments. Condition is then TK1+(T 1)K2 G2.As before, we only add (T 1)K2 instruments, as (QX2) is notfull rank but (T 1)K2.

    4.5 Computation of variance-covariance matrix

    for IV estimators

    Problem here: endogenous regressors may yield unconsistent esti-

    mates of variance components in , in particular parameter .

    Method suggested by Hausman-Taylor (1981) that yields consis-

    tent estimates.

    Let M1 denote the individual-mean vector of the Within residual:

    M1 = BY BXW =B BX(X 0QX)1X 0Q

    Y

    = Z + +B BX(X 0QX)1X 0Q

    ";

    where X = (X1jX2), Z = (Z1jZ2), and = (1; 2). The lastthree terms above can be treated as centered residuals, and it

    suces to nd instruments for Z2 in order to estimate .

    The IV estimator of is

    B = (Z0PCZ)

    1(Z 0PCM1);

  • 62 CHAPTER 4. AUGMENTED PANEL DATA MODELS

    where PC is the projection matrix associated to instruments C =

    (X1; Z1). Using parameter estimates W and B, we form resid-

    uals

    uW = QY QXW and uB = BY BXW ZB:

    These two vectors of residuals are used to compute variance com-

    posants as in standard Feasible GLS.

    4.5.1 Full IV-GLS estimation procedure

    Step 1. Compute individual means and deviations, BX, BY ,QX and QY .

    Step 2. Estimate parameters associated toX using Within.

    Step 3. Estimate B by the IV procedure above.

    Step 4. Compute 2 and

    2" from uW and uB, and compute

    = 1 + T 2=2

    ".

    Step 5. Transform variables by GLS scalar procedure , e.g.,(Q+

    pB)Y = yit (1

    p)yi.

    Step 6. Compute projection projection PW from instrumentmatrix W .

    Step 7. Estimate parameters .

    4.6 Example: Wage equation

    4.6.1 Model specication

  • 4.6. EXAMPLE: WAGE EQUATION 63

    Theory (Human capital or signal theory):

    logw = F [X1; ; ED]; where w : wage rate;

    : worker's ability (unobserved), X1: additional variables (indus-

    try, occupation status, etc.), and ED: educational level. Proxies

    for ability that can be used: number of hours worked, experience,

    union, etc.

    Main objective: estimate marginal gain associated withED: @w=@ED.

    But problem: what if worker's ability is constant through time and

    conditions ED ? True model would belogw = F [X1; ; ED];

    ED = G[;X2];

    where X2 are additional, individual-specic variables.

    If ability is replaced by proxies Z, we havelogw = F [X1; Z; ED] + U;

    ED = G[X2; Z2] + V;

    where U = F [X1; ; ED] F [X1; Z; ED] andV = G[X2; ]G[X2; Z].

    Two problems when estimating the rst equation while overlook-

    ing the second one:

    If some X1 and X2 variables in common, endogeneity bias (be-cause of ED);

    If Z correlated with omitted variables (explaining ability), measurement-error bias.

  • 64 CHAPTER 4. AUGMENTED PANEL DATA MODELS

    4.7 Application: returns to education

    Sample used: Panel Study of Income Dynamics (PSID), Univer-

    sity of Michigan. See Baltagi and KhantiAkom 1990, Cornwell

    and Rupert 1988.

    595 individuals, for years 1976 to 1982 (7 time periods): heads of

    households (males and females) aged between 18 and 65 in 1976,

    with a positive wage in private, nonfarm employment for the

    years 1976 to 1982.

    4.7.1 Variables related to job status

    LWAGE : logarithm of wage earnings;

    WKS : number of weeks worked in the year;

    EXP : working experience in years at the date of the sample;

    OCC : dummy, 1 if bluecollar occupation;

    IND : dummy, 1 if working in industry;

    UNION : dummy, 1 if wage is covered by a union contract.

    4.7.2 Variables related to characteristics of households

    heads

    SMSA : dummy, 1 if household resides in SMSA (StandardMetropolitan Statistical Area);

    SOUTH : dummy, 1 if individual resides in the south;

    MS : Marital Status dummy, 1 if head is married;

  • 4.7. APPLICATION: RETURNS TO EDUCATION 65

    FEM : dummy, 1 female;

    BLK : dummy, 1 if head is black;

    ED : number of years of education attained.

    Individual-specic variables: ED, BLK and FEM .

    Estimation of non-augmented models (w/o Zi's)

    Variables a priori endogenous (because correlated with ability:

    individual eects): X2: (EXPE, EXPE2, UNION , WKS,

    MS);

    Variables a priori exogenous: X1: (OCC, SOUTH, SMSA,

    IND).

    Augmented model

    Yit = X1it1 +X2it2 + Z1i1 + Z2i2 + i + "it

    Variables a priori endogenous: Z2: ED;

    Variables a priori exogenous: Z1: (BLK, FEM).

  • 66 CHAPTER 4. AUGMENTED PANEL DATA MODELS

    Table 4.1: Sample 1 1976-1982. Descriptive StatisticsVariable Mean Std. Dev. Minimum Maximum

    LWAGE 6.6763 0.4615 4.6052 8.5370

    EXP 19.8538 10.9664 1.0000 51.0000

    WKS 46.8115 5.1291 5.0000 52.0000

    OCC 0.5112 0.4999 0.0000 1.0000

    IND 0.3954 0.4890 0.0000 1.0000

    UNION 0.3640 0.4812 0.0000 1.0000

    SOUTH 0.2903 0.4539 0.0000 1.0000

    SMSA 0.6538 0.4758 0.0000 1.0000

    MS 0.8144 0.3888 0.0000 1.0000

    ED 12.8454 2.7880 4.0000 17.0000

    FEM 0.1126 0.3161 0.0000 1.0000

    BLK 0.0723 0.2590 0.0000 1.0000

  • 4.7. APPLICATION: RETURNS TO EDUCATION 67

    Table 4.2: Dependent variable: log(wage). Exogenous regressors

    only.Within GLS

    Constant 0.0976 (0.0040)

    OCC -0.0696 (0.02323) -0.0701 (0.02322)

    SOUTH -0.0052 (0.05833) -0.0072 (0.05807)

    SMSA -0.1287 (0.03295) -0.1275 (0.03290)

    IND 0.0317 (0.02626) 0.0317 (0.02624)

    2(4) = 0:551

    Notes. Standard errors are in parentheses.

    Table 4.3: Dependent variable: log(wage). Endogenous regressors

    only.Within GLS

    Constant 0.0561 (0.0024)

    EXPE 0.1136 (0.002467) 0.1133 (0.002466)

    EXPE2 -0.0004 (0.000054) -0.0004 (0.000054)

    WKS 0.0008 (0.0005994) 0.0008 (0.0005994)

    MS -0.0322 (0.01893) -0.0325 (0.01892)

    UNION 0.0301 (0.01480) 0.0300 (0.01479)

    2(5) = 24:94

    Notes. Standard errors are in parentheses.

  • 68 CHAPTER 4. AUGMENTED PANEL DATA MODELS

    Table 4.4: Dependent variable: log(wage). Augmented model.

    Within GLS

    Constant 0.1866 (0.01189)

    OCC -0.0214 (0.01378) -0.0243 (0.01367)

    SOUTH -0.0018 (0.03429) 0.0048 (0.03188)

    SMSA -0.0424 (0.01942) -0.0468 (0.01891)

    IND 0.0192 (0.01544) 0.0148 (0.01521)

    EXPE 0.1132 (0.00247) 0.1084 (0.00243)

    EXPE2 -0.0004 (0.00005) -0.0004 (0.00005)

    WKS 0.0008 (0.00059) 0.0008 (0.00059)

    MS -0.0297 (0.01898) -0.0391 (0.01884)

    UNION 0.0327 (0.01492) 0.0375 (0.01472)

    FEM -0.1666 (0.12646)

    BLK -0.2639 (0.15413)

    ED 0.1373 (0.01415)

    2(9) = 495:3

    Notes. Standard errors are in parentheses.

    Table 4.5: Dependent variable: log(wage). IV Estimation

    HT AM BMS

    Constant 0.1772 (0.017) 0.1781 (0.016) 0.1748 (0.016)

    OCC -0.0207 (0.013) -0.0208 (0.013) -0.0204 (0.013)

    SOUTH 0.0074 (0.031) 0.0072 (0.031) 0.0077 (0.031)

    SMSA -0.0418 (0.018) -0.0419 (0.018) -0.0423 (0.018)

    IND 0.0135 (0.015) 0.0136 (0.015) 0.0138 (0.015)

    EXPE 0.1131 (0.002) 0.1129 (0.002) 0.1127 (0.002)

    EXPE2 -0.0004 (0.005) -0.0004 (0.000) -0.0004 (0.000)

    WKS 0.0008 (0.000) 0.0008 (0.000) 0.0008 (0.000)

    MS -0.0298 (0.018) -0.0300 (0.018) -0.0303 (0.018)

    UNION 0.0327 (0.014) 0.0324 (0.014) 0.0326 (0.014)

    FEM -0.1309 (0.126) -0.1320 (0.126) -0.1337 (0.126)

    BLK -0.2857 (0.155) -0.2859 (0.155) -0.2793 (0.155)

    ED 0.1379 (0.021) 0.1372 (0.020) 0.1417 (0.020)

    Test 2(3) = 5:23 2(13) = 19:29 2(13) = 12:23Notes. Standard errors are in parentheses.

  • Chapter 5

    Dynamic panel data models

    5.1 Motivation

    Usefulness of dynamic panel data models:

    Investigate adjustment dynamics in micro- and macro-economicvariables of interest;

    Estimate equations from intertemporal-framework models (life-cycle models, nance,...)

    In practice: estimate long-run elasticities and structural parame-

    ters from Euler equations.

    5.1.1 Dynamic formulations from dynamic programming

    problems

    Consider the general problem

    maxq(0);:::;q(T )ER

    ert(t);

    (t) = p(t)q(t) c[q(t); b(t)];_b = G[b(t); q(t)];

    69

  • 70 CHAPTER 5. DYNAMIC PANEL DATA MODELS

    where b(t) is the state variable (stock, capital,...), q(t) is the con-

    trol variable, r is discount rate. G(:) describes the evolution path

    of the state variable.

    Dynamic programming solves the problem in a series of steps.

    Switch to discrete-time framework:

    maxq0;:::;qT EnP

    T

    t=0(1 + r)tt

    o;

    bt+1 = f(bt; qt);

    and use the Bellman equation:

    Vt(bt) = maxEtt + (1 + r)

    1Vt+1(bt+1)

    = maxEt fptqt c[qt; bt] + Vt+1f [bt; qt]g ;where Vt(bt) is the value function of the problem at time t, and

    Et is the conditional expectation operator at time t.

    We use a) the envelope theorem (evolution path at optimum de-

    pends only on state variable, as control variable is already opti-

    mized); b) First-order condition wrt. control variable.

    @Vt(bt)

    @bt=@t(bt; qt)

    @bt+

    1

    1 + r

    @Vt+1

    @f

    @f(bt; qt)

    @bt;

    (Envelope theorem)

    @Vt(bt)

    @qt=@t(bt; qt)

    @qt+

    1

    1 + r

    @Vt+1

    @f

    @f(bt; qt)

    @qt= 0 (FOC):

    From (FOC):

    @Vt+1

    @f= @t

    @qt

    @f(bt; qt)

    @qt

    1(1 + r);

  • 5.1. MOTIVATION 71

    that we replace in rst equation above:

    @Vt

    @bt=@t

    @bt @t@qt

    @f(bt; qt)

    @qt

    1@f(bt; qt)

    @bt:

    Now we lag (FOC) once and replace:

    @t1

    @qt1+

    1

    1 + r

    "@t

    @bt @t@qt

    @f

    @qt

    1@f

    @bt

    #@f(bt1; qt1)

    @qt1= 0:

    Assume @f=@q = a1 and @f=@b = a2. We have

    @t

    @qt=

    1 + r

    a2

    @t1

    @qt1+

    a1

    a2

    @t

    @bt:

    This is the Euler equation relating current and past marginal

    prots.

    If, for instance, prot is linear-quadratic in qt and bt, we have

    b0 + b1qt + b2bt =1+ra2

    (b0 + b1qt1 + b2bt1)

    +a1

    a2

    (c0 + c1qt + c2bt)

    , qit = 0 + 1qi;t1 + 2bi;t1 + 3bit + i + "it;

    where

    0 = (a2b1 a1c1)1 [b0 ((1 + r) a2) + a1c0] ;1 = (a2b1 a1c1)1 [(1 + r)b1] ;2 = (a2b1 a1c1)1 [(1 + r)b2] ;3 = (a2b1 a1c1)1 [a1c2 a2b2] :

    5.1.2 Euler equations and consumption

  • 72 CHAPTER 5. DYNAMIC PANEL DATA MODELS

    Consider a two-period model with the following period-to-period

    budget constraint

    ct +At = yt + At1(1 + rt); t = 1; 2;

    where ct is consumption at time t, At is total assets, yt is wage

    income, and rt is interest rate.

    Assume further, intertemporally additive preferences:

    U = u(c1) +1

    1 + u(c2);

    where u0 > 0, u00 < 0 and 0 is the subjective discount rate.Often-used specication: CES (Constant Elasticity of Substitu-

    tion)

    U = c1 +

    1

    1 + c2 ;

    where = 1=(1+) is the intertemporal elasticity of substitution.

    At the optimum (by replacing budget constraints in utility func-

    tion and optimizing wrt. A1):

    @U

    @A1=@u

    @c1

    @c1

    @A1+

    1

    1 +

    @u

    @c2

    @c2

    @A1= 0

    , @u@c1

    =1 + r

    1 +

    @u

    @c2:

    This is the intertemporal eciency condition (Hall 1978), and in

    the CES case we have

    c1=1 =

    1 + r

    1 +

    c1=2 :

  • 5.1. MOTIVATION 73

    Stochastic framework with u(X) = 1=2( X)2:

    c1 =1 + r

    1 + ( Ec2) , c1 = Ec2 if r = :

    Hall Euler equation with more than 2 periods reduces to

    ct+1 = ct + "t+1; where "t+1 is i.i.d.;

    which is tested from the equation

    ct = 0 + 1yt + 2(yt1 ct1) + "t:

    This is an error-correction model that can be written

    ct = 0 + 1yt + (ct1 1yt1) + 2(yt1 ct1) + "t:

    5.1.3 Long-run relationships in economics

    Long-run relationships are represented by the stationary path

    of the variable of interest (consumption, capital stock,...)yt+1

    yt= and if we add variable xt, yt+1 = yt + xt+1, stationary

    equilibrium path is y = x

    1.

    5.1.3.1 Long-run elasticities

    Dynamic models are helpful in computing long-run elasticities.

    Consider for example the dynamic consumption model

    ~Ci;t+j = ~Ci;t+j1 + ~Pi;t+j + ui;t+j;

    where ~Ci;t+j and ~Pi;t+j respectively denote logs of consumption

    and price. Lagged consumption here accounts for habits. We

    have~Ci;t+j =

    j+1 ~Ci;t1 +j ~Pit +

    j1 ~Pi;t+1 + : : :

  • 74 CHAPTER 5. DYNAMIC PANEL DATA MODELS

    + ~Pi;t+j1 + ~Pi;t+j

    + ui;t+j;

    where ui;t+j = juit +

    j1ui;t+1 + + ui;t+j1 + ui;t+j.

    Assume we want to compute the change in consumption at

    time t+ j following a permanent change of 1% in price between

    t and t+ j:

    @ ~Ci;t+j

    @ ~Pit+@ ~Ci;t+j

    @ ~Pi;t+1+ + @

    ~Ci;t+j

    @ ~Pi;t+j= (j + j1 + + + 1):

    When consumption is stationary (in logs), jj < 1, and the long-run eect of price obtains by taking the limit

    limj!1

    jXs=0

    @ ~Ci;t+j

    @ ~Pi;t+s= lim

    j!1(j + j1 + + + 1) =

    1 :

    5.1.3.2 Dynamic representations from AR(1) errors

    Consider the following Cobb-Douglas production model

    logQit = 1 logNit + 2 logKit + uit;

    where Qit is output of rm i at time t, Nit is labor input, Kit is

    capital stock, and uit is the residual. Assume the latter decom-

    poses into

    uit = t + i + vit + "it;

    where t is a year-specic intercept (industry-wide technological

    change), i is the unobserved rm-specic eect, "it is an i.i.d.

    error component (measurement error), and vit is a productivity

    shock having an AR(1) representation:

    vit = vi;t1 + eit:

  • 5.2. THE DYNAMIC FIXED-EFFECT MODEL 75

    This model has the following, dynamic representation:

    logQit = 1 logNit 1 logNi;t1 + 2 logKit 2 logKi;t1

    + logQi;t1 + (t t1) + [i(1 ) + eit + "it "i;t1] ;

    or

    logQit = 1 logNit + logNi;t1 + 3 logKit + logKi;t1+5 logQi;t1 +

    t+ (

    i+ !it);

    subject to restrictions 2 = 15 and 4 = 35.

    Hence, equivalence between a static (short-run) model with serially-

    correlated productivity shocks, and a dynamic representation of

    production output.

    5.2 The dynamic xed-eect model

    Simple dynamic panel-data model:

    yit = yi;t1 + i + "it; i = 1; 2; : : : ; N ; t = 1; 2; : : : ; T;

    where initial conditions yi0; i = 1; 2; : : : ; N are assumed known.

    We assume E("it) = 0 8i; t, E("it"js) = 2"if i = j; t = s and 0 otherwise, E(i"it) = 0 8i; t.By continuous substitution:

    yit = "it + "i;t1 + 2"i;t2 + + t1"i1 +

    1 t1 i +

    tyi0:

  • 76 CHAPTER 5. DYNAMIC PANEL DATA MODELS

    5.2.1 Bias in the Fixed-Eects estimator

    The Within estimator is:

    =

    PN

    i=1

    PT

    t=1(yit yi)(yi;t1 yi;1)PN

    i=1

    PT

    t=1(yi;t1 yi;1)2;

    i = yi yi;1;

    where

    yi =1

    T

    TXt=1

    yit; yi;1 =1

    T

    TXt=1

    yi;t1; "i =1

    T

    TXt=1

    "it:

    Also,

    = +1NT

    PN

    i=1

    PT

    t=1("it "i)(yi;t1 yi;1)1NT

    PN

    i=1

    PT

    t=1(yi;t1 yi;1)2;

    This estimator exists if denominator 6= 0 and is consistent if nu-merator converges to 0.

    Numerator:

    plimN!11

    NT

    N;TXi;t

    (yi;t1 yi;1)("it "i) = plim1

    N

    NXi=1

    yi;1"i

    because "it is serially uncorrelated and not correlated with i. We

    use

    yi;1 =1

    T

    TXt=1

    yi;t1 =1

    T

    1 T1 yi0 +

    (T 1) T+ T(1 )2 i

    +1 T11 "i1 +

    1 T21 "i2 + + "i;T1

    :

  • 5.2. THE DYNAMIC FIXED-EFFECT MODEL 77

    We have

    plim1

    N

    NXi=1

    yi;1"i = plim

    (1

    N

    NXi=1

    "i1

    T

    "T1Xt=1

    1 Tt1 "it

    #)

    = plim

    (1

    N

    NXi=1

    1

    T

    TXt=1

    "it

    !1

    T

    "T1Xt=1

    1 Tt1 "it

    #)

    =2"T 2

    (T 1) T+ T

    (1 )2

    :

    In a similar manner, we show that plim 1NT

    PN;T

    i;t(yi;t1 yi;1)2

    =2"

    1 2

    1 1

    T 2

    (1 )2 (T 1) T+ T

    T 2

    Forming the ratio of these two terms, the asymptotic bias is

    plimN!1( ) = 1 +

    T 1

    1 1

    T

    1 T1

    1 2

    (1 )(T 1)

    1 1

    T

    T (1 )

    1= O(1=T ):

    In the transformed model

    (yit yi) = (yi;t1 yi;1) + ("it "i);

    the explanatory variable is correlated with residual, and correla-

    tion is of order 1=T . Hence, the Fixed-Eects estimator is biased

    in the usual case where N is large and T is small.

  • 78 CHAPTER 5. DYNAMIC PANEL DATA MODELS

    Table 5.1: Asymptotic bias in Fixed-Eects DPD estimator T Bias Percent

    0.2 6 -0.2063 -103.1693

    8 -0.1539 -76.9597

    10 -0.1226 -61.3139

    20 -0.0607 -30.3541

    40 -0.0302 -15.0913

    0.5 6 -0.2756 -55.1282

    8 -0.2049 -40.9769

    10 -0.1622 -32.4421

    20 -0.0785 -15.6977

    40 -0.0384 -7.6819

    0.7 6 -0.3307 -47.2392

    8 -0.2479 -35.4084

    10 -0.1966 -28.0912

    20 -0.0938 -13.3955

    40 -0.0449 -6.4114

    0.9 6 -0.3939 -43.7633

    8 -0.3017 -33.5179

    10 -0.2432 -27.0248

    20 -0.1196 -13.2934

    40 -0.0563 -6.2561

  • 5.2. THE DYNAMIC FIXED-EFFECT MODEL 79

    5.2.2 Instrumental-variable estimation

    Only way to obtain consistent estimator of when T is xed

    (small). Dierent procedure to eliminate individual eects: use

    First dierencing instead of Within:

    (yit yi;t1) = (yi;t1 yi;t2) + ("it "i;t1)yit = yi;t1 +"it;

    and in vector form:

    yi = yi;1 +"i; i = 1; 2; : : : ; N:

    In model above, yi;t1 correlated by construction with "i;t1!Weneed instruments that are uncorrelated with ("it "i;t1) but cor-related with (yi;t1 yi;t2). Only possibility in a single-equationframework with no other explanatory variables: use values of de-

    pendent variables.

    Because of autoregressive nature of model, instruments from fu-

    ture values of yit are not feasible because yit is a recursive function

    of "it; "i;t1; : : : ; "i1; i; yi0.

    As for lagged dependent variables, we can use either yi;t2 or

    (yi;t2 yi;t3):E[yi;t2("it "i;t1)] = E("i;t2"it) E("i;t2"i;t1) = 0;E[(yi;t2 yi;t3)("it "i;t1)] = E["i;t2("it "i;t1)]

    E["i;t3("it "i;t1)] = 0;E[yi;t2(yi;t1 yi;t2)] = 0 E("2i;t2) = 2" ;E[(yi;t2 yi;t3)(yi;t1 yi;t2)] = 0 E("2i;t2) = 2" :

    Instrumental-variable estimators that are consistent whenN and/or

    T !1:

    =

    PN

    i=1

    PT

    t=3(yit yi;t1)(yi;t2 yi;t3)PN

    i=1

    PT

    t=3(yi;t1 yi;t2)(yi;t2 yi;t3)

  • 80 CHAPTER 5. DYNAMIC PANEL DATA MODELS

    or =

    PN

    i=1

    PT

    t=3(yit yi;t1)yi;t2PN

    i=1

    PT

    t=3(yi;t1 yi;t2)yi;t2:

    Conclusion: With Within transformation on a dynamic model,

    even though i is eliminated, endogeneity bias occurs for xed T

    because the Q operator used introduces errors "is correlated by

    construction with current explanatory variable.

    Consider now a more general model:

    yit = yi;t1 + xit + zi + i + "it:

    IV Estimation proceeds as follows.

    Step 1. First-dierence the model, to get

    (yit yi;t1) = (yi;t1 yi;t2) + (xit xi;t1) + "it "i;t1:

    Use yi;t2 or (yi;t2 yi;t3) as instrument for (yi;t1 yi;t2) andestimate ; with the IV procedure.

    Step 2. Substitute and in rst-dierence Between equation:

    yi yi;1 xi = zi + i + "i; i = 1; 2; : : : ; N;

    and estimate by OLS.

    Step 3. Estimate variance components:

    2"= 1

    2N(T1)

    PN

    i=1

    PT

    t=1 [(yit yi;t1) (yi;t1 yi;t2)

    (xit xi;t1)i2;

    2= 1

    N

    PN

    i=1

    hyi yi;1 zi xi

    i2 1

    T2";

  • 5.3. THE RANDOM-EFFECTS MODEL 81

    Consistency of the estimator:

    IV estimator of , and 2"are consistent when N or T !1;

    IV estimator of and 2are consistent only when T ! 1, but

    inconsistent when T is xed and N !1.

    5.3 The Random-eects model

    We now treat i as a random variable, in addition to "it. As

    for static models, i is not eliminated, but it is correlated by

    construction with lagged dependent variable yi;t1.

    5.3.1 Bias in the ML estimator

    In the simple model yit = yi;t1+i+ "it, the MLE is equivalent

    to the OLS estimator:

    =

    PN

    i=1

    PT

    t=1 yityi;t1PN

    i=1

    PT

    t=1 y2i;t1

    = +

    PN

    i=1

    PT

    t=1(i + "it)yi;t1PN

    i=1

    PT

    t=1 y2i;t1

    :

    We show that

    plimN!11

    NT

    NXi=1

    TXt=1

    (i + "it)yi;t1 =1

    T

    1 T1 Cov(yi0; i)

    +1

    T

    2

    (1 )2(T 1) T+ T

    ;

    and

    plimN!11

    NT

    NXi=1

    TXt=1

    y2i;t1 =1 2TT (1 2):

    PN

    iy2i0

    N

    +2

    (1 )2 :1

    T

    T 21

    T

    1 +1 2T1 2

  • 82 CHAPTER 5. DYNAMIC PANEL DATA MODELS

    +2

    T (1 )

    1 T1

    1 2T1 2

    Cov(yi0; i)

    +2"

    T (1 2)2(T 1) T2 + 2T

    :

    The bias depends on the behavior of initial conditions yi0 (constant

    or generated as yit).

    5.3.2 An equivalent representation

    We consider a more general model

    yit = yi;t1 + xit + zi + uit;

    with the following assumptions:

    jj < 1; E(i) = E("it) = 0;

    E(ixit) = 0; E(izi) = 0; E(i"it) = 0;

    E(ij) = 2

    if i = j;

    0 otherwise;

    E("it"js) = 2"

    if i = j; t = s;

    0 otherwise:

    We can also write

    wit = wi;t1 + xit + zi + "it;

    yit = wit + i;

    where i = i=(1 ); Ei = 0; V ar(i) = 2 = 2=(1 )2;

    and the dynamic process fwitg is independent from individual ef-fect i.

  • 5.3. THE RANDOM-EFFECTS MODEL 83

    5.3.3 The role of initial conditions

    The two equivalent specications of the model are:

    (A) yit = yi;t1 + xit + zi + i + "it;

    (B)wit = wi;t1 + xit + zi + "it;

    yit = wit + i:

    In model (A), yit is driven by unobserved characteristics i, dif-

    ferent across units, in addition to xit and zi.

    In model (B), dynamic process wit is independent from individual

    eects i. Conditional on exogenous xit and zi, wit are driven by

    identical processes with i.i.d. shocks "it. But observed value yit is

    shifted by individual-specic eect i.

    Possible interpretation: wit is a latent variable, yit is observed,

    and i is a time-invariant measurement error.

    The two processes are equivalent because wit is unobserved. But

    assumptions (or knowledge) on initial conditions may help to dis-

    tinguish between both processes.

    Dierent cases:

    1/ yi0 xed; 2/ yi0 random; 2.a/ yi0 independent of i, with E(yi0) = y0 and V ar(yi0) =2y0;

    2.b/ yi0 correlated with i, with Cov(yi0; i) = 2y0; 3/ wi0 xed; 4/ wi0 random; 4.a/ wi0 random with common mean w and variance 2"=(12)

  • 84 CHAPTER 5. DYNAMIC PANEL DATA MODELS

    (stationarity assumption);

    4.b/ wi0 random with common mean w and arbitrary variance2w0;

    4.c/ wi0 random with mean i0 and variance 2"=(1 2) (sta-tionarity assumption);

    4.d/ wi0 random with mean i0 and arbitrary variance 2w0.

    See Appendix 4 for a derivation of Maximum Likelihood esti-

    mators in each case.

    5.3.4 Possible inconsistency of GLS

    In cases 1 and 2.a/ (yi0 xed of random but independent of i):

    When 2and 2

    "are known, maximizing log-likelihood wrt. ;

    and yields the GLS estimator. When 2 and 2" are unknown,

    feasible GLS applies by using consistent estimates of these vari-

    ances in VT .

    Other cases

    Estimators for and are consistent when T !1, because GLSconverges to Within. When N !1 and T is xed, GLS is incon-sistent in cases where initial values are correlated with individual

    eects.

    5.3.5 Example: The Balestra-Nerlove study

    Seminal paper on Dynamic Panel Data models (1966). Household

    demand for natural gas in the US, including a/ the demand due

    to replacement of gas appliances, and b/ demand due to increases

    in the stock of appliances.

  • 5.3. THE RANDOM-EFFECTS MODEL 85

    Table 5.2: Properties of the MLE for dynamic panel data models

    Parameters N xed, T !1 T xed, N !1Case 1: yi0 xed

    ; ; 2"

    Consistent Consistent

    ; 2 Inconsistent Consistent

    Case 2.a: yi0 random, yi0 ind. of i; ; 2

    " Consistent Consistent

    y0; ; 2; 2

    y0Inconsistent Consistent

    Case 2.b: yi0 correlated with i; ; 2

    "Consistent Consistent

    y0; ; 2;

    2y0; Inconsistent Consistent

    Case 3: wi0 xed

    ; ; 2"

    Consistent Inconsistent

    wi0; ; 2 Inconsistent Inconsistent

    Case 4.a: wi0 random, mean w, variance 2"=(1 2)

    ; ; 2" Consistent Consistent

    w; ; 2 Inconsistent Consistent

    Case 4.b: wi0 random, mean w, variance 2w0

    ; ; 2"

    Consistent Consistent

    w0; ; 2; w Inconsistent Consistent

    Case 4.c: wi0 random, mean i0, variance 2"=(1 2)

    ; ; 2" Consistent Inconsistent

    i0; ; 2

    Inconsistent Inconsistent

    Case 4.d: wi0 random, mean i0, variance 2w0

    ; ; 2"

    Consistent Inconsistent

    i0; 2;

    2w0

    Inconsistent Inconsistent

  • 86 CHAPTER 5. DYNAMIC PANEL DATA MODELS

    Demand system:

    Git= Git (1 r)Gi;t1;

    F it= Fit (1 r)Fi;t1;

    Fit = a0 + a1Nit + a2Iit;

    Git= b0 + b1Pit + b2F

    it;

    where Gitand Git are respectively the new demand and the actual

    demand for gas at time t from unit i, r is the appliances deprecia-

    tion rate, F itand Fit are respectively the new and actual demand

    for all types of fuel, Nit is total population, Iit is per-head income,

    and Pit is relative price of gas.

    Solving the system, we have the equation to be estimated:

    Git = 0 + 1Pit + 2Nit + 3Ni;t1+4Iit + 5Ii;t1 + 6Gi;t1;

    where Nit = Nit Ni;t1, Iit = Iit Ii;t1, and 6 = 1 r.

    Estimation procedures: OLS, Within (LSDV) and GLS (with as-

    sumption that initial conditions Gi0 are xed, case 1/).

    In accordance with the theory, (here, 6) is biased upward for

    OLS and downward for Within.

  • 5.3. THE RANDOM-EFFECTS MODEL 87

    Table 5.3: Parameter estimates, Balestra-Nerlove model

    Parameter OLS Within GLS

    0 (Intercept) -3.650 - -4.091

    (3.316) - (11.544)

    1 (Pit) -0.0451(*) -0.2026 -0.0879(*)

    (0.027) (0.0532) (0.0468)

    2 (Nit) 0.0174(*) -0.0135 -0.00122

    (0.0093) (0.0215) (0.0190)

    3 (Ni;t1) 0.00111(**) 0.0327(**) 0.00360(**)

    (0.00041) (0.0046) (0.00129)

    4 (Iit) 0.0183(**) 0.0131 0.0170(**)

    (0.0080) (0.0084) (0.0080)

    5 (Ii;t1) 0.00326 0.0044 0.00354

    (0.00197) (0.0101) (0.00622)

    6 (Gi;t1) 1.010(**) 0.6799(**) 0.9546(**)

    (0.014) (0.0633) (0.0372)

    Notes. N = 36, T = 11. Standard errors are in parentheses. (*) and (**):

    parameter signicant at 10% and 5% level respectively.

  • 88 CHAPTER 5. DYNAMIC PANEL DATA MODELS

  • Part II

    Generalized Method of Moments

    estimation

    89

  • Chapter 6

    The GMM estimator

    Generalized Method of Moments: ecient way to obtain consis-

    tent parameter estimates under mild conditions on the model.

    Very popular in estimating structural economic models, as it re-

    quires much less conditions on model disturbances than Maximum

    Likelihood. Another important advantage: easy to obtain param-

    eter estimates that are robust to heteroskedasticity of unknown

    form.

    6.1 Moment conditions and the method of mo-

    ments

    6.1.1 Moment conditions

    Consider a sample of size N , fxi; i = 1; 2; : : : ; Ng from which onewishes to estimate a p 1 vector whose true value is 0.Note: notation above is very general, xi will typically include de-

    pendent (endogenous) and explanatory (exogenous, endogenous)

    variables.

    Let f(xi; ) denote a q1 function whose expectation E[f(xi; )]

    91

  • 92 CHAPTER 6. THE GMM ESTIMATOR

    exists and is nite. Moment conditions are then dened as

    E[f(xi; 0)] = 0:

    6.1.2 Example: Linear regression model

    Consider the linear model

    yi = xi0 + ui; i = 1; 2; : : : ; N;

    where 0: true value of parameter vector , and ui is the error

    term.

    A common assumption is E(uijxi) = 0 , E(yijxi) = xi0, andfrom the Law of Iterated Expectations:

    E(xiui) = E[E(xiuijxi)] = E[xiE(uijxi)] = 0:

    In terms of the denition above, = and f((xi; yi); ) = xi(yixi). Moment conditions are then

    E(xiui) = E[xi(yi xi0] = 0:

    Note that here, p = q, as many moment conditions as we have

    parameters to estimate.

    Suppose now we do not assume E(uijxi) = 0 but instead, thatE(ziui) = 0. Vector zi is q 1 and would consist of instrumentssuch that

    E(ziui) = E[zi(yi xi0)] = 0; orf [(xi; yi; zi); ] = zi(yi xi):

    There are q moment equations (as many as there are instruments)

    and p parameters to estimate. Hence, identication condition is

    q p.

  • 6.1. MOMENT CONDITIONS AND THE METHOD OF MOMENTS 93

    6.1.3 Example: Gamma distribution

    A sample fxi; i = 1; 2; : : : ; Ng is drawn from a Gamma distri-bution (a; b) with true values a0 and b0. Relationship between

    parameters and two rst moments of the distribution:

    E(xi) =a0

    b0; E[xi E(xi)]2 =

    a0

    b20:

    In our notation in the denition above: = (a; b) and

    f(xi; ) =hxi

    a

    b; (xi

    a

    b)2 a

    b2

    i;

    so that E[f(xi; 0] = 0.

    6.1.4 Method of moments estimation

    How to estimate using moment conditions given above ? In the

    case where p = q (as many conditions as parameters), we could

    solve E[f(xi; 0)] = 0 for 0. But E[f(:)] is unknown, whereas

    function values f(xi; ) can be computed 8; 8i. Also, samplemoments of function f(:) can be computed:

    fN() =1

    N

    NXi=1

    f(xi; ):

    Basic idea of the method of moment estimation: if E(f) close to

    fN (population moments close to empirical moments), then N is

    a convenient estimate for 0, where f(N) = 0.

    0 = E[f(0)] fN(N) ) 0 N :

    Two important conditions need to hold for the method of moment

    estimation to be valid: a) E(f) is adequately approximated by

  • 94 CHAPTER 6. THE GMM ESTIMATOR

    fN ; b) moment conditions can be solved for N .

    Example: linear regression.

    Sample moment conditions are

    1

    N

    NXi=1

    xiui =1

    N

    NXi=1

    xi(yi xiN) = 0;

    and solving for N yields

    N =

    NXi=1

    xix0i

    !1NXi=1

    xiyi:

    6.1.5 Example: Poisson counting model

    Poisson process: dependent variable is discrete (number of events,

    etc.). Restriction: Mean of distribution is equal to the variance.

    Assumption: dependent variables y1; y2; : : : ; yN are distributed

    according to independent Poisson distributions, with parameters

    1; 2; : : : ; N respectively.

    Prob[yi = r] = exp(i)ri

    r!

    We assume the i's depend on explanatory variables by a log-

    linear relationship:

    logi = 0 +

    pXj=1

    jxij:

    The likelihood of the Poisson model is

    L = Ni=1

    exp(i)

    yi

    i

    yi!

    = exp

    "

    NXi=1

    i + 0

    NXi=1

    yi

  • 6.1. MOMENT CONDITIONS AND THE METHOD OF MOMENTS 95

    +

    pXj=1

    j

    NXi=1

    xijyi

    # Ni=1yi!

    1:

    Let us consider the following sample moments :

    T0 =

    NXi=1

    yi Tj =

    NXi=1

    xijyi j = 1; : : : ; p;

    and we use the fact that

    @i

    @0= i and

    @i

    @j= xiji:

    If we set derivatives of logL wrt. 0 and the j's to 0, we get

    T0 =

    NXi=1

    i Tj =

    NXi=1

    xiji j = 1; : : : ; p

    where i = exp(0 +P

    p

    j=1 jxij): Hence, we match sample mo-

    ments T0 and Tj to theoretical momentsP

    N

    i=1 exp(0+P

    p

    j=1 jxij)

    and Tj =P

    N

    i=1 xij exp(0 +P

    p

    j=1 jxij) respectively.

    We have p+ 1 such matching conditions for p+ 1 parameters.

    6.1.6 Comments

    Note the dierence between the Method of Moments philosophy

    and the usual estimation criteria. For Maximum Likelihood and

    Least Squares, we maximize (minimize) a criterion

    = argmax logL() (MLE);

    = argmin1N

    PN

    i[yi f(xi; )]2 (LS);

  • 96 CHAPTER 6. THE GMM ESTIMATOR

    whereas here, we start from First-order Conditions and solve the

    system for .

    Example: Instrumental Variable estimation

    We could consider minimizing the IV criterion wrt. :

    = argmin

    (Y X)0Z(Z 0Z)1Z 0(Y X);

    where Z is a N q matrix of instruments, or start from the FOC:

    1

    N

    NXi=1

    ziui =1

    N

    NXi=1

    zi(yi xi) = 0

    , =

    NXi=1

    z0ixi

    !1NXi=1

    z0iyi = (Z

    0X)1Z 0Y:

    Equivalently, we could maximize the log likelihood wrt. or start

    from the FOC

    1

    N

    NXi=1

    @ logL()

    @j= = 0;

    which can be regarded here as a set of sample moment conditions.

    Problems that remain to be solved:

    Ensure that we can replace population moments by sample mo-ments, for the Method of Moments to work.

    What if the system of moment conditions is overidentied (moreconditions than parameters) ?

    How to be sure our moment conditions are valid (e.g., validchoice of instruments) ?

  • 6.2. THE GENERALIZED METHOD OF MOMENTS (GMM) 97

    6.2 The Generalized Method of Moments (GMM)

    6.2.1 Introduction

    As the name indicates, GMM is an extension of the Method of

    Moments, when parameters are overidentied by moment con-

    ditions. Equations E[f(xi; 0] = 0 represent q conditions for p

    unknown parameters, therefore we cannot nd a vector N satis-

    fying fN() = 0.

    But we can look for that makes fN() as close to 0 as possible,

    by dening

    N = argmin

    QN() = fN()0ANfN();

    where AN is a positive weighting matrix of order 0(1).

    Important note: for the just-identied case, QN() = 0 because

    fN() = 0, but in the over-identied case, QN() > 0.

    This fact is important for model checking (we will come to this

    point later in the course).

    6.2.2 Example: Just-identied IV model

    Consider Y = X+u with condition E(W 0u) = 0 (W are instru-

    ments), and

    rank(W 0X) = p. Solving for we have = (W 0X)1(W 0Y )

    that we replace in the IV criterion:

    u()0P 0Wu() =

    Y X(W 0X)1(W 0Y )

    0W (W 0W )1W