tilburg university efficiency gains due to using missing data … · 182 cristina pennavaja...

16
Tilburg University Efficiency gains due to using missing data procedures in regression models Nijman, T.E.; Palm, F.C. Publication date: 1986 Link to publication in Tilburg University Research Portal Citation for published version (APA): Nijman, T. E., & Palm, F. C. (1986). Efficiency gains due to using missing data procedures in regression models. (Research memorandum / Tilburg University, Department of Economics; Vol. FEW 240). Unknown Publisher. General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain • You may freely distribute the URL identifying the publication in the public portal Take down policy If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim. Download date: 27. Jun. 2021

Upload: others

Post on 07-Feb-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

  • Tilburg University

    Efficiency gains due to using missing data procedures in regression models

    Nijman, T.E.; Palm, F.C.

    Publication date:1986

    Link to publication in Tilburg University Research Portal

    Citation for published version (APA):Nijman, T. E., & Palm, F. C. (1986). Efficiency gains due to using missing data procedures in regression models.(Research memorandum / Tilburg University, Department of Economics; Vol. FEW 240). Unknown Publisher.

    General rightsCopyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright ownersand it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.

    • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain • You may freely distribute the URL identifying the publication in the public portal

    Take down policyIf you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediatelyand investigate your claim.

    Download date: 27. Jun. 2021

    https://research.tilburguniversity.edu/en/publications/001c2552-a7e8-45b6-be4b-41094d0a48c4

  • IINN IIIIVI IVI I I I II IIIII II tlI IIIIi I IIVll lllll

  • FEW 240~~-:i` `!.~ï.~3.~~'~~~~ ~;;r~t St? ï'H1:~`rC~,,,

    ~ l~ Tlt~p~RG ~~ .,..-!

    EFFICIENCY GAINS DUE TO USINGMISSING DATA PROCEDURES

    IN REGRESSION MODELS

    Th.E. NijmanF.C. Palm

    November 1986

  • EFFICIENCY GAINS DUE TO USING MISSING DATAPROCEDURES IN REGRESSION MODELS

    Th.E. Nijmanwr

    F.C. Palm

    november 19f36

    .

    In the chapter on "Economic Data Issues" of the Handbook of Econometrics,

    Griliches (1986) analyzes the asymptotic variance of an estimator of a

    reqression coefficient using imputations for the missing regressor values

    and he compares it with that of an estimation procedure based on the com-

    plete observations only. His derivation of an expression for the relati-

    ve efficiency is íncorrect. In this note, we qive the correct result and

    show that the relative efficiency of three estimators desiqned to handle

    incomplete samples depends on parameters that have a atraightforward sta-

    tistical interpretation. In terms of a gain of asymptotíc efficiency,

    the use of these estimators is equivalent to the observation of a percen-

    tage of the values which are actually missing. This percentage depends on

    three R2-measures only, which can be straiqhtforwardly computed in applied

    work. Therefore it should be easy in practice to check whether it is

    worthwhile to use a more elaborate estimator.

    The authors thank Professor Z. Griliches for his comments on an earlierversion of this note.

    t

    rt

    Department of Econometrics, Tilburq University, P.O.B. 90153,

    5000 LE Tilburg, The Netherlands.

    Department of Economics, University of Limburg, P.O.B. 616,6200 MD Maastricht, The Netherlands.

  • 1

    Griliches (1986) considers the following regression model

    2yi - Rxi t yzi t ei . ei ~ IN(0,o ),

    and

    xi - dzi t vi , vi ~ IN(O,Q~),

    where the regressors xi and zi are assumed to be independent of the

    correspondinq disturbances ei and vi. Actually, the normality

    assumption is not made by Griliches, but it will be required below

    for maximum likelihood (ML) estimation and it does not affect the

    results for the other estimators. The variables yi and zi are ob-

    served for i- 1, ...~N1 t N2, whereas xi is observed for i- 1,

    (1)

    (2)

    ... N1 only.Besides the OLS estimator of the reqression of yi on xi and z1 for

    i- 1, ... N1 only, denoted by ~a and ya, Griliches ( 1986) considers

    an estimation procedure in which the missing xi's are replaced by

    d z., where d is the OLS estimate of d in ( 2) usinq the first N1 ob-a i a -servations. The estimate yatb is subsequently computed by OIS on

    yiwhere

    i 5It

    - Baxi - Yzi t wi.

    t e Bv, f e.(~dwi - ei .i i iN1 and xi - dazi and Ai -

    is straightforward to show yatb

    Ai - 0 if

    can be alternatively com-

    - dasa)zi, with xi - xi and1 otherwise.

    that

    puted by OLS of yi on xi and zi

    yi - Bxi t yzi t{ei t A1Bvi t Bei(d - da)zi}-

    Contrary to what is stated by Griliches ( 1986), the contribution ofSAi(d - da)zi to the asymptotic variance of yatb is not negliqible

    if plim N2 N-1 - À~ 0 for N-~ ~, with N~ N1 } N2.

    As the three components of the disturbance c~ (4) are independent,

    the large sample distributíon of yatb is given by

    `IN (Yatb - Y) ~ N(O,V)a

    (3)

    (4)

    with V~ plim N(X'X)-1{X'StX } X'WS2 E W'X} (X'X)-1, (5)

  • where X is the matrix of regressors in (4), W is a vector with typical2 2 2element Aizi, Sl is a diagonal matrix with typical element Q f 9is av,

    and E is the asymptotic variance of da.

    After some alqebra, we get2a

    V - {(1 - r2 )-1 t ~(U-1 - 2)}.aQ z

    2 xz

    Nwhere QZ - plim N-1 E zi (assumed to exist), U a

    o2(SZQV } Q2)-1 and1-1

    r2 is the theoretical R2 of the regression (2) of x on z. The resultxz

    in (6) has been obtained by Gouriéroux and Monfort [1981, expression

    (11) on p. 583].

    (6)

    The relative efficiency of Yatb with respect to Yais

    Avar(vNY )Eff(Y ) - -a{b - 1 t ~(U-1 - 2)(1 - r2 ). (7)atb Avar ( v N Ya) xz

    Accordinq to (7), using imputed values as in (3) leads to a gain of

    efficiency compared with using complete observations only if )t ~},1 - ~

    which is more stringent than the condition U~ 2-~ given by

    Griliches (1986). Both conditionsrrequire that the unpredictable part

    of x from z is not too important relative to Q2, the overall noise

    level of (1). 21 - r

    As U - 2xz ~ ~ (8)

    1 - ryz

    where r2 and rz denote the theozetical Rz's of a reqression of y ony xz yz

    respectively x and z and on z only, it is obvious that a sufficient con-

    dition for an efficiency gain is ry Xz ~}, i.e. the predictible pazt of

    y is small.

    As noted by Griliches (1986) and others, an efflciency qain is assuzed

    if (4) is estimated by a generalized least squares (GLS) method which

  • 3

    takes the correlation structure of the disturbance in (4) into account.Again, the term AiB(d - da)zi cannot be neqlected (see Palm and Nijman

    (1982) and Nijman and Palm (1985)). Alternatively, the fully efficient

    ML estimator can be computed, e.q. using the convenient reparametrisationsuggested by Gouriéroux and Monfort (1981). From their results, the re-

    lative efficiency of the GLS and ML estimators with respect to that of Yacan be obtained

    Eff(YGLS) -

    1 - a)1(1 - rXZ)

    and Eff(Y~) - 1- ay(1 - rXZ) - 2aU(1 - u)rxz'

    (9)

    (10)

    The relative efficiency in (7), ( 9) and ( 10) only depends on the threemagnitudes ~, u and rXZ. Equation ( 9) indicates that in terms of a gainof asymptotic efficiency, the use of GLS is equivalent to the observationof 100 U(1 - rX2) B of the values of xi that are actually missing. Similarexpressions can be obtained from (7) and (10) for Yatb and Y~ respecti-vely. The values in Table 1 illustrate this result.

    TABLE 1 Percentage of missing observations that are regained by theuse of missing data prcx:edures instead of the complete dataonly.

    1- ryxzu - 21- ryz

    2rXZGain in percentage points for- - -

    ' Yatb YGLS YML

    .3 .2 -106 24 32

    .3 .8 - 27 6 40

    .6 .2 27 48 58

    .6 .8 7 12 50

    .9 .2 71 72 76

    .9 .8 18 18 32

    Note that a good fit in (2) yielding a"qood proxy" for the missingvalues of xi does not imply that a large part of the missing information

    on x, can be recovered, because of the induced multicollinearity betweenixi and zi in (4). Especially, when rXZ is small, the efficiency gainobtained by usinq the appropriate estlmators can be substantial. Thevalue of u is crucial for the efficíency of Ya}b. The loss of efficiency

  • 4

    can be important when u ~}. This lose incr~ases ae rXZ decreasea.Finally, íf u is close to one, i.e. xi is not very important in ex-plaining y in equation (1), all three approaches which take intoaccount the incomplete data, yield about equally efficient estimators.

  • 5

    References

    Gouriéroux, C., and A. Monfort (1981), "On the problem of missing data

    in linear models", Review of Economic Studies, 48, 579-586.

    Griliches, Z. (1986), "Economic data issues", in Z. Griliches and

    M.D. Intriligator, eds, Handbook of Econometrics, North

    Holland, Amsterdam, 1466-1514.

    Nijman, Th.E., and F.C. Palm (1985), "Consistent estimation of a regression

    model with incompletely observed exogenous variable",

    Netherlands Central Bureau of Statistics, unpublished paper.

    Palm, F.C., and Th.E. Nijman ( 1982), "Linear regression using both tem-

    porally aggregated and temporally disaggregated data",

    Journal of Econometrics, 19, 333-343.

  • i

    IN 1985 REEllS VERSCHENEN

    168 T.M. Doup, A.J.J. TalmanA continuous deformation algorithm on the product space of unitsimplices

    169 P.A. BekkerA note on the identification of restricted factor loading matricea

    170 J.H.M. Donders, A.M, van NunenEconomische politiek ín een twee-sectoren-model

    171 L.H.M. Bosch, W.A.M. de LangeShift work in health care

    172 B.B. van der GenugtenAsymptotic Normality of Least Squares Estimators in AutoregressiveLinear Regression Models

    173 R.J, de GroofGe3soleerde versus gecoSrdineerde economische politiek in een twee-regiomodel

    174 G, van der Laan, A.J.J. TalmanAdjustment processes for finding economic equilibria

    175 B.R. MeijboomHorizontal mixed decomposition

    176 F. van der Ploeg, A.J. de ZeeuwNon-cooperative strategies for dynamic policy games and the problemof time inconsistency: a comment

    177 B.R. MeijboomA two-level planning procedure with respect to make-or-buy deci-sions, including cost allocations

    178 N.J. de BeerVoorspelprestaties van het Centraal Planbureau in de periode 1953t~m 1980

    178a N.J, de BeerBIJLAGEN bij Voorspelprestaties van het Centraal Planbureau in deperiode 1953 t~m 1980

    179 R.J.M. Alessie, A. Kapteyn, W.H.J, de FreytasDe invloed van demografische factoren en ínkomen op consumptieveuitgaven

    180 P. Kooreman, A. KapteynEstimatíon of a~;ame theoretic model of household labor supply

    l81 A.J. de 'Lceuw, A.C. MeijdamOn Expectatíons, Information and Uynamic Game liquílibria

  • ii

    182 Cristina PennavajaPeriodization approaches of capitalist development.A critical survey

    183 J.P.C. Kleíjnen, G.L.J. Kloppenburg and F.L. MeeuwsenTesting the mean of an asymmetric population: Johnson's modified Ttest revisited

    184 M.O. Nijkamp, A.M. van NunenFreia versus Vintaf, een analyse

    185 A.H.M. GerardsHomomorphisms of graphs to odd cycles

    186 P. Bekker, A. Kapteyn, T. WansbeekConsistent sets of estimates for regressions with correlated oruncorrelated measurement errors in arbitrary subsets of allvariables

    187 P. Bekker, J, de LeeuwThe rank of reduced dispersion matrices

    188 A.J. de Zeeuw, F, van der PloegConsístency of conjectures and reactíons: a critique

    189 E.N. KertzmanBelastingstructuur en privatisering

    190 J.P.C. KleijnenSimulation with too many factora: review of random and group-screening designs

    191 J.P.C. KleijnenA Scenario for Sequential Experimentation

    192 A. DortmansDe loonvergelijkingAfwenteling van collectieve lasten door loontrekkera?

    193 R. Heuts, J, van Lieshout, K. BakenThe quality of some approximation formulas in a continuous reviewinventory model

    194 J.P.C. KleijnenAnalyzing simulation experiments with common random numbers

    195 P.M. KortOptimal dynamic investment policy under financial restrictions andadjustment costs

    196 A.H, van den Elzen, G. van der Laan, A.J.J. TalmanAdjustment processes for finding equílibria on the simplotope

  • iii

    197 J.P.C, KleijnenVariance heterogeneity in experimental design

    198 J.P.C. KleijnenSelecting random number seeds in practice

    199 J.P.C. KleijnenRegression analysis of simulatíon experiments: functional softwarespecification

    200 G, van der Laan and A.J.J, TalmanAn algorithm for the linear complementarity problem with upper andlower bounds

    201 P. KooremanAlternative specífication tests for Tobit and related models

  • iv

    IN 1986 ltl?F.US VEItSCHh:NP:N

    202 J.H.F. SchilderinckInterregional Structure of the European Community. Part III

    203 Antoon van den Elzen and Dolf TalmanA new strategy-adjustment process for computing a Nash equilibriumin a noncooperative more-person game

    204 Jan VíngerhoetsFabrication of copper and copper semis in developing countries.A review of evidence and opportunities.

    205 R. Heuts, J. v. Lieshout, K. BakenAn inventory model: what is the influence of the shape of the leadtime demand distributton?

    206 A, v. Soest, P. KooremanA Microeconometric Analysis of Vacation Behavior

    207 F. Boekema, A. NagelkerkeLabour Relations, Networks, Job-creatíon and Regional DevelopmentA view to the consequences of technological change

    208 R. Alessie, A. KapteynHabit Formation and Interdependent Preferences in the Almost IdealDemand System

    209 T. Wansbeek, A. KapteynEstimation of the error components model with incomplete panels

    210 A.L. HempeníusThe relation between dividends and profits

    211 J. Kriens, J.Th. van LieshoutA generalisation and aome properties of Markowitz' portfolioselection method

    212 Jack P.C. Kleijnen and Charles R. StandridgeExperimental design and regression analysis in simulation: an FMScase study

    213 T.M. Doup, A.H. van den Elzen and A.J.J. TalmanSimplicial algorithms for solving the non-linear complementarityproblem on the simplotope

    214 A.J.W, van de GevelThe theory of wage differentials: a correction

    215 J.P.C. Kleijnen, W. van GroenendaalRegression analysis of factorial designs with sequential replica-tion

  • V

    216 T.E, Nijman and F,C, PalmConsistent estimation of rational expectations models

    217 P,M. KortThe firm's ínvestment policy under a concave adjustment cost func-tion

    218 J.P,C. KleijnenDecision Support Systems ( DSS), en de kleren van de keizer ...

    219 T.M. Doup and A,J..J, TalmanA continuous deformation algorithm on the product space of unitsimplices

    220 T.M, Doup and A.J.J, TalmanThe 2-ray algorithm for solving equilibrium problems on the unitsimplex

    221 Th. van de Klundert, P, PetersPrice Inertia in a Macroeconomic Model of Monopolistíc Competition

    222 Christian MulderTesting Korteweg's rational expectations model for a small openeconomy

    223 A.C, Meijdam, J,E,J. PlasmansMaximum Likelihood Estimation of Econometric Models with RationalExpectations of Current Endogenous Variables

    224 .~rie Kapteyn, Peter Kooreman, Arthur van SoestNon-convex budget sets, institutional constraints and imposition ofconcavity in a flexibele household labor supply model.

    225 R,J, de GroofInternationale coSrdinatie van economische politiek in een twee-regio-twee-sectoren model.

    226 Arthur van Soest, Peter KooremanComment on 'Microeconometric Demand Systems with Binding Non-Nega-tivity Constraints: The Dual Approach'

    227 A.J,J, Talman and Y, Yamamotoa globally convergent simplicial algorithm for stationary pointproblems on polytopes

    228 Jack P,C. Kleijnen, Peter C.A, Karremans, Wim K, Oortwijn, WillemJ.H, van GroenendaalJackknifing estímated weighted least squares

    229 A.H. van den Elzen and G, van der LaanA price adjustment for an economy with a block-diagonal pattern

    230 M,H,C. PaardekooperJacobi-type algorithms for eigenvalues on vector- and parallelcomputer

  • vi

    231 J.Y.C. KleijnenAnalyzing simulation experiments with common random numbers

    232 A.B.T.M. van Schaik, R.J. MulderOn Superimposed Recurrent Cycles

    233 M.H.C. PaardekooperSameh's parallel eigenvalue algorithm revisited

    234 Pieter H.M. Ruys and Ton J.A. StorckenPreferences revealed by the choice of friends

    235 C.J.J. Huys en E.N. KertzmanEffectieve belastíngtarieven en kapitaalkosten

    236 A.M.H. GerardsAn extension of Kónig's theorem to graphs with no odd-K4

    237 A.M.H. Gerards and A. SchrijverSigned Graphs - Regular Matroids - Grafts

    238 Rob J.M. Alessie and Arie KapteynConsumption, Savings and Demography

    239 A.J, van ReekenAegrippen rondom "kwaliteit"

  • M I ~ A ~ÍÍ~~~~I~~N~~~~ VII II

    page 1page 2page 3page 4page 5page 6page 7page 8page 9page 10page 11page 12page 13page 14page 15