econometric analysis of panel data

88
Part 20: Selection [1/66] Econometric Analysis of Panel Data William Greene Department of Economics Stern School of Business

Upload: cala

Post on 22-Feb-2016

123 views

Category:

Documents


1 download

DESCRIPTION

Econometric Analysis of Panel Data. William Greene Department of Economics Stern School of Business. Econometric Analysis of Panel Data. 20. Sample Selection and Attrition. Received Sunday, April 27, 2014 - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Econometric Analysis of Panel Data

Part 20: Selection [1/66]

Econometric Analysis of Panel Data

William GreeneDepartment of EconomicsStern School of Business

Page 2: Econometric Analysis of Panel Data

Part 20: Selection [2/66]

Econometric Analysis of Panel Data

20. Sample Selection and Attrition

Page 3: Econometric Analysis of Panel Data

Part 20: Selection [3/66]

Received Sunday, April 27, 2014

I have a paper regarding strategic alliances between firms, and their impact on firm risk. While observing how a firm’s strategic alliance formation impacts its risk, I need to correct for two types of selection biases. The reviews at Journal of Marketing asked us to correct for the propensity of firms to enter into alliances, and also the propensity to select a specific partner, before we examine how the partnership itself impacts risk.

Our approach involved conducting a probit of alliance formation propensity, take the inverse mills and include it in the second selection equation which is also a probit of partner selection. Then, we include inverse mills from the second selection into the main model. The review team states that this is not correct, and we need an MLE estimation in order to correctly model  the set of three equations. The Associate Editor’s point is given below. Can you please provide any guidance on whether this is a valid criticism of our approach. Is there a procedure in LIMDEP that can handle this set of three equations with two selection probit models?

AE’s comment:“Please note that the procedure of using an inverse mills ratio is only consistent when the main equation where the ratio is being used is linear. In non-linear cases (like the second probit used by the authors), this is not correct. Please see any standard econometric treatment like Greene or Wooldridge. A MLE estimator is needed which will be far from trivial to specify and estimate given error correlations between all three equations.”

Page 4: Econometric Analysis of Panel Data

Part 20: Selection [4/66]

Hello Dr. Greene,

My name is xxxxxxxxxx and I go to the University of xxxxxxxx.I see that you have an errata page on your website of your econometrics book 7th edition.It seems like you want to correct all mistakes so I think I have spotted a possible proofreading error.On page 477 (theorem 13.2) you want to show that theta is consistent and you say that

"But, at the true parameter values, qn(θ0) →0. So, if (13-7) is true, then it must followthat qn(θˆGMM) →θ0 as well because of the identification assumption"

I think in the second line it should be  qn(θˆGMM) → 0,  not θ0.

Page 5: Econometric Analysis of Panel Data

Part 20: Selection [5/66]

I also have a questions about nonlinear GMM - which is more or less nonlinear IV technique I suppose.

I am running a panel non-linear regression  (non-linear in the parameters) and I have L parameters and K exogenous variables with L>K.

In particular my model looks kind of like this:  Y =   b1*X^b2 + e, and so I am trying to estimate the extra b2 that don't usually appear in a regression.From what I am reading, to run nonlinear GMM I can use the K exogenous variables to construct the orthogonality conditions but what should I use for the extra, b2 coefficients?Just some more possible IVs (like lags) of the exogenous variables?I agree that by adding more IVs you will get a more efficient estimation, but isn't it only the case when you believe the IVs are truly uncorrelated with the error term?So by adding more "instruments" you are more or less imposing more and more restrictive assumptions about the model (which might not actually be true).

I am asking because I have not found sources comparing nonlinear GMM/IV to nonlinear least squares.  If there is no homoscadesticity/serial correlation what is more efficient/give tighter estimates?

Page 6: Econometric Analysis of Panel Data

Part 20: Selection [6/66]

Page 7: Econometric Analysis of Panel Data

Part 20: Selection [7/66]

Dueling Selection Biases – From two emails, same day.

“I am trying to find methods which can deal with data that is non-randomised and data that is non-randomised and suffers from selection biassuffers from selection bias.”

“I explain the probability of answering questions using, among other independent variables, a variable which measures knowledge breadth. Knowledge breadth can be constructed only for those individuals that fill in a skill description in the company intranet. This is where the selection bias comes from.

Page 8: Econometric Analysis of Panel Data

Part 20: Selection [8/66]

The Crucial Element Selection on the unobservables

Selection into the sample is based on both observables and unobservables

All the observables are accounted for Unobservables in the selection rule also appear

in the model of interest (or are correlated with unobservables in the model of interest)

“Selection Bias”=the bias due to not accounting for the unobservables that link the equations.

Page 9: Econometric Analysis of Panel Data

Part 20: Selection [9/66]

A Sample Selection Model

Linear model 2 step ML – Murphy & Topel Binary choice application Other models

Page 10: Econometric Analysis of Panel Data

Part 20: Selection [10/66]

Canonical Sample Selection ModelRegression Equationy*=x +Sample Selection Mechanismd*=z +u; d=1[d* > 0] (probit)y = y* if d = 1; not observed otherwiseIs the sample 'nonrandomly selected?'E[y*|x,d=1] = x +E[ | x,d 1]

= x +E[ | x,u z ]

= x something if Cor[ ,u|x] 0A left out variable problem (again)Incidental truncation

Page 11: Econometric Analysis of Panel Data

Part 20: Selection [11/66]

Applications Labor Supply model:

y*=wage-reservation wage d=labor force participation

Attrition model: Clinical studies of medicines Survival bias in financial data Income studies – value of a college application Treatment effects Any survey data in which respondents self select

to report Etc…

Page 12: Econometric Analysis of Panel Data

Part 20: Selection [12/66]

Estimation of the Selection Model Two step least squares

Inefficient Simple – exists in current software Simple to understand and widely used

Full information maximum likelihood Efficient Simple – exists in current software Not so simple to understand – widely

misunderstood

Page 13: Econometric Analysis of Panel Data

Part 20: Selection [13/66]

Heckman’s Model

i i

i i i i

i i i2

i i

i i i i i i

i i i

y*= +d*= +u; d=1[d* > 0] (probit)y = y * if d = 1; not observed otherwise[ ,u]~Bivariate Normal[0,0, , ,1]E[y*|x ,d=1] = +E[ | x ,d 1] = +E[ | x ,u

i

i

i

i i

x βz γ

x βx β z γ

i

]( ) = ( )

= Least squares is biased and inconsistent again. Left out variable

ii

i

i

z γx β z γx β

Page 14: Econometric Analysis of Panel Data

Part 20: Selection [14/66]

Two Step Estimation

i i i i

i

Step 1: Estimate the probit modeld*= +u; d=1[d* > 0] (probit).

ˆ( )ˆˆ Estimation of by . Now compute ˆ( )Step 2: Estimate the regression model with estimated re

i

i

i

z γz γγ γ z γ

i i

i i i

i i i i i i

i

i i i

gressory*= +y = y* if d = 1; not observed otherwiseE[y*|x ,d=1] = +E[ | x ,d 1] =

ˆ Linearly regress y on x , . Step2a. Fix standard errors (Murphy

i

i

i

x βxβ

and Topel). Estimate ˆand using and /ne'e

The “LAMBDA”

Page 15: Econometric Analysis of Panel Data

Part 20: Selection [15/66]

FIML Estimation

id 0

2i i i2d 1 2

i i

2

d 0

2 2i i id

logL log

/1 log exp 22 1y

Let

1/ , =- / , =1-

logL log1 log exp y ( 1 ) ( y )22

i

i

i i

z

z

z

x δ z x δ

1

Note: no inverse Mills ratio appears anywhere in the model.

Page 16: Econometric Analysis of Panel Data

Part 20: Selection [16/66]

Classic Application Mroz, T., Married women’s labor supply,

Econometrica, 1987. N =753 N1 = 428

A (my) specification LFP=f(age,age2,family income, education,

kids) Wage=g(experience, exp2, education, city)

Two step and FIML estimation

Page 17: Econometric Analysis of Panel Data

Part 20: Selection [17/66]

Selection Equation+---------------------------------------------+| Binomial Probit Model || Dependent variable LFP || Number of observations 753 || Log likelihood function -490.8478 |+---------------------------------------------++--------+--------------+----------------+--------+--------+----------+|Variable| Coefficient | Standard Error |b/St.Er.|P[|Z|>z]| Mean of X|+--------+--------------+----------------+--------+--------+----------+---------+Index function for probability Constant| -4.15680692 1.40208596 -2.965 .0030 AGE | .18539510 .06596666 2.810 .0049 42.5378486 AGESQ | -.00242590 .00077354 -3.136 .0017 1874.54847 FAMINC | .458045D-05 .420642D-05 1.089 .2762 23080.5950 WE | .09818228 .02298412 4.272 .0000 12.2868526 KIDS | -.44898674 .13091150 -3.430 .0006 .69588313

Page 18: Econometric Analysis of Panel Data

Part 20: Selection [18/66]

Heckman Estimator and MLE

Page 19: Econometric Analysis of Panel Data

Part 20: Selection [19/66]

Extension – Treatment Effect

i i i i

i i i2

i i

i i i i i i i

What is the value of an elite college education?d*= +u; d=1[d* > 0] (probit)y*= + d observed for everyone[ ,u]~Bivariate Normal[0,0, , ,1]E[y*|x ,d=1] = + d+E[ | x ,d 1]

i

i

i

z γx β

i i i

i

i i i i i i i

= + +E[ | x ,u ]( ) = + ( )

= +E[y*|x ,d=0] = + d+E[ | x ,d 0]

( =

i i

ii

i

i

i

ii

x β z γz γx β z γ

x βxβ

zx β

)( )

Least squares is still biased and inconsistent. Left out variablei

γz γ

Page 20: Econometric Analysis of Panel Data

Part 20: Selection [20/66]

Sample Selection

An approach modeled on Heckman's modelRegression Equation:Prob[y=j|x,u]=P(λ); λ=exp(xβ+θu)Selection Equation: d=1[zδ+ε>0] (The usual probit)[u,ε]~n[0,0,1,1,ρ] (Var[u] is

2

absorbed in θ)Estimation:Nonlinear Least Squares: [Terza (1998, see cite in text).]

Φ(zδ+ρ)E[y|x,d=1]=exp(xβ+θρ ) Φ(zδ)FIML using Hermite quadrature: [Greene (Stern wp, 97-02, 1997)]

Page 21: Econometric Analysis of Panel Data

Part 20: Selection [21/66]

Extensions – Binary Data

i i i i

i i i i

i i i

i i

Application: In German Health care data, insurance choicesd*= +u; d=1[d* > 0] (probit)y*= + ,y=1[y* > 0] (probit)y = y* if d = 1; not observed otherwise[ ,u]~Bivariate Normal[0,0,1,

i

i

z γxβ

,1]Estimation:(1) Two step? (Wooldridge text)(2) FIML (Stata, LIMDEP)

Page 22: Econometric Analysis of Panel Data

Part 20: Selection [22/66]

Panel Data and Selection

it i it

it it it

it

Selection equation with time invariant individual effectd 1[ 0] Observation mechanism: (y , ) observed when d 1Primary equation of interestCommon effects linear regression model y

itz γx

it i it

it it

i i

| (d 1)"Selectivity" as usual arises as a problem when the unobservablesare correlated; Corr( , ) 0.The common effects, and make matters worse.

itx β

Page 23: Econometric Analysis of Panel Data

Part 20: Selection [23/66]

Panel Data and Sample Selection Models: A Nonlinear Time Series

I. 1990-1992: Fixed and Random Effects Extensions

II. 1995 and 2005: Model Identification through Conditional Mean Assumptions

III. 1997-2005: Semiparametric Approaches based on Differences and Kernel Weights

IV. 2007: Return to Conventional Estimators, with Bias Corrections

Page 24: Econometric Analysis of Panel Data

Part 20: Selection [24/66]

Panel Data Sample Selection Models

i

it i it

it it i it

Ti t 1

Verbeek, Economics Letters, 1990.d 1[ w 0] y | (d 1) ; Proposed "marginal likelihood" based on joint normality

logL

it

it

z γ (Random effects probit)x β (Fixed effects regression)

i,1 it i,2it i,1 i,2 i,1 i,22 2

it

it it i

i,1 i,2

u d u(2d 1) f(u ,u )du du(1 d )

( / )d (y y ) ( )(Integrate out the random effects; difference out the fixed effects.)u ,u are tim

it it

it it i

z γ+

x x 'β

e invariant uncorrelated standard normal variablesHow to do the integration? Natural candidate for simulation.(Not mentioned in the paper. Too early.)[Verbeek and Nijman: Selectivity "test" based on this model, International Economic Review, 1992.]

Page 25: Econometric Analysis of Panel Data

Part 20: Selection [25/66]

Zabel – Economics Letters Inappropriate to have a mix of FE and RE

models Two part solution

Treat both effects as “fixed” Project both effects onto the group means of

the variables in the equations Resulting model is two random effects

equations Use both random effects

Page 26: Econometric Analysis of Panel Data

Part 20: Selection [26/66]

Selection with Fixed Effects

22

0

2

* , , ~ [0,1]* , , ~ [0,1]

( , ) ~ [(0,0), ( ,1, )].

( )

( / ) 1

it

it i it it i i i i

it i it it i i i i

it it

i it i i i id

it i i it

y w w Nd u v v N

u N

L v v dv

v

x xz z

z z

z z

21-

1 ( , )

it

iti i i id

it it it i i

v w dv dw

y w

x x

Page 27: Econometric Analysis of Panel Data

Part 20: Selection [27/66]

Practical Complications

The bivariate normal integration is actually the product of two univariate normals, because in the specification above, vi and wi are assumed to be uncorrelated. Vella notes, however, “… given the computational demands of estimating by maximum likelihood induced by the requirement to evaluate multiple integrals, we consider the applicability of available simple, or two step procedures.”

Page 28: Econometric Analysis of Panel Data

Part 20: Selection [28/66]

Simulation

The first line in the log likelihood is of the form Ev[d=0(…)] and the second line is of the form Ew[Ev[(…)(…)/]]. Using simulation instead, the simulated likelihood is

,1 0

, , ,1 1 2

, ,

1

( / )1 1 1

it

it

RSi it i i rr d

R it i i r it r it rr d

it r it it i i r

L vR

vR

y w

z z

z z

x x

Page 29: Econometric Analysis of Panel Data

Part 20: Selection [29/66]

Correlated Effects

Suppose that wi and vi are bivariate standard normal with correlation vw. We can project wi on vi and write

wi = vwvi + (1-vw2)1/2hi

where hi has a standard normal distribution. To allow the correlation, we now simply substitute this expression for wi in the simulated (or original) log likelihood, and add vw to the list of parameters to be estimated. The simulation is then over still independent normal variates, vi and hi.

Page 30: Econometric Analysis of Panel Data

Part 20: Selection [30/66]

Conditional Means

Page 31: Econometric Analysis of Panel Data

Part 20: Selection [31/66]

A Feasible Estimator

Page 32: Econometric Analysis of Panel Data

Part 20: Selection [32/66]

Estimation

Page 33: Econometric Analysis of Panel Data

Part 20: Selection [33/66]

Kyriazidou - Semiparametrics

1N Ni=1 i1 i2 i i i i=1 i1 i2 i i i

ii

Assume 2 periodsEstimate selection equation by FE logitUse first differences and weighted least squares:

ˆ ˆ = d d d d y

ˆw1ˆ K kernel function.h h

Use

x x x

with longer panels - any pairwise differencesExtensions based on pairwise differences by Rochina-Barrachina and Dustman/Rochina-Barrachina (1999)

Page 34: Econometric Analysis of Panel Data

Part 20: Selection [34/66]

Bias Corrections Val and Vella, 2007 (Working paper) Assume fixed effects

Bias corrected probit estimator at the first step

Use fixed probit model to set up second step Heckman style regression treatment.

Page 35: Econometric Analysis of Panel Data

Part 20: Selection [35/66]

Postscript What selection process is at work?

All of the work examined here (and in the literature) assumes the selection operates anew in each period

An alternative scenario: Selection into the panel, once, at baseline.

Why aren’t the time invariant components correlated? (Greene, 2007, NLOGIT development)

Other models All of the work on panel data selection assumes the main

equation is a linear model. Any others? Discrete choice? Counts?

Page 36: Econometric Analysis of Panel Data

Part 20: Selection [36/66]

Sample Selection

Page 37: Econometric Analysis of Panel Data

Part 20: Selection [37/66]

TECHNICAL EFFICIENCY ANALYSIS CORRECTING FOR BIASES FROM OBSERVED AND UNOBSERVED

VARIABLES: AN APPLICATION TO A NATURAL RESOURCE MANAGEMENT PROJECT

Empirical Economics: Volume 43, Issue 1 (2012), Pages 55-72

Boris Bravo-UretaUniversity of Connecticut

Daniel SolisUniversity of Miami

William GreeneNew York University

Page 38: Econometric Analysis of Panel Data

Part 20: Selection [38/66]

The MARENA Program in Honduras

Several programs have been implemented to address resource degradation while also seeking to improve productivity, managerial performance and reduce poverty (and in some cases make up for lack of public support).

One such effort is the Programa Multifase de Manejo de Recursos Naturales en Cuencas Prioritarias or MARENA in Honduras focusing on small scale hillside farmers.

Page 39: Econometric Analysis of Panel Data

Part 20: Selection [39/66]OVERALL CONCEPTUAL FRAMEWORKOVERALL CONCEPTUAL FRAMEWORK

Working HYPOTHESIS: if farmers receive private Working HYPOTHESIS: if farmers receive private benefits (higher income) from project activities (e.g., benefits (higher income) from project activities (e.g., training, financing) then adoption is likely to be training, financing) then adoption is likely to be sustainable and to generate positive externalities.sustainable and to generate positive externalities.

More Production and Productivity

More Farm

Income

Sustainability

Off-Farm Income

MARENA

Training &

Financing

Natural, Human &

Social Capital

Page 40: Econometric Analysis of Panel Data

Part 20: Selection [40/66]

COMPONENT I: Strengthening Strategic Management Capabilities among Govt. Institutions (central and local) COMPONENT II: Support to Nat. Res. Management. Projects

Module 1: Promotion and Organization Modulo 2: Strengthening Local Institutions & Organizations Module 3: Investment (farm, municipal & regional)

COMPONENT III: Administration and Supervision

The MARENA Program

Page 41: Econometric Analysis of Panel Data

Part 20: Selection [41/66]

Component II - Module 3 focused on promoting investments in sustainable production systems with a budget of US $7.6 million (Bravo-Ureta, 2009).

The major activities undertaken with beneficiaries: training in business management and sustainable farming practices; and the provision of funds to co-finance investment activities through local rural savings associations (cajas rurales).

Component II - Module 3

Page 42: Econometric Analysis of Panel Data

Part 20: Selection [42/66]

Rural poverty in Honduras, largely due to policy driven unsustainable land use => environmental degradation, productivity losses, food insecurity, growing climatic vulnerability (GEF-IFAD, 2002).

ConclusionsConclusions

Page 43: Econometric Analysis of Panel Data

Part 20: Selection [43/66]

Expected Impact Evaluation

Page 44: Econometric Analysis of Panel Data

Part 20: Selection [44/66]

Methods

A matched group of beneficiaries and control farmers is determined using Propensity Score Matching techniques to mitigate biases that would stem from selection on observed variables.

In addition, we deal with possible self-selection on unobservables arising from unobserved variables using a selectivity correction model for stochastic frontiers introduced by Greene (2010).

Page 45: Econometric Analysis of Panel Data

Part 20: Selection [45/66]

A Sample Selected SF Model

di = 1[ z′ i + hi > 0], hi ~ N[0,12]

yi = + x′ i + i, i ~ N[0,2]

(yi,xi) observed only when di = 1.

i = vi - ui

ui = u|Ui| where Ui ~ N[0,12]

vi = vVi where Vi ~ N[0,12].

(hi,vi) ~ N2[(0,1), (1, v, v2)]

Page 46: Econometric Analysis of Panel Data

Part 20: Selection [46/66]

Simulated logL for the Standard SF Model2 21

2exp[ ( |) / ]( | ,| |)2

i i u i v

i i iv

y |Uf y U xx

2 212

| |

exp[ ( |) / ]( | ) (| |) | |2

i

i i u i vi i i iU

v

y |Uf y p U d Uxx

2122exp[ | | ](| |) , |U | 0. (Half normal)

2

ii i

Up U

2 212

1

1 exp[ ( |) / ]( | ) 2

R i i u ir vi r

v

y |Uf yR

xx

2 212

=1 1

1 exp[ ( |) / ]log ( , , , ) = log2

N R i i u ir vS u v i r

v

y |ULR

x

This is simply a linear regression with a random constant term, αi = α - σu |Ui |

Page 47: Econometric Analysis of Panel Data

Part 20: Selection [47/66]

Likelihood For a Sample Selected SF Model

2 212

2

| |

| ( , , ,| |)

exp ( | |) / )

2 (1 ) ( )

( | |) / 1

| ( , , ) | ( , , ,| |) (| |) | |

i

i i i i i

i i u i v

vi i i

i i u i i

i i i i i i i i i i iU

f y d U

y U

d dy U

f y d f y d U f U d U

x z

x

zx z

x z x z

Page 48: Econometric Analysis of Panel Data

Part 20: Selection [48/66]

Simulated Log Likelihood for a Selectivity Corrected Stochastic Frontier Model

2 212

1 12

exp ( | |) / )

2

1 ( | |) /log ( , , , , , ) log 1

(1 ) ( )

i i u ir vi

v

N Ri i u ir iS u v i r

i i

y Ud

y ULR

d

x

x z

z

The simulation is over the inefficiency term.

Page 49: Econometric Analysis of Panel Data

Part 20: Selection [49/66]

JLMS Estimator of ui

2 212

2

1 1

1

ˆˆ ˆ ˆexp ( | |) / )

ˆ 2ˆˆˆ ˆ ˆ ˆ( | |) /

ˆ1

1 1ˆ ˆˆ ˆˆ = ( | |) ,

ˆˆ Estimator of [ | ] ˆ

ˆ

i i u ir v

vir

i i u ir v i

R Ri u ir ir i irr r

ii i i

i

Rirr

y U

fy U a

A U f B fR R

Au E uB

g

x

x

1

1

ˆˆ ˆ ˆ| | where , 1

ˆ

Riru ir ir irR r

irr

fU g gf

Page 50: Econometric Analysis of Panel Data

Part 20: Selection [50/66]

Closed Form for the Selection Model The selection model can be estimated without simulation “The stochastic frontier model with correction for sample selection revisited.” Lai, Hung-pin. Forthcoming, JPA

Based on closed skew normal distribution Similar to Maddala’s 1982 result for the linear selection model. See slide 42. Not more computationally efficient. Statistical properties identical. Suggested possibility that simulation chatter is an element of inefficiency in the maximum simulated likelihood estimator.

Page 51: Econometric Analysis of Panel Data

Part 20: Selection [51/66]

Spanish Dairy Farms: Selection based on being farm #1-125. 6 periods

The theory works.

Closed Form vs. Simulation

Page 52: Econometric Analysis of Panel Data

Part 20: Selection [52/66]

Variables Usedin the Analysis

Production

Participation

Page 53: Econometric Analysis of Panel Data

Part 20: Selection [53/66]

Findings from the First Wave

Page 54: Econometric Analysis of Panel Data

Part 20: Selection [54/66]

A Panel Data Model Selection takes place only at the baseline. There is no attrition.

0 0 0

0

0 0 0

1[ > 0] Sample Selector, 0,1,... Stochastic Frontier

Selection effect is exerted on ; Corr( , , )( , ) ( ) ( | )

C

i i i

it i it it it

i i i

it i i it i

d hy w v u t

w h wP y d P d P y d

zx

0

0 1 0 00

0 1 0 0 0

onditioned on the selection ( ) observations are independent.

( , ,..., | ) ( | )

I.e., the selection is acting like a permanent random effect.

( , ,..., , ) ( ) (

i

Ti i iT i it it

Ti i iT i i it

h

P y y y d P y d

P y y y d P d P y 0| )t id

Page 55: Econometric Analysis of Panel Data

Part 20: Selection [55/66]

Simulated Log Likelihood

,

2 212

1 1 00

2

log ( , , , , )

exp ( | |) / )

21log( | |) /

1

i

S C u v

it it u itr v

T vR

d r tit it u itr v i

L

y U

R y U a

x

x

Page 56: Econometric Analysis of Panel Data

Part 20: Selection [56/66]

Benefit group is more efficient in both years The gap is wider in the second year Both means increase from year 0 to year 1 Both variances decline from year 0 to year 1

Main Empirical Conclusions from Waves 0 and 1

Page 57: Econometric Analysis of Panel Data

Part 20: Selection [57/66]

Page 58: Econometric Analysis of Panel Data

Part 20: Selection [58/66]

Attrition

In a panel, t=1,…,T individual I leaves the sample at time Ki and does not return.

If the determinants of attrition (especially the unobservables) are correlated with the variables in the equation of interest, then the now familiar problem of sample selection arises.

Page 59: Econometric Analysis of Panel Data

Part 20: Selection [59/66]

Application of a Two Period Model “Hemoglobin and Quality of Life in Cancer

Patients with Anemia,” Finkelstein (MIT), Berndt (MIT), Greene

(NYU), Cremieux (Univ. of Quebec) 1998 With Ortho Biotech – seeking to change

labeling of already approved drug ‘erythropoetin.’ r-HuEPO

Page 60: Econometric Analysis of Panel Data

Part 20: Selection [60/66]

QOL Study Quality of life study

i = 1,… 1200+ clinically anemic cancer patients undergoing chemotherapy, treated with transfusions and/or r-HuEPO

t = 0 at baseline, 1 at exit. (interperiod survey by some patients was not used)

yit = self administered quality of life survey, scale = 0,…,100 xit = hemoglobin level, other covariates

Treatment effects model (hemoglobin level) Background – r-HuEPO treatment to affect Hg level

Important statistical issues Unobservable individual effects The placebo effect Attrition – sample selection FDA mistrust of “community based” – not clinical trial based

statistical evidence Objective – when to administer treatment for maximum

marginal benefit

Page 61: Econometric Analysis of Panel Data

Part 20: Selection [61/66]

Dealing with Attrition The attrition issue: Appearance for the second

interview was low for people with initial low QOL (death or depression) or with initial high QOL (don’t need the treatment). Thus, missing data at exit were clearly related to values of the dependent variable.

Solutions to the attrition problem Heckman selection model (used in the study)

Prob[Present at exit|covariates] = Φ(z’θ) (Probit model) Additional variable added to difference model i =

Φ(zi’θ)/Φ(zi’θ) The FDA solution: fill with zeros. (!)

Page 62: Econometric Analysis of Panel Data

Part 20: Selection [62/66]

An Early Attrition ModelHausman, J . and Wise, D., "Attrition Bias in Experimental and Panel Data: The Gary Income Maintenance Experiment," Econometrica, 1979.A two period model:Structural response model (Random Effects Regre

i1 i1 i1 i

i2 i2 i2 i

i2 i2 i2 i2 i2

i2 i2

2 2 212 i1 i i2 i u u

ssion)y uy uAttrition model for observation in the second period (Probit)z * y vz 1(z * 0)Endogeneity "problem"

Corr[ u , u] /( ) C

x βx β

x θ w α

i2 i2 i i2 i2 i i2 iorr[v , u] Corr[v ( u), u)

Page 63: Econometric Analysis of Panel Data

Part 20: Selection [63/66]

Methods of Estimating the Attrition Model

Heckman style “selection” model Two step maximum likelihood Full information maximum likelihood Two step method of moments estimators Weighting schemes that account for the

“survivor bias”

Page 64: Econometric Analysis of Panel Data

Part 20: Selection [64/66]

Selection Model

i2 i2 i i2 i i

i2 i2

i2 i2

i2 i2

Reduced form probit model for second period observation equationz * x ( ) w ( u v ) r hz 1(z * 0)Conditional means for observations observed in the second period

E[y | x

i2i2 i2 12

i2

i2i1 i1 i2 i1 12

i2

(r ),z 1] x ( ) (r )First period conditional means for observations observed in the second period

(r )E[y | x ,z 1] x ( ) (r )(1) Estimate probit equation(2) Combine these two equations with a period dummy variable, use OLS with a constructed regressor in the second period

( )THE TWO DISTURBANCES ARE CORRELATED. TREAT THIS IS A SUR MODEL. EQUIVALENT TO MDE

Page 65: Econometric Analysis of Panel Data

Part 20: Selection [65/66]

Maximum Likelihood

2i1 i1

i 2

22 i2 12 i1 12 12 i112 2 2

12i2

i2 i2 i22

i2

(y x )log2LogL log2 2[(y y ) (x x ) )]log log 1 2 (1 )

zr ( / )(y x ) log

1

(1 z ) log

i2 12 i1 i12

12

r ( / )(y x )1

(1) See H&W for FIML estimation(2) Use the invariance principle to reparameterize(3) Estimate separately and use a two step ML with Murphy and Topel correction of asymptotic covariance matrix.

Page 66: Econometric Analysis of Panel Data

Part 20: Selection [66/66]

Page 67: Econometric Analysis of Panel Data

Part 20: Selection [67/66]

A Model of Attrition Nijman and Verbeek, Journal of Applied

Econometrics, 1992 Consumption survey (Holland, 1984 –

1986) Exogenous selection for participation (rotating

panel) Voluntary participation (missing not at

random – attrition)

Page 68: Econometric Analysis of Panel Data

Part 20: Selection [68/66]

Attrition Model

i,t 0 i,t i i,t

i i i i i

i,t 0 i,t i i i,t

The main equationy , Random effects consumption function

u , Mundlak device; u uncorrelated with y u , Reduced form random effect

xx X

x x

it

it

it

s model

The selection mechanisma 1[individual i asked to participate in period t] Purely exogenous a may depend on observables, but does not depend on unobservablesr 1[individual i cho

it

it it

it

oses to participate if asked] Endogenous. r is the endogenous participation dummy variable a 0 r 0 a 1 the selection mechanism operates

Page 69: Econometric Analysis of Panel Data

Part 20: Selection [69/66]

Selection Equation

i,t 0 i,t i i i,t

it

it

The main equationy u , Reduced form random effects model

The selection mechanismr 1[individual i chooses to participate if asked] Endogenous. r is the endogenous p

x x

it it

it

it 0 i,t i i,t i i,t it

i

articipation dummy variable a 0 r 0 a 1 the selection mechanism operatesr 1[ v w 0] all observed if a 1

State dependence: may include r

x x zz

,t-1

2

i,t i

v

,t i i

Latent persistent unobserved heterogeneity: 0."Selection" arises Cov[ ,w ] 0 Cov[u ,v if or ] 0

Page 70: Econometric Analysis of Panel Data

Part 20: Selection [70/66]

Estimation Using One Wave Use any single wave as a cross section

with observed lagged values. Advantage: Familiar sample selection

model Disadvantages

Loss of efficiency “One can no longer distinguish between state

dependence and unobserved heterogeneity.”

Page 71: Econometric Analysis of Panel Data

Part 20: Selection [71/66]

One Wave Model

it 0 it i i it

it 0 it i 1 i,t 1 2 i,t 1 i it

i,t-1

A standard sample selection model.y x x (u )r 1[ x x r a (v w ) 0]

With only one period of data and r exogenous,

this is the Heckman sample selection mod

i,t-1 i

i,t-1

i,t-1

el.If > 0, then r is correlated with v and the Heckman

approach fails.An assumption is required:(1) Include r and assume no unobserved heterogeneity

(2) Exclude r and assume there is n

i it i it

o state dependence.

In either case, now if Cov[(u ),(v w )] we can use OLS.Otherwise, use the maximum likelihood estimator.

Page 72: Econometric Analysis of Panel Data

Part 20: Selection [72/66]

Maximum Likelihood Estimation See Zabel’s model in slides 20 and 23. Because numerical integration is required in

one or two dimensions for every individual in the sample at each iteration of a high dimensional numerical optimization problem, this is, though feasible, not computationally attractive. The dimensionality of the optimization is irrelevant This is much easier in 2015 than it was in 1992

(especially with simulation) The authors did the computations with Hermite quadrature.

Page 73: Econometric Analysis of Panel Data

Part 20: Selection [73/66]

Testing for Selection? Maximum Likelihood Results

Covariances were highly insignificant. LR statistic=0.46.

Two step results produced the same conclusion based on a Hausman test

ML Estimation results looked like the two step results.

Page 74: Econometric Analysis of Panel Data

Part 20: Selection [74/66]

A Dynamic Ordered Probit Model

Page 75: Econometric Analysis of Panel Data

Part 20: Selection [75/66]

Random Effects Dynamic Ordered Probit Model

Jit it j 1 j i,t 1 i i,t

i,t j-1 it j

i,t i,t

Jit,j it j it j 1 j i,t 1 i

Random Effects Dynamic Ordered Probit Model

h * x h ( j)

h j if < h * <

h ( j) 1 if h = j

P P[h j] ( x h ( j) )

i

i

Jj 1 it j 1 j i,t 1 i

Ji 0 j 1 1,j i,1 i i

TNit,j j ji=1 t 1

( x h ( j) )

Parameterize Random Effects

h ( j) x u

Simulation or Quadrature Based Estimation

lnL= ln P f( )d

Page 76: Econometric Analysis of Panel Data

Part 20: Selection [76/66]

A Study of Health Status in the Presence of Attrition

“THE DYNAMICS OF HEALTH IN THE BRITISH HOUSEHOLD PANEL SURVEY,”

Contoyannis, P., Jones, A., N. RiceJournal of Applied Econometrics, 19, 2004, pp. 473-503. Self assessed health British Household Panel Survey (BHPS)

1991 – 1998 = 8 waves About 5,000 households

Page 77: Econometric Analysis of Panel Data

Part 20: Selection [77/66]

Attrition

Page 78: Econometric Analysis of Panel Data

Part 20: Selection [78/66]

Testing for Attrition Bias

Three dummy variables added to full model with unbalanced panel suggest presence of attrition effects.

Page 79: Econometric Analysis of Panel Data

Part 20: Selection [79/66]

Attrition Model with IP Weights

Assumes (1) Prob(attrition|all data) = Prob(attrition|selected variables) (ignorability) (2) Attrition is an ‘absorbing state.’ No reentry. Obviously not true for the GSOEP data above.Can deal with point (2) by isolating a subsample of those present at wave 1 and the monotonically shrinking subsample as the waves progress.

Page 80: Econometric Analysis of Panel Data

Part 20: Selection [80/66]

Probability Weighting Estimators A Patch for Attrition (1) Fit a participation probit equation for

each wave. (2) Compute p(i,t) = predictions of

participation for each individual in each period. Special assumptions needed to make this work

Ignore common effects and fit a weighted pooled log likelihood: Σi Σt [dit/p(i,t)]logLPit.

Page 81: Econometric Analysis of Panel Data

Part 20: Selection [81/66]

Inverse Probability WeightingPanel is based on those present at WAVE 1, N1 individualsAttrition is an absorbing state. No reentry, so N1 N2 ... N8.Sample is restricted at each wave to individuals who were present atthe pre

1 , 1

1

vious wave.d = 1[Individual is present at wave t]. d = 1 i, d 0 d 0.

covariates observed for all i at entry that relate to likelihood of being present at subsequen

it

i it i t

i

xt waves.

(health problems, disability, psychological well being, self employment, unemployment, maternity leave, student, caring for family member, ...)Probit model for d 1[it i x 1

1

ˆ], t = 2,...,8. fitted probability.ˆ ˆAssuming attrition decisions are independent, P

dˆInverse probability weight WP̂

Weighted log likelihood logL log (No common

it it

tit iss

itit

it

W it

w

L

8

1 1 effects.)N

i t

Page 82: Econometric Analysis of Panel Data

Part 20: Selection [82/66]

Spatial Autocorrelation in a Sample Selection Model

Alaska Department of Fish and Game. Pacific cod fishing eastern Bering Sea – grid of locations Observation = ‘catch per unit effort’ in grid square Data reported only if 4+ similar vessels fish in the region 1997 sample = 320 observations with 207 reported full data

Flores-Lagunes, A. and Schnier, K., “Sample selection and Spatial Dependence,” Journal of Applied Econometrics, 27, 2, 2012, pp. 173-204.

Page 83: Econometric Analysis of Panel Data

Part 20: Selection [83/66]

Spatial Autocorrelation in a Sample Selection Model

• LHS is catch per unit effort = CPUE• Site characteristics: MaxDepth, MinDepth, Biomass• Fleet characteristics:

Catcher vessel (CV = 0/1) Hook and line (HAL = 0/1) Nonpelagic trawl gear (NPT = 0/1) Large (at least 125 feet) (Large = 0/1)

Flores-Lagunes, A. and Schnier, K., “Sample selection and Spatial Dependence,” Journal of Applied Econometrics, 27, 2, 2012, pp. 173-204.

Page 84: Econometric Analysis of Panel Data

Part 20: Selection [84/66]

Spatial Autocorrelation in a Sample Selection Model*1 0 1 1 1 1 1

*2 0 2 2 2 2 2

21 1 12

122 12 2

*1 1

2

0~ , , (?? 1??)

0

Observation Mechanism

1 > 0 Probit Model

i i i i ij j ij i

i i i i ij j ij i

i

i

i i

i

y u u c u

y u u c u

N

y y

y

x

x

*2 1 if = 1, unobserved otherwise.i iy y

Page 85: Econometric Analysis of Panel Data

Part 20: Selection [85/66]

Spatial Autocorrelation in a Sample Selection Model

1 1 1

1 (1)1 1 1 2

2* (1) 2 (1)1 0 1 1 1 1 1 1

* (2) 2 (2 0 2 2 2 1 1

= Spatial weight matrix, 0.

[ ] = , likewise for

( ) , Var[ ] ( )

( ) , Var[ ] ( )

ii

N Ni i ij i i ijj j

Ni i ij i i ijj

y u

y u

u CuC C

u I C u

x

x

22)1

(1) (2)1 2 12 1

Cov[ , ] ( ) ( )

N

j

Ni i ij ijj

u u

Page 86: Econometric Analysis of Panel Data

Part 20: Selection [86/66]

Spatial Weights

2

1 ,

Euclidean distance

Band of 7 neighbors is usedRow standardized.

ijij

ij

cd

d

Page 87: Econometric Analysis of Panel Data

Part 20: Selection [87/66]

Two Step Estimation

0 122 (1)(1) (2)

1 11

2(1) 1 0 1

22 (1)1 1

Probit estimated by Pinske/Slade GMM

( )( ) ( )

( )

( )

Spatial regression with included IMR i

i

NNijij ij jj

iN

ijj i

Nijj

x

x

n second step(*) GMM procedure combines the two steps in one large estimation.

Page 88: Econometric Analysis of Panel Data

Part 20: Selection [88/66]