econometric methods and applications i lecture 4

Non-linear regression models a) The probit and tobit models as examples b) Interpretation of the models c) Relevant estimation methods (ML) d) Final considerations about regressions

Literature Wooldridge (2002,2010): 15.1-15.4, 15.6, 17.1-17.4

Econometric Methods and Applications I Lecture 4

Econometric Methods and Applications I, Lecture 4, Slide 2

Introduction (1)

> If the CIA is valid, then the causal parameter in the most

simple binary D framework is obtained by estimating

and aggregating over the appropriate distribution of X

> Now, we consider the case when a linear approximation of

these conditional expectations is inadequate

( | , 1) ( | , 0)E Y X x D E Y X x D= = − = =


Introduction (2)

> Example

• True conditional expectation of Y:

• Function used in estimation by linear regression:

• Implied error term by function used in estimation:

> Implied error term is correlated with variables used in estimation OLS/FGLS is inconsistent

> Non-linear models may lead to better approximation of and may avoid inconsistent estimation

( , )g x θ

xβ

( | ) ( | ) ( , ) 0U Y X

E Y X X x E Y X x x g x xβ

β β θ β= − ⇒

− = = = − = − ≠

( | ) ( , )E Y X x g x θ= =


Probit model (1-1)

> Example 1: With a binary outcome variable, linear regression is

generally not attractive, because

• A probability is bounded between zero and one

• The probability (usually) corresponds to a cumulative distribution

function (details later), which is generally not linear in its

arguments (exception: uniform distribution)

( | ) 1 ( 1| ) 0 ( 0 | )( 1| )

E Y X x P Y X x P Y X xP Y X x

= = × = = + × = == = =


Probit model (1-2)


Probit model (2)

> Model based on a linear index model and a

non-linear link function

> F(.) denotes the cdf of U

( 1| )P Y X x= =

1( 0)Y X Uβ= + > ⇒

( | ) ( 1| )( 0 | )( | )

1 ( | )1 ( )

E Y X x P Y X xP X U X xP U X X x

P U X X xF x

ββ

ββ

= = = == + > == > − == − ≤ − == − −


Probit model (3)

> By making different distributional assumptions concerning F,

we obtain different models (logit, probit, etc.)

> Only in one special case (U is distributed uniformly in a fixed

interval) is F(.) a linear function (linear probability model)

> Generally, this expression simplifies for symmetric

distributions:

( | ) 1 ( ) ( )u symmetric

E Y X x F x F xβ β= = − − =


Probit model (4)

> By assuming that U is normally distributed with mean zero

and variance , we obtain the probit model:

( | ) ,

: cdf of standard normal distribution

xE Y X x

a

βσ

σ

= = Φ Φ

2σ

2Remember: (0, ) ( ) aA N P A aσσ ⇒ ≤ = Φ


Probit model (5)

> General identification problem of binary choice models

• The following two models lead to the same dependent variable:

• therefore, it is impossible to distinguish models empirically

• Some (convenient) normalisation is needed usually

1( 0) and 1( 0); 0.Y X U Y X Uβ βσ σ σ= + > = + > >

1σ =


Tobit model (1-1)

> Second example for a non-linear model: Tobit

• Motivation: Some dependent variables cannot fall below (rise

above) some threshold

− e.g. earnings cannot be negative

> Again, modelling is based on a latent linear index:

*

1( 0)( )Y

Y X U X Uβ β= + > +


Tobit model (1-2)


Tobit model (2)

> Assume U is normally distributed with mean 0 and variance

> Derivation of E(Y| X) is somewhat complex (Wooldridge, 2002,

Ch. 16.2)

> Consider only the subpopulation having positive values of y:

2σ

( | ) , : pdf of stand. normal distr.x x aE Y X x xβ ββ σφ φσ σ σ

= = Φ +

( | , 0)

xxE Y X x Y x x

x

βφβσβ σ β σλ

β σσ

= > = + = + Φ


Tobit model (3)

> Estimation in the complete sample

> OLS is inconsistent because of the neglected nonlinearity

and the omitted variable

xβσ

Φ

xβφσ

( | ) x xE Y X x xβ ββ σφσ σ

= = Φ +


Tobit model (4)

> Estimation in the subsample with Y > 0

> OLS in the population with positive Y is inconsistent because

of the omitted variable

xx

x

βφβ σλ

βσσ

= Φ

( | , 0)

xxE Y X x Y x x

x


β σσ

= > = + = + Φ


Effects of interest (1)

> In case of a binary treatment D, we want to compute

> Assume the most simple model without D x X interactions

[ ]( )

( ) | ( | 1, ) ( | 0, ) |x

E x D d E E Y D X x E Y D X x D dθ

θ = = = = − = = =

* 2

Probit

obit

; (0, )

P robit: ( )

Tobit: ( ) ( ) ( )T

Y X D U U N

x xx

x x x xx x x

β γ σ

β γ βθσ σβ γ β γ β βθ β γ σφ β σφσ σ σ σ

= + +

+ = Φ −Φ

+ + = Φ + + −Φ −



> If D is continuous we may be more interested in how the

conditional expectation changes for very small changes in D

( | , )P robit:

( | , )Tobit:

E Y D d X x x dd

E Y D d X x x dd

β γ γφσ σ

β γ γσ

∂ = = + = ∂

∂ = = + = Φ ∂



> The coefficient is informative about the sign of the effect,

but not of its magnitude which depends also on the other

coefficients and control variables

(this is different in the linear regression model)

γ


Estimation (1)

> Minimizing the squared deviation between actual and

predicted individual outcomes (least squares principle)

• is not efficient (due to the implied heteroscedasticity)

> Probit and Tobit models are usually estimated by maximum

likelihood or generalized methods of moments

• both these estimation methods will be discussed in more detail in

Econometric Methods and Applications II


Estimation (2)

> Basic idea of Maximum Likelihood (ML)

• Choose the unknown coefficients in such a way that the observed

sample is most likely to come from an underlying population

described by the chosen values of the coefficients

> Properties of ML

• When the model is correctly specified and some further regularity

conditions are met, ML is consistent, asymptotically efficient,

and asymptotically normally distributed


Estimation (3) > Basic idea of the Generalized Method of Moments (GMM)

• The model implies that the residual V (not U) has conditional expectation 0

• Thus, it is uncorrelated with all functions of X. This defines a set of moment conditions (equalities) that hold in the population for the true parameters

• Choose the parameters such that the sample analogues of those moments (very often mean functions) come as close as possible to fulfil the same conditions in the sample

• Under correct specification, GMM is (usually) consistent and asymptotically normally distributed

( | ) | ( | ) 0; Probit :V

XE Y E Y X x X x E V X x V Y βσ

− = = = = = = −Φ

N −


Estimation (4)

> Probit and Tobit models are usually estimated by ML

> Tobit model: There is a particularly simple 2-step GMM

estimator when considering observations with positive y:

( | , 0)

xxE Y X x Y x x

x


β σσ

= > = + = + Φ


Estimation (5)

> 1st step:

• Estimate a probit model to obtain consistent estimates of

• Use them to compute a consistent estimate of

for every observation:

> 2nd step

• Use as additional regressor in regression of Y on X (Heckit)

( | , 0) xE Y X x Y x ββ σλσ

= > = +

βσxβλ

σ

ˆi

i i

i

xx

x

βφσβλ λ

σ βσ

= = Φ

iλ


Computing the effects of interest (1)

( )

Probitˆi i ix x xβ γ βθ

σ σ σ

= Φ + −Φ

( )11

1 ˆ N

i ii

ATET d xN

θ=

= ∑

( )Tobitˆ ˆ ˆ ˆˆ ˆˆ ˆ ˆˆ ˆ ˆ( )ˆ ˆ ˆ ˆ

i i i ii i

x x x xx x xβ γ β γ β βθ β γ σφ β σφσ σ σ σ

+ += Φ + + −Φ −

( )11

1 ˆ (1 )N

i ii

ATENT d xN N

θ=

= −− ∑

( )1

1 ˆ N

ii

ATE xN

θ=

= ∑


Computing the effects of interest (3)

> The same logic applies to the continuous outcomes

> Averaging is over the various populations as before

> Most statistical software packages also provide the values of

these derivatives or discrete changes of D (and other X) for a

particular value of D and X, usually the sample mean

( | , )P robit: E Y D d X x x dd

β γ γφσ σ σ

∂ = = = + ∂ ˆ ˆ( | , ) ˆTobit:

ˆE Y D d X x x d

dβ γ γσ

∂ = = += Φ ∂


Final considerations about regressions (1)

> Linear and non-linear regressions are tools to remove differences

in the outcome variables due to observable variables

> Whether this is enough to uncover causal effects depends on the

(non-)existence other (non-observables) differences also related

to selection (i.e. other factors influencing D and Y)

> Regressions uncover causal effects if

• conditional expectations are of linear or non-linear known form

• CIA holds

[ ] [ ]( | ) | ( , ) | ( | ) 0E Y E Y X x X x E Y g X X x E U X xθ− = = = − = = = =



> Regressions for causal inference

• If effect heterogeneity is expected include (enough) interaction

terms D x X

• Always check whether coefficient has an interpretation as effect

or if more complex calculations are required, in particular in

models with D x X interactions and non-linear models

• If unsure about non-linearities or if substantial effect

heterogeneity of unknown form is expected

use more flexible semi- or non-parametric methods (as

discussed in the course “Flexible estimation in practice”)!



> Regressions for causal inference: Fatal mistakes

• Bad controls

− Conditioning on variables influenced by D (simultaneous equation bias)

− Controls measured with error related to D (measurement error bias)

• Missing variables

− Variables related to Y and D are not in the data (omitted variable bias)

• Specification error of the conditional expectation functions acts like a

missing variable (or a measurement error)

− Misspecified bit of true regression becomes part of error term and may

violate E(U|X)=0 condition

econometric methods and applications i lecture 4

Documents