duration analysis - uab barcelonapareto.uab.cat/jllull/bgse_panel_data/duration_notes.pdf · the...

Duration Analysis

Joan Llull

Panel Data and DurationMaster in Macroeconomic Policy and Financial Markets

Barcelona GSE

I. Introduction

A. Motivation

There are many examples in economics in which our variable of interest is a

duration. Duration data can answer the question “how long an individual has

been in a particular state when exiting from it”. Examples are the number of

weeks or months that an individual has been unemployed when she finds a job,

how long an individual have been in a hospital before leaving it, or what is the

life expectancy of an individual with certain age (i.e. how long has she been alive

when she dies). In this chapter we learn the basic tools to model this kind of data.

This kind of analysis allows us to talk about why durations differ across individ-

uals (i.e. what is the effect of individual characteristics on duration), and about

how and why do exit probabilities vary over time. These techniques have a long

tradition in biometrics. For this reason, it is common to find a lot of nomenclature

that has been borrowed from that field: survival probabilities, hazard functions,...

B. Duration data

Our data consist of a sample of durations. We have a sample of durations for N

individuals, t1, t2, ..., tN . Importantly, these data are typically censored. Figure 1

draws two examples of censored samples. In the left plot, we have a hypothetical

sample of durations that is assumed to be obtained interviewing individuals from

January 1990 to January 1992 at a monthly basis. For individuals 2 and 4, we

observe complete duration spells. Individual 1 was already unemployed at the

first interview date. Individual 3 is still unemployed at the last interview date. In

both cases we know that the duration of their unemployment spell is larger than

a certain value, but not by how much; i.e. we observe t > t̄, but not t.

The second hypothetical sample could be collected through registries. Imagine

that, in order to receive the unemployment benefit, workers have to show up at

the unemployment office and prove that they are still unemployed. In the figure,

1

Figure 1. Two examples of censored observations

A. Example 10

12

34

Indi

vidu

al

Jan90 Jul90 Jan91 Jul91 Jan92Date

B. Example 2

01

23

4In

divi

dual

Jan90 Jul90 Jan91 Jul91 Jan92Date

Note: Black lines represent the time when the individual was unemployed. A dot indicates that theindividual is still unemployed at that date, but we do not have further information about him/her.Vertical red dashed lines in Example 2 are interview dates.

individual 1 found a job in less than one year; in our data, we observe that he

does not appear in the sample of January 1991, so we learn that he found a job

before; as a result, we know that the duration of this unemployment spell was

below one year, but not by how much. Similarly, individuals 3 and 4 found a job

during the second year. For these three individuals data is censored in the sense

that we only observe that the duration is within a given interval, but we do not

know it exactly; i.e. we observe t < t < t̄ instead of t. Individual 2 is censored

exactly in the same way as individuals 1 and 3 from the left figure.

The presence of censoring is one of the main motivations to use the techniques

presented here (as opposed to, say, regression). If data are censored, sample

average durations are biased. This framework incorporates censoring into the

analysis without the need of additional strong assumptions.

On top of censoring, duration analysis allows for time/state dependence, i.e. the

probability of terminating the current spell depends on the duration of the spell.

For instance, we might be interested in analyzing whether individuals’ probability

of finding a job is decreasing in the time they have been unemployed in the current

spell. These techniques allow us to go beyond the simple average duration and

look at the shape of exit probabilities over different durations.

II. The Hazard Function

The hazard function is the most important object in this analysis. It is defined

as the probability/density of exiting at t conditional on being alive. In the unem-

ployment example, it is the probability that an individual finds a job at, say, the

10th month of unemployment, conditional on being unemployed in the 9th month.

2

Figure 2. Mortality Hazard Rate

0.0

0.2

0.4

0.6

0.8

1.0

Haz

ard

func

tion

0 20 40 60 80 100Age

Note: The line depicts the hazard mortality rate, i.e. the probability of dying at age a conditional onsurvival until that age.

This hazard function can be constant or time varying. For instance, if individu-

als receive offers in every period with the same probability, the hazard is constant.

A time-varying example is plotted in Figure 2. The figure depicts the hazard

mortality rates of a given population. Infant mortality makes it decreasing at the

first five years of age. A hazard rate of around 8% at age 5 does not mean that the

probability of dying at age 5 is 8% but, instead, that the probability of dying at

age 5 for those individuals who survived the first 4 years is 8%. A good example

to understand this distinction is the hazard rate at age 100: probability of dying

at age 100 conditional on having survived until age 99 is almost 1; however, the

probability of dying at age 100 is almost zero.

We consider both discrete and continuous durations. Whenever our duration

of interest is discrete, the hazard function is a probability. When the duration is

continuous, this hazard function is a pdf.

Our interest in the hazard function has theoretical and empirical grounds. From

a theoretical perspective, it is more appealing to model the decision of exiting con-

ditional on survival than the duration itself (e.g. modeling job offer arrival rate).

Empirically, it is also convenient to model the hazard function because it implies

a binomial discrete decision and delivers a very clean likelihood, and because it

allows to take censoring into account without need of further strong assumptions.

In this section, we characterize the unconditional hazard function. We build

bridges between hazard functions and pdfs/cdfs/probability masses, which then

are used to write the likelihood function. Regressors are introduced later on in

the chapter.

3

A. Hazard function for a discrete variable

Let t be a random variable with discrete support {1, 2, 3, ...} with probability

mass function p(τ) = Pr(t = τ) and cdf F (t) = p(1) + p(2) + ... + p(t) for

t = 1, 2, 3, .... The hazard function is defined as:

h(τ) ≡ Pr(t = τ |t ≥ τ) =Pr(t = τ)

Pr(t ≥ τ)=

p(τ)

1− F (τ − 1)=F (τ)− F (τ − 1)

1− F (τ − 1). (1)

This hazard function is a modeling decision (e.g. with a logistic or normal cdf).

We need to recover p(t) and F (t) in order to write the likelihood. To recover

them, we proceed recursively. In the first period, we know that:

h(1) = Pr(t = 1|t ≥ 1) = Pr(t = 1) = p(1) and F (1) = p(1). (2)

In the second period, we can use equation (1):

h(2) =p(2)

1− F (1)=

p(2)

1− h(1)⇒ p(2) = h(2)(1− h(1)). (3)

Hence:

F (2) = p(1)+p(2) = h(1)+h(2)(1−h(1)) ⇔ 1−F (2) = (1−h(2))(1−h(1)).

(4)

In the third period:

h(3) =p(3)

1− F (2)=

p(3)

(1− h(2))(1− h(1))⇒ p(3) = h(3)

2∏s=1

(1− h(s))). (5)

Hence:

F (3) = p(1) + p(2) + p(3) = h(1) + h(2)(1− h(1)) + h(3)2∏s=1

(1− h(s))), (6)

which implies:

1− F (3) =3∏s=1

(1− h(s))). (7)

In general, the recursion is such that:

p(t) = F (t)− F (t− 1) = h(t)(1− F (t− 1)) = h(t)t−1∏s=1

(1− h(s)), (8)

and:

F (t) = 1−t∏

s=1

(1− h(s)). (9)

4

These expressions are interpretable. In particular:

p(τ) = Pr(t = τ) = h(τ)τ−1∏s=1

(1−h(s)) = Pr(t = τ |t ≥ τ) Pr(t > τ − 1|t ≥ τ − 1)...,

(10)

or, in words, the probability of exiting at time t is equal to the probability of

exiting at time t conditional on survival times the probability of survival until t.

Additionally:

F (τ) = Pr(t ≤ τ) = 1−τ∏s=1

(1− h(s)) =

= 1− Pr(t > τ |t ≥ τ) Pr(t > τ − 1|t ≥ τ − 1)... =

= 1− Pr(t > τ). (11)

B. Hazard function for a continuous variable

Consider now the case of a continuous duration t. This random variable is

characterized by its pdf f(t) instead of a probability mass. Therefore, its hazard

function is also a density:

h(τ) = limdt→0

Pr(τ ≤ t < τ + dt|t ≥ τ)

dt= lim

dt→0

Pr(τ ≤ t < τ + dt)

Pr(t ≥ τ)

/dt =

f(τ)

1− F (τ).

(12)

Note that this expression is analogous to equation (1).

In order to derive f(t) and F (t) from the hazard function (so that we can write

the likelihood of our sample) we make use of the integrated or cumulative hazard :

H(t) =

∫ t

0

h(s)ds. (13)

The integrated hazard can be written as a function of f(t) and F (t):

H(t) =

∫ t

0

f(s)

1− F (s)ds = [− ln(1− F (s))]t0 = − ln[1− F (t)]. (14)

Therefore, we can trivially see that:

F (t) = 1− exp(−H(t)), (15)

and, similarly:

f(t) =∂F (t)

∂t= h(t) exp(−H(t)). (16)

The interpretation of these expressions is not as straightforward as before, but

5

we can make a connection with the discrete case. In the discrete case:

ln[1− F (t)] =t∑

s=1

ln(1− h(s)), (17)

and ln(1− h(s)) ≈ −h(s), which compares to the continuous case:

ln[1− F (t)] = −H(t) =

∫ t

0

(−h(s))ds. (18)

C. Some frequently used hazard functions

In the discrete case, we often avoid parametric assumptions on the hazard func-

tion, and we estimate it semi-parametrically. When the duration is continuous,

however, (and in some cases when it is discrete), we need to make functional form

assumptions on the hazard function. These are two widely used cases.

Constant hazard This is the simplest possible hazard function. In our example

of unemployment duration, this assumption is consistent with a constant job ar-

rival rate, i.e. in every period we have the same probability of receiving a job offer,

no matter how long we have been unemployed. Therefore, we assume h(t) = λ,

with λ > 0.

Given this function, the integrated hazard is very easy to compute:

H(t) =

∫ t

0

λdu = λt. (19)

Therefore, cdf and pdf are trivially derived:

F (t) = 1− e−λt, (20)

and:

f(t) = λe−λt, (21)

which is the exponential distribution.

Therefore, the exponential distribution has a constant hazard (this is called the

memoryless property of the exponential distribution). Additionally, the expected

duration with this function is the inverse of the hazard function: E[T ] = 1/λ.

The discrete counterpart of this parametric family of functions is:

F (t) = 1− (1− λ)t, (22)

and:

f(t) = λ(1− λ)t−1. (23)

6

Figure 3. Examples of Hazard Functions: Constant and Weibull

Panel A. Constant Hazard

A. Hazard function0.

000.

040.

080.

120.

160.

20H

azar

d ra

te [

h(t)

]

0 10 20 30 40Duration [t]

B. Integrated hazard

0.0

1.0

2.0

3.0

4.0

5.0

Inte

grat

ed h

azar

d [H

(t)]

0 10 20 30 40Duration [t]

C. Pdf f(t)

0.00

0.02

0.04

0.06

0.08

0.10

0.12

Prob

. den

sity

fun

ctio

n [f

(t)]

0 10 20 30 40Duration [t]

D. Cdf F (t)

0.00

0.20

0.40

0.60

0.80

1.00

Cum

. dis

trib

utio

n fu

nctio

n [F

(t)]

0 10 20 30 40Duration (t)

Panel B. Weibull Distribution

E. Hazard function

0.00

0.10

0.20

0.30

0.40

0.50

Haz

ard

rate

[h(

t)]

0 10 20 30 40Duration [t]

F. Integrated hazard

0.0

2.0

4.0

6.0

8.0

10.0

12.0

14.0

Inte

grat

ed h

azar

d [H

(t)]

0 10 20 30 40Duration [t]

G. Pdf f(t)

0.00

0.03

0.06

0.09

0.12

0.15

Prob

. den

sity

fun

ctio

n [f

(t)]

0 10 20 30 40Duration [t]

H. Cdf F (t)

0.00

0.20

0.40

0.60

0.80

1.00

Cum

. dis

trib

utio

n fu

nctio

n [F

(t)]

0 10 20 30 40Duration (t)

Note: Black: λ = 0.12; gray: λ = 0.05. Solid: α = 1; dotted: α = 0.5; dashed: α = 1.5; dot-dashed:α = 3. The examples in the top panel depict the hazard function (left), the integrated hazard (center-left), the pdf or probability mass function (center-right) and the cdf (right) of a constant hazard modelwith hazard equal to λ. The bottom panel depicts the corresponding functions for a Weibull hazardmodel with parameters λ and α.

Top panel in Figure 3 depicts the hazard rate, the integrated hazard, the cdf

and the pdf of a constant hazard model. We can see that a constant hazard

rate implies a decreasing unconditional probability of exiting and a marginally

decreasing cdf. The figure also allows us to illustrate the fact that the integrated

hazard does not have any interpretation as a conditional cdf, as it can take values

above 1 as in this case.

The Weibull distribution Another hazard function that is commonly used

for continuous durations is derived from a two-parameter generalization of the

exponential distribution known as the Weibull distribution. This function allows

for a hazard that increases or decreases monotonically.

The Weibull distribution is given by:

F (t) = 1− e−(λt)α λ > 0, α > 0. (24)

Hence, its pdf is:

f(t) =∂F (t)

∂t= αλαtα−1e−(λt)

α

, (25)

7

the hazard function is:

h(t) =f(t)

1− F (t)=αλαtα−1e−(λt)

α

e−(λt)α= αλαtα−1, (26)

and the integrated hazard is:

H(t) =

∫ t

0

h(s)ds = (λt)α. (27)

The bottom panel of Figure 3 plots this function for different combinations of

the two parameters. It can be seen that the function is very flexible in mimicking

different shapes for the hazard function. Its main limitation, however, is that

hazard rates are either monotonically increasing or monotonically decreasing (for

instance, it cannot approximate the hazard function of mortality from Figure 2).

III. Conditional Hazard Functions

This section discusses the introduction of covariates into the model. We need

to write the hazard function conditional on covariates:

h(t,x) =f(t|x)

1− F (t|x). (28)

Below we review some of the most popular approaches for doing this.

A. The proportional hazard model

The Proportional Hazard (PH) model (or Cox model, after Cox (1972)) is prob-

ably one of the most widely used duration models because of its simplicity. In

this model, the conditional hazard function is given by:

h(t,x) = λ(t) exp(x′β). (29)

This is, we factorize h(t,x) into a function of t and a function of x, so that two

different individuals have exit probabilities that are proportional for all t (hence

the name). For instance, if an individual has twice the probability of another

individual of exiting from unemployment in period t1 conditional on survival, she

also has twice the conditional probability of the other individual of exiting at t2.

The function λ(t) is called the baseline hazard function (as it provides the

shape of the conditional hazard function, that then is scaled differently for every

individual). The baseline hazard is often assumed to be given by one of the two

possibilities described above. Note that if x′β includes a constant term, then the

scale of the corresponding hazard function needs to be normalized (e.g. λ(t) = 1

in the constant case, and λ(t) = αtα−1 in the Weibull case).

8

B. Discrete durations

In discrete duration models, even though some times we might proceed with the

standard PH model, it is common to select richer models. The reason is that the

problem then reduces to a sequence of Probit or Logit estimations. In general, we

can specify conditional hazard rates as follows:

h(τ,x) = Pr(t = τ |t ≥ τ,x) = G(γτ + x′βτ ), (30)

where G(.) is a cdf (e.g. normal or logistic for Probit and Logit respectively).

This model is richer than the previous one because it allows for different different

βs at different durations. The equivalent to the baseline hazard here, {γt}Tt=0,

where T is the maximum duration observed in the data, can be specified in many

different ways. One possibility is to specify a polynomial like:

γt = γ0 + γ1 ln t+ γ2(ln t)2. (31)

Another possibility is to leave these parameters free:

γt =T ∗∑j=1

γj 1{t = j}. (32)

Note that we only specify T ∗ < T parameters instead of T . The reason for this is

identification: on the one hand, we want T ∗ to be close enough to T , but, on the

other hand, we need a critical mass of individuals being alive at T ∗ to be able to

identify exit rates (e.g. if T is the period in which the last individual exits, then

γT would be such that the probability of exiting at period T is equal to 1). This

flexible option is very interesting, as it provides a semi-parametric estimation of

λ(t) and it allows to test other parametric assumptions.

IV. Likelihood Functions

A. Complete continuous durations

Assume that we observe {t1, t2, ..., tN}. Then, the log-likelihood of this sample is:

LN =N∑i=1

ln f(ti|xi), (33)

where:

f(t|x) = h(t,x) exp(−H(t,x)) = λ(t) exp(x′β) exp {−Λ(t) exp(x′β)} . (34)

9

B. Censored continuous durations

Duration data can be censored, i.e. we know that t > t or that t < t < t,

but we do not observe the exact value of t. We allow the level of censoring to be

observation specific (with the individual subindexes in the upper and lower limits).

The contribution to the likelihood of an observation that is censored because we

only observe that t > t is Pr(t ≥ t|x) = 1 − F (t|x). Hence, in this case, the

log-likelihood boils down to:

LN =N∑i=1

{wi ln f(ti|xi) + (1− wi) ln(1− F (ti|xi))

}, (35)

where wi = 1{ti < ti}, i.e. equals 1 if the observation is not censored. This is the

a log-likelihood of a sample like the one generated in Example 1 of Figure 1.

Now consider the Example 2 of Figure 1. There we have observations with

durations below one year (t1), others with durations between one and two years

(t1< t < t

2), and others with durations above two years (t > t

2). In this case,

the log-likelihood is:

LN =N∑i=1

{w1i lnF (t

1i |xi) + w2

i ln(F (t

2i |xi)− F (t

1i |xi)

)+ (1− w1

i − w2i ) ln

(1− F (t

2i |xi)

)}, (36)

where w1i = 1{ti < t

1i }, and w2

i = 1{t1i < ti < t2i }.

As a final example, consider a case like Example 2 of Figure 1 but in which at

the starting point, individuals have not had 0 periods unemployed, but, instead,

d1, d2, ..., dN , with di known by the researcher. In this case, the log-likelihood

looks similar to (36) with the exception of conditioning on initial duration:

LN =N∑i=1

{w1i ln

F (di + t1i |xi)− F (di|xi)

1− F (di|xi)+ w2

i lnF (di + t

2i |xi)− F (di + t

1i |xi)

1− F (di|xi)

+(1− w1i − w2

i ) ln1− F (di + t

2i |xi)

1− F (di|xi)

}, (37)

where w1i = 1{di < ti < di + t

1i }, and w2

i = 1{di + t1i < di + ti < t

2i }.

C. Discrete durations

In the discrete duration model, we use a logistic or normal cdf (Logit or Probit)

to estimate the hazard function. Also, we use the link between the probability

10

of observing a given duration and the hazard rate seen in Section II. The log-

likelihood is given by:

LN =N∑i=1

T ∗∑τ=1

wiτ{yiτ lnG(γτ + x′iβτ ) + (1− yiτ ) ln(1−G(γτ + x′iβτ ))} (38)

where yiτ = 1{ti = τ}, and wiτ = 1{ti ≥ τ}, and G(.) is a cdf (e.g. logistic or

normal as before). This expression includes two types of contributions:

• Spells that end at time τ :

ln Pr(t = τ |x) = lnh(τ,x) +τ−1∑s=1

ln(1− h(s,x)). (39)

• Spells that are incomplete at time T ∗:

ln Pr(t > T ∗|x) =T ∗∑s=1

ln(1− h(s,x)). (40)

For every period τ = 1, ..., T ∗ we estimate a Probit or Logit of exiting vs not

exiting conditional on survival (i.e. estimated on the sample of individuals still

alive, which are those with wit = 1). Given this, it becomes clear the importance

of setting T ∗ small enough, so that we have enough observations alive to estimate

the probit of exiting at period T ∗ with precision. We can set T ∗i (= min{t̄i, T ∗})different for every individual, if there are observations censored below the maxi-

mum T ∗ that we are considering.

V. Unobserved Heterogeneity

A. Unobserved heterogeneity vs spurious duration dependence

Often, we cannot observe all important determinants of durations. The omission

of important regressors can generate spurious duration dependence. To illustrate

this idea, consider a regressor x = {0, 1} (e.g. low vs high ability), and that the

conditional hazard is well represented by a constant proportional hazard model

in which:

h(t, x = 0) = h0 h(t, x = 1) = h1 h1 > h0. (41)

Now assume that we do not observe x. The (unconditional) hazard that we

identify is the following:

h(τ) = h1 Pr(x = 1|t ≥ τ) + h0 Pr(x = 0|t ≥ τ). (42)

11

Figure 4. An Example with Unobserved Heterogeneity

0.00

0.03

0.06

0.09

0.12

0.15

Haz

ard

func

tion

0 10 20 30 40Duration (t)

Note: Black dashed: h1 (hazard rate when x = 1); gray dashed: h0 (hazard rate when x = 0); graydashed: observed (unconditional) hazard.

The shape of this hazard is not constant anymore. Given that individuals with

x = 1 have a higher hazard of exiting, the proportion of individuals with x = 1 is

decreasing in the population, and the unconditional hazard converges to h0.

Figure 4 is an example of this. In the figure, the conditional hazards are h1 =

0.14 and h0 = 0.02, and the initial fraction of individuals with x = 1 is 80%,

with 20% having x = 0. As it emerges from the figure, after 40 periods, the

unconditional hazard rate has completely converged to h0, as all individuals with

x = 1 exited, while there are still individuals with x = 0 remaining unemployed.

Hence, not being able to control for a covariate x (e.g. ability) that correlates

with the hazard of exiting (e.g. high ability unemployed workers have a larger

hazard of exiting from unemployment) can create a spurious duration dependence.

In our unemployment example, not controlling for ability leads to the conclusion

that the hazard of finding a job decreases with the duration of the current un-

employment spell when this hazard is indeed constant because an individual that

has been unemployment for long is more likely to be of low ability.

B. Dealing with heterogeneity in continuous hazard models

Lancaster (1979) addressed this problem by introducing a multiplicative random

effect in the proportional hazard specification:

h(t,x, ν) = λ(t) exp(x′β)ν, (43)

where ν is assumed independent of x with positive support, E[ν] = 1 and pdf g(ν).

Hence, h(t,x, ν) is a hazard function conditional on x and ν. The cdf conditional

12

on x and ν is:

F (t|x, ν) = 1− exp

(−∫ t

0

h(u,x, ν)

), (44)

and the cdf of t given x only, based on which we write the integrated likelihood, is

F (t|x) =

∫ ∞0

F (t|x, v)g(v)dv. (45)

Lancaster assumed a Gamma distribution for g(ν). An important remark here is

that we need to include regressors to be able to identify this model.

C. Dealing with heterogeneity in discrete hazard models

Similarly, in the discrete case, we can also write hazards conditional on ν and

x and then integrate over ν:

Pr(t = τ |x) =

∫Pr(t = τ |x, v)g(v)dv =

∫h(τ,x, v)

τ−1∏s=1

[1− h(s,x, v)]g(v)dv,

(46)

where, for instance:

h(t,x, ν) = G(γt + x′βt + ν). (47)

A frequently used specification for g(v) is a discrete-support mass point distribu-

tion: {ν1, ..., νm} with probabilities {p1, .., pm}:

Pr(t = τ |x) =m∑j=1

{h(τ,x, νj)

τ−1∏s=1

[1− h(s,x, νj)]pj

}, (48)

where νj and pj are additional parameters to be estimated.

VI. Multiple-exit discrete duration models

A. Discrete competing risk models

In this last section we discuss the case in which there are multiple exits from

the current state. For instance, we consider a model in which individuals can exit

from unemployment into a temporary or a permanent job.

Let duration t be discrete, and consider the two indicator functions dj with

j = 1, 2 that equal one if the exit is to alternative j. We can define the following

intensities of transition to each state:

φj(τ) = Pr(t = τ, dj = 1|t ≥ τ), j = 1, 2. (49)

This expression has a direct link with the unconditional hazard rates:

h(τ) = Pr(t = τ |t ≥ τ) = φ1(τ) + φ2(τ). (50)

13

Conditional hazard rates are:

h1(τ) = Pr(y1τ = 1|t ≥ τ, y2τ = 0) (51)

h2(τ) = Pr(y2τ = 1|t ≥ τ, y1τ = 0), (52)

where yjτ = 1{t = τ, dj = 1}.The mapping between intensities and conditional hazards is given by the defi-

nition of conditional expectation:

hj(τ) =Pr(yjτ = 1|t ≥ τ)

Pr(ykτ = 0|t ≥ τ)=

φj(t)

1− φk(t), (53)

where the numerator is the joint probability as y1t = 1 implies y2t = 0 and vicev-

ersa. Therefore, we can write the model in terms of either of the two. For instance,

a MNL for φ’s is equivalent to a binary Logit for h’s with the same parameters.

Models presented in terms of conditional hazards h1(t) and h(2(t) are also known

as competing risk models. This name comes from considering two latent random

variables t∗1 and t∗2 such that the observed duration is t = min{t∗1, t∗2}. If t∗1 and

t∗2 are independent, h1(t) and h2(t) can be interpreted as hazard rates of latent

durations:

hj(τ) = Pr(t∗j = τ |t∗j ≥ τ). (54)

This implies that the analysis of exits to 1 takes exits to 2 as censored observations.

B. Full information ML

The log-likelihood function is analogous to the discrete case:

LN =N∑i=1

T ∗∑τ=1

wiτ{yiτ (d1i lnφ1(τ,xi) + d2i lnφ2(τ,xi))

+ (1− yiτ ) ln(1− φ1(τ,xi)− φ2(τ,xi))}, (55)

where yiτ = 1{ti = τ}, and wiτ = 1{ti ≥ τ}, as defined above. This expression

includes three types of contributions:

• Spells that end at time τ exiting to option 1:

ln Pr(t = τ, d1 = 1|x) = lnφ1(τ,x) +τ−1∑s=1

ln(1− φ1(s,x)− φ2(s,x)). (56)

• Spells that end at time τ exiting to option 2:

ln Pr(t = τ, d2 = 1|x) = lnφ2(τ,x) +τ−1∑s=1

ln(1− φ1(s,x)− φ2(s,x)). (57)

14

• Spells that are incomplete at time T ∗:

ln Pr(t > T ∗|x) =T ∗∑s=1

ln(1− φ1(s,x)− φ2(s,x)). (58)

C. Limited information ML based on competing risk models

We can also estimate separately the model for each option, considering exits

to the other option as censored, in a competing risk fashion. This is a LIML

estimation. The likelihood for option j would be:

LNj =N∑i=1

T ∗∑τ=1

wiτ{yijτ lnhj(τ,xi) + (1− yijτ ) ln(1− hj(τ,xi))}, (59)

or, alternatively:

LNj =N∑i=1

T ∗∑τ=1

wiτ{yiτdj lnhj(τ,xi) + (1− yiτdj) ln(1− hj(τ,xi))}, (60)

where yijτ = 1{ti = τ, dij = 1}, yiτ = 1{ti = τ}, and wiτ = 1{ti ≥ τ}, with ti =

min{t∗1i, t∗2i}, which delivers two types of contributions as in the standard discrete

case (with the “censored” contributions being for both censored observations and

exits to alternative k 6= j.

References

Cameron, A. Colin and Pravin K. Triverdi (2005), Microeconometrics:

Methods and Applications, Cambridge University Press.

Cox, David R. (1972), “Regression Models and Life Tables (with Discussion)”,

Journal of the Royal Statistical Society, B, 34, 187-220.

Lancaster, Tony (1979), “Econometric Models for the Duration of Unemploy-

ment”, Econometrica, 47, 939-956.

Lancaster, Tony (1990), Econometric Analysis of Transition Data, Cambridge.

Van den Berg, Gerard (2001), “Duration Models: Specification, Identification

and Multiple Durations”, in J.J. Heckman and E. Leamer (eds.), Handbook of

Econometrics, Vol. 5, Ch. 55.

15

duration analysis - uab barcelonapareto.uab.cat/jllull/bgse_panel_data/duration_notes.pdf · the...

Documents