on the dynamics of interstate migration: migration costs

On the Dynamics of Interstate Migration:

Migration Costs and Self-Selection

Christian Bayer�

Falko Juessenyz

IGIER and Universität BonnTechnische Universität Dortmund and IZA

First version: February 15, 2006This version: July 30, 2009

Abstract

This paper develops a dynamic structural model of migration decisions that is aggre-

gated to describe the behavior of interregional migration. Our structural approach

allows us to deal with dynamic self-selection problems that arise from the endogene-

ity of location choice and the persistency of migration incentives. The self-selection

problem is solved by keeping track of the distribution of migration incentives over

time. This econometric treatment has important consequences for the estimation

of structural parameters such as migration costs. For US interstate migration, we

obtain a cost estimate of roughly two-thirds of an average annual household income.

We show that not treating the dynamic self-selection problem properly leads to

substantially larger cost estimates.

KEYWORDS: Dynamic self-selection, aggregate migration, indirect inference

JEL-codes: C61, C20, J61, R23

�IGIER, Università Commerciale L. Bocconi, Via SRöntgen 1, 20139 Milano, Italy; phone: +39-02-5836 3386; email: [email protected]

yTechnische Universität Dortmund, Department of Economics, 44221 Dortmund, Germany; phone:+49-231-755-3291; fax: +49-231-755-3069; email: [email protected]

zWe would like to thank Francesc Ortega, Andreas Schabert, anonymous referees and conferenceparticipants at the NASM 2006, the SED Meeting 2006, the EEA Meeting 2006, the VfS Meeting 2006,the SMYE 2007, the SCE Meeting 2007, the ERSA Meeting 2007, LAMES 2008, and at seminars held atIZA, Universität Bonn, the EUI, and Università Bocconi for their helpful comments and suggestions. Partof this paper was written while C. Bayer was visiting fellow at Yale University and Jean Monnet fellow atthe European University Institute. He is grateful for the support of these institutions. Financial supportby the Rudolf Chaudoire Foundation is gratefully acknowledged. The research has been supported byDFG under Sonderforschungsbereich 475. We would like to thank Christian Wogatzke for excellentresearch assistance. A previous version of the paper has been circulated under the title "A generalizedoptions approach to aggregate migration with an application to US federal states".

1

1 Introduction

Migration choices are important economic decisions. Migration allows individual agents

to evade adverse shocks to their income and it is an important way of macroeconomic

adjustment (Blanchard and Katz, 1992, and Decressin and Fatas, 1995). Many factors

in�uence the decision to migrate and a vast empirical literature has analyzed how mi-

gration decisions are driven by economic incentives, in particular by income di¤erentials

(see Greenwood, 1975, 1985, and 1997 and Cushing and Poot, 2004 for survey articles).

Since migration is a dynamic discrete choice problem, advances in modelling these prob-

lems (see Keane and Wolpin, 2009, Norets, 2009, Aguirregabiria and Mira, 2008) have

opened up new frontiers for empirical research on migration too. This triggered a recent

interest in structural models of migration (see e.g. Davis Greeenwood and Li, 2001,

Hunt and Mueller, 2004, Kennan and Walker 2003, 2009, or Coen-Pirani, 2008).

In such structural models of within-country migration decisions, an important state

variable is the household�s potential income in alternative regions. Yet, this variable is

typically unobservable to the econometrician. Moreover, this unobservable state variable

can be expected to be serially correlated as found in the literature on income risk, see e.g.

Storesletten, Telmer and Yaron (2004) or Low, Meghir and Pistaferri (2008). Thus the

assumption of i.i.d. unobservables which is common in dynamic discrete choice models

is violated. Norets (2008) shows that wrongly assuming i.i.d. unobservables can create

signi�cant estimation biases in dynamic discrete choice models and therefore (Norets,

2009) develops a Bayesian estimation technique for this class of models with serially

correlated unobservables.

Relative to his setup, however, migration induces the additional problem of dynamic

self selection. Dynamic self selection occurs because optimizing agents repeatedly select

themselves into the region which is best for them, such that the observability of income

depends on the agents�past and present choices.

In non-repeated discrete-choice modelling ("now-or-never" type of decisions), various

solutions to self-selection problems have been discussed, see Heckmann and Robb (1985)

for an overview. In the context of migration, self-selection a¤ects the regression used to

proxy the agents�alternative income as well as the estimate of the decision rule itself as

has been highlighted by Nakosteen and Zimmer (1980). Their proposed solution builds

on a selection model of the type popularized by Heckman (1974, 1976, 1978) and Lee

(1978, 1979). However, it rests on the assumption of non-repeated discrete choice and

on residual income heterogeneity being i.i.d.

In this paper, we �rst highlight that a deviation from the i.i.d. assumption has stark

2

consequences for the estimation of structural parameters of migration models already

in a simpli�ed repeated choice setup, where we postulate a reduced form relation of

the initial location of an agent in a given period and past unobservables.1 Standard self

selection on residual potential incomes implies that we cannot assume the set of migrants

to be a random sample. Dynamic self-selection, i.e. repeated decision making, implies

additionally that such assumption not even holds for the initial set of agents situated in

a given location at a given point of time if there is autocorrelation in potential incomes.

The unobserved residual income of an agent is typically in favor of the place she currently

lives in. Agents have selected themselves (repeatedly) into the region where they are

better o¤.

Based on this insight, we develop a fully dynamic model of repeated migration choices

that allows us to take a classical, simulation-based estimation approach of the deep struc-

tural parameters while taking serial correlation in potential incomes and self-selection

into account. Our approach relies on explicitly modelling the dynamics of the distrib-

ution of realized incomes. Our modelling strategy follows Caballero and Engel�s (1999)

paper on investment, which highlights the interaction of lumpy investment and the evo-

lution of investment incentives. In the spirit of their model, we develop a microeconomic

structural model of migration which can be aggregated and used to describe the simulta-

neous evolution of unobservable migration incentives and migration rates at an aggregate

level.

This means we make explicit the movements in the distribution of unobserved po-

tential income and then use these distributions to describe average migration behavior.

In this setting, we can use aggregate migration data for the estimation of our model. We

use annual US state level migration �ows from 1989-2004 and �nd that the estimated

migration cost is 33,230 US$ for a typical move between US-states. This number is

substantially smaller than the ones reported in previous contributions, such as Davis,

Greenwood and Li (2001) or Kennan and Walker (2009). We then show that it can

generate a bias in estimated migration costs by an order of magnitude if one ignores the

endogeneity and the dynamics of the distribution of unobserved potential incomes.

While we can use our structural model to express and to track the dynamic evolution

of migration incentives at the macroeconomic level, we are constrained to a bi-regional

setup for numerical feasibility. This constraint arises because simulating the dynamic

evolution of migration incentives is numerically intense even if solving the microeconomic

1We focus on migration costs. They are a structural parameter of high interest, both at an individuallevel as well as from an aggregate perspective (Sjaastad, 1962), for example since more generous unem-ployment insurance schemes will receive more political support in countries where migration costs arehigh, see Hassler et al. (2005).

3

decision problem itself is quick. Due to the numerical intensity of simulating the incen-

tive distributions, we also have to focus exclusively on the migration-income relation,

excluding other factors that may be important to migration decisions and shape the

heterogeneity of the population in terms of their migration behavior.2

Nonetheless and despite our focus on a macroeconomic model of migration and on

its econometric treatment, we also document migration dynamics at the micro level that

di¤ers from a model which does not keep track of the incentive distribution. One of

the best documented facts from microdata is that younger households are more likely to

migrate than older ones. Our model is able to generate a distinct age pro�le of migration

rates as a result of dynamic self-selection such that it provides a new explanation for the

empirical age-migration patterns. We apply a perpetual-youth model and as a result,

age is not part of the decision problem of the agent and the decision problem remains

stationary. This implies that we have shut down the human capital channel that relates

age to migration (Sjaastad, 1962). And yet, age in�uences migration because it is an

argument of the distribution of migration incentives. The match between agent and

region becomes more e¢ cient as agents get older, since agents have selected themselves

into their preferred region.

In summary, our paper extends the discussion of self-selection in migration3 to a

dynamic discrete choice setup. At the same time, our approach complements previous

structural dynamic discrete choice models of migration which neglect self-selection, but

account for a richer set of factors that in�uence migration decisions and a more complex

decision problem involving multiple regions.

The remainder of this paper is organized as follows. Section 2 develops the key motive

of our paper. It illustrates on the basis of a simple two-period model why the evolution

of migration incentives and triggered migration choices have to be estimated simultane-

ously to avoid a bias from dynamic self-selection. Given these considerations, Section 3

presents a dynamic microeconomic model of the migration decision, which assumes that

an in�nitely-lived agent maximizes future expected well-being by location choice. In

Section 4, we show how to aggregate this model. We derive the contemporaneous law of

motion of the distribution of migration incentives and aggregate migration rates. Section

5 confronts the model with aggregate data on migration between US states and presents

the estimates of the structural parameters of the model, in particular the estimates of

migration costs. Section 6 investigates in more detail why ignoring self-selection leads to

2This research strategy links our paper to Coen-Pirani�s (2008) island-economy model of regionalmigration.

3See for example Nakosteen and Zimmer, 1980, Borjas, 1987, Borjas, Bronars, and Trejo, 1992, Tunali,2000, and Hunt and Mueller, 2004.

4

biases in the estimation and explores the magnitude of the bias. Section 7 investigates

the age patterns of migration our model implies. Section 8 concludes and an appendix

provides detailed proofs, details on the numerical model, as well as details on the data

employed.

2 Why a dynamic model of migration and migration incentives?

Most micro studies and lately also more macro studies on migration link the individual

migration decision to a probabilistic model in which agent i migrates at time t if the

gain in utility terms obtained by migration is large enough and exceeds some threshold

value c, see for example Davies, Greenwood, and Li (2001), Hunt and Mueller (2004),

or Kennan and Walker (2009).

2.1 Endogenous initial state

To illustrate the problems induced by dynamic self selection in such setup, we consider a

two-period, t = 0; 1; bi-regional example with regions A and B in this section. In Section

3 we develop an in�nite horizon, dynamic discrete choice model of migration that can

solve the problems highlighted here.

If yit denotes the region (yit = 1 for region A and yit = 0 for region B) in which

agent i resides at time t; the two-period model can be written as

y�i1=

(uiA1 � (uiB1 � c)(uiA1 � c)� uiB1

if agent i lives in A at time 0

if agent i lives in B at time 0

= (wiA1 � wiB1)� c (1� 2yi0) + �i1: (1)

yi1=

(1

0

if y�i1 > 0

if y�i1 � 0: (2)

Equation (2) is the decision rule the agent uses to decide her place of residence.

Here, y�i1 denotes the latent utility di¤erence the agent observes when living in region

A relative to living in B to which (1) gives a parametric form. We assume that the

�ow utility uij1 from living in region j depends only on incomes wij1: The parameter

measures the marginal utility of income. The utility-costs of migration are described

by c. The stochastic component �i1 re�ects di¤erences across agents, omitted migration

incentives, and/or some variability of migration costs.

Typically, we are interested in the structural parameters and c and hence would

estimate (2) to infer these parameters. A direct approach is in general impossible as

potential migration gains are unobservable to the econometrician. i.e. we observe wij1only if the agent chooses to live in j.

5

A standard approach to solve this problem is to proxy the unobservable potential

income by the income a similar agent realizes in the other region using a Mincer-type

wage regression

wij1 = �j1zi1 + w�ij1;

where �j1 measures the sensitivity of wages to observables and w�ij1 is residual wage

heterogeneity.

Early on, Nakosteen and Zimmer (1980) highlighted that the self selection of agents

has to be taken into account when estimating the unobserved potential income gains.

This leads them to advocate a joint estimation of the latent income variable and the

migration choice based on a model of the type popularized by Heckman (1974, 1976,

1978) and Lee (1978, 1979).

Since we are in this paper not interested in the e¤ect of classical self-selection, we

assume that the econometrician actually knew �j1 and thus also the average gain from

migration.4 Nonetheless, if wage residuals w�ij1 are autocorrelated, the structural es-

timation of the decision problem will be biased if the place of residence is a result of

past decision making (dynamic self-selection). Consequently, the estimated migration

decision (2) fails to identify true migration costs.

Replacing wij1 by the estimates �j1zi1 in (2) ; we obtain for the latent variable y�i1

y�i1 = (�A1 � �B1) zi1 � c (1� 2yi0) + (w�iA1 � w�iB1) + �i1| {z }:=�i1

: (3)

The proxy-model (3) now contains a composed error term that combines the original

error �i1 from the discrete choice problem (2) and a measurement error (w�iA1 � w�iB1)that captures the residual income heterogeneity across agents after controlling for ob-

servables zi1: We assume this term is orthogonal to (�A1 � �B1) zi1: The necessary as-sumption for unbiased estimates of c is that (w�iA1 � w�iB1) is also orthogonal to theprevious place of residence yi0:

In a setup in which time periods are relatively short, e.g. years, this is unlikely. The

literature on income over the life cycle has documented relatively high autocorrelation

in incomes, even after controlling for individual characteristics and �xed individual het-

erogeneity, see for example Storesletten, Telmer and Yaron (2004) or Low, Meghir and

Pistaferri (2008).

4See Heckmann and Robb (1985) for various consistent estimators of �j1: In the terminology of theeconometric literature on selection, this assumption means that the problem of estimating treatmente¤ects can be readily solved.

6

Assume w�ijt follows an AR(1) process

w�ij1 = �w�ij0 + "ij1

with i.i.d. innovations "ij1: The initial conditions w�ij0 are i.i.d., drawn from a normal

distribution N�0; �20

�: Replacing w�ij1 in (3) ; we obtain

y�i1 = (�A1 � �B1) zi1 � c (1� 2yi0) + (� (w�iA0 � w�iB0) + "iA1 � "iB1) + �i1| {z }=�i1

:

As long as � 6= 0; corr (yi0; �i1) 6= 0 if the location in the previous period yi0 is a functionof the previous periods�residual income di¤erence (w�iA0 � w�iB0) : In general this will bethe case if the location yi0 has been a result of migration choice and thus is not random.

For example assume that households are distributed e¢ ciently across regions in t = 0,

in the sense that each household is in the region where it earns most income

yi0 =

(1

0

ifwiA0 > wiB0

ifwiA0 � wiB0:

One can think of this allocation rule as re�ecting migration choices in period 0 in the

absence of migration costs and decision errors �i0:

We can calculate the covariance of the composed error term �i1 with yi0 as (see

Appendix A)

cov (yi0; �i1) = 2 ��0�

�(�A1 � �B1) zi1

2�0

�> 0; (4)

where � is the density of the standard normal distribution. Note that the covariance

is positive, implying an upwards bias in the estimate of migration costs c: In addition,

the bias is neither constant across individuals nor across time. It is largest when the

deterministic di¤erences (�A1��B1)zi12�0are small, i.e. when regions are much alike.

The bias vanishes if � = 0; which corresponds to the model considered by Nakosteen

and Zimmer (1980). It also vanishes if the location in period t = 0; yi0; is not related

to the income di¤erence in t = 0: Then we have cov (yi0; �i1) = 0:That initial location

is unrelated to initial income di¤erences is likely for example if one looks at location at

the time of birth compared to the location at another �xed age. Research on internal

migration, however, has typically not looked at this type of data. Migration data that

comes at a yearly frequency typically re�ects the behavior of households who have faced

migration decisions repeatedly - even if they have taken the decision to stay most of the

time.

7

Figure 1: Distribution of potential incomes in region B relative to A

3 2 1 0 1 2 30

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Live in A Live in B

3 2 1 0 1 2 30

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

(a) overall population (b) conditional on living in region Aafter migration

3 2 1 0 1 2 30

0.2

0.4

0.6

0.8

1

1.2

1.4

Live in A Live in B

3 2 1 0 1 2 30

0.2

0.4

0.6

0.8

1

1.2

1.4

Live in A Live in B

(c) conditional on living in region A (d) conditional on living in region Aafter migration and idiosyncratic after migration, idiosyncratic,shocks and aggregate shocks

2.2 Dynamics of the distribution of potential incomes

The two-period model introduced above re�ects this repeated decision process only up

to a limit, though it highlights the general problem arising from dynamic self-selection.

To fully address the dynamic character of the migration decision, we extend the model

to the in�nite horizon in the next section. One important element of this in�nite horizon

model will be the dynamics of the distribution of unobserved income heterogeneity, as

sketched in Figure 1.

Suppose the composed error term �it is initially normally distributed as in Figure

1 (a). The �gure displays the distribution of potential incomes, (�At � �Bt) zit + �it:

8

Low values imply that income in region A is favorable, high values imply better income

prospects in region B: In the absence of migration costs, all agents with (�At � �Bt) zit+�it < 0 decide to live in region A and they decide to live in region B otherwise.

As a result of this self-selection, the distribution of income di¤erences changes for

the next period. No agent who lives in region A prefers to live in region B; see Figure

1 (b). E¤ectively, the right-hand part of the distribution in Figure 1 (a) has been cut

because all agents with higher income in region B have actually chosen B as the region

to live in.

Adding a normally distributed idiosyncratic income shock to the persistent income

di¤erence leads to the distribution of income di¤erences as displayed in Figure 1 (c).

The colored-in region indicates the set of agents that will migrate from A to B after the

idiosyncratic shocks occurred.

Besides idiosyncratic shocks, aggregate shocks to the average income di¤erence

(�A1 � �B1) zi1 also in�uence the migration decisions of agents. Figure 1 (d) showsthe distribution of migration incentives as in Figure 1 (c), but after an adverse shock to

region A: Aggregate shocks shift the income di¤erences for all agents and thus shift the

distribution of income di¤erences before migration without directly altering its shape.

By comparing Figures 1 (c) and 1 (d), one can see that the shape of the distribution after

migration (the region not colored in) di¤ers between both �gures. As a consequence, one

needs to keep track of the evolution of the incentive distribution to determine aggregate

migration. Therefore, we develop a model based on dynamic optimal migration decisions

in the presence of persistent shocks to income. This model can then be aggregated and

used to simulate the evolution of migration and its incentives over time.

3 A simple stochastic model of migration decisions

We consider an economy with two regions, A and B: This economy is assumed to be

inhabited by a continuum of agents of measure 1. Agents maximize future well-being over

an in�nite horizon by location choice. In each period a constant fraction � of randomly

selected agents dies and is replaced by newborn agents so that the overall population

remains constant ("perpetual youth model"). We model the economy in discrete time

and at each point in time an agent decides in which region to live and work. First,

we consider the decision problem of an individual agent i living in region j = A;B.

Thereafter, we discuss aggregation and the dynamics of the distribution of migration

incentives.

Living in region j at time t gives the agent utility ~wijt that we interpret as utility

from income, which is stochastic in our model. We assume incomes to be composed

9

of a permanent (autocorrelated) component ~wijt and a transitory (i.i.d.) component

'ijt.5 Both components vary over time and across individuals. We assume that only

the permanent component is observed before the migration decision. Consequently, in

describing migration behavior, we can focus exclusively on the e¤ect of permanent varia-

tions in potential incomes. Changes in the transitory component realize after migration

and hence do not a¤ect migration choice. However, when confronting our model to data,

including data on aggregate income, we need to take transitory income �uctuations into

account. This means that the microeconomic model does not need to include the tran-

sitory income component, while we have to take it into account in the aggregation of

realized incomes.

Moving from one region to the other comes at a cost. When an agent moves, she

is subject to a disutility c that enters additively in her utility function. Therefore, the

instantaneous utility function uit(j; k) is given by

uit (j; k) = ~wijt � Ij 6=kc (5)

for an agent that has lived in region k in the preceding period and now lives in region j:

Here, I denotes an indicator function, which equals 1 if the agent has moved from regionk to j and 0 if the agent had lived in region j in the preceding period.

The agent discounts future utility by factor � � (1� �) < 1 and maximizes the

discounted sum of expected future utility by location choice. The agent knows the

distribution of the permanent component of income ~wijt and forms rational expectations.

With ~wijt being stochastic, the potential migrant waits for good income opportunities.

In her migration decision the agent thus takes into account the option value to wait and

learn more about future incomes.6

The distribution of migration incentives, ~wijt; is assumed to be log-normal. In partic-

ular, we assume that log income, wijt; follows an AR(1) process with normally distributed

innovations �ijt and autoregressive coe¢ cient � :

ln ( ~wijt) =: wijt = �j (1� �) + �wijt�1 + �ijt; j = A;B: (6)

This process holds for the whole continuum of agents and each agent draws her own series

5See the evidence on transitory income �uctuations provided by Storesletten, Telmer, and Yaron(2004) for instance.

6Our model is based on the real-options approach to migration suggested by Burda (1993) and Burdaet al. (1998). Since the latter two papers only look at migration as a once and for all decision, theypreclude return migration and do not have to study the evolution of migration incentives, to which pastmigration decisions feed back.

10

of innovations �ijt for both regions. The expected value of log income in region j is �j .

The innovations �ijt are composed of aggregate as well as idiosyncratic components.

They have mean zero, are serially uncorrelated, but may be correlated across regions

A,B (see Section 4.2.2).7

The income distribution and migration costs, together with the utility function and

the discount factor de�ne the decision problem for the potential migrant. This is an

optimization problem, which is described by the following Bellman equation:

V (k;wiAt; wiBt) = maxj=A;B

�exp (wijt)� Ifk 6=jgc+ �EtV (j; wiAt+1; wiBt+1)

: (7)

In this equation, Et denotes the expectations operator with respect to information avail-able at time t:8 The optimization problem is stationary and in particular it is independent

of age as agents die with a constant probability �.

The optimal policy is relatively simple. The agent migrates from region k to region j

if and only if the costs of migration are lower than the sum of direct bene�ts of migration

expwijt � expwikt and the expected value gain

�V (wiAt; wiBt) := �Et [V (B;wiAt+1; wiBt+1)� V (A;wiAt+1; wiBt+1)] : (8)

This means that the agent migrates if and only if

c � exp (wijt)� exp (wikt) + �V (wiAt; wiBt) =: �c (wA; wB) : (9)

This gives a critical level of costs �c (wA; wB) at which an agent living in region A is

indi¤erent between moving and not moving to region B. A person moves from A to B if

and only if

c � �cA := �c (wA; wB) :

Conversely, a person living in region B moves to region A if and only if

c � �cB := ��c (wA; wB) :

Note that �c can be positive as well as negative. If �c is positive, region B is more attractive.

If it is negative, region A is more attractive and a person living in region A would only

have an incentive to move to region B if migration costs were negative.

7For technical reasons, we assume boundedness of �ijt; so that �ijt is in fact only approximatelynormal.

8Existence and uniqueness of the value function is proved in Appendix B.

11

4 Aggregate migration and the dynamics of income distributions

4.1 Aggregate migration

Given this trigger rationale for migration, the hazard rate

�j (wA; wB) :=

(1 if c � �cj (wA; wB)0 otherwise

, j = A;B

determines whether a person living in region j moves to the other region if she faces the

potential incomes (wA; wB).

Now, consider the distribution Ft of (potential) incomes (wA; wB) and household

locations. Suppose this joint income and location distribution is the distribution after

the income shocks �ijt have been realized, but before migration decisions have been

taken. Let fjt denote the conditional density of this distribution, conditional on the

household living in region j at time t: Then, the actual fraction ��jt of households living

in j that migrate to the other region evaluates as

��jt :=

Z�j (wA; wB) � fjt (wA; wB) dwAdwB: (10)

This aggregate migration hazard can be thought of as a weighted mean of all microeco-

nomic migration hazards �j (wA; wB), weighted by the density of income pairs (wA; wB)

from distribution Ft:

4.2 Dynamics of income distributions

The distribution Ft itself (and hence fjt) evolves over time and is a result of direct shocks

to income just as it is a result of past migration. In addition it is altered by the death

and birth of agents. We need to characterize the law of motion for Ft to close our model

and to obtain the sequence of aggregate migration rates.

4.2.1 The e¤ect of migration on income distributions

In order to follow the evolution of Ft we need to characterize both the evolution of

the fraction Pjt of households living in each region and the conditional distribution of

incomes fjt (conditional on a household actually living in a speci�c region j).

The proportion of households living in region j at time t+ 1 is a result of migration

decisions at time t. The law of motion for Pjt is thus given by

Pjt+1 =�1� ��jt

�Pjt + ��jtP�jt: (11)

12

The �rst part of the sum re�ects the fraction of households that remain in region j;

where�1� ��jt

�is the probability to stay in region j. The second part is the fraction

of households that migrate from region �j to region j: The probability of a householddying � does not in�uence the proportion of households living in each region as a dying

household is by assumption replaced by a newborn one in the same region.

Since the microeconomic migration hazard depends on (wA; wB) ; di¤erent potential

incomes result in di¤erent propensities to migrate. As a consequence, migration changes

not only the fraction Pjt of households living in region j at time t; but also the conditional

distribution of income, fjt: For example, households living in region A; earning a low

current income, wA; but facing a substantially higher potential income in B; wB; will

probably migrate. As a result, the number of those households will drop to zero in region

A after migration decisions have been taken, while the number of households facing a

smaller income di¤erential might not change, recall Figure 1.

The distribution of migration incentives is thus a function of past migration decisions,

and we can express the new density of households with income (wA; wB) in region j after

migration, fjt; by

fjt (wA; wB) = [1� �jt (wA; wB)] fjt(wA;wB)PjtPjt+1+ ��jt (wA; wB)

f�jt(wA;wB)P�jtPjt+1

: (12)

The probability [1� �jt (wA; wB)] is again the probability to stay in region j. The

term fjt (wA; wB)Pjt weights this probability and is the unconditional income density

for region j before migration has taken place. To obtain the conditional density after

migration, the unconditional income density, fjt (wA; wB)Pjt; is divided by Pjt+1; which

is the fraction (or probability) of households living in region j after migration (i.e. in

time t+ 1): Analogously, the second part of the sum is constructed.

4.2.2 The e¤ect of income shocks on the income distribution

Besides migration, also shocks to income change the distribution of income pairs, Ft: The

shocks to income can be di¤erentiated along two dimensions: One dimension is aggregate

vs. idiosyncratic, the other one is region-speci�c vs. economy-wide. For a single agent

we can decompose the total potential income wijt in region j (see equation 6) into

an aggregate regional-component zjt and an individual-speci�c regional-component w�ijt

13

being driven by shocks �jt and "ijt; respectively:

wijt= zjt + w�ijt (13)

zjt= �j (1� �) + �zjt�1 + �jtw�ijt= �w

�ijt�1 + "ijt; j = A;B:

In case agent i is newborn in period t; we assume that she begins life without any past

idiosyncratic income advantage or disadvantage in region j; i.e. we set w�ijt�1 = 0:

The regional-aggregate shock �jt for region j hits all agents equally and changes their

potential income for region j: Note that this shock does not depend on the actual region

the agent lives in. For example, a positive shock �At > 0 increases the potential income

in region A for agents that currently live in this region as well as for agents that are

currently living in region B: They realize this potential income by deciding to actually

live in region A: The importance of economy-wide business cycles relative to the size of

region-speci�c aggregate �uctuations is re�ected by the correlation � between aggregate

shocks �At and �Bt.

However, aggregate shocks are typically only a minor source of income variation

for an agent. Agents di¤er in various personal characteristics that result in di¤erent

income pro�les over time. Individuals di¤er in their skills and while the demand may

grow for the skill of one person, demand may deteriorate for another person�s skills.

This heterogeneity is captured by the idiosyncratic shocks ("iAt; "iBt) : If "iAt is positive,

income prospects of the individual agent i increase in region A: The correlation "

between "iAt and "iBt re�ects economy-wide demand shifts for a person�s individual skills.

Since we assume aggregate and idiosyncratic shocks to be independent, the variance of

the total shock to income, �ijt; is the sum of the variances of idiosyncratic and aggregate

shocks: �2� = �2" + �2�.

Persistency in incomes is captured by the autoregressive parameter � in equation

(13) : We abstain from the inclusion of permanently �xed individual di¤erences (�xed

e¤ects) primarily because this makes the model numerically more tractable.9

9While the solution of the restricted dynamic programming problem of the agent can be obtainedquickly, the simulation of the distribution of migration incentives is numerically much more involved.The computation time for the estimation amounts to ca. 24h on a 4core Xeon 3GHz machine.If we were to include �xed e¤ects that re�ect di¤erent types of agents, the model would have had to

be solved for each di¤erent type of agent in the way it is now solved for the single type of agent. Thus,any modelling of k heterogeneous types of migrants would mean time need of k days for the estimation.The increase in computation time was even more severe if we were to model more than two regions

as this would increase the dimensionality of the problem, such that computation time would increaseexponentially.

14

Aggregate and idiosyncratic shocks to income, birth and death of households, as

well as income persistency jointly determine the transition from fjt to fjt+1; details

are provided in Appendix C. The latter density now determines migration decisions in

time t + 1; starting the cycle over again. As a result it is both past income shocks and

past migration decisions that drive the incentives to migrate. Making this explicit and

keeping track of the distributional dynamics of migration incentives is the key element

of our model, as it distinguishes our approach from other empirical models of migration.

4.3 Aggregate Income

To link our model to aggregate data, we �nally need to describe the evolution of aggregate

regional realized incomes. For region j; the fraction of income which is persistent and

realized after migration reads as the conditional expected value of exp (wijt)Zexp (wj) fjt (wA; wB) dwAdwB:

Besides persistent income also transitory components (orthogonal to the persistent part)

add to the �uctuations of income in observed data, so that we obtain for log aggregate

income �wjt

�wjt = ln

�Zexp (wj) fjt (wA; wB) dwAdwB

�+ 'jt:

The transitory income component 'jt measures �uctuations in income at a high fre-

quency that are irrelevant to the migration decision. More generally, it captures the

idea that in reality income measures migration incentives imperfectly. One reason is

that any empirical income concept is noisy as such. Personal income� the income con-

cept we use� comprises labor income as well as some components of capital incomes.

Whether each of these elements is too widely or too narrowly de�ned as a migration

incentive is not clear a priori. The inclusion of 'jt re�ects this agnostic view.

5 Estimation

5.1 Estimation technique

We rely on an indirect inference procedure in order to �nd the parameters of our model

that allow us to match closest the observed patterns of migration that are in the data.

In particular, we apply a method of simulated moments (MSM) as has been proposed by

Gourieroux, Monfort, and Renault (1993) to obtain estimates of structural parameters

when the likelihood function of the structural model becomes intractable, as in our

setting. This estimator relies on numerical simulation of the model. Details on the

15

numerical simulation are provided in Appendix E.

The idea behind a method of simulated moments is to choose a set of moments

that captures the characteristics of the data, and then to calibrate and simulate the

structural economic model such that the moments are best replicated by the simulation.

Accordingly, we have to 1) discuss how to use our model of two regions to address

migration between a richer set or regions, 2) decide which parameters of our model shall

be estimated and which are treated as �x, 3) decide on an informative set of moments

along which simulated and real-world data are compared, and 4) decide how to measure

the closeness of the simulated and actual moments.

A simulation of our model yields migration and income data for two regions. Of

course, the actual migrant faces a more complex decision problem than the one simulated

in our model of two regions. Including D.C. as a destination region, an agent has to

decide between 50 possible alternative states where she can move to. To make this

comparable to our model, the 50 alternatives in the data have to be aggregated to a

single complementary region.10 The average income of the alternative region is proxied

by the population-weighted average income over all alternative 50 states.11 With these

assumptions, we can use our model to simulate a data set for migration that has the same

size as the data set constructed from the Internal Revenue Service (IRS) migration data,

which is our empirical benchmark. This database contains annual area-to-area migration

�ow data for US states for the period 1989-2004.12 Accordingly, we simulate our model

for 51 pairs of regions and 81 years, but we drop the �rst 55 years for each region to

minimize the in�uence of our initial choice of the income distribution F0. We choose F0to equal the ergodic distribution in the absence of aggregate shocks, see Appendix D for

details. In order to reduce simulation uncertainty, we replicate each simulation 5 times

and use the averages over the simulations.

We cannot estimate all parameters of the model. This would be numerically infeasi-

10Generating arti�cial bi-regional data means that we technically assume the best income opportunityover all alternative regions to follow a log-normal distribution assumed in our model. An approximationof this sort cannot be avoided by assuming an extreme value distribution for incomes. This would onlywork if migration incentives were serially uncorrelated.11 Income data is taken from the REIS database, CPI de�ated, and in logs.12A detailed data description of both IRS and REIS data can be found in Appendix F. Alternative

migration data for the US, such as the Census, are less appropriate for our analysis as we focus on thee¤ect of income dynamics. While the Census reports changes in the place of residence over a periodof 5 years and is only available once every decade, the IRS data are available on a yearly basis. TheCensus data suggests an approximate annual migration rate of 2%, which is signi�cantly lower than themigration rate of 3.9% documented in the IRS data. This di¤erence may stem from the fact that theCensus data cannot take into account return migration over a 5 year period, which can be expected tobe of sizable importance (see the discussion in Coen-Pirani, 2008 or the results of Kennan and Walker,2009).

16

ble since every dimension added to the parameter search increases the estimation time

exponentially. Our primary interest is to estimate migration costs. Besides migration

costs, we restrict ourselves to the estimation of those parameters of our model which

cannot be inferred from raw data alone. One such set of parameters is the correlation of

shocks to permanent income across regions, " and �. Another one is the correlation of

income shocks across individuals, i.e. the fraction of aggregate shocks, �: In our model "

refers to the residual income after eliminating predictable income components (here re-

gional averages). This means that even in micro-data there is no observable counterpart

for " that we can use to construct a moment condition to infer the correlation coe¢ cient

". For � this is di¤erent as it refers to observable di¤erences in households (i.e. the

place of residence). Thus, � has a observable counterpart that we can use for moment

matching, which is the correlation of realized regional average earnings. For this reason,

we assume that aggregate and individual correlation coe¢ cients are equal, i.e. " = �;

so that we only need to estimate one common parameter, .

Nonetheless, we cannot directly infer and � from the data as these correlations

refer to potential incomes and are therefore inherently unobservable. Since migration

smooths income and agents select into high income regions, the counterparts to � and

in terms of a covariance structure in realized incomes are substantially in�uenced by

the magnitude of migration costs. At the same time, we also expect � and to have

a signi�cant in�uence on the behavior of aggregate migration itself. Consequently, we

estimate � and within the model.

The transitory shock is irrelevant to the migration decision, hence its e¤ect on realized

incomes cannot be smoothed by migration. For this reason, we �x the correlation of the

transitory income shock, '; to the value of the observed correlation of incomes in

the REIS data, taking �xed e¤ects and a linear time trend into account. However,

the variance of the transitory shock, �2'; is estimated. The transitory component does

not only capture purely transitory �uctuations in income, but also various forms of

measurement error. Since we �nd it di¢ cult to guess their quantitative importance, we

abstain from �xing �2' or estimating it outside the model. Our complete set of estimated

parameters is � =�migration costs, ; �; �2'

�:

In turn, all other parameters��; �; �; �2� ; '; �

�in our model have to be �xed to take

reasonable values. As we work with annual data, we choose the discount factor � = 0:95:

Similarly, we �x the probability of dying to � = 2:5% to re�ect an average working-life

expectancy of 40 years.

In order to characterize the process for income, we need to specify the autocorrelation

parameter �, the mean �, and the long-run variance of income. We take these parameters

17

from Storesletten, Telmer, and Yaron (2004). They estimate the dynamics of idiosyn-

cratic labor market risk for the US based on the Panel Study of Income Dynamics. The

paper conveys information on both income variances and autocorrelation of log house-

hold income after controlling for observable household characteristics such as education

or age. Besides, the paper reports a mean household income of US$ 45,000. To approx-

imately match this �gure, we choose the mean of the log income to be �A = �B = 10:5:

Storesletten, Telmer, and Yaron (2004, pp. 711) �nd an annual autocorrelation of resid-

ual idiosyncratic incomes of roughly 0.95 and a frequency weighted average standard

deviation of idiosyncratic income shocks of 0.17 in their preferred speci�cation. 13

A next step is to decide on an informative set of moments, %:We select the standard

deviation of migration rates, the standard deviation of average incomes, and the corre-

lation of average incomes across regions as the �rst three moments to be matched. To

this set of moments we add the estimated parameters from a reduced form regression of

migration rates on the incomes of the destination and the source region. To make the

regression scale-invariant with respect to incomes, we use log-deviations from average

incomes as the income variables, i.e. we estimate

mjt = �0 + �1 (wjt � �wj) + �2 (w�jt � �w�j) + ujt:

The parameters �1 and �2 re�ect income sensitivities of migration. The intercept �0captures the average of migration rates.

We simulate our model for a given vector of model parameters � and calculate the

distance between the moments obtained from this simulation % (�) and the sample mo-

ments %S . We use the covariance matrix of %S obtained by 10000 bootstrap replications

as a weighting matrix so that our distance and goodness-of-�t measure is

L = (%S � % (�))0 cov (%S)�1 (%S � % (�)) :

The actual estimation is carried out by minimizing the distance measure L numeri-

cally by using a direct search algorithm.

13The choice to �x �2� is somewhat problematic, but it arises as a necessary restriction from the useof macro data. The realized �uctuation of income is a function of migration costs. Still we use statisticscalculated from realized income �uctuations to calibrate a parameter for potential income �uctuations.Yet, the standard deviation of idiosyncratic potential income can be understood as a pure scaling

parameter of our model. When we consider the case of zero migration costs, a simulation of our modelreveals that realized income inherits 89% of the standard deviation of potential income. This means thattaking the cross-sectional variance of realized incomes under-estimates the true variance of potentialincomes by no more than 11%. The implied maximal underestimation of migration costs is of the samemagnitude.

18

5.2 Estimation results

Table 1 displays the point estimates of the matched moments calculated from the IRS

and REIS data and the corresponding moments obtained from the simulation of our

model with the estimated parameters. The column "All Moments Matched" refers to

the estimation results from our baseline speci�cation. The remaining columns report

robustness checks discussed in the next subsection. Overall our model is able to repli-

cate the observed moments closely. In fact, the �2 (2)-distributed overidenti�cation test

reported at the bottom of the table does not reject our model.

Table 2 presents the estimates of the model parameters. The estimated migration

costs are US$ 33,230. This is substantially smaller than the estimates reported in pre-

vious contributions such as Davies, Greenwood, and Li (2001) or Kennan and Walker

(2009).

The estimated correlation of income shocks across regions is 23.99%. This is substan-

tially smaller than the observed correlation of realized incomes (58:07%, see Table 1).

The realized incomes co-move more strongly than the shocks to income because mi-

gration ties together the average incomes in both regions more closely than they were

tied together without migration. This drives a wedge between the correlation of income

shocks and the correlation of average realized incomes, the latter being larger than the

former.

The estimated fraction of income shocks that is aggregate amounts to 0.36%.14 There

is a signi�cant transitory income component (measurement error) in the aggregate in-

come �uctuations, which has an estimated standard deviation of 0.0270. This means

that transitory �uctuations in aggregate income add a variance term that has about

41.5% of the long-run variance of the sum of potential incomes and measurement error:

However, migration smooths realized income so that transitory shocks make up 82.8%

of the aggregate variance in realized incomes (again the sum of persistent and transitory

�uctuations). As outlined before, the transitory income component of our model relates

to two sources: to truly transitory �uctuations in incomes and to the fact that income

does not measure migration incentives perfectly.

It is useful to understand the source of identi�cation of the parameters for this model.

Table 3 summarizes the response of the moments to variations in the parameters for

the baseline model speci�cation (�All Moments Matched�). One can see that migration14To put this number into perspective, we calculate the long-run variance of per capita state income,

accounting for a linear time trend and �xed e¤ects, and set this number relative to the implied long-runvariance of household incomes as reported by Storesletten et al. (2004). While this benchmark ratio iswith 0.6% somewhat larger, it also points towards aggregate income shocks being very small comparedto idiosyncratic ones.

19

Table 1: Simulated moments estimation: moments estimates

Moment Actual Simulated MomentsDataMoments All Moments w/o Aver. 50 Yrs CRRA

Matched Migration Working Life (log utility)Rate (� = 2%)

Migration Rates

standard deviation 0.0036 0.0036 0.0036 0.0036 0.0036

Aggregate Regional Income

standard deviation 0.0299 0.0297 0.0297 0.0297 0.0298

correlation across regions 0.5807 0.5663 0.5713 0.5740 0.5751

Reduced Form Regression

intercept (? migration rate) 0.0393 0.0393 (0.0399)1 0.0393 0.0393

sensitivity to incomeof destination region 0.0603 0.0606 0.0616 0.0599 0.0604

sensitivity to incomeof source region -0.0624 -0.0587 -0.0545 -0.0579 -0.0612

Overidenti�cation test �2(2) 0.8212 2.9082 0.7291 0.1037p-value 0.6632 0.0881 0.6954 0.9495

1 Not matched; 2 Only one degree of freedom.The column �Actual Data Moments�refers to the moments estimated from the combinedREIS/IRS data set, with data on 50 US states and D.C. over the period 1989-2004. Thecolumns �Simulated Moments�refer to the moments estimated from the simulation of the modelusing the parameters given in Table 2. Both actual and simulated data are within-transformedand linearly de-trended. The simulations generate a panel of 51 region-pairs and an 71-yearhistory of migration and income data. The �rst 55 years of simulated data are dropped in orderto minimize the in�uence of initial values. Each simulation is repeated 5 times and datamoments are compared to the average over the 5 replications of the simulation.

20

Table 2: Simulated moments estimation: structural parameter estimates

Parameter All Moments w/o Aver. 50 Yrs CRRAMatched Migration Working Life (log utility)

Rate (� = 2%)

Migration Costs 33,2301 31,5571 29,744 0.53872

(4,318) (12,756) (4,596) (0.081)

Correlation of Income Shocks 0.2399 0.2449 0.2820 0.250Across Regions (0.1827) (0.2971) (0.1920) (0.1589)

Fraction of Income Shock 0.0036 0.0035 0.0036 0.0034due to Aggregate Fluctuations � (0.0011) (0.0042) (0.0011) (0.0012)

Standard Deviation of 0.0270 0.0271 0.0268 0.0269Transitory Income Shock �' (0.0012) (0.0027) (0.0012) (0.0012)

1Migration cost estimate c in US$ terms. 2 exp (c)�1 measures the relative income gainnecessary to o¤set migration costs.Standard errors in parenthesis. Estimation is carried out using the simulated momentsestimator by Gourieroux, Monfort, and Renault (1993), which chooses structural modelparameters by matching the moments from a simulated panel of regions with data moments asdisplayed in Table 1. The simulations generate a panel of 51 region-pairs and an 71-yearhistory of migration and income data. The �rst 55 years of simulated data are dropped in orderto minimize the in�uence of initial values. Each simulation is repeated 5 times and datamoments are compared to the average over the 5 replications of the simulation.

21

Table 3: Simulated moments estimation: Jacobian

MomentsMig. Rates Agg. Regional Income Reduced Form Regression

sd sd corr across intercept sensitivity to incomeParameter regions (? mig. rate) destination source

Mig. Costs c -0.4335 0.0148 -0.0764 -0.5060 -0.2872 -0.3116

Corr. -0.0750 0.0128 0.0483 -0.0859 -0.1228 -0.1302

Fraction � 0.1768 0.0594 -0.0011 0.0030 0.1113 0.1037

Trans. Shock �' 0.0000 0.7722 0.0180 0.0000 -1.7352 -1.6636

The table contains the Jacobian (in the form of elasticities @mi@�j

�jmi) of the baseline model

estimate with parameter estimates shown in Table 2. The derivatives are calculatednumerically on the basis of simulations that generate a panel of 51 region-pairs and an 71-yearhistory of migration and income data. The �rst 55 years of simulated data are dropped in orderto minimize the in�uence of initial values. Each simulation is repeated 5 times and datamoments are compared to the average over the 5 replications of the simulation.

costs have a strong in�uence on all moments directly relating to migration: the time-

series standard deviation of migration rates, average migration, and the sensitivity of

migration to aggregate income in source and destination region. In other words these

aggregate moments identify migration costs. An increase in migration costs also has a

weak negative impact on the correlation of realized average incomes across regions. This

helps to identify the correlation of income shocks, ; which has the opposite e¤ect on

the correlation of realized average incomes while the impact on all other moments is

negative as for migration costs.

Making aggregate income shocks stronger, i.e. increasing �; increases the overall

volatility of migration rates and make these more sensitive to income di¤erences but

do not a¤ect the level of migration rates. Obviously � also increases aggregate income

volatility. Similarly an increases in the standard deviation of transitory shocks, �';

increases aggregate realized income volatility, but it decreases the sensitivity of migration

to income di¤erences.

22

5.3 Robustness checks

One might argue that our relatively low migration-cost estimate is driven by the fact that

we attribute all migration to be driven by the income incentive. Correspondingly, the

average migration rate our model should match had to be lower. The downside of such

an argument is that it would be di¢ cult to tell which was the correct migration rate to

match. Therefore, taking an agnostic view on this point, we provide a robustness check

of our estimation results, excluding the average migration rate from the set of moments

our model is calibrated to. The estimation results are reported in the corresponding

columns of Tables 1 and 2. As one can see, the point estimate for migration costs

changes only slightly. Since we drop one moment condition, the standard errors of the

estimates increase.

In addition, we check the robustness of our results to the assumption of an expected

working life of 40 years. If we instead assume a dying probability of 2% and hence an

expected working life of 50 years the estimated migration costs decrease slightly.

As an additional robustness check, we replace the assumption of a risk-neutral agent

and assume constant relative risk aversion instead. Instead of linear utility from income,

we now assume logarithmic utility. While one can interpret the risk-neutrality assump-

tion as a short cut for modelling an agent who has access to perfect capital markets,

the assumption of logarithmic utility from income can be thought of as a model of re-

stricted capital market access. The respective last column of Tables 1 and 2 report the

corresponding estimation results. The estimated cost parameter c is no longer readily

interpretable as a US$ value, instead it re�ects the relative income gain necessary to

o¤set the costs of migration. This means that the money measure of migration costs at

average income is obtained by multiplying exp (c)�1 with average income. Our estimateof c = 0:5387 implies an estimated migration cost of US$ 35,736. Interestingly for none

of the robustness checks the overidenti�cation test rejects (at the 5% level). Yet - as we

will see - the test rejects models that ignore dynamic self-selection. Further robustness

checks can be found in the appendix where we estimate �2" within the model or �x it to

a value that is 40% higher than in the baseline case. The main �ndings do not change.

6 Behind the scenes

A simulation based estimation technique like a method of simulated moments is, by

construction, somewhat of a black box. Therefore, we try to shed some light on our esti-

mation results by running counterfactual simulations and by comparing the estimation

to static estimation techniques.

23

6.1 Zero migration costs

We start with a simulation of the model with migration costs set to zero. In a situation in

which unobservable migration incentives are serially uncorrelated, i.e. drawn completely

anew every period, migration rates would be 50% on average in the absence of migration

costs. In such a situation of zero costs and i.i.d. incentives, every period half of the

population in one region is better o¤ by moving to the other one. By contrast, this is no

longer true if there is serial correlation of incentives. Agents self-select into the region

where they are better o¤, and only those agents that have been on the margin, on the

verge of moving, in the previous period are likely to migrate in the current period. To

illustrate this point we simulate our model for the counterfactual case of zero migration

costs, setting all other parameters to their estimated values. In Table 4 we report some

summary statistics for this experiment.

The most important result of this simulation experiment concerns the average mi-

gration rate. Only 12.9% of the population migrates each year even if migration costs

are zero. This shows why a dynamic view on migration incentives leads to much lower

estimates of migration costs.

Besides we observe that in a world without migration costs, realized incomes are

more strongly correlated, they vary less, and they are signi�cantly higher (+20.44%)

than in a world without any migration. However, the increase is only mild compared

to the situation under the estimated migration costs. There, log average incomes are

10.822 and hence average incomes are already 18.77% larger than without migration.

Only moves that lead to small income gains are added by setting migration costs to

zero.

6.2 Estimation ignoring dynamic self-selection

The evidence from the zero-cost speci�cation suggests that it is indeed the dynamic

nature of the problem, with agents sorting themselves into their preferred region, that

changes the estimation of migration costs substantially. To provide further evidence, we

run two alternative estimation experiments.

First, we re-estimate our model, setting the autocorrelation of income to zero, keeping

the long term variation of income �xed. This experiment tells us which role it plays in

our model and estimation to keep track of the distribution of migration incentives.

Second, we estimate an approximate version of our model, where the dynamic self-

selection that shapes incentive distributions is ignored. In this experiment, we replace

the conditional density in (10) by its unconditional counterpart. If self-selection played

no role, this replacement was innocent. The former place of residence was not informa-

24

Table 4: Simulation Results: Zero Migration Costs

Moment Data Simulated Moment Data SimulatedMoment Moment Moment Moment

Migration rates Reduced form regression3

standard deviation 0.0036 0.0083 intercept(? migration rate) 0.0393 0.1286

Incomesensitivity of migration

log average income1 10.654 10.836 to income of:destination region 0.0603 0.0501

standard deviation 0.0299 0.0292source region -0.0624 -0.0600

correlation across 0.5808 0.6274regions2

1 Cross-sectional average income of a region; 2 Partial correlation controlling for a linear timetrend and �xed e¤ects; 3 Coe¢ cients of a reduced form regression of migration rates onincomes in both regions; 4 Value used for simulation

��+

��2

�: Original data has been rescaled

to have this mean value after de-trending.The zero-cost speci�cation assumes one US$ of migration costs for numerical feasibility. Wesimulate data on 51 region-pairs and an 71 year history of migration and income data. The �rst55 years of simulated data are dropped in order to minimize the in�uence of initial values. Thesimulation is repeated 5 times. The table reports averages over the 5 repetitions.

25

tive for unobservable migration incentives and conditional and unconditional distribu-

tions coincided. Hence, we should obtain similar estimation results as in our baseline

estimation speci�cation, if self-selection was of no concern.

As we will see, both versions that ignore the dynamic setup will lead to higher

estimated migration costs.

6.2.1 Estimation with zero autocorrelation

Our �rst experiment is to set the autocorrelation of income to zero when estimating the

model. We increase the shock variance to income to�2�1��2 in order to keep the long-run

dispersion of income �xed. Setting the autocorrelation to zero makes our model a static

model, as the expected value in the Bellman equation (7) becomes independent of the

current state of migration incentives. The �rst column in Table 5 reports the estimation

results from this experiment. The parameter estimates from the benchmark model with

dynamic self-selection and autocorrelation are repeated for comparison (last column).

Migration cost estimates are with US$ 57,713 roughly 80% higher in this setting

than in the setting with autocorrelated income. Moreover, the model is no longer able

to match the observed data moments, so that the model speci�cation test now rejects

the model. Compared to the estimates reported in the literature, estimated migration

costs are still on the low side. However, two aspects need to be taken into account:

First, with assumed zero autocorrelation of gains from migration, the net present value

of these gains is very limited. In fact, the (naive) net-present value gain from a once and

for all location decision is in the absence of autocorrelation about 10 times smaller than

in our baseline speci�cation. Second, we estimate a substantial measurement error in

the no-autocorrelation speci�cation, which dampens the increase in estimated migration

costs. Under zero autocorrelation, the estimated measurement error is with a standard

deviation of 0.0282 substantially larger, making up 95% of the sum of the long-run

variances of potential income and measurement error.

6.2.2 Estimation imposing the unconditional distribution of migration in-centives

Potentially closer to the existing arguments for ignoring dynamic selection-e¤ects is a

speci�cation of the model that observes the autocorrelation in income, but uses the un-

conditional income distribution (instead of the income distribution conditional on the

place of residence) to evaluate migration rates. Since migration rates are small each

year, one might expect this approximation to be innocent. Using this approximation

in the estimation of the model means to ignore dynamic self-selection, but it leaves the

micro-parameters of the problem unchanged otherwise. For example, from the micro per-

26

Table 5: Simulated moments estimation: Estimation results from models without dy-namic self-selection

Parameter Benchmark Model, Ignoring Dynamic Benchmark Model,No Autocorrelation Self-Selection Autocorr. = 0.95

Migration Costs 56,1011 216,9301 33,2301

(2,848) (27,176) (4,318)

Correlation of Income Shocks 0.19636 0.7101 0.2399Across Regions (0.1334) (0.08577) (0.1827)

Fraction of Income Shock 0.0001 0.0051 0.0036due to Aggregate Fluctuations � (0.0007) (0.0015) (0.0011)

Standard Deviation of 0.0282 0.0263 0.0270Transitory Income Shock �' (0.0017) (0.00145) (0.0012)

p-value, �2(2) overident. test 0.0000 0.0000 0.6632

1Migration cost estimate c in US$ terms.Standard errors in parenthesis. Estimation is carried out using the simulated momentsestimator by Gourieroux, Monfort, and Renault (1993), which chooses structural modelparameters by matching the moments from a simulated panel of regions. See Table 1 forfurther details. The simulations generate a panel of 51 region-pairs and an 71-year history ofmigration and income data. The �rst 55 years of simulated data are dropped in order tominimize the in�uence of initial values. Each simulation is repeated 5 times and data momentsare compared to the average over the 5 replications of the simulation.

27

spective, the decision problem is still dynamic, and naive net-present values of location

choices remain unaltered.

The column �Ignoring Dynamic Self Selection� in Table 5 reports the estimation

results from this exercise. Neglecting self-selection seems all but harmless. The point

estimates of all model parameters change substantially. Most importantly� and in line

with our argument� the estimated migration costs are with US$ 216,930 substantially

larger. The bias in this last experiment is larger than in the experiment before since

there is no o¤setting downwards bias from a misrepresentation of the income process,

i.e. of implied expected income streams.

Hence, treating migration as a dynamic decision problem at the micro level without

taking care of dynamic self-selection in the aggregation may lead to a more severe bias

than ignoring the dynamic structure of the migration decision altogether.

7 Implied Age Patterns

So far, we focused on the estimation of migration costs and the fact that dynamic self-

selection plays an important role for this estimation. However, current contributions to

the empirical study of migration go beyond the representative agent assumption of our

homogeneous migration-cost model (with heterogeneous incentives). A well documented

pattern in migration data is that younger agents are signi�cantly more likely to move

than older agents.

The standard explanation of this pattern relies on the investment character of mi-

gration choices, the so-called human capital theory of migration. This theory rests on

the fact that younger agents face a longer period in which their migration choices can

pay o¤. As a consequence younger agents are more likely to migrate just as younger

agents are more likely to invest in human capital taking further education. While this

human capital theory of migration is able to explain well the di¤erence in migration

between job starters and agents close to retirement, it has di¢ culties in explaining the

sharp decline in migration rates between ages 20 and 35 (Kennan and Walker, 2009).

Accordingly some authors have suggested further age-dependence of migration costs.

Our model provides an additional explanation for the age-migration relation. Agents

start their working life in our model randomly assigned to one of the two regions.

Agents then repeatedly choose whether to stay or to move to the alternative region

observing their current potential income di¤erences, which result from their history of

income shocks. This sequence of income shocks and migration choices has two conse-

quences. First, over the course of their lives agents accumulate income risk since income

is highly autocorrelated. Thus potential incomes become more dispersed between re-

28

Figure 2: Simulation Results: Migration Rates by Age

20 25 30 35 40 45 500.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

0.11

Age

Mig

ratio

n R

ate

Migration Rates by Age

ModelCPSCPS rescaled

Model: Relative frequencies of migration conditional on age. The frequencies are obtained bysimulating the behavior (migration, income, death and birth) of a cross-section of 20,000households for 150 years. Reported frequencies are obtained by averaging over 5 repeatedsimulations using only the �nal 16 years of data in each simulation.CPS: Average interstate migration rates of households by age from the Current PopulationSurvey (CPS), civilian population of age 23-50. Average over the years 1989-2004; there is nodata for 1995.CPS rescaled: Same as above but rescaled to match the average migration rate from the IRSdata.

gions as agents get older. Second, an agent stays in a given region if she earns more

income in her current region than in the alternative one. This means that ex-ante (i.e.

before income shocks realize) the match between agent and region becomes more e¢ -

cient as agents get older. They have selected themselves into their preferred region. This

increasing match e¢ ciency implies that migration rates generally decline in age.

To investigate this e¤ect quantitatively, we simulate 20,000 households over 150 pe-

riods of time for each state, repeat the simulation 5 times, and store both household�s

age and their migration choices for the last 16 years of the simulation. This allows us

to calculate average migration rates by age as displayed in Figure 2: We assume that

a household enters the labor market at the age of 23. One can see that the migration

rate falls as the household becomes older, but the drop in migration rates is smoothed

29

bayer

Rechteck

over di¤erent ages. Agents have a probability to migrate of 10% at the time they enter

the labor market, while agents who are 20 years older only migrate with a probability

of 3%. This age pattern is induced by agents choosing their optimal location over time.

We confront this implied age pattern of migration to observed migration rates by

age in the Current Population Survey (CPS). CPS migration rates are overall slightly

lower than migration rates in the IRS data, so that we also display a rescaled series that

matches the average migration rate in the IRS data. As our model predicts, migration

rates fall quickly in the �rst years after labor market entry. However, our model under-

predicts the decline in migration rates at later ages. This is likely due to the eternal

youth structure of our model that shuts down the human capital channel of migration

described before. Understanding the dynamic self-selection channel as complementary

to the human capital channel, we thus expect a richer model encompassing both chan-

nels to �t the observed age patterns in migration rates closely without making migration

costs age-dependent.

The �gure also reveals that migration rates do not decline monotonically in age in

our model. At the time of entry into the labor market there is not much heterogeneity

across agents, but this heterogeneity quickly grows as agents accumulate shocks to their

potential incomes. This means that initially only few agents observe income di¤erences

between both regions large enough to make them move. Therefore, under our estimated

migration costs, migration rates are highest one year after labor market entry. After

the �rst period there are many agents who would earn more in the other region but not

su¢ ciently more to justify a move. In the second period, many of these agents observe

an income shock that is large enough to induce a move, because income shocks are large

relative to the observed heterogeneity at early ages. In the following periods the e¤ect

of dynamic self-selection dominates the e¤ect of increasing heterogeneity, leading to the

inverse hump-shaped pattern of migration rates.

8 Conclusion

We have provided a model of aggregate migration with a sound microeconomic founda-

tion. The paper is a contribution to the recently evolving literature on structural models

of migration. We explicitly deal with the problem that potential gains from migration

are unobservable and display a dynamic character. This dynamic character of migration

incentives has two aspects: First, the individual gains from migration evolve stochasti-

cally over time, but will typically be highly persistent. Second, at an aggregate level, the

distribution of migration incentives is a result of past migration decisions themselves.

Starting from the microeconomic decision problem allows us to keep track of the dy-

30

namics of the incentive distribution. This dynamics is driven by (dynamic) self-selection.

Neglecting this self-selection results in biased estimates of structural parameters, such

as migration costs. In our application to US interstate migration, we estimate migration

costs to be substantially lower than reported in previous studies. The estimated migra-

tion costs amount to about US$ 33,230, which corresponds to two-thirds of an average

annual income.

Our analysis calls for a careful treatment of the self-selection problem when economic

incentives are not fully observable but persistent. This is particularly relevant for the

analysis of migration. Rather than being drawn every period anew, migration incentives

have a long memory. One example of this long memory of migration incentives is the

persistency that income displays. We integrated the persistency of unobserved migration

incentives in a structural dynamic microeconomic model of the migration decision. This

consequently allowed us to simulate the joint behavior of the observed migration rates,

of the unobserved migration incentives, and of their observable proxies, i.e. incomes.

Addressing the partial unobservability of migration incentives may not only be impor-

tant to macro-studies of migration. Even at a micro level, potential incomes are typically

unobservable and have to be proxied. However, such approximation regularly neglects

self-selection. If households live in their preferred place of residence as a result of their

location choice, and if all observable things are equal, then it must be the unobserved

component of their preferences that is in favor of the place in which they actually live.

Besides unobservable parts of income, this unobservable component of preferences can

also comprise di¤erent valuations of amenities and social networks. Even these factors

can be expected to exhibit persistency.

Future research calls for a more complex microeconomic model that integrates more

information into the macroeconomic analysis, for example labor market conditions and

amenities. Additionally, it would be desirable to extend our bi-regional approach to

the case of multiple regions, as in Davies, Greenwood and Li (2001) and Kennan and

Walker (2009). Further aspects, such as the interaction of migration and local labor

markets, could be analyzed in a general equilibrium framework as in Coen-Pirani (2008),

but our results call for an explicit treatment of the dynamic structure and persistency

of migration incentives. However, all this goes beyond what is currently numerically

feasible, in particular if the model is meant to be estimated.

Taking a more general perspective, our paper highlights the role of dynamic self-

selection in a model with imperfectly observed incentives. One can expect the econo-

metric issues that we raise to carry over to other examples of dynamic discrete choice

problems. Examples would be labor-market participation (see Keane and Wolpin, 2009)

31

or product switching (see e.g. Sweeting, 2007). Also in these frameworks our suggested

solution may well be applicable - explicit aggregation and taking the incentive dynamics

seriously.

References

[1] Adda, J. and R. Cooper (2003): "Dynamic Economics: Quantitative Methods and

Applications", MIT Press, Cambridge.

[2] Aguirregabiria, V. and P. Mira (2008): "Dynamic Discrete Choice Structural Mod-

els: A Survey", Journal of Econometrics. Forthcoming.

[3] Blanchard, O. and L. Katz (1992): "Regional evolutions", Brookings Papers on

Economic Activity, 1, 1-75.

[4] Breitung, J. and W. Meyer (1994): "Testing for unit roots in panel data: are wages

on di¤erent bargaining levels cointegrated?", Applied Economics, 26, 353-361.

[5] Burda, M. (1993): "The Determinants of East-West German Migration: Some First

Results", European Economic Review, 37, 452-462.

[6] Burda, M., M. Müller, W. Härdle, and A. Werwatz (1998): "Semiparametric Analy-

sis of German East-West Migration Intentions: Facts and Theory", Journal of Ap-

plied Econometrics, 13, 525-541.

[7] Borjas, G. J. (1987): "Self-selection and the earnings of immigrants", American

Economic Review, 77, 531-553.

[8] Borjas, G. J., S. Bronars, and S. Trejo (1992): "Self-selection and internal migration

in the United States", Journal of Urban Economics, 32, 159-185.

[9] Caballero, R. and E. M. R. A. Engel (1999): "Explaining Investment Dynamics in

U.S. Manufacturing: A Generalized (S,s) Approach", Econometrica, 67, 783-826.

[10] Chow, C. S. and J. N. Tsitsiklis (1991): "An optimal multigrid algorithm for con-

tinuous state discrete time stochastic control", IEEE Transactions on Automatic

Control, 36, 898�914.

[11] Cushing, B. and J. Poot (2004): "Crossing boundaries and borders: Regional science

advances in migration modelling", Papers in Regional Science, 83, 317-338.

[12] Coen-Pirani, D. (2008): "Understanding Gross Workers Flows Across U.S. States",

mimeo, Carnegie Mellon University.

[13] Davies, P. S., M. J. Greenwood, and H. Li (2001): "A Conditional Logit Approach

to U.S. State-to-State Migration", Journal of Regional Science, 41, 337-360.

[14] Decressin, J. and A. Fatas (1995): "Regional Labor Market Dynamics in Europe",

European Economic Review, 39, 1627-1655.

[15] Gourieroux, C., A. Monfort, and E. Renault (1993): "Indirect inference", Journal

32

of Applied Econometrics, 8, S85-118.

[16] Greenwood, M. J. (1975): "Research on internal migration in the United States: A

survey", Journal of Economic Literature, 13, 397-433.

[17] Greenwood, M. J. (1985): "Human migration: Theory, models and empirical stud-

ies", Journal of Regional Science, 25, 521-544.

[18] Greenwood, M. J. (1997): "Internal migration in developed countries", in: Rosen-

zweig, M. R. and O. Stark (eds.), Handbook of population and family economics,

Volume 1B, Elsevier, Amsterdam.

[19] Hassler, J., J. V. R. Mora, K. Storesletten, and F. Zilibotti (2005). "A Positive

Theory Of Geographic Mobility And Social Insurance", International Economic

Review, 46, 263-303.

[20] Heckman J. J. (1974): "Shadow Prices, Market Wages, and Labor Supply", Econo-

metrica, 42, 679-94.

[21] Heckman J. J. (1976): "The common structure of statistical models of truncation,

sample selection, and limited dependent variables and a simple estimator for such

models", Annals of Economic and Social Measurement, 5, 475-492.

[22] Heckman J. J. (1978): "Dummy Endogenous Variables in a Simultaneous Equations

System", Econometrica, 46, 931�59.

[23] Heckman J. J. and R. Robb (1985): "Alternative Methods for Evaluating the Impact

of Interventions", Journal of Econometrics, 30, 239-267.

[24] Hunt, G. L. and R. E. Mueller (2004): "North American Migration: Returns to

Skill, Border E¤ects and Mobility Costs", Review of Economics and Statistics, 86,

988-1007.

[25] Keane M. P. and K. I. Wolpin (2009): "Empirical Applications of Discrete Choice

Dynamic Programming Models", Review of Economic Dynamics, 12, 1-22.

[26] Kennan, J. and J. R. Walker (2003): "The E¤ect of Expected Income on Individual

Migration Decisions", NBER Working Papers, No. 9585.

[27] Kennan, J. and J. R. Walker (2009): "The E¤ect of Expected Income on Individual

Migration Decisions", mimeo, Universitiy of Wisconsin.

[28] Lee L.F. (1978): "Unionism and Wage Rates: A Simultaneous Equation Model with

Qualitative and Limited Dependent Variables," International Economic Review 19,

415�33.

[29] Lee L.F. (1979): "Identi�cation and Estimation in Binary Choice Models with

Limited (Censored) Dependent Variables," Econometrica, 47, 977�96.

[30] Levin, A., C. F. Lin, and C. S. J. Chu (2002): "Unit Root Tests in Panel Data:

Asymptotic and Finite Sample Properties", Journal of Econometrics, 108, 1-24.

33

[31] Low H., C. Meghir, and L. Pistaferri (2008): "Wage Risk and Employment Risk

over the Life Cycle", IZA DP No. 3700.

[32] Nakosteen, R. A. and M. Zimmer (1980): "Migration and Income; The Question of

Self-Selection," Southern Economic Journal, 46, 840-851.

[33] Norets, A. (2008), "Implementation of Bayesian Inference in Dynamic Discrete

Choice Models", mimeo, Princeton University.

[34] Norets, A. (2009), "Inference in Dynamic Discrete Choice Models with Serially

Correlated Unobserved State Variables", Econometrica, forthcoming.

[35] Sjaastad, L. (1962): "The costs and returns of human migration", Journal of Polit-

ical Economy, 70, 80-93.

[36] Storesletten, K., C. I. Telmer, and A. Yaron (2004): "Cyclical Dynamics in Idio-

syncratic Labor-Market Risk", Journal of Political Economy, 112, 695-717.

[37] Sweeting, A. (2007): "Dynamic Product Repositioning in Di¤erentiated Product

Industries: The Case of Format Switching in the Commercial Radio Industry",

NBER-WP 13522.

[38] Tauchen, G. (1986): "Finite state Markov-chain approximation to univariate and

vector autoregressions", Economics Letters, 20, 177-181.

[39] Tunali, I. (2000): "Rationality of Migration", International Economic Review, 41,

893-920.

34

Appendix

A Derivation of equation (4)

For the covariance between the error term �i1 and location in period 0; yi0; (as described

in Section 2, eq. (4)), we obtain:

cov (yi0; �i1) =E (yi0�i1)� E (�i1)E (yi0)

=E (�i1jyi0 = 1)Pr (yi0 = 1)� E (�i1) Pr (yi0 = 1) :

Since E (�i1) is by assumption zero, this simpli�es to

cov (yi0; �i1) = E (�i1jyi0 = 1)Pr (yi0 = 1) :

Using the de�nition of

�i1 := (w�iA1 � w�iB1) + �i1

and

w�ij1 = �w�ij0 + "ij1

we can rewrite cov (yi0; �i1) as

cov (yi0; �i1) =E ( (� (w�iA0 � w�iB0) + "iA1 � "iB1) + �i1jwiA0 > wiB0) Pr (wiA0 > wiB0)

= �E ((w�iA0 � w�iB0) jwiA0 > wiB0) Pr (wiA0 > wiB0) ;

where the second equality follows from "ijt and vit being orthogonal to all information

available at time t�1 and mean zero. Making use of the de�nition of wijt := �jtzit+w�ijt

and the normality of w�ijt we obtain

cov (yi0; �i1) = �E ((w�iA0 � w�iB0) j (w�iA0 � w�iB0) > � (�A1 � �B1) zi1) Pr (wiA0 > wiB0)

= 2 ��0��(�A1��B1)zi1

2�0

��1� �

��(�A1��B1)zi1

2�0

�� Pr ((w�iA0 � w�iB0) > � (�A1 � �B1) zi1)= 2 ��0�

�(�A1 � �B1) zi1

2�0

�;

where � and � are the probability density function and cumulative distribution function

of a standard normal distribution respectively.

35

B Existence and uniqueness of the value function

We begin with proving existence and uniqueness of the value function. Notation is as in

the main text throughout this appendix, unless stated otherwise.

For the ease of exposition, we assume that the income process is only approximately

log-normal. In particular, we assume that income has a �nite support.

De�nition 1 Let W =�W;W

�be the support of w.

De�nition 2 De�ne a mapping T according to the migration problem of a household,

that is

T (u) (k;wiAt; wiBt) = maxj=A;B

�exp (wijt)� Ifk 6=jgc+ �Etu (j; wiAt+1; wiBt+1)

: (14)

The mapping T is de�ned on the set of all real-valued, bounded functions B that arecontinuous with respect to wA;B and have domain D = fA;Bg �W2:

Lemma 3 The mapping T preserves boundedness.Proof. To show that T preserves boundedness one has to show that for any bounded

function u also Tu is bounded. Consider u to be bounded from above by �u and bounded

from below by u: Then, Tu is bounded, because

Tu = maxj=A;B


� exp

��W�+ ��u <1;

(15)

and

Tu= maxj=A;B


(16)

� maxj=A;B

�exp (wijt)� Ifk 6=jgc+ �u

� exp (W ) + �u > �1: (17)

Lemma 4 The mapping T preserves continuity.Proof. Since Tu is the maximum of two continuous functions, it is itself continuous.

Lemma 5 The mapping T satis�es Blackwell�s conditions.Proof. First, we need to show that for any u1 (�) < u2 (�) the mapping T preserves

the inequality. Since both the expectations operator and the max operator preserve the

36

inequality, T does also. Second, we need to show that T (u+ a) � Tu + a for any

constant a and some < 1: Straightforward algebra shows that

T (u+ a) = Tu+ �a: (18)

Since � < 1 by assumption, T satis�es Blackwell�s conditions.

Proposition 6 The mapping T has a unique �xed point on B, and hence the Bellman-equation has a unique solution.

Proof. Follows straightforwardly from the last three Lemmas.

C E¤ect of Income Shocks on the Distribution of Income

Idiosyncratic shocks, aggregate shocks, death and birth of agents, and the persistency

of the income process determine the transition of the distribution of income incentives

after migration to the distribution of migration incentives before migration in the next

period: For surviving households the income distribution at the beginning of period t+1

results from adding idiosyncratic and aggregate shocks to the distribution of income

after migration in period t, Ft; of which fjt (wA; wB) is the conditional density, see (12).

This means that for a surviving household an income of wijt+1 in period t+1 can result

from any possible combination of wijt and �ijt+1 = �jt+1 + "ijt+1 for which

wijt+1 = �j (1� �) + �wijt + �jt+1 + "ijt+1 (19)

holds. Solving this equation for wijt; we obtain

w�j (wijt+1; �jt+1; "ijt+1) := wijt =wijt+1 � (�jt+1 + "jt+1)

�� j

(1� �)�

: (20)

This w�j (wijt+1; �jt+1; "ijt+1) is the time-t potential income in region j that is consistent

with a future potential income of wijt+1 and realizations of shocks �jt+1 + "ijt+1 at the

beginning of period t + 1: Now suppose that both kinds of shocks, � and "; have been

realized. Then, w�A;B is a one-to-one mapping of future incomes (wiAt+1; wiBt+1) to

current income (wiAt; wiBt) :

The conditional density of observing the future income pair (wiAt+1; wiBt+1) can

thus be obtained from a retrospective. The income pair (w�A; w�B) of past incomes for

a surviving household corresponds uniquely to a future income pair (wiAt+1; wiBt+1) :

Consequently, we can express the density of the income distribution at time t + 1 (for

37

surviving households) using the income distribution after migration Ft; and its condi-

tional density fjt: The density of the income distribution �Ft+1 conditional on surviving

and the region and the vector of shocks is given by

�fjt+1 (wA; wBj�At+1; �Bt+1; "iAt+1; "iBt+1)

= fjt (w�A (wA; �At+1; "iAt+1) ; w

�B (wB; �Bt+1; "iBt+1)) : (21)

Weighting this density with the density of the idiosyncratic shocks h ("iAt+1; "iBt+1)

yields the density of observing the future income pair (w�A; w�B) together with the idio-

syncratic shocks ("iAt+1; "iBt+1) :

fjt (w�A (wA; �At+1; "iAt+1) ; w

�B (wB; �Bt+1; "iBt+1)) � h ("iAt+1; "iBt+1) :

Integrating over all possible idiosyncratic shocks ("iAt+1; "iBt+1) yields the density�fjt+1 of the income distribution before migration and conditional on surviving in period

t+ 1 for a certain combination of aggregate shocks (�At+1; �Bt+1):

�fjt+1 (wA; wBj�At+1; �Bt+1) =Zfjt (w

�A (wA; �At+1; "A) ; w

�B (wB; �Bt+1; "B)) � h ("A; "B) d"Ad"B; j = A;B: (22)

Finally, the actual conditional distribution of potential incomes Fjt+1 and its density

fjt+1 is determined by a convex combination of �fjt+1 (for surviving households) and the

distribution of income shocks (for newborn households)

fjt+1 (wA; wBj�At+1; �Bt+1; zAt+1; zBt+1)

= (1� �) �fjt+1 (wA; wBj�At+1; �Bt+1) + �h ("A � zAt+1; "B � zBt+1) :

For given aggregate states and shocks, this new distribution determines migration from

region j to region �j according to equation (10) for period t+ 1:The evolution of income distributions can thus be summarized as follows. Between

two consecutive periods, the conditional distribution of potential incomes �rst evolves

as a result of migration decisions, moving the density from fjt to fjt: Thereafter, the

distribution is altered by aggregate and idiosyncratic shocks to income, moving the

density from fjt to �fjt+1: Finally, a fraction of households dies and for this fraction the

distribution �fjt+1 is replaced by the distribution of income shocks. This leads to the

new distribution fjt+1, which determines migration decisions in period t + 1; starting

38

the cycle over again.

D Invariant distribution

We prove that migration decisions and idiosyncratic shocks to income imply that poten-

tial income follows an ergodic Markov-process if there are no aggregate shocks. There-

fore, there is an invariant distribution the sequence of income distributions converges

to. For simplicity, we present the proof for an arbitrary discrete approximation of the

model with a continuous state-space for income.

Lemma 7 Assume an arbitrary discretization of the state space with n points for thepotential income in each of the regions. Then, we can capture the transition from ft to

ft+1; which are the unconditional densities of the distribution of households over both

regions and potential incomes, in a matrix � =

�(I �DA) �DB

�DA �(I �DB)

!2 R2n2�2n2.15

In this matrix, � denotes the transition matrix that approximates the AR(1)-process for

income (including birth and death) by a Markov-chain, see Adda and Cooper (2003, pp.

56) for details. Matrix Dj ; j = A;B is the n2 � n2 diagonal matrix with the migration

hazard rates for each of the n2 income pairs of the income grid.

Proof. First, we take a discrete state-space of n possible incomes for each region,

wA1:::wAn and wB1:::wBn: Second, we denote the vector of probabilities that describes

the distribution of potential incomes and household locations in the following form

f =�f (A;wA1; wB1) ::: f (A;wAn; wB1) ::: f (A;wAn; wBn) f (B;wA1; wB1) ::: f (B;wAn; wBn)

�0:

(23)

Analogously, we de�ne the distribution after migration but before idiosyncratic shocks,

f . Taking our law of motion from (22) ; we obtain as a discretized analog

ft+1 =

� 0

0 �

!ft: (24)

Now, de�ne dh 2 f0; 1g as the fraction of households that migrate and are in the h-thincome and location triple given our vectorization of the income grid. This means that

dh = �j (wAk; wBl) ; h = 1:::2n2; where (j; wAk; wBl) is the h-th element in the vectorized

grid. Moreover, de�ne D = diag (d) as the diagonal matrix with migration rates on the

diagonal and DA and DB as the diagonal matrices with only the �rst n2 and the last n2

15Since we work with a discretization, strictly speaking f is not a density, but a vector of probabilitiesfor drawing a location-income possibility vector from a given element of the grid.

39

elements of d; respectively. Then, we can describe the transition from ft to ft by

ft =

I �DA DB

DA I �DB

!ft (25)

Combining the last two equations, we obtain

ft+1 =

�(I �DA) �DB

�DA �(I �DB)

!ft: (26)

Lemma 8 For any distribution of idiosyncratic shocks with support equal to W2; matrix

� has only strictly positive entries.

Proof. If the idiosyncratic shocks have support equal to W2; then every pair of potential

incomes can be reached from every other pair of incomes as a result of the shock, because

we assume the shocks to income to be approximately log-normal. Thus, all entries of �

are strictly positive.

Lemma 9 �2 has only positive entries.Proof. Due to �cA = ��cB; we can assume an ordering of states such that we can write

DA =

Ina 0

0 0

!and DB =

0 0

0 Inb

!; without loss of generality, where Iz is a z�z unit

matrix. Accordingly, we de�ne partitions of � such that

�=

A1 A2

A3 A4

!=

B1 B2

B3 B4

!

=

C1 C2

C3 C4

!=

D1D2

D3D4

!;

where A1 2 R(n2�na)�(n2�na); B1 2 Rnb�nb ; C1 2 Rna�na ; D1 2 R(n

2�nb)�(n2�nb):

This yields for �2 after some tedious algebra

�2 =

0BBBB@B2C3 A2A4 B2C4 A2B4

B4C3 A4A4 B4C4 A4B4

D1C1 C1A2D1D1 C1B2

D3C1 C3A2D3D1 C3B2

1CCCCA :

Each entry of this matrix is positive, since � and hence its partitions are positive.

40

Proposition 10 Under the assumptions of the above Lemmas, migration and idiosyn-cratic shocks de�ne an ergodic process with a stationary distribution F0 = limn!1Bnei:

Proof. The above Lemma directly implies the ergodicity of the Markov chain.

E Numerical aspects

The �rst step in solving the model numerically is to obtain a solution to (7) : We do so

by value-function iteration.16 For this value-function iteration, we �rst approximate the

bivariate process of potential incomes for an individual agent in regions A and B

wijt = �j (1� �) + �wijt�1 + �ijt (27)

by a Markov chain. Because wA and wB are correlated through the correlation structure

in �; it is easier to work with the orthogonal components�w+A ; w

+B

�of (wA; wB) in the

value function iteration.

We evaluate the value function on an equi-spaced grid for the orthogonal components

with a width of �3:5�+A;B around their means, where �+A;B denote the long-run standard

deviations of the orthogonal components. The grid is chosen to capture almost all move-

ments of the income distribution F later on.17 Given this grid, we use Tauchen�s (1986)

algorithm to obtain the transition probabilities for the Markov-chain approximation of

the income process in (27) :

We apply a multigrid algorithm (see Chow and Tsitsiklis, 1991) to speed up the

calculation of the value function. This algorithm works iteratively. It �rst solves the

dynamic programming problem for a coarse grid and then doubles the number of grid

points in each iteration until the grid is �ne enough. In between iterations the solution

for the coarser grid is used to generate the initial guess for the value-function iteration

on the new grid by spline interpolation. The initial grid has 16�16 points (income A �income B) and the �nal grid has 128�128 points.

The solution of (7) yields the optimal migration policy and thus the microeconomic

migration hazard rates �j : With these hazard rates, we can obtain a series of aggregate

16See for example Adda and Cooper (2003) for an overview of dynamic programming techniques.17The choice of �3:5�+A;B is motivated as follows. We obtain in the estimation that about 99%

of the income shocks is due to the idiosyncratic component. Therefore, we can expect 99.9% of themass of the income distribution to fall within �3:29 �

p0:99�+A;B

�= �3:27�+A;B around the mean of thedistribution for any given year. Additionally, the mean income for each year moves within the band�3:29 �

p0:01�+A;B

�= �0:33�+A;B in again 99.9% of all years. Since the sum of both components is�3:6�+A;B ; a grid variation of �3:5�

+A;B should not truncate the income distribution.

41

migration rates for a simulated economy as described in detail in Section 4.2 for any

realization of aggregate shocks (�jt)j=A;Bt=1:::T and an initial distribution F0:

This means that we need an initial distribution of income F0 to solve the sequen-

tial problem. Following Caballero and Engel�s (1999) suggestion, we use the ergodic

distribution of income �F that would be obtained in the absence of aggregate income

shocks.18

To simulate a series of migration rates that correspond to the aggregate migration

hazards��At;Bt

�t=1:::T

; we draw a series of aggregate shocks (to the orthogonal basis)��+At; �

+Bt

�t=1:::T

from a normal distribution with variance � ��+A;B

�2; � 2 [0; 1] : The

weight � measures the relative importance of aggregate shocks, relative to idiosyncratic

shocks, i.e. �2" = (1� �)�2� and �2� = ��2� : Correspondingly, the orthogonal components

of the idiosyncratic shocks have variance (1� �) ��+A;B

�2.

As we did to approximate the microeconomic income process for value function iter-

ation, we also discretize the distribution of migration incentives over the chosen grid of

income to simulate its evolution. Accordingly, we replace the conditional density in (10)

by discrete probabilities. This means that for grid points (�xAk; �xBl) k; l = 1:::64; (k; l

being the index of grid points) with a distance of 2h in between points, we calculate the

probabilities initially (for t = 0 and before the �rst aggregate shock) as

�pk;l;0 =

Z �xA;k+h

�xA;k�h

Z �xB;l+h

�xB;l�hf0 (x1; x2) dx1dx2:

An aggregate shock �j in t = 0 now implies that the o¤-grid pair (�xA;k + �A; �xB;l + �B)

occurs with probability �pk;l;0 after the aggregate shock. To re-obtain on-grid probabili-

ties, we use spline interpolation methods to �nd the on-grid probability after aggregate

but before idiosyncratic shocks, �pk;l;1; restricting p to take values between 0 and 1. That

is, for each t we de�ne a function � with � t (�xA;k + �A;t+1; �xB;l + �B;t+1) := �pk;l;t and

obtain �pk;l;t+1 as

�pk;l;t+1 = � t (�xA;k; �xB;l)

where � t is the interpolation of � t: Idiosyncratic shocks are accounted for by multiplying

after-aggregate-shock, on-grid income probabilities with the transition probability matrix

obtained from Tauchen�s algorithm and thus obtain �pk;l;t+1, the probability to fall in the

k; l-cluster after idiosyncratic shocks. The e¤ect of migration on the distribution of

migration incentives is captured by using a discretized version of (12) :

18This distribution is calculated by assuming that idiosyncratic shocks ! have the full variance of �:

42

We calculate aggregate migration rates this way, i.e. by directly simulating the

evolution of the incentive distribution instead of using a Monte Carlo method based on

drawing a sample of agents, for the reason that the latter is not adequate in our case.

We focus on aggregate behavior, but aggregate shocks turn out to be relatively small

(being responsible for less than 1% of the total variation in income, see the discussion

in Section 5.2). Hence sampling variation would exceed the true aggregate variation of

income most likely if we applied a Monte Carlo approximation.

F Data

Data on migration between US federal states is provided by the US Internal Revenue

Service (IRS). The IRS uses individual income tax returns to calculate internal migration

�ows between US states. In particular, the IRS compiles migration data by matching

the Social Security number of the primary taxpayer from one year to the next. The IRS

identi�es households with an address change since the previous year, and then totals

migration to and from each state in the US to every other state. Given these bilateral

migration �ows, we compute aggregate gross immigration for the 50 US states and the

District of Columbia as the sum of all immigrations from other US states to a particular

state. Migration rates are calculated by expressing gross immigration as proportions

of the number of non-migrants reported in the IRS dataset. The IRS state-to-state

migration-�ow data is available for the years 1989 - 2004.

Income per capita data is taken from the Regional Economic Information System

(REIS) compiled by the Bureau of Economic Analysis (BEA). We use personal income as

the relevant income concept. The REIS data is available online at www.bea.gov/bea/regional/reis/.

The income-per-capita �gure for the alternative region is computed as the population-

weighted mean of all per capita incomes outside a speci�c state.

We remove a linear time trend from all data and express all variables as deviations

from their unit-speci�c means (re-scaled by their overall mean), i.e. we apply a within-

transformation. Table 6 reports descriptive statistics for the original as well as for the

transformed data.

In order to examine the time-series properties of the data employed, we perform a

unit-root analysis for migration rates and income data. In a sample of size T = 16 and

N = 51 either a Breitung and Meyer (1994) or a Levin, Lin, and Chu (2002) unit-root

test is most appropriate. For the Breitung and Meyer (1994) test, we determined the

optimal augmentation lag length by sequential t�testing. Taking into account threeaugmentation lags and time-speci�c e¤ects, we can reject the null hypothesis of a unit

root at the 5% level of signi�cance. Similarly, the Levin, Lin, and Chu (2003) test rejects

43

Table 6: Descriptive statistics

Mean Std. Dev. Min Max

Migration Rate .0393 .0178 .0144 .1146

Migration Rate Filtered .0393 .0036 .0234 .0629

Income per Capita 10.71 .1644 10.39 11.27

Income per Capita Filtered 10.71 .0312 10.61 10.82

Complementary Income per Capita 10.76 .0624 10.67 10.86

Complementary Income per Capita Filtered 10.76 .0239 10.73 10.80

the null hypothesis of a unit root taking a linear time trend into account.

G Further robustness checks

Table 7 displays the estimation results from two further robustness checks. First we

include the variance of income shocks in the set of model parameters to be estimated.

This does neither change the point estimates of the other parameters substantially, nor

does it increase their con�dence bounds. The variance of income shocks itself has a

standard deviation of roughly 20% of the point estimate.

This motivates us to run another robustness check, where we increase the standard

deviation of shocks by 40% in order to understand how sensitive the other parameter

estimates react to changes in this calibrated parameter. Again changes are small. The

estimated migration costs increase by 16.8%, while the other parameter estimates remain

virtually unchanged.

44

Table 7: Simulated moments estimation: structural parameter estimates, further robust-ness checks

Parameter Estimating the Variance of PersistentVariance of Persistent Income Shock �2" set 40%Income Shock �2" higher than baseline (=.0289)

Migration Costs 33,1041 38,6591

(4,347) (4,277)

Correlation of Income Shocks 0.2523 0.2803Across Regions (0.1827) (0.1548)

Fraction of Income Shock 0.0011 0.0014due to Aggregate Fluctuations � (0.0012) (0.0001)

Standard Deviation of 0.0270 0.0265Transitory Income Shock �' (0.0012) (0.0012)

Variance of Persistent 0.0297 0.04053

Income Shock �2" (0.0062) �

Overidenti�cation test �2(2) 0.79432 3.1225p-value 0.3728 0.2099

1Migration cost estimate c in US$ terms. 2 Only one degree of freedom, 3not estimatedStandard errors in parenthesis. Estimation is carried out using the simulated momentsestimator by Gourieroux, Monfort, and Renault (1993), which chooses structural modelparameters by matching the moments from a simulated panel of regions with data moments asdisplayed in Table 1. The simulations generate a panel of 51 region-pairs and an 71-yearhistory of migration and income data. The �rst 55 years of simulated data are dropped in orderto minimize the in�uence of initial values. Each simulation is repeated 5 times and datamoments are compared to the average over the 5 replications of the simulation.

45

on the dynamics of interstate migration: migration costs

Documents