on the dynamics of interstate migration: migration costs
TRANSCRIPT
On the Dynamics of Interstate Migration:
Migration Costs and Self-Selection
Christian Bayer�
Falko Juessenyz
IGIER and Universität BonnTechnische Universität Dortmund and IZA
First version: February 15, 2006This version: July 30, 2009
Abstract
This paper develops a dynamic structural model of migration decisions that is aggre-
gated to describe the behavior of interregional migration. Our structural approach
allows us to deal with dynamic self-selection problems that arise from the endogene-
ity of location choice and the persistency of migration incentives. The self-selection
problem is solved by keeping track of the distribution of migration incentives over
time. This econometric treatment has important consequences for the estimation
of structural parameters such as migration costs. For US interstate migration, we
obtain a cost estimate of roughly two-thirds of an average annual household income.
We show that not treating the dynamic self-selection problem properly leads to
substantially larger cost estimates.
KEYWORDS: Dynamic self-selection, aggregate migration, indirect inference
JEL-codes: C61, C20, J61, R23
�IGIER, Università Commerciale L. Bocconi, Via SRöntgen 1, 20139 Milano, Italy; phone: +39-02-5836 3386; email: [email protected]
yTechnische Universität Dortmund, Department of Economics, 44221 Dortmund, Germany; phone:+49-231-755-3291; fax: +49-231-755-3069; email: [email protected]
zWe would like to thank Francesc Ortega, Andreas Schabert, anonymous referees and conferenceparticipants at the NASM 2006, the SED Meeting 2006, the EEA Meeting 2006, the VfS Meeting 2006,the SMYE 2007, the SCE Meeting 2007, the ERSA Meeting 2007, LAMES 2008, and at seminars held atIZA, Universität Bonn, the EUI, and Università Bocconi for their helpful comments and suggestions. Partof this paper was written while C. Bayer was visiting fellow at Yale University and Jean Monnet fellow atthe European University Institute. He is grateful for the support of these institutions. Financial supportby the Rudolf Chaudoire Foundation is gratefully acknowledged. The research has been supported byDFG under Sonderforschungsbereich 475. We would like to thank Christian Wogatzke for excellentresearch assistance. A previous version of the paper has been circulated under the title "A generalizedoptions approach to aggregate migration with an application to US federal states".
1
1 Introduction
Migration choices are important economic decisions. Migration allows individual agents
to evade adverse shocks to their income and it is an important way of macroeconomic
adjustment (Blanchard and Katz, 1992, and Decressin and Fatas, 1995). Many factors
in�uence the decision to migrate and a vast empirical literature has analyzed how mi-
gration decisions are driven by economic incentives, in particular by income di¤erentials
(see Greenwood, 1975, 1985, and 1997 and Cushing and Poot, 2004 for survey articles).
Since migration is a dynamic discrete choice problem, advances in modelling these prob-
lems (see Keane and Wolpin, 2009, Norets, 2009, Aguirregabiria and Mira, 2008) have
opened up new frontiers for empirical research on migration too. This triggered a recent
interest in structural models of migration (see e.g. Davis Greeenwood and Li, 2001,
Hunt and Mueller, 2004, Kennan and Walker 2003, 2009, or Coen-Pirani, 2008).
In such structural models of within-country migration decisions, an important state
variable is the household�s potential income in alternative regions. Yet, this variable is
typically unobservable to the econometrician. Moreover, this unobservable state variable
can be expected to be serially correlated as found in the literature on income risk, see e.g.
Storesletten, Telmer and Yaron (2004) or Low, Meghir and Pistaferri (2008). Thus the
assumption of i.i.d. unobservables which is common in dynamic discrete choice models
is violated. Norets (2008) shows that wrongly assuming i.i.d. unobservables can create
signi�cant estimation biases in dynamic discrete choice models and therefore (Norets,
2009) develops a Bayesian estimation technique for this class of models with serially
correlated unobservables.
Relative to his setup, however, migration induces the additional problem of dynamic
self selection. Dynamic self selection occurs because optimizing agents repeatedly select
themselves into the region which is best for them, such that the observability of income
depends on the agents�past and present choices.
In non-repeated discrete-choice modelling ("now-or-never" type of decisions), various
solutions to self-selection problems have been discussed, see Heckmann and Robb (1985)
for an overview. In the context of migration, self-selection a¤ects the regression used to
proxy the agents�alternative income as well as the estimate of the decision rule itself as
has been highlighted by Nakosteen and Zimmer (1980). Their proposed solution builds
on a selection model of the type popularized by Heckman (1974, 1976, 1978) and Lee
(1978, 1979). However, it rests on the assumption of non-repeated discrete choice and
on residual income heterogeneity being i.i.d.
In this paper, we �rst highlight that a deviation from the i.i.d. assumption has stark
2
consequences for the estimation of structural parameters of migration models already
in a simpli�ed repeated choice setup, where we postulate a reduced form relation of
the initial location of an agent in a given period and past unobservables.1 Standard self
selection on residual potential incomes implies that we cannot assume the set of migrants
to be a random sample. Dynamic self-selection, i.e. repeated decision making, implies
additionally that such assumption not even holds for the initial set of agents situated in
a given location at a given point of time if there is autocorrelation in potential incomes.
The unobserved residual income of an agent is typically in favor of the place she currently
lives in. Agents have selected themselves (repeatedly) into the region where they are
better o¤.
Based on this insight, we develop a fully dynamic model of repeated migration choices
that allows us to take a classical, simulation-based estimation approach of the deep struc-
tural parameters while taking serial correlation in potential incomes and self-selection
into account. Our approach relies on explicitly modelling the dynamics of the distrib-
ution of realized incomes. Our modelling strategy follows Caballero and Engel�s (1999)
paper on investment, which highlights the interaction of lumpy investment and the evo-
lution of investment incentives. In the spirit of their model, we develop a microeconomic
structural model of migration which can be aggregated and used to describe the simulta-
neous evolution of unobservable migration incentives and migration rates at an aggregate
level.
This means we make explicit the movements in the distribution of unobserved po-
tential income and then use these distributions to describe average migration behavior.
In this setting, we can use aggregate migration data for the estimation of our model. We
use annual US state level migration �ows from 1989-2004 and �nd that the estimated
migration cost is 33,230 US$ for a typical move between US-states. This number is
substantially smaller than the ones reported in previous contributions, such as Davis,
Greenwood and Li (2001) or Kennan and Walker (2009). We then show that it can
generate a bias in estimated migration costs by an order of magnitude if one ignores the
endogeneity and the dynamics of the distribution of unobserved potential incomes.
While we can use our structural model to express and to track the dynamic evolution
of migration incentives at the macroeconomic level, we are constrained to a bi-regional
setup for numerical feasibility. This constraint arises because simulating the dynamic
evolution of migration incentives is numerically intense even if solving the microeconomic
1We focus on migration costs. They are a structural parameter of high interest, both at an individuallevel as well as from an aggregate perspective (Sjaastad, 1962), for example since more generous unem-ployment insurance schemes will receive more political support in countries where migration costs arehigh, see Hassler et al. (2005).
3
decision problem itself is quick. Due to the numerical intensity of simulating the incen-
tive distributions, we also have to focus exclusively on the migration-income relation,
excluding other factors that may be important to migration decisions and shape the
heterogeneity of the population in terms of their migration behavior.2
Nonetheless and despite our focus on a macroeconomic model of migration and on
its econometric treatment, we also document migration dynamics at the micro level that
di¤ers from a model which does not keep track of the incentive distribution. One of
the best documented facts from microdata is that younger households are more likely to
migrate than older ones. Our model is able to generate a distinct age pro�le of migration
rates as a result of dynamic self-selection such that it provides a new explanation for the
empirical age-migration patterns. We apply a perpetual-youth model and as a result,
age is not part of the decision problem of the agent and the decision problem remains
stationary. This implies that we have shut down the human capital channel that relates
age to migration (Sjaastad, 1962). And yet, age in�uences migration because it is an
argument of the distribution of migration incentives. The match between agent and
region becomes more e¢ cient as agents get older, since agents have selected themselves
into their preferred region.
In summary, our paper extends the discussion of self-selection in migration3 to a
dynamic discrete choice setup. At the same time, our approach complements previous
structural dynamic discrete choice models of migration which neglect self-selection, but
account for a richer set of factors that in�uence migration decisions and a more complex
decision problem involving multiple regions.
The remainder of this paper is organized as follows. Section 2 develops the key motive
of our paper. It illustrates on the basis of a simple two-period model why the evolution
of migration incentives and triggered migration choices have to be estimated simultane-
ously to avoid a bias from dynamic self-selection. Given these considerations, Section 3
presents a dynamic microeconomic model of the migration decision, which assumes that
an in�nitely-lived agent maximizes future expected well-being by location choice. In
Section 4, we show how to aggregate this model. We derive the contemporaneous law of
motion of the distribution of migration incentives and aggregate migration rates. Section
5 confronts the model with aggregate data on migration between US states and presents
the estimates of the structural parameters of the model, in particular the estimates of
migration costs. Section 6 investigates in more detail why ignoring self-selection leads to
2This research strategy links our paper to Coen-Pirani�s (2008) island-economy model of regionalmigration.
3See for example Nakosteen and Zimmer, 1980, Borjas, 1987, Borjas, Bronars, and Trejo, 1992, Tunali,2000, and Hunt and Mueller, 2004.
4
biases in the estimation and explores the magnitude of the bias. Section 7 investigates
the age patterns of migration our model implies. Section 8 concludes and an appendix
provides detailed proofs, details on the numerical model, as well as details on the data
employed.
2 Why a dynamic model of migration and migration incentives?
Most micro studies and lately also more macro studies on migration link the individual
migration decision to a probabilistic model in which agent i migrates at time t if the
gain in utility terms obtained by migration is large enough and exceeds some threshold
value c, see for example Davies, Greenwood, and Li (2001), Hunt and Mueller (2004),
or Kennan and Walker (2009).
2.1 Endogenous initial state
To illustrate the problems induced by dynamic self selection in such setup, we consider a
two-period, t = 0; 1; bi-regional example with regions A and B in this section. In Section
3 we develop an in�nite horizon, dynamic discrete choice model of migration that can
solve the problems highlighted here.
If yit denotes the region (yit = 1 for region A and yit = 0 for region B) in which
agent i resides at time t; the two-period model can be written as
y�i1=
(uiA1 � (uiB1 � c)(uiA1 � c)� uiB1
if agent i lives in A at time 0
if agent i lives in B at time 0
= (wiA1 � wiB1)� c (1� 2yi0) + �i1: (1)
yi1=
(1
0
if y�i1 > 0
if y�i1 � 0: (2)
Equation (2) is the decision rule the agent uses to decide her place of residence.
Here, y�i1 denotes the latent utility di¤erence the agent observes when living in region
A relative to living in B to which (1) gives a parametric form. We assume that the
�ow utility uij1 from living in region j depends only on incomes wij1: The parameter
measures the marginal utility of income. The utility-costs of migration are described
by c. The stochastic component �i1 re�ects di¤erences across agents, omitted migration
incentives, and/or some variability of migration costs.
Typically, we are interested in the structural parameters and c and hence would
estimate (2) to infer these parameters. A direct approach is in general impossible as
potential migration gains are unobservable to the econometrician. i.e. we observe wij1only if the agent chooses to live in j.
5
A standard approach to solve this problem is to proxy the unobservable potential
income by the income a similar agent realizes in the other region using a Mincer-type
wage regression
wij1 = �j1zi1 + w�ij1;
where �j1 measures the sensitivity of wages to observables and w�ij1 is residual wage
heterogeneity.
Early on, Nakosteen and Zimmer (1980) highlighted that the self selection of agents
has to be taken into account when estimating the unobserved potential income gains.
This leads them to advocate a joint estimation of the latent income variable and the
migration choice based on a model of the type popularized by Heckman (1974, 1976,
1978) and Lee (1978, 1979).
Since we are in this paper not interested in the e¤ect of classical self-selection, we
assume that the econometrician actually knew �j1 and thus also the average gain from
migration.4 Nonetheless, if wage residuals w�ij1 are autocorrelated, the structural es-
timation of the decision problem will be biased if the place of residence is a result of
past decision making (dynamic self-selection). Consequently, the estimated migration
decision (2) fails to identify true migration costs.
Replacing wij1 by the estimates �j1zi1 in (2) ; we obtain for the latent variable y�i1
y�i1 = (�A1 � �B1) zi1 � c (1� 2yi0) + (w�iA1 � w�iB1) + �i1| {z }:=�i1
: (3)
The proxy-model (3) now contains a composed error term that combines the original
error �i1 from the discrete choice problem (2) and a measurement error (w�iA1 � w�iB1)that captures the residual income heterogeneity across agents after controlling for ob-
servables zi1: We assume this term is orthogonal to (�A1 � �B1) zi1: The necessary as-sumption for unbiased estimates of c is that (w�iA1 � w�iB1) is also orthogonal to theprevious place of residence yi0:
In a setup in which time periods are relatively short, e.g. years, this is unlikely. The
literature on income over the life cycle has documented relatively high autocorrelation
in incomes, even after controlling for individual characteristics and �xed individual het-
erogeneity, see for example Storesletten, Telmer and Yaron (2004) or Low, Meghir and
Pistaferri (2008).
4See Heckmann and Robb (1985) for various consistent estimators of �j1: In the terminology of theeconometric literature on selection, this assumption means that the problem of estimating treatmente¤ects can be readily solved.
6
Assume w�ijt follows an AR(1) process
w�ij1 = �w�ij0 + "ij1
with i.i.d. innovations "ij1: The initial conditions w�ij0 are i.i.d., drawn from a normal
distribution N�0; �20
�: Replacing w�ij1 in (3) ; we obtain
y�i1 = (�A1 � �B1) zi1 � c (1� 2yi0) + (� (w�iA0 � w�iB0) + "iA1 � "iB1) + �i1| {z }=�i1
:
As long as � 6= 0; corr (yi0; �i1) 6= 0 if the location in the previous period yi0 is a functionof the previous periods�residual income di¤erence (w�iA0 � w�iB0) : In general this will bethe case if the location yi0 has been a result of migration choice and thus is not random.
For example assume that households are distributed e¢ ciently across regions in t = 0,
in the sense that each household is in the region where it earns most income
yi0 =
(1
0
ifwiA0 > wiB0
ifwiA0 � wiB0:
One can think of this allocation rule as re�ecting migration choices in period 0 in the
absence of migration costs and decision errors �i0:
We can calculate the covariance of the composed error term �i1 with yi0 as (see
Appendix A)
cov (yi0; �i1) = 2 ��0�
�(�A1 � �B1) zi1
2�0
�> 0; (4)
where � is the density of the standard normal distribution. Note that the covariance
is positive, implying an upwards bias in the estimate of migration costs c: In addition,
the bias is neither constant across individuals nor across time. It is largest when the
deterministic di¤erences (�A1��B1)zi12�0are small, i.e. when regions are much alike.
The bias vanishes if � = 0; which corresponds to the model considered by Nakosteen
and Zimmer (1980). It also vanishes if the location in period t = 0; yi0; is not related
to the income di¤erence in t = 0: Then we have cov (yi0; �i1) = 0:That initial location
is unrelated to initial income di¤erences is likely for example if one looks at location at
the time of birth compared to the location at another �xed age. Research on internal
migration, however, has typically not looked at this type of data. Migration data that
comes at a yearly frequency typically re�ects the behavior of households who have faced
migration decisions repeatedly - even if they have taken the decision to stay most of the
time.
7
Figure 1: Distribution of potential incomes in region B relative to A
3 2 1 0 1 2 30
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Live in A Live in B
3 2 1 0 1 2 30
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
(a) overall population (b) conditional on living in region Aafter migration
3 2 1 0 1 2 30
0.2
0.4
0.6
0.8
1
1.2
1.4
Live in A Live in B
3 2 1 0 1 2 30
0.2
0.4
0.6
0.8
1
1.2
1.4
Live in A Live in B
(c) conditional on living in region A (d) conditional on living in region Aafter migration and idiosyncratic after migration, idiosyncratic,shocks and aggregate shocks
2.2 Dynamics of the distribution of potential incomes
The two-period model introduced above re�ects this repeated decision process only up
to a limit, though it highlights the general problem arising from dynamic self-selection.
To fully address the dynamic character of the migration decision, we extend the model
to the in�nite horizon in the next section. One important element of this in�nite horizon
model will be the dynamics of the distribution of unobserved income heterogeneity, as
sketched in Figure 1.
Suppose the composed error term �it is initially normally distributed as in Figure
1 (a). The �gure displays the distribution of potential incomes, (�At � �Bt) zit + �it:
8
Low values imply that income in region A is favorable, high values imply better income
prospects in region B: In the absence of migration costs, all agents with (�At � �Bt) zit+�it < 0 decide to live in region A and they decide to live in region B otherwise.
As a result of this self-selection, the distribution of income di¤erences changes for
the next period. No agent who lives in region A prefers to live in region B; see Figure
1 (b). E¤ectively, the right-hand part of the distribution in Figure 1 (a) has been cut
because all agents with higher income in region B have actually chosen B as the region
to live in.
Adding a normally distributed idiosyncratic income shock to the persistent income
di¤erence leads to the distribution of income di¤erences as displayed in Figure 1 (c).
The colored-in region indicates the set of agents that will migrate from A to B after the
idiosyncratic shocks occurred.
Besides idiosyncratic shocks, aggregate shocks to the average income di¤erence
(�A1 � �B1) zi1 also in�uence the migration decisions of agents. Figure 1 (d) showsthe distribution of migration incentives as in Figure 1 (c), but after an adverse shock to
region A: Aggregate shocks shift the income di¤erences for all agents and thus shift the
distribution of income di¤erences before migration without directly altering its shape.
By comparing Figures 1 (c) and 1 (d), one can see that the shape of the distribution after
migration (the region not colored in) di¤ers between both �gures. As a consequence, one
needs to keep track of the evolution of the incentive distribution to determine aggregate
migration. Therefore, we develop a model based on dynamic optimal migration decisions
in the presence of persistent shocks to income. This model can then be aggregated and
used to simulate the evolution of migration and its incentives over time.
3 A simple stochastic model of migration decisions
We consider an economy with two regions, A and B: This economy is assumed to be
inhabited by a continuum of agents of measure 1. Agents maximize future well-being over
an in�nite horizon by location choice. In each period a constant fraction � of randomly
selected agents dies and is replaced by newborn agents so that the overall population
remains constant ("perpetual youth model"). We model the economy in discrete time
and at each point in time an agent decides in which region to live and work. First,
we consider the decision problem of an individual agent i living in region j = A;B.
Thereafter, we discuss aggregation and the dynamics of the distribution of migration
incentives.
Living in region j at time t gives the agent utility ~wijt that we interpret as utility
from income, which is stochastic in our model. We assume incomes to be composed
9
of a permanent (autocorrelated) component ~wijt and a transitory (i.i.d.) component
'ijt.5 Both components vary over time and across individuals. We assume that only
the permanent component is observed before the migration decision. Consequently, in
describing migration behavior, we can focus exclusively on the e¤ect of permanent varia-
tions in potential incomes. Changes in the transitory component realize after migration
and hence do not a¤ect migration choice. However, when confronting our model to data,
including data on aggregate income, we need to take transitory income �uctuations into
account. This means that the microeconomic model does not need to include the tran-
sitory income component, while we have to take it into account in the aggregation of
realized incomes.
Moving from one region to the other comes at a cost. When an agent moves, she
is subject to a disutility c that enters additively in her utility function. Therefore, the
instantaneous utility function uit(j; k) is given by
uit (j; k) = ~wijt � Ij 6=kc (5)
for an agent that has lived in region k in the preceding period and now lives in region j:
Here, I denotes an indicator function, which equals 1 if the agent has moved from regionk to j and 0 if the agent had lived in region j in the preceding period.
The agent discounts future utility by factor � � (1� �) < 1 and maximizes the
discounted sum of expected future utility by location choice. The agent knows the
distribution of the permanent component of income ~wijt and forms rational expectations.
With ~wijt being stochastic, the potential migrant waits for good income opportunities.
In her migration decision the agent thus takes into account the option value to wait and
learn more about future incomes.6
The distribution of migration incentives, ~wijt; is assumed to be log-normal. In partic-
ular, we assume that log income, wijt; follows an AR(1) process with normally distributed
innovations �ijt and autoregressive coe¢ cient � :
ln ( ~wijt) =: wijt = �j (1� �) + �wijt�1 + �ijt; j = A;B: (6)
This process holds for the whole continuum of agents and each agent draws her own series
5See the evidence on transitory income �uctuations provided by Storesletten, Telmer, and Yaron(2004) for instance.
6Our model is based on the real-options approach to migration suggested by Burda (1993) and Burdaet al. (1998). Since the latter two papers only look at migration as a once and for all decision, theypreclude return migration and do not have to study the evolution of migration incentives, to which pastmigration decisions feed back.
10
of innovations �ijt for both regions. The expected value of log income in region j is �j .
The innovations �ijt are composed of aggregate as well as idiosyncratic components.
They have mean zero, are serially uncorrelated, but may be correlated across regions
A,B (see Section 4.2.2).7
The income distribution and migration costs, together with the utility function and
the discount factor de�ne the decision problem for the potential migrant. This is an
optimization problem, which is described by the following Bellman equation:
V (k;wiAt; wiBt) = maxj=A;B
�exp (wijt)� Ifk 6=jgc+ �EtV (j; wiAt+1; wiBt+1)
: (7)
In this equation, Et denotes the expectations operator with respect to information avail-able at time t:8 The optimization problem is stationary and in particular it is independent
of age as agents die with a constant probability �.
The optimal policy is relatively simple. The agent migrates from region k to region j
if and only if the costs of migration are lower than the sum of direct bene�ts of migration
expwijt � expwikt and the expected value gain
�V (wiAt; wiBt) := �Et [V (B;wiAt+1; wiBt+1)� V (A;wiAt+1; wiBt+1)] : (8)
This means that the agent migrates if and only if
c � exp (wijt)� exp (wikt) + �V (wiAt; wiBt) =: �c (wA; wB) : (9)
This gives a critical level of costs �c (wA; wB) at which an agent living in region A is
indi¤erent between moving and not moving to region B. A person moves from A to B if
and only if
c � �cA := �c (wA; wB) :
Conversely, a person living in region B moves to region A if and only if
c � �cB := ��c (wA; wB) :
Note that �c can be positive as well as negative. If �c is positive, region B is more attractive.
If it is negative, region A is more attractive and a person living in region A would only
have an incentive to move to region B if migration costs were negative.
7For technical reasons, we assume boundedness of �ijt; so that �ijt is in fact only approximatelynormal.
8Existence and uniqueness of the value function is proved in Appendix B.
11
4 Aggregate migration and the dynamics of income distributions
4.1 Aggregate migration
Given this trigger rationale for migration, the hazard rate
�j (wA; wB) :=
(1 if c � �cj (wA; wB)0 otherwise
, j = A;B
determines whether a person living in region j moves to the other region if she faces the
potential incomes (wA; wB).
Now, consider the distribution Ft of (potential) incomes (wA; wB) and household
locations. Suppose this joint income and location distribution is the distribution after
the income shocks �ijt have been realized, but before migration decisions have been
taken. Let fjt denote the conditional density of this distribution, conditional on the
household living in region j at time t: Then, the actual fraction ��jt of households living
in j that migrate to the other region evaluates as
��jt :=
Z�j (wA; wB) � fjt (wA; wB) dwAdwB: (10)
This aggregate migration hazard can be thought of as a weighted mean of all microeco-
nomic migration hazards �j (wA; wB), weighted by the density of income pairs (wA; wB)
from distribution Ft:
4.2 Dynamics of income distributions
The distribution Ft itself (and hence fjt) evolves over time and is a result of direct shocks
to income just as it is a result of past migration. In addition it is altered by the death
and birth of agents. We need to characterize the law of motion for Ft to close our model
and to obtain the sequence of aggregate migration rates.
4.2.1 The e¤ect of migration on income distributions
In order to follow the evolution of Ft we need to characterize both the evolution of
the fraction Pjt of households living in each region and the conditional distribution of
incomes fjt (conditional on a household actually living in a speci�c region j).
The proportion of households living in region j at time t+ 1 is a result of migration
decisions at time t. The law of motion for Pjt is thus given by
Pjt+1 =�1� ��jt
�Pjt + ���jtP�jt: (11)
12
The �rst part of the sum re�ects the fraction of households that remain in region j;
where�1� ��jt
�is the probability to stay in region j. The second part is the fraction
of households that migrate from region �j to region j: The probability of a householddying � does not in�uence the proportion of households living in each region as a dying
household is by assumption replaced by a newborn one in the same region.
Since the microeconomic migration hazard depends on (wA; wB) ; di¤erent potential
incomes result in di¤erent propensities to migrate. As a consequence, migration changes
not only the fraction Pjt of households living in region j at time t; but also the conditional
distribution of income, fjt: For example, households living in region A; earning a low
current income, wA; but facing a substantially higher potential income in B; wB; will
probably migrate. As a result, the number of those households will drop to zero in region
A after migration decisions have been taken, while the number of households facing a
smaller income di¤erential might not change, recall Figure 1.
The distribution of migration incentives is thus a function of past migration decisions,
and we can express the new density of households with income (wA; wB) in region j after
migration, fjt; by
fjt (wA; wB) = [1� �jt (wA; wB)] fjt(wA;wB)PjtPjt+1+ ��jt (wA; wB)
f�jt(wA;wB)P�jtPjt+1
: (12)
The probability [1� �jt (wA; wB)] is again the probability to stay in region j. The
term fjt (wA; wB)Pjt weights this probability and is the unconditional income density
for region j before migration has taken place. To obtain the conditional density after
migration, the unconditional income density, fjt (wA; wB)Pjt; is divided by Pjt+1; which
is the fraction (or probability) of households living in region j after migration (i.e. in
time t+ 1): Analogously, the second part of the sum is constructed.
4.2.2 The e¤ect of income shocks on the income distribution
Besides migration, also shocks to income change the distribution of income pairs, Ft: The
shocks to income can be di¤erentiated along two dimensions: One dimension is aggregate
vs. idiosyncratic, the other one is region-speci�c vs. economy-wide. For a single agent
we can decompose the total potential income wijt in region j (see equation 6) into
an aggregate regional-component zjt and an individual-speci�c regional-component w�ijt
13
being driven by shocks �jt and "ijt; respectively:
wijt= zjt + w�ijt (13)
zjt= �j (1� �) + �zjt�1 + �jtw�ijt= �w
�ijt�1 + "ijt; j = A;B:
In case agent i is newborn in period t; we assume that she begins life without any past
idiosyncratic income advantage or disadvantage in region j; i.e. we set w�ijt�1 = 0:
The regional-aggregate shock �jt for region j hits all agents equally and changes their
potential income for region j: Note that this shock does not depend on the actual region
the agent lives in. For example, a positive shock �At > 0 increases the potential income
in region A for agents that currently live in this region as well as for agents that are
currently living in region B: They realize this potential income by deciding to actually
live in region A: The importance of economy-wide business cycles relative to the size of
region-speci�c aggregate �uctuations is re�ected by the correlation � between aggregate
shocks �At and �Bt.
However, aggregate shocks are typically only a minor source of income variation
for an agent. Agents di¤er in various personal characteristics that result in di¤erent
income pro�les over time. Individuals di¤er in their skills and while the demand may
grow for the skill of one person, demand may deteriorate for another person�s skills.
This heterogeneity is captured by the idiosyncratic shocks ("iAt; "iBt) : If "iAt is positive,
income prospects of the individual agent i increase in region A: The correlation "
between "iAt and "iBt re�ects economy-wide demand shifts for a person�s individual skills.
Since we assume aggregate and idiosyncratic shocks to be independent, the variance of
the total shock to income, �ijt; is the sum of the variances of idiosyncratic and aggregate
shocks: �2� = �2" + �2�.
Persistency in incomes is captured by the autoregressive parameter � in equation
(13) : We abstain from the inclusion of permanently �xed individual di¤erences (�xed
e¤ects) primarily because this makes the model numerically more tractable.9
9While the solution of the restricted dynamic programming problem of the agent can be obtainedquickly, the simulation of the distribution of migration incentives is numerically much more involved.The computation time for the estimation amounts to ca. 24h on a 4core Xeon 3GHz machine.If we were to include �xed e¤ects that re�ect di¤erent types of agents, the model would have had to
be solved for each di¤erent type of agent in the way it is now solved for the single type of agent. Thus,any modelling of k heterogeneous types of migrants would mean time need of k days for the estimation.The increase in computation time was even more severe if we were to model more than two regions
as this would increase the dimensionality of the problem, such that computation time would increaseexponentially.
14
Aggregate and idiosyncratic shocks to income, birth and death of households, as
well as income persistency jointly determine the transition from fjt to fjt+1; details
are provided in Appendix C. The latter density now determines migration decisions in
time t + 1; starting the cycle over again. As a result it is both past income shocks and
past migration decisions that drive the incentives to migrate. Making this explicit and
keeping track of the distributional dynamics of migration incentives is the key element
of our model, as it distinguishes our approach from other empirical models of migration.
4.3 Aggregate Income
To link our model to aggregate data, we �nally need to describe the evolution of aggregate
regional realized incomes. For region j; the fraction of income which is persistent and
realized after migration reads as the conditional expected value of exp (wijt)Zexp (wj) fjt (wA; wB) dwAdwB:
Besides persistent income also transitory components (orthogonal to the persistent part)
add to the �uctuations of income in observed data, so that we obtain for log aggregate
income �wjt
�wjt = ln
�Zexp (wj) fjt (wA; wB) dwAdwB
�+ 'jt:
The transitory income component 'jt measures �uctuations in income at a high fre-
quency that are irrelevant to the migration decision. More generally, it captures the
idea that in reality income measures migration incentives imperfectly. One reason is
that any empirical income concept is noisy as such. Personal income� the income con-
cept we use� comprises labor income as well as some components of capital incomes.
Whether each of these elements is too widely or too narrowly de�ned as a migration
incentive is not clear a priori. The inclusion of 'jt re�ects this agnostic view.
5 Estimation
5.1 Estimation technique
We rely on an indirect inference procedure in order to �nd the parameters of our model
that allow us to match closest the observed patterns of migration that are in the data.
In particular, we apply a method of simulated moments (MSM) as has been proposed by
Gourieroux, Monfort, and Renault (1993) to obtain estimates of structural parameters
when the likelihood function of the structural model becomes intractable, as in our
setting. This estimator relies on numerical simulation of the model. Details on the
15
numerical simulation are provided in Appendix E.
The idea behind a method of simulated moments is to choose a set of moments
that captures the characteristics of the data, and then to calibrate and simulate the
structural economic model such that the moments are best replicated by the simulation.
Accordingly, we have to 1) discuss how to use our model of two regions to address
migration between a richer set or regions, 2) decide which parameters of our model shall
be estimated and which are treated as �x, 3) decide on an informative set of moments
along which simulated and real-world data are compared, and 4) decide how to measure
the closeness of the simulated and actual moments.
A simulation of our model yields migration and income data for two regions. Of
course, the actual migrant faces a more complex decision problem than the one simulated
in our model of two regions. Including D.C. as a destination region, an agent has to
decide between 50 possible alternative states where she can move to. To make this
comparable to our model, the 50 alternatives in the data have to be aggregated to a
single complementary region.10 The average income of the alternative region is proxied
by the population-weighted average income over all alternative 50 states.11 With these
assumptions, we can use our model to simulate a data set for migration that has the same
size as the data set constructed from the Internal Revenue Service (IRS) migration data,
which is our empirical benchmark. This database contains annual area-to-area migration
�ow data for US states for the period 1989-2004.12 Accordingly, we simulate our model
for 51 pairs of regions and 81 years, but we drop the �rst 55 years for each region to
minimize the in�uence of our initial choice of the income distribution F0. We choose F0to equal the ergodic distribution in the absence of aggregate shocks, see Appendix D for
details. In order to reduce simulation uncertainty, we replicate each simulation 5 times
and use the averages over the simulations.
We cannot estimate all parameters of the model. This would be numerically infeasi-
10Generating arti�cial bi-regional data means that we technically assume the best income opportunityover all alternative regions to follow a log-normal distribution assumed in our model. An approximationof this sort cannot be avoided by assuming an extreme value distribution for incomes. This would onlywork if migration incentives were serially uncorrelated.11 Income data is taken from the REIS database, CPI de�ated, and in logs.12A detailed data description of both IRS and REIS data can be found in Appendix F. Alternative
migration data for the US, such as the Census, are less appropriate for our analysis as we focus on thee¤ect of income dynamics. While the Census reports changes in the place of residence over a periodof 5 years and is only available once every decade, the IRS data are available on a yearly basis. TheCensus data suggests an approximate annual migration rate of 2%, which is signi�cantly lower than themigration rate of 3.9% documented in the IRS data. This di¤erence may stem from the fact that theCensus data cannot take into account return migration over a 5 year period, which can be expected tobe of sizable importance (see the discussion in Coen-Pirani, 2008 or the results of Kennan and Walker,2009).
16
ble since every dimension added to the parameter search increases the estimation time
exponentially. Our primary interest is to estimate migration costs. Besides migration
costs, we restrict ourselves to the estimation of those parameters of our model which
cannot be inferred from raw data alone. One such set of parameters is the correlation of
shocks to permanent income across regions, " and �. Another one is the correlation of
income shocks across individuals, i.e. the fraction of aggregate shocks, �: In our model "
refers to the residual income after eliminating predictable income components (here re-
gional averages). This means that even in micro-data there is no observable counterpart
for " that we can use to construct a moment condition to infer the correlation coe¢ cient
". For � this is di¤erent as it refers to observable di¤erences in households (i.e. the
place of residence). Thus, � has a observable counterpart that we can use for moment
matching, which is the correlation of realized regional average earnings. For this reason,
we assume that aggregate and individual correlation coe¢ cients are equal, i.e. " = �;
so that we only need to estimate one common parameter, .
Nonetheless, we cannot directly infer and � from the data as these correlations
refer to potential incomes and are therefore inherently unobservable. Since migration
smooths income and agents select into high income regions, the counterparts to � and
in terms of a covariance structure in realized incomes are substantially in�uenced by
the magnitude of migration costs. At the same time, we also expect � and to have
a signi�cant in�uence on the behavior of aggregate migration itself. Consequently, we
estimate � and within the model.
The transitory shock is irrelevant to the migration decision, hence its e¤ect on realized
incomes cannot be smoothed by migration. For this reason, we �x the correlation of the
transitory income shock, '; to the value of the observed correlation of incomes in
the REIS data, taking �xed e¤ects and a linear time trend into account. However,
the variance of the transitory shock, �2'; is estimated. The transitory component does
not only capture purely transitory �uctuations in income, but also various forms of
measurement error. Since we �nd it di¢ cult to guess their quantitative importance, we
abstain from �xing �2' or estimating it outside the model. Our complete set of estimated
parameters is � =�migration costs, ; �; �2'
�:
In turn, all other parameters��; �; �; �2� ; '; �
�in our model have to be �xed to take
reasonable values. As we work with annual data, we choose the discount factor � = 0:95:
Similarly, we �x the probability of dying to � = 2:5% to re�ect an average working-life
expectancy of 40 years.
In order to characterize the process for income, we need to specify the autocorrelation
parameter �, the mean �, and the long-run variance of income. We take these parameters
17
from Storesletten, Telmer, and Yaron (2004). They estimate the dynamics of idiosyn-
cratic labor market risk for the US based on the Panel Study of Income Dynamics. The
paper conveys information on both income variances and autocorrelation of log house-
hold income after controlling for observable household characteristics such as education
or age. Besides, the paper reports a mean household income of US$ 45,000. To approx-
imately match this �gure, we choose the mean of the log income to be �A = �B = 10:5:
Storesletten, Telmer, and Yaron (2004, pp. 711) �nd an annual autocorrelation of resid-
ual idiosyncratic incomes of roughly 0.95 and a frequency weighted average standard
deviation of idiosyncratic income shocks of 0.17 in their preferred speci�cation. 13
A next step is to decide on an informative set of moments, %:We select the standard
deviation of migration rates, the standard deviation of average incomes, and the corre-
lation of average incomes across regions as the �rst three moments to be matched. To
this set of moments we add the estimated parameters from a reduced form regression of
migration rates on the incomes of the destination and the source region. To make the
regression scale-invariant with respect to incomes, we use log-deviations from average
incomes as the income variables, i.e. we estimate
mjt = �0 + �1 (wjt � �wj) + �2 (w�jt � �w�j) + ujt:
The parameters �1 and �2 re�ect income sensitivities of migration. The intercept �0captures the average of migration rates.
We simulate our model for a given vector of model parameters � and calculate the
distance between the moments obtained from this simulation % (�) and the sample mo-
ments %S . We use the covariance matrix of %S obtained by 10000 bootstrap replications
as a weighting matrix so that our distance and goodness-of-�t measure is
L = (%S � % (�))0 cov (%S)�1 (%S � % (�)) :
The actual estimation is carried out by minimizing the distance measure L numeri-
cally by using a direct search algorithm.
13The choice to �x �2� is somewhat problematic, but it arises as a necessary restriction from the useof macro data. The realized �uctuation of income is a function of migration costs. Still we use statisticscalculated from realized income �uctuations to calibrate a parameter for potential income �uctuations.Yet, the standard deviation of idiosyncratic potential income can be understood as a pure scaling
parameter of our model. When we consider the case of zero migration costs, a simulation of our modelreveals that realized income inherits 89% of the standard deviation of potential income. This means thattaking the cross-sectional variance of realized incomes under-estimates the true variance of potentialincomes by no more than 11%. The implied maximal underestimation of migration costs is of the samemagnitude.
18
5.2 Estimation results
Table 1 displays the point estimates of the matched moments calculated from the IRS
and REIS data and the corresponding moments obtained from the simulation of our
model with the estimated parameters. The column "All Moments Matched" refers to
the estimation results from our baseline speci�cation. The remaining columns report
robustness checks discussed in the next subsection. Overall our model is able to repli-
cate the observed moments closely. In fact, the �2 (2)-distributed overidenti�cation test
reported at the bottom of the table does not reject our model.
Table 2 presents the estimates of the model parameters. The estimated migration
costs are US$ 33,230. This is substantially smaller than the estimates reported in pre-
vious contributions such as Davies, Greenwood, and Li (2001) or Kennan and Walker
(2009).
The estimated correlation of income shocks across regions is 23.99%. This is substan-
tially smaller than the observed correlation of realized incomes (58:07%, see Table 1).
The realized incomes co-move more strongly than the shocks to income because mi-
gration ties together the average incomes in both regions more closely than they were
tied together without migration. This drives a wedge between the correlation of income
shocks and the correlation of average realized incomes, the latter being larger than the
former.
The estimated fraction of income shocks that is aggregate amounts to 0.36%.14 There
is a signi�cant transitory income component (measurement error) in the aggregate in-
come �uctuations, which has an estimated standard deviation of 0.0270. This means
that transitory �uctuations in aggregate income add a variance term that has about
41.5% of the long-run variance of the sum of potential incomes and measurement error:
However, migration smooths realized income so that transitory shocks make up 82.8%
of the aggregate variance in realized incomes (again the sum of persistent and transitory
�uctuations). As outlined before, the transitory income component of our model relates
to two sources: to truly transitory �uctuations in incomes and to the fact that income
does not measure migration incentives perfectly.
It is useful to understand the source of identi�cation of the parameters for this model.
Table 3 summarizes the response of the moments to variations in the parameters for
the baseline model speci�cation (�All Moments Matched�). One can see that migration14To put this number into perspective, we calculate the long-run variance of per capita state income,
accounting for a linear time trend and �xed e¤ects, and set this number relative to the implied long-runvariance of household incomes as reported by Storesletten et al. (2004). While this benchmark ratio iswith 0.6% somewhat larger, it also points towards aggregate income shocks being very small comparedto idiosyncratic ones.
19
Table 1: Simulated moments estimation: moments estimates
Moment Actual Simulated MomentsDataMoments All Moments w/o Aver. 50 Yrs CRRA
Matched Migration Working Life (log utility)Rate (� = 2%)
Migration Rates
standard deviation 0.0036 0.0036 0.0036 0.0036 0.0036
Aggregate Regional Income
standard deviation 0.0299 0.0297 0.0297 0.0297 0.0298
correlation across regions 0.5807 0.5663 0.5713 0.5740 0.5751
Reduced Form Regression
intercept (? migration rate) 0.0393 0.0393 (0.0399)1 0.0393 0.0393
sensitivity to incomeof destination region 0.0603 0.0606 0.0616 0.0599 0.0604
sensitivity to incomeof source region -0.0624 -0.0587 -0.0545 -0.0579 -0.0612
Overidenti�cation test �2(2) 0.8212 2.9082 0.7291 0.1037p-value 0.6632 0.0881 0.6954 0.9495
1 Not matched; 2 Only one degree of freedom.The column �Actual Data Moments�refers to the moments estimated from the combinedREIS/IRS data set, with data on 50 US states and D.C. over the period 1989-2004. Thecolumns �Simulated Moments�refer to the moments estimated from the simulation of the modelusing the parameters given in Table 2. Both actual and simulated data are within-transformedand linearly de-trended. The simulations generate a panel of 51 region-pairs and an 71-yearhistory of migration and income data. The �rst 55 years of simulated data are dropped in orderto minimize the in�uence of initial values. Each simulation is repeated 5 times and datamoments are compared to the average over the 5 replications of the simulation.
20
Table 2: Simulated moments estimation: structural parameter estimates
Parameter All Moments w/o Aver. 50 Yrs CRRAMatched Migration Working Life (log utility)
Rate (� = 2%)
Migration Costs 33,2301 31,5571 29,744 0.53872
(4,318) (12,756) (4,596) (0.081)
Correlation of Income Shocks 0.2399 0.2449 0.2820 0.250Across Regions (0.1827) (0.2971) (0.1920) (0.1589)
Fraction of Income Shock 0.0036 0.0035 0.0036 0.0034due to Aggregate Fluctuations � (0.0011) (0.0042) (0.0011) (0.0012)
Standard Deviation of 0.0270 0.0271 0.0268 0.0269Transitory Income Shock �' (0.0012) (0.0027) (0.0012) (0.0012)
1Migration cost estimate c in US$ terms. 2 exp (c)�1 measures the relative income gainnecessary to o¤set migration costs.Standard errors in parenthesis. Estimation is carried out using the simulated momentsestimator by Gourieroux, Monfort, and Renault (1993), which chooses structural modelparameters by matching the moments from a simulated panel of regions with data moments asdisplayed in Table 1. The simulations generate a panel of 51 region-pairs and an 71-yearhistory of migration and income data. The �rst 55 years of simulated data are dropped in orderto minimize the in�uence of initial values. Each simulation is repeated 5 times and datamoments are compared to the average over the 5 replications of the simulation.
21
Table 3: Simulated moments estimation: Jacobian
MomentsMig. Rates Agg. Regional Income Reduced Form Regression
sd sd corr across intercept sensitivity to incomeParameter regions (? mig. rate) destination source
Mig. Costs c -0.4335 0.0148 -0.0764 -0.5060 -0.2872 -0.3116
Corr. -0.0750 0.0128 0.0483 -0.0859 -0.1228 -0.1302
Fraction � 0.1768 0.0594 -0.0011 0.0030 0.1113 0.1037
Trans. Shock �' 0.0000 0.7722 0.0180 0.0000 -1.7352 -1.6636
The table contains the Jacobian (in the form of elasticities @mi@�j
�jmi) of the baseline model
estimate with parameter estimates shown in Table 2. The derivatives are calculatednumerically on the basis of simulations that generate a panel of 51 region-pairs and an 71-yearhistory of migration and income data. The �rst 55 years of simulated data are dropped in orderto minimize the in�uence of initial values. Each simulation is repeated 5 times and datamoments are compared to the average over the 5 replications of the simulation.
costs have a strong in�uence on all moments directly relating to migration: the time-
series standard deviation of migration rates, average migration, and the sensitivity of
migration to aggregate income in source and destination region. In other words these
aggregate moments identify migration costs. An increase in migration costs also has a
weak negative impact on the correlation of realized average incomes across regions. This
helps to identify the correlation of income shocks, ; which has the opposite e¤ect on
the correlation of realized average incomes while the impact on all other moments is
negative as for migration costs.
Making aggregate income shocks stronger, i.e. increasing �; increases the overall
volatility of migration rates and make these more sensitive to income di¤erences but
do not a¤ect the level of migration rates. Obviously � also increases aggregate income
volatility. Similarly an increases in the standard deviation of transitory shocks, �';
increases aggregate realized income volatility, but it decreases the sensitivity of migration
to income di¤erences.
22
5.3 Robustness checks
One might argue that our relatively low migration-cost estimate is driven by the fact that
we attribute all migration to be driven by the income incentive. Correspondingly, the
average migration rate our model should match had to be lower. The downside of such
an argument is that it would be di¢ cult to tell which was the correct migration rate to
match. Therefore, taking an agnostic view on this point, we provide a robustness check
of our estimation results, excluding the average migration rate from the set of moments
our model is calibrated to. The estimation results are reported in the corresponding
columns of Tables 1 and 2. As one can see, the point estimate for migration costs
changes only slightly. Since we drop one moment condition, the standard errors of the
estimates increase.
In addition, we check the robustness of our results to the assumption of an expected
working life of 40 years. If we instead assume a dying probability of 2% and hence an
expected working life of 50 years the estimated migration costs decrease slightly.
As an additional robustness check, we replace the assumption of a risk-neutral agent
and assume constant relative risk aversion instead. Instead of linear utility from income,
we now assume logarithmic utility. While one can interpret the risk-neutrality assump-
tion as a short cut for modelling an agent who has access to perfect capital markets,
the assumption of logarithmic utility from income can be thought of as a model of re-
stricted capital market access. The respective last column of Tables 1 and 2 report the
corresponding estimation results. The estimated cost parameter c is no longer readily
interpretable as a US$ value, instead it re�ects the relative income gain necessary to
o¤set the costs of migration. This means that the money measure of migration costs at
average income is obtained by multiplying exp (c)�1 with average income. Our estimateof c = 0:5387 implies an estimated migration cost of US$ 35,736. Interestingly for none
of the robustness checks the overidenti�cation test rejects (at the 5% level). Yet - as we
will see - the test rejects models that ignore dynamic self-selection. Further robustness
checks can be found in the appendix where we estimate �2" within the model or �x it to
a value that is 40% higher than in the baseline case. The main �ndings do not change.
6 Behind the scenes
A simulation based estimation technique like a method of simulated moments is, by
construction, somewhat of a black box. Therefore, we try to shed some light on our esti-
mation results by running counterfactual simulations and by comparing the estimation
to static estimation techniques.
23
6.1 Zero migration costs
We start with a simulation of the model with migration costs set to zero. In a situation in
which unobservable migration incentives are serially uncorrelated, i.e. drawn completely
anew every period, migration rates would be 50% on average in the absence of migration
costs. In such a situation of zero costs and i.i.d. incentives, every period half of the
population in one region is better o¤ by moving to the other one. By contrast, this is no
longer true if there is serial correlation of incentives. Agents self-select into the region
where they are better o¤, and only those agents that have been on the margin, on the
verge of moving, in the previous period are likely to migrate in the current period. To
illustrate this point we simulate our model for the counterfactual case of zero migration
costs, setting all other parameters to their estimated values. In Table 4 we report some
summary statistics for this experiment.
The most important result of this simulation experiment concerns the average mi-
gration rate. Only 12.9% of the population migrates each year even if migration costs
are zero. This shows why a dynamic view on migration incentives leads to much lower
estimates of migration costs.
Besides we observe that in a world without migration costs, realized incomes are
more strongly correlated, they vary less, and they are signi�cantly higher (+20.44%)
than in a world without any migration. However, the increase is only mild compared
to the situation under the estimated migration costs. There, log average incomes are
10.822 and hence average incomes are already 18.77% larger than without migration.
Only moves that lead to small income gains are added by setting migration costs to
zero.
6.2 Estimation ignoring dynamic self-selection
The evidence from the zero-cost speci�cation suggests that it is indeed the dynamic
nature of the problem, with agents sorting themselves into their preferred region, that
changes the estimation of migration costs substantially. To provide further evidence, we
run two alternative estimation experiments.
First, we re-estimate our model, setting the autocorrelation of income to zero, keeping
the long term variation of income �xed. This experiment tells us which role it plays in
our model and estimation to keep track of the distribution of migration incentives.
Second, we estimate an approximate version of our model, where the dynamic self-
selection that shapes incentive distributions is ignored. In this experiment, we replace
the conditional density in (10) by its unconditional counterpart. If self-selection played
no role, this replacement was innocent. The former place of residence was not informa-
24
Table 4: Simulation Results: Zero Migration Costs
Moment Data Simulated Moment Data SimulatedMoment Moment Moment Moment
Migration rates Reduced form regression3
standard deviation 0.0036 0.0083 intercept(? migration rate) 0.0393 0.1286
Incomesensitivity of migration
log average income1 10.654 10.836 to income of:destination region 0.0603 0.0501
standard deviation 0.0299 0.0292source region -0.0624 -0.0600
correlation across 0.5808 0.6274regions2
1 Cross-sectional average income of a region; 2 Partial correlation controlling for a linear timetrend and �xed e¤ects; 3 Coe¢ cients of a reduced form regression of migration rates onincomes in both regions; 4 Value used for simulation
��+
��2
�: Original data has been rescaled
to have this mean value after de-trending.The zero-cost speci�cation assumes one US$ of migration costs for numerical feasibility. Wesimulate data on 51 region-pairs and an 71 year history of migration and income data. The �rst55 years of simulated data are dropped in order to minimize the in�uence of initial values. Thesimulation is repeated 5 times. The table reports averages over the 5 repetitions.
25
tive for unobservable migration incentives and conditional and unconditional distribu-
tions coincided. Hence, we should obtain similar estimation results as in our baseline
estimation speci�cation, if self-selection was of no concern.
As we will see, both versions that ignore the dynamic setup will lead to higher
estimated migration costs.
6.2.1 Estimation with zero autocorrelation
Our �rst experiment is to set the autocorrelation of income to zero when estimating the
model. We increase the shock variance to income to�2�1��2 in order to keep the long-run
dispersion of income �xed. Setting the autocorrelation to zero makes our model a static
model, as the expected value in the Bellman equation (7) becomes independent of the
current state of migration incentives. The �rst column in Table 5 reports the estimation
results from this experiment. The parameter estimates from the benchmark model with
dynamic self-selection and autocorrelation are repeated for comparison (last column).
Migration cost estimates are with US$ 57,713 roughly 80% higher in this setting
than in the setting with autocorrelated income. Moreover, the model is no longer able
to match the observed data moments, so that the model speci�cation test now rejects
the model. Compared to the estimates reported in the literature, estimated migration
costs are still on the low side. However, two aspects need to be taken into account:
First, with assumed zero autocorrelation of gains from migration, the net present value
of these gains is very limited. In fact, the (naive) net-present value gain from a once and
for all location decision is in the absence of autocorrelation about 10 times smaller than
in our baseline speci�cation. Second, we estimate a substantial measurement error in
the no-autocorrelation speci�cation, which dampens the increase in estimated migration
costs. Under zero autocorrelation, the estimated measurement error is with a standard
deviation of 0.0282 substantially larger, making up 95% of the sum of the long-run
variances of potential income and measurement error.
6.2.2 Estimation imposing the unconditional distribution of migration in-centives
Potentially closer to the existing arguments for ignoring dynamic selection-e¤ects is a
speci�cation of the model that observes the autocorrelation in income, but uses the un-
conditional income distribution (instead of the income distribution conditional on the
place of residence) to evaluate migration rates. Since migration rates are small each
year, one might expect this approximation to be innocent. Using this approximation
in the estimation of the model means to ignore dynamic self-selection, but it leaves the
micro-parameters of the problem unchanged otherwise. For example, from the micro per-
26
Table 5: Simulated moments estimation: Estimation results from models without dy-namic self-selection
Parameter Benchmark Model, Ignoring Dynamic Benchmark Model,No Autocorrelation Self-Selection Autocorr. = 0.95
Migration Costs 56,1011 216,9301 33,2301
(2,848) (27,176) (4,318)
Correlation of Income Shocks 0.19636 0.7101 0.2399Across Regions (0.1334) (0.08577) (0.1827)
Fraction of Income Shock 0.0001 0.0051 0.0036due to Aggregate Fluctuations � (0.0007) (0.0015) (0.0011)
Standard Deviation of 0.0282 0.0263 0.0270Transitory Income Shock �' (0.0017) (0.00145) (0.0012)
p-value, �2(2) overident. test 0.0000 0.0000 0.6632
1Migration cost estimate c in US$ terms.Standard errors in parenthesis. Estimation is carried out using the simulated momentsestimator by Gourieroux, Monfort, and Renault (1993), which chooses structural modelparameters by matching the moments from a simulated panel of regions. See Table 1 forfurther details. The simulations generate a panel of 51 region-pairs and an 71-year history ofmigration and income data. The �rst 55 years of simulated data are dropped in order tominimize the in�uence of initial values. Each simulation is repeated 5 times and data momentsare compared to the average over the 5 replications of the simulation.
27
spective, the decision problem is still dynamic, and naive net-present values of location
choices remain unaltered.
The column �Ignoring Dynamic Self Selection� in Table 5 reports the estimation
results from this exercise. Neglecting self-selection seems all but harmless. The point
estimates of all model parameters change substantially. Most importantly� and in line
with our argument� the estimated migration costs are with US$ 216,930 substantially
larger. The bias in this last experiment is larger than in the experiment before since
there is no o¤setting downwards bias from a misrepresentation of the income process,
i.e. of implied expected income streams.
Hence, treating migration as a dynamic decision problem at the micro level without
taking care of dynamic self-selection in the aggregation may lead to a more severe bias
than ignoring the dynamic structure of the migration decision altogether.
7 Implied Age Patterns
So far, we focused on the estimation of migration costs and the fact that dynamic self-
selection plays an important role for this estimation. However, current contributions to
the empirical study of migration go beyond the representative agent assumption of our
homogeneous migration-cost model (with heterogeneous incentives). A well documented
pattern in migration data is that younger agents are signi�cantly more likely to move
than older agents.
The standard explanation of this pattern relies on the investment character of mi-
gration choices, the so-called human capital theory of migration. This theory rests on
the fact that younger agents face a longer period in which their migration choices can
pay o¤. As a consequence younger agents are more likely to migrate just as younger
agents are more likely to invest in human capital taking further education. While this
human capital theory of migration is able to explain well the di¤erence in migration
between job starters and agents close to retirement, it has di¢ culties in explaining the
sharp decline in migration rates between ages 20 and 35 (Kennan and Walker, 2009).
Accordingly some authors have suggested further age-dependence of migration costs.
Our model provides an additional explanation for the age-migration relation. Agents
start their working life in our model randomly assigned to one of the two regions.
Agents then repeatedly choose whether to stay or to move to the alternative region
observing their current potential income di¤erences, which result from their history of
income shocks. This sequence of income shocks and migration choices has two conse-
quences. First, over the course of their lives agents accumulate income risk since income
is highly autocorrelated. Thus potential incomes become more dispersed between re-
28
Figure 2: Simulation Results: Migration Rates by Age
20 25 30 35 40 45 500.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
0.11
Age
Mig
ratio
n R
ate
Migration Rates by Age
ModelCPSCPS rescaled
Model: Relative frequencies of migration conditional on age. The frequencies are obtained bysimulating the behavior (migration, income, death and birth) of a cross-section of 20,000households for 150 years. Reported frequencies are obtained by averaging over 5 repeatedsimulations using only the �nal 16 years of data in each simulation.CPS: Average interstate migration rates of households by age from the Current PopulationSurvey (CPS), civilian population of age 23-50. Average over the years 1989-2004; there is nodata for 1995.CPS rescaled: Same as above but rescaled to match the average migration rate from the IRSdata.
gions as agents get older. Second, an agent stays in a given region if she earns more
income in her current region than in the alternative one. This means that ex-ante (i.e.
before income shocks realize) the match between agent and region becomes more e¢ -
cient as agents get older. They have selected themselves into their preferred region. This
increasing match e¢ ciency implies that migration rates generally decline in age.
To investigate this e¤ect quantitatively, we simulate 20,000 households over 150 pe-
riods of time for each state, repeat the simulation 5 times, and store both household�s
age and their migration choices for the last 16 years of the simulation. This allows us
to calculate average migration rates by age as displayed in Figure 2: We assume that
a household enters the labor market at the age of 23. One can see that the migration
rate falls as the household becomes older, but the drop in migration rates is smoothed
29
over di¤erent ages. Agents have a probability to migrate of 10% at the time they enter
the labor market, while agents who are 20 years older only migrate with a probability
of 3%. This age pattern is induced by agents choosing their optimal location over time.
We confront this implied age pattern of migration to observed migration rates by
age in the Current Population Survey (CPS). CPS migration rates are overall slightly
lower than migration rates in the IRS data, so that we also display a rescaled series that
matches the average migration rate in the IRS data. As our model predicts, migration
rates fall quickly in the �rst years after labor market entry. However, our model under-
predicts the decline in migration rates at later ages. This is likely due to the eternal
youth structure of our model that shuts down the human capital channel of migration
described before. Understanding the dynamic self-selection channel as complementary
to the human capital channel, we thus expect a richer model encompassing both chan-
nels to �t the observed age patterns in migration rates closely without making migration
costs age-dependent.
The �gure also reveals that migration rates do not decline monotonically in age in
our model. At the time of entry into the labor market there is not much heterogeneity
across agents, but this heterogeneity quickly grows as agents accumulate shocks to their
potential incomes. This means that initially only few agents observe income di¤erences
between both regions large enough to make them move. Therefore, under our estimated
migration costs, migration rates are highest one year after labor market entry. After
the �rst period there are many agents who would earn more in the other region but not
su¢ ciently more to justify a move. In the second period, many of these agents observe
an income shock that is large enough to induce a move, because income shocks are large
relative to the observed heterogeneity at early ages. In the following periods the e¤ect
of dynamic self-selection dominates the e¤ect of increasing heterogeneity, leading to the
inverse hump-shaped pattern of migration rates.
8 Conclusion
We have provided a model of aggregate migration with a sound microeconomic founda-
tion. The paper is a contribution to the recently evolving literature on structural models
of migration. We explicitly deal with the problem that potential gains from migration
are unobservable and display a dynamic character. This dynamic character of migration
incentives has two aspects: First, the individual gains from migration evolve stochasti-
cally over time, but will typically be highly persistent. Second, at an aggregate level, the
distribution of migration incentives is a result of past migration decisions themselves.
Starting from the microeconomic decision problem allows us to keep track of the dy-
30
namics of the incentive distribution. This dynamics is driven by (dynamic) self-selection.
Neglecting this self-selection results in biased estimates of structural parameters, such
as migration costs. In our application to US interstate migration, we estimate migration
costs to be substantially lower than reported in previous studies. The estimated migra-
tion costs amount to about US$ 33,230, which corresponds to two-thirds of an average
annual income.
Our analysis calls for a careful treatment of the self-selection problem when economic
incentives are not fully observable but persistent. This is particularly relevant for the
analysis of migration. Rather than being drawn every period anew, migration incentives
have a long memory. One example of this long memory of migration incentives is the
persistency that income displays. We integrated the persistency of unobserved migration
incentives in a structural dynamic microeconomic model of the migration decision. This
consequently allowed us to simulate the joint behavior of the observed migration rates,
of the unobserved migration incentives, and of their observable proxies, i.e. incomes.
Addressing the partial unobservability of migration incentives may not only be impor-
tant to macro-studies of migration. Even at a micro level, potential incomes are typically
unobservable and have to be proxied. However, such approximation regularly neglects
self-selection. If households live in their preferred place of residence as a result of their
location choice, and if all observable things are equal, then it must be the unobserved
component of their preferences that is in favor of the place in which they actually live.
Besides unobservable parts of income, this unobservable component of preferences can
also comprise di¤erent valuations of amenities and social networks. Even these factors
can be expected to exhibit persistency.
Future research calls for a more complex microeconomic model that integrates more
information into the macroeconomic analysis, for example labor market conditions and
amenities. Additionally, it would be desirable to extend our bi-regional approach to
the case of multiple regions, as in Davies, Greenwood and Li (2001) and Kennan and
Walker (2009). Further aspects, such as the interaction of migration and local labor
markets, could be analyzed in a general equilibrium framework as in Coen-Pirani (2008),
but our results call for an explicit treatment of the dynamic structure and persistency
of migration incentives. However, all this goes beyond what is currently numerically
feasible, in particular if the model is meant to be estimated.
Taking a more general perspective, our paper highlights the role of dynamic self-
selection in a model with imperfectly observed incentives. One can expect the econo-
metric issues that we raise to carry over to other examples of dynamic discrete choice
problems. Examples would be labor-market participation (see Keane and Wolpin, 2009)
31
or product switching (see e.g. Sweeting, 2007). Also in these frameworks our suggested
solution may well be applicable - explicit aggregation and taking the incentive dynamics
seriously.
References
[1] Adda, J. and R. Cooper (2003): "Dynamic Economics: Quantitative Methods and
Applications", MIT Press, Cambridge.
[2] Aguirregabiria, V. and P. Mira (2008): "Dynamic Discrete Choice Structural Mod-
els: A Survey", Journal of Econometrics. Forthcoming.
[3] Blanchard, O. and L. Katz (1992): "Regional evolutions", Brookings Papers on
Economic Activity, 1, 1-75.
[4] Breitung, J. and W. Meyer (1994): "Testing for unit roots in panel data: are wages
on di¤erent bargaining levels cointegrated?", Applied Economics, 26, 353-361.
[5] Burda, M. (1993): "The Determinants of East-West German Migration: Some First
Results", European Economic Review, 37, 452-462.
[6] Burda, M., M. Müller, W. Härdle, and A. Werwatz (1998): "Semiparametric Analy-
sis of German East-West Migration Intentions: Facts and Theory", Journal of Ap-
plied Econometrics, 13, 525-541.
[7] Borjas, G. J. (1987): "Self-selection and the earnings of immigrants", American
Economic Review, 77, 531-553.
[8] Borjas, G. J., S. Bronars, and S. Trejo (1992): "Self-selection and internal migration
in the United States", Journal of Urban Economics, 32, 159-185.
[9] Caballero, R. and E. M. R. A. Engel (1999): "Explaining Investment Dynamics in
U.S. Manufacturing: A Generalized (S,s) Approach", Econometrica, 67, 783-826.
[10] Chow, C. S. and J. N. Tsitsiklis (1991): "An optimal multigrid algorithm for con-
tinuous state discrete time stochastic control", IEEE Transactions on Automatic
Control, 36, 898�914.
[11] Cushing, B. and J. Poot (2004): "Crossing boundaries and borders: Regional science
advances in migration modelling", Papers in Regional Science, 83, 317-338.
[12] Coen-Pirani, D. (2008): "Understanding Gross Workers Flows Across U.S. States",
mimeo, Carnegie Mellon University.
[13] Davies, P. S., M. J. Greenwood, and H. Li (2001): "A Conditional Logit Approach
to U.S. State-to-State Migration", Journal of Regional Science, 41, 337-360.
[14] Decressin, J. and A. Fatas (1995): "Regional Labor Market Dynamics in Europe",
European Economic Review, 39, 1627-1655.
[15] Gourieroux, C., A. Monfort, and E. Renault (1993): "Indirect inference", Journal
32
of Applied Econometrics, 8, S85-118.
[16] Greenwood, M. J. (1975): "Research on internal migration in the United States: A
survey", Journal of Economic Literature, 13, 397-433.
[17] Greenwood, M. J. (1985): "Human migration: Theory, models and empirical stud-
ies", Journal of Regional Science, 25, 521-544.
[18] Greenwood, M. J. (1997): "Internal migration in developed countries", in: Rosen-
zweig, M. R. and O. Stark (eds.), Handbook of population and family economics,
Volume 1B, Elsevier, Amsterdam.
[19] Hassler, J., J. V. R. Mora, K. Storesletten, and F. Zilibotti (2005). "A Positive
Theory Of Geographic Mobility And Social Insurance", International Economic
Review, 46, 263-303.
[20] Heckman J. J. (1974): "Shadow Prices, Market Wages, and Labor Supply", Econo-
metrica, 42, 679-94.
[21] Heckman J. J. (1976): "The common structure of statistical models of truncation,
sample selection, and limited dependent variables and a simple estimator for such
models", Annals of Economic and Social Measurement, 5, 475-492.
[22] Heckman J. J. (1978): "Dummy Endogenous Variables in a Simultaneous Equations
System", Econometrica, 46, 931�59.
[23] Heckman J. J. and R. Robb (1985): "Alternative Methods for Evaluating the Impact
of Interventions", Journal of Econometrics, 30, 239-267.
[24] Hunt, G. L. and R. E. Mueller (2004): "North American Migration: Returns to
Skill, Border E¤ects and Mobility Costs", Review of Economics and Statistics, 86,
988-1007.
[25] Keane M. P. and K. I. Wolpin (2009): "Empirical Applications of Discrete Choice
Dynamic Programming Models", Review of Economic Dynamics, 12, 1-22.
[26] Kennan, J. and J. R. Walker (2003): "The E¤ect of Expected Income on Individual
Migration Decisions", NBER Working Papers, No. 9585.
[27] Kennan, J. and J. R. Walker (2009): "The E¤ect of Expected Income on Individual
Migration Decisions", mimeo, Universitiy of Wisconsin.
[28] Lee L.F. (1978): "Unionism and Wage Rates: A Simultaneous Equation Model with
Qualitative and Limited Dependent Variables," International Economic Review 19,
415�33.
[29] Lee L.F. (1979): "Identi�cation and Estimation in Binary Choice Models with
Limited (Censored) Dependent Variables," Econometrica, 47, 977�96.
[30] Levin, A., C. F. Lin, and C. S. J. Chu (2002): "Unit Root Tests in Panel Data:
Asymptotic and Finite Sample Properties", Journal of Econometrics, 108, 1-24.
33
[31] Low H., C. Meghir, and L. Pistaferri (2008): "Wage Risk and Employment Risk
over the Life Cycle", IZA DP No. 3700.
[32] Nakosteen, R. A. and M. Zimmer (1980): "Migration and Income; The Question of
Self-Selection," Southern Economic Journal, 46, 840-851.
[33] Norets, A. (2008), "Implementation of Bayesian Inference in Dynamic Discrete
Choice Models", mimeo, Princeton University.
[34] Norets, A. (2009), "Inference in Dynamic Discrete Choice Models with Serially
Correlated Unobserved State Variables", Econometrica, forthcoming.
[35] Sjaastad, L. (1962): "The costs and returns of human migration", Journal of Polit-
ical Economy, 70, 80-93.
[36] Storesletten, K., C. I. Telmer, and A. Yaron (2004): "Cyclical Dynamics in Idio-
syncratic Labor-Market Risk", Journal of Political Economy, 112, 695-717.
[37] Sweeting, A. (2007): "Dynamic Product Repositioning in Di¤erentiated Product
Industries: The Case of Format Switching in the Commercial Radio Industry",
NBER-WP 13522.
[38] Tauchen, G. (1986): "Finite state Markov-chain approximation to univariate and
vector autoregressions", Economics Letters, 20, 177-181.
[39] Tunali, I. (2000): "Rationality of Migration", International Economic Review, 41,
893-920.
34
Appendix
A Derivation of equation (4)
For the covariance between the error term �i1 and location in period 0; yi0; (as described
in Section 2, eq. (4)), we obtain:
cov (yi0; �i1) =E (yi0�i1)� E (�i1)E (yi0)
=E (�i1jyi0 = 1)Pr (yi0 = 1)� E (�i1) Pr (yi0 = 1) :
Since E (�i1) is by assumption zero, this simpli�es to
cov (yi0; �i1) = E (�i1jyi0 = 1)Pr (yi0 = 1) :
Using the de�nition of
�i1 := (w�iA1 � w�iB1) + �i1
and
w�ij1 = �w�ij0 + "ij1
we can rewrite cov (yi0; �i1) as
cov (yi0; �i1) =E ( (� (w�iA0 � w�iB0) + "iA1 � "iB1) + �i1jwiA0 > wiB0) Pr (wiA0 > wiB0)
= �E ((w�iA0 � w�iB0) jwiA0 > wiB0) Pr (wiA0 > wiB0) ;
where the second equality follows from "ijt and vit being orthogonal to all information
available at time t�1 and mean zero. Making use of the de�nition of wijt := �jtzit+w�ijt
and the normality of w�ijt we obtain
cov (yi0; �i1) = �E ((w�iA0 � w�iB0) j (w�iA0 � w�iB0) > � (�A1 � �B1) zi1) Pr (wiA0 > wiB0)
= 2 ��0���(�A1��B1)zi1
2�0
��1� �
��(�A1��B1)zi1
2�0
�� Pr ((w�iA0 � w�iB0) > � (�A1 � �B1) zi1)= 2 ��0�
�(�A1 � �B1) zi1
2�0
�;
where � and � are the probability density function and cumulative distribution function
of a standard normal distribution respectively.
35
B Existence and uniqueness of the value function
We begin with proving existence and uniqueness of the value function. Notation is as in
the main text throughout this appendix, unless stated otherwise.
For the ease of exposition, we assume that the income process is only approximately
log-normal. In particular, we assume that income has a �nite support.
De�nition 1 Let W =�W;W
�be the support of w.
De�nition 2 De�ne a mapping T according to the migration problem of a household,
that is
T (u) (k;wiAt; wiBt) = maxj=A;B
�exp (wijt)� Ifk 6=jgc+ �Etu (j; wiAt+1; wiBt+1)
: (14)
The mapping T is de�ned on the set of all real-valued, bounded functions B that arecontinuous with respect to wA;B and have domain D = fA;Bg �W2:
Lemma 3 The mapping T preserves boundedness.Proof. To show that T preserves boundedness one has to show that for any bounded
function u also Tu is bounded. Consider u to be bounded from above by �u and bounded
from below by u: Then, Tu is bounded, because
Tu = maxj=A;B
�exp (wijt)� Ifk 6=jgc+ �Etu (j; wiAt+1; wiBt+1)
� exp
��W�+ ��u <1;
(15)
and
Tu= maxj=A;B
�exp (wijt)� Ifk 6=jgc+ �Etu (j; wiAt+1; wiBt+1)
(16)
� maxj=A;B
�exp (wijt)� Ifk 6=jgc+ �u
� exp (W ) + �u > �1: (17)
Lemma 4 The mapping T preserves continuity.Proof. Since Tu is the maximum of two continuous functions, it is itself continuous.
Lemma 5 The mapping T satis�es Blackwell�s conditions.Proof. First, we need to show that for any u1 (�) < u2 (�) the mapping T preserves
the inequality. Since both the expectations operator and the max operator preserve the
36
inequality, T does also. Second, we need to show that T (u+ a) � Tu + a for any
constant a and some < 1: Straightforward algebra shows that
T (u+ a) = Tu+ �a: (18)
Since � < 1 by assumption, T satis�es Blackwell�s conditions.
Proposition 6 The mapping T has a unique �xed point on B, and hence the Bellman-equation has a unique solution.
Proof. Follows straightforwardly from the last three Lemmas.
C E¤ect of Income Shocks on the Distribution of Income
Idiosyncratic shocks, aggregate shocks, death and birth of agents, and the persistency
of the income process determine the transition of the distribution of income incentives
after migration to the distribution of migration incentives before migration in the next
period: For surviving households the income distribution at the beginning of period t+1
results from adding idiosyncratic and aggregate shocks to the distribution of income
after migration in period t, Ft; of which fjt (wA; wB) is the conditional density, see (12).
This means that for a surviving household an income of wijt+1 in period t+1 can result
from any possible combination of wijt and �ijt+1 = �jt+1 + "ijt+1 for which
wijt+1 = �j (1� �) + �wijt + �jt+1 + "ijt+1 (19)
holds. Solving this equation for wijt; we obtain
w�j (wijt+1; �jt+1; "ijt+1) := wijt =wijt+1 � (�jt+1 + "jt+1)
�� �j
(1� �)�
: (20)
This w�j (wijt+1; �jt+1; "ijt+1) is the time-t potential income in region j that is consistent
with a future potential income of wijt+1 and realizations of shocks �jt+1 + "ijt+1 at the
beginning of period t + 1: Now suppose that both kinds of shocks, � and "; have been
realized. Then, w�A;B is a one-to-one mapping of future incomes (wiAt+1; wiBt+1) to
current income (wiAt; wiBt) :
The conditional density of observing the future income pair (wiAt+1; wiBt+1) can
thus be obtained from a retrospective. The income pair (w�A; w�B) of past incomes for
a surviving household corresponds uniquely to a future income pair (wiAt+1; wiBt+1) :
Consequently, we can express the density of the income distribution at time t + 1 (for
37
surviving households) using the income distribution after migration Ft; and its condi-
tional density fjt: The density of the income distribution �Ft+1 conditional on surviving
and the region and the vector of shocks is given by
�fjt+1 (wA; wBj�At+1; �Bt+1; "iAt+1; "iBt+1)
= fjt (w�A (wA; �At+1; "iAt+1) ; w
�B (wB; �Bt+1; "iBt+1)) : (21)
Weighting this density with the density of the idiosyncratic shocks h ("iAt+1; "iBt+1)
yields the density of observing the future income pair (w�A; w�B) together with the idio-
syncratic shocks ("iAt+1; "iBt+1) :
fjt (w�A (wA; �At+1; "iAt+1) ; w
�B (wB; �Bt+1; "iBt+1)) � h ("iAt+1; "iBt+1) :
Integrating over all possible idiosyncratic shocks ("iAt+1; "iBt+1) yields the density�fjt+1 of the income distribution before migration and conditional on surviving in period
t+ 1 for a certain combination of aggregate shocks (�At+1; �Bt+1):
�fjt+1 (wA; wBj�At+1; �Bt+1) =Zfjt (w
�A (wA; �At+1; "A) ; w
�B (wB; �Bt+1; "B)) � h ("A; "B) d"Ad"B; j = A;B: (22)
Finally, the actual conditional distribution of potential incomes Fjt+1 and its density
fjt+1 is determined by a convex combination of �fjt+1 (for surviving households) and the
distribution of income shocks (for newborn households)
fjt+1 (wA; wBj�At+1; �Bt+1; zAt+1; zBt+1)
= (1� �) �fjt+1 (wA; wBj�At+1; �Bt+1) + �h ("A � zAt+1; "B � zBt+1) :
For given aggregate states and shocks, this new distribution determines migration from
region j to region �j according to equation (10) for period t+ 1:The evolution of income distributions can thus be summarized as follows. Between
two consecutive periods, the conditional distribution of potential incomes �rst evolves
as a result of migration decisions, moving the density from fjt to fjt: Thereafter, the
distribution is altered by aggregate and idiosyncratic shocks to income, moving the
density from fjt to �fjt+1: Finally, a fraction of households dies and for this fraction the
distribution �fjt+1 is replaced by the distribution of income shocks. This leads to the
new distribution fjt+1, which determines migration decisions in period t + 1; starting
38
the cycle over again.
D Invariant distribution
We prove that migration decisions and idiosyncratic shocks to income imply that poten-
tial income follows an ergodic Markov-process if there are no aggregate shocks. There-
fore, there is an invariant distribution the sequence of income distributions converges
to. For simplicity, we present the proof for an arbitrary discrete approximation of the
model with a continuous state-space for income.
Lemma 7 Assume an arbitrary discretization of the state space with n points for thepotential income in each of the regions. Then, we can capture the transition from ft to
ft+1; which are the unconditional densities of the distribution of households over both
regions and potential incomes, in a matrix � =
�(I �DA) �DB
�DA �(I �DB)
!2 R2n2�2n2.15
In this matrix, � denotes the transition matrix that approximates the AR(1)-process for
income (including birth and death) by a Markov-chain, see Adda and Cooper (2003, pp.
56) for details. Matrix Dj ; j = A;B is the n2 � n2 diagonal matrix with the migration
hazard rates for each of the n2 income pairs of the income grid.
Proof. First, we take a discrete state-space of n possible incomes for each region,
wA1:::wAn and wB1:::wBn: Second, we denote the vector of probabilities that describes
the distribution of potential incomes and household locations in the following form
f =�f (A;wA1; wB1) ::: f (A;wAn; wB1) ::: f (A;wAn; wBn) f (B;wA1; wB1) ::: f (B;wAn; wBn)
�0:
(23)
Analogously, we de�ne the distribution after migration but before idiosyncratic shocks,
f . Taking our law of motion from (22) ; we obtain as a discretized analog
ft+1 =
� 0
0 �
!ft: (24)
Now, de�ne dh 2 f0; 1g as the fraction of households that migrate and are in the h-thincome and location triple given our vectorization of the income grid. This means that
dh = �j (wAk; wBl) ; h = 1:::2n2; where (j; wAk; wBl) is the h-th element in the vectorized
grid. Moreover, de�ne D = diag (d) as the diagonal matrix with migration rates on the
diagonal and DA and DB as the diagonal matrices with only the �rst n2 and the last n2
15Since we work with a discretization, strictly speaking f is not a density, but a vector of probabilitiesfor drawing a location-income possibility vector from a given element of the grid.
39
elements of d; respectively. Then, we can describe the transition from ft to ft by
ft =
I �DA DB
DA I �DB
!ft (25)
Combining the last two equations, we obtain
ft+1 =
�(I �DA) �DB
�DA �(I �DB)
!ft: (26)
Lemma 8 For any distribution of idiosyncratic shocks with support equal to W2; matrix
� has only strictly positive entries.
Proof. If the idiosyncratic shocks have support equal to W2; then every pair of potential
incomes can be reached from every other pair of incomes as a result of the shock, because
we assume the shocks to income to be approximately log-normal. Thus, all entries of �
are strictly positive.
Lemma 9 �2 has only positive entries.Proof. Due to �cA = ��cB; we can assume an ordering of states such that we can write
DA =
Ina 0
0 0
!and DB =
0 0
0 Inb
!; without loss of generality, where Iz is a z�z unit
matrix. Accordingly, we de�ne partitions of � such that
�=
A1 A2
A3 A4
!=
B1 B2
B3 B4
!
=
C1 C2
C3 C4
!=
D1D2
D3D4
!;
where A1 2 R(n2�na)�(n2�na); B1 2 Rnb�nb ; C1 2 Rna�na ; D1 2 R(n
2�nb)�(n2�nb):
This yields for �2 after some tedious algebra
�2 =
0BBBB@B2C3 A2A4 B2C4 A2B4
B4C3 A4A4 B4C4 A4B4
D1C1 C1A2D1D1 C1B2
D3C1 C3A2D3D1 C3B2
1CCCCA :
Each entry of this matrix is positive, since � and hence its partitions are positive.
40
Proposition 10 Under the assumptions of the above Lemmas, migration and idiosyn-cratic shocks de�ne an ergodic process with a stationary distribution F0 = limn!1Bnei:
Proof. The above Lemma directly implies the ergodicity of the Markov chain.
E Numerical aspects
The �rst step in solving the model numerically is to obtain a solution to (7) : We do so
by value-function iteration.16 For this value-function iteration, we �rst approximate the
bivariate process of potential incomes for an individual agent in regions A and B
wijt = �j (1� �) + �wijt�1 + �ijt (27)
by a Markov chain. Because wA and wB are correlated through the correlation structure
in �; it is easier to work with the orthogonal components�w+A ; w
+B
�of (wA; wB) in the
value function iteration.
We evaluate the value function on an equi-spaced grid for the orthogonal components
with a width of �3:5�+A;B around their means, where �+A;B denote the long-run standard
deviations of the orthogonal components. The grid is chosen to capture almost all move-
ments of the income distribution F later on.17 Given this grid, we use Tauchen�s (1986)
algorithm to obtain the transition probabilities for the Markov-chain approximation of
the income process in (27) :
We apply a multigrid algorithm (see Chow and Tsitsiklis, 1991) to speed up the
calculation of the value function. This algorithm works iteratively. It �rst solves the
dynamic programming problem for a coarse grid and then doubles the number of grid
points in each iteration until the grid is �ne enough. In between iterations the solution
for the coarser grid is used to generate the initial guess for the value-function iteration
on the new grid by spline interpolation. The initial grid has 16�16 points (income A �income B) and the �nal grid has 128�128 points.
The solution of (7) yields the optimal migration policy and thus the microeconomic
migration hazard rates �j : With these hazard rates, we can obtain a series of aggregate
16See for example Adda and Cooper (2003) for an overview of dynamic programming techniques.17The choice of �3:5�+A;B is motivated as follows. We obtain in the estimation that about 99%
of the income shocks is due to the idiosyncratic component. Therefore, we can expect 99.9% of themass of the income distribution to fall within �3:29 �
p0:99�+A;B
�= �3:27�+A;B around the mean of thedistribution for any given year. Additionally, the mean income for each year moves within the band�3:29 �
p0:01�+A;B
�= �0:33�+A;B in again 99.9% of all years. Since the sum of both components is�3:6�+A;B ; a grid variation of �3:5�
+A;B should not truncate the income distribution.
41
migration rates for a simulated economy as described in detail in Section 4.2 for any
realization of aggregate shocks (�jt)j=A;Bt=1:::T and an initial distribution F0:
This means that we need an initial distribution of income F0 to solve the sequen-
tial problem. Following Caballero and Engel�s (1999) suggestion, we use the ergodic
distribution of income �F that would be obtained in the absence of aggregate income
shocks.18
To simulate a series of migration rates that correspond to the aggregate migration
hazards���At;Bt
�t=1:::T
; we draw a series of aggregate shocks (to the orthogonal basis)��+At; �
+Bt
�t=1:::T
from a normal distribution with variance � ���+A;B
�2; � 2 [0; 1] : The
weight � measures the relative importance of aggregate shocks, relative to idiosyncratic
shocks, i.e. �2" = (1� �)�2� and �2� = ��2� : Correspondingly, the orthogonal components
of the idiosyncratic shocks have variance (1� �) ���+A;B
�2.
As we did to approximate the microeconomic income process for value function iter-
ation, we also discretize the distribution of migration incentives over the chosen grid of
income to simulate its evolution. Accordingly, we replace the conditional density in (10)
by discrete probabilities. This means that for grid points (�xAk; �xBl) k; l = 1:::64; (k; l
being the index of grid points) with a distance of 2h in between points, we calculate the
probabilities initially (for t = 0 and before the �rst aggregate shock) as
�pk;l;0 =
Z �xA;k+h
�xA;k�h
Z �xB;l+h
�xB;l�hf0 (x1; x2) dx1dx2:
An aggregate shock �j in t = 0 now implies that the o¤-grid pair (�xA;k + �A; �xB;l + �B)
occurs with probability �pk;l;0 after the aggregate shock. To re-obtain on-grid probabili-
ties, we use spline interpolation methods to �nd the on-grid probability after aggregate
but before idiosyncratic shocks, �pk;l;1; restricting p to take values between 0 and 1. That
is, for each t we de�ne a function � with � t (�xA;k + �A;t+1; �xB;l + �B;t+1) := �pk;l;t and
obtain �pk;l;t+1 as
�pk;l;t+1 = � t (�xA;k; �xB;l)
where � t is the interpolation of � t: Idiosyncratic shocks are accounted for by multiplying
after-aggregate-shock, on-grid income probabilities with the transition probability matrix
obtained from Tauchen�s algorithm and thus obtain �pk;l;t+1, the probability to fall in the
k; l-cluster after idiosyncratic shocks. The e¤ect of migration on the distribution of
migration incentives is captured by using a discretized version of (12) :
18This distribution is calculated by assuming that idiosyncratic shocks ! have the full variance of �:
42
We calculate aggregate migration rates this way, i.e. by directly simulating the
evolution of the incentive distribution instead of using a Monte Carlo method based on
drawing a sample of agents, for the reason that the latter is not adequate in our case.
We focus on aggregate behavior, but aggregate shocks turn out to be relatively small
(being responsible for less than 1% of the total variation in income, see the discussion
in Section 5.2). Hence sampling variation would exceed the true aggregate variation of
income most likely if we applied a Monte Carlo approximation.
F Data
Data on migration between US federal states is provided by the US Internal Revenue
Service (IRS). The IRS uses individual income tax returns to calculate internal migration
�ows between US states. In particular, the IRS compiles migration data by matching
the Social Security number of the primary taxpayer from one year to the next. The IRS
identi�es households with an address change since the previous year, and then totals
migration to and from each state in the US to every other state. Given these bilateral
migration �ows, we compute aggregate gross immigration for the 50 US states and the
District of Columbia as the sum of all immigrations from other US states to a particular
state. Migration rates are calculated by expressing gross immigration as proportions
of the number of non-migrants reported in the IRS dataset. The IRS state-to-state
migration-�ow data is available for the years 1989 - 2004.
Income per capita data is taken from the Regional Economic Information System
(REIS) compiled by the Bureau of Economic Analysis (BEA). We use personal income as
the relevant income concept. The REIS data is available online at www.bea.gov/bea/regional/reis/.
The income-per-capita �gure for the alternative region is computed as the population-
weighted mean of all per capita incomes outside a speci�c state.
We remove a linear time trend from all data and express all variables as deviations
from their unit-speci�c means (re-scaled by their overall mean), i.e. we apply a within-
transformation. Table 6 reports descriptive statistics for the original as well as for the
transformed data.
In order to examine the time-series properties of the data employed, we perform a
unit-root analysis for migration rates and income data. In a sample of size T = 16 and
N = 51 either a Breitung and Meyer (1994) or a Levin, Lin, and Chu (2002) unit-root
test is most appropriate. For the Breitung and Meyer (1994) test, we determined the
optimal augmentation lag length by sequential t�testing. Taking into account threeaugmentation lags and time-speci�c e¤ects, we can reject the null hypothesis of a unit
root at the 5% level of signi�cance. Similarly, the Levin, Lin, and Chu (2003) test rejects
43
Table 6: Descriptive statistics
Mean Std. Dev. Min Max
Migration Rate .0393 .0178 .0144 .1146
Migration Rate Filtered .0393 .0036 .0234 .0629
Income per Capita 10.71 .1644 10.39 11.27
Income per Capita Filtered 10.71 .0312 10.61 10.82
Complementary Income per Capita 10.76 .0624 10.67 10.86
Complementary Income per Capita Filtered 10.76 .0239 10.73 10.80
the null hypothesis of a unit root taking a linear time trend into account.
G Further robustness checks
Table 7 displays the estimation results from two further robustness checks. First we
include the variance of income shocks in the set of model parameters to be estimated.
This does neither change the point estimates of the other parameters substantially, nor
does it increase their con�dence bounds. The variance of income shocks itself has a
standard deviation of roughly 20% of the point estimate.
This motivates us to run another robustness check, where we increase the standard
deviation of shocks by 40% in order to understand how sensitive the other parameter
estimates react to changes in this calibrated parameter. Again changes are small. The
estimated migration costs increase by 16.8%, while the other parameter estimates remain
virtually unchanged.
44
Table 7: Simulated moments estimation: structural parameter estimates, further robust-ness checks
Parameter Estimating the Variance of PersistentVariance of Persistent Income Shock �2" set 40%Income Shock �2" higher than baseline (=.0289)
Migration Costs 33,1041 38,6591
(4,347) (4,277)
Correlation of Income Shocks 0.2523 0.2803Across Regions (0.1827) (0.1548)
Fraction of Income Shock 0.0011 0.0014due to Aggregate Fluctuations � (0.0012) (0.0001)
Standard Deviation of 0.0270 0.0265Transitory Income Shock �' (0.0012) (0.0012)
Variance of Persistent 0.0297 0.04053
Income Shock �2" (0.0062) �
Overidenti�cation test �2(2) 0.79432 3.1225p-value 0.3728 0.2099
1Migration cost estimate c in US$ terms. 2 Only one degree of freedom, 3not estimatedStandard errors in parenthesis. Estimation is carried out using the simulated momentsestimator by Gourieroux, Monfort, and Renault (1993), which chooses structural modelparameters by matching the moments from a simulated panel of regions with data moments asdisplayed in Table 1. The simulations generate a panel of 51 region-pairs and an 71-yearhistory of migration and income data. The �rst 55 years of simulated data are dropped in orderto minimize the in�uence of initial values. Each simulation is repeated 5 times and datamoments are compared to the average over the 5 replications of the simulation.
45