estimators of the intergenerational elasticity of … · 2019-12-12 · the intergenerational...

49
ESTIMATORS OF THE INTERGENERATIONAL ELASTICITY OF EXPECTED INCOME Pablo A. Mitnik ([email protected]) December, 2017 The Stanford Center on Poverty and Inequality is a program of the Institute for Research in the Social Sciences at Stanford University.

Upload: others

Post on 13-Mar-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

ESTIMATORS OF THE INTERGENERATIONAL ELASTICITY OF EXPECTED INCOME

Pablo A. Mitnik ([email protected])

December, 2017

The Stanford Center on Poverty and Inequality is a program of the Institute for Research in the Social Sciences at Stanford University.

Abstract

The intergenerational income elasticity (IGE) conventionally estimated in the mobility literature

has been widely misinterpreted as pertaining to the conditional expectation of children’s income,

when in fact it pertains to its conditional geometric mean. In line with recent work, this article

focuses on the estimation of the IGE of expected income. It proposes that, in the one-sample

context, estimation of this IGE be based on the Poisson Pseudo Maximum Likelihood estimator,

or on a Generalized Method of Moments (GMM) instrumental variable estimator of the Poisson

or exponential regression model, depending on the parental information available. These

estimators can also be used together to generate a set estimate of the IGE of the expectation. In

the two-sample context, the article proposes that estimation be based on a recently advanced

two-sample GMM estimator of the exponential regression model. The article explains how to use

official Stata commands to estimate the IGE of the expectation with the first two estimators, and

how to construct confidence intervals for the partially identified IGE when their estimates are

combined to generate a set estimate. It also introduces the user-written program igets, which

implements the third estimator as well as a GMM version of the Two-Sample Two-Stage Least

Squares estimator, and thus allows to use Stata to estimate both the IGE of the expectation and

the IGE of the geometric mean when the income information for parents and children is available

in two independent samples.

1. Introduction

The intergenerational elasticity (IGE) is the workhorse measure of intergenerational

economic mobility. It has been widely estimated over the last four decades, very often with the

goal of providing a summary assessment of the level of income or earnings mobility within a

country (for reviews, see Solon 1999:1778-1788; Corak 2006; Jäntti and Jenkins 2015:Secs.

10.5.2 and 10.5.3). Beyond this elementary goal, the IGE has been used, among other purposes,

to conduct comparative analyses of economic mobility and persistence across countries, regions,

demographic groups, cohorts, and time periods (e.g., Chadwick and Solon 2002; Hertz 2005,

2007; Aaronson and Mazumder 2008; Björklund and Jäntti 2000; Mayer and Lopoo 2008;

Bloome and Western 2011); to examine the relationship between cross-sectional economic

inequality and mobility across generations (e.g., Corak 2013; Bloome 2015); and to theoretically

model and empirically study the impact of social policies and political institutions on inequality

of opportunity (e.g., Solon 2004; Bratsberg et al. 2007; Landersø and Heckman 2016).

Now, despite the IGE’s centrality in the intergenerational-mobility field, Mitnik and

Grusky (2017) have recently shown that this elasticity has been widely misinterpreted. Indeed,

the IGE has been construed as pertaining to the expectation of children’s income conditional on

their parents’ income—as apparent, for instance, in its oft-invoked interpretation as a measure of

regression to the (arithmetic) mean. However, the IGE estimated in the literature pertains to the

conditional geometric mean of the children’s income. As explained later, this not only makes all

conventional interpretations of the IGE invalid but also has very deleterious methodological

consequences.

Mitnik and Grusky (2017) have argued that both the conceptual and the methodological

problems can be solved in a straightforward manner by simply replacing the IGE of the

2

geometric mean (the de facto estimated IGE) by the IGE of the expectation (the IGE that

mobility scholars thought they were estimating) as the workhorse intergenerational elasticity.

They also called for effectuating such replacement, which requires identifying appropriate

estimators.

In line with recent work that has estimated the IGE of the expectation (Mitnik et al. 2015;

Mitnik and Grusky 2017; Mitnik 2017a; Mitnik 2017b; Mitnik 2017c), I propose that when

income information for both parents and children is included in the same sample, estimation of

that IGE be based, depending on the parental information available, on (a) the Poisson Pseudo

Maximum Likelihood (PPML) estimator, or (b) the additive-error version of the Generalized

Method of Moments (GMM) instrumental variables (IV) estimator of the Poisson or exponential

regression model (to which I will refer as the GMM-IVP estimator). As I explain later, in some

cases it may be useful to use these estimators together to generate a set estimate of—that is, to

“bracket” or provide bounds for—the IGE of the expectation. In the very common situation in

which the income information for parents and children is only available in independent samples,

I propose that estimation be based on the two-sample GMM estimator of the exponential

regression model advanced by Mitnik (2007c), to which I refer as the GMM-E-TS estimator.

In this article I show how to estimate the IGE of the expectation with the PPML and

GMM-IVP estimators using official Stata commands, and introduce the user-written program

igets, which implements the GMM-E-TS estimator and thus allows to estimate the IGE of the

expectation in the two-sample context; this new program also implements a GMM version of the

estimator typically employed to estimate the IGE of the geometric mean in that context, i.e. the

Two-Sample Two-Stage Least Squares (TSTSLS) estimator (see, e.g., Inoue and Solon 2010). I

also explain how to use Stata to construct the confidence interval for partially identified

3

parameters proposed by Imbens and Manski (2004), for the case where IGE bounds estimated

with the PPML and GMM-IVP estimators are combined to generate a set estimate of the IGE of

the expectation. The Stata implementation introduced here can be easily adapted for use in any

context in which the goal is to compute confidence intervals for partially identified parameters.

The structure of the rest of the paper is as follows. In the second section I explain why the

conventional IGE pertains to the conditional geometric mean of children’s income rather than to

its conditional expectation, as well as Mitnik and Grusky’s (2017) proposal to redefine the IGE

used as the workhorse measure of economic mobility. In the next three sections I identify the

estimators that can be employed to estimate the IGE of the expectation, lay out the reasons for

preferring the PPML, GMM-IVP and GMM-E-TS estimators, introduce the expression for a

confidence interval for a partially-identified IGE, and briefly discuss how to avoid “lifecycle

biases.” In the following three sections I show how to use official Stata commands to estimate

the IGE of the expectation in the one-sample context and to compute confidence intervals when

the IGE is partially identified, present the command igets, and explain how to use it to estimate

IGEs in the two-sample context. The last section offers brief concluding comments.

2. The IGE of what? Redefining the workhorse intergenerational elasticity

As already indicated, the conventionally estimated IGE has been widely misinterpreted.

While mobility scholars have interpreted it as the elasticity of the expectation of children’s

income or earnings conditional on parental income, that IGE pertains in fact to the conditional

geometric mean. Closely following Mitnik and Grusky’s (2017) analysis, the standard population

regression function (PRF) posited in the literature, which assumes the elasticity is constant across

levels of parental income, is:

𝐸𝐸(ln𝑌𝑌 |𝑥𝑥) = 𝛽𝛽0 + 𝛽𝛽1 ln 𝑥𝑥, [1]

4

where 𝑌𝑌 is the child’s long-run income or earnings, X is long-run parental income or father’s

earnings, 𝛽𝛽1 is the IGE as specified in the literature, and I use expressions like “Z|𝑤𝑤” as a

shorthand for "𝑍𝑍|𝑊𝑊 = 𝑤𝑤.” The parameter 𝛽𝛽1 is not, in the general case, the elasticity of the

conditional expectation of the child’s income. This would hold as a general result only if

𝐸𝐸(ln𝑌𝑌|𝑥𝑥) = ln𝐸𝐸(𝑌𝑌|𝑥𝑥). But, due to Jensen’s inequality, the latter is not the case. Instead, as

𝐸𝐸(ln𝑌𝑌 |𝑥𝑥) = ln exp𝐸𝐸(ln𝑌𝑌 |𝑥𝑥), and 𝐺𝐺𝐺𝐺(𝑌𝑌|𝑥𝑥) = exp𝐸𝐸(ln𝑌𝑌 |𝑥𝑥), Equation [1] is equivalent to

ln𝐺𝐺𝐺𝐺(𝑌𝑌|𝑥𝑥) = 𝛽𝛽0 + 𝛽𝛽1 ln 𝑥𝑥 , [2]

where GM denotes the geometric mean operator. Therefore, 𝛽𝛽1 is the elasticity of the conditional

geometric mean, i.e., the percentage differential in the geometric mean of children’s long-run

income with respect to a marginal percentage differential in parental long-run income.1

As the geometric mean is undefined whenever an income distribution includes zero in its

support, the IGE is undefined as well when this is the case. Mitnik and Grusky (2017:Section IV)

have argued that this has serious methodological consequences: It (a) makes it impossible to

determine the extent to which parental economic advantage is transmitted through the labor

market among women (as many women have zero earnings), and (b) greatly hinders research on

the role that marriage plays in generating the observed levels of intergenerational persistence in

family income (as many people remain single or have nonworking spouses, and therefore cannot

be included in analyses examining the relationship between people’s parental income and the

income contributed by their spouses). As a result, the study of gender and marriage dynamics in

intergenerational processes has been badly hampered. Equally important, Mitnik and Grusky

(2017:Section III) have shown that, as a consequence of mobility scholars’ expedient of dropping

children with zero earnings from samples (to address what is perceived as the problem of the

1 The parameter 𝛽𝛽1 is (also) the IGE of the expectation only when the error term satisfies very special conditions (Santos Silva and Tenreyro 2006; Petersen 2017; Wooldridge 2002:17).

5

logarithm of zero being undefined), estimation of earnings IGEs with short-run proxy earnings

measures is affected by substantial selection biases. This makes the current use of the IGE of

men’s individual earnings as an index of economic persistence and mobility in a country a rather

problematic practice.2

To address these problems, Mitnik and Grusky (2017) have called for redefining the

workhorse measure of economic mobility. This entails replacing the PRF of Equation [1] by a

PRF whose estimation delivers estimates of the IGE of the expectation in the general case. Under

the assumption of constant elasticity, that PRF can be written as:

ln𝐸𝐸(𝑌𝑌|𝑥𝑥) = 𝛼𝛼0 + 𝛼𝛼1 ln 𝑥𝑥 , [3]

where 𝑌𝑌 ≥ 0, 𝑋𝑋 > 0 and 𝛼𝛼1 = 𝑑𝑑 ln𝐸𝐸(𝑌𝑌|𝑥𝑥)𝑑𝑑 ln𝑥𝑥

is the percentage differential in the expectation of

children’s long-run income with respect to a marginal percentage differential in parental long-run

income. Crucially, (a) all interpretations incorrectly applied to the conventional IGE are correct

or approximately correct under this formulation (see Mitnik and Grusky 2017:Section V.A), and

(b) the IGE of the expectation is fully immune to the methodological problems affecting the IGE

of the geometric mean and, in particular, is very well suited for studying the role of marriage in

the intergenerational transmission of advantage (see Mitnik and Grusky 2017:Section V.B for

details; and Mitnik et al. 2015:64-68 for an empirical application).

3. Estimation of the IGE of the expectation in the one-sample context

Although IGEs are defined in terms of long-run income variables, they are nearly always

estimated with short-run proxy measures. As a result, estimation is affected by both left-side and

right-side measurement errors, i.e., by the fact that the children’s and the parents’ long-run

2 Importantly, the methodological problems discussed in this paragraph can’t be solved by replacing zeros by “small values” (Mitnik and Grusky, 2017:15-16).

6

incomes are measured with error. Regardless of IGE concept and regardless of the estimator

used, this makes IGE estimates susceptible to left-side and right-side “lifecycle biases,” which

occur when the short-run income measures are obtained when parents or children are too young

or too old for their incomes to represent lifetime income differences well (for lifecycle biases in

the estimation of the conventional IGE see, e.g., Black and Devereux 2011). To simplify the

discussion that follows, however, I assume in this section that lifecycle biases are not an issue,

and that the errors are classical in nature. I will consider lifecycle biases again in Section 5.

In the one-sample context, the conventional IGE—that is, the IGE of the geometric

mean—has mostly been estimated with the Ordinary Least Squares (OLS) and the Two-Stage

Least Squares (TSLS) estimators. Making the IGE of the expectation the workhorse measure of

economic persistence, as proposed by Mitnik and Grusky (2017), requires identifying estimators

that can play for the estimation of this IGE the same roles that those estimators have played for

the estimation of the conventional IGE.

3.1. Estimation with multi-year averages of parental income

OLS has been, by far, the approach most commonly used to estimate the conventional

IGE in the one-sample context. After substituting short-run income variables for the long-run

income variables in Equation [1], estimation has been carried out almost in the same way it

would be carried out with the long-run variables.3 However, due to the use of short-run income

variables, estimation is affected by right-side attenuation bias (e.g., Solon 1999; Mazumder

2005). Formal analyses (e.g., Solon 1992; Haider and Solon 2006) indicate that using as the

proxy measure a multiyear average of parents’ income—rather than a single-year measure—can

3 The only difference is that polynomials in the parents’ and the children’s ages are customarily added as controls, to account for within-generation differences in the ages at which incomes are measured. For an argument against controlling for parental age, see Mitnik et al. (2015:40-41).

7

be expected to reduce attenuation bias. Moreover, there is strong evidence that the bias may be

substantially shrunken this way, although there is disagreement on how many years of

information are necessary to eliminate the bulk of it (see Mitnik et al. 2015:7-15).

Several estimators may be used to estimate the IGE of the expectation in the same

contexts in which the OLS estimator has been used to estimate the IGE of the geometric mean.

These include the Nonlinear Least Squares estimator and PML estimators relying on various

distributions in the linear exponential family (Poisson, gamma, inverse Gaussian).4 There are

good reasons, however, to prefer the PPML estimator employed by Mitnik et al. (2015), Mitnik

and Grusky (2017), and Mitnik (2017a; 2017b). Santos Silva and Tenreyro (2006, 2011)

provided conceptual arguments and empirical evidence favoring the use of this estimator for the

estimation of constant-elasticity models. In addition, Mitnik et al. (2015:ftn. 40) reported that, in

some cases, the PPML estimator worked better than the gamma PML estimator for the

estimation of the IGE of the expectation (for the latter estimator, see Manning and Mullahy

2001).

After substituting short-run income variables for the long-run income variables in

Equation [3], the PPML estimator can be used to estimate the IGE of the expectation with any

data used in the past (or, more generally, that could be used) to estimate the conventional IGE by

OLS. Given that estimation of the IGE of the expectation with short-run proxy measures does not

require dropping children with zero earnings or income from samples, this IGE is fully immune

to the selection bias that may affect the estimation of the IGE of the geometric mean (see Mitnik

and Grusky 2017). Nevertheless, Mitnik (2017a) has shown that the use of short-run measures to

4 PML estimators are consistent regardless of the distribution of the dependent variable, provided that the mean function is correctly specified (Gourieroux, Monfort, and Trognon 1984).

8

estimate the IGE of the expectation with the PPML estimator produces an attenuation bias that is

qualitatively very similar to that affecting the IGE of the geometric mean. His formal analysis

indicates that the strategy of using a multiyear average of parents’ income can also be expected

to work in this case, while his empirical evidence, using data from the Panel Study of Income

Dynamics (PSID), shows that the attenuation bias decreases substantially as more years of

parental information are added. His results also indicate, however, that estimates of the IGE of

the expectation with parental measures based on three to five years of information (the measures

often used by mobility scholars) would be affected by substantial attenuation bias. In fact, it

appears that, with survey data, at least 13 years of information are needed to eliminate the bulk

of that bias.5

The PPML estimator has been implemented by several official Stata commands (more on

this latter). Therefore, information on methods and formulas can be found in the corresponding

sections of the Stata Manual and it’s not necessary to include it here.

3.2. Estimation with instrumental variables

Very often there are much fewer years of parental information available than what would

be needed to (nearly) eliminate attenuation bias, or even to reduce it substantially (it is not

uncommon that only one annual measure of parental income is available). Therefore, a natural

alternative is to address right-side measurement error by resorting to IV estimation. In the case of

the conventional IGE, mobility scholars have often used the TSLS estimator, instrumenting

error-ridden proxy measures of the logarithm of long-run parental income with variables like

years of parental education or father’s occupation (e.g., Solon 1992; Zimmerman 1992; Mulligan

5 This is similar to what has been reported for the conventional IGE with the same data (Mazumder 2016: Tables 1 and 2). Mitnik (2017a:29-30) suggests two reasons why fewer years might be needed with tax and other administrative data.

9

1997; Ng 2007). The main concern with this strategy has been, however, that the instruments

typically available (including parental education and occupation) are most likely positively

correlated with both the logarithm of short-run parental income and the error term in the PRF of

interest, making them endogenous. In this context, IV estimates may still be useful if the sign of

their asymptotic bias can be established. An analysis by Solon (1992: Appendix; see also Mitnik

2017b:8-10) achieved that for the conventional IGE, showing that if some plausible substantive

assumptions hold then the IV estimator of that IGE with the invalid instruments typically

available is upward inconsistent. The standard interpretation in the literature has been that this

result implies that IV estimates are useful as upper-bound estimates. Mitnik (2017:22-31)

provided empirical evidence fully consistent with this interpretation; he also showed that

instruments vary greatly with regards to the tightness of the upper bounds that they provide.

There are several IV estimators that could be used to estimate the IGE of the expectation

in any context in which the estimation of the conventional IGE has relied (or might rely) on

instrumental variables. These include the multiplicative- and additive-error versions of the GMM

IV estimator of the Poisson or exponential regression model (Mullahy 1997; Windmeijer and

Santos Silva 1997), and five two-step estimators: Two different quasi-maximum-likelihood IV

estimators of the Poisson or exponential regression model (Wooldridge 1999:Sec. 6.1; Mullahy

1997:590-591); a residual-inclusion estimator for nonlinear parametric models (Terza et al.

2008); and two different regression-calibration IV estimators of generalized linear models with

measurement error in covariates (Carroll et al. 2006:Ch. 6). The two GMM IV estimators seem

preferable to all other estimators. First, they make weaker assumptions, as they neither require

the functional-form assumptions that the other estimators need for their first estimation step, nor

the additional assumptions that the “predictor-substitution” estimators of Mullahy (1997:590-

10

591) and Carroll et al. (2006) invoke. Second, the GMM IV estimators involve standard

asymptotic inferential procedures, while the other estimators require using more complicated

closed-form asymptotic variance estimators (e.g., Murphy and Topel 1985; Hardin 2002), or

resampling methods.

Here I focus on the additive-error version of the GMM-IV estimator (to which, as

indicated earlier, I will refer as the GMM-IVP estimator), not because there are reasons to

believe that if performs better than its multiplicative-error counterpart, but for a pragmatic

reason. Mitnik (2017b) has advanced a formal analysis showing that when the GMM-IVP

estimator is used to estimate the IGE of the expectation with the instruments typically available,

the probability limit of the estimator is larger than the true parameter under substantive

assumptions analogous to those made by Solon (1992) in the case of the IGE of the geometric

mean. Therefore, as with the latter IGE, estimates of the IGE of the expectation obtained with the

GMM-IVP estimator and the instruments typically available can be expected to provide upper-

bound estimates. Mitnik (2017:22-31) offered empirical evidence strongly supporting the notion

that this is in fact the case. As with the conventional IGE, his results also indicate that

instruments vary greatly with regards to the tightness of the upper bounds that they provide. For

this reason, Mitnik suggested putting a good amount of effort into searching for “best invalid

instruments.” This may involve looking for additional instruments beyond those typically

employed by mobility researchers; using multiple instruments simultaneously, and possibly

including interactions between them; and exploring the effects of alternative functional forms

(e.g., entering an instrument in levels or in logarithms), as this has been shown to be very

consequential in some contexts (Reiss 2016).

The GMM-IVP estimator has been implemented by an official Stata command (more on

11

this latter). Thus, it isn’t necessary to include here information regarding methods and formulas,

as it is already available in the Stata Manual.

3.3. The bracketing strategy and inference under partial identification

If there are fewer years of parental information than what is needed to eliminate the bulk

of attenuation bias, the PPML estimator of the IGE of the expectation is downward inconsistent,

while the GMM-IVP estimator is upward inconsistent with the instruments typically available.

As a result, the probability limits of the PPML and GMM-IVP estimators “bracket” the true

value of the IGE of the expectation. This suggests—in direct analogy to what Solon (1992:400)

proposed for the conventional IGE and the OLS and IV estimators—that the PPML and GMM-

IVP estimators be used together to bracket or to provide bounds for the IGE of the expectation

(Mitnik 2017b). Now, although this “bracketing strategy” generates a set rather than a point

estimate, it is nevertheless possible to construct a confidence interval for the parameter of

interest. This can be achieved by considering the bracketing strategy from the perspective of the

partial-identification approach to inference (for this approach, see, e.g., Tamer 2010).

Indeed, the fact that the probability limits of the PPML and GMM-IVP estimators

provide a lower and an upper bound, respectively, for the IGE entails that the latter is only

“partially identified” (e.g., Manski 2003) by data on short-run incomes and the instruments: Even

if we could obtain an unlimited number of observations, we would not be able to learn the true

value of the IGE. As long as only short-run income information and invalid instruments are

available, we can only aim at learning what the range of values consistent with those data—the

“identified set” defined by the probability limits of the estimators—is. And, of course, we never

have an unlimited number of observations but, rather, need to estimate the bounds from a finite

sample. This means we need to take into account both the uncertainty due to partial identification

12

and the uncertainty regarding the estimated bounds.

Ideally, we would like to provide one confidence interval that (a) pertains to the partially

identified IGE rather than to the identified set, (b) reflects uncertainty due both to partial

identification and to sampling variability, and (c) converges to its nominal value uniformly

across values of the width of the identified set.6 Denoting the probability of type-I error by 𝛼𝛼,

and following Imbens and Manski (2004), a 100 (1 − 𝛼𝛼) % confidence interval meeting these

three requirements, denoted by 𝐶𝐶𝐶𝐶(𝛼𝛼), can be constructed as follows:

𝐶𝐶𝐶𝐶(𝛼𝛼) ≡ �𝛼𝛼�1𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 − 𝑐𝑐(𝛼𝛼) 𝑆𝑆𝐸𝐸� (𝛼𝛼�1𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃), 𝛼𝛼�1𝐺𝐺𝑃𝑃𝑃𝑃−𝐼𝐼𝐼𝐼𝑃𝑃 + 𝑐𝑐(𝛼𝛼) 𝑆𝑆𝐸𝐸� (𝛼𝛼�1𝐺𝐺𝑃𝑃𝑃𝑃−𝐼𝐼𝐼𝐼𝑃𝑃)�,

where 𝑐𝑐(𝛼𝛼) solves

Φ(𝑐𝑐(𝛼𝛼) + 𝑅𝑅𝑊𝑊) −Φ�−𝑐𝑐(𝛼𝛼)� = 1 − 𝛼𝛼;

𝑅𝑅𝑊𝑊 = 𝛼𝛼�1𝐺𝐺𝑃𝑃𝑃𝑃−𝐼𝐼𝐼𝐼𝑃𝑃 − 𝛼𝛼�1𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃

max �𝑆𝑆𝐸𝐸� (𝛼𝛼�1𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃), 𝑆𝑆𝐸𝐸� (𝛼𝛼�1𝐺𝐺𝑃𝑃𝑃𝑃−𝐼𝐼𝐼𝐼𝑃𝑃)�

is an (estimate of) the relative width of the identified set; SE is the standard error operator; Φ(. )

denotes the CDF of the standard normal distribution; and the superscripts identify estimators.

Here, 𝑐𝑐(𝛼𝛼) is inversely related to 𝑅𝑅𝑊𝑊. With a 95 percent confidence interval, 𝑐𝑐(𝛼𝛼) is close to

Φ−1(0.90) ≅ 1.64 when the width of the identified set is large compared to sampling error, and

is equal to Φ−1(0.95) ≅ 1.96 under point identification.7

4. Estimation of the IGE of the expectation in the two-sample context

Relatively few countries count with samples in which the incomes of parents and their

children are both measured in some period during adulthood. For this reason, the conventional

6 Except for Mitnik (2017b), research using the bracketing strategy to estimate the conventional IGE has reported separate confidence intervals for the bounds estimated by the OLS and IV estimators, i.e., for the bounds of the identified set. 7 See Imbens and Manski (2004) and Stoye (2009) for detailed discussions of point (c) above, and for the rationale for this confidence interval. Mitnik (2017b:20-22) offers a brief summary of the intuition behind the latter.

13

IGE has very often been estimated not just with annual and other short-run proxy income

measures, but with short-run income measures drawn from two independent samples and using

the TSTSLS estimator (see Jerrim et al. 2016). As in the case of the one-sample estimates

generated with the TSLS estimator, and for the same reason (i.e., the endogeneity of the

instruments typically available), the two-sample estimates of that IGE have been interpreted as

upper-bound estimates. Making the IGE of the expectation the workhorse measure of economic

persistence requires identifying an estimator that can play for this IGE in the two-sample context

the same role played by the TSTSLS estimator of the conventional IGE in that context. Unlike in

the one sample-context, however, in which there are several possible estimators, here the only

option seems to be the two-sample GMM estimator of the exponential regression model

advanced by Mitnik (2017c). This estimator is based on a two-sample two-step estimator. I

present this estimator next. I first assume that the income variables are measured without error,

and then introduce measurement error into the analysis. This is followed by a description of the

approach for transforming the two-sample two-step estimator into a GMM estimator, and a brief

summary of what is known about the empirical performance of the estimator.

4.1. The two-sample two-step estimator with a valid instrument

In the two-sample context, the “main sample” has the children’s income information, the

“auxiliary sample” has the parents’ income information, and both samples have a common set of

variables (e.g., parents’ education or father’s occupation) that may be used as instruments or

predictors for the parents’ income information. Operationally, the two-sample two-step estimator

simply extends the approach used by the TSTSLS estimator of the linear regression model to the

estimation of the exponential regression model. Thus, when there is only one instrumented

variable, the first step estimates a linear projection of that variable on the instruments and any

14

other right-side variables included in the second step, using information from the auxiliary

sample; while the second step estimates the exponential regression model of interest using the

PPML estimator and the main sample, with the instrumented variable replaced by its predicted

values (which are computed with the parameters estimated in the first step).

I specify in this subsection the assumptions under which this two-sample two-step

estimator of the exponential regression model is consistent or approximately consistent when all

income variables are measured without error. Without any loss of generality, I assume in what

follows that 𝐸𝐸(𝑌𝑌) = 𝐸𝐸(ln𝑋𝑋) = 1. 8

Rewriting Equation [3] in additive-error form, the PRF of interest is

𝑌𝑌 = exp(𝛼𝛼0 + 𝛼𝛼1 ln𝑋𝑋) + Ψ, [4]

where 𝐸𝐸(Ψ|𝑥𝑥) = 0. Assuming for simplicity that there is only one quantitative instrument

denoted by T (e.g., years of parental education), the fist-step equation is the following population

linear projection:

ln𝑋𝑋 = 𝛾𝛾0 + 𝛾𝛾1𝑇𝑇 + Υ. [5]

Consider now the following assumptions, which apply to all t when relevant:

𝐴𝐴1. 𝐸𝐸(𝑅𝑅|𝑡𝑡) = 0

𝐴𝐴2.∀𝑐𝑐 > 0,𝐸𝐸(exp(𝑐𝑐𝑅𝑅)|𝑡𝑡) = 𝐸𝐸(exp(𝑐𝑐𝑅𝑅))

𝐴𝐴2′.𝑉𝑉𝑉𝑉𝑉𝑉(𝑅𝑅|t) = Var(𝑅𝑅)

𝐴𝐴3. 𝛾𝛾1 ≠ 0

𝐴𝐴4. 𝐸𝐸(Ψ|𝑡𝑡) = 𝐸𝐸(Ψ).

8 This involves no loss of generality because it can always be achieved by simply changing the monetary units used to measure income.

15

Assumptions A1 and A3 entail that the expectation of the logarithm of parental income

conditional on the value of T is a linear function of that value, while assumptions A3 and A4

entail that T is a valid instrument. Assumption A2 is similar to the standard assumption made for

the estimation of Poisson models with unobserved heterogeneity (see, e.g., Winkelman 2008).

Assumption A2’, which posits that the error in the first-step equations is homoscedastic, provides

an alternative to A2. Although A2’ is not strictly weaker than A2 (neither assumption entails the

other), the fact that the dependent variable in the first-step equation is the logarithm of an income

variable may make A2’ more attractive than A2.

I start by showing that under assumptions A1, A2, A3 and A4 the two-sample estimator

of 𝛼𝛼1 is consistent. Substituting Equation [5] into Equation [4], and using A1 and A3, yields:

𝑌𝑌 = exp(𝛼𝛼0 + 𝛼𝛼1𝐸𝐸(ln𝑋𝑋 |𝑇𝑇) + 𝛼𝛼1𝑅𝑅) + Ψ

𝐸𝐸(𝑌𝑌|𝑡𝑡) = exp�𝛼𝛼0 + 𝛼𝛼1𝐸𝐸(ln𝑋𝑋 |𝑡𝑡)� 𝐸𝐸(exp(𝛼𝛼1 𝑅𝑅)|𝑡𝑡) + 𝐸𝐸(Ψ|𝑡𝑡). [6]

Using now A2, Equation [6] reduces to:

𝐸𝐸(𝑌𝑌|𝑡𝑡) = exp�𝛼𝛼0′ + 𝛼𝛼1𝐸𝐸(ln𝑋𝑋 |𝑡𝑡)� + 𝐸𝐸(Ψ|𝑡𝑡), [7]

where 𝛼𝛼0′ = 𝛼𝛼0 + ln𝐸𝐸(exp(𝛼𝛼1𝑅𝑅); and it further reduces to

𝐸𝐸(𝑌𝑌|𝑡𝑡) = exp�𝛼𝛼0′ + 𝛼𝛼1𝐸𝐸(ln𝑋𝑋 |𝑡𝑡)� [8]

if assumption A4 also holds, that is, if the instrument is valid.9 If the variable 𝐸𝐸(ln𝑋𝑋 |𝑇𝑇) were

available in the estimation sample, Equation [8] would be consistently estimated by the PPML

estimator (e.g., Santos Silva and Tenreyro 2006). Under a standard identification condition for

two-step M-estimators (e.g., Wooldridge 2002:354), the PPML estimator that replaces

𝐸𝐸(ln𝑋𝑋 |𝑇𝑇) by estimates obtained in the first step, is also consistent.

9 𝐸𝐸(Ψ|𝑡𝑡) = 0 follows from 𝐸𝐸(Ψ|𝑥𝑥) = 0 (see Equation [4]) and assumption A4.

16

Going back to Equation [6], an alternative justification for this estimator as

“approximately consistent” can be obtained by replacing A2 by A2’. Indeed, it can be shown

(Mitnik 2017c:11) that this yields:

𝐸𝐸(𝑌𝑌|𝑡𝑡) ≅ exp�𝛼𝛼0′′ + 𝛼𝛼1𝐸𝐸(ln𝑋𝑋 |𝑡𝑡)� + 𝐸𝐸(Ψ|𝑡𝑡), [7′]

where 𝛼𝛼0′′ = 𝛼𝛼0 + ln(1 + 0.5 [𝛼𝛼1]2 𝑉𝑉𝑉𝑉𝑉𝑉(𝑅𝑅)). Here I focus on the first justification, as both lead

to the same conclusions.

Resorting to Taylor-series expansions, Mitnik (2017a:16) has advanced an approximated

closed form expression for 𝛼𝛼1 in a PRF like [8]. For future reference, I note that it yields:

𝛼𝛼1 ≅ 𝐶𝐶𝛼𝛼1 − ��𝐶𝐶𝛼𝛼1�2− 𝑉𝑉𝛼𝛼1�

12 , [9]

where

𝑉𝑉𝛼𝛼1 = 2 �𝑉𝑉𝑉𝑉𝑉𝑉�𝐸𝐸(ln𝑋𝑋 |𝑇𝑇)��−1

[9𝑉𝑉]

𝐶𝐶𝛼𝛼1 = [𝐶𝐶𝐶𝐶𝐶𝐶(𝑌𝑌,𝐸𝐸(ln𝑋𝑋 |𝑇𝑇) )]−1. 10 [9𝑏𝑏]

4.2. The two-sample two-step estimator with an invalid instrument

Let’s now assume that estimation is not based on the valid instrument 𝑇𝑇 but on the

invalid instrument 𝑻𝑻, and that although A4 does not hold it is the case that:

𝐴𝐴3′. 𝜸𝜸𝟏𝟏 > 0

𝐴𝐴4′. 𝐶𝐶𝐶𝐶𝐶𝐶(Ψ,𝑻𝑻) > 0.

(Throughout I use bold font to indicate that a parameter, expression or variable pertains to the

analysis with the invalid instrument.) As 𝜸𝜸𝟏𝟏 > 0 if and only if ln𝑋𝑋 and T are positively

correlated, in the “long-run context” A3’ and A4’ are equivalent to the standard assumption that

10 Equations [9], [9a] and [9b] assume 𝐸𝐸(𝑌𝑌) = 1, which is true by hypothesis, and 𝐸𝐸(𝐸𝐸(ln𝑋𝑋 |𝑇𝑇)) = 1. The latter follows from 𝐸𝐸(ln𝑋𝑋) = 1, which is true by hypothesis.

17

the (invalid) instruments typically available to mobility scholars are positively correlated with

the logarithm of parental income and with the error term of the PRF of interest.

To determine the implications of A3’ and A4’, it is useful to rewrite the counterpart to

Equation [7] as follows:

𝐸𝐸(𝒀𝒀|𝒕𝒕) = exp�𝜶𝜶𝟎𝟎′ + 𝛼𝛼1𝐸𝐸(ln𝑋𝑋 |𝒕𝒕)� [10]

where 𝒀𝒀 ≡ 𝑌𝑌 − 𝐸𝐸(Ψ|𝑻𝑻). Now, making use again of the approximated closed-form expression

introduced above, we may write:

𝛼𝛼1 ≅ 𝑪𝑪𝛼𝛼1 − ��𝑪𝑪𝛼𝛼1�2− 𝑽𝑽𝛼𝛼1�

12 , [11]

where

𝑽𝑽𝛼𝛼1 = 2 �𝑉𝑉𝑉𝑉𝑉𝑉�𝐸𝐸(ln𝑋𝑋 |𝑻𝑻)��−1

[11𝑉𝑉]

𝑪𝑪𝛼𝛼1 = [𝐶𝐶𝐶𝐶𝐶𝐶(𝒀𝒀,𝐸𝐸(ln𝑋𝑋 |𝑻𝑻) )]−1

= �𝐶𝐶𝐶𝐶𝐶𝐶�𝑌𝑌,𝐸𝐸(ln𝑋𝑋 |𝑻𝑻)� − 𝜸𝜸𝟏𝟏𝐶𝐶𝐶𝐶𝐶𝐶(Ψ,𝑻𝑻)�−1

. 11 [11𝑏𝑏]

Actual estimation, however, is not based on 𝒀𝒀 but on 𝑌𝑌, which is equivalent to making

𝜸𝜸𝟏𝟏𝐶𝐶𝐶𝐶𝐶𝐶(Ψ,𝑻𝑻) = 0. As assumptions A3’ and A4’ entail that 𝜸𝜸𝟏𝟏𝐶𝐶𝐶𝐶𝐶𝐶(Ψ,𝑻𝑻) > 0, and 𝜕𝜕𝛼𝛼1𝜕𝜕𝑪𝑪𝛼𝛼1

< 0

(Mitnik 2017a:16), it follows that the probability limit of the two-sample two-step estimator of

the IGE of the expectation with the invalid instruments typically available to mobility scholars is

larger than the true parameter. This is the same conclusion that is obtained for the conventional

IGE when the latter is estimated with the TSTSLS estimator, also under the assumption that the

samples have information on long-run rather than short-run income.12

11 Equations [11], [11a] and [11b] assume that 𝐸𝐸(𝒀𝒀) = 1 and that 𝐸𝐸�𝐸𝐸(ln𝑋𝑋 |𝑻𝑻)� = 1. This follows immediately from 𝐸𝐸(𝑌𝑌) = 1 and 𝐸𝐸(ln𝑋𝑋) = 1, which are true by hypothesis. In deriving [11b] I applied the law of total covariance. See Mitnik(2017c:12) for the step-by-step derivation. 12 In the case of the conventional IGE, the fact that the standard conclusion relies on (counterfactually) assuming that the long-run income variables are available was recently noted by Jerrim et al. (2016).

18

4.3. Two-sample two-step estimation of the IGE of the expectation with short-run

income variables

Let 𝑍𝑍 ≥ 0 be the children’s short-run income and 𝑆𝑆 > 0 be the parents’ short-run income.

Without any loss of generality, I assume that 𝐸𝐸(𝑍𝑍) = 𝐸𝐸(𝑆𝑆) = 1. 13P And, as in Section 3, to

simplify the presentation I ignore lifecycle biases (more on this in Section 5).

The first-step equation is now the population linear projection

ln 𝑆𝑆 = 𝛾𝛾�0 + 𝛾𝛾�1𝐷𝐷 + 𝑄𝑄,

where 𝐷𝐷 is a generic instrument, i.e., an instrument that may or may not be valid.

The short-run income measures can be expressed as:

𝑍𝑍 = 𝑌𝑌 + 𝑊𝑊 [12]

𝑆𝑆 = 𝑋𝑋 + 𝑃𝑃. [13]

From 𝐸𝐸(𝑌𝑌) = 𝐸𝐸(𝑋𝑋) = 𝐸𝐸(𝑍𝑍) = 𝐸𝐸(𝑆𝑆) = 1, it follows that 𝐸𝐸(𝑊𝑊) = 𝐸𝐸(𝑃𝑃) = 0. Let’s now make

the following measurement-error assumptions:

𝐺𝐺1.𝐶𝐶𝐶𝐶𝐶𝐶(𝑊𝑊,𝐷𝐷) = 0

𝐺𝐺2.𝐸𝐸(𝑃𝑃|𝑑𝑑) = 𝐸𝐸(𝑃𝑃).

It is easy to show that when these measurement-error assumptions hold (a) assumption A1 entails

𝐸𝐸(𝑄𝑄|𝑑𝑑) = 0, and (b) each of assumptions A3 and A3’ entails 𝛾𝛾�1 ≠ 0.

Therefore, under A1, M1 and M2, the second-step equation may be written as:

𝐸𝐸(𝑍𝑍|𝑑𝑑) = exp(𝛼𝛼�0 + 𝛼𝛼�1𝐸𝐸(ln 𝑆𝑆 |𝑑𝑑)).

Then, using again the approximated closed-form expression employed before, we have:

13 See note 8.

19

𝛼𝛼�1 ≅ 𝐶𝐶𝛼𝛼�1 − ��𝐶𝐶𝛼𝛼�1�2− 𝑉𝑉𝛼𝛼�1�

12 , [14]

where

𝑉𝑉𝛼𝛼�1 = 2 �𝑉𝑉𝑉𝑉𝑉𝑉�𝐸𝐸(ln 𝑆𝑆 |𝐷𝐷)��−1

= 2 [𝑉𝑉𝑉𝑉𝑉𝑉(𝐸𝐸(ln𝑋𝑋|𝐷𝐷) + 𝐸𝐸(𝑃𝑃|𝐷𝐷))]−1

= 2 [𝑉𝑉𝑉𝑉𝑉𝑉(𝐸𝐸(ln𝑋𝑋|𝐷𝐷))]−1. [14𝑉𝑉]

𝐶𝐶𝛼𝛼�1 = �𝐶𝐶𝐶𝐶𝐶𝐶�𝑍𝑍,𝐸𝐸(ln 𝑆𝑆 |𝐷𝐷)��−1

= �𝐶𝐶𝐶𝐶𝐶𝐶�𝑌𝑌,𝐸𝐸(ln𝑋𝑋 |𝐷𝐷)� + 𝐶𝐶𝐶𝐶𝐶𝐶(𝑌𝑌,𝐸𝐸(𝑃𝑃|𝐷𝐷)) + 𝛾𝛾�1𝐶𝐶𝐶𝐶𝐶𝐶(𝑊𝑊,𝐷𝐷)�−1

.

= [𝐶𝐶𝐶𝐶𝐶𝐶(𝑌𝑌,𝐸𝐸(ln𝑋𝑋|𝐷𝐷))]−1. 14 [14𝑏𝑏]

So let’s assume that Equations A1, A2 or A2’, A3 and A4 hold, that is, let’s consider the

case in which the instrument is valid. Comparing Equations [14], [14a] and [14b] with Equations

[9], [9a] and [9b] makes clear that in this scenario the “short-run estimator” (the two-sample two-

step estimator with short-run income variables) is a consistent (or approximately consistent)

estimator of the IGE of the expectation as long as the measurement-error assumptions

𝐶𝐶𝐶𝐶𝐶𝐶(𝑊𝑊,𝑇𝑇) = 0 and 𝐸𝐸(𝑃𝑃|𝑡𝑡) = 𝐸𝐸(𝑃𝑃) hold.

Let’s consider next the case in which the instrument is invalid because A4 does not hold,

but A3’ and A4’ do hold. Comparing now Equations [14], [14a] and [14b] with Equations [11],

[11a] and [11b] shows that, under the same measurement-error assumptions, the short-run

estimator is upward inconsistent with the invalid instruments typically available to mobility

scholars.

14 Equations [14], [14a] and [14b] assume that 𝐸𝐸(𝑍𝑍) = 1 and 𝐸𝐸(𝐸𝐸(ln 𝑆𝑆 |𝐷𝐷)) = 1. The former is true by hypothesis while the latter follows from 𝐸𝐸(ln 𝑆𝑆) = 1, which is true by hypothesis.

20

4.4. Transforming the two-sample two-step estimator into a two-sample GMM

estimator

In the one-sample context, and following the approach first advanced by Newey (1984), a

two-step estimator—where the estimator in the second step is itself an M-estimator or a GMM

estimator that depends on the first-step estimator—can be easily transformed into a GMM two-

equation estimator (where the two equations are estimated simultaneously). To do so, the first-

order conditions for the two equations are “stacked,” so that the first-order conditions for the full

GMM problem reproduce the first-order conditions of the estimators employed in each step.

There are two main advantages to using this approach. First, as in the case of the GMM-

IVP estimator, the two-equation GMM estimator only involves standard asymptotic inferential

procedures, while the two-step estimator requires to account for the two-step nature of the

estimation by using more complicated closed-form asymptotic variance estimators (e.g., Murphy

and Topel 1985; Hardin 2002), or resampling methods. Second, transforming the two-step

estimator into a GMM estimator ensures efficient estimation (see Wooldridge 2002:425 and ff.

for more details). As efficiency is achieved by weighting instruments in an optimal way, in finite

samples the two-step and GMM estimators will produce identical estimates when there is only

one instrument, but will generally produce somewhat different estimates when there are multiple

instruments.

With a minor modification, the same approach can be used to transform the two-sample

two-step predictor-substitution estimator of the exponential regression model introduced above

into a two-sample two-equation GMM estimator. Let’s estipulate that the “first equation” is the

equation from the first step, and the “second equation” is the equation from the second step.

Then a two-sample GMM estimator of the IGE of the expectation, where the moment conditions

21

are products of instruments and “modified residuals,” is obtained as follows: (a) replace the

missing information in each of the samples—the logarithm of parents’ income in the main

sample, children’s income in the auxiliary sample—by any value, e.g., zero, (b) stack the data

from the two samples into one sample, adding an indicator variable to identify the observations

from the auxiliary sample, (c) define the modified residuals entering the moment conditions

associated to the first equation as the usual residuals multiplied by the indicator variable, (d)

define the modified residuals entering the moment conditions associated to the second equation

as the usual residuals multiplied by one minus the indicator variable, and (e) estimate the two-

equation model by GMM in the usual way.

The key steps are (c) and (d). The modified residuals defined in those steps are equal to

the usual residuals for the observations in the equation-dependent “relevant sample” but are

always equal to zero—regardless of the value of the parameter vector—for the observations in

the equation-dependent “irrelevant sample.” Therefore, estimation can proceed as in the one-

sample context. The resulting estimator is the GMM-E-TS estimator.

4.5. Empirical evidence

Mitnik (2017c) has offered empirical evidence strongly supporting the notion that,

like in the case of the GMM-IVP estimator in the one-sample context, estimates of the

IGE of the expectation obtained with the GMM-E-TS estimator and the instruments

typically available are upper-bound estimates. His results also show that instruments vary

greatly with regards to the tightness of the upper bounds that they provide; this indicates

that, as in the case of the GMM-IVP estimator, it is important to search for best invalid

instruments when using the GMM-E-TS estimator.

22

5. Lifecycle biases

As I indicated earlier, the attenuation bias that may result from estimating the

conventional IGE by OLS when parental income is measured with error, and the amplification

bias that may result from estimating it by IV with the invalid instruments typically available,

have not been the only biases of concern for mobility scholars. Indeed, starting with Jenkins

(1987), researchers have also been concerned with the lifecycle biases that may result when the

differences in short-run incomes between children or between parents do not capture well the

differences in their long-run incomes (see, e.g., Grawe 2003; Mazumder 2005; Black and

Devereaux 2011).

Since the mid-2000s, the literature has relied on measurement-error models that allow for

both left-side and right-side lifecycle biases, and model their interactions with the attenuation

and amplification biases just mentioned. Thus, Haider and Solon (2006; see also Nybom and

Stuhler 2016 for an illuminating discussion) advanced a generalized error-in-variables model for

the estimation of the conventional IGE by OLS, while an analogous model for its estimation by

IV in the one-sample context has been implicitly invoked by mobility scholars and has been

explicitly discussed by Mitnik (2017b). In three recent papers, Mitnik advanced functionally

similar generalized error-in-variables models for the estimation of the IGE of the expectation

with the PPML (Mitnik 2017a), GMM-IVP (Mitnik 2017b) and GMM-E-TS (Mitnik 2017c)

estimators. These generalized error-in-variables models all indicate that using measures of

economic status pertaining to specific ages should eliminate the bulk of lifecycle biases, while

the empirical evidence available for both IGEs suggests that this should happen when parents’

and children’s income information is obtained close to age 40 (Haider and Solon 2006;

Böhlmark and Lindquist 2006; Mazumder 2001; Nybom and Stuhler 2016; Mitnik 2017a; Mitnik

23

2017b, Mitnik 2017c).

To simplify the presentation, in Sections 3 and 4 I assumed that estimates are always free

of lifecycle biases. Although this is not true, all conclusions drawn in those sections still follow

from the generalized error-in-variables models that do take lifecycle biases into account,

provided that both the children’s and the parents’ short-run income measures pertain to the “right

points” of their lifecycles. So, as long as the children’s and the parents’ measures pertain to when

they are close to 40 years old, lifecycle biases may be ignored, at least as a first approximation.

If this is not the case, however, those conclusions may be invalid. For instance, the

conclusion that PPML estimates of the IGE of the expectation (as well as OLS estimates of the

conventional IGE) are downward biased when the measures of parental income are based on one

or a few years of parental information may not follow if children’s income is measured when

they are old enough or if parents’ income is measured when they are young enough; in these

cases, the upward lifecycle biases that result may more than compensate for the right-side

attenuation bias. Therefore, the simplified discussion of estimation biases in this paper can be

used as reference as long as the short-run income measures pertain to the “right ages” of parents

and children. If this is not the case, however, the more complicated analyses provided by the

generalized error-in-variables models, and the associated empirical evidence, should be

consulted to determine—to the extent that this is possible—the likely directions of any (net)

estimation biases.

6. Estimating the IGE of the expectation in the one-sample context: Examples

Here I present examples of estimation of the IGE of expected family income using the

PPML and GMM-IVP estimators and a subset of the PSID data employed by Mitnik (2017b),

which cover U.S. men and women born between 1966 and 1974. The examples use the PSID-

24

provided sampling weights (of course, weights may not be needed with other datasets). The

short-run parental income measures included in the dataset I use pertain to when the average age

of the parents was close to 40 years old. The (approximated) long-run measure of parental

income was computed by averaging 25 years of parental data. A measure of children’s long-run

income is not available, so I use a short-run measure in all cases. This measure is an average of

the children’s family income when they were 35-38 years old.15

The variables used in the examples are the following:

c_cohort Child’s birth cohort c_inc Child's family income p_ln_lrinc Logarithm of long-run parental income p_ln_srinc_1y Logarithm of short-run parental income - 1 year p_ln_srinc_5y Logarithm of short-run parental income - 5 years p_age Average parental age p_yeduc Parents' total years of education f_occ Father's occupation c_pweight Sampling weights

The PPML estimator is available with three Stata commands: (a) poisson, (b) glm, by

specifying the options family(poisson) link(log), and (c) ivpoisson, by using the GMM estimator

and entering the parental income variable as an exogenous variable instead of instrumenting it.

Here I employ the poisson command in all cases. Although the description of this command says

that it “fits a Poisson regression of depvar on indepvars, where depvar is a nonnegative count

variable,” in fact it both fits Poisson regression models by ML, and the semiparametric

exponential regression model by PML.

15 Based on the generalized error-in variables models to which I referred in Section 5 and the associated empirical evidence, using this short-run measure of children’s income instead of a long-run measure can be expected to generate very little left-side bias.

25

After reading and svy-setting the data, I use poisson to estimate the IGE of the

expectation with the long-run measure of parental income and with short-run measures of

parental income based on one and five years of information:

(output omitted)

. svyset [pw = c_pweight]

. sysuse onesample_data, clear

_cons 4.776277 .8229352 5.80 0.000 3.160975 6.391579 p_ln_lrinc .5975232 .073114 8.17 0.000 .454011 .7410354 c_inc Coef. Std. Err. t P>|t| [95% Conf. Interval] Linearized

Prob > F = 0.0000 F( 1, 822) = 66.79 Design df = 822Number of PSUs = 823 Population size = 31,492.847Number of strata = 1 Number of obs = 823

Survey: Poisson regression

(running poisson on estimation sample). svy: poisson c_inc p_ln_lrinc

_cons 6.991882 .6658399 10.50 0.000 5.684935 8.298828p_ln_srinc_1y .4043214 .0596204 6.78 0.000 .2872952 .5213476 c_inc Coef. Std. Err. t P>|t| [95% Conf. Interval] Linearized

Prob > F = 0.0000 F( 1, 822) = 45.99 Design df = 822Number of PSUs = 823 Population size = 31,492.847Number of strata = 1 Number of obs = 823

Survey: Poisson regression

(running poisson on estimation sample). svy: poisson c_inc p_ln_srinc_1y

26

The results show that while the PPML estimate based on the long-run measure of parental

income puts the IGE of expected income at 0.60, the estimate with only one year of parental

information is 0.40, that is, the latter estimate is affected by an attenuation bias of 33 percent. At

0.52, the estimate based on five years is much less affected by attenuation bias, but this bias is

still substantial, about 13.5 percent. This is fully in line with the discussion in Section 3.1.

Mobility scholars often include polynomials on parents’ or children’s age as controls, or

specify different intercepts by gender or birth cohort. The next example uses again the five-year

measure of parental income, but this time the linear predictor includes parental age and its square

as controls:

_cons 5.701497 .7091284 8.04 0.000 4.309582 7.093413p_ln_srinc_5y .517027 .0633199 8.17 0.000 .3927393 .6413147 c_inc Coef. Std. Err. t P>|t| [95% Conf. Interval] Linearized

Prob > F = 0.0000 F( 1, 822) = 66.67 Design df = 822Number of PSUs = 823 Population size = 31,492.847Number of strata = 1 Number of obs = 823

Survey: Poisson regression

(running poisson on estimation sample). svy: poisson c_inc p_ln_srinc_5y

27

The results show that controlling for parental age makes no difference with this data.

With any PML estimator, it is mandatory to use robust (or, if relevant, cluster-robust)

standard errors. I did not need to specify this explicitly in the above examples because when a

command is preceded by the prefix svy and, more generally, whenever pweights are used, robust

standard errors (or cluster-robust standard errors if relevant) are computed by default (although

in the former case they are labeled “linearized standard errors”). With data collected through

simple random sampling (SRS), a statement like the following should be used to estimate the

IGE of the expectation (when not controlling for parental age):

The GMM-IVP estimator is implemented by the ivpoisson command. I use it next to

provide estimates of the IGE of expected income with the five-year measure of parental income

Prob > F = 0.4744 F( 2, 821) = 0.75

( 2) [c_inc]c.p_age#c.p_age = 0 ( 1) [c_inc]p_age = 0

Adjusted Wald test

. test c.p_age c.p_age#c.p_age

_cons 5.437295 1.226426 4.43 0.000 3.03 7.844591 c.p_age#c.p_age -.0002766 .00066 -0.42 0.675 -.001572 .0010188 p_age .0176749 .0562984 0.31 0.754 -.0928307 .1281806 p_ln_srinc_5y .5181189 .0666771 7.77 0.000 .3872415 .6489962 c_inc Coef. Std. Err. t P>|t| [95% Conf. Interval] Linearized

Prob > F = 0.0000 F( 3, 820) = 21.90 Design df = 822Number of PSUs = 823 Population size = 31,492.847Number of strata = 1 Number of obs = 823

Survey: Poisson regression

(running poisson on estimation sample). svy: poisson c_inc p_ln_srinc_5y c.p_age c.p_age#c.p_age

. poisson c_inc p_ln_srinc_5y, robust

28

and two instruments: The total years of parental education and the father’s occupation.16 This

command computes robust standard errors by default, so the option robust doesn’t need to be

specified even with SRS data. The command does not accept the survey prefix svy but does

accepts population weights (cluster-robust standard errors can also be specified). When using

father’s occupation to instrument the logarithm of parental income, I include the options igmm

igmmiterate(8), which specify that the iterative GMM estimator rather than the default two-step

estimator be used, with a maximum of 8 iterations. Use of the iterative GMM estimator is

recommended, unless it is too computationally expensive, as it is likely to improve precision in

finite samples (e.g. Hall 2005:Sec. 2.4 and 3.6).17 I specify the option nolog to save space here,

but it’s better not to do so (to be able to examine the log for potential convergence issues). The

commands and the corresponding outputs are these:

16 GMM-IVP estimates with the annual measure of parental income are very similar. 17 When there is only one instrument the model is exactly identified, so requesting that the iterative GMM estimator be used with parental years of education as instrument would make no difference.

Instruments: p_yeducInstrumented: p_ln_srinc_5y _cons 2.656924 1.120022 2.37 0.018 .4617203 4.852128p_ln_srinc_5y .7824478 .0977918 8.00 0.000 .5907794 .9741162 c_inc Coef. Std. Err. z P>|z| [95% Conf. Interval] Robust

GMM weight matrix: RobustInitial weight matrix: UnadjustedNumber of moments = 2Number of parameters = 2 Number of obs = 823

Exponential mean model with endogenous regressors

note: model is exactly identified

Final GMM criterion Q(b) = 2.53e-31

. ivpoisson gmm c_inc (p_ln_srinc_5y = p_yeduc) [pw=c_pweight], nolog

29

The GMM-IVP estimates of the IGE of the expectation are 0.78 and 0.71 with parental

education and father’s occupation as instruments, respectively. Both are larger than 0.6, the

PPML estimate based on the long-run measure of parental income. This is in line with the

discussion in Section 3.2, in which I explained that the GMM-IVP estimator can be expected to

provide upper bound estimates. Moreover, these results also provide a good illustration of the

fact that, as I pointed out in that section, different instruments may lead to estimates that vary

greatly with regards to the tightness of the upper bounds that they provide, and so it is important

to estimate the IGE with as many instruments as possible and select the estimate that provides

the most informative upper bound.

As it’s the case with the poisson command, it is straightforward to add age controls, or

different intercepts for different subpopulations, with the ivpoisson command. For instance, the

following statement specifies a model that allows different intercepts for different birth cohorts,

while again instrumenting the five-year measure of parental income with the father’s occupation:

23.f_occ 24.f_occ 25.f_occ 999.f_occ 16.f_occ 17.f_occ 18.f_occ 19.f_occ 20.f_occ 21.f_occ 22.f_occ 8.f_occ 9.f_occ 10.f_occ 11.f_occ 13.f_occ 14.f_occ 15.f_occInstruments: 1.f_occ 2.f_occ 3.f_occ 4.f_occ 5.f_occ 6.f_occ 7.f_occInstrumented: p_ln_srinc_5y _cons 3.496155 .9444919 3.70 0.000 1.644985 5.347325p_ln_srinc_5y .7057764 .0833317 8.47 0.000 .5424491 .8691036 c_inc Coef. Std. Err. z P>|z| [95% Conf. Interval] Robust

GMM weight matrix: RobustInitial weight matrix: UnadjustedNumber of moments = 26Number of parameters = 2 Number of obs = 823

Exponential mean model with endogenous regressors

Final GMM criterion Q(b) = .0188692

note: iterative GMM weight matrix converged

. ivpoisson gmm c_inc (p_ln_srinc_5y = i.f_occ) [pw=c_pweight], igmm igmmiterate(8) nolog

30

(output omitted)

The computation of the confidence interval for a partially identified IGE discussed in

section 3.3 requires a few steps. I next list these steps and provide code that implements them for

the case in which the IGE is estimated with the five-year measure of parental income and using

father’s occupation as instrument.

The first step is estimating the IGE with the PPML estimator, and storing the point

estimate and the standard error in scalars:

(output omitted)

The second step is estimating the IGE with the GMM-IVP estimator, and storing the

point estimate and the standard error in scalars:

(output omitted)

The third step is computing RW, the estimate of the relative width of the identified set,

and saving it in a scalar:

The fourth step is solving for 𝑐𝑐(𝛼𝛼) in Φ(𝑐𝑐(𝛼𝛼) + 𝑅𝑅𝑊𝑊) −Φ�−𝑐𝑐(𝛼𝛼)� = 1 − 𝛼𝛼, and storing

its value in a scalar. To this end I rely on an approach for solving nonlinear equations suggested

by Feiveson, Kammire, and Cañette in Stata’s website, which uses the command nl for a

. ivpoisson gmm c_inc i.c_cohort (p_ln_srinc_5y = i.f_occ) [pw=c_pweight], igmm igmmiterate(8)

. svy: poisson c_inc p_ln_srinc_5y

. scalar lb_se = _se[p_ln_srinc_5y]

. scalar lb_pe = _b[p_ln_srinc_5y]

. ivpoisson gmm c_inc (p_ln_srinc_5y = i.f_occ) [pw=c_pweight], igmm igmmiterate(8)

. scalar ub_se = _se[p_ln_srinc_5y]

. scalar ub_pe = _b[p_ln_srinc_5y]

. scalar RW = (ub_pe - lb_pe) / max(lb_se, ub_se)

31

nonstandard purpose (its original purpose is nonlinear least squares estimation).18 I employ the

following evaluator program, which is called by nl, to implement that approach in the context at

hand (note that the evaluator program makes use of the scalar RW):

Once this program is saved in the working directory, the following code solves for 𝑐𝑐(𝛼𝛼) and

stores its value in a scalar (I assume that 𝛼𝛼 = 0.05, but any confidence level other than 95

percent can be used by modifying the first line of the code):

The last step is computing the confidence interval and displaying results. Although the

latter may be done with simpler code, I do it with a program, called sumresults, because this

may be helpful in some contexts:

18 See https://www.stata.com/support/faqs/programming/system-of-nonlinear-equations/ , where the approach is clearly described. As also pointed out there, an alternative is to use the Mata function optimize().

end

qui replace `Y' = normal(`c' + RW) - normal(-`c') in 1 qui replace `Y' = 0 // Fill in artificial dependent variable

scalar `c' = `at'[1, 1] tempname c // Retrieve c local Y `varlist' syntax varlist(min=1 max=1) if, at(name) version 9

program nlci

. scalar c = _b[_cons]

. qui nl ci @ `Y', parameters(c) initial(c 1.64)

. qui replace `Y' = conf_level in 1

. gen `Y' = 0

. tempvar Y

. scalar conf_level = 0.95

32

The results show that the set estimate of the IGE of expected income generated by the

bracketing strategy is highly informative. At the same time, the fact that the confidence interval

for that IGE is rather wide underscores the fact that obtaining satisfactory levels of precision

with that strategy requires samples substantially larger than those that are required to obtain

satisfactory levels of precision with the individual estimators the strategy combines.

7. The igets command

6.1 Syntax

The syntax for the igets command is

igets depvar [ varlist1 ] [ if ] [ in ] [ weight ], instruments(varlist2) sampaux(varname1)

depvaraux(varname2) [ options ]

where varlist1 and varlist2 may contain factor variables, and all types of Stata weights (i.e.,

aweights, fweights, iweights, and pweights) are allowed.

6.2 Options

Required options

instruments(varlist2) specifies the instruments

c(0.05) 1.64530 95% Conf. interval 0.41285 - 0.84288 Set estimate 0.51703 - 0.70578Partially identified IGE of the expectation . sumresults. 10. end 9. di as txt "{hline 60}" 8. di as text _col(6) "c(`alpha')" _col(31) as res %6.5f c 7. di as text _col(6) "`level'% Conf. interval" _col(31) as res %6.5f ci_lb _col(39) as res "- " %6.5f ci_ub 6. di as text _col(6) "Set estimate" _col(31) as res %6.5f lb_pe _col(39) as res "- " %6.5f ub_pe 5. di as text "Partially identified IGE of the expectation" 4. di as txt "{hline 60}" 3. local alpha: di %3.2f `alpha' 2. local alpha = 1 - conf_level 1. local level = 100 * conf_level. program define sumresults. . scalar ci_ub = ub_pe + c * ub_se. scalar ci_lb = lb_pe - c * lb_se

33

sampaux(varname1) specifies the name of an indicator variable identifying the observations

from the auxiliary sample

depvaraux(varname2) specifies the name of the log-parental-income variable, which is the

dependent variable in the auxiliary equation

GMM weight matrix options

wmatrix(wmtype) specifies the weight matrix type; weight matrix type wmtype may be robust

(the default), cluster clustvar, or unadjusted; these types of matrices are as defined in

the Stata manual entry for the gmm command

winitial(iwtype) specifies the initial weight matrix; initial weight matrix iwtype may be

unadjusted, identity (the default), or the name of a Stata matrix; these types of matrices

are as defined in the Stata manual entry for the gmm command

SE/Robust options

vce(vcetype) specifies the type of standard error reported; vcetype may be robust, cluster

clustvar, bootstrap, jackknife, or unadjusted; the default vcetype is based on the

wmtype specified in the wmatrix() option; the types of standard errors, and the rules used

to define the default vcetype, are as described in the Stata manual entry for the gmm

command

Other options

nostandardize requires that, when estimating the elasticity of the expectation, the dependent

variable in the main equation not be standardized by dividing it by its mean (which

normally helps the model converge); the default is to standardize this variable

noinitvalues requires that initial values not be provided for GMM estimation; the default is to

provide initial values

34

show requires that the results of the regressions used to generate initial values be shown; the

default is not to show them

technique(optalg) specifies the optimization technique to use; optalg may be nr, bfgs, dfp, gn

(the default), or a combination of these algorithms, which are those used by the gmm

command

othergmmoptions( ) can be used to specify any option allowed by the command gmm not

listed above, as long as the option pertains to the interactive version of that command; for

instance, specifying “othergmm(igmm igmmiterate(8) quickderivatives)" requires igets

to use the iterative GMM estimator (doing up to 8 iterations), and to employ an

alternative method of computing numerical derivatives for the variance-covariance

matrix

geometricmean requires estimation of the elasticity of the conditional geometric mean; the

default is to estimate the elasticity of the conditional expectation

6.3. Description and remarks

The command igets estimates IGEs of children's income with respect to parental income

in contexts in which (a) the measure of children's income (the dependent variable) is available in

one sample (the main sample), the measure of parental income is available in a different sample

(the auxiliary sample), and other parental variables that can be used as instruments (e.g., parental

education, parental occupation) are available in both samples, and (b) the IGE is assumed to be

constant across levels of parental income. By default, igets estimates the IGE of the conditional

expectation of children's income, using the GMM-E-TS estimator introduced in Section 4. The

command can also estimate the IGE of the conditional geometric mean of children's income,

using a GMM version of the TSTSLS estimator, which requires specifying the option

geometricmean; here the two-step procedure employed by the TSTSLS estimator is replaced by

35

estimation of a two-equation model by GMM (relying on the approach described in Section 4.4

for the IGE of the expectation).

The program assumes that the names of variables used in estimation have been

harmonized across samples, the auxiliary sample has been appended to the main sample, and

there is an indicator variable, specified in sampaux(varname1), that is coded 0 for the

observations in the main sample and 1 for the observations in the auxiliary sample. It also

assumes that the dependent variable of the main equation, specified by depvar, is the children’s

income when estimating the IGE of the expectation and the logarithm of their income when

estimating the IGE of the geometric mean. In addition, the program assumes that depvar is coded

as 0 (rather than missing) in the auxiliary sample, and that the dependent variable in the auxiliary

equation, specified by option depvaraux(varname2), is coded as 0 (rather than missing) in the

main sample. If this is not the case, the program exits with an error message. At least one

instrument—and as many as desired—need to be specified with the option

instruments(varlist2). Control variables (e.g., a polynomial on children’s age), may be specified

in varilist1 if desired. If one of the samples includes weights (e.g., population weights) and the

other doesn't, weights equal to 1 need to be generated in the latter sample. Otherwise, the

program will exit with an error message if weighted estimation is requested.

The optimization algorithm employed sometimes makes a difference for whether the

model converges or not, and for the time it takes to converge. If the model under estimation has

difficulties converging, users should experiment with the various optimization algorithms

available (and combinations thereof) by using the option technique( ). Combining 15 iterations

of the modified Newton–Raphson algorithm followed by 5 of the Davidon–Fletcher–Powell

algorithm—i.e., specifying technique(nr 15 dfp 5)—has proven particularly useful in some

36

cases. However, estimation with this combination may be much slower than with the default

Gauss–Newton method, so it should only be used if needed.

Specifying the option othergmm(igmm), which requests use of the iterative GMM

estimator, may lead in some cases to estimates that are (non-negligibly) different. These

estimates are likely to be more precise (Hall 2005: Sec. 2.4 and 3.6) than those obtained without

that option. Specifying the option othergmm(igmm igmmiterate(#)), with # > 2 a small

maximum number of iterations (e.g., # = 8), may be a good compromise if the iterative GMM

estimator takes too long to converge.19 The iterative GMM estimator with a maximum of # > 2

iterations can still be expected to be more efficient than the default (for which # = 2). The

wmtype unadjusted and the vcetype unadjusted should in general not be used; they are allowed

by igets mostly because they might serve a pedagogical purpose in some contexts.

The option show displays the estimates generated by the two-step estimator on which the

relevant GMM estimator is based. By default, igets uses these estimates as initial values for the

latter. In addition, when estimating the IGE of the expectation, the dependent variable is divided

by its mean—which affects the intercept of the linear predictor but not the elasticity—as this

often facilitates convergence. Both behaviors can be changed by specifying the options

noinitvalues and nostandardize, respectively. In addition to what is stored by gmm, igets stores

a few macros in e( ), which are listed in the command’s help.

8. Estimating IGEs in the two-sample context: Examples

In this section I present examples of estimation of IGEs in the two-sample context, using

the command igets and a subset of the PSID-based data employed by Mitnik (2017c). Like the

data used in the previous section, the “artificial two-sample” data I use in this section represent

19 Moreover, there is no guarantee that the iterative GMM estimator will converge (e.g., Hall 2005:90).

37

U.S. men and women born between 1966 and 1974. I mostly focus on the estimation of the IGE

of the expectation with the TS-GMM-E estimator, but at the end I also present an example of

estimation of the IGE of the geometric mean with the GMM version of the TSTSLS estimator.

As required by igets, the names of the variables used in estimation are the same across

samples, and the auxiliary sample has been appended to the main sample. Most variables are as

in Section 6, although here I use three additional variables:

c_female Child female c_ln_inc Logarithm of child's family income aux Auxiliary sample cluster Cluster variable

There is one more condition that the data must satisfy for use with igets: The children’s

income variables need to be coded as zero (rather than missing) in the auxiliary sample, and the

parental income variables need to be coded as zero (rather than missing) in the main sample. I

start by reading the data and doing the needed replacements:

In all examples in this section I estimate IGEs with the short-run measure of parental

income based on one year of information, which is what is typically available in the two-sample

context. Like ivpoisson, igets does not accept the survey prefix svy but does allow population

weights, which need to be specified with the data at hand. It also allows to request a weight

matrix that accounts for arbitrary correlation among observations within clusters, and that

cluster-robust standard errors be computed, both of which are also needed with the data I use.20

20 This is needed because of the relationship between the observations in the main and the auxiliary samples (see Mitnik 2017c:22).

. qui replace p_ln_srinc_1y = 0 if aux == 0

. qui replace c_ln_inc = 0 if aux == 1

. qui replace c_inc = 0 if aux == 1

. sysuse twosample_data, clear

38

In all cases I specify the option othergmm(nolog) to save space, but it’s better not to do so (to be

able to examine the iteration log for potential convergence issues).

In the first example I use parents’ years of education as instrument. The command and

the corresponding output is this:

As with the poisson and ivpoisson commands, it is straightforward to add controls for

parents’ or children’s age, and to specify different intercepts for different subpopulations. Here I

specify a model in which the auxiliary equation and the linear predictor of the second equation

include different intercepts for men and women:

For estimation, dependent variable was standarized by dividing it by its meanInstruments for equation main: p_yeduc _consInstruments for equation aux: p_yeduc _cons /main_cons -7.829381 1.078968 -7.26 0.000 -9.944119 -5.714642 /main_ige .7601429 .096159 7.91 0.000 .5716748 .9486109 _cons 9.249505 .1560973 59.25 0.000 8.94356 9.55545 p_yeduc .151169 .0118903 12.71 0.000 .1278644 .1744736 Coef. Std. Err. z P>|z| [95% Conf. Interval] Robust (Std. Err. adjusted for 823 clusters in cluster)GMM-E-TS estimatorTwo-sample estimation of the intergenerational elasticity of the expectation

GMM weight matrix: Cluster (cluster)Initial weight matrix: Identity Number of obs = 1,646Number of moments = 4Number of parameters = 4

GMM estimation

note: model is exactly identified

Final GMM criterion Q(b) = 1.28e-31

> wmatrix (cluster cluster) othergmm(nolog). igets c_inc [pw=c_pweight], instruments(p_yeduc) sampaux(aux) depvaraux(p_ln_srinc_1y)

39

In the results table produced by igets, the IGE of the expectation is identified by the term

“ige” while all coefficients from the main equation are identified by the prefix “main”—

although, as we can see in the output from the first two examples, the structure of the table is

different when the equations include additional right-hand variables, beyond those that are

required, and when they don’t. The results show that allowing different intercepts by gender

makes essentially no difference for the estimated IGE, as both models put it at about 0.76.

In the above examples, the computation of cluster-robust standard errors is implied by the

use of a weight matrix of the wmtype cluster. This is so because the default vcetype in igets is

based on the wmtype specified in the wmatrix() option. Now, as the default wmtype is robust,

For estimation, dependent variable was standarized by dividing it by its meanInstruments for equation main: p_yeduc 0b.c_female 1.c_female _consInstruments for equation aux: p_yeduc 0b.c_female 1.c_female _cons _cons -7.753714 1.066004 -7.27 0.000 -9.843043 -5.664386 1.c_female -.0913637 .0750093 -1.22 0.223 -.2383793 .0556519main_other _cons .7572668 .0951306 7.96 0.000 .5708142 .9437193main_ige _cons 9.22405 .1599308 57.68 0.000 8.910591 9.537509 p_yeduc .151244 .0118557 12.76 0.000 .1280071 .1744808 1.c_female .0511126 .0561278 0.91 0.362 -.0588959 .1611211aux Coef. Std. Err. z P>|z| [95% Conf. Interval] Robust (Std. Err. adjusted for 823 clusters in cluster)GMM-E-TS estimatorTwo-sample estimation of the intergenerational elasticity of the expectation

GMM weight matrix: Cluster (cluster)Initial weight matrix: Identity Number of obs = 1,646Number of moments = 6Number of parameters = 6

GMM estimation

note: model is exactly identified

Final GMM criterion Q(b) = 1.01e-31

> wmatrix (cluster cluster) othergmm(nolog). igets c_inc i.c_female [pw=c_pweight], instruments(p_yeduc) sampaux(aux) depvaraux(p_ln_srinc_1y)

40

the default vcetype is also robust. Therefore, had the data been the result of SRS, estimation of

the first model could have been accomplished by the following command:

that is, without having to specify robust standard errors (which are mandatory with the GMM- E-TS estimator). This command only includes the minimum information required by igets: The

children’s income variable, one instrument, the indicator variable identifying the auxiliary

sample, and the parents’ income variable (in logarithms).

In the following example I specify a model in which both parents’ years of education and

its square are used as instruments. As the model converges very slowly with the default

algorithm, I specify the option technique(nr 15 dfp 5). For the reason explained earlier, I also

specify the option othergmm(igmm igmmiterate(8)). The results follow, showing a larger IGE

estimate than in the previous two examples:

. igets c_inc, instruments(p_yeduc) sampaux(aux) depvaraux(p_ln_srinc_1y)

For estimation, dependent variable was standarized by dividing it by its meanInstruments for equation main: p_yeduc c.p_yeduc#c.p_yeduc _consInstruments for equation aux: p_yeduc c.p_yeduc#c.p_yeduc _cons /main_cons -8.734743 1.129218 -7.74 0.000 -10.94797 -6.521516 /main_ige .8392476 .1008351 8.32 0.000 .6416145 1.036881 _cons 8.575028 .4330375 19.80 0.000 7.726291 9.423766 c.p_yeduc#c.p_yeduc -.0045586 .0027657 -1.65 0.099 -.0099792 .000862 p_yeduc .2641736 .0691984 3.82 0.000 .1285472 .3998 Coef. Std. Err. z P>|z| [95% Conf. Interval] Robust (Std. Err. adjusted for 823 clusters in cluster)GMM-E-TS estimatorTwo-sample estimation of the intergenerational elasticity of the expectation

GMM weight matrix: Cluster (cluster)Initial weight matrix: Identity Number of obs = 1,646Number of moments = 6Number of parameters = 5

GMM estimation

Final GMM criterion Q(b) = .0018108

note: iterative GMM parameter vector converged

> wmatrix (cluster cluster) othergmm(igmm igmmiterate(8) nolog) tech(nr 15 dfp 5). igets c_inc [pw=c_pweight], instruments(c.p_yeduc c.p_yeduc#c.p_yeduc) sampaux(aux) depvaraux(p_ln_srinc_1y)

41

The last example of estimation of the IGE of the expectation instruments the logarithm of

parental income with father’s occupation:

For estimation, dependent variable was standarized by dividing it by its meanWarning: convergence not achieved 20.f_occ 21.f_occ 22.f_occ 23.f_occ 24.f_occ 25.f_occ 999.f_occ _cons 8.f_occ 9.f_occ 10.f_occ 11.f_occ 13.f_occ 14.f_occ 15.f_occ 16.f_occ 17.f_occ 18.f_occ 19.f_occInstruments for equation main: 0b.f_occ 1.f_occ 2.f_occ 3.f_occ 4.f_occ 5.f_occ 6.f_occ 7.f_occ 20.f_occ 21.f_occ 22.f_occ 23.f_occ 24.f_occ 25.f_occ 999.f_occ _cons 8.f_occ 9.f_occ 10.f_occ 11.f_occ 13.f_occ 14.f_occ 15.f_occ 16.f_occ 17.f_occ 18.f_occ 19.f_occInstruments for equation aux: 0b.f_occ 1.f_occ 2.f_occ 3.f_occ 4.f_occ 5.f_occ 6.f_occ 7.f_occ /main_cons -7.217849 .9484083 -7.61 0.000 -9.076695 -5.359003 /main_ige .7011201 .0851389 8.24 0.000 .5342509 .8679894 _cons 10.62134 .2042857 51.99 0.000 10.22095 11.02174 DK; NA; refused .1579691 .2281777 0.69 0.489 -.289251 .6051892 Military Specific Occupations .0890164 .2589856 0.34 0.731 -.4185861 .596619 Transp. and Material Moving Occs. .1863462 .2209411 0.84 0.399 -.2466905 .6193829 Production Occupations .3117909 .2123928 1.47 0.142 -.1044913 .728073 Inst., Maint., and Repair Workers .6662887 .2197166 3.03 0.002 .2356521 1.096925 Extraction Workers -.1083873 .2137944 -0.51 0.612 -.5274166 .3106421 Construction Trades .3426165 .2404355 1.42 0.154 -.1286285 .8138614 Farming, Fish., and Forestry Occs. -.0956342 .2945735 -0.32 0.745 -.6729876 .4817193 Office and Admin. Support Occs. .6616165 .2328089 2.84 0.004 .2053194 1.117914 Sales Occupations .8022661 .2194437 3.66 0.000 .3721644 1.232368 Personal Care and Service Occs. -.7091987 .2726542 -2.60 0.009 -1.243591 -.1748064 Build. and Grounds Maint. Occs. .4490454 .2706518 1.66 0.097 -.0814224 .9795133 Food Prep. and Serving Occupations -.5128079 .2323927 -2.21 0.027 -.9682891 -.0573267 Protective Service Occupations .4813709 .2302279 2.09 0.037 .0301326 .9326092 Healthcare Pract. and Tech. Occs. 1.600922 .3486824 4.59 0.000 .9175171 2.284327Arts, Entert., Sports, and Media Oc.. .9253009 .2440438 3.79 0.000 .4469839 1.403618 Education, Train., and Libr. Occs. .8108341 .2153786 3.76 0.000 .3886997 1.232968 Legal Occupations .9632733 .204361 4.71 0.000 .5627331 1.363814 Community and Social Servs. Occs. .5672367 .2139011 2.65 0.008 .1479982 .9864752 Life, Physical, and Soc. Sc. Occs. .9000578 .2362636 3.81 0.000 .4369896 1.363126 Architecture and Engineering Occs. .861216 .2213654 3.89 0.000 .4273478 1.295084Computer and Mathematical Occupations .9330239 .2413472 3.87 0.000 .459992 1.406056 Financial Specialists .9509302 .2331198 4.08 0.000 .4940238 1.407837 Business Operations Specialists .4077267 .2318554 1.76 0.079 -.0467015 .8621549 Management Occupations .8470761 .2279301 3.72 0.000 .4003414 1.293811 f_occ Coef. Std. Err. z P>|z| [95% Conf. Interval] Robust (Std. Err. adjusted for 823 clusters in cluster)GMM-E-TS estimatorTwo-sample estimation of the intergenerational elasticity of the expectation

GMM weight matrix: Cluster (cluster)Initial weight matrix: Identity Number of obs = 1,646Number of moments = 52Number of parameters = 28

GMM estimation

Final GMM criterion Q(b) = .0086302

note: maximum number of GMM iterations reached

> cluster) othergmm(igmm igmmiterate(8) nolog). igets c_inc [pw=c_pweight], instruments(i.f_occ) sampaux(aux) depvaraux(p_ln_srinc_1y) wmatrix (cluster

42

As in all other examples in Section 6 and in this section in which I used multiple

instruments with a GMM estimator, here I requested that the iterative GMM estimator, with up

to 8 iterations, be used (rather than the default GMM estimator, where the number of iterations is

2). Unlike in all previous cases, however, in this case the GMM iterative estimator did not

converge in 8 iterations, and this is noted in two places in the output: Indirectly at the top, where

there is a note saying “maximum number of GMM iterations reached,” and directly at the

bottom, where there is a warning saying “convergence not achieved.” This is fully

unproblematic. Like the default estimator, any GMM iterative estimator based on # > 2 iterations

is consistent; moreover, as already indicated, it is likely more efficient that the default estimator

in finite samples.

As I am using data that are “artificially two-sample,” and the original data include a

measure of long-run parental income, the long-run IGE of the expectation can be estimated with

the PPML estimator. This estimate is the one I reported in Section 6, i.e. 0.60 (see the first

example in that section). Therefore, examining the results obtained in this section, we see that (a)

as expected, all IGE estimates produced by the GMM-E-TS estimator are upper-bound estimates,

and (b) as with the GMM-IVP estimator in the one-sample context, in the two-sample context the

estimate using father’s occupation as instrument, 0.70, provides the tightest upper bound, while

those obtained by using parents’ years of education, alone and as second degree polynomial, put

the IGE at 0.76 and 0.84, respectively.

I finish the section with an example of estimation of the conventional IGE, in which I use

parents’ years of education as instrument. Estimation of the conventional IGE is achieved by

substituting the logarithm of children’s income for children’s income as outcome variable, and

43

specifying the option geometricmean. I also specify the option show, just to show that, with a

single instrument, igets reproduces the point estimate obtained with the TSTSLS estimator.

The estimate of 0.91 is larger than the log-run estimate of the conventional IGE obtained with

Instruments for equation main: p_yeduc _consInstruments for equation aux: p_yeduc _cons /main_cons 1.060185 1.221688 0.87 0.386 -1.334279 3.454649/main_ige_gm .9117561 .1087229 8.39 0.000 .6986632 1.124849 _cons 9.249505 .1560973 59.25 0.000 8.94356 9.55545 p_yeduc .151169 .0118903 12.71 0.000 .1278644 .1744736 Coef. Std. Err. z P>|z| [95% Conf. Interval] Robust (Std. Err. adjusted for 823 clusters in cluster)GMM version of the TSTSLS estimatorTwo-sample estimation of the intergenerational elasticity of the geometric mean

GMM weight matrix: Cluster (cluster)Initial weight matrix: Identity Number of obs = 1,644Number of moments = 4Number of parameters = 4

GMM estimation

note: model is exactly identified

Final GMM criterion Q(b) = 1.20e-31

Joint estimation by GMM

_cons 1.060185 1.180002 0.90 0.369 -1.255999 3.376369p_ln_srinc_1y_hat .9117561 .1050486 8.68 0.000 .7055599 1.117952 c_ln_inc Coef. Std. Err. t P>|t| [95% Conf. Interval] Robust

Root MSE = .77855 R-squared = 0.1325 Prob > F = 0.0000 F(1, 819) = 75.33Linear regression Number of obs = 821

(sum of wgt is 3.1455e+04)Second-step regression

_cons 9.249505 .1562873 59.18 0.000 8.942735 9.556275 p_yeduc .151169 .0119048 12.70 0.000 .1278015 .1745364 p_ln_srin~1y Coef. Std. Err. t P>|t| [95% Conf. Interval] Robust

Root MSE = .66842 R-squared = 0.1996 Prob > F = 0.0000 F(1, 821) = 161.24Linear regression Number of obs = 823

(sum of wgt is 3.1493e+04)First-step regression

Generating initial values

> cluster) geometricmean show othergmm(nolog). igets c_ln_inc [pw = c_pweight], instruments(p_yeduc) sampaux(aux) depvaraux(p_ln_srinc_1y) wmatrix (cluster

44

the underlying data and the OLS estimator, which puts that IGE at 0.70. This is in agreement

with the standard approach in the literature, which treats TSTSLS estimates as upper-bound

estimates.

9. Concluding remarks

I have explained how to use Stata official commands to estimate the IGE of the

expectation in the one-sample context, employing the PPML and GMM-IVP estimators, and how

to construct confidence intervals for the partially identified IGE when the estimates obtained

with those estimators are combined to generate a set estimate. I have also introduced the user-

written program igets, which allows to estimate IGEs in the two-sample context, and have shown

how to use it to estimate the IGE of the expectation with the GMM-E-TS estimator. By

providing a suite of estimators and Stata commands to estimate the IGE of the expectation in all

situations in which the IGE of the geometric mean has been estimated by mobility scholars, I

hope to contribute to the replacement of the latter by the former as the workhorse

intergenerational elasticity.

10. References

Aaronson, David and Bhashkar Mazumder. 2008. “Intergenerational Economic Mobility in the US: 1940 to 2000.” Journal of Human Resources 43(1): 139-172.

Björklund, Anders and Markus Jäntti. 2000. “Intergenerational Mobility of Socio-Economic Status in Comparative Perspective.” Nordic Journal of Political Economy 26: 3-32.

Böhlmark, Anders and Matthew Lindquist. 2006. “Life-Cycle Variations in the Association between Current and Lifetime Income: Replication and Extension for Sweden,” Journal of Labor Economics, 24(4): 879–896.

Black, Sandra and Paul Devereux. 2011. “Recent Developments in Intergenerational Mobility.” Handbook of Labor Economics, Volume 4b, edited by David Card and Orley Ashenfelter. Amsterdam: Elsevier.

Bloome, Deirdre. 2015. “Income Inequality and Intergenerational Income Mobility in the United States.” Social Forces, 93(3): 1047-1080.

Bloome, Deirdre and Western. 2011. “Cohort Change and Racial Differences in Educational and Income Mobility.” Social Forces, 90(2): 375-395.

45

Bratsberg, Bernt, Knut Røed, Oddbjørn Raaum, Robin Naylor, Markus Jäntti, Tor Eriksson and Eva Österbacka. 2007. “Nonlinearities in Intergenerational Earnings Mobility. Consequences for Cross-Country Comparisons.” The Economic Journal, 117 (March) C72-C92.

Carroll, Raymond, David Ruppert, Leonard Stefanski, and Ciprian Crainiceanu. 2006. Measurement Error in Nonlinear Models: A Modern Perspective (second edition). Boca Raton: Chapman & Hall/CRC.

Chadwick, Laura and Gary Solon. 2002. “Intergenerational Income Mobility among Daughters.” The American Economic Review 92(1): 335-344.

Corak, Miles. 2006. “Do Poor Children Become Poor Adults? Lessons from a Cross Country Comparison of Generational Earnings Mobility.” IZA Discussion Paper No. 1993, IZA.

Corak, Miles. 2013. “Income Inequality, Equality of Opportunity, and Intergenerational Mobility.” Journal of Economic Perspectives 27 (3):79-102.

Gourieroux, C, A. Monfort and A. Trognon. 1984. “Pseudo Maximum Likelihood Methods: Theory.” Econometrica 52(3): 681-700.

Grawe, Nathan. 2003. “Life Cycle Bias in the Estimation of Intergenerational Earnings Persistence." Analytical Studies Branch Research Paper Series 2003207e, Statistics Canada, Analytical Studies Branch.

Haider, Steven and Gary Solon. 2006. “Life-Cycle Variation in the Association between Current and Lifetime Earnings.” American Economic Review 96(4): 1308-1320.

Hall, Alastair. 2005. Generalized Method of Moments. Oxford: Oxford University Press. Hardin, James. 2002. “The Robust Variance Estimator for Two-Stage Models.” The Stata

Journal 2(3): 253-266. Hertz, Tom. 2005. “Rags, Riches and Race: The Intergenerational Economic Mobility of Black

and White Families in the United States.” In Unequal Chances. Family Background and Economic Success, edited by Samuel Bowles, Herbert Gintis, and Melissa Osborne Groves. New York, Princeton and Oxford: Russell Sage and Princeton University Press.

Hertz, Tom. 2007. “Trends in the Intergenerational Elasticity of Family Income in the United States.” Industrial Relations 46(1): 22-50.

Imbens, Guido and Charles Manski. 2004. “Confidence Intervals for Partially Identified Parameters.” Econometrica 72(6): 1845-1857.

Inoue, Atsushi and Gary Solon. 2010. “Two-Sample Instrumental Variables Estimators.” The Review of Economics and Statistics 92(3):557-561.

Jäntti, Markus and Stephen Jenkins. 2015. “Income Mobility.” In Handbook of Income Distribution (Vol. 2A), edited by Anthony Atkinson and François Bourguignon. Amsterdam: Elsevier.

Jerrim, John, Álvaro Choi, and Rosa Simancas. 2016. “Two-Sample Two-Stage Least Squares (TSTSLS) Estimates of Earnings Mobility: How Consistent Are They?” Survey Research Methods 10(2):85-102.

46

Landersø, Rasmus and James Heckman. 2016. “The Scandinavian Fantasy: The Sources of Intergenerational Mobility in Denmark and the U.S.” NBER Working Paper 22465.

Manski, Charles. 2003. Partial Identification of Probability Distributions. New York: Springer-Verlag.

Mazumder, Bhashkar. 2001. “The Miss-measurement of Permanent Earnings: New Evidence from Social Security Earnings Data.” Federal Reserve Bank of Chicago Working Paper 2001-24.

Mazumder, Bhashkar. 2005. “Fortunate Sons: New Estimates of Intergenerational Mobility in the United States Using Social Security Earnings Data.” The Review of Economics and Statistics 87(2): 235-255.

Mazumder, Bhashkar. 2016. “Estimating the Intergenerational Elasticity and Rank Association in the United States: Overcoming the Current Limitation of Tax Data.” In Inequality: Causes and Consequences, edited by Lorenzo Cappellari, Solomon Polacheck, and Konstantinos Tatsiramos. Bingley: Emerald.

Mullahy, John. 1997. “Instrumental-Variable Estimation of Count Data Models: Applications to Models of Cigarette Smoking Behavior.” Review of Economics and Statistics 79(4): 586-593.

Mayer, Susan E., and Leonard Lopoo. 2008. “Government Spending and Intergenerational Mobility.” Journal of Public Economics 92: 139-58.

Mitnik, Pablo. 2017a. “Estimating the Intergenerational Elasticity of Expected Income with Short-Run Income Measures: A Generalized Error-in-Variables Model.” Stanford Center on Poverty and Inequality Working Paper.

Mitnik, Pablo. 2017b. “Intergenerational Income Elasticities, Instrumental Variable Estimation, and Bracketing Strategies.” Stanford Center on Poverty and Inequality Working Paper.

Mitnik, Pablo. 2017c. “Two-Sample Estimation of the Intergenerational Elasticity of Expected Income.” Stanford Center on Poverty and Inequality Working Paper.

Mitnik, Pablo, Victoria Bryant, Michael Weber and David Grusky. 2015. “New Estimates of Intergenerational Mobility Using Administrative Data.” SOI Working Paper, Statistics of Income Division, Internal Revenue Service.

Mitnik, Pablo and David Grusky. 2017. The Intergenerational Elasticity of What? A Call for Redefining the Workhorse Measure of Economic Mobility.” Stanford Center on Poverty and Inequality Working Paper.

Mulligan, Casey. 1997. Parental Priorities and Economic Inequality. Chicago: University of Chicago Press.

Murphy, Kevin and Robert Topel. 1985. “Estimation and Inference in Two-Step Econometric Models.” Journal of Business & Economic Statistics 3(4): 370-379.

Newey, Whitney. 1984. “A Method of Moments Interpretation of Sequential Estimators.” Economics Letters 14(2-3):201-206.

Ng, Irene. 2007. “Intergenerational Income Mobility in Singapore,” The B.E. Journal of Economic Analysis & Policy 7(2): Article 3.

47

Nybom, Martin and Jan Stuhler. 2016. “Heterogeneous Income Profiles and Life-Cycle Bias in Intergenerational Mobility Estimation.” The Journal of Human Resources 15(1): 239-268.

Petersen, Trond. 2017. “Multiplicative Models for Continuous Dependent Variables: Estimation on Unlogged versus Logged Form.” Sociological Methodology 47:113-164.

Reiss, Peter. 2016, “Just How Sensitive are Instrumental Variable Estimates?”. Foundations and Trends in Accounting 10(2-4): 204–237.

Santos Silva, J. M. C. and Silvana Tenreyro. 2006. "The Log of Gravity." The Review of Economics and Statistics 88(4): 641-658.

Santos Silva, J. M. C. and Silvana Tenreyro. 2011. “Further Simulation Evidence on the Performance of the Poisson Pseudo-maximum Likelihood Estimator.” Economics Letters 112: 220-222.

Solon, Gary. 1992. “Intergenerational Income Mobility in the United States.” American Economic Review 82(3): 393-408.

Solon, Gary. 1999. “Intergenerational Mobility in the Labor Market.” Handbook of Labor Economics, Volume 3A, edited by Orley C. Ashenfelter and David Card. Amsterdam: Elsevier.

Solon, Gary. 2004. “A Model of Intergenerational Mobility Variation over Time and Place.” In Generational Income Mobility in North America in Europa, edited by Miles Corak. Cambridge: Cambridge University Press.

Stoye, Jörg. 2009. “More on Confidence Regions for Partially Identified Parameters.” Econometrica 77(4):1299–1315.

Tamer, Elie. 2010. “Partial Identification in Econometrics.” The Annual Review of Economics 2: 167-195.

Terza, Joseph, Anirban Basu, Paul J. Rathouz. 2008. “Two-Stage Residual Inclusion Estimation: Addressing Endogeneity in Health Econometric Modeling.” Journal of Health Economics 27(3):531-543.

Windmeijer, F. A. G. and J. M. C. Santos Silva. 1997. “Endogeneity in Count Data Models: An Applications to Demand for Health Care.” Journal of Applied Econometrics 12: 281-294.

Winkelmann, Rainer. 2008. Econometric Analysis of Count Data. Berlin: Springer-Verlag. Wooldridge, Jeffrey. 1999. “Quasi-Likelihood Methods for Count Data.” In Handbook of

Applied Econometrics Volume II: Microeconomics, edited by M. Hashem Pesaran and Peter Schmidt. Blackwell Publishing.

Wooldridge, Jeffrey. 2002. Econometric Analysis of Cross Section and Panel Data. Cambridge, Mass: The MIT Press.

Zimmerman, David J. 1992. “Regression toward Mediocrity in Economic Stature.” American Economic Review 82 (3), pp. 409-429.

48