selection bias and self-selection -...

11
selectionbias and self-selection lrxian Iythe ,Ie of ibour large temal Wage anced : and lions. jobs, Williamson, O.E. 1975. Markets and Hierarchies: Analysis and Antitrust Implications: New York: Free Press. ::s for basis I tend 'ed as lr of those non- ector, workers' cooperation, mutual training, and tenure lon$evity is another old idea in economics. A recent neoclassical application, with abundant citations, is that of' Oliver E. Williamson (1975), who argues that such insti~utional devices as implicit contracts, collective bargaining, internal .c promotion ladders, and seniority rights are econ~mically efficient when jobs and workers are heterogeneous and idiosyncratic. A fixed structure of wagesfor jobs, which is emphasized by segmentation economists, is descriptively accurateand useful for analysing short-run behaviour, but even in the short run a human capital model of supply-side productivity tr$its can explain the match of workers to a hierarchy of wage-fixed jobs. In the long run the human capital model can explain changes in workers' productivity traits, and neoclassical models generally would predict changes in the structure of both jobs and wages. A discussionof empirical work and policy issues concerning segmented labour markets is beyond the scope of t~is entry (see the bibliography below). It should bestated, however,that the sometimeclaim that the neoclassical economists ignore the demand side of the market in policy discussions is u*t'punded. That labour market outcomes and processes are ~omplex and controversial is evident in the intellectual lega(;y of the above-listed five sources of inequality. The criticlsJns and empirical work of the segmented labour market economists have added to this legacy, but they, like the earlier di$senters, the Marxists and the Institutionalists, remain on the bank of the mainstream. I I ~nces, ~ the er. In bility, :riods n the :s are, oclas- rlsider Ir the rience on a itudes ch an Issical In the these GLENj $. CAIN sand lea. It 10mic ,1944, lay be namic. e not It the form ics of seigniorage. Full-bodied monies such as gold coin contain metal approximately equal in value to the face value of the coin. Under the gold standard, metal could be brought to the mint and freely coined into gold, less a small seigniorage charge for the privilege. Subsidiary or token coin and paper money by contrast cost much less to produce than their face value. The excess of the face value over the cost of production of currency is also called seigniorage, IJecause it accrued to the seigneur or ruler who issuedthe currency, in early times. The use of paper money instead of full-bodied coin by modern governments generates a very large social saving in the useof the resources that would otherwise have to be expended in mining and smelting large quantities of metal. The value of this seigniorage can bemeasuredby considering the aggregate demand curve for currency, as a function of the rate of interest. The area under this demand curve represents the aggregate flow of social benefits from holding currency, under certain assumptions. The social cost of holding currency is measuredby the opportunity cost of the resources it takes to produce the currency. If gold were used for currency, its opportunity cost would be measured by the rate of interest that could beearned on those resources if transferred to some other use. Thus the area under the demand curve betweenthe market rate of interest and the cost of providing paper currency represents the flow of seigniorage or social saving that accruesfrom the use of paper currency instead of gold. In the international monetary system, gold remains a very large fraction of total holdings of international reserves (about 45 per cent of total reserves valued at market prices at the end of March 1985).Substitution of fiduciary reserveassets such as Special Drawing Rights created by the International Monetary Fund or United States dollars for gold would generate a substantial social gain in the form of seigniorageequal to the excess of the opportunity cost of capital over the costs of providing the fiduciary asset. If interest is paid to the holders of the reserveasset,the seigrllo:ageis split between the issuer and the holder. The existenceof theselarge seigniorage gains is what led to the development of the gold exchange standard, under which first British sterling, before World War II, and since then United' Statesdollars and other currencieshave substituted for gold in international reserve holdings. As interest rates paid on these reserveassets have risen, more of the seigniorage has accrued to holders of reserveassets. Further substitution of fiduciary reserveassets for gold in the international monetary systemhas frequently been suggested, and the Second Amendment to the Charter of the International Monetary Fund adopted in 1978proposed such a goal. Little progress has been made, however, since the underlying issue is one of trust in the financial probity of the issuer and its continued political stability, as well as its continued willingness to convert reserve assets into usable currenciesover long periods of time. )loyee: also 'api/at ~neral orous cker's fectly nt in ~e to Mill's Issical ces in :rs of S. BLACK selection. See COMPEfI110N AND SELECTION. selection bias and self-selection. The problem of selectionbias in economic and social statisticsarises when a rule other than simple random sampling is used to sample the underlying 287 BIBLIOGRAPHY The literature on segmented labour markets is extensive and diversified, and there are disputes about who are the leading theorists and which are the landmark articles. These chaf'dcteristicsmake it difficult to provide a brief bibliography. In addition to the items cited in the text, several survey articles and books contain lengthy bibliographies: Taubman and Wachter (1986); Gordon, Edwards and Reich (1982); Wilkinson (1981); Cain (1976). The application of segmented labour market theories to developmenteconomicsis not, however, covered in thesesources, and the author is unawareof any surveyor bil>liographic sourcesfor this application. ~ Becker,G.S. 1964. Human Capital. New York: Columbia University Pressfor the National Bureau of Economic Research. Berger, S. and Piore, M.J. 1980. Dualism and Discontinuity in Tndus- trial Societies. Cambridge: Cambridge University Press. Braverman, H. 1974. Labor and Monopoly Capital. New York: Monthly ReviewPress. Cain, G. 1976. The challenge or segmented labor market theories to orthodox theory: a survey. Journal of EconomicLiterature 14(4), December, 1215-57. Cairnes, J.E. 1874. Some Leading Principles of Political Ecl1npmy. New York: Harper & Brothers. Gordon, D.M., Edwards, R.C. and Reich, M.S. 1982. Segmented Work, Divided Workers: The Historical Transformation of Labor in the United States.Cambridge: Cambridge University Press. Marshall, A. [1890) 1959. Principles of Economics. 8th edn" London: Macmillan. Mill, J.S. [1848) 1900. The Principles of Political Economy,IVol. I. Revisededn, The World's Greatest Classics,New Yorlt: Colonial Press. i Myrdal, G. 1944. An American Dilemma. New York: Ha~1i & Row. Taubman, P. and Wachter, M.L. 1986. Segmented labor njarkets. In Handbook of Labor Economics, ed. o. Ashenfelter and' R. Layard, Amsterdam: Elsevier Science Publishers. Werthheim, W.F. 1967. Economy, dual. In Tnternational E!U1Yclopedia of the Social Sciences, Vol. 4, New York: Macmillan I/nd Free Press, 495-500. Wilkinson, F. (ed.) 1981. The Dynamics of Labor Market $egmenta- tion. New York: Academic Press.

Upload: others

Post on 15-Jul-2020

18 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: selection bias and self-selection - jenni.uchicago.edujenni.uchicago.edu/econ312/papers/scanned-palgrave.pdf · selection. See COMPEfI110N AND SELECTION. selection bias and self-selection

selection bias and self-selection

lrxianIythe,Ie ofibourlarge

temalWageanced: andlions.jobs,

Williamson, O.E. 1975. Markets and Hierarchies: Analysis andAntitrust Implications: New York: Free Press.

::s forbasis

I tend'ed aslr ofthosenon-

ector,

workers' cooperation, mutual training, and tenure lon$evity isanother old idea in economics. A recent neoclassicalapplication, with abundant citations, is that of' OliverE. Williamson (1975), who argues that such insti~utionaldevices as implicit contracts, collective bargaining, internal

.cpromotion ladders, and seniority rights are econ~micallyefficient when jobs and workers are heterogeneous andidiosyncratic.

A fixed structure of wages for jobs, which is emphasized bysegmentation economists, is descriptively accurate and usefulfor analysing short-run behaviour, but even in the short run ahuman capital model of supply-side productivity tr$its canexplain the match of workers to a hierarchy of wage-fixedjobs. In the long run the human capital model can explainchanges in workers' productivity traits, and neoclassicalmodels generally would predict changes in the structure ofboth jobs and wages.

A discussion of empirical work and policy issues concerningsegmented labour markets is beyond the scope of t~is entry(see the bibliography below). It should be stated, however, thatthe sometime claim that the neoclassical economists ignore thedemand side of the market in policy discussions is u*t'punded.

That labour market outcomes and processes are ~omplexand controversial is evident in the intellectual lega(;y of theabove-listed five sources of inequality. The criticlsJns andempirical work of the segmented labour market economistshave added to this legacy, but they, like the earlier di$senters,the Marxists and the Institutionalists, remain on the bank ofthe mainstream. I I

~nces,~ theer. Inbility,:riodsn the:s are,oclas-rlsiderIr therienceon a

itudesch anIssicalIn thethese

GLENj $. CAIN

sandlea. It10mic,1944,lay benamic.e notIt theform

ics of

seigniorage. Full-bodied monies such as gold coin containmetal approximately equal in value to the face value of thecoin. Under the gold standard, metal could be brought to themint and freely coined into gold, less a small seignioragecharge for the privilege. Subsidiary or token coin and papermoney by contrast cost much less to produce than their facevalue. The excess of the face value over the cost of productionof currency is also called seigniorage, IJecause it accrued to theseigneur or ruler who issued the currency, in early times.

The use of paper money instead of full-bodied coin bymodern governments generates a very large social saving in theuse of the resources that would otherwise have to be expendedin mining and smelting large quantities of metal. The value ofthis seigniorage can be measured by considering the aggregatedemand curve for currency, as a function of the rate ofinterest. The area under this demand curve represents theaggregate flow of social benefits from holding currency, undercertain assumptions. The social cost of holding currency ismeasured by the opportunity cost of the resources it takes toproduce the currency. If gold were used for currency, itsopportunity cost would be measured by the rate of interestthat could be earned on those resources if transferred to someother use. Thus the area under the demand curve between themarket rate of interest and the cost of providing papercurrency represents the flow of seigniorage or social savingthat accrues from the use of paper currency instead of gold.

In the international monetary system, gold remains a verylarge fraction of total holdings of international reserves (about45 per cent of total reserves valued at market prices at the endof March 1985). Substitution of fiduciary reserve assets such asSpecial Drawing Rights created by the International MonetaryFund or United States dollars for gold would generate asubstantial social gain in the form of seigniorage equal to theexcess of the opportunity cost of capital over the costs ofproviding the fiduciary asset. If interest is paid to the holdersof the reserve asset, the seigrllo:age is split between the issuerand the holder.

The existence of these large seigniorage gains is what led tothe development of the gold exchange standard, under whichfirst British sterling, before World War II, and since thenUnited' States dollars and other currencies have substituted forgold in international reserve holdings. As interest rates paid onthese reserve assets have risen, more of the seigniorage hasaccrued to holders of reserve assets.

Further substitution of fiduciary reserve assets for gold in theinternational monetary system has frequently been suggested,and the Second Amendment to the Charter of theInternational Monetary Fund adopted in 1978 proposed sucha goal. Little progress has been made, however, since theunderlying issue is one of trust in the financial probity of theissuer and its continued political stability, as well as itscontinued willingness to convert reserve assets into usablecurrencies over long periods of time.

)loyee: also

'api/at~neralorouscker's

fectlynt in~e to

Mill'sIssicalces in:rs of

S. BLACK

selection. See COMPEfI110N AND SELECTION.

selection bias and self-selection. The problem of selection biasin economic and social statistics arises when a rule other thansimple random sampling is used to sample the underlying

287

BIBLIOGRAPHYThe literature on segmented labour markets is extensive and diversified,and there are disputes about who are the leading theorists and whichare the landmark articles. These chaf'dcteristics make it difficult toprovide a brief bibliography. In addition to the items cited in the text,several survey articles and books contain lengthy bibliographies:Taubman and Wachter (1986); Gordon, Edwards and Reich (1982);Wilkinson (1981); Cain (1976). The application of segmented labourmarket theories to development economics is not, however, covered inthese sources, and the author is unaware of any surveyor bil>liographicsources for this application. ~

Becker, G.S. 1964. Human Capital. New York: Columbia UniversityPress for the National Bureau of Economic Research.

Berger, S. and Piore, M.J. 1980. Dualism and Discontinuity in Tndus-trial Societies. Cambridge: Cambridge University Press.

Braverman, H. 1974. Labor and Monopoly Capital. New York:Monthly Review Press.

Cain, G. 1976. The challenge or segmented labor market theories toorthodox theory: a survey. Journal of Economic Literature 14(4),December, 1215-57.

Cairnes, J.E. 1874. Some Leading Principles of Political Ecl1npmy.New York: Harper & Brothers.

Gordon, D.M., Edwards, R.C. and Reich, M.S. 1982. SegmentedWork, Divided Workers: The Historical Transformation of Laborin the United States. Cambridge: Cambridge University Press.

Marshall, A. [1890) 1959. Principles of Economics. 8th edn" London:Macmillan.

Mill, J.S. [1848) 1900. The Principles of Political Economy,IVol. I.Revised edn, The World's Greatest Classics, New Yorlt: ColonialPress. i

Myrdal, G. 1944. An American Dilemma. New York: Ha~1i & Row.Taubman, P. and Wachter, M.L. 1986. Segmented labor njarkets. In

Handbook of Labor Economics, ed. o. Ashenfelter and'R. Layard, Amsterdam: Elsevier Science Publishers.

Werthheim, W.F. 1967. Economy, dual. In Tnternational E!U1Yclopediaof the Social Sciences, Vol. 4, New York: Macmillan I/nd FreePress, 495-500.

Wilkinson, F. (ed.) 1981. The Dynamics of Labor Market $egmenta-tion. New York: Academic Press.

Page 2: selection bias and self-selection - jenni.uchicago.edujenni.uchicago.edu/econ312/papers/scanned-palgrave.pdf · selection. See COMPEfI110N AND SELECTION. selection bias and self-selection

selection bias and self-selection

Any sampling rule can be interpreted as producing a non-negative weighting function w(y, x) that alters the populationdensity.

Let (Y*, X*) denote the sampled random variables. Thedensity of the sampled data g(y*, x*) may be written as

g(y*, x*) = w(y*, x*)f(y*, x*)j

r w(y*, x*)f(y*, x*) dy* dx* (1.1).where the denominator of the expression is introduced to makethe density g(y*, x*) integrate to one as is required for properdensities.

Alternatively, the weight may be defined as

w(y*, x*)w*(y*, x*) =

population that is the object of interest. The distorted r

i presen- tation of a true population as a consequence of a sampI" Itg rule

is the essence of the selection problem. Distorting selecti. rules

may be the outcome of decisions of sample survey stati ticians,

self-selection decisions by the agents being studied 9r both.

A random sample of a population produces a descrililtion of

the population distribution of characteristics that ha~ many

desirable properties. One attractive feature of a randoml-,mple

generated by the known rule that all individuals are f!ually

likely to be sampled is that it produces a description! Qf the

population distribution of characteristics that becoltt,s in-

creasingly accurate as sample size expands. I I

A sample selected by any rule not equivalent to

~ andom sampling produces a description of the population distr bution

of characteristics that does not accurately describe t II true

populati?n distribution of characte~stics no matter. how ~g .t~e

sample size. Unless the rule by which the sample IS sel ted IS

known or can be recovered from the data, the selected ample

cannot be used to produce an accurate description lof the

underlying population. For certain sampling rules, even ~howl-

edge of the ~ule g~ne~ati~g the sample does not su.m~ to tepover

the population dlstnbutlon from the sampled dlstnbut/on.

This entry defines the problem of selection bias and Presents

conditions required to solve the problem. Examples of- .arious

types of commonly encountered sampling frames are given and

specific economic selection mechanisms are presented. A~s~mp-

tions required to use selected samples to determine featpres of

the population distribution are discussed. I

The analytical framework developed to understand th Infer-

ential problems raised by selection bias is also frui!pl in

understanding the economics of self-selection. The proto }!pical

choice theoretic model of self-selection is that of Roy (I 5'). In

his model, agents choose among a variety of iScrete

'occupational' opportunities. Agents can pursue on y one

'occupation' at a time. While every person can, in princi I~, do

the work in each 'occupation', at least at some I el of

competence, self-interest drives individuals to choo that

'occupation' which produces the highest income (utili y~ for

them. As in the statistical selection bias problem, the e is a

latent population (of skills). Observed (utilized) skill di tribu-

tions are the outcome of a selection rule by agent. The

relationship between observed and latent skill distributio s is of

considerable interest and underlies recent work on

E rker hierarchies (see Willis and Rosen, 1979). The 'occupatio s' can

be: (a) market work or non-market work (b) unemploy and

searching or working at the offered wage (c) working none

province or working in another, or (d) any choice amon~ a set

of mutually exclusive opportunities. !

Because the insights in the Roy model underly much recent

research, we present a brief exposition of it and demonstrate

how it can be or has been fruitfully extended to a variety of

settings. An important issue, closely linked to the problem of

identifying population parameters from selected sample Idistri-

butions, is the empirical content of economic models dr self-

selection and worker hierarchies. Are th~y artefacts of diftfi~u-

tional assumptions for unobservable skills or are they gtnume

behavioural hypotheses? I

I w(y*, x*)f{y*, x*) dy* dx*I

I. A DEFINITION AND SOME EXAMPLES OF SELECfION BI~S

Any selection bias model can be described by the follb\!llingset-up. Let Y be a vector of outcomes of interest and letl ~ bea vector of 'control' or 'explanatory' variables. The populationdistribution of (Y, X) is F(y, x). To simplify the exposition weassume that the density is well defined and write it asf(y,x).

x [I]' -.I [f(l -i(y,x»[(y, x) dy dx J -.I, (1.4)

288

so that

g(y*, x*) = w *(y*, x*)f(y*, x*). (1.2)

Sampling schemes for which w(y, x) =0 for some values of(Y, X) create special problems. For such schemes, not all valuesof (Y, X) are sampled. Let indicator variable i(x, y) = 0 if apotential observation at values y, x cannot be sampled and leti(y, x) = I otherwise. Let A = I record the occurrence of theevent 'a potential observation is sampled, i.e. the value of y, xis observed' and let A = 0 if it is not. In the population, theproportion that is sampled is

Pr(A = I) =.f i(y,x)f(y, x) dy dx. (1.3)

while

Pr(A=O)= I-,.Pr(A= I).For samples in which w(y, x) = 0 for a non-negligible propor-

tion of the population (Pr(A = 0) > 0), it is clarifying to con-sider two cases. A truncated sample is one for which Pr(A = I)is not known and cannot be consistently estimated. For such asample, (1.1) is the density of all of the sampled Y and X values.A censored sample is one for which Pr(A = I) is known or canbe consistently estimated. The sampling rule in this case is suchthat values ofy,x for which w(y,x)=O are not known butitis known whether or not i(y, x) = 0 for all values ofY, X. In thiscase it is notationally convenient to define (Y*, X*) = (0, 0) forvalues of y, x such thatw(y, x) = i(y, x) = o. Such a definiti<?nis innocuous provided that in the population there is no pointmass (concentration of probability mass) at (0,0). (Any valueother than (0, 0) can be selected provided that there is no pointmass at that value). Given A = 0, the distribution of Y*, X* is

G(y*, x*) = 1 for A = 0

at

y* = 0 and X*= O.

The joint density of Y*, X*, A for the case of a censored sampleis obtained by combining (1.1) and.(I.3). Thus

w(y*, x*)f(y*, x*)g(y*,x* ,«5) =

Page 3: selection bias and self-selection - jenni.uchicago.edujenni.uchicago.edu/econ312/papers/scanned-palgrave.pdf · selection. See COMPEfI110N AND SELECTION. selection bias and self-selection

non-ation.The

(1.1)

The first term on the right-hand side of (1.4) is the C

~ I!ditional density of Y*, X* given A. = I. The second term is t ~ proba-

bility that A. = I. The third term is the conditional tnsity of

Y*, X* given A. = O. This density assigns unit ~ass to

y* = O..~* = 0 when A. = O. The fourth term is 1he dr@bability

that A. = O. Notice that in the case in which w(y, xi:\. '0 for all

y,x.,A. = I and (1.4) is identical to (1.1). i

In a random sample w(y*, x*) = I (and so w*(y*, x ) = I). In

a selected sample, the sampling rule weights the data i~erently.

Values of (Y, X) are over-sampled or under-sampled ~lative to

their occurrence in the population. In the case of ttuncated

samples, the weight is zero for certain values of the Qutcome.

In many problems in economics, attention focuses ~f(Ylx),

th~ conditional density of Y given X = x. In such ~roblems

knowledge of the population distribution of X is 0 ilo direct

interest. If samples are selected solely on the x variables

('selection on the exogenous variables'), w(y, x) =] lJ)(X) and there is no problem about using selected samples to ke valid

inference about the population conditional density. ~is is so

because in the case of selection on the exogenous ahables

( * *

) f(Y * 1 * ) w(~*)f(x*) gy,~ = ~

f w(~*)f(x*) dx

:nake

roper

(\.2)

Jes of,a\ues) if and let)f the,f y, x1, the

and

g(x*) =

suport at those values. We now turn to some specific samplingplans of interest in economics.

Example I. Data are collected on incomes of individualswhose income Yexceeds a certain value c (for cutoff value). Therule is to ooserve Y if Y > c. Thus w(y) = I if y > c andw(y) = 0 if y ~ c. Because the weight is zero for some valuesof y, we know that knowledge of the sampling rule does notsuffice to recover the population distribution. From a randomsample of the entire population, the social scientist knows orcan consistently estimate (a) the sample distribution of Yabovec and (b) the proportion of the original random sample withincome below c (F(c) where F is the distribution function of f).The social scientist does not observe values of Y below c.

In this example, observed income is a truncated randomvariable. The point of truncation is c. The sample of observedincome is said to be censored. If the proportion of the originalrandom sample with income below c is not known and cannotbe consistently estimated, the sample is truncated. In a truncatedsample, nothing is known about the proportion of the under-lying population that can appear in the sample. A sample istruncated only if w(y) = 0 for some intervals of y (for ycontinuous) or if w(y) = 0 at values off at which there is finiteprobability mass. In a censored sample, the proportion of theunderlying population that can appear in the sample is known,at least to an arbitrarily high degree of approximation, assample size increases.

Let y* = Y if Y > c. Define y* = 0 otherwise (the choice ofthe value for y* when Y is not observed is inessential and anyvalue can be used in place of 0 provided that the true distribu-tion places no mass at the selected value). Define an indicatorvariable 1:\ = I if Y > c. 1:\ = 0 otherwise. Then the distributionof y* is

cu(x*)f(x*)

r:~::~:;~3) Thus

c)=F(y*Ic5=I)G(y*1 Y > 0) = F(y*IY

F(y*)'opor-) con-~ = 1);uch a'alues.)r cans suchbut itIn this0) fornitionpointvaluepoint

.X. is

~.v'>c (1.6a)

oo(y', x') w(y*,x*)f(y*, x*) dy* dx*

;ample

and by hypothesi$ both the numerator and denomiq~tor of theleft-hand side are known. From the requirement t~~t (y*, x*)has a well defined density

f'lff(Y*, x*) dy* dx* = I.

Integrating the left-hand side of (1.5) it is possible t ~etermineJ (J)(Y*, x*)f(y*, x*) dy* dx. and hence to use (1.5 to recoverthe population density of the data. !

The requirements that (a) the support of (y, x) is Iknown and(b) (J)(y, x) is nonzero are not innocuous. In many importantproblems in economics requirement (b) is not satisfied: thesampling rule excludes observations for certain values of y, xand hence it is impossible without invoking further assumptionsto determine the population distribution of (Y, X) at thosevalues. If neither the support nor the weight is known, it isimpossible, without invoking strong assumptions, tp determinewhether the fact that data are missing at certain )1, x values isdue to the sampling plan or that the population dehsity has no

I -F(c)-G(y*IY*>O)=1 for Y* = 0 (Ii. =0). (1.6b)

Observe that (1.6a) is obtained from (1.1) by setting w(y*) = Iif y > c, and w(y*) = 0 otherwise, and integrating up withrespect to y*. The distribution of Ii. is

pr(li.) = [I --"(c)f[F(c)r -6.

The joint distribution of (Y*, Ii.) is

F(y*, <5) = F(y*I<5)Pr«5)

= { F(y*) }6[I-F(C)]6(1)1-6[F(C)]1-6(I -F(c))

=[F(y*)]~F(c)r-6. (\..7)

Note that (1.7) is obtained from (1.4) by settingw(y)=O,y<c,w(y)=1 otherwise, by setting i(y)=w(y),and by integrating up with respect to y*. For normally distrib-uted Y, (1.7) is the 'Tobit' distribution.

The difference between the information in a truncated sampleand the information in a censored sample is encapsulated in thecontrast between (1.6a) and (1.7). Clearly there is more infor-mation in a censored sample than in a truncated sample becauseone can obtain (1.6a) from (1.7) (by conditioning on Ii. = I) but

not vice versa.Inferences about the population distribution bjlsed on as-

suming that F(y*1 Y > c) closely approximates F(y) are poten-tially very misleading. A description of population incomeinequality based on a subsample of high income people mayconvey no information about the true population distribution.

289

(1.4)

For such problems, sample selection distorts infere~qe only ifselection occurs on y (or y and x). Sampling on botli y and xis termed general stratified sampling. I

From a sample of data, it is not possible to recover the truedensity j(y, x) without knowledge of the weighting rule. On theother hand, if the weighting rule is known (ro(y., x.», thedensity of the sampled data is known (g(y., x.», t e supportof (y, x) is known and ro(y, x) is nonzero, then ~y, x) canalways be recovered because

I.* ..,if,g(y,x)- f(Y,x) ;,11... (1.5)

Page 4: selection bias and self-selection - jenni.uchicago.edujenni.uchicago.edu/econ312/papers/scanned-palgrave.pdf · selection. See COMPEfI110N AND SELECTION. selection bias and self-selection

selection bias and self-selection

Without further information about F and its support, it ~ ~otpossible to recover F from G(y*) from either a censored or atruncated sample. Access to a censored sample enable* theanalyst to recover F(y) for y > c but obviously doe~ notprovide any information on the shape of the true distribf~onfor values of y ~ c. "

This probl~m is routinely. 'solve~' by assuming that F i~ of aknown functIonal form. This solutIon strategy does not al~I1Yswork. If F is normal, then it can be recovered from a cen~oredor truncated sample (pearson, 1901). If F is Pareto, F carl1otbe recovered from either a truncated or a censored sample (seeFli~n ~nd Heckman, 1982). If F is real analy~ic (i.e. pos~f~sesderivatIves of all order) and the support of Y IS known, tnrn Fcan be recovered (Heckman and Singer, 1985). ,

Example 2. Expand the discussion in the previous examl1 !to a linear regression setting. Let

Y=XP+U 1~8),

be the population earnings function where Y is earningS'~X isa regressor ~ector assumed t~ be di~tribut~ ind~pendent y ofmean zero dIsturbance U. 'P' IS a suItably dImensIoned pa ~-eter vector. Conventional assumptions are invoked to e*urethat ordinary least squares applied to a random samplF ofearnings data consistently estimates p.

Data are collected on incomes of persons for whom Y ex~sc. Again the weight depends solely on y, i.e. w(y,.~~:0, y ~ c, w(y, x) = I,y > c. The social scientist knows orlqanconsistently estimate (a) the sample distribution of Y ab~ ' c

(b) the sample distribution of the X for Yabove c and (c ~e

proportion of the original random sample with income I~wc. The social scientist does not observe values of Y below c.

As before, let y* = Y if Y > c. Define y* = 0 otherfi~.t\ = I if Y > c, t\ = 0 otherwise. The probability of the eyentt\ = I given X =x is

Pr(t\ = IIX = x) = Pr(Y > clX = x)

= Pr(Y > c -xPIX=x).

Invoking independence between U and X and letting Fu dthe distribution of U,

Pr(t\ = IIX = x)= I -Fu(c -xP)

andPr(t\ = OIX = x) = Fu(c -xP). i

The distribution of y* conditional on X is

G(y*1 Y > O,X =x) =F(y*IX=x, Y> c)

=F(Y*IX=x,t\= I)

Fu(Y*- xP)Populationregression

I-Fu(c-xfJ)' y*>c.

G(y*IY~O)=l for Y*=O(L\=O).

The joint distribution of (Y*, L\) given X = x is

F(y*,bIX = x) = F(y*ib, x) Pr(blx)

= {Fu(Y* -xfJ)}J{Fu(c -xfJ)}'-6

(1.lra)

(1.I?b)

=1

Selected sampleregression

(1.~~)

xIn particular,

E(Y*IX = x, A. = 1) = xp + E(UIX = x, t5 = I)~ z dFu (z)

=x,+ (I.~~~c-.,(I.-Fu(c -xII» ,

wherez is a dummy variable of integration.. In contrast, ~he Figure J

290

population mean regression function is

E(YIX = x) = xfl.. (1.13)

The contrast between (1.12) and (1.13) is illuminating. Manybehavioural theories in social science produce empirical coun-terparts of (1.8) with population conditional expectations like(1.13). Such theories sometimes restrict the signs, permissiblevalues and other relationships among the coefficients in fl.When the theoretical model is estimated on a selected sample(A. = I),. the true conditional expectation is (1.12) not (1.13). Theconditional mean of U depends on x. In terms of conventionalomitted variable analysis, E(UIX=x,A.= I) is omitted fromthe regression. Since this term is a function of x it is likely tobe correlated with x. Least squares estimates of fl obtained onselected samples which do not account for selection are biasedand inconsistent.

To illustrate the nature of the bias, it is useful to draw on thework of Cain and Watts (1973). Suppose that X is a scalarrandom variable (e.g. education) and that its associatedcoefficient is positive (p > 0). Under conventional assumptionsabout U (e.g. mean zero, independently and identically distrib-uted and distributed independently of X), the populationregression of Y on X is a straight line. The scatter about theregression line and the regression line are given in Figure I.When Y > c is imposed as a sample inclusion requirement,lower population values of U are excluded from the sample ina way that systematically depends on x. (Y > c or U > c -xP).As x increases, the conditional mean of U[E(UIX = x, A. = I)]decreases. Regression estimates of P that do not correct forsample selection (i.e. include E(UIX = x, A. = I) as a regressor)are downward biased because of the negative correlation be-tween x and E(if IX = x, A. = I). See the flattened regressionline for the selected sample in Figure I.

In models with more than one regressor, no sharp result onthe sign of the bias in the regression estimate that results fromignoring the selected nature of the sample is available exceptwhen the X variables are from certain distributions (e.g. normal,see Goldberger, 1983). None the less, the key result -thatconventional least squares estimates of fl obtained from selectedsamples are biased and inconsistent remains true.

As in example I, it is fruitful to di.,;nguish between the caseof a truncated sample and the case of a censored sample. In thetruncated sample case, no information is available about thefraction of the population that would be allocated to thetruncated sample [pr(A. = I)]. In the censored sample case, this

Page 5: selection bias and self-selection - jenni.uchicago.edujenni.uchicago.edu/econ312/papers/scanned-palgrave.pdf · selection. See COMPEfI110N AND SELECTION. selection bias and self-selection

selection bias and self-selection

(1.13)

\1anycoun-s likessiblein p.

lmple).The:ionalfrom

~ly to.;:d on,jased

on theiCalar:iatedItionsstrib-ation11 theIre I.~ent.?Ie in-xP).= 1)]:t for:ssor)n be-'8sion

fraction is known or can be consistently estimat~t In thecensored sample case it is fruitful to distinguish tv,'O furthercases: (a) the case in which X is not observed when ~= 0 and(b) the case in which it is. Case (b) is the one most fullydeveloped in the literature (Heckman and MaCur~y, 1981).

Note that the conditional mean 'E(UIX=x,i\I=*..J) is afunction of c -xp solely through Pr(i\ = 11*), SincePr(i\ = Ilx) is monotonic in c -xp, the conditiop~1 meandepends solely on Pr(i\ = II x) and the parameters F. I.e. since

F;;'(I-Pr(i\=llx»=c-xp,. . '

00 z (z)

E(UIX=x,i\= 1)= .

fFr'I'-Pr(A_IIX))Pr(i\ Illx)

This relationship demonstrates that the conditional mean is afunction of the probability of selection. As the pro~~bility ofselection goes to 1, the conditional mean goes to izII:ro. Forsamples chosen so that the values of x are such! that theobservations are certain to be included in the samplf' there isno problem in using ordinary least squares on selectep,samplesto estimate p. Thus in Figure 1, ordinary least s<l4ares re-gressions fit on samples selected to have large x valJe~ closelyapproximate the true regression function and become arbi-trarily close as x becomes large. The condition mean in (1;12)is a surrogate for Pr(i\ = Ilx). As this probability goes to one,the problem of sample selection in regression analysis becomesnegligibly small.

Heckman (1976) demonstrates that p and F. are iqeptified ifU is normally distributed and standard conditions ihvoked inregression analysis are satisfied. Gallant and Nychka (1984) andCosslett (1984) establish conditions for identificatiol) for non-normal U. In their analyses, Fu is consistei tlr non-parametrically estimated.

Example 3. The next example considers censorej/ randomvariables. This concept extends the notion of a truncatedrandom variable by letting a more general rule than truncationon the outcome of interest generate the selected sample. Becausethe sample generating rule may be different from ~ simpletruncation of the outcome being studied, the con~pt of acensored random variable in general requires at least twodistinct random variables.

Let Y, be the outcome of interest. Let Y 2 be anothllr randomvariable. Denote observed Y, by yr. If Y2 < c, Y1 is:dbserved.Otherwise Y, is not observed and we can set yr =; 0 or anyother convenient value (assuming that Y, has no poi~t mass atY1 = 0 or at the alternative convenient value). In te~s of theweighting function w, w(y" yJ = 0 if Y2> c, W(YI ,yz) = I if

Y2~C.Selection rule Y 2 < c dQes not necessarily restrict t~ ,range of

Y,. Thus Yf is not in general a truncated rando~ variable.Define i\ = I if Y 2 < c; i\ = 0 otherwise. If F(Yl' )Ii) is thepopulation distribution of (Y" YJ, the distributiondf i\ is

Pr(A = b) = [1 -F2(c)r -6[FJc»)6, b = 0,1,

whe~e F2 is the marginal distribution of Y2. The distl1 ~tion of yr IS i

G(yr)=F(Yflb = 1)=~, i\= I, 1(1014a)

G(yr=O)=I, i\=O'i(I.14b)Note that (1.14a) is the distribution function correspo~ding tothe density in (1.1) when W(Yl,YJ = I ifY2 ~ cand w(Y"YJ =0 otherwise.

The joint distribution of (yr, i\) is

G(Yt,b)=[F(Yf,c))6[I-FJc)r-l. (1.15)

tit onfromxceptrmal,~ that

ected

: casen thet the) the.this

291

This is the distribution function corresponding to density (1.4)for the special weighting rule of this example. In a censoredsample, under general conditions it is possible to consistentlyestimate Pr(~ = b) and G(yt). In a truncated sample, onlyconditional distribution (1.14a) can be estimated. A degenerateversion of this model has Y1 = Y2. In that case, censoredrandom variable Y1 is also a truncated random variable. Notethat a censored random variable may be defined for a truncatedor censored sample.

Example 3 and variants of it have wide applicability ineconomics. Let Y1 be the wage of a woman. Wages of womenare observed only if women work. Let Yz be an index of awoman's propensity to work. In Gronau (1974) and Heckman(1974), Y 2 is postulated as the difference between reservationwages (the value of time at home determined from householdpreference functions) and potential market wages Y1. Then ifY2 < 0, the woman works. Otherwise, she does not. Yt = Y1 ifY 2 < 0 is the observed wage.

If Y1 is the offered wage of an unemployed worker, and Y2is the difference between reservation wages (the return tosearching) and offered market wages, Yt ';= Y, if Y 2 < 0 is theaccepted wage for an unemployed worker (see Flinn andHeckman, 1982). If Y,is the potential output of a firm and Y2is its profitability, Yt = Y1 if Y 2> O. If YI is the potentialincome in occupation one and Y.2 is the potential income inoccupation two, Yt = Y1 if Y1 -Y2 < 0 while Yz* = Y2 ifY1 -Y2 ~ o. We develop this example at length in section 2where we consider explicit economic models of self-selection.There we discuss the identifiability of this model.

Example 4. This example builds on example 3 by intro-ducing regressors. This produces the censored regression model(Heckman, 1976; 1979). In example 3 set

Y, = XI'I + U, (1.16a)

Y2 = X2'2 + U2 «(16b)

where (X" XJ are distributed independently of (U" UJ, amean zero, finite variance random vector. Conventionalassumptions are invoked to ensure that if Y1 and Y2 can beobserved, least squares applied to a random sample of dataon (Y1, Y2,X"X2) would consistently estimate '1 and 'z.Yt=Y. if Y2<0. 1f Y2<';1,~=I. Then the regressionfunction for the selected sample is

E(Ytlx, =Xl' Y2<O)=E(YtIX. =x,,~= 1)

=X,',+E(U1IX, =XI'~= 1) (1.17)

and the regression function for the population is

E(Y,IX,=x.)=X1fJ.. (1.18)

As in the regression analysis of truncated random variables,there is an illuminating contrast between the conditional ex-pectation for the selected sample (1.17) and the populationregression function (1.18). The two functions differ by theconditional mean of U,[E(U.IX. = x,,~= I)]. In the regressionanalysis of truncated random variables, ordinary least squaresestimates of fJ (in equation (1.14» are biased and inconsistentbecause the conditional mean is improperly omitted from theselected sample regression. The same analysis applies to theregression analysis of censored random variables. The condi-tional mean is a surrogate for the probability of selection[Pr(~ = Ilx2)]' As Pr(~ = Ilx2) goes to one, the problem ofsample selection bias becomes negligible. However, in thecensored regression case, a new phenomenon appears. If thereare variables in X2 not in X" such variables may appear to be

Page 6: selection bias and self-selection - jenni.uchicago.edujenni.uchicago.edu/econ312/papers/scanned-palgrave.pdf · selection. See COMPEfI110N AND SELECTION. selection bias and self-selection

selection bias and self-selection

proportion purchasing none of good ZI given p, Mis

Pr(~1 =OIP,M)';' 1- r dF(£).

.J~Provided inequality (1.21} is satisfied, ~I = I and interiorsolution demand function

ZI=ZI(P,M,E) (1.22)is well defined and Z\ ~ zt .When ~I = 0, observedZ\=Zt=O.

Equation (1.22) is the conventional object of interest inconsumer theory. Partial derivatives of that function holding Eand the other arguments constant have well defined economicinterpretations. Suppose that some non-negligible proportionof the population buys none of good Z\. Regression estimatesof the parameters of (1.22) using zt approximate theconditional expectation

E(ZII~I=I,P,M}=f!!;ZI(P,M,£)dF(£). (1.23)

The derivatives of (1.23) are different from the derivatives of(1.22). In order to define these derivatives, it is helpful to defineIE (E) as an indicator function for set E which equalsone if E E E and equals zero otherwise. When=prices or incomechange, the set of values of E that satisfy inequality (1.21)changes. Let E + ~Ep be the set of E values thatsatisfy (1.21) when there is a finite price change ~P. IE+6Ep(E)is an indicator function which equals one whenE EE: + ~E:P. Then the derivatives of(I.23) are, for thejth price

iJE(ZJI~= 1,P,M)

statistically important determinants of YI when ordinary leastsquares is applied to data generated from censored samPles.

As an example, suppose that survey statisticians use $o~eextraneo~s (to XI) variables to ?etermine sample. enrol~~ nt. Such vanablesmay appear to be Important determinants () !YI

when in fact they are not. They are important determinan lofYr. In an analysis of self-selection, let YI be the wage tn .apotential worker could earn were he to accept a market qffer.Let Y 2 be the difference between the best non-market onppr-tunity available to the potential worker and YI. If Y2 < O~ ~eagent works. The conditional expectation of observed \\fa~es(yr = Y, if Y2 < 0) given XI and X2 will be a non-trivial fun~tlonof X2' Thus variables determining non-market opportuljitieswill determine yr, even though they do not determine YI' forexample, the number of children less than six may appear (0 besignificant determinants of Y1 when inadequate account is tltltenof sample selection, even though the market does not place anyvalue or penalty on small children in generating wage offers forpotential workers.

Heckman (1976) develops the analysis of this model wlien(VI, VJ is normally distributed. Gallant and Nychka (~984)andCosslett (1984) demonstrate that under mild restrictions onF(uI' uJ, if there is one continuous valued variable in X2 notin XI (so that there is no exact linear dependence betweep X2and XI), PI' P2 and F(ul' uJ can be consistently ppn-parametrically estimated. Heckman and MaCurdy (1986>

] ' e-

velop this class of models at length. j !

Example 5. This example demonstrates how self-selection: jasaffects the interpretation placed on estimated consumer de~~ndfunctions when there is self-selection. We postulate a pppu-lation of consumers with a quasi-concave utility func(:tlonV(Z, E) which depends on the consumption of goods andpreference shock E which represents heterogeneity in p~ter-ences among consumers. The support of E is E. For price v~orP and endowment income M, the consumer's problem is Ito

Max V(Z, E) subject to p'Z ~ M. [ I I~ the populatio~ ~ and M a~e distributed independently !E.

FIrst order conditIons for thIs problem are i

oV(Z, E) I~ J.P, ( .9)az ,--,

where A ffithe Lagrange multiplier associated with the budgetconstraint. Focusing on the demand for the first good, Z., n0neof it is purchased if at zero consumption of Zl

aU(Z,E~1 ~API

az, 2,-0

i.e. marginal valuation is less than marginal cost in utility tConventional interior solution demand functions for Zdefined for a given P, M only for values of E such that

aU(Z, E) I '~"P1.

2,=0

i i1Z.(P,M,l) = dF(l)

i1Pj ~ i1Pj

. i [(IE+AE,.(l)-IE(l)]Z(P,M,l) + ltm ~ ~J ~ dF(l). (1.24)APj-O ~ A.Pj

When the limit in the second term does not exist, the derivativedoes not exist. We assume for expositional convenience that thelimit is well defined.

The first expression on the right-hand side of (1.24) is theaverage effect of price change 01; ~ommodity demand. Thesecond term on the right-hand side of (1.24) arises from thechange in sample composition of E as the proportion ofnon-purchasers changes in response to price change. This termgene~ates the selection bias.

Neither term is the same as the price derivative of (1.22) foran arbitrary value of E = l although the first term on theright-hand side of (1.24) approximates the price derivative of(1.22) for some value ofE=l.

A similar decomposition of the derivatives of the conditionaldemand function can be performed if it is defined solely for asample of non-zero purchasers (see Heckman and MaCurdy,1981, 1986).

Just as in the statistical sample selection bias problem, thereis a population of interest. In this case, the population par-ameters of interest are the distribution of E and the parametersof U(Z, E). Those who buy Zl are a self-selected sample of thepopulation. Estimates of population parameters estimated onself-selected samples are biased and inconsistent. There is apopulation distribution of Z.(P, M, E) generated by the distri-bution of E. Observations of Zl are obtained only ifE E E(w(E) = I if E E E, w(E) = 0 otherwise). Alternatively onecan express the inclusIOn criteria in terms of the latent popula-tion distribution of Z. induced by E (given P and M)and writew(z.) = I if z. > 0, W(ZI) = 0 if Zl ~ O.

(Ilj~l)oZ,

Let the set of E for which conventional interior sol,tionconsumer demand functions for Z, are defined be den ,edby ~. Then ! i

{ IIau(z, E) I } I !1.1~= E az1z,_o~)'PI forgiven P,M ..,c;

Let .-\, = 0 if the consumer does not purchase ZI' Let All Iotherwise. If F(£) is the population distribution of ~ he

292

Page 7: selection bias and self-selection - jenni.uchicago.edujenni.uchicago.edu/econ312/papers/scanned-palgrave.pdf · selection. See COMPEfI110N AND SELECTION. selection bias and self-selection

selection bias and self-selection

1.22)

rved

;1

inIlgE:>mic

latesthe

1.23)

~S ofefineluals;ome1.21)that,,(E)Hhenprice

Heckman (1974) and Heckman and MaCurdy (1981) providefurther discussion of this type of model which is widely used inapplied economics and consider issues of identifiability for suchmodels.

Example 6. Length biased sampling. Let T be the duration ofan event such as a completed unemployment spell or a com-pleted duration of a job with an employer. The populationdistribution of Tis F(t) with density f(t). The sampling rule issuch that individuals are sampled at random. Data are recordedon a completed spell provided that at the time of the interviewthe individual is experiencing the event. Such sampling rules arein wide use in many national surveys of employment andunemployment.

In order to have a sampled completed spell, a person mustbe in the state at the time of the interview. Let '0' be the dateof the survey. Decompose any completed spell T into a com-ponent that occurs before the survey T b and a component thatoccurs after the survey T.. Then T= T.+ Tb' For a person tobe sampled, Tb> O. The density of T given Tb = tb is

f(t) IIf(tltb)= 1 F( ,t~tb' (1.25)

-tb)

Suppose that the environment is stationary. The p0pulationentry rate into the state at each instant of time is k. From eachvintage of entrants into the state distinguished by their distancefrom the survey date tb' only 1 -F(tb) = Pr(T > tb) survive.Aggregating over all cohorts of entrants, the population pro-portion in the state at the date of the interview is P where

P = fo'" k(1 -F(tb) )dtb (1.26)

which is assumed to exist. The density of T:, sampled pre-survey duration, is

1.24)* * -k(1 -F(t:»

g(tbltb>O)-- -p -.! 'I'

The density of sampled completed durations is thus I I'

g(t*) = f:'1(t*,t:)g(t:,t: > 0) dt: 1!: 1(/.) 1- F(t*) i " ,f;!=k b

dt *;:

b "c ,c,I-F(t:) P "'C.'"0 i;::;;~

t.l'(t. ) i:, ),);:f;= k -.! ccl:::JY~

P ;

Observe from (1.26) that by a standard integration lby partsI,

argument i

P = k fo'" (I -F(z»dz = k fo'" z dF(z) = kE(r~f

(1.27)

'ativeIt the

s theThe

n theIn oftenD

source of the bias is the requirement that T b > 0, not that onlya fraction of the population experiences the event (P < I).

The simple length weight (W(I) = I) that produces (1.28) is anartefact of the stationarity assumption. Heckman and Singer(1985) consider the consequences of non-stationarity and un-observables when there is selection on the event that a personbe in the state at the time of the inverview. The also demon-strate the bias that results from estimating parametric modelson samples generated by length biased samplinlZ rules wheninadequate account is taken of the sampling plan. Vardi (1983,1985) and Gill and Wellner (1985) consider nonparametricidentification and estimation of models with densities of theform (1.28).

It is unfortunate that the lessQns of length biased samplingare not adequately appreciated in economics. Two widely citedstudies by Clark and Summers (1979) and Hall (198~) use lengthbiased data to prove, respectively, that unemployment andemployment spells are 'surprisingly long'. Whether theirfindings are artefacts of sampling plans remains to be deter-mined.

Example 7. Choice based sampling. Let D be a discretevalued random variable which assumes a finite number ofvalues I. D = i, i = I, ..., I corresponds to the occurrence ofstate i. States are mutually exclusive. In the literature the statesmay be modes of transportation choice for comrnunters(Domencich and McFadden, 1975), occupations, migrationdestinations, financial solvency status of firms, schoolingchoices of students, etc. Interest centres on estimating apopulation choice model

Pr(D=iIX=x), i=l,...,l. (1.29)

The population density of (D, X) is

lCd, x) = Pr(D = dlX = x)h(x) (1.30)

where h(x) is the density of the data.In many problems, plentiful data are available on certain

outcomes while data are scarce for other outcomes. For ex-ample, interviews about transportation preferences conductedat train stations tend to over-sample train riders andunder-sample bus riders. Interviews about occupational choicepref~rences conducted at leading universities over-sample thosewho select professional occupations.

In choice based sampling, selection occurs solely on the Dcoordinate of (D, X). In terms of (1.1) (extended to allow fordiscrete random variables), w(d, X) = w(d). Then sampled(D*, X*) has density

2) fort1 theve of

ro(d* )' I'(d* x* )g(d., x*) = I :J. (1.31)

i~1 f ro(i)f(i, x*) dx*

Notice that the denominator can be simplified toI

L ro(i)f(i)i-I

where f(d.) is the marginal distribution of D. so that

ro(d* ) I'(d* x* )g(d*,x.)= ,:J , (1.32)

L ro(i)f(i)i-I

Also, integrating (1.31) with respect to x using (1.32}we obtain

ro(d*)f(d.)g(d*)= I

L ro(i)f(i)I-I

:ionalfor aurdy, Note that

1*/(/*)g(t.) = ~. (I 28)E(T) .

In this fonn (1.28) is equivalent to (1.1) with oo(t) = t. Hencethe tenn 'length biased sampling'. Intuitively, longer spells areoversampled when the requirement is imposed that a spell be inprogress at the time the survey is conducted (T b > 0). Suppose,instead, that individuals are randomly sampled and data arerecorded on the next spell of the event (after the survey date).As long as successive spells are independent, such a samplingframe does not distort the sampled distribution because norequirement is imposed that the sampled spell be in progress atthe date of the interview. It is important to notice that the

therepar-

.1eters)f the~d on: is ajistri-Ily if~y one

.pula-

.write

~0.33)

293

Page 8: selection bias and self-selection - jenni.uchicago.edujenni.uchicago.edu/econ312/papers/scanned-palgrave.pdf · selection. See COMPEfI110N AND SELECTION. selection bias and self-selection

selection bias and self-selection

wh;ch m,'" t""'p,rent how th, "mpling ml, c'u"" th,..mplod propoition, to d,v;at, from th, population propm.tio", Not, furth" th,t '" a ron""u,n,, of ..mpling only onD. th, popuJot;on rond;!ion,1 d,",ity

f(d' x')h(x'ld') = ---'-- (134)f(d')

can be recov,red from th, cho;" b"," ..mpl, Th, d,",ity ofx in th, "mplo i, tho,

I

g(x*) = L h(x*'i)g(i).;=1

Tht:n using (1.32)-{1.35) we reachg(d*lx*) = f(d*lx*)

(1.35)

I~I

a>(d*)(1.36)

I (JJ(i)f(i) I!~~~: ~.i-I JL-. f(l) .

The bias that results from using choice based samples to makeinference about f(d*lx*) is a consequence of negleCting theterms in braces on the right-hand side of (1.36). Notice that ifthe data are generated by a random sampling rule,(JJ(d*) = I, g(d*) = f(d*) and the term in braces is one.

Manski and Lerman (1977), Manski and McFadden (1981)and Cosslett (1981) provide illuminating discussions of choicebased sampling.

Example 8. Size biased sampling. Let N be the number ofchildren in a family. f(N) is the density of discrete randomvariable N. Suppose that family size is recorded only when atleast one child is interviewed. Suppose further that each childhas an independent and identical chance P of being interviewed.The probability of sampled family size of N* = n* is

(n*) = (JJ(n*)f(n*) 1 '(1.37)g E[(JJ(N*)] ;

where (JJ(n*)=I-(I-P).' (the probability that at least onechild from a family of size n * will be sampled) and

E[(JJ(N*)] = L (I -(I -pr')f(n*).'

is the probability of observing a family. In a large populationP -+ 0 with increasing population size. Using I'Hospital's rule,and assuming that passage to the limit under the summationsign is valid

In Wi = In x, + In T, (2.2)

The proportion of the population working at task j is theproportion of the population for whom

X2TI>-T2.

XI

Roy assumes that (In T. , In T 2) is normally distributed withmean (p,,/lV and covariance matrix 2'. Letting (VI, V2) be amean zero normal vector, agents in the Roy model choosebetween two possible wages:

In WI =lnx, +/It+ VI

InW2=ln1t2+Jl2+U2.

Workers enter sector I if In WI> In W2. Otherwise they entersector 2.

Letting

q* = Jvar(U1- UJ

I

n*f(n*) ilimg(n*)=- ~I.38)p-o E(N*)

Thus the limit fonn of (1.37) is identical to (1.28). Largerfamilies tend to be oversampled and hence a misleading esti-mate of family size will be produced from such samples. Sincethe model is fonnally equivalent to the length biased samplingmodel, all references and statements about identification givenin example 6 apply with full force to this example. See thediscussion in Rao (1965).

U;;-Uil' ( ).A c;,a

i,j= 1,2, i~j, (2.3)

1~ exp( _lC2 ),,2n 2

2. ECONOMIC MODELS OF SELF-SELECTION

We begin our analysis by expositing the Roy model of self-selection for workers with heterogeneous skills. The statisticalframework for this model has been outlined in examples 3 and4. Following Roy, we assume that there are two market sectorsin which income-maximizing agents can work. Agents are freeto enter the sector that gives them the highest income. However.they can work in only one sector at a time.

A(C) =tP(c)

is a convex monotone decreasing function of C with A(C) ~ 0,and

limA(c)=O, lim A(C) = 00.C~'" c~-'"

Convexity is proved in Heckman and Honore (1986).

294

Each sector requires a unique sector-specific task. Each agenthas two skills, T, and Tz which he cannot use simultaneously.The model is short run in that aggregate skill distributions areassumed to be given. There are no costs of changing sectors,and investment is ignored. Because of this assumption, themodel presented here applies to environments with certain oruncertain prices for sector-specific tasks. For simplicity andwithout any loss of generality (given the preceding assump-tions), we assume an environment of perfect certainty.

Let Ti be the amount of sector i specific task a worker canperform. The price of task i is Xi' An agent works in sector 1if his income is higher there, that is

x,T, >xzTz (2.1)

Indifference between sectors is a negligible probability event ifthe T; = 1,2 are assumed to be continuous nondegeneraterandom variables. Throughout we assume that prices arepositive (Xi> 0).

The log wage in task i of an individual with endowment T,

is

or

andCi = (In(lti!ltj) + IIi -IIj)!u*, i # j,

Pr(i) = P(ln Wi> In ~) = tP(Ci), i # j, i,j = 1,2

where tP( ) is the cumulative distribution function of a stan-dard normal variable. When standard sample selection biasformulae are used (see, e.g. Heckman 1976), the mean of logwages observed in sector i is

E(ln Wjlln Wj>ln ~)=Inlti+lli+-

where

Page 9: selection bias and self-selection - jenni.uchicago.edujenni.uchicago.edu/econ312/papers/scanned-palgrave.pdf · selection. See COMPEfI110N AND SELECTION. selection bias and self-selection

selection bias and self-selection

agentJusly..1S arectors,1, thelin orI' and

:ump-

:r can;tor I

(2.1).ent iflerates are

The variance of log wages observed in sector i

1 jvar(ln WJln Wi> In W) = ITii{P;[1 -CiA(C,} -A~,

+(I-p;)}, i if (2.4)

where Pi = correl(Vi, Vi -V), i # j = 1,2. The varian~e of the

log of observed wages never e)(ceeds ITii' the population vari-ance, because the tenn in braces in (2.4) is never gre~ter thanunity. In general, sectoral variances decrease with i~creasedselection. For example, if PI and P2 do not equal zero, as 7[1increases with 7[2 held fixed so that people shift from sector 2to sector I, the variance in the log of wages in sector I Increaseswhile the variance in the log of wages in sector 2ff reases. Using the fact that W, = 7[, Ti, we may use (2..3) write

ITI,-ITI2IE(ln T,lln WI> In W2) = JlI + A(CI)' (2.5a)

u*

~nt T,

(2.2)is the

Figure 2

withlbeaboose

enter

stan-bias

If log

E(1~T21InWI»lnWJ~.tI2+~~~CJ'111 (2.5b)

Focusmg on (2.5a) and notmg that). IS positive for ~ I values

of C I (except CI = <XJ), the mean of log task I used in! sector I

exceeds, equal, or falls short of the population mean endow-ment of log task I as (]II -(]12 is greater than, equal ,t~ ' or lessthan zero. If endowments of tasks are uncorrelated (] 12 = 0),self-selection always causes the mean of In TI emp oyed insector I to be above the population mean .tI1 .The opposite caseoccurs when (]II -(]12 is negative. This case can arise only whenvalues of In TI and In T 2 are sufficiently positively correlated. Ifthis occurs, the mean of log task I used in sector I falls belowthe population mean .tI1' Since covariance matrices must bepositive semidefinite, (]II + (]22 -2(]12 ~ O. Thus if(]II -(]12 < 0, (]22 -(]12 > 0 so the mean of log task 2 employedin sector 2 necessarily lies above the population mean *. In theRoy model the unusual case can arise in at most on~ sector.Notice from (2.5) that only if (]II -(]12 = 0 (so PI = 0) is thevariance of log task I employed in sector I identical to thevariance of log task I in the population. Otherwise, the sectoralvariance of observed log task I is less than the populationvariance of log task I.

To gain further insight into the effect of self-selecti~ on thedistribution of earnings for workers in sector I, it is h"lpful todraw on some results from normal regression theQry. Theregression equation for In T2 conditional on In TI is

(]t2In T2=1l2+-(1n TI-.tI.)+f2' (2.6)(]12

where E(fJ = 0 and var(f2) = (]22[1 -«(]b/(]II(]22)]'Figure 2 plots regression function (2.6) for the case (]12 = (]II

and .tI2 > .tI1 > O. For each value of In TI' the population valuesof In T 2 are normally distributed around the regression line.Individuals with high values ofm T1 also tend to have a highvalue of In T2. Assuming 7It = 712' individuals with (In T" In TJendowments above the 450 line of equal income shown inFigure I choose to work in sector 2, while those individuals withendowments below this line work in sector I. Because (] 12 = (] II ,the regression function is parallel to the line of equal income.

The distribution of f2 about the regression line is the same forall values of In Tt. When individuals are classified on the basisof their In TI values the same proportion of individuals work ir.sector I at all values of In TI. For this reason the distributionof In Tt employed in sector I is the same as the latent populationdistribution. If 711 is raised (or 712 is lowered) so that the 450equal income line is shifted upward, the same proportion ofpeople enter sector I at each value of T1 = II' Figur~ 3 plotsregression function (2.6) for the case (] 12 > (] II and Il2 :1>.tI1 > O.

(2.3)

As before we set 1t1 = 1t2' Individuals with endowments abovethe 450 line choose to work in sector 2, while those withendowments below this line work in sector I. When individualsare classified on the basis of their T1 values, the fraction ofpeople working in sector I decreases the higher the value of T, .Self-selection causes the mean of log task I employed in sectorI to be less than the mean of log task I in the total population.People with high values of Tt are under-represented in sector Iand low T, values are over-represented. In the extreme, whenIn T, and In T2 are perfectly positively correlated, all high-income individuals are in sector 2, while all the low-incomeindividuals are in sector I. The highest-paid sector I workerearns the same as the lowest-~aid sector 2 worker (Roy, 1951;Willis and Rosen, 1979). In this case there is really only one skilldimension and individuals can be unambiguously ranked alongthis scale.

If 1t, is raised (or 1t2 is lowered) so that the line of equalincome is shifted upward, the mean of In T1 employed in sectorI must rise. The only place left to get T, is from the high endof the T, distribution. Unlike the case of (112 = (111' in which a10 per cent increase in 1t! results in a 10 per cent increase inmeasured average earnings in sector I, when (112 > (1" , a 10 percent increase in 1t) results in a greater than 10 per cent increasein the measured average earnings in sector I as the averagequality of the sector I work-force increases. The variance of logwages in sector I increases.

If (1'1 < (1'2' than (1'2 < (122 in order for l: to be a covariancematrix. In the population, log task 2 must have greater vari-ability than log task I. Individuals with high T, values tend tohave high T2 values. But the population distribution of log task2 has more mass in the tails. The higher an agent's value of T, ,the more likely it is that he will be able to get higher incomein sector 2. At the lower end of the distribution, the processworks in reverse: lower T, individuals on average have poor T2values. Self-selection causes the In T1 distribution in sector I tohave an evacuated right tail, an exaggerate left tail, and a lowermean than the population mean of In T1 .

)~o.

295

Page 10: selection bias and self-selection - jenni.uchicago.edujenni.uchicago.edu/econ312/papers/scanned-palgrave.pdf · selection. See COMPEfI110N AND SELECTION. selection bias and self-selection

selection bias and self-selection

Q;(ZI)= r !(z"t2)dt2

J;12112<Z'}

=' iz/(ZI,t2)dtJ

The density of Z2 is

Q2(Z2) = 1"/(/" zJ d/l.

Note that Q;(n) and Q2(n) summarize all of the available data

on observed earnings.

Now if TI' T2 are independent with cdf's Ft and Ft re-

spectively

Q;(n) = ft(n)Ff(n)

Qz(n) = Ft(n)/f(n).

Define

Q(n) = In[Q;(I) + Q2(1)] dl

= Ff(n)F1(n).

Figure]

r"'~dn= r"'~dnJ; Q(n) J; Fr(n)

= -In Fr(q,).

296

If (112 < (111 (a case not depicted graphically), the proportionof each T1 group working in sector I increases, the higher thevalue of Ti. The mean of the log task employed in sector Iexceeds Ill' A 10 per cent increase in 7t1 produces an increaseof less than 10 per cent in the average earnings of workers insector I as the mean of In T1 employed in sector I declines. Infact if (112> (122 it is possible for an increase in 7t1 to causemeasured sector I wages to decline. Thus through a selectionphenomenon it is possible for the average wage of peopleworking in sector I to decline even though the price per unitskill increases there.

How robust are these conclusions if the normality ass~p-,tion is relaxed? Heckman and Sedlacek (1985) show that many

propositions derived from assumed normality of skills ~ nothold up for more general distributions. For example, increasingselection need not decrease sectoral variances. The effects ofselection on mean employed skill levels are ambiguous. Heck-man and Honore (1986) demonstrate that in a single cross-section of data, it is possible to identify all of the parametersof the model from the data if the normality asumption isinvoked. However, in a single cross-section many other modelscan explain the data equally well. In particular, intuitive notionsabout the degree of correlation or dependence among skillshave no empirical content and so models of skill 'hierarchies'based on the extent of such dependence have no content forsingle cross-sections of data with all individuals facing con\~onprices.

To show this, write the density of skills as 1(/1' /z).

{TI if TI > T2z=0 otherwise

{ T2 if T2> T1Z2=

0 otherwise

Thus we can write

(r"' [Q~(n)l )F1(I/J)=exp- J. QwJdn i=I,2

so that we can always rationalize the data on wages in a singlecross-section by a model of skill independence, and economicmodels of skill hierarchies have no empirical content for a singlecross-section of data.

Suppose, however, that the observing economist has access todata on skill distributions in d;fferent market settings i.e.setti~gs in which relative skill prices vary. To take an extremecase, suppose that we observe a continuum of values of 7t. /7t2ranging from zero to infinity. Then it is possible to identifyF(tl' t2) and it is possible to give empirical content to modelsbased on the degrees of dependence among latent skills.

This point is made most simply in a situation in which Z isobserved but the analyst does not know ZI or Z2 (i.e. whichoccupation is chosen). When 7t1/7t2 = 0, everyone works inoccupation two. Thus we can observe the marginal density oft2. When 7t,/7t2 = 0), everyone works in occupation one. As7t1/7t2 pivots from zero to infinity it is thus possible to trace outthe full joint distribution of (T1, T J.

To establish the general result, set (1 = 7t2/7t.. Let F(t I' tJ bethe distribution function of T., T2. Then

Pr(Z ~ n) = Pr(max(T., (1T2) ~ n)

=p{ T. ~n, T2~~n)

= in,;).

As (1 varies between 0 and 0), the entire distribution can be

Page 11: selection bias and self-selection - jenni.uchicago.edujenni.uchicago.edu/econ312/papers/scanned-palgrave.pdf · selection. See COMPEfI110N AND SELECTION. selection bias and self-selection

self-interest

data

r reo

This proposition establishes the benefit of having ~C/!ess to Heckman, J. and Robb, R. 1985. Alternative methods for evaluating the

data from more than one market. Heckman and Honote (1986) effect of training on earnings. In Longitudinal Analysis of Labor

~how ho:w access to data fro~ various market ~tt/~s and Ma:ket.Data, ed. J. Heckman and B. Singer. Cambridge: Cambridge

Information about the choices of agents aids in the UnIverSIty Press. .identification of the latent skill distributions. Heckman, J. and Sed~acek, G. 1985. He~erogenelty, aggregatIon and

Th~ R~y model i~ the prot?type fo~ many models pf .~lf- ~a;7~:2;~ge functIons. Journal of Pollllcal Economy 93, December,

selectlo~ III economics. If T1 .I~ potential market. prod\lctlVlty Heckman, J. and Singer, tJ. 1985. Econometric analysis of longitudinal

and T2 IS non-market productlvlty'(or the reservation \f~ge) for data. In Handbook of Econometrics, Vol. III, ed. Z. Griliches and

housewives or unemployed individuals, precisely the same M. Intriligator, Amsterdam: North-Holland.

model can be used to explore the effects of self-selection on Lee, L. F. 1978. Unionism and wage rates: a simultaneous equations

measured productivity. In such a model, T 2 is never observed. model with qualitative and limited dependent variables. International

Thi8 creates certain problems of identification discussed in Economic Review 19,415-33. Heckman and Honore (1986). The model has been extended to Manskl, C. a~d Lerman, S. 1977. The estImatIon of choIce probabIlitIes

...from choice based samples. Econometrica 45, 1977--88.allow. for more general cho~ce mech~msms. In partlcula.r, Manski, C. and McFadden, D. 198f. Alternative estimates and

sel~t~on may occur as a fun~tl°.n of vanables other th~.ll or III sample designs for discrete choice analysis. In Structural Analysis of

addition to T, and T2. Applications of the Roy model Include Discrete Data with Econometric Applications, ed. C. Manski and

studies of the union-non-union wage differential (L~, 1978), D. McFadden, Cambridge: MIT Press.

the returns to schooling (Willis and Rosen, 1979); I ~nd the Pearson, K. 190 I. Mathematical contributions to the theory of evolution.

returns to training (Bjorklund and Moffitt, 1986) and Heckman Philosophical Transactions, 195, 1-47.and Robb (1985). Amemiya (1984) and Heckman and Honore Rao, C. R. 1965. On discrete distributions arising out of methods of

(1986) present comprehensive surveys of empirical studies based ascerta!nment. In Classical and Contagious Distributions, ed.

..G. Patil, Calcutta: Pergamon Press.

on the Roy model and ItS extensions. Roy, A. D. 1951. Some thoughts on the distribution of earnings. O~ford

J Mffi J I ~CKMAN Economic Papers, 3, 135-46.A. Vardi, Y. 1983. Nonparametric estimation in the presence of length bias.

Annals of Statistics 10, 616-20.Vardi, Y. 1985. Empirical distributions in selection bias models.

BIBLIOGRAPHY Annals of Statistics, 13, 178-203.

Amemiya, T. 1984. Tobit models: a survey. Journal of Econometrics 24, Willis, R. and Rosen, S. 1979. Education and self selection. Journal of

3-61. Political Economy 87, S7-S36.

Bjorklund, A. and Moffitt, R. 1986. Estimation of wage gains and

welfare gains from self selection models. Review of Economics and

Statistics 24, 1-63.Cain, G. and Watts, H. 1973. Toward a summary and synthesis of the

evidence. In Income Maintenance and Labor Supply, ed. G. Cain and

H. Watts, Madison: University of Wisconsin Press.Clark, K. and Summers, L. 1979. Labor market dynamics and un-

employment: a reconsideration. Brookings Papers on Economic

Activity, 13-60.Cosslett, S. 1981. Maximum likelihood estimation from choice based

samples. Econometrica.Cosslett, S. 1984. Distribution free estimator of regression model with

sample selectivity. Unpublished manuscript, University of Florida.

Domencich, T. and McFadden, D. 1975. Urban Travel Demand. Amster-

dam: North-Holland.Flinn, C. and Heckman, J. 1982. New methods for analyzing structural

models of labor force dynamics. Journal of Econometrics 18, 5-168.

Gallant, R. and Nychka, R. 1984. Consistent estimation of the censored

regression model. Unpublished manuscript, North Carolina State.

Gill, R. and Wellner, J. 1985. Large sample theory of empirical

distributions in biased sampling models. Unpublished manuscript,

University of Washington.Goldberger, A. 1983. Abnormal selection bias. In Studies ill Econo-

metrics, Time Series and Multivariate Statistics, ed. S. ~rlin,

T. Amemiya and L. Goodman, Wiley, NY

Gronau, R. 1974. Wage comparisons-a selectivity bias. JlJUrnal of

Political Economy 82, (6),1119-1144.Hall, R. 1982. The importance of lifetime jobs in the U.S. economy.

American Economic Review 72, September, 716-724.

Heckman, J. 1974. Shadow prices, market wages and lal!Or supply.

Econometrica 42(4), 679-94.Heckman, J. 1976. The common structure of statistical models of

truncation, sample selection and limited dependent variables and a

simple estimator for such models. Annals of Economic and Social

Measurement 5(4), 475-92.Heckman, J. 1977. Sample selection bias as a specification error.

Econometrica 47(1), 153-62.Heckman, J. and Honore, B. 1986. The empirical content of the Roy

model. Unpublished manuscript, University of Chicago.Heckman, J. and MaCurdy, T. 1981. New methods for estimating labor

supply t'unctions. Tn Research in Labor Economics, Vol. 4, ed.

R. Ehrenberg, Greenwich, Conn.: JAI Press.I II

;ingleomic;ingle

~ss tos i.e.rerne1t1/~~ntifyodels

I Zisvhichks inityofe. As:e out

lJ be

self-interest. Two of the basic questions with which moralphilosophers have been concerned are: (a) what are thefundamental principles of morality? (b) why should we obeythem? One tempting answer to the second question is: becauseobeying them is in your own interest. Tempting, because anyother answer simply invites a further 'why?'. For example,'why bother about helping others to get what they want?'clearly demands an answer. But 'why bother about gettingwhat you want?'., though of course it can be asked, hardlymakes sense.

Self-interest as the answer to the second questiol:l, however,implies a similar answer to the first. Self-interest can only be areason for obeying moral principles if those principles doalways benefit us as individuals, so that the fundamental onebecomes: Do whatever will enable you to satisf)L your owndesires. And this seems perverse, since most moralists tell us toconsider others rather than ourselves. Self-sacrifice, we aretold, is noble, and self-seeking base.

Thomas Hobbes answers this objection by pointing out that,while human desires are diverse, so that there is no commonend, there is a single means common to all ends. They allrequire the cooperation of other people, or at least theirnon-interference. Everyone has an interest in maintaining apeaceful and harmonious society. Moral principles are simplythe rules which everyone must follow in order to obtain such asociety. We should obey them because obeying them makes forpeace and security, and without peace and security no one hasmuch chance of satisfying an.v desires. If morality requires usto consider others and not ourselves, it is for our own sakes in

the long run.To suppose that men imposed moral restraints on themselves

for this reason might suggest a far-sightedness greater thanmost of us are capable of. Bernard Mandeville suggested thatmen are motivated less by this consideration than by vanity.Morality, he conjectured, came about through the artifice of a

:an be:e that:t5.

297