omitted variable bias - faculty of...
TRANSCRIPT
Omitted Variable Bias
OLS estimates the causal relationship from to It is possible that the direction of causality goes both ways: to
(A) Simultaneity
E.g. Impact of smoking on health Does smoking determine health outcomes or do health outcomes determine smoking behaviour?
(B) Omitted variable bias
Eg. Impact of schooling on earnings Observed association between outcome variable ( ) and explanatory variable ( ) can be misleading partly reflects omitted factors that are related to both variables
If these factors could be measured and held constant in a regression omitted variable bias would be eliminated in practice this is difficult
Innate ability of ones’ parents affects earnings and schooling of children cannot perfectly control for ability which is essentially
unobservable
Formally:
Assumption of OLS is that and are not correlated This assumption is violated if: There are omitted variables which determine both and
which we cannot control for
In other words the estimate of β is not identified, we cannot deduce it from the joint distribution between and
Econ 495 - Econometric Review 52
• Here, we could add demographic characteristics such as marital status,
geographic location, etc.
1.13.2 Omitted Variables Bias
• If we omit variables that do belong, then the OLS estimate will likely
be biased, E(β̂1) 6= β1
• For example, suppose that the true wage equation model was
wagesi = β0 + β1educi + β2abil + ui
but that since we do not observe ability, we estimate
wagesi = β0 + β1educi + vi (13)
Econ 495 - Econometric Review 53
where vi = β2abil + ui
• Then, calling β̃1 the estimate from the equation (13) that omits ability
(13), we can show that
E[β̃1] = β1 + β2δ̃1
where δ1 =Cov(educi, abili)
V ar(educi)
• More generally, when X1 and X2 are correlated and β2 6= 0, the
estimate β̃1 will be biased.
• The sign of the bias depends on both the sign of β2 and of δ̃1
Econ 495 - Econometric Review 54
Corr(X1, X2) > 0 Corr(X1, X2) < 0β2 > 0 positive bias negative biasβ2 < 0 negative bias positive bias
• In the case of the wage equation , because more ability leads to higher
productivity, and higher wage: β2 > 0. There are also reason to
believe that educ and ability are positively correlated, so we would
think that the OLS estimates from equation (13) are too large
• What to do about it? This is not an easy problem to correct, if we do
not have some measures of ability in our sample. (One has to take an
quasi-experimental approach using IV for example.)
• However, in terms of reporting results, one would be aware of the
Econ 495 - Econometric Review 55
possibility of an “omitted variables” bias and qualify the results as
likely “upward biased” or “downward biased”.
1.13.3 Heteroscedasticity
• When the variance of the error terms is not constant across observa-
tions, we have a problem of heteroscedasticity: V ar(ui|educ) = σ2i =
σ2(Xi)
• The OLS estimates are still unbiased and consistent, but the standard
errors of the estimates are biased if we have heteroskedasticity
Ways to address this:
(1) Experiments which randomly assigns so that it is no longer correlated with
Job training program which conducts a social experiment which
randomly assigns training to a subset of individuals Random assignment assures that participation into the program is
not correlated with omitted personal or social factors
In practice randomization is not very feasible Not easy to run social experiments on population (outside of a lab)
(2) Instrumental Variables
Suppose we have a third variable (“the instrument”) which is correlated with but not with Hence is uncorrelated with the omitted variables and the regression error Instrumental variable technique allows us to estimate the coefficient of interest consistently (free of bias caused by the omitted variables) without having data on the omitted variables Intuitively -- instrumental variables uses only part of the variability in (the part which is uncorrelated with ) to estimate relationship
between and
Classic example Estimation of demand and supply elasticities Observed data on quantities and prices reflects a set of equilibrium points on both the demand and supply curves Consequently an OLS regression of quantities on prices fails to identify, that is trace out, either the supply or demand relationship We can solve this problem by finding certain “curve shifters” (now called instrumental variables)
Find additional factors which affect demand conditions without affecting supply conditions and vice-versa Example of linseed oil: For the demand curve shifter we can use the price of substitute
goods (cottonseed)
For the supply curve shifter can use factors that affect costs (yield per acre) such as weather patterns
Intuitively: weather related shifts (which shift the supply curve) are used to
trace out the demand curve
changes in the price of substitute goods are used to shift the demand curve so as to trace out the supply curve
EXAMPLE ON SUPPLY/DEMAND[from: Stock and Watson, Introduction to Econometrics, chapter 12]
Simultaneous causality bias in the OLS regression of quantities on prices arises because price and quantity are determined by the interaction of demand and supply!
EXAMPLE ON SUPPLY/DEMAND[from: Stock and Watson, Introduction to Econometrics, chapter 12]
The interaction between demand and supply could reasonably produce something not useful for our purposes!
EXAMPLE ON SUPPLY/DEMAND[from: Stock and Watson, Introduction to Econometrics, chapter 12]
But, what if only supply shifts?
TSLS estimates the demand curve by isolating shifts in price and quantity that arise from shifts in supply; Z is a variable that shifts supply but not demand.
Instrumental Variables Method First stage estimation:
Obtain predicted values:
Predicted value, , is not a random variable and hence is not correlated with
Second stage estimation:
Econ 495 - Econometric Review 2
4 Instrumental Variables
4.1 Single endogenous variable – One continuous instru-
ment
• Instrumental Variables (IV) estimation is used when a model
Y = β0 + β1X + u (1)
has an endogenous X, that is, whenever Cov(X, u) 6= 0
• In other words, IV can be used to address the problem of omitted
variable bias
Econ 495 - Econometric Review 5
• But what is an instrumental variable?
• In order for a variable, Z, to serve as a valid instrument for X, the
following must be true
• Assumption 1: Exclusion Restriction The instrument must be exoge-
nous, that is, uncorrelated with the error term, Cov(Z, u) = 0
• Assumption 2: Instrument Relevance The instrument must be corre-
lated with the endogenous variable X that is, Cov(Z, X) 6= 0
• How do we know that Z is a valid instrument?
Econ 495 - Econometric Review 6
• The main problem is that we have to use common sense and economic
theory to decide if it makes sense to assume Cov(Z, u) = 0
• In the case of multiple instruments, we can use the overid test below
• However, we can test whether Cov(Z, X) 6= 0
• We simply test H0 : π1 = 0 in the regression
X = π0 + π1Z + v (2)
• This regression is called the first-stage regression
Econ 495 - Econometric Review 7
• Card (1995) has used proximity to a four-year college nearc4 as in-
strument for education
. reg educ nearc4 exper expersq black south smsa smsa66 reg661-reg668 ;
Source | SS df MS Number of obs = 3010-------------+------------------------------ F( 15, 2994) = 182.13
Model | 10287.6179 15 685.841194 Prob > F = 0.0000Residual | 11274.4622 2994 3.76568542 R-squared = 0.4771
-------------+------------------------------ Adj R-squared = 0.4745Total | 21562.0801 3009 7.16586243 Root MSE = 1.9405
------------------------------------------------------------------------------educ | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------nearc4 | .3198989 .0878638 3.64 0.000 .1476194 .4921785exper | -.4125334 .0336996 -12.24 0.000 -.4786101 -.3464566
expersq | .0008686 .0016504 0.53 0.599 -.0023674 .0041046black | -.9355287 .0937348 -9.98 0.000 -1.11932 -.7517377south | -.0516126 .1354284 -0.38 0.703 -.3171548 .2139296smsa | .4021825 .1048112 3.84 0.000 .1966732 .6076918
Econ 495 - Econometric Review 8
smsa66 | .0254805 .1057692 0.24 0.810 -.1819071 .2328682reg661 | -.210271 .2024568 -1.04 0.299 -.6072395 .1866975reg662 | -.2889073 .1473395 -1.96 0.050 -.5778042 -.0000105reg663 | -.2382099 .1426357 -1.67 0.095 -.5178838 .0414639reg664 | -.093089 .1859827 -0.50 0.617 -.4577559 .2715779reg665 | -.4828875 .1881872 -2.57 0.010 -.8518767 -.1138982reg666 | -.5130857 .2096352 -2.45 0.014 -.9241293 -.1020421reg667 | -.4270887 .2056208 -2.08 0.038 -.8302611 -.0239163reg668 | .3136204 .2416739 1.30 0.194 -.1602434 .7874841_cons | 16.84852 .2111222 79.80 0.000 16.43456 17.26248
------------------------------------------------------------------------------
. test nearc4;
( 1) nearc4 = 0
F( 1, 2994) = 13.26Prob > F = 0.0003
Econ 495 - Econometric Review 9
• Rule-of-thumb: you need to worry about weak instruments if the first-
stage F-statistic is less than 10
• Given equation (1) and our assumptions 1 and 2
Cov(Z, Y ) = Cov[Z, (β0 + β1X + u)]
= β1Cov(Z, X) + Cov(Z, u),
so βIV1 =
Cov(Z, Y )
Cov(Z, X)
Econ 495 - Econometric Review 11
. ivreg lwage (educ=nearc4) exper expersq black south smsa smsa66 reg661-reg668 ;
Instrumental variables (2SLS) regression
Source | SS df MS Number of obs = 3010-------------+------------------------------ F( 15, 2994) = 51.01
Model | 141.146813 15 9.40978752 Prob > F = 0.0000Residual | 451.494832 2994 .150799877 R-squared = 0.2382
-------------+------------------------------ Adj R-squared = 0.2343Total | 592.641645 3009 .196956346 Root MSE = .38833
------------------------------------------------------------------------------lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------educ | .1315038 .0549637 2.39 0.017 .0237335 .2392742
exper | .1082711 .0236586 4.58 0.000 .0618824 .1546598expersq | -.0023349 .0003335 -7.00 0.000 -.0029888 -.001681black | -.1467757 .0538999 -2.72 0.007 -.2524603 -.0410912south | -.1446715 .0272846 -5.30 0.000 -.19817 -.091173smsa | .1118083 .031662 3.53 0.000 .0497269 .1738898
smsa66 | .0185311 .0216086 0.86 0.391 -.0238381 .0609003
Econ 495 - Econometric Review 12
reg661 | -.1078142 .0418137 -2.58 0.010 -.1898007 -.0258278reg662 | -.0070465 .0329073 -0.21 0.830 -.0715696 .0574767reg663 | .0404445 .0317806 1.27 0.203 -.0218694 .1027585reg664 | -.0579172 .0376059 -1.54 0.124 -.1316532 .0158189reg665 | .0384577 .0469387 0.82 0.413 -.0535777 .130493reg666 | .0550887 .0526597 1.05 0.296 -.0481642 .1583416reg667 | .026758 .0488287 0.55 0.584 -.0689832 .1224992reg668 | -.1908912 .0507113 -3.76 0.000 -.2903238 -.0914586_cons | 3.773965 .934947 4.04 0.000 1.940762 5.607169
------------------------------------------------------------------------------Instrumented: educInstruments: exper expersq black south smsa smsa66 reg661 reg662 reg663
reg664 reg665 reg666 reg667 reg668 nearc4------------------------------------------------------------------------------
• Notice that β̂IVeduc = 0.132 > β̂
OLSeduc = 0.075 (see LATE effect below)
• Which estimator should we prefer IV or OLS?
Econ 495 - Econometric Review 16
4.2 Single endogenous variable – more than one continuous
instrument
• Consider the following structural model
Y = β0 + β1X1 + β2X2 + u1 (6)
where X1 is an endogeneous variable and X2 is an exogenous variable
• Suppose now that we have two exogenous variables excluded from
equation (6)
X1 = π0 + π1Z1 + π2Z2 + v (7)
where Z1 and Z2 are valid instruments in that they do not appear
in the structural model and are uncorrelated with the structural error
term u1, but are correlated with X1
Econ 495 - Econometric Review 17
• With more than one instrument, the IV estimator is also called the
two-stage least squares (2SLS) estimator
• In our returns to education example, we can add proximity to a two-
year college nearc2
. reg educ nearc4 nearc2 exper expersq black south smsa smsa66 reg661-reg668 ;
Source | SS df MS Number of obs = 3010-------------+------------------------------ F( 16, 2993) = 170.99
Model | 10297.1164 16 643.569774 Prob > F = 0.0000Residual | 11264.9637 2993 3.76377002 R-squared = 0.4776
-------------+------------------------------ Adj R-squared = 0.4748Total | 21562.0801 3009 7.16586243 Root MSE = 1.94
------------------------------------------------------------------------------educ | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
Econ 495 - Econometric Review 18
nearc4 | .3205819 .0878425 3.65 0.000 .148344 .4928197nearc2 | .1229986 .0774256 1.59 0.112 -.0288142 .2748114exper | -.4122915 .0336914 -12.24 0.000 -.4783521 -.3462309
expersq | .0008479 .00165 0.51 0.607 -.0023874 .0040832black | -.9451729 .0939073 -10.06 0.000 -1.129302 -.7610434south | -.0419115 .1355316 -0.31 0.757 -.3076561 .2238331smsa | .4013708 .1047858 3.83 0.000 .1959113 .6068303
smsa66 | .0000782 .1069445 0.00 0.999 -.2096139 .2097704reg661 | -.1687829 .2040832 -0.83 0.408 -.5689405 .2313747reg662 | -.269031 .1478324 -1.82 0.069 -.5588944 .0208325reg663 | -.1902114 .1457652 -1.30 0.192 -.4760216 .0955987reg664 | -.037715 .1891745 -0.20 0.842 -.4086403 .3332102reg665 | -.4371387 .1903306 -2.30 0.022 -.8103307 -.0639467reg666 | -.5022265 .2096933 -2.40 0.017 -.9133841 -.0910688reg667 | -.3775317 .207922 -1.82 0.070 -.7852162 .0301529reg668 | .3820043 .2454171 1.56 0.120 -.0991991 .8632076_cons | 16.77306 .2163481 77.53 0.000 16.34885 17.19727
------------------------------------------------------------------------------
. predict peduc;(option xb assumed; fitted values)
Econ 495 - Econometric Review 19
. predict reseduc, res;
. test nearc4=nearc2=0;
( 1) nearc4 - nearc2 = 0( 2) nearc4 = 0
F( 2, 2993) = 7.89Prob > F = 0.0004
• Here nearc2 is a weak instrument, so we no longer pass the rule-of-
thumb test, so we would prefer the IV using only nearc4
• In a more general case, we could use either Z1 or Z2 as an instrument
INSTITUTIONS AND ECONOMIC DEVELOPMENT Notes from : “Colonial origins of comparative development”
(Acemoglu et. al.) What are the fundamental causes of the large differences in income per capita across countries? Differences in institutions and property rights have received attention
View receives support from cross-country correlations between measures of property rights and economic development
At some level -- obvious that institutions matter North and South Korea East and West Germany
One part of the country stagnated under central planning and collective ownership while the other prospered with private property and a market economy To estimate impact of institutions on economic performance we need a source of exogenous variation in institutions (an instrument)
Propose a theory of institutional differences among countries colonized by Europeans Exploit this theory to derive a possible source of exogenous variation Theory rests on three premises
(1) Different types of colonization policies created different sets of institutions
At one extreme:
Colonizers did not settle and set up extractive institutions did not introduce protection for private property did not provide checks and balances against government
expropriation main purpose -- transfer as much of the resources of the colony
to the colonizer Latin America and the Belgian Congo
At the other extreme:
Colonizers settled and replicated European institutions strong emphasis on private property and checks against
government power
Australia, New Zealand, Canada and U.S.
(2) Colonization strategy was influenced by the feasibility of
settlements
In places where disease environment was not favorable to European settlement
formation of the extractive state was more likely
(3) The colonial state and institutions persisted even after independence.
Based on these three premises: use the mortality rates of the first European settlers as an instruments
for current institutions in these countries
settler mortality settlements early institutions current institutions current economic performance
Log
GD
P pe
r cap
ita, P
PP, 1
995
Log of Settler Mortality2 4 6 8
4
6
8
10
AGO
ARG
AUS
BDI
BENBFABGD
BHS
BLZ
BOL
BRA
BRB
CAF
CAN
CHL
CIVCMRCOG
COLCRI
DOMDZAECU
EGY
ETH
FJI
GAB
GHAGINGMB
GTM
GUY
HKG
HND
HTI
IDN
IND
JAM
KENLAO
LKAMAR
MDG
MEX
MLI
MLT
MRT
MUSMYS
NER NGA
NIC
NZL
PAK
PAN
PERPRY
RWA
SDN SEN
SGP
SLE
SLV
TCD
TGO
TTOTUN
TZA
UGA
URY
USA
VEN
VNM
ZAF
ZAR
Colonies where Europeans faced higher mortality rates are today substantially poorer than colonies that were healthy for Europeans Theory implies this relationship reflects the effect of settler mortality working
through the institutions brought by Europeans
assumes there is no direct affect between settler mortality and
economic performance today
Under these assumptions: Regress current performance on current institutions and instrument the latter by settler mortality rates Focus on property rights and checks against government power use protection against risk of expropriation index as a proxy for
institutions
Estimation Strategy
is income per capita in country i
is protection against expropriation (institutions)
is a vector of other control variables (geography, legal origins) is not an exogenous variable
many omitted variables determine both and OLS regression would suffer from omitted variable bias
First stage estimation:
Where is settler mortality rate Second stage estimation:
Log
GD
P pe
r cap
ita, P
PP, 1
995
Average Expropriation Risk 1985-954 6 8 10
4
6
8
10
AGO
ARG
AUS
BFA BGD
BHS
BOL
BRA
CAN
CHL
CIVCMRCOG
COLCRI
DOMDZAECU
EGY
ETH
GAB
GHAGINGMB
GTM
GUY
HKG
HND
HTI
IDN
IND
JAM
KEN
LKAMAR
MDG
MEX
MLI
MLT
MYS
NER NGA
NIC
NZL
PAK
PAN
PERPRY
SDN SEN
SGP
SLE
SLV
TGO
TTOTUN
TZA
UGA
URY
USA
VEN
VNM
ZAF
ZAR
Log
GD
P pe
r cap
ita, P
PP, 1
995
Log of Settler Mortality2 4 6 8
4
6
8
10
AGO
ARG
AUS
BDI
BENBFABGD
BHS
BLZ
BOL
BRA
BRB
CAF
CAN
CHL
CIVCMRCOG
COLCRI
DOMDZAECU
EGY
ETH
FJI
GAB
GHAGINGMB
GTM
GUY
HKG
HND
HTI
IDN
IND
JAM
KENLAO
LKAMAR
MDG
MEX
MLI
MLT
MRT
MUSMYS
NER NGA
NIC
NZL
PAK
PAN
PERPRY
RWA
SDN SEN
SGP
SLE
SLV
TCD
TGO
TTOTUN
TZA
UGA
URY
USA
VEN
VNM
ZAF
ZAR
Aver
age
Expr
opria
tion
Ris
k 19
85-9
5
Log of Settler Mortality2 4 6 8
4
6
8
10
AGO
ARG
AUS
BFA
BGD
BHS
BOL
BRA
CAN
CHL
CIV
CMR
COG
COLCRI
DOMDZAECU
EGY
ETH
GAB
GHAGIN
GMB
GTM
GUY
HKG
HND
HTI
IDN
IND
JAM
KENLKA
MAR
MDG
MEX
MLI
MLT
MYS
NER
NGANIC
NZL
PAK PANPER
PRY
SDN
SEN
SGP
SLE
SLV
TGO
TTO
TUNTZA
UGA
URY
USA
VEN
VNM
ZAF
ZAR
Notes from “The long-term effects of Africa’s slave trade” (Nunn) Africa’s economic performance in second half of the 20th century has been very poor One explanation for Africa’s underdevelopment is its history of extraction characterized by two events: slave trades colonialism
Estimation Equation
is per capita GDP in country i
is vector of variables reflecting origin of colonizer prior to independence
is vector of variables reflecting geography and climate
THE LONG-TERM EFFECTS OF AFRICA’S SLAVE TRADES 155
TABLE IIIRELATIONSHIP BETWEEN SLAVE EXPORTS AND INCOME
Dependent variable is log real per capita GDP in 2000, ln y
(1) (2) (3) (4) (5) (6)
ln(exports/area) −0.112∗∗∗ −0.076∗∗∗ −0.108∗∗∗ −0.085∗∗ −0.103∗∗∗ −0.128∗∗∗(0.024) (0.029) (0.037) (0.035) (0.034) (0.034)
Distance from 0.016 −0.005 0.019 0.023 0.006equator (0.017) (0.020) (0.018) (0.017) (0.017)
Longitude 0.001 −0.007 −0.004 −0.004 −0.009(0.005) (0.006) (0.006) (0.005) (0.006)
Lowest monthly −0.001 0.008 0.0001 −0.001 −0.002rainfall (0.007) (0.008) (0.007) (0.006) (0.008)
Avg max humidity 0.009 0.008 0.009 0.015 0.013(0.012) (0.012) (0.012) (0.011) (0.010)
Avg min −0.019 −0.039 −0.005 −0.015 −0.037temperature (0.028) (0.028) (0.027) (0.026) (0.025)
ln(coastline/area) 0.085∗∗ 0.092∗∗ 0.095∗∗ 0.082∗∗ 0.083∗∗(0.039) (0.042) (0.042) (0.040) (0.037)
Island indicator −0.398 −0.150(0.529) (0.516)
Percent Islamic −0.008∗∗∗ −0.006∗ −0.003(0.003) (0.003) (0.003)
French legal origin 0.755 0.643 −0.141(0.503) (0.470) (0.734)
North Africa 0.382 −0.304indicator (0.484) (0.517)
ln(gold prod/pop) 0.011 0.014(0.017) (0.015)
ln(oil prod/pop) 0.078∗∗∗ 0.088∗∗∗(0.027) (0.025)
ln(diamond −0.039 −0.048prod/pop) (0.043) (0.041)
Colonizer fixed Yes Yes Yes Yes Yes Yeseffects
Number obs. 52 52 42 52 52 42R2 .51 .60 .63 .71 .77 .80
Notes. OLS estimates of (1) are reported. The dependent variable is the natural log of real per capitaGDP in 2000, ln y. The slave export variable ln(exports/area) is the natural log of the total number of slavesexported from each country between 1400 and 1900 in the four slave trades normalized by land area. Thecolonizer fixed effects are indicator variables for the identity of the colonizer at the time of independence.Coefficients are reported with standard errors in brackets. ∗∗∗ , ∗∗ , and ∗ indicate significance at the 1%, 5%,and 10% levels.
for slave exports remains negative and significant, and the mag-nitude of the estimated coefficient actually increases.9
9. One may also be concerned that the inclusion of the countries in southernAfrica—namely South Africa, Swaziland, and Lesotho—may also be biasing theresults. As I report in the Appendix, the results are robust to also omitting this
Direction of causality OLS estimates show there is a relationship between slave exports and current economic performance Still unclear whether slave trades have a causal impact on current income Alternative explanation for relationship: societies that were initially under developed selected into slave
trades these societies continue to be under developed today
Therefore we observe a negative relationship between slave exports and current income even though the slave trades did not have any effect on subsequent economic development
Two strategies to evaluate whether there is a causal effect of the slave trades on income
(1) Using historic data: can evaluate importance and characteristics of selection into the slave trades
(2) Find instruments for slave exports
Historical Evidence on Selection during the Slave Trades Using data on initial population densities Check whether more or less prosperous areas selected into the slave
trades
Population density is a reasonable indicator of economic prosperity
158 QUARTERLY JOURNAL OF ECONOMICS
FIGURE IVRelationship between Initial Population Density and Slave Exports
obtained if civil wars or conflicts could be instigated (Barry 1992;Inikori 2003). As well, societies that were the most violent andhostile, and therefore the least developed, were often best ableto resist European efforts to purchase slaves. For example, theslave trade in Gabon was limited because of the defiance andviolence of its inhabitants toward the Portuguese. This resistancecontinued for centuries, and as a result the Portuguese were forcedto concentrate their efforts along the coast further south (Hall2005, pp. 60–64).
Using data on initial population densities, I check whether itwas the more prosperous or less prosperous areas that selectedinto the slave trades. Acemoglu, Johnson, and Robinson (2002)have shown that population density is a reasonable indicator ofeconomic prosperity. Figure IV shows the relationship betweenthe natural log of population density in 1400 and ln(exports/area).The data confirm the historical evidence on selection during theslave trades.12 The figure shows that the parts of Africa that were
12. The relationship is similar if one excludes island and North African coun-tries, or if one normalizes slave exports by population rather than land area.
Figure shows that parts of Africa that were the most prosperous in 1400 (measured by population density) tend also to be the areas that were most impacted by the slave trades evidence suggests that societies that were the most prosperous, not
the most under developed, that selected into the slave trades
unlikely that the strong relationship between slave exports and
current income is driven by selection
Instrumental Variables
Use instruments that are correlated with slave exports but are uncorrelated with other country characteristics As instruments for slave exports use distances from each African country to the locations where the
slaves were demanded
Validity of these instruments relies on presumption: although location of demand influenced the location of supply location of supply did not influence location of demand
If sugar plantations were established in West Indies because West Indies were close to Western Coast of Africa instruments not valid
If instead slaves were taken from Western Africa, because it was relatively close to plantation economies in West Indies instruments are valid
Historical evidence suggests this to be true location for demand of African slaves was determined by a number
of factors all unrelated to the supply of slaves
In West Indies and southern U.S. slaves were imported because of climates suitable for growing commodities such as sugar and tobacco Existence of gold and silver mines was determinant for demand of slaves in Brazil In northern Sahara, Arabia, and Persia slaves were needed to work in salt mines In the Red Sea area slaves were used as pearl divers
THE LONG-TERM EFFECTS OF AFRICA’S SLAVE TRADES 161
FIGURE VExample Showing the Distance Instruments for Burkina Faso
4. The overland distance from a country’s centroid to the clos-est port of export for the Red Sea slave trade. The portsare Massawa, Suakin, and Djibouti.14
The instruments are illustrated in Figure V, which shows thefour distances for Burkina Faso. The ports in each of the fourslave trades are represented by different colored symbols, and theshortest distances by colored lines. Details of the construction ofthe instruments are given in the Appendix.15
The IV estimates are reported in Table IV. The first columnreports estimates without control variables, the second columnincludes colonizer fixed effects, and the third and fourth columnsinclude colonizer fixed effects and geography controls. In column(4), the sample excludes islands and North African countries.
14. For island countries, one cannot reach the ports of the Saharan or RedSea slave trades by traveling overland. For these countries I use the sum of thesailing distance and overland distance.
15. An alternative strategy is to also include the distance from the centroidto the coast (which is also shown in Figure V) as an additional instrument, sincethis distance is part of the total distance to the markets in the Indian Ocean andtrans-Atlantic slave trades. The results are essentially identical if this distance isalso included as an additional instrument.
162 QUARTERLY JOURNAL OF ECONOMICS
TABLE IVESTIMATES OF THE RELATIONSHIP BETWEEN SLAVE EXPORTS AND INCOME
(1) (2) (3) (4)
Second Stage. Dependent variable is log income in 2000, ln yln(exports/area) −0.208∗∗∗ −0.201∗∗∗ −0.286∗ −0.248∗∗∗
(0.053) (0.047) (0.153) (0.071)[−0.51, −0.14] [−0.42, −0.13] [−∞, +∞] [−0.62, −0.12]
Colonizer fixed No Yes Yes Yeseffects
Geography controls No No Yes YesRestricted sample No No No YesF-stat 15.4 4.32 1.73 2.17Number of obs. 52 52 52 42
First Stage. Dependent variable is slave exports, ln(exports/area)
Atlantic distance −1.31∗∗∗ −1.74∗∗∗ −1.32∗ −1.69∗∗(0.357) (0.425) (0.761) (0.680)
Indian distance −1.10∗∗∗ −1.43∗∗∗ −1.08 −1.57∗(0.380) (0.531) (0.697) (0.801)
Saharan distance −2.43∗∗∗ −3.00∗∗∗ −1.14 −4.08∗∗(0.823) (1.05) (1.59) (1.55)
Red Sea distance −0.002 −0.152 −1.22 2.13(0.710) (0.813) (1.82) (2.40)
F-stat 4.55 2.38 1.82 4.01Colonizer fixed No Yes Yes Yes
effectsGeography controls No No Yes YesRestricted sample No No No YesHausman test .02 .01 .02 .04
(p-value)Sargan test (p-value) .18 .30 .65 .51
Notes. IV estimates of (1) are reported. Slave exports ln(exports/area) is the natural log of the total numberof slaves exported from each country between 1400 and 1900 in the four slave trades normalized by land area.The colonizer fixed effects are indicator variables for the identity of the colonizer at the time of independence.Coefficients are reported, with standard errors in brackets. For the endogenous variable ln(exports/area), Ialso report 95% confidence regions based on Moreira’s (2003) conditional likelihood ratio (CLR) approach.These are reported in square brackets. The p-value of the Hausman test is for the Wu–Hausman chi-squaredtest. ∗∗∗ , ∗∗ , and ∗ indicate significance at the 1%, 5%, and 10% levels. The “restricted sample” excludes islandand North African countries. The “geography controls” are distance from equator, longitude, lowest monthlyrainfall, avg max humidity, avg min temperature, and ln(coastline/area).
The first-stage estimates are reported in the bottom panel ofthe table. The coefficients for the instruments are generally neg-ative, suggesting that the further a country was from slave mar-kets, the fewer slaves it exported.16 The exception is the distance
16. The specifications assume a linear first-stage relationship. The estimatesare similar if one also allows for a nonlinear relationship between slave exports
Possible Channels of Causality Channels through which slave trade may affect current economic development:
(1) Slave trades weaken ties between villages discourage formation of larger communities and broader ethnic identities
Evidence shows that ethnic fractionalization reduces provision of public goods (education, health facilities, access to water, transportation, infrastructure) important for economic development
164 QUARTERLY JOURNAL OF ECONOMICS
FIGURE VIRelationship between Slave Exports and Current Ethnic Fractionalization
preliminary and exploratory. With only 52 observations it is notpossible to pin down the precise channels and mechanism under-lying the relationships with any reasonable degree of certainty.My strategy here is to simply investigate whether the data areconsistent with the historic events described in Section II.
An important consequence of the slave trades was that theytended to weaken ties between villages, thus discouraging theformation of larger communities and broader ethnic identities. Iexplore whether the data are consistent with this channel by ex-amining the relationship between slave exports and a measureof current ethnic fractionalization from Alesina et al. (2003). Asshown in Figure VI, there is a strong positive relationship be-tween the two variables.18 This is consistent with the historicaccounts of the slave trades impeding the formation of broaderethnic identities.
This consequence of the slave trades is important because ofthe increasing evidence showing that ethnic fractionalization is an
18. The results are also similar if other measures of ethnic fractionalizationare used.
(2) Slave trades weakened and under developed states
negative relationship between slave exports and 19th century state
centralization
Consistent with slave trades causing long-term political instability weakened and fragmented states
undevelopment of political structures (institutions)
166 QUARTERLY JOURNAL OF ECONOMICS
FIGURE VIIRelationship between Slave Exports and Nineteenth-Century State Development
growth between 1960 and 1995. Looking within Africa, Gennaioliand Rainer (2006) find that countries with ethnicities that hadcentralized precolonial state institutions today provide more pub-lic goods, such as education, health, and infrastructure.
Herbst (1997, 2000) also focuses on the importance of statedevelopment for economic success, arguing that Africa’s pooreconomic performance is a result of postcolonial state failure,the roots of which lie in the underdevelopment and instability ofprecolonial polities. Herbst (2000, chaps. 2–4) argues that becauseof a lack of significant political development during colonial rule,the limited precolonial political structures continued to exist afterindependence.19 As a result, Africa’s postindependence leadersinherited nation states that did not have the infrastructurenecessary to extend authority and control over the whole country.Many states were, and still are, unable to collect taxes fromtheir citizens, and as a result they are also unable to provide aminimum level of public goods and services.
19. On the continuity between Africa’s precolonial and postcolonial politicalsystems also see Hargreaves (1969, p. 200).