nber working paper series understanding and ...lars hansen, jeff hammer, glenn harrison, macartan...
TRANSCRIPT
![Page 1: NBER WORKING PAPER SERIES UNDERSTANDING AND ...Lars Hansen, Jeff Hammer, Glenn Harrison, Macartan Humphreys, Michal Kolesár, Helen Milner, Tamlyn Munslow, Suresh Naidu, Lant Pritchett,](https://reader033.vdocuments.net/reader033/viewer/2022060100/60b128e9329a544c0969daf7/html5/thumbnails/1.jpg)
NBER WORKING PAPER SERIES
UNDERSTANDING AND MISUNDERSTANDING RANDOMIZED CONTROLLED TRIALS
Angus DeatonNancy Cartwright
Working Paper 22595http://www.nber.org/papers/w22595
NATIONAL BUREAU OF ECONOMIC RESEARCH1050 Massachusetts Avenue
Cambridge, MA 02138September 2016, Revised October 2017
We acknowledge helpful discussions with many people over the several years this paper has been in preparation. We would particularly like to note comments from seminar participants at Princeton, Columbia, and Chicago, the CHESS research group at Durham, as well as discussions with Orley Ashenfelter, Anne Case, Nick Cowen, Hank Farber, Jim Heckman, Bo Honoré, Chuck Manski, and Julian Reiss. Ulrich Mueller had a major influence on shaping Section 1. We have benefited from generous comments on an earlier version by Christopher Adams, Tim Besley, Chris Blattman, Sylvain Chassang, Jishnu Das, Jean Drèze, William Easterly, Jonathan Fuller, Lars Hansen, Jeff Hammer, Glenn Harrison, Macartan Humphreys, Michal Kolesár, Helen Milner, Tamlyn Munslow, Suresh Naidu, Lant Pritchett, Dani Rodrik, Burt Singer, Richard Williams, Richard Zeckhauser, and Steve Ziliak. Cartwright’s research for this paper has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (grant agreement No 667526 K4U), the Spencer Foundation, and the National Science Foundation (award 1632471). Deaton acknowledges financial support from the National Institute on Aging through the National Bureau of Economic Research, Grants 5R01AG040629-02 and P01AG05842-14 and through Princeton University’s Roybal Center, Grant P30 AG024928. The views expressed herein are those of the authors and do not necessarily reflect the views of the National Bureau of Economic Research.
NBER working papers are circulated for discussion and comment purposes. They have not been peer-reviewed or been subject to the review by the NBER Board of Directors that accompanies official NBER publications.
© 2016 by Angus Deaton and Nancy Cartwright. All rights reserved. Short sections of text, not to exceed two paragraphs, may be quoted without explicit permission provided that full credit, including © notice, is given to the source.
![Page 2: NBER WORKING PAPER SERIES UNDERSTANDING AND ...Lars Hansen, Jeff Hammer, Glenn Harrison, Macartan Humphreys, Michal Kolesár, Helen Milner, Tamlyn Munslow, Suresh Naidu, Lant Pritchett,](https://reader033.vdocuments.net/reader033/viewer/2022060100/60b128e9329a544c0969daf7/html5/thumbnails/2.jpg)
Understanding and Misunderstanding Randomized Controlled Trials Angus Deaton and Nancy CartwrightNBER Working Paper No. 22595September 2016, Revised October 2017JEL No. C10,C26,C93,O22
ABSTRACT
RCTs would be more useful if there were more realistic expectations of them and if their pitfalls were better recognized. For example, and contrary to many claims in the applied literature, randomization does not equalize everything but the treatment across treatments and controls, it does not automatically deliver a precise estimate of the average treatment effect (ATE), and it does not relieve us of the need to think about (observed or unobserved) confounders. Estimates apply to the trial sample only, sometimes a convenience sample, and usually selected; justification is required to extend them to other groups, including any population to which the trial sample belongs. Demanding “external validity” is unhelpful because it expects too much of an RCT while undervaluing its contribution. Statistical inference on ATEs involves hazards that are not always recognized. RCTs do indeed require minimal assumptions and can operate with little prior knowledge. This is an advantage when persuading distrustful audiences, but it is a disadvantage for cumulative scientific progress, where prior knowledge should be built upon and not discarded. RCTs can play a role in building scientific knowledge and useful predictions but they can only do so as part of a cumulative program, combining with other methods, including conceptual and theoretical development, to discover not “what works,” but “why things work”.
Angus Deaton361 Wallace HallWoodrow Wilson SchoolPrinceton UniversityPrinceton, NJ 08544-1013and [email protected]
Nancy CartwrightDepartment of PhilosophyDurham University50 Old ElvetDurham, UK DH1 3HN and University of California, San [email protected]
![Page 3: NBER WORKING PAPER SERIES UNDERSTANDING AND ...Lars Hansen, Jeff Hammer, Glenn Harrison, Macartan Humphreys, Michal Kolesár, Helen Milner, Tamlyn Munslow, Suresh Naidu, Lant Pritchett,](https://reader033.vdocuments.net/reader033/viewer/2022060100/60b128e9329a544c0969daf7/html5/thumbnails/3.jpg)
2
IntroductionRandomizedcontrolledtrials(RCTs)arecurrentlywidelyvisibleineconomicstoday
andhavebeenusedinthesubjectatleastsincethe1960s(seeGreenbergand
Shroder(2004)foracompendium).Itisoftenclaimedthatsuchtrialscandiscover
“whatworks”ineconomics,aswellasinpoliticalscience,education,andsocialpol-
icy.Amongbothresearchersandthegeneralpublic,RCTsareperceivedtoyield
causalinferencesandestimatesofaveragetreatmenteffects(ATEs)thataremore
reliableandmorecrediblethanthosefromanyotherempiricalmethod.Theyare
takentobelargelyexemptfromthemyriadeconometricproblemsthatcharacterize
observationalstudies,torequireminimalsubstantiveassumptions,littleornoprior
information,andtobelargelyindependentof“expert”knowledgethatisoftenre-
gardedasmanipulable,politicallybiased,orotherwisesuspect.
Therearenow“WhatWorks”centersusingandrecommendingRCTsina
rangeofareasofsocialconcernacrossEuropeandtheAnglophoneworld.These
centersseeRCTsastheirpreferredtoolandindeedoftenpreferRCTevidencelexi-
cographically.Asoneofmanyexamples,theUSDepartmentofEducation’sstandard
for“strongevidenceofeffectiveness”requiresa“well-designedandimplemented”
RCT;noobservationalstudycanearnsuchalabel.This“goldstandard”claimabout
RCTsislesscommonineconomics,butImbens(2010,407)writesthat“randomized
experimentsdooccupyaspecialplaceinthehierarchyofevidence,namelyatthe
verytop.”TheAbdulLatifJameelPovertyActionLab(J-PAL),whosestatedmission
is“toreducepovertybyensuringthatpolicyisinformedbyscientificevidence”,ad-
vertisesthatitsaffiliatedprofessors“conductrandomizedevaluationstotestand
improvetheeffectivenessofprogramsandpoliciesaimedatreducingpoverty”,J-
PAL(2017).Theleadpageofitswebsite(echoedinthe‘Evaluation’section)notes
“843ongoingandcompletedrandomizedevaluationsin80countries”withnomen-
tionofanystudiesthatarenotrandomized.
Inmedicine,thegoldstandardviewhaslongbeenwidespread,e.g.fordrug
trialsbytheFDA;anotableexceptionistherecentpaperbyFrieden(2017),ex-di-
![Page 4: NBER WORKING PAPER SERIES UNDERSTANDING AND ...Lars Hansen, Jeff Hammer, Glenn Harrison, Macartan Humphreys, Michal Kolesár, Helen Milner, Tamlyn Munslow, Suresh Naidu, Lant Pritchett,](https://reader033.vdocuments.net/reader033/viewer/2022060100/60b128e9329a544c0969daf7/html5/thumbnails/4.jpg)
3
rectoroftheU.S.CentersforDiseaseControlandPrevention,wholistskeylimita-
tionsofRCTsaswellasarangeofcontextswhereRCTs,evenwhenfeasible,are
dominatedbyothermethods.
WearguethatanyspecialstatusforRCTsisunwarranted.Whichmethodis
mostlikelytoyieldagoodcausalinferencedependsonwhatwearetryingtodis-
coveraswellasonwhatknowledgeisalreadyavailable.Whenlittleprior
knowledgeisavailable,nomethodislikelytoyieldwell-supportedconclusions.This
paperisnotacriticismofRCTsinandofthemselves,letaloneanattempttoidentify
goodandbadstudies.Instead,wewillarguethat,dependingonwhatwewantto
discover,whywewanttodiscoverit,andwhatwealreadyknow,therewilloftenbe
superiorroutesofinvestigation.
Wepresenttwosetsofarguments.Thefirstisanenquiryintotheideathat
ATEsestimatedfromRCTSarelikelytobeclosertothetruththanthoseestimated
inotherways.ThesecondexploreshowtousetheresultsofRCTsoncewehave
them.Inthefirstsection,ourdiscussionrunsinfamiliartermsofbiasandefficiency,
orexpectedloss.Noneofthismaterialisnew,butweknowofnosimilartreatment,
andwewishtodisputemanyoftheclaimsthatarefrequentlymadeintheapplied
literature.Someroutinemisunderstandingsare:(a)randomizationensuresafair
trialbyensuringthat,atleastwithhighprobability,treatmentandcontrolgroups
differonlyinthetreatment;(b)RCTsprovidenotonlyunbiasedestimatesofATEs
butalsopreciseestimates;(c)statisticalinferenceinRCTs,whichrequiresonlythe
simplecomparisonofmeans,isstraightforward,sothatstandardsignificancetests
arereliable.
Nothingwesayinthepapershouldbetakenasageneralargumentagainst
RCTs;wearesimplytryingtochallengeunjustifiableclaims,andexposemisunder-
standings.WearenotagainstRCTs,onlymagicalthinkingaboutthem.Themisun-
derstandingsareimportantbecausewebelievethattheycontributetothecommon
perceptionthatRCTsalwaysprovidethestrongestevidenceforcausalityandforef-
fectiveness.
![Page 5: NBER WORKING PAPER SERIES UNDERSTANDING AND ...Lars Hansen, Jeff Hammer, Glenn Harrison, Macartan Humphreys, Michal Kolesár, Helen Milner, Tamlyn Munslow, Suresh Naidu, Lant Pritchett,](https://reader033.vdocuments.net/reader033/viewer/2022060100/60b128e9329a544c0969daf7/html5/thumbnails/5.jpg)
4
Inthesecondpartofthepaper,wediscusshowtousetheevidencefrom
RCTs.Thenon-parametricandtheory-freenatureofRCTs,whichisarguablyanad-
vantageinestimation,isoftenadisadvantagewhenwetrytousetheresultsoutside
ofthecontextinwhichtheresultswereobtained;credibilityinestimationcanlead
toincredibilityinuse.Muchoftheliterature,perhapsinspiredbyCampbelland
Stanley’s(1963)famous“primacyofinternalvalidity”,appearstobelievethatinter-
nalvalidityisnotonlynecessarybutalmostsufficienttoguaranteetheusefulnessof
theestimatesindifferentcontexts.Butyoucannotknowhowtousetrialresults
withoutfirstunderstandinghowtheresultsfromRCTsrelatetotheknowledgethat
youalreadypossessabouttheworld,andmuchofthisknowledgeisobtainedby
othermethods.OncethecommitmenthasbeenmadetoseeingRCTswithinthis
broaderstructureofknowledgeandinference,andwhentheyaredesignedtoen-
hanceit,theycanbeenormouslyuseful,notjustforwarrantingclaimsofeffective-
nessbutforscientificprogressmoregenerally.Cumulativescienceisnotadvanced
throughmagicalthinking.
TheliteratureontheprecisionofATEsestimatedfromRCTsgoesbacktothe
verybeginning.Gosset(writingas`Student’)neveracceptedFisher’sargumentsfor
randomizationinagriculturalfieldtrialsandarguedconvincinglythathisownnon-
randomdesignsfortheplacementoftreatmentandcontrolsyieldedmoreprecise
estimatesoftreatmenteffects(seeStudent(1938)andZiliak(2014)).Gosset
workedforGuinnesswhereinefficiencymeantlostrevenue,sohehadreasonsto
care,asshouldwe.Fisherwontheargumentintheend,notbecauseGossetwas
wrongaboutefficiency,butbecause,unlikeGosset’sprocedures,randomizationpro-
videsasoundbasisforstatisticalinference,andthusforjudgingwhetheranesti-
matedATEisdifferentfromzerobychance.Moreover,Fisher’sblockingprocedures
canlimittheinefficiencyfromrandomization(seeYates(1939)).Gosset’sreserva-
tionswereechoedmuchlaterinSavage’s(1962)commentthataBayesianshould
notchoosetheallocationoftreatmentsandcontrolsatrandombutinsuchaway
that,givenwhatelseisknownaboutthetopicandthesubjects,theirplacementre-
vealsthemosttotheresearcher.Theseissuesabouthowtoincorporatepriorinfor-
mationintorandomizedtrialsarecentraltoSection1.
![Page 6: NBER WORKING PAPER SERIES UNDERSTANDING AND ...Lars Hansen, Jeff Hammer, Glenn Harrison, Macartan Humphreys, Michal Kolesár, Helen Milner, Tamlyn Munslow, Suresh Naidu, Lant Pritchett,](https://reader033.vdocuments.net/reader033/viewer/2022060100/60b128e9329a544c0969daf7/html5/thumbnails/6.jpg)
5
Ineconomics,thestrengthsandweaknessesofRCTsarewellexploredinthe
volumesbyHausmanandWise(1985)andbyGarfinkelandManski(1992);inthe
latter,theintroductionbyGarfinkelandManskiisabalancedsummaryofwhatran-
domizedtrialscanandcannotdo.ThepaperinthatvolumebyHeckman(1992)
raisesmanyoftheissuesthatheandhiscoauthorshaveexploredinsubsequentpa-
pers,seeinparticularHeckmanandSmith(1995),andHeckman,LalondeandSmith
(1999)whofocusonlabormarketexperiments.Manski(2013)containsagood
summaryofbothstrengthsandweaknesses.
Thereisalsoamorecontestedrecentliterature.Ontheonehand,thereare
proceduresthattakeasfundamentaltheunrestrictedindividualtreatmenteffectsof
individualsandseeknon-parametricapproachestoestimatingtheiraverage.Onthe
otherhand,theseproceduresarecontrastedwithanapproachthatuseselementsof
economictheorytodefineparametersofinterestandtoidentifymagnitudesthat
arelikelytobeinvarianttopolicymanipulationoracrosscontexts,whereinvari-
anceisdefinedinthesenseofHurwicz(1966).TheintroductioninImbensand
Wooldridge(2009)provideaneloquentdefenseofthetreatment-effectformulation.
Itemphasizesthecredibilitythatcomesfromatheory-freespecificationwithalmost
unlimitedheterogeneityintreatmenteffects.TheintroductioninHeckmanand
Vytlacil(2007)makesanequallyeloquentcaseagainst,notingthatthecrucialingre-
dientsoftreatmentsinRCTsareoftennotclearlyspecified—sothatweoftendonot
knowwhatthetreatmentreallyis—andthatthetreatmenteffectsarehardtolinkto
invariantparametersthatwouldbeusefulelsewhere.Aspectsofthesamedebate
featureinImbens(2010),AtheyandImbens(2017),AngristandPischke(2017),
Heckman(2005,2008,2010)andHeckmanandUrzua(2010).
Deaton(2010)complainsabouttheuseofinstrumentalvariables,including
randomization,asasubstituteforthinkingaboutandconstructingmodelsofeco-
nomicdevelopment.HearguesagainsttheideathatusingRCTstoevaluateprojects
todiscover“whatworks”caneveryieldasystematicbodyofscientificknowledge
thatcanbeusedtoreduceoreliminatepoverty.Thatpaperisanargumentagainst
theusefulnessoftheheterogeneoustreatmentapproach.Itarguesthatrefusingto
![Page 7: NBER WORKING PAPER SERIES UNDERSTANDING AND ...Lars Hansen, Jeff Hammer, Glenn Harrison, Macartan Humphreys, Michal Kolesár, Helen Milner, Tamlyn Munslow, Suresh Naidu, Lant Pritchett,](https://reader033.vdocuments.net/reader033/viewer/2022060100/60b128e9329a544c0969daf7/html5/thumbnails/7.jpg)
6
modelheterogeneity,thoughavoidingassumptions,precludesthesortofcumula-
tiveresearchprogramthatmightyieldusefulpolicy.Thepaper’sclaimthatRCTs
havenospecialclaimtogeneratecredibleandusefulknowledgewaschallengedby
Imbens(2010);someofhisargumentsareansweredbelow.Cartwright(2007)and
CartwrightandMunro(2010)challengeany“goldstandard”viewofRCTs.Cart-
wright(2011,2012,2016)andCartwrightandHardie(2012)focusonthequestion
ofhowtousetheresultsofRCTsandwhatwecanlearnwhenanexperimentshows
thatsomepolicyworkssomewhere.Section2pursuestheseissuesingeneraland
throughcasestudies.
Section1:DoRCTsgivegoodestimatesofAverageTreatmentEffects
Inthissection,weexplorehowtoestimateaveragetreatmenteffects(ATEs)andthe
roleofrandomization.WenotefirstthatestimatingATEsisonlyoneofmanyuses
forthedatageneratedbyanRCT.Westartfromatrialsample,acollectionofsub-
jectsthatwillbeallocatedrandomlytoeitherthetreatmentorcontrolarmofthe
trial.This“sample”mightbe,butrarelyis,arandomsamplefromsomepopulation
ofinterest.Morefrequently,itisselectedinsomeway,forexampletothosewilling
toparticipate,orissimplyaconveniencesamplethatisavailabletothetrialists.
Givenrandomallocationtotreatmentsandcontrols,thedatafromthetrialallowthe
identificationoftwodistributions,𝐹"(𝑌")and𝐹&(𝑌&),ofoutcomes𝑌"and𝑌&inthe
treatedanduntreatedcaseswithinthetrialsample.TheestimatedATEisthediffer-
enceinmeansofthetwodistributionsandisthefocusofmuchoftheliteraturein
socialscienceandmedicine.Yetpolicymakersandresearchersmaywellbeinter-
estedinotherfeaturesofthetwodistributions.Forexample,ifYisincome,they
maybeinterestedinwhetheratreatmentreducedincomeinequality,orinwhatit
didtothe10thor90thpercentilesoftheincomedistribution,eventhoughdifferent
peopleoccupythosepercentilesinthetreatmentandcontroldistributions(seeBit-
leretal(2006)foranexampleinUSwelfarepolicy).Cancertrialsstandardlyusethe
mediandifferenceinsurvival,whichcomparesthetimesuntilhalfthepatientshave
diedineacharm.Morecomprehensively,policymakersmaywishtocompareex-
pectedutilitiesfortreatedanduntreatedunderthetwodistributionsandconsider
![Page 8: NBER WORKING PAPER SERIES UNDERSTANDING AND ...Lars Hansen, Jeff Hammer, Glenn Harrison, Macartan Humphreys, Michal Kolesár, Helen Milner, Tamlyn Munslow, Suresh Naidu, Lant Pritchett,](https://reader033.vdocuments.net/reader033/viewer/2022060100/60b128e9329a544c0969daf7/html5/thumbnails/8.jpg)
7
optimalexpected-utilitymaximizingtreatmentrulesconditionalonthecharacteris-
ticsofsubjects(seeManski(2004)andManskiandTetenov(2016);Bhattacharya
andDupas(2012)containsanapplication.)Theseusesareimportant,butwefocus
onATEshereanddonotconsidertheseotherusesofRCTsanyfurtherinthispaper.
1.1Whatdoesrandomizationdo?
Ausefulwaytothinkabouttheestimationoftreatmenteffectsistouseaschematic
linearcausalmodeloftheform:
(1)
where, istheoutcomeforuniti,𝑇( isadichotomous(1,0)treatmentdummyindi-
catingwhetherornotiistreated,and𝛽( istheindividualtreatmenteffectofthe
treatmentoni.Thex’saretheobservedorunobservedotherlinearcausesofthe
outcome,andwesupposethat(1)capturesaminimalsetofcausesof𝑌( sufficientto
fixitsvalue.Jmaybe(very)large.Becausetheheterogeneityoftheindividualtreat-
menteffects,𝛽( ,isunrestricted,weallowthepossibilitythatthetreatmentinteracts
withthex’sorothervariables,sothattheeffectsofTcandependonanyothervaria-
bles.Notethatwedonotneedisubscriptsonthe𝛾’sthatcontroltheeffectsofthe
othercauses;iftheireffectsdifferacrossindividuals,weincludetheinteractionsof
individualcharacteristicswiththeoriginalx’sasnewx’s.Giventhatthex’scanbe
unobservable,thisisnotrestrictive.
Consideranexperimentthataimstotellussomethingaboutthetreatment
effects;thismightormightnotuserandomization.Eitherway,wecanrepresentthe
treatmentgroupashaving𝑇( = 1andthecontrolgroupashaving𝑇( = 0.Giventhe
study(ortrial)sample,subtractingtheaverageoutcomesamongthecontrolsfrom
theaverageoutcomesamongthetreatments,weget
Y
1−Y
0= β
1+ γ j (xij
1−
j=1
J
∑ xij0) = β
1+ (S
1− S
0) (2)
Thefirsttermonthefar-right-handsideof(2),whichistheATEinthetrialsample,
iswhatwewant,butthesecondtermorerrorterm,whichisthesumofthenetav-
eragebalanceofothercausesacrossthetwogroups,willgenerallybenon-zeroand
Yi = βiTi + γ j xijj=1
J∑Yi
![Page 9: NBER WORKING PAPER SERIES UNDERSTANDING AND ...Lars Hansen, Jeff Hammer, Glenn Harrison, Macartan Humphreys, Michal Kolesár, Helen Milner, Tamlyn Munslow, Suresh Naidu, Lant Pritchett,](https://reader033.vdocuments.net/reader033/viewer/2022060100/60b128e9329a544c0969daf7/html5/thumbnails/9.jpg)
8
needstobedealtwithsomehow.Wegetwhatwewantwhenthemeansofallthe
othercausesareidenticalinthetwogroups,ormoreprecisely(andlessonerously)
whenthesumoftheirnetdifferences𝑆" − 𝑆&iszero;thisisthecaseofperfectbal-
ance.Withperfectbalance,thedifferencebetweenthetwomeansisexactlyequalto
theaverageofthetreatmenteffectsamongthetreated,sothatwehavetheultimate
precisioninthatweknowthetruthinthetrialsample,atleastinthislinearcase.As
always,the“truth”herereferstothetrialsample,anditisalwaysimportanttobe
awarethatthetrialsamplemaynotberepresentativeofthepopulationthatisulti-
matelyofinterest,includingthepopulationfromwhichthetrialsamplecomes;any
suchextensionrequiresfurtherargument.
Howdowegetbalance,orsomethingclosetoit?What,exactly,istheroleof
randomization?Inalaboratoryexperiment,wherethereisusuallymuchprior
knowledgeoftheothercauses,theexperimenterhasagoodchanceofcontrolling
(orsubtractingawaytheeffectsof)theothercauses,aimingtoensurethatthelast
termin(1)isclosetozero.Failingsuchknowledgeandcontrol,analternativeis
matching,whichisfrequentlyusedinstatistical,medical,andeconometricwork.For
eachsubject,amatchisfoundthatisascloseaspossibleonallsuspectedcauses,so
that,onceagain,thelasttermin(1)canbekeptsmall.Whenwehaveagoodideaof
thecauses,matchingmayalsodeliverapreciseestimate.Ofcourse,whenthereare
unknownorunobservablecausesthathaveimportanteffects,neitherlaboratory
controlnormatchingoffersprotection.
Whatdoesrandomizationdo?Sincethetreatmentsandcontrolscomefrom
thesameunderlyingdistribution,randomizationguarantees,byconstruction,that
thelasttermontherightin(1)iszeroinexpectation,subjecttothecaveatthatno
correlationsofthex’swithYareintroducedpost-randomization,forexampleby
subjectsnotacceptingtheirassignment.Theexpectationhereistakenoverre-
peatedrandomizationsonthetrialsample,eachwithitsownallocationoftreat-
mentsandcontrols.Assumingthatourcaveatholds,thelasttermin(2)willbezero
whenaveragedoverthisinfinitenumberof(entirelyhypothetical)replications,and
![Page 10: NBER WORKING PAPER SERIES UNDERSTANDING AND ...Lars Hansen, Jeff Hammer, Glenn Harrison, Macartan Humphreys, Michal Kolesár, Helen Milner, Tamlyn Munslow, Suresh Naidu, Lant Pritchett,](https://reader033.vdocuments.net/reader033/viewer/2022060100/60b128e9329a544c0969daf7/html5/thumbnails/10.jpg)
9
theaverageoftheestimatedATEswillbethetrueATEinthetrialsample.So𝛽"de-
liversanunbiasedestimateoftheATEamongthetreatedinthetrialsample,andit
doessowhetherornotthecausesareobserved.Unbiasednessdoesnotrequireus
toknowanythingabouttheothercausesthoughitdoesrequirethattheynotchange
afterrandomizationsoastomakethemcorrelatedwiththetreatment,whichisan
importantcaveattowhichweshallreturn.IftheRCTisrepeatedmanytimesonthe
sametrialsample,then,assumingourcaveatholdsinthetrials,thelasttermin(2)
willbezerowhenaveragedoveraninfinitenumberof(entirelyhypothetical)trials,
andtheaverageoftheestimatedATEswillbethetrueATEinthetrialsample.Of
course,noneofthisistrueinanyonetrialwherethedifferenceinmeanswillbe
equaltotheaveragetreatmenteffectamongthosetreatedplusthetermthatreflects
theimbalanceintheneteffectsoftheothercauses.Wedonotknowthesizeofthis
errorterm,andthereisnothinginrandomizationthatlimitsitssize;bychancethe
randomizationinoursingletrialcanover-representanimportantexcludedcause(s)
inonearmovertheother,inwhichcasetherewillbeadifferencebetweenthe
meansofthetwogroupsthatisnotcausedbythetreatment.
Theunbiasednessresultcaneasilybecompromised.Inparticular,thetreat-
mentmustnotbecorrelatedwithanyothercause.Randomassignmentisdesigned
toaidwiththis,butitisnotsufficientif,forexample,thereislackofblindingsothat
individualsareawareoftheirassignment,orifthoseadministeringthetreatment
aresoaware,andifthatawarenesstriggersanothercause.Similarly,researchers
sometimesreturntoindividualswhowererandomizedyearsbefore,sothatthere
hasbeentimeforthesubjectsorotherstolearntheirassignmentorforothercauses
tobeinfluencedbytheassignment.Thisagainopensupthepossibilityofunbal-
ancedeffectsofcausesotherthanthetreatmentweareinterestedin.Wehaveal-
readynotedthatunbiasednessreferstothetrialsample,whichmayormaynotbe
representativeofthepopulationofinterest.
Ifweweretorepeatthetrialmanytimes,theover-representationoftheun-
balancedcauseswillsometimesbeinthetreatmentsandsometimesinthecontrols.
Theimbalancewillvaryoverreplicationsofthetrial,andalthoughwecannotsee
thisfromoursingletrial,weshouldbeabletocaptureitseffectsonourestimateof
![Page 11: NBER WORKING PAPER SERIES UNDERSTANDING AND ...Lars Hansen, Jeff Hammer, Glenn Harrison, Macartan Humphreys, Michal Kolesár, Helen Milner, Tamlyn Munslow, Suresh Naidu, Lant Pritchett,](https://reader033.vdocuments.net/reader033/viewer/2022060100/60b128e9329a544c0969daf7/html5/thumbnails/11.jpg)
10
theATEfromanestimatedstandarderror.ThiswasFisher’sinnovation:notthat
randomizationbalancedothercausesbetweentreatmentsandcontrolsbutthat,
conditionalonourcaveatabove,randomizationprovidesthebasisforcalculating
thesizeoftheerror.Gettingthestandarderrorandassociatedsignificancestate-
mentsrightareofthegreatestimportance.Giventheabsenceoftreatment-related
post-randomizationchangesinothercauses,randomizationyieldsanunbiasedesti-
mateoftheATEinthetrialsampleaswellasasoundmethodformeasuringerrorof
estimationinthatsample;thereinliesitsvirtue,notthatityieldspreciseestimates
throughbalance.
1.2Misunderstandings:claimingtoomuch
Everythingsofarshouldbeperfectlyfamiliar,butexactlywhatrandomizationdoes
isfrequentlylostinthepracticalliterature.Thereisoftenconfusionbetweenperfect
control,ontheonehand(asinalaboratoryexperimentorperfectmatchingwithno
unobservablecauses),andcontrolinexpectationontheother,whichiswhatran-
domizationcontributes.Ifweknewenoughabouttheproblemtobeabletocontrol
well,thatiswhatwewoulddo.Randomizationisanalternativewhenwedonot
knowenough,butisgenerallyinferiortogoodcontrol.Wesuspectthatatleastsome
ofthepopularandprofessionalenthusiasmforRCTs,aswellasthebeliefthatthey
areprecisebyconstruction,comesfrommisunderstandingsaboutbalance.These
misunderstandingsarenotsomuchamongthetrialistswhowilloftengiveacorrect
accountwhenpressed.Theycomefromimprecisestatementsbytrialiststhatare
takenliterallybythelayaudiencethatthetrialistsarekeentoreach.
Suchamisunderstandingiswellcapturedbyaquotefromthesecondedition
oftheonlinemanualonimpactevaluationjointlyissuedbytheInter-AmericanDe-
velopmentBankandtheWorldBank(thefirst,2011editionissimilar):
“Wecanbeconfidentthatourestimatedimpactconstitutesthetrueimpact
oftheprogram,sincewehaveeliminatedallobservedandunobservedfac-
torsthatmightotherwiseplausiblyexplainthedifferenceinoutcomes.”Ger-
tler,Martinez,Premand,Rawlings,andVermeersch(2016,69).
![Page 12: NBER WORKING PAPER SERIES UNDERSTANDING AND ...Lars Hansen, Jeff Hammer, Glenn Harrison, Macartan Humphreys, Michal Kolesár, Helen Milner, Tamlyn Munslow, Suresh Naidu, Lant Pritchett,](https://reader033.vdocuments.net/reader033/viewer/2022060100/60b128e9329a544c0969daf7/html5/thumbnails/12.jpg)
11
Thisstatementisfalse,becauseitconfusesactualbalanceinanysingletrialwith
balanceinexpectationovermany(hypothetical)trials.Ifitweretrue,andifallfac-
torswereindeedcontrolled(andnoimbalanceswereintroducedpostrandomiza-
tion),thedifferencewouldbeanexactmeasureoftheaveragetreatmenteffect
amongthetreatedinthetrialpopulation(atleastintheabsenceofmeasurementer-
ror).Weshouldnotonlybeconfidentofourestimatebut,asthequotesays,we
wouldknowthatitisthetruth.Notethatthestatementcontainsnoreferenceto
samplesize;wegetthetruthbyvirtueofbalance,notfromalargenumberofobser-
vations.
AsimilarquotecomesfromJohnList,oneofthemostimaginativeandsuc-
cessfulscholarswhouseRCTs:
“complicationsthataredifficulttounderstandandcontrolrepresentkeyrea-
sonstoconductexperiments,notapointofskepticism.Thisisbecauseran-
domizationactsasaninstrumentalvariable,balancingunobservablesacross
controlandtreatmentgroups.”Al-UbaydliandList(2013)(italicsintheorig-
inal.)
AndfromDeanKarlan,founderandPresidentofYale’sInnovationsforPovertyAc-
tion,whichrunsdevelopmentRCTsaroundtheworld:
“Asinmedicaltrials,weisolatetheimpactofaninterventionbyrandomly
assigningsubjectstotreatmentsandcontrolgroups.Thismakesitsothatall
thoseotherfactorswhichcouldinfluencetheoutcomearepresentintreat-
mentandcontrol,andthusanydifferenceinoutcomecanbeconfidentlyat-
tributedtotheintervention.”Karlan,GoldbergandCopestake(2009)
Andfromthemedicalliterature,fromadistinguishedpsychiatristwhoisdeeply
skepticaloftheuseofevidencefromRCTs,
“Thebeautyofarandomizedtrialisthattheresearcherdoesnotneedtoun-
derstandallthefactorsthatinfluenceoutcomes.Saythatanundiscoveredge-
neticvariationmakescertainpeopleunresponsivetomedication.Theran-
domizingprocesswillensure—ormakeithighlyprobable—thatthearmsof
thetrialcontainequalnumbersofsubjectswiththatvariation.Theresultwill
beafairtest.”(Kramer,2016,p.18)
![Page 13: NBER WORKING PAPER SERIES UNDERSTANDING AND ...Lars Hansen, Jeff Hammer, Glenn Harrison, Macartan Humphreys, Michal Kolesár, Helen Milner, Tamlyn Munslow, Suresh Naidu, Lant Pritchett,](https://reader033.vdocuments.net/reader033/viewer/2022060100/60b128e9329a544c0969daf7/html5/thumbnails/13.jpg)
12
ClaimsareevenmadethatRCTsrevealknowledgewithoutpossibilityoferror.Judy
Gueron,thelong-timepresidentofMDRC,whichhasbeenrunningRCTsonUSgov-
ernmentpolicyfor45years,askswhyfederalandstateofficialswerepreparedto
supportrandomizationinspiteoffrequentdifficultiesandinspiteoftheavailability
ofothermethodsandconcludesthatitwasbecause“theywantedtolearnthetruth,”
GueronandRolston(2013,429).Therearemanystatementsoftheform“Weknow
that[projectX]workedbecauseitwasevaluatedwitharandomizedtrial,”Dynarski
(2015).
ItiscommontotreattheATEfromanRCTasifitwerethetruth,notjustin
thetrialsamplebutmoregenerally.Ineconomics,afamousexampleisLalonde’s
(1986)studyoftrainingprograms,whoseresultswereatoddswithanumberof
previousnon-randomizedstudies.Thepaperpromptedalarge-scalere-examination
oftheobservationalstudiestotrytobringthemintoline,thoughitnowseemsjust
aslikelythatthedifferenceslieinthefactthatthestudyresultsapplytodifferent
populations(Heckman,Lalonde,andSmith(1999)).Inepidemiology,Davey-Smith
andIbrahim(2002)statethat“observationalstudiespropose,RCTsdispose”.A
goodexampleistheRCTofhormonereplacementtherapy(HRT)forpost-menopau-
salwomen.HRThadpreviouslybeensupportedbypositiveresultsfromahigh-
qualityandlong-runningobservationalstudy,buttheRCTwasstoppedinthefaceof
excessdeathsinthetreatmentgroup.ThenegativeresultoftheRCTledtowide-
spreadabandonmentofthetherapy,whichmighthavebeenamistake(seeVanden-
broucke(2009)andFrieden(2017)).Yetthemedicalandpopularliteraturerou-
tinelystatesthattheRCTwasrightandtheearlierstudywrong,simplybecausethe
earlierstudywasnotrandomized.Thegoldstandardor“truth”viewdoesharm
whenitunderminestheobligationofsciencetoreconcileRCTsresultswithother
evidenceinaprocessofcumulativeunderstanding.
Thefalsebeliefinautomaticprecisionsuggeststhatweneedpaynoatten-
tiontotheothercausesin(1)or(2).Indeed,GerberandGreen(2012),intheir
standardtextforRCTsinpoliticalscience,writethatrunninganRCTis“aresearch
strategythatdoesnotrequire,letalonemeasure,allpotentialconfounders.”Thisis
trueifwearehappywithestimatesthatarearbitrarilyfarfromthetruth,justso
![Page 14: NBER WORKING PAPER SERIES UNDERSTANDING AND ...Lars Hansen, Jeff Hammer, Glenn Harrison, Macartan Humphreys, Michal Kolesár, Helen Milner, Tamlyn Munslow, Suresh Naidu, Lant Pritchett,](https://reader033.vdocuments.net/reader033/viewer/2022060100/60b128e9329a544c0969daf7/html5/thumbnails/14.jpg)
13
longastheerrorscanceloutoveraseriesofimaginaryexperiments.Inreality,the
causalitythatisbeingattributedtothetreatmentmight,infact,becomingfroman
imbalanceinsomeothercauseinourparticulartrial;limitingthisrequiresserious
thoughtaboutpossibleconfounders.
1.3Samplesize,balance,andprecision
Atthetimeofrandomizationandintheabsenceofpost-randomizationchangesin
othercauses,atrialismorelikelytobebalancedwhenthesamplesizeislarge.As
thesamplesizetendstoinfinity,themeansofthex’sinthetreatmentandcontrol
groupswillbecomearbitrarilyclose.Yetthisisoflittlehelpinfinitesamples.As
Fisher(1926)noted:“Mostexperimentersoncarryingoutarandomassignment
willbeshockedtofindhowfarfromequallytheplotsdistributethemselves,”quoted
inMorganandRubin(2012).Evenwithverylargesamplesizes,ifthereisalarge
numberofcauses,balanceoneachcausemaybeinfeasible.Evenwithjustthree
causeswiththreevalueseach,thereare27cellstobalance,andinmostsocialand
medicalcasestherewillbemore.Vandenbroucke(2004)notesthattherearethree
billionbasepairsinthehumangenome,manyorallofwhichcouldberelevantprog-
nosticfactorsforthebiologicaloutcomethatweareseekingtoinfluence.Itistrue,
as(2)makesclear,thatwedonotneedbalanceoneachcauseindividually,onlyon
theirneteffect,theterm𝑆" − 𝑆&.Butconsiderthehumangenomebasepairs.Outof
allofthosebillions,onlyonemightbeimportant,andifthatoneisunbalanced,the
resultsofasingletrialcanbefarfromthetruth.Statementsaboutlargesamples
guaranteeingbalancearenotusefulwithoutguidelinesabouthowlargeislarge
enough,andsuchstatementscannotbemadewithoutknowledgeofothercauses
andhowtheyaffectoutcomes.Ofcourse,lackofbalanceintheneteffectofeither
observablesornon-observablesin(2)doesnotcompromisetheinferenceinanRCT
inthesenseofobtainingastandarderrorfortheunbiasedATE(seeSenn(2013)for
aparticularlyclearstatement).
HavingrunanRCT,itmakesgoodsensetoexamineanyavailablecovariates
forbalancebetweenthetreatmentsandcontrols;ifwesuspectthatanobserved
variablexisapossiblecause,anditsmeansinthetwogroupsareverydifferent,we
![Page 15: NBER WORKING PAPER SERIES UNDERSTANDING AND ...Lars Hansen, Jeff Hammer, Glenn Harrison, Macartan Humphreys, Michal Kolesár, Helen Milner, Tamlyn Munslow, Suresh Naidu, Lant Pritchett,](https://reader033.vdocuments.net/reader033/viewer/2022060100/60b128e9329a544c0969daf7/html5/thumbnails/15.jpg)
14
shouldtreatourresultswithappropriatesuspicion.Inpractice,trialistsineconom-
ics(andinsomeotherdisciplines)usuallycarryoutastatisticaltestforbalanceaf-
terrandomizationbutbeforeanalysis,presumablywiththeaimoftakingsomeap-
propriateactionifbalancefails.Thefirsttableofthepapertypicallypresentsthe
samplemeansofobservablecovariates—theobservablex’sin(1)orinteractivefac-
torsrepresentedinβ—forthecontrolandtreatmentgroups,togetherwiththeirdif-
ferences,andtestsforwhetherornottheyaresignificantlydifferentfromzero,ei-
thervariablebyvariable,orjointly.Thesetestsareappropriateforunbiasednessif
weareconcernedthattherandomnumbergeneratormighthavefailed,orifweare
worriedthattherandomizationisunderminedbynon-blindedsubjectswhosys-
tematicallyunderminetheallocation.Otherwise,unbiasednessisguaranteedbythe
randomization,whateverthetestsshow,andasthenextparagraphdemonstrates,
thetestisnotinformativeaboutthebalancethatwouldleadtoprecision.
Ifwewrite𝜇&and𝜇"forthe(vectorsof)truemeansinthetrialsample(i.e.
themeansoverallpossiblerandomizations)oftheobservedcausesofYinthecon-
trolandtreatmentgroupsatthepointofassignment,thenullhypothesisis(pre-
sumably,asjudgedbythetypicalbalancetest)thatthetwovectorsareidentical,
withthealternativebeingthattheyarenot.Butiftherandomizationhasbeencor-
rectlydonethenullhypothesisistruebyconstruction(seee.g.Altman(1985)and
Senn(1994)),whichmayhelpexplainwhyitsorarelyfailsinpractice.AsBegg
(1990)notes,“(I)tisatestofanullhypothesisthatisknowntobetrue.Therefore,if
thetestturnsouttobesignificantitis,bydefinition,aafalsepositive.”Thisis,of
course,consistentwithFisher’scommentsabouttheplotsinthefield,whichnotes
thattwosamplesofplotsrandomlydrawnfromthesamefieldcanlookveryunbal-
anced.Indeed,althoughwecannot“test”itinthisway,weknowthatthenullhy-
pothesisisalsotruefortheunobservablecauses.Notethecontrastwiththestate-
mentquotedaboveclaimingthatRCTsguaranteebalanceoncausesacrosstreat-
mentandcontrolgroups.Thosestatementsrefertobalanceofcausesatthepointof
assignmentinanysingletrial,whichisnotguaranteedbyrandomization,whereas
thebalancetestsareaboutthebalanceofcausesatthepointofassignmentinexpec-
![Page 16: NBER WORKING PAPER SERIES UNDERSTANDING AND ...Lars Hansen, Jeff Hammer, Glenn Harrison, Macartan Humphreys, Michal Kolesár, Helen Milner, Tamlyn Munslow, Suresh Naidu, Lant Pritchett,](https://reader033.vdocuments.net/reader033/viewer/2022060100/60b128e9329a544c0969daf7/html5/thumbnails/16.jpg)
15
tationovermanytrials,whichisguaranteedbyrandomization.Theconfusionisper-
hapsunderstandable,butitisaconfusionnevertheless.Ofcourse,itisalwaysgood
practicetolookforimbalancesbetweenobservedcovariatesinanysingletrialusing
somemoreappropriatedistancemeasure,forexamplethenormalizeddifferencein
means(ImbensandWooldridge(2009,equation3)).Similarly,itwouldhavebeen
goodpracticeforFishertoabandonarandomizationinwhichtherewereclearpat-
ternsinthe(random)distributionofplotsacrossthefield,eventhoughthetreat-
mentandcontrolplotswererandomlyselectionsthat,byconstruction,couldnot
differ“significantly”usingthestandard(incorrect)balancetest.Whethersuchim-
balancesshouldbeseenasunderminingtheestimateoftheATEdependsonour
priorsaboutwhichcovariatesarelikelytobeimportant,andhowimportant,which
is(notcoincidentally)thesamethoughtexperimentthatisroutinelyundertakenin
observationalstudieswhenweworryaboutconfounding.
Oneproceduretoimprovebalanceistoadaptthedesignbeforerandomiza-
tion,forexample,bystratification.Fisher,whoasthequoteaboveillustrates,was
wellawareofthelossofprecisionfromrandomizationarguedfor“blocking”(strati-
fication)inagriculturaltrialsorforusingLatinSquares,bothofwhichrestrictthe
amountofimbalance.Stratification,tobeuseful,requiressomepriorunderstanding
ofthefactorsthatarelikelytobeimportant,andsoittakesusawayfromthe“no
knowledgerequired”or“nopriorsaccepted”appealofRCTs;itrequiresthinking
aboutandmeasuringcovariates.ButasScriven(1974,103)notes:“(C)ausehunting,
likelionhunting,isonlylikelytobesuccessfulifwehaveaconsiderableamountof
relevantbackgroundknowledge”.Cartwright(1994,Chapter2)putsitevenmore
strongly,“nocausesin,nocausesout”.StratificationinRCTs,asinotherformsof
sampling,isastandardmethodforusingbackgroundknowledgetoincreasethe
precisionofanestimator.Ithasthefurtheradvantagethatitallowsfortheexplora-
tionofdifferentATEsindifferentstratawhichcanbeusefulinadaptingortrans-
portingtheresultstootherlocations(seeSection2).
Stratificationisnotpossibleiftherearetoomanycovariates,orifeachhas
manyvalues,sothattherearemorecellsthancanbefilledgiventhesamplesize.
Withfivecovariates,andtenvaluesoneach,andnopriorstolimitthestructure,we
![Page 17: NBER WORKING PAPER SERIES UNDERSTANDING AND ...Lars Hansen, Jeff Hammer, Glenn Harrison, Macartan Humphreys, Michal Kolesár, Helen Milner, Tamlyn Munslow, Suresh Naidu, Lant Pritchett,](https://reader033.vdocuments.net/reader033/viewer/2022060100/60b128e9329a544c0969daf7/html5/thumbnails/17.jpg)
16
wouldhave100,000possiblestrata.Fillingtheseiswellbeyondthesamplesizesin
mosttrials.Analternativethatworksmoregenerallyistore-randomize.Iftheran-
domizationgivesanobviousimbalanceonknowncovariates—treatmentplotsall
ononesideofthefield,allofthetreatmentclinicsinoneregion,toomanyrichand
toofewpoorinthecontrolgroup—wetryagain,andkeeptryinguntilwegetabal-
ancemeasuredasasmallenoughdistancebetweenthemeansoftheobservedco-
variatesinthetwogroups.MorganandRubin(2012)suggesttheMahalanobisD–
statisticbeusedasacriterionanduseFisher’srandomizationinference(tobedis-
cussedfurtherbelow)tocalculatestandarderrorsthattakethere-randomization
intoaccount.Analternative,widelyadaptedinpractice,istoadjustforcovariatesby
runningaregression(orcovariance)analysis,withtheoutcomeontheleft-hand
sideandthetreatmentdummyandthecovariatesasexplanatoryvariables,includ-
ingpossibleinteractionsbetweencovariatesandtreatmentdummies.Freedman
(2008)showsthattheadjustedestimateoftheATEisbiasedinfinitesamples,with
thebiasdependingonthecorrelationbetweenthesquaredtreatmenteffectandthe
covariates.Acceptingsomebiasinexchangeforgreaterprecisionwilloftenmake
sense,thoughitcertainlyunderminesanygoldstandardargumentthatreliesonun-
biasednesswithoutconsiderationofprecision.
1.4Shouldwerandomize?
ThetensionbetweenrandomizationandprecisionthatgoesbacktoFisher,Gosset,
andSavagehasbeenreopenedinrecentpapersbyKasy(2016),Banerjee,Chassang,
andSnowberg(BCS)(2016)andBanerjee,Chassang,Montero,andSnowberg
(BCMS)(2016).
Thetrade-offbetweenbiasandprecisioncanbeformalizedinseveralways,
forexamplebyspecifyingalossorutilityfunctionthatdependsonhowauserisaf-
fectedbydeviationsoftheestimateoftheATEfromthetruthandthenchoosingan
estimatororanexperimentaldesignthatminimizesexpectedlossormaximizesex-
pectedutility.AsSavage(1962)noted,foraBayesian,thisinvolvesallocatingtreat-
mentsandcontrolsin“thespecificlayoutthatpromisedtotellhimthemost,”but
withoutrandomization.Ofcourse,thisrequiresseriousandperhapsdifficultthought
aboutthemechanismsunderlyingtheATE,whichrandomizationavoids.Savagealso
![Page 18: NBER WORKING PAPER SERIES UNDERSTANDING AND ...Lars Hansen, Jeff Hammer, Glenn Harrison, Macartan Humphreys, Michal Kolesár, Helen Milner, Tamlyn Munslow, Suresh Naidu, Lant Pritchett,](https://reader033.vdocuments.net/reader033/viewer/2022060100/60b128e9329a544c0969daf7/html5/thumbnails/18.jpg)
17
notesthatseveralpeoplewithdifferentpriorsmaybeinvolvedinaninvestigation
andthatindividualpriorsmaybeunreliablebecauseof“vaguenessandtemptation
toself-deception,”defectsthatrandomizationmayalleviate,oratleastevade.BCMS
(2016)provideaproofofaBayesianno-randomizationtheorem,andBCS(2016)
provideanillustrationofaschooladministratorwhohaslongbelievedthatschool
outcomesaredetermined,notbyschoolquality,butbyparentalbackground,and
whocanlearnthemostbyplacingdeprivedchildrenin(supposed)high-quality
schoolsandprivilegedchildrenin(supposed)low-qualityschools,whichisthekind
ofstudysettingthatcasestudymethodologyiswellattunedto.AsBCSnote,thisal-
locationwouldnotpersuadethosewithdifferentpriors,andtheyproposerandomi-
zationasameansofsatisfyingskepticalobservers.
Severalpointsareimportant.First,theanti-randomizationtheoremisnota
justificationofanynon-randomizeddesign,forexample,onethatallowsselection
onunobservables,butonlytheoptimaldesignthatismostinformative.Accordingto
Chalmers(2001)andBothwellandPodolsky(2016),thedevelopmentofrandomi-
zationinmedicineoriginatedwithBradford-Hill,whousedrandomizationinthe
firstRCTinmedicine—thestreptomycintrial—becauseitpreventeddoctorsselect-
ingpatientsonthebasisofperceivedneed(oragainstperceivedneed,leaningover
backwardasitwere),anargumentrecentlyechoedbyWorrall(2007).Randomiza-
tionservesthispurpose,butsodoothernon-discretionaryschemes;whatisre-
quiredisthathiddeninformationshouldnotbeallowedtoaffecttheallocation.
Second,theidealrulesbywhichunitsareallocatedtotreatmentorcontrol
dependonthecovariatesandontheinvestigators’priorsabouthowthecovariates
affecttheoutcomes.Thisopensupallsortsofmethodsofinferencethatarelongfa-
miliartoeconomistsbutthatareexcludedbypurerandomization.Forexample,
whatphilosopherscallthehypothetico-deductivemethodworksbyusingtheoryto
makeapredictionthatcanbetakentothedataforpotentialfalsification(asinthe
schoolexampleabove).Thisisthewaythatphysicistslearn,asdoeconomistswhen
theyusetheorytoderivepredictionsthatcanbetestedagainstthedata,perhapsin
anRCT,butmorefrequentlynot.Someofthemostfruitfulresearchprogramsin
economicshavebeengeneratedbythepuzzlesthatresultwhenthedatafailto
![Page 19: NBER WORKING PAPER SERIES UNDERSTANDING AND ...Lars Hansen, Jeff Hammer, Glenn Harrison, Macartan Humphreys, Michal Kolesár, Helen Milner, Tamlyn Munslow, Suresh Naidu, Lant Pritchett,](https://reader033.vdocuments.net/reader033/viewer/2022060100/60b128e9329a544c0969daf7/html5/thumbnails/19.jpg)
18
matchsuchtheoreticalpredictions,suchastheequitypremiumpuzzle,variouspur-
chasingpowerparitypuzzles,theFeldstein-Horiokapuzzle,theconsumption
smoothnesspuzzle,thepuzzleofcaloriedeclineinthefaceofmalnourishmentand
incomegrowth,andmanyothers.
Third,randomization,byrunningroughshodoverpriorinformationfrom
theoryandfromcovariates,iswastefulandevenunethicalwhenitunnecessarilyex-
posespeople,orunnecessarilymanypeople,topossibleharminariskyexperiment.
Worrall(2008)documentsthe(extreme)caseofECMO,anewtreatmentfornew-
bornswithpersistentpulmonaryhypertensionthatwasdevelopedinthe1970sby
intelligentanddirectedtrialanderrorwithinawell-understoodtheoryofthedis-
ease.Inearlyexperimentationbytheinventors,mortalitywasreducedfrom80to
20percent.TheinvestigatorsfeltcompelledtoconductanRCT,albeitwithanadap-
tive‘play-the-winner’designinwhicheachsuccessinanarmincreasedtheproba-
bilityofthenextbabybeingassignedtothatarm.Onebabyreceivedconventional
therapyanddied,11receivedECMOandlived.Evenso,astandardrandomizedcon-
trolledtrialwasthoughtnecessary.Withastoppingruleoffourdeaths,fourmore
babies(outoften)diedinthecontrolgroupandnoneoftheninewhoreceived
ECMO.
Fourth,thenon-randommethodsusepriorinformation,whichiswhythey
dobetterthanrandomization.Thisisbothanadvantageandadisadvantage,de-
pendingonone’sperspective.Ifpriorinformationisnotwidelyaccepted,orisseen
asnon-crediblebythoseweareseekingtopersuade,wewillgeneratemorecredible
estimatesifwedonotusethosepriors.Indeed,thisiswhyBCS(2016)recommend
randomizeddesigns,includinginmedicineandindevelopmenteconomics.Theyde-
velopatheoryofaninvestigatorwhoisfacinganadversarialaudiencewhowill
challengeanypriorinformationandcanevenpotentiallyvetoresultsbasedonit
(thinkofadministrativeagenciessuchastheFDAorjournalreferees).Theexperi-
mentertradesoffhisorherowndesireforprecision(andpreventingpossibleharm
tosubjects),whichwouldrequirepriorinformation,againstthewishesoftheaudi-
ence,whowantsnothingtodowiththosepriors.Eventhen,theapprovaloftheau-
dienceisonlyexante;oncethefullyrandomizedexperimenthasbeendone,nothing
![Page 20: NBER WORKING PAPER SERIES UNDERSTANDING AND ...Lars Hansen, Jeff Hammer, Glenn Harrison, Macartan Humphreys, Michal Kolesár, Helen Milner, Tamlyn Munslow, Suresh Naidu, Lant Pritchett,](https://reader033.vdocuments.net/reader033/viewer/2022060100/60b128e9329a544c0969daf7/html5/thumbnails/20.jpg)
19
stopscriticsarguingthat,infact,therandomizationdidnotofferafairtestbecause
importantothercauseswerenotbalanced.AmongdoctorswhouseRCTs,andespe-
ciallymeta-analysis,suchargumentsare(appropriately)common(seeKramer
(2016)).
Today,whenthepublichascometoquestionexpertpriorknowledge,RCTs
willflourish.Incaseswherethereisgoodreasontodoubtthegoodfaithofexperi-
menters,asinmanypharmaceuticaltrials,randomizationwillindeedbeanappro-
priateresponse.Butwebelievesuchargumentsaredestructiveforscientificen-
deavor(whichisnotthepurposeoftheFDA)andshouldberesistedasageneral
prescriptioninscientificresearch.Previousknowledgeneedstobebuiltonandin-
corporatedintonewknowledge,notdiscardedinthefaceofaggressiveignorance.
Thesystematicrefusaltousepriorknowledgeandtheassociatedpreferencefor
RCTsarerecipesforpreventingcumulativescientificprogress.Intheend,itisalso
self-defeating.ToquoteRodrik(2016)“thepromiseofRCTsastheory-freelearning
machinesisafalseone.”
1.5StatisticalinferenceinRCTs
TheestimatedATEinasimpleRCTisthedifferenceinthemeansbetweenthetreat-
mentandcontrolgroups.Whencovariatesareallowedfor,asinmostRCTsineco-
nomics,theATEisusuallyestimatedfromthecoefficientonthetreatmentdummy
inaregressionthatlookslike(1),butwiththeheterogeneityin𝛽ignored.Modern
workcalculatesstandarderrorsallowingforthepossibilitythatresidualvariances
maybedifferentinthetreatmentandcontrolgroups,usuallybyclusteringthe
standarderrors,whichisequivalenttothefamiliartwosamplestandarderrorin
thecasewithnocovariates.Statisticalinferenceisdonewitht-valuesintheusual
way.Theseproceduresdonotalwaysgivetherightanswer.
Lookingbackat(1),theunderlyingobjectsofinterestaretheindividual
treatmenteffects𝛽( foreachoftheindividualsinthetrialsample.Neitherthey,nor
theirdistribution𝐺(𝛽)isidentifiedfromanRCT;becauseRCTsmakesofewas-
sumptionswhich,inmanycases,istheirstrength,theycanidentifyonlythemeanof
thedistribution.Inmanyobservationalstudies,researchersarepreparedtomake
moreassumptionsonfunctionalformsorondistributions,andforthatpriceweare
![Page 21: NBER WORKING PAPER SERIES UNDERSTANDING AND ...Lars Hansen, Jeff Hammer, Glenn Harrison, Macartan Humphreys, Michal Kolesár, Helen Milner, Tamlyn Munslow, Suresh Naidu, Lant Pritchett,](https://reader033.vdocuments.net/reader033/viewer/2022060100/60b128e9329a544c0969daf7/html5/thumbnails/21.jpg)
20
abletoidentifyotherquantitiesofinterest.Withouttheseassumptions,inferences
mustbebasedonthedifferenceinthetwomeans,astatisticthatissometimesill-
behaved,asweshalldiscussbelow.Thisill-behaviorhasnothingtodowithRCTs,
perse,butwithinRCTs,andtheirminimalassumptions,wecannoteasilyswitch
fromthemeantosomeotherquantityofinterest.
Fisherproposedthatstatisticalinferenceshouldbedoneusingwhathasbe-
comeknownas“randomization”inference,aprocedurethatisasnon-parametricas
theRCT-basedestimateofanATEitself.Totestthenullhypothesisthat𝛽( = 0for
alli,notethat,underthenullthatthetreatmenthasnoeffectonanyindividual,an
estimatednonzeroATEmustbeaconsequenceoftheparticularrandomallocation
thatgeneratedit.Bytabulatingallpossiblecombinationsoftreatmentsandcontrols
inourtrialsample,andtheATEassociatedwitheach,wecancalculatetheexactdis-
tributionoftheestimatedATEunderthenull.Thisallowsustocalculatetheproba-
bilityofcalculatinganestimateaslargeasouractualestimatewhentherearenoef-
fectsoftreatment.Thisrandomizationtestrequiresafinitesample,butitwillwork
foranysamplesize(seeImbensandWooldridge(2009)foranexcellentaccountof
theprocedure).Imbens(2010)arguesthatitisthisrandomizationinferenceplus
theunbiasednessoftheATEthatprovidesthetwinnon-parametricpillarsthatsup-
portplacingRCTsatthe“verytop”ofthehierarchyofevidence.
Randomizationinferencecanbeusedfornullhypothesesthatspecifythatall
ofthetreatmenteffectsarezero,asintheaboveexample,butitcannotbeusedto
testthehypothesisthattheaveragetreatmenteffectiszero,whichwilloftenbeof
interest.Inagriculturaltrials,andinmedicine,thestronger(sharp)hypothesisthat
thetreatmenthasnoeffectwhateverisoftenofinterest.Inmanyeconomicapplica-
tionsthatinvolvemoney,suchaswelfareexperimentsorcost-benefitanalyses,we
areinterestedinwhethertheneteffectofthetreatmentispositiveornegative,and
inthesecases,randomizationinferencecannotbeused.Noneofwhichargues
againstitswideruseinsocialscienceswhenappropriate.
Incaseswhererandomizationinferencecannotbeused,wemustconstruct
testsforthedifferencesintwomeans.Standardprocedureswilloftenworkwell,but
therearetwopotentialpitfalls.One,the‘Fisher-Behrensproblem’,comesfromthe
![Page 22: NBER WORKING PAPER SERIES UNDERSTANDING AND ...Lars Hansen, Jeff Hammer, Glenn Harrison, Macartan Humphreys, Michal Kolesár, Helen Milner, Tamlyn Munslow, Suresh Naidu, Lant Pritchett,](https://reader033.vdocuments.net/reader033/viewer/2022060100/60b128e9329a544c0969daf7/html5/thumbnails/22.jpg)
21
factthat,whenthetwosampleshavedifferentvariances—whichwetypicallywant
topermit—theusualt-statisticdoesnothavethet-distribution.Thesecondprob-
lem,whichismuchhardertoaddress,occurswhenthedistributionoftreatmentef-
fectsisnotsymmetric(BahadurandSavage(1956)).Neitherpitfallisspecificto
RCTs,butRCTsforceustoworkwithmeansinestimatingtreatmenteffectsand,
withonlyaveryfewexceptionsintheliterature,socialscientistswhouseRCTsap-
peartobeunawareofthedifficulties.
InthesimplecaseofcomparingtwomeansinanRCTwithoutcovariates,in-
ferenceisusuallybasedonthetwo–samplet–statisticwhichiscomputedbydivid-
ingtheATEbytheestimatedstandarderrorwhosesquareisgivenby
𝜎4 =𝑛" − 1 6" 𝑌( − 𝑌"(7"
4
𝑛"+
𝑛& − 1 6" 𝑌( − 𝑌&(7&4
𝑛&3
where0referstocontrolsand1totreatments,sothatthereare𝑛"treatmentsand
𝑛&controls,and𝑌"and𝑌&arethetwomeans.Ashaslongbeenknown,the“t-statis-
tic”basedon(3)isnotdistributedasStudent’stifthetwovariances(treatmentand
control)arenotidenticalbuthastheBehrens–Fisherdistribution.Inextremecases,
whenoneofthevariancesiszero,thet–statistichaseffectivedegreesoffreedom
halfofthatofthenominaldegreesoffreedom,sothatthetest-statistichasthicker
tailsthanallowedfor,andtherewillbetoomanyrejectionswhenthenullistrue.
Young(2016)arguesthatthisproblemisworsewhenthetrialresultsarean-
alyzedbyregressingoutcomesnotonlyonthetreatmentdummybutalsoonaddi-
tionalcontrolsandwhenusingclusteredorrobuststandarderrors.Whenthede-
signmatrixissuchthatthemaximalinfluenceislarge,sothatforsomeobservations
outcomeshavelargeinfluenceontheirownpredictedvalues,thereisareductionin
theeffectivedegreesoffreedomforthet–value(s)oftheaveragetreatmenteffect(s)
leadingtospuriousfindingsofsignificance.Younglooksat2,003regressionsre-
portedin53RCTpapersintheAmericanEconomicAssociationjournalsandrecalcu-
latesthesignificanceoftheestimatesusingrandomizationinferenceappliedtothe
authors’originaldata.In30to40percentoftheestimatedtreatmenteffectsinindi-
vidualequationswithcoefficientsthatarereportedassignificant,hecannotreject
thenullofnoeffectforanyobservation;thefractionofspuriouslysignificantresults
![Page 23: NBER WORKING PAPER SERIES UNDERSTANDING AND ...Lars Hansen, Jeff Hammer, Glenn Harrison, Macartan Humphreys, Michal Kolesár, Helen Milner, Tamlyn Munslow, Suresh Naidu, Lant Pritchett,](https://reader033.vdocuments.net/reader033/viewer/2022060100/60b128e9329a544c0969daf7/html5/thumbnails/23.jpg)
22
increasesfurtherwhenhesimultaneouslytestsforallresultsineachpaper.These
spuriousfindingscomeinpartfromissuesofmultiple-hypothesistesting,both
withinregressionswithseveraltreatmentsandacrossregressions.Withinregres-
sions,treatmentsarelargelyorthogonal,butauthorstendtoemphasizesignificant
t–valuesevenwhenthecorrespondingF-testsareinsignificant.Acrossequations,
resultsareoftenstronglycorrelated,sothat,atworst,differentregressionsarere-
portingvariantsofthesameresult,thusspuriouslyaddingtothe“killcount”ofsig-
nificanteffects.Atthesametime,thepervasivenessofobservationswithhighinflu-
encegeneratesspurioussignificanceonitsown.
Theseissuesarenowbeingtakenmoreseriously.InadditiontoYoung
(2016),ImbensandKolesár(2016)providepracticaladvicefordealingwiththe
Fisher-Behrensproblem,andthebestcurrentpracticetriestobecarefulaboutmul-
tiplehypothesistesting.Yetitremainsthecasethatmanyoftheresultsinthelitera-
turearespuriouslysignificant.
Spurioussignificancealsoariseswhenthedistributionoftreatmenteffects
containsoutliersor,moregenerally,isnotsymmetric.Standardt–testsbreakdown
indistributionswithenoughskewness(seeLehmannandRomano(2005,p.466–
8)).Howdifficultisittomaintainsymmetry?Andhowbadlyisinferenceaffected
whenthedistributionoftreatmenteffectsisnotsymmetric?Ineconomics,manytri-
alshaveoutcomesvaluedinmoney.Doesananti-povertyinnovation—forexample
microfinance—increasetheincomesoftheparticipants?Incomeitselfisnotsym-
metricallydistributed,andthismightbetrueofthetreatmenteffectstooifthereare
afewpeoplewhoaretalentedbutcredit-constrainedentrepreneursandwhohave
treatmenteffectsthatarelargeandpositive,whilethevastmajorityofborrowers
fritterawaytheirloans,oratbestmakepositivebutmodestprofits.Arecentsum-
maryoftheliteratureisconsistentwiththis(seeBanerjee,Karlan,andZinman
(2015)).Anotherimportantexampleisexpendituresonhealthcare.Mostpeople
havezeroexpenditureinanygivenperiod,butamongthosewhodoincurexpendi-
tures,afewindividualsspendhugeamountsthataccountforalargeshareoftheto-
tal.Indeed,inthefamousRandhealthexperiment(seeManning,Newhouseetal.
![Page 24: NBER WORKING PAPER SERIES UNDERSTANDING AND ...Lars Hansen, Jeff Hammer, Glenn Harrison, Macartan Humphreys, Michal Kolesár, Helen Milner, Tamlyn Munslow, Suresh Naidu, Lant Pritchett,](https://reader033.vdocuments.net/reader033/viewer/2022060100/60b128e9329a544c0969daf7/html5/thumbnails/24.jpg)
23
(1987,1988)),thereisasingleverylargeoutlier.Theauthorsrealizethatthecom-
parisonofmeansacrosstreatmentarmsisfragile,and,althoughtheydonotsee
theirproblemexactlyasdescribedhere,theyobtaintheirpreferredestimatesusing
astructuralapproachthatisexplicitlydesignedtomodeltheskewnessofexpendi-
tures.
Insomecases,itwillbeappropriatetodealwithoutliersbytrimming,trans-
forming,oreliminatingobservationsthathavelargeeffectsontheestimates.Butif
theexperimentisaprojectevaluationdesignedtoestimatethenetbenefitsofapol-
icy,theeliminationofgenuineoutliers,asintheRandHealthExperiment,willviti-
atetheanalysis.Itispreciselytheoutliersthatmakeorbreaktheprogram.Trans-
formations,suchastakinglogarithms,mayhelptoproducesymmetry,butthey
changethenatureofthequestionbeingasked;acostbenefitanalysismustbedone
indollars,notlogdollars.
Weconsideranexamplethatillustrateswhatcanhappeninarealisticbut
simplifiedcase;thefullresultsarereportedintheAppendix.Weimagineapopula-
tionofindividuals,eachwithatreatmenteffect𝛽( .Theparentpopulationmeanof
thetreatmenteffectsiszero,butthereisalongtailofpositivevalues;weusealeft-
shiftedlognormaldistribution.Wehaveamicrofinancetrialinmind,wherethereis
alongpositivetailofrareindividualswhocandoamazingthingswithcreditwhile
mostpeoplecannotuseiteffectively.Atrialsampleof2n individualsisrandomly
drawnfromtheparentpopulationandisrandomlysplitbetweenntreatmentsand
ncontrols.Withineachtrialsample,whosetrueATEwillgenerallydifferfromzero
becauseofthesampling,werunmanyRCTsandtabulatethevaluesoftheATEfor
each.
Usingstandardt-tests,the(trueintheparentdistribution)hypothesisthat
theATEiszeroisrejectedbetween14(𝑛 = 25)and6percent(𝑛 = 500)ofthetime.
Theserejectionscomefromtwoseparateissues,bothofwhicharerelevantinprac-
tice;(a)thattheATEintrialsamplediffersfromtheATEintheparentpopulationof
interest,and(b)thatthet-valuesarenotdistributedastinthepresenceofoutliers.
![Page 25: NBER WORKING PAPER SERIES UNDERSTANDING AND ...Lars Hansen, Jeff Hammer, Glenn Harrison, Macartan Humphreys, Michal Kolesár, Helen Milner, Tamlyn Munslow, Suresh Naidu, Lant Pritchett,](https://reader033.vdocuments.net/reader033/viewer/2022060100/60b128e9329a544c0969daf7/html5/thumbnails/25.jpg)
24
Theproblemcasesarewhenthetrialsamplehappenstocontainoneormoreoutli-
ers,somethingthatisalwaysariskgiventhelongpositivetailoftheparentdistribu-
tion.Whenthishappens,everythingdependsonwhethertheoutlierisamongthe
treatmentsorthecontrols;ineffect,theoutliersbecomethesample,reducingthe
effectivenumberofdegreesoffreedom.Inextremecases,oneofwhichisillustrated
inFigureA.1,thedistributionofestimatedATEsisbimodal,dependingonthegroup
towhichtheoutlierisassigned.Whentheoutlierisinthetreatmentgroup,thedis-
persionacrossoutcomesislarge,asistheestimatedstandarderror,andsothose
outcomesrarelyrejectthenullusingthestandardtableoft–values.Theover-rejec-
tionscomefromcaseswhentheoutlierisinthecontrolgroup,theoutcomesarenot
sodispersed,andthet–valuescanbelarge,negative,andsignificant.Whilethese
casesofbimodaldistributionsmaynotbecommonanddependontheexistenceof
largeoutliers,theyillustratetheprocessthatgeneratestheover-rejectionsandspu-
rioussignificance.Notethatthereisnoremedythroughrandomizationinference
here,giventhatourinterestisinthehypothesisthattheaveragetreatmenteffectis
zero.
OurreadingoftheliteratureonRCTsindevelopmenteconomicssuggests
thattheyarenotexemptfromtheseconcerns.Manydevelopmenttrialsarerunon
(sometimesvery)smallsamples,theyhavetreatmenteffectswhereasymmetryis
hardtoruleout—especiallywhentheoutcomesareinmoney—andtheyoftengive
resultsthatarepuzzling,oratleastnoteasilyinterpretedintermsofeconomicthe-
ory.NeitherBanerjeeandDuflo(2012)norKarlanandAppel(2011),whocitemany
RCTs,raiseconcernsaboutmisleadinginference,implicitlytreatingallresultsasre-
liable.Nodoubttherearebehaviorsintheworldthatareinconsistentwithstandard
economics,andsomecanbeexplainedbystandardbiasesinbehavioraleconomics,
butitwouldalsobegoodtobesuspiciousofthesignificancetestsbeforeaccepting
thatanunexpectedfindingiswell-supportedandthattheorymustberevised.Repli-
cationofresultsindifferentsettingsmaybehelpful,iftheyaretherightkindof
places(seeourdiscussioninSection2).Yetithardlysolvestheproblemgiventhat
theasymmetrymaybeinthesamedirectionindifferentsettings,thatitseemslikely
tobesoinjustthosesettingsthataresufficientlyliketheoriginaltrialsettingtobe
![Page 26: NBER WORKING PAPER SERIES UNDERSTANDING AND ...Lars Hansen, Jeff Hammer, Glenn Harrison, Macartan Humphreys, Michal Kolesár, Helen Milner, Tamlyn Munslow, Suresh Naidu, Lant Pritchett,](https://reader033.vdocuments.net/reader033/viewer/2022060100/60b128e9329a544c0969daf7/html5/thumbnails/26.jpg)
25
ofuseforinferenceaboutthepopulationofinterest,andthatthe“significant”t–val-
ueswillshowdeparturesfromthenullinthesamedirection.This,then,replicates
thespuriousfindings.
Asummary
Whatdotheargumentsofthissectionmeanabouttheimportanceofrandomization
andtheinterpretationthatshouldbegiventoanestimatedATEfromarandomized
trial?First,weshouldbesurethatanunbiasedestimateofanATEforthetrialpopu-
lationislikelytobeusefulenoughtowarrantthecostsofrunningthetrial.Second,
sincerandomizationdoesnotensureorthogonality,caremustbetaken(e.g.by
blinding)thattherearenosignificantpost-randomizationcorrelateswiththetreat-
ment.Thisisawell-knownlessonbutmanysocialandeconomictrialsarenot
blindedandinsufficientdefenseisofferedthatunbiasednessisnotundermined.In-
deed,lackofblindingisnottheonlysourceofpost-randomizationbias.Treatments
andcontrolsmaybehandledindifferentplaces,orbydifferentlytrainedpractition-
ers,oratdifferenttimesofday,andthesedifferencescanbringwiththemsystem-
aticdifferencesintheothercausestowhichthetwogroupsareexposed.Thesecan,
andshould,beguardedagainst.Butdoingsorequiresanunderstandingofwhat
thesecausallyrelevantfactorsmightbe.Third,theinferenceproblemsreviewed
herecannotjustbepresumedaway.Whenthereissubstantialheterogeneity,the
ATEinthetrialsamplecanbequitedifferentfromtheATEinthepopulationofin-
terest,evenifthetrialisrandomlyselectedfromthatpopulation;inpractice,there-
lationshipbetweenthetrialsampleandthepopulationisoftenobscure.
Beyondthat,inmanycases,thestatisticalinferencewillbefine,butserious
attentionshouldbegiventothepossibilitythatthereareoutliersintreatmentef-
fects,somethingthatknowledgeoftheproblemcansuggestandwhereinspectionof
themarginaldistributionsoftreatmentsandcontrolsmaybeinformative.Forexam-
ple,ifbotharesymmetric,itseemsunlikely(thoughcertainlynotimpossible)that
thetreatmenteffectsarehighlyskewed.MeasurestodealwithFisher-Behrens
shouldbeusedandrandomizationinferenceconsideredwhenappropriatetothe
hypothesisofinterest.
![Page 27: NBER WORKING PAPER SERIES UNDERSTANDING AND ...Lars Hansen, Jeff Hammer, Glenn Harrison, Macartan Humphreys, Michal Kolesár, Helen Milner, Tamlyn Munslow, Suresh Naidu, Lant Pritchett,](https://reader033.vdocuments.net/reader033/viewer/2022060100/60b128e9329a544c0969daf7/html5/thumbnails/27.jpg)
26
Allofthiscanberegardedasrecommendationsforimprovementtocurrent
practice,notachallengetoit.Morefundamentally,westronglycontesttheoften-ex-
pressedideathattheATEcalculatedfromanRCTisautomaticallyreliable,thatran-
domizationautomaticallycontrolsforunobservables,orworstofall,thatthecalcu-
latedATEistrue.If,bychance,itisclosetothetruth,thetruthwearereferringtois
thetruthinthetrialsampleonly.Tomakeanyinferencebeyondthatrequiresanar-
gumentofthekindweconsiderinthenextsection.Wehavealsoarguedthat,de-
pendingonwhatwearetryingtomeasureandwhatwewanttousethatmeasure
for,thereisnopresumptionthatanRCTisthebestmeansofestimatingit.Thattoo
requiresanargument,notapresumption.
Section2:Usingtheresultsofrandomizedcontrolledtrials
2.1Introduction
SupposewehaveestimatedanATEfromawell-conductedRCTonatrialsample,
andourstandarderrorgivesusreasontobelievethattheeffectdidnotcomeabout
bychance.Wethushavegoodwarrantthatthetreatmentcausestheeffectinour
trialsample,uptothelimitsofstatisticalinference.Whataresuchfindingsgoodfor?
Theliteratureineconomics,asindeedinmedicineandinsocialpolicy,has
paidmoreattentiontoobtainingresultsthantoconsideringwhatcanbedonewith
them.Thereislittletheoreticalorempiricalworktoguideushowandforwhatpur-
posestousethefindingsofRCTs,suchastheconditionsunderwhichthesamere-
sultsholdoutsideoftheoriginalsettings,howtheymightbeadaptedforuseelse-
where,orhowtheymightbeusedforformulating,testing,understanding,orprob-
inghypothesesbeyondtheimmediaterelationbetweenthetreatmentandtheout-
comeinvestigatedinthestudy.Yetitcannotbethatknowinghowtouseresultsis
lessimportantthanknowinghowtodemonstratethem.Anychainofevidenceisonly
asstrongasitweakestlink,sothatarigorouslyestablishedeffectwhoseapplicabil-
ityisjustifiedbyaloosedeclarationofsimilewarrantslittle.Iftrialsaretobeuseful,
weneedpathstotheirusethatareascarefullyconstructedasarethetrialsthem-
selves.
Theargumentforthe“primacyofinternalvalidity”madebyShadish,Cook,
andCampbell(2002)maybereasonableasawarningthatbadRCTsareunlikelyto
![Page 28: NBER WORKING PAPER SERIES UNDERSTANDING AND ...Lars Hansen, Jeff Hammer, Glenn Harrison, Macartan Humphreys, Michal Kolesár, Helen Milner, Tamlyn Munslow, Suresh Naidu, Lant Pritchett,](https://reader033.vdocuments.net/reader033/viewer/2022060100/60b128e9329a544c0969daf7/html5/thumbnails/28.jpg)
27
generalize,butitissometimesincorrectlytakentoimplythatresultsofaninternally
validtrialwillautomatically,oroften,apply‘asis’elsewhere,orthatthisshouldbe
thedefaultassumptionfailingargumentstothecontrary,asifaparameter,once
wellestablished,canbeexpectedtobeinvariantacrosssettings.Aninvarianceas-
sumptionisoftenmadeinmedicine,forexample,whereitissometimesplausible
thataparticularprocedureordrugworksthesamewayeverywhere(thoughsee
Horton(2000)forastrongdissentandRothwell(2005)forexamplesonbothsides
ofthequestion).Weshouldalsonotetherecentmovementtoensurethattestingof
drugsincludeswomenandminoritiesbecausemembersofthosegroupssuppose
thattheresultsoftrialsonmostlyhealthyyoungwhitemalesdonotapplytothem.
2.2Usingresults,transportability,andexternalvalidity
Supposeatrialhasestablishedaresultinaspecificsetting.If`thesame’resultholds
elsewhere,itissaidtohave`externalvalidity’.Externalvaliditymayreferjusttothe
transportabilityofthecausalconnection,orgofurtherandrequirereplicationofthe
magnitudeoftheATE.Eitherway,theresultholds—everywhere,orwidely,orin
somespecificelsewhere—oritdoesnot.
Thisbinaryconceptofexternalvalidityisoftenunhelpfulbecauseitasksthe
resultsofanRCTtosatisfyaconditionthatisneithernecessarynorsufficientfora
trialtobeuseful,andsobothoverstatesandunderstatestheirvalue.Itdirectsusto-
wardsimpleextrapolation—whetherthesameresultholdselsewhere—orsimple
generalization—itholdsuniversallyoratleastwidely—andawayfrommorecom-
plexbutmoreusefulapplicationsoftheresults.Thefailureofexternalvalidityinter-
pretedassimplegeneralizationorextrapolationsayslittleaboutthevalueofthe
trial.
First,thereareseveralusesofRCTsthatdonotrequiretransportabilitybe-
yondtheoriginalcontext;wediscusstheseinthenextsubsection.Second,thereare
oftengoodreasonstoexpectthattheresultsfromawell-conducted,informative,
andpotentiallyusefulRCTwillnotapplyelsewhereinanysimpleway.Withoutfur-
therunderstandingandanalysis,evensuccessfulreplicationtellsuslittleeitherfor
oragainstsimplegeneralizationortosupportfortheconclusionthatthenextwill
workinthesameway.Nordofailuresofreplicationmaketheoriginalresultuseless.
![Page 29: NBER WORKING PAPER SERIES UNDERSTANDING AND ...Lars Hansen, Jeff Hammer, Glenn Harrison, Macartan Humphreys, Michal Kolesár, Helen Milner, Tamlyn Munslow, Suresh Naidu, Lant Pritchett,](https://reader033.vdocuments.net/reader033/viewer/2022060100/60b128e9329a544c0969daf7/html5/thumbnails/29.jpg)
28
Weoftenlearnmuchfromcomingtounderstandwhyreplicationfailedandcanuse
thatknowledge,inlookingforhowthefactorsthatcausedtheoriginalresultmight
operatedifferentlyindifferentsettings.Third,andparticularlyimportantforscien-
tificprogress,theRCTresultcanbeincorporatedintoanetworkofevidenceandhy-
pothesesthattestorexploreclaimsthatlookverydifferentfromtheresultsre-
portedfromtheRCT.WeshallgiveexamplesbelowofextremelyusefulRCTsthat
arenotexternallyvalidinthe(usual)sensethattheirresultsdonotholdelsewhere,
whetherinaspecifictargetsettingorinthemoresweepingsenseofholdingevery-
where.
BertrandRussell’schicken(Russell(1912))providesanexcellentexampleof
thelimitationstostraightforwardextrapolationfromrepeatedsuccessfulreplica-
tion.Thebirdinfers,onrepeatedevidence,thatwhenthefarmercomesinthe
morning,hefeedsher.TheinferenceservesherwelluntilChristmasmorning,when
hewringsherneckandservesherfordinner.Thoughthischickendidnotbaseher
inferenceonanRCT,hadweconstructedoneforher,wewouldhaveobtainedthe
sameresultthatshedid.Herproblemwasnothermethodology,butratherthatshe
didnotunderstandthesocialandeconomicstructurethatgaverisetothecausalre-
lationsthatsheobserved.
So,establishingcausalitydoesnothinginandofitselftoguaranteegenerali-
zability.NordoestheabilityofanidealRCTtoeliminatebiasfromselectionorfrom
omittedvariablesmeanthattheresultingATEfromthetrialsamplewillapplyany-
whereelse.Theissueisworthmentioningonlybecauseoftheenormousweight
thatiscurrentlyattachedineconomicstothediscoveryandlabelingofcausalrela-
tions,aweightthatishardtojustifyforeffectsthatmayhaveonlylocalapplicabil-
ity,whatmightbelabeled‘anecdotalcausality’.Theoperationofacausegenerally
requiresthepresenceof“supportfactors”,withoutwhichacausethatproducesthe
targetedeffectinoneplace,eventhoughitmaybepresentandhavethecapacityto
operateelsewhere,willremainlatentandinoperative.WhatMackie(1974)called
INUScausality(InsufficientbutNon-redundantpartsofaconditionthatisitselfUn-
necessarybutSufficientforacontributiontotheoutcome)isoftenthekindofcau-
salitywesee.Astandardexampleisahouseburningdownbecausethetelevision
![Page 30: NBER WORKING PAPER SERIES UNDERSTANDING AND ...Lars Hansen, Jeff Hammer, Glenn Harrison, Macartan Humphreys, Michal Kolesár, Helen Milner, Tamlyn Munslow, Suresh Naidu, Lant Pritchett,](https://reader033.vdocuments.net/reader033/viewer/2022060100/60b128e9329a544c0969daf7/html5/thumbnails/30.jpg)
29
waslefton,althoughtelevisionsdonotoperateinthiswaywithoutsupportfactors,
suchaswiringfaults,thepresenceoftinder,andsoon.Thisisstandardfareinepi-
demiology,whichusestheterm`causalpie’torefertoasetofcausesthatarejointly
butnotseparatelysufficientforaneffect.
Ifwerewrite(1)intheform
𝑌( = 𝛽(𝑇( + 𝛾<𝑥(< = 𝜃 𝑤( 𝑇(
@
<A"
+ 𝛾<𝑥(<
@
<A"
(4)
wherethefunction𝜃(. )controlshowak-vector𝑤( ofk`supportfactors’affectindi-
viduali’streatmenteffect𝛽( .Thesupportfactorsmayincludesomeofthex’s.Since
theATEistheaverageofthe𝛽(𝑠,twopopulationswillhavethesameATEifandonly
iftheyhavethesameaveragefortheneteffectofthesupportfactorsnecessaryfor
thetreatmenttowork,i.e.forthequantityinfrontof𝑇( .Thesearehoweverjustthe
kindoffactorsthatarelikelytobedifferentlydistributedindifferentpopulations,
andindeedwedogenerallyfinddifferentATEsindifferenteconomic(andotherso-
cialpolicy)RCTsindifferentplaceseveninthecaseswhere(unusually)theyall
pointinthesamedirection.
Causalprocessesoftenrequirehighlyspecializedeconomic,cultural,orsocial
structurestoenablethemtowork.ConsidertheRubeGoldbergmachinethatis
riggedupsothatflyingakitesharpensapencil(CartwrightandHardie(2012,77)).
Theunderlyingstructureaffordsaveryspecificformof(4)thatwillnotdescribe
causalprocesseselsewhere.NeitherthesameATEnorthesamequalitativecausal
relationscanbeexpectedtoholdwherethespecificformfor(4)isdifferent.Indeed,
wecontinuallyattempttodesignsystemsthatwillgeneratecausalrelationsthatwe
likeandthatwillruleoutcausalrelationsthatwedonotlike.Healthcaresystems
aredesignedtopreventnursesanddoctorsmakingerrors;carsaredesignedsothat
driverscannotstarttheminreverse;workschedulesforpilotsaredesignedsothey
donotflytoomanyconsecutivehourswithoutrestbecausealertnessandperfor-
mancearecompromised.
![Page 31: NBER WORKING PAPER SERIES UNDERSTANDING AND ...Lars Hansen, Jeff Hammer, Glenn Harrison, Macartan Humphreys, Michal Kolesár, Helen Milner, Tamlyn Munslow, Suresh Naidu, Lant Pritchett,](https://reader033.vdocuments.net/reader033/viewer/2022060100/60b128e9329a544c0969daf7/html5/thumbnails/31.jpg)
30
AsintheRubeGoldbergmachineandinthedesignofcarsandworksched-
ules,theeconomicstructureandequilibriummaydifferinwaysthatsupportdiffer-
entkindsofcausalrelationsandthusrenderatrialinonesettinguselessinanother.
Forexample,atrialthatreliesonprovidingincentivesforpersonalpromotionisof
nouseinastateinwhichapoliticalsystemlockspeopleintotheirsocialandeco-
nomicpositions.Cashtransfersthatareconditionalonparentstakingtheirchildren
toclinicscannotimprovechildhealthintheabsenceoffunctioningclinics.Policies
targetedatmenmaynotworkforwomen.Weusealevertotoastourbread,butlev-
ersonlyoperatetotoastbreadinatoaster;wecannotbrowntoastbypressingan
accelerator,eveniftheprincipleoftheleveristhesameinbothatoasterandacar.
Ifwemisunderstandthesetting,ifwedonotunderstandwhythetreatmentinour
RCTworks,werunthesamerisksasRussell’schicken.
2.3WhenRCTsspeakforthemselves:notransportabilityrequired
Forsomethingswewanttolearn,anRCTisenoughbyitself.AnRCTmayprovidea
counterexampletoageneraltheoreticalproposition,eithertothepropositionitself
(asimplerefutationtest)ortosomeconsequenceofit(acomplexrefutationtest).
AnRCTmayalsoconfirmapredictionofatheory,andalthoughthisdoesnotcon-
firmthetheory,itisevidenceinitsfavor,especiallyifthepredictionseemsinher-
entlyunlikelyinadvance.Thisisallfamiliarterritory,andthereisnothingunique
aboutanRCT;itissimplyoneamongmanypossibletestingprocedures.Evenwhen
thereisnotheory,orveryweaktheory,anRCT,bydemonstratingcausalityinsome
populationcanbethoughtofasproofofconcept,thatthetreatmentiscapableof
workingsomewhere.Thisisoneoftheargumentsfortheimportanceofinternalva-
lidity.
NoristransportationcalledforwhenanRCTisusedforevaluation,forexam-
pletosatisfydonorsthattheprojecttheyfundedachieveditsaimsinthepopulation
inwhichitwasconducted.Evenso,forsuchevaluations,saybytheWorldBank,to
beglobalpublicgoodsrequiresargumentsandguidelinesthatjustifyusingthere-
sultsinsomewayelsewhere;theglobalpublicgoodisnotanautomaticby-product
oftheBankfulfillingitsfiduciaryresponsibility.Whenthecomponentsoftreat-
mentschangeacrossstudies,evaluationsneednotleadtocumulativeknowledge.Or
![Page 32: NBER WORKING PAPER SERIES UNDERSTANDING AND ...Lars Hansen, Jeff Hammer, Glenn Harrison, Macartan Humphreys, Michal Kolesár, Helen Milner, Tamlyn Munslow, Suresh Naidu, Lant Pritchett,](https://reader033.vdocuments.net/reader033/viewer/2022060100/60b128e9329a544c0969daf7/html5/thumbnails/32.jpg)
31
asHeckmanetal(1999,1934)note,“thedataproducedfromthem[socialexperi-
ments]arefarfromidealforestimatingthestructuralparametersofbehavioral
models.Thismakesitdifficulttogeneralizefindingsacrossexperimentsortouse
experimentstoidentifythepolicy-invariantstructuralparametersthatarerequired
foreconometricpolicyevaluation.”
Ofcourse,whenweaskexactlywhatthoseinvariantstructuralparameters
are,whethertheyexist,andhowtheyshouldbemodeled,weopenupmajorfault
linesinmodernappliedeconomics.Forexample,wedonotintendtoendorseinter-
temporaldynamicmodelsofbehaviorastheonlywayofrecoveringtheparameters
thatweneed.Wealsorecognizethattheusefulnessofsimplepricetheoryisnotas
universallyacceptedasitoncewas.Butthepointremainsthatweneedsomething,
someregularityorsomeinvariance,andthatsomethingcanrarelyberecoveredby
simplygeneralizingacrosstrials.
Athirdnon-problematicandimportantuseofanRCTiswhentheparameter
ofinterestistheATEinawell-definedpopulationfromwhichthetrialsampleisit-
selfarandomsample.Inthiscasethesampleaveragetreatmenteffect(SATE)isan
unbiasedestimatorofthepopulationaveragetreatmenteffect(PATE)that,byas-
sumption,isourtarget(seeImbens(2004)fortheseterms).Werefertothisasthe
`publichealth’case;likemanypublichealthinterventions,thetargetistheaverage,
`populationhealth,’notthehealthofindividuals.Onemajor(andwidelyrecog-
nized)dangerofthisuseofRCTsisthatscalingupfrom(evenarandom)sampleto
thepopulationwillnotgothroughinanysimplewayiftheoutcomesofindividuals
orgroupsofindividualschangethebehaviorofothers—whichwillbecommonin
economicexamplesbutperhapslesssoinhealth.Thereisalsoanissueoftimingif
timeelapsesbetweenthetrialandtheimplementation.
Ineconomics,a`public-health-style’exampleistheimpositionofacommod-
itytax,wherethetotaltaxrevenueisofinterestandpolicymakersdonotcarewho
paysthetax.Indeed,theorycanoftenidentifyaspecific,well-definedquantity
whosemeasurementiskeyforapolicy(seeDeatonandNg(1998)foranexampleof
whatChetty(2009)callsa“sufficient”statistic).Inthiscase,thebehaviorofaran-
domsampleofindividualsmightwellprovideagoodguidetothetaxrevenuethat
![Page 33: NBER WORKING PAPER SERIES UNDERSTANDING AND ...Lars Hansen, Jeff Hammer, Glenn Harrison, Macartan Humphreys, Michal Kolesár, Helen Milner, Tamlyn Munslow, Suresh Naidu, Lant Pritchett,](https://reader033.vdocuments.net/reader033/viewer/2022060100/60b128e9329a544c0969daf7/html5/thumbnails/33.jpg)
32
canbeexpected.Anothercasecomesfromworkonpovertyprogramswherethe
sponsorsaremostconcernedaboutthebudget;wediscussthesecasesattheendof
thisSection.Evenhere,itiseasytoimaginebehavioraleffectscomingintoplaythat
driveawedgebetweenthetrialanditsfull-scaleimplementation,forexampleif
complianceishigherwhentheschemeiswidelypublicized,orifgovernmentagen-
ciesimplementtheschemedifferentlyfromtrialists.
2.4Transportingresultslaterallyandglobally
TheprogramofRCTsineconomics,asinotherareasofsocialscience,hasthe
broadergoaloffindingout`whatworks.’Atitsmostambitious,thisaimsforuniver-
salreach,andthedevelopmenteconomicsliteraturefrequentlyarguesthat“credi-
bleimpactevaluationsareglobalpublicgoodsinthesensethattheycanofferrelia-
bleguidancetointernationalorganizations,governments,donors,andnongovern-
mentalorganizations(NGOs)beyondnationalborders”,DufloandKremer(2008,
93).SometimestheresultsofasingleRCTareadvocatedashavingwideapplicabil-
ity,withespeciallystrongendorsementwhenthereisatleastonereplication.For
example,KremerandHolla(2009,3)useaKenyantrialasthebasisforablanket
statementwithoutspecifyingcontext,“Provisionoffreeschooluniforms,forexam-
ple,leadsto10%-15%reductionsinteenpregnancyanddropoutrates.”Dufloand
Kremer(2008,104),writingaboutanothertrial,aremorecautious,citingtwoeval-
uationsandrestrictingthemselvestoIndia:“Onecanberelativelyconfidentabout
recommendingthescaling-upofthisprogram,atleastinIndia,onthebasisofthese
estimates,sincetheprogramwascontinuedforaperiodoftime,wasevaluatedin
twodifferentcontexts,andhasshownitsabilitytoberolledoutonalargescale.”
Evenanumberofreplicationsdonotprovideasoundbasisforinference.Without
theorytosupporttheprojectionofresults,thisisjustinductionbysimpleenumera-
tion—swan1iswhite,swan2iswhite,...,soallswansarewhite.
TheproblemofgeneralizationextendsbeyondRCTs,toboth`fullycon-
trolled’laboratoryexperimentsandtomostnon-experimentalfindings.Ourargu-
menthereisthatevidencefromRCTsisnotautomaticallysimplygeneralizable,and
thatitssuperiorinternalvalidity,ifandwhenitexists,doesnotprovideitwithany
uniqueinvarianceacrosscontext.Thattransportationisfarfromautomaticalso
![Page 34: NBER WORKING PAPER SERIES UNDERSTANDING AND ...Lars Hansen, Jeff Hammer, Glenn Harrison, Macartan Humphreys, Michal Kolesár, Helen Milner, Tamlyn Munslow, Suresh Naidu, Lant Pritchett,](https://reader033.vdocuments.net/reader033/viewer/2022060100/60b128e9329a544c0969daf7/html5/thumbnails/34.jpg)
33
tellsuswhy(evenideal)RCTsofsimilarinterventionsgivedifferentanswersindif-
ferentsettings.Suchdifferencesdonotnecessarilyreflectmethodologicalfailings
andwillholdacrossperfectlyexecutedRCTsjustastheydoacrossobservational
studies.
ManyadvocatesofRCTsunderstandthat`whatworks’needstobequalified
to`whatworksunderwhichcircumstances’andtrytosaysomethingaboutwhat
thosecircumstancesmightbe,forexample,byreplicatingRCTsindifferentplaces
andthinkingintelligentlyaboutthedifferencesinoutcomeswhentheyfindthem.
Sometimesthisisdoneinasystematicway,forexamplebyhavingmultipletreat-
mentswithinthesametrialsothatitispossibletoestimatea`responsesurface’that
linksoutcomestovariouscombinationsoftreatments(seeGreenbergandSchroder
(2004)orShadishetal(2002)).Forexample,theRANDhealthexperimenthadmul-
tipletreatments,allowinginvestigation,notofhowmuchhealthinsurancein-
creasedexpendituresunderdifferentcircumstances.Someofthenegativeincome
taxexperiments(NITs)inthe1960sand1970sweredesignedtoestimateresponse
surfaces,withthenumberoftreatmentsandcontrolsineacharmoptimizedtomax-
imizeprecisionofestimatedresponsefunctionssubjecttoanoverallcostlimit(see
Conlisk(1973)).Experimentsontime-of-daypricingforelectricityhadasimilar
structure(seeAigner(1985)).
TheexperimentsbyMDRC(originallyknownastheManpowerDevelopment
ResearchCorporation)havealsobeenanalyzedacrosscitiesinanefforttolinkcity
featurestotheresultsoftheRCTswithinthem(seeBloom,Hill,andRiccio(2005)).
UnliketheRANDandNITexamples,theseareexpostanalysesofcompletedtrials;
thesameistrueofVivalt(2015),whofinds,forthecollectionoftrialsshestudied,
thatdevelopment-relatedRCTsrunbygovernmentagenciestypicallyfindsmaller
(standardized)effectsizesthanRCTsrunbyacademicsorbyNGOs.Boldetal
(2013),whoranparallelRCTsonaninterventionimplementedeitherbyanNGOor
bythegovernmentofKenya,foundsimilarresultsthere.Notethattheseanalyses
haveadifferentpurposefrommeta-analysesthatassumethatdifferenttrialsesti-
matethesameparameteruptonoiseandaverageinordertoincreaseprecision.
![Page 35: NBER WORKING PAPER SERIES UNDERSTANDING AND ...Lars Hansen, Jeff Hammer, Glenn Harrison, Macartan Humphreys, Michal Kolesár, Helen Milner, Tamlyn Munslow, Suresh Naidu, Lant Pritchett,](https://reader033.vdocuments.net/reader033/viewer/2022060100/60b128e9329a544c0969daf7/html5/thumbnails/35.jpg)
34
Althoughthereareissueswithallofmethodsofinvestigatingdifferences
acrosstrials,withoutsomedisciplineitistooeasytocomeupwith`just-so’orfairy
storiesthataccountfordifferences.Weriskaprocedurethat,ifaresultisreplicated
infullorinpartinatleasttwoplaces,putsthattreatmentintothe`itworks’box
and,iftheresultdoesnotreplicate,casuallyinterpretsthedifferenceinawaythat
allowsatleastsomeofthefindingstosurvive.
Howcanwedobetterthansimplegeneralizationandsimpleextrapolation?
Manywritersemphasizetheroleoftheoryintransportingandusingtheresultsof
trials,andweshalldiscussthisinthenextsubsection.Butstatisticalapproachesare
alsowidelyused;thesearedesignedtodealwiththepossibilitythattreatmentef-
fectsvarysystematicallywithothervariables.Referringbackto(4),itisclearthat,
supposingthesameformof(4)obtains,ifthedistributionofthewvaluesisthe
sameinthenewcircumstancesasintheold,theATEintheoriginaltrialwillholdin
thenewcircumstances.Ingeneral,ofcourse,thisconditionwillnothold,nordowe
haveanyobviouswayofcheckingitunlessweknowwhatthesupportfactorsarein
bothplaces.Oneproceduretodealwithinteractionsispost-experimentalstratifica-
tion,whichparallelspost-surveystratificationinsamplesurveys.Thetrialisbroken
upintosubgroupsthathavethesamecombinationofknown,observablew’s(age,
race,genderforexample),thentheATEswithineachofthesubgroupsarecalcu-
lated,andthentheyarereassembledaccordingtotheconfigurationofw’sinthe
newcontext.ThiscanbeusedtoestimatetheATEinanewcontext,ortocorrectes-
timatestotheparentpopulationwhenthetrialsampleisnotarandomsampleof
theparent.Othermethodscanbeusedwhentherearetoomanyw’sforstratifica-
tion,forexamplebyestimatingtheprobabilityofeachobservationinthepopulation
includedinthetrialsampleasafunctionofthew’s,thenweightingeachobservation
bytheinverseofthesepropensityscores.AgoodreferenceforthesemethodsisStu-
artetal(2011),orineconomics,Angrist(2004)andHotz,Imbens,andMortimer
(2005).
Thesemethodsareoftennotapplicable,however.First,reweightingworks
onlywhentheobservablefactorsusedforreweightingincludeall(andonly)genu-
ineinteractivecauses.Second,aswithanyformofreweighting,thevariablesusedto
![Page 36: NBER WORKING PAPER SERIES UNDERSTANDING AND ...Lars Hansen, Jeff Hammer, Glenn Harrison, Macartan Humphreys, Michal Kolesár, Helen Milner, Tamlyn Munslow, Suresh Naidu, Lant Pritchett,](https://reader033.vdocuments.net/reader033/viewer/2022060100/60b128e9329a544c0969daf7/html5/thumbnails/36.jpg)
35
constructtheweightsmustbepresentinboththeoriginalandnewcontext.Forex-
ample,ifwearetocarryaresultforwardintime,wemaynotbeabletoextrapolate
fromaperiodoflowinflationtoaperiodofhighinflation.AsHotzetal(2005)note,
itwilltypicallybenecessarytoruleoutsuch`macro’effects,whetherovertime,or
overlocations.Third,italsodependsonassumingthatthesamegoverningequation
(4)coversthetrialandthetargetpopulation.
PearlandBareinboim(2011,2014)andBareinboimandPearl(2013,2014)
providestrategiesforinferringinformationaboutnewpopulationsfromtrialre-
sultsthataremoregeneralthanreweighting.Theysupposewehaveavailableboth
causalinformationandprobabilisticinformationforpopulationA(e.g.theexperi-
mentalone),whileforpopulationB(thetarget)wehaveonly(some)probabilistic
information,andalsothatweknowthatcertainprobabilisticandcausalfactsare
sharedbetweenthetwoandcertainonesarenot.Theyoffertheoremsdescribing
whatcausalconclusionsaboutpopulationBaretherebyfixed.Theirworkunder-
linesthefactthatexactlywhatconclusionsaboutonepopulationcanbesupported
byinformationaboutanotherdependsonexactlywhatcausalandprobabilisticfacts
theyhaveincommon. ButasMuller(2015)notes,this,liketheproblemwithsimple
reweighting,takesusbacktothesituationthatRCTsaredesignedtoavoid,where
weneedtostartfromacompleteandcorrectspecificationofthecausalstructure.
RCTscanavoidthisinestimation—whichisoneoftheirstrengths,supportingtheir
credibility—butthebenefitvanishesassoonaswetrytocarrytheirresultstoanew
context.
Thisdiscussionleadstoanumberofpoints.First,wecannotgettogeneral
claimsbysimplegeneralization;thereisnowarrantfortheconvenientassumption
thattheATEestimatedinaspecificRCTisaninvariantparameter,northatthe
kindsofinterventionsandoutcomeswemeasureintypicalRCTsparticipateingen-
eralcausalrelations.Whileitistruethatgeneralcausalclaimsexist—thatgravita-
tionalmassesattracteachother,orthatpeoplerespondtoincentives—theseuse
relativelyabstractconceptsandoperateatamuchhigherlevelthantheclaimsthat
canbereasonablyinferredfromatypicalRCT.
![Page 37: NBER WORKING PAPER SERIES UNDERSTANDING AND ...Lars Hansen, Jeff Hammer, Glenn Harrison, Macartan Humphreys, Michal Kolesár, Helen Milner, Tamlyn Munslow, Suresh Naidu, Lant Pritchett,](https://reader033.vdocuments.net/reader033/viewer/2022060100/60b128e9329a544c0969daf7/html5/thumbnails/37.jpg)
36
Second,thoughtfulpre-experimentalstratificationinRCTsislikelytobeval-
uable,orfailingthat,subgroupanalysis,becauseitcanprovideinformationthatmay
beusefulforgeneralizationortransportation.Forexample,KremerandHolla
(2009)notethat,intheirtrials,schoolattendanceissurprisinglysensitivetosmall
subsidies,whichtheysuggestisbecausetherearealargenumberofstudentsand
parentswhoareonthe(financial)marginbetweenattendingandnotattending
school;ifthisisindeedthemechanismfortheirresults,agoodvariableforstratifi-
cationwouldbedistancefromtherelevantcutoff.Wealsoneedtoknowthatthis
samemechanismworksinanynewtargetsetting.
Third,weneedtobeexplicitaboutcausalstructure,evenifthatmeansmore
modelbuildingandmore—ordifferent—assumptionsthanadvocatesofRCTsare
oftencomfortablewith.Tobeclear,modelingcausalstructuredoesnotcommitus
totheelaborateandoftenincredibleassumptionsthatcharacterizesomestructural
modelingineconomics,butthereisnoescapefromthinkingaboutthewaythings
work;thewhyaswellasthewhat.
Fourth,wewilltypicallyneedtoknowmorethantheresultsoftheRCTitself,
forexampleaboutdifferencesinsocial,economic,andculturalstructuresandabout
thejointdistributionsofcausalvariables,knowledgethatwilloftenonlybeavaila-
blethroughobservationalstudies.Wewillalsoneedexternalinformation,boththe-
oreticalandempirical,tosettleonaninformativecharacterizationofthepopulation
enrolledintheRCTbecausehowthatpopulationisdescribediscommonlytakento
besomeindicationofwhichotherpopulationstheresultsarelikelytobeexportable
to.Manymedicalandpsychologicaljournalsareexplicitaboutthis.Forinstance,the
rulesforsubmissionrecommendedbytheInternationalCommitteeofMedicalJour-
nalEditors,ICMJE(2015,14)insistthatarticleabstracts“Clearlydescribetheselec-
tionofobservationalorexperimentalparticipants(healthyindividualsorpatients,
includingcontrols),includingeligibilityandexclusioncriteriaandadescriptionof
thesourcepopulation.”AnRCTisconductedonaspecifictrialsample,somehow
drawnfromapopulationofspecificindividuals.Theresultsobtainedarefeaturesof
thatsample,ofthoseveryindividualsatthatverytime,notanyotherpopulation
![Page 38: NBER WORKING PAPER SERIES UNDERSTANDING AND ...Lars Hansen, Jeff Hammer, Glenn Harrison, Macartan Humphreys, Michal Kolesár, Helen Milner, Tamlyn Munslow, Suresh Naidu, Lant Pritchett,](https://reader033.vdocuments.net/reader033/viewer/2022060100/60b128e9329a544c0969daf7/html5/thumbnails/38.jpg)
37
withanydifferentindividualsthatmight,forexample,satisfyoneoftheinfiniteset
ofdescriptionsthatthetrialsamplesatisfies.
Thissameissueisconfrontedalreadyinstudydesign.Apartfromspecial
cases,likeposthocevaluationforpayment-for-results,wearenotespeciallycon-
cernedtolearnabouttheveryindividualsenrolledinthetrial.Mostexperiments
are,andshouldbe,conductedwithaneyetowhattheresultscanhelpuslearn
aboutotherpopulations.Thiscannotbedonewithoutsubstantialassumptions
aboutwhatmightandwhatmightnotberelevanttotheproductionoftheoutcome
studied.(Forexample,theICMJEguidelines(2015,14)goontosay:“Becausethe
relevanceofsuchvariablesasage,sex,orethnicityisnotalwaysknownatthetime
ofstudydesign,researchersshouldaimforinclusionofrepresentativepopulations
intoallstudytypesandataminimumprovidedescriptivedatafortheseandother
relevantdemographicvariables,”p14.)Sobothintelligentstudydesignandrespon-
siblereportingofstudyresultsinvolvesubstantialbackgroundassumptions.
Ofcourse,thisistrueforallstudies.ButRCTsrequirespecialconditionsif
theyaretobeconductedatallandespeciallyiftheyaretobeconductedsuccess-
fully—forexample,localagreements,compliantsubjects,affordableadministrators,
multipleblinding,peoplecompetenttomeasureandrecordoutcomesreliably,aset-
tingwhererandomallocationismorallyandpoliticallyacceptable,etc.—whereas
observationaldataareoftenmorereadilyandwidelyavailable.InthecaseofRCTs,
thereisdangerthatthesekindsofconsiderationshavetoomucheffect.Thisisespe-
ciallyworrisomewherethefeaturesthatthetrialsampleshouldhavearenotjusti-
fied,madeexplicit,orsubjectedtoseriouscriticalreview.
Theneedforobservationalknowledgeisoneofmanyreasonswhyitiscoun-
ter-productivetoinsistthatRCTsarethegoldstandardorthatsomecategoriesof
evidenceshouldbeprioritizedoverothers;thesestrategiesleaveushelplessinus-
ingRCTsbeyondtheiroriginalcontext.TheresultsofRCTsmustbeintegratedwith
otherknowledge,includingthepracticalwisdomofpolicymakers,iftheyaretobe
useableoutsidethecontextinwhichtheywereconstructed.
![Page 39: NBER WORKING PAPER SERIES UNDERSTANDING AND ...Lars Hansen, Jeff Hammer, Glenn Harrison, Macartan Humphreys, Michal Kolesár, Helen Milner, Tamlyn Munslow, Suresh Naidu, Lant Pritchett,](https://reader033.vdocuments.net/reader033/viewer/2022060100/60b128e9329a544c0969daf7/html5/thumbnails/39.jpg)
38
Contrarytomuchpracticeinmedicineaswellasineconomics,conflictsbe-
tweenRCTsandobservationalresultsneedtobeexplained,forexamplebyrefer-
encetothedifferentpopulationsineach,aprocessthatwillsometimesyieldim-
portantevidence,includingontherangeofapplicabilityoftheRCTresultsthem-
selves.WhilethevalidityoftheRCTwillsometimesprovideanunderstandingof
whytheobservationalstudyfoundadifferentanswer,thereisnobasis(orexcuse)
forthecommonpracticeofdismissingtheobservationalstudysimplybecauseit
wasnotanRCTandthereforemustbeinvalid.Itisabasictenetofscientificadvance
that,ascollectiveknowledgeadvances,newfindingsmustbeabletoexplainandbe
integratedwithpreviousresults,evenresultsthatarenowthoughttobeinvalid;
methodologicalprejudiceisnotanexplanation.
2.5Usingtheoryforgeneralization
Economistshavebeencombiningtheoryandrandomizedcontrolledtrialssincethe
earlyexperiments.OrcuttandOrcutt(1968)laidouttheinspirationfortheincome
taxtrialsusingasimplestatictheoryoflaborsupply.Accordingtothis,people
choosehowtodividetheirtimebetweenworkandleisureinanenvironmentin
whichtheyreceiveaminimumGiftheydonotwork,andwheretheyreceiveanad-
ditionalamount(1-t)wforeachhourtheywork,wherewisthewagerate,andtisa
taxrate.ThetrialsassigneddifferentcombinationsofGandttodifferenttrial
groups,sothattheresultstracedoutthelaborsupplyfunction,allowingestimation
oftheparametersofpreferences,whichcouldthenbeusedinawiderangeofpolicy
calculations,forexampletoraiserevenueatminimumutilitylosstoworkers.
Followingtheseearlytrials,therehasbeenacontinuingtraditionofusing
trialresults,togetherwiththebaselinedatacollectedforthetrial,tofitstructural
modelsthataretobeusedmoregenerally.EarlyexamplesincludeMoffitt(1979)on
laborsupplyandWise(1985)onhousing;amorerecentexampleisHeckman,Pinto
andSavelyev(2013)forthePerrypre-schoolprogram.Developmenteconomicsex-
amplesincludeAttanasio,Meghir,andSantiago(2012),Attanasioetal(2015),Todd
andWolpin(2006),Wolpin(2013),andDuflo,Hanna,andRyan(2012).These
![Page 40: NBER WORKING PAPER SERIES UNDERSTANDING AND ...Lars Hansen, Jeff Hammer, Glenn Harrison, Macartan Humphreys, Michal Kolesár, Helen Milner, Tamlyn Munslow, Suresh Naidu, Lant Pritchett,](https://reader033.vdocuments.net/reader033/viewer/2022060100/60b128e9329a544c0969daf7/html5/thumbnails/40.jpg)
39
structuralmodelssometimesrequireformidableauxiliaryassumptionsonfunc-
tionalformsorthedistributionsofunobservables,buttheyhavecompensatingad-
vantages,includingtheabilitytointegratetheoryandevidence,tomakeout-of-sam-
plepredictions,andtoanalyzewelfare,andtheuseofRCTevidenceallowsthere-
laxationofatleastsomeoftheassumptionsthatareneededforidentification.Inthis
way,thestructuralmodelsborrowcredibilityfromtheRCTsandinreturnhelpset
theRCTresultswithinacoherentframework.Withoutsomesuchinterpretation,the
welfareimplicationsofRCTresultscanbeproblematic;knowinghowpeopleingen-
eral(letalonejustpeopleinthetrialpopulation)respondtosomepolicyisrarely
enoughtotellwhetherornottheyaremadebetteroff,Harrison(2014a,b).Tradi-
tionalwelfareeconomicsdrawsalinkfrompreferencestobehavior,alinkthatisre-
spectedinstructuralworkbutoftenlostinthe`whatworks’literature,andwithout
whichwehavenobasisforinferringwelfarefrombehavior.Whatworksisnot
equivalenttowhatshouldbe.
Lighttouchtheorycandomuchtointerpret,toextend,andtouseRCTre-
sults.InboththeRANDHealthExperimentandnegativeincometaxexperiments,an
immediateissueconcernedthedifferencebetweenshortandlong-runresponses;
indeed,differencesbetweenimmediateandultimateeffectsoccurinawiderangeof
RCTs.BothhealthandtaxRCTsaimedtodiscoverwhatwouldhappenifconsum-
ers/workerswerepermanentlyfacedwithhigherorlowerprices/wages,butthetri-
alscouldonlyrunforalimitedperiod.Atemporarilyhightaxrateonearningsisef-
fectivelya`firesale’onleisure,sothattheexperimentprovidedanopportunityto
takeavacationandmakeuptheearningslater,anincentivethatwouldbeabsentin
apermanentscheme.Howdowegetfromtheshort-runresponsesthatcomefrom
thetrialtothelong-runresponsesthatwewanttoknow?Metcalf(1973)andAsh-
enfelter(1978)providedanswersfortheincometaxexperiments,asdidArrow
(1975)fortheRandHealthExperiment.
Arrow’sanalysisillustrateshowtousebothstructureandobservationaldata
totransportandadaptresultsfromonesettingtoanother.Hemodelsthehealthex-
perimentasatwo-periodmodelinwhichthepriceofmedicalcareisloweredinthe
firstperiodonly,andshowshowtoderivewhatwewant,whichistheresponsein
![Page 41: NBER WORKING PAPER SERIES UNDERSTANDING AND ...Lars Hansen, Jeff Hammer, Glenn Harrison, Macartan Humphreys, Michal Kolesár, Helen Milner, Tamlyn Munslow, Suresh Naidu, Lant Pritchett,](https://reader033.vdocuments.net/reader033/viewer/2022060100/60b128e9329a544c0969daf7/html5/thumbnails/41.jpg)
40
thefirstperiodifpriceswereloweredbythesameproportioninbothperiods.The
magnitudethatwewantisS,thecompensatedpricederivativeofmedicalcarein
period1inthefaceofidenticalincreasesin𝑝"and𝑝4inbothperiods1and2.Thisis
equalto𝑠"" + 𝑠"4,thesumofthederivativesofperiod1’sdemandwithrespectto
thetwoprices.Thetrialgivesonly𝑠"".Butifwehavepost-trialdataonmedicalser-
vicesforbothtreatmentsandcontrols,wecaninfer𝑠4",theeffectoftheexperi-
mentalpricemanipulationonpost-experimentalcare.Choicetheory,intheformof
Slutskysymmetrysaysthat𝑠"4 = 𝑠4"andsoallowsArrowtoinfer𝑠"4andthusS.He
contraststhiswithMetcalf’salternativesolution,whichmakesdifferentassump-
tions—thattwoperiodpreferencesareintertemporallyadditive,inwhichcasethe
long-runelasticitycanbeobtainedfromknowledgeoftheincomeelasticityofpost-
experimentalmedicalcare,whichwouldhavetocomefromanobservationalanaly-
sis.Thesetwoalternativeapproachesshowhowwecanchoose,basedonourwill-
ingnesstomakeassumptionsandonthedatathatwehave,asuitablecombination
of(elementaryandtransparent)theoreticalassumptionsandobservationaldatain
ordertoadaptandusetrialresults.Suchanalysiscanalsohelpdesigntheoriginal
trialbyclarifyingwhatweneedtoknowinordertousetheresultsofatemporary
treatmenttoestimatethepermanenteffectsthatweneed.Ashenfelterprovidesa
thirdsolution,notingthatthetwo-periodmodelisformallyidenticaltoatwo-person
model,sothatwecanuseinformationontwo-personlaborsupplytotellusabout
thedynamics.
Theorycanoftenallowustoreclassifyneworunknownsituationsasanalo-
goustosituationswherewealreadyhavebackgroundknowledge.Onefrequently
usefulwayofdoingthisiswhenthenewpolicycanberecastasequivalenttoa
changeinthebudgetconstraintthatrespondentsface.Theconsequencesofanew
policymaybeeasiertopredictifwecanreduceittoequivalentchangesinincome
andprices,whoseeffectsareoftenwellunderstoodandwell-studied.Toddand
Wolpin(2008)andWolpin(2013)makethispointandprovideexamples.Inthela-
borsupplycase,anincreaseinthetaxratehasthesameeffectasadecreaseinthe
wagerate,sothatwecanrelyonpreviousliteraturetopredictwhatwillhappen
whentaxratesarechanged.InthecaseofMexico’sPROGRESAconditionalcash
![Page 42: NBER WORKING PAPER SERIES UNDERSTANDING AND ...Lars Hansen, Jeff Hammer, Glenn Harrison, Macartan Humphreys, Michal Kolesár, Helen Milner, Tamlyn Munslow, Suresh Naidu, Lant Pritchett,](https://reader033.vdocuments.net/reader033/viewer/2022060100/60b128e9329a544c0969daf7/html5/thumbnails/42.jpg)
41
transferprogram,ToddandWolpinnotethatthesubsidiespaidtoparentsiftheir
childrengotoschoolcanbethoughtofasacombinationofreductioninchildren’s
wagesandanincreaseinparents’income,whichallowsthemtopredicttheresults
oftheconditionalcashexperimentwithlimitedadditionalassumptions.Ifthis
works,asitpartiallydoesintheiranalysis,thetrialhelpsconsolidateprevious
knowledgeandcontributestoanevolvingbodyoftheoryandempirical,including
trial,evidence.
Theprogramofthinkingaboutpolicychangesasequivalenttopriceandin-
comechangeshasalonghistoryineconomics;muchofrationalchoicetheorycanbe
sointerpreted(seeDeatonandMuellbauer(1980)formanyexamples).Whenthis
conversioniscredible,andwhenatrialonsomeapparentlyunrelatedtopiccanbe
modeledasequivalenttoachangeinpricesandincomes,andwhenwecanassume
thatpeopleindifferentsettingsrespondrelevantlysimilarlytochangesinprices
andincomes,wehaveareadymadeframeworkforincorporatingthetrialresults
intopreviousknowledge,aswellasforextendingthetrialresultsandusingthem
elsewhere.Ofcourse,alldependsonthevalidityandcredibilityofthetheory;peo-
plemaynotinfacttreatataxincreaseasadecreaseinthepriceofleisure,andbe-
havioraleconomicsisfullofexampleswhereapparentlyequivalentstimuligenerate
non-equivalentoutcomes.Theembraceofbehavioraleconomicsbymanyofthecur-
rentgenerationoftrialistsmayaccountfortheirlimitedwillingnesstouseconven-
tionalchoicetheoryinthisway.Unfortunately,behavioraleconomicsdoesnotyet
offerareplacementforthegeneralframeworkofchoicetheorythatissousefulin
thisregard.
Theorycanalsohelpwiththeproblemweraisedofdelineatingthepopula-
tiontowhichthetrialresultsimmediatelyapplyandforthinkingaboutmoving
fromthispopulationtopopulationsofinterest.Ashenfelter’s(1978)analysisis
againagoodillustrationandpredatesmuchsimilarworkinlaterliterature.Thein-
cometaxexperimentsofferedparticipationinthetrialtoarandomsampleofthe
populationofinterest.Becausetherewasnoblindingandnocompulsion,people
whowererandomizedintothetreatmentgroupwerefreetochoosetorefusetreat-
ment.Asinmanysubsequentanalyses,Ashenfeltersupposesthatpeoplechooseto
![Page 43: NBER WORKING PAPER SERIES UNDERSTANDING AND ...Lars Hansen, Jeff Hammer, Glenn Harrison, Macartan Humphreys, Michal Kolesár, Helen Milner, Tamlyn Munslow, Suresh Naidu, Lant Pritchett,](https://reader033.vdocuments.net/reader033/viewer/2022060100/60b128e9329a544c0969daf7/html5/thumbnails/43.jpg)
42
participateifitisintheirinteresttodoso,dependingonwhathasbecomeknownin
theRCTandInstrumentalVariablesliteratureastheirownidiosyncratic`gain.’The
simplelaborsupplymodelgivesanapproximatecondition:Ifthetreatmentin-
creasesthetaxratefrom t0 to t1 withanoffsettingincreaseinG,thenanindividual
assignedtotheexperimentalgroupwilldeclinetoparticipateif
(t1 − t0 )w0h0 + 12
s00 (t1 − t0 ) > G1 −G0 (5)
wheresubscript1referstothetreatmentsituation,0tothecontrol,ℎ&ishours
worked,and𝑠&&isthe(negative)utility-constantresponseofhoursworkedtothe
taxrate.Ifthereisnosubstitution,thesecondtermontheleft-handsideiszero,and
peoplewillaccepttreatmentiftheincreaseinGmorethanmakesupforthein-
creasesintaxespayable,the`breakeven’condition.Inconsequence,thosewith
higherearningsarelesslikelytoaccepttreatment.Somebetter-offpeoplewithhigh
substitutioneffectswillalsoaccepttreatmentiftheopportunitytobuymorecheap
leisureissufficiententicement.
Theselectiveacceptanceoftreatmentlimitstheanalyst’sabilitytolearn
aboutthebetter-offorlow-substitutionpeoplewhodeclinetreatmentbutwho
wouldhavetoacceptitifthepolicywereimplemented.Boththeintention-to-treat
estimatorandthe`astreated’estimatorthatcomparesthetreatedandtheun-
treatedareaffected,notjustbythelaborsupplyeffectsthatthetrialisdesignedto
induce,butbythekindofselectioneffectsthatrandomizationisdesignedtoelimi-
nate.Ofcourse,theanalysisthatleadsto(5)canperhapshelpussaysomething
aboutthisandhelpusadjustthetrialestimatesbacktowhatwewouldliketoknow.
Yetthisisnoeasymatterbecauseselectiondepends,notonlyonobservables,such
aspre-experimentalearningsandhoursworked,buton(muchhardertoobserve)
laborsupplyresponsesthatlikelyvaryacrossindividuals.ParaphrasingAshenfelter,
wecannotestimatetheeffectsofapermanentcompulsorynegativeincometaxpro-
gramfromatransitoryvoluntarytrialwithoutstrongassumptionsoradditionalevi-
dence.
Muchofthemodernliterature,forexampleontrainingprograms,wrestles
withtheissueofexactlywhoisrepresentedbytheRCTresults,includingnotonly
![Page 44: NBER WORKING PAPER SERIES UNDERSTANDING AND ...Lars Hansen, Jeff Hammer, Glenn Harrison, Macartan Humphreys, Michal Kolesár, Helen Milner, Tamlyn Munslow, Suresh Naidu, Lant Pritchett,](https://reader033.vdocuments.net/reader033/viewer/2022060100/60b128e9329a544c0969daf7/html5/thumbnails/44.jpg)
43
whoparticipatesinthefirstplacebutwholeavesbeforethetrialiscompleted(see
againHeckman,LalondeandSmith(1999)).Asintheexamplesabove,modelingat-
tritionwithinatrialcanyieldestimatesofbehavioralresponsesthatcanbeusedto
transportthefindingstoothersettings(seeChanandHamilton(2006),Chassang,
PadróIMiguel,andSnowberg(2012)andChassangetal(2015)).Whenpeopleare
allowedtorejecttheirrandomlyassignedtreatmentaccordingtotheirown(realor
perceived)advantage,ortodropoutofatrialonanestimateofthebenefitsand
costsfromdoingso,wehavecomealongwayawayfromtherandomallocationin
thestandardconceptionofarandomizedcontrolledtrial.Moreover,theabsenceof
blindingiscommoninsocialandeconomicRCTs,andwhiletherearetrials,suchas
welfaretrials,thateffectivelycompelpeopletoaccepttheirassignments,andsome
wherethetreatmentisgenerousenoughtodoso,therearetrialswheresubjects
havemuchfreedomand,inthosecasesitislessthanobvioustouswhatrole,ifany,
randomizationplaysinwarrantingtheresults.
2.6Scalingup:usingtheaverageforpopulations
ManyRCTsaresmall-scaleandlocal,forexampleinafewschools,clinics,orfarms
inaparticulargeographic,cultural,socio-economicsetting.Ifsuccessfulaccording
toacost-effectivenesscriterion,forexample,itisacandidateforscaling-up,apply-
ingthesameinterventionforamuchlargerarea,oftenawholecountry,orsome-
timesevenbeyond,aswhensometreatmentisconsideredforallrelevantWorld
Bankprojects.Thefactthattheinterventionmightworkdifferentlyatscalehaslong
beennotedintheeconomicsliterature,e.g.GarfinkelandManski(1992),Heckman
(1992),andMoffitt(1992),andisrecognizedintherecentreviewbyBanerjeeand
Duflo(2009).Wewantheretoemphasizethepervasivenessofsucheffectsaswell
astonoteagainthatthisshouldnotbetakenasanargumentagainstusingRCTsbut
onlyagainsttheideathateffectsatscalearelikelytobethesameasinthetrial.
Anexampleofwhatareoftencalled`generalequilibriumeffects’comesfrom
agriculture.SupposeanRCTdemonstratesthatinthestudypopulationanewwayof
usingfertilizerhadasubstantialpositiveeffecton,say,cocoayields,sothatfarmers
whousedthenewmethodssawincreasesinproductionandinincomescompared
tothoseinthecontrolgroup.Iftheprocedureisscaleduptothewholecountry,or
![Page 45: NBER WORKING PAPER SERIES UNDERSTANDING AND ...Lars Hansen, Jeff Hammer, Glenn Harrison, Macartan Humphreys, Michal Kolesár, Helen Milner, Tamlyn Munslow, Suresh Naidu, Lant Pritchett,](https://reader033.vdocuments.net/reader033/viewer/2022060100/60b128e9329a544c0969daf7/html5/thumbnails/45.jpg)
44
toallcocoafarmersworldwide,thepricewilldrop,andifthedemandforcocoais
priceinelastic—asisusuallythoughttobethecase,atleastintheshortrun—cocoa
farmers’incomeswillfall.Indeed,theconventionalwisdomformanycropsisthat
farmersdobestwhentheharvestissmall,notlarge.Ofcourse,theseconsiderations
mightnotbedecisiveindecidingwhetherornottopromotetheinnovation,and
theremaystillbelongtermgainsif,forexample,somefarmersfindsomethingbet-
tertodothangrowingcocoa.But,inthiscase,thescaled-upeffectisoppositeinsign
tothetrialeffect.Theproblemisnotwiththetrialresults,whichcanbeusefullyin-
corporatedintoamorecomprehensivemarketmodelthatincorporatesthere-
sponsesestimatedbythetrial.Theproblemisonlyifweassumethattheaggregate
looksliketheindividual.Thatotheringredientsoftheaggregatemodelmustcome
fromobservationalstudiesshouldnotbeacriticism,evenforthosewhofavorRCTs;
itissimplythepriceofdoingseriousanalysis.
Therearemanypossibleinterventionsthataltersupplyordemandwhoseef-
fect,inaggregate,willchangeapriceorawagethatisheldconstantintheoriginal
RCT.Educationwillchangethesuppliesofskilledversusunskilledlabor,withimpli-
cationsforrelativewagerates.Conditionalcashtransfersincreasethedemandfor
(andperhapssupplyof)schoolsandclinics,whichwillchangepricesorwaiting
lines,orboth.Thereareinteractionsbetweenpeoplethatwilloperateonlyatscale.
Givingonechildavouchertogotoprivateschoolmightimproveherfuture,butdo-
ingsoforeveryonecandecreasethequalityofeducationforthosechildrenwhoare
leftinthepublicschools(seethecontrastingstudiesofAngristetal(2002)and
HsiehandUrquiola(2002)).Educationalortrainingprogramsmaybenefitthose
whoaretreatedbutharmthoseleftbehind;Créponetal(2014)recognizetheissue
andshowhowtoadaptanRCTtodealwithit.
Scalingupcanalsodisturbthepoliticalequilibrium.Anexploitativegovern-
mentmaynotallowthemasstransferofmoneyfromabroadtoapowerlessseg-
mentofthepopulation,thoughitmaypermitasmall-scaleRCTofcashtransfers,
perhapseveninthehopethatalarge-scaleimplementationwillyieldopportunities
forpredation.ProvisionofhealthcarebyforeignNGOsmaybesuccessfulintrials,
buthaveunintendednegativeconsequencestoscalebecauseofgeneralequilibrium
![Page 46: NBER WORKING PAPER SERIES UNDERSTANDING AND ...Lars Hansen, Jeff Hammer, Glenn Harrison, Macartan Humphreys, Michal Kolesár, Helen Milner, Tamlyn Munslow, Suresh Naidu, Lant Pritchett,](https://reader033.vdocuments.net/reader033/viewer/2022060100/60b128e9329a544c0969daf7/html5/thumbnails/46.jpg)
45
effectsonthesupplyofhealthcarepersonnel,orbecauseitdisturbsthenatureof
thecontractbetweenthepeopleandagovernmentthatisusingtaxrevenuetopro-
videservices.InIndia,thegovernmentspendslargesumsonfoodsubsidiesthrough
asystem(thePDS)thatisbothcorruptandinefficient,withmuchofthegrainthatis
procuredfailingtofinditswaytotheintendedbeneficiaries.LocalizedRCTson
whetherornotfamiliesarebetteroffwithcashtransfersarenotinformativeabout
howpoliticianswouldchangetheamountofthetransferiffacedwithunanticipated
inflation,andatleastasimportant,whetherthegovernmentcouldcutprocurement
fromrelativelywealthyandpoliticallypowerfulfarmers.Withoutapoliticaland
generalequilibriumanalysis,itisimpossibletothinkabouttheeffectsofreplacing
foodsubsidieswithcashtransfers(seee.g.Basu(2010)).
Eveninmedicine,wherebiologicalinteractionsbetweenpeoplearelesscom-
monthanaresocialinteractionsinsocialscience,interactionscanbeimportant.In-
fectiousdiseasesareonewell-knownexample,whereimmunizationprogramsaf-
fectthedynamicsofdiseasetransmissionthroughherdimmunity(seeFineand
Clarkson(1986)andManski(2013,52)).Thesocialandeconomicsettingalsoaf-
fectshowdrugsareactuallyusedandthesameissuescanarise;thedistinctionbe-
tweenefficacyandeffectivenessinclinicaltrialsisinpartrecognitionofthefact.
2.7Drillingdown:usingtheaverageforindividuals
Justasthereareissueswithscaling-up,itisnotobvioushowtousetheresultsfrom
RCTsatthelevelofindividualunits,evenindividualunitsthatwereincludedinthe
trial.Awell-conductedRCTdeliversanATEforthetrialpopulationbut,ingeneral,
thataveragedoesnotapplytoeveryone.Itisnottrue,forexample,asarguedinthe
AmericanMedicalAssociation’s“Users’guidetothemedicalliterature”that“ifthe
patientwouldhavebeenenrolledinthestudyhadshebeenthere—thatisshemeets
alloftheinclusioncriteriaanddoesn’tviolateanyoftheexclusioncriteria—thereis
littlequestionthattheresultsareapplicable”(seeGuyattetal(1994,60)).Even
moremisleadingaretheoften-heardstatementsthatanRCTwithanaveragetreat-
menteffectinsignificantlydifferentfromzerohasshownthatthetreatmentworks
fornoone.
![Page 47: NBER WORKING PAPER SERIES UNDERSTANDING AND ...Lars Hansen, Jeff Hammer, Glenn Harrison, Macartan Humphreys, Michal Kolesár, Helen Milner, Tamlyn Munslow, Suresh Naidu, Lant Pritchett,](https://reader033.vdocuments.net/reader033/viewer/2022060100/60b128e9329a544c0969daf7/html5/thumbnails/47.jpg)
46
Theseissuesarefamiliartophysicianspracticingevidence-basedmedicine
whoseguidelinesrequire“integratingindividualclinicalexpertisewiththebest
availableexternalclinicalevidencefromsystematicresearch,”Sackettetal(1996,
71).Exactlywhatthismeansisunclear;physiciansknowmuchmoreabouttheirpa-
tientsthanisallowedforintheATEfromtheRCT(though,onceagain,stratification
inthetrialislikelytobehelpful)andtheyoftenhaveintuitiveexpertisefromlong
practicethatcanhelpthemidentifyfeaturesinaparticularpatientthatmayinflu-
encetheeffectivenessofagiventreatmentforthatpatient.Butthereisanoddbal-
ancestruckhere.Thesejudgmentsaredeemedadmissibleindiscussionwiththein-
dividualpatient,buttheydon’tadduptoevidencetobemadepubliclyavailable,
withtheusualcautionsaboutcredibility,bythestandardsadoptedbymostEBM
sites.Itisalsotruethatphysicianscanhaveprejudicesand`knowledge’thatmight
beanythingbut.Clearly,therearesituationswhereforcingpractitionerstofollow
theaveragewilldobetter,evenforindividualpatients,andotherswheretheoppo-
siteistrue,KahnemanandKlein(2009).
Whetherornotaveragesareusefultoindividualsraisesthesameissue
throughoutsocialscienceresearch.Imaginetwoschools,StJoseph’sandSt.Mary’s,
bothofwhichwereincludedinanRCTofaclassroominnovation.Theinnovationis
successfulonaverage,butshouldtheschoolsadoptit?ShouldStMary’sbeinflu-
encedbyapreviousattemptinStJoseph’sthatwasjudgedafailure?Manywould
dismissthisexperienceasanecdotalandaskhowStJoseph’scouldhaveknownthat
itwasafailurewithoutbenefitof`rigorous’evidence.YetifStMary’sislikeStJo-
seph’s,withasimilarmixofpupils,asimilarcurriculum,andsimilaracademic
standing,mightnotStJoseph’sexperiencebemorerelevanttowhatmighthappen
atStMary’sthanisthepositiveaveragefromtheRCT?Andmightitnotbeagood
ideafortheteachersandgovernorsofStMary’stogotoStJoseph’sandfindout
whathappenedandwhy?Theymaybeabletoobservethemechanismofthefailure,
ifsuchitwas,andfigureoutwhetherthesameproblemswouldapplyforthem,or
whethertheymightbeabletoadapttheinnovationtomakeitworkforthem,per-
hapsevenmoresuccessfullythanthepositiveaverageinthetrial.
![Page 48: NBER WORKING PAPER SERIES UNDERSTANDING AND ...Lars Hansen, Jeff Hammer, Glenn Harrison, Macartan Humphreys, Michal Kolesár, Helen Milner, Tamlyn Munslow, Suresh Naidu, Lant Pritchett,](https://reader033.vdocuments.net/reader033/viewer/2022060100/60b128e9329a544c0969daf7/html5/thumbnails/48.jpg)
47
Onceagain,thesequestionsareunlikelytobeeasilyansweredinpractice;
but,aswithtransportability,thereisnoseriousalternativetotrying.Assumingthat
theaverageworksforyouwilloftenbewrong,anditwillatleastsometimesbepos-
sibletodobetter.Asinthemedicalcase,theadvicetoindividualschoolsoftenlacks
specificity.Forexample,theU.S.InstituteofEducationScienceshasprovideda
“user-friendly”guidetopracticessupportedbyrigorousevidence,USDepartmentof
Education(2003).Theadvice,whichissimilartorecommendationsindevelopment
economics,isthattheinterventionbedemonstratedeffectivethroughwell-designed
RCTsinmorethanonesiteandthat“thetrialsshoulddemonstratetheinterven-
tion’seffectivenessinschoolsettingssimilartoyours”(2003,17).Nooperational
definitionof“similar”isprovided.
2.8Examplesandillustrationsfromeconomics
OurargumentsinthisSectionshouldnotbecontroversial,yetwebelievethatthey
representanapproachthatisdifferentfrommostcurrentpractice.Todocument
thisandtofilloutthearguments,weprovidesomeexamples.Whiletheseareocca-
sionallycritical,ourpurposeisconstructive;indeed,webelievethatmisunderstand-
ingsabouthowtouseRCTshaveartificiallylimitedtheirusefulness,aswellasalien-
atedsomewhowouldotherwiseusethem.
Conditionalcashtransfers(CCTs)areinterventionsthathavebeentestedus-
ingRCTs(andotherRCT-likemethods)andareoftencitedasaleadingexampleof
howanevaluationwithstronginternalvalidityleadstoarapidspreadofthepolicy,
e.g.AngristandPischke(2010)amongmanyothers.Thinkthroughthecausalchain
thatisrequiredforCCTstobesuccessful:Peoplemustlikemoney,theymustlike
(ordonotobjecttoomuch)totheirchildrenbeingeducatedandvaccinated,there
mustexistschoolsandclinicsthatarecloseenoughandwellenoughstaffedtodo
theirjob,andthegovernmentoragencythatisrunningtheschememustcareabout
thewellbeingoffamiliesandtheirchildren.Thatsuchconditionsholdinawide
rangeof(althoughcertainlynotall)countriesmakesitunsurprisingthatCCTs
`work’inmanyreplications,thoughtheycertainlywillnotworkinplaceswherethe
schoolsandclinicsdonotexist,e.g.Levy(2001),norinplaceswherepeople
stronglyopposeeducationorvaccination.
![Page 49: NBER WORKING PAPER SERIES UNDERSTANDING AND ...Lars Hansen, Jeff Hammer, Glenn Harrison, Macartan Humphreys, Michal Kolesár, Helen Milner, Tamlyn Munslow, Suresh Naidu, Lant Pritchett,](https://reader033.vdocuments.net/reader033/viewer/2022060100/60b128e9329a544c0969daf7/html5/thumbnails/49.jpg)
48
Similarly,giventhatthesupportfactorswilloperatewithdifferentstrengths
andeffectivenessindifferentplaces,itisalsonotsurprisingthatthesizeoftheATE
differsfromplacetoplace;forexample,Vivalt’sAidGradewebsitelists29estimates
fromarangeofcountriesofthestandardized(dividedbylocalstandarddeviationof
theoutcome)effectsofCCTsonschoolattendance;allbutfourshowtheexpected
positiveeffect,andtherangerunsfrom–8to+38percentagepoints,Vivalt(2015).
Eveninthisleadingcase,wherewemightreasonablyconcludethatCCTs`work’in
gettingchildrenintoschool,itwouldbehardtocalculatecrediblecost-effectiveness
numbersortocometoageneralconclusionaboutwhetherCCTsaremoreorless
costeffectivethanotherpossiblepolicies.Bothcostsandeffectsizescanbeex-
pectedtodifferinnewsettings,justastheyhaveinobservedones,makingthese
predictionsdifficult.
Therangeofestimatesillustratesthatthesimpleviewofexternalvalidity—
thattheATEtransportsfromoneplacetoanother—isnotreasonable.AidGrade
usesstandardizedmeasuresofeffectsizedividedbystandarddeviationofoutcome
atbaseline,asdoesthemajormulti-countrystudybyBanerjeeetal(2015).Butwe
mightprefermeasuresthathaveaneconomicinterpretation,suchasadditional
monthsofschoolingper$100spent(forexampleifadonoristryingtodecide
wheretospend,seebelow).Nutritionmightbemeasuredbyheight,orbythelogof
height.EveniftheATEbyonemeasurecarriesacross,itwillonlydosousingan-
othermeasureiftherelationshipbetweenthetwomeasuresisthesameinbothsit-
uations.Thisisexactlythesortofthingthataformalanalysisoftransportability
forcesustothinkabout.(NotealsothattheATEintheoriginalRCTcandifferde-
pendingonwhethertheoutcomeismeasuredinlevelsorinlogs;itiseasytocon-
structexampleswherethetwoATEshavedifferentsigns.)
Muchoftheeconomicsliterature,likethemedicalliterature,workswiththe
viewofexternalvaliditythat,unlessthereisevidencetothecontrary,thedirection
andsizeoftreatmenteffectscanbetransportedfromoneplacetoanother.TheJ-
PALwebsitereportsitsfindingsunderageneralheadingofpolicyrelevance,subdi-
videdbyaselectionoftopics.Undereachtopic,thereisalistofrelevantRCTsfrom
arangeofdifferentsettingsaroundtheworld.Theseareconvenientlyconverted
![Page 50: NBER WORKING PAPER SERIES UNDERSTANDING AND ...Lars Hansen, Jeff Hammer, Glenn Harrison, Macartan Humphreys, Michal Kolesár, Helen Milner, Tamlyn Munslow, Suresh Naidu, Lant Pritchett,](https://reader033.vdocuments.net/reader033/viewer/2022060100/60b128e9329a544c0969daf7/html5/thumbnails/50.jpg)
49
intoacommoncost-effectivenessmeasuresothat,forexample,under“education”,
subhead“studentparticipation”,therearefourstudiesfromAfrica:oninforming
parentsaboutthereturnstoeducationinMadagascar,ondeworming,onschooluni-
forms,andonmeritscholarships,allfromKenya.Theunitsofmeasurementaread-
ditionalyearsofstudenteducationper$100,andamongthesefourstudies,theav-
erageeffectsofspending$100are20.7years,13.9years,0.71yearsand0.27years
respectively.(Notethatthisisadifferent—andmuchsuperior—standardization
fromtheeffectsizestandardizationdiscussedbelow.)
Whatcanweconcludefromsuchcomparisons?Foraphilanthropicdonorin-
terestedineducation,andifmarginalandaverageeffectsarethesame,theymight
indicatethatthebestplacetodevoteamarginaldollarisinMadagascar,whereit
wouldbeusedtoinformparentsaboutthevalueofeducation.Thisiscertainlyuse-
ful,butitisnotasusefulasstatementsthatinformationordewormingprograms
areeverywheremorecost-effectivethanprogramsinvolvingschooluniformsor
scholarships,orifnoteverywhere,atleastoversomedomain,anditisthesesecond
kindsofcomparisonthatwouldgenuinelyfulfillthepromiseof`findingoutwhat
works.’Butsuchcomparisonsonlymakesenseifwecantransporttheresultsfrom
oneplacetoanother,iftheKenyanresultsalsoholdinMadagascar,Mali,orNa-
mibia,orsomeotherlistofplaces.J-PAL’smanualforcost-effectiveness,Dhaliwalet
al(2012)explainsin(entirelyappropriate)detailhowtohandlevariationincosts
acrosssites,notingvariablefactorssuchaspopulationdensity,prices,exchange
rates,discountrates,inflation,andbulkdiscounts.Butitgivesshortshrifttocross-
sitevariationinthesizeofATEs,whichplayanequalpartinthecalculationsofcost
effectiveness.Themanualbrieflynotesthatdiminishingreturns(orthelast-mile
problem)mightbeimportantintheorybutarguesthatthebaselinelevelsofout-
comesarelikelytobesimilarinthepilotandreplicationareas,sothattheATEcan
besafelytransportedasis.Allofthislacksajustificationfortransportability,some
understandingofwhenresultstransport,whentheydonot,orbetterstill,howthey
shouldbemodifiedtomakethemtransportable.
![Page 51: NBER WORKING PAPER SERIES UNDERSTANDING AND ...Lars Hansen, Jeff Hammer, Glenn Harrison, Macartan Humphreys, Michal Kolesár, Helen Milner, Tamlyn Munslow, Suresh Naidu, Lant Pritchett,](https://reader033.vdocuments.net/reader033/viewer/2022060100/60b128e9329a544c0969daf7/html5/thumbnails/51.jpg)
50
OneofthelargestandmosttechnicallyimpressiveofthedevelopmentRCTs
isbyBanerjeeetal(2015),whichtestsa“graduation”programdesignedtoperma-
nentlyliftextremelypoorpeoplefrompovertybyprovidingthemwithagiftofa
productiveasset(fromguinea-pigs,(regular-)pigs,sheep,goats,orchickensde-
pendingonlocale),trainingandsupport,andlife-skillscoaching,aswellassupport
forconsumption,saving,andhealthservices.Theideaisthatthispackageofaidcan
helppeoplebreakoutofpovertytrapsinawaythatwouldnotbepossiblewithone
interventionatatime.ComparableversionsoftheprogramweretestedinEthiopia,
Ghana,Honduras,India,Pakistan,andPeruand,exceptingHonduras(wherethe
chickensdied)findlargelypositiveandpersistenteffects—withsimilar(standard-
ized)effectsizes—forarangeofoutcomes(economic,mentalandphysicalhealth,
andfemaleempowerment).Onesiteapart,essentiallyeveryoneacceptedtheiras-
signment.ReplicationofpositiveATEsoversuchawiderangeofplacescertainly
providesproofofconceptforsuchascheme.YetBauchet,Morduch,andRavi(2015)
failtoreplicatetheresultinSouthIndia,wherethecontrolgroupgotaccesstomuch
thesamebenefits,whatHeckman,Hohman,andSmith(2000)call“substitution
bias”.Evenso,theresultsareimportantbecause,althoughthereisalongstanding
interestinpovertytraps,manyeconomistshavebeenskepticaloftheirexistenceor
thattheycouldbesprungbysuchaid-basedpolicies.Inthissense,thestudyisan
importantcontributiontothetheoryofeconomicdevelopment;ittestsatheoretical
propositionandwill(orshould)changemindsaboutit.
Anumberofdifficultiesremain.Astheauthorsnote,suchtrialscannottellus
whichcomponentofthetreatmentaccountedfortheresults,orwhichmightbedis-
pensable—amuchmoreexpensivemultifactorialtrialwouldberequired—thoughit
seemslikelyinpracticethatthecostliestcomponent—therepeatedvisitsfortrain-
ingandsupport—islikelytobethefirsttobecutbycash-strappedpoliticiansorad-
ministrators.Andasnoted,itisnotclearwhatshouldcountas(simple)replication
ininternationalcomparisons;itishardtothinkoftheusesofstandardizedeffect
sizes,excepttodocumentthateffectsexisteverywhereandthattheyaresimilarly
largerelativetolocalvariationinsuchthings.
![Page 52: NBER WORKING PAPER SERIES UNDERSTANDING AND ...Lars Hansen, Jeff Hammer, Glenn Harrison, Macartan Humphreys, Michal Kolesár, Helen Milner, Tamlyn Munslow, Suresh Naidu, Lant Pritchett,](https://reader033.vdocuments.net/reader033/viewer/2022060100/60b128e9329a544c0969daf7/html5/thumbnails/52.jpg)
51
Theeffectsize—theATEstandardizedbybeingexpressedinnumbersof
standarddeviationsoftheoriginaloutcome—thoughconvenientlydimensionless,
haslittletorecommendit.AswithmuchofRCTpractice,itstripsoutanyeconomic
content—noratesofreturn,orbenefitsminuscosts—anditremovesanydiscipline
onwhatisbeingcompared.Applesandorangesbecomeimmediatelycomparable,
asdotreatmentswhoseinclusioninameta-analysisislimitedonlybytheimagina-
tionoftheanalystsinclaimingsimilarity.Trainingprogramsforphysicalfitnesscan
bepooledwithtrainingprogramsforwelding,ormarketing,orevenobedience
trainingforpets.Inpsychology,wheretheconceptoriginated,thisresultsinendless
disputesaboutwhatshouldandshouldnotbepooledinameta-analysis.Goldberger
andManski(1995,769)notethat“standardizationaccomplishesnothingexceptto
givequantitiesinnoncomparableunitsthesuperficialappearanceofbeingincom-
parableunits.Thisaccomplishmentisworsethanuseless—ityieldsmisleadingin-
ferences.”Beyondthat,Simpson(2017)notesthatrestrictionsonthetrialsample—
oftengoodpracticetoreducebackgroundnoiseandtohelpdetectaneffect—will
reducethebaselinestandarddeviationandinflatetheeffectsize.Moregenerally,ef-
fectsizesareopentomanipulationbyexclusionrules.Itmakesnosensetoclaim
replicabilityonthebasisofeffectsizes,letalonetousethemtorankprojects.Effect
sizesareirrelevantforpolicymaking.
Thegraduationstudycanbetakenastheclosesttofulfillingthe`findingout
whatworks’aimoftheRCTmovementindevelopment.Yetitissilentonperhaps
thecrucialaspectforpolicy,whichisthatthetrialwasruninpartnershipwith
NGOs,whereaswhatwewouldliketoknowiswhetheritcouldbereplicatedbygov-
ernments,includingthosegovernmentsthatareincapableofgettingdoctors,
nurses,andteacherstoshowuptoclinicsorschools,Chaudhuryetal(2005),
Banerjee,DeatonandDuflo(2004),orofregulatingthequalityofmedicalcareinei-
therthepublicorprivatesectors,Filmer,HammerandPritchett(2000)orDasand
Hammer(2005).Infact,wealreadyknowagreatdealabout`whatworks.’Vaccina-
tionswork,maternalandchildhealthcareserviceswork,andclassroomteaching
works.Yetknowingthisdoesnotgetthosethingsdone.Addinganotherprogram
thatworksunderidealconditionsisusefulonlywhereconditionsareinfactideal,in
![Page 53: NBER WORKING PAPER SERIES UNDERSTANDING AND ...Lars Hansen, Jeff Hammer, Glenn Harrison, Macartan Humphreys, Michal Kolesár, Helen Milner, Tamlyn Munslow, Suresh Naidu, Lant Pritchett,](https://reader033.vdocuments.net/reader033/viewer/2022060100/60b128e9329a544c0969daf7/html5/thumbnails/53.jpg)
52
whichcaseitwouldlikelybeunnecessary.Findingoutwhatworksisnotthemagic
keytoeconomicdevelopment.Technicalknowledge,thoughalwaysworthhaving,
requiressuitableinstitutionsandsuitableincentivesifitistodoanygood.
Asimilarpointisdocumentedinthecontrastbetweenasuccessfultrialthat
usedcamerasandthreatsofwagereductionstoincentivizeattendanceofteachers
inschoolsrunbyanNGOinRajasthaninIndia,Duflo,Hanna,andRyan(2012),and
thesubsequentfailureofafollow-upprograminthesamestatetotacklemassab-
senteeismofhealthworkers,Banerjee,Duflo,andGlennerster(2008).Inthe
schools,thecamerasandtimekeepingworkedasintended,andteacherattendance
increased.Intheclinics,therewasashort-runeffectonnurseattendance,butitwas
quicklyeliminated.(Theabilityofagentseventuallytounderminepoliciesthatare
initiallyeffectiveiscommonenoughandnoteasilyhandledwithinanRCT.)Inboth
trials,therewereincentivestoimproveattendance,andtherewereincentivesto
findawaytosabotagethemonitoringandrestoreworkerstotheiraccustomedpo-
sitions;theforceoftheseincentivesisa`high-level’cause,likegravity,ortheprinci-
pleofthelever,thatworksinmuchthesamewayeverywhere.Fortheclinics,some
sabotagewasdirect—thesmashingofcameras—andsomewassubtler,whengov-
ernmentsupervisorsprovidedofficial,thoughspecious,reasonsformissingwork.
WecanonlyconjecturewhythecausalitywasswitchedinthemovefromNGOto
government;wesuspectthatworkingforahighly-respectedlocalNGOisadifferent
contractfromworkingforthegovernment,wherenotshowingupforworkis
widely(ifinformally)understoodtobepartofthedeal.Theincentiveleverworks
whenitiswiredupright,aswiththeNGOs,butnotwhenthewiringcutsitout,as
withthegovernment.Knowing`whatworks’inthesenseofthetreatmenteffecton
thetrialpopulationisoflimitedvaluewithoutunderstandingthepoliticalandinsti-
tutionalenvironmentinwhichitisset.Thisunderlinestheneedtounderstandthe
underlyingsocial,economic,andculturalstructures—includingtheincentivesand
agencyproblemsthatinhibitservicedelivery—thatarerequiredtosupportthe
causalpathwaysthatweshouldliketoseeatwork.
Trialsineconomicdevelopmentoftentakeplaceinartificialenvironments.
Drèze(2016)notes,basedonextensiveexperienceinIndia,“whenaforeignagency
![Page 54: NBER WORKING PAPER SERIES UNDERSTANDING AND ...Lars Hansen, Jeff Hammer, Glenn Harrison, Macartan Humphreys, Michal Kolesár, Helen Milner, Tamlyn Munslow, Suresh Naidu, Lant Pritchett,](https://reader033.vdocuments.net/reader033/viewer/2022060100/60b128e9329a544c0969daf7/html5/thumbnails/54.jpg)
53
comesinwithitsheavybootsandsuitcasesofdollarstoadministera`treatment,’
whetherthroughalocalNGOorgovernmentorwhatever,thereisalotgoingon
otherthanthetreatment.”Thereisalsothesuspicionthatatreatmentthatworks
doessobecauseofthepresenceofthe`treators,’oftenfromabroad,andmaynot
workdosowiththepeoplewhowillworkitinpractice.
Thereisalsomuchtobelearnedfrommanyyearsofeconomictrialsinthe
UnitedStates,particularlyfromtheworkofMDRC,fromtheearlyincometaxtrials,
aswellasfromtheRandHealthExperiment.Followingtheincometaxtrials,MDRC
hasrunmanyrandomizedtrialssincethe1970s,mostlyfortheFederalgovernment
butalsoforindividualstatesandforCanada(seethethoroughandinformativeac-
countbyGueronandRolston(2011)forthefactualinformationunderlyingthefol-
lowingdiscussion).MDRC’sprogram,likethatofJPALindevelopment,isintended
tofindout`whatworks’inthestateandfederalwelfareprograms.Theseprograms
areconditionalcashtransfersinwhichpoorrecipientsaregivencashprovidedthey
satisfycertainconditionssuchasworkrequirementsortraining,whichareoften
thesubjectofthetrial.Whatarethebenefitsandcostsofvariousalternatives,both
totherecipientsandtothelocalandfederaltaxpayers?Alloftheseprogramsare
deeplypoliticized,withsharplydifferentviewsoverbothfactsanddesirability.
Manyengagedinthesedisputesfeelcertainofwhatshouldbedoneandwhatits
consequenceswillbesothat,bytheirlights,controlgroupsareunethicalbecause
theydeprivesomepeopleofwhattheadvocates`know’willbecertainbenefits.
Giventhis,itisperhapssurprisingthatRCTshavebecometheacceptednormfor
thiskindofpolicyevaluationintheUS.
Thereasonsowemuchtopoliticalinstitutions,aswellastothecommonbe-
lief,exploredinSection1,thatRCTscanrevealthetruth.AttheFederallevel,pro-
spectivepoliciesarevettedbythenon-partisanCongressionalBudgetOffice(CBO),
whichmakesitsownestimatesofthebudgetaryimplicationsoftheprogram.Ideo-
logueswhoseprogramsarescoredpoorlybytheCBOhaveanincentivetosupport
anRCT,nottoconvincethemselves,buttoconvinceopponents;onceagain,RCTs
arevaluablewhenyouropponentsdonotshareyourprior.Andcontrolgroupsare
![Page 55: NBER WORKING PAPER SERIES UNDERSTANDING AND ...Lars Hansen, Jeff Hammer, Glenn Harrison, Macartan Humphreys, Michal Kolesár, Helen Milner, Tamlyn Munslow, Suresh Naidu, Lant Pritchett,](https://reader033.vdocuments.net/reader033/viewer/2022060100/60b128e9329a544c0969daf7/html5/thumbnails/55.jpg)
54
easiertoputinplacewhenthereareinsufficientfundstocoverthewholepopula-
tion.TherewasalsoawidespreadandlargelyuncriticalbeliefthatRCTsgivethe
rightanswer,atleastforthebudgetaryimplications,which,ratherthanthewellbe-
ingoftherecipients,wereoftentheprimaryconcern;notethatallofthesetrialsare
onpoorpeoplebyrichpeoplewhoaretypicallymoreconcernedwithcostthanwith
thewellbeingofthepoor,Greenberg,SchroderandOnstott(1999).MDRCstrials
couldthereforebeeffectivedispute-reconciliationmechanismsbothforthosewho
sawtheneedforevidenceandforthosewhodidnot(exceptinstrumentally).The
outcomeherefitswithour`publichealth’case;whatthepoliticiansneedtoknowis
nottheoutcomesforindividuals,orevenhowtheoutcomesinonestatemight
transporttoanother,buttheaveragebudgetarycostinaspecificplace,something
thatagoodRCTconductedonarepresentativesampleofthetargetpopulationcan
deliver,atleastintheabsenceofgeneralequilibriumeffects,timingeffects,etc.
TheseRCTsbyMDRCandothercontractorshavedemonstratedboththefea-
sibilityoflarge-scalesocialtrialsincludingthepossibilityofrandomizationinthese
settings(wheremanyparticipantswerehostiletotheidea),aswellastheiruseful-
nesstopolicymakers.Theyalsoseemtohavechangedbeliefs,forexampleinfavor
ofthedesirabilityofworkrequirementsasaconditionofwelfare,evenamongmany
originallyopposed.Therearealsolimitations;thetrialsappeartohavehadatbesta
minorinfluenceonscientificthinkingaboutbehaviorinlabormarketsand,inthat
sense,theyaremoreabout`plumbing’thanscience,Duflo(2017).Theresultsof
similarprogramshaveoftenbeendifferentacrossdifferentsites,andtherehasto
datebeennofirmunderstandingofwhy;indeed,thetrialsarenotdesignedtore-
vealthis,Moffitt(2004).Finally,andperhapscruciallyforthepotentialcontribution
toeconomicscience,therehasbeenlittlesuccessinunderstandingeithertheunder-
lyingstructuresorchainsofcausation,inspiteofadeterminedeffortfromthebe-
ginningtoopentheblackboxes.
TheRANDhealthexperiment,Manningetal(1975a,b),providesadifferent
butequallyinstructivestoryifonlybecauseitsresultshavepermeatedtheacademic
andpolicydiscussionsabouthealthcareeversince.Itwasoriginallydesignedtotest
whethermoregenerousinsurancecausespeopletousemoremedicalcareand,ifso,
![Page 56: NBER WORKING PAPER SERIES UNDERSTANDING AND ...Lars Hansen, Jeff Hammer, Glenn Harrison, Macartan Humphreys, Michal Kolesár, Helen Milner, Tamlyn Munslow, Suresh Naidu, Lant Pritchett,](https://reader033.vdocuments.net/reader033/viewer/2022060100/60b128e9329a544c0969daf7/html5/thumbnails/56.jpg)
55
byhowmuch.Theincentiveeffectsarehardlyindoubttoday;theimmortalityofthe
studycomesratherfromthefactthatitsmulti-arm(responsesurface)designal-
lowedthecalculationofanelasticityforthestudypopulation,thatmedicalexpendi-
turesdecreasedby–0.1to–0.2percentforeverypercentageincreaseinthecopay-
ment.AccordingtoAron-Dine,Einav,andFinkelstein(2013),itisthisdimensionless
andthusapparentlytransportablenumberthathasbeenusedeversincetodiscuss
thedesignofhealthcarepolicy;theelasticityhascometobetreatedasauniversal
constant.Ironically,theyarguethattheestimatecannotbereplicatedinrecentstud-
ies,anditisevenunclearthatitisfirmlybasedontheoriginalevidence.Thispoints,
onceagain,tothecentralimportanceoftransportabilityfortheusefulness,both
shortandlongterm,ofatrial.Here,thesimpledirecttransportabilityoftheresult
seemstohavebeenlargelyillusorythough,aswehaveargued,thisdoesnotmean
thatmorecomplexconstructionsbasedontheresultsofthetrialwouldnothave
donebetter.
Conclusions
Itisusefultorespondtotwochallengesthatareoftenputtous,onefrommedicine
andonefromsocialscience.Themedicalchallengeis,“Ifyouarebeingprescribeda
newdrug,wouldn’tyouwantittohavebeenthroughanRCT?”Thesecond(related)
challengeis,“OK,youhavehighlightedsomeoftheproblemswithRCTs,butother
methodshaveallofthoseproblems,plusproblemsoftheirown.”Webelievethatwe
haveansweredbothoftheseinthepaperbutthatitishelpfultorecapitulate.
Themedicalchallengeisaboutyou,aspecificperson,sothatoneanswer
wouldbethatyoumaybedifferentfromtheaverage,andyouareentitledtoand
oughttoaskabouttheoryandevidenceaboutwhetheritwillworkforyou.This
wouldbeintheformofaconversationbetweenyouandyourphysician,whoknows
alotaboutyou.Youwouldwanttoknowhowthisclassofdrugissupposedtowork
andwhetherthatmechanismislikelytoworkforyou.Isthereanyevidencefrom
otherpatients,especiallypatientslikeyou,withyourconditionandinyourcircum-
stances,oraretheresuggestionsfromtheory?Whatscientificworkhasbeendone
toidentifywhatsupportfactorsmatterforsuccesswiththiskindofdrug?Iftheonly
![Page 57: NBER WORKING PAPER SERIES UNDERSTANDING AND ...Lars Hansen, Jeff Hammer, Glenn Harrison, Macartan Humphreys, Michal Kolesár, Helen Milner, Tamlyn Munslow, Suresh Naidu, Lant Pritchett,](https://reader033.vdocuments.net/reader033/viewer/2022060100/60b128e9329a544c0969daf7/html5/thumbnails/57.jpg)
56
informationavailableisfromthepharmaceuticalcompany,anRCTmightseemlike
agoodidea.Buteventhen,andalthoughknowledgeofthemeaneffectamongsome
groupiscertainlyofvalue,youmightgivelittleweighttoanRCTwhoseparticipants
areselectedinthewaytheywereselectedinthetrial,orwherethereislittleinfor-
mationaboutwhethertheoutcomesarerelevanttoyou.Recallthatmanynewdrugs
areprescribed‘off-label’,forapurposeforwhichtheywerenottested,andbeyond
that,thatmanynewdrugsareadministeredintheabsenceofanRCTbecauseyou
areactuallybeingenrolledinone.Forpatientswhoselastchanceistoparticipatein
atrialofsomenewdrug,thisisexactlythesortofconversationyoushouldhave
withyourphysician(followedbyoneaskinghertorevealwhetheryouareintheac-
tivearm,sothatyoucanswitchifnot),andsuchconversationsneedtotakeplace
forallprescriptionsthatarenewtoyou.Intheseconversations,theresultsofan
RCTmayhavemarginalvalue.Ifyourphysiciantellsyouthatsheendorsesevidence-
basedmedicine,andthatthedrugwillmostlikelyworkforyoubecauseanRCThas
shownthat‘itworks’,itistimetofindanewphysician.
Thesecondchallengeclaimsthatothermethodsarealwaysdominatedbyan
RCT.Thiskindofchallengeisnotwell-formulated.Dominatedforansweringwhat
question,forwhatpurposes?ThechiefadvantageoftheRCTisthatitcan,ifwell-
conducted,giveanunbiasedestimateofanATEinastudy(trial)sampleandthus
provideevidencethatthetreatmentcausedtheoutcomeinsomeindividualsinthat
sample.Ifthatiswhatyouwanttoknowandthere’slittlebackgroundknowledge
availableandthepriceisright,thenanRCTmaybethebestchoice.Astoother
questions,theRCTresultcanbepart—butusuallyonlyasmallpart—ofthedefense
of(a)ageneralclaim,(b)aclaimthatthetreatmentwillcausethatoutcomefor
someotherindividuals,oreven(c)aclaimaboutwhattheATEwillbeinsomeother
population.Buttheydolittlefortheseenterprisesontheirown.Whatisthebest
overallpackageofresearchworkfortacklingthesequestions—mostcost-effective
andmostlikelytoproducecorrectresults—dependsonwhatweknowandwhat
differentkindsofresearchwillcost.
![Page 58: NBER WORKING PAPER SERIES UNDERSTANDING AND ...Lars Hansen, Jeff Hammer, Glenn Harrison, Macartan Humphreys, Michal Kolesár, Helen Milner, Tamlyn Munslow, Suresh Naidu, Lant Pritchett,](https://reader033.vdocuments.net/reader033/viewer/2022060100/60b128e9329a544c0969daf7/html5/thumbnails/58.jpg)
57
ThereareexampleswhereanRCTdoesbetterthananobservationalstudy,
andtheseseemtobethecasesthatcometomindfordefendersofRCTs.Forexam-
ple,regressionsofwhetherpeoplewhogetMedicaiddobetterorworsethanpeople
withprivateinsurancearevitiatedbygrossdifferencesintheothercharacteristics
ofthetwopopulations.ButitisalongstepfromthattosayingthatanRCTcansolve
theproblem,letalonethatitistheonlywaytosolvetheproblem.Itwillnotonlybe
expensivepersubject,butitcanonlyenrollaselectedandalmostcertainlyunrepre-
sentativestudysample,itcanberunonlytemporarily,andtherecruitmenttothe
experimentwillnecessarilybedifferentfromrecruitmentinaschemethatisper-
manentandopentothefullqualifiedpopulation.Noneofthisremovestheblem-
ishesoftheobservationalstudy,buttherearemanymethodsofmitigatingitsdiffi-
culties,sothat,intheend,anobservationalstudywithcrediblecorrectionsanda
morerelevantandmuchlargerstudysample—todayoftenthecompletepopulation
ofinterestthroughadministrativerecords—mayprovideabetterestimate.Every-
thinghastobejudgedonacase-by-casebasis.Thereisnorigorousargumentfora
lexicographicpreferenceforRCTs.
Thereisalsoanimportantlineofenquirythatgoes,notonlybeyondRCTs,
butbeyondthe‘methodofdifferences’thatiscommontoRCTs,regressions,orany
formofcontrolledoruncontrolledcomparison.Thehypothetico-deductivemethod
confrontstheory-baseddeductionswiththedata—eitherobservationalorexperi-
mental.Asnotedabove,economistsroutinelyusetheorytoteaseoutanewimplica-
tionthatcanbetakentothedata,andtherearealsogoodexamplesinmedicine
suchasBleyerandWelch(2012)’sdemonstrationofthelimitedimpactonbreast
cancerincidenceofmammographyscreening,atopicwhereothermethodshave
generatedgreatcontroversyandlittleconsensus.
RCTsaretheultimateinnon-parametricestimationofaveragetreatmentef-
fectsinthetrialsamplesbecausetheymakesofewassumptionsaboutheterogene-
ity,causalstructure,choiceofvariables,andfunctionalform.RCTsareoftenconven-
ientwaystointroduceexperimenter-controlledvariance—ifyouwanttoseewhat
happens,thenkickitandsee,twistthelion’stail—butnotethatmanyexperiments,
includingmanyofthemostimportant(andNobelPrizewinning)experimentsin
![Page 59: NBER WORKING PAPER SERIES UNDERSTANDING AND ...Lars Hansen, Jeff Hammer, Glenn Harrison, Macartan Humphreys, Michal Kolesár, Helen Milner, Tamlyn Munslow, Suresh Naidu, Lant Pritchett,](https://reader033.vdocuments.net/reader033/viewer/2022060100/60b128e9329a544c0969daf7/html5/thumbnails/59.jpg)
58
economics,donotanddidnotuserandomization,Harrison(2013),Svorencik
(2015).Butthecredibilityoftheresults,eveninternally,canbeunderminedbyun-
balancedcovariatesandbyexcessiveheterogeneityinresponses,especiallywhen
thedistributionofeffectsisasymmetric,whereinferenceonmeanscanbehazard-
ous.Ironically,thepriceofthecredibilityinRCTsisthatwecanonlyrecoverthe
meanofthedistributionoftreatmenteffects,andthatonlyforthetrialsample.Yet,
inthepresenceofoutliers,reliableinferenceonmeansisdifficult.Andrandomiza-
tioninandofitselfdoesnothingunlessthedetailsareright;purposiveselectioninto
theexperimentalpopulation,likepurposiveselectionintoandoutofassignment,
underminesinferenceinjustthesamewayasdoesselectioninobservationalstud-
ies.Lackofblinding,whetherofparticipants,trialists,datacollectors,oranalysts,
underminesinference,akintoafailureofexclusionrestrictionsininstrumentalvari-
ableanalysis.
ThelackofstructurecanbeseriouslydisablingwhenwetrytouseRCTre-
sultsoutsideofafewcontexts,suchasprogramevaluation,hypothesistesting,or
establishingproofofconcept.Beyondthat,theresultscannotbeusedtohelpmake
predictionsbeyondthetrialsamplewithoutmorestructure,withoutmorepriorin-
formation,andwithouthavingsomeideaofwhatmakestreatmenteffectsvaryfrom
placetoplaceortimetotime.Thereisnooptionbuttocommittosomecausal
structureifwearetoknowhowtouseRCTevidenceoutoftheoriginalcontext.
Simplegeneralizationandsimpleextrapolationdonotcutthemustard.Thisistrue
ofanystudy,experimentalorobservational.Butobservationalstudiesarefamiliar
with,androutinelyworkwith,thesortofassumptionsthatRCTsclaimtoavoid,so
thatiftheaimistouseempiricalevidence,anycredibilityadvantagethatRCTshave
inestimationisnolongeroperative.AndbecauseRCTstellussolittleaboutwhyre-
sultshappen,theyhaveadisadvantageoverstudiesthatuseawiderrangeofprior
informationanddatatohelpnaildownmechanisms.
Yetoncethatcommitmenthasbeenmade,RCTevidencecanbeextremely
useful,pinningdownpartofastructure,helpingtobuildstrongerunderstanding
andknowledge,andhelpingtoassesswelfareconsequences.Asourexamplesshow,
thiscanoftenbedonewithoutcommittingtothefullcomplexityofwhatareoften
![Page 60: NBER WORKING PAPER SERIES UNDERSTANDING AND ...Lars Hansen, Jeff Hammer, Glenn Harrison, Macartan Humphreys, Michal Kolesár, Helen Milner, Tamlyn Munslow, Suresh Naidu, Lant Pritchett,](https://reader033.vdocuments.net/reader033/viewer/2022060100/60b128e9329a544c0969daf7/html5/thumbnails/60.jpg)
59
thoughtofasstructuralmodels.Yetwithoutthestructurethatallowsustoplace
RCTresultsincontext,ortounderstandthemechanismsbehindthoseresults,not
onlycanwenottransportwhether`itworks’elsewhere,butwecannotdothestand-
ardstuffofeconomics,whichistosaywhethertheinterventionisactuallywelfare
improving.Withoutknowingwhythingshappenandwhypeopledothings,werun
theriskofworthlesscasual(`fairystory’)causaltheorizingandhavegivenupon
oneofthecentraltasksofeconomics.
Wemustbackawayfromtherefusaltotheorize,fromtheexultationinour
abilitytohandleunlimitedheterogeneity,andactuallySAYsomething.Perhapspar-
adoxically,unlesswearepreparedtomakeassumptions,andtosaywhatweknow,
makingstatementsthatwillbeincredibletosome,allthecredibilityoftheRCTisfor
naught.
RCTsineconomicsonhealth,labor,anddevelopmenthaveproventheir
worthinprovidingproofsofconceptandattestingpredictionsthatsomepolicies
mustalwaysworkorcanneverwork.But,aselsewhereineconomics,wecannot
findoutwhysomethingworksbysimplydemonstratingthatitdoeswork,nomatter
howoften,whichleavesusuninformedastowhetherthepolicyshouldbeimple-
mented.Beyondthat,smallscale,demonstrationRCTsarenotcapableoftellingus
whatwouldhappenifthesepolicieswereimplementedtoscale,ofcapturingunin-
tendedconsequencesthattypicallycannotbeincludedintheprotocols,orofmodel-
ingwhatwillhappenifschemesareimplementeddifferentlythaninthetrial,forex-
amplebygovernments,whosemotivesandoperatingprinciplesaredifferentfrom
theNGOsoracademicswhotypicallyruntrials.Whileitistruethatabstract
knowledgeisalwayslikelytobebeneficial,successfulpolicydependsoninstitutions
andonpolitics,mattersonwhichRCTshavelittletosay.TheresultsofRCTscanand
shouldfeedintopublicdebateaboutwhatshouldbedone,butweareondangerous
groundwhentheyareused,ongroundsoftheirsupposedepistemicsuperiority,to
insulatepolicyfromdemocraticprocesses.
Citations
![Page 61: NBER WORKING PAPER SERIES UNDERSTANDING AND ...Lars Hansen, Jeff Hammer, Glenn Harrison, Macartan Humphreys, Michal Kolesár, Helen Milner, Tamlyn Munslow, Suresh Naidu, Lant Pritchett,](https://reader033.vdocuments.net/reader033/viewer/2022060100/60b128e9329a544c0969daf7/html5/thumbnails/61.jpg)
60
Aigner,DennisJ.,1985,“Theresidentialelectricitytime-of-usepricingexperiments.Whathavewelearned?”inDavidA.WiseandJerryA.Hausman,Socialexperimen-tation,Chicago,Il.ChicagoUniversityPressforNationalBureauofEconomicRe-search,11–54.
Al-Ubaydil,Omar,andJohnA.List,2013,“Onthegeneralizabilityofexperimentalre-sultsineconomics,”inG.FrechetteandA.Schotter,Methodsofmodernexperi-mentaleconomics,OxfordUniversityPress.
Altman,DouglasG.,1985,“Comparabilityofrandomizedgroups,”JournaloftheRoyalStatisticalSociety,SeriesD(TheStatistician),34(1),Statisticsinhealth,125–36.
Angrist,JoshuaD.,2004,“Treatmenteffectheterogeneityintheoryandpractice,”EconomicJournal,114,C52–C83.
Angrist,JoshuaD.,EricBettinger,ErikBloom,ElizabethKing,andMichaelKremer,2002,“VouchersforprivateschoolinginColombia:evidencefromarandomizednaturalexperiment,”AmericanEconomicReview,92(5),1535–58.
Angrist,JoshuaD.andJörn-SteffenPischke,2010,“Thecredibilityrevolutioninem-piricaleconomics:howbetterresearchdesignistakingtheconoutofeconomet-rics,”JournalofEconomicPerspectives,24(2),3–30.
Angrist,JoshuaD.andJörn-SteffenPischke,2017,“Undergraduateeconometricsin-struction:throughourclasses,darkly,”JournalofEconomicPerspectives,31(2),125-44.
Aron-Dine,Aviva,LiranEinav,andAmyFinkelstein,2013,“TheRANDhealthinsur-anceexperiment,threedecadeslater,”JournalofEconomicPerspectives,27(1),197–222.
Arrow,KennethJ.,1975,“Twonotesoninferringlongrunbehaviorfromsocialex-periments,”DocumentNo.P-5546,SantaMonica,CA.RandCorporation.
Ashenfelter,Orley,1978,“Thelaborsupplyresponseofwageearners,”inJohnL.PalmerandJosephA.Pechman,eds.,Welfareinruralareas:theNorthCarolina–IowaIncomeMaintenanceExperiment,Washington,DC.TheBrookingsInstitu-tion.109–38.
Athey,SusanandGuidoW.Imbens,2017,“Thestateofappliedeconometrics:cau-salityandpolicyevaluation,”JournalofEconomicPerspectives,31(2),3-32.
Attanasio,Orazio,CostasMeghir,andAnaSantiago,2012,“EducationchoicesinMexico:usingastructuralmodelandarandomizedexperimenttoevaluatePRO-GRESA,”ReviewofEconomicStudies,79(1),37–66.
Attanasio,Orazio,SarahCattan,EmlaFitzsimons,CostasMeghir,andMartaRubioCodina,2015,“Estimatingtheproductionfunctionforhumancapital:resultsfromarandomizedcontrolledtrialinColombia,”London.InstituteforFiscalStudies,WorkingPapernoW15/06.
Bahadur,R.R.,andLeonardJ.Savage,1956,“Thenon-existenceofcertainstatisticalproceduresinnonparametricproblems,”AnnalsofMathematicalStatistics,25:1115–22.
Banerjee,Abhijit,SylvainChassang,SergioMontero,andErikSnowberg,2016,“Atheoryofexperimenters,”processed,July2016.
![Page 62: NBER WORKING PAPER SERIES UNDERSTANDING AND ...Lars Hansen, Jeff Hammer, Glenn Harrison, Macartan Humphreys, Michal Kolesár, Helen Milner, Tamlyn Munslow, Suresh Naidu, Lant Pritchett,](https://reader033.vdocuments.net/reader033/viewer/2022060100/60b128e9329a544c0969daf7/html5/thumbnails/62.jpg)
61
Banerjee,Abhijit,SylvainChassang,andErikSnowberg,2016,“Decisiontheoreticapproachestoexperimentdesignandexternalvalidity,”Cambridge,MA.NBERWorkingPaperno22167,April.
Banerjee,Abhijit,AngusDeaton,andEstherDuflo,2004,“Healthcaredeliveryinru-ralRajasthan,”EconomicandPoliticalWeekly,39(9),944–9.
Banerjee,AbhijitandEstherDuflo,2009,“Theexperimentalapproachtodevelop-menteconomics,”AnnualReviewofEconomics,1,151-78.
Banerjee,AbhijitandEstherDuflo,2012,Pooreconomics:aradicalrethinkingofthewaytofightglobalpoverty,PublicAffairs.
Banerjee,Abhijit,EstherDuflo,NathanaelGoldberg,DeanKarlan,RobertOsei,Wil-liamParienté,JeremyShapiro,BramThuysbaert,andChristopherUdry,2015,“Amultifacetedprogramcauseslastingprogressfortheverypoor:evidencefromsixcountries,”Science,348(6236),1260799.
Banerjee,Abhijit,EstherDuflo,andRachelGlennerster,2008,“Puttingaband-aidonacorpse:incentivesfornursesintheIndianpublichealthcaresystem,”JournaloftheEuropeanEconomicAssociation,6(2–3),487–500.
Banerjee,AbhijitV.andRuiminHe,2003,“TheWorldBankofthefuture,”AmericanEconomicReview,93(2),39–44.
Banerjee,Abhijit,DeanKarlan,andJonathanZinman,2015,“Sixrandomizedevalua-tionsofmicrocredit:introductionandfurthersteps,”AmericanEconomicJournal:AppliedEconomics,7(1),1-21.
Bareinboim,EliasandJudeaPearl,2013,“Ageneralalgorithmfordecidingtrans-portabilityofexperimentalresults,”JournalofCausalInference,1(1),107-34.
Bareinboim,EliasandJudeaPearl,2014,“Transportabilityfrommultipleenviron-mentswithlimitedexperiments:completenessresults,”inM.Welling,Z.Ghah-ramani,C.Cortes,andN.Lawrence,eds.,AdvancesofNeuralInformationPro-cessing,27,(NIPSProceedings),280-8.
Bauchet,Jonathan,JonathanMorduch,andShamikaRavi,2015,“Failurevsdisplace-ment:whyaninnovativeanti-povertyprogramshowednonetimpactinSouthIndia,”JournalofDevelopmentEconomics,116,1–16.
Basu,Kaushik,2010,“TheeconomicsoffoodgrainmanagementinIndia,”MinistryofFinance,Delhi.http://finmin.nic.in/workingpaper/Foodgrain.pdf
Begg,ColinB.,1990,“Significancetestsofcovarianceimbalanceinclinicaltrials,”ControlledClinicalTrials,11(4),223-5.
Bhattacharya,DebopamandPascalineDupas,2012,“Inferringwelfaremaximizingtreatmentassignmentunderbudgetconstraints,”JournalofEconometrics,167(1),168-96.
Bitler,MarianneP.,JonahB.Gelbach,andHilaryW.Hoynes,2006,“Whatmeanim-pactsmiss:distributionaleffectsofwelfarereformexperiments,”AmericanEco-nomicReview,96(4),988-1012.
Bleyer,Archie,andH.GilbertWelch,2012,“Effectofthreedecadesofscreeningmammographyonbreast-cancerincidence,”NewEnglandJournalofMedicine,367,1998-2005
Bloom,HowardS.,CarolynJ.Hill,andJamesA.Riccio,2005,“Modelingcross-siteex-perimentaldifferencestofindoutwhyprogrameffectivenessvaries,”inHoward
![Page 63: NBER WORKING PAPER SERIES UNDERSTANDING AND ...Lars Hansen, Jeff Hammer, Glenn Harrison, Macartan Humphreys, Michal Kolesár, Helen Milner, Tamlyn Munslow, Suresh Naidu, Lant Pritchett,](https://reader033.vdocuments.net/reader033/viewer/2022060100/60b128e9329a544c0969daf7/html5/thumbnails/63.jpg)
62
S.Bloom,ed.,Learningmorefromsocialexperiments:evolvinganalyticalap-proaches,NewYork,NY.RussellSage.
Bold,Tessa,MwangiKimenyi,GermanoMwabu,AliceNg’ang’a,andJustinSandefur,2013,“Scalingupwhatworks:experimentalevidenceonexternalvalidityinKen-yaneducation,”Washington,DC.CenterforGlobalDevelopment,WorkingPaper321.
Bothwell,LauraE.andScottH.Podolsky,2016,“Theemergenceoftherandomized,controlledtrial,”NewEnglandJournalofMedicine,375(6),501–4.doi:10.1056/NEJMp1604635
Campbell,D.T.andJ.C.Stanley,1963,Experimentalandquasi-experimentaldesignsforresearch.Chicago.RandMcNally.
Cartwright,Nancy,1994,Nature’scapacitiesandtheirmeasurement.Oxford.Claren-donPress.
Cartwright,Nancy,2007,“AreRCTsthegoldstandard?”Biosocieties,2,11-20.Cartwright,Nancy,2011,“Aphilosopher’sviewofthelongroadfromRCTstoeffec-tiveness,”TheLancet,377,1400-01.
Cartwright,Nancy,2012,“Presidentialaddress:willthispolicyworkforyou?Pre-dictingeffectivenessbetter:howphilosophyhelps,”PhilosophyofScience,79,973-89.
Cartwright,Nancy,2016.“Whereistherigorwhenyouneedit?”inI.Marinovic,ed.,FoundationsandTrendsinAccounting:specialissueoncausalinferenceincapitalmarketsresearch,10(2-4):106-24.
Cartwright,NancyandJeremyHardie,2012,Evidencebasedpolicy:apracticalguidetodoingitbetter,Oxford.OxfordUniversityPress.
Cartwright,NancyandEileenMunro,2010,“ThelimitationsofRCTsinpredictingeffectiveness,”JournalofExperimentalChildPsychology,16(2),
Chalmers,Iain,2001,“Comparinglikewithlike:somehistoricalmilestonesintheevolutionofmethodstocreateunbiasedcomparisongroupsintherapeuticexper-iments,”InternationalJournalofEpidemiology,30,1156–64.
Chan,TatY.andBartonH.Hamilton,2006,“Learning,privateinformation,andtheeconomicevaluationofrandomizedexperiments,”JournalofPoliticalEconomy,114(6),997-1040.
Chassang,Sylvain,GerardPadróIMiguel,andErikSnowberg,2012,“Selectivetrials:aprincipal–agentapproachtorandomizedcontrolledexperiments,”AmericanEconomicReview,102(4),1279–1309.
Chassang,Sylvain,ErikSnowberg,BenSeymour,andCayleyBowles,2015,“Ac-countingforbehaviorintreatmenteffects:newapplicationsforblindtrials,”PLoSOne,10(6),e0127227.doi:10:1371/journal.pone.0127227.
Chaudhury,Nazmul,JeffreyHammer,MichaelKremer,KarthikMuralidharan,andF.HalseyRogers,2005,“Missinginaction:teacherandhealthworkerabsenceinde-velopingcountries,”JournalofEconomicPerspectives,19(4),91–116.
Chyn,Eric,2016,“Movedtoopportunity:thelong-runeffectofpublichousingdemo-litiononlabormarketoutcomesofchildren,”UniversityofMichigan.http://www-personal.umich.edu/~ericchyn/Chyn_Moved_to_Opportunity.pdf
![Page 64: NBER WORKING PAPER SERIES UNDERSTANDING AND ...Lars Hansen, Jeff Hammer, Glenn Harrison, Macartan Humphreys, Michal Kolesár, Helen Milner, Tamlyn Munslow, Suresh Naidu, Lant Pritchett,](https://reader033.vdocuments.net/reader033/viewer/2022060100/60b128e9329a544c0969daf7/html5/thumbnails/64.jpg)
63
Chetty,Raj,2009,“Sufficientstatisticsforwelfareanalysis:abridgebetweenstruc-turalandreduced-formmethods,”AnnualReviewofEconomics,1,451-87.
Conlisk,John,1973,“Choiceofresponsefunctionalformindesigningsubsidyexper-iments,”Econometrica,41(4),643–56.
Crépon,Bruno,EstherDuflo,MarcGurgand,RolandRathelot,andPhilippeZamora,2014,“Dolabormarketpolicieshavedisplacementeffects?evidencefromaclus-teredrandomizedexperiment,”QuarterlyJournalofEconomics,128(2),531–80.
Das,JishnuandJeffreyHammer,2005,”Whichdoctor?Combiningvignettesanditemresponsetomeasureclinicalcompetence,”JournalofDevelopmentEconom-ics,78,348–83.
Davey-Smith,George,andShahIbrahim,2002,“Datadredging,bias,orconfound-ing,”BritishMedicalJournal,325,1437-8.
Deaton,Angus,2010,“Instruments,randomization,andlearningaboutdevelop-ment,”JournalofEconomicLiterature,48(2),424-55.
Deaton,AngusandNancyCartwright,2016,“Understandingandmisunderstandingrandomizedcontrolledtrials,”http://www.princeton.edu/~deaton/down-load.html?pdf=Deaton_Cartwright_RCTs_with_ABSTRACT_August_25.pdf
Deaton,AngusandJohnMuellbauer,1980,Economicsandconsumerbehavior,NewYork.CambridgeUniversityPress.
Deaton,AngusandSerenaNg,1998,“Parametricandnonparametricapproachestopriceandtaxreform,”JournaloftheAmericanStatisticalAssociation,93(443),900-9.
Dhaliwal,Iqbal,EstherDuflo,RachelGlennerster,andCaitlinTulloch,2012,“Com-parativecost-effectivenessanalysistoinformpolicyindevelopingcountries:ageneralframeworkwithapplicationsforeducation,”J–PAL,MIT,December3rd.http://www.povertyactionlab.org/publication/cost-effectiveness
Drèze,Jean,2016,Personalemailcommunication.Duflo,Esther,2017,“Theeconomistasplumber,”AmericanEconomicReview,107(5),1-26.
Duflo,Esther,RemaHanna,andStephenP.Ryan,2012,“Incentiveswork:gettingteacherstocometoschool,”AmericanEconomicReview,102(4),1241–78.
Duflo,EstherandMichaelKremer,2008,“Useofrandomizationintheevaluationofdevelopmenteffectiveness,”inWilliamEasterly,ed.,Reinventingforeignaid.Washington,DC.Brookings,93–120.
Dynarski,Susan,2015,“Helpingthepoorineducation:thepowerofasimplenudge,”NewYorkTimes,Jan17,2015.
Fine,PaulE.M.andJacquelineA.Clarkson,1986,“Individualversuspublicprioritiesinthedeterminationofoptimalvaccinationpolicies,”AmericanJournalofEpide-miology,124(6),1012–20.
Fisher,RonaldA.,1926,“Thearrangementoffieldexperiments,”JournaloftheMin-istryofAgricultureofGreatBritain,33,503–13.
Filmer,Deon,JeffreyHammer,andLantPritchett,2000,“Weaklinksinthechain:adiagnosisofhealthpolicyinpoorcountries,”WorldBankResearchObserver,15(2),199–204.
![Page 65: NBER WORKING PAPER SERIES UNDERSTANDING AND ...Lars Hansen, Jeff Hammer, Glenn Harrison, Macartan Humphreys, Michal Kolesár, Helen Milner, Tamlyn Munslow, Suresh Naidu, Lant Pritchett,](https://reader033.vdocuments.net/reader033/viewer/2022060100/60b128e9329a544c0969daf7/html5/thumbnails/65.jpg)
64
Freedman,DavidA.,2008,“Onregressionadjustmentstoexperimentaldata,”Ad-vancesinAppliedMathematics,40,180–93.
Frieden,ThomasR.,2017,“Evidenceforhealthdecisionmaking—beyondrandom-ized,controlledtrials,”NewEnglandJournalofMedicine,377,465-75.
Garfinkel,IrwinandCharlesF.Manski,1992,“Introduction,”inIrwinGarfinkelandCharlesF.Manski,eds.,Evaluatingwelfareandtrainingprograms,Cambridge,MA.HarvardUniversityPress.1–22.
Gerber,AlanS.andDonaldP.Green,2012,FieldExperiments,NewYork.Norton.Gertler,PaulJ.,SebastianMartinez,PatrickPremand,LauraB.Rawlings,andChristelM.J.Vermeersch,2016,Impactevaluationinpractice,2ndEdition,Washington,DC.Inter-AmericanDevelopmentBankandWorldBank.
Goldberger,ArthurS.andCharlesF.Manski,1995,“ReviewArticle:TheBellCurvebyHerrnsteinandMurray,”JournalofEconomicLiterature,33(2),762-76.
Greenberg,DavidandMarkShroder,2004,Thedigestofsocialexperiments(3rded.),Washington,DC.UrbanInstitutePress.
Greenberg,David,MarkShroder,andMatthewOnstott,1999,“Thesocialexperi-mentmarket,”JournalofEconomicPerspectives,13(3),157–72.
Gueron,JudithM.andHowardRolston,2013,Fightingforreliableevidence,NewYork,RussellSage.
Guyatt,Gordon,DavidL.Sackett,andDeborahJ.CookfortheEvidence-BasedMedi-cineWorkingGroup,1994,“Users’guidestothemedicalliteratureII:howtouseanarticleabouttherapyorprevention.B.Whatweretheresultsandwilltheyhelpmeincaringformypatients?”JournaloftheAmericanMedicalAssociation,271(1),59–63.
Harrison,GlennW.,2013,“Fieldexperimentsandmethodologicalintolerance,”Jour-nalofEconomicMethodology,20(2),103–17.
Harrison,GlennW.,2014,“Impactevaluationandwelfareevaluation,”EuropeanJournalofDevelopmentResearch,26,39–45.
Harrison,GlennW.,2014,“Cautionarynotesontheuseoffieldexperimentstoad-dresspolicyissues,”OxfordReviewofEconomicPolicy,30(4),753-63.
Hausman,JerryA.andDavidA.Wise,1985,“Technicalproblemsinsocialexperi-mentation:costversuseaseofanalysis,”inJerryA.HausmanandDavidA.Wise,eds.,SocialExperimentation,Chicago,IL.ChicagoUniversityPress.187–220.
Heckman,JamesJ.,1992,“Randomizationandsocialpolicyevaluation,”inCharlesF.ManskiandIrwinGarfinkel,eds.,Evaluatingwelfareandtrainingprograms,Cam-bridge,MA.HarvardUniversityPress.547–70.
Heckman,JamesJ.,1997,“Instrumentalvariables:astudyofimplicitbehavioralas-sumptionsusedinmakingprogramevaluations,”JournalofHumanResources,32(3),441–62.
Heckman,JamesJ.,2005,“Thescientificmodelofcausality,”SociologicalMethodol-ogy,35(1),1-97.
Heckman,JamesJ.,2008,“Econometriccausality,”InternationalStatisticalReview,76(1),1-27.
![Page 66: NBER WORKING PAPER SERIES UNDERSTANDING AND ...Lars Hansen, Jeff Hammer, Glenn Harrison, Macartan Humphreys, Michal Kolesár, Helen Milner, Tamlyn Munslow, Suresh Naidu, Lant Pritchett,](https://reader033.vdocuments.net/reader033/viewer/2022060100/60b128e9329a544c0969daf7/html5/thumbnails/66.jpg)
65
Heckman,JamesJ.,2010,“Buildingbridgesbetweenstructuralandprogramevalua-tionapproachestoevaluatingpolicy,”JournalofEconomicLiterature,48(2),356-98.
Heckman,JamesJ.,NeilHohman,andJeffreySmith,withtheassistanceofMichaelKhoo,2000,“Substitutionanddropoutbiasinsocialexperiments:astudyofaninfluentialsocialexperiment,”QuarterlyJournalofEconomics,115(2),651–94.
Heckman,JamesJ.,RobertJ.Lalonde,andJeffreyA.Smith,1999,“Theeconomicsandeconometricsofactivelabormarkets,”Chapter31inAshenfelter,OrleyandDa-vidCard,eds.Handbookoflaboreconomics,Amsterdam.North-Holland,3(A),1866–2097.
Heckman,JamesJ.,RodrigoPinto,andPeterSavelyev,2013,“Understandingthemechanismsthroughwhichaninfluentialearlychildhoodprogramboostedadultoutcomes,”AmericanEconomicReview,103(6),2052–86.
Heckman,JamesJ.andJeffreySmith,1995,“Assessingthecaseforsocialexperi-ments,”JournalofEconomicPerspectives,9(2),85-110.
Heckman,JamesJ.,JeffreySmith,andNancyClements,1997,“Makingthemostoutofprogrammeevaluationsandsocialexperiments:accountingforheterogeneityinprogrammeimpacts,”ReviewofEconomicStudies,64(4),487–535.
Heckman,JamesJ.andSergioUrzúa,2010,“ComparingIVwithstructuralmodels:whatsimpleIVcanandcannotidentify,”JournalofEconometrics,156,27-37.
Heckman,JamesJ.andEdwardVytlacil,2005,“Structuralequations,treatmentef-fects,andeconometricpolicyevaluation,”Econometrica,73(3),669–738.
Heckman,JamesJ.andEdwardJ.Vytlacil,2007,“Econometricevaluationofsocialprograms,Part1:causalmodels,structuralmodels,andeconometricpolicyeval-uation,”Chapter70inJamesJ.HeckmanandEdwardE.Leamer,eds.,HandbookofEconometrics,6B,4779–874.
Horton,Richard,2000,“Commonsenseandfigures:therhetoricofvalidityinmedi-cine:BradfordHillmemoriallecture1999,”Statisticsinmedicine,19,3149–64.
Hotz,V.Joseph,GuidoW.Imbens,andJulieH.Mortimer,2005,“Predictingtheeffi-cacyoffuturetrainingprogramsusingpastexperienceatotherlocations,”Jour-nalofEconometrics,125,241–70.
Hsieh,Chang-taiandMiguelUrquiola,2006,“Theeffectsofgeneralizedschoolchoiceonachievementandstratification:evidencefromChile’svoucherpro-gram,”JournalofPublicEconomics,90,1477–1503.
Hurwicz,Leonid,1966,“Onthestructuralformofinterdependentsystems,”StudiesinLogicandtheFoundationsofMathematics,44,232-9.
Imbens,GuidoW.,2004,“Nonparametricestimationofaveragetreatmenteffectsunderexogeneity:areview,”ReviewofEconomicsandStatistics,86(1),4–29.
Imbens,GuidoW.,2010,“BetterLATEthannothing:somecommentsonDeaton(2009)andHeckmanandUrzua,”JournalofEconomicLiterature,48(2),399–423.
Imbens,GuidoW.andJoshuaD.Angrist,1994,“Identificationandestimationoflocalaveragetreatmenteffects,”Econometrica,62(2),467–75.
Imbens,GuidoW.andMichalKolesár,2016,“Robuststandarderrorsinsmallsam-ples:somepracticaladvice,”ReviewofEconomicsandStatistics,98(4),701-12.
![Page 67: NBER WORKING PAPER SERIES UNDERSTANDING AND ...Lars Hansen, Jeff Hammer, Glenn Harrison, Macartan Humphreys, Michal Kolesár, Helen Milner, Tamlyn Munslow, Suresh Naidu, Lant Pritchett,](https://reader033.vdocuments.net/reader033/viewer/2022060100/60b128e9329a544c0969daf7/html5/thumbnails/67.jpg)
66
Imbens,GuidoW.andJeffreyM.Wooldridge,2009,“Recentdevelopmentsintheeconometricsofprogramevaluation,”JournalofEconomicLiterature,47(1),5–86.
InternationalCommitteeofMedicalJournalEditors,2015,Recommendationsfortheconduct,reporting,editing,andpublicationofscholarlyworkinmedicaljournals,http://www.icmje.org/icmje-recommendations.pdf(accessed,August20,2016.)
J_PAL,2017,https://www.povertyactionlab.org/about-j-pal,(accessed,August21,2017).
Kahneman,DanielandGaryKlein,2009,“Conditionsforintuitiveexpertise:afailuretodisagree,”AmericanPsychologist,64(6),515–26.
Karlan,DeanandJacobAppel,2011,Morethangoodintentions:howaneweconom-icsishelpingtosolveglobalpoverty,NewYork.Dutton.
Karlan,Dean,NathanealGoldbergandJamesCopestake,2009,“Randomizedcon-trolledtrialsarethebestwaytomeasureimpactofmicrofinanceprogramsandimprovemicrofinanceproductdesigns,”EnterpriseDevelopmentandMicro-finance,20(3),167–76.
Kasy,Maximilian,2016,“Whyexperimentersmightnotwanttorandomize,andwhattheycoulddoinstead,”PoliticalAnalysis,1–15doi:10.1093/pan/mpw012
Kramer,Peter,2016,Ordinarilywell:thecaseforantidepressants,NewYork.Farrar,Straus,andGiroux.
Kremer,MichaelandAlakaHolla,2009,“Improvingeducationinthedevelopingworld:whathavewelearnedfromrandomizedevaluations?”AnnualReviewofEconomics,1,513–42.
Lalonde,RobertJ.,1986,“Evaluatingtheeconometricevaluationsoftrainingpro-gramswithexperimentaldata,”AmericanEconomicReview,76(4),604-20.
Lehman,Erich.L.andJosephP.Romano,2005,Testingstatisticalhypotheses(thirdedition),NewYork.Springer.
Levy,Santiago,2006,Progressagainstpoverty:sustainingMexico’sProgresa-Opor-tunidadesprogram,Washington,DC.Brookings.
Mackie,JohnL.,1974,Thecementoftheuniverse:astudyofcausation,Oxford.Ox-fordUniversityPress.
Manning,WillardG.,JosephP.Newhouse,NaihuaDuan,EmmettKeeler,andArleenLeibowitz,1988a,“Healthinsuranceandthedemandformedicalcare:evidencefromarandomizedexperiment,”AmericanEconomicReview,77(3),251–77.
Manning,WillardG.,JosephP.Newhouse,NaihuaDuan,EmmettKeeler,BernadetteBenjamin,ArleenLeibowitz,M.SusanMarquis,andJackZwanziger,1988b,Healthinsuranceandthedemandformedicalcare:evidencefromarandomizedex-periment,SantaMonica,CA.RAND.
Manski,CharlesF.,2004,“Treatmentrulesforheterogeneouspopulations,”Econo-metrica,72(4),1221-46.
Manski,CharlesF.,2013,Publicpolicyinanuncertainworld:analysisanddecisions,Cambridge,MA.HarvardUniversityPress.
Manski,CharlesF.andAlekseyTetenov,2016,“Sufficienttrialsizetoinformclinicalpractice,”PNAS,113(38),10518-23.
![Page 68: NBER WORKING PAPER SERIES UNDERSTANDING AND ...Lars Hansen, Jeff Hammer, Glenn Harrison, Macartan Humphreys, Michal Kolesár, Helen Milner, Tamlyn Munslow, Suresh Naidu, Lant Pritchett,](https://reader033.vdocuments.net/reader033/viewer/2022060100/60b128e9329a544c0969daf7/html5/thumbnails/68.jpg)
67
Metcalf,CharlesE.,1973,“Makinginferencesfromcontrolledincomemaintenanceexperiments,”AmericanEconomicReview,63(3),478–83.
Moffitt,Robert,1979,“ThelaborsupplyresponseintheGaryexperiment,”JournalofHumanResources,14(4),477–87.
Moffitt,Robert,1992,“Evaluationmethodsforprogramentryeffects,”Chapter6inCharlesManskiandIrwinGarfinkel,Evaluatingwelfareandtrainingprograms,Cambridge,MA.HarvardUniversityPress,231–52.
Moffitt,Robert,2004,“Theroleofrandomizedfieldtrialsinsocialscienceresearch:aperspectivefromevaluationsofreformsofsocialwelfareprograms,”AmericanBehavioralScientist,47(5),506–40
Morgan,KariLockandDonaldB.Rubin,2012,“Rerandomizationtoimprovecovari-atebalanceinexperiments,”AnnalsofStatistics,40(2),1263–82.
Muller,SeánM.,2015,“Causalinteractionandexternalvalidity:obstaclestothepol-icyrelevanceofrandomizedevaluations,”WorldBankEconomicReview,29,S217–S225.
Orcutt,GuyH.andAliceG.Orcutt,1968,“Incentiveanddisincentiveexperimenta-tionforincomemaintenancepolicypurposes,”AmericanEconomicReview,58(4),754–72.
Pearl,JudeaandEliasBareinboim,2011,“Transportabilityofcausalandstatisticalrelations:aformalapproach,”Proceedingsofthe25thAAAIConferenceonArtificialIntelligence,AAAIPress,247-54,
Pearl, Judea and Elias Bareinboim, 2014, “External validity: from do-calculus to trans-portability across populations,” Statistical Science, 29(4), 579-95.
Rodrik,Dani,2006,personalemailcommunication.Rothwell,PeterM.,2005,“Externalvalidityofrandomizedcontrolledtrials:‘towhomdotheresultsofthetrialapply’”,Lancet,365,82–93.
Russell,Bertrand,2008[1912],Theproblemsofphilosophy,Rockville,MD.ArcManor.
Sackett,DavidL.,WilliamM.C.Rosenberg,J.A.MuirGray,R.BrianHaynesandW.ScottRichardson,1996,“Evidencebasedmedicine:whatitisandwhatitisn’t,”BritishMedicalJournal,312(January13),71–2.
Savage,LeonardJ.,1962,“Subjectiveprobabilityandstatisticalpractice,”inG.A.Bar-nardandD.R.Cox,eds.,TheFoundationsofStatisticalInference,London.Me-thuen.9-35.
Scriven,Michael,1974,“Evaluationperspectivesandprocedures,”inW.JamesPop-ham,ed.,Evaluationineducation—currentapplications,Berkeley,CA.McCutchanPublishingCorporation.
Senn,Stephen,1994,“Testingforbaselinebalanceinclinicaltrials,”StatisticsinMedicine,13,1715–26.
Senn,Stephen,2013,“Sevenmythsofrandomizationinclinicaltrials,”StatisticsinMedicine32,1439–50.
Shadish,WilliamR.,ThomasD.Cook,andDonaldT.Campbell,2002,Experimentalandquasi-experimentaldesignsforgeneralizedcausalinference,Boston,MA.HoughtonMifflin.
Simpson,Adrian,2017,“Themisdirectionofpublicpolicy:comparingandcombin-ingstandardisedeffectsizes,”JournalofEducationalPolicy32(4),450-66.
![Page 69: NBER WORKING PAPER SERIES UNDERSTANDING AND ...Lars Hansen, Jeff Hammer, Glenn Harrison, Macartan Humphreys, Michal Kolesár, Helen Milner, Tamlyn Munslow, Suresh Naidu, Lant Pritchett,](https://reader033.vdocuments.net/reader033/viewer/2022060100/60b128e9329a544c0969daf7/html5/thumbnails/69.jpg)
68
Stuart,ElizabethA.,StephenR.Cole,andCatharineP.Bradshaw,andPhilipJ.Leaf,2011,“Theuseofpropensityscorestoassessthegeneralizabilityofresultsfromrandomizedtrials,”JournaloftheRoyalStatisticalSocietyA,174(2),369–86.
Student(W.S.Gosset),1938,“Comparisonbetweenbalancedandrandomarrange-mentsoffieldplots,”Biometrika,29(3/4),363-78.
Svorencik,Andrej,2015,Theexperimentalturnineconomics:ahistoryofexperi-mentaleconomics,UtrechtSchoolofEconomics,DissertationSeries#29,http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2560026
Todd,PetraE.andKennethJ.Wolpin,2006,“Assessingtheimpactofaschoolsub-sidyprograminMexico:usingasocialexperimenttovalidateadynamicbehav-ioralmodelofchildschoolingandfertility,”AmericanEconomicReview,96(5),1384–1417.
Todd,PetraE.andKennethJ.Wolpin,2008,“Exanteevaluationofsocialprograms,”Annalesd’EconomieetdelaStatistique,91/92,263–91.
U.S.DepartmentofEducation,InstituteofEducationSciences,NationalCenterforEducationEvaluationandRegionalAssistance,2003,Identifyingandimplement-ingeducationalpracticessupportedbyrigorousevidence:auserfriendlyguide,Washington,DC.InstituteofEducationSciences.
Vandenbroucke,JanP.,2004,“Whenareobservationalstudiesascredibleasran-domizedcontrolledtrials?”TheLancet,363:1728–31.
Vandenbroucke,JanP..2009,“TheHRTcontroversy:observationalstudiesandRCTsfallinline,”TheLancet,373,1233-5.
Vivalt,Eva,2015,“Howmuchcanwegeneralizefromimpactevaluations?”NYU,un-published.http://evavivalt.com/wp-content/uploads/2014/10/Vivalt-JMP-10.27.14.pdf
White,Halbert,1980,“Aheteroskedasticity-consistentcovariancematrixestimatorandadirecttestforheteroskedasticity,”Econometrica,50(1),1–25.
Wise,DavidA.,1985,“Abehavioralmodelversusexperimentation:theeffectsofhousingsubsidiesonrent,”inP.BruckerandR.Pauly,eds.MethodsofOperationsResearch,50,VerlagAnonHain.441–89.
Wolpin,KennethI.,2013,Thelimitsofinferencewithouttheory,Camridge,MA.MITPress.
Worrall,John,2007,“Evidenceinmedicineandevidence-basedmedicine,”Philoso-phyCompass,2/6,981–1022.
Worrall,John,2008,“Evidenceandethicsinmedicine,”PerspectivesinBiologyandMedicine,51(3),418-31.
Yates,Frank,1939,“Thecomparativeadvantagesofsystematicandrandomizedar-rangementsinthedesignofagriculturalandbiologicalexperiments,”Biometrika,30(3/4),440-66.
Young,Alwyn,2016,“ChannelingFisher:randomizationtestsandthestatisticalin-significanceofseeminglysignificantexperimentalresults,”LondonSchoolofEco-nomics,WorkingPaper,Feb.
Ziliak,StephenT.,2014,“Balancedversusrandomizedfieldexperimentsineconom-ics:whyW.S.Gossetaka‘Student’matters,”ReviewofBehavioralEconomics,1,167–208.
![Page 70: NBER WORKING PAPER SERIES UNDERSTANDING AND ...Lars Hansen, Jeff Hammer, Glenn Harrison, Macartan Humphreys, Michal Kolesár, Helen Milner, Tamlyn Munslow, Suresh Naidu, Lant Pritchett,](https://reader033.vdocuments.net/reader033/viewer/2022060100/60b128e9329a544c0969daf7/html5/thumbnails/70.jpg)
69
Appendix:MonteCarloexperimentforanRCTwithoutliersInthisillustrativeexample,thereisparentpopulationeachmemberofwhichhashisorher
owntreatmenteffect;thesearecontinuouslydistributedwithashiftedlognormaldistribu-
tionwithzeromeansothatthepopulationATEiszero.Theindividualtreatmenteffectsβ
aredistributedsothat β + e0.5 ∼ Λ(0,1) ,forstandardizedlognormaldistributionΛ. Inthe
absenceoftreatment,everyoneinthesamplerecordszero,sothesampleaveragetreat-
menteffectinanyonetrialissimplythemeanoutcomeamongthentreatments.Forvalues
ofnequalto25,50,100,200,and500wedrawfromtheparentpopulation100trialsam-
pleseachofsize2n;withfivevaluesofn,thisgivesus500trialsamplesinall;becauseof
samplingthetrueATE’sineachtrialsamplewillnotbezero.Foreachofthese500samples,
werandomizeintoncontrolsandntreatments,estimatetheATEanditsestimatedt–value
(usingthestandardtwo-samplet–value,orequivalently,byrunningaregressionwithro-
bustt–values),andthenrepeat1,000times,sowehave1,000ATEestimatesandt–values
foreachofthe500trialsamples.TheseallowustoassessthedistributionofATEestimates
andtheirnominalt–valuesforeachtrial.
TheresultsareshowninTableA1.Eachrowcorrespondstoasamplesize.Ineach
row,weshowtheresultsof100,000individualtrials,composedof1,000replicationson
eachofthe100trial(experimental)samples.Thecolumnsareaveragedoverall100,000tri-
als.
TableA1:RCTswithskewedtreatmenteffects
Samplesize MeanofATE
estimates
Meanofnominalt–
values
Fractionnullre-
jected(percent)
25
50
0.0268
0.0266
–0.4274
–0.2952
13.54
11.20
100 –0.0018 –0.2600 8.71
200 0.0184 –0.1748 7.09
500 –0.0024 –0.1362 6.06
Note:1,000randomizationsoneachof100drawsofthetrialsamplerandomlydrawnfromalognormaldistributionoftreatmenteffectsshiftedtohaveazeromean.
![Page 71: NBER WORKING PAPER SERIES UNDERSTANDING AND ...Lars Hansen, Jeff Hammer, Glenn Harrison, Macartan Humphreys, Michal Kolesár, Helen Milner, Tamlyn Munslow, Suresh Naidu, Lant Pritchett,](https://reader033.vdocuments.net/reader033/viewer/2022060100/60b128e9329a544c0969daf7/html5/thumbnails/71.jpg)
70
Thelastcolumnshowsthefractionsoftimesthenullthatistrueinthepopulationis
rejectedinthetrialsamplesandisourkeyresult.Whenthereareonly50treatmentsand
50controls(row2),the(true)nullisrejected11.2percentofthetime,insteadofthe5per-
centthatwewouldlikeandexpectifwewereunawareoftheproblem.Whenthereare500
unitsineacharm,therejectionrateis6.06percent,muchclosertothenominal5percent.
FigureA1:EstimatesofanATEwithanoutlierinthetrialsample
FigureA1illustratestheestimatedATEsfromanextremetrialsamplefromthesimulations
inthesecondrowwith100observationsintotal;thehistogramshowsthe1,000estimates
oftheATEforthattrialsample.Thistrialsamplehasasinglelargeoutlyingtreatmenteffect
of48.3;themean(s.d.)oftheother99observationsis–0.51(2.1);whentheoutlierisinthe
treatmentgroup,wegettheright-handsideofthefigure,whenitisinthecontrolgroup,we
gettheleft-handside.
0.5
11.
5D
ensi
ty
-.5 0 .5 1 1.5 21,000 estimates of average treatment effect