epidemiology of exposure to mixtures: we can’t be casual ... · 1 epidemiology of exposure to...
Post on 09-Jul-2020
5 Views
Preview:
TRANSCRIPT
1
Epidemiologyofexposuretomixtures:wecan’tbecasualaboutcausality
whenusingortestingmethods
ThomasF.Webster1andMarcG.Weisskopf2
1DepartmentofEnvironmentalHealth,BostonUniversitySchoolofPublicHealth,
Boston,MA,USA
2DepartmentofEnvironmentalHealth,HarvardT.H.ChanSchoolofPublicHealth,
Boston,MA,USA
Correspondingauthor:
Dr.ThomasF.Webster
ORCiD:0000-0003-4896-9323
DepartmentofEnvironmentalHealth(T4W)
BostonUniversitySchoolofPublicHealth
715AlbanySt
Boston,MA02118USA
email:twebster@bu.edu
2
Abstract
Background:Thereisincreasinginterestinapproachesforanalyzingtheeffectof
exposuremixturesonhealth.Akeyissueishowtosimultaneouslyanalyzeoften
highlycollinearcomponentsofthemixture,whichcancreateproblemssuchas
confoundingbyco-exposureandco-exposureamplificationbias(CAB).Evaluation
ofnovelmixturesmethods,typicallyusingsyntheticdata,iscriticaltotheirultimate
utility.
Objectives:Thispaperaimstoanswertwoquestions.Howdocausalmodelsinform
theinterpretationofstatisticalmodelsandthecreationofsyntheticdatausedtotest
them?ArenovelmixturesmethodssusceptibletoCAB?
Methods:Weusedirectedacyclicgraphs(DAGs)andlinearmodelstoderiveclosed
formsolutionsformodelparameterstoexaminehowunderlyingcausal
assumptionsaffecttheinterpretationofmodelresults.
Results:Thesamebetacoefficientsestimatedbyastatisticalmodelcanhave
differentinterpretationsdependingontheassumedcausalstructure.Similarly,the
methodusedtosimulatedatacanhaveimplicationsfortheunderlyingDAG(and
viceversa),andthereforetheidentificationoftheparameterbeingestimatedwith
ananalyticapproach.Wedemonstratethatmethodsthatcanreproduceresultsof
linearregression,suchasBayesiankernelmachineregressionandthenewquantile
g-computationapproach,willbesubjecttoCAB.However,undersomeconditions,
estimatesofanoveralleffectofthemixtureisnotsubjecttoCABandevenhas
reduceduncontrolledbias.
3
Discussion:JustasDAGsencodeapriorisubjectmatterknowledgeallowing
identificationofvariablecontrolneededtoblockanalyticbias,werecommend
explicitlyidentifyingDAGsunderlyingsyntheticdatacreatedtoteststatistical
mixturesapproaches.Estimatesofthetotaleffectofamixtureisanimportantbut
relativelyunderexploredtopicthatwarrantsfurtherinvestigation.
4
Introduction
Ithaslongbeenrecognizedthathumansandecosystemsareexposedto
multiplechemicalandnon-chemicalagentsthatmayaffecthealth.While
environmentalepidemiologiststendtoanalyzeexposuresoneatatime,methodsfor
examiningexposuretomixturesofexposureshaverecentlyreceivedincreased
attention,includingconferencesandgrantscomparingapproaches(Carlinetal
2013,Tayloretal.2016,PRIME).Questionsofinteresttomixturesepidemiology
includeselectionofexposurevariablesthatcontributetotheoutcome,interactions,
jointeffectofthemixtureasawholeandconstructionofexposuresummary
measures.Importantissuesthatcomeupinthiscontextincludecollinearityand
confoundingbyco-exposures(Braunetal2016).
Causalityisofgreatconcerntoepidemiology,buttherehasbeenarevolution
incausalmethodsinthelastdecadeortwo.Theuseofdirectedacyclicgraphs
(DAGs)isonesuchpowerfulmethod(e.g.,HernanandRobins2020).Causal
methodsarenowwidelyseenascriticaltothedesignandanalysisofetiologic
epidemiologicstudies(exemptfromthisrulearedescriptivestudiesorpurely
predictivemodels,aslongasitisclearthattheirpurposeisnotetiologic
investigation).Inparticular,oneneedstohaveacausalmodelinmindwhen
interpretingepidemiologicresults.
However,causalityhassofarreceivedrelativelylittleattentioninmixtures
epidemiology(Weisskopfetal.2018,Keiletal2020).Analogouslytotheideathat
causalmodelsarenecessarytoevaluatepossibleconfoundingaswellascollider
bias(Hernánetal2002),wewillalsoargueherethatcausalmodelsarenecessary
5
whentestingandcomparingmethodsformixturesepidemiology.Thisincludes
constructionandanalysisofsyntheticdatathatareoftenusedforevaluationof
approacheswithoutclosedformsolutions,asistrueformanynovelmethods.
Correlationbetweenexposuresiscommonandcreatesbothepidemiologic
andstatisticalissues.Inthispaperweexploretheintersectionofmixtures
epidemiologyandcausalitybyexaminingtwoquestionsthatrelatetocollinearity.
1)Howisagivenmixturesepidemiologymethodaffectedbyincreasingcollinearity?
Inparticular,howshouldwecreatesyntheticdatatoanswerthisquestion?2)Are
newermixturesmethods,suchasBayesiankernelmachineregression(bkmr)
(Bobbetal2015)andquantileg-computation(qgcomp)(Keiletal2020),
susceptibletoco-exposureamplificationbias(CAB),whichwehavepreviously
describedasapotentialproblemwhenexaminingmultiplecorrelatedexposures
(Weisskopfetal.2018)?Beforeaddressingthesequestions,wewillbrieflyreview
causalmodelsandmixturesepidemiology.
1.Background:Causalmodelsandmixturesepidemiology
Tokeepthingssimple,wewillrestrictourselvestothesituationofone
outcome,twoexposuresandalimitednumberofconfounders.Exposuretomixtures
canobviouslybefarmorecomplex(e.g.,Weisskopfetal.2018).Whilethis
representsaverysimplemixturesproblem,interpretationcanstillbecomplicated.
Fornow,let’sconsidertwopossiblecausalmodels:
1a.Confoundingbyco-exposure
6
AsshownintheDAGinFigure1a,assumethatthetwoexposures,X1andX2,
arecorrelatedduetoanunknownorunmeasuredcauseU,e.g.,acommonsource.
TheDAG,iftrue,representsthecausalconnectionsbetweenvariables(thearrows).
ForareviewofDAGsandrelatedterminologysee(GlymourandGreenland2008;
HernanandRobins2020;Greenlandetal.1999).Here,bothX1andX2arecausally
associatedwiththeoutcomeY(Throughoutthispaperweassumenoeffectmeasure
modificationoftheserelationshipsorinteractions,whicharetypicallynotreflected
inDAGs).Supposethatweareinterestedinthecontributionofeachmixture
componenttotheoutcome.ThecrudeX1-YassociationisconfoundedbyX2because
inadditiontothecausalpathwayfromX1toY,thereisalsoanopenpathwayfrom
X1toYviaX2.Similarly,theX2-YassociationisconfoundedbyX1.Wecallthis
problemconfoundingbyco-exposure.Atypicalsolutionistomutuallyadjustfor
bothX1andX2,e.g.,putbothinaregressionmodel.ForthisDAG,mutual
adjustmentproducesunbiasedresults.TheX1-Yassociationisunbiasedbecause
adjustingforX2blocksthepathwayfromX1toYviaX2,andviceversa.These
conclusionsdonotdependonthemathematicalformoftherelationships
underlyingthelinksbetweenvariables(Greenlandetal1999).
Themagnitudeofthecrudeandmutuallyadjustedeffectestimateshave
straightforwardclosed-formsolutionswhenthevariablesarecontinuous,the
associationsarelinear,andweapplylinearregression.Letb1bethecausal
coefficientlinkingX1toY,i.e.,theincreaseinYcausedbyaoneunitincreaseinX1,
andsimilarlyforb2linkingX2toY.Forsimplicity,we’llassumethatthevariables
arestandardized(centeredwithunitvariances).Letρbethecorrelationcoefficient
7
forX1andX2(combiningthetwocausalcoefficientsb3andb4linkingUtoX1and
X2.NotethattheotherpathwaybetweenX1andX2viaYisblockedbecauseYisa
collider).Table1showsthecrudeandadjustedregressioncoefficients(see
Weisskopfetal.2018forderivations).
1b.Co-exposureamplificationbias
Nowlet’sexamineanotherpossibleDAGforX1,X2andY.Figure1b
describesasituationcalledco-exposureamplificationbias(CAB)(Weisskopfetal.
2018),avariationofz-amplificationbias(e.g.,Pearl2010).X1andX2arecorrelated
asbeforebecauseofU,butonlyX1iscausallyassociatedwithY.Supposethatthere
isalsoanunknown(orunmeasured)variableU’thatconfoundsthecrudeX1-Y
association.Forexample,thistypeofsituationmightoccurwhenX1isabiomarker
measuredinserumandU’isaphysiologicalfactorthataffectsboththebiomarker
X1andtheoutcome.Inthisexample,thecrudeX1-YassociationisconfoundedbyU’,
butnotbyco-exposureX2.ThecrudeX2-YassociationisconfoundedbyX1,butnot
byU’asthatpathwayisblockedbythecolliderX1.
ThenaturaltendencyisagaintomutuallyadjustforX1andX2,particularlyif
wedon’tknowthatU’exists.Whathappens?ItturnsoutthattheX1-Yassociation
(adjustedforX2)canbemorebiasedthanthecrudeestimate(Weisskopfetal.
2018).Meanwhile,adjustingforX1eliminatestheconfoundingbyco-exposurefor
theX2-Yassociation(byblockingthatpathway),butleadstoconfoundingbyU’
(openingthepathwaythroughU’byconditioningonthecolliderX1).
8
Wecanquantifytheresultswhentheassociationsarelinearandweapply
linearregression,againassumingstandardizedvariables.Aspreviouslyshown
(Weisskopfetal.2018),thebiasinthemutuallyadjustedX1-Yassociationisalways
amplifiedby1/(1-ρ2)comparedtothecruderesult.TheX2-Yassociationisalso
typicallybiased.Forexample,ifallfourcoefficients(c1,c2,c3,andρ=c4c5here)are
positive,thentheX2-Yassociationswitchessignfrompositiveforthecrude
associationtonegativeforthemutuallyadjustedassociation.SeeTable2.
2.EquivalencyofdataandregressionresultsbetweendifferentDAGs
Thetwoexamplesillustratetheimportanceofhavingacausalmodelinmind
whenanalyzingdataandinterpretingresults.Withthesamethreevariables(X1,X2,
Y),mutualadjustmenteliminatesbiasintheconfoundingbyco-exposurecase,while
mutualadjustmentcanmakebiasworseintheCABcase.Importantly,onecannot
tellfromtheregressionresultswhichoftheseDAGs(ormanyothers)iscorrect.
Subjectareaspecificinformationisrequired.
Indeed,thesameunderlyingdatacanbeconsistentwithbothDAGs,i.e.,both
DAGscangenerateexactlythesamecrudeandadjustedregressionresults.
ComparisonofTables1and2showthecorrespondencebetweenparameters.
Equatingtheadjustedregressionresults,
β1|2 = b1 = c1 +
c2c3
1− ρ 2 (1a)
β2|1 = b2 = -c2c3ρ
1− ρ 2
⎛
⎝⎜
⎞
⎠⎟ (1b)
9
wedirectlyobtainthecausalcoefficientsforfigure1a(b1,b2)fromthoseforfigure
1b(c1,c2,c3).Somealgebra(notshown)allowsonetoexpress(c1,c2c3)asafunction
of(b1,b2);notethatonlytheproductc2c3isdetermined(andnotc2andc3
individually).
c1 = b1 +
b2
ρ (2a)
c2c3 = −b21− ρ 2
ρ
⎛
⎝⎜
⎞
⎠⎟ (2b)
Thismeansthattheregressionresultβ1doesn’ttelluswhetherweare
obtainingthetruecausalestimatethatwewant(b1)underDAG1a,orabiased
estimate(c1+c2c3/(1-ρ2))ofthecausalestimatewewant(c1),underDAG1b,and
similarlyforβ2.
Thesamecorrespondences(equations1,2)areobtainedusingthecrude
results.(ToderiveequivalentparametersbetweenDAGs,theunitvariance
assumptionsmaysometimesneedtoberelaxed).
ThereareofcourseotherDAGsconsistentwithtwoexposurevariablesand
oneoutcomevariable,manymoreifoneallowsthenumberofunknownor
unmeasuredvariablestoincrease.Clearly,regressioncannotinandofitselftellus
thecorrectunderlyingDAG.Wecannotdeterminefromdataalonethebestwayto
analyzedataorinterpretresults.Thisistrueforbothrealworlddataandsynthetic
data.
3.Generationofsyntheticmixturesepidemiologydata
10
Collinearityandconfoundingbyco-exposurearetwooftheissuesfrequently
mentionedinmixturesepidemiology.Forexample,inFigure1a,wemightwonder
abouttheeffectofhighcorrelationbetweenX1andX2onbiasineffectestimatesor
selectionofvariables.Forexample,ifb2=0,howwouldourabilitytoselectonlyX1
beaffectedbyincreasingρ?Insomesituationsandforsomestatisticalmethodswe
cananswerthesequestionsmathematically.Inothercases,wewouldtypically
generatesyntheticdataandcomparetheresultswiththetruemodelusedto
generatethedata.Butinthiscase,wemustpayattentiontothewaywesimulatethe
data.
SupposewewanttogeneratesyntheticdatacorrespondingtoFig1a,
confoundingbyco-exposure,foralinearmodelwithparametersb1,b2,andρ=b3b4.
Let’sfurtherassume,forsimplicity,thatX1andX2arestandardnormal(these
assumptionscanberelaxedifdesired,e.g.,assuminglog-normaldataorcorrelated
quantiledata,e.g.,Keiletal2020).Atleasttwomethodshavebeenused.
3a.Method1:
Followthesestepstosimulatedataforalinearmodelforthescenariodescribedby
Figure1a,buildingdirectlyfromtheDAG:
1)Pickvaluesforthecausalcoefficientsb1,b2,b3,b4aswellasthesample
sizen.
2)GeneratenvaluesofUassumingastandardnormaldistribution.
3)GenerateX1andX2usingthefollowingequations:
11
X1= b3U +δ1X 2 = b4U +δ2
(3)
whereδ1andδ2areerrorterms.IntermsofDAGs,thisisequivalenttoaddingnew
variablesδ1andδ2thatpointonlyintoX1andX2,respectively,asinFig1c;such
variablesaretypicallyomittedinDAGsbecausetheyarenotacauseofmorethan
onevariable(andthereforecannotintroducestructuralbias).Asdiscussedinthe
SupplementalMaterial,weincludetheseerrorterms(withappropriatevariances)
sothatX1andX2arestandardnormalwithρ=b3b4equaltotheircorrelation
coefficient.
4)GeneratetheexpectedvaluesofYusingthefollowingequation:
E[Y ]= b1X1 +b2X 2 (4)
ThetwotermscorrespondtothetwoarrowspointingdirectlyintoYfromX1and
X2.Forsimplicity,weomittheconstanttermin(4).
5)ObtainthevectorYofoutcomedatawithsomeaddednoise,byaddingto
(4)avectorofnormalerrortermsε(centeredatzero)withvarianceσ2:
Y = E[Y ]+ε = b1X1 +b2X 2 +ε (5)
Again,thiscorrespondstoanewvariableεpointingintoY(seeFig1c).Ifwealso
wantYtohaveaspecifiedvariance,e.g.,V(Y)=1,thenweapplythestandard
equationforthevarianceofsumsto(5)andrearrange,obtaining
V (ε) =V (Y )−b12 −b2
2 − 2ρb1b2 (6)
Sincevariancesmustbenon-negative,thisplacesconstraintsontheparametersρ,
b1,b2.NotethatV(ε)dependsonρ.
12
3b.Method2:
Thereisasecondapproachthatcansometimessimplifythesimulation(e.g.,
Carricoetal2015,Czarnotaetal2015).Supposewewanttosimulatedataforthe
DAGinfigure1a,assumingthatX1andX2aremultivariatenormalwithcorrelation
ρandtheunderlyinglinearmodelforYasin(4).ThenYisalsonormal,i.e.,(X1,X2,
Y)aremultivariatenormal.Withthisinsight,onecangenerateasetofsynthetic
dataforFigure1aasfollows:
1)Pickvaluesforρ,b1,b2aswellasthesamplesizen.
2)Computethecrudecorrelationcoefficientsbetweenthethreevariables.
ThecorrelationcoefficientforX1-X2isofcourseρ.Whenvariancesequalone,the
correlationsforX1-YandX2-YareequaltothecrudecoefficientsinTable1:r1y=b1+
ρb2andr2y=b2+ρb1.
3)Createthesymmetricvariance-covariancematrixforthethreevariables
(X1,X2,Y):
Σ =
1 ρ r1yρ 1 r2 yr1y r2 y 1
⎡
⎣
⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥
(7)
WehaveassumedunitvariancesforX1,X2andY,butthiscanberelaxed.Notethat
Σmustbepositivedefinite.
4)Generatethemultivariatenormaldata(X1,X2,Y)directlyusing(7),forexample
byusingthemvrnormfunctioninR.
3c.Hybridmethods
13
Inpractice,athirdapproachforcreatingsyntheticdataisoftenusedthat
combineselementsofbothmethods1and2.Inourexampleformethod1above,we
firstcreatedUandthenbuiltX1andX2fromthat,includingtheextraerrortermsδ1
andδ2.ThisextraeffortisnotnecessaryifthegoalistoobtainX1andX2as
bivariatenormal.Hence,onemightuseamethod2approachtosimulateX1andX2
andthenamethod1approachtosimulateY.Ouropinionisthatitdoesn’treally
matteraslongasoneisclearabouttheDAGbeingsimulatedandtheadditional
assumptionsneededforturningaqualitativeDAGintoadataset.Toguide
constructionofsimulationsandinterpretationofresultswerecommend
researchersalwaysincludeaDAGandexplicitlyindicatingwithreferencetothe
DAGwhichparameter(s)theyareattemptingtorecover(asthe“truth”)withtheir
mixturesanalysismeasure.
3d.Calculatingbiasusingsyntheticdata
Tonumericallytestastatisticalmethod,e.g.,linearregression,onewould
typicallygenerateadataset(usingoneofthesemethod),runlinearregression—
perhapsbothcrudeandmutuallyadjusted—savetheresults,andrepeat,say1000
times.Tocalculatebiasoftheindividualregressionestimatesassumingfigure1a,
averagetheX1-Yestimatesofthe1000datasetsandsubtractthetruevalueofb1,
andsimilarlyforX2-Yusingb2.Thevarianceoftheregressioncoefficientscanalso
becomputed.
Methods1and2canalsobeusedtogeneratedataforfigure1b.Hereone
wouldpickparametersc1,c2c3,andn.Amethod1approachmightfirstsimulateU
14
andU’,usetheseresultstosimulateX1andX2,andfinallyconstructY.Amethod2
approachwouldusethecrudecoefficientsfromTable2forr1yandr2y.Tocalculate
biasoftheregressionestimatesforFigure1b,averagetheX1-Yestimatesofthe
simulateddatasetsandsubtractc1,andsimilarlyforX2-Y,wherethetruevalueis
zero.
Notethatmethod1ismoreflexibleandcanaccommodatenonlinear
functions.Method2isoftensimplerwhentherequiredassumptionsaremet,often
allowingonetoskipconstructionofextravariables,e.g.,U,U’.Butforasinglesetof
parametersandalinearmodel,bothmethodscaninprinciplegenerateexactlythe
samedata,dataconsistentwitheitherFigure1aor1b.
4.Howisagivenmixturesepidemiologymethodaffectedbyincreasing
collinearity?Howshouldwecreatesyntheticdatatoanswerthisquestion?
Wecannowaddressthequestionofgeneratingsyntheticdatatoexamine
collinearity.Thegeneraltopicofthepatternofcorrelationsfoundinexposuredata
isbeyondthescopeofthispaper(Weisskopfetal.2018,Webster2018).Forour
purposeshere,itissufficienttonotethatcorrelationsbetweenexposurestypically
varyfromzerotonearlyoneandtendtoformblocksofcorrelatedvariables;
negativecorrelationscanalsooccurbutdonotappeartobeascommon.Thehigh
degreeofcorrelationshownbysomeexposuresraisesthespectreofcollinearity,a
standardissueforregression.
Hence,theeffectofcollinearityisapotentiallyimportantstatisticalissuefor
mixturesmethods.Indeed,itisoneofthemotivationsforsomenovelmethodssuch
15
asweightedquantilesumregression(WQS)(Carricoetal2015).Itiswellknown
thatinlinearregression,highdegreesofcorrelationdonotleadtobiasedregression
coefficients—biasedinthestatisticalsenseofdifferencefromtruthonaverage—but
increasethesizeofstandarderrorsandconfidenceintervals(e.g.,Schistermanetal
2017).Theeffectsofcollinearityonnovelmixturesmethodsarelesswellknown
andoftenlessamenabletotheoreticalanalysis.
SupposewewanttoexaminecollinearityfortheDAGinFigure1a,
confoundingbyco-exposure.Morespecifically,wewanttodeterminetheeffectof
increasingthecorrelationbetweenX1andX2(ρ)whilekeepingthecausal
coefficientslinkingX1toY(b1)andX2toY(b2)fixed.Wehypothesizethatmany
researchershavethistypeofmodelinmindwheninvestigatingeffectsof
collinearity,evenifnotexplicitlystated.
WeearlierdescribedmethodsforsimulatingalinearmodelforFigure1a.To
simulateincreasingcollinearity,wemight,forexample,usethehybridmethodto
generateaseriesofdatasets,keepingb1,b2thesame,whileincreasingρ(As
discussedearlier,somecaremaybeneededwithrespecttoV(ε)ifvarianceswillbe
examined).Toexaminebias,wewouldthencomparetheaverageX1-YandX2-Y
associationswithb1andb2,respectively.
Aswesawearlier,methods1,2andthehybridapproachcangeneratethe
sameresultsforanysingledataset.Onemightthereforebetemptedtousemethod
2,increasingρ,butkeepingr1yandr2ythesame.However,recallthatfortheDAGin
Figure1a(andcorrespondingTable1),r1yisthecrudeassociationbetweenX1and
Y,similarlyforX2:
16
r1y = b1 + ρb2 (7a)
r2 y = b2 + ρb1 (7b)
Theseequationstellusthatforfixedb1andb2,changingρwillchanger1yandr2y.
Putanotherway,changingρwhilefixingr1yandr2ymeansthatb1andb2have
changedfromthoseinitiallyspecifiedinstep1ofMethod2above.Butthen
comparingregressionresultswiththeoriginalb1andb2toexaminebiasdoesn’t
makesense.Theproblemisthatwehavechangedallofthecausalcoefficientsin
Figure1a(b1,b2andρ=b3b4)ratherthanjustthosegoverningtheX1-X2correlation
(orwe’veabandonedtheDAGinFigure1a—seebelow).Infact,inordertokeepthe
r1yandr2yfixedwhilechangingρ,oneofb1orb2willmovehigherandtheother
lower—aphenomenonreferredtoasthereversalparadoxthatisoften
misinterpretedasastatisticaleffectonregressionbetaestimateswhenputting
correlatedvariablesinamodeltogetherratherthanaconsequenceofhaving
(usuallyunwittingly)changedtheunderlyingcausalparametersbeingmodeled
(e.g.,Tuetal2008,Vatchevaetal2016).Asimilarproblemwouldbeassociated
withtheDAGinFig1b.Supposewewanttofixc1,c2,c3,andonlychangeρ(=c4c5).
Then,asshowninTable2,r1yandr2y(whicharetheequationsinthecrudecolumn)
change.
Figures2aand2bshowtwoDAGswherechangingthecausalcoefficients
governingtheX1-X2correlationdoesallowr1yandr2ytoremainfixed(thereare
otherpossibilities,e.g.,ahybridoffigures1aand2a).Examinationoffigure2a,
nicknamed“threehandles”becausetherearethreeunknownvariables(U,U’,U’’),
showsthatr1y(=c3c4)isinsulatedfromchangesinρ(=c1c2)bythecollidersX1and
17
X2,andsimilarlyforr2y(=c5c6).CrudeanalysisoftheX1-YandX2-Yassociations
usinglinearregressionyieldsc3c4andc5c6,respectively,andcanbeseentobe
independentofρ(=c1c2).Amixturesmethodthatrecoversthesevalueswould,ina
certainsense,berecoveringthecorrectvalues,althoughtheydonotreflectcausal
X1-YandX2-YassociationsbecausetheyareactuallyduetoconfoundingbyU’and
U’’.Mutualadjustmentproducesothernon-causalestimates(Table3)asit
conditionsonthecollidersX1andX2.Forexample,foracrudeanalysisofX1-Y,the
onlyopenpathwayisX1çU’èY.AdjustingforX2opensuptheadditionalpathway
X1çUèX2çU’’èY.SimilarlogicappliesforanalysisofX2.
AnalternativeDAGthatwouldallowforρtobechangedwiththerremaining
fixedisoneofreversecausation(Figure2b).Inthiscase,theonlyopenpathwayina
crudeanalysisoftheX1-YassociationisX1çYsinceX2isacollider,andviceversa.
Again,assuminglinearmodelsandunitvariance,theDAGshowsthat
r1y = c3 (8a)
r2 y = c4 (8b)
ρ = c1c2 + c3c4 (8c)
wheretheciarethecausalcoefficientsinFigure2bandρistheX1-X2correlation.
Equation(8)showsthatwecanchangeρ(bychangingc1c2)whilekeepingother
causalcoefficientsand,therefore,r1yandr2yfixed.Inthisexample,crudeanalysisof
theX1-YandX2-Yassociationsyieldsc3andc4,buttheyreflectreversecausality.
MutualadjustmentconditionsoncollidersX1andX2andsointroducesanon-causal
pathwayintoestimationofeachparameter(notshown).
Summingup,usingmethod2togeneratesyntheticdatabyincreasingρ=b3b4
andfixingtheriymeansthatweareeitherchangingothercausalcoefficients(b1,b2)
18
inFigure1a,orthatweareusingadifferentDAG(e.g.,figure2a).Apossibleexample
ofthisistheevaluationofWQSandothermethodsbyCarricoetal.(2015).While
thesimulationscenariosincludemorethantwoexposures,theyinvolvechanging
thecorrelationsbetweenexposureswhilefixingtheriy(withsomeequaltozero)or
changingtheriywhilefixingthecorrelationsbetweenexposures—ascenariothatis
possibleunderaDAGlikethatinfigure2a,butnot1a.Ratherthancalculatebias,
theydeterminesensitivityandspecificityofdetectinganassociationwiththe
outcome,treatingnon-zeroriyastruth.However,fortheDAGinfigure2a,crude
regressionanalysisprovidesthe“correct”answers–i.e.,thecruderiy,which,
however,arenon-causalassociations–whileadjustedregressionanalysis(i.e.,
includingallexposuresinthemodelasWQSdoes)estimatesresultsdifferentfrom
theoriginalriywithadditionalbiasduetoconditioningoncolliders.Indeed,Carrico
etal.foundthatcrudeordinaryregressionhadgoodsensitivityandspecificitywhen
theriywerelarge,whichisexactlywhatonewouldexpect.Thissimulation
procedureisnotconsistentwiththeconfoundingbyco-exposureDAG(Figure1a,
whereadjustedanalysesshouldprovidethecorrectanswer),unlesstheunderlying
causalcoefficientslinkingexposureandoutcomes(bi)change.Incontrast,method1
orthehybridmethodmakesitstraightforwardtofixb1andb2inFigure1awhile
increasingρ.
Noneofthismeansthatmethod2forgeneratingsyntheticdataiswrong.It
does,however,meanthatitisimportanttohaveaDAGexplicitlyinmindwhen
generatingsyntheticdataandinterpretingregressionresultsintermsofwhether
theyarereflectingthecausalparameterswehopedtoestimateorconfounded
19
associations.Thisisparticularlyimportantwhenexaminingtheeffectsofincreased
collinearitybetweenexposures.
5.Aremoresophisticatedmixturesmethodssusceptibletoco-exposure
amplificationbias(CAB)?
Asdiscussedinourearlierpaper(Weisskopfetal.2018),itisanopen
questionwhethermixturesmethodsotherthanlinearregressionarealsosubjectto
co-exposureamplificationbias(CAB).Formanysuchmethods,onecannoteasily
writeclosedformsolutionsakintoTables1-3.Thestandardapproachtoanswering
thisquestionwouldbetocreatesyntheticdatabasedontheDAGforCABandtest
themethods.
Butthereisanotherconceptualapproachbasedonthefactthatdifferent
DAGscangeneratethesamedata.ThiscorrespondenceofDAGsmeansthatany
mixturesmethodthatcanreplicatetheresultsoflinearregressionforindividual
componentsinfigure1a(mutualconfoundingbyco-exposure)willalsobesubject
toCAB(figure1b).
Forexample,withsmallamountsoferrorandtherightdegreeofsmoothing,
smoothingmethods(e.g.,splines,generalizedadditivemodels)canveryclosely
reproducelinearregressionoflinearmodels.Eventhoughsmoothingmethodsdon’t
directlyprovideregressioncoefficients,thesmoothsthencloselyapproximate
linearregressionandtheunderlyingmodel.Thus,bkmrandsimilarsmoothing
approachesshouldalsobesubjecttoCAB.
20
Therecentlyproposedquantileg-computation(qgcomp)method(Keiletal
2020)isamoreinterestingcase.Wewillrestrictourdiscussiontothesituations
we’vediscussedinthispaper:theDAGswe’vedescribedwithunderlyinglinear
modelsandnoeffectmeasuremodificationorinteractionsandtime-fixed
exposures.Theunderlyingcausalmethod,g-computation,shouldthenyieldbeta
estimatesidenticaltolinearregression.Indeed,itwouldtypicallyusealinear
regressionmodelasthefirststep.Thus,g-computationshouldbesusceptibleto
CAB.qgcompconvertsdataintoquantilesbeforeanalysis,butonalreadyquantiled
data,gcomputationshouldagainyieldthesamebetacoefficientsforindividual
exposuresaslinearregression.
Onedifferenceinqgcompistheestimationoftheoveralleffectofthemixture
(italsocomputesweightsforcomponents).Fortheconfoundingbyco-exposure
DAG(fig.1a)withthelinearmodelofequation(5),itshouldonexpectationproduce
theoverallcausaleffectofthemixture
ψ = b1 +b2 (9)
i.e.,thesumoftheunderlyingcausalcoefficients(Keiletal2020).Notably,qgcomp
doesnotassumethatb1andb2havethesamesign.FortheCABDAG(fig.1b),orthe
reparameterizationof1bintermsof1a,thetruevalueofψisjustc1.AsU’is
unknownandthereforeomitted(violatingtheassumptionofthecausalmodel),the
expectedoveralleffectisfoundbyaddingthebetacoefficientsintherightcolumnof
Table2:
ψ = c1 +c2c31− ρ 2
⎛
⎝⎜
⎞
⎠⎟+ −c2c3
ρ1− ρ 2
⎛
⎝⎜
⎞
⎠⎟= c1 +
c2c31+ ρ
(10)
21
Fornocorrelationbetweenexposures(ρ=0),weobtainc1+c2c3(sincethenU
disappearsinFig.1b)buttheconfoundingbyU’(c1c2)remains.Thebiasofthetotal
effectofthemixtureψisjustequation10minusthetruecausalvaluec1:
ψbias = c2c31+ ρ
(11)
Forpositivecorrelationsbetweenexposures,theabsolutevalueofthebiasofψ
decreasesasweincreaseρ.Thisoccursbecauseeventhougheachindividualbeta
coefficientbecomesmorebiased(duetoCAB),thetwobetacoefficientshave
oppositesignsandpartlycanceleachotherout.NotonlyistherenoCABforthe
totaleffectofthemixture(ψ),butthebiasfromtheuncontrolledconfounding(byU’
inFigure1b)isreducedascorrelationincreases.Thesameconclusionholdstruefor
theestimateofψderivedfromlinearregressionbyaddingthetwobetacoefficients
(Notethatqgcomphasotherusefulpropertiescomparedtolinearregression(Keil
etal2020)).However,fornegativecorrelationsbetweenexposures,theabsolute
valueofthebiasofψ,calculatedusingeithermethod,increasesasρbecomesmore
negative.Estimatesofthetotaleffectofamixturemayhaveadvantages,especially
sincenegativecorrelationsbetweenexposuresappeartoberarerthanpositive
correlations(Webster2018).Moreresearchisneededonthesusceptibilityofother
mixturesmethodssuchasWQStoCABaswellasthepropertiesofmeasuresoftotal
effect.
6.Conclusion
22
Insummary,whenresultsfromagivenanalysisareconsidered,precise
assumptionsabouttheunderlyingdatastructure—whichareoftenrepresentedina
DAG—mustbemadewhentryingtomakecausalinferencesfromtheresults.While
analysisresultscanruleoutsomeDAGs,anumberofDAGscanbeconsistentwith
thesameresults.Thus,onecannotinfertheDAGfromthedata.Thisalsohas
implicationsforthegenerationofsyntheticdata,whichareveryusefulfortesting
thecapabilitiesofnewmixturesmethods.Thereareanumberofwaysthatsuch
syntheticdatamaybecreated,butwestronglyrecommendthatresearchers
explicitlyuseaDAGwhendoingso.Finally,weusetheequivalencyofDAGsto
provideatestofwhethernovelmixturesmethodsaresubjecttoco-exposure
amplificationbias.Ourresultsindicatethatbkmrissusceptibletothisproblem,
whileestimatesofthetotaleffectofamixture,obtainedvialinearregressionor
gqcomp,maysometimesavoidit.Estimatesofthetotaleffectofamixtureisan
importantbutrelativelyunderexploredtopicthatwarrantsfurtherinvestigation.
Acknowledgements:
ThisworkwassupportedbyNIEHHgrantR01ES028800.
References
BobbJF,ValeriL,ClausHennB,ChristianiDC,WrightRO,MazumdarM,etal.
Bayesiankernelmachineregressionforestimatingthehealtheffectsofmulti-
pollutantmixtures.Biostatistics2015;16:493–508;
doi:10.1093/biostatistics/kxu058.
23
BraunJM,GenningsC,HauserR,WebsterTF.WhatcanEpidemiologicalStudiesTell
UsabouttheImpactofChemicalMixturesonHumanHealth?EnvironHealth
Perspect2016;124:A6-9
CarlinDJ,RiderCV,WoychikR,BirnbaumLS.Unravelingthehealtheffectsof
environmentalmixtures:anNIEHSpriority.EnvironHealthPerspect.2013;
121(1):A6-8.
CarricoC,GenningsC,WheelerDC,Factor-LitvakP.Characterizationofweighted
quantilesumregressionforhighlycorrelateddatainariskanalysissetting.J
AgricBiolEnvironStat2015;20:100-120.
CzarnotaJ,GenningsC,WheelerDC.Assessmentofweightedquantilesum
regressionformodelingchemicalmixturesandcancerrisk.CancerInform2015;
14:159–171
GlymourM,GreenlandS.CausalDiagrams.In:ModernEpidemiology,3rdedition,
Part2nd(RothmanKJ,GreenlandS,LashT,eds).Philadelphia,PA:Wolters
KluwerHealth/LippincottWilliams&Wilkins,2008.
GreenlandS,PearlJ,RobinsJM.Causaldiagramsforepidemiologicresearch.
Epidemiology1999;10(1):37-48
HernánM,Hernández-DíazS,WerlerMM,MitchellAA.Causalknowledgeasa
prerequisiteforconfoundingevaluation.Anapplicationtobirthdefects
epidemiology.AmerJEpidemiol2002;155:176-84
HernanMA,RobinsJM.CausalInference.BocaRaton:Chapman&Hall/CRC,
forthcoming.2020
24
KeilAP,BuckleyJP,O’BrienKM,FergusonKK,ZhaoS,WhiteAJ.Aquantile-basedg-
computationapproachtoaddressingtheeffectsofexposuremixtures.Environ
HealthPerspect2020;128(4):047004-1
PearlJ.Onaclassofbias-amplifyingvariablesthatendangereffectestimates.In:
ProceedingsoftheProceedingsoftheTwenty-SixthConferenceonUncertainty
inArtificialIntelligence(UAI2010),2010.Corvallis,OR,(GrünwaldP,SpirtesP
eds).AssociationforUncertaintyinArtificialIntelligence,425–432.
PoweringResearchThroughInnovativeMethodsforMixturesinEpidemiology
(PRIME)Program.NIEHS.
https://www.niehs.nih.gov/research/supported/exposure/mixtures/prime_pro
gram/index.cfm(accessed10April2020)
SchistermanEF,PerkinsNJ,MumfordSL,AhrensKA,MitchellEM.Collinearityand
causaldiagrams–alessonontheimportanceofmodelspecification.
Epidemiology.2017;28(1):47–53.
TaylorKW,JoubertBR,BraunJM,DilworthC,GenningsC,HauserR,etal.Statistical
ApproachesforAssessingHealthEffectsofEnvironmentalChemicalMixturesin
Epidemiology:LessonsfromanInnovativeWorkshop.EnvironHealthPerspect.
2016;124(12):A227-A229.
TuY-K,GunnellD,GilthorpeMS.Simpson'sParadox,Lord'sParadox,and
SuppressionEffectsarethesamephenomenon–thereversalparadox.Emerging
ThemesinEpidemiology2008,5:2
VatchevaKP,LeeM,McCormickJB,RahbarMH.MulticollinearityinRegression
AnalysesConductedinEpidemiologicStudies.Epidemiology(Sunnyvale).2016;
25
6(2)doi:10.4172/2161-1165.1000227.
WebsterTF.Mixtures:ContrastingPerspectivesfromToxicologyand
Epidemiology.In:RiderC.,SimmonsJ.(eds)ChemicalMixturesandCombined
ChemicalandNonchemicalStressors.Springer,Cham,2018.
WeisskopfMG,SealsRM,WebsterTF.Biasamplificationinepidemiologicanalysisof
exposuretomixtures.EnvironHealthPerspect2018;126(4):047003
26
Table1.LinearregressionresultsforDAG1a:confoundingbyco-exposure(ρ=b3b4)
Association Crude(riy) Mutuallyadjustedβ1(forX1-Y) b1+ρb2 b1β2(forX2-Y) b2+ρb1 b2
Table2.LinearregressionresultsforDAG1b:co-exposureamplificationbias(ρ=b3b4)
Association Crude(riy) Mutuallyadjustedβ1(forX1-Y) c1+c2c3 c1+c2c3/(1-ρ2)β2(forX2-Y) ρc1 -c2c3ρ/(1-ρ2)
Table3.LinearregressionresultsforDAG2a“3handles”(ρ=c1c2)
Association Crude(riy) Mutuallyadjustedβ1(forX1-Y) c3c4 (c3c4-ρc5c6)/(1-ρ2)β2(forX2-Y) c5c6 (c5c6-ρc3c4)/(1-ρ2)
27
Figure1.A)DAGforconfoundingbyco-exposurewithcorrelationbetweenexposuresρ=b3b4.B)DAGforco-exposureamplificationbias,ρ=c4c5.C)DAGforconfoundingbyco-exposurewithnoise(errors)explicitlyshown,ρ=b3b4.
U
X1
X2
Yb3
b1
b2b4
A.
U
X1
X2
Yb3 b1
b2
ε
δ1
δ1
b4
C.
U
X1
X2
Yc1
U’c3c2
c4
c5
B.
28
Figure2.TwoDAGsthatallowtheX1-X2correlationtobechangewhilefixingtheX1-YandX2-Ycorrelations:A)“3handles”(ρ=c1c2,r1y=c3c4,r2y=c5c6).B)Reversecausation(ρ=c1c2+c3c4,r1y=c3,r2y=c4).Thereareotherpossibilities,e.g.,addingtoFigure2athecausalX1-YandX2-YlinksfromFigure1a.
top related