epidemiology of exposure to mixtures: we can’t be casual ... · 1 epidemiology of exposure to...

28
1 Epidemiology of exposure to mixtures: we can’t be casual about causality when using or testing methods Thomas F. Webster 1 and Marc G. Weisskopf 2 1 Department of Environmental Health, Boston University School of Public Health, Boston, MA, USA 2 Department of Environmental Health, Harvard T.H. Chan School of Public Health, Boston, MA, USA Corresponding author: Dr. Thomas F. Webster ORCiD: 0000-0003-4896-9323 Department of Environmental Health (T4W) Boston University School of Public Health 715 Albany St Boston, MA 02118 USA email: [email protected]

Upload: others

Post on 09-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Epidemiology of exposure to mixtures: we can’t be casual ... · 1 Epidemiology of exposure to mixtures: we can’t be casual about causality when using or testing methods Thomas

1

Epidemiologyofexposuretomixtures:wecan’tbecasualaboutcausality

whenusingortestingmethods

ThomasF.Webster1andMarcG.Weisskopf2

1DepartmentofEnvironmentalHealth,BostonUniversitySchoolofPublicHealth,

Boston,MA,USA

2DepartmentofEnvironmentalHealth,HarvardT.H.ChanSchoolofPublicHealth,

Boston,MA,USA

Correspondingauthor:

Dr.ThomasF.Webster

ORCiD:0000-0003-4896-9323

DepartmentofEnvironmentalHealth(T4W)

BostonUniversitySchoolofPublicHealth

715AlbanySt

Boston,MA02118USA

email:[email protected]

Page 2: Epidemiology of exposure to mixtures: we can’t be casual ... · 1 Epidemiology of exposure to mixtures: we can’t be casual about causality when using or testing methods Thomas

2

Abstract

Background:Thereisincreasinginterestinapproachesforanalyzingtheeffectof

exposuremixturesonhealth.Akeyissueishowtosimultaneouslyanalyzeoften

highlycollinearcomponentsofthemixture,whichcancreateproblemssuchas

confoundingbyco-exposureandco-exposureamplificationbias(CAB).Evaluation

ofnovelmixturesmethods,typicallyusingsyntheticdata,iscriticaltotheirultimate

utility.

Objectives:Thispaperaimstoanswertwoquestions.Howdocausalmodelsinform

theinterpretationofstatisticalmodelsandthecreationofsyntheticdatausedtotest

them?ArenovelmixturesmethodssusceptibletoCAB?

Methods:Weusedirectedacyclicgraphs(DAGs)andlinearmodelstoderiveclosed

formsolutionsformodelparameterstoexaminehowunderlyingcausal

assumptionsaffecttheinterpretationofmodelresults.

Results:Thesamebetacoefficientsestimatedbyastatisticalmodelcanhave

differentinterpretationsdependingontheassumedcausalstructure.Similarly,the

methodusedtosimulatedatacanhaveimplicationsfortheunderlyingDAG(and

viceversa),andthereforetheidentificationoftheparameterbeingestimatedwith

ananalyticapproach.Wedemonstratethatmethodsthatcanreproduceresultsof

linearregression,suchasBayesiankernelmachineregressionandthenewquantile

g-computationapproach,willbesubjecttoCAB.However,undersomeconditions,

estimatesofanoveralleffectofthemixtureisnotsubjecttoCABandevenhas

reduceduncontrolledbias.

Page 3: Epidemiology of exposure to mixtures: we can’t be casual ... · 1 Epidemiology of exposure to mixtures: we can’t be casual about causality when using or testing methods Thomas

3

Discussion:JustasDAGsencodeapriorisubjectmatterknowledgeallowing

identificationofvariablecontrolneededtoblockanalyticbias,werecommend

explicitlyidentifyingDAGsunderlyingsyntheticdatacreatedtoteststatistical

mixturesapproaches.Estimatesofthetotaleffectofamixtureisanimportantbut

relativelyunderexploredtopicthatwarrantsfurtherinvestigation.

Page 4: Epidemiology of exposure to mixtures: we can’t be casual ... · 1 Epidemiology of exposure to mixtures: we can’t be casual about causality when using or testing methods Thomas

4

Introduction

Ithaslongbeenrecognizedthathumansandecosystemsareexposedto

multiplechemicalandnon-chemicalagentsthatmayaffecthealth.While

environmentalepidemiologiststendtoanalyzeexposuresoneatatime,methodsfor

examiningexposuretomixturesofexposureshaverecentlyreceivedincreased

attention,includingconferencesandgrantscomparingapproaches(Carlinetal

2013,Tayloretal.2016,PRIME).Questionsofinteresttomixturesepidemiology

includeselectionofexposurevariablesthatcontributetotheoutcome,interactions,

jointeffectofthemixtureasawholeandconstructionofexposuresummary

measures.Importantissuesthatcomeupinthiscontextincludecollinearityand

confoundingbyco-exposures(Braunetal2016).

Causalityisofgreatconcerntoepidemiology,buttherehasbeenarevolution

incausalmethodsinthelastdecadeortwo.Theuseofdirectedacyclicgraphs

(DAGs)isonesuchpowerfulmethod(e.g.,HernanandRobins2020).Causal

methodsarenowwidelyseenascriticaltothedesignandanalysisofetiologic

epidemiologicstudies(exemptfromthisrulearedescriptivestudiesorpurely

predictivemodels,aslongasitisclearthattheirpurposeisnotetiologic

investigation).Inparticular,oneneedstohaveacausalmodelinmindwhen

interpretingepidemiologicresults.

However,causalityhassofarreceivedrelativelylittleattentioninmixtures

epidemiology(Weisskopfetal.2018,Keiletal2020).Analogouslytotheideathat

causalmodelsarenecessarytoevaluatepossibleconfoundingaswellascollider

bias(Hernánetal2002),wewillalsoargueherethatcausalmodelsarenecessary

Page 5: Epidemiology of exposure to mixtures: we can’t be casual ... · 1 Epidemiology of exposure to mixtures: we can’t be casual about causality when using or testing methods Thomas

5

whentestingandcomparingmethodsformixturesepidemiology.Thisincludes

constructionandanalysisofsyntheticdatathatareoftenusedforevaluationof

approacheswithoutclosedformsolutions,asistrueformanynovelmethods.

Correlationbetweenexposuresiscommonandcreatesbothepidemiologic

andstatisticalissues.Inthispaperweexploretheintersectionofmixtures

epidemiologyandcausalitybyexaminingtwoquestionsthatrelatetocollinearity.

1)Howisagivenmixturesepidemiologymethodaffectedbyincreasingcollinearity?

Inparticular,howshouldwecreatesyntheticdatatoanswerthisquestion?2)Are

newermixturesmethods,suchasBayesiankernelmachineregression(bkmr)

(Bobbetal2015)andquantileg-computation(qgcomp)(Keiletal2020),

susceptibletoco-exposureamplificationbias(CAB),whichwehavepreviously

describedasapotentialproblemwhenexaminingmultiplecorrelatedexposures

(Weisskopfetal.2018)?Beforeaddressingthesequestions,wewillbrieflyreview

causalmodelsandmixturesepidemiology.

1.Background:Causalmodelsandmixturesepidemiology

Tokeepthingssimple,wewillrestrictourselvestothesituationofone

outcome,twoexposuresandalimitednumberofconfounders.Exposuretomixtures

canobviouslybefarmorecomplex(e.g.,Weisskopfetal.2018).Whilethis

representsaverysimplemixturesproblem,interpretationcanstillbecomplicated.

Fornow,let’sconsidertwopossiblecausalmodels:

1a.Confoundingbyco-exposure

Page 6: Epidemiology of exposure to mixtures: we can’t be casual ... · 1 Epidemiology of exposure to mixtures: we can’t be casual about causality when using or testing methods Thomas

6

AsshownintheDAGinFigure1a,assumethatthetwoexposures,X1andX2,

arecorrelatedduetoanunknownorunmeasuredcauseU,e.g.,acommonsource.

TheDAG,iftrue,representsthecausalconnectionsbetweenvariables(thearrows).

ForareviewofDAGsandrelatedterminologysee(GlymourandGreenland2008;

HernanandRobins2020;Greenlandetal.1999).Here,bothX1andX2arecausally

associatedwiththeoutcomeY(Throughoutthispaperweassumenoeffectmeasure

modificationoftheserelationshipsorinteractions,whicharetypicallynotreflected

inDAGs).Supposethatweareinterestedinthecontributionofeachmixture

componenttotheoutcome.ThecrudeX1-YassociationisconfoundedbyX2because

inadditiontothecausalpathwayfromX1toY,thereisalsoanopenpathwayfrom

X1toYviaX2.Similarly,theX2-YassociationisconfoundedbyX1.Wecallthis

problemconfoundingbyco-exposure.Atypicalsolutionistomutuallyadjustfor

bothX1andX2,e.g.,putbothinaregressionmodel.ForthisDAG,mutual

adjustmentproducesunbiasedresults.TheX1-Yassociationisunbiasedbecause

adjustingforX2blocksthepathwayfromX1toYviaX2,andviceversa.These

conclusionsdonotdependonthemathematicalformoftherelationships

underlyingthelinksbetweenvariables(Greenlandetal1999).

Themagnitudeofthecrudeandmutuallyadjustedeffectestimateshave

straightforwardclosed-formsolutionswhenthevariablesarecontinuous,the

associationsarelinear,andweapplylinearregression.Letb1bethecausal

coefficientlinkingX1toY,i.e.,theincreaseinYcausedbyaoneunitincreaseinX1,

andsimilarlyforb2linkingX2toY.Forsimplicity,we’llassumethatthevariables

arestandardized(centeredwithunitvariances).Letρbethecorrelationcoefficient

Page 7: Epidemiology of exposure to mixtures: we can’t be casual ... · 1 Epidemiology of exposure to mixtures: we can’t be casual about causality when using or testing methods Thomas

7

forX1andX2(combiningthetwocausalcoefficientsb3andb4linkingUtoX1and

X2.NotethattheotherpathwaybetweenX1andX2viaYisblockedbecauseYisa

collider).Table1showsthecrudeandadjustedregressioncoefficients(see

Weisskopfetal.2018forderivations).

1b.Co-exposureamplificationbias

Nowlet’sexamineanotherpossibleDAGforX1,X2andY.Figure1b

describesasituationcalledco-exposureamplificationbias(CAB)(Weisskopfetal.

2018),avariationofz-amplificationbias(e.g.,Pearl2010).X1andX2arecorrelated

asbeforebecauseofU,butonlyX1iscausallyassociatedwithY.Supposethatthere

isalsoanunknown(orunmeasured)variableU’thatconfoundsthecrudeX1-Y

association.Forexample,thistypeofsituationmightoccurwhenX1isabiomarker

measuredinserumandU’isaphysiologicalfactorthataffectsboththebiomarker

X1andtheoutcome.Inthisexample,thecrudeX1-YassociationisconfoundedbyU’,

butnotbyco-exposureX2.ThecrudeX2-YassociationisconfoundedbyX1,butnot

byU’asthatpathwayisblockedbythecolliderX1.

ThenaturaltendencyisagaintomutuallyadjustforX1andX2,particularlyif

wedon’tknowthatU’exists.Whathappens?ItturnsoutthattheX1-Yassociation

(adjustedforX2)canbemorebiasedthanthecrudeestimate(Weisskopfetal.

2018).Meanwhile,adjustingforX1eliminatestheconfoundingbyco-exposurefor

theX2-Yassociation(byblockingthatpathway),butleadstoconfoundingbyU’

(openingthepathwaythroughU’byconditioningonthecolliderX1).

Page 8: Epidemiology of exposure to mixtures: we can’t be casual ... · 1 Epidemiology of exposure to mixtures: we can’t be casual about causality when using or testing methods Thomas

8

Wecanquantifytheresultswhentheassociationsarelinearandweapply

linearregression,againassumingstandardizedvariables.Aspreviouslyshown

(Weisskopfetal.2018),thebiasinthemutuallyadjustedX1-Yassociationisalways

amplifiedby1/(1-ρ2)comparedtothecruderesult.TheX2-Yassociationisalso

typicallybiased.Forexample,ifallfourcoefficients(c1,c2,c3,andρ=c4c5here)are

positive,thentheX2-Yassociationswitchessignfrompositiveforthecrude

associationtonegativeforthemutuallyadjustedassociation.SeeTable2.

2.EquivalencyofdataandregressionresultsbetweendifferentDAGs

Thetwoexamplesillustratetheimportanceofhavingacausalmodelinmind

whenanalyzingdataandinterpretingresults.Withthesamethreevariables(X1,X2,

Y),mutualadjustmenteliminatesbiasintheconfoundingbyco-exposurecase,while

mutualadjustmentcanmakebiasworseintheCABcase.Importantly,onecannot

tellfromtheregressionresultswhichoftheseDAGs(ormanyothers)iscorrect.

Subjectareaspecificinformationisrequired.

Indeed,thesameunderlyingdatacanbeconsistentwithbothDAGs,i.e.,both

DAGscangenerateexactlythesamecrudeandadjustedregressionresults.

ComparisonofTables1and2showthecorrespondencebetweenparameters.

Equatingtheadjustedregressionresults,

β1|2 = b1 = c1 +

c2c3

1− ρ 2 (1a)

β2|1 = b2 = -c2c3ρ

1− ρ 2

⎝⎜

⎠⎟ (1b)

Page 9: Epidemiology of exposure to mixtures: we can’t be casual ... · 1 Epidemiology of exposure to mixtures: we can’t be casual about causality when using or testing methods Thomas

9

wedirectlyobtainthecausalcoefficientsforfigure1a(b1,b2)fromthoseforfigure

1b(c1,c2,c3).Somealgebra(notshown)allowsonetoexpress(c1,c2c3)asafunction

of(b1,b2);notethatonlytheproductc2c3isdetermined(andnotc2andc3

individually).

c1 = b1 +

b2

ρ (2a)

c2c3 = −b21− ρ 2

ρ

⎝⎜

⎠⎟ (2b)

Thismeansthattheregressionresultβ1doesn’ttelluswhetherweare

obtainingthetruecausalestimatethatwewant(b1)underDAG1a,orabiased

estimate(c1+c2c3/(1-ρ2))ofthecausalestimatewewant(c1),underDAG1b,and

similarlyforβ2.

Thesamecorrespondences(equations1,2)areobtainedusingthecrude

results.(ToderiveequivalentparametersbetweenDAGs,theunitvariance

assumptionsmaysometimesneedtoberelaxed).

ThereareofcourseotherDAGsconsistentwithtwoexposurevariablesand

oneoutcomevariable,manymoreifoneallowsthenumberofunknownor

unmeasuredvariablestoincrease.Clearly,regressioncannotinandofitselftellus

thecorrectunderlyingDAG.Wecannotdeterminefromdataalonethebestwayto

analyzedataorinterpretresults.Thisistrueforbothrealworlddataandsynthetic

data.

3.Generationofsyntheticmixturesepidemiologydata

Page 10: Epidemiology of exposure to mixtures: we can’t be casual ... · 1 Epidemiology of exposure to mixtures: we can’t be casual about causality when using or testing methods Thomas

10

Collinearityandconfoundingbyco-exposurearetwooftheissuesfrequently

mentionedinmixturesepidemiology.Forexample,inFigure1a,wemightwonder

abouttheeffectofhighcorrelationbetweenX1andX2onbiasineffectestimatesor

selectionofvariables.Forexample,ifb2=0,howwouldourabilitytoselectonlyX1

beaffectedbyincreasingρ?Insomesituationsandforsomestatisticalmethodswe

cananswerthesequestionsmathematically.Inothercases,wewouldtypically

generatesyntheticdataandcomparetheresultswiththetruemodelusedto

generatethedata.Butinthiscase,wemustpayattentiontothewaywesimulatethe

data.

SupposewewanttogeneratesyntheticdatacorrespondingtoFig1a,

confoundingbyco-exposure,foralinearmodelwithparametersb1,b2,andρ=b3b4.

Let’sfurtherassume,forsimplicity,thatX1andX2arestandardnormal(these

assumptionscanberelaxedifdesired,e.g.,assuminglog-normaldataorcorrelated

quantiledata,e.g.,Keiletal2020).Atleasttwomethodshavebeenused.

3a.Method1:

Followthesestepstosimulatedataforalinearmodelforthescenariodescribedby

Figure1a,buildingdirectlyfromtheDAG:

1)Pickvaluesforthecausalcoefficientsb1,b2,b3,b4aswellasthesample

sizen.

2)GeneratenvaluesofUassumingastandardnormaldistribution.

3)GenerateX1andX2usingthefollowingequations:

Page 11: Epidemiology of exposure to mixtures: we can’t be casual ... · 1 Epidemiology of exposure to mixtures: we can’t be casual about causality when using or testing methods Thomas

11

X1= b3U +δ1X 2 = b4U +δ2

(3)

whereδ1andδ2areerrorterms.IntermsofDAGs,thisisequivalenttoaddingnew

variablesδ1andδ2thatpointonlyintoX1andX2,respectively,asinFig1c;such

variablesaretypicallyomittedinDAGsbecausetheyarenotacauseofmorethan

onevariable(andthereforecannotintroducestructuralbias).Asdiscussedinthe

SupplementalMaterial,weincludetheseerrorterms(withappropriatevariances)

sothatX1andX2arestandardnormalwithρ=b3b4equaltotheircorrelation

coefficient.

4)GeneratetheexpectedvaluesofYusingthefollowingequation:

E[Y ]= b1X1 +b2X 2 (4)

ThetwotermscorrespondtothetwoarrowspointingdirectlyintoYfromX1and

X2.Forsimplicity,weomittheconstanttermin(4).

5)ObtainthevectorYofoutcomedatawithsomeaddednoise,byaddingto

(4)avectorofnormalerrortermsε(centeredatzero)withvarianceσ2:

Y = E[Y ]+ε = b1X1 +b2X 2 +ε (5)

Again,thiscorrespondstoanewvariableεpointingintoY(seeFig1c).Ifwealso

wantYtohaveaspecifiedvariance,e.g.,V(Y)=1,thenweapplythestandard

equationforthevarianceofsumsto(5)andrearrange,obtaining

V (ε) =V (Y )−b12 −b2

2 − 2ρb1b2 (6)

Sincevariancesmustbenon-negative,thisplacesconstraintsontheparametersρ,

b1,b2.NotethatV(ε)dependsonρ.

Page 12: Epidemiology of exposure to mixtures: we can’t be casual ... · 1 Epidemiology of exposure to mixtures: we can’t be casual about causality when using or testing methods Thomas

12

3b.Method2:

Thereisasecondapproachthatcansometimessimplifythesimulation(e.g.,

Carricoetal2015,Czarnotaetal2015).Supposewewanttosimulatedataforthe

DAGinfigure1a,assumingthatX1andX2aremultivariatenormalwithcorrelation

ρandtheunderlyinglinearmodelforYasin(4).ThenYisalsonormal,i.e.,(X1,X2,

Y)aremultivariatenormal.Withthisinsight,onecangenerateasetofsynthetic

dataforFigure1aasfollows:

1)Pickvaluesforρ,b1,b2aswellasthesamplesizen.

2)Computethecrudecorrelationcoefficientsbetweenthethreevariables.

ThecorrelationcoefficientforX1-X2isofcourseρ.Whenvariancesequalone,the

correlationsforX1-YandX2-YareequaltothecrudecoefficientsinTable1:r1y=b1+

ρb2andr2y=b2+ρb1.

3)Createthesymmetricvariance-covariancematrixforthethreevariables

(X1,X2,Y):

Σ =

1 ρ r1yρ 1 r2 yr1y r2 y 1

⎢⎢⎢⎢

⎥⎥⎥⎥

(7)

WehaveassumedunitvariancesforX1,X2andY,butthiscanberelaxed.Notethat

Σmustbepositivedefinite.

4)Generatethemultivariatenormaldata(X1,X2,Y)directlyusing(7),forexample

byusingthemvrnormfunctioninR.

3c.Hybridmethods

Page 13: Epidemiology of exposure to mixtures: we can’t be casual ... · 1 Epidemiology of exposure to mixtures: we can’t be casual about causality when using or testing methods Thomas

13

Inpractice,athirdapproachforcreatingsyntheticdataisoftenusedthat

combineselementsofbothmethods1and2.Inourexampleformethod1above,we

firstcreatedUandthenbuiltX1andX2fromthat,includingtheextraerrortermsδ1

andδ2.ThisextraeffortisnotnecessaryifthegoalistoobtainX1andX2as

bivariatenormal.Hence,onemightuseamethod2approachtosimulateX1andX2

andthenamethod1approachtosimulateY.Ouropinionisthatitdoesn’treally

matteraslongasoneisclearabouttheDAGbeingsimulatedandtheadditional

assumptionsneededforturningaqualitativeDAGintoadataset.Toguide

constructionofsimulationsandinterpretationofresultswerecommend

researchersalwaysincludeaDAGandexplicitlyindicatingwithreferencetothe

DAGwhichparameter(s)theyareattemptingtorecover(asthe“truth”)withtheir

mixturesanalysismeasure.

3d.Calculatingbiasusingsyntheticdata

Tonumericallytestastatisticalmethod,e.g.,linearregression,onewould

typicallygenerateadataset(usingoneofthesemethod),runlinearregression—

perhapsbothcrudeandmutuallyadjusted—savetheresults,andrepeat,say1000

times.Tocalculatebiasoftheindividualregressionestimatesassumingfigure1a,

averagetheX1-Yestimatesofthe1000datasetsandsubtractthetruevalueofb1,

andsimilarlyforX2-Yusingb2.Thevarianceoftheregressioncoefficientscanalso

becomputed.

Methods1and2canalsobeusedtogeneratedataforfigure1b.Hereone

wouldpickparametersc1,c2c3,andn.Amethod1approachmightfirstsimulateU

Page 14: Epidemiology of exposure to mixtures: we can’t be casual ... · 1 Epidemiology of exposure to mixtures: we can’t be casual about causality when using or testing methods Thomas

14

andU’,usetheseresultstosimulateX1andX2,andfinallyconstructY.Amethod2

approachwouldusethecrudecoefficientsfromTable2forr1yandr2y.Tocalculate

biasoftheregressionestimatesforFigure1b,averagetheX1-Yestimatesofthe

simulateddatasetsandsubtractc1,andsimilarlyforX2-Y,wherethetruevalueis

zero.

Notethatmethod1ismoreflexibleandcanaccommodatenonlinear

functions.Method2isoftensimplerwhentherequiredassumptionsaremet,often

allowingonetoskipconstructionofextravariables,e.g.,U,U’.Butforasinglesetof

parametersandalinearmodel,bothmethodscaninprinciplegenerateexactlythe

samedata,dataconsistentwitheitherFigure1aor1b.

4.Howisagivenmixturesepidemiologymethodaffectedbyincreasing

collinearity?Howshouldwecreatesyntheticdatatoanswerthisquestion?

Wecannowaddressthequestionofgeneratingsyntheticdatatoexamine

collinearity.Thegeneraltopicofthepatternofcorrelationsfoundinexposuredata

isbeyondthescopeofthispaper(Weisskopfetal.2018,Webster2018).Forour

purposeshere,itissufficienttonotethatcorrelationsbetweenexposurestypically

varyfromzerotonearlyoneandtendtoformblocksofcorrelatedvariables;

negativecorrelationscanalsooccurbutdonotappeartobeascommon.Thehigh

degreeofcorrelationshownbysomeexposuresraisesthespectreofcollinearity,a

standardissueforregression.

Hence,theeffectofcollinearityisapotentiallyimportantstatisticalissuefor

mixturesmethods.Indeed,itisoneofthemotivationsforsomenovelmethodssuch

Page 15: Epidemiology of exposure to mixtures: we can’t be casual ... · 1 Epidemiology of exposure to mixtures: we can’t be casual about causality when using or testing methods Thomas

15

asweightedquantilesumregression(WQS)(Carricoetal2015).Itiswellknown

thatinlinearregression,highdegreesofcorrelationdonotleadtobiasedregression

coefficients—biasedinthestatisticalsenseofdifferencefromtruthonaverage—but

increasethesizeofstandarderrorsandconfidenceintervals(e.g.,Schistermanetal

2017).Theeffectsofcollinearityonnovelmixturesmethodsarelesswellknown

andoftenlessamenabletotheoreticalanalysis.

SupposewewanttoexaminecollinearityfortheDAGinFigure1a,

confoundingbyco-exposure.Morespecifically,wewanttodeterminetheeffectof

increasingthecorrelationbetweenX1andX2(ρ)whilekeepingthecausal

coefficientslinkingX1toY(b1)andX2toY(b2)fixed.Wehypothesizethatmany

researchershavethistypeofmodelinmindwheninvestigatingeffectsof

collinearity,evenifnotexplicitlystated.

WeearlierdescribedmethodsforsimulatingalinearmodelforFigure1a.To

simulateincreasingcollinearity,wemight,forexample,usethehybridmethodto

generateaseriesofdatasets,keepingb1,b2thesame,whileincreasingρ(As

discussedearlier,somecaremaybeneededwithrespecttoV(ε)ifvarianceswillbe

examined).Toexaminebias,wewouldthencomparetheaverageX1-YandX2-Y

associationswithb1andb2,respectively.

Aswesawearlier,methods1,2andthehybridapproachcangeneratethe

sameresultsforanysingledataset.Onemightthereforebetemptedtousemethod

2,increasingρ,butkeepingr1yandr2ythesame.However,recallthatfortheDAGin

Figure1a(andcorrespondingTable1),r1yisthecrudeassociationbetweenX1and

Y,similarlyforX2:

Page 16: Epidemiology of exposure to mixtures: we can’t be casual ... · 1 Epidemiology of exposure to mixtures: we can’t be casual about causality when using or testing methods Thomas

16

r1y = b1 + ρb2 (7a)

r2 y = b2 + ρb1 (7b)

Theseequationstellusthatforfixedb1andb2,changingρwillchanger1yandr2y.

Putanotherway,changingρwhilefixingr1yandr2ymeansthatb1andb2have

changedfromthoseinitiallyspecifiedinstep1ofMethod2above.Butthen

comparingregressionresultswiththeoriginalb1andb2toexaminebiasdoesn’t

makesense.Theproblemisthatwehavechangedallofthecausalcoefficientsin

Figure1a(b1,b2andρ=b3b4)ratherthanjustthosegoverningtheX1-X2correlation

(orwe’veabandonedtheDAGinFigure1a—seebelow).Infact,inordertokeepthe

r1yandr2yfixedwhilechangingρ,oneofb1orb2willmovehigherandtheother

lower—aphenomenonreferredtoasthereversalparadoxthatisoften

misinterpretedasastatisticaleffectonregressionbetaestimateswhenputting

correlatedvariablesinamodeltogetherratherthanaconsequenceofhaving

(usuallyunwittingly)changedtheunderlyingcausalparametersbeingmodeled

(e.g.,Tuetal2008,Vatchevaetal2016).Asimilarproblemwouldbeassociated

withtheDAGinFig1b.Supposewewanttofixc1,c2,c3,andonlychangeρ(=c4c5).

Then,asshowninTable2,r1yandr2y(whicharetheequationsinthecrudecolumn)

change.

Figures2aand2bshowtwoDAGswherechangingthecausalcoefficients

governingtheX1-X2correlationdoesallowr1yandr2ytoremainfixed(thereare

otherpossibilities,e.g.,ahybridoffigures1aand2a).Examinationoffigure2a,

nicknamed“threehandles”becausetherearethreeunknownvariables(U,U’,U’’),

showsthatr1y(=c3c4)isinsulatedfromchangesinρ(=c1c2)bythecollidersX1and

Page 17: Epidemiology of exposure to mixtures: we can’t be casual ... · 1 Epidemiology of exposure to mixtures: we can’t be casual about causality when using or testing methods Thomas

17

X2,andsimilarlyforr2y(=c5c6).CrudeanalysisoftheX1-YandX2-Yassociations

usinglinearregressionyieldsc3c4andc5c6,respectively,andcanbeseentobe

independentofρ(=c1c2).Amixturesmethodthatrecoversthesevalueswould,ina

certainsense,berecoveringthecorrectvalues,althoughtheydonotreflectcausal

X1-YandX2-YassociationsbecausetheyareactuallyduetoconfoundingbyU’and

U’’.Mutualadjustmentproducesothernon-causalestimates(Table3)asit

conditionsonthecollidersX1andX2.Forexample,foracrudeanalysisofX1-Y,the

onlyopenpathwayisX1çU’èY.AdjustingforX2opensuptheadditionalpathway

X1çUèX2çU’’èY.SimilarlogicappliesforanalysisofX2.

AnalternativeDAGthatwouldallowforρtobechangedwiththerremaining

fixedisoneofreversecausation(Figure2b).Inthiscase,theonlyopenpathwayina

crudeanalysisoftheX1-YassociationisX1çYsinceX2isacollider,andviceversa.

Again,assuminglinearmodelsandunitvariance,theDAGshowsthat

r1y = c3 (8a)

r2 y = c4 (8b)

ρ = c1c2 + c3c4 (8c)

wheretheciarethecausalcoefficientsinFigure2bandρistheX1-X2correlation.

Equation(8)showsthatwecanchangeρ(bychangingc1c2)whilekeepingother

causalcoefficientsand,therefore,r1yandr2yfixed.Inthisexample,crudeanalysisof

theX1-YandX2-Yassociationsyieldsc3andc4,buttheyreflectreversecausality.

MutualadjustmentconditionsoncollidersX1andX2andsointroducesanon-causal

pathwayintoestimationofeachparameter(notshown).

Summingup,usingmethod2togeneratesyntheticdatabyincreasingρ=b3b4

andfixingtheriymeansthatweareeitherchangingothercausalcoefficients(b1,b2)

Page 18: Epidemiology of exposure to mixtures: we can’t be casual ... · 1 Epidemiology of exposure to mixtures: we can’t be casual about causality when using or testing methods Thomas

18

inFigure1a,orthatweareusingadifferentDAG(e.g.,figure2a).Apossibleexample

ofthisistheevaluationofWQSandothermethodsbyCarricoetal.(2015).While

thesimulationscenariosincludemorethantwoexposures,theyinvolvechanging

thecorrelationsbetweenexposureswhilefixingtheriy(withsomeequaltozero)or

changingtheriywhilefixingthecorrelationsbetweenexposures—ascenariothatis

possibleunderaDAGlikethatinfigure2a,butnot1a.Ratherthancalculatebias,

theydeterminesensitivityandspecificityofdetectinganassociationwiththe

outcome,treatingnon-zeroriyastruth.However,fortheDAGinfigure2a,crude

regressionanalysisprovidesthe“correct”answers–i.e.,thecruderiy,which,

however,arenon-causalassociations–whileadjustedregressionanalysis(i.e.,

includingallexposuresinthemodelasWQSdoes)estimatesresultsdifferentfrom

theoriginalriywithadditionalbiasduetoconditioningoncolliders.Indeed,Carrico

etal.foundthatcrudeordinaryregressionhadgoodsensitivityandspecificitywhen

theriywerelarge,whichisexactlywhatonewouldexpect.Thissimulation

procedureisnotconsistentwiththeconfoundingbyco-exposureDAG(Figure1a,

whereadjustedanalysesshouldprovidethecorrectanswer),unlesstheunderlying

causalcoefficientslinkingexposureandoutcomes(bi)change.Incontrast,method1

orthehybridmethodmakesitstraightforwardtofixb1andb2inFigure1awhile

increasingρ.

Noneofthismeansthatmethod2forgeneratingsyntheticdataiswrong.It

does,however,meanthatitisimportanttohaveaDAGexplicitlyinmindwhen

generatingsyntheticdataandinterpretingregressionresultsintermsofwhether

theyarereflectingthecausalparameterswehopedtoestimateorconfounded

Page 19: Epidemiology of exposure to mixtures: we can’t be casual ... · 1 Epidemiology of exposure to mixtures: we can’t be casual about causality when using or testing methods Thomas

19

associations.Thisisparticularlyimportantwhenexaminingtheeffectsofincreased

collinearitybetweenexposures.

5.Aremoresophisticatedmixturesmethodssusceptibletoco-exposure

amplificationbias(CAB)?

Asdiscussedinourearlierpaper(Weisskopfetal.2018),itisanopen

questionwhethermixturesmethodsotherthanlinearregressionarealsosubjectto

co-exposureamplificationbias(CAB).Formanysuchmethods,onecannoteasily

writeclosedformsolutionsakintoTables1-3.Thestandardapproachtoanswering

thisquestionwouldbetocreatesyntheticdatabasedontheDAGforCABandtest

themethods.

Butthereisanotherconceptualapproachbasedonthefactthatdifferent

DAGscangeneratethesamedata.ThiscorrespondenceofDAGsmeansthatany

mixturesmethodthatcanreplicatetheresultsoflinearregressionforindividual

componentsinfigure1a(mutualconfoundingbyco-exposure)willalsobesubject

toCAB(figure1b).

Forexample,withsmallamountsoferrorandtherightdegreeofsmoothing,

smoothingmethods(e.g.,splines,generalizedadditivemodels)canveryclosely

reproducelinearregressionoflinearmodels.Eventhoughsmoothingmethodsdon’t

directlyprovideregressioncoefficients,thesmoothsthencloselyapproximate

linearregressionandtheunderlyingmodel.Thus,bkmrandsimilarsmoothing

approachesshouldalsobesubjecttoCAB.

Page 20: Epidemiology of exposure to mixtures: we can’t be casual ... · 1 Epidemiology of exposure to mixtures: we can’t be casual about causality when using or testing methods Thomas

20

Therecentlyproposedquantileg-computation(qgcomp)method(Keiletal

2020)isamoreinterestingcase.Wewillrestrictourdiscussiontothesituations

we’vediscussedinthispaper:theDAGswe’vedescribedwithunderlyinglinear

modelsandnoeffectmeasuremodificationorinteractionsandtime-fixed

exposures.Theunderlyingcausalmethod,g-computation,shouldthenyieldbeta

estimatesidenticaltolinearregression.Indeed,itwouldtypicallyusealinear

regressionmodelasthefirststep.Thus,g-computationshouldbesusceptibleto

CAB.qgcompconvertsdataintoquantilesbeforeanalysis,butonalreadyquantiled

data,gcomputationshouldagainyieldthesamebetacoefficientsforindividual

exposuresaslinearregression.

Onedifferenceinqgcompistheestimationoftheoveralleffectofthemixture

(italsocomputesweightsforcomponents).Fortheconfoundingbyco-exposure

DAG(fig.1a)withthelinearmodelofequation(5),itshouldonexpectationproduce

theoverallcausaleffectofthemixture

ψ = b1 +b2 (9)

i.e.,thesumoftheunderlyingcausalcoefficients(Keiletal2020).Notably,qgcomp

doesnotassumethatb1andb2havethesamesign.FortheCABDAG(fig.1b),orthe

reparameterizationof1bintermsof1a,thetruevalueofψisjustc1.AsU’is

unknownandthereforeomitted(violatingtheassumptionofthecausalmodel),the

expectedoveralleffectisfoundbyaddingthebetacoefficientsintherightcolumnof

Table2:

ψ = c1 +c2c31− ρ 2

⎝⎜

⎠⎟+ −c2c3

ρ1− ρ 2

⎝⎜

⎠⎟= c1 +

c2c31+ ρ

(10)

Page 21: Epidemiology of exposure to mixtures: we can’t be casual ... · 1 Epidemiology of exposure to mixtures: we can’t be casual about causality when using or testing methods Thomas

21

Fornocorrelationbetweenexposures(ρ=0),weobtainc1+c2c3(sincethenU

disappearsinFig.1b)buttheconfoundingbyU’(c1c2)remains.Thebiasofthetotal

effectofthemixtureψisjustequation10minusthetruecausalvaluec1:

ψbias = c2c31+ ρ

(11)

Forpositivecorrelationsbetweenexposures,theabsolutevalueofthebiasofψ

decreasesasweincreaseρ.Thisoccursbecauseeventhougheachindividualbeta

coefficientbecomesmorebiased(duetoCAB),thetwobetacoefficientshave

oppositesignsandpartlycanceleachotherout.NotonlyistherenoCABforthe

totaleffectofthemixture(ψ),butthebiasfromtheuncontrolledconfounding(byU’

inFigure1b)isreducedascorrelationincreases.Thesameconclusionholdstruefor

theestimateofψderivedfromlinearregressionbyaddingthetwobetacoefficients

(Notethatqgcomphasotherusefulpropertiescomparedtolinearregression(Keil

etal2020)).However,fornegativecorrelationsbetweenexposures,theabsolute

valueofthebiasofψ,calculatedusingeithermethod,increasesasρbecomesmore

negative.Estimatesofthetotaleffectofamixturemayhaveadvantages,especially

sincenegativecorrelationsbetweenexposuresappeartoberarerthanpositive

correlations(Webster2018).Moreresearchisneededonthesusceptibilityofother

mixturesmethodssuchasWQStoCABaswellasthepropertiesofmeasuresoftotal

effect.

6.Conclusion

Page 22: Epidemiology of exposure to mixtures: we can’t be casual ... · 1 Epidemiology of exposure to mixtures: we can’t be casual about causality when using or testing methods Thomas

22

Insummary,whenresultsfromagivenanalysisareconsidered,precise

assumptionsabouttheunderlyingdatastructure—whichareoftenrepresentedina

DAG—mustbemadewhentryingtomakecausalinferencesfromtheresults.While

analysisresultscanruleoutsomeDAGs,anumberofDAGscanbeconsistentwith

thesameresults.Thus,onecannotinfertheDAGfromthedata.Thisalsohas

implicationsforthegenerationofsyntheticdata,whichareveryusefulfortesting

thecapabilitiesofnewmixturesmethods.Thereareanumberofwaysthatsuch

syntheticdatamaybecreated,butwestronglyrecommendthatresearchers

explicitlyuseaDAGwhendoingso.Finally,weusetheequivalencyofDAGsto

provideatestofwhethernovelmixturesmethodsaresubjecttoco-exposure

amplificationbias.Ourresultsindicatethatbkmrissusceptibletothisproblem,

whileestimatesofthetotaleffectofamixture,obtainedvialinearregressionor

gqcomp,maysometimesavoidit.Estimatesofthetotaleffectofamixtureisan

importantbutrelativelyunderexploredtopicthatwarrantsfurtherinvestigation.

Acknowledgements:

ThisworkwassupportedbyNIEHHgrantR01ES028800.

References

BobbJF,ValeriL,ClausHennB,ChristianiDC,WrightRO,MazumdarM,etal.

Bayesiankernelmachineregressionforestimatingthehealtheffectsofmulti-

pollutantmixtures.Biostatistics2015;16:493–508;

doi:10.1093/biostatistics/kxu058.

Page 23: Epidemiology of exposure to mixtures: we can’t be casual ... · 1 Epidemiology of exposure to mixtures: we can’t be casual about causality when using or testing methods Thomas

23

BraunJM,GenningsC,HauserR,WebsterTF.WhatcanEpidemiologicalStudiesTell

UsabouttheImpactofChemicalMixturesonHumanHealth?EnvironHealth

Perspect2016;124:A6-9

CarlinDJ,RiderCV,WoychikR,BirnbaumLS.Unravelingthehealtheffectsof

environmentalmixtures:anNIEHSpriority.EnvironHealthPerspect.2013;

121(1):A6-8.

CarricoC,GenningsC,WheelerDC,Factor-LitvakP.Characterizationofweighted

quantilesumregressionforhighlycorrelateddatainariskanalysissetting.J

AgricBiolEnvironStat2015;20:100-120.

CzarnotaJ,GenningsC,WheelerDC.Assessmentofweightedquantilesum

regressionformodelingchemicalmixturesandcancerrisk.CancerInform2015;

14:159–171

GlymourM,GreenlandS.CausalDiagrams.In:ModernEpidemiology,3rdedition,

Part2nd(RothmanKJ,GreenlandS,LashT,eds).Philadelphia,PA:Wolters

KluwerHealth/LippincottWilliams&Wilkins,2008.

GreenlandS,PearlJ,RobinsJM.Causaldiagramsforepidemiologicresearch.

Epidemiology1999;10(1):37-48

HernánM,Hernández-DíazS,WerlerMM,MitchellAA.Causalknowledgeasa

prerequisiteforconfoundingevaluation.Anapplicationtobirthdefects

epidemiology.AmerJEpidemiol2002;155:176-84

HernanMA,RobinsJM.CausalInference.BocaRaton:Chapman&Hall/CRC,

forthcoming.2020

Page 24: Epidemiology of exposure to mixtures: we can’t be casual ... · 1 Epidemiology of exposure to mixtures: we can’t be casual about causality when using or testing methods Thomas

24

KeilAP,BuckleyJP,O’BrienKM,FergusonKK,ZhaoS,WhiteAJ.Aquantile-basedg-

computationapproachtoaddressingtheeffectsofexposuremixtures.Environ

HealthPerspect2020;128(4):047004-1

PearlJ.Onaclassofbias-amplifyingvariablesthatendangereffectestimates.In:

ProceedingsoftheProceedingsoftheTwenty-SixthConferenceonUncertainty

inArtificialIntelligence(UAI2010),2010.Corvallis,OR,(GrünwaldP,SpirtesP

eds).AssociationforUncertaintyinArtificialIntelligence,425–432.

PoweringResearchThroughInnovativeMethodsforMixturesinEpidemiology

(PRIME)Program.NIEHS.

https://www.niehs.nih.gov/research/supported/exposure/mixtures/prime_pro

gram/index.cfm(accessed10April2020)

SchistermanEF,PerkinsNJ,MumfordSL,AhrensKA,MitchellEM.Collinearityand

causaldiagrams–alessonontheimportanceofmodelspecification.

Epidemiology.2017;28(1):47–53.

TaylorKW,JoubertBR,BraunJM,DilworthC,GenningsC,HauserR,etal.Statistical

ApproachesforAssessingHealthEffectsofEnvironmentalChemicalMixturesin

Epidemiology:LessonsfromanInnovativeWorkshop.EnvironHealthPerspect.

2016;124(12):A227-A229.

TuY-K,GunnellD,GilthorpeMS.Simpson'sParadox,Lord'sParadox,and

SuppressionEffectsarethesamephenomenon–thereversalparadox.Emerging

ThemesinEpidemiology2008,5:2

VatchevaKP,LeeM,McCormickJB,RahbarMH.MulticollinearityinRegression

AnalysesConductedinEpidemiologicStudies.Epidemiology(Sunnyvale).2016;

Page 25: Epidemiology of exposure to mixtures: we can’t be casual ... · 1 Epidemiology of exposure to mixtures: we can’t be casual about causality when using or testing methods Thomas

25

6(2)doi:10.4172/2161-1165.1000227.

WebsterTF.Mixtures:ContrastingPerspectivesfromToxicologyand

Epidemiology.In:RiderC.,SimmonsJ.(eds)ChemicalMixturesandCombined

ChemicalandNonchemicalStressors.Springer,Cham,2018.

WeisskopfMG,SealsRM,WebsterTF.Biasamplificationinepidemiologicanalysisof

exposuretomixtures.EnvironHealthPerspect2018;126(4):047003

Page 26: Epidemiology of exposure to mixtures: we can’t be casual ... · 1 Epidemiology of exposure to mixtures: we can’t be casual about causality when using or testing methods Thomas

26

Table1.LinearregressionresultsforDAG1a:confoundingbyco-exposure(ρ=b3b4)

Association Crude(riy) Mutuallyadjustedβ1(forX1-Y) b1+ρb2 b1β2(forX2-Y) b2+ρb1 b2

Table2.LinearregressionresultsforDAG1b:co-exposureamplificationbias(ρ=b3b4)

Association Crude(riy) Mutuallyadjustedβ1(forX1-Y) c1+c2c3 c1+c2c3/(1-ρ2)β2(forX2-Y) ρc1 -c2c3ρ/(1-ρ2)

Table3.LinearregressionresultsforDAG2a“3handles”(ρ=c1c2)

Association Crude(riy) Mutuallyadjustedβ1(forX1-Y) c3c4 (c3c4-ρc5c6)/(1-ρ2)β2(forX2-Y) c5c6 (c5c6-ρc3c4)/(1-ρ2)

Page 27: Epidemiology of exposure to mixtures: we can’t be casual ... · 1 Epidemiology of exposure to mixtures: we can’t be casual about causality when using or testing methods Thomas

27

Figure1.A)DAGforconfoundingbyco-exposurewithcorrelationbetweenexposuresρ=b3b4.B)DAGforco-exposureamplificationbias,ρ=c4c5.C)DAGforconfoundingbyco-exposurewithnoise(errors)explicitlyshown,ρ=b3b4.

U

X1

X2

Yb3

b1

b2b4

A.

U

X1

X2

Yb3 b1

b2

ε

δ1

δ1

b4

C.

U

X1

X2

Yc1

U’c3c2

c4

c5

B.

Page 28: Epidemiology of exposure to mixtures: we can’t be casual ... · 1 Epidemiology of exposure to mixtures: we can’t be casual about causality when using or testing methods Thomas

28

Figure2.TwoDAGsthatallowtheX1-X2correlationtobechangewhilefixingtheX1-YandX2-Ycorrelations:A)“3handles”(ρ=c1c2,r1y=c3c4,r2y=c5c6).B)Reversecausation(ρ=c1c2+c3c4,r1y=c3,r2y=c4).Thereareotherpossibilities,e.g.,addingtoFigure2athecausalX1-YandX2-YlinksfromFigure1a.