variability in the uk biobank · 1 1 genotype-by-environment interactions inferred from genetic...

24
1 Genotype-by-environment interactions inferred from genetic effects on phenotypic 1 variability in the UK Biobank 2 3 Huanwei Wang 1 , Futao Zhang 1 , Jian Zeng 1 , Yang Wu 1 , Kathryn E. Kemper 1 , Angli Xue 1 , Min 4 Zhang 1 , Joseph E. Powell 1,2,3 , Michael E. Goddard 4,5 , Naomi R. Wray 1,6 , Peter M. Visscher 1,6 , Allan 5 F. McRae 1 , Jian Yang 1,6,7 6 7 1 Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland 4072, 8 Australia 9 2 Garvan-Weizmann Centre for Cellular Genomics, Garvan Institute for Medical Research, 10 Sydney, NSW 2010, Australia 11 3 Faculty of Medicine, University of New South Wales, Sydney, NSW 2052, Australia 12 4 Faculty of Veterinary and Agricultural Science, University of Melbourne, Parkville, Victoria, 13 Australia; 14 5 Biosciences Research Division, Department of Economic Development, Jobs, Transport and 15 Resources, Bundoora, Victoria, Australia 16 6 Queensland Brain Institute, The University of Queensland, Brisbane, Queensland 4072, 17 Australia 18 7 Institute for Advanced Research, Wenzhou Medical University, Wenzhou, Zhejiang 325027, 19 China 20 Correspondence: Jian Yang ([email protected]) 21 22 Abstract 23 Genotype-by-environment interaction (GEI) is a fundamental component in understanding 24 complex trait variation. However, it remains challenging to identify genetic variants with GEI 25 effects in humans largely because of the small effect sizes and the difficulty of monitoring 26 environmental fluctuations. Here, we demonstrate that GEI can be inferred from genetic 27 variants associated with phenotypic variability in a large sample without the need of measuring 28 environmental factors. We performed a genome-wide variance quantitative trait locus (vQTL) 29 analysis of ~5.6 million variants on 348,501 unrelated individuals of European ancestry for 13 30 quantitative traits in the UK Biobank, and identified 75 significant vQTLs with P<2.0´10 -9 for 9 31 traits, especially for those related to obesity. Direct GEI analysis with five environmental factors 32 showed that the vQTLs were strongly enriched with GEI effects. Our results indicate pervasive 33 GEI effects for obesity-related traits and demonstrate the detection of GEI without 34 environmental data. 35

Upload: others

Post on 26-Jun-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: variability in the UK Biobank · 1 1 Genotype-by-environment interactions inferred from genetic effects on phenotypic 2 variability in the UK Biobank 3 4 Huanwei Wang1, Futao Zhang1,

1

Genotype-by-environmentinteractionsinferredfromgeneticeffectsonphenotypic1 variabilityintheUKBiobank2 3 HuanweiWang1,FutaoZhang1,JianZeng1,YangWu1,KathrynE.Kemper1,AngliXue1,Min4 Zhang1,JosephE.Powell1,2,3,MichaelE.Goddard4,5,NaomiR.Wray1,6,PeterM.Visscher1,6,Allan5 F.McRae1,JianYang1,6,76 7 1InstituteforMolecularBioscience,TheUniversityofQueensland,Brisbane,Queensland4072,8 Australia9 2Garvan-WeizmannCentreforCellularGenomics,GarvanInstituteforMedicalResearch,10 Sydney,NSW2010,Australia11 3FacultyofMedicine,UniversityofNewSouthWales,Sydney,NSW2052,Australia12 4FacultyofVeterinaryandAgriculturalScience,UniversityofMelbourne,Parkville,Victoria,13 Australia;14 5BiosciencesResearchDivision,DepartmentofEconomicDevelopment,Jobs,Transportand15 Resources,Bundoora,Victoria,Australia16 6QueenslandBrainInstitute,TheUniversityofQueensland,Brisbane,Queensland4072,17 Australia18 7InstituteforAdvancedResearch,WenzhouMedicalUniversity,Wenzhou,Zhejiang325027,19 China20 Correspondence:JianYang([email protected])21 22 Abstract23 Genotype-by-environmentinteraction(GEI)isafundamentalcomponentinunderstanding24 complextraitvariation.However,itremainschallengingtoidentifygeneticvariantswithGEI25 effectsinhumanslargelybecauseofthesmalleffectsizesandthedifficultyofmonitoring26 environmentalfluctuations.Here,wedemonstratethatGEIcanbeinferredfromgenetic27 variantsassociatedwithphenotypicvariabilityinalargesamplewithouttheneedofmeasuring28 environmentalfactors.Weperformedagenome-widevariancequantitativetraitlocus(vQTL)29 analysisof~5.6millionvariantson348,501unrelatedindividualsofEuropeanancestryfor1330

quantitativetraitsintheUKBiobank,andidentified75significantvQTLswithP<2.0´10-9for931 traits,especiallyforthoserelatedtoobesity.DirectGEIanalysiswithfiveenvironmentalfactors32 showedthatthevQTLswerestronglyenrichedwithGEIeffects.Ourresultsindicatepervasive33 GEIeffectsforobesity-relatedtraitsanddemonstratethedetectionofGEIwithout34 environmentaldata. 35

Page 2: variability in the UK Biobank · 1 1 Genotype-by-environment interactions inferred from genetic effects on phenotypic 2 variability in the UK Biobank 3 4 Huanwei Wang1, Futao Zhang1,

2

Introduction36 Mosthumantraitsarecomplexbecausetheyareaffectedbymanygeneticandenvironmental37 factorsaswellaspotentialinteractionsbetweenthem1,2.Despitethelonghistoryofeffort3-5,38 therehasbeenlimitedsuccessinidentifyinggenotype-by-environmentinteraction(GEI)effects39 inhumans5-8.Thisislikelybecausemanyenvironmentalexposuresareunknownordifficultto40 recordduringthelifecourse,andbecausetheeffectsizesofGEIaresmallgiventhepolygenic41 natureofmosthumantraits9-11sothatthesamplesizesofmostpreviousstudiesarenotlarge42 enoughtodetectthesmallGEIeffects.43 44 TheGEIeffectofageneticvariantonaquantitativetraitcouldleadtodifferencesinvarianceof45 thetraitamonggroupsofindividualswithdifferentvariantgenotypes(Figure1a-b).GEIeffects46 canthereforebeinferredfromavariancequantitativetraitlocus(vQTL)analysis12.Unlikethe47 classicalquantitativetraitlocus(QTL)analysisthatteststheallelicsubstitutioneffectofa48 variantonthemeanofaphenotype(Figure1c),vQTLanalysisteststheallelicsubstitutioneffect49 onthetraitvariance(Figure1bor1d).IncomparisontotheanalysesthatperformdirectGEI50 tests,vQTLanalysiscouldbeamorepowerfulapproachtoidentifyGEIbecauseitdoesnot51 requiremeasuresofenvironmentalfactorsandthuscanbeperformedindatawithverylarge52 samplesizes13.Althoughtherehadbeenempiricalevidenceforthegeneticcontrolof53 phenotypicvarianceinlivestockfordecades14,15,itwasnotuntilrecentyearsthatgenome-wide54 vQTLanalysiswasappliedinhumans12,16,17,andonlyahandfulofvQTLshavebeenidentified55 foralimitednumberoftraits(e.g.theFTOlocusforbodymassindex(BMI)17)owingtothe56 smalleffectsizesofthevQTLs.Theavailabilityofdatafromlargebiobank-basedgenome-wide57 associationstudies(GWAS)18,19provideanopportunitytointerrogatethegenomeforvQTLsfor58 arangeofphenotypesincohortswithunprecedentedsamplesize.59 60 Ontheotherhand,thestatisticalmethodsforvQTLanalysisarenotentirelymature13.There61 havebeenaseriesofclassicalnon-parametricmethods20,originallydevelopedtodetect62 violationofthehomogeneousvarianceassumptioninlinearregressionmodel,whichcanbe63 usedtodetectvQTLs,includingtheBartlett’stest21,theLevene’stest22,23andtheFligner-Killen64 test24.Recently,moreflexibleparametricmodelshavebeenproposed,includingthedouble65 generalizedlinearmodel(DGLM)25-27andthelikelihoodratiotest28.Inaddition,ithasbeen66 suggestedthatthetransformationofphenotypethataltersphenotypedistributionalsohasan67 influenceonthepowerand/orfalsepositiverate(FPR)ofavQTLanalysis16,29.68 69 Inthisstudy,wecalibratedthemostcommonlyusedstatisticalmethodsforvQTLanalysisby70 extensivesimulations.Wethenusedthebestperformingmethodtoconductagenome-wide71

Page 3: variability in the UK Biobank · 1 1 Genotype-by-environment interactions inferred from genetic effects on phenotypic 2 variability in the UK Biobank 3 4 Huanwei Wang1, Futao Zhang1,

3

vQTLanalysisfor13quantitativetraitsin348,501unrelatedindividualsusingthefullreleaseof72 theUKBiobank(UKB)data18.WefurtherinvestigatedwhetherthedetectedvQTLsareenriched73 forGEIbyconductingadirectGEItestforthevQTLswithfiveenvironmentalfactors.74 75 Results76 EvaluationofthevQTLmethodsbysimulation77 WeusedsimulationstoquantifytheFPRandpower(i.e.,truepositiverate)forthevQTL78 methodsandphenotypeprocessingstrategies(Methods).Wefirstsimulatedaquantitativetrait79 basedonasimulatedsinglenucleotidepolymorphism(SNP),i.e.,asingle-SNPmodel,undera80 numberofdifferentscenarios,namely:1)fivedifferentdistributionsfortherandomerrorterm81 (i.e.,individual-specificenvironmenteffect);2)fourdifferenttypesofSNPwithorwithoutthe82 effectonmeanorvariance(Methods).Weusedthesimulateddatatocomparefourmostwidely83 usedvQTLmethods,namelytheBartlett’stest21,theLevene’stest22,23,theFligner-Killen(FK)84 test24andtheDGLM25-27.WeobservednoinflationinFPRfortheLevene’stestunderthenull85 (i.e.,novQTLeffect)regardlessoftheskewnessorkurtosisofthephenotypedistributionorthe86 presenceorabsenceoftheSNPeffectonmean(SupplementaryFigure1a).Thesefindingsarein87 linewiththeresultsfrompreviousstudies16,20,30thatdemonstratetheLevene’stestisrobustto88 thedistributionofphenotype.TheFPRoftheBartlett’stestorDGLMwasinflatedifthe89 phenotypedistributionwasskewedorheavy-tailed(SupplementaryFigure1a).TheFKtest90 seemedtoberobusttokurtosisbutvulnerabletoskewnessofthephenotypedistribution91 (SupplementaryFigure1a).Wealsoobservedthatlogarithmorrank-basedinverse-normal92 transformation(RINT)couldresultininflatedteststatisticsinthepresenceofQTLeffect(i.e.,93 SNPeffectonmean;SupplementaryFigure1b).94 95 Tosimulatemorecomplexscenarios,weusedamultiple-SNPmodelwithtwocovariates(age96 andsex)anddifferentnumbersofSNPs(Figure2).Theresultsweresimilartothoseobserved97 above,althoughthepoweroftheLevene’stestdecreasedwithanincreaseofthenumberof98 causalSNPs(Figure2a).Again,logarithmtransformationorRINTgaverisetoaninflatedFPRin99 thepresenceofSNPeffectonmean,andRINTledtoafurtherlossofpower(Figure2b).These100 resultsalsosuggestedthatpre-adjustingthephenotypebycovariatesslightlyincreasedthe101 powerofvQTLdetection(Figure2b).WethereforeusedtheLevene’stestforrealdataanalysis102 withthephenotypespre-adjustedforcovariateswithoutlogarithmtransformationorRINT.103 104 Genome-widevQTLanalysisfor13UKBtraits105 Weperformedagenome-widevQTLanalysisusingtheLevene’stestwith5,554,549genotyped106 orimputedcommonvariantson348,501unrelatedindividualsofEuropeanancestryfor13107

Page 4: variability in the UK Biobank · 1 1 Genotype-by-environment interactions inferred from genetic effects on phenotypic 2 variability in the UK Biobank 3 4 Huanwei Wang1, Futao Zhang1,

4

quantitativetraitsintheUKB18(Methods,SupplementaryTable1andSupplementaryFigure2).108 Foreachtrait,wepre-adjustedthephenotypeforageandthefirst10principalcomponents109 (PCs,derivedfromSNPdata)andstandardisedtheresidualstoz-scoresineachgendergroup110 (Methods).Thisprocessremovednotonlytheeffectsofageandthefirst10PCsonthe111 phenotypebutalsothedifferencesinmeanandvariancebetweenthetwogenders.Weexcluded112 individualswithadjustedphenotypesmorethan5standarddeviations(SD)fromthemeanand113 removedSNPswithminorallelefrequency(MAF)smallerthan0.05toavoidpotentialfalse114 positiveassociationsduetothecoincidenceofalow-frequencyvariantwithanoutlier115 phenotype(seeSupplementaryFigure3foranexample).Weacknowledgethatthisprocess116 couldpotentiallyresultinalossofpower,butthiscanbecompensatedforbytheuseofavery117 largesample(n~350,000).118 119 Withanexperiment-wisesignificantthreshold2.0´10-9(i.e.,1´10-8/5.03with1´10-8beinga120

morestringentgenome-widesignificantthresholdrecommendedbyrecentstudies31,32and5.03121 beingtheeffectivenumberofindependenttraits(SupplementaryNote3)),weidentified75122 vQTLsfor9traits(Figure3,Table1andSupplementaryTable2).TherewasnovQTLforheight,123 consistentwiththeobservationinapreviousstudy17.Weidentifiedmorethan15vQTLsfor124 eachofthethreeobesity-relatedtraits,i.e.,BMI,waistcircumference(WC),andhip125 circumference(HC)(Table1).The75vQTLswerelocatedat40near-independentlociafter126 excludingoneofeachpairoftopvQTLSNPs(i.e.,theSNPwithlowestvQTLp-valueateach127 vQTLassociationpeak)withlinkagedisequilibrium(LD)r2>0.01,suggestingthatsomeofthe128 lociwereassociatedwiththephenotypicvarianceofmultipletraits.Forexample,theFTOlocus129 wasassociatedwiththephenotypicvarianceofWC,HC,BMI,bodyfatpercentage(BFP)and130 basalmetabolicrate(BMR)(Figure4).Forthelung-function-relatedtraits,therewasno131 significantvQTLforforcedexpiratoryvolumeinonesecond(FEV1)andforcedvitalcapacity132 (FVC)butwere3vQTLsforFEV1/FVCratio(FFR).133 134 TheLevene’stestassessesthedifferenceinvarianceamongthreegenotypegroupsfreeofthe135 assumptionaboutadditivity(i.e.,thevQTLeffectofcarryingtwocopiesoftheeffectalleleisnot136 assumedtobetwicethatcarryingonecopy).WefoundtwovQTLs(i.e.,rs141783576and137 rs10456362)potentiallyshowingnon-additivegeneticeffectonthevarianceofHCandBMR,138 respectively(SupplementaryTable2).139 140 GWASanalysisforthe13UKBtraits141 ToinvestigatewhethertheSNPswitheffectsonvariancealsohaveeffectsonmean,we142 performedGWAS(orgenome-wideQTL)analysesforthe13UKBtraitsdescribedabove.We143

Page 5: variability in the UK Biobank · 1 1 Genotype-by-environment interactions inferred from genetic effects on phenotypic 2 variability in the UK Biobank 3 4 Huanwei Wang1, Futao Zhang1,

5

identified3,803QTLsatanexperiment-wisesignificancelevel(i.e.,PQTL<2.0´10-9)forthe13144 traitsintotal,amuchlargernumberthanthatofthevQTLs(Table1andFigure5).Amongthe145 75vQTLs,thetopvQTLSNPsat9locididnotpasstheexperiment-wisesignificancelevelinthe146 QTLanalysis(SupplementaryTable2).Forexample,theCCDC92locusshowedasignificant147 vQTLeffectbutnosignificantQTLeffectonWC(SupplementaryTable2andFigure6a),148 whereastheFTOlocusshowedbothsignificantQTLandvQTLeffectsonWC(Figure6b).Forthe149 66vQTLswithbothQTLandQTLeffects,thevQTLeffectswereallinthesamedirectionsasthe150 QTLeffects,meaningthatforanyoftheseSNPsthegenotypegroupwithlargerphenotypic151 meanalsotendstohavelargerphenotypicvariancethantheothergroups.Forthe9lociwith152 vQTLeffectsonly,itisequivalenttoascenariowhereaQTLhasaGEIeffectwithno(ora153 substantiallyreduced)effectonaverageacrossdifferentlevelsofanenvironmentalfactor154 (Figure1b).155 156 vQTLandGEI157 TofurtherinvestigatewhethertheassociationsbetweenvQTLsandphenotypicvariancecanbe158 explainedbyGEI,weperformedadirectGEItestbasedonanadditivegeneticmodelwithan159 interactiontermbetweenatopvQTLSNPandoneoffiveenvironmental/covariatefactorsinthe160 UKBdata(Methods).Thefiveenvironmentalfactorsaresex,age,physicalactivity(PA),161 sedentarybehaviour(SB),andeversmoking(SupplementaryNote4,SupplementaryFigure4162 andSupplementaryTable3).Weobserved16vQTLsshowingasignificantGEIeffectwithat163

leastoneoffiveenvironmentalfactorsaftercorrectingformultipletests(p<1.3´10-4=164 0.05/(75*5);Figure7aandSupplementaryTable4).165 166 TotestwhethertheGEIeffectsareenrichedamongvQTLsincomparisonwiththesamenumber167 ofQTLs,weperformedGEItestfor75topGWASSNPsrandomlyselectedfromalltheQTLsand168 repeatedtheanalysis1000times.Ofthe75topSNPswithQTLeffects,thenumberofSNPswith169 significantGEIeffectswas1.39averagedfromthe1000repeatedsamplingswithaSDof1.15170 (Figure7b),significantlylowerthenumber(16)observedforthevQTLs(thedifferenceislarger171 than12SDs,equivalenttop=6.6´10-37).ThisresultshowsthatSNPswithvQTLeffectsare172

muchmoreenrichedwithGEIeffectscomparedtothosewithQTLeffects.Toexcludethe173 possibilitythattheGEIsignalsweredrivenbyphenotypeprocessing(e.g.,theadjustmentof174 phenotypeforsexandage),werepeatedtheGEIanalysesusingrawphenotypedatawithout175 covariatesadjustment;theresultsremainlargelyunchanged(SupplementaryFigure5).176 177 Discussion178

Page 6: variability in the UK Biobank · 1 1 Genotype-by-environment interactions inferred from genetic effects on phenotypic 2 variability in the UK Biobank 3 4 Huanwei Wang1, Futao Zhang1,

6

Inthisstudy,weleveragedthegeneticeffectsassociatedwithphenotypicvariabilitytoinfer179 GEI.WecalibratedthemostcommonlyusedvQTLmethodsbysimulation.Wefoundthatthe180 FPRoftheLevene’stestwaswell-calibratedacrossallsimulationscenarioswhereastheother181 methodsshowedaninflatedFPRifthephenotypedistributionwasskewedorheavy-tailed182 underthenullhypothesis(i.e.,novQTLeffect),despitethattheLevene’stestappearedtobeless183 powerfulthantheothermethodsunderthealternativehypothesisinparticularwhentheper-184 variantvQTLeffectwassmall(Figure2andSupplementaryFigure1).Parametricbootstrapor185 permutationprocedureshavebeenproposedtoreducetheinflationinthetest-statisticsof186 DGLMandLRT-basedmethod,bothofwhichareexpectedtobemorepowerfulthanthe187 Levene’stest28,30,butbootstrapandpermutationarecomputationallyinefficientandthusnot188 practicallyapplicabletobiobankdatasuchastheUKB.Inaddition,weobservedinflatedFPRfor189 theLevene’stestintheabsenceofvQTLeffectsbutinthepresenceofQTLeffectsifthe190 phenotypewastransformedbylogarithmtransformationorRINT.Wethereforerecommend191 theuseoftheLevene’stestinpracticewithoutlogarithmtransformationorRINTofthe192 phenotype.Inaddition,averyrecentstudybyYoungetal.33developedanefficientalgorithmto193 performaDGLManalysisandproposedamethod(calleddispersioneffecttest(DET))to194 removethefoundinginvQTLassociations(identifiedbyDGLM)duetotheQTLeffects.We195 showedbysimulationthatwhenthenumberofsimulatedcausalvariantswasrelativelylarge196 (notethattheDETtestisnotapplicabletooligogenictraits),theYoungetal.method(DGLM197 followedbyDET)performedsimilarlyastheLevene’stestwithdifferencesdependingonhow198 thephenotypewasprocessed(SupplementaryFigure6).199 200 Weidentified75geneticvariantswithvQTLeffectsfor9quantitativetraitsintheUKBata201 stringentsignificancelevelandobservedstrongenrichmentofGEIeffectsamongthegenetic202 variantswithvQTLeffectscomparedtothosewithQTLeffects.ThereareseveralvQTLsfor203 whichtheGEIeffecthasbeenreportedinpreviousstudies.Thefirstexampleistheinteraction204 effectoftheCHRNA5-A3-B4locus(rs56077333)withsmokinglungfunction(asmeasuredby205 FFRratio,i.e.,FEV1/FVC),PvQTL=1.1´10-14andPGEI(smoking)=4.6´10-25(SupplementaryTable2206

and4).TheCHRNA5-A3-B4geneclusterisknowntobeassociatedwithsmokingandnicotine207 dependence34-36.However,resultsfromrecentGWASstudies37-39donotsupporttheassociation208 ofthislocuswithlungfunction.WehypothesizethattheeffectoftheCHRNA5-A3-B4locuson209 lungfunctiondependsonsmoking40(SupplementaryTable5).ThevQTLsignalatthislocus210

remained(PvQTL=5.2´10-12)afteradjustingthephenotypeforarrayeffect,whichwasreported211 toaffecttheQTLassociationsignalatthislocus18.Thesecondexampleistheinteractionofthe212

WNT16-CPED1locuswithageforBMD(rs10254825:PvQTL=2.0´10-45andPGEI(age)=1.2´10-7).213 TheWNT16-CPED1locusisoneofthestrongestBMD-associatedlociidentifiedfromGWAS41,42.214

Page 7: variability in the UK Biobank · 1 1 Genotype-by-environment interactions inferred from genetic effects on phenotypic 2 variability in the UK Biobank 3 4 Huanwei Wang1, Futao Zhang1,

7

Weobservedagenotype-by-ageinteractioneffectatthislocusforBMD(SupplementaryTable215 6),inlinewiththeresultsfrompreviousstudiesthattheeffectofthetopSNPatWNT16-CPED1216 onBMDinhumans43andtheknock-outeffectofWnt16onbonemassinmice44areage-217 dependent.ThethirdexampleistheinteractionoftheFTOlocuswithphysicalactivityand218 sedentarybehaviourforobesity-relatedtraits(PvQTL<1´10-10forBMI,WC,HC,BFPandBMR;219

PGEI(PA)=1.3´10-10forBMI,1.4´10-7forWC,5.3´10-7forHCand2.6´10-7forBMR).TheFTO220 locuswasoneofthefirstlociidentifiedbytheGWASofobesity-relatedtraits45although221 subsequentstudies46,47showthatIRX3andIRX5(ratherthanFTO)arethefunctionalgenes222 responsiblefortheGWASassociation.ThetopassociatedSNPattheFTOlocusisnotassociated223 withphysicalactivitybutitseffectonBMIdecreaseswiththeincreaseofphysicalactivity224 level48,49,consistentwiththeinteractioneffectsoftheFTOlocuswithphysicalactivityor225 sedentarybehaviourforobesity-relatedtraitsidentifiedinthisstudy(SupplementaryTables7226 and8).Inaddition,5ofthe22BMIvQTLswereinLD(r2>0.5)withthevariants(identifiedbya227 recentlydevelopedmultiple-environmentGEItest)showingsignificantinteractioneffectsat228

FDR<5%(correspondingtop<1.16´10-3)withatleastoneof64environmentalfactorsfor229 BMIintheUKB50.230 231 ApartfromGEI,thereareotherpossibleinterpretationsofanobservedvQTLsignal,including232 “phantomvQTLs”28,51andepistasis(genotype-by-genotypeinteraction).Iftheunderlyingcausal233 QTLisnotwellimputedornotwelltaggedbyagenotyped/imputedvariant,theuntagged234 variationatthecausalQTLwillinflatethevQTLtest-statistic,potentiallyleadingtoaspurious235 vQTLassociation,i.e.,theso-calledphantomvQTL.Weshowedbytheoreticaldeviationsthat236 theLevene’stest-statisticduetothephantomvQTLeffectwasafunctionofsamplesize,effect237 sizeofthecausalQTL,allelefrequencyofthecausalQTL,allelefrequencyofthephantomvQTL,238 andLDbetweenthecausalQTLandthephantomvQTL(SupplementaryNote5and239 SupplementaryFigure7).Fromourdeviations,wecomputedthenumericaldistributionofthe240 expectedphantomvQTLF-statisticsgivenanumberofparametersincludingthesamplesize(n241 =350,000),varianceexplainedbythecausalQTL(q2=0.005,0.01or0.02),andMAFsofthe242 causalQTLandthephantomvQTL(MAF=0.05–0.5).TheresultshowedthatforacausalQTL243 withq2<0.005andMAF>0.05,thelargestpossiblephantomvQTLF-statisticwassmallerthan244 2.69(correspondingtoap-valueof6.8´10-2;SupplementaryFigure8).Thisexplainswhythere245

werethousandsofgenome-widesignificantQTLsbutnosignificantvQTLforheight(Table1246 andFigure3).ThisresultalsosuggeststhatthevQTLsdetectedinthisstudyareveryunlikelyto247 bephantomvQTLsbecausetheestimatedvarianceexplainedbytheirQTLeffectswereall248 smallerthan0.005exceptforrs10254825attheWNT16locusonBMD(q2=0.014)249 (SupplementaryFigure9).However,ournumericalcalculationalsoindicatedthatforaQTL250

Page 8: variability in the UK Biobank · 1 1 Genotype-by-environment interactions inferred from genetic effects on phenotypic 2 variability in the UK Biobank 3 4 Huanwei Wang1, Futao Zhang1,

8

withMAF>0.3andq2<0.02,thelargestpossiblephantomvQTLF-statisticwassmallerthan251

5.64(correspondingtoap-valueof3.6´10-3),suggestingrs10254825isalsounlikelytobea252 phantomvQTL.NotethatweusedthevarianceexplainedestimatedatthetopGWASSNPto253 approximateq2ofthecausalQTLsothatq2waslikelytobeunderestimatedbecauseof254 imperfecttagging.However,consideringtheextremelyhighimputationaccuracyforcommon255 variants52,thestrongLDbetweenthecausalQTLsandtheGWAStopSNPsobservedina256 previoussimulationstudybasedonwhole-genome-sequencedata31,andtheoverestimationof257 varianceexplainedbytheGWAStopSNPsbecauseofwinner’scurse,theunderestimationin258 causalQTLq2islikelytobesmall.Inaddition,were-ranthevQTLanalysiswiththephenotype259 adjustedforthetopGWASvariantswithin10MbdistanceofthetopvQTLSNP;thevQTLsignals260 afterthisadjustmentwerehighlyconcordantwiththosewithoutadjustment(Supplementary261 Figure10).Wefurthershowedthattherewasnoevidenceforepistaticinteractionsbetweenthe262 topvQTLSNPsandanyotherSNPinmorethan10Mbdistanceoronadifferentchromosome263 (SupplementaryFigure11).264 265 Inconclusion,wesystematicallyquantifiedtheFPRandthepoweroffourcommonlyusedvQTL266 methodsbyextensivesimulationsanddemonstratedtherobustnessoftheLevene’stest.We267 alsoshowedthatinthepresenceofQTLeffectstheLevene’steststatisticcouldbeinflatedifthe268 phenotypewastransformedbylogarithmtransformationorRINT.Weimplementedthe269 Levene’stestaspartoftheOSCAsoftwarepackage53(URLs)forefficientgenome-widevQTL270 analysis,andappliedOSCA-vQTLto13quantitativetraitsintheUKBandidentified75vQTL(at271 40independentloci)associatedwith9traits,9ofwhichdidnotshowasignificantQTLeffect.272 Asaproof-of-principle,weperformedGEIanalysesintheUKBwith5environmentalfactors,273 anddemonstratedtheenrichmentofGEIeffectsamongthedetectedvQTLs.Wefurtherderived274 thetheorytocomputetheexpected“phantomvQTL”test-statisticduetountaggedcausalQTL275 effect,andshowedbynumericalcalculationthatourobservedvQTLswereveryunlikelytobe276 drivenbyimperfectlytaggedQTLeffects.Ourtheoryisalsoconsistentwiththeobservationof277 pervasivephantomvQTLsformoleculartraitswithlarge-effectQTLs(e.g.,DNAmethylation51).278 However,theconclusionsfromthisstudymaybeonlyapplicabletoquantitativetraitsof279 polygenicarchitecture.WecautionvQTLanalysisforbinaryorcategoricaltraits,ormolecular280 traits(e.g.,geneexpressionorDNAmethylation),forwhichthemethodsneedfurther281 investigation.282 283

Page 9: variability in the UK Biobank · 1 1 Genotype-by-environment interactions inferred from genetic effects on phenotypic 2 variability in the UK Biobank 3 4 Huanwei Wang1, Futao Zhang1,

9

Methods284 Simulationstudy285 WeusedaDGLM25-27tosimulatethephenotypebasedontwomodelswithsimulatedSNPdata286 inasampleof10,000individuals,i.e.,asingle-SNPmodelandmultiple-SNPmodelwithtwo287 covariates(i.e.ageandsex).Thesingle-SNPmodelcanbewrittenas288

𝑦 = 𝑤𝛽% + 𝑒with𝑙𝑜𝑔(𝜎-.) = 𝑤𝜙% + 𝑙𝑜𝑔(𝜎.)289

andthemultiple-SNPmodelcanbeexpressedas290

𝑦 = ∑ 𝑐345 𝛽67 + ∑ 𝑤89

5 𝛽%: + 𝑒with𝑙𝑜𝑔(𝜎-.) = ∑ 𝑐34

5 𝜙67 + ∑ 𝑤885 𝜙%: + 𝑙𝑜𝑔(𝜎

.),291

where𝑦isasimulatedphenotype;𝑤or𝑤8isastandardizedSNPgenotype,i.e.,𝑤 = (𝑥 −292

2𝑓)/@2𝑓(1 − 𝑓)with𝑥beingthegenotypeindicatorvariablecodedas0,1or2,generatedfrom293

binomial(2,f)andfbeingtheMAFgeneratedfromuniform(0.01,0.5);cjisastandardized294 covariatewithc1(sex)generatedfrombinomial(1,0.5)andc2(age)generatedfromuniform(20,295 60);eisanerrortermnormallydistributedwithmean0andvariance𝜎-..Tosimulatetheerror296 termwithdifferentlevelsofskewnessandkurtosis,wegenerated𝑒fromfivedifferent297 distributions,includingnormaldistribution,t-distributionwithdegreeoffreedom(df)=10or3298 and𝜒.distributionwithdf=15or1.𝛽(𝜙)istheeffectonmean(variance)generatedfrom299 N(0,1)ifexists,0otherwise.𝑙𝑜𝑔(𝜎.)istheinterceptofthesecondlinearmodelwhichwassetto300 0.Were-scaledthedifferentcomponentstocontrolthevarianceexplained,i.e.,0.1and0.9for301 thegenotypecomponentanderrorterm,respectively,forthesingle-SNPmodel,and0.2,0.4and302 0.4forthecovariatecomponent,genotypecomponentanderrorterm,respectively,forthe303 multiple-SNPmodel.WesimulatedtheSNPeffectsinfourdifferentscenarios:1)effecton304 neithermeannorvariance(nei),2)effectonmeanonly(mean),3)effectonvarianceonly(var),305 or4)effectonbothmeanandvariance(both).WesimulatedonlyonecausalSNPinthesingle-306 SNPmodeland4,40or80causalSNPsinthemultiple-SNPmodel.307 308 WeperformedvQTLanalysesusingthesimulatedphenotypeandSNPdatatocomparefour309 vQTLmethods,includingtheBartlett’stest21,theLevene’stest23,theFligner-Killeentest24and310 theDGLM(SupplementaryNote1).WealsoperformedtheLevene’stestwithfourphenotype311 processstrategies,includingrawphenotype(raw),rawphenotypeadjustedforcovariates(adj),312 RNITaftercovariateadjustment(rint),andlogarithmtransformationaftercovariateadjustment313 (log)(SupplementaryNote2).Werepeatedthesimulation1,000timesandcalculatedtheFPR314 andpoweratp<0.05atasingleSNPlevel.315 316 TheUKBiobankdata317

Page 10: variability in the UK Biobank · 1 1 Genotype-by-environment interactions inferred from genetic effects on phenotypic 2 variability in the UK Biobank 3 4 Huanwei Wang1, Futao Zhang1,

10

ThefullreleaseoftheUKBdatacomprisedofgenotypeandphenotypedatafor~500,000318 participatesacrosstheUK18.ThegenotypedatawerecleanedandimputedtotheHaplotype319 ReferenceConsortium(HRC)52andUK10K54referencepanelsbytheUKBteam.Genotype320 probabilitiesfromimputationwereconvertedtohard-callgenotypesusingPLINK255(--hard-321 call0.1).WeexcludedgeneticvariantswithMAF<0.05,Hardy-Weinbergequilibriumtestp322

value<1´10-5,missinggenotyperate>0.05orimputationINFOscore<0.3,andretained323 5,554,549variantsforanalysis.324 325 WeidentifiedasubsetofindividualsofEuropeanancestry(n=456,422)byprojectingtheUKB326 PCsontothoseof1000GenomeProject(1KGP)56.Furthermore,weremovedoneofeachpairof327 individualswithSNP-derived(basedonHapMap3SNPs)genomicrelatedness>0.05using328 GCTA-GRM57andretained348,501unrelatedEuropeanindividualsforfurtheranalysis.329 330 Weselected13quantitativetraitsforouranalysis(SupplementaryTable1andSupplementary331 Figure2).Therawphenotypevalueswereadjustedforageandthefirst10PCsineachgender332 group.Weexcludedfromtheanalysisphenotypevaluesthatweremorethan5SDfromthe333 mean.Thephenotypeswerethenstandardizedtoz-scoreswithmean0andvariance1.334 335 Genome-widevQTLanalysis336 Thegenome-widevQTLanalysiswasconductedusingtheLevene’stestimplementedinthe337 softwaretoolOSCA53(URLs).TheLevene’stestusedinthestudy(alsoknownasthemedian-338 basedLevene’stestortheBrown-Forsythetest23)isamodifiedversionoftheoriginalLevene’s339 test22developedin1960thatisessentiallyanone-wayanalysisofvariance(ANOVA)ofthe340 variable𝑧D3 = |𝑦D3 − 𝑦FG|,where𝑦D3isphenotypeofthej-thindividualinthei-thgroupand𝑦FG is341

themedianofthei-thgroup.TheLevene’steststatistic342

(𝑛 − 𝑘)(𝑘 − 1)

∑ 𝑛D8DJ5 (𝑧D. − 𝑧..).

∑ ∑ (LM3J5

8DJ5 𝑧D3 − 𝑧D.).

343

followsaFdistributionwith𝑘 − 1and𝑛 − 𝑘degreesoffreedom,wherenisthetotalsample344 size,kisthenumberofgroups(𝑘 = 3invQTLanalysis),𝑛D isthesamplesizeofthei-thgroup,345

i.e.𝑛 = ∑ 𝑛D8DJ5 ,𝑧D3 = |𝑦D3 − 𝑦FG|,𝑧D. =

5LM∑ 𝑧D3LM3J5 ,and𝑧.. =

5O∑ ∑ 𝑧D3

LM3J5

8DJ5 .346

347

Theexperiment-wisesignificancelevelwassetto2.0´10-9,whichisthegenome-wide348

significancelevel(i.e.1´10-8)31,32dividedbytheeffectivenumberofindependenttraits(i.e.5.03349

for13traits).Theeffectivenumberofindependenttraitswasestimatedbasedonthe350 phenotypiccorrelationmatrix58(SupplementaryNote3).Todeterminethenumberof351

Page 11: variability in the UK Biobank · 1 1 Genotype-by-environment interactions inferred from genetic effects on phenotypic 2 variability in the UK Biobank 3 4 Huanwei Wang1, Futao Zhang1,

11

independentvQTLs,weperformedanLDclumpinganalysisforeachtraitusingPLINK255(--352 clumpoptionwithparameters--clump-p12.0e-9--clump-p22.0e-9--clump-r20.01and--353 clump-kb5000).Tovisualizetheresults,wegeneratedtheManhattanandregionalassociation354 plotsusingggplot2packageinR.355 356 GWASanalysis357 TheGWAS(orgenome-wideQTL)analysiswasconductedusingPLINK255(--assocoption)using358 thesamedataasusedinthevQTLanalysis(notethatthephenotypehadbeenpre-adjustedfor359 covariatesandPCs).Theotheranalyses,includingLDclumping,andvisualization,were360 performedusingthesamepipelinesasthoseforgenome-widevQTLanalysisdescribedabove.361 362 GEIanalysis363 Fiveenvironmental/covariatefactors(i.e.,sex,age,PA,SBandsmoking)wereusedforthe364 directGEItests.Sexwascodedas0or1forfemaleormale.Agewasanintegernumberranging365 from40to74.PAwasassessedbyathree-levelcategoricalscore(i.e.,low,intermediateand366 high)basedontheshortformoftheInternationalPhysicalActivityQuestionnaire(IPAQ)367 guideline59.SBwasanintegernumberdefinedasthecombinedtime(hours)spentdriving,non-368 work-relatedcomputerusingorTVwatching.Thesmokingfactor“eversmoked”wascodedas0369 or1forneveroreversmoker.Moredetailsaboutthedefinitionandderivationof370 environmentalfactorPA,SBandsmokingcanbefoundintheSupplementaryNote4,Figure4371 andTable3.372 373 WeperformedaGEIanalysistotesttheinteractioneffectbetweenthetopvQTLSNPandoneof374 thefiveenvironmentalfactorsbasedonthefollowingmodel375

𝑦 = 𝜇 + 𝛽%𝑥% + 𝛽Q𝑥Q + 𝛽%Q𝑥%Q + 𝑒,376

whereyisphenotype,𝜇isthemeanterm,𝑥%ismean-centredSNPgenotypeindicator,𝑥Qis377

mean-centredenvironmentalfactor,and𝑥%Q = 𝑥%𝑥Q .WeusedastandardANOVAanalysisto378

testfor𝛽%QandappliedastringentBonferroni-correctedthreshold1.33´10-4(i.e.0.05/(75´5))379

toclaimasignificantGEIeffect. 380

Page 12: variability in the UK Biobank · 1 1 Genotype-by-environment interactions inferred from genetic effects on phenotypic 2 variability in the UK Biobank 3 4 Huanwei Wang1, Futao Zhang1,

12

URLs381 OSCA,http://cnsgenomics.com/software/osca382 PLINK2,http://www.cog-genomics.org/plink2383 GCTA,http://cnsgenomics.com/software/gcta384 UCSCGenomeBrowser,https://genome.ucsc.edu/385 UKB,http://www.ukbiobank.ac.uk/386 387 Acknowledgements388 ThisresearchwassupportedbytheAustralianResearchCouncil(DP160101343and389 DP160101056),theAustralianNationalHealthandMedicalResearchCouncil(1078037,390 1078901,1113400,1107258and1083656),andtheSylvia&CharlesViertelCharitable391 Foundation.ThisstudymakesuseofdatafromtheUKBiobank(projectID:12514).Afulllistof392 acknowledgmentsofthisdatasetcanbefoundinSupplementaryNote6.393 394 Authorcontributions395 J.Y.andA.F.M.conceivedthestudy.J.Y.,H.W.andA.F.M.designedtheexperiment.F.Z.developed396 thesoftwaretool.H.W.performedsimulationsanddataanalysesundertheassistanceor397 guidancefromJ.Y.,J.Z.,Y.W.,K.K.,A.X.andM.Z..J.E.P.,M.E.G.,N.R.W.andP.M.V.providedcritical398 advicethatsignificantlyimprovedtheexperimentaldesignand/orinterpretationoftheresults.399 P.M.V.,N.R.W.andJ.Y.contributedresourcesandfunding.H.W.andJ.Y.wrotethemanuscript400 withtheparticipationofallauthors.401 402 Competinginterests403 Theauthorsdeclarenocompetinginterests.404 405 References406 1. FalconerDS,MackayTFC.Introductiontoquantitativegenetics.4thed:Longman,407

Harlow;1996.408 2. LynchM,WalshB.Geneticsandanalysisofquantitativetraits.SinauerAssociates,409

Sunderland,Ma;1998.410 3. GarrodA.Theincidenceofalkaptonuria:astudyinchemicalindividuality.TheLancet.411

1902;160(4137):1616-1620.412 4. HaldaneJ.Heredityandpolitics.WWNorton&Co.,NY;1938.413 5. KraftP,HunterD.Integratingepidemiologyandgeneticassociation:thechallengeof414

gene–environmentinteraction.PhilosophicalTransactionsoftheRoyalSocietyB:415 BiologicalSciences.2005;360(1460):1609-1616.416

6. ThomasD.Gene–environment-wideassociationstudies:emergingapproaches.Nature417 ReviewsGenetics.2010;11(4):259.418

7. AschardH,LutzS,MausB,etal.Challengesandopportunitiesingenome-wide419 environmentalinteraction(GWEI)studies.HumGenet.2012;131(10):1591-1613.420

Page 13: variability in the UK Biobank · 1 1 Genotype-by-environment interactions inferred from genetic effects on phenotypic 2 variability in the UK Biobank 3 4 Huanwei Wang1, Futao Zhang1,

13

8. McAllisterK,MechanicLE,AmosC,etal.CurrentChallengesandNewOpportunitiesfor421 Gene-EnvironmentInteractionStudiesofComplexDiseases.AmJEpidemiol.422 2017;186(7):753-761.423

9. YangJ,LeeT,KimJ,etal.Ubiquitouspolygenicityofhumancomplextraits:genome-424 wideanalysisof49traitsinKoreans.PLoSgenetics.2013;9(3):e1003355.425

10. ShiH,KichaevG,PasaniucB.Contrastingthegeneticarchitectureof30complextraits426 fromsummaryassociationdata.TheAmericanJournalofHumanGenetics.427 2016;99(1):139-153.428

11. MaierRM,VisscherPM,RobinsonMR,WrayNR.Embracingpolygenicity:areviewof429 methodsandtoolsforpsychiatricgeneticsresearch.PsycholMed.2017:1-19.430

12. PareG,CookNR,RidkerPM,ChasmanDI.Ontheuseofvariancepergenotypeasatool431 toidentifyquantitativetraitinteractioneffects:areportfromtheWomen'sGenome432 HealthStudy.PLoSGenet.2010;6(6):e1000981.433

13. RönnegårdL,ValdarW.Recentdevelopmentsinstatisticalmethodsfordetecting434 geneticlociaffectingphenotypicvariability.BMCgenetics.2012;13(1):63.435

14. VanVleckLD.Variationofmilkrecordswithinpaternal-sibgroups.JournalofDairy436 Science.1968;51(9):1465-1470.437

15. HillWG,MulderHA.Geneticanalysisofenvironmentalvariation.GeneticsResearch.438 2010;92(5-6):381-395.439

16. StruchalinMV,DehghanA,WittemanJC,vanDuijnC,AulchenkoYS.Variance440 heterogeneityanalysisfordetectionofpotentiallyinteractinggeneticloci:methodand441 itslimitations.BMCGenet.2010;11:92.442

17. YangJ,LoosRJ,PowellJE,etal.FTOgenotypeisassociatedwithphenotypicvariability443 ofbodymassindex.Nature.2012;490(7419):267-272.444

18. BycroftC,FreemanC,PetkovaD,etal.TheUKBiobankresourcewithdeepphenotyping445 andgenomicdata.Nature.2018;562(7726):203-209.446

19. CollinsFS,VarmusH.Anewinitiativeonprecisionmedicine.NewEnglandJournalof447 Medicine.2015;372(9):793-795.448

20. ConoverWJ,JohnsonME,JohnsonMM.Acomparativestudyoftestsforhomogeneityof449 variances,withapplicationstotheoutercontinentalshelfbiddingdata.Technometrics.450 1981;23(4):351-361.451

21. BartlettMS.Propertiesofsufficiencyandstatisticaltests.Paperpresentedat:Proc.R.452 Soc.Lond.A1937.453

22. LeveneH.RobustTestsforEqualityofVariances.InIngramOlkin;HaroldHotelling;etal454 ContributionstoProbabilityandStatistics:EssaysinHonorofHaroldHotellingStanford455 UniversityPress,Stanford.1960:278–292.456

23. BrownMB,ForsytheAB.Robusttestsfortheequalityofvariances.Journalofthe457 AmericanStatisticalAssociation.1974;69(346):364-367.458

24. FlignerMA,KilleenTJ.Distribution-freetwo-sampletestsforscale.Journalofthe459 AmericanStatisticalAssociation.1976;71(353):210-213.460

25. RonnegardL,FellekiM,FikseF,MulderHA,StrandbergE.Geneticheterogeneityof461 residualvariance-estimationofvariancecomponentsusingdoublehierarchical462 generalizedlinearmodels.GenetSelEvol.2010;42:8.463

26. RonnegardL,ValdarW.Detectingmajorgeneticlocicontrollingphenotypicvariability464 inexperimentalcrosses.Genetics.2011;188(2):435-447.465

27. SmythGK.Generalizedlinearmodelswithvaryingdispersion.JournaloftheRoyal466 StatisticalSocietySeriesB(Methodological).1989:47-60.467

28. CaoY,WeiP,BaileyM,KauweJSK,MaxwellTJ.Aversatileomnibustestfordetecting468 meanandvarianceheterogeneity.GenetEpidemiol.2014;38(1):51-59.469

29. SunX,ElstonR,MorrisN,ZhuX.Whatisthesignificanceofdifferenceinphenotypic470 variabilityacrossSNPgenotypes?AmJHumGenet.2013;93(2):390-397.471

30. CortyRW,ValdarW.Mean-VarianceQTLMappingonaBackgroundofVariance472 Heterogeneity.bioRxiv.2018:276980.473

Page 14: variability in the UK Biobank · 1 1 Genotype-by-environment interactions inferred from genetic effects on phenotypic 2 variability in the UK Biobank 3 4 Huanwei Wang1, Futao Zhang1,

14

31. WuY,ZhengZ,VisscherPM,YangJ.Quantifyingthemappingprecisionofgenome-wide474 associationstudiesusingwhole-genomesequencingdata.Genomebiology.475 2017;18(1):86.476

32. PulitSL,deWithSA,deBakkerPI.Resettingthebar:Statisticalsignificanceinwhole-477 genomesequencing-basedassociationstudiesofglobalpopulations.Genetic478 epidemiology.2017;41(2):145-151.479

33. YoungAI,WauthierFL,DonnellyP.Identifyinglociaffectingtraitvariabilityand480 detectinginteractionsingenome-wideassociationstudies.NaturePublishingGroup;2018.481 1546-1718.482

34. SacconeSF,HinrichsAL,SacconeNL,etal.Cholinergicnicotinicreceptorgenes483 implicatedinanicotinedependenceassociationstudytargeting348candidategenes484 with3713SNPs.Humanmoleculargenetics.2006;16(1):36-49.485

35. ThorgeirssonTE,GellerF,SulemP,etal.Avariantassociatedwithnicotinedependence,486 lungcancerandperipheralarterialdisease.Nature.2008;452(7187):638.487

36. FowlerCD,LuQ,JohnsonPM,MarksMJ,KennyPJ.Habenularα5nicotinicreceptor488 subunitsignallingcontrolsnicotineintake.Nature.2011;471(7340):597.489

37. RepapiE,SayersI,WainLV,etal.Genome-wideassociationstudyidentifiesfiveloci490 associatedwithlungfunction.Naturegenetics.2010;42(1):36.491

38. HancockDB,EijgelsheimM,WilkJB,etal.Meta-analysesofgenome-wideassociation492 studiesidentifymultiplelociassociatedwithpulmonaryfunction.Naturegenetics.493 2010;42(1):45.494

39. WainLV,ShrineN,ArtigasMS,etal.Genome-wideassociationanalysesforlungfunction495 andchronicobstructivepulmonarydiseaseidentifynewlociandpotentialdruggable496 targets.Naturegenetics.2017;49(3):416.497

40. Kaur-KnudsenD,NordestgaardBG,BojesenSE.CHRNA3genotype,nicotinedependence,498 lungfunctionanddiseaseinthegeneralpopulation.EuropeanRespiratoryJournal.499 2012;40(6):1538-1544.500

41. EstradaK,StyrkarsdottirU,EvangelouE,etal.Genome-widemeta-analysisidentifies56501 bonemineraldensitylociandreveals14lociassociatedwithriskoffracture.Nature502 genetics.2012;44(5):491.503

42. KempJP,MorrisJA,Medina-GomezC,etal.Identificationof153newlociassociatedwith504 heelbonemineraldensityandfunctionalinvolvementofGPC6inosteoporosis.Nature505 genetics.2017;49(10):1468.506

43. Medina-GomezC,KempJP,EstradaK,etal.Meta-analysisofgenome-widescansfortotal507 bodyBMDinchildrenandadultsrevealsallelicheterogeneityandage-specificeffectsat508 theWNT16locus.PLoSgenetics.2012;8(7):e1002718.509

44. Movérare-SkrticS,HenningP,LiuX,etal.Osteoblast-derivedWNT16represses510 osteoclastogenesisandpreventscorticalbonefragilityfractures.Naturemedicine.511 2014;20(11):1279.512

45. FraylingTM,TimpsonNJ,WeedonMN,etal.AcommonvariantintheFTOgeneis513 associatedwithbodymassindexandpredisposestochildhoodandadultobesity.514 Science.2007;316(5826):889-894.515

46. SmemoS,TenaJJ,KimK-H,etal.Obesity-associatedvariantswithinFTOformlong-516 rangefunctionalconnectionswithIRX3.Nature.2014;507(7492):371.517

47. ClaussnitzerM,DankelSN,KimK-H,etal.FTOobesityvariantcircuitryandadipocyte518 browninginhumans.NewEnglandJournalofMedicine.2015;373(10):895-907.519

48. KilpeläinenTO,QiL,BrageS,etal.PhysicalactivityattenuatestheinfluenceofFTO520 variantsonobesityrisk:ameta-analysisof218,166adultsand19,268children.PLoS521 medicine.2011;8(11):e1001116.522

49. LoosRJ,YeoGS.ThebiggerpictureofFTO—thefirstGWAS-identifiedobesitygene.523 NatureReviewsEndocrinology.2014;10(1):51.524

50. MooreR,CasaleFP,JanBonderM,etal.Alinearmixed-modelapproachtostudy525 multivariategene–environmentinteractions.NatureGenetics.2018.526

Page 15: variability in the UK Biobank · 1 1 Genotype-by-environment interactions inferred from genetic effects on phenotypic 2 variability in the UK Biobank 3 4 Huanwei Wang1, Futao Zhang1,

15

51. EkWE,Rask-AndersenM,KarlssonT,EnrothS,GyllenstenU,JohanssonA.Genetic527 variantsinfluencingphenotypicvarianceheterogeneity.HumMolGenet.528 2018;27(5):799-810.529

52. McCarthyS,DasS,KretzschmarW,etal.Areferencepanelof64,976haplotypesfor530 genotypeimputation.NatGenet.2016;48(10):1279-1283.531

53. ZhangF,ChenW,ZhuZ,etal.OSCA:atoolforomic-data-basedcomplextraitanalysis.532 bioRxiv.2018:445163.533

54. TheUK10KConsortium.TheUK10Kprojectidentifiesrarevariantsinhealthand534 disease.Nature.2015;526(7571):82-90.535

55. ChangCC,ChowCC,TellierLC,VattikutiS,PurcellSM,LeeJJ.Second-generationPLINK:536 risingtothechallengeoflargerandricherdatasets.Gigascience.2015;4:7.537

56. GenomesProjectConsortium.Amapofhumangenomevariationfrompopulation-scale538 sequencing.Nature.2010;467(7319):1061.539

57. YangJ,LeeSH,GoddardME,VisscherPM.GCTA:atoolforgenome-widecomplextrait540 analysis.AmJHumGenet.2011;88(1):76-82.541

58. BrethertonCS,WidmannM,DymnikovVP,WallaceJM,BladéI.Theeffectivenumberof542 spatialdegreesoffreedomofatime-varyingfield.Journalofclimate.1999;12(7):1990-543 2009.544

59. IPAQResearchCommittee.Guidelinesfordataprocessingandanalysisofthe545 InternationalPhysicalActivityQuestionnaire(IPAQ)-shortandlongforms.546 www.ipaq.ki.se.2005.547

548

549

Page 16: variability in the UK Biobank · 1 1 Genotype-by-environment interactions inferred from genetic effects on phenotypic 2 variability in the UK Biobank 3 4 Huanwei Wang1, Futao Zhang1,

16

Figures550

551

552

Figure1.Schematicofthedifferencesinmeanorvarianceamonggenotypegroupsinthe553 presenceofGEI,QTLandvQTLeffect.Thephenotypesof1,000individualsweresimulated554 basedonageneticvariant(MAF=0.3)witha)bothQTLandGEIeffects,(b)GEIeffectonly(no555 QTLeffect),(c)QTLeffectonly(noGEIorvQTLeffect),or(d)vQTLonly(noQTLeffect).556

557

Page 17: variability in the UK Biobank · 1 1 Genotype-by-environment interactions inferred from genetic effects on phenotypic 2 variability in the UK Biobank 3 4 Huanwei Wang1, Futao Zhang1,

17

558 Figure2.Evaluationofthestatisticalmethodsandphenotypeprocessingstrategiesfor559 vQTLanalysisbysimulation.Phenotypesof10,000individualsweresimulatedbasedon560 differentnumberofSNPs(i.e.4,40or80),twocovariates(i.e.sexandage)andoneerrorterm561 inamultiple-SNPmodel(Methods).TheSNPeffectsweresimulatedunderfourscenarios:1)562 effectonneithermeannorvariance(nei),2)effectonmeanonly(mean),3)effectonvariance563 only(var),or4)effectonbothmeanandvariance(both).Theerrortermwasgeneratedfrom564 fivedifferentdistributions:normaldistribution,t-distributionwithdf=10or3,or𝜒.565 distributionwithdf=15or1.Inpanela,fourstatisticaltestmethods,i.e.,theBartlett’stest566

raw adj rint log

0.000.250.500.751.00

0.000.250.500.751.00

0.000.250.500.751.00

0.000.250.500.751.00

4 SNPs modelraw adj rint log

40 SNPs modelraw adj rint log

neim

eanvar

both

80 SNPs model

Bart Lev FK DGLM

0.000.250.500.751.00

0.000.250.500.751.00

0.000.250.500.751.00

0.000.250.500.751.00

4 SNPs modelBart Lev FK DGLM

40 SNPs modelBart Lev FK DGLM

neim

eanvar

both80 SNPs modela

b

Distribution of the residualsNormal distribution

t-distribution (df=10)

t-distribution (df=3)

Chi-squared distribution (df=15)

Chi-squared distribution (df=1)

FPR

FPR

Powe

rPo

wer

FPR

FPR

Powe

rPo

wer

Page 18: variability in the UK Biobank · 1 1 Genotype-by-environment interactions inferred from genetic effects on phenotypic 2 variability in the UK Biobank 3 4 Huanwei Wang1, Futao Zhang1,

18

(Bart),theLevene’stest(Lev),theFligner-Killentest(FK)andtheDGLM,wereusedtodetect567 vQTLs.Inpanelb,theLevene’stestwasusedtoanalysephenotypesprocessedusingfour568 strategies,i.e.,rawphenotype(raw),rawphenotypeadjustedforcovariates(adj),rank-based569 inverse-normaltransformationaftercovariateadjustment(rint),andlogarithmtransformation570 aftercovariateadjustment(log).TheFPRorpowerwascalculatedasthenumberofvQTLswith571 p<0.05dividedbythetotalnumberoftestsacross1,000simulations.Theredhorizontalline572 representsanFPRof0.05.573 574

Page 19: variability in the UK Biobank · 1 1 Genotype-by-environment interactions inferred from genetic effects on phenotypic 2 variability in the UK Biobank 3 4 Huanwei Wang1, Futao Zhang1,

19

575 Figure3.Manhattanplotsofgenome-widevQTLanalysisfor13traitsintheUKB.Foreach576 ofthe13traits(seeTable1forfullnamesofthetraits),teststatistics(-log10(PvQTL))ofall577

common(MAF³0.05)SNPsfromthevQTLanalysisareplottedagainsttheirphysicalpositions.578

Thedashlinerepresentsthegenome-widesignificancelevel1.0´10-8andthesolidline579

representstheexperiment-wisesignificancelevel2.0´10-9.Forgraphicalclarity,SNPswithPvQTL580

<1´10-25areomitted,SNPswithPvQTL<2.0´10-9arecolour-codedinorange,thetopvQTLSNP581 isrepresentedbyadiamond,andtheremainingSNPsarecolour-codedingreyorblueforodd582 orevenchromosome.583 584

Page 20: variability in the UK Biobank · 1 1 Genotype-by-environment interactions inferred from genetic effects on phenotypic 2 variability in the UK Biobank 3 4 Huanwei Wang1, Futao Zhang1,

20

585 Figure4.RegionalplotsoftheFTOlocusassociatedwiththephenotypicvariabilityof5586 traits.Foreachofthese5traitsforwhichthephenotypicvarianceissignificantlyassociated587 withtheFTOlocus,vQTLteststatistics(-log10(PvQTL))areplottedagainstSNPpositions588 surroundingthetopvQTLSNP(representedbyapurplediamond)attheFTOlocus.SNPsin589 differentlevelsofLDwiththetopvQTLSNPareshownindifferentcolours.TheRefSeqgenesin590 thetoppanelareextractedfromtheUCSCGenomeBrowser(URLs).591 592

Page 21: variability in the UK Biobank · 1 1 Genotype-by-environment interactions inferred from genetic effects on phenotypic 2 variability in the UK Biobank 3 4 Huanwei Wang1, Futao Zhang1,

21

593 Figure5.Manhattanplotsofgenome-widevQTLorQTLanalysisforwaistcircumference594 intheUKB.Teststatistics(-log10(PvQTL))ofallcommonSNPsfromvQTL(a)orQTL(b)analysis595 areplottedagainsttheirphysicalpositions.Thedashlinerepresentsthegenome-wide596

significancelevel1´10-8andthesolidlinerepresentstheexperiment-wisesignificancelevel597

2.0´10-9.Forgraphicalclarity,SNPswithPvQTL<1´10-25orPQTL<1´10-75areomitted,SNPswith598

P<2.0´10-9arecolour-codedinorange,thetopvQTLorQTLSNPisrepresentedbyadiamond,599 andtheremainingSNPsarecolour-codedingreyorblueforvQTLanalysis(a)orgreyorpink600 forQTLanalysis(b)foroddorevenchromosomes.601 602

Page 22: variability in the UK Biobank · 1 1 Genotype-by-environment interactions inferred from genetic effects on phenotypic 2 variability in the UK Biobank 3 4 Huanwei Wang1, Futao Zhang1,

22

603 Figure6.QTLandvQTLregionalplotsoftheCCDC92orFTOlocusforwaist604 circumference.TheQTLandvQTLteststatistics(i.e.,-log10(Pvalues))forwaistcircumference605 areplottedagainstSNPpositionssurroundingthetopvQTLSNPattheCCDC92(panela)orFTO606 locus(panelb).ThetopvQTLSNPisrepresentedbyapurplediamond.SNPsindifferentlevels607 ofLDwiththetopvQTLSNPareshownindifferentcolours.TheRefSeqgenesinthetoppanel608 areextractedfromtheUCSCGenomeBrowser(URLs).609 610

Page 23: variability in the UK Biobank · 1 1 Genotype-by-environment interactions inferred from genetic effects on phenotypic 2 variability in the UK Biobank 3 4 Huanwei Wang1, Futao Zhang1,

23

611 Figure7.EnrichmentofGEIeffectsamongthe75vQTLsincomparedwitharandomsetof612 QTLs.Fiveenvironmentalfactors,i.e.,sex,age,physicalactivity(PA),sedentarybehaviour(SB),613 andsmoking,wereusedintheGEIanalysis.(a)TheheatmapplotofGEIteststatistics(-614 log10(PGEI))forthe75topvQTLSNPs.“*”denotessignificantGEIeffectsafterBonferroni615

correction(PGEI<1.33´10-4=0.05/(75*5)).(b)ThedistributionofthenumberofsignificantGEI616 effectsfor75topQTLSNPsrandomlyselectedfromallthetopQTLSNPswith1000repeats617 (mean1.39andSD1.15).TheredlinerepresentsthenumberofsignificantGEIeffectsforthe75618 topvQTLSNPs(i.e.,16).619 620

0

100

200

300

0 5 10 15 20The number of top SNPs

Cou

nts

1

2

3

4

>=5

−log10(PGxE)

Trait

WC

HC

BMD

BW

BMI

BFP

BMR

WHR

FFR

a b

*

*

*

***

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

Sex Age PA SB Smoking

rs476828(MC4R)rs1421085(FTO)

rs78719460(GOLGA3)rs80083564(BDNF)

rs10456362(ZKSCAN4)rs62033406(FTO)

rs2523625(HLA−B)rs900399(CCNL1)rs1128249(GRB14)

rs2820468(LYPLAL1)rs459193(C5orf67)rs2238691(GIPR)

rs11152213(MC4R)rs1421085(FTO)

rs34898535(STX1B)rs8056890(ATP2A1)

rs10846580(CCDC92)rs17789506(KLF14)

rs141783576(RSPO3)rs72891717(TFAP2B)rs3132947(GPSM3)

rs34158769(BTN3A2)rs10200566(ADCY3)rs6751993(TMEM18)

rs62104180(FAM150B)rs2605098(LYPLAL1)

rs6685593(OPTC)rs1800437(GIPR)

rs11152213(MC4R)rs1421085(FTO)

rs34898535(STX1B)rs8056890(ATP2A1)rs7133378(CCDC92)rs12667251(KLF14)rs987237(TFAP2B)

rs4472337(UHRF1BP1)rs1062070(RNF5)rs13198716(ABT1)

rs12507026(GNPDA2)rs7649970(PPARG)

rs13412194(TMEM18)rs62104180(FAM150B)rs10913469(SEC16B)

rs2238691(GIPR)rs10871777(MC4R)rs11642015(FTO)

rs12716979(STX1B)rs4072402(RABEP2)

rs11057413(ZNF664−FAM101A)rs7132908(BCDIN3D)

rs2049045(BDNF)rs4132670(TCF7L2)rs17150703(MSRA)rs987237(TFAP2B)rs3132947(GPSM3)

rs34817112(PRSS16)rs12507026(GNPDA2)

rs10016841(SLIT2)rs1225053(CPNE4)rs1641155(FANCL)

rs10203386(ADCY3)rs6751993(TMEM18)

rs62104180(FAM150B)rs6689335(LYPLAL1)rs545608(SEC16B)rs13322435(CCNL1)rs603140(TMEM135)rs10254825(WNT16)

rs4576334(STARD3NL)rs3020332(ESR1)

rs9371221(CCDC170)rs1414660(GREM2)

rs56077333(CHRNA3)rs12374521(FBXO38)

rs6537292(HHIP)

Page 24: variability in the UK Biobank · 1 1 Genotype-by-environment interactions inferred from genetic effects on phenotypic 2 variability in the UK Biobank 3 4 Huanwei Wang1, Futao Zhang1,

24

Table1.Thenumberofexperiment-wisesignificantvQTLsorQTLsforthe13UKBtraits.621

Trait DescriptionNumber

ofvQTLs

Numberof

QTLs

HT Standingheight 0 1063

FVC Forcedvitalcapacity 0 325

FEV1 Forcedexpiratoryvolumein1-second 0 266

FFR FEV1andFVCratio 3 17

BMD HeelbonemineraldensityT-score,automated 6 267

BW Birthweight 1 57

BMI Bodymassindex 22 271

WC Waistcircumference 16 196

HC Hipcircumference 16 249

WHR WaisttoHipRatio 1 157

WHRadjBMI WHRadjustedforBMI 0 221

BFP Bodyfatpercentage 5 249

BMR Basalmetabolicrate 5 465

Total 75 3,803

622 623