qtl - biostatistics and medical informaticskbroman/teaching/...genotype at a mar k er. do a t-test /...

19
Introduction to QTL mapping in e xper imental crosses Kar l W Broman Depar tment of Biostatistics The Johns Hopkins Univ ersity http://biosun01.biostat.jhsph.edu/˜kbroman Outline Exper iments and data Models ANO V A at mar k er loci Inter v al mapping LOD thresholds LOD suppor t inter v als P o w er to detect QTLs Ho w man y mar k ers/mice? Selection bias Errors in the map Genotyping errors Selectiv e genotyping Co v ar iates Non-nor mal tr aits

Upload: others

Post on 04-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: QTL - Biostatistics and Medical Informaticskbroman/teaching/...genotype at a mar k er. Do a t-test / ANO V A. Repeat f or each mar k er. 40 50 60 80 Phenotype BB AB Genotype at D1M30

IntroductiontoQTLmappinginexperimentalcrosses

KarlWBroman

DepartmentofBiostatistics

TheJohnsHopkinsUniversity

http://biosun01.biostat.jhsph.edu/˜kbroman

Outline

�Experimentsanddata

�Models

�ANOVAatmarkerloci

�Intervalmapping

�LODthresholds

�LODsupportintervals

�PowertodetectQTLs

�Howmanymarkers/mice?

�Selectionbias

�Errorsinthemap

�Genotypingerrors

�Selectivegenotyping

�Covariates

�Non-normaltraits

Page 2: QTL - Biostatistics and Medical Informaticskbroman/teaching/...genotype at a mar k er. Do a t-test / ANO V A. Repeat f or each mar k er. 40 50 60 80 Phenotype BB AB Genotype at D1M30

Backcrossexperiment

P1

A A

P2

B B

F1

A B

BC BCBCBC

Intercrossexperiment

P1

A A

P2

B B

F1

A B

F2 F2F2F2

Page 3: QTL - Biostatistics and Medical Informaticskbroman/teaching/...genotype at a mar k er. Do a t-test / ANO V A. Repeat f or each mar k er. 40 50 60 80 Phenotype BB AB Genotype at D1M30

Phenotypedistributions

�WithineachoftheparentalandF�strains,individualsaregeneticallyidentical.

�Environmentalvariationmayormaynotbeconstantwithgenotype.

�Thebackcrossgenerationex-hibitsgeneticaswellasenvi-ronmentalvariation.

Parental strains

020406080100

020406080100

F1 generation

020406080100

Phenotype

Backcross generation

DataandGoals

Phenotypes:��=phenotypeformouse�

Genotypes:���=1/0ifmouse�isBB/ABatmarker�

(forabackcross)Geneticmap:Locationsofmarkers

Goals:

�Identifythe(oratleastone)genomicregions(QTLs)thatcontributetovariationinthephenotype.

�FormconfidenceintervalsforQTLlocations.

�EstimateQTLeffects.

Page 4: QTL - Biostatistics and Medical Informaticskbroman/teaching/...genotype at a mar k er. Do a t-test / ANO V A. Repeat f or each mar k er. 40 50 60 80 Phenotype BB AB Genotype at D1M30

0

20

40

60

80

Chromosome

Location (cM)

12345678910111213141516171819X

Genetic map

20406080100120

20

40

60

80

100

120

Markers

Individuals

12345678910111213141516171819X

Missing genotypes

Page 5: QTL - Biostatistics and Medical Informaticskbroman/teaching/...genotype at a mar k er. Do a t-test / ANO V A. Repeat f or each mar k er. 40 50 60 80 Phenotype BB AB Genotype at D1M30

Models:Recombination

Weassume:Mendel’srulesNocrossoverinterference

����� �� ����� �� ���

LocationsofcrossoversareaccordingtoaPoissonprocess.

�����formaMarkovchainwithtransitionprobabilities:

�������� ����� �� �������� ����� �� ����=recombinationfraction=�����������

�isthegeneticdistanceinMorgans.

Markovchain:������������!������!������!"""� �������������Models:Genotype#$Phenotype

Let�=phenotype

%=wholegenomegenotype

ImagineasmallnumberofQTLswithgenotypes%�!"""!%&.(�&distinctgenotypes)

E���%� '()�***�(+var���%� ,�()�***�(+

Homoscedasticity(constantvariance):,�(-,�

Normallydistributedresidualvariation:��%./�'(!,��.Additivity:'()�***�(+ '01& �2�3�%�(%� �or�)Epistasis:Anydeviationsfromadditivity.

Page 6: QTL - Biostatistics and Medical Informaticskbroman/teaching/...genotype at a mar k er. Do a t-test / ANO V A. Repeat f or each mar k er. 40 50 60 80 Phenotype BB AB Genotype at D1M30

Thesimplestmethod:ANOVA

�Alsoknownasmarkerregression.

�Splitmiceintogroupsaccordingtogenotypeatamarker.

�Doat-test/ANOVA.

�Repeatforeachmarker.40

50

60

70

80

Phenotype

BBAB

Genotype at D1M30

BBAB

Genotype at D2M99

Effectatamarker

ConsiderthecaseofasingleQTLwitheffect3 '44�'54.ConsideramarkerlinkedtotheQTL,with� recomb.frac.

OfindividualswithmarkergenotypeBB,meanphenotypeis:

'44�����0'54� '44��3

OfindividualswithmarkergenotypeAB,meanphenotypeis:

'54�����0'44� '540�3

Difference:�'44��3���'540�3� 3������

Page 7: QTL - Biostatistics and Medical Informaticskbroman/teaching/...genotype at a mar k er. Do a t-test / ANO V A. Repeat f or each mar k er. 40 50 60 80 Phenotype BB AB Genotype at D1M30

ANOVAatmarkerloci

Advantages

�Simple.

�Easilyincorporatescovariates.

�Easilyextendedtomorecomplexmodels.

�Doesn’trequireageneticmap.

Disadvantages

�Mustexcludeindividualswithmissinggenotypedata.

�ImperfectinformationaboutQTLlocation.

�Suffersinlowdensityscans.

�OnlyconsidersoneQTLatatime.

Intervalmapping(IM)

Lander&Botstein(1989)

�AssumeasingleQTLmodel.

�Eachpositioninthegenome,oneatatime,ispositedastheputativeQTL.

�Let6 1/0ifthe(unobserved)QTLgenotypeisBB/AB.Assume� '03607where7./��!,��.

�Givengenotypesatlinkedmarkers,�.mixtureofnormaldist’nswithmixingproportion��6 ��markerdata�:

QTLgenotype 8�89BBABBBBB:;<=>?:;<=@?A:;<=?=>=@A:;<=? BBAB:;<=>?=@A==>:;<=@?A= ABBB=>:;<=@?A=:;<=>?=@A= ABAB=>=@A:;<=?:;<=>?:;<=@?A:;<=?

Page 8: QTL - Biostatistics and Medical Informaticskbroman/teaching/...genotype at a mar k er. Do a t-test / ANO V A. Repeat f or each mar k er. 40 50 60 80 Phenotype BB AB Genotype at D1M30

Thenormalmixtures

8�89 B

7cM13cM

�Twomarkersseparatedby20cM,withtheQTLclosertotheleftmarker.

�Thefigureatrightshowthedis-tributionsofthephenotypecondi-tionalonthegenotypesatthetwomarkers.

�Thedashedcurvescorrespondtothecomponentsofthemixtures.

20406080100

Phenotype

BB/BB

BB/AB

AB/BB

AB/AB

µ

µ

µ

µ

µ+∆

µ+∆

µ+∆

µ+∆

Intervalmapping(continued)

LetC� ��6� ��markerdata����6�./�'036�!,��

�����markerdata!'!3!,� C�D���E'03!,�0���C��D���E'!,�

whereD��E'!,� FGHI����'�����,��J�K�L,�

Loglikelihood:M�'!3!,� 1�NOP�����markerdata!'!3!,�

Maximumlikelihoodestimates(MLEs)of',3,,:valuesforwhichM�'!3!,�ismaximized.

Page 9: QTL - Biostatistics and Medical Informaticskbroman/teaching/...genotype at a mar k er. Do a t-test / ANO V A. Repeat f or each mar k er. 40 50 60 80 Phenotype BB AB Genotype at D1M30

EMalgorithm

Dempsteretal.(1977)

Estep:

LetQRS��T ��6� ����!markerdata!U'RST!U3RST!U,RST� &VWRXVYZ[\]^�Z_\]^�Z`\]^T &VWRXVYZ[\]^�Z_\]^�Z`\]^T�R��&VTWRXVYZ[\]^�Z`\]^T

Mstep:

LetU'RS��T 1������QRS��T ���1����QRS��T �� U3RS��T 1���QRS��T ��1�QRS��T ��U'RS��TU,RS��T abitcomplicated

Thealgorithm:

StartwithQR�T � C�;iteratetheE&Mstepsuntilconvergence.

LODscores

TheLODscoreisameasureofthestrengthofevidenceforthepresenceofaQTLataparticularlocation.

LOD�6� NOP�alikelihoodratiocomparingthehypothesisofaQTLatposition6versusthatofnoQTL

NOP��b����QTLat6!U'c!U3c!U,c� ����noQTL!U'!U,�d

U'c!U3c!U,caretheMLEs,assumingasingleQTLatposition6.NoQTLmodel:Thephenotypesareindependentandidentically

distributed(iid)/�'!,��.

Page 10: QTL - Biostatistics and Medical Informaticskbroman/teaching/...genotype at a mar k er. Do a t-test / ANO V A. Repeat f or each mar k er. 40 50 60 80 Phenotype BB AB Genotype at D1M30

AnexampleLODcurve

020406080100

0

2

4

6

8

10

12

Chromosome position (cM)

LOD

05001000150020002500

0.0

0.5

1.0

1.5

2.0

2.5

3.0

Map position (cM)

lod

12345678910111213141516171819LOD curves

Page 11: QTL - Biostatistics and Medical Informaticskbroman/teaching/...genotype at a mar k er. Do a t-test / ANO V A. Repeat f or each mar k er. 40 50 60 80 Phenotype BB AB Genotype at D1M30

Intervalmapping

Advantages

�Takesproperaccountofmissingdata.

�Allowsexaminationofpositionsbetweenmarkers.

�GivesimprovedestimatesofQTLeffects.

�Providesprettygraphs.

Disadvantages

�Increasedcomputationtime.

�Requiresspecializedsoftware.

�Difficulttogeneralize.

�OnlyconsidersoneQTLatatime.

MultipleQTLmethods

WhyconsidermultipleQTLsatonce?

�Reduceresidualvariation.

�SeparatelinkedQTLs.

�InvestigateinterationsbetweenQTLs(epistasis).

Page 12: QTL - Biostatistics and Medical Informaticskbroman/teaching/...genotype at a mar k er. Do a t-test / ANO V A. Repeat f or each mar k er. 40 50 60 80 Phenotype BB AB Genotype at D1M30

LODthresholds

LargeLODscoresindicateevidenceforthepresenceofaQTL.

Q:Howlargeislarge?

$WeconsiderthedistributionoftheLODscoreunderthenullhypothesisofnoQTL.

Keypoint:WemustmakesomeadjustmentforourexaminationofmultipleputativeQTLlocations.

$WeseekthedistributionofthemaximumLODscore,genome-wide.The95th%ileofthisdistributionservesasagenome-wideLODthreshold.

Estimatingthethreshold:simulations,analyticalcalculations,per-mutation(randomization)tests.

NulldistributionoftheLODscore

�Nulldistributionderivedbycomputersimulationofbackcrosswithgenomeoftypicalsize.

�Solidcurve:distributionofLODscoreatanyonepoint.

�Dashedcurve:distributionofmaximumLODscore,genome-wide.

01234

LOD score

Page 13: QTL - Biostatistics and Medical Informaticskbroman/teaching/...genotype at a mar k er. Do a t-test / ANO V A. Repeat f or each mar k er. 40 50 60 80 Phenotype BB AB Genotype at D1M30

Permutationtests

mice

markers

genotypedata

phenotypes

eLOD:f? (asetofcurves)e8g h

ijkLOD:f?

�Permute/shufflethephenotypes;keepthegenotypedataintact.

�CalculateLODl:f?<m8lghijkLODl:f? �Wewishtocomparetheobserved8tothedistributionof8l.

�no:8lp8

?isagenome-wideP-value.

�The95th%ileof8lisagenome-wideLODthreshold.

�Wecan’tlookatallqrpossiblepermutations,butarandomsetof1000isfeasi-bleandprovidesreasonableestimatesofP-valuesandthresholds.

�Value:conditionsonobservedphenotypes,markerdensity,andpatternofmiss-ingdata;doesn’trelyonnormalityassumptionsorasymptotics.

LODsupportintervals

s-LODsupportinterval

�ChromosomalregionforwhichtheLODscoreiswithinsofitsmaximum.

�Generallys 1or2;Iprefers 1.5.

�PlotofLOD�tuG�LOD� depictsevidenceforQTLlocation.

2030405060

−2.0

−1.5

−1.0

−0.5

0.0

Chromosome position (cM)

LOD

− m

ax(LOD

)

Page 14: QTL - Biostatistics and Medical Informaticskbroman/teaching/...genotype at a mar k er. Do a t-test / ANO V A. Repeat f or each mar k er. 40 50 60 80 Phenotype BB AB Genotype at D1M30

PowertodetectQTLs

�ThepowertodetectaQTListhechancethatitsLODscoreexceedsthegenome-widethreshold.

�Powerdependson

–SizeoftheQTLeffect.

–Numberofprogeny.

–Typeofcross.

–Densityofmarkers.

–StringencyoftheLODthreshold.

�Atright:

–Dashedcurve:dist’nofmaxLODundernullhypothesis.

–Solidcurve:dist’nofLODscoreatQTL,withv2�aa. –Dottedcurve:dist’nofLODscoreatQTL,withv2�aa.

LOD score

0369

QTL effect = 5Power = 2%

LOD score

0369

QTL effect = 8Power = 41%

LOD score

0369

QTL effect = 11Power = 97%

Howmanymarkers/mice?

�Atright:

–Top:v2�aa –Bottom:v2�aa –Solid:10cMspacing

–Dashed:1cMspacing

�Moremice:

–Morerecombinationbreakpoints.

–Reducedsamplingvariation.

�Moremarkers:

–Moredetailedgenotypeinformation.

–Notnecessarilyincreasedprecision(dependsonnumberofmice,sizeofQTLeffect,andluck).

�Note:Thefiguresshouldbetakenwithagrainofsalt.

20253035404550

−2.0

−1.5

−1.0

−0.5

0.0

100 mice

Chromosome position (cM)

LOD

− m

ax(LOD

)

20253035404550

−2.0

−1.5

−1.0

−0.5

0.0

200 mice

Chromosome position (cM)

LOD

− m

ax(LOD

)

Page 15: QTL - Biostatistics and Medical Informaticskbroman/teaching/...genotype at a mar k er. Do a t-test / ANO V A. Repeat f or each mar k er. 40 50 60 80 Phenotype BB AB Genotype at D1M30

Selectionbias

�TheestimatedeffectofaQTLwillvarysomewhatfromitstrueeffect.

�OnlywhentheestimatedeffectislargewilltheQTLbedetected.

�AmongthoseexperimentsinwhichtheQTLisdetected,theestimatedQTLeffectwillbe,onaverage,largerthanitstrueeffect.

�Thisisselectionbias.

�SelectionbiasislargestinQTLswithsmallormoderateeffects.

�ThetrueeffectsofQTLsthatweidentifyarelikelysmallerthanwasobserved.

Estimated QTL effect

051015

QTL effect = 5Bias = 79%

Estimated QTL effect

051015

QTL effect = 8Bias = 18%

Estimated QTL effect

051015

QTL effect = 11Bias = 1%

Thegeneticmap:effectsoferrors

�Markerorder–CauseswigglyLODcurves.

–Shouldn’tcompletelyeliminateasignal.

�Mapdistances–Doesn’tseemtomakemuch

difference.

–MakesabigdifferenceinperceivedlengthofLODsupportintervals.

�Greatereffectsoferrorswithappreciablemissingdata

020406080100

0

2

4

6

8

10

Map position (cM)

lod

1

Error in marker order

050100150200

0

2

4

6

8

10

Map position (cM)

lod

1

Error in marker spacing

Page 16: QTL - Biostatistics and Medical Informaticskbroman/teaching/...genotype at a mar k er. Do a t-test / ANO V A. Repeat f or each mar k er. 40 50 60 80 Phenotype BB AB Genotype at D1M30

Thegeneticmap:findingproblems

�Consider:–Estimatedgeneticmap–Pairwiserecombination

fractions

�Misplacedmarkerscause:–Biggapsinthemap–Largerecombination

fractions

0

200

400

600

800

1000

1200

Chromosome

Location (cM)

1

Comparison of genetic maps

5101520

5

10

15

20

Markers

Markers

1

1

Pairwise recombination fractions and LOD scores

Genotypingerrors:effects

�Withgenotypingerrors,individualsareplacedinthewronggenotypegroup.

�Withwidelyspacedmarkers,thereislittleeffect.

�Withdensemarkers,errorsmaketheLODcurvehavemoredips.

020406080100

0

2

4

6

Map position (cM)

lod

1

Genotyping errors; 10 cM spacing

020406080100

0

2

4

6

8

Map position (cM)

lod

1

Genotyping errors; 2 cM spacing

Page 17: QTL - Biostatistics and Medical Informaticskbroman/teaching/...genotype at a mar k er. Do a t-test / ANO V A. Repeat f or each mar k er. 40 50 60 80 Phenotype BB AB Genotype at D1M30

Identifyinggenotypingerrors

�Lookfortightdoublecrossovers.(Crossoverinterferenceisoften

strong.)

�ErrorLODscores(Lincoln&Lander1992)

–Modelforgenotypingerrors.

–Assumederrorrate.

–Assumptionofnointerference.

–Atmarkerwinmousex,LOD= yz{�|}~�����inerror�markerdata��� ~�����correct�markerdata����

010203040

010

2030

4050

6070

Chromosome 5

Individual

Position (cM

)

510152025

10

20

30

40

Markers

Individuals

513

Genotyping error LOD scores

Selectivegenotyping

�Saveeffortbyonlytypingthemostinformativeindividuals(say,top&bottom10%).

�Usefulincontextofasingle,inexpensivetrait.

�TrickytoestimatetheeffectsofQTLs:useIMwithallphenotypes.

�Can’tgetatinteractions.

�Likelybettertoalsogenotypesomerandomportionoftherestoftheindividuals.

40

50

60

70

80

Phenotype

BBAB

All genotypes

BBAB

Top and botton 10%

Page 18: QTL - Biostatistics and Medical Informaticskbroman/teaching/...genotype at a mar k er. Do a t-test / ANO V A. Repeat f or each mar k er. 40 50 60 80 Phenotype BB AB Genotype at D1M30

Covariates

�Examples:treatment,sex,litter,lab,age.

�Controlresidualvariation.

�Avoidconfounding.

�LookforQTL�environ’tinteractions

�Adjustbeforeintervalmapping(IM)versusadjustwithinIM.

020406080100120

0

1

2

3

4

Map position (cM)

lod

7

Intercross, split by sex

020406080100120

0

1

2

3

4

Map position (cM)

lod

7

Reverse intercross

Non-normaltraits

�Standardintervalmappingassumesnormallydistributedresidualvariation.(Thusthephenotypedistributionisamixtureofnormals.)

�Inreality:weseedichotomoustraits,counts,skeweddistributions,outliers,andallsortsofoddthings.

�Intervalmapping,withLODthresholdsderivedfrompermutationtests,generallyperformsjustfineanyway.

�Alternativestoconsider:–Nonparametricapproaches(Kruglyak&Lander1995)

–Transformations(e.g.,log,squareroot)

–Specially-tailoredmodels(e.g.,ageneralizedlinearmodel,theCoxproportionalhazardmodel,andthemodelinBromanetal.2000)

Page 19: QTL - Biostatistics and Medical Informaticskbroman/teaching/...genotype at a mar k er. Do a t-test / ANO V A. Repeat f or each mar k er. 40 50 60 80 Phenotype BB AB Genotype at D1M30

SummaryI

�ANOVAatmarkerloci(akamarkerregression)issimpleandeasilyextendedtoincludecovariatesoraccommodatecomplexmodels.

�IntervalmappingimprovesonANOVAbyallowinginferenceofQTLstopositionsbetweenmarkersandtakingproperaccountofmissinggenotypedata.

�ANOVAandIMconsideronlysingle-QTLmodels.MultipleQTLmethodsallowthebetterseparationoflinkedQTLsandarenecessaryfortheinvestigationofepistasis.

�StatisticalsignificanceofLODpeaksrequiresconsiderationofthemaximumLODscore,genome-wide,underthenullhypothesisofnoQTLs.Permutationtestsareextremelyusefulforthis.

�1.5-LODsupportintervalsindicatetheplausiblelocationofaQTL.AplotoftheLODcurve,re-centeredsothatitsmaximumisat0,isavaluabletoolfordepictingevidenceforQTLlocation.

�Onceyou’veachieveda10cMmarkerspacing,moremicewillprobablybemoreimportantthanmoremarkers.Butthisdependsonthenumberofmice,thesizeoftheQTLeffect,andluck.

SummaryII

�EstimatesofQTLeffectsaresubjecttoselectionbias.Suchestimatedeffectsareoftentoolarge.

�Studyyourdata.Lookforerrorsinthegeneticmap,genotypingerrorsandphenotypeoutliers.Butdon’tworryaboutthemtoomuch.

�Selectivegenotypingcansaveyoutimeandmoney,butproceedwithcaution.

�Studyyourdata.Theconsiderationofcovariatesmayrevealextremelyinterestingphenomena.

�Intervalmappingworksreasonablywellevenwithnon-normaltraits.Butconsidertransformationsorspecially-tailoredmodels.Ifintervalmappingsoftwareisnotavailableforyourpreferredmodel,startwithsomeversionofANOVA.