resemblance between relaves - university of washington

230
Resemblance between rela.ves Mike Goddard

Upload: others

Post on 16-Oct-2021

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Resemblance between relaves - University of Washington

Resemblancebetweenrela.ves

MikeGoddard

Page 2: Resemblance between relaves - University of Washington

Whatdowemeanbyresemble?

Similarvaluesofquan.ta.vetraitsMeasurebycorrela.on

=Covariance(yi,yj)/variance(y)

Page 3: Resemblance between relaves - University of Washington

Whydorela.vesresembleeachother?

Page 4: Resemblance between relaves - University of Washington

Whydorela.vesresembleeachother?

Similar

GenesFamilyenvironmentCountrySchool

Page 5: Resemblance between relaves - University of Washington

ModelphenotypePhenotype=gene.ceffect

+country +yearofbirth +familyenvironment

FixedeffectsCountry,yearofbirth

Randomeffects

Gene.ceffect,familyenvironmentWeneedamodelofthecovariancesbetweenterms

Page 6: Resemblance between relaves - University of Washington

ModelphenotypePhenotype=gene.ceffect

+country +yearofbirth +familyenvironment +individualenvironment

V(phenotype)=V(gene.ceffects)+V(familyenvironment)

+V(individualenvironment)Cov(phenotypei,phenotypej)=Cov(gene.ceffects)

+Cov(familyenvironments)

Page 7: Resemblance between relaves - University of Washington

ModelphenotypeRandomeffects

Gene.ceffect,familyenvironmentWeneedamodelofthecovariancesbetweenterms

C(familyenvironments)= 0ifdifferentfamilies

1*VCEifsamefamily

Page 8: Resemblance between relaves - University of Washington

Covariancebetweengene.ceffectsofrela.ves

Modelwith1gene,2allelesandaddi.vegeneac.onWeneedgene.cvariancesandcovariancesGenotype BB Bb bbEffect a 0 -aFrequency p2 2pq q2 (p+q=1)Mean=a*p2 +0*2pq–a*q2=(p-q)*aVariance(gene.ceffect)=gene.cvariance=VG

=E(effect2)–E(effect)2

VG =a2*p2+0*2pq+a2*q2–[(p-q)*a]2=2pqa2

Page 9: Resemblance between relaves - University of Washington

Modelwith1gene,2allelesandaddi.vegeneac.on

CovariancebetweenparentandoffspringParent OffspringGenotype frequency BB Bb bb meanBB a p2 p q paBb 0 2pq 0.5p 0.5 0.5q 0.5(p-q)abb -a q2 p q -qaCov(parentgene.cvalue,offspringgene.cvalue)

=p2*a*pa+q2*(-a)*(-qa)–[(p-q)a]*[(p-q)a]=pqa2=0.5VG

Page 10: Resemblance between relaves - University of Washington

Modelwith1gene,2allelesandaddi.vegeneac.on

Covariancebetweenparentandoffspring(anotherway)Modelgene.cvalueassumofgame.ceffectsfrommotherandfatherg=xm+xfV(g)=V(xm)+V(xf)=2V(x)C(gp,go)=C(xmp+xfp,xmo+xfo)

=C(xmp,xmo)+C(xmp,xfo)+C(xfp,xmo)+C(xfp,xfo)=0 +? +0 +?

C(xmp,xfo) =V(x)ifxmpisibdtoxfo =0otherwise

C(xmp,xfo)=C(xfp,xfo)=0.5V(x)C(gp,go)=0+0.5V(x)+0+0.5V(x)=V(x)=0.5VG

Page 11: Resemblance between relaves - University of Washington

Probabilitythatrela.vesshareallelesIBD

Covariancebetweenrela.vesdependsonprobabilitythattheirallelesareIBDThisprobabilitycanbecalculatedfrompedigreesAssumethatbaseindividualsatthetopofthepedigree(iethosewithoutapedigree)haveunrelatedallelesietheindividualsareunrelatedRecurrenceformulaeforP(IBD)ifiandjarebaseindividuals,P(x.i≡x.j)=0Otherwise,P(x.i≡x_)=0.5[P(x.i≡x`)+P(x.i≡xmk)]wherekisthefatherofj

Page 12: Resemblance between relaves - University of Washington

Probabilitythatrela.vesshareallelesIBD

k(mk,`)i(mi,fi) j(mj,_)

Page 13: Resemblance between relaves - University of Washington

Rela.onshipsbetweenindividualsP(gametesareIBD)canbestoredinagame.crela.onshipmatrixG(wi,zj)=P(wi≡zj)ButusuallyweanalysemeasurementsondiploidindividualsC(gi,gj)=A(i,j)VG=[G(mi,mj)+G(mi,_)+G(fi,mj)+G(fi,_)]V(x)

=[G(mi,mj)+G(mi,_)+G(fi,mj)+G(fi,_)]VG/2A(i,j)=[G(mi,mj)+G(mi,_)+G(fi,mj)+G(fi,_)]/2whereAisthenumeratorrela.onshipmatrix

Page 14: Resemblance between relaves - University of Washington

Rela.onshipsbetweenindividualsExample:Rela.onshipofindividualwithherselfGame.crela.onshipmatrix

mi fimi 1 0fi 0 1Numeratorrela.onshipA(i,i)=[1+0+0+1]/2=1

Page 15: Resemblance between relaves - University of Washington

Rela.onshipsbetweenindividualsExample:Rela.onshipofsistersGame.crela.onshipmatrix

mi fimj 0.5 0_ 0 0.5Numeratorrela.onshipA(i,j)=[0.5+0+0+0.5]/2=0.5

Page 16: Resemblance between relaves - University of Washington

Rela.onshipsbetweenindividualsi=(im,if)andj=(jm,jf)Co-ancestryofiandj

=Inbreedingco-efficientofanoffspringofiandj

=Prob(offspringgetstwoallelesthatareIBD)=(P(im≡jm)+P(im≡jf)+P(if≡jm)+P(if≡jf))/4=A(i,j)/2

Addi.verela.onship(NRM)=2*co-ancestry

=2*kinship

Page 17: Resemblance between relaves - University of Washington

Es.ma.nggene.cvariance

Dataonphenotypes(y)ofrelatedsubjectsy=fixedeffects+g+eV(g)=AVG

V(e)=IVE

UseMLorREMLtoes.matevariances

Page 18: Resemblance between relaves - University of Washington

Es.ma.nggene.cvarianceUseMLorREMLtoes.matevariances

MLfindsthevalueofVGthatmaximisestheprobabilityofobservingthedataMLes.matesallparameterstogether

=es.matesvariancesassumingthatfixedeffectshavebeenes.matedwithouterror

REMLallowsforlossofdfines.ma.ngfixedeffectsMLσ2=Σ(y-mean)2/NREMLσ2=Σ(y-mean)2/(N-1)LikledifferenceunlessmanyfixedeffectsUseREMLcomputerprogramssuchasASREML

Page 19: Resemblance between relaves - University of Washington

Es.ma.nggene.cvarianceExample:Dataonphenotypes(y)offullsibsy=fixedeffects=g+eCov(gi,gj)=A(i,j)VG=0.5VGifiandjaresibsThereforees.mateVGby2cov(full-sibs)

h2by2correla.onbetweenfull-sibsWhatisthecovariancebetweentwins?

Page 20: Resemblance between relaves - University of Washington

Modelwithdominance

Page 21: Resemblance between relaves - University of Washington

Covariancebetweengene.ceffectsofrela.ves

Modelwith1gene,2allelesandaddi.veanddominantgeneac.onWeneedgene.cvariancesandcovariancesGenotype BB Bb bbEffect a d -aFrequency p2 2pq q2 (p+q=1)Mean=a*p2 +d*2pq–a*q2 =(p-q)*a+2pqdVariance(gene.ceffect)=gene.cvariance=VG

=E(effect2)–E(effect)2

VG =a2*p2+d2*2pq+a2*q2–[(p-q)*a+2pqd]2=2pqα2+(2pqd)2

whereα=a+(q-p)d

Page 22: Resemblance between relaves - University of Washington

Covariancebetweengene.ceffectsofrela.ves

Modelwith1gene,2allelesandaddi.veanddominantgeneac.onbutthecovariancebetweenrela.vesdoesn’tdependdirectlyonVG.WeneedtodecomposeVGintoanaddi.veanddominancevariance.Parameterisethegene.cvalueas

g=mean+addi.veeffect+dominancedevia.ong=mean+paternalalleleeffect+maternalalleleeffect+interac.onofalleles

Genotype BB Bb bbEffect a d -aFrequency p2 2pq q2 (p+q=1)mean (p-q)a+2pqd (p-q)a+2pqd (p-q)a+2pqdaddi.ve 2qα (q-p)α -2pα α=a+(q-p)ddominancedev. -q2d 2pqd -p2dMean(addi.veeffect)=0,mean(dominancedevia.on)=0,cov(addi.veeffect,dominancedev)=0Gene.cvariance=VG =2pqα2 +(2pqd)2

=VA +VD

Page 23: Resemblance between relaves - University of Washington

Covariancebetweengene.ceffectsofrela.ves

Modelwith1gene,2allelesandaddi.veanddominantgeneac.onCov(gi,gj)=Cov(ai+di,aj+dj)=Cov(ai,aj)+cov(di,dj)

=A(i,j)VA+D(i,j)VD

D(i,j)=prob(iandjinheritthesamegenotypeIBD)EgD(i,j)=1forMZtwins,0.25forfull-sibs,0forparentandoffspring

Page 24: Resemblance between relaves - University of Washington

Covariancebetweengene.ceffectsofrela.ves

Modelwith1gene,2allelesandaddi.veanddominantgeneac.onRela.onships MZtwins full-sibs 1/2sibs P-OA 1 0.5 0.25 0.5D 1 0.25 0 0Thereforecanes.matebothVAandVDbyusingmul.plerela.onships

Page 25: Resemblance between relaves - University of Washington

Covariancebetweenenvironmentaleffectsofrela.ves

y=mean+gene.ceffect+commonenvironmenteffect+individualenvironmenteffecty=mean+g+ec+eModelwithacommonenvironmentaleffectwithinthesamefamilyCov(eci,ecj)=VCifiandjinsamefamily,zerootherwiseRela.onships MZtwins full-sibs 1/2sibs P-OA 1 0.5 0.25 0.5D 1 0.25 0 0Ecommon 1 1 ? ?

Page 26: Resemblance between relaves - University of Washington

Covariancebetweenrela.vesEs.ma.ngVA,VDandVC

Difficult!AssumeVD=0VA=2(cov(MZtwins)–cov(full-sibs))Rela.onships MZtwins full-sibs 1/2sibs P-OA 1 0.5 0.25 0.5D 1 0.25 0 0Ecommon 1 1 ? ?

Page 27: Resemblance between relaves - University of Washington

Covariancebetweenrela.vesCanaddepista.cinterac.onstomodelg=mean+addi.ve+dominace+epistasisegg=mean+a+d+aaCov(gi,gj)=A(i,j)VA+D(i,j)VD+A(i,j)2VAA

Rela.onships MZtwins full-sibs 1/2sibs P-OA 1 0.5 0.25 0.5D 1 0.25 0 0AxA 1 0.25 0.0625 0.25

Page 28: Resemblance between relaves - University of Washington

28

Page 29: Resemblance between relaves - University of Washington

Resemblancebetweenrela.ves(height)

29

y=0.747x+0.107R²=0.965

0

0.2

0.4

0.6

0.8

0.00 0.25 0.50 0.75 1.00

Phen

otypiccorrela/o

n

Gene/crela/onship

Pearson&LeeVirginiaFHMLiu

Page 30: Resemblance between relaves - University of Washington

Moredataonheight

30[MagnusJohannesson,DavidCesarini]

Datafrom~172,00018-yearoldbrotherpairs

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

MZtwins DZtwins Fullsibs Fullsibslivingapart

Halfsibs Halfsibslivingapart

Phen

otypiccorrela/o

nPhenotypiccorrela/on

Page 31: Resemblance between relaves - University of Washington

31

Averagecorrela.onsMZ 0.74DZ(samesex) 0.36DZ(oppositesex) 0.25

Page 32: Resemblance between relaves - University of Washington

32

Page 33: Resemblance between relaves - University of Washington

BMI

33[MagnusJohannesson,DavidCesarini]

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

MZtwins DZtwins Fullsibs Fullsibslivingapart

Halfsibs Halfsibslivingapart

Phen

otypiccorrela/o

n

Datafrom~172,00018-yearoldbrotherpairs

Page 34: Resemblance between relaves - University of Washington

SummaryResemblancebetweenrela.ves

Modelphenotypesbyfixedeffectsandrandomeffectsincludinggene.cvalue(addi.ve,dominance,epista.c)Modelcovarianceofgene.ceffectsbyrela.onshipes.matedfrompedigree(orSNPgenotypes)Es.mategene.cvariancebyREML

Page 35: Resemblance between relaves - University of Washington

Estimating genetic variation within families

Peter M. Visscher [email protected]

1

Page 36: Resemblance between relaves - University of Washington

Key concepts

1. There is variation in realised relationships given the expected value from the pedigree;

2. Variation in realised relationships can be captured with genetic markers;

3. Variation in realised relationships can be exploited to estimate genetic variation

2

Page 37: Resemblance between relaves - University of Washington

Genetic covariance between relatives

covG(yi,yj) = aijσA2 + dijσD

2 a = additive coefficient of relationship

= 2 * θ (= E(πa)) d = coefficient of fraternity

= Prob(2 alleles are IBD) = Δ = E(πd)

3

Page 38: Resemblance between relaves - University of Washington

Examples (no inbreeding)

Relatives a d MZ twins 1 1 Parent-offspring ½ 0 Fullsibs ½ ¼ Double first cousins ¼ 1/16

4

Page 39: Resemblance between relaves - University of Washington

Controversy/confounding: nature vs nurture

•  Is observed resemblance between relatives genetic or environmental? – MZ & DZ twins (shared environment) – Fullsibs (dominance & shared environment)

•  Estimation and statistical inference – Different models with many parameters may

fit data equally well

5

Page 40: Resemblance between relaves - University of Washington

Actual or realised genetic relationship

= proportion of genome shared IBD (πa)

•  Varies around the expectation – Apart from parent-offspring and MZ twins

•  Can be estimated using marker data

6

Page 41: Resemblance between relaves - University of Washington

x

1/4 1/4 1/4 1/4 7

Page 42: Resemblance between relaves - University of Washington

IDENTITY BY DESCENT

Sib 1

Sib 2

4/16 = 1/4 sibs share BOTH parental alleles IBD = 2

8/16 = 1/2 sibs share ONE parental allele IBD = 1

4/16 = 1/4 sibs share NO parental alleles IBD = 0 8

Page 43: Resemblance between relaves - University of Washington

Single locus

Relatives E(πa) var(πa) Fullsibs ½ 1/8 Halfsibs ¼ 1/16 Double 1st cousins ¼ 3/32

9

Page 44: Resemblance between relaves - University of Washington

Several notations

IBD Probability Actual IBD0 k0 0 or 1 IBD1 k1 0 or 1 IBD2 k2 0 or 1

Σ=1 Σ=1 πa = ½k1 + k2 = R = 2θ πd = k2 = Δxy

10 [e.g., LW Chapter 7; Weir and Hill 2011, Genetics Research]

Realisations k0 k1 k2 1 0 0 0 1 0 0 0 1

Page 45: Resemblance between relaves - University of Washington

n multiple unlinked loci

Relatives E(πa) var(πa) Fullsibs ½ 1/8n

Halfsibs ¼ 1/16n Double 1st cousins ¼ 3/32n

11

Page 46: Resemblance between relaves - University of Washington

Loci are on chromosomes

•  Segregation of large chromosome segments within families –  increasing variance of IBD sharing

•  Independent segregation of chromosomes – decreasing variance of IBD sharing

12

Page 47: Resemblance between relaves - University of Washington

Theoretical SD of πa Relatives 1 chrom (1 M) genome (35 M) Fullsibs 0.217 0.038 Halfsibs 0.154 0.027 Double 1st cousins 0.173 0.030

[Stam 1980; Hill 1993; Guo 1996; Hill & Weir 2011] 13

Page 48: Resemblance between relaves - University of Washington

Fullsibs: genome-wide (Total length L Morgan)

var(πa) ≈ 1/(16L) – 1/(3L2) var(πd) ≈ 5/(64L) – 1/(3L2) var(πd)/ var(πa) ≈ 1.3 if L = 35

[Stam 1980; Hill 1993; Guo 1996]

Genome-wide variance depends more on total genome length than on the number of chromosomes

14

Page 49: Resemblance between relaves - University of Washington

Fullsibs: Correlation additive and dominance relationships

r(πa, πd) = σ(πa) / σ(πd) ≈ [1/(16L) / (5/(64L))]0.5 = 0.89.

Using β(πa on πd) = 1 Difficult but not impossible to disentangle additive and dominance variance NB Practical 15

Page 50: Resemblance between relaves - University of Washington

Summary Additive and dominance (fullsibs)

SD(πa) SD(πd)

Single locus 0.354 0.433 One chromsome (1M) 0.217 0.247 Whole genome (35M) 0.038 0.043 Predicted correlation 0.89 (genome-wide πa and πd)

16

Page 51: Resemblance between relaves - University of Washington

Estimating IBD from marker data •  Elston-Stewart algorithm Handles large pedigrees, but small nr of loci, exact IBD

distributions (Elston and Stewart, 1971)

•  Lander-Green algorithm Handles small pedigrees, but large nr of loci, exact IBD

distributions (Lander and Green, 1987). Software: Merlin

•  MCMC methods Calculates approximate IBD distributions (Heath, 1997). Software:

Loki •  Average sharing methods. Calculates approximate IBD distributions (Fulker et al., 1995; Almasy

and Blangero, 1998). Software: SOLAR

17

Page 52: Resemblance between relaves - University of Washington

Estimating π when marker is not fully informative

•  Using: – Mendelian segregation rules – Marker allele frequencies in the population

18

Page 53: Resemblance between relaves - University of Washington

IBD can be trivial…

1

1 1

1

/ 2 2 /

2 / 2 /

IBD=0 19

Page 54: Resemblance between relaves - University of Washington

Two Other Simple Cases…

1

1 1

1

/

2 / 2 /

1 1 /

1 1 2 / 2 /

IBD=2

2 2 / 2 2 / 20

Page 55: Resemblance between relaves - University of Washington

A little more complicated…

1 2 /

IBD=1 (50% chance)

2 2 /

1 2 / 1 2 /

IBD=2 (50% chance)

21

Page 56: Resemblance between relaves - University of Washington

And even more complicated…

1 1 / IBD=? 1 1 / 22

Page 57: Resemblance between relaves - University of Washington

Bayes Theorem for IBD Probabilities

∑ ==

===

===

===

jjIBDGPjIBDP

iIBDGPiIBDPGP

iIBDGPiIBDPGP

GiGiIBDP

)|()()|()(

)()|()(

)(), P(IBD)|(

prior

Prob(data)

posterior

23

Page 58: Resemblance between relaves - University of Washington

P(Marker Genotype|IBD State)

IBD Sib CoSib 0 1 2 (a,b) (c,d) papbpcpd 0 0 (a,a) (b,c) pa

2pbpc 0 0 (a,a) (b,b) pa

2pb2 0 0

(a,b) (a,c) pa2pbpc papbpc 0

(a,a) (a,b) pa3pb pa

2pb 0 (a,b) (a,b) pa

2pb2 papb

2+pa2pb papb

(a,a) (a,a) pa4 pa

3 pa2

Prior Probability ¼ ½ ¼ [Assumes Hardy-Weinberg proportions of genotypes in the population]

24

Page 59: Resemblance between relaves - University of Washington

Worked Example

1 1 / 1 1 / 94

)(41

)|2(

94

)(21

)|1(

91

)(41

)|0(

649

41

21

41)(

41)2|(81)1|(161)0|(

5.0

21

3

41

21

31

41

21

31

41

1

1

===

===

===

=++=

===

===

===

=

GP

pGIBDP

GP

pGIBDP

GP

pGIBDP

pppGP

pIBDGP

pIBDGP

pIBDGP

p

32ˆ =π

25

Page 60: Resemblance between relaves - University of Washington

Application (1) Aim: estimate genetic variance from actual

relationships between fullsib pairs

•  Two cohorts of Australian twin families

Adolescent Adult Families 500 1512 Individuals 1201 3804 Sibpairs with genotypes 950 3451 Markers per individual 211-791 201-1717 Average marker spacing 6 cM 5 cM

26

Page 61: Resemblance between relaves - University of Washington

Application (1)

•  Phenotype = height Number of sibpairs with phenotypes and genotypes

Adolescent cohort 931 Adult cohort 2444 Combined 3375

27

Page 62: Resemblance between relaves - University of Washington

Mean IBD sharing across the genome for the jth sib pair was based on IBD estimated every centimorgan and

averaged over 3500 points (L = 35)

π a( j ) = π a(ij )i=1

3500

∑ / 3500

π d ( j ) = p2(ij )i=1

3500

∑ / 3500

additive

dominance

28

Page 63: Resemblance between relaves - University of Washington

And for the cth chromosome of length lc cM

c

l

i

cc lc

ijaja/ˆˆ

1)()( ∑

=

= ππ

c

l

iij

c lpc

jd/ˆ

1)(2)( ∑

=

additive

dominance

29

Page 64: Resemblance between relaves - University of Washington

Mean and SD of genome-wide additive relationships

30

Page 65: Resemblance between relaves - University of Washington

Mean and SD of genome-wide dominance relationships

31

Page 66: Resemblance between relaves - University of Washington

Empirical and theoretical SD of additive relationships correlation = 0.98 (n = 4401)

32

Page 67: Resemblance between relaves - University of Washington

Empirical and theoretical SD of dominance relationships correlation = 0.98 (n = 4401)

33

Page 68: Resemblance between relaves - University of Washington

Additive and dominance relationships correlation = 0.91 (n= 4401)

34

Page 69: Resemblance between relaves - University of Washington

Phenotypes

After adjustment for sex and age: σp = 7.7 cm σp = 6.9 cm 35

Page 70: Resemblance between relaves - University of Washington

Phenotypic correlation between siblings

Raw After age & sex Adolescents 0.33 0.40 Adults 0.24 0.39

36

Page 71: Resemblance between relaves - University of Washington

Models

C = Family effect A = Genome-wide additive genetic E = Residual Full model C + A + E Reduced model C + E

37

yij = µ + ci + aij + eijvar(y) =σ c

2 +σ a2 +σ e

2

cov(yij, yik ) =σ c2 +π a( jk )σ a

2

Page 72: Resemblance between relaves - University of Washington

Estimation

•  Maximum Likelihood variance components

•  Likelihood-ratio-test (LRT) to calculate P-values for hypotheses H0: A = 0 H1: A > 0

38

Page 73: Resemblance between relaves - University of Washington

Estimates: null model (CE)

Cohort Family effect (C) Adolescent 0.40 (0.34 – 0.45) Adult 0.39 (0.36 – 0.43) Combined 0.39 (0.36 – 0.42)

39

Page 74: Resemblance between relaves - University of Washington

Estimates: full model (ACE)

Cohort C A P Adolescent 0 0.80 0.0869 Adult 0 0.80 0.0009 Combined 0 0.80 0.0003 ►All family resemblance due to

additive genetic variation 40

Page 75: Resemblance between relaves - University of Washington

Sampling variances are large

Cohort A (95% CI) Adolescent 0.80 (0.00 – 0.90) Adult 0.80 (0.43 – 0.86) Combined 0.80 (0.46 – 0.85)

41

Page 76: Resemblance between relaves - University of Washington

Power and SE of estimates

•  True parameter (t = intra-class correlation) •  Sample size (n pairs) •  Variance in genome-wide IBD sharing (var(π))

NCP = nh4var(π)(1+t2) / (1-t2)2

[ ]))var()(1(/)1()ˆvar( 2222 πntth +−≈

42

Page 77: Resemblance between relaves - University of Washington

•  Aims – Estimate genetic variance from genome-wide

IBD in larger sample – Partition genetic variance to individual

chromosomes •  using chromosome-wide coefficients of relationship

– Test hypotheses about the distribution of genetic variance in the genome

Application (2) Genome partitioning of additive

genetic variance for height

43

Page 78: Resemblance between relaves - University of Washington

Sample # Sibpairs Sib Correlation AU 5952 0.43 US 3996 0.50 NL 1266 0.45 Total 11,214 0.46

44

Page 79: Resemblance between relaves - University of Washington

Realised relationships Mean 0.499 Range 0.31 – 0.64 SD 0.036

45

Page 80: Resemblance between relaves - University of Washington

46

Page 81: Resemblance between relaves - University of Washington

222120

19

18

17

16

15

14

13

12

11 10

9

8

7

6

5

4

3

2

1

0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

50 100 150 200 250 300

Length of chromosome (cM)

Estim

ate

of h

erita

bilit

y

WLS analysis: P<0.001; intercept NS

Longer chromosomes explain more additive genetic variance: ~0.03 per 100 cM

47

Page 82: Resemblance between relaves - University of Washington

Application (3)

•  Using SNP data to estimate IBD •  Data from ~20,000 fullsib pairs •  Height and BMI

48

Page 83: Resemblance between relaves - University of Washington

IBD

π = p(IBD1) 2 + p(IBD2)

Frequency

0.35 0.40 0.45 0.50 0.55 0.60 0.65

0200

400

600

800

1000

μ=0.500σ=0.037

49

Heritabilityes6matesfrom~20,000fullsibpairs:Height 0.7(SE0.14)BMI 0.4(SE0.17)

Gene6cvaria6onwithinfamiliesusingSNPdata

n=1507 n=1819 n=2722 n=4607 n=9585 n=20240 Meta n=20240

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

GC = 1.13

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

GC = 0.96

●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

GC = 1.12

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

GC = 1.14

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

GC = 0.97

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

GC = 1.29

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

GC = 0.93

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

GC = 1.58

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

GC = 1.42

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

GC = 1.76

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

GC = 1.3

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

GC = 2.14

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

GC = 1.27

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

GC = 2.08

0

2

4

6

0

2

4

6

BMI

HT

0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5Expected

Obs

erve

d

Page 84: Resemblance between relaves - University of Washington

Conclusions

•  Empirical variation in genome-wide IBD sharing follows theoretical predictions

•  Genetic variance can be estimated from genome-wide IBD within families –  results for height consistent with estimates from

between-relative comparisons –  no assumptions about nature/nurture causes of family

resemblance •  Genetic variance can be partitioned onto

chromosomes 50

Page 85: Resemblance between relaves - University of Washington

Key concepts

1. There is variation in realised relationships given the expected value from the pedigree;

2. Variation in realised relationships can be captured with genetic markers;

3. Variation in realised relationships can be exploited to estimate genetic variation

51

Page 86: Resemblance between relaves - University of Washington

Es#ma#ngrela#onshipfrommarkergenotypes

MikeGoddard

Page 87: Resemblance between relaves - University of Washington

Rela#onships

Weuserela#onshipdatatoes#mategene#cvariancetoes#matedemographichistory…

Page 88: Resemblance between relaves - University of Washington

Rela#onships

Addi#vegene#crela#onshipG(i,j)=propor#onofthegenomeiniandjthat isIBD

Pedigreerela#onshipA(i,j)=Prob(IBD)

=E(G(i,j))Actualrela#onshipdeviatesrandomlyfromthisexpecta#on

Page 89: Resemblance between relaves - University of Washington

Rela#onshipsSinglelocuscase,fullsibsParentsA1A2 x A3A4offspring A1A3

A1A4 A2A3 A2A4

Pairsofsibsshare0alleles 25%ofthe#me1allele 50%2alleles 25%

E(G)=A=0.5butGvariesfrom0to1

Page 90: Resemblance between relaves - University of Washington

Es#materela#onshipfrommarkersGisamoreaccuratedescrip#onofrela#onshipthanA

Gcapturesunknownpedigreeinforma#onpedigreecanbeincorrectGcapturesdevia#onsfromA

Therefore,canuseGin

Randomsampleofpopula#on(“unrelatedindividuals”)Individualswithsamepedigree

Page 91: Resemblance between relaves - University of Washington

Es#materela#onshipfrommarkers1.  Welldefined(recent)base2.  Nowelldefinedbase3.  Welldefined,recentbaseEgDataonfamiliesoffull-sibsandparentsofsibsarethebase

Page 92: Resemblance between relaves - University of Washington

Es#materela#onshipfrommarkersEgDataonfamiliesoffull-sibsandparentsofsibsarethebaseConsiderasingleSNPFullsibscanbeIBDateithermaternalorpaternalallele

IBDstatus P(IBDstatus)Maternal Paternalyes yes 0.25yes no 0.25no yes 0.25no no 0.25

Page 93: Resemblance between relaves - University of Washington

Es#materela#onshipfrommarkersEgDataonfamiliesoffull-sibsandparentsofsibsarethebaseAtthisSNP,onesibhasgenotypeAAandtheotherisAB,mother=AB,father=AAP(IBDstatus|SNPgenotypes)

=P(SNPgenotypes|IBDstatus)*P(IBDstatus) P(SNPgenotypes)

=P(SNPgenotypes|IBDstatus)*P(IBDstatus)ΣP(SNPgenotypes|IBDstatus)*P(IBDstatus)

Page 94: Resemblance between relaves - University of Washington

Es#materela#onshipfrommarkers=P(SNPgenotypes|IBDstatus)*P(IBDstatus)ΣP(SNPgenotypes|IBDstatus)*P(IBDstatus)

IBDstatus P(IBDstatus)P(genotypes|IBDstatus) P(IBDstatus|genotypes)Maternal Paternal Gyes yes 0.25 0 0 1yes no 0.25 0 0 0.5no yes 0.25 1 0.5 0.5no no 0.25 1 0.5 0ΣP(SNPgenotypes|IBDstatus)*P(IBDstatus)=0.5E(G)=0.25comparedwithA=0.5

Page 95: Resemblance between relaves - University of Washington

Es#materela#onshipfrommarkers1.  Welldefined,recentbaseEgDataonfamiliesoffull-sibsandparentsofsibsarethebasea)CalculateBayesianprobabilityofIBDstatusateachSNP

àE(G)ateachSNPaverageoverSNPs

b)Usehaplotypes?

Page 96: Resemblance between relaves - University of Washington

Es#materela#onshipfrommarkers2.Lesswelldefined,lessrecentbaseEgDataoncurrentpopula#on,base=ancestors1000yearsagoandallelefrequenciesinbaseareknown(pandq)ConsiderhaploidgametesofSNPallelesinsteadofgenotypesWhatfrac#onofthegametesareIBD(G)?AtasingleSNP,thereare3possibledatasetsandtheirprobabili#esareAandA AandB BandBp2+pqG 2pq(1-G) q2+pqG

Page 97: Resemblance between relaves - University of Washington

Es#materela#onshipfrommarkersSNPgenotypes AandA AandB BandBProbability p2+pqG 2pq(1-G) q2+pqGscore(x) q/p -1 p/qEs#mateG(i,j)fromthemeanvalueofxoverSNPsThisisarela#onshipbetweengametes.CalculateGforindividualsfromthe4game#crela#onships.SeeYangetal(2010)andPowelletal(2010)forthediploidformulae.

Page 98: Resemblance between relaves - University of Washington

Es#materela#onshipfrommarkersE.g.Score(x)forpairsofgametesfrompopula#oninH-Wp(A)=0.9,q(B)=0.1

A B (0.9) (0.1)

A(0.9) 0.11 -1B(0.1) -1 9MeanG=0.81*0.11+0.18*(-1)+0.01*9=0

Page 99: Resemblance between relaves - University of Washington

Es#materela#onshipfrommarkersE.g.Score(x)forpairsofgametesfrompopula#oninH-Wp(A)=0.9,q(B)=0.1AAAAAAAAAAAAAAAAAABBAandAorAandA BandB

Page 100: Resemblance between relaves - University of Washington

Es#materela#onshipfrommarkersE.g.Score(x)forpairsofgametesfromsameparentp(A)=0.9,q(B)=0.1Parent AA AB BBFreq. 0.81 0.18 0.01

AA(x=0.11) AA(0.11) BB(9) AB(-1) BB(9)

MeanG=0.81*0.11+0.18*(0.25*0.11+0.5*(-1)+0.25*9)+0.01*9

=0.5

Page 101: Resemblance between relaves - University of Washington

Es#materela#onshipfrommarkersE.g.Score(x)forpairsofgametesfrompopula#oninH-Wbutaierallelefrequencyhasdriiedtop(A)=0.8,q(B)=0.2

A B (0.8) (0.2)

A(0.8) 0.11 -1B(0.2) -1 9MeanG=0.64*0.11+0.32*(-1)+0.04*9=0.11

Page 102: Resemblance between relaves - University of Washington

Es#materela#onshipfrommarkers2.NowelldefinedbaseEgrandomsamplefrompopula#onbutdon’tknowallelefrequencyinthebase.a)Usethecurrentpopula#onasthebaseProblem:SomeG<0Cannotinterpretasprobabili#esbuts#llinterpretascovariancesIfg=gene#cvalue,V(g)=GVA

whereGiscalculatedasabovebutusingallelefrequenciesincurrentpopula#on.

Page 103: Resemblance between relaves - University of Washington

Es#materela#onshipfrommarkersE.g.Score(x)forpairsofgametesfrompopula#oninH-Wbutaierallelefrequencyhasdriiedtop(A)=0.8,q(B)=0.2andusingallelefrequenciesinmodernpopula#on

A B (0.8) (0.2)

A(0.8) 0.25 -1B(0.2) -1 4MeanG=0.64*0.25+0.32*(-1)+0.04*4=0

Page 104: Resemblance between relaves - University of Washington

Es#materela#onshipfrommarkers2.Nowelldefinedbaseb)AssumeSNPsarearandomsampleoflociasareQTLy=mean+g+ey=mean+Zu+eZij=0forAA,1forABor2forBBu~N(0,Iσu2)àg=Zu~N(0,ZZ’σu2),ZZ’σu2=Gσg2,ifσg2=Nσu2whereN=Σ2pqacrossSNPsTherefore,G=ZZ’/N

Page 105: Resemblance between relaves - University of Washington

Es#materela#onshipfrommarkersE.g.Scoreforpairsofgametesfrompopula#oninH-Wp(A)=0.8,q(B)=0.2

A B (0.8) (0.2)z 0 1

A(0.8) 0 0 0B(0.2) 1 0 1MeanG=0.04*1=0.04

Page 106: Resemblance between relaves - University of Washington

Es#materela#onshipfrommarkersE.g.Scoreforpairsofgametesfrompopula#oninH-Wp(A)=0.8,q(B)=0.2

A B (0.8) (0.2)z -0.2 0.8

A(0.8) -0.2 0.04 -0.16B(0.2) 0.8 -0.16 0.64MeanG=0.64*0.04+0.32*(-0.16)+0.04*0.64=0

Page 107: Resemblance between relaves - University of Washington

Comparing2aand2bE.g.p(A)=0.8,q(B)=0.2

2b 2a A B A B (0.8) (0.2)z -0.2 0.8

A(0.8) -0.2 0.04 -0.16 A 0.25 -1 B(0.2) 0.8 -0.16 0.64 B -1 4

Page 108: Resemblance between relaves - University of Washington

Es#materela#onshipfrommarkers2aand2bcomparedforgame#crela#onshipsSNPdata AandA AandB BandBscore(x) q/p -1 p/qweight(w) pq pq pq2a)G=meanofx2b)G=weightedmeanofx=Σwx/ΣwThiscouldbedescribedasusingtheIBSstatusofSNPsinsteadofIBD

Page 109: Resemblance between relaves - University of Washington

Es#materela#onshipfrommarkersE.g.Score(xi.e.method2a)forpairsofgametesp(A)=0.8,q(B)=0.2andweigh#ngbypq=0.16

A B (0.8) (0.2)

A(0.8) 0.25*0.16 -1*0.16

=0.04 =-0.16B(0.2) -1*0.16 4*0.16

=-0.16 =0.64Sameas2b

Page 110: Resemblance between relaves - University of Washington

Es#materela#onshipfrommarkers2a)G=meanofx

givesmoreemphasistosharingrareallelesMakessensebecauseindividualswhosharerareallelesaremorelikelytobecloselyrelatedthanindividualswhosharecommonalleles.Givesminimumerrorvarianceofrela#onshipundersomecondi#ons

Page 111: Resemblance between relaves - University of Washington

Es#materela#onshipfrommarkers2.Nowelldefinedbasec)AssumeSNPsarearandomsampleoflociasareQTLbuteffectofSNPdecreasesasheterozygosityincreasesy=mean+g+ ey=mean+Zu+eZij=0forAA,1forABor2forBBu~N(0,Dσu2)àg=Zu~N(0,ZDZ’σu2),ZDZ’σu2=Gσg2,ifσg2=Nσu2whereN=Σ(piqi)Therefore,G=ZDZ’/NDii=1/(piqi)Thatis,assumetheeffectofSNPsispropor#onalto√(piqi)SovarianceexplainedbySNPsisnotaffectedbyallelefrequency2c=2a

Page 112: Resemblance between relaves - University of Washington

Es#materela#onshipfrommarkersRela#onshipdependsonthemarkersorQTLEgQTLareduetorecentmuta#onsAQAQ AqMarkeristhesamebutQTLisdifferentRareSNPallelestendtobearecentmuta#onTherefore,treatSNPsdifferentlyaccordingtoMAF

Page 113: Resemblance between relaves - University of Washington

Es#materela#onshipfrommarkersRela#onshipdependsonthemarkersorQTLTherefore,treatSNPsdifferentlyaccordingtoMAFy=mean+g1+g2+g3+g4+g5+eV(gi)=(ZZ’/N)σi2forSNPsinMAFbini

Page 114: Resemblance between relaves - University of Washington

Es#materela#onshipfrommarkersUsehaplotypesofmarkersNewdefini#onofIBDforchromosomesegmentsTwosegmentsareIBDiftheycoalescewithoutrecombina#on

Avoidsdefini#onofabasepopula#onChromosomesegmenthomozygosity(CSH)

=P(2segmentsareIBD)E(csh)=1/(1+4Nec)

Page 115: Resemblance between relaves - University of Washington

Es#materela#onshipfrommarkersProblem:cantobserveCSHdirectly

onlyobservehaplotypehomozygosity(HH) orrunsofhomozygosity(ROH)

Page 116: Resemblance between relaves - University of Washington

Es#materela#onshipfrommarkersCanuseHHorROHinQTLmappingAddi#veeffectsCalculateP(QTLinposi#onxisIBD)=P(cshforsurroundingchr)EgP(QTLIBD)=0.9ifinmiddleof10iden#calmarkersRecessiveeffectsROHwithinindividualàhomozygousQTLwithintherun

Page 117: Resemblance between relaves - University of Washington

Es#materela#onshipfrommarkersRecessiveeffectsROHwithinindividualàhomozygousQTLwithintherunDetectembryoniclethalsbymissingROH

Page 118: Resemblance between relaves - University of Washington

Es#materela#onshipfrommarkersSummary

1.  Infamilies

2.  Inthegeneralpopula#onExpressrela#onshiprela#vetocurrentpopula#on Gcanbenega#ve Gisnotaprobability V(g)=Gσg2

twoformulae(2aand2b)Sameexcept2agivesmoreweighttorarealleles

Page 119: Resemblance between relaves - University of Washington

(Genome-wide) association analysis

Peter M. Visscher [email protected]

1

Page 120: Resemblance between relaves - University of Washington

Key concepts

•  Mapping QTL by association relies on linkage disequilibrium in the population;

•  LD can be caused by close linkage between a QTL and marker (= good) or by confounding between a marker and other effects (= usually bad);

•  The power of QTL detection by LD depends on the proportion of phenotypic variance explained at a marker;

•  Mixed models are good for performing GWAS •  Genetic (co)variance can be estimated from GWAS

summary statistics

2

Page 121: Resemblance between relaves - University of Washington

Outline

•  Association vs linkage •  Linkage disequilibrium •  Analysis: single SNP

•  GWAS: design, power •  GWAS: analysis

3

Page 122: Resemblance between relaves - University of Washington

Linkage Association

Families Populations 4

Page 123: Resemblance between relaves - University of Washington

Linkage disequilibrium around an ancestral mutation

[Ardlie et al. 2002] 5

Page 124: Resemblance between relaves - University of Washington

LD

•  Non-random association between alleles at different loci

•  Many possible causes – mutation –  drift / inbreeding / founder effects –  population stratification –  selection

•  Broken down by recombination

6

Page 125: Resemblance between relaves - University of Washington

Definition of D

•  2 bi-allelic loci – Locus 1, alleles A & a, with freq. p and (1-p) – Locus 2, alleles B & b with freq. q and (1-q) – Haplotype frequencies pAB, pAb, paB, pab

D = pAB - pq

7

Page 126: Resemblance between relaves - University of Washington

r2

r2 = D2 / [pq(1-p)(1-q)]

•  Squared correlation between presence and absence of the alleles in the population

•  ‘Nice’ statistical properties

[Hill and Robertson 1968] 8

Page 127: Resemblance between relaves - University of Washington

Properties of r and r2

•  Population in ‘equilibrium’ E(r) = 0 E(r2) = var(r) ≈ 1/[1 + 4Nc] + 1/n

N = effective population size n = sample size (haplotypes) c = recombination rate

•  nr2 ~ χ(1)2 •  Human population is NOT in equilibrium

[Sved 1971; Weir and Hill 1980]

LD depends on population size and recombination distance

9

Page 128: Resemblance between relaves - University of Washington

Analysis

•  Single locus association •  GWAS

•  Least squares •  ML •  Bayesian methods

10

Page 129: Resemblance between relaves - University of Washington

Falconer model for single biallelic QTL

Var (X) = Regression Variance + Residual Variance = Additive Variance + Dominance Variance

bb Bb BB

m

-a

a d

11

Page 130: Resemblance between relaves - University of Washington

12

Page 131: Resemblance between relaves - University of Washington

Statistical power (linear regression)

y = µ + β*x + e, x = 0, 1, 2 σy

2 = σq2 + σe

2 regression + residual σx

2 = 2p(1-p) p = allele frequency for indicator x {HWE: note x is usually considered fixed in regression}

σq

2 = β2σx2 = [a + d(1-2p)]2 * 2p(1-p)

q2 = σq

2/ σy2 {QTL heritability}

13

Page 132: Resemblance between relaves - University of Washington

Statistical Power χ2 test with 1 df: E(X2) = 1 + n R2 / (1 – R2) = 1 + nq2/(1-q2) = 1 + NCP NCP = non-centrality parameter

Power of association proportional to q2 (Power of linkage proportional to q4) 14

Page 133: Resemblance between relaves - University of Washington

Statistical Power (R)

alpha= 5e-8 threshold= qchisq(1-alpha,1)

q2= 0.005

n= 10000

ncp= n*q2/(1-q2)

power= 1-pchisq(threshold,1,ncp)

threshold

ncp

power

> alpha= 5e-8 > threshold= qchisq(1-alpha,1) > q2= 0.005 > n= 10000 > ncp= n*q2/(1-q2) > power= 1-pchisq(threshold,1,ncp) > threshold [1] 29.71679 > ncp [1] 50.25126 > power [1] 0.9492371

15

Page 134: Resemblance between relaves - University of Washington

Power by association with SNP

(small effect; HWE) NCP(SNP) = n r2 q2 = r2 * NCP(causal variant) = n * {r2 q2} = n * (variance explained by SNP)

Power of LD mapping depends on the experimental sample size, variance explained by the causal variant and LD with a genotyped SNP 16

Page 135: Resemblance between relaves - University of Washington

GWAS •  Same principle as single locus association,

but additional information – QC

•  Duplications, sample swaps, contamination – Power of multi-locus data

•  Unbiased genome-wide association •  Relatedness •  Population structure •  Ancestry •  More powerful statistical analyses

17

Page 136: Resemblance between relaves - University of Washington

The multiple testing burden

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 5 10 15 20 25 30 35 40 45 50

Number of independent tests performed

P(at

leas

t 1 fa

lse

posi

tive)

per test false positive rate 0.05

per test false positive rate 0.001 = 0.05/50

P = 1 – (1-α)n

18

Page 137: Resemblance between relaves - University of Washington

Population stratification (association unlinked genes)

Allele frequency Haplotype frequency pA1 pB1 pA1B1 pA1B2 pA2B1 pA2B2 Pop. 1 0.9 0.9 0.81 0.09 0.09 0.01 Pop. 2 0.1 0.1 0.01 0.09 0.09 0.81 Average 0.5 0.5 0.41 0.09 0.09 0.41

Both populations are in linkage equilibrium; genes unlinked

Combined population: D = 0.16 and r2 = 0.41

19

Page 138: Resemblance between relaves - University of Washington

Population stratification (genes and phenotypes)

[Hamer & Sirota 2010 Mol Psych] 20

Page 139: Resemblance between relaves - University of Washington

Population stratification (genes and phenotypes)

[Hamer & Sirota 2010 Mol Psych] 21

Page 140: Resemblance between relaves - University of Washington

Population stratification (genes and phenotypes)

[Vilhjalmsson & Nordborg 2013 Nature Reviews Genetics ] 22

Page 141: Resemblance between relaves - University of Washington

23

Page 142: Resemblance between relaves - University of Washington

Stratification

y = Σgi + Σei r(y,gi) due to •  causal association with gi •  correlation gi and gj and causal association with gj

(LTC and height) •  correlation gi and environmental factor ej

(chopsticks)

24

Page 143: Resemblance between relaves - University of Washington

How to deal with structure?

•  Detect and discard ‘outliers’ •  Detect, analysis and adjustment

– E.g. genomic control •  Account for structure during analysis

– Fit a few principal components as covariates – Fit GRM

25

Page 144: Resemblance between relaves - University of Washington

GWAS using mixed linear models

y = Xb + β*x + g + e var(g) = Gσg

2 G = genetic relationship matrix (GRM) Model conditions on effects of all other variants Power depends on whether x is included (MLMi) or excluded (MLMe) from the construction of G. [Yang et al. 2014 Nature Genetics]

26

Page 145: Resemblance between relaves - University of Washington

GWAS using mixed linear models: statistical power

[Yang et al. 2014 Nature Genetics] 27

r2 here is the squared correlation between g-hat and g

Page 146: Resemblance between relaves - University of Washington

How does LD shape association A set of markers along a chromosome region:

Superimpose LD between markers

Consider causal SNPs

All markers correlated with a causal variant show association

28

Page 147: Resemblance between relaves - University of Washington

How does LD shape association Consider causal SNPs

All markers correlated with a causal variant show association. Lonely SNPs only show association if they are causal The more you tag the more likely you are to tag a causal variant Assuming all SNPs gave an equal probability of association given LD status, we expect to see more association for SNPs with more LD friends. This is a reasonable assumption under a polygenic genetic architecture

29

Page 148: Resemblance between relaves - University of Washington

LD score regression

[Yang 2011 EJHG; Bulik-Sullivan 2015]

Quantifies local LD for SNP j

à regression of test statistic on LD score provides an estimate of SNP heritability Use GWAS summary statistics and reference sample for LD score estimation

Test statistic is linear in LD score

30

Page 149: Resemblance between relaves - University of Washington

Same principle for genetic covariance

31 [Bulik-Sullivan 2015 Nature Genetics]

z = test statistics from GWAS summary statistics N = sample size M = number of markers ρg = genetic covariance between traits ρ = phenotypic correlation between traits

Ns is the number of overlapping samples

Page 150: Resemblance between relaves - University of Washington

Key concepts

•  Mapping QTL by association relies on linkage disequilibrium in the population;

•  LD can be caused by close linkage between a QTL and marker (= good) or by confounding between a marker and other effects (= usually bad);

•  The power of QTL detection by LD depends on the proportion of phenotypic variance explained at a marker;

•  Mixed models are good for performing GWAS •  Genetic (co)variance can be estimated from GWAS

summary statistics

32

Page 151: Resemblance between relaves - University of Washington

Es#ma#onofquan#ta#vegene#cparametersfromdistant

rela#vesusingmarkerdata

[email protected]

1

Page 152: Resemblance between relaves - University of Washington

Keyconcepts•  DenseSNPpanelsallowthees#ma#onoftheexpected

gene#ccovariancebetweendistantrela#ves(‘unrelateds’)•  Amodelbasedupones#matedrela#onshipsfromSNPsis

equivalenttoamodelfiLngallSNPssimultaneously•  Thetotalgene#cvarianceduetoLDbetweencommonSNPs

and(unknown)causalvariantscanbees#mated•  Gene#cvariancecapturedbycommonSNPscanbeassigned

tochromosomesandchromosomesegments

2

Page 153: Resemblance between relaves - University of Washington

1886

3

Page 154: Resemblance between relaves - University of Washington

4

Page 155: Resemblance between relaves - University of Washington

100yearslaterHeritabilityofhumanheight

h2 ~ 80% 5

Page 156: Resemblance between relaves - University of Washington

6

Page 157: Resemblance between relaves - University of Washington

Disease Number of loci

Percent of Heritability Measure Explained

Heritability Measure

Age-related macular degeneration

5 50% Sibling recurrence risk

Crohn’s disease 32 20% Genetic risk (liability)

Systemic lupus erythematosus

6 15% Sibling recurrence risk

Type 2 diabetes 18 6% Sibling recurrence risk

HDL cholesterol 7 5.2% Phenotypic variance

Height 40 5% Phenotypic variance

Early onset myocardial infarction

9 2.8% Phenotypic variance

Fasting glucose 4 1.5% Phenotypic variance

WhereistheDarkMaYer?

7

Page 158: Resemblance between relaves - University of Washington

Hypothesistes#ngvs.Es#ma#on

•  GWAS=hypothesistes#ng– Stringentp-valuethreshold– Es#matesofeffectsbiased(“Winner’sCurse”)

•  E(bhat|test(bhat)>T)>b{bfixed}•  var(bhat)=var(b)+var(bhat|b){brandom}

•  Canwees#matethetotalpropor#onofvaria#onaccountedforbyallSNPs?

8

Page 159: Resemblance between relaves - University of Washington

Areverydistantrela.vesthatsharemoreoftheirgenomebydescentphenotypicallymoresimilarthanthosethatshareless?

9

Page 160: Resemblance between relaves - University of Washington

Basicidea•  Es#matesofaddi#vegene#cvariancefromknownpedigreeisunbiased–  Ifmodeliscorrect–  Despitevaria#oniniden#tygiventhepedigree–  PedigreegivescorrectexpectedIBD

•  Unknownpedigree:es#mategenome-wideIBDfrommarkerdata–  Es#mateaddi#vegene#cvariancegiventhises#mateofrelatedness

•  Ideaisnotnew–  (Evolu#onary)gene#csliterature(Ritland,Lynch,Hill,others)

10

Page 161: Resemblance between relaves - University of Washington

Closevsdistantrela#ves

•  Detec#onofcloserela#ves(fullsibs,parent-offspring,halfsibs)frommarkerdataisrela#velystraighnorward

•  Butcloserela#vesmayshareenvironmentalfactors–  Biasedes#matesofgene#cvariance

•  Solu#on:useonly(very)distantrela#ves

11

Page 162: Resemblance between relaves - University of Washington

Amodelforasinglecausalvariant AA AB BB

frequency (1-p)2 2p(1-p) p2x 0 1 2effect 0 b 2bz=[x-E(x)]/σx -2p/√{2p(1-p)} (1-p)/√{2p(1-p)} 2(1-p)/√{2p(1-p)}yj = µ’+xijbi+ej x=0,1,2{standardassocia#onmodel}yj = µ+zijuj+ej u=bσx;µ=µ’+bσx

12

Page 163: Resemblance between relaves - University of Washington

Mul#ple(m)causalvariants

yj= µ+Σzijuj+ej= µ+gj+ej

y = µ1+g+e= µ1+Zu+e

13

Page 164: Resemblance between relaves - University of Washington

Equivalence

Letubearandomvariable,u~N(0,σu2)

Thenσg2=mσu

2andvar(y) =ZZ’σu

2+Iσe2

=ZZ’(σg2/m)+Iσe

2 =Gσg

2+Iσe2

14

Model with individual genome-wide additive values using relationships (G) at the causal variants is equivalent to a model fitting all causal variants We can estimate genetic variance just as if we would do using pedigree relationships

Page 165: Resemblance between relaves - University of Washington

Butwedon’thavethecausalvariantsIfwees#mateGfromSNPs:

–  loseinforma#onduetoimperfectLDbetweenSNPsandcausalvariants

–  howmuchwelosedependson•  densityofSNPs•  allelefrequencyspectrumofSNPsvs.causalvariants

– es#mateofvarianceàmissingheritability

15

LetAbethees#mateofGfromNSNPs:Ajk =(1/N)Σ{xij–2pi)(xik–2pi)/{2pi(1-pi)}

=(1/N)Σzijzik

Page 166: Resemblance between relaves - University of Washington

Data •  ~4000 ‘unrelated’ individuals •  Ancestry ~British Isles •  Measurement on height (self-report or clinically measured) •  GWAS on 300k (‘adults’) or 600k (16-year olds) SNPs

16

Page 167: Resemblance between relaves - University of Washington

Lack of evidence for population stratification within the Australian sample

17

Page 168: Resemblance between relaves - University of Washington

Methods

•  Estimate realised relationship matrix from SNPs

•  Estimate additive genetic variance 22)var( eg σσ IAVy +==

)1(2),cov(

)var()var(),cov(

ii

ikij

iikiij

iikiijijk pp

xxaxax

axaxA

−==

iii egy +=

Base population = current population

⎪⎪

⎪⎪

=−

++−+

≠−

−−

==

∑∑

kjpp

pxpxN

kjpp

pxpxN

AN

A

iii

iijiij

iii

iikiij

i ijkjk

,)1(22)21(11

,)1(2

)2)(2(11

22

18

Page 169: Resemblance between relaves - University of Washington

Statistical analysis

22)var( eg σσ IAVy +==

y standardised ~N(0,1) No fixed effects other than mean A estimated from SNPs Residual maximum likelihood (REML)

19

Page 170: Resemblance between relaves - University of Washington

h2 ~ 0.5 (SE 0.1)

20

Page 171: Resemblance between relaves - University of Washington

Checking for population structure

21

Page 172: Resemblance between relaves - University of Washington

Partitioning variation

•  If we can estimate the variance captured by SNPs genome-wide, we should be able to partition it and attribute variance to regions of the genome

•  “Population based linkage analysis”

22

Page 173: Resemblance between relaves - University of Washington

Genome partitioning

23

•  Partition additive genetic variance according to groups of SNPs –  Chromosomes –  Chromosome segments –  MAF bins –  Genic vs non-genic regions –  Etc.

•  Estimate genetic relationship matrix from SNP groups

•  Analyse phenotypes by fitting multiple relationship matrices

•  Linear model & REML (restricted maximum likelihood)

Page 174: Resemblance between relaves - University of Washington

Data from the GENEVA Consortium •  Investigators: Bruce Weir, Teri Manolio and many others •  Data

–  ~14,000 European Americans •  ARIC •  NHS •  HPFS

–  Affy 6.0 genotype data •  ~600,000 after stringent QC

–  Phenotypes on height, BMI, vWF and QT Interval

24

Page 175: Resemblance between relaves - University of Washington

QC of SNPs

•  780,062 SNPs after QC steps listed in the table.

•  Exclude 141,772 SNPs with MAF < 0.02 in European-ancestry group.

•  Exclude 36,949 SNPs with missingness > 2% in all samples.

•  Include autosomal SNPs only.

•  End up with 577,778 SNPs. 25

Page 176: Resemblance between relaves - University of Washington

Results (genome-wide)

26

Page 177: Resemblance between relaves - University of Washington

1

2

3

4

5

6

789

101112

13

14

15

16

17

1819

20

21

22

0

0.01

0.02

0.03

0.04

0.05

0 50 100 150 200 250

Varianceexplainedbyeachchromosom

e

Chromosomelength(Mb)

Slope=1.6×10-4

P =1.4×10-6

R2 =0.695

Height(combined)

12

34

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

2122

0

0.005

0.01

0.015

0.02

0.025

0 50 100 150 200 250

Varianceexplainedbyeachchromosom

eChromosomelength(Mb)

Slope=2.3×10-5

P =0.214R2 =0.076

BMI(combined)

27

Genome-partitioning: longer chromosomes explain more variation

Page 178: Resemblance between relaves - University of Washington

1

234

5

6

7

8

9

10

11

12

13 14

15

16

17

1819

20

21 22

R²=0.511

0.000

0.002

0.004

0.006

0.008

0.010

0.00 0.01 0.02 0.03 0.04 0.05

Varia

nceexplainedbyGIANTheightSNPson

eachch

romosom

e

Varianceexplainedbyeachchromosome

Height (11,586 unrelated)

1

2

34

5

6

7

8

9

10

11

12

13

14

1516

17

18

19

20

2122

0.000

0.004

0.008

0.012

0.016

0.020

0.000 0.004 0.008 0.012 0.016 0.020

Varia

nceexplainedbychrom

osom

e(adjustedfortheFTO

SNP)

Varianceexplainedbychromosome(noadjustment)

BMI(11,586unrelated)

FTO

28

Results are consistent with reported GWAS

Page 179: Resemblance between relaves - University of Washington

1

234

5

6

78

9

1011

12

13

14

151617

18

192021220.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

0.16

0 50 100 150 200 250

Varianceexplainedbyeachchromosom

e

Chromosomelength(Mb)

Slope=6.9×10-5

P =0.524R2 =0.021

vWF(ARIC)

29

Inference robust with respect to genetic architecture

1

2

3

4

5

6

7

8

9

1011

12

13

14

151617

18

192021

22

0.00

0.03

0.06

0.09

0.12

0.15

0.00 0.03 0.06 0.09 0.12 0.15

Varia

nceexplainedbychrom

osom

e(adjustedfortheABO

SNP)

Varianceexplainedbychromosome(noadjustment)

vWF(6,662unrelated)

ABO

Page 180: Resemblance between relaves - University of Washington

0

0.01

0.02

0.03

0.04

0.05

0.06

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

Varianceexplained

Chromosome

intergenic(± 20Kb)

genic(± 20Kb)

Height(combined)17,277proteincoding geneshGg

2 =0.328(s.e.=0.024)hGi

2 =0.126(s.e.=0.022)Coverageofgenicregions=49.4%P(observedvs.expected)=2.1x10-10

0

0.005

0.01

0.015

0.02

0.025

0.03

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

Varia

nceexplained

Chromosome

intergenic(± 20Kb)

genic(± 20Kb)

BMI(combined)17,277proteincoding geneshGg

2 =0.117(s.e.=0.023)hGi

2 =0.047(s.e.=0.022)Coverageofgenicregions=49.4%P(observedvs.expected)=0.022

30

Genic regions explain variation disproportionately

Page 181: Resemblance between relaves - University of Washington

Key concepts •  Dense SNP panels allow the estimation of the expected

genetic covariance between distant relatives (‘unrelateds’)

•  A model based upon estimated relationships from SNPs is equivalent to a model fitting all SNPs simultaneously

•  The total genetic variance due to LD between common SNPs and (unknown) causal variants can be estimated

•  Genetic variance captured by common SNPs can be assigned to chromosomes and chromosome segments

31

Page 182: Resemblance between relaves - University of Washington

Predictionofquantitativetraitsusingmarkerdata

PeterM.Visscher&[email protected]

[email protected]

1

Page 183: Resemblance between relaves - University of Washington

Keyconcepts• Predictionofphenotypicvaluesislimitedbyheritability• Accuracyofpredictiondependson

– howwellmarkereffectsareestimated(samplesize)– howwellmarkereffectsarecorrelatedwithcausalvariants(LD)

• Estimationofmarkereffectsandpredictioninthesamedataleadsto(severe)bias– winner’scurse;over-fitting

• VarianceexplainedbyaSNP-basedpredictorisnotthesameasthevarianceexplainedbythoseSNPs

• Markerdatacapturesbothbetweenandwithinfamilygeneticvariation• Bestpredictionmethodstakegeneticvaluesasrandomeffects

2

Page 184: Resemblance between relaves - University of Washington

3“Genomicselection”=individualprediction inacommercialsetting

Page 185: Resemblance between relaves - University of Washington

Take-homefromanimalbreeding

(1) Don’tneedgenome-widesignificanteffects(2) Don’tneedtoknowcausalvariants(3) Don’tneedtoknowfunction(4) FitallSNPssimultaneously

4

Page 186: Resemblance between relaves - University of Washington

5

Page 187: Resemblance between relaves - University of Washington

Aquantitativegeneticsmodel

y=fixedeffects+G+EG=A+D+I

Possiblepredictions:• PredictyfromfixedeffectsandG• PredictGfromA• PredictyfromA• PredictyfromAusingmarkers

6

Page 188: Resemblance between relaves - University of Washington

Predictionusinglinearregression

y=β*x+e

• Usually,β andxareconsidered‘fixed’

• ForSNPs,xisrandomwithvariance2p(1-p)assumingHWE

• Laterwewillconsiderthecasewhereβ israndom

7

Page 189: Resemblance between relaves - University of Washington

Chanceassociation

m markers,samplesizeNAllβ =0Multiplelinearregressionofy onmmarkers

E(R2)=m/N {strictlym/(N-1)}

à Variation“explained”bychance

8[Wishart, 1931]

Page 190: Resemblance between relaves - University of Washington

Selectionbias• Selectm ‘best’markersoutofM intotal• ‘Prediction’insamesample(in-sampleprediction)

E(R2)>>m/Nà Lotsofvariationexplainedbychance

9~15 best markers selected from 2.5 million markers

Page 191: Resemblance between relaves - University of Washington

Leastsquaresprediction

10

Rm2 = var(a) / var(y) = h2

E(Ry, y2 ) ≈ h2 / [1+m / {Nh2}]

Even if we knew all m causal variants but needed to estimate their effect sizes then the variance explained by the predictor is less than the variance explained by the causal variants in the population.

[Daetwyler et al. 2008, PLoS Genetics; Visscher, Yang, Goddard 2010, Twin Research Human Genetics 2010]

Page 192: Resemblance between relaves - University of Washington

Take-home

(4)Estimationofvariancecontributedby(all)lociisnotthesameaspredictionaccuracy

unlesstheeffectsizesareestimatedwithouterror

11

Page 193: Resemblance between relaves - University of Washington

12

Page 194: Resemblance between relaves - University of Washington

Measuresofhowwellapredictorworks

• “Accuracy”(animalbreeding)– Correlationbetweentruegenome-widegeneticvalueanditspredictor

• R2 fromaregressionofoutcomeonpredictor(humangenetics)

• Area-under-curvefromROCanalyses(diseaseclassification)

13

Page 195: Resemblance between relaves - University of Washington

Limitsofprediction

• AperfectpredictorofAcanbealousypredictorofaphenotype

• TheregressionR2 hasamaximumthatdependsonheritability

• TheregressionR2 islimitedbyunknown(egfuture)fixedeffectsandcovariates

14

Page 196: Resemblance between relaves - University of Washington

Predictionsfromknownvariants

15

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.2 0.4 0.6 0.8 1

Precision

Proportionofheritabilityexplainedbygenomicprofile

r(A,genomicprofile)

R^2(y,genomicprofile),h2=0.4

R^2(y,genomicprofile),h2=0.8

Page 197: Resemblance between relaves - University of Washington

Prediction using genetic markers: using between and within-family genetic variation

16

Page 198: Resemblance between relaves - University of Washington

Inclassdemo

• 180heightvariantsfromLango-Allenetal.2010– Estimationofbfromdata(N~4000)

• NotethatE(R2)=180/4000=0.045bychance!

– UsingbfromLango-Allenpaper• Takingthetop180SNPsfromGWAS

17

Page 199: Resemblance between relaves - University of Washington

Analysisdemonstration

18

• Data:– Genotype data:3,924unrelated individuals and ~2.5MSNPs.– Phenotype data:height z-scores (adjusted for age and sex)– 180SNPsidentified by the GIANTmeta-analysis(MA)of height (n =~180,000)

• Analyses:– Estimating effect sizes of the 180height SNPsinthe data.– PLINKscoring:180GIANTSNPs,using effect sizes estimated from GIANTMA.– GWASanalysis inthe data,selecting topSNPsat 180lociand predicting the

phenotypes inthe samedata.

• Results:– Estimation:R2 =0.134(R2 =0.046by chance),adjusted R2 =0.093– Prediction:R2 =0.099– Prediction using the topSNPsselected inthe samedata:R2 =0.429

Page 200: Resemblance between relaves - University of Washington

Identifyingpeopleathighrisk:T1D

Per 10,000 people

40 casesRatio 1:250

32 cases in 1800 at most risk Ratio 1:56

Most disease is due to people most at risk

Polychronakos & Li NRG 2011Clayton PLoS Genetics 2009

Page 201: Resemblance between relaves - University of Washington

Predictionofgeneticvalueusingbetterpredictors

Modelwithadditiveinheritance

y=g+e

V(g)=Gσg2,V(e)=Iσe2,V(y)=V=Gσg2 +Iσe2,

AimistopredictgforindividualsEg topredictfutureriskofadisease

20

Page 202: Resemblance between relaves - University of Washington

Predictionofgeneticvalue

y=g+e

V(g)=Gσg2,V(e)=Iσe2,V(y)=V=Gσg2 +Iσe2,

Bestpredictionisg-hat=E(g|y)IfyandgarebivariatenormalE(g|y)=b’y =σg2 GV-1 y

21

Page 203: Resemblance between relaves - University of Washington

Predictionofgeneticvalue

Eg Unrelatedindividuals

V(g)=Ih2,V(e)=I(1-h2),V(y)=I,

Bestpredictionisg-hat=E(g|y)=b’y =σg2 GV-1 y=h2 y

22

Page 204: Resemblance between relaves - University of Washington

Predictionofgeneticvalue

y=g+e,g=Zu

V(u)=Iσu2,V(Zu)=ZZ’σu2,

Bestpredictionisu-hat=E(u|y)IfyanduaremultivariatenormalE(u|y)=b’y =σu2Z’V-1 y

23

Page 205: Resemblance between relaves - University of Washington

Predictionofgeneticvalue

y=g+e,g=Zu

V(u)=Iσu2,V(Zu)=ZZ’σu2,

u-hat=E(u|y)=b’y =σu2Z’V-1 yg-hat=Zu-hat=σu2ZZ’V-1 y=σg2GV-1 y

24

Page 206: Resemblance between relaves - University of Washington

Predictionofgeneticvalue

y=g+e,g=ZuIfyanduaremultivariatenormalE(u|y)=b’y =σu2Z’V-1 y

TheSNPeffectsareunlikelytobenormallydistributedwithequalvariance

25

Page 207: Resemblance between relaves - University of Washington

Best prediction

u-hat = E(u | y)

= ∫ u P(u | y) du

Bayes theoremP(u | y) = P(y | u) P(u) / P(data)

Likelihood prior

Predictionofgeneticvalue

26

Page 208: Resemblance between relaves - University of Washington

Bayesian estimation

E(u | y) = ∫ u P(y | u) P(u) / P(y) du

Distribution of SNP effects

Normal à BLUPt-distribution à Bayes AMixture à Bayes B (Meuwissen et al 2001)

Mixture of N à Bayes R (Erbe et al 2012)u ~ N(0,σi

2) with probability πiσi

2 = {0, 0.0001, 0.001,0.01} σg2

Accuracy is greatest if assumed distribution matchesreal distribution.

Predictionofgeneticvalue

27

Page 209: Resemblance between relaves - University of Washington

Prediction equations

28

Page 210: Resemblance between relaves - University of Washington

PredictionofgeneticvalueOthermethodsofprediction

EstimateeffectofeachSNPoneatatimeandaddg-hat=Zu-hatu-hatestimatedfromsingleSNPregression

BiasedE(g|g-hat)≠g-hatLessaccuratebecauseignoresLDbetweenSNPs

andtreatsuasfixedeffects

29

Page 211: Resemblance between relaves - University of Washington

Real data4500 bulls and 12000 cows (Holstein and Jersey)600,000 SNPs genotypedTrain using bulls born < 2005Test using bulls born >= 2005

Correlation of EBV and daughter averageProtein Stature Milk Fat%

BLUP 0.66 0.52 0.65 0.72Bayes R 0.66 0.54 0.68 0.82

Prediction of genetic value

30

Page 212: Resemblance between relaves - University of Washington

Genetic architecture

ProportionofSNPsfromdistributionwithvariance

Trait 0.01% 0.1% 1% polygenic(%)

RFI 7498 296 6 11LDPF 1419 254 36 27

Mean4029 271 19 2531

Page 213: Resemblance between relaves - University of Washington

Integrationofpredictionandmappingofcausalvariants

SameBayesianmodelsasusedforpredictioncanbeusedformappingcausalvariantsofcomplextraits

32

Page 214: Resemblance between relaves - University of Washington

Prediction equations

33

Page 215: Resemblance between relaves - University of Washington

Mapping QTL – Milk on BTA5

34

Page 216: Resemblance between relaves - University of Washington

Mapping QTL – Milk on BTA5

Page 217: Resemblance between relaves - University of Washington

Applicationtohumandiseasedata(WTCCC)

36

Page 218: Resemblance between relaves - University of Washington

Model• AssumestrueSNPeffectsarederivedfromaseriesofnormaldistributions• Priorassumptions

– EffectssizeofSNPk

– Mixingproportion,π• Dirichlet distribution,

– Geneticvariance• hyper-parameterestimatedfromdata, !!!~!!! !! , !!! !

! !! ,… ,!! !~!! !,… , ! ,!"#ℎ!! = 1!

−0.15 −0.10 −0.05 0.00 0.05 0.10 0.15

05

1015

20

Effect size

Den

sity

!!! =

!!!×!! 0, 0!×!!!!!!!×!! 0, 10!!×!!!!!!!×!! 0, 10!!!×!!!!!!!×!! 0, 10!!!×!!!!

!

Page 219: Resemblance between relaves - University of Washington

CAD HT T2D BD CD RA T1D

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

SNP−

base

d he

ritab

ility

MethodBayesRBSLMMLMM

CAD HT T2D BD CD RA T1D

●●

●●

●●●

0.5

0.6

0.7

0.8

0.9

Pred

ictio

n ac

cura

cy (A

UC

)

MethodBayesRBSLMMLMMGPRS

Figure 4. Comparison of performance of BayesR, BSLMM, LMM and GPRS in WTCCC data. (A) Estimates of SNP-based heritability on the observed scale. Antennas are standard deviations of posterior samples for BayesR and BSLMM or standard errors for LMM. GPRS does not provide estimates of heritability. (B) Distribution of the area under the curve (AUC). The single boxplots display the variation in estimates among 20 replicates. In each replicate, the data set was randomly split into a training sample containing 80% of individuals and a validation sample containing the remaining 20%.

Page 220: Resemblance between relaves - University of Washington

ExpectedproportionoftotalSNPvarianceexplainedbyeachmixture

●●

●●

●●

●●

0

25

50

75

100

T1D RA CD CAD T2D HT BD SCZDisease

Varia

nce

expl

aine

d (%

)

Mixturecomponent

10−4 x σ2g

10−3 x σ2g

10−2 x σ2g

(NumberofSNPsinclass× varianceassignedtoSNP)/sumofmarkervariance

Page 221: Resemblance between relaves - University of Washington

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●0

20

40

60

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22Chromosome

Varia

nce

expl

aine

d (%

) T1D

● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●

● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●0

10

20

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22Chromosome

Varia

nce

expl

aine

d (%

) RA

● ●

●● ● ●

● ●●

●● ●

●● ● ●

●●

● ●

● ●● ●

● ●

● ● ●

● ●● ● ● ●

●●

● ● ● ●

● ●

● ●

●● ● ●

● ● ● ● ●

●●

● ● ● ●0.0

2.5

5.0

7.5

10.0

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22Chromosome

Varia

nce

expl

aine

d (%

) CD

●●

●●

● ●

● ●

●● ●

● ● ●

● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ●0

2

4

6

8

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22Chromosome

Varia

nce

expl

aine

d (%

) CAD

●●

●● ● ●

● ●

●● ●

● ● ●

● ●● ● ● ● ● ●

● ● ●● ● ● ● ● ● ●

● ●● ● ● ●● ● ● ● ● ● ● ● ●

● ● ● ● ●

● ● ● ● ● ●0

2

4

6

8

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22Chromosome

Varia

nce

expl

aine

d (%

) T2D

●●

●● ● ●

●●

●●

●● ●

● ●● ● ● ● ● ●● ● ●

● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●0

2

4

6

8

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22Chromosome

Varia

nce

expl

aine

d (%

) HT

●●

●● ● ●

● ●

●● ●

●●

● ●

● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●0.0

2.5

5.0

7.5

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22Chromosome

Varia

nce

expl

aine

d (%

) BD

Mixture component●

10−4 x σ2g

10−3 x σ2g

10−2 x σ2g

Figure 6. Proportion of genetic variance on each chromosome explained by SNPs with different effect sizes underlying seven traits in WTCCC. Proportion of additive genetic variation contributed by individual chromosomes and the proportion of variance on each chromosome explained by SNPs with different effect sizes. For each chromosome we calculated the proportion of variance in each mixture component as the sum of the square of the sampled effect sizes of the SNPs allocated to each component divided by the sum of the total variance explained by SNPs. The colored bars partition the genetic variance in contributions from each mixture class.

Page 222: Resemblance between relaves - University of Washington

PosteriormeanofnumberofSNPsestimatedbyBayesR

●●

● ● ●

● ●

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

11000

T1D RA CD CAD T2D HT BD SCZDisease

Num

ber o

f SN

P

Model size

• Posterior mean and 95% posterior credible interval • WTCC1+SCZ swedish

Page 223: Resemblance between relaves - University of Washington

PredictionofgeneticvalueSummary

Bestpredictionisg-hat=E(g|y)Geneticvaluestreatedasrandomeffects

Eg g~N(0,Gσg2)

EquivalentmodeltopredictSNPeffectsuE(u|y)dependsonpriordistributionofu

à Bayesianmodelsg-hat=Zu-hatgiveshigheraccuracythanassuming

g~N(0,Gσg2)Bayesianmodelsintegratepredictionandmappingofcausalvariants

42

Page 224: Resemblance between relaves - University of Washington

Keyconcepts• Predictionofphenotypicvaluesislimitedbyheritability• Accuracyofpredictiondependson

– howwellmarkereffectsareestimated(samplesize)– howwellmarkereffectsarecorrelatedwithcausalvariants(LD)

• Estimationofmarkereffectsandpredictioninthesamedataleadsto(severe)bias– winner’scurse;over-fitting

• VarianceexplainedbyaSNP-basedpredictorisnotthesameasthevarianceexplainedbythoseSNPs

• Markerdatacapturesbothbetweenandwithinfamilygeneticvariation• Bestpredictionmethodstakegeneticvaluesasrandomeffects

43

Page 225: Resemblance between relaves - University of Washington

Supplementaryderivations

44

Page 226: Resemblance between relaves - University of Washington

Theory(additivemodel)m unlinkedcausalvariants

45

yi = xijbj + eij=1

m

∑ = ai + ei

var(y) = var(x j )b2j + var(e)

j=1

m

∑ = var(a)+ var(e)

cov(yi, yk ) = cov(xij, xkj )b2j

j=1

m

∑ + cov(ei,ek )

= cov(ai,ak )+ cov(ei,ek )= cov(ai,ak ) if cov(ei,ek ) = 0

Page 227: Resemblance between relaves - University of Washington

Prediction

46

yi = xijbjj=1

m

∑ = ai

var(y) = var(x j )b2j

j=1

m

∑ = var(a)

cov(yi, yk ) = cov(xij, xkj )b2j

j=1

m

∑ = cov(ai, ak )

Page 228: Resemblance between relaves - University of Washington

- theory-

47

cov(yi, yi ) = cov{ (xijbj ), xijbj + eij=1

m

∑ }j=1

m

= var(xij )bjbj + xij cov(bj,j=1

m

∑ ei )j=1

m

If b estimated from the same data in which prediction is made, then the second term is non-zero

Page 229: Resemblance between relaves - University of Washington

EffectoferrorsinestimatingSNPeffects(leastsquares;singleSNP)

48

yi = xib+ eib = b+ε

E(b) = b

var(b) = var(ε) =σ e2 / Σx2 ≈ var(y) / {N var(x)}

var(x) = 2p(1− p) under HWEDefine RSNP

2 = var(x)b2 / var(y)

= contribution of single SNP to heritability

Page 230: Resemblance between relaves - University of Washington

- effectsoferrors-

49

Ry, y2 = cov(y, y)2 / {var(y)var(y)}

E[cov(y, y)]= E[cov(xb, xb)]= var(xi )E(b)b= var(x)b2

E[var(y)]= E[var(xb)]= var(x)E[b2 ]

= var(x)[b2 + var(b)] ≈ var(x)b2 + var(x)var(y) / [N var(x)]= var(x)b2 + var(y) / N

E(Ry, y2 ) ≈ RSNP

2 / [1+1/ {NRSNP2 }]