lecture 5: allelic effects and genetic...

1

Lecture 5: Allelic Effects and Genetic

Variances

Bruce Walsh lecture notesSynbreed course

version 12 July 2013

2

Quantitative Genetics

The analysis of traits whosevariation is determined by both

a number of genes andenvironmental factors

Phenotype is highly uninformative as tounderlying genotype

3

Complex (or Quantitative) trait• No (apparent) simple Mendelian basis for variation in the

trait

• May be a single gene strongly influenced by environmentalfactors

• May be the result of a number of genes of equal (ordiffering) effect

• Most likely, a combination of both multiple genes andenvironmental factors

• Example: Blood pressure, cholesterol levels– Known genetic and environmental risk factors

• Molecular traits can also be quantitative traits

– mRNA level on a microarray analysis

– Protein spot volume on a 2-D gel

4

Phenotypic distribution of a trait

5

Consider a specific locus influencing the trait

For this locus, mean phenotype = 0.15, while overall meanphenotype = 0

Hence, it is very hard to distinguish the QQ individuals fromall others simply from their phenotypic values

Values for QQ individuals shaded in dark green

6

Goals of Quantitative Genetics

• Partition total trait variation into genetic (nature) vs.environmental (nurture) components

• Predict resemblance between relatives– If a sib has a disease/trait, what are your odds?

– Selection response

– Change in mean under inbreeding, outcrossing, assortative maing

• Find the underlying loci contributing to genetic variation– QTL -- quantitative trait loci

• Deduce molecular basis for genetic trait variation

• eQTLs -- expression QTLs, loci with a quantitative influenceon gene expression– e.g., QTLs influencing mRNA abundance on a microarray

7

Dichotomous (binary) traits

Presence/absence traits (such as a disease) can (and usually do) have a complex genetic basis

Consider a disease susceptibility (DS) locus underlying a disease, with alleles D and d, where allele D significantly increases your disease risk

In particular, Pr(disease | DD) = 0.5, so that thepenetrance of genotype DD is 50%

Suppose Pr(disease | Dd ) = 0.2, Pr(disease | dd) = 0.05

dd individuals can rarely display the disease, largelybecause of exposure to adverse environmental conditions

8

If freq(d) = 0.9, what is Prob (DD | show disease) ?

freq(disease) = 0.12*0.5 + 2*0.1*0.9*0.2 + 0.92*0.05 = 0.0815 (Hardy-Weinberg assumption)

From Bayes’ theorem, Pr(DD | disease) = Pr(disease |DD)*Pr(DD)/Prob(disease) = 0.12*0.5 / 0.0815 = 0.06 (6 %)

dd individuals can give rise to phenocopies 5% of the time,showing the disease but not as a result of carrying therisk allele

Pr(Dd | disease) = 0.442, Pr(dd | disease) = 0.497

Thus about 50% of the diseased individuals are phenocopies

9

Basic model of Quantitative Genetics

Basic model: P = G + E

Phenotypic value -- wealso use z for this value

Genotypic value

Environmental value

G = average phenotypic value for that genotypeif we are able to replicate it over the universeof environmental values, G = E[P]

Hence, genotypic values are functions of the environments experienced.

10

Basic model of Quantitative GeneticsBasic model: P = G + E

G = average phenotypic value for that genotypeif we are able to replicate it over the universeof environmental values, G = E[P]

G x E interaction --- The performance of a particulargenotype in a particular environment differs fromthe sum of the average performance of thatgenotype over all environments and the averageperformance of that environment over all genotypes.Basic model now becomes P = G + E + GE

G = average value of an inbred line over a seriesof environments

11

The transmission of genotypes versusalleles

• With fully inbred lines, offspring have the same genotype astheir parent (i.e., they are clones), and hence the entireparental genotypic value G is passed along– Hence, favorable interactions between alleles (such as with

dominance) are not lost by randomization under random matingbut rather passed along.

• When offspring are generated by crossing (or randommating), each parent contributes a single allele at each locusto its offspring, and hence only passes along a PART of itsgenotypic value

• This part is determined by the average effect of the allele– Favorable interaction between alleles are NOT passed along to

their offspring in a diploid (but, as we will see, are in anautoteraploid)

12

Q1Q1 Q2Q1 Q2Q2

C C + a(1+k) C + 2aC C + a + d C + 2a

C - a C + d C + a

2a = G(Q2Q2) - G(Q1Q1)

d = ak =G(Q1Q2 ) - [G(Q2Q2) + G(Q1Q1) ]/2

d measures dominance, with d = 0 if the heterozygoteis exactly intermediate to the two homozygotes

k = d/a is a scaled measure of the dominance

Contribution of a locus to a trait

13

Example: Apolipoprotein E &Alzheimer’s

84.375.568.4Average age of onset

EEEeeeGenotype

2a = G(EE) - G(ee) = 84.3 - 68.4 --> a = 7.95

ak =d = G(Ee) - [ G(EE)+G(ee)]/2 = -0.85

k = d/a = -0.10 Only small amount of dominance

14

Example: Booroola (B) gene

2.662.171.48Average Litter size

BBBbbbGenotype

2a = G(BB) - G(bb) = 2.66 -1.46 --> a = 0.59

ak =d = G(Bb) - [ G(BB)+G(bb)]/2 = 0.10

k = d/a = 0.17

15

Population means: Random matingLet p = freq(A), q = 1-p = freq(a). Assuming random-mating (Hardy-Weinberg frequencies),

p22pqq2Frequency

C + aC + dC - aValue

AAAaaaGenotype

Mean = q2(C - a) + 2pq(C + d) + p2(C + a) µRM = C + a(p-q) + d(2pq)

Contribution fromhomozygotes

Contribution fromheterozygotes

16

Population means: Inbred cross F2

Suppose two inbred lines are crossed. If A is fixedin one population and a in the other, then p = q = 1/2

1/41/21/4Frequency


AAAaaaGenotype

Mean = (1/4)(C - a) + (1/2)(C + d) + (1/4)( C + a) µRM = C + d/2

Note that C is the average of the two parental lines, sowhen d > 0, F2 exceeds this. Note also that the F1 exceedsthis average by d, so only half of this passed onto F2.

17

Population means: RILs from an F2

A large number of F2 individuals are fully inbred, either byselfing for many generations or by generating doubled haploids.If p and q denote the F2 frequencies of A and a,what is the expected mean over the set of resulting RILS?

p0qFrequency


AAAaaaGenotype

µRILs = C + a(p-q)

Note this is independent of the amount of dominance (d)

18

The average effect of anallele

• The average effect !A of an allele A is defined bythe difference between offspring that get thatallele and a random offspring.

– !A = mean(offspring value given parenttransmits A) - mean(all offspring)

– Similar definition for !a.

• Note that while C, a and d (the genotypicparameters) do not change with allele frequency,!x is clearly a function of the frequencies ofalleles with which allele x combines.

19

Random matingConsider the average effect of allele A when a parent is randomly-mated to another individual from its population

C + dAaqa

C + aAApA

ValueGenotypeProbabilityAllele from otherparent

Suppose parent contributes A

Mean(A transmitted) = p(C + a) + q(C + d) = C + pa + qd

!A = Mean(A transmitted) - µ = q[a + d(q-p)]

20

Random mating

C - aaaqa

C + dAapA

ValueGenotypeProbabilityAllele from otherparent

Now suppose parent contributes a

Mean(a transmitted) = p(C + d) + q(C - a) = C - qa + pd

!a = Mean(a transmitted) - µ = -p[a + d(q-p)]

21

!, the average effect of anallelic substitution

• ! = !A - !a is the average effect of an allelicsubstitution, the change in mean trait value whenan a allele in a random individual is replaced by anA allele

– ! = a + d(q-p). Note that

• !A = q! and !a =-p!.• E(!X) = p!A + q!a = pq! - qp! = 0,

• The average effect of a random allele iszero, hence average effects are deviationsfrom the mean

22

Dominance deviations• Fisher (1918) decomposed the contribution

to the genotypic value from a single locusas Gij = µ + !i + !j + "ij

– Here, µ is the mean (a function of p)

– !i are the average effects

– Hence, µ + !i + !j is the predicted genotypicvalue given the average effect (over allgenotypes) of alleles i and j.

– The dominance deviation associated withgenotype Gij is the difference between its truevalue and its value predicted from the sum ofaverage effects (essentially a residual)

23

Fisher’s (1918) Decomposition of GOne of Fisher’s key insights was that the genotypic valueconsists of a fraction that can be passed from parent tooffspring and a fraction that cannot.

Mean value µG = # Gij Freq(AiAj)

!i = average contribution to genotypic value for allele i

Gij = µG + αi +αj + δij

Consider the genotypic value Gij resulting from an AiAj individual

In particular, under sexual reproduction, diploid parents only pass along SINGLE ALLELES to their offspring

24

Since parents pass along single alleles to theiroffspring, the !i (the average effect of allele i)represent these contributions


Gij = µG +αi + αj

The genotypic value predicted from the individualallelic effects is thus

The average effect for an allele is POPULATION-SPECIFIC, as it depends on the types and frequencies of alleles that it pairs with

25


Gij !Gij = δij

Dominance deviations --- the difference (for genotypeAiAj) between the genotypic value predicted from thetwo single alleles and the actual genotypic value, namelyany interactions (dominance) between the two alleles

Gij = µG +αi + αj

The genotypic value predicted from the individualallelic effects is thus

26

Gen

otyp

ic V

alue

N = # Copies of Allele 20 1 2

G11

G21

G22

µ + 2!1

µ + !1 + !2

µ + 2!2

"12

"11

"22

Slope = ! = !2 - !1

1

!

11 21 22Genotypes

This decomposition is a regression of G

27

Gij = µG + 2α1 + (α2 ! α1)N + δij


Fisher’s decomposition is a Regression

Predicted value Residual error

A notational change clearly shows this is a regression,

Independent (predictor) variable N = # of A2 alleles

Note that the slope !2 - !1 = !, the average effectof an allelic substitution

28

Gij = µG + 2α1 + (α2 ! α1)N + δij

2α1 + (α2 !α1)N =

2α1 forN = 0, e.g, A1A1

α1 + α2 forN = 1, e.g, A1A2

2α2 forN = 2, e.g, A2A2

Regression slopeIntercept

A key point is that the average effects change withallele frequencies. Indeed, if overdominance is presentthey can change sign with allele frequencies.

29

0 1 2

N

G G22

G11

G21

Allele A2 common, !1 > !2

The size of the circle denotes the weight associated withthat genotype. While the genotypic values do not change,their frequencies (and hence weights) do.

30

0 1 2

N

G G22

G11

G21

Allele A1 common, !2 > !1

Slope = !2 - !1

Again, same genotypic values as previous slide, butdifferent weights, and hence a different slope(here a change in sign!)

31

0 1 2N

G G22

G11

G21

Both A1 and A2 frequent, !1 = !2 = 0

With these allele frequencies, both alleles have the same mean value when transmitted, so that all parents have the same average offspring value -- no response to selection

32

2aa(1+k)0Genotypic

value

Q2Q2Q2Q1Q1Q1Genotype

Consider a diallelic locus, where p1 = freq(Q1)

µG = 2p2 a(1 + p1k)Mean

Allelic effects

α2 = p1a [ 1 +k (p1 � p2 ) ]α1 = � p2a [ 1 + k (p1 � p2 ) ]

Dominance deviations δij = Gij ! µG! αi ! αj

33

Average Effects and Additive Genetic Values

A (Gij ) = αi + αj

A =n∑

k=1

(α(k)

i + α(k)k

)

The ! values are the average effects of an allele

A key concept is the Additive Genetic Value (A) ofan individual,

A is called the Breeding value or Additive geneticvalue

!i(k) = effect of allele i at locus k

34

A =n∑

k=1

(α(k)

i + α(k)k

)

Why all the fuss over A?

Suppose pollen parent has A = 10 and seed parent has A = -2 for plant height

Expected average offspring height is (10-2)/2 = 4 units above the population mean. Expected offspring A = average of parental A’s

KEY: parents only pass single alleles to their offspring.Hence, they only pass along the A part of their genotypicvalue G.

35

Genetic Variances

Gij = µg + (αi + αj ) + δij

σ2(G) =n∑

k=1

σ2(α(k)i + α(k)

j ) +n∑

k=1

σ2(δ(k)ij )

σ2(G) = σ2(µg +(αi + αj ) + δij) = σ2(αi + αj ) + σ2(δij)

As Cov(!,") = 0 (under random mating)

Writing the genotypic value as

The genetic variance can be written as

This follows since

36

Genetic Variances

σ2(G) =n∑

k=1

σ2(α(k)i + α(k)

j ) +n∑

k=1

σ2(δ(k)ij )

σ2G = σ2

A + σ2D

Additive Genetic Variance(or simply Additive Variance)

Dominance Genetic Variance(or simply Dominance Variance)

Hence, total genetic variance = additive + dominancevariances,

37

Key concepts (so far)

• !i = average effect of allele i– Property of a single allele in a particular population

(depends on genetic background)

• A = Additive Genetic Value (A)– A = sum (over all loci) of average effects

– Fraction of G that parents pass along to their offspring

– Property of an individual in a particular population

• Var(A) = additive genetic variance– Variance in additive genetic values

– Property of a population

• Can estimate A or Var(A) without knowing any ofthe underlying genetical detail (forthcoming)

38

σ2A = 2p1 p2 a2[ 1+ k (p1 p2 ) ]2One locus, 2 alleles:

Q1Q1 Q1Q2 Q2Q2

0 a(1+k) 2a

Dominance alters additive variance

When dominance present, Additive variance is anasymmetric function of allele frequencies

σ2A = 2E[α2 ] = 2

m∑

i=1

α2i pi

Since E[!] = 0, Var(!) = E[(! -µa)2] = E[!2]

39

σ2D = E[δ2 ] =

m∑

i=1

m∑

j=1

δ2ij pi pj

σ2D = (2p1 p2 ak)2One locus, 2 alleles:

Q1Q1 Q1Q2 Q2Q2

0 a(1+k) 2a

Equals zero if k = 0

This is a symmetric function ofallele frequencies

Dominance variance

Can also be expressed in terms of d = ak

40

Additive variance, VA, with no dominance (k = 0)

Allele frequency, p

VA

41

Complete dominance (k = 1)

Allele frequency, p

VA

VD

42

Epistasis

Gijkl = µG + (αi + αj + αk +αl) + (δij + δkj)+ (ααik +ααil + ααjk + ααjl)+ (αδikl + αδjkl + αδkij + αδlij)+ (δδijkl)

= µG + A+ D + AA + AD + DD

These components are defined to be uncorrelated,(or orthogonal), so that

σ2G = σ2

A + σ2D + σ2

AA + σ2AD +σ2

DD

The two-locus decomposition allowing for allpossible interactions is given by

43

Gijkl = µG + (αi + αj + αk +αl) + (δij + δkj)+ (ααik +ααil + ααjk + ααjl)+ (αδikl + αδjkl + αδkij + αδlij)+ (δδijkl)

= µG + A + D + AA + AD + DD

Additive x Additive interactions -- !!, AA

interactions between a single alleleat one locus with a single allele at another

Additive x Dominance interactions -- !", AD

interactions between an allele at onelocus with the genotype at another, e.g.allele Ai and genotype Bkj

Dominance x dominance interaction --- "", DD

the interaction between the dominancedeviation at one locus with the dominancedeviation at another.

44

Effects and Variance when using atestor

• A common design in plant breeding is to crossmembers from a population to a testor to generatea testcross.– Testor can be either an inbred line or an outcrossing

population

– Often from a different heteroic group from thepopulation being tested

– Often testor is an elite genotype

• The average effect of an allele in a testcross, itsvariance, and its additive (General combiningability, GCA) and interaction (Specific combiningability, SCA) effects all follow in analogous fashionto previous results for crosses within a population

45

• The concept of the average effect of an allelewhen crossed within its population is easilyextended to the average effect of an allele whencrossed to a testor.

– Called the testcross average effect.

• The average effect of allele X in this testcross,!x

T , is defined as difference between the meanvalue of offspring getting this allele from thepopulation versus the mean value of a randomoffspring from this cross– Will turn out to be a function of the frequencies of

alleles in both the tested and the testor population.

The average effect of an allele in a testcross

46

Mean value for a testcrossSuppose the frequency of A is p in the population and pT in the testor (with q and qT similarly defined for a).

qqT

C - a

qpT

C + d

a (q)

pqT

C + d

ppT

C + a

A (p)

a (qT)A (pT)

testor

Pare

ntal

lin

e

Mean of cross = C + a(ppT - qqT) + d(pqT + qpT)

47

Average testcross mean in a seriesof RILs

• Slide 17 gave an expression for the expected averageperformance from a series of RILs formed by crossing twopopulations.

• A similar expression exists for the average testcrossperformance for a series of RILs from a cross of A x B

– Mean = (1/2) µAT + (1/2) µB

T, namely the average of thetestcross means for A and B

– More generally (since lines can, by chance, give an unequalcontribution of alleles),

• Mean = $A µAT + $B µB

T, where $A = (1- $B) is the fraction ofalleles from A in the sample if RILS

• Can use molecular markers to estimate the $x directly.Here $x is the fraction of SNP alelles from line x.

48

!AT, testcross effect of allele A

C + dAaqTa

C + aAApTA

ValueGenotypeProbabilityAllele from testorparent

Suppose parent contributes A

Mean(A transmitted) = pT(C + a) + qT(C + d) = C + pTa + qTd

!AT = Mean(A transmitted) - µ = q[a + d(qT-pT)]

!aT = Mean(a transmitted) - µ = -p[a + d(qT-pT)]

Likewise,

49

!T, the average testcross effectof an allelic substitution

• !T = !AT - !a

T is the average testcrosseffect of an allelic substitution, the changein mean trait value when an a allele in arandom testcrossed individual is replacedby an A allele

– !T = a + d(qT-pT). Note that this isindependent of the allele frequencies in theparental population, and depends ONLY on thetestor allele frequencies (pT, qT).

• !AT = q!T , !a

T = -p!T, and E(!xT) = 0

50

Testcross variance• Just as the additive genetic variance was the

population variance in the sum of the average effectsof an allele, the testcross variance is variance in theaverage testcross effects of a random allele

– Var(AT) = Var(!xT) = Var(!x

T)

– Var(!xT) = p (!A

T)2 + q (!aT)2 =

– p(q[a + d(qT-pT)])2 + q(-p[a + d(qT-pT)])2

• = pq[a + d(qT-pT)]2

– Hence, Var(!xT) = pq[a + d(qT-pT)]2

51

GCS and SCA• Consider a cross between individuals from

population 1 and population 2

• Let µ1 x 2 denote the average value for allof these crosses, and let Gij be the averagegenotypic value of an individual from across from individual (or line) i in populationone and individual (or line) j frompopulation two.

• Analogous to Fisher’s decomposition, wecan write this in terms of two additiveeffects and one interaction effect.

52

!i2 is the testcross average effect for allele i (more

generally an allele from individual i) when testedusing population 2 as a testor, with !j

1 similarly definedfor allele j (from pop 2) using one as the testor

is the interaction between allele i from and allelej in the testcross of 1 and 2

The sum over all loci of the !i2 values is the general

combining ability (GCA) of line i when crossed to line 2 (note these are cross-specific)!

The sum of the " is the specific combining ability (SCA)

53

Gij = µ + GCAi2 + GCAj

1 + SCAij12

The superscripts denoting the population in which theallele is being tested is often suppressed

The GCA is akin to the breeding value from one parent,but now it is the testcross value of that parent

The predicted mean of a particular cross is the sum ofthe two GCAs for those individuals/lines

As with average effects and dominance deviations, theseare only defined with respect to a particular referenceset of crosses (i.e., lines from Pop 1 X lines from pop 2)

54

Within-population crosses vs. testors

SCADominance valueNonadditivecomponent

Var(GCA),Var(SCA)

Var(A),Var(D)

GeneticVariances

GCA1 + GCA2A1/2 +A2/2Predicting

offspring mean

GCABreeding value AAdditive

transmittingfactor

!T !Allelic effects

testorWithin-pop

lecture 5: allelic effects and genetic...

Documents