lecture 5: allelic effects and genetic...
TRANSCRIPT
1
Lecture 5: Allelic Effects and Genetic
Variances
Bruce Walsh lecture notesSynbreed course
version 12 July 2013
2
Quantitative Genetics
The analysis of traits whosevariation is determined by both
a number of genes andenvironmental factors
Phenotype is highly uninformative as tounderlying genotype
3
Complex (or Quantitative) trait• No (apparent) simple Mendelian basis for variation in the
trait
• May be a single gene strongly influenced by environmentalfactors
• May be the result of a number of genes of equal (ordiffering) effect
• Most likely, a combination of both multiple genes andenvironmental factors
• Example: Blood pressure, cholesterol levels– Known genetic and environmental risk factors
• Molecular traits can also be quantitative traits
– mRNA level on a microarray analysis
– Protein spot volume on a 2-D gel
4
Phenotypic distribution of a trait
5
Consider a specific locus influencing the trait
For this locus, mean phenotype = 0.15, while overall meanphenotype = 0
Hence, it is very hard to distinguish the QQ individuals fromall others simply from their phenotypic values
Values for QQ individuals shaded in dark green
6
Goals of Quantitative Genetics
• Partition total trait variation into genetic (nature) vs.environmental (nurture) components
• Predict resemblance between relatives– If a sib has a disease/trait, what are your odds?
– Selection response
– Change in mean under inbreeding, outcrossing, assortative maing
• Find the underlying loci contributing to genetic variation– QTL -- quantitative trait loci
• Deduce molecular basis for genetic trait variation
• eQTLs -- expression QTLs, loci with a quantitative influenceon gene expression– e.g., QTLs influencing mRNA abundance on a microarray
7
Dichotomous (binary) traits
Presence/absence traits (such as a disease) can (and usually do) have a complex genetic basis
Consider a disease susceptibility (DS) locus underlying a disease, with alleles D and d, where allele D significantly increases your disease risk
In particular, Pr(disease | DD) = 0.5, so that thepenetrance of genotype DD is 50%
Suppose Pr(disease | Dd ) = 0.2, Pr(disease | dd) = 0.05
dd individuals can rarely display the disease, largelybecause of exposure to adverse environmental conditions
8
If freq(d) = 0.9, what is Prob (DD | show disease) ?
freq(disease) = 0.12*0.5 + 2*0.1*0.9*0.2 + 0.92*0.05 = 0.0815 (Hardy-Weinberg assumption)
From Bayes’ theorem, Pr(DD | disease) = Pr(disease |DD)*Pr(DD)/Prob(disease) = 0.12*0.5 / 0.0815 = 0.06 (6 %)
dd individuals can give rise to phenocopies 5% of the time,showing the disease but not as a result of carrying therisk allele
Pr(Dd | disease) = 0.442, Pr(dd | disease) = 0.497
Thus about 50% of the diseased individuals are phenocopies
9
Basic model of Quantitative Genetics
Basic model: P = G + E
Phenotypic value -- wealso use z for this value
Genotypic value
Environmental value
G = average phenotypic value for that genotypeif we are able to replicate it over the universeof environmental values, G = E[P]
Hence, genotypic values are functions of the environments experienced.
10
Basic model of Quantitative GeneticsBasic model: P = G + E
G = average phenotypic value for that genotypeif we are able to replicate it over the universeof environmental values, G = E[P]
G x E interaction --- The performance of a particulargenotype in a particular environment differs fromthe sum of the average performance of thatgenotype over all environments and the averageperformance of that environment over all genotypes.Basic model now becomes P = G + E + GE
G = average value of an inbred line over a seriesof environments
11
The transmission of genotypes versusalleles
• With fully inbred lines, offspring have the same genotype astheir parent (i.e., they are clones), and hence the entireparental genotypic value G is passed along– Hence, favorable interactions between alleles (such as with
dominance) are not lost by randomization under random matingbut rather passed along.
• When offspring are generated by crossing (or randommating), each parent contributes a single allele at each locusto its offspring, and hence only passes along a PART of itsgenotypic value
• This part is determined by the average effect of the allele– Favorable interaction between alleles are NOT passed along to
their offspring in a diploid (but, as we will see, are in anautoteraploid)
12
Q1Q1 Q2Q1 Q2Q2
C C + a(1+k) C + 2aC C + a + d C + 2a
C - a C + d C + a
2a = G(Q2Q2) - G(Q1Q1)
d = ak =G(Q1Q2 ) - [G(Q2Q2) + G(Q1Q1) ]/2
d measures dominance, with d = 0 if the heterozygoteis exactly intermediate to the two homozygotes
k = d/a is a scaled measure of the dominance
Contribution of a locus to a trait
13
Example: Apolipoprotein E &Alzheimer’s
84.375.568.4Average age of onset
EEEeeeGenotype
2a = G(EE) - G(ee) = 84.3 - 68.4 --> a = 7.95
ak =d = G(Ee) - [ G(EE)+G(ee)]/2 = -0.85
k = d/a = -0.10 Only small amount of dominance
14
Example: Booroola (B) gene
2.662.171.48Average Litter size
BBBbbbGenotype
2a = G(BB) - G(bb) = 2.66 -1.46 --> a = 0.59
ak =d = G(Bb) - [ G(BB)+G(bb)]/2 = 0.10
k = d/a = 0.17
15
Population means: Random matingLet p = freq(A), q = 1-p = freq(a). Assuming random-mating (Hardy-Weinberg frequencies),
p22pqq2Frequency
C + aC + dC - aValue
AAAaaaGenotype
Mean = q2(C - a) + 2pq(C + d) + p2(C + a) µRM = C + a(p-q) + d(2pq)
Contribution fromhomozygotes
Contribution fromheterozygotes
16
Population means: Inbred cross F2
Suppose two inbred lines are crossed. If A is fixedin one population and a in the other, then p = q = 1/2
1/41/21/4Frequency
C + aC + dC - aValue
AAAaaaGenotype
Mean = (1/4)(C - a) + (1/2)(C + d) + (1/4)( C + a) µRM = C + d/2
Note that C is the average of the two parental lines, sowhen d > 0, F2 exceeds this. Note also that the F1 exceedsthis average by d, so only half of this passed onto F2.
17
Population means: RILs from an F2
A large number of F2 individuals are fully inbred, either byselfing for many generations or by generating doubled haploids.If p and q denote the F2 frequencies of A and a,what is the expected mean over the set of resulting RILS?
p0qFrequency
C + aC + dC - aValue
AAAaaaGenotype
µRILs = C + a(p-q)
Note this is independent of the amount of dominance (d)
18
The average effect of anallele
• The average effect !A of an allele A is defined bythe difference between offspring that get thatallele and a random offspring.
– !A = mean(offspring value given parenttransmits A) - mean(all offspring)
– Similar definition for !a.
• Note that while C, a and d (the genotypicparameters) do not change with allele frequency,!x is clearly a function of the frequencies ofalleles with which allele x combines.
19
Random matingConsider the average effect of allele A when a parent is randomly-mated to another individual from its population
C + dAaqa
C + aAApA
ValueGenotypeProbabilityAllele from otherparent
Suppose parent contributes A
Mean(A transmitted) = p(C + a) + q(C + d) = C + pa + qd
!A = Mean(A transmitted) - µ = q[a + d(q-p)]
20
Random mating
C - aaaqa
C + dAapA
ValueGenotypeProbabilityAllele from otherparent
Now suppose parent contributes a
Mean(a transmitted) = p(C + d) + q(C - a) = C - qa + pd
!a = Mean(a transmitted) - µ = -p[a + d(q-p)]
21
!, the average effect of anallelic substitution
• ! = !A - !a is the average effect of an allelicsubstitution, the change in mean trait value whenan a allele in a random individual is replaced by anA allele
– ! = a + d(q-p). Note that
• !A = q! and !a =-p!.• E(!X) = p!A + q!a = pq! - qp! = 0,
• The average effect of a random allele iszero, hence average effects are deviationsfrom the mean
22
Dominance deviations• Fisher (1918) decomposed the contribution
to the genotypic value from a single locusas Gij = µ + !i + !j + "ij
– Here, µ is the mean (a function of p)
– !i are the average effects
– Hence, µ + !i + !j is the predicted genotypicvalue given the average effect (over allgenotypes) of alleles i and j.
– The dominance deviation associated withgenotype Gij is the difference between its truevalue and its value predicted from the sum ofaverage effects (essentially a residual)
23
Fisher’s (1918) Decomposition of GOne of Fisher’s key insights was that the genotypic valueconsists of a fraction that can be passed from parent tooffspring and a fraction that cannot.
Mean value µG = # Gij Freq(AiAj)
!i = average contribution to genotypic value for allele i
Gij = µG + αi +αj + δij
Consider the genotypic value Gij resulting from an AiAj individual
In particular, under sexual reproduction, diploid parents only pass along SINGLE ALLELES to their offspring
24
Since parents pass along single alleles to theiroffspring, the !i (the average effect of allele i)represent these contributions
Gij = µG + αi +αj + δij
Gij = µG +αi + αj
The genotypic value predicted from the individualallelic effects is thus
The average effect for an allele is POPULATION-SPECIFIC, as it depends on the types and frequencies of alleles that it pairs with
25
Gij = µG + αi +αj + δij
Gij !Gij = δij
Dominance deviations --- the difference (for genotypeAiAj) between the genotypic value predicted from thetwo single alleles and the actual genotypic value, namelyany interactions (dominance) between the two alleles
Gij = µG +αi + αj
The genotypic value predicted from the individualallelic effects is thus
26
Gen
otyp
ic V
alue
N = # Copies of Allele 20 1 2
G11
G21
G22
µ + 2!1
µ + !1 + !2
µ + 2!2
"12
"11
"22
Slope = ! = !2 - !1
1
!
11 21 22Genotypes
This decomposition is a regression of G
27
Gij = µG + 2α1 + (α2 ! α1)N + δij
Gij = µG + αi +αj + δij
Fisher’s decomposition is a Regression
Predicted value Residual error
A notational change clearly shows this is a regression,
Independent (predictor) variable N = # of A2 alleles
Note that the slope !2 - !1 = !, the average effectof an allelic substitution
28
Gij = µG + 2α1 + (α2 ! α1)N + δij
2α1 + (α2 !α1)N =
2α1 forN = 0, e.g, A1A1
α1 + α2 forN = 1, e.g, A1A2
2α2 forN = 2, e.g, A2A2
Regression slopeIntercept
A key point is that the average effects change withallele frequencies. Indeed, if overdominance is presentthey can change sign with allele frequencies.
29
0 1 2
N
G G22
G11
G21
Allele A2 common, !1 > !2
The size of the circle denotes the weight associated withthat genotype. While the genotypic values do not change,their frequencies (and hence weights) do.
30
0 1 2
N
G G22
G11
G21
Allele A1 common, !2 > !1
Slope = !2 - !1
Again, same genotypic values as previous slide, butdifferent weights, and hence a different slope(here a change in sign!)
31
0 1 2N
G G22
G11
G21
Both A1 and A2 frequent, !1 = !2 = 0
With these allele frequencies, both alleles have the same mean value when transmitted, so that all parents have the same average offspring value -- no response to selection
32
2aa(1+k)0Genotypic
value
Q2Q2Q2Q1Q1Q1Genotype
Consider a diallelic locus, where p1 = freq(Q1)
µG = 2p2 a(1 + p1k)Mean
Allelic effects
α2 = p1a [ 1 +k (p1 � p2 ) ]α1 = � p2a [ 1 + k (p1 � p2 ) ]
Dominance deviations δij = Gij ! µG! αi ! αj
33
Average Effects and Additive Genetic Values
A (Gij ) = αi + αj
A =n∑
k=1
(α(k)
i + α(k)k
)
The ! values are the average effects of an allele
A key concept is the Additive Genetic Value (A) ofan individual,
A is called the Breeding value or Additive geneticvalue
!i(k) = effect of allele i at locus k
34
A =n∑
k=1
(α(k)
i + α(k)k
)
Why all the fuss over A?
Suppose pollen parent has A = 10 and seed parent has A = -2 for plant height
Expected average offspring height is (10-2)/2 = 4 units above the population mean. Expected offspring A = average of parental A’s
KEY: parents only pass single alleles to their offspring.Hence, they only pass along the A part of their genotypicvalue G.
35
Genetic Variances
Gij = µg + (αi + αj ) + δij
σ2(G) =n∑
k=1
σ2(α(k)i + α(k)
j ) +n∑
k=1
σ2(δ(k)ij )
σ2(G) = σ2(µg +(αi + αj ) + δij) = σ2(αi + αj ) + σ2(δij)
As Cov(!,") = 0 (under random mating)
Writing the genotypic value as
The genetic variance can be written as
This follows since
36
Genetic Variances
σ2(G) =n∑
k=1
σ2(α(k)i + α(k)
j ) +n∑
k=1
σ2(δ(k)ij )
σ2G = σ2
A + σ2D
Additive Genetic Variance(or simply Additive Variance)
Dominance Genetic Variance(or simply Dominance Variance)
Hence, total genetic variance = additive + dominancevariances,
37
Key concepts (so far)
• !i = average effect of allele i– Property of a single allele in a particular population
(depends on genetic background)
• A = Additive Genetic Value (A)– A = sum (over all loci) of average effects
– Fraction of G that parents pass along to their offspring
– Property of an individual in a particular population
• Var(A) = additive genetic variance– Variance in additive genetic values
– Property of a population
• Can estimate A or Var(A) without knowing any ofthe underlying genetical detail (forthcoming)
38
σ2A = 2p1 p2 a2[ 1+ k (p1 p2 ) ]2One locus, 2 alleles:
Q1Q1 Q1Q2 Q2Q2
0 a(1+k) 2a
Dominance alters additive variance
When dominance present, Additive variance is anasymmetric function of allele frequencies
σ2A = 2E[α2 ] = 2
m∑
i=1
α2i pi
Since E[!] = 0, Var(!) = E[(! -µa)2] = E[!2]
39
σ2D = E[δ2 ] =
m∑
i=1
m∑
j=1
δ2ij pi pj
σ2D = (2p1 p2 ak)2One locus, 2 alleles:
Q1Q1 Q1Q2 Q2Q2
0 a(1+k) 2a
Equals zero if k = 0
This is a symmetric function ofallele frequencies
Dominance variance
Can also be expressed in terms of d = ak
40
Additive variance, VA, with no dominance (k = 0)
Allele frequency, p
VA
41
Complete dominance (k = 1)
Allele frequency, p
VA
VD
42
Epistasis
Gijkl = µG + (αi + αj + αk +αl) + (δij + δkj)+ (ααik +ααil + ααjk + ααjl)+ (αδikl + αδjkl + αδkij + αδlij)+ (δδijkl)
= µG + A+ D + AA + AD + DD
These components are defined to be uncorrelated,(or orthogonal), so that
σ2G = σ2
A + σ2D + σ2
AA + σ2AD +σ2
DD
The two-locus decomposition allowing for allpossible interactions is given by
43
Gijkl = µG + (αi + αj + αk +αl) + (δij + δkj)+ (ααik +ααil + ααjk + ααjl)+ (αδikl + αδjkl + αδkij + αδlij)+ (δδijkl)
= µG + A + D + AA + AD + DD
Additive x Additive interactions -- !!, AA
interactions between a single alleleat one locus with a single allele at another
Additive x Dominance interactions -- !", AD
interactions between an allele at onelocus with the genotype at another, e.g.allele Ai and genotype Bkj
Dominance x dominance interaction --- "", DD
the interaction between the dominancedeviation at one locus with the dominancedeviation at another.
44
Effects and Variance when using atestor
• A common design in plant breeding is to crossmembers from a population to a testor to generatea testcross.– Testor can be either an inbred line or an outcrossing
population
– Often from a different heteroic group from thepopulation being tested
– Often testor is an elite genotype
• The average effect of an allele in a testcross, itsvariance, and its additive (General combiningability, GCA) and interaction (Specific combiningability, SCA) effects all follow in analogous fashionto previous results for crosses within a population
45
• The concept of the average effect of an allelewhen crossed within its population is easilyextended to the average effect of an allele whencrossed to a testor.
– Called the testcross average effect.
• The average effect of allele X in this testcross,!x
T , is defined as difference between the meanvalue of offspring getting this allele from thepopulation versus the mean value of a randomoffspring from this cross– Will turn out to be a function of the frequencies of
alleles in both the tested and the testor population.
The average effect of an allele in a testcross
46
Mean value for a testcrossSuppose the frequency of A is p in the population and pT in the testor (with q and qT similarly defined for a).
qqT
C - a
qpT
C + d
a (q)
pqT
C + d
ppT
C + a
A (p)
a (qT)A (pT)
testor
Pare
ntal
lin
e
Mean of cross = C + a(ppT - qqT) + d(pqT + qpT)
47
Average testcross mean in a seriesof RILs
• Slide 17 gave an expression for the expected averageperformance from a series of RILs formed by crossing twopopulations.
• A similar expression exists for the average testcrossperformance for a series of RILs from a cross of A x B
– Mean = (1/2) µAT + (1/2) µB
T, namely the average of thetestcross means for A and B
– More generally (since lines can, by chance, give an unequalcontribution of alleles),
• Mean = $A µAT + $B µB
T, where $A = (1- $B) is the fraction ofalleles from A in the sample if RILS
• Can use molecular markers to estimate the $x directly.Here $x is the fraction of SNP alelles from line x.
48
!AT, testcross effect of allele A
C + dAaqTa
C + aAApTA
ValueGenotypeProbabilityAllele from testorparent
Suppose parent contributes A
Mean(A transmitted) = pT(C + a) + qT(C + d) = C + pTa + qTd
!AT = Mean(A transmitted) - µ = q[a + d(qT-pT)]
!aT = Mean(a transmitted) - µ = -p[a + d(qT-pT)]
Likewise,
49
!T, the average testcross effectof an allelic substitution
• !T = !AT - !a
T is the average testcrosseffect of an allelic substitution, the changein mean trait value when an a allele in arandom testcrossed individual is replacedby an A allele
– !T = a + d(qT-pT). Note that this isindependent of the allele frequencies in theparental population, and depends ONLY on thetestor allele frequencies (pT, qT).
• !AT = q!T , !a
T = -p!T, and E(!xT) = 0
50
Testcross variance• Just as the additive genetic variance was the
population variance in the sum of the average effectsof an allele, the testcross variance is variance in theaverage testcross effects of a random allele
– Var(AT) = Var(!xT) = Var(!x
T)
– Var(!xT) = p (!A
T)2 + q (!aT)2 =
– p(q[a + d(qT-pT)])2 + q(-p[a + d(qT-pT)])2
• = pq[a + d(qT-pT)]2
– Hence, Var(!xT) = pq[a + d(qT-pT)]2
51
GCS and SCA• Consider a cross between individuals from
population 1 and population 2
• Let µ1 x 2 denote the average value for allof these crosses, and let Gij be the averagegenotypic value of an individual from across from individual (or line) i in populationone and individual (or line) j frompopulation two.
• Analogous to Fisher’s decomposition, wecan write this in terms of two additiveeffects and one interaction effect.
52
!i2 is the testcross average effect for allele i (more
generally an allele from individual i) when testedusing population 2 as a testor, with !j
1 similarly definedfor allele j (from pop 2) using one as the testor
is the interaction between allele i from and allelej in the testcross of 1 and 2
The sum over all loci of the !i2 values is the general
combining ability (GCA) of line i when crossed to line 2 (note these are cross-specific)!
The sum of the " is the specific combining ability (SCA)
53
Gij = µ + GCAi2 + GCAj
1 + SCAij12
The superscripts denoting the population in which theallele is being tested is often suppressed
The GCA is akin to the breeding value from one parent,but now it is the testcross value of that parent
The predicted mean of a particular cross is the sum ofthe two GCAs for those individuals/lines
As with average effects and dominance deviations, theseare only defined with respect to a particular referenceset of crosses (i.e., lines from Pop 1 X lines from pop 2)
54
Within-population crosses vs. testors
SCADominance valueNonadditivecomponent
Var(GCA),Var(SCA)
Var(A),Var(D)
GeneticVariances
GCA1 + GCA2A1/2 +A2/2Predicting
offspring mean
GCABreeding value AAdditive
transmittingfactor
!T !Allelic effects
testorWithin-pop