the$effectof$par-al$sweeps$on$ diversity$
TRANSCRIPT
The effect of par-al sweeps on diversity
A par-al dra7 of hitchhiking
Graham Coop and Peter Ralph UC Davis
The effect of selective sweeps on linked neutral variants
sweep recovery
Reduced diversity high frequency derived alleles
New muta-ons lead to a skew towards rare alleles
Maynard Smith and Haigh, Kaplan et al, Braverman et al etc
Background selec-on can also lead to a reduc-on in diversity, but lead to only a weak skew towards rare alleles
Recombina-on rate cM/kb
Data from Drosophila Data from Humans
e.g. Cai et al 2009
Polymorph
ism
Rate of recombination (cM/Mb)
Data from Drosophila melanogaster
Recombina-on rate cM/kb
Only a very weak skew towards singletons in low recombination regions in humans. Ta
jima’s D
e.g. Shapiro et al 2007
Polymorph
ism e.g. Shapiro et al 2007
Evidence for linked selec-on
What if most newly arisen selected alleles do not sweep to rapidly fixation? E.g. due to changing environment or genomic background (Due to parallel mutation, other variation etc)
of each copy of the favored allele, assuming an additivemodel (e.g., from [50], assuming 2NmL favored mutationsper generation). Consider the worst case, where only onepossible mutation at a single site will work, so that L = 1/3.Then, if s = 1% and 4Nem = 10-3, the expected waiting timeis extremely long, namely 300,000 generations. In thisextreme case, adaptation to the new environment is essen-tially ineffective. It has been proposed that recent populationgrowth of humans could have supplied a greater input of newmutations [51,52]; additionally, even modest populationgrowth can greatly increase fixation probabilities of favoredalleles [53]. The latter effect could substantially reduce thewaiting time for new favored mutations to start spreadingat loci with very small mutational targets, although it shouldbe noted that it is still unclear when growth in censuspopulation size began to increase the effective populationsize [54].
Thesemodels assume that environmental change, broadlydefined, is a primary driver of adaptation. If this is the case,then the results would argue that soft sweeps are likely tobe common, and perhaps the main mode of adaptation.The exact balance between hard and soft sweeps woulddepend on the distribution of mutational target sizes, whichwe do not yet know. For traits where the mutational targetsize is hundreds of base pairs or more, we can expect thatadaptation from standing variation is likely to be the ruleand, additionally, that very often multiple favored mutationsmay sweep up simultaneously, with none reaching fixationduring the selective sweep [47].
Polygenic Adaptation from Standing VariationMost of the recent literature in human population geneticsfocuses on models of selection at one, or a small numberof loci, as in the previous section. This is in contrast to clas-sical models of natural and artificial selection in quantitativegenetics, where it is assumed that most traits of interestare highly polygenic, and are influenced to a small degreeby standing variation at many loci [55]. The quantitativegenetics view is supported both by classical breeding andselection experiments and occasionally field observations[56–58], as well as by recent genome-wide associationstudies showing that many traits are highly polygenic.
We would argue that for many traits, the quantitativeperspective may be closer to reality: that is, that short-termadaptation takes place by selection on standing variation
atmany loci simultaneously (e.g., [22,59–61]). Consider a traitthat is affected by a large (finite) number of loci. If the envi-ronment shifts so that there is a new phenotypic optimum,then the population will adapt by allele frequency shifts atmany loci (Figure 3). Once the typical phenotype in the pop-ulation matches the new optimum, selection will weaken.This means that it may be very common for selection topush alleles upwards in frequency, but generally not to fixa-tion [22,62,63]. In principle, this type of process could allowvery rapid adaptation, yet be difficult to detect using mostcurrent population genetic methods.The example of human height illustrates these issues.
Height has long been a textbook example of a polygenictrait [64]; recently, three genome-wide association studiesidentified a total of around 50 loci that contribute to adultheight in Europeans [65–68]. Each associated allele affectstotal height by about 3–6mm, and together these loci explainabout 5% of the population variation in height, after control-ling for sex. Since height is extremely heritable [69], manymore loci remain to be found. If there were a sudden onsetof strong selection for increased height, we could expecta rapid upward shift in average height [55]. However, theresponse to selection would be generated by modest allelefrequency shifts at many loci that are already polymorphic.Even with very strong selection, and a strong phenotypicresponse, standard methods for detecting selective sweepswould have little power. In the final section of this review, wewill discuss possible approaches to studying polygenicadaptation.The idea that polygenic adaptation from standing variation
is important could help to explain key aspects of the data.This would allow rapid phenotypic adaptation (for example,to high altitude, as described above) without necessarilygenerating any large differences in allele frequenciesbetween populations. This could also help to explain theapparent scarcity of rapid, hard sweeps. Qualitatively,increased drift of neutral alleles within genes due to effectsof polygenic selection on nearby sites [70] should createthe types of differences that are observed between genicand non-genic regions. However, it is not yet clear whetherthe magnitude of this effect could explain the observedpatterns (background selection may also contribute [29]).Of course, we do not mean to imply that all adaptation
occurs in this way. Indeed, it is notable that some of themost impressive selection signals involve loci that act in a
Bef
ore
sele
ctio
nA
fter
sele
ctio
n
Hard sweep Polygenic adaptation
Current Biology
Figure 3. A cartoon illustration of the hardsweep and polygenic adaptation models.
The horizontal blue lines represent haplo-types, and the red lines indicate regionsthat are identical by descent (IBD). The redcircles indicate alleles that are favoredfollowing an environmental change. In themonogenic (hard selection) model, selectiondrives a new mutation to fixation, creatinga large region of IBD. In the polygenicmodel, prior to selection red alleles exist atmodest frequencies at various loci acrossthe genome. (The red alleles can be thoughtof as being alleles that all shift a particularphenotype in the same direction, e.g.,alleles that increase height.) After selection,the genome-wide abundance of favored
alleles has increased, but in this cartoon they have not fixed at any locus. In this example, at some loci selection has acted on new variants,creating signals of partial sweeps at those loci (the x-axis scale is not necessarily the same in the left- and right-hand plots).
Current Biology Vol 20 No 4R212
Pritchard, Pickrell, Coop E.g. Chevin and Hospital
What if most newly arisen selected alleles do not sweep to rapidly fixation? E.g. due to changing environment or genomic background (Due to parallel mutation, other variation etc)
The coalescent process with a par-al sweep A model of recurrent par-al sweeps Model where these sweeps occur along the genome -‐Effect on diversity -‐Effect on frequency spectrum
0.00
0.05
0.10
0.15
0.0 0.2 0.4 0.6 0.8 1.0
sweep.traj$V1
sweep.traj$V2
B. Coalescent with trajectory
0 The derived allele arose τ Genera-ons ago
X(t) is the frequency of the Derived allele at -me t
Condi-ons on trajectory: Selected allele ini-ally quickly increases In frequency.
τ
0.00
0.05
0.10
0.15
0.0 0.2 0.4 0.6 0.8 1.0
sweep.traj$V1
sweep.traj$V2
0 neutral site a gene-c distance r away from the selected locus Consider the coalescent process of a pair of lineages at this site.
0.00
0.05
0.10
0.15
0.0 0.2 0.4 0.6 0.8 1.0
sweep.traj$V1
sweep.traj$V2
0
Fail to recombine off derived background, forced to coalesce
Recombine off derived background Not forced to coalesce.
r
0.00
0.05
0.10
0.15
0.0 0.2 0.4 0.6 0.8 1.0
sweep.traj$V1
sweep.traj$V2
B. Coalescent with trajectory
0
t
Probability that the lineage is of the derived type at -me 0 =
r τ >>1
Imagine a neutral site a gene-c distance r away from the selected locus
0.00
0.05
0.10
0.15
0.0 0.2 0.4 0.6 0.8 1.0
sweep.traj$V1
sweep.traj$V2
B. Coalescent with trajectory
0
Probability that 2 lineages Are forced to coalesce At -me 0 by sweep = q2 q2
0.00
0.05
0.10
0.15
0.0 0.2 0.4 0.6 0.8 1.0
sweep.traj$V1
sweep.traj$V2
B. Coalescent with trajectory
Probability that out of k lineages i are force to coalesce is binomial:
Simple trajectories
Also holds for other Trajectories when r >> s2
Selected allele moves quickly from 1/2N to x in -me tx Then stays at x, or goes to fixa-on, or loss on a slower -me-‐scale (e.g. with selec-on coefficient s2, -‐s2, or 0 respec-vely)
q ⇡ xe
�rtx
0 tx τ
If instead the allele on reaching x is now deleterious with a selectioncoefficient, −s2, then as long as x is not too high the deterministic decreaseof the selected allele forward in time should be well approximated by X1(t) =xe−s2(t−tx) t ≥ tx. Inserting this into equation (5), we obtain
q = xe−rtx r
r + s2(7)
If sd # r then q ≈ e−rtxx(1 − s2/r). While if s2 % r then q ≈ ertxxr/sd,in this case strong selection is acting to remove the allele means so there islittle chance of it contributing genetic material to the population.
Consideration of these two extreme cases, shows that if after the selectedallele reaches the frequency x its frequency stays close to x for a time greaterthan 1/r, then
q ≈ xe−rtx (8)
this will hold regardless of whether the derived allele eventually goes to lossor fixation, or whether it at some later point it begins to change frequencyvery rapidly again.
To demonstrate this independence of partial linked variation from thefate of the selected allele: consider an allele with an additive selective ad-vantage s that arose τ generations ago and reached frequency x within tgenerations, its trajectory follows the standard logistic curve. On reaching xits selection coefficient changes to either s2, −s2 or it remains perfectly bal-anced at frequency x. These three possible trajectories are plotted in Figure2A. I simulated using mssel neutral genealogies for a recombining sequencesurrounding the selected locus. The expected neutral pairwise coalescencetime is shown in Figure 2B and C as we move away from the selected locus.Close to the selected site the level of neutral diversity is dependent on thewhole trajectory. However, as we move further away from the selected sitethe three curves converge (as q is converging to qx for all three trajectories).
Using qx to approximate the probability that a lineage is caught by thesweep at neutral sites partially linked to the selected site the expected pair-wise coalescent time is given by
2N(1− q2xeτ/(2N)) (9)
Found by considering whether a pair of lineages coalesce before, during, orafter the sweep. This approximation is plotted as a line in figure 2. With q
as inequa-tion(8)?Hm,andFigure 2shows π,notmeancoales-cencetime?
9
tx /2N = 0.0053 τ/2N = 0.05
Average
Recurrent sweep process
• Sweeps happen at rate ν • Rate of sweep coalescence ν q2
• Neutral rate of coalescence: 1/(2N) • Total rate of coalescence νq2 + 1/(2Ν)
u=muta-on rate at neutral locus.
0 200 400 600 800 1000
0.20.4
0.60.8
1.0
x = 0.4
/e
Position, 4Nr
i = 2i = 4i = 8StepTop−hatTheory
0 200 400 600 800 1000
0.20.4
0.60.8
1.0
x = 0.8
/e
Position, 4Nr0 200 400 600 800 1000
0.20.4
0.60.8
1.0
x = 0.95
/e
Position, 4Nr
• For our simple approxima-on
What if once the allele reaches a frequency x and then its selection co-efficient is reduced to s2? If we model the frequency of the selected alleleforward in time, after it reaches x, by X1(t) = 1 − (1 − x)e−s2(t−tx) t ≥ tx.(Assuming the allele has past the inflexion point in the logistic curve). Inthis case, using eq. (5)
q = (1 −r
s2 + r(1 − x))e−rtx (6)
Thus if the second selection coefficient is still large s2 >> r, then we obtainq = e−rtx as the selected allele has gone quickly to fixation as in a full sweep,and the only time for recombination is in the early phase of the trajectorytx. While if the second selection coefficient is weak s2 << r then q = xe−rtx .
If instead the allele on reaching x is now deleterious with a selectioncoefficient, −s2, then as long as x is not too high the deterministic decreaseof the selected allele forward in time should be well approximated by X1(t) =xe−s2(t−tx) t ≥ tx. Inserting this into equation (5), we obtain
q = xe−rtx r
r + s2(7)
If sd << r then q ≈ e−rtxx(1 − s2/r). While if s2 >> r then q ≈ ertxxr/sd,in this case strong selection is acting to remove the allele means so there islittle chance of it contributing genetic material to the population.
Thus, if after the selected allele reaches the frequency x, it then beginsto move much more slowly, such that it is close to x for a time greater than1/r. Then
q ≈ xe−rtx (8)
this will hold regardless of whether the derived allele eventually goes to lossor fixation, or whether it at some later point it begins to change frequencyvery rapidly again.
To demonstrate this independence of partial linked variation from thefate of the selected allele: consider an allele that arose T + t generations, andreached frequency x within t generations. Upon reaching x its selection coef-ficient changes to either sd, −sd or it remains perfectly balanced at frequencyx. These three possible trajectories are plotted in Figure 1A. The averagelevel of neutral diversity moving away from the selected locus is shown inFigure 1B and C. Close to the selected site the level of neutral diversity isvery dependent on the trajectory. However, as we move further away from
6
4Nν=1, 4Nν=2 4Nν=4 Approx.
recurrent loss traj. recurrent fix traj.
x=0.8
u= muta-on rate π/(4Nu)
Average
Homogeneous sweeps
Infinitely long chromosome recombining at rate rBP Sweeps occur homogeneous at rate νBP per genera-on
Under our simple par-al sweep model: J = x2/tx
Where J depends only on the form taken by the trajectory.
J =
Z 1
0q(r)2dr
●●●
●
●●●
●●●
●●●
●●
●
●●
●●●
● ●
●
●●●●●●
●●●
● ●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●● ●
●●●
●
●
●
● ●
●●
●
●
●●●
●
●
●
●● ●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
● ●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
0.0 0.5 1.0 1.5 2.0 2.5
01
23
45
6
rec rate (cM/Mb)
Syn
dive
rsity
(%)
tx = 1000 gens (s~0.1%)
N=106 vBP x2 = 3x10-‐13
x = 100% 20% 5% vBP= 3e-‐13 8e-‐12 1e-‐10
Assuming none of the reduc-on is due to Background Selec-on
Recombina-on rate cM/kb
Syn. Diversity (%
) Data from Drosophila melanogaster (Shapiro et al 2007)
2NvBP x2/tx=7x10-‐9
0.00
0.05
0.10
0.15
0.0 0.2 0.4 0.6 0.8 1.0
sweep.traj$V1
sweep.traj$V2
A. Mul-ple mergers coalescent B. Coalescent with trajectory
0
Where Jk,i depend only on the form taken by trajectories E.g. for our simple trajectory J is a func-on of x (freq. sweeps achieve) So rate of coalescence controlled by And number of lineages forced to coalesce by x (or distribu-on on x).
Homogeneous sweeps at rate vBP ,recombina-on at rate rBP. Then i out of k lineages coalesce at rate:
⌫BP
rBP
⇡/(4Nu) = 0.1
For same reduc-on in diversity we can get very different distor-ons to frequency spectrum
⇡/(4Nu) = 0.1
Conclusions
• Par-al sweeps may be responsible for reduc-on in diversity in low recombina-on regions
• If they only sweep to low frequency may result in likle distor-on in neutral frequency spectrum
• Relaxing our assump-ons about sweeps leads to a (depressingly) much broader set of predic-ons.
• Idea: Rather than es-ma-ng one model why not es-mate rates of different types of coalescence (Jk,i).