the$effectof$par-al$sweeps$on$ diversity$

21
The effect of par-al sweeps on diversity A par-al dra7 of hitchhiking Graham Coop and Peter Ralph UC Davis

Upload: others

Post on 09-Nov-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

The  effect  of  par-al  sweeps  on  diversity  

A  par-al  dra7  of  hitchhiking  

Graham  Coop  and  Peter  Ralph  UC  Davis  

The effect of selective sweeps on linked neutral variants

sweep recovery

Reduced  diversity  high  frequency  derived  alleles    

New  muta-ons    lead  to  a  skew  towards    rare  alleles  

Maynard  Smith  and  Haigh,  Kaplan  et  al,  Braverman  et  al  etc  

Background  selec-on  can  also  lead  to  a  reduc-on  in  diversity,    but  lead  to  only  a  weak  skew  towards  rare  alleles    

Recombina-on  rate  cM/kb  

Data from Drosophila Data from Humans

e.g.  Cai  et  al  2009    

Polymorph

ism  

Rate of recombination (cM/Mb)

Data from Drosophila melanogaster

Recombina-on  rate  cM/kb  

Only a very weak skew towards singletons in low recombination regions in humans. Ta

jima’s  D

 

e.g.  Shapiro  et  al  2007  

Polymorph

ism  e.g.  Shapiro  et  al  2007  

Evidence  for  linked  selec-on    

What if most newly arisen selected alleles do not sweep to rapidly fixation? E.g. due to changing environment or genomic background (Due to parallel mutation, other variation etc)

of each copy of the favored allele, assuming an additivemodel (e.g., from [50], assuming 2NmL favored mutationsper generation). Consider the worst case, where only onepossible mutation at a single site will work, so that L = 1/3.Then, if s = 1% and 4Nem = 10-3, the expected waiting timeis extremely long, namely 300,000 generations. In thisextreme case, adaptation to the new environment is essen-tially ineffective. It has been proposed that recent populationgrowth of humans could have supplied a greater input of newmutations [51,52]; additionally, even modest populationgrowth can greatly increase fixation probabilities of favoredalleles [53]. The latter effect could substantially reduce thewaiting time for new favored mutations to start spreadingat loci with very small mutational targets, although it shouldbe noted that it is still unclear when growth in censuspopulation size began to increase the effective populationsize [54].

Thesemodels assume that environmental change, broadlydefined, is a primary driver of adaptation. If this is the case,then the results would argue that soft sweeps are likely tobe common, and perhaps the main mode of adaptation.The exact balance between hard and soft sweeps woulddepend on the distribution of mutational target sizes, whichwe do not yet know. For traits where the mutational targetsize is hundreds of base pairs or more, we can expect thatadaptation from standing variation is likely to be the ruleand, additionally, that very often multiple favored mutationsmay sweep up simultaneously, with none reaching fixationduring the selective sweep [47].

Polygenic Adaptation from Standing VariationMost of the recent literature in human population geneticsfocuses on models of selection at one, or a small numberof loci, as in the previous section. This is in contrast to clas-sical models of natural and artificial selection in quantitativegenetics, where it is assumed that most traits of interestare highly polygenic, and are influenced to a small degreeby standing variation at many loci [55]. The quantitativegenetics view is supported both by classical breeding andselection experiments and occasionally field observations[56–58], as well as by recent genome-wide associationstudies showing that many traits are highly polygenic.

We would argue that for many traits, the quantitativeperspective may be closer to reality: that is, that short-termadaptation takes place by selection on standing variation

atmany loci simultaneously (e.g., [22,59–61]). Consider a traitthat is affected by a large (finite) number of loci. If the envi-ronment shifts so that there is a new phenotypic optimum,then the population will adapt by allele frequency shifts atmany loci (Figure 3). Once the typical phenotype in the pop-ulation matches the new optimum, selection will weaken.This means that it may be very common for selection topush alleles upwards in frequency, but generally not to fixa-tion [22,62,63]. In principle, this type of process could allowvery rapid adaptation, yet be difficult to detect using mostcurrent population genetic methods.The example of human height illustrates these issues.

Height has long been a textbook example of a polygenictrait [64]; recently, three genome-wide association studiesidentified a total of around 50 loci that contribute to adultheight in Europeans [65–68]. Each associated allele affectstotal height by about 3–6mm, and together these loci explainabout 5% of the population variation in height, after control-ling for sex. Since height is extremely heritable [69], manymore loci remain to be found. If there were a sudden onsetof strong selection for increased height, we could expecta rapid upward shift in average height [55]. However, theresponse to selection would be generated by modest allelefrequency shifts at many loci that are already polymorphic.Even with very strong selection, and a strong phenotypicresponse, standard methods for detecting selective sweepswould have little power. In the final section of this review, wewill discuss possible approaches to studying polygenicadaptation.The idea that polygenic adaptation from standing variation

is important could help to explain key aspects of the data.This would allow rapid phenotypic adaptation (for example,to high altitude, as described above) without necessarilygenerating any large differences in allele frequenciesbetween populations. This could also help to explain theapparent scarcity of rapid, hard sweeps. Qualitatively,increased drift of neutral alleles within genes due to effectsof polygenic selection on nearby sites [70] should createthe types of differences that are observed between genicand non-genic regions. However, it is not yet clear whetherthe magnitude of this effect could explain the observedpatterns (background selection may also contribute [29]).Of course, we do not mean to imply that all adaptation

occurs in this way. Indeed, it is notable that some of themost impressive selection signals involve loci that act in a

Bef

ore

sele

ctio

nA

fter

sele

ctio

n

Hard sweep Polygenic adaptation

Current Biology

Figure 3. A cartoon illustration of the hardsweep and polygenic adaptation models.

The horizontal blue lines represent haplo-types, and the red lines indicate regionsthat are identical by descent (IBD). The redcircles indicate alleles that are favoredfollowing an environmental change. In themonogenic (hard selection) model, selectiondrives a new mutation to fixation, creatinga large region of IBD. In the polygenicmodel, prior to selection red alleles exist atmodest frequencies at various loci acrossthe genome. (The red alleles can be thoughtof as being alleles that all shift a particularphenotype in the same direction, e.g.,alleles that increase height.) After selection,the genome-wide abundance of favored

alleles has increased, but in this cartoon they have not fixed at any locus. In this example, at some loci selection has acted on new variants,creating signals of partial sweeps at those loci (the x-axis scale is not necessarily the same in the left- and right-hand plots).

Current Biology Vol 20 No 4R212

Pritchard,  Pickrell,  Coop  E.g.  Chevin  and  Hospital  

What if most newly arisen selected alleles do not sweep to rapidly fixation? E.g. due to changing environment or genomic background (Due to parallel mutation, other variation etc)

The  coalescent  process  with  a  par-al  sweep  A  model  of  recurrent  par-al  sweeps  Model  where  these  sweeps  occur  along  the  genome  -­‐Effect  on  diversity  -­‐Effect  on  frequency  spectrum    

0.00

0.05

0.10

0.15

0.0 0.2 0.4 0.6 0.8 1.0

sweep.traj$V1

sweep.traj$V2

B.  Coalescent  with  trajectory  

0  The  derived  allele  arose  τ    Genera-ons  ago  

X(t)  is  the  frequency  of  the    Derived  allele  at  -me  t    

Condi-ons  on  trajectory:  Selected  allele  ini-ally  quickly  increases    In  frequency.    

τ

0.00

0.05

0.10

0.15

0.0 0.2 0.4 0.6 0.8 1.0

sweep.traj$V1

sweep.traj$V2

0 neutral  site    a  gene-c  distance  r  away  from  the  selected  locus      Consider  the  coalescent  process    of  a  pair  of  lineages  at  this  site.    

0.00

0.05

0.10

0.15

0.0 0.2 0.4 0.6 0.8 1.0

sweep.traj$V1

sweep.traj$V2

0

Fail  to  recombine  off  derived  background,    forced  to  coalesce  

Recombine  off  derived  background  Not  forced  to  coalesce.    

r  

0.00

0.05

0.10

0.15

0.0 0.2 0.4 0.6 0.8 1.0

sweep.traj$V1

sweep.traj$V2

B.  Coalescent  with  trajectory  

0

t  

Probability  that  the  lineage  is  of  the  derived  type  at  -me  0 =      

r  τ >>1  

Imagine  a  neutral  site    a  gene-c  distance  r  away  from  the  selected  locus    

0.00

0.05

0.10

0.15

0.0 0.2 0.4 0.6 0.8 1.0

sweep.traj$V1

sweep.traj$V2

B.  Coalescent  with  trajectory  

0

Probability  that  2  lineages  Are  forced  to  coalesce  At  -me  0 by  sweep  =  q2  q2

0.00

0.05

0.10

0.15

0.0 0.2 0.4 0.6 0.8 1.0

sweep.traj$V1

sweep.traj$V2

B.  Coalescent  with  trajectory  

Probability  that  out  of  k  lineages  i  are  force  to  coalesce  is  binomial:  

Simple  trajectories  

Also  holds  for  other  Trajectories  when    r  >>  s2  

Selected  allele  moves  quickly  from  1/2N  to  x  in  -me  tx    Then  stays  at  x,  or  goes  to  fixa-on,  or  loss  on  a  slower  -me-­‐scale    (e.g.  with  selec-on  coefficient  s2,  -­‐s2,  or  0  respec-vely)    

q ⇡ xe

�rtx

0      tx   τ

If instead the allele on reaching x is now deleterious with a selectioncoefficient, −s2, then as long as x is not too high the deterministic decreaseof the selected allele forward in time should be well approximated by X1(t) =xe−s2(t−tx) t ≥ tx. Inserting this into equation (5), we obtain

q = xe−rtx r

r + s2(7)

If sd # r then q ≈ e−rtxx(1 − s2/r). While if s2 % r then q ≈ ertxxr/sd,in this case strong selection is acting to remove the allele means so there islittle chance of it contributing genetic material to the population.

Consideration of these two extreme cases, shows that if after the selectedallele reaches the frequency x its frequency stays close to x for a time greaterthan 1/r, then

q ≈ xe−rtx (8)

this will hold regardless of whether the derived allele eventually goes to lossor fixation, or whether it at some later point it begins to change frequencyvery rapidly again.

To demonstrate this independence of partial linked variation from thefate of the selected allele: consider an allele with an additive selective ad-vantage s that arose τ generations ago and reached frequency x within tgenerations, its trajectory follows the standard logistic curve. On reaching xits selection coefficient changes to either s2, −s2 or it remains perfectly bal-anced at frequency x. These three possible trajectories are plotted in Figure2A. I simulated using mssel neutral genealogies for a recombining sequencesurrounding the selected locus. The expected neutral pairwise coalescencetime is shown in Figure 2B and C as we move away from the selected locus.Close to the selected site the level of neutral diversity is dependent on thewhole trajectory. However, as we move further away from the selected sitethe three curves converge (as q is converging to qx for all three trajectories).

Using qx to approximate the probability that a lineage is caught by thesweep at neutral sites partially linked to the selected site the expected pair-wise coalescent time is given by

2N(1− q2xeτ/(2N)) (9)

Found by considering whether a pair of lineages coalesce before, during, orafter the sweep. This approximation is plotted as a line in figure 2. With q

as inequa-tion(8)?Hm,andFigure 2shows π,notmeancoales-cencetime?

9

tx  /2N  =  0.0053  τ/2N  =  0.05  

Average  

Recurrent  sweep  process  

•  Sweeps  happen  at  rate  ν  •  Rate  of  sweep  coalescence  ν q2

•  Neutral  rate  of  coalescence:  1/(2N)  •  Total  rate  of  coalescence  νq2 + 1/(2Ν)

 

u=muta-on  rate  at  neutral  locus.  

0 200 400 600 800 1000

0.20.4

0.60.8

1.0

x = 0.4

/e

Position, 4Nr

i = 2i = 4i = 8StepTop−hatTheory

0 200 400 600 800 1000

0.20.4

0.60.8

1.0

x = 0.8

/e

Position, 4Nr0 200 400 600 800 1000

0.20.4

0.60.8

1.0

x = 0.95

/e

Position, 4Nr

•  For  our  simple  approxima-on  

What if once the allele reaches a frequency x and then its selection co-efficient is reduced to s2? If we model the frequency of the selected alleleforward in time, after it reaches x, by X1(t) = 1 − (1 − x)e−s2(t−tx) t ≥ tx.(Assuming the allele has past the inflexion point in the logistic curve). Inthis case, using eq. (5)

q = (1 −r

s2 + r(1 − x))e−rtx (6)

Thus if the second selection coefficient is still large s2 >> r, then we obtainq = e−rtx as the selected allele has gone quickly to fixation as in a full sweep,and the only time for recombination is in the early phase of the trajectorytx. While if the second selection coefficient is weak s2 << r then q = xe−rtx .

If instead the allele on reaching x is now deleterious with a selectioncoefficient, −s2, then as long as x is not too high the deterministic decreaseof the selected allele forward in time should be well approximated by X1(t) =xe−s2(t−tx) t ≥ tx. Inserting this into equation (5), we obtain

q = xe−rtx r

r + s2(7)

If sd << r then q ≈ e−rtxx(1 − s2/r). While if s2 >> r then q ≈ ertxxr/sd,in this case strong selection is acting to remove the allele means so there islittle chance of it contributing genetic material to the population.

Thus, if after the selected allele reaches the frequency x, it then beginsto move much more slowly, such that it is close to x for a time greater than1/r. Then

q ≈ xe−rtx (8)

this will hold regardless of whether the derived allele eventually goes to lossor fixation, or whether it at some later point it begins to change frequencyvery rapidly again.

To demonstrate this independence of partial linked variation from thefate of the selected allele: consider an allele that arose T + t generations, andreached frequency x within t generations. Upon reaching x its selection coef-ficient changes to either sd, −sd or it remains perfectly balanced at frequencyx. These three possible trajectories are plotted in Figure 1A. The averagelevel of neutral diversity moving away from the selected locus is shown inFigure 1B and C. Close to the selected site the level of neutral diversity isvery dependent on the trajectory. However, as we move further away from

6

4Nν=1,    4Nν=2  4Nν=4  Approx.  

recurrent  loss  traj.  recurrent  fix  traj.  

x=0.8  

u=  muta-on  rate  π/(4Nu)  

Average  

Homogeneous  sweeps  

     

Infinitely  long  chromosome  recombining  at  rate  rBP  Sweeps  occur  homogeneous  at  rate  νBP per  genera-on

Under  our  simple  par-al  sweep  model:  J  =  x2/tx  

Where  J  depends  only  on  the  form  taken  by  the  trajectory.    

J =

Z 1

0q(r)2dr

●●●

●●●

●●●

●●●

●●

●●

●●●

● ●

●●●●●●

●●●

● ●

●●

●●

●● ●

●●●

● ●

●●

●●●

●● ●●

●●

●●

●●

●●

● ●

●●

● ●

●●

● ●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

0.0 0.5 1.0 1.5 2.0 2.5

01

23

45

6

rec rate (cM/Mb)

Syn

dive

rsity

(%)

tx  =  1000  gens    (s~0.1%)  

N=106  vBP  x2  =  3x10-­‐13    

x  =    100%    20%    5%  vBP=  3e-­‐13        8e-­‐12      1e-­‐10  

Assuming  none  of  the  reduc-on  is  due  to  Background  Selec-on  

Recombina-on  rate  cM/kb  

Syn.  Diversity  (%

)  Data from Drosophila melanogaster (Shapiro  et  al  2007)  

2NvBP  x2/tx=7x10-­‐9  

0.00

0.05

0.10

0.15

0.0 0.2 0.4 0.6 0.8 1.0

sweep.traj$V1

sweep.traj$V2

A.  Mul-ple  mergers  coalescent   B.  Coalescent  with  trajectory  

0

Where  Jk,i  depend  only  on  the  form  taken  by  trajectories    E.g.  for  our  simple  trajectory  J  is  a  func-on  of  x  (freq.  sweeps  achieve)    So  rate  of  coalescence  controlled  by    And  number  of  lineages  forced  to  coalesce  by  x  (or  distribu-on  on  x).  

Homogeneous  sweeps  at  rate  vBP  ,recombina-on  at  rate  rBP.    Then  i  out  of  k  lineages  coalesce  at  rate:    

⌫BP

rBP

⇡/(4Nu) = 0.1

For  same  reduc-on  in  diversity  we  can  get  very  different    distor-ons  to  frequency  spectrum  

⇡/(4Nu) = 0.1

Conclusions  

•  Par-al  sweeps  may  be  responsible  for  reduc-on  in  diversity  in  low  recombina-on  regions  

•  If  they  only  sweep  to  low  frequency  may  result  in  likle  distor-on  in  neutral  frequency  spectrum  

•  Relaxing  our  assump-ons  about  sweeps  leads  to  a  (depressingly)  much  broader  set  of  predic-ons.  

•  Idea:  Rather  than  es-ma-ng  one  model  why  not  es-mate  rates  of  different  types  of  coalescence  (Jk,i).      

Thanks  

Peter  Ralph  

Thanks  to  Yaniv  Brandvain,  Chuck  Langley,  Molly  Przeworski,    Alisa  Sedghifar,  and  Guy  Sella  for  helpful  conversa-ons