the effec otf recombination on background selection*€¦ · effect of background selection than...

16
Genet. Res., Camb. (1996), 61, pp. 159-174 With 3 text-figures Copyright © 1996 Cambridge University Press 159 The effect of recombination on background selection* MAGNUS NORDBORGf, BRIAN CHARLESWORTH AND DEBORAH CHARLESWORTH Department of Ecology & Evolution, The University of Chicago, 1101 E. 57th St, Chicago, IL 60637-1573, USA. Tel: (312) 702-1040, Fax: {312) 702-9740, E-mail: [email protected] (Received 3 July 1995 and in revised form 20 October 1995) Summary An approximate equation is derived, which predicts the effect on variability at a neutral locus of background selection due to a set of partly linked deleterious mutations. Random mating, multiplicative fitnesses, and sufficiently large population size that the selected loci are in mutation/selection equilibrium are assumed. Given these assumptions, the equation is valid for an arbitrary genetic map, and for an arbitrary distribution of selection coefficients across loci. Monte Carlo computer simulations show that the formula performs well for small population sizes under a wide range of conditions, and even seems to apply when there are epistatic fitness interactions among the selected loci. Failure occurred only with very weak selection and tight linkage. The formula is shown to imply that weakly selected mutations are more likely than strongly selected mutations to produce regional patterning of variability along a chromosome in response to local variation in recombination rates. Loci at the extreme tip of a chromosome experience a smaller effect of background selection than loci closer to the centre. It is shown that background selection can produce a considerable overall reduction in variation in organisms with small numbers of chromosomes and short maps, such as Drosophila. Large overall effects are less likely in species with higher levels of genetic recombination, such as mammals, although local reductions in regions of reduced recombination might be detectable. 1. Introduction The possible effects of natural selection on variation and evolution at neutral or weakly selected linked loci have recently attracted a good deal of attention (Birky & Walsh, 1988; Kaplan et al. 1989; Stephan et al. 1992; Charlesworth et al. 1993; Wiehe & Stephan, 1993; Barton, 1994, 1995; Charlesworth, 1994; Gillespie, 1994; Hudson, 1994; Hudson & Kaplan, 1994, 1995; Braverman et al. 1995; Charlesworth et al. 1995; Simonsen et al. 1995), 20 yr after the pioneering work on this subject (Maynard Smith & Haigh, 1974; Ohta & Kimura, 1975; Thomson, 1977). In addition, the effect of artificial selection in reducing the effective population size at neutral loci has been studied theoretically (Santiago & Caballero, 1995; Santiago, in preparation). This work has in part been * This paper is dedicated to Richard Lewontin on the occasion of his 65th birthday, t Corresponding author. stimulated by data from surveys of DNA variation in natural populations of Drosophila, which consistently indicate that genetic variability is lower for loci in regions where genetic recombination is relatively infrequent, compared with regions in which it occurs at higher frequencies (Aquadro et al. 1994; Kreitman & Wayne, 1994). In addition, codon bias in D. melanogaster appears to be lower in regions of reduced recombination, suggesting that selection at weakly selected sites is less effective when recombination is infrequent (Kliman & Hey, 1993). Two main hypotheses have been proposed to explain these observations. The 'selective sweep' model (Berry el al. 1991) appeals to hitch-hiking of neutral (or weakly selected) variants by favourable mutations that arise at closely linked loci, and which cause a substantial loss in variation as a result of the fixation of surrounding chromosomal regions (Maynard Smith & Haigh, 1974; Kapian et al. 1989; Stephan et al. 1992; Wiehe & Stephan, 1993; Barton, 1994, 1995; Braverman et al. 1995; Simonsen et al. https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0016672300033619 Downloaded from https://www.cambridge.org/core. IP address: 54.39.106.173, on 30 Mar 2021 at 16:16:57, subject to the Cambridge Core terms of use, available at

Upload: others

Post on 19-Oct-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

  • Genet. Res., Camb. (1996), 61, pp. 159-174 With 3 text-figures Copyright © 1996 Cambridge University Press 159

    The effect of recombination on background selection*

    MAGNUS NORDBORGf, BRIAN CHARLESWORTHAND DEBORAH CHARLESWORTHDepartment of Ecology & Evolution, The University of Chicago, 1101 E. 57th St, Chicago, IL 60637-1573, USA. Tel: (312) 702-1040,Fax: {312) 702-9740, E-mail: [email protected]

    (Received 3 July 1995 and in revised form 20 October 1995)

    Summary

    An approximate equation is derived, which predicts the effect on variability at a neutral locus ofbackground selection due to a set of partly linked deleterious mutations. Random mating,multiplicative fitnesses, and sufficiently large population size that the selected loci are inmutation/selection equilibrium are assumed. Given these assumptions, the equation is valid for anarbitrary genetic map, and for an arbitrary distribution of selection coefficients across loci. MonteCarlo computer simulations show that the formula performs well for small population sizes undera wide range of conditions, and even seems to apply when there are epistatic fitness interactionsamong the selected loci. Failure occurred only with very weak selection and tight linkage. Theformula is shown to imply that weakly selected mutations are more likely than strongly selectedmutations to produce regional patterning of variability along a chromosome in response to localvariation in recombination rates. Loci at the extreme tip of a chromosome experience a smallereffect of background selection than loci closer to the centre. It is shown that background selectioncan produce a considerable overall reduction in variation in organisms with small numbers ofchromosomes and short maps, such as Drosophila. Large overall effects are less likely in specieswith higher levels of genetic recombination, such as mammals, although local reductions in regionsof reduced recombination might be detectable.

    1. Introduction

    The possible effects of natural selection on variationand evolution at neutral or weakly selected linked locihave recently attracted a good deal of attention (Birky& Walsh, 1988; Kaplan et al. 1989; Stephan et al.1992; Charlesworth et al. 1993; Wiehe & Stephan,1993; Barton, 1994, 1995; Charlesworth, 1994;Gillespie, 1994; Hudson, 1994; Hudson & Kaplan,1994, 1995; Braverman et al. 1995; Charlesworth etal. 1995; Simonsen et al. 1995), 20 yr after thepioneering work on this subject (Maynard Smith &Haigh, 1974; Ohta & Kimura, 1975; Thomson, 1977).In addition, the effect of artificial selection in reducingthe effective population size at neutral loci has beenstudied theoretically (Santiago & Caballero, 1995;Santiago, in preparation). This work has in part been

    * This paper is dedicated to Richard Lewontin on the occasion ofhis 65th birthday,

    t Corresponding author.

    stimulated by data from surveys of DNA variation innatural populations of Drosophila, which consistentlyindicate that genetic variability is lower for loci inregions where genetic recombination is relativelyinfrequent, compared with regions in which it occursat higher frequencies (Aquadro et al. 1994; Kreitman& Wayne, 1994). In addition, codon bias in D.melanogaster appears to be lower in regions of reducedrecombination, suggesting that selection at weaklyselected sites is less effective when recombination isinfrequent (Kliman & Hey, 1993).

    Two main hypotheses have been proposed toexplain these observations. The 'selective sweep'model (Berry el al. 1991) appeals to hitch-hiking ofneutral (or weakly selected) variants by favourablemutations that arise at closely linked loci, and whichcause a substantial loss in variation as a result of thefixation of surrounding chromosomal regions(Maynard Smith & Haigh, 1974; Kapian et al. 1989;Stephan et al. 1992; Wiehe & Stephan, 1993; Barton,1994, 1995; Braverman et al. 1995; Simonsen et al.

    https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0016672300033619Downloaded from https://www.cambridge.org/core. IP address: 54.39.106.173, on 30 Mar 2021 at 16:16:57, subject to the Cambridge Core terms of use, available at

    https://www.cambridge.org/core/termshttps://doi.org/10.1017/S0016672300033619https://www.cambridge.org/core

  • M. Nordborg, B. Charlesworth and Deborah Charlesworth 160

    1995; Stephan, 1995). The 'background selection'model involves the loss of neutral or nearly neutralvariants as a result of elimination of linked deleteriousmutant alleles from the population (Charlesworth etal. 1993; Charlesworth, 1994; Hudson, 1994; Hudson& Kaplan, 1994, 1995; Barton, 1995; Charlesworth etal. 1995).

    To evaluate the relative contributions of theseprocesses to the patterns discerned in the data, it isnecessary to develop theoretical models which aresufficiently realistic to allow quantitative predictionsabout the relations between chromosomal location(i.e. recombinational environment) and genetic varia-bility and rate of evolution at neutral or weaklyselected loci. Since genetic recombination occurs at alow rate even in regions of the Drosophila genomewhere its frequency is reduced, models which includethe effects of genetic recombination are essential forthis purpose. Although useful results can be obtainedby computer simulation (Charlesworth et al. 1993,1995; Braverman et al. 1995; Hudson & Kaplan,1995; Simonsen et al. 1995), it is clearly valuable tohave analytic results, provided that sufficiently ac-curate approximations to reality can be achieved.Several such models have been developed for theanalysis of selective sweeps (Maynard Smith & Haigh,1974; Kaplan et al. 1989; Stephan et al. 1992; Wiehe& Stephan, 1993; Stephan, 1995). With respect tobackground selection, Hudson & Kaplan (1994) haveobtained a formula for the reduction in geneticdiversity at a neutral locus, taking into accountselection at a single partly linked locus subject torecurrent deleterious mutation. Barton (1995) deriveda similar result for the fixation probability of afavourable allele. Hudson & Kaplan (1995) derivedpredictions of the effect of mutation at multiple locisubject to selection on variability at a linked neutrallocus, using the simplifying assumptions of onecrossover per chromosome, and equal selective effectsat each locus. They used their results to predict thepattern of genetic variability as a function of chromo-somal location in the genome of D. melanogaster.

    In this paper, we present an alternative derivationof the effect on variability at a neutral site due to alinked locus subject to deleterious mutations, andshow how an arbitrary number of selected loci can betreated without the restrictive assumptions made byHudson & Kaplan (1995). We also investigate theconditions required for the analytical results to hold,and relate our results to those on the reduction ineffective population size due to selection, derived bySantiago & Caballero (1995). We use computersimulations to evaluate the performance of theformulae under a wide range of parameter values.These results have been used to predict patterns ofvariability in Drosophila under less restrictive assump-tions than those made by Hudson & Kaplan (1995)(B. Charlesworth, submitted).

    2. Analytical results

    (i) Formulation of the model

    The model assumes m autosomal diallelic loci subjectto mutation-selection balance, in a randomly matingpopulation. The wild-type allele at the rth locus isdenoted by A, and the mutant allele by at; theirfrequencies are p, and qt = \ —pt, respectively. Themutant alleles are assumed to be so rare that terms oforder q\ are negligible. Selection can thus be assumedto take place exclusively against heterozygous carriersof mutant alleles. Let the mutation rate from the wild-type to the mutant allele at the rth locus be ut. Themean number of new mutations per diploid individualis U = 2 Tit ut. Let the fitness of mutant heterozygotesrelative to that of wild-type homozygotes be 1 —1{ (t isthe product of the selection coefficient for mutanthomozygotes, s, and the dominance coefficient h). Ifthe fitness effects of different loci are multiplicative, aswill be assumed in this section, there is no linkagedisequilibrium in an infinite population at equilibriumunder mutation and selection (Felsenstein, 1965;Feldman et al. 1980; Charlesworth, 1990), and theequilibrium allele frequency at the rth locus isapproximately u{jtt, provided that u,

  • Recombination and background selection 161

    segregating for a pair of alleles. Allele B is arbitrarilychosen as the allele which originates by mutation. Theexpected nucleotide site diversity at statistical equi-librium, n, is thus given by the expectation of G.Knowing the mean and variance terms for thefrequencies of B in the different gamete classes, it isstraightforward (see Appendix) to apply the lineardifferential operator method of Ohta & Kimura (1969)in order to obtain an expression for n.

    The final expression for the ratio of n to theclassical neutral value, n0, is found to be

    n - i (3)

    where pi = fi/ti and ft = rt(\ — tt). This formula isequivalent to equation (3) of Hudson & Kaplan(1995), with the minor difference that their formulainvolves r, not rt. It is also similar to the formuladerived by Barton (1995) for the probability of fixationof a favourable mutation linked to a locus subject tomutation and selection. It is easily seen that, to theorder of the approximations used here, equation (3)with no recombination is equivalent to the expression77 = f0 na (where f0 = 1 — qt in this case) derivedpreviously for the case of an arbitrary number of loci(Charlesworth et al. 1993).

    Some modification is needed for the case of X-linked mutations, which are selected against in thehemizygous state in males (assuming male hetero-gamety). The largest difference from the autosomalcase is likely to be when all deleterious mutations actin both males and females. The equilibrium frequencyof a mutant allele at locus / is then qi = ujtt, wherett = (2ti + s()/3, and s( is its homozygous or hemi-zygous effect on fitness (Haldane, 1927). This can besubstituted into equation (3), replacing tt by t{. Ifselection acts only in males, tt = s,/3; if its acts onlyon females, t{ = 2tJ3. The averaging of recombinationrates across the sexes must be done differently fromthe autosomal case, since an X-linked gene spendstwo-thirds of its time in the homogametic sex,compared with one-half for autosomes. The effectivepopulation size for the classical neutral case must alsobe adjusted appropriately (Charlesworth 1994;Caballero, 1995; Nagylaki, 1995).

    (iii) The effects of multiple loci

    The above results can be generalized (see Appendix)to incorporate the effects of multiple selected loci.With small effects of each locus, we have

    ai

    i 0 +Pt)(4)

    This is similar to equation (5) of Hudson & Kaplan(1995), which was derived on the assumption of equalselective effects of each locus, and a single chromosomewith at most one crossover per chromosome. As with

    their formula, the summation can be replaced byintegration over the genomic region in question, ifselected loci are sufficiently densely packed intochromosomes that a chromosome can be treated as acontinuum.

    We can then assume that loci are distributeduniformly along the physical map of the genomicregion in question, and denote by a variable z thephysical position of a locus subjects to mutation andselection. The frequency of recombination between aselected locus at position z and the neutral site underconsideration is denoted by r(z). Let u be the frequencyof new mutations per unit physical distance for ahaploid genome. The joint probability density of uand the selection coefficient t (i.e. the 'mutationspectrum') is

  • M. Nordborg, B. Charlesworth and Deborah Charlesworth 162

    (M J> t), and for loci which are not close to the ends(so that PQM $> t), we have

    U(10)

    i.e. the proportional effect of background selection isapproximately the same as the density of newmutations per map unit, as pointed out previously byHudson & Kaplan (1994, 1995) and Barton (1995) forthe case of a neutral locus located in the centre of ablock of selected loci. Remarkably, the effect ofbackground selection with M t> Ms virtually inde-pendent of the position of the neutral locus, except atthe extreme ends of the chromosome, so that thisresult holds for almost any chromosomal regionwhere the recombination rate per nucleotide isapproximately independent of position. There is,however, an edge effect, as may be seen by comparingthe value of E for the case of a neutral locus at theextreme left end of the chromosome (PQ = 0, E =U/{2[t + M(\-t)]}) with the value for a locus in themiddle (P = Q = 1/2, E x U/[M-(\ -/)]). This im-plies that there is a weaker effect of backgroundselection on a neutral site located at the end of achromosome than on one in the middle, as would beexpected from the fact that it experiences the effects ofselection only from one side.

    (iv) The effect of the strength of selection

    It is useful to look more closely at the qualitativebehaviour of the results derived in the previoussections. Equation (3), the background selection effectdue to a single selected locus, can be furtherapproximated by

    — ~\ —n0 i

    (11)

    using q( « ujt, and assuming small tt. As is intuitivelyexpected, diversity decreases with lower rt and higherut. The effect of the selection coefficient is slightlymore complicated. Differentiating (11) with respect tot(, we see that the effect on diversity increases as ttdecreases, up to a maximum at /( = r(, and decreasesthereafter. In other words, a weakly selected locus cancause strong background selection if it is tightlylinked, but its importance declines rapidly withincreasing rt, whereas a strongly selected locus causesweaker background selection (i.e. greater n/n0), butcan do so from a greater distance.

    It is thus clear that only mutations within a certaindistance will matter when we sum over all loci toobtain the total effect. From the above discussion, it isalso clear that this distance depends on t. For weaklyselected mutations, the total effect comes fromsumming over a few closely linked loci, each causingstrong background selection, whereas for stronglyselected loci the total effect comes from summing over

    1

    0-8

    S-0-4

    0-2

    0-8

    2 0-6

    : div

    ! 0-4

    10-2

    («)

    /

    /

    (b)\

    V

    /\

    /f

    ItIt

    ItIt

    ItIt

    It

    Physical position

    Fig. 1. The effect of background selection on achromosome of approximate map length 1 Morgan withreduced recombination in the centre. Plot (a) shows therelationship assumed between physical distance and mapdistance [r(z) = (z + sinz)/6]. Plot (b) shows n/n0 withC/= 0-1 for weak ( ) (t = 0-0003) and strong ( )(/ = 003) selection. Strong selection gives a smootherrelationship between the recombination rate and the effectof background selection. Note also that whether stronglyor weakly selected mutations result in strongerbackground selection depends on the position on themap.

    a larger fraction of the chromosome, each locuscausing relatively weak background selection. If thereis a distribution of mutation effects, weakly selectedmutations may cause strong background selection inregions of reduced recombination but be unimportantin other parts of the genome. Also, since the regionimportant for background selection is smaller whenmutations are weaker, the resulting effect is expectedto follow any variation in recombination rate moreclosely if the mutations are weak rather than strong.This is illustrated in Fig. 1. The expected values ofn/n0 from equation (7) are shown in Fig. 1 b for twoselection coefficients, for a chromosome with anapproximate map length of one Morgan with reducedrecombination in the centre (Fig. 1 a). Fig. 1 alsoillustrates the edge effect referred to in the previoussection. Near the tips of the chromosome, 7T/TT0increases sharply. Notice that the region affected ismuch smaller under weak than under strong selection,in accordance with the argument just given: whenselection is weak, only closely linked loci matter.

    3. Simulation methods

    The approximations derived in Section 2 were appliedto results from Monte Carlo simulations of smallpopulations. The program was designed to follow

    https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0016672300033619Downloaded from https://www.cambridge.org/core. IP address: 54.39.106.173, on 30 Mar 2021 at 16:16:57, subject to the Cambridge Core terms of use, available at

    https://www.cambridge.org/core/termshttps://doi.org/10.1017/S0016672300033619https://www.cambridge.org/core

  • Recombination and background selection 163

    closely the Wright-Fisher sampling model (e.g. Ewens,1979, pp. 16-19). Populations of Ndiploid individualswere simulated, following the general method ofFelsenstein & Yokoyama (1976) and Charlesworth elal. (1993). Pseudo-random numbers were generatedusing ACG from the GNU C+ +Library (FreeSoftware Foundation, 1992). Details of the programand source code can be obtained from the authors.

    Each chromosome in a haploid genome wasassumed to have a large number of loci subject todeleterious mutations (assumed to be the same for allchromosomes when genomes with more than onechromosome were modelled). The exact number ofloci is not important as long as it is large enough(compared to the mutation rate and the expectedsurvival time of a deleterious mutation) for theprobability of double mutations (i.e. a gamete carryinga mutant allele at a certain locus experiencing anothermutation at this locus) to be negligible. When doublemutations did occur, the resulting alleles were given aselective disadvantage equal to the sum of those of thetwo mutations. This occurred too rarely to beimportant. If a mutation became fixed (again, thishappened with extremely low probability), it wastreated as a new wild-type allele.

    Fitnesses were calculated according to the fitnessfunction in use (see below). These values were thendivided by the greatest individual fitness present(almost always 1). An individual was then drawn atrandom from the population, and used as the firstparent if its fitness was greater than a uniform variatein the interval [0,1). If not, then a new individual wasdrawn (possibly the same), and the process wasrepeated. Using the parent, a gamete was formed asfollows. A number of recombination events perchromosome was generated as a Poisson variate withmean equal to the total map length of the chromosome,and positions uniformly distributed over the chromo-some (i.e. no interference). Finally, mutations wereadded to the new gamete, the number being de-termined by a Poisson variate with mean equal to thehaploid mutation rate, and the positions again beinguniformly distributed. A second new gamete wascreated by the same method (and could originate fromthe same parent as the first one). This procedure wasrepeated until N new individuals had been created.

    Several different mutation schemes were used. Inthe simplest case, all mutations were identical, but theprogram could also model weaker selection on every«th site, or could pick the selective effects from agamma distribution. The fitness of an individual withn heterozygous mutations, wn, was calculated by oneof three methods, namely:

    (i) Multiplicative. Fitness was given by the productof the single-locus fitnesses. For instance, in the caseof n identical mutations with heterozygous selectioncoefficient t, wn = (1 — /)".

    (ii) Additive. Fitness was equal to one minus thesum of the single-locus selection coefficients, or zero,

    if the sum is negative. For instance, in the case of nidentical mutations with heterozygous selection coef-ficient /, wn = max(l — nt,0).

    (iii) Synergistic epistasis. Each additional mutationhad a progressively larger effect, following theexponential quadratic model described byCharlesworth (1990). This model is defined only formutations of equal effect, for which wn =exp —«(a + «/?/2). The fitness of an individual with amixture of heterozygous and homozygous mutationswas determined by the method described byCharlesworth et al. (1991).

    To monitor the effects of the selective backgroundon neutral loci, the method of Charlesworth et al.(1993) was used. The neutral mutation process in itselfwas not modelled, but neutral alleles were introduced,one at a time, and observed until either fixed or lost.The number of generations until either fixation orloss, T, was recorded, as was the quantity

    = 2 2 *,(!-*,), (12)

    where x, is the frequency of the allele. The geneticdiversity under background selection, relative to theclassical neutral value, was calculated as n/n0 = H/2(Charlesworth et al. 1993).

    A number of neutral loci were placed at evenlyspaced intervals from the centre of each chromosome,and the simulation was run until a given number ofalleles (usually 16000) had been fixed or lost at eachsite. When more than a single neutral allele was to beintroduced in the population, they were introduced indifferent individuals. The purpose of this was to speedup the simulations by allowing several simultaneousobservations of different loci. Although the values ofH observed for different sites are not independent, themean properties we are measuring should not beaffected. However, as is shown in Section 4(vii), theposition of a neutral site can sometimes influence itsdistribution, and the simulation method used providesa convenient way of investigating this. Thus, whenresults from neutral sites are pooled in what follows,only sites deemed to be free of such effects were used.Confidence intervals for n/n0 were calculated as-suming that the means of the H/2 values overreplicates were approximately normally distributed.Although the individual values of H/2 are far fromnormally distributed, this procedure is justified by thevery large sample sizes.

    Except where otherwise noted, the predicted meanswere calculated from equation (7), with P = Q = 1/2.We used two extremes for r(z), either the linear one[equation (8)], or Haldane's mapping function(Haldane, 1919),

    r{z) =1-e"2~

    03)

    which, like the simulations, assumes no interference.In addition to making it possible to evaluate the

    https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0016672300033619Downloaded from https://www.cambridge.org/core. IP address: 54.39.106.173, on 30 Mar 2021 at 16:16:57, subject to the Cambridge Core terms of use, available at

    https://www.cambridge.org/core/termshttps://doi.org/10.1017/S0016672300033619https://www.cambridge.org/core

  • M. Nordborg, B. Charlesworth and Deborah Charlesworth 164

    integral (7) explicitly, these two functions serve asuseful limits. Because the map distance increases fastwith physical distance when there are no doublecrossovers, n/n0 is always smaller when (13) is usedthan with (8). However, as most of the effect iscontributed by closely linked loci [cf. Section 2(iv)],the precise choice of mapping function rarely mattersgreatly. Other mapping functions, which allow forsome degree of positive interference among crossovers(e.g. McPeek & Speed, 1995), will give resultsintermediate between the above.

    Except where noted, we used a population size ofTV = 3200, and a single chromosome. As in previousstudies (Charlesworth et al. 1993; Hudson & Kaplan,1995), approximations for the effect of backgroundselection derived under the assumption of largepopulation size were found to work remarkably wellfor small N (see Section 5). The number of neutral lociper chromosome varied, as did the required number ofobservations (introductions followed by loss or fix-ation) per locus, but all simulations recorded at leasta total of one and a half million such observations. Itwas verified that the frequency of fixations was notsignificantly different from the expected \/2N. Weassumed a dominance coefficient of h = 0-2, so thatonly values of t are given below. The main variablesinvestigated in the simulations were the map length,M, the diploid mutation rate, U, and the selectionscheme (the distribution of effects of mutations andtheir interaction). The parameter values were chosenpartly to test the model with simulations usingparameters comparable with previous work(Charlesworth et al. 1993, 1995; Hudson & Kaplan,1994, 1995), and partly because they (barring N) liewithin biologically reasonable bounds.

    4. Simulation results

    (i) Identical mutations, multiplicative selection,short maps

    The approximate equation (7) for the effect ofbackground selection in a region with recombinationwas derived assuming multiplicative interactions be-tween loci of equal effect, and should thereforeaccurately predict the outcome of simulations incor-porating this assumption. Cases 1-3 of Table 1 showthat this expectation is fulfilled, regardless of themapping function used for the theoretical expecta-tions. The results agree with those obtained by Hudson& Kaplan (1995), who simulated the same cases,allowing only a single crossover per chromosome.

    (ii) Effect of map length

    The choice of mapping function is expected to be mostimportant for longer maps. Cases 4 and 5 of Table 1

    include the results of two simulations with long maps(M = 1), the first with a low mutation rate (U = 01),and the second with a high mutation rate {U = 0-4). Inthe latter case, when background selection is strong,the prediction using Haldane's mapping function issignificantly better than that obtained using equation(8), but even here the difference is not great, illustratingthe conclusion already noted that, because most of thebackground selection effect is contributed by closelylinked loci, the recombination function does notmatter much.

    The most important effect of map length, however,is that, ceteris paribus, the shorter the map, thestronger the background selection. This is illustratedby cases 5-8 in Table 1 where we decrease the maplength from M = 1 to M = 0-1, while keeping every-thing else constant (see also Fig. 2).

    (iii) Weak selection

    The analysis in Section 2 assumes that the sojourntime of a deleterious allele is so short that a linkedneutral allele contributes nothing to nucleotide di-versity (see Section 5). If selection is weak, this latterassumption will not hold, and the approximationsshould overestimate the effect of background selection.Cases 9-12 of Table 1 show the results of simulationswith weak selection. When a strong effect of back-ground selection is predicted (cases 11-12), thepredictions consistently overestimate the effect.

    (iv) Mutations with unequal effects

    When the selective effects varied across loci, theexpected effect can be calculated using the relevantform of equation (5), namely,

    U ft

    dz TJo

    dz. (14)

    To test this equation, two different distributions ofmutation effects were used. In the first, every third sitewas subject to weaker selection (/3) than the rest (tl 2).Table 2 (case 1) shows the good agreement with theprediction from equation (14).

    In the second test, the selective effects were drawnfrom a gamma distribution with parameters a = 0-70,P = 0032, which yields a mean t of 0022, and fits theobservations from Drosophila well (Keightley, 1994).Note, however, that this distribution will yield somemutations whose effect is too small to satisfy theassumption discussed above. We therefore used agamma distribution truncated below at Nt = 5, aswell as the full distribution, in the simulations. Sincemutations of very small effect are not expected to

    https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0016672300033619Downloaded from https://www.cambridge.org/core. IP address: 54.39.106.173, on 30 Mar 2021 at 16:16:57, subject to the Cambridge Core terms of use, available at

    https://www.cambridge.org/core/termshttps://doi.org/10.1017/S0016672300033619https://www.cambridge.org/core

  • Recombination and background selection 165

    Table 1. Summary of simulation results for multiplicative selection withmutations of equal effect

    Case

    12345678

    9101112

    Selection

    Multiplicativet = 0005t = 001/ = 003/ = 002t = 002t = 002t = 002/ = 002

    u

    008008008010-40-40-40-4

    M

    016016016110-750-501

    Multiplicative, weak selection/ = 3-125x10-"/ = 6-25 x 10"5

    t = 10"3

    t = 10"3

    00250005008016

    0160160160-32

    n/n0

    Observed (95 % c.i.)

    0-64 (0-60-0-69)0-61 (0-56-0-66)0-72 (0-66-0-78)0-93 (0-85-1 00)0-61 (0-56-0-66)0-55 (0-51-0-59)0-46 (0-43-0-50)0069 (0055-0072)

    0-93 (0-87-1-00)100(0-93-108)0-71 (0-66-0-76)0-67 (0-62-0-71)

    Expected

    0-620-63-0-640-68-0-690-900-65-0-680-57-0-600-45-0-470052-0055

    0-860-970-610-61

    The results are discussed in Sections 4 (i)-(iii)- The expected values were calculatedfrom equation (7), using both Haldane's mapping function (13) and the linearfunction (8). If the two functions gave the same prediction only a single value isgiven, otherwise the low value comes from using Haldane's function and the highone from the linear function.

    0-25

    0-2

    015

    01

    Mrk f1-2

    1

    0-8

    0 6 .

    0-4

    0-2

    (b)

    1 III |1 III' 11000 2000 3000

    Position4000 5000

    Fig. 2. The effect of background selection (means and95 % confidence intervals) along chromosomes of maplengths (a) M = 01 and (b) M = 0-75 (cases 8 and 6 ofTable 1). The mutation rate is U = 0-4 and selection ismultiplicative with / = 002. Neutral sites were placed atpositions 0, ±101, ±202,. . . , ±5005, but, since thechromosomes are symmetric, statistics were computed bycombining results from the two halves. The numbers onthe abscissa therefore represent loci counting from thecentre (0). Means with 95 % confidence intervals areshown. The expected values were calculated fromequation (7), using both Haldane's mapping function( ) (13) and the linear function ( ) (8).

    cause much background selection, the simulationresults with truncation should show stronger effect ofbackground selection than those without truncation.Table 2 (cases 2-9) shows that the simulation resultsagree with this prediction. The predictions based onthe full distribution might be expected to overestimatethe effect of background selection, but this effectseems to be too small to detect except for the case withvery tight linkage (case 5), when the truncated resultand prediction also disagree significantly (case 9). Werepeated the simulations of cases 7 and 9 with thegamma distribution truncated at Nt = 25. As expected(see Section 5), this makes no difference when themap is long (case 10), but significantly improves theprediction when the map is short (case 11). It isnot clear what causes these disagreements but it isprobably due to the small population size (seeSection 5).

    (v) Non-multiplicative selection

    The effects of deviations from the assumption ofmultiplicativity were tested using two models of non-multiplicative interactions, additivity and synergisticepistasis (both described in Section 3). Given thesmall selective values used, and the observation that(1 — /)" x 1 — nt for small values of t, additiveinteractions are expected to give very similar resultsto multiplicative interactions. This is upheld for theadditive case by the simulation results (Table 2,cases 12 and 13). The same is true for synergisticepistasis with 0 = 0 (case 14) which should beidentical to multiplicative selection (Charlesworth,

    https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0016672300033619Downloaded from https://www.cambridge.org/core. IP address: 54.39.106.173, on 30 Mar 2021 at 16:16:57, subject to the Cambridge Core terms of use, available at

    https://www.cambridge.org/core/termshttps://doi.org/10.1017/S0016672300033619https://www.cambridge.org/core

  • M. Nordborg, B. Charlesworth and Deborah Charlesworth 166

    Table 2. Summary of simulation results examining the sensitivity of the predictions to the distribution ofselective effects and the selection scheme

    7T/7T0

    Case Selection U M

    1

    01610-501

    Observed (95 % c.i.)

    0-67 (0-62-0-69)

    0-68 (0-63-0-72)0-67 (0-62-0-72)0-5 (0-46-0-53)016(015-017)

    truncated at Nt = 5016 0-68 (0-63-0-73)1 0-63 (0-59-0-68)0-5 0-43 (0-40-0-46)01 012(011-013)

    truncated at Nt = 251 0-64 (0-59-0-69)01 0088 (0081-0096)

    016

    016

    0160160160-50-5

    0-65 (0-60-0-70)

    0-65 (0-60-0-70)

    0-68 (0-63-0-73)0-75 (0-69-0-81)0-79 (0-73-0-85)0-47 (0-43-0-50)0-59 (0-54-0-64)

    Expected

    0-65-0-68

    0-65-0-660-66-0-680-46-0-470045-0047

    0-66-0-670-66-0-680-46-0-470051-0053

    0-65-0-680070-0072

    0-66-0-67

    0-66-0-67

    0-66-0-670-70-0-710-800-49-0-510-57-0-60

    2345

    6789

    1011

    12

    13

    1415161715

    Multiplicative, unequal effectst12 = 004, /, = 001 0-4Multiplicative, gamma-distributed effectsa = 0-70,/? = 0032 008a = 0-70,/? = 0032 0-4a = 0-70,/? = 0032 0-4a = 0-70,/? = 0032 0-4Multiplicative, gamma-distributed effects.a = 0-70,/? = 0032 0-08a = 0-70,/? = 0-032 0-4a = 0-70,/? = 0-032 0-4a = 0-70,/? = 0-032 0-4Multiplicative, gamma-distributed effects.a = 0-70,/? = 0032 0-4a = 0-70,/? = 0032 0-4Multiplicative

    008

    0-08Additiver = 002

    Synergistic epistasis, a = 002/? = 0 008/? = 0-01 (« = 2-067) 008/? = 0-1 (« = 0-7393) 008/? = 001 (« = 6-111) 0-4/? = 01 {n = 2-199) 0-4

    The results are discussed in Section 4(iv)-(v). The expected values were calculated from equation (7) or (14) as appropriate,using both Haldane's mapping function (13) and the linear function (8) as in Table 1. Under synergistic epistasis, the expectedeffect was calculated by the method explained in the text, n is the mean number of mutations per individual.

    1990). For /? 4= 0, we estimated the 'effective't as U/n(Charlesworth, 1990, p. 204), where n is the meannumber of deleterious mutations per individual,calculated numerically by the method of Kimura &Maruyama (1966), assuming segregation but norecombination. The resulting value was then used as /in equation (7) to calculate the prediction for n/n0.The calculated values of n, the simulation results, andthe predictions are shown in Table 2 (cases 15-18).Again, the agreement is clearly satisfactory, even withstrong epistasis (/? J> a).

    (vi) Multiple chromosomes

    The approximations derived in Section 2 can easilybe generalized to yield predicted values for the case ofmultiple chromosomes, simply by letting r{z) = 1/2for all loci that lie on chromosomes other than the onecarrying the neutral locus under consideration. Thiswas tested using four chromosomes, each with U =0-2 and M = 0-25. Multiplicative interactions betweenmutations of equal effect, t = 002, and a populationsize of N = 1600 were assumed. Simulation of thismodel gave n/n0 = 0-46, with a 95% confidenceinterval of (0-44-0-48). The predicted value using

    Haldane's mapping function (13) is 0-47 and thatusing the linear mapping function (8) is 0-48. Theeffect of the additional three chromosomes is extremelysmall; the predictions for a single chromosome withU = 0-2 would be 0-48 and 0-50 for the Haldane andlinear mapping functions respectively. The correctionfor multiple chromosomes can also be calculated fromthe effect of unlinked loci on effective population size[cf. Appendix (iii)].

    (vii) Background selection close to chromosome tips

    As we have seen, the effect of background selectionshould be less pronounced close to the tips ofchromosomes. Figure 3 shows this very clearly. Theresults shown are those from the simulation with fourchromosomes described in Section 4(vi). There were atotal of 12007 loci, with seven neutral loci at positions{-6003,-4002,-2001,0,2001,4002,6003} on eachchromosome. To calculate confidence intervals foreach position, results from equivalent positions (e.g.— 4002 and 4002 from all four chromosomes) werepooled. The predicted values were obtained fromequation (7) with the correction for multiple chromo-

    https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0016672300033619Downloaded from https://www.cambridge.org/core. IP address: 54.39.106.173, on 30 Mar 2021 at 16:16:57, subject to the Cambridge Core terms of use, available at

    https://www.cambridge.org/core/termshttps://doi.org/10.1017/S0016672300033619https://www.cambridge.org/core

  • Recombination and background selection 167

    0-65

    4000 5000 6000Position

    0-45

    Fig. 3. An illustration of the decreased effect ofbackground selection close to tips of chromosomes. Thenumbers on the abscissa represent loci counting from thecentre (0). The dots are means with 95 % confidenceintervals. The broken and whole lines show the predicted7r/n0 using Haldane's mapping function ( ) (13) andthe linear function ( ) (8), respectively. Parameters areU = 0-2, M = 0-25, and t = 002. See Section 4(vii) fordetails.

    somes given in Section 4(vi). The prediction isextremely good for the centre, but seems to over-estimate the edge effect slightly.

    5. Discussion

    The present results, and the results of Hudson &Kaplan (1995) for short maps, show that a relativelysimple formula [equation (4)] can predict the effects ofbackground selection on genetic diversity at a neutrallocus in a random-mating population, under a widerange of models of selection and map lengths. Themajor assumptions used to obtain this result are that:(a) the population size is sufficiently large that thefrequencies of mutant alleles are close to theirequilibrium values for an infinite population; (b) thereis a negligible contribution to genetic diversity fromneutral variants associated with deleterious mutations[see Appendix, Section (ii)]; (c) different loci actmultiplicatively to determine fitness, and are inde-pendently distributed. Our simulations violate (a)grossly, and it is therefore rather surprising that theresults agree with the theoretical predictions at all.The likely reason for the agreement is that, in thesimulations, a given point on the chromosome issurrounded by a large group of closely linked selectedloci. Since Nt, §> 1, the total number of mutant allelesin the group will be close to the expected number forlarge populations, even though each locus is far frombeing at deterministic equilibrium. A neutral locuselsewhere on the chromosome thus perceives thegroup as if it were a single locus in mutation-selectionequilibrium. Given this, our simulation results in-dicate that the equation (4) applies both when eachlocus is subject to the same selection intensity, andwhen there is a wide distribution of selection coeffi-cients (Tables 1 and 2). It even seems to work wellwhen there is substantial synergistic epistasis for thefitness effects of mutant alleles (Table 2), which is apotentially important feature of systems with multiple

    loci subject to mutation-selection balance (Kimura &Maruyama, 1966; Crow, 1970; Feldman et al. 1980;Kondrashov, 1988; Charlesworth, 1990).

    An important question concerning the applicabilityof these results is whether very weak selection is oftenlikely to cause the failure of predictions based onequation (4). Assumption (b) can fail even if Ne t is ofthe order of 100 or more, as indicated by our previoussimulation results, leading to overestimates of theeffect of background selection on genetic diversity (seeCharlesworth et al. 1993, table 2). One reason for thiseffect appears to be that the mean time to loss of avery weakly deleterious allele in a finite population isnot very different from the neutral value (Kimura &Ohta, 1969). A neutral variant associated with achromosome carrying one or more weakly deleteriousmutations therefore persists in the population almostas long as if there is no selection, if it does notrecombine onto a mutation-free background, eventhough it is ultimately destined for elimination. Inaddition, as pointed out by Charlesworth et al. (1993),when the mean number of deleterious mutations in anon-recombining chromosomal region is substantiallygreater than one, a chromosome with a small numberof deleterious mutations is actually at a selectiveadvantage relative to the overall population. If U issmall, such a chromosome can persist in the popu-lation for a long time before it accumulates furtherdeleterious mutations and becomes selectively dis-advantageous. Both of these factors can lead to failureof the assumption that neutral variants contributelittle to diversity if they are associated with mutant-carrying chromosomes. Coalescent-process simula-tions of the case with no recombination indicate thatrather weak selection (with t of the order of 10"3 orless) is required for large deviations from the predictedeffects of background selection, if the population sizeis larger than 500000 (Charlesworth et al. 1995).Bigger deviations from the predictions are found insmall populations than in large populations with thesame selection coefficient, although the effect is notvery sensitive to population size after a value of a fewtens of thousands is reached, except if the expectedreduction in diversity is very large (Table 3).

    While the slowness of our Monte Carlo simulationswith recombination and large population size pre-cludes an exhaustive analysis of the conditions underwhich the theoretical predictions breakdown, theexamples we have studied suggest that the aboveconclusions are qualitatively valid when there is somerecombination (Table 1). However, the argumentgiven in the Appendix [Section (ii)] suggests thatassumption (b) is less likely to be violated whenrecombination occurs, consistent with the results inTable 3, where the largest discrepancies due to weakselection are observed with tight linkage. Of course, ifpopulation size and selection coefficients are bothso small that Ne t is of the order of one or less, thenthe fundamental assumption that the number of

    https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0016672300033619Downloaded from https://www.cambridge.org/core. IP address: 54.39.106.173, on 30 Mar 2021 at 16:16:57, subject to the Cambridge Core terms of use, available at

    https://www.cambridge.org/core/termshttps://doi.org/10.1017/S0016672300033619https://www.cambridge.org/core

  • M. Nordborg, B. Charlesworth and Deborah Charlesworth 168

    Table 3. Dependence of background selection effects on selectioncoefficient and population size, when there is weak selection and norecombination

    Theoretical n/no( x 10"1): 0-82 0-24 007

    USimulated TT/TTO{ X 10"

    1):N = 5000N= 25000N= 100000N = 500000N = 1000000

    r = 2x10""uSimulated TT/T70(X 10"

    1):N = 5000N = 25000N = 100000N = 500000N= 1000000

    1-OxlO-2

    1-797 ±00181050 ±00040-846 ±00270-831 ±00270-789 ±0029

    1-OxlO"3

    8-684 ±00722-735 ±0-0451-311 ±00310-969 ±00300-989 + 0035

    1-5 xlO"2

    l-354±00130-478 ±00140-318 + 0-0140-251 ±0-0140-223 ±0-014

    1-5 xlO"3

    8-721 ±0-0712-391 ±00410-823 ±00200-336±00140-314±0016

    20 x 10~2

    1-255 + 0-0140-330 + 0-0100136 ±00080081 ±00080071 ±0006

    2-0 xlO-3

    8-672 + 0-0762-265 + 0-0200-698 + 00180-230±0018O130±0-008

    The simulated values of n/n0 were obtained from the means of 2000 replicates ofthe coalescent process simulation procedure of Charlesworth et al. (1995), with asample size of 100 haploid genomes per replicate.

    deleterious mutations per individual is close to equi-librium will be invalid even in the multi-locus case,and background selection will be ineffective(Charlesworth et al. 1993). Although there is evidencefor variation in the strength of selection against newmutations affecting viability in D. melanogaster(Keightley, 1994), the fact that the harmonic mean oft is about 002 (Crow & Simmons, 1983) means thatonly about 2% of mutations would have t < 10"4

    under the gamma distribution parameters fitted inSection 4(iv). Even with an Ne of 20000, as estimatedfor a local population of D. melanogaster by Mukai &Yamaguchi (1974), a / value of 10~4 would be sufficientto prevent fixation of deleterious alleles (Crow &Kimura, 1970).

    These considerations suggest that we can be fairlyconfident that investigations based on the validity ofequation (4) and its corollaries, such as that ofHudson & Kaplan (1995), will provide useful pre-dictions of the background selection effects of all butthe most weakly selected loci, at least as far asDrosophila populations are concerned. This providesa justification for using the analytical results basedon equation (4) to make detailed predictions aboutthe effect of background selection on the relationbetween chromosomal location and DNA variationfor D. melanogaster, as has been done by Hudson &Kaplan (1995) and B. Charlesworth (submitted).These predictions may help to distinguish betweenalternative explanations of the empirical associationbetween low local recombination rates and low geneticvariation (Aquadro et al. 1994; Kreitman & Wayne,1994; Braverman et al. 1995; Charlesworth et al.1995; Hudson & Kaplan, 1995; Simonsen et al. 1995).

    Some significant implications about the effects ofbackground selection in the presence of recombinationfollow from equation (4) and its corollaries. First,provided that recombination does not vary with mapposition and that the map length is less than oneMorgan or so, equation (10) can be applied to neutralloci that are not too close to the tips of thechromosome. This equation implies that the reductionin genetic diversity for such loci is a negativeexponential function of the ratio of the diploidmutation rate U for the chromosome to its map lengthin Morgans. This result was also obtained by adifferent method by Hudson & Kaplan (1995), and asimilar formula was derived by Barton (1995) for theeffect of background selection on the probability offixation of a selectively favourable mutation. For D.melanogaster, a U of 0-4 for one of the majorautosomes is probably close to the true value, or evensomewhat conservative (Keightley, 1994). Given thatthere is no recombination in males, the effective maplength of an autosome is about 0-5 Morgans, so thata reduction in neutral diversity to about 45 % of theclassical neutral value would be predicted. Moreaccurate calculations, taking into account the sup-pression of crossing over near the centromeres andhigher frequencies of exchange elsewhere, give a valueclose to 62 % for loci in the most freely recombiningsections of the chromosome, with much smaller valuesin the centromeric regions (Hudson & Kaplan, 1995;B. Charlesworth, submitted). A somewhat smallereffect (70 % of the neutral value) is predicted for the Xchromosome. This suggests that all loci on the majorchromosomes of D. melanogaster experience the effectsof background selection from genes on the same

    https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0016672300033619Downloaded from https://www.cambridge.org/core. IP address: 54.39.106.173, on 30 Mar 2021 at 16:16:57, subject to the Cambridge Core terms of use, available at

    https://www.cambridge.org/core/termshttps://doi.org/10.1017/S0016672300033619https://www.cambridge.org/core

  • Recombination and background selection 169

    chromosome as themselves [the results of the Ap-pendix, Section (iii), indicate that unlinked loci arelikely to have little effect].

    Drosophila species with the basic complement offour acrocentric autosomes, rather than the twometacentrics derived from fusions of acrocentrics,which are characteristic of the melanogaster speciessubgroup (Ashburner, 1989, ch. 37), would beexpected to have substantially smaller effects ofbackground selection, since the mutation rate perautosome is then half the value for D. melanogaster.Additional effects will be produced by higher rates ofrecombination per nucleotide site. The combinedeffect of these factors could be substantial. Forexample, with U/M = 0-2/1 instead of 0-4/0-5, theexpected relative diversity, n/n0, is 82% instead of45 %. An autosomal neutral locus in a species such asD. virilis or D. subobscura, which have much largermap lengths per chromosome arm than D. melano-gaster and four major autosomes instead of two(Alexander, 1976; Krimbas, 1993), might thus havenearly twice as much neutral variation than a mid-armlocus of D. melanogaster, other things being equal.Differences among members of the melanogastersubgroup are discussed by B. Charlesworth (sub-mitted).

    It is difficult to predict the likely effect of back-ground selection in other taxa, in the absence ofestimates of U values for higher organisms. Currently,the only direct estimates of this parameter are forDrosophila (Crow & Simmons, 1983; Keightley, 1994).While there are still considerable uncertainties asso-ciated with these estimates, it seems likely that thevalue of U for Drosophila is approximately 1, yieldingthe value of 0-4 for the major autosomes of D.melanogaster used above. This would presumablyapply to other taxa of higher insects. An indirectapproach for highly self-fertilizing species of plantsyields a similar value (Charlesworth et al. 1990;Charlesworth et al. 1994; Johnston & Schoen, 1995).A method for determining U for mammals from dataon rates of molecular evolution has recently beenproposed, but no results have yet been reported(Kondrashov & Crow, 1993). The mutation rate perlocus for visible mutations in mammals seems to besimilar to that for Drosophila (Kondrashov & Crow,1993), but there are more than four times as manygenes (Bird, 1995). This suggests that U for mammalsmay be at least 4, which we may adopt as a workingestimate. In humans, with 23 chromosomes, the totalsex-averaged map length is about 40 Morgans(Morton, 1991), so that the mean map length perchromosome is approximately 1-75 Morgans, and themean U per chromosome is 0-17. With the Kosambimapping function (Kosambi, 1944), which is com-monly used in mammalian mapping studies, theexpected nucleotide site diversity relative to theclassical neutral value would be 0-90 for a centrallylocated gene, and 0-95 for a distal one, assuming that

    t = 002. For the mouse, with a much shorter totalmap length of about 14 Morgans (Dietrich et al.1992), the central and distal relative diversity valueswould be predicted to be 0-77 and 0-87 respectively.This suggests that recombinational differences couldcontribute to differences in genetic diversity and ratesof molecular evolution among mammalian species.No data are currently available to determine whetheror not such differences are observed. In plants, maplengths of about 1-2 Morgans per chromosome havebeen found in genome mapping projects (Tanksleyet al. 1992; Ahn & Tanksley, 1993). With a mutationrate of 1 per genome, and 10 chromosomes with amap length of 1-2, as in tomatoes (Tanksley et al.1992), this would yield central and distal relativediversity values of 0-92 and 0-96, suggesting a ratherweak overall effect of background selection.

    The above predictions ignore regional differences inrecombination within chromosomes. In humans, thereis evidence for pericentric reductions but telomericincreases in exchange rates per nucleotide, comparedwith the intervening regions of chromosome arms(NIH/CEPH Collaborative Mapping Group, 1992).In plants, there is often strong centromeric andtelomeric suppression of recombination, extendingover large sections of the chromosomes (Neuffer &Coe, 1974; Tanksley etal. 1992). Background selectionand selective sweeps may thus produce significantwithin-chromosome structuring of patterns of geneticdiversity and molecular evolution in these species, asin Drosophila. This possibility should be taken intoconsideration in future studies of molecular evolutionand variation.

    As pointed out in Section 2(iv), the backgroundselection effects of weakly selected loci are much moresensitive to recombination than those of stronglyselected loci, although (for the same mutation rate) aweakly selected locus can have a larger backgroundselection effect than a strongly selected locus if linkageis very tight. As displayed in Fig. 1, if there is regionalvariation in the frequency of recombination pernucleotide, weakly selected loci will contribute moreto regional differences in the level of neutral variabilitythan strongly selected loci. It has been suggested thatthere may be a long tail of deleterious mutations withvery small effects among the viability mutationsdetected in the Drosophila mutation accumulationexperiments (Keightley, 1994; Lande, 1994). Inaddition, there is evidence that the abundant trans-posable elements found in natural Drosophila popula-tions have very slightly deleterious fitness effects, withmean selection coefficients of approximately 2 x 10"4

    (Charlesworth et al. 1992). As suggested by Hudson(1994), they could make a substantial contribution tothe reduction in variability in regions of low re-combination. This possibility will be explored in moredetail in a subsequent paper (B. Chariesworth, sub-mitted).

    Finally, it is interesting to note that similar formulae

    https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0016672300033619Downloaded from https://www.cambridge.org/core. IP address: 54.39.106.173, on 30 Mar 2021 at 16:16:57, subject to the Cambridge Core terms of use, available at

    https://www.cambridge.org/core/termshttps://doi.org/10.1017/S0016672300033619https://www.cambridge.org/core

  • M. Nordborg, B. Charlesworth and Deborah Charlesworth 170

    for the influence of linked loci subject to deleteriousmutations have emerged from the study of their effectson the fixation probabilities of favourable mutationswith selection coefficients which are sufficiently largethat branching process theory can be used (Barton,1995), and on nucleotide site diversity at neutral sites(Hudson, 1994; Hudson & Kaplan, 1994, 1995).Related results have also been derived for the effect onneutral loci of directional selection on a quantitativetrait (Santiago & Caballero, 1995; Santiago, inpreparation), and for the effects of selective sweepsand temporally fluctuating selection coefficients(Barton, 1995). But, except for the case of completelinkage (Charlesworth, 1994), we are currently lackingresults on the effects of background selection onweakly selected mutations, for which branchingprocess theory cannot be used. With free recom-bination, all approaches lead to the conclusion thatthe effects of selection can be approximately rep-resented by a reduction in effective population size Ne.The factor by which Ne is divided to obtain therelevant expression in each case is approximatelyequal to one plus four times the additive geneticvariance in family size, if fitness is measured relative tothe population mean. More complex formulae ob-viously apply with linkage (cf. equation [4]). But it isimportant to note that Ne is not a sufficient descriptorof the effects of selection at linked sites. As shown byCharlesworth et al. (1993) and Hudson (1994) forbackground selection, and by Braverman et al. (1995)and Simonsen et al. (1995) for selective sweeps, theremay be very different effects on the nucleotide sitediversity and on the number of segregating sites. Thiswould not be expected if these processes could bedescribed simply in terms of a reduction in Ne. Rather,both the genealogical structure of the population andthe expected time to coalescence between a pair ofgenes (which is controlled by Ne in the classical neutralmodel), are affected by selection at linked sites. It ismuch more difficult to obtain useful analytical resultsfor this effect than for the reduction in nucleotide sitediversity, or the fixation probability.

    Appendix

    The application of the linear diffusion operatormethod (Ohta & Kimura, 1969) to the case of aneutral locus and a single selected locus will bepresented here.

    (i) Diffusion coefficients

    Deterministic equations for the changes in the x, caneasily be written down on the assumption that thelocus under selection is in equilibrium. Second-orderterms in qt, tt and ut will be neglected in what follows.To the assumed order of approximation, the mean

    fitness of the population is 1 — 2u( (Crow & Kimura1970, Chap. 6), the marginal fitness of a gametecarrying a wild-type allele at the selected locus is1 — qt f, « 1 — u(, and the marginal fitness of a mutant-carrying gamete is 1 — rs — ut. Let the frequencies of thegametes A(B, Atb, a(B, and atb be yl = p{x0, y2 =AO -*o)> y3 = atxx, and y4 = qt(\ -xx), respectively.The coefficient of linkage disequilibrium is y1yi —

    These expressions can be substituted into thestandard equations for two loci (Crow & Kimura,1970, ch. 5), with the addition of terms to included theeffect of mutation from At to at, in order to obtainexpressions for the deterministic changes in x0 and xvWriting Mt for the changes in x( from selection,mutation and recombination, letting ft = r,(l — tt),and noting that the net frequencies of the alleles at theselected locus remain unchanged over a generation,we obtain

    M = ^ ~ 10 Pt pt{\ -2ut)

    and

    M = !Z± x -1i 1i^-2u()

    -{\-2ul)y1], (A 1)

    i, (A 2)

    which, upon neglecting second-order terms andutilizing the relations ufurther simplified to yield

    and

    qtt( and ut

  • Recombination and background selection 171

    (Crow & Kimura, 1970, pp. 443^44), this conditionshould automatically be met whenever the model isappropriate.

    (ii) Genetic diversity at statistical equilibrium

    For details of the method used in this section, seeEwens (1979, section 4.10), Kimura & Ohta (1971,appendix 3), or Stephan et al. (1992). For a functiong(x0, xx), we have

    (A 7)

    where ET denotes expectation taken at a given point rin time. The object is to find a set of equations infunctions gt that result in a soluble set of equations forthe expectations. For the set

    = 2xo(l-xo),

    = 2ql(\-2x0)8,

    = 8\

    (AS a)

    (A 86)

    (A 8 c)

    we have the following equations for the effects ofselection, recombination and drift:

    AT 2ptNe(A 9 a)

    ^r(E(g2), (A 9 b)

    drE(gr}_

    4PiqfNe 4q*Nt

    1+ 2qt r, + 2pt(t( + rt) E(g2), (A 9 c)

    where all expectations are taken at time T.At statistical equilibrium, the sum of the con-

    tribution from mutation at the neutral sites to theexpected change in each function and the change inthe expectation of the function due to drift and theother deterministic forces [represented by the ap-propriate member of equations (A 9)] must be equalto zero. The mutational term for a given function isequal to the change in the value of the function due toa single new mutation that arises in a non-segregatingpopulation, times the rate per generation at whichmutations arise (Kimura & Ohta, 1971, pp. 186-187).If the total breeding population size is N, the changein g0 due to a single new mutation that arises in amutant-free chromosome is 1 /(Np() (neglecting termsof order i/N2). The expected number of mutationsthat arise per generation is 2Nv, and the probabilitythat each arises in a mutant-free chromosome is p(, so

    that the mutational change in the expectation of g0is 2v. It is easily seen that the mutational change inthe expectation of 8 is zero, so that the change in theexpectation of g1 is of order \/N

    2. The change in theexpectation of g2 is v/2Nqt, which is negligible forsufficiently small v/N. Thus we have

    (A 10a)

    (A

    E(g0) E(gl)4ptqtN. 4q*N,

    1+ 2qt rt + 2Pi(t( + r,) E(g2), (A 10 c)

    where all expectations are now taken at statisticalequilibrium. Equations (A 10) form a linear system ofequations. Their approximate solutions can be foundas follows.

    The leading term in the factor of E(g2) in equation(A 10 c) is 2pi(ti + ri) provided that 4qi(ti + rt)Ne > 1.The neglected term \/2qtNe only contributes toequation (A 10ft) if rt 4= 0, and its contribution to thefinal result is negligible if 4qt N£tt + r() $> 1. Neglectingthe other terms, equation (A 10 c) gives

    E{g2) ( A l l )

    Substitution of this into equation (A 10 ft), againretaining only the leading term in E(gi) and neglectingsecond-order terms, gives

    (A 12)

    can now be obtained by substituting thisexpression into equation (A 10a). Using this inconjunction with equation (A 12), we have

    (A 13)

    Write n for the expectation of the genetic diversity, G.Then n in the absence of background selection is TT0 =4Nev (Kimura, 1969). Rearranging equation (A 13),and using equations (2) and (A 11), we obtain

    n =E(g0)-E(gl)7rn 4N. v

    1

    1+-(A 14)

    where pt = ft/i( measures the frequency of recom-bination relative to the selection coefficient for the /thlocus.

    https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0016672300033619Downloaded from https://www.cambridge.org/core. IP address: 54.39.106.173, on 30 Mar 2021 at 16:16:57, subject to the Cambridge Core terms of use, available at

    https://www.cambridge.org/core/termshttps://doi.org/10.1017/S0016672300033619https://www.cambridge.org/core

  • M. Nordborg, B. Charlesworth and Deborah Charlesworth 172

    We can therefore write

    (A 15)1 +

    0 +A)2

    provided that q(

  • Recombination and background selection 173

    Molecular Data (ed. G. B. Golding), pp. 46-56. London:Chapman and Hall.

    Ashburner, M. (1989). Drosphila. A Laboratory Handbook.Cold Spring Harbor, New York: Cold Spring HarborLaboratory Press.

    Barton, N. H. (1994). The reduction in fixation probabilitycaused by substitutions at linked loci. Genetical Research64, 199-208.

    Barton, N. H. (1995). Linkage and the limits to naturalselection. Genetics 140, 821-841.

    Berry, A. J., Ajioka, J. W. & Kreitman, M. (1991). Lack ofpolymorphism on the Drosophila fourth chromosomeresulting from selection. Genetics 129, 1085-1098.

    Bird, A. P. (1995). Gene number, noise reduction andbiological complexity. Trends in Genetics 11, 77-117.

    Birky, Jr., C. W. & Walsh, J. B. (1988). Effects of linkage onrates of molecular evolution. Proceedings of the NationalAcademy of Sciences, USA 85, 6414-6418.

    Braverman, J. M., Hudson, R. R., Kaplan, N. L., Langley,C. H. & Stephan, W. (1995). The hitchhiking effect on thesite frequency spectrum of DNA polymorphism. Genetics140, 783-796.

    Caballero, A. (1995). On the effective size of populationswith separate sexes, with particular reference to sex-linked genes. Genetics 139, 1007-1011.

    Charlesworth, B. (1990). Mutation-selection balance andthe evolutionary advantage of sex and recombination.Genetical Research 55, 199-221.

    Charlesworth, B. (1994). The effect of background selectionagainst deleterious mutations on weakly selected, linkedvariants. Genetial Research 63, 213-227.

    Charlesworth, B., Charlesworth, D. & Morgan, M. T.(1990). Genetic loads and estimates of mutation rates inhighly inbred plant populations. Nature 347, 380-382.

    Charlesworth, B., Charlesworth, D. & Morgan, M. T.(1991). Multilocus models of inbreeding depression withsynergistic selection and partial self-fertilisation, GeneticalResearch 57, 177-194.

    Charlesworth, B., Lapid, A. & Canada, D. (1992). Thedistribution of transposable elements within and betweenchromosomes in a population of Drosophila melanogaster.I. Element frequencies and distribution. Genetical Re-search 60, 103-114.

    Charlesworth, B., Morgan, M. T. & Charlesworth, D.(1993). The effect of deleterious mutations on neutralmolecular variation. Genetics 134, 1289-1303.

    Charlesworth, D., Charlesworth, B. & Morgan, M. T.(1995). The pattern of neutral molecular variation underthe background selection model. Genetics 141,1619-1632.

    Charlesworth, D., Lyons, E. E. & Litchfield, L. B. (1994).Inbreeding depression in two highly inbreeding popula-tions of Leavenworthia. Proceedings of the Royal Society,London, B 258, 209-214.

    Crow, J. F. (1970). Genetic loads and the cost of naturalselection. In Mathematical Topics in Population Genetics(ed. K. Kojima), pp. 128-177. Berlin: Springer-Verlag.

    Crow, J. F. & Kimura, M. (1970). An Introduction toPopulation Genetics Theory. New York: Harper & Row.

    Crow, J. F. & Simmons, M. J. (1983). The mutation load inDrosophila. In The Genetics and Biology of Drosophila(ed. H. L. Carson, M. Ashburner and J. N. Thomson),vol. 3e, pp. 1-35. London: Academic Press.

    Dietrich, W., Katz, H., Lincoln, S. E., Shin, H.-S.,Friedman, J., Dracopoli, N. C. & Lander, E. S. (1992). Agenetic map of the mouse suitable for typing intraspecificcrosses. Genetics 131, 423—447.

    Ewens, W. J. (1979). Mathematical Population Genetics.Berlin: Springer-Verlag.

    Feldman, M. W., Christiansen, F. B. & Brooks, L. D.

    (1980). Evolution of recombination in a constant en-vironment. Proceedings of the National Academy ofSciences, USA 11, 4838-4841.

    Felsenstein, J. (1965). The effect of linkage on directionalselection. Genetics 52, 349-363.

    Felsenstein, J. & Yokoyama, S. (1976). The evolutionaryadvantage of recombination. II. Individual selection forrecombination. Genetics 83, 845-859.

    Free Software Foundation (1992). GNU C + + Library.Publicly available via ftp:/1 prep, ai.mit.edu /.

    Gillespie, J. H. (1994). Alternatives to the neutral theory. InNon-Neutral Evolution: Theories and Molecular Data (ed.G. B. Golding), pp. 1-17. London: Chapman and Hall.

    Haldane, J. B. S. (1919). The combination of linkage valuesand the calculation of distance between loci of linkedfactors. Journal of Genetics 8, 299-309.

    Haldane, J. B. S. (1927). A mathematical theory of naturaland artifical selection. Part V. Selection and mutation.Proceedings of the Cambridge Philosophical Society 23,838-844.

    Hudson, R. R. (1994). How can the low levels of Drosophilasequence variation in regions of the genome with lowlevels of recombination be explained? Proceedings of theNational Academy of Sciences, USA 91, 6815-6818.

    Hudson, R. R. & Kaplan, N. L. (1994). Gene trees withbackground selection. In Non-Neutral Evolution: Theoriesand Molecular Data (ed. G. B. Golding), pp. 140-153.New York: Chapman & Hall.

    Hudson, R. R. & Kaplan, N. L. (1995). Deleterious back-ground selection with recombination. Genetics 141,1605-1617.

    Johnston, M. O. & Schoen, D. J. (1995). Mutation rates anddominance levels of genes affecting total fitness in twoangiosperm species. Science 267, 226-229.

    Kaplan, N. L., Hudson, R. R. & Langley, C. H. (1989). The'hitch-hiking' effect revisited. Genetics 123, 887-899.

    Keightley, P. D. (1994). The distribution of mutation effectson viability in Drosophila melanogaster. Genetics 138,1315-1322.

    Kimura, M. (1969). The number of heterozygous nucleotidesites maintained in a finite population due to steady fluxof mutations. Genetics 61, 893-903.

    Kimura, M. & Maruyama, T. (1966). The mutational loadwith epistatic gene interactions in fitness. Genetics 54,1337-1351.

    Kimura, M. & Ohta, T. (1969). The average number ofgenerations until extinction of an individual mutant genein a population. Genetics 63, 701-709.

    Kimura, M. & Ohta, T. (1971). Theoretical Aspects ofPopulation Genetics. Princeton: Princeton UniversityPress.

    Kliman, R. M. & Hey, J. (1993). Reduced natural selectionassociated with low recombination in Drosophila melano-gaster. Molecular Biology and Evolution 10, 1239-1258.

    Kondrashov, A. S. (1988). Deleterious mutations and theevolution of sexual reproduction. Nature 336, 435—440.

    Kondrashov, A. S. & Crow, J. F. (1993). A molecularapproach to estimating the human deleterious mutationrate. Human Mutation 2, 229-234.

    Kosambi, D. D. (1944). The estimation of map distancefrom recombination values. Annals of Eugenics 12,172-175.

    Kreitman, M. & Wayne, M. L. (1994). Organization ofgenetic variation at the molecular level: lessons fromDrosophila. In Molecular Ecology and Evolution: Approa-ches and Applications (ed. B. Schierwater, B. Streit, G. P.Wagner and R. DeSalle), pp. 157-184. Basel: Birkhauser.

    Krimbas, C. B. (1993). Drosophila subobscura. Hamburg:Verlag Dr Kovac.

    Lande, R. (1994). Risk of population extinction from

    https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0016672300033619Downloaded from https://www.cambridge.org/core. IP address: 54.39.106.173, on 30 Mar 2021 at 16:16:57, subject to the Cambridge Core terms of use, available at

    https://www.cambridge.org/core/termshttps://doi.org/10.1017/S0016672300033619https://www.cambridge.org/core

  • M. Nordborg, B. Charlesworth and Deborah Charlesworth 174

    fixation of new deleterious mutations. Evolution 48,1460-1469.

    Maynard Smith, J. & Haigh, J. (1974). The hitchhikingeffect of a favourable gene. Genetical Research 23, 23-35.

    McPeek, M. S. & Speed, T. P. (1995). Modeling interferencein genetic recombination. Genetics 139, 1031-1044.

    Morton, N. E. (1991). Parameters of the human genome.Proceedings of the National Academy of Sciences, USA 88,7474-7476.

    Mukai, T., Cardellino, R. K., Watanabe, T. K. & Crow,J. F. (1974). The genetic variance for viability and itscomponents in a population of Drosophila melanogaster.Genetics 78, 1195-1208.

    Mukai, T. & Yamaguchi, O. (1974). The genetic structure ofnatural populations of Drosophila melanogaster. XI.Genetic variability in a local population. Genetics 76,339-366.

    Nagylaki, T. (1995). The inbreeding effective populationnumber in dioecious populations. Genetics 139, 473^485.

    Nei, M. (1987). Molecular Evolutionary Genetics. NewYork: Columbia University Press.

    Nei, M. & Murata, M. (1966). Effective population sizewhen fertility is inherited. Genetical Research 8, 257-260.

    Neuffer, M. G. & Coe, E. H. (1974). Corn (maize). InHandbook of Genetics (ed. R. C. King), vol. 2, pp. 3-30.New York: Plenum.

    NIH/CEPH Collaborative Mapping Group (1992). Acomprehensive genetic linkage map of the human genome.Science 258, 67-86.

    Ohta, T. & Kimura, M. (1969). Linkage disequilibrium dueto random genetic drift. Genetical Research 13, 47-55.

    Ohta, T. & Kimura, M. (1975). The effect of a selected locuson heterozygosity of neutral alleles (the hitch-hikingeffect). Genetical Research 25, 313-326.

    Robertson, A. (1961). Inbreeding in artificial selectionprogrammes. Genetical Research 2, 189-194.

    Santiago, E. & Caballero. A. (1995). Effective size ofpopulations under selection. Genetics 139, 1013-1030.

    Simonsen, K. L., Churchill, G. A. & Aquadro, C. F. (1995).Properties of statistical tests of neutrality for DNApolymorphism data. Genetics 141, 413^29.

    Stephan, W. (1995). An improved method for estimating therate of fixation of favorable mutations based on DNApolymorphism data. Molecular Biology and Evolution 12,959-962.

    Stephan, W., Wiehe, T. H. E. & Lenz, M. W. (1992). Theeffect of strongly selected substitutions on neutral poly-morphism: analytical results based on diffusion theory.Theoretical Population Biology 41, 237-254.

    Tanksley, S. D., Ganal, M. W., Prince, J. P., deVicente,M. C, Bonierbale, M. W., Broun, P., Fulton, T. M.,Giovannoni, J. J., Grandillo, S., Martin, G. B.,Messeguer, R., Miller, J. C, Miller, L., Paterson, A. H.,Pineda, O., Roder, M. S., Wing, R. A., Wu, W. & Young,N. D. (1992). High density molecular linkage maps of thetomato and potato genomes. Genetics 132, 1141-1160.

    Thomson, G. (1977). The effect of a selected locus on linkedneutral loci. Genetics 85, 753-788.

    Wiehe, T. H. E. & Stephan, W. (1993). Analysis of a genetichitchhiking model and its application to DNA poly-morphism data. Molecular Biology and Evolution 10,842-854.

    https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0016672300033619Downloaded from https://www.cambridge.org/core. IP address: 54.39.106.173, on 30 Mar 2021 at 16:16:57, subject to the Cambridge Core terms of use, available at

    https://www.cambridge.org/core/termshttps://doi.org/10.1017/S0016672300033619https://www.cambridge.org/core