germ banks affect the inference of past demographic events filegerm banks affect the inference of...

13
Germ banks affect the inference of past demographic events DANIEL Ž IVKOVIC ´ * and AURE ´ LIEN TELLIER* *Section of Evolutionary Biology, Department of Biology II, BioCenter, LMU Munich, Grosshaderner Strasse 2, 82152 Planegg- Martinsried, Germany, Section of Population Genetics, Center of Life and Food Sciences Weihenstephan, Technische Universita ¨t Mu ¨ nchen, 85354 Freising, Germany Abstract Continuous progress in empirical population genetics based on the whole-genome polymorphism data requires the theoretical analysis of refined models in order to interpret the evolutionary history of populations with adequate accuracy. Recent stud- ies focus prevalently on the aspects of demography and adaptation, whereas age structure (for example, in plants via the maintenance of seed banks) has attracted less attention. Germ banking, that is, seed or egg dormancy, is a prevalent and important life-history trait in plants and invertebrates, which buffers against environmental vari- ability and modulates species extinction in fragmented habitats. Within this study, we investigate the combined effect of germ banking and time-varying population size on the neutral coalescent and particularly derive the allele frequency spectrum under some simplifying assumptions. We then perform an ABC analysis using two simple demographic scenariosa population expansion and an instantaneous decline. We demonstrate the appreciable influence of seed banks on the estimation of demo- graphic parameters depending on the germination rate with biases scaled by the square of the germination rate. In the more complex case of a population bottleneck, which comprises an instantaneous decline and an expansion phase, ignoring informa- tion on the germination rate denies reliable estimates of the bottleneck parameters via the allelic spectrum. In particular, when seeds remain in the bank over several generations, recent expansions may remain invisible in the frequency spectrum, whereas ancient declines leave signatures much longer than in the absence of seed bank. Keywords: allele frequency spectrum, approximate Bayesian computation, coalescent, seed and germ bank, time-varying population size Received 10 May 2012; revision received 10 August 2012; accepted 21 August 2012 Introduction Since the beginning of the 20th century, molecular data have been used to estimate the recent evolutionary his- tory of populations (e.g. Hirschfeld & Hirschfeld 1919). This has been made largely feasible by theoretical advances demonstrating the possibility to detect depar- tures from equilibrium conditions (e.g. panmictic popu- lation, mutationdrift equilibrium), as for instance deviations from demographic stationarity (e.g. Watter- son 1984). Understanding which demographic events or selective forces shape the patterns of polymorphism is fundamental in evolutionary genetics (e.g. Stephan 2010) and of practical relevance for conservation biol- ogy (e.g. Olivieri et al. 2008). Genetic data are indeed increasingly used to reconstruct the demographic his- tory of species or populations, such as past bottlenecks due to hunting, the introduction of alien species or hab- itat loss and fragmentation for endangered species (e.g. Olivieri et al. 2008). It becomes therefore important to quantify which ecological factors or life-history traits affect the precision of these inferences and thus poten- tially the conclusions of these studies. For example, the Correspondence: Aure ´lien Tellier, Fax: +49 89 2180 74 104; E-mail: [email protected] © 2012 Blackwell Publishing Ltd Molecular Ecology (2012) doi: 10.1111/mec.12039

Upload: others

Post on 14-Sep-2019

10 views

Category:

Documents


0 download

TRANSCRIPT

Germ banks affect the inference of past demographicevents

DANIEL Ž IVKOVIC* and AURELIEN TELLIER*†

*Section of Evolutionary Biology, Department of Biology II, BioCenter, LMU Munich, Grosshaderner Strasse 2, 82152 Planegg-

Martinsried, Germany, †Section of Population Genetics, Center of Life and Food Sciences Weihenstephan, Technische Universitat

Munchen, 85354 Freising, Germany

Abstract

Continuous progress in empirical population genetics based on the whole-genome

polymorphism data requires the theoretical analysis of refined models in order to

interpret the evolutionary history of populations with adequate accuracy. Recent stud-

ies focus prevalently on the aspects of demography and adaptation, whereas age

structure (for example, in plants via the maintenance of seed banks) has attracted less

attention. Germ banking, that is, seed or egg dormancy, is a prevalent and important

life-history trait in plants and invertebrates, which buffers against environmental vari-

ability and modulates species extinction in fragmented habitats. Within this study,

we investigate the combined effect of germ banking and time-varying population size

on the neutral coalescent and particularly derive the allele frequency spectrum under

some simplifying assumptions. We then perform an ABC analysis using two simple

demographic scenarios—a population expansion and an instantaneous decline. We

demonstrate the appreciable influence of seed banks on the estimation of demo-

graphic parameters depending on the germination rate with biases scaled by the

square of the germination rate. In the more complex case of a population bottleneck,

which comprises an instantaneous decline and an expansion phase, ignoring informa-

tion on the germination rate denies reliable estimates of the bottleneck parameters

via the allelic spectrum. In particular, when seeds remain in the bank over several

generations, recent expansions may remain invisible in the frequency spectrum,

whereas ancient declines leave signatures much longer than in the absence of seed

bank.

Keywords: allele frequency spectrum, approximate Bayesian computation, coalescent, seed and

germ bank, time-varying population size

Received 10 May 2012; revision received 10 August 2012; accepted 21 August 2012

Introduction

Since the beginning of the 20th century, molecular data

have been used to estimate the recent evolutionary his-

tory of populations (e.g. Hirschfeld & Hirschfeld 1919).

This has been made largely feasible by theoretical

advances demonstrating the possibility to detect depar-

tures from equilibrium conditions (e.g. panmictic popu-

lation, mutation–drift equilibrium), as for instance

deviations from demographic stationarity (e.g. Watter-

son 1984). Understanding which demographic events or

selective forces shape the patterns of polymorphism is

fundamental in evolutionary genetics (e.g. Stephan

2010) and of practical relevance for conservation biol-

ogy (e.g. Olivieri et al. 2008). Genetic data are indeed

increasingly used to reconstruct the demographic his-

tory of species or populations, such as past bottlenecks

due to hunting, the introduction of alien species or hab-

itat loss and fragmentation for endangered species (e.g.

Olivieri et al. 2008). It becomes therefore important to

quantify which ecological factors or life-history traits

affect the precision of these inferences and thus poten-

tially the conclusions of these studies. For example, theCorrespondence: Aurelien Tellier, Fax: +49 89 2180 74 104;

E-mail: [email protected]

© 2012 Blackwell Publishing Ltd

Molecular Ecology (2012) doi: 10.1111/mec.12039

existence of metapopulation structure may create spuri-

ous bottleneck signals depending on the genetic

differentiation (or gene flow), genetic diversity and

sampling scheme. As a result, several conservation biol-

ogy studies may have overestimated or incorrectly

detected bottlenecks (discussed in Chikhi et al. 2010).

Theoretical studies in population genetics were prev-

alently based on the continuous-time approximation of

the Wright–Fisher model (Fisher 1930; Wright 1931),

which inter alia assumes a constant population size, ran-

domly mating individuals without structure and nonov-

erlapping generations in discrete time. Kimura (1955)

introduced the time rescaling, which generalizes several

results that were originally derived for the basic model

to deterministic changes in population size. Thereafter,

studies often considered particular demographic scenar-

ios such as instantaneous changes (e.g. Watterson 1984)

or exponential growth (e.g. Slatkin & Hudson 1991),

before Griffiths & Tavare (1994) investigated the coales-

cent for deterministic changes in population size. Fur-

thermore, the allele frequency spectrum that serves as

an essential statistic to find agreements between demo-

graphic models and samples of DNA sequences has

been derived (e.g. Griffiths & Tavare 1998; Živkovic &

Stephan 2011). Although deterministic changes in popu-

lation size are arguably more popular regarding the

interpretation of biological data, studies have also

focussed on stochastically varying population sizes (e.g.

Kaj & Krone 2003).

Departure from the Wright–Fisher assumptions

arises also if species show age-structured populations

due to specific life-history traits (Charlesworth 1994)

such as created by overlapping generations or germ

banks. Germ banks spanning over several generations

are ubiquitous characteristics to many species (Evans

& Dennehy 2005) encompassing seed dormancy in

plants (Templeton & Levin 1979; Nunney 2002; Evans

et al. 2007), resting eggs, for example, in pond sedi-

ments of Daphnia (Decaestecker et al. 2007) and sur-

vival of spores in bacteria (e.g. Lennon & Jones 2011).

It has been suggested theoretically (e.g. Templeton &

Levin 1979) and shown empirically (Evans et al. 2007;

Tielborger et al 2012) that adaptation for dormancy is a

bet-hedging strategy to magnify the evolutionary effect

of ‘good’ years and to dampen the effect of ‘bad’

years, that is, to buffer environmental variability.

Importantly, germ dormancy generates an increase in

the effective population size compared to the census

size of the observable population by (i) promoting the

storage of genetic diversity (Templeton & Levin 1979;

Nunney 2002) and (ii) counter-acting habitat fragmen-

tation by buffering against the extinction of small

and isolated populations—a phenomenon known as

‘temporal rescue effect’ (Brown & Kodric-Brown 1977;

Honnay et al. 2008). Seed banks are also key for the

conservation of endangered plant species as a life-his-

tory trait modulating habitat fragmentation. We denote

here the observable population as a general term

describing the census size of reproductive individuals

such as the above-ground plants, and the free living

individuals of invertebrates and bacteria. For simplic-

ity, we interchangeably refer to seed and germ banks

hereafter.

Based on the theoretical work of Kaj et al. (2001), seed

banking leads to an increase in the effective population

size because a coalescent event of two lineages can only

occur in a reproducing individual in the observable

population. Assuming a population of constant size in

time as Kaj et al. (2001), the relative allele frequencies

within a sample remain unchanged in the seed bank

model. This is because the underlying mean waiting

times to coalescence are equivalently stretched com-

pared to a population without a bank. However, we

reason that if a population undergoes size changes in

time, long germ banks with small germination rates

may buffer or enhance the effect of the demography.

The polymorphism signature, that is, the observed

genetic variability and the allelic spectrum, of a given

past demographic event may then be affected compared

to a population without banks. Germ banks would thus

create spurious signatures of a past population expan-

sion or a bottleneck, or make these signatures nondiffer-

entiable based on SNP data.

In this study, we analyse the effect of germ banks

on the detection of past population size changes under

neutrality. First, we derive the frequency spectrum for

a Wright–Fisher-type dynamics with germ bank and

for widely used models of population size changes

based on the work of Kaj et al. (2001), Tavare (1984),

Griffiths & Tavare (1998) and Živkovic & Stephan

(2011). On the means of a simple expansion model, we

exemplify that numerous parameter combinations of

rate and time of growth lead to equivalent relative fre-

quency spectra depending on the germination rate.

Second, we investigate if this confounding effect of

seed banks is likely to strongly impede inference of

past demographic events. We simulate two simple

demographic scenarios—a population expansion and

an instantaneous decline—with variable parameters

assuming the presence of banks with different germi-

nation rates. To mimic current studies, we infer with

approximate Bayesian computation (ABC) (e.g. Beau-

mont et al. 2002) the past demography ignoring the

effect of seed banks on the polymorphism data. We

show that the model choice of the ABC method is

robust to differentiate between these two demographic

models. However, the parameter inference procedure

presents strong biases in the estimates. Finally, we

© 2012 Blackwell Publishing Ltd

2 D. Ž IVKOVIC and A. TELLIER

show that more complex demographic histories such as

bottlenecks, which include a decline and an expansion,

can even lead to an excess of low- , intermediate- or

high-frequency derived alleles depending on the usu-

ally unknown germination rate. The characterization of

certain demographic tendencies based on the frequency

spectrum is in these cases cumbersome. On the posi-

tive side, if knowledge on the germination rate is

available, relatively older changes in population size

remain detectable in the polymorphism data in com-

parison with the model without seed bank due to the

enlarged coalescent tree.

Methods and results

The coalescent for seed bank models with constantpopulation size

Our model is based on the elegant urn model by Kaj

et al. (2001) describing the neutral seed bank dynamics

for a haploid population of constant size. In each gener-

ation, the population consists of N individuals with pro-

portion bi originating from seeds produced i = 1,…,m

generations ago, where m denotes the maximum num-

ber of generations a seed may spend in the bank. In a

given generation, each individual is randomly drawn

independently from the others with probabilities

b1; . . .; bm from the appropriate generation. The popula-

tion of a new generation is thus formed via multinomial

sampling from the previous m generations. Viewing

time retrospectively, each individual in a given genera-

tion is assigned randomly to an ancestor from the

previous m generations according to the above probabil-

ities. This procedure can be seen as a process in which

a sample of balls of initial size n at present is relocated

across the previous generations by sliding a window

that comprises m consecutive generations as cells in a

stepwise manner. When the window is slided one gen-

eration backwards, all balls from the first cell of the pre-

vious window are relocated into one of the m cells of

the actual window. More precisely, each ball is relo-

cated into one of the N slots of a given cell. Each slot

represents an individual of the population in the

respective generation. During the relocation process,

two types of coalescent events may occur: either two

balls are relocated into the same slot of the same cell or

a ball is relocated into a previously occupied slot. It has

been shown that more than one coalescent event hap-

pens with the negligible probability of Oð1=N2Þ at a

time (Kaj et al. 2001). The probability of one coalescent

event is O(1/N) at each step, so that coalescences occur

in O(N) steps. In contrast, the configuration process

describing the distribution of balls across the cells of

the windows over time offers transitions between the

states at each step. So the configuration process has

time to reach an equilibrium between coalescent events

for large N (Kaj et al. 2001). This separation of time-

scales into a slow and a fast process has been applied

to several population genetic models (e.g. Nordborg &

Krone 2002).

The ancestral process of the seed bank model is

denoted by ðANn ðkÞÞk� 0, where AN

n ðkÞ is the number of

ancestors at step k with population size N and initial

sample size n. Let b = 1/E(B), where EðBÞ ¼ Pmi¼ 1 ibi

is the expected value of the seed bank age distribution

PðB ¼ jÞ ¼ bj, j = 1,…,m, or simply the mean time a

seed will spend in the bank. In a biological meaning, bis approximately the germination rate (Tellier et al.

2011a), so that we will refer to b as the germination rate

throughout. The main result of Kaj et al. (2001) states

that the time-recaled ancestral process ðANn ð½Nt�ÞÞt� 0

converges as N?∞ to the continuous-time Markov

chain ðAnðtÞÞt� 0 with infinitesimal generator matrix

Q ¼ ðqijÞi;j2f1;...;ng defined by

qii ¼ �b2i

2

� �; 2� i� n;

qii�1 ¼ b2i

2

� �; 2� i� n;

qij ¼ 0; otherwise:

ð1Þ

So the limiting process of the seed bank model is the n-

coalescent (Kingman 1982) run on a slower timescale.

From eqn 1, it is straightforward to derive the proba-

bility that the process AnðtÞ is in a certain state, in

which there are j = n,…,2 ancestors, at time t via the

matrix method (e.g. Tavare 1984; Živkovic & Stephan

2011). After some algebra, one obtains

PðAnðtÞ ¼ jÞ ¼Xnk¼j

cnkrkj exp �b2k

2

� �t

� �; ð2Þ

where cnk ¼ nk

� �kðkÞ=nðkÞ and rkj ¼ ð�1Þk�j k

j

� �jðk�1Þ=

kðk�1Þ are the elements of the matrices of column

and row eigenvectors of Q, respectively, and

aðbÞ ¼ aðaþ 1Þ � � � ðaþ b� 1Þ, að0Þ ¼ 1. The mean wait-

ing times between coalescent events are given by

EðTjÞ ¼Z 1

0

PðAnðtÞ ¼ jÞdt ¼ � � � ¼ b2j

2

� �� ��1

; ð3Þ

as the inverse of the coalescent rate. The germination

rate, b, is bounded as 1/m� b� 1. The lower and upper

bounds result from the scenarios, where all seeds,

respectively, rest m and one generation in the bank. So

the expected coalescent tree can be up to m2 genera-

© 2012 Blackwell Publishing Ltd

GERM BANKS AFFECT INFERENCE OF DEMOGRAPHY 3

tions longer in the seed bank model compared with the

usual Wright–Fisher model.

The coalescent for seed bank models with variablepopulation size

When population size changes occur, plants and seeds

of all age classes are assumed to be equivalently

affected such that the relative proportions of all type of

seeds remain constant over time. Then the probabilities

b1; . . .; bm and therefore the germination rate b remain

constant over time as well, such that in the urn model a

change in population size solely alters the number of

slots in the corresponding cell. One may also think of a

more complex neutral model, in which an environmen-

tal change affects solely the plants but not the seeds of

the corresponding generation, such that subsequently

the proportions of seeds of different age classes could

very well change. However, for mathematical conve-

nience, we focus on the simplified setting.

In discrete time, let qNðiþ k� 1Þ ¼ Nðiþ k� 1Þ=Nbe the ratio of the population size, N(i+k�1), at the ith

cell of the kth m-window, relative to the population

size, N, at time of sampling. As usual, the population

size is assumed to be large in each generation, which

will particularly allow the configuration process to

reach an equilibrium between coalescence events as in

the case of constant population size. The demographic

changes here occur on the coalescent time scale and not

generation-wise (as in Nunney 2002). Moreover, we

require that the population size remains approximately

constant over a given m-window k0 as determined

by the population size of the first cell, that is,

qNðiþ k0 � 1Þ � qNðk0Þ. This simplification holds in

particular for a geometrically growing population, if the

growth rate and m are chosen realistically small. In the

case of an instantaneous population decline, this rela-

tionship is violated just for m�1 generations, so that for

small m instantaneous changes within a window can be

neglected due to the small corresponding coalescence

probability for large population sizes. In summary,

models encompassing these forms of demographic

changes can be approximately treated as the usual

Wright–Fisher model regarding changes in population

size (e.g. Griffiths & Tavare 1994) as being determined

by the first cell of each window.

In continuous time, let the function q(t), which arises

from qNð½Nt�Þ as N?∞ and time being measured in

units of N generations, be piecewise continuous and

bounded. The time-rescaling argument for the coales-

cent approximation of the usual Wright–Fisher model

(e.g. Griffiths & Tavare 1994), t ! R t0 qðsÞ�1ds, can be

applied to the ancestral process ðAnðtÞÞt� 0 to obtain the

process with time-varying population size ðAqnðtÞÞt� 0

according to the convention in discrete time. Therefore,

the corresponding results to eqns 2 and 3 are given by

PðAqnðtÞ ¼ jÞ ¼

Xnk¼j

cnkrkj exp �b2k

2

� �Z t

0

qðsÞ�1ds

� �ð4Þ

and

EðTjÞ ¼Z 1

0

PðAqnðtÞ ¼ jÞdt; ð5Þ

respectively. It might be worth mentioning that all of

the above equations hold for a diploid population of

size N and scaling time in units of 2N generations. For

simplicity, we will mostly consider demographies that

comprise exponential growths and instantaneous

declines. As a first example, we illustrate in Fig. 1 the

effect of different germination rates on the genealogy of

a sample from an exponentially growing population.

The ratio of the means of external and total branch

lengths, which are both simply obtained from eqns 4

and 5, is used as a tree measure. These ratios are clearly

elevated compared with the basic model of a constant

population size (without dormancy). Equivalent curves

are found for different values of b, as the time of expan-

sion, te, is shifted to the past, and the growth rate, R, is

adequately reduced to keep the ratio of the ancestral

and the actual population size, d, constant. The strong-

est accumulation of coalescent events as represented by

the peaks of this measure is shifted to the past with

decreasing b, as seeds remaining longer in the bank

compensate the demographic coalescence pressure.

0.5

0.4

0.3

0 0.5 1 1.5 2

Fig. 1 The ratio of the expected external branch length, EðTeÞ,and the expected total tree length, EðTcÞ, is plotted over the

time of expansion, te, for a sample of size n = 20 and various

values of b, which is the inverse of the mean time a seed will

spend in the bank. The underlying demography is a popula-

tion expansion from previously constant size, that is,

q(t) = exp (�Rt), 0� t\ te, q(t) = d, te � t. R is determined by

te as R ¼ � logðdÞ=te, and d = 0.1. The basic model refers to a

constant population size without seed bank.

© 2012 Blackwell Publishing Ltd

4 D. Ž IVKOVIC and A. TELLIER

Adding mutations to the genealogy

Kaj et al. (2001) modelled neutral mutations as being age-

dependent, so that older seeds accumulate more muta-

tions than younger ones. However, to shorten notation,

we assume the mutation rate to be identical for seeds of

all age classes, as (i) following Kaj et al. (2001) the differ-

ent rates are summarized into an overall mutation rate,

so that the results below are applicable in their age-

dependent model as well, and (ii) evidence for such a

dependency is scarce (Vitalis et al. 2004; Honnay et al.

2008; but see Levin 1990). Neglecting seed banks, muta-

tions are commonly assumed to occur independently

according to Poisson processes of rate h/2 along the

edges of the coalescent tree, where the population muta-

tion rate h ¼ limN!1 2Nl with N being the population

size at the time of sampling and l being the mutation

probability per sequence per generation. The seed bank

model requires in addition that mutations are solely

superimposed on the above-ground edges of the coales-

cent tree. Similarly as Kaj et al. (2001), we thus introduce

the scaled mutation rate for seed bank models, hb,referred to as the b-scaled mutation rate, via the relation-

ship hb ¼ bh, as seeds germinate on average every 1/bgenerations. This relationship particularly holds for time-

varying population size, as each ancestral line on average

remains above-ground an equivalent amount of time, and

mutations occur along the above-ground edges condi-

tional on their lengths. Therefore, and assuming an infi-

nitely many sites mutation model (Kimura 1969), where

each mutation arises at a previously monomorphic site,

results regarding allele frequency spectra of general bin-

ary coalescent trees (Griffiths & Tavare 1998) are applica-

ble and given in eqns 6 and 7. The allele or site frequency

spectrum, hereafter denoted as frequency spectrum, is

the distribution of the number of derived alleles in a sam-

ple of size n over a large number of polymorphic sites. As

mutations can be either counted absolutely or relative to

the total number of segregating sites, we note and use

both results. The absolute and relative frequency spectra,

fi and ri, 1� i� n�1, are, respectively, given by

fi ¼ hb2

Xn�iþ1

k¼2

k

n� i� 1

k� 2

� �n� 1

k� 1

� � EðTkÞ; ð6Þ

and

ri ¼Xn�iþ1

k¼2

k

n� i� 1

k� 2

� �n� 1

k� 1

� � EðTkÞ.Xn

k¼2

kEðTkÞ: ð7Þ

Again, these equations are applicable in the diploid

case as well, where time is scaled in units of 2N genera-

tions using hb ¼ bh and h ¼ limN!1 4Nl. For constant

population size, the number of mutations is elevated by

1/b in each class of the absolute frequency spectrum, fi,

compared to the model without dormancy, whereas the

relative frequency spectrum, ri, is equivalent with and

without dormancy. We revisit the demographic exam-

ple of Fig. 1 in terms of relative allele frequencies, ri, by

applying eqn 5 to eqn 7. The expansions (Fig. 2a) are

chosen so that the corresponding frequency spectra, ri,

are equivalent for the different germination rates

(Fig. 2b), which holds for arbitrary values of b, te and R

as long as the values of teb2, d and R=b2 remain the

same. The amount of singletons corresponds to the

maximum value of EðTeÞ=EðTcÞ in Fig. 1. This example

particularly shows that a recent (te ¼ 0:12) and strong

(R = 20) expansion without seed bank has an identical

frequency spectrum as an old (te ¼ 2:88) and weak

(R = 0.8) expansion with a small germination rate

(b = 0.2). Thus, expansions will be dated as too recent

and growth rates overestimated, when seed banks are

not taken into account. The impact of the germination

(a)

(b)

Fig. 2 Three different parameter combinations of the rate and

time of growth (a) leading to equivalent relative frequency

spectra, ri, depending on the germination rate (b). The parame-

ter combinations of the curves from left to right (a) are given

top down in the legend of (b).

© 2012 Blackwell Publishing Ltd

GERM BANKS AFFECT INFERENCE OF DEMOGRAPHY 5

rate, b, onto the estimation of demographic changes is

studied in more detail in the next section.

Simulation procedure and pseudo-observed datasets

On the basis of two simple models of time-varying pop-

ulation size, we study how well the demographic history

of a given single population can be retrieved in a species

with seed bank. The estimation is based on approximate

Bayesian computation (ABC) (e.g. Beaumont et al. 2002)

using the mean of the absolute frequency spectrum

across loci as the set of summary statistics. In the follow-

ing, we use the absolute instead of the relative frequency

spectrum, as it captures information on the number of

segregating sites. Two situations are modelled. First, we

mimic the common situation where no information on

the seed bank is available, that is, seed banks are ignored

and the demographic coalescent inference rests on the

simple Wright–Fisher model with past population size

changes. Second, and for comparison, we assume that

the germination rate b is known, which corresponds to a

few rare cases (Tellier et al. 2011a). Here, the estimation

procedure is conducted taking the existence of seed

banks into account using a coalescence model with seed

banks and known values of b. The third possibility of

simultaneously estimating the demography and b is not

taken into account due to the various combinations of

demographic parameters and b-values, which result in

equivalent frequency spectra (Fig. 2).

The studied population experiences either an expo-

nential growth from previously constant population size

or an instantaneous population decline. The sequences

sampled from the population are thereafter denoted as

pseudo-observed data sets. Exponential growth is mod-

elled as above with three parameters: the time, te, at

which the population expansion starts, the growth rate,

R, and the population mutation rate, h. The ratio, d, of

the ancestral population size before the expansion and

the current population size is determined by te and R

(Fig. 1). The decline model has also three parameters:

the time of decline, td, the ratio of the ancestral and the

current population size, d, and the population mutation

rate, h. Simulations are performed using a modified

version of the coalescent program ms (Hudson 2002) as

previously developed in Tellier et al. (2011a). The

coalescent simulator (C++ code available in the asso-

ciated Dryad patch) follows the expectations of the the-

oretical model as described in eqns 4–7 previously.

The pseudo-observed data sets are composed of a

sample of 20 chromosomes, sequenced at 1000 neutral

and independent loci without intra-locus recombination.

Such data sets capture the signatures of past demogra-

phy on the genome at multiple independent and

neutral loci and represent typically next-generation

sequencing or whole-genome data. These data sets are

simulated under the expansion model with parameters

te;obs, Robs and hobs or under the decline model with

parameters td;obs, dobs and hobs, assuming a seed bank

with b-values of 0.1, 0.2, 0.3, 0.4, 0.5, 0.75 and 0.95,

mimicking long to very short times of dormancy. For

each b-value, 500 pseudo-observed data sets are gener-

ated by drawing values of the three model parameters

randomly from a uniform distribution (Table 1). Note

that the population mutation rate is identical for all

data sets (hobs ¼ 12:5). This value is chosen (i) to gener-

ate a sufficient number of segregating sites at each locus

to perform the statistical estimation procedure and (ii)

based on the very high observed genetic diversity in

wild tomato species exhibiting seed banks (Tellier et al.

2011a). Note that data sets with smaller amount of seg-

regating sites and/or loci will yield more ambiguous

results. Furthermore, the highly idealized simulation

conditions, which assume known mutation and germi-

nation rates, are chosen to narrow down the effect of

seed banks on the estimation of the demographic

parameters. Practically, the b-scaled mutation rate

Table 1 Ranges of values for the set of pseudo-observed data under population expansion and decline. The b-scaled mutation rate,

hb;obs, is chosen so that the population mutation rate, hobs ¼ hb;obs=b, has a fixed value of 12.5 per locus. The parameters te;obs and

td;obs are rescaled by the factor of b2 from the ranges without germ bank (b = 1)

b hb;obs

Expansion Decline

Robs te;obs dobs td;obs

0.1 1.25 0–5 20–400 1–20 20–10000.2 2.5 0–5 5–100 1–20 5–250

0.3 3.75 0–5 2.22–44.44 1–20 2.22–111.110.4 5 0–5 1.25–25 1–20 1.25–62.5

0.5 6.25 0–5 0.8–16 1–20 0.8–400.75 9.375 0–5 0.3556–7.11 1–20 0.3556–17.778

0.95 11.875 0–5 0.2216–4.432 1–20 0.2216–11.08031 12.5 0–5 0.2–4 1–20 0.2–10

© 2012 Blackwell Publishing Ltd

6 D. Ž IVKOVIC and A. TELLIER

varies for different b-values (Table 1) via the relation-

ship hb;obs ¼ bhobs. This allows for a statistical compari-

son of the results for the various b-values based on a

comparable average number of segregating sites across

all data sets. The growth rate, Robs, of the expansion

model ranges conservatively from 0 (no expansion) to 5

for all values of b. The time of expansion, te;obs, varies

depending on the b-values with larger ranges for lower

values of b (Table 1). The rationale for the choice of these

ranges derives from the previous rescaling argument.

For the decline model, the ratio of the ancestral and the

present population size, dobs, ranges from 1 (no decline)

to 20. The time of decline, td;obs, is also chosen depend-

ing on the b-values (Table 1). Each pseudo-observed

data set is summarized as the absolute frequency spec-

trum across 1000 loci using a combination of R, C++codes and the libsequence library (Thornton 2003).

In the following, we estimate for each pseudo-

observed data set generated under population expan-

sion or decline (i) the demographic scenario, that is

expansion, constant size or decline, using the ABC

model choice procedure (e.g. Beaumont et al. 2002) and

(ii) the demographic parameters by means of the local

regression algorithm of the ABC (Excoffier et al. 2005).

Simulation step of the ABC

The simulation step of the ABC comprises 1 000 000

data sets without and with seed banks (for each of the

various b-values as noted previously), respectively, for

each demographic model (expansion, constant size and

decline) and the same number of sampled individuals

and independent neutral loci as for the pseudo-

observed data sets. Each demographic scenario is simu-

lated given a set of three parameters (te;sim or td;sim, Rsim

or dsim, hsim) each, randomly chosen from uniform prior

distributions (Table S1, Supporting information). The

ABCest program (Excoffier et al. 2005) is used to

retrieve the 2000 simulations with the smallest Euclid-

ean distance to the pseudo-observed data sets regarding

the absolute frequency spectrum across loci. The prior

distributions (Table S1, Supporting information) encom-

pass the range of values of the pseudo-observed data

sets (Table 1). To avoid a potential variability in the

number of segregating sites, the prior range of hb;sim(Table S1, Supporting information) is adjusted in the

seed bank models according to the respective values of

hb;obs in the pseudo-observed data sets (Table 1).

Model choice procedure and performance for the chosendemographic scenarios

The model choice procedure rests on a weighted

multinomial logistic regression computed on the 2000

simulations closest to the pseudo-observed data

(Beaumont et al. 2002). Bayes factors are calculated as

the ratio of the posterior probabilities for an expansion

or a decline against the respective other models (Kass

& Raftery 1995). We record the Bayes factors for the

500 pseudo-observed data sets for both demographic

models with and without seed banks. The correct

model is considered to be chosen conservatively if

its Bayes factor is higher than five (Kass & Raftery

1995).

In at least 95% of the pseudo-observed data sets, the

occurrence of a demographic past expansion (against

constant size and decline models, Table S2, Supporting

information) is correctly estimated irrespective of taking

seed banks into account or not. On the other hand, for

a population decline, the correct model is recovered

well, when seed banks are ignored, but not as well

when the seed bank parameter is known (Table S2,

Supporting information). The genomic signature of a

past population expansion, such as an excess of low-

frequency derived polymorphisms in the frequency

spectrum, in a species with seed bank is thus not con-

founded with decline or constant population size mod-

els without seed bank. Similarly, the signature of a past

decline in the frequency spectrum appears to be distin-

guishable from that of the other two demographic mod-

els. In other words, even when ignoring the effect of a

seed bank, it is mostly possible to distinguish the signa-

tures of population expansion and decline with reason-

able certainty.

Method for parameter inference

The parameter estimation procedure is based on the

generated data from the simulation step of the ABC.

The parameters of the expansion model with a certain

seed bank parameter b0, for example, are estimated for

an expansion model without and with a germ bank

characterized by b0. Estimates of the posterior distribu-

tions (mode and 95%-credibility intervals) of each of the

three model parameters (te;est or td;est, Rest or dest and

hest) are obtained by applying the locally weighted mul-

tivariate regression method implemented in the ABCest

program (Beaumont et al. 2002) based on the 2000 data

sets closest to the 500 pseudo-observed data. We sum-

marize the accuracy of the parameter estimates by cal-

culating the relative error (RE), and the root mean

square error (RMSE), for each of the 500 pseudo-

observed data sets (e.g. Tellier et al. 2011b). The relative

error of the time of expansion, for example, is given by

REte ¼ ðte;est � te;obsÞ=te;obs with a negative and positive

value indicating that the parameter is under- and over-

estimated, respectively. The RMSE is the square root of

the average squared relative errors over #sim (here:

© 2012 Blackwell Publishing Ltd

GERM BANKS AFFECT INFERENCE OF DEMOGRAPHY 7

500) data sets. For the time of expansion, for example, it

is given by

RMSEte ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1

#sim

X te;est � te;obste;obs

� �2s

;

where higher values of RMSE indicate a greater estima-

tion inaccuracy. For the presentation of the results and

the calculation of the RMSEs, the 10 most extreme val-

ues of RE are removed for each model parameter. This

allows us to reject unrealistic overestimates of the

observed value, which can be due to a boundary effect

in the prior or in the observed parameter values (Tellier

et al. 2011b).

We compare here the accuracies of models with and

without germ banks in estimating the demographic

parameters from pseudo-observed sequence data with a

given germ bank. Note that the demographic model,

that is, a past decline or an expansion, of the pseudo-

observed data is assumed to be known, which allows

us to focus on the impact of the germ bank in the

parameter estimation procedure.

Inference of past expansion

When seed banks are disregarded, the expansion rate,

R, is overestimated (Fig. 3b) compared to the estimates

assuming the correct b-values (Fig. 3a). The values of

RE are higher for lower values of b and proportional to

1=b2 (Fig. 3a, b), as also indicated by higher values of

RMSE for smaller b (Table S3, Supporting information).

Conversely, the time of expansion, te, is strongly under-

estimated when ignoring seed banks (Fig. 4b) compared

to the estimates with known seed bank (Fig. 4a). Esti-

mates of h are similarly accurate irrespective of taking

seed banks into account or not (Fig. S1, Table S3, Sup-

porting information), as this parameter has been scaled

by b, so that hb provides a similar number of segregat-

ing sites for all models.

Based on our previous theoretical argument that

equivalent frequency spectra are obtained for arbitrary

values of b, te and R, when teb2 and R=b2 are fixed

(Fig. 1), the overestimation of the growth rate R by

means of RE and proportionally to 1=b2 is expected,

when seed banks are ignored. After rescaling the esti-

mated parameter values in Fig. 4b, that is, multiplying

the estimated growth rate, Rest, and dividing the time

of expansion, te;est, by b2, the values of RE and RMSE

are similar to those obtained with seed banks (Figs 3c

and 4c, Table S3, Supporting information). We confirm

therefore (i) that ignoring seed banks when estimating

the demographic parameters leads to an overestimation

of the growth rate and an underestimation of the time of

expansion by a factor of b2 and that (ii) these errors are

–1

0

1

2

3

4

5

0

50

100

150

0.1 0.2 0.3 0.4 0.5 0.75 0.95

0.1 0.2 0.3 0.4 0.5 0.75 0.95

0.1 0.2 0.3 0.4 0.5 0.75 0.95

–1

0

1

2

3

4

5

(a)

(b)

(c)

Fig. 3 Relative errors for the growth rate, R, of a past popula-

tion expansion. The x-axis indicates the b-values under which

the pseudo-observed data are generated, that is, for a model

with germ bank. The relative error distribution over the 500

data sets is shown assuming (a) a model with germ bank and

parameters b equal to that of the pseudo-observed data set, (b)

a model without germ bank and (c) a model without germ

bank with values from (b) multiplied by b2.

© 2012 Blackwell Publishing Ltd

8 D. Ž IVKOVIC and A. TELLIER

not due to our statistical method or the choice of the

absolute frequency spectrum as the summary statistics.

Inference of past decline

Demographic parameters for decline models are mises-

timated as shown in Figs S2, S3 and Table S4, Sup-

porting information. Note that the RMSE-values are

low because the decline ratio, d, and more so the time

of decline, td, are strongly underestimated irrespective

of the value of b in models with and without seed

banks. The population mutation rate, h, is most accu-

rately estimated, and in the model with seed bank h is

slightly more overestimated for lower b-values(Fig. S4a, Supporting information). Finally, rescaling

the parameter estimates of the model without seed

bank via b2 does not improve the results in contrast to

the expansion model. This points out the insufficiency

of the allelic spectrum for the estimation of decline

parameters, as (i) declines show a large variance in

polymorphism patterns among loci in contrast to

expansion scenarios (Živkovic & Wiehe 2008) and (ii)

relatively old events become barely distinguishable

from the basic model.

A more complex model

We define here a model of a population bottleneck that

unifies a population expansion and an instantaneous

decline with fixed parameters (Fig. 5a). We study the

signatures of such a bottleneck on the relative allele

frequencies, ri, depending on the b-values (Fig. 5b).

Without seed dormancy (b = 1), the expansion phase

appears more evident than the rather old decline by an

excess of low-frequency alleles relative to the basic

model of a constant population size. Both demographic

events—the expansion and the instantaneous decline—

are visible in terms of the relative allele frequencies, ri,

for b = 0.6, as there is an excess of low- and high-

frequency derived alleles relative to the basic model. As

the probability of coalescence decreases as b becomes

smaller, the expansion phase is not detectable anymore

in the frequency spectrum for b = 0.2, and only the

instantaneous decline can be observed by an excess of

alleles in intermediate to high frequencies. This effect is

enhanced with larger ancestral population sizes. Simi-

larly, if one decreases the duration of the bottleneck for

small germination rates, b, even the decline becomes

harder to detect, as the relative allele frequencies, ri,

approach those of the basic model again. In conclusion,

for small values of b, recent or short enduring demo-

graphic changes can be barely or not accessible to esti-

mation, whereas old events leave signatures longer than

models without seed bank.

0.1 0.2 0.3 0.4 0.5 0.75 0.95

0

5

10

15

20

25

0

2

4

6

0

5

10

15

20

25

0.1 0.2 0.3 0.4 0.5 0.75 0.95

0.1 0.2 0.3 0.4 0.5 0.75 0.95

(a)

(b)

(c)

Fig. 4 Relative errors for the time, te, of a past population

expansion. The x-axis indicates the b-values under which the

pseudo-observed data are generated, that is, for a model with

germ bank. The relative error distribution over the 500 data

sets is shown assuming (a) a model with germ bank and

parameters b equal to that of the pseudo-observed data set, (b)

a model without germ bank and (c) a model without germ

bank with values from (b) divided by b2.

© 2012 Blackwell Publishing Ltd

GERM BANKS AFFECT INFERENCE OF DEMOGRAPHY 9

Discussion

We study here the impact of the simultaneous occur-

rence of germ dormancy and simple deterministic mod-

els of time-varying population size on neutral

polymorphism patterns. First, based on the results of

Kaj et al. (2001), we obtain the probability distribution

of the ancestral process for a sample of size n, the mean

waiting times between coalescence events and the fre-

quency spectra for various demographic models. We

show then in our simulation study that the model

choice procedure of the ABC retrieves quite well the

correct model between a population expansion and an

instantaneous decline, irrespective of whether the seed

bank is included. However, the germination rate, b, isshown to have a substantial impact on the parameter

estimation. This follows our theoretical result for the

case of a past population expansion that a seed bank

model with germination rate, b, and growth from time

te at rate R leaves an equivalent polymorphism pattern

as the case without seed banks, where growth starts

more recently at b2te at a higher rate R=b2. We con-

clude when studying a more complex bottleneck model

that for small values of b, recent or short enduring

demographic changes can be barely or not accessible to

estimation, whereas old events leave signatures longer

than models without seed bank. Moreover, if complex

demographic scenarios such as population bottlenecks

appear biologically relevant for many species, their

inference may be impossible without a priori informa-

tion on the germination rate (Fig. 5).

The recent burst of genomic data available for many

taxa is prompting the need for further refinements of

the population genetics theory based on the classic

Wright–Fisher model. A general problem is to quantify

the extent to which we can violate assumptions, that is,

ignore the ecological reality, when estimating past

demography. Germ banking or overlapping of genera-

tions, changes in population size and spatial structuring

are realistic assumptions common to many species with

potential consequences for inference. For example, over-

lapping generations in combination with varying popu-

lation size are shown to generate deviations of the

molecular clock from expected patterns of neutrality

(Balloux & Lehmann 2012). We show here that age

structure due to seed banks is a common factor to

account for in population genetics analysis, when seed

dormancy is a bet-hedging strategy and the germination

rate is lower than 0.5 (Figs 3 and 4; Evans et al. 2007;

Tielborger et al. 2012). Germ banks and spatial structur-

ing lead in principle to a similar departure from the

assumption of random mating of the usual Wright–

Fisher model, as there is a separation of individuals

either into different age classes (Charlesworth 1994; Kaj

et al. 2001; Nunney 2002) or into different spatial demes

(Charlesworth et al. 2003). Furthermore, ignoring the

effect of seed banks in spatially structured populations

may lead to misinterpretations of the amount of genetic

differentiation among demes, local effective population

sizes (Vitalis et al. 2004) and demographic changes

within the metapopulations. Another possible ecologi-

cally realistic assumption, but further complication, is

that some species may exhibit long-term seed banks

only in parts of their range. Examples and exhaustive

studies are so far lacking, but in Arabidopsis species,

long-term seed banks may only be prevalent in north-

ern European populations (Lundemo et al. 2009; Falah-

ati-Anbaran et al. 2011). Seed banks may thus not affect

estimates of the whole species’ past demography

(Francois et al. 2008), but may be important for under-

standing the demography and local adaptation in north-

ern populations. Finally, we suggest that the statistical

inference based on SNP data of evolutionary parame-

ters in speciation scenarios, for example, the so-called

0.5 1.0 1.5 2.0

0.2

0.4

0.6

0.8

5 6 71 2 3 4 8 9 10 11 12 13 14 15 16 17 18 19

0.1

0.2

0.3

0.4

0.5

(a)

(b)

Fig. 5 The underlying demography (a) is a bottleneck model,

that is, q(t) = exp (�Rt), 0� t\ te, qðtÞ ¼ d1, te � t\ td,

qðtÞ ¼ d2, td � t, with parameter combination R = 20,

d1 ¼ 0:1, td ¼ 0:6 and d2 ¼ 0:5. te is determined by R as

te ¼ � logðd1Þ=R, such that te � 0:12. Relative frequency spec-

tra, ri, (b) for this demographic model and germination rates as

given in the legend box are illustrated.

© 2012 Blackwell Publishing Ltd

10 D. Ž IVKOVIC and A. TELLIER

isolation with migration model (Wakeley & Hey 1997;

Tellier et al. 2011a), may be as well affected by long-

term seed banks.

In some cases, the inferred neutral null model of

demography with seed banks may serve as a basis to

study the rate of genetic adaptation to biotic and abiotic

environments and the genes under natural selection.

We advocate here also that taking seed banks into

account may be essential to infer the existence of selec-

tion. In the case of balancing selection, the interaction

of a persistent seed bank and temporally fluctuating

selection promotes the maintenance of stable polymor-

phism (Turelli et al. 2001; Tellier & Brown 2009). For

example, seed banks explained the stable single-locus

polymorphism for flower colour found in Linanthus par-

ryae (Turelli et al. 2001), whereas ignoring their effect

led to erroneous evolutionary inference (Schemske &

Bierzychudek 2001). Concerning positive selection, seed

banks slow down the rate of selection (Hairston & De

Stasio 1988), that is, positively selected alleles have

longer fixation times and decrease the rate of local

adaptation in spatially structured populations. In plant

species, recent positive selection may then not be

detectable in sequence data due to the existence of seed

banks (e.g. Gossmann et al. 2010) similarly as in the

case of a recent strong population expansion.

For simplicity, we have estimated in this study the

past demography of a population assuming a known

germination parameter, b, or the known absence of seed

banks. However, in reality, the presence of seed banks

and values of the germination rates are often unknown.

The first option to perform statistical inference of past

events lies in the simultaneous estimation of past

demography and b based, for example, on the fre-

quency spectrum at numerous neutral or reference loci.

However, we show that different combinations of

demographic parameters and b-values may result in

similar genomic signatures (Fig. 2). To circumvent this

difficulty, information on the above-ground population

based on ecological observations is needed and has to

be integrated to define the priors of the population size.

This follows from the results that smaller values of bincrease the observed nucleotide diversity (Kaj et al.

2001; Nunney 2002; Tellier et al. 2011a). A corollary is

that all hypotheses, which could explain an increase in

the observed nucleotide diversity compared to expecta-

tions based on population sizes, should be accounted

for in the model. It is, for example, crucial to incorpo-

rate spatial structuring of populations and limited gene

flow among demes in the population models because (i)

spatial structure may increase genetic diversity com-

pared to expectations based on census sizes and num-

ber of demes (e.g. Charlesworth et al. 2003) and (ii) low

germination rates decrease the genetic differentiation

among demes (Vitalis et al. 2004). The usefulness of

such an approach was recently demonstrated in wild

tomato species, where metapopulation structure is a

key evolutionary factor (Tellier et al. 2011a).

The second option relies on field observations and

measurement of germination rates, which can be used

to define priors on b-values in the model for inference.

The germination rate and dormancy of seeds are, how-

ever, determined by the interactions of genetic (Bent-

sink et al. 2010) as well as physical, climatic and

ecological factors (Fenner & Thompson 2004). Disentan-

gling the influence of these factors on population

dynamics is a key requirement to demonstrate that seed

banks are bet-hedging strategies (e.g. Evans et al. 2007;

Tielborger et al. 2012). Such studies have thus generated

collections of plants and seeds at different points in

time and ecological surveys on population census sizes

(Honnay et al. 2008). We suggest that these data can be

combined with nucleotide sequences and analysed with

new statistical methods of inference as for instance the

ABC procedure, to reveal the evolutionary importance

of long-term germ banks in plants as well as in inverte-

brate and bacterial species.

Acknowledgements

The authors would like to thank Wolfgang Stephan for valu-

able comments on this article. This research was supported by

grant I/84232 from the Volkswagen Foundation to D.Z. and

grant HU1776/1 from the Deutsche Forschungsgemeinschaft to

Stephan Hutter and A.T.

References

Balloux F, Lehmann L (2012) Substitution rates at neutral genes

depend on population size under fluctuating demography

and overlapping generations. Evolution, 66, 605–611.

Beaumont MA, Zhang W, Balding DJ (2002) Approximate

Bayesian computation in population genetics. Genetics, 162,

2025–2035.Bentsink L, Hanson J, Hanhart C, et al. (2010) Natural variation for

seed dormancy in Arabidopsis is regulated by additive genetic

and molecular pathways. Proceedings of the National Academy of

Sciences of the United States of America, 107, 4264–4269.Brown JH, Kodric-Brown A (1977) Turnover rates in insular

biogeography: effect of immigration on extinction. Ecology,

58, 445–449.

Charlesworth B (1994) Evolution in Age-Structured Popula-

tions. Cambridge University Press, Cambridge, UK.

Charlesworth B, Charlesworth D, Barton NH (2003) The effects

of genetic and geographic structure on neutral variation.

Annual Review of Ecology, Evolution and Systematics, 34, 99–125.Chikhi L, Sousa VC, Luisi P, Goossens B, Beaumont MA

(2010) The confounding effects of population structure,

genetic diversity and the sampling scheme on the detection

and quantification of population size changes. Genetics, 186,

983–995.

© 2012 Blackwell Publishing Ltd

GERM BANKS AFFECT INFERENCE OF DEMOGRAPHY 11

Decaestecker E, Gaba S, Raeymaekers JAM, et al. (2007) Host–

parasite ‘red queen’ dynamics archived in pond sediment.

Nature, 450, 870–873.

Evans MEK, Dennehy JJ (2005) Germ banking: bet-hedging and

variable release from egg and seed dormancy. The Quarterly

Review of Biology, 80, 431–451.Evans MEK, Ferriere R, Kane MJ, Venable DL (2007) Bet hedg-

ing via seed banking in desert evening primroses (Oenothera,

Onagraceae): demographic evidence from natural popula-

tions. American Naturalist, 169, 184–194.Excoffier L, Estoup A, Cornuet JM (2005) Bayesian analysis of

an admixture model with mutations and arbitrarily linked

markers. Genetics, 169, 1727–1738.

Falahati-Anbaran M, Lundemo S, Agren J, Stenøien H (2011)

Genetic consequences of seed banks in the perennial herb

Arabidopsis lyrata subsp. petraea (Brassicaceae). American Jour-

nal of Botany, 98, 1475–1485.

Fenner M, Thompson K (2004) The Ecology of Seeds. Cam-

bridge University Press, Cambridge, UK.

Fisher RA (1930) The Genetical Theory of Natural Selection.

Clarendon Press, Oxford.

Francois O, Blum M, Jakobsson M, Rosenberg N (2008) Demo-

graphic history of European populations of Arabidopsis thali-

ana. PLoS Genetics, 4, e1000075.

Gossmann TI, Song BH, Windsor AJ, et al. (2010) Genome wide

analyses reveal little evidence for adaptive evolution in many

plant species. Molecular Biology and Evolution, 27, 1822–1832.

Griffiths RC, Tavare S (1994) Sampling theory for neutral

alleles in a varying environment. Philosophical Transactions of

the Royal Society B: Biological Sciences, 344, 403–410.

Griffiths RC, Tavare S (1998) The age of a mutation in a gen-

eral coalescent tree. Stochastic Models, 14, 273–295.

Hairston Jr NG, De Stasio Jr BT (1988) Rate of evolution slo-

wed by a dormant propagule pool. Nature, 336, 239–242.

Hirschfeld L, Hirschfeld H (1919) Serological differences

between the blood of different races - the result of researches

on the Macedonian front. Lancet, 2, 675–679.Honnay O, Bossuyt B, Jacquemyn H, Shimono A, Uchiyama K

(2008) Can a seed bank maintain the genetic variation in the

above ground plant population? Oikos, 117, 1–5.

Hudson RR (2002) Generating samples under a Wright–Fisherneutral model of genetic variation. Bioinformatics, 18, 337–338.

Kaj I, Krone SM (2003) The coalescent process in a population

of stochastically varying size. Journal of Applied Probability,

40, 33–48.Kaj I, Krone SM, Lascoux M (2001) Coalescent theory for seed

bank models. Journal of Applied Probability, 38, 285–300.Kass RE, Raftery AE (1995) Bayes factors. Journal of the Ameri-

can Statistical Association, 90, 773–795.Kimura M (1955) Random genetic drift in multi-allelic locus.

Evolution, 9, 419–435.Kimura M (1969) The number of heterozygous nucleotide sites

maintained in a finite population due to steady flux of muta-

tions. Genetics, 61, 893–903.

Kingman JFC (1982) On the genealogy of large populations.

Journal of Applied Probability, 19A, 27–43.

Lennon JT, Jones SE (2011) Microbial seed banks: the ecological

and evolutionary implications of dormancy. Nature Reviews

Microbiology, 9, 119–130.Levin DA (1990) The seed bank as a source of genetic novelty

in plants. American Naturalist, 135, 563–572.

Lundemo S, Falahati-Anbaran M, Stenøien H (2009) Seed banks

cause elevated generation times and effective population

sizes of Arabidopsis thaliana in northern Europe. Molecular

Ecology, 18, 2798–2811.Nordborg M, Krone S (2002) Separation of time scales and con-

vergence to the coalescent in structured populations. In:

Modern Developments in Theoretical Population Genetics: The

Legacy of Gustave Malecot (eds Slatkin M and Veuille M), pp.

194–232. Oxford University Press, Oxford, UK.

Nunney L (2002) The effective size of annual plant popula-

tions: the interaction of a seed bank with fluctuating popula-

tion size in maintaining genetic variation. American

Naturalist, 160, 195–204.

Olivieri GL, Sousa V, Chikhi L, Radespiel U (2008) From

genetic diversity and structure to conservation: genetic sig-

nature of recent population declines in three mouse lemur

species (Microcebus spp.). Biological Conservation, 141,

1257–1271.Schemske DW, Bierzychudek P (2001) Evolution of flower color

in the desert annual Linanthus parryae: Wright revisited. Evo-

lution, 55, 1269–1282.

Slatkin M, Hudson RR (1991) Pairwise comparisons of mito-

chondrial DNA sequences in stable and exponentially grow-

ing populations. Genetics, 129, 555–562.Stephan W (2010) Detecting strong positive selection in the

genome. Molecular Ecology Resources, 10, 863–872.Tavare S (1984) Line-of-descent and genealogical processes,

and their application in population genetics model. Theoreti-

cal Population Biology, 26, 119–164.Tellier A, Brown JKM (2009) The influence of perenniality and

seed banks on polymorphism in plant-parasite interactions.

American Naturalist, 174, 769–779.

Tellier A, Laurent SJY, Lainer H, Pavlidis P, Stephan W (2011a)

Inference of seed bank parameters in two wild tomato spe-

cies using ecological and genetic data. Proceedings of the

National Academy of Sciences of the United States of America,

108, 17052–17057.Tellier A, Pfaffelhuber P, Haubold B, et al. (2011b) Estimating

parameters of speciation models based on refined summa-

ries of the joint site-frequency spectrum. PLoS ONE, 6,

e18155.

Templeton AR, Levin DA (1979) Evolutionary consequences of

seed pools. American Naturalist, 114, 232–249.Thornton K (2003) libsequence: a C++ class library for evolu-

tionary genetic analysis. Bioinformatics, 19, 2325–2327.Tielborger K, Petruu M, Lampei C (2012) Bet-hedging germina-

tion in annual plants: a sound empirical test of the theoretical

foundations, Oikos, doi:10.1111/j.1600-0706.2011.20236.x.

Turelli M, Schemske DW, Bierzychudek P (2001) Stable two-

allele polymorphisms maintained by fluctuating fitnesses

and seed banks: protecting the blues in Linanthus parryae.

Evolution, 55, 1283–1298.

Vitalis R, Glemin S, Olivieri I (2004) When genes go to

sleep: the population genetic consequences of seed dor-

mancy and monocarpic perenniality. American Naturalist,

163, 295–311.

Wakeley J, Hey J (1997) Estimating ancestral population

parameters. Genetics, 145, 847–855.

Watterson GA (1984) Allele frequencies after a bottleneck. Theo-

retical Population Biology, 26, 387–407.

© 2012 Blackwell Publishing Ltd

12 D. Ž IVKOVIC and A. TELLIER

Wright S (1931) Evolution in Mendelian populations. Genetics,

16, 97–159.Živkovic D, Stephan W (2011) Analytical results on the neutral

non-equilibrium allele frequency spectrum based on diffu-

sion theory. Theoretical Population Biology, 79, 184–191.

Živkovic D, Wiehe T (2008) Second-order moments of segregating

sites under variable population size. Genetics, 180, 341–357.

D.Z. is a postdoctoral researcher at LMU Munich. His research

focuses on the enhancement of coalescent and diffusion theory

regarding the inclusion of ecologically realistic assumptions

such as varying population size, seed banks and natural selec-

tion. A.T. is a professor of population genetics at TUM and

focuses on combining theoretical approaches with the use of

new sequencing technologies, for studying seed banks and

plant-pathogen coevolution.

Data accessibility

The codes used in this article are archived on Dryad

(doi:10.5061/dryad.7kp90).

Supporting information

Additional Supporting Information may be found in the online ver-

sion of this article.

Table S1 Ranges of values for the ABC priors under popula-

tion expansion and decline, which encompass the correspond-

ing ranges of the pseudo-observed datasets (Table 1).

Table S2 Results of the model choice procedure for population

expansion and decline.

Table S3 Root mean square errors for the estimates of the

expansion model parameters h , R and te assuming a model

without or with germ banks and the correct b -value.

Table S4 Root mean square errors for the estimates of the

decline model parameters h , d and td assuming a model with-

out or with germ banks and the correct b -value.

Fig. S1 Relative errors for the population mutation rate, h , of

a past population expansion. The x-axis indicates the b -values

under which the observed data were generated, that is, for a

model with germ bank.

Fig. S2 Relative errors for the decline ratio, d, of a past popula-

tion decline. The x-axis indicates the b -value under which the

observed data were generated, that is, for a model with germ

bank.

Fig. S3 Relative errors for the time of decline, td, of a past pop-

ulation decline.

Fig. S4 Relative errors for the population mutation rate, h , of

a past population decline.

Please note: Wiley-Blackwell are not responsible for the content

or functionality of any supporting materials supplied by the

authors. Any queries (other than missing material) should be

directed to the corresponding author for the article.

© 2012 Blackwell Publishing Ltd

GERM BANKS AFFECT INFERENCE OF DEMOGRAPHY 13