genome evolution and adaptation of a successful...

52
ACTA UNIVERSITATIS UPSALIENSIS UPPSALA 2018 Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology 1632 Genome evolution and adaptation of a successful allopolyploid, Capsella bursa-pastoris DMYTRO KRYVOKHYZHA ISSN 1651-6214 ISBN 978-91-513-0236-2 urn:nbn:se:uu:diva-341709

Upload: dotuyen

Post on 10-Nov-2018

223 views

Category:

Documents


0 download

TRANSCRIPT

ACTAUNIVERSITATIS

UPSALIENSISUPPSALA

2018

Digital Comprehensive Summaries of Uppsala Dissertationsfrom the Faculty of Science and Technology 1632

Genome evolution and adaptationof a successful allopolyploid,Capsella bursa-pastoris

DMYTRO KRYVOKHYZHA

ISSN 1651-6214ISBN 978-91-513-0236-2urn:nbn:se:uu:diva-341709

Dissertation presented at Uppsala University to be publicly examined in Lindhalsalen,Norbyväagen 18, Uppsala, Tuesday, 27 March 2018 at 10:00 for the degree of Doctor ofPhilosophy. The examination will be conducted in English. Faculty examiner: Dr. VincentCastric (CNRS / Université de Lille 1, France).

AbstractKryvokhyzha, D. 2018. Genome evolution and adaptation of a successful allopolyploid,Capsella bursa-pastoris. Digital Comprehensive Summaries of Uppsala Dissertations fromthe Faculty of Science and Technology 1632. 51 pp. Uppsala: Acta Universitatis Upsaliensis.ISBN 978-91-513-0236-2.

The term allopolyploid refers to an organism that originated through hybridization and increasedits ploidy level by retaining the unreduced genomes of its parents. Both hybridization andpolyploidy usually have negative consequences for the organism. However, there are speciesthat not only survive these modifications but even thrive and can outcompete their diploidrelatives. There are many intuitive explanations for the success of polyploids, but the numberof empirical studies is limited.

The shepherd's purse (Capsella bursa-pastoris) is an emerging model for studying asuccessful allopolyploid species. C. bursa-pastoris occurs worldwide, whereas its parentalspecies, Capsella grandiflora and Capsella orientalis, have more limited distribution range.C. grandiflora is confined to Northern Greece and Albania, and C. orientalis is found only inthe steppes of Central Asia. We described the genetic variation within C. bursa-pastoris andshowed that it is not homogeneous across Eurasia but rather subdivided into three geneticallydistinct populations: one comprises accessions from Europe and Eastern Siberia, the secondone is located in Eastern Asia and the third one groups accessions around the Middle East.Reconstruction of the colonization history suggested that this species originated in the MiddleEast and subsequently spread to Europe and Eastern Asia. This colonization was probablyhuman-mediated. Interestingly, these three populations survive in different environmentalconditions, and yet most gene expression differences between them could be explained byneutral processes. We also found that despite a common history within one species, the twosubgenomes retained differences already present between the parental species. In particular,the genetic load was still higher on the subgenome inherited from C. orientalis than on theone inherited from C. grandiflora. The two subgenomes were also differentially influenced byintrogression and selection in the three genetic clusters. Gene expression variation was highlycorrelated between the two subgenomes but the total level of expression showed variation inparental dominance across flower, leaf, and root tissues.

This thesis for the first time shows that the evolutionary pathways of allopolyploids maydiffer not only on the species level but also between populations within one species. It alsosupports the theory that alloploidy provides an increased amount of genetic material that enablesevolutionary flexibility.

Keywords: allopolyploidy, population structure, selection, genetic drift, gene expression,parental legacy, genetic load

Dmytro Kryvokhyzha, Department of Ecology and Genetics, Norbyvägen 18 D, UppsalaUniversity, SE-752 36 Uppsala, Sweden.

© Dmytro Kryvokhyzha 2018

ISSN 1651-6214ISBN 978-91-513-0236-2urn:nbn:se:uu:diva-341709 (http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-341709)

”If you thought that science was certain - well,that is just an error on your part.”

– Richard P. Feynman

List of papers

This thesis is based on the following papers, which are referred to in the textby their Roman numerals.

I Cornille A., Salcedo A., Kryvokhyzha D., Glémin S., Holm K.,Wright S.I. and Lascoux M., 2016. Genomic signature of successfulcolonization of Eurasia by the allopolyploid shepherd’s purse(Capsella bursa-pastoris). Molecular ecology, 25(2), pp. 616-629.

II Kryvokhyzha D.*, Holm K.*, Chen J., Cornille A., Glémin S., WrightS.I., Lagercrantz U. and Lascoux M., 2016. The influence ofpopulation structure on gene expression and flowering time variation inthe ubiquitous weed Capsella bursa-pastoris (Brassicaceae).Molecular ecology, 25(5), pp. 1106-1121.

III Kryvokhyzha D.*, Salcedo A.*, Eriksson M., Duan T., Tawari N.,Chen J., Guerrina M., Kreiner J.M., Kent T.V., Lagercrantz U.,Stinchcombe J., Glémin S., Wright S.I., and Lascoux M., 2017.Parental legacy, demography, and introgression influenced theevolution of the two subgenomes of the tetraploid Capsellabursa-pastoris (Brassicaceae). bioRxiv, 234096.

IV Kryvokhyzha D., Duan T., Orsucci M., Milesi P., Glémin, S., Wright,S.I., and Lascoux, M., 2018. Towards the new normal: Genomic andtranscriptomic changes in the two subgenomes of a 100,000 years oldtetraploid, Capsella bursa-pastoris. (Manuscript)

* These authors contributed equally to the work.Reprints were made with permission from the publishers.

Contents

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.1 Evolutionary significance of polyploidy and hybridization . . . . . . . . 101.2 Population structure, genetic drift and adaptation . . . . . . . . . . . . . . . . . . . . . . . 141.3 Challenges of studying allopolyploid organisms . . . . . . . . . . . . . . . . . . . . . . . . . 181.4 A promising model system to understand allopolyploidy:

C. bursa-pastoris . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2 Present study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3 Conclusions and future perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

Svensk sammanfattning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

Резюме українською . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

1. Introduction

If you ate oatmeal or a sandwich for breakfast today, you ate a polyploid. Oat-meal is made from oat Avena sativa, which is a polyploid. Bread flour is usu-ally produced by graining seeds of either common wheat Triticum aestivum ordurum wheat Triticum turgidum, both of which are also polyploids. Moreover,these three species are also hybrids. Polyploids and hybrids are quite commonamong domesticated plants, and with the recent advent of molecular tools, ithas become evident that they are also abundant in the wild.Polyploid is an organism that has more than two paired sets of chromo-

somes. The majority of macroorganisms have a diploid set of chromosomes(2n), but some animals and many plants have tri- (3n), tetra- (4n), hexa- (6n)and even octoploid (8n) sets of chromosomes. Such increase in the ploidy levelmay arise from a doubling of the same genome within a cell, resulting in anautopolyploid, or frommerging two genomes from different species during hy-bridization, resulting in an allopolyploid (Fig. 1.1). Both autopolyploidy andallopolyploidy imply significant genetic changes that disrupt cellular functionsand usually lead to extinction. However, hybrids and polyploids that manage tosurvive often become successful species and can even outperform their diploidrelatives.Studying the phenomena associated to polyploidizationmay shedmore light

on our understanding of the link between the content of a genome and fitness.The genetic variability of polyploids has often been used in the domesticationprocess. More than half of all crop species are polyploids. The frequencyof polyploids in the wild, while significantly lower, remains high ( 40%) [1].There has been a lot of progress in studying the genomic consequences of poly-ploidy in crops [2, 3]. However, polyploidy was rather a precondition of do-mestication than a consequence [1]. So, we also need to study the origin andmechanisms of evolution in natural polyploid species.The genome-wide analysis of allopolyploid species is, however, not easy.

The significant size of polyploid genomes, the aggregate of two or more diver-gent genomes and the presence of multiple alleles and loci, as well as massivestructural and functional rearrangements, make it challenging to assemble thegenome, obtain accurate genotype data and analyze it with existing popula-tion genetics models that are primarily developed for diploid data. This is themain reason why genomic studies of polyploids are still lagging behind thoseof diploids.This thesis is a contribution to the progress of genomic studies of allopoly-

ploids. Here, I summarize our investigations of genome evolution and adap-tation in one of the most successful natural polyploids, the shepherd’s purse

9

> Homoploid hybrid

Allopolyploid

Autopolyploid

>

>

Figure 1.1. Scheme of origin of a homoploid hybrid, allopolyploid and autopoly-ploid species. Parental species contain two chromosomes (2n). Diploid hybrid (homo-ploid) species arises through merging of one chromosome from each parental species.Allopolyploid species originates the same way except that it combines the entire chro-mosomal sets of both parental species. Autopolyploid species emerges by chromosomedoubling of the same species.

Capsella bursa-pastoris. The results show that evolution of allopolyploids ismore complex than previously imagined. In particular, our results indicate thatboth demography and introgression played an important role in the evolutionof the shepherd’s purse.

1.1 Evolutionary significance of polyploidy andhybridization

The evolutionary role of polyploidy has long been a subject of debate. In the20th century, G. L. Stebbins stated that polyploidy contributed little to progres-sive evolution [4]. W. H. Wagner also denied the significance of polyploidyand regarded it as ”evolutionary noise” [5]. On the other hand, R. J. Schultzinsisted that polyploidy was far from playing a secondary role in evolutionand had actually contributed to major steps in evolution [6]. Despite importantprogress in our understanding of polyploids since these early studies [7], it isstill debated whether polyploidy is an evolutionary dead-end or a major driverof diversification [8, 9, 10, 11].The evolutionary significance of hybridization has also been controversial.

Charles Darwin himself changed his point of view from discounting hybridiza-

10

tion as a source of new forms (”I am very far from believing in hybrids...”,Darwin’s letter to Joseph Hooker in 1856) to emphasizing its significance (”theseveral kinds of dogs are almost certainly descended from more than one spe-cies, and so it is with cattle, pigs and some other domesticated animals.”, Vari-ation of Animals and Plants under Domestication, 1890). The progressive roleof hybridization was totally discarded during the era of the biological speciesconcept that viewed species as discrete reproductively isolated units [12]. Dur-ing the last two decades, the view of hybridization as a source of diversificationand adaptation has been revived [13]. Moreover, with the advent of DNA se-quencing and genome-scale analyses, it has become evident that hybridizationresulting in genomic introgression and speciation is widespread [14, 15].The evolutionary significance of polyploidy is usually attributed to its abil-

ity to rapidly produce new vigorous forms. An increase of ploidy results innearly instantaneous reproductive isolation from parental species [16]. Forexample, if a newly emerged tetraploid (4n) individual crossed with a diploid(2n) ancestor, it would produce a triploid (3n) offspring. Having an odd num-ber of chromosomes, such triploid progeny would be sterile due to problemswith chromosome pairing during meiosis. Reproductive isolation may alsoarise from switching to the asexual reproduction mode, e.i. reproduction frommaternal tissues bypassing sexual fusion of an egg and a sperm. Asexual re-production in polyploids is often realized through parthenogenesis, a processwhen an embryo develops from an unfertilized egg cell. This is a commonreproduction mode in polyploid animals [17]. Polyploids may also reproduceby selfing. The change of ploidy level often disrupts the self-incompatibilitysystem and allows self-fertilization, which is especially common among plants[18, 19]. Thus, polyploidy is one of the few mechanisms that may lead to sym-patric speciation (speciation without geographic isolation).New polyploid lineages often demonstrate superiority relative to diploid

species, a phenomenon known as polyploid vigor. This vigor is generallyattributed to increased genetic variability. First, polyploid genomes are usu-ally unstable at early stages and experience massive rearrangements and frag-ment loss [20]. These major genomic changes introduce more variation into apolyploid population that survives initial stages of formation and thus providesmore material that can be selected for. Second, polyploids have several copiesof each gene. This gene redundancy also benefits a polyploid because genescopies can gain new functions while retaining the old function. Third, poly-ploids have greater ability to mask deleterious mutations and suffer less frominbreeding depression. Because polyploids have several genomic copies, dele-terious mutations that are usually recessive have smaller chance to influence aphenotype. For instance, take a diploid that has a genotype with two dominantalleles AA. If one of its two alleles mutates to the deleterious recessive allele a,the resulting genotype is Aa, and a allele needs to replace only one dominantallele A to produce a genotype aa and change the phenotype. Whereas onerecessive allele would need to replace three dominant alleles in a tetraploid to

11

achieve the same phenotypic change because the genotype of a tetraploid isAAAa. Several genomic copies also ensure that crossing between closely re-lated individuals, known as inbreeding, rarely leads to homozygous offspringand thus does not have a deleterious effect. Thus, deleterious mutations need toreach higher frequency in polyploids than in diploids to achieve a homozygousstate and influence the phenotype. Fourth, polyploids often experience rapidchanges in patterns of gene expression due to gene duplication. Duplicatedgenes may gain new functions (neo-functionalization), partition their origi-nal ancestral function (subfunctionalization) or completely lose their function(pseudogenization) [21, 22, 23]. Finally, these genetic changes also influencethe proteome, metabolome and phenotypic variation. A few studies that haveinvestigated the effect of polyploidy on proteome and metabolome show thatthe variation in proteome and metabolome may substantially differ betweenpolyploids and parental species [24, 25]. Polyploids also tend to possess en-larged organs and overall larger size [26]. All these genetic and phenotypicmodifications of polyploids provide more variability that may facilitate sur-vival.Hybridization also increases genetic variance and can produce new vigorous

forms. The genetic variance is boosted by genomic re-patterning because of theconflict between two divergent genomes and release of transposable elementsfrom the suppression mechanisms established within each parent. The ge-nomic conflict between two divergent genomes also disrupts gene expressionregulation networks and lead to non-additive gene expression patterns whenexpression levels are not non-intermediate between parental species. This newvariation and mixture of different parental traits may result in transgressivephenotypes, whereby a hybrid exhibits extreme traits relative to its two par-ents. Such traits may facilitate colonization of new ecological niches that arenot accessible to parental species. Hence, hybridization may result into hy-brid speciation. The hybrid species is then isolated from its parental speciesby its novel ecological niche. The most famous examples of such a speciationscenario include Helianthus sunflowers, Rhagoletis flies, Cottus sculpins, Ly-caeides butterflies [27, 28, 29, 30]. Isolation from parental species can alsoarise due to assortative mating when hybrids have different mating seasonsor phenology from the parental species and mate only among themselves as insome hybridHeliconius butterflies [31]. These are scenarios of homoploid hy-brid speciation because the chromosome number of a hybrid does not increase(Fig. 1.1).However, hybridization can also be accompanied by chromosomal dou-

bling, i.e. allopolyploidy (Fig. 1.1). Allopolyploid forms can emerge fromchromosome doubling in a formed diploid hybrid or from a fusion of twounreduced gametes during the hybridization event [32, 33]. There is also away through a triploid bridge when an unreduced diploid gamete fuses with ahaploid gamete from a diploid species to form a triploid, and later this triploidproduces an unreduced triploid gamete that fuses with a haploid gamete of a

12

diploid. There are numerous extant allopolyploid species but in most casesit is still unclear how they originated. A few documented examples includePrimula kewensis that originated from somatic chromosome doubling in adiploid hybrid [34], a tetraploid strain of silk moths (Bombyx) that was ob-tained through a triploid bridge [35], and fusion of diploid gametes was usedin the production of synthetic polyploids of red clover and potato [36, 37]. Al-lopolyploids combine features of both hybrids and polyploids. They are morediverse than autopolyploids because they comprise two divergent genomes in-stead of one duplicated genome. They also suffer less from aneuploidy thanhomoploid hybrids and autopolyploids because each divergent subgenome isdoubled and thus can find a pair during meiosis. This is why allopolyploidsusually have even chromosome number, most frequently 4n, and exhibit di-somic inheritance when divergent subgenomes do not recombine. Allopoly-ploids also benefit from having two divergent genomes by differentially ex-pressing genes from the two subgenomes (homeologue-specific expression).Hybridization may also contribute to adaptability and diversification thro-

ugh genomic introgression when one species just acquires a fragment of ge-netic material from another species. It is well known that bacteria thrive onintrogression that often confers adaptive traits [38]. There is also growing ev-idence of such benefits among plants and animals. For example, transfer ofherbivore resistance in sunflower Helianthus [39] and insecticide resistance inmalaria mosquitoes [40] have been attributed to introgressive hybridization.Introgressive hybridization has also contributed to color pattern diversity ofHeliconius butterflies [41] and adaptive radiations in Darwin’s finches [42].There are even evidence for archaic adaptive introgression in humans [43].All these factors that increase genetic diversity and were described above

as advantageous can have negative effects as well. Assuming that the genomestructure and regulatory networks were optimized under selection in the paren-tal species, both increase of ploidy level and hybridization would disrupt thislong-established design and result in fitness loss. Moreover, an increase inthe copy number of chromosomes causes problems with mitosis and meiosisduring cell divisions and changes cellular architecture due to an increased nu-cleus size [44]. Forms that switch to self-fertilization suffer from the negativeeffect of inbreeding depression. Hybrid forms experience conflicts betweendivergent genomes and may suffer from outbreeding depression. Maybe thisis the reason why some allopolyploids evolve towards functional diploidiza-tion when subgenomes experience differential gene loss, deletion, and down-regulation of gene expression [45, 46, 47, 48]. On balance, hybridization andpolyploidy are more often deleterious than advantageous but those individualsthat survive frequently become evolutionary successful lineages.

13

1.2 Population structure, genetic drift and adaptationSpecies are highly dynamics entities: their size, density, and geographical lo-cation fluctuate over time; they are also often highly fragmented over space.This seriously complicates the study of species and the reconstruction of theirhistories. Even if one is primarily interested in a trait that, at first glance, is notrelated to demographics, for instance, the relation between the two subgenomesof a polyploid, the details of the demographic history of the polyploid specieswill need to be considered. Demography will for instance influence the evolu-tion of the trait through the local and overall amount of random genetic drift.Structured populations will also face different environmental conditions andbe under different selection pressures. A textbook example of the impact ofenvironmental changes and local adaptation is the Peppered moth (Biston be-tularia). It was found that dark moths were prevalent in the polluted environ-ment around Birmingham, UK, because they were less conspicuous on darktree trunks, while light moths were common in a clean environment such asDevon, UK, where tree trunks were covered with lichens and thus were lightin color [49]. This difference is proven to be due to differential selection [50].However, as hinted above, evolutionary differences between populations are

not necessarily the result of selection. Differences may also arise simply due toneutral processes. Suppose there are two well-isolated populations and a newmutation arises in one of these populations. Its probability to disappear fromthe population is high, even if it is positively selected. This is nicely illustratedby J. Haldane’s [51] early result on the probability of fixation of a positivelyselected allele: the probability of fixation of a mutation with a selected advan-tage s is 2s. If s = 0.01, a large value, the probability of fixation will be amere 2%! Of course, a neutral mutation generally has an even bleaker future.If by chance the mutation does survive and increases in frequency, a popula-tion with this new allele will become divergent from a population without thisallele because the two populations do not exchange alleles. Over time therewill be many mutations, some will go to fixation, primarily neutral or nearlyneutral, and populations will diverge through random genetic drift [52].The concept of random genetic drift stemmed from the work of J. Haldane,

S. Wright, and R. Fisher but gained prominence through M. Kimura’s neutraltheory of molecular evolution [53]. The neutral theory of molecular evolu-tion states that most changes observed at the molecular level reflect a balancebetween mutations that are functionally neutral or nearly neutral and randomgenetic drift acting on these mutations in a population of finite size. The neu-tral theory of evolution does not state that adaptive selection does not occur butassumes that it plays a minor part in changes at the genome level. The neutraltheory has been challenged by a selection theory of evolution which states thattruly neutral mutations represent a minority of all changes and that selection isa major force of evolution [54]. Whether the majority of changes are neutralor adaptive is still hotly debated [54, 55, 56]. Independently of this debate,

14

the neutral theory laid a foundation for statistical frameworks to distinguishnatural selection from random genetic drift at the molecular level [57].The difficulty in telling apart different theories of molecular evolution stem-

med in great part on the difficulty to measure random genetic drift. Since theseminal work of S. Wright, this is done through estimation of the so-called ef-fective population size,Ne [58]. The effective population size of a population,Ne, is the size of an idealized population, the Wright-Fisher population, thathas experienced the same amount of random genetic drift than the populationof interest. The Wright-Fisher population is a population in which individualsmate randomly and where the only source of change in allele frequencies isallele sampling. In other words, a population without structure and without se-lection or migration. Of course, this is a highly idealized situation. At neutralsite, the effective size is directly and simply related to genetic diversity and thisis captured by the quantity θ = 4Neµ, where µ is the mutation rate and θ isgenetic diversity that can be estimated by the number of pairwise differencesbetween DNA sequences (Tajima’s π) [59] or by the number of segregatingsites in DNA sequences (Watterson’s θ) [60]. Hence, assuming equal or sim-ilar mutation rate, a more genetically diverse population has a larger Ne thana less diverse one. The relationship between random genetic drift and Ne iswell illustrated in Figure 1.2. In a large population of 2000 individuals, anallele does not deviate too much from its original frequency of 0.5 over thetime of 50 generations. However, when the population size is as small as 20individuals, there is an increasing effect of randomness and an allele can getfixed (frequency = 1) or get lost (frequency = 0) even within 10 generations.If the three populations from the Figure 1.2 were real populations, they wouldstrongly differ in their allele frequency simply due to stochasticity.The effectiveness of selection also depends on Ne. Selection is more ef-

fective in populations with large Ne because they are less affected by geneticdrift, and the contrary is true for populations with small Ne. More precisely,the probabilities of an advantageous mutation to go to fixation and a deleteri-ous mutation to be eliminated is a function of its initial frequency 1/2Ne andthe product Nes, where s measures the strength of selection. The fate of newmutation is largely determined by selection when Nes is much greater thanone, and by random genetic drift when Nes is much smaller than one. Thus,if the strength of selection is the same in two populations but their Ne differ,levels of adaptation as the outcome of this selection will be different in thesetwo populations.The level ofNe is affected by several factors. A population may go through

a bottleneck, i.e. a strong decrease in size over a short period of time. Thiscan greatly reduce Ne and such populations will evolve largely under randomgenetic drift. A population bottleneck can be caused by some environmentalevent (e.g. drought, fire, overhunting), or when a small fraction of the originalpopulations colonizes a new location (founder event). For example, selfingspecies likely go through repeated bottlenecks when they expand across a new

15

Figure 1.2. Effect of population size on genetic drift. Ten simulations each of ran-dom change in the frequency distribution of a single hypothetical allele over 50 gen-erations for different sized populations; first population size n=20, second populationn=200, and third population n=2000. (Image by Professor marginalia, distributed un-der a CC BY-SA 3.0 license).

16

territory. Ne can also decrease with the switch of reproductive mode fromoutcrossing to self-fertilization, which frequently occurs in plants [61]. Ne

can gradually increase when a population is growing or when new variation isintroduced by migrants from other populations.It can be concluded from the considerations above that natural populations

are constantly under selection, leading to the evolution of adaptive traits, andgenetic drift, leading to neutral variation. Distinguishing the two processes isone of the greatest challenges of evolutionary biology. This problem becomesparticularly obvious in structured populations. Samples from two divergentpopulations exhibit differences that may look adaptive, but in fact, they arecaused by genetic drift. One of the ways to separate the two processes is to as-sume that populations evolved largely neutrally if the variation between themcan be explained by population structure. This is done through various math-ematical models that take into account the population structure during testsfor adaptive differences [62, 63, 64]. This control for population structure,however, may also mask real effects if the distribution of adaptive variationcoincides with population structure. For example, the first association studybetween variation in Dwarf8, a gene regulating flowering, and flowering timein maize revealed that the association was due to population structure [65], buta later study with larger sampling proved that it was significant [66]. Thus,knowledge on the population structure of a species is extremely important ifthe aim is to find adaptive variation.With growing evidence of inter-specific hybridization [14], admixture with

other species in structured populations also need to be considered. One pop-ulation of a species may be in contact with other not reproductively isolatedspecies and this may lead to genomic introgression specifically in this popula-tion. These introgressed genomic regions can be neutral, but they may appearas the regions under selection in some selection tests (e.g. selective sweepsscans) because they stand out from the genome-wide and species-wide varia-tion. Without the information on possible admixture with other species, suchresults may spuriously be interpreted as a local adaptive change because it ispresent only in one population. Admixture can also be strong enough to ho-mogenize the difference between species and disrupt the whole-genome phy-logeny for some populations. For example, a phylogenetic analyses ofHelico-nius elevatus andHeliconius pardalinus butterflies revealed that they are moregenetically related to each other in the area where their distribution rangesoverlap than to conspecific populations where these two species are geograph-ically isolated [67]. An introgression can, of course, be under selection andtruly adaptive. For example, in Spain, where house mice (Mus musculus do-mesticus) can hybridize with the Algerian mouse (Mus spretus), the majorityof house mice have resistance to anticoagulant rodenticides that they acquiredthrough introgression of the gene vkorc1 from the Algerian mouse [68]. Housemice from other parts of Europe (Greece, Italy, UK), where the Algerianmousedoes not occur, do not have vkorc1 that provide such resistance. Thus, know-

17

ing whether some populations of a species can potentially be admixed with anyother species is extremely important.The population genetics analyses of Capsella bursa-pastoris presented in

this thesis illustrate well how a species can be structured into several popula-tions and how these populations differ in their environments, effects of geneticdrift, targets and strengths of selection. This thesis also demonstrates how dif-ferent populations of one species can be influenced by admixture with sisterspecies. It also provides a good example of challenges associated with analysesrequiring a correction for population structure.

1.3 Challenges of studying allopolyploid organismsTo unravel the complex evolutionary history of allopolyploid organisms, onehas to solve many practical and theoretical issues. Many allopolyploid speciesdo not have a reference genome or the quality of their genomes is poor. Thisis primarily because genome assembly tools that reconstruct a whole genomefrom short DNA reads coming from a sequencing machine are not designedto resolve genome complexity of allopolyploids that have large genomes withmany repeats, paralogs, and high heterozygosity. Genome assembly tools alsogenerally do not handle properly the hybrid nature of allopolyploid genomes -regions of low divergence between two subgenomes would be collapsed to asingle sequence by an assembler. Even such economically important species asbread wheat (Triticum aestivum) had only 61% of its genome sequence assem-bled [69] and only recent developments in both genomics and bioinformaticsled to near-completion of its genome [70]A general approach to deal with the absence of a reference genome for an

allopolyploid species is to use an available reference genome of the closestdiploid species. However, this approach has some issues that a researchershould be aware of. First, one of the subgenomes of the allopolyploid maybe more related to the reference genome than the other subgenome and thiswill increase the success of mapping genomic reads from the more closely re-lated subgenome. This mapping bias is not that critical for DNA data becauseat sufficient coverage most of the variation will be captured correctly. Thisis, however, a serious problem for gene expression analyses because mappingbias in RNA reads would directly influence the results. To reduce the severityof this mapping bias, one has to specify the correct divergence from the refer-ence genome during mapping. Bias in the RNA data can also be reduced byremoving sites that show strong biases in the DNA data. Second, to separatethe two subgenomes, the mapping data needs to be phased. There are severalprograms that can phase the data into haplotype blocks [71], but the concatena-tion of these blocks into continuous sequences representing two subgenomesis only possible if the origin of an allopolyploid is known and parental ge-nomic data is available. Allopolyploids usually exhibit disomic inheritance

18

(two subgenomes do not recombine) and this allows the use of the informa-tion provided by fixed differences between the two subgenomes and the twoparental species to perform the concatenation. Third, most of genome analysissoftware, as well as population genetic models, are developed for haploid ordiploid data and they expect bi-allelic variation (two alleles per variable site).Allopolyploids, however, have four alleles at each genomic site and becauseof their hybrid nature, some variable sites may include more than two alleles.To deal with this issue, one usually has to treat tetraploid data as diploid dataand remove all non-biallelic sites. This simplification is based on the fact thatselfing and disomic inheritance that are common in allopolyploids lead to thestrong reduction of heterozygosity within each subgenome and hence variationwithin subgenomes can be neglected.The nature of allopolyploid data also makes it complicated to compare gene

expression data between the two subgenomes of allopolyploid and the genomesof its diploid parental species. Traditionally, the gene expression level of agiven gene is estimated by counting all reads that map to a gene. This is howthe data is usually obtained for diploid parental species. On the other hand,gene expression data for the two subgenomes of an allopolyploid species canbe obtained only by phasing the count data per variable site. For example, if asite is heterozygous AATT, one can obtain the number of reads with allele Aand T correspondingly. Reads that are mapped to the region without polymor-phism cannot be distinguished between two subgenomes and thus are skipped.This results in the reduction of the total read counts and under-estimation ofexpression level for genes with a low number of heterozygous sites. To correctthis bias, one can use the unphased reads counts of an allopolyploid and trans-form them into the phased data by using the ratio between the two subgenomesthat was estimated on the phased data. Suppose 30 reads map to a gene, 16 ofwhich map to a polymorphic site AATT, 10 with allele A and 6 with allele T.Hence, the proportion of one subgenome is 6/(6 + 10) = 0.375 and the otherone is 1− 0.375 = 0.625. Extrapolating this proportion to total read count pergene gives 0.375 ∗ 30 = 11 reads for the first subgenome and 0.625 ∗ 30 = 19reads for the second one. This method is not ideal because allopolyploid geneswithout any polymorphism are still skipped, but it makes the diploid and al-lopolyploid data more comparable.All these challenges have been faced during the analyses ofC. bursa-pastoris

in this thesis. Some of the solutions presented above also have been developedduring the work on this thesis.

1.4 A promising model system to understandallopolyploidy: C. bursa-pastoris

Genetic studies on the shepherd’s purse Capsella bursa-pastoris were startedin the early twentieth century by G. Shull. G. Shull, then one of the most

19

prominent geneticists of his time [72] (among other things he introduced theconcept and word ”heterosis”), wanted to ”carry Mendelian analysis beyondthe field of domesticated plants and animals into the realm of wild nature whererelationships have been unmodified by artificial breeding and selection by hu-man agency”. He and his associates carried out a very large amount of crosses(4000 pedigreed families!) from a worldwide collection and also performedsome cytological studies. These studies led to a delineation of the species ofthe Capsella genus [73, 74, 75]. In particular, S. Hill showed that the speciescould be divided into two groups based on their ploidy level, a tetraploid anda diploid one [74]. The Capsella genus is today considered to consist of 4-5species: C. grandiflora, C. orientalis, C. rubella, C. bursa-pastoris, and possi-blyC. thracica [76]. Among these species onlyC. bursa-pastoris is a polyploidwith a worldwide distribution (Fig. 1.3). Actually, C. bursa-pastoris is one ofthe most common weed in the world [77]. C. thracica is also a polyploid thatis endemic to Bulgaria and was described as a separate from C. bursa-pastorisspecies [76]. However, given the recent information on the origin of C. bursa-pastoris [78], the status of C. thracica needs to be verified. It could as well bea divergent population of C. bursa-pastoris.The widespread allotetraploid C. bursa-pastoris has recently emerged as

a promising system for studying the evolution of allopolyploidy. First, in-formation on its two progenitor diploid species and their current distributionis readily available. C. bursa-pastoris, a selfing species, originated from thehybridization of the Capsella orientalis and Capsella grandiflora/rubella lin-eages some 100-300 kya [78] (Fig. 1.3). Nowadays, C. orientalis, a geneti-cally depauperate selfer, occurs across the steppes of Central Asia and EasternEurope, while C. grandiflora, an extremely genetically diverse obligate out-crosser, is primarily confined to a tiny distribution range in the mountains ofNorthern Greece and Albania (Fig. 1.3). Second, self-fertilization, small size,yearly reproduction and low requirements for growing conditions make it easyto use C. bursa-pastoris in laboratory experiments (Fig. 1.4). Third, genomicdata of C. bursa-pastoris can easily be phased into subgenomes [78, 79] due toa disomic inheritance [80]; thus, meiotic pairing happens between homologouschromosomes from the same ancestor and subgenomes inherited from differentancestors do not recombine. Fourth, genomic data of C. bursa-pastoris can beanalyzed by mapping to the reference genome of C. rubella, a selfer recentlyderived fromC. grandiflora, that was sequenced, assembled and edited in 2013[81]. Moreover, high quality genome assemblies ofC. bursa-pastoris,C. gran-diflora andC. orientalis should be available soon (D.Weigel, personal comm.).Fifth, genomic studies of C. bursa-pastoris and other Capsella species bene-fit from their close relatedness to the best studied plant model species Ara-bidopsis thaliana. The estimated divergence time between the Arabidopsisand Capsella genera is about 10 million years [82, 83]. Sixth, the compositionof the genus Capsella provides a good set up to study consequences of selfingand polyploidy. There have already been interesting studies on the genetic ba-

20

~900kya

~100kya

~25-50kya

C. rubella C. grandiflora C. orientalisC. bursa-pastoris

Figure 1.3. Origin and distribution of the Capsella species. The origin schemeshows the most likely scenario of the origin of C. bursa-pastoris through the hy-bridization between C. orientalis and C. grandiflora/rubella lineages and the increaseof ploidy level to 4n. The approximate time of species splits is shown in thousands ofyears ago (kya). Reduced flowers in C. bursa-pastoris, C. rubella, and C. orientalisdemonstrate the consequence of selfing when there is no need to attract pollinators.The map shows the limited distribution ranges of the diploid Capsella species in Eura-sia, but C. bursa-pastoris is distributed worldwide because of its invasiveness.

21

sis of the shift to selfing and the so-called selfing syndrome [84, 85] and onthe establishment of reproduction barriers [86, 87] in the diploid species of thegenus. Our group also recently started a series of growth chamber experimentscomparing the competitive ability of the different Capsella species. The firststudies indicate that the two diploid selfing species have a much lower com-petitive ability than C. grandiflora, but C. bursa-pastoris has a competitiveability as high if not higher than C. grandiflora [88, 89]. It is, hence, as ifthe genome doubling associated with polyploidization restored the competi-tive ability lost because of the shift in the mating system. Seventh, worldwidedistribution of C. bursa-pastoris exemplifies colonizing success [90]. Eighth,this worldwide distribution also enable studies of clinal variation in such traitsas flowering time [91, 92, 93, 94]. Finally, evidence for gene flow betweenC. bursa-pastoris and its close relatives [95, 79] indicate thatC. bursa-pastorisand the genus Capsella can also be used to investigate genomic introgressionbetween diploids and polyploids.In addition, this thesis demonstrates thatC. bursa-pastoris provides a model

to study allopolyploid evolution in structured populations [96, 94]. In Eurasia,C. bursa-pastoris is divided into three genetic clusters - Middle East, Europe,and Asia - with low gene flow among them and strong differentiation both atthe nucleotide and gene expression levels (Papers I and II). This thesis also ex-emplifies that the genomic data of C. bursa-pastoris can easily be phased intosubgenomes [79] (Paper III). It also reveals sophisticated regulation of gene ex-pression that varies across different tissues (Paper IV). So, C. bursa-pastoriscan be also used to study allopolyploid gene expression evolution across tis-sues and populations.The puzzle posed byC. bursa-pastoris had already fascinated G. Shull, who

wrote in 1929 ”It is considered a matter of fundamental significance that theincrease in the number of chromosomes in the bursa-pastoris group is cor-related with greater variability, greater adaptability, greater vigor and greaterhardiness” [75]. The results presented in this thesis further highlight the intri-cacy of C. bursa-pastoris and more generally the complexity of allopoplyploidevolution. I believe that after reading this thesis C. bursa-pastoris will fasci-nate you too.

22

Figure 1.4. C. bursa-pastoris growing in laboratory conditions.

23

2. Present study

The four papers of this thesis are sequential studies of the shepherd’s purseC. bursa-pastoris. First, we determined its population structure and coloniza-tion history. Then we compared phenotypic variation in the discovered pop-ulations to assess how much of this variation was due to local adaption andgenetic drift. Finally, we investigated the evolution of the two subgenomes ofC. bursa-pastoris by comparing their genomic and gene expression variationwith each other and with the parental species overall and in different popula-tions. The summaries of each of these studies are provided below.

Paper I. Population structure and colonization historyPolyploid species usually inhabit larger geographic areas and have wider eco-logical niches than their diploid progenitors [97] but our understanding of howthey gained the capacity to colonize different localities with wide ranges ofbiotic and abiotic conditions remains limited. The first step to address thisquestion is to characterize the population structure of a polyploid species anddetermine where it originated and how it spread. To that end, one has to ob-tain a wide-range sampling of a polyploid species, generate molecular data formany samples, and use fast and powerful computational tools to infer demo-graphic histories. C. bursa-pastoris is a polyploid species that grows under alarge range of environmental conditions world-wide [90] and its global pop-ulation structure, place of origin and colonization history was not known. Inthis study, we genotyped a large sample of C. bursa-pastoris accessions withindividuals sampled across Eurasia and a few sites from North America. Wethen analyzed its genetic structure and demographic history.We genotyped 261 accessions of C. bursa-pastoris with genotyping-by-

sequencing (GBS) and obtained 4,274 SNPs after quality control. The GBStechnique uses restriction enzymes to selectively target DNA sequences out-side of repetitive regions and to obtain SNPs in multiple samples with highefficiency [98]. Phasing of the GBS data was not feasible due to low cover-age, so we analyzed the unphased data. To avoid bias in estimates of polymor-phism, we removed sites showing fixed heterozygosity because most of themrepresented differences between the two subgenomes and not real polymor-phism. We determined the population genetic structure with model-based an-cestry inference implemented in ADMIXTURE [99]. The demographic history

24

was inferred by comparing alternative demographic models in the coalescent-based approximate Bayesian computation framework (ABC) [100]. To furthercharacterize the inferred populations, we also estimated pairwise genetic dif-ferentiation between them [101] and their genetic diversity with Watterson‘stheta (θW ) [60] and mean pairwise nucleotide differences (π) [59].

The ABC approach is a likelihood-free method based on summarystatistics of the data that allows estimating several parameters (e.g. di-vergence time, migration rates, effective population size) of a givendemographic model. For the sake of simplicity, let us assume a stan-dard coalescent model with a single parameter, θ = 4Neµ. Briefly, onefirst starts by calculating summary statistics of the data, for instance, thenucleotide diversity in the sample, πobs. In a second step, values of themodel parameter, in our example, values of θ, are randomly sampledfrom a prior distribution and used in coalescent simulations under thedemographic model which parameters we want to estimate. For eachsimulation run, we then compute the simulated summary statistic, πsim,and compare it to the observed estimate, πobs. If the difference betweenπsim andπobs is less than a threshold value (tolerance) we keep the valueof θ used in the simulation run. Otherwise, we discard it. We repeat thisa certain number of times until we obtain a posterior distribution of theparameter. ABC can handle complex demographics models and meth-ods have been developed to compare the probability of the differentmodels [102].

C. bursa-pastoris could be divided into three genetically distinct clusters:one cluster grouped accessions fromWestern Europe and Southeastern Siberia(hereafter EUR), a second included samples from theMiddle East andNorthernAfrica (hereafter ME), and the third one united accessions from Eastern Asia(hereafter ASI) (Fig. 2.1). The ABC analysis supported the hypothesis thatC. bursa-pastoris most likely originated in the Middle East and then spreadtowards Europe and later into Asia. The ME population diverged 167 250years ago from an ancestral population, the EUR population diverged fromME7 967 years ago, and the Asian population diverged from the Middle Easternpopulation 942 years ago. The estimated current effective population sizes(Ne) were 14 748, 21 130 and 25 376 for ASI, EUR and ME, respectively.The gene flow between these populations was moderate and each populationunderwent a bottleneck at its origin with the strongest bottleneck in ASI. TheME population, as the putative origin population, was the most geneticallydiverse and the ASI population was the least diverse.A recent colonization of the world by C. bursa-pastoris was in agreement

with earlier studies. Previous estimates of the origin time of C. bursa-pastoriswere around 67,770 ya [95], 80,000 ya [103] and 128,000 ya [78], which wereclose to our estimate of the origin of theME population some 167,250 year ago.

25

FST=0.19

FST=0.16FST=0.10

A

B

Figure 2.1. Genetic structure and ancestry ofC. bursa-pastoris accessions in Eura-sia and Northern America. A.Map with the mean membership proportions inferredby ADMIXTURE for the three detected clusters in samples collected from the samesite; FST : pairwise genetic differentiation [101]. B. Population structure of C. bursa-pastoris inferred with ADMIXTURE. Each vertical line represents an individual. Col-ors represent the inferred ancestry from three ancestral populations.

The discovered genetic differentiation between ASI, EUR and ME was also inline with previous studies [104, 95]. We further proposed that the colonizationhistory of C. bursa-pastoris was human-mediated. This was primarily sup-ported by the accessions from the Russian Far East that were geographicallycloser to the Asian cluster but genetically belonged to the European one. Thislong-distance dispersion was likely carried out by humans along the Trans-Siberian Railway, or through human colonization of Siberia in the 16th cen-tury. Similarly, the origin of the North American populations from the EURand ME populations was best explained by human-mediated movement (see[76] and references therein).As illustrated by the following three papers, this strong population structure

and the recent multistage colonization history played a key role in the evolutionof C. bursa-pastoris.

Paper II. Adaptation and neutral processesThe complex population structure and demographic history of species add tothe difficulty of telling apart the effects of natural selection and genetic drift.In structured populations, observed phenotypic differences can simply be dueto genetic drift, and if the genetic distance between populations is not con-

26

A B C

Figure 2.2. Differentiation between the three populations of C. bursa-pastoris inclimatic parameters, gene expression levels and genetic variation. A. A PCA of20 climatic variables for 65 populations. B. A multidimensional scaling plot of theexpression levels in 19 484 genes from 24 accessions. Distances can be interpretedin terms of log2-fold-change (logFC). C. A PCA of 261 accessions based on 4 274nuclear SNPs obtained by GBS.

sidered, the differentiation may be falsely interpreted as adaptive. The preva-lence of neutral evolutionary processes has been suggested for gene expressionevolution in various organisms, including humans [105], plants [106] and fish[107]. The variation in gene expression was often found to be correlated withgenetic distance and thus was more likely a consequence of genetic drift thanadaptive divergence. In the present study, we assessed the effect of popula-tion genetic structure on variation in gene expression and flowering time inC. bursa-pastoris.To assess population structure, we used the GBS data generated for 261 ac-

cessions in Paper I [96]. The gene expression data was generated with RNA-Seq for a subset of 24 accessions. We also measured the variation in floweringtime and circadian period length in the 261 accessions. We used flowering timeand circadian rhythm as potentially adaptive traits because the onset of flower-ing at the right time ensures reproductive success, and the regulatory networkof flowering time overlaps with the regulatory pathways of circadian rhythm[108, 109]. To compare the environmental differences between populations,we retrieved climatic data for the collection sites (20 bioclimatic variables)[110, 111]. The variation in climatic variables, gene expression, and GBS datawas explored with a principal component analysis (PCA), multidimensionalscaling plot (MDS), correlation tests and linear models. The correction forthe population structure was done by using the phylogenetic distance betweensamples as a random effect in generalized linear mixed models.The Asian (ASI), European (EUR) and Middle Eastern (ME) populations

of C. bursa-pastoris showed clear differentiation in climatic conditions, geneexpression variation and genetic polymorphism (Fig. 2.2). Variation in flow-ering time and circadian period length was significantly associated with thesegenetic groups. Controlling for population structure did not affect this associ-

27

ation indicating that variation in these traits could be adaptive. We also foundsignificant differences between the three populations in gene expression. Outof 19 484 analyzed genes, 2 457 genes were differentially expressed betweenASI and EUR, 1 317 between ASI and ME, 903 between EUR and ME. Someof these genes control flowering time variation and thus these differences couldbe adaptive. However, all significant expression differences were removedwhen correcting for the population genetic structure. This indicated that mostdifferences were due to genetic drift. We assumed that these results could alsobe due to the confounding of population structure and distribution of adaptivetraits. To bypass this effect, we also compared gene expression differences be-tween early and late flowering ecotypes within the ASI and EUR clusters. Wefound different sets of genes showing differential expression in each of thesetwo populations. Among these genes, FLOWERING LOCUS C (FLC), a geneknown to regulate flowering time, was the best candidates for local adaptationin EUR.These results exemplify how a demographic history that results in a strong

population structure may complicate the interpretation of gene expression dif-ferences within a species. On the one hand, gene expression differences be-tween populations of C. bursa-pastoris detected without the correction of pop-ulation structure could be interpreted as adaptive. However, all these differ-ences were explained by the population structure. Thus, the potentially adap-tive changes were likely false positives. It has been shown how easy it is todevelop biologically sensible explanations of the adaptive importance for thedifference detected in the data simulated under a purely neutral model [112].On the other hand, the three populations inhabit regions with different climaticconditions and variation in gene expression between these populations is un-likely to be purely neutral. For example, variation in flowering time and cir-cadian rhythm was not purely neutral. Thus, either the distribution of adaptivetraits in C. bursa-pastoris coincides with population structure or the effect oflocal adaptation, if present, is weak. Finally, local adaptation may get evenmore complicated if selection varies over the entire geographical range but actsat a very local scale. For example, it seems that neither humans norArabidopsisthaliana have experienced large-scale local adaptation across their geograph-ical range, but rather local selection within different populations [113, 114].Evidence of selection on FLC and some other genes in EUR and ASI alsosupports this hypothesis of a more local adaptation for C. bursa-pastoris.

Paper III. Differential evolution of two subgenomesacross populationsIt remains largely unknown how the two genomes evolve once they have fusedin an allopolyploid species and how strongly their evolutionary trajectories de-pend on the initial differences between the two parental species and the specific

28

0.04

0.05

0.06

0.07

0.08

0.09

0.10

Dele

teri

ous / T

ota

l

Co Cg Co Cg Co Cg

ASI EUR MECO CG

Figure 2.3. Genetic load in the subgenomes of C. bursa-pastoris and its parentalspecies. The proportion of deleterious nonsynonymous changes was estimated on de-rived alleles, i.e. alleles accumulated after the speciation of C. bursa-pastoris. Coand Cg are the two subgenomes of C. bursa-pastoris. ASI, EUR, ME, CO, CG indi-cate Asian, European and Middle Eastern populations C. bursa-pastoris, and parentalspecies C. orientalis and C. grandiflora, respectively.

demographic history of the newly formed allopolyploid species. For example,species that went through repeated bottlenecks during their range expansionare expected to have reduced genetic variation and higher genetic load thanmore ancient central populations [115, 116]. Populations can also admix withdifferent related species and such admixture could shift the evolutionary pathof the local population. Finally, range expansion will expose newly formedallopolyploid populations to divergent selective pressures, providing the pos-sibility of differentially exploiting the two subgenomes, and creating asym-metrical patterns of adaptive evolution in different populations. In this paper,we tested whether the two subgenomes of C. bursa-pastoris have similar ordifferent evolutionary trajectories in term of hybridization, selection and geneexpression in three known populations.We phased 31 genomes and 24 transcriptomes for variation between the two

subgenomes, one descended from the outcrossing and highly diverse C. gran-diflora (hereafterCbpCg) and the other one descended from the selfing and ge-netically depauperate C. orientalis (hereafterCbpCo). To perform the phasing,we used genomic data of 10 C. orientalis and 13 C. grandiflora as references.For each subgenome, we assessed its phylogenetic relationship with the diploidrelatives, temporal changes of effective population size (Ne), signatures of pos-itive selection, amount of accumulated deleterious mutations (genetic load),genetic diversity, and levels of gene expression. We also performed tests foradmixture of the phased subgenomes of C. bursa-pastoris with other Capsellaspecies.

29

Figure 2.4. Phylogenetic relationships between subgenomes of C. bursa-pastorisand other Capsella species. A. Whole genome neighbor-joining phylogenetic tree.The bootstrap support based on 100 replicates is shown only for the major clades.The root (Neslia paniculata) is not shown. B. Overlapping of 1002 neighbor-joiningtrees reconstructed with 100 Kb sliding genomic windows. ASI, EUR ME, CO, CG,CR indicate Asian, European and Middle Eastern populations of C. bursa-pastoris,C. orientalis, C. grandiflora, and C. rubella, respectively. The CbpCo and CbpCg

subgenomes are marked with Co and Cg, respectively.

30

We found that during∼100,000 generations of co-existence within the samespecies, the effective population sizes, Ne, of the two subgenomes decreasedgradually and converged in all three regions. However, the two subgenomesretained part of the initial difference between the two parental species. TheCbpCg subgenome, inherited from a genetically more diverse parent, was alsomore diverse than theCbpCo subgenome inherited from a selfing parent with avery low genetic diversity. The amount of genetic load also differed betweensubgenomes and indicated that purifying selection removed deleterious muta-tions more efficiently from CbpCg (Fig. 2.3) that had a larger effective popu-lation size than CbpCo. However, purifying selection was not as efficient asin outcrossing C. grandiflora, but it was more efficient than in selfing diploidC. orientalis. These differences in diversity and load, as well as differencesin patterns of positive selection and levels of gene expression, also stronglydepended on the specific histories of the three populations considered. TheAsian population that was the most distant from the tentative place of originof the species was the least genetically diverse and had the highest number ofdeleteriousmutations. This population also experiencedmore selective sweepson the CbpCo subgenome, whereas the European and Middle Eastern popula-tions experienced positive selection mostly on the CbpCg subgenome. Moststrikingly, and unexpectedly, the three populations hybridized with differentdiploid relatives. There was strong hybridization between C. orientalis andC. bursa-pastoris in Asia and this admixture profoundly affected the phylo-genetic relationship between subgenomes and parental species (Fig. 2.4). Itlooked like C. orientalis was derived from the Asian C. bursa-pastoris. Thispattern could also be retrieved if one assumes a scenario of multiple originsof C. bursa-pastoris, but it seems more likely to be the result of admixture.Admixture was also detected between C. rubella and C. bursa-pastoris in Eu-rope. There were no major changes in gene expression between subgenomes,though the CbpCg subgenome was slightly more expressed than CbpCo in Eu-rope and the Middle-East, whereas there was rather equal expression betweenthe subgenomes in Asia.In summary, the shift to selfing and polyploidy had a global impact on the

two subgenomes of C. bursa-pastoris resulting in a sharp reduction of the ef-fective population size in all populations, that was accompanied by relaxedpurifying selection and accumulation of deleterious mutations. However, thetwo subgenomes still retained a strong signature of parental legacy and weredifferentially affected by introgression and selection in different geographi-cal regions. Hence, this study illustrates that, first, the parental legacy hasa long-term impact on the evolution of allopolyploid species, and second, theevolutionary trajectories of allopolyploid populations may differ depending onthe ecological arena.

31

Paper IV. Convergence and parental legacy in thesubgenomesThe impact of allopolyploidy on genome diversity and expression pattern stillremains to be understood and put in a clear temporal perspective. Some studieshave described genome upheaval and ”transcriptomic shock” [117, 118, 119],whereas others detected a strong legacy from the parental species and muchmore subtle and concerted changes [120, 121]. These differences across stud-ies could result from many factors such as age and demographic history of thealloploid species or extent of the divergence between the parental species. Inthe present study, we investigated genomic and gene expression changes in thetwo subgenomes of the recent allopolyploid C. bursa-pastoris.We sequenced the transcriptomes of three tissues (root, leaf, and flower) of

C. bursa-pastoris, its two parental species, C. grandiflora and C. orientalis,and its close relative C. rubella. We compared the overall and homeologue-specific levels of expression betweenC. bursa-pastoris and its parental species.To control for mapping bias in gene expression data, we used the genomic datagenerated in the previous study (Paper III) [79]. We compared both the totalexpression level (CbpCo + CbpCg) as well as homeologue-specific expres-sion with correlation, differential expression analyses and a similarity index(S), which measured relative expression deviance of each subgenome fromthe average parental expression level. We also compared the distribution ofsynonymous and deleterious mutations between the two subgenomes with amodel-based approach that accounts for an inherent bias towards one of thesubgenomes. We then accounted for this bias and tested if the deleterious mu-tations were randomly distributed between the two subgenomes or whetherthey occurred more frequently on one of them than expected by chance. Fi-nally, we checked whether the proportion of deleterious mutations per geneand gene expression levels were correlated in any way.The gene expression ratio between subgenomes was overall equal (0.496)

and not different from the ratio inferred in the DNA count data (0.497), whereno difference between subgenomes is expected. Levels of gene expressionwere strongly correlated between the subgenomes (Spearman’s ρ 0.87-0.94),and expression in each subgenome was even more correlated with the corre-sponding parental species (Spearman’s ρ 0.88-0.97). These strong correlationswere further supported by the differential gene expression analysis, which re-vealed that 55-80% of the genes depending on the tissue were of the same levelin all three species (Table 2.1). However, a small proportion of genes showedthe expression dominance of one of the parental species in the total level ofexpression. Notably, this dominance was opposite between the two tissues:the total expression level was more similar to C. orientalis in flower, and toC. grandiflora in root (Fig. 2.1). Interestingly, the expression of the selfingspecies, C. rubella, was the most similar to the expression pattern of its closestrelative, the outcrossing C. grandiflora in all tissues including flower. This

32

Table 2.1. Gene expression levels in C. bursa-pastoris and the parental species

Expression pattern Flower Leaf Root

CO Cbp CGCO Cbp CG

CO Cbp CGCO Cbp CG

CO Cbp CGCO Cbp CG

CO Cbp CGCO Cbp CG

CO Cbp CG

8 805 12 656 10 278

2 164 506 687

1 229 720 1 297

1 498 713 941

565 1 226

(55.6%)

(13.7%) (3.2%) (4.3%)

(80.0%) (65.0%)

(7.8%) (4.6%) (8.2%)

(9.5%) (4.5%) (5.9%)

(3.6%) (7.7%)

CO Cbp CG

CO Cbp CGCO Cbp CG CO Cbp CG

1 139(7.2%)

664 1 395(4.2%) (8.8%)

989(6.2%)

No difference

Additivity

Dominance

Transgressive

CO, CG, and Cbp correspond to C. orientalis, C. grandiflora, and C. bursa-pastoris,respectively. The y-axes show levels of expression. The levels were considereddifferent if they showed significant differential expression (FDR < 0.05). The moststriking differences showing reversed levels of expression between tissues arehighlighted in bold.

33

suggested that the detected similarity of C. bursa-pastoris and C. orientalis inflower tissues was not purely due to selfing. The variation between tissues wasfurther confirmed by computing a similarity index (S), which was systemati-cally biased toward the corresponding parental species in each subgenome butthis bias varied between tissues. In addition, the correlation of S values ofhomeologous genes of two subgenomes suggested that these genes were gen-erally regulated into the same direction. The slope of the regression betweenthe expression ratio of the subgenomes over the expression ratio of the parentalspecies was always less than one, indicating the presence of co-regulation ofthe two subgenomes through a mixture of trans- and cis-regulation [122, 123].The slope was less in flower tissue than in leaf and root tissues, suggesting astronger trans-regulation in the former than in the latter. Finally, the deleteri-ous mutations accumulated more on the CbpCo subgenome relative to the syn-onymous mutations, the distribution of deleterious mutations between the twogenomes had a smaller variance compared to the distribution of synonymousmutations. This suggests that both copies are needed and prevent to accumu-late too many deleterious mutations. We also found no association betweenlevels of expression and the number of deleterious mutations in genes.To conclude, since the speciation ofC. bursa-pastoris around 100,000 years

ago, some of the major differences between the ancestral genomes started tobe erased and the two subgenomes have started to act in a coordinated manner.However, a significant legacy effect on the number of deleterious mutationscarried by the two subgenomes, and on the total expression level in the threetissues can still be detected.

34

3. Conclusions and future perspectives

In the twenty-first century, the research on polyploidy has become particularlymolecular-centered due to the availability of huge amount of whole-genomesequencing data. Although this thesis is also primarily based on the whole-genome data, it differs from many modern works by considering not onlymolecular but also demographic and ecological aspects of polyploidy evolu-tion. In Paper I, we have shown that C. bursa-pastoris is structured into threegenetically distinct populations that have different effective population sizeand thus are unequally affected by random genetic drift. In Paper II, we fur-ther bolster the differentiation between these populations by showing the dif-ferences in climatic, phenotypic and gene expression variation between them.In Paper III, we reveal that selection targets different subgenomes across thesepopulations and that the populations carry a different amount of genetic load.Moreover, C. bursa-pastoris hybridizes with different species depending ongeographical region and this hybridization has a profound effect on molecularvariation in some populations. All these results emphasize the importance ofstudying polyploid species across their distribution range because evolutionaryprocesses may substantially differ across populations and sampling only onepopulation could lead to biased results and misleading conclusions.The major global changes relative to the parental species would, of course,

be evident in all populations of an allopolyploid. In C. bursa-pastoris, thesechanges include a steep decrease in the effective population size, relaxed pu-rifying selection and no strong expression dominance between subgenomes.There is also a long-lasting effect of parental legacy in the amount of ge-netic load between subgenomes. From these findings, it can be concludedthat C. bursa-pastoris experienced some events shared by many polyploids.For example, it went through a demographic bottleneck and shifted to self-ing which led to reduced efficacy of natural selection. Gene expression inC. bursa-pastoris as in many other allopolyploids was not disrupted, thoughsome allopolyploids experience profound expression changes (so-called ”tran-scriptomic shock”). However, I still would like to point out that smaller scalevariation in these effects across population was significant and cannot be ig-nored.This thesis also addresses the question of adaptation in polyploids. Our

results on local adaptation seem to strongly depend on the levels of integra-tion and geographical scale that are investigated. Signs of global adaptation inflowering time and circadian rhythm variation are observed in Paper II. On theother hand, gene expression variation does not depart from neutral expecta-tions. In Paper III, variation in positive selection across subgenomes depends

35

on population and could be considered as indirect evidence of local adaptation.However, this inference is not straightforward as the observed patterns couldalso be due to complex demographic history, especially admixture with differ-ent sister species. The general view of low local adaptation, or possibly mal-adaptation in some subpopulations, is also supported by a large-scale, multi-site, common garden experiment that is not included in this thesis (Manuscriptin prep.). Hence, despite being distributed across diverse environments andclimates around the globe, the evolution of C. bursa-pastoris seems to havebeen dominated by neutral rather than adaptive processes.In addition, this thesis further exemplifies how C. bursa-pastoris can be

used as a model species to study polyploid evolution. In Paper III and IV, wesuccessfully phased a major part ofC. bursa-pastoris genome. Such phasing isonly possible becauseC. bursa-pastoris is of known origin, it exhibits a strictlydisomic inheritance, the genomic data of parental species is available, it has asmall genome, and the fully assembled and annotated genome of a closely re-lated species is available. These conditions are not met in many allopolyploidspecies. Paper III shows how this phased data sheds new light on the differen-tial evolution of the subgenomes across populations of C. bursa-pastoris. Pa-per IV, that is based on the phased expression data of C. bursa-pastoris and itsparental species, demonstrates that gene expression regulation can vary acrossdifferent tissues and that there is a strong legacy from the parental species.Finally, these thesis has already shed light on many questions of polyploid

evolution, but if I had a chance to continue working on this project, I wouldwork in the following directions. First, I would further extend Paper IV withadditional gene expression analyses and a stronger focus on different popula-tions. Second, I would verify the demographic history reconstructed in PaperII but this time with the ABC analysis of the phased genomic data. Third,I think one of the most intriguing questions would be to compare the abso-lute expression levels between C. bursa-pastoris and its diploid relatives at thesingle-cell level. Fourth, it would be most exciting to analyze genomic datawith good quality reference genome assemblies of all four Capsella species toeliminate all technical issues I had to cope with during the work on this thesis.Finally, as a long time perspective, it would also be interesting to look at smallRNAs and epigenetic regulation in C. bursa-pastoris.

36

Svensk sammanfattning

Termen allopolyploid avser en organism som har sitt ursprung genom hybri-disering och polyploidi. Hybridisering förenar två genom från olika arter tillen organism, medan polyploidi dubblerar (ibland tripplerar eller till och medfyrdubblar) uppsättningarna av kromosomer. Dessa modifieringar har vanligt-vis negativa konsekvenser för organismen. Det finns dock arter som inte baraöverlever dessa modifieringar utan till och med frodas och konkurrerar ut nor-mala arter. Det finns flera teoretiska förklaringar till detta fenomen, men antaletempiriska studier är begränsat.

Capsella bursa-pastoris, känd som lomme, håller på att bli en modell förstudier av en framgångsrik allopolyploid art. C. bursa-pastoris förekommeröver hela världen, medan dess föräldraarter, Capsella grandiflora och Cap-sella orientalis, har mer begränsade distributionsområden. C. grandiflora ärbegränsad till norra Grekland och Albanien, och C. orientalis finns på stäp-perna i Centralasien. Genetiska studier på C. bursa-pastoris startades i börjanav 1900-talet. Studier omC. bursa-pastoris evolutionsbiologi påbörjades docksenare. Det var exempelvis osäkerhet runt ursprunget av C. bursa-pastoris ochdet var endast några få år sedan det blev klarlagt att C. bursa-pastoris är en al-lopolyploid. Den världsomspännande distributionen av C. bursa-pastoris varlänge känd, men om det fanns någon populationsstruktur, var arten härstamma-de ifrån och hur den spred sig fastställdes inte. C. bursa-pastoris allopolyplo-ida ursprung leder också till frågor om utvecklingen av de två subgenom somärvts av ganska olika föräldrar, en genetiskt fattig självbefruktare (C. orienta-lis) och en extremt genetiskt varierad fakultativ utkorsare (C. grandiflora). OmC. bursa-pastoris står inför olika miljöförhållanden runt om i världen, anpas-sar den sig lokalt till dessa olika miljöer? Hur påverkar självreproduktionen,som är känd för att leda till ökad genetisk belastning och inavelsdepression,C. bursa-pastoris prestanda? Jag har analyserat genomiska och transkriptomis-ka data samt några fenotypiska egenskaper för att svara på alla dessa frågor imin avhandling.I Artikel I beskrev vi den genetiska variationen i C. bursa-pastoris och

visade att den inte är homogen över Eurasien utan snarare indelad i tre gene-tiskt distinkta populationer: den första består av individer från Europa och öst-ra Sibirien, den andra ligger i östra Asien, och den tredje grupperar individerfrån Mellanöstern och Nordafrika. Rekonstruktion av koloniseringshistorienföreslog att denna art härstammar från Mellanöstern och därefter spridits tillEuropa och östra Asien. Vi upptäckte också att anslutningar från ryska FjärranÖstern tillhörde den europeiska gruppen genetiskt men befann sig närmare den

37

asiatiska gruppen. Detta indikerade att denna spridning mest sannolikt utfördesav människor längs den transsibiriska järnvägen, eller genom kolonisering avSibirien avmänniskor under 1500-talet. På samma sätt är ursprunget för de nor-damerikanska populationerna från Europeiska och Mellanöstern-populationersannolikt också resultatet av människoförmedlad spridning.I Artikel II testade vi om det finns tecken på lokal anpassning i genuttryck

och variation av blomningstid i C. bursa-pastoris samt om det finns någoneffekt av den upptäckta populationsgenetiska struktur på variation av dessatvå egenskaper. I strukturerade populationer kan de observerade fenotypiskaskillnaderna helt enkelt bero på genetisk drift, och om det genetiska avståndetmellan populationer inte beaktas, kan differentieringen felaktigt tolkas som ad-aptiv. Vi upptäckte att populationerna av C. bursa-pastoris från Asien, EuropaochMellanöstern visade tydlig differentiering i klimatförhållanden, genuttryckoch blomningstid. Variation i blomningstid, en potentiellt adaptiv egenskap ef-tersom blomningsstart vid rätt tidpunkt säkerställer reproduktiv framgång, på-verkades svagt av populationens genetiska struktur vilket indikerade att varia-tion i denna egenskap kunde vara adaptiv. Emellertid försvann alla skillnadernai uttryck efter korrigering för populationens genetiska struktur. Detta indike-rade antingen att de flesta skillnaderna i genuttryck var neutrala eller att denlokala anpassningen, om den var närvarande, var svag och korrelerad med po-pulationsstrukturen. Vi upptäckte också några bevis för att anpassningen kundevara mer lokal och variera inom varje genetisk grupp.I Artikel III och IV fokuserade vi på utvecklingen av de två subgenom-

en hos C. bursa-pastoris och mängden kvarvarande föräldraarv i dessa tvåsubgenom. Vi upptäckte att båda subgenomen uttryckte kraftig minskning avgenetisk mångfald efter artbildningen av C. bursa-pastoris, förmodligen pågrund av självbefruktning. Trots deras gemensam historia som en art behöllde två subgenomen också skillnader i mängden genetisk belastning som ärvtsav föräldraarterna och påverkades differentiellt av introgression och urval i detre populationerna. Variationen i genuttryck var högt korrelerad mellan de tvåsubgenomen men en liten del av generna visade differentiering över populatio-ner och vävnader. Det fanns flera gener som visade överuttryck av subgenometsom ärvts av C. grandiflora i Mellanöstern och Europa, medan de två subge-nomen uttrycktes lika i Asien. I blommor hade C. bursa-pastoris ett uttrycksom liknar C. orientalis både i den övergripande nivån av transkript och påsubgenomnivån, medan den i blad och rötter liknade C. grandiflora.Sammanfattningsvis visar denna avhandling för första gången att allopoly-

ploider kanske inte är homogena i sina distributionsområden och de kan ocksåmarkant följa olika utvecklingsvägar i olika populationer. Denna avhandlingvisar också att trots om sammanslagningen av två divergerande genom i al-lopolyploider resulterar i deras konvergens, så kan vissa skillnader bibehållasoch troligtvis möjliggör dessa skillnader evolutionär flexibilitet hos allopoly-ploida genom.

38

Резюме українською

Алополіплоїд - це організм, який утворився шляхом гібридизації та одно-часного збільшення набору хромосом (поліплоїдія). Гібридизація та полі-плоїдія зазвичай мають негативні наслідки для організму. Однак є види,які не тільки виживаюсь при цих модифікаціях, а навіть процвітають таконкурують з батьківськими видами. Є кілька теоретичних пояснень цьо-го явища, але кількість практичних досліджень обмежена.Вид Capsella bursa-pastoris, відомий як грицики звичайні, є перспе-

ктивною моделлю для вивчення таких успішних алополіплоїдних видів.C. bursa-pastoris поширена по всьому світі, тоді як її батьківські види,Capsella grandiflora та Capsella orientalis, мають значно менші ареали.ПоширенняC. grandiflora обмежується Грецією та Албанією, аC. orienta-lis зустрічається в Середній Азії. Генетичні дослідження грициків звичай-них були розпочаті на початку XX століття. Проте дослідження еволюцій-ної біології C. bursa-pastoris було розпочато значно пізніше. Наприклад,походження C. bursa-pastoris залишалось невідомим досить довго і тіль-ки кілька років тому з’ясувалося що C. bursa-pastoris є алополіплоїдом.Про всесвітнє поширенняC. bursa-pastoris було відомо давно, але не буловстановлено чи існує у цього виду певна генетична структура популяції,звідки цей вид походить і яким чином він поширився. Алополіплоїдне по-ходження C. bursa-pastoris також ставить питання щодо еволюції її двохсубгеномів, які були успадковані від досить різних батьків - генетичнобідного пращура C. orientalis, який розмножується самозапиленням, танадзвичайно генетично різноманітного пращура C. grandiflora, який роз-множується перехресним запиленням. Також якщо C. bursa-pastoris зу-стрічається в різних середовищах по всьому світі, як вона адаптується доцих різноманітних умов? Я проаналізував геномні, транскриптомні та де-які фенотипічні дані C. bursa-pastoris для вирішення всіх цих питань вцій дисертації.У статті I ми описали генетичні варіації C. bursa-pastoris і показали,

що цей вид не є однорідними, а поділяться на три генетично відміннихпопуляції на території Євразії: одна розповсюджена в Європі та на СходіСибіру, друга - в Східній Азії, а третя - на Близькому Сході та ПівнічнійАфриці. Реконструкція історії колонізації цих регіонів показала, що цейвид зародився на Близькому Сході й згодом поширився в Європу та Схі-дну Азію. Ми також виявили, що зразки зі Сходу Сибіру генетично нале-жали до європейської групи, але були розташовані ближче до азіатської.Це дозволило припустити, що така далека колонізаціяшвидше за все здій-снилась за допомогою людей вздовж Транссибірської магістралі, або під

39

час колонізації Сибіру людьми в XVI столітті. Аналогічним чином, похо-дження північноамериканської популяції від європейської та близькосхі-дної популяції імовірно є наслідком розповсюдження з людьми.У статті IIми перевірили наявність ознак локальної адаптації в експре-

сії генів та варіації часу початку цвітіння, а також перевірили чи варіаціїцих двох рис залежать від виявленої генетичної структури. Фенотипічнівідмінності, що спостерігаються у структурованих популяціях, можутьбути пов’язані з дрейфом генів, і якщо не брати до уваги генетичну від-стань між популяціями, відмінності можуть бути помилково інтерпрето-вані як адаптивні. Ми виявили, що всі три популяції демонструють чі-тке розмежування в кліматичних умовах, експресії генів та в часі поча-тку цвітіння. На варіацію в часі цвітіння, як потенційно адаптивної риси,оскільки початок цвітіння в потрібний час забезпечує репродуктивнийуспіх, популяційна генетична структура мала слабкий вплив, що вказуєна те, що варіація цієї ознаки може бути адаптивною. Проте всі відмінно-сті в експресії генів зникли після корекції даних на генетичну структурупопуляції. Це вказує на те, що більшість відмінностей в експресії генів єнейтральними або місцева адаптація, якщо вона присутня, є слабкою тазбігатися зі структурою популяції. Ми також виявили деякі докази, щоадаптація може бути більш локальною в межах кожної генетичної групи.У статтях III та IV ми зосередили увагу на еволюції двох субгеномів

та кількості батьківських рис, що залишились в цих двох субгеномах. Мивиявили, що обидва субгенома проявляють зниження генетичної різно-манітності після видоутворення C. bursa-pastoris, ймовірно через самоза-пилення. Проте, не зважаючи на спільну історію в межах одного виду, цідва субгеноми також зберегли відмінності в кількості генетичного тягаря,успадкованого від батьківських видів, а також зазнали різного впливу від-бору та гібридизації з близькими видами. Варіація експресії генів сильнокорелювала між субгеномами, але мала частка генів відрізнялась в екс-пресії між популяціями та тканинами. На Близькому Сході та в Європібільша частка генів демонструвала надмірну експресію субгеному успад-кованого від C. grandiflora, тоді як в Азії субгеноми були більш рівні міжсобою. У квітах експресія генів була більш подібна до C. orientalis, тодіяк у листках та корінні вона була більш схожа з C. grandiflora.На завершення варто зазначити, що ця дисертація вперше демонструє,

що алополіплоїди можуть бути генетично неоднорідними по територіїпоширення й еволюційні шляхи можуть різнитися між популяціями. Цяробота також виявила, що хоча об’єднання двох різних геномів у алопо-ліплоїдах призводить до їх еволюційного зближення, деякі відмінностіміж ними можуть бути збережені досить довго, і ці відмінності ймовірнозабезпечують еволюційну гнучкість алополіплідних видів.

40

Acknowledgements

There were many people who helped me during the course of this thesis as wellas on the way to it and I would like to reflect on all of them here.First of all, I would like to express my sincere gratitude toMartin Lascoux,

my Ph.D. supervisor, for giving me a chance to do this Ph.D. project. I am verythankful for your patient guidance, enthusiastic encouragement, gentle criti-cism and constant positivity. These four years have been exciting, challengingand gratifying. You taught me to like negative and intricate results. I also haveto admit that thanks to you I do appreciate neutral evolutionary processes now.I would like to express my very great appreciation to Stephen Wright and

Sylvain Glémin for their advice and assistance during the work on this Ph.D.thesis. Stephen, thank you for your hospitality during my visit to Toronto andfor your prompt replies during our remote communication. Sylvain, your ideaswere fascinating and inspiring. I would also like to thank Ulf Lagercrantz forco-supervising this project. Your assistance on the RNA-Seq experiments wasvery valuable.Special thanks go to Karl Holm, Amandine Cornille, Jun Chen, Maria

Guerrina, Clément Lafon-Placette and Marion Orsucci for their coopera-tion in some projects of this thesis. Karl, your time and willingness to helpwhen I just started my Ph.D. studies is greatly appreciated. Amandine, it hasbeen fun to share the office with you and to experience fickle Swedish weatherduring the common garden experiment. Jun, your expertise in genomic analy-ses was invaluable. Maria and Clément, thank you for helping with the crosses.Marion, I greatly appreciate your help at the final stage of this thesis when therewas strong time pressure.I also wish to thank Mimmi Eriksson and Tianlin Duan for doing their

Master’s project with us. Your persistence and independence was impressiveand the results you produced were a great contribution to this thesis. My grate-ful thanks also go to our collaborators in Toronto. Adriana Salcedo, JuliaKreiner, Tyler Kent, Huirun Huang and John Stinchcombe, working withyou has been pleasant and productive.I also wish to recognize the valuable help of Kerstin Jeppssonwhose assis-

tance in the lab was vital. Without your responsive help and timely provisionof chemicals, none of my Ph.D. projects would be finished in time.I express my very profound gratitude to my friend Glib Mazepa, who en-

couraged me to apply to this Ph.D. program and helped with the applicationprocess. Your contribution to this thesis was indirect but extremely valuable.

41

My sincere thanks also goes toYevgen Ryeznik for his continuous encourage-ment. Talking to you in some down moments helped me to keep going. It wasalso a pleasure to work with you on our Data Science project.My grateful thanks are also extended to all people at Evolutionary Biology

Center and the Department of Plant Ecology and Evolution, in particular. I amthankful toAndres Cortes, Xiaodong Liu, Lili Li, Judith Trunschke, ElodieChapurlat, Giulia Zacchello, Luis Leal, Anna-Malin Linde, Fia Bengts-son, Charles Campbell, Matthew Tye, Linus Söderquist, Froukje Postma,Thomas Ellis, Venkat Talla, Ludovic Dutoit, Nagarjun Vijay, Moos Blom,Foteini Spagopoulou, Homa Papoli, Paulina Bolivar, Willian Silva, Ser-gio Tusso, Lore Ament, Kristaps Sokolovskis, Josefine Stångberg, IvainMartinossi, Matthias Weissensteiner, and Jesper Svedberg for stimulatingscientific discussions and all the fun we had together.I also would like to thank the organizers of all courses I took during my

Ph.D. studies. I am especially grateful to Lucas Sinclair, Ludovic Dutoit,and Alexander Eiler for the Python course, Raazesh Sainudiin for the DataScience course,Andreas Dahlin for the Visualize your Science course, TereseBergfors for the Writing Scientific English course, and Göran Arnqvist forthe Statistics course. These courses had the most profound impact on my sci-entific skills.Probably, I would never be able to do this Ph.D. thesis without the knowl-

edge and experience I gained during the Erasmus Mundus Master Program inEvolutionary Biology. I would like to pay my regards to people who run thisprogram, in particular, Franjo Weissing, Irma Knevel and Jacob Höglund.When I look back and think about the time I decided to pursue a scientific

career, I wish to acknowledge my very first scientific supervisor, GennadiyShandikov. You kept telling me to do what I like and dispelled all my doubts.I am glad I followed your advice and I owe you a debt of gratitude for the veryfirst step towards a Ph.D. degree.I must express my very profound gratitude to my English teacher, Nikolay

Yakymenko. You opened me a door to international science and made thisPh.D. thesis possible.Finally, this accomplishment would not have been possible without unfail-

ing support from my parents, sister, and wife. They have given up manythings for me to do this Ph.D. thesis. Thank you for your patience, under-standing and motivation during these last four years and my life in general.My sister deserves special thanks for designing the cover of this thesis. I alsowish to thank my little daughter who was born just at the time of writing thisthesis. I will blissfully remember restless days and sleepless night in the strug-gle to write this thesis and to calm you down.

42

References

[1] A. Salman-Minkov, N. Sabath, and I. Mayrose, “Whole-genome duplication asa key factor in crop domestication,” Nature Plants, vol. 2, p. 16115, 2016.

[2] S. Renny-Byfield and J. F. Wendel, “Doubling down on genomes: Polyploidyand crop plants,” American Journal of Botany, vol. 101, no. 10,pp. 1711–1725, 2014.

[3] M. C. Sattler, C. R. Carvalho, and W. R. Clarindo, “The polyploidy and its keyrole in plant breeding,” Planta, vol. 243, no. 2, pp. 281–296, 2016.

[4] G. L. Stebbins, Processes of organic evolution. Prentice-Hall, 1971.[5] W. Wagner Jr, “Biosystematics and evolutionary noise,” Taxon, pp. 146–151,

1970.[6] R. J. Schultz, “Role of polyploidy in the evolution of fishes,” in Polyploidy,

pp. 313–340, Springer, 1980.[7] D. E. Soltis, C. J. Visger, D. B. Marchant, and P. S. Soltis, “Polyploidy: Pitfalls

and paths to a paradigm,” American Journal of Botany, vol. 103, no. 7,pp. 1146–1166, 2016.

[8] I. Mayrose, S. H. Zhan, C. J. Rothfels, K. Magnuson-Ford, M. S. Barker, L. H.Rieseberg, and S. P. Otto, “Recently formed polyploid plants diversify at lowerrates,” Science, vol. 333, no. 6047, pp. 1257–1257, 2011.

[9] D. E. Soltis, M. C. Segovia-Salcedo, I. Jordon-Thaden, L. Majure, N. M.Miles, E. V. Mavrodiev, W. Mei, M. B. Cortez, P. S. Soltis, and M. A.Gitzendanner, “Are polyploids really evolutionary dead-ends (again)? Acritical reappraisal of Mayrose et al. (2011),” New Phytologist, vol. 202, no. 4,pp. 1105–1117, 2014.

[10] I. Mayrose, S. H. Zhan, C. J. Rothfels, N. Arrigo, M. S. Barker, L. H.Rieseberg, and S. P. Otto, “Methods for studying polyploid diversification andthe dead end hypothesis: a reply to Soltis et al. (2014),” New Phytologist,vol. 206, no. 1, pp. 27–35, 2015.

[11] E. A. Kellogg, “Has the connection between polyploidy and diversificationactually been tested?,” Current Opinion in Plant Biology, vol. 30, pp. 25–32,2016.

[12] E. Mayr, Systematics and the origin of species, from the viewpoint of azoologist. Harvard University Press, 1942.

[13] M. L. Arnold, Natural hybridization and evolution. Oxford University Press,1997.

[14] J. Mallet, “Hybridization as an invasion of the genome,” Trends in Ecology &Evolution, vol. 20, no. 5, pp. 229–237, 2005.

[15] J. Mallet, “Hybrid speciation,” Nature, vol. 446, no. 7133, pp. 279–283, 2007.[16] J. A. Coyne and H. A. Orr, Speciation. Sinauer Associates, Inc, 2004.[17] I. Schön, K. Martens, and P. van Dijk, “Lost sex,” The evolutionary biology of

parthenogenesis, 2009.

43

[18] B. C. Barringer, “Polyploidy and self-fertilization in flowering plants,”American Journal of Botany, vol. 94, no. 9, pp. 1527–1533, 2007.

[19] B. C. Husband, B. Ozimec, S. L. Martin, and L. Pollock, “Matingconsequences of polyploid evolution in flowering plants: current trends andinsights from synthetic polyploids,” International Journal of Plant Sciences,vol. 169, no. 1, pp. 195–206, 2008.

[20] J. F. Wendel, “Genome evolution in polyploids,” in Plant molecular evolution,pp. 225–249, Springer, 2000.

[21] S. Ohno, Evolution by Gene Duplication. Springer, 1970.[22] G. Blanc and K. H. Wolfe, “Functional divergence of duplicated genes formed

by polyploidy during Arabidopsis evolution,” The Plant Cell, vol. 16, no. 7,pp. 1679–1691, 2004.

[23] K. L. Adams, R. Percifield, and J. F. Wendel, “Organ-specific silencing ofduplicated genes in a newly synthesized cotton allotetraploid,” Genetics,vol. 168, no. 4, pp. 2217–2226, 2004.

[24] W. Albertin, T. Balliau, P. Brabant, A.-M. Chevre, F. Eber, C. Malosse, andH. Thiellement, “Numerous and rapid nonstochastic modifications of geneproducts in newly synthesized Brassica napus allotetraploids,” Genetics,vol. 173, no. 2, pp. 1101–1113, 2006.

[25] I. S. Pearse, T. Krügel, and I. T. Baldwin, “Innovation in anti-herbivore defensesystems during neopolypoloidy – the functional consequences of instantaneousspeciation,” The Plant Journal, vol. 47, no. 2, pp. 196–210, 2006.

[26] S. P. Otto and J. Whitton, “Polyploid incidence and evolution,” Annual Reviewof Genetics, vol. 34, no. 1, pp. 401–437, 2000.

[27] L. H. Rieseberg, O. Raymond, D. M. Rosenthal, Z. Lai, K. Livingstone,T. Nakazato, J. L. Durphy, A. E. Schwarzbach, L. A. Donovan, and C. Lexer,“Major ecological transitions in wild sunflowers facilitated by hybridization,”Science, vol. 301, no. 5637, pp. 1211–1216, 2003.

[28] D. Schwarz, K. D. Shoemaker, N. L. Botteri, and B. A. McPheron, “A novelpreference for an invasive plant as a mechanism for animal hybrid speciation,”Evolution, vol. 61, no. 2, pp. 245–256, 2007.

[29] A. W. Nolte, J. Freyhof, K. C. Stemshorn, and D. Tautz, “An invasive lineageof sculpins, Cottus sp.(Pisces, Teleostei) in the rhine with new habitatadaptations has originated from hybridization between old phylogeographicgroups,” Proceedings of the Royal Society of London B: Biological Sciences,vol. 272, no. 1579, pp. 2379–2387, 2005.

[30] Z. Gompert, J. A. Fordyce, M. L. Forister, A. M. Shapiro, and C. C. Nice,“Homoploid hybrid speciation in an extreme habitat,” Science, vol. 314,no. 5807, pp. 1923–1925, 2006.

[31] J. Mavárez, C. A. Salazar, E. Bermingham, C. Salcedo, C. D. Jiggins, andM. Linares, “Speciation by hybridization in Heliconius butterflies,” Nature,vol. 441, no. 7095, pp. 868–871, 2006.

[32] J. M. Kreiner, P. Kron, and B. C. Husband, “Evolutionary dynamics ofunreduced gametes.,” Trends In Genetics, vol. 33, no. 9, pp. 583–593, 2017.

[33] J. M. Kreiner, P. Kron, and B. C. Husband, “Frequency and maintenance ofunreduced gametes in natural plant populations: associations withreproductive mode, life history and genome size.,” The New Phytologist,

44

vol. 214, no. 2, pp. 879–889, 2017.[34] M. Upcott, “The nature of tetraploidy in Primula kewensis,” Journal of

Genetics, vol. 39, pp. 79–100, Nov 1939.[35] B. Astaurov, “Experimental polyploidy in animals,” Annual Review of

Genetics, vol. 3, no. 1, pp. 99–126, 1969.[36] W. Parrott, R. Smith, and M. Smith, “Bilateral sexual tetraploidization in red

clover,” Canadian journal of genetics and cytology, vol. 27, no. 1, pp. 64–68,1985.

[37] J. E. Werner and S. J. Peloquin, “Yield and tuber characteristics of 4x progenyfrom 2x×2x crosses,” Potato Research, vol. 34, no. 3, pp. 261–267, 1991.

[38] H. Ochman, J. G. Lawrence, and E. A. Groisman, “Lateral gene transfer andthe nature of bacterial innovation,” Nature, vol. 405, no. 6784, pp. 299–304,2000.

[39] K. D. Whitney, R. A. Randell, and L. H. Rieseberg, “Adaptive introgression ofherbivore resistance traits in the weedy sunflower Helianthus annuus,” TheAmerican Naturalist, vol. 167, no. 6, pp. 794–807, 2006.

[40] M. Weill, F. Chandre, C. Brengues, S. Manguin, M. Akogbeto, N. Pasteur,P. Guillet, and M. Raymond, “The kdr mutation occurs in the Mopti form ofAnopheles gambiaes. s. through introgression,” Insect Molecular Biology,vol. 9, no. 5, pp. 451–455, 2000.

[41] K. K. Dasmahapatra, J. R. Walters, A. D. Briscoe, J. W. Davey, A. Whibley,N. J. Nadeau, A. V. Zimin, D. S. Hughes, L. C. Ferguson, S. H. Martin, et al.,“Butterfly genome reveals promiscuous exchange of mimicry adaptationsamong species,” Nature, vol. 487, no. 7405, p. 94, 2012.

[42] P. R. Grant and B. R. Grant, 40 Years of Evolution: Darwin’s Finches onDaphne Major Island. Princeton University Press, 2014.

[43] F. Racimo, S. Sankararaman, R. Nielsen, and E. Huerta-Sánchez, “Evidencefor archaic adaptive introgression in humans,” Nature Reviews Genetics,vol. 16, no. 6, p. 359, 2015.

[44] L. Comai, “The advantages and disadvantages of being polyploid,” NatureReviews Genetics, vol. 6, no. 11, pp. 836–846, 2005.

[45] J. A. Tate, P. Joshi, K. A. Soltis, P. S. Soltis, and D. E. Soltis, “On the road todiploidization? Homoeolog loss in independently formed populations of theallopolyploid Tragopogon miscellus (Asteraceae),” BMC Plant Biology, vol. 9,no. 1, p. 80, 2009.

[46] J. C. Schnable, X. Wang, J. C. Pires, and M. Freeling, “Escape frompreferential retention following repeated whole genome duplications inplants,” Frontiers in plant science, vol. 3, p. 94, 2012.

[47] R. De Smet, K. L. Adams, K. Vandepoele, M. C. Van Montagu, S. Maere, andY. Van de Peer, “Convergent gene loss following gene and genomeduplications creates single-copy families in flowering plants,” Proceedings ofthe National Academy of Sciences, vol. 110, no. 8, pp. 2898–2903, 2013.

[48] A. M. Session, Y. Uno, T. Kwon, J. A. Chapman, A. Toyoda, S. Takahashi,A. Fukui, A. Hikosaka, A. Suzuki, M. Kondo, et al., “Genome evolution in theallotetraploid frog Xenopus laevis,” Nature, vol. 538, no. 7625, pp. 336–343,2016.

[49] H. B. D. Kettlewell, “Selection experiments on industrial melanism in the

45

lepidoptera,” Heredity, vol. 9, no. 3, p. 323, 1955.[50] L. Cook, B. Grant, I. Saccheri, and J. Mallet, “Selective bird predation on the

peppered moth: the last experiment of Michael Majerus,” Biology Letters,vol. 8, no. 4, pp. 609–612, 2012.

[51] J. B. S. Haldane, “A mathematical theory of natural and artificial selection,part V: selection and mutation,” inMathematical Proceedings of theCambridge Philosophical Society, vol. 23, pp. 838–844, Cambridge UniversityPress, 1927.

[52] J. Masel, “Genetic drift,” Current Biology, vol. 21, no. 20, pp. R837–R838,2011.

[53] M. Kimura et al., “Evolutionary rate at the molecular level,” Nature, vol. 217,no. 5129, pp. 624–626, 1968.

[54] M. W. Hahn, “Toward a selection theory of molecular evolution,” Evolution,vol. 62, no. 2, pp. 255–265, 2008.

[55] A. Wagner, “Neutralism and selectionism: a network-based reconciliation,”Nature Reviews Genetics, vol. 9, no. 12, pp. 965–974, 2008.

[56] M. Nei, Y. Suzuki, and M. Nozawa, “The neutral theory of molecularevolution in the genomic era,” Annual Review of Genomics and HumanGenetics, vol. 11, pp. 265–289, 2010.

[57] J. J. Vitti, S. R. Grossman, and P. C. Sabeti, “Detecting natural selection ingenomic data,” Annual Review of Genetics, vol. 47, pp. 97–120, 2013.

[58] S. Wright, “Evolution in Mendelian populations,” Genetics, vol. 16, no. 2,pp. 97–159, 1931.

[59] F. Tajima, “Statistical method for testing the neutral mutation hypothesis byDNA polymorphism.,” Genetics, vol. 123, no. 3, pp. 585–595, 1989.

[60] G. Watterson, “On the number of segregating sites in genetical models withoutrecombination,” Theoretical Population Biology, vol. 7, no. 2, pp. 256–276,1975.

[61] V. Grant, Plant speciation. New York: Columbia University Press, 1981.[62] G. N. Stone, S. Nee, and J. Felsenstein, “Controlling for non-independence in

comparative analysis of patterns across populations within species,”Philosophical Transactions of the Royal Society of London B: BiologicalSciences, vol. 366, no. 1569, pp. 1410–1424, 2011.

[63] B. Devlin and K. Roeder, “Genomic control for association studies,”Biometrics, vol. 55, no. 4, pp. 997–1004, 1999.

[64] J. K. Pritchard and P. Donnelly, “Case-control studies of association instructured or admixed populations,” Theoretical Population Biology, vol. 60,no. 3, pp. 227–237, 2001.

[65] J. R. Andersen, T. Schrag, A. E. Melchinger, I. Zein, and T. Lübberstedt,“Validation of Dwarf8 polymorphisms associated with flowering time in eliteEuropean inbred lines of maize (Zea mays l.),” Theoretical and AppliedGenetics, vol. 111, no. 2, pp. 206–217, 2005.

[66] L. Camus-Kulandaivelu, J.-B. Veyrieras, D. Madur, V. Combes, M. Fourmann,S. Barraud, P. Dubreuil, B. Gouesnard, D. Manicacci, and A. Charcosset,“Maize adaptation to temperate climate: relationship between populationstructure and polymorphism in the Dwarf8 gene,” Genetics, vol. 172, no. 4,pp. 2449–2463, 2006.

46

[67] D. Kryvokhyzha, “Whole genome resequencing of Heliconius butterfliesrevolutionizes our view of the level of admixture between species,” Master’sthesis, Uppsala University, 2014.

[68] Y. Song, S. Endepols, N. Klemann, D. Richter, F.-R. Matuschka, C.-H. Shih,M. W. Nachman, and M. H. Kohn, “Adaptive introgression of anticoagulantrodent poison resistance by hybridization between old world mice,” CurrentBiology, vol. 21, no. 15, pp. 1296–1301, 2011.

[69] I. W. G. S. Consortium et al., “A chromosome-based draft sequence of thehexaploid bread wheat (Triticum aestivum) genome,” Science, vol. 345,no. 6194, p. 1251788, 2014.

[70] A. V. Zimin, D. Puiu, R. Hall, S. Kingan, B. J. Clavijo, and S. L. Salzberg,“The first near-complete assembly of the hexaploid bread wheat genome,Triticum aestivum.,” GigaScience, vol. 6, no. 11, pp. 1–7, 2017.

[71] S. R. Browning and B. L. Browning, “Haplotype phasing: existing methodsand new developments,” Nature Reviews Genetics, vol. 12, no. 10, p. 703,2011.

[72] J. F. Crow, “90 years ago: The beginning of hybrid maize,” Genetics, vol. 148,no. 3, pp. 923–928, 1998.

[73] G. H. Shull, “Elementary species and hybrids of Bursa,” Science, vol. 25,no. 641, p. 590, 1907.

[74] S. E. Hill, “Chromosome numbers in the genus Bursa,” The BiologicalBulletin, vol. 53, no. 6, pp. 413–415, 1927.

[75] G. H. Shull, “Species hybridization among old and new species of shepherd’spurse,” Int Congr Plant Sci., no. 1, pp. 837–888, 1929.

[76] H. Hurka, N. Friesen, D. A. German, A. Franzke, and B. Neuffer, “‘Missinglink’ species Capsella orientalis and Capsella thracica elucidate evolution ofmodel plant genus Capsella (Brassicaceae),”Molecular Ecology, vol. 21,no. 5, pp. 1223–1238, 2012.

[77] M. Coquillat, “Sur les plantes les plus communes a la surface du globe,”Bulletin mensuel de la Société linnéenne de Lyon, vol. 20, no. 7, pp. 165–170,1951.

[78] G. M. Douglas, G. Gos, K. A. Steige, A. Salcedo, K. Holm, E. B. Josephs,R. Arunkumar, J. A. Ågren, K. M. Hazzouri, W. Wang, et al., “Hybrid originsand the earliest stages of diploidization in the highly successful recentpolyploid Capsella bursa-pastoris,” Proceedings of the National Academy ofSciences, vol. 112, no. 9, pp. 2806–2811, 2015.

[79] D. Kryvokhyzha, A. Salcedo, M. C. Eriksson, T. Duan, N. Tawari, J. Chen,M. Guerrina, J. M. Kreiner, T. V. Kent, U. Lagercrantz, et al., “Parental legacy,demography, and introgression influenced the evolution of the twosubgenomes of the tetraploid Capsella bursa-pastoris (Brassicaceae),”bioRxiv, p. 234096, 2017.

[80] H. Hurka, S. Freundner, A. H. Brown, and U. Plantholt, “Aspartateaminotransferase isozymes in the genus Capsella (Brassicaceae): Subcellularlocation, gene duplication, and polymorphism,” Biochemical genetics, vol. 27,no. 1, pp. 77–90, 1989.

[81] T. Slotte, K. M. Hazzouri, J. A. Ågren, D. Koenig, F. Maumus, Y.-L. Guo,K. Steige, A. E. Platts, J. S. Escobar, L. K. Newman, et al., “The Capsella

47

rubella genome and the genomic consequences of rapid mating systemevolution,” Nature Genetics, vol. 45, no. 7, pp. 831–835, 2013.

[82] M. A. Koch, B. Haubold, and T. Mitchell-Olds, “Comparative evolutionaryanalysis of chalcone synthase and alcohol dehydrogenase loci in Arabidopsis,Arabis, and related genera (Brassicaceae),”Molecular Biology and Evolution,vol. 17, no. 10, pp. 1483–1498, 2000.

[83] M. Koch, B. Haubold, and T. Mitchell-Olds, “Molecular systematics of theBrassicaceae: evidence from coding plastidic matK and nuclear Chssequences,” American Journal of Botany, vol. 88, no. 3, pp. 534–544, 2001.

[84] A. Sicard, N. Stacey, K. Hermann, J. Dessoly, B. Neuffer, I. Bäurle, andM. Lenhard, “Genetics, evolution, and adaptive significance of the selfingsyndrome in the genus Capsella,” The Plant Cell, vol. 23, no. 9,pp. 3156–3171, 2011.

[85] A. Sicard, C. Kappel, Y. W. Lee, N. J. Woźniak, C. Marona, J. R. Stinchcombe,S. I. Wright, and M. Lenhard, “Standing genetic variation in a tissue-specificenhancer underlies selfing-syndrome evolution in Capsella,” Proceedings ofthe National Academy of Sciences, vol. 113, no. 48, pp. 13911–13916, 2016.

[86] C. A. Rebernig, C. Lafon-Placette, M. R. Hatorangan, T. Slotte, and C. Köhler,“Non-reciprocal interspecies hybridization barriers in the Capsella genus areestablished in the endosperm,” PLoS Genetics, vol. 11, no. 6, p. e1005295,2015.

[87] A. Sicard, C. Kappel, E. B. Josephs, Y. W. Lee, C. Marona, J. R. Stinchcombe,S. I. Wright, and M. Lenhard, “Divergent sorting of a balanced ancestralpolymorphism underlies the establishment of gene-flow barriers in Capsella,”Nature Communications, vol. 6, 2015.

[88] S. Petrone, M. Lascoux, and S. Glémin, “Competitive ability of Capsellaspecies with different mating systems and ploidy levels,” Annals of Botany,p. in press, 2018.

[89] X. Yang, M. Lascoux, and S. Glemin, “Competitive ability of a tetraploidselfing species (Capsella bursa-pastoris) across its expansion range andcomparison with its sister species,” bioRxiv, p. 214866, 2017.

[90] H. Hurka and B. Neuffer, “Evolutionary processes in the genus Capsella(Brassicaceae),” Plant Systematics and Evolution, vol. 206, no. 1-4,pp. 295–316, 1997.

[91] T. Slotte, K. Holm, L. M. McIntyre, U. Lagercrantz, and M. Lascoux,“Differential expression of genes important for adaptation in Capsellabursa-pastoris (Brassicaceae),” Plant physiology, vol. 145, no. 1, pp. 160–173,2007.

[92] T. Slotte, H.-R. Huang, K. Holm, A. Ceplitis, K. S. Onge, J. Chen,U. Lagercrantz, and M. Lascoux, “Splicing variation at a FLOWERINGLOCUS C homeolog is associated with flowering time variation in thetetraploid Capsella bursa-pastoris,” Genetics, vol. 183, no. 1, pp. 337–345,2009.

[93] H.-R. Huang, P.-C. Yan, M. Lascoux, and X.-J. Ge, “Flowering time andtranscriptome variation in Capsella bursa-pastoris (Brassicaceae),” NewPhytologist, vol. 194, no. 3, pp. 676–689, 2012.

[94] D. Kryvokhyzha, K. Holm, J. Chen, A. Cornille, S. Glémin, S. I. Wright,

48

U. Lagercrantz, and M. Lascoux, “The influence of population structure ongene expression and flowering time variation in the ubiquitous weed Capsellabursa-pastoris (Brassicaceae),”Molecular Ecology, vol. 25, no. 5,pp. 1106–1121, 2016.

[95] T. Slotte, H. Huang, M. Lascoux, and A. Ceplitis, “Polyploid speciation didnot confer instant reproductive isolation in Capsella (Brassicaceae),”Molecular Biology and Evolution, vol. 25, no. 7, pp. 1472–1481, 2008.

[96] A. Cornille, A. Salcedo, D. Kryvokhyzha, S. Glémin, K. Holm, S. I. Wright,and M. Lascoux, “Genomic signature of successful colonization of eurasia bythe allopolyploid shepherd’s purse (Capsella bursa-pastoris),”MolecularEcology, vol. 25, no. 2, pp. 616–629, 2016.

[97] C. Parisod, “Polyploids integrate genomic changes and ecological shifts,” NewPhytologist, vol. 193, no. 2, pp. 297–300, 2012.

[98] R. J. Elshire, J. C. Glaubitz, Q. Sun, J. A. Poland, K. Kawamoto, E. S. Buckler,and S. E. Mitchell, “A robust, simple genotyping-by-sequencing (GBS)approach for high diversity species,” PloS one, vol. 6, no. 5, p. e19379, 2011.

[99] D. H. Alexander, J. Novembre, and K. Lange, “Fast model-based estimation ofancestry in unrelated individuals,” Genome research, vol. 19, no. 9,pp. 1655–1664, 2009.

[100] M. A. Beaumont, W. Zhang, and D. J. Balding, “Approximate Bayesiancomputation in population genetics.,” Genetics, vol. 162, no. 4,pp. 2025–2035, 2002.

[101] B. S. Weir and C. C. Cockerham, “Estimating F-statistics for the analysis ofpopulation structure,” evolution, vol. 38, no. 6, pp. 1358–1370, 1984.

[102] M. A. Beaumont, “Approximate Bayesian Computation in Evolution andEcology,” dx.doi.org, 2010.

[103] C. Roux and J. R. Pannell, “Inferring the mode of origin of polyploid speciesfrom next-generation sequence data,”Molecular ecology, vol. 24, no. 5,pp. 1047–1059, 2015.

[104] A. Ceplitis, Y. Su, and M. Lascoux, “Bayesian inference of evolutionaryhistory from chloroplast microsatellites in the cosmopolitan weed Capsellabursa-pastoris (Brassicaceae),”Molecular ecology, vol. 14, no. 14,pp. 4221–4233, 2005.

[105] P. Khaitovich, G. Weiss, M. Lachmann, I. Hellmann, W. Enard, B. Muetzel,U. Wirkner, W. Ansorge, and S. Pääbo, “A neutral model of transcriptomeevolution,” PLoS Biology, vol. 2, no. 5, p. e132, 2004.

[106] M. Broadley, P. White, J. Hammond, N. Graham, H. Bowen, Z. Emmerson,R. Fray, P. Iannetta, J. McNicol, and S. May, “Evidence of neutraltranscriptome evolution in plants,” New Phytologist, vol. 180, no. 3,pp. 587–593, 2008.

[107] A. Whitehead and D. L. Crawford, “Neutral and adaptive variation in geneexpression,” Proceedings of the National Academy of Sciences, vol. 103,no. 14, pp. 5425–5430, 2006.

[108] P. Suárez-López, K. Wheatley, F. Robson, H. Onouchi, F. Valverde, andG. Coupland, “CONSTANS mediates between the circadian clock and thecontrol of flowering in Arabidopsis,” Nature, vol. 410, no. 6832, p. 1116, 2001.

[109] F. Valverde, A. Mouradov, W. Soppe, D. Ravenscroft, A. Samach, and

49

G. Coupland, “Photoreceptor regulation of constans protein in photoperiodicflowering,” Science, vol. 303, no. 5660, pp. 1003–1006, 2004.

[110] R. J. Hijmans, S. E. Cameron, J. L. Parra, P. G. Jones, and A. Jarvis, “Veryhigh resolution interpolated climate surfaces for global land areas,”International Journal of Climatology, vol. 25, no. 15, pp. 1965–1978, 2005.

[111] A. Trabucco and R. J. Zomer, “Global aridity index (global-aridity) and globalpotential evapo-transpiration (global-PET) geospatial database,” CGIARConsortium for Spatial Information, 2009.

[112] P. Pavlidis, J. D. Jensen, W. Stephan, and A. Stamatakis, “A critical assessmentof storytelling: gene ontology categories and the importance of validatinggenomic scans,” Molecular Biology and Evolution, vol. 29, no. 10,pp. 3237–3248, 2012.

[113] C. D. Huber, M. Nordborg, J. Hermisson, and I. Hellmann, “Keeping it local:Evidence for positive selection in Swedish Arabidopsis thaliana,” MolecularBiology and Evolution, vol. 31, no. 11, pp. 3026–3039, 2014.

[114] B. F. Voight, S. Kudaravalli, X. Wen, and J. K. Pritchard, “A map of recentpositive selection in the human genome,” PLoS Biology, vol. 4, no. 3, p. e72,2006.

[115] S. Peischl, I. Dupanloup, L. Bosshard, and L. Excoffier, “Genetic surfing inhuman populations: from genes to genomes,” Current Opinion in Genetics &Development, vol. 41, pp. 53–61, 2016.

[116] K. J. Gilbert, N. P. Sharp, A. L. Angert, G. L. Conte, J. A. Draghi,F. Guillaume, A. L. Hargreaves, R. Matthey-Doret, and M. C. Whitlock,“Local adaptation interacts with expansion load during range expansion:Maladaptation reduces expansion load,” The American Naturalist, vol. 189,no. 4, pp. 368–380, 2017.

[117] K. L. Adams, R. Cronn, R. Percifield, and J. F. Wendel, “Genes duplicated bypolyploidy show unequal contributions to the transcriptome and organ-specificreciprocal silencing,” Proceedings of the National Academy of Sciences,vol. 100, no. 8, pp. 4649–4654, 2003.

[118] R. J. Buggs, L. Zhang, N. Miles, J. A. Tate, L. Gao, W. Wei, P. S. Schnable,W. B. Barbazuk, P. S. Soltis, and D. E. Soltis, “Transcriptomic shock generatesevolutionary novelty in a newly formed, natural allopolyploid plant,” CurrentBiology, vol. 21, no. 7, pp. 551–556, 2011.

[119] R. J. Buggs, S. Chamala, W. Wu, J. A. Tate, P. S. Schnable, D. E. Soltis, P. S.Soltis, and W. B. Barbazuk, “Rapid, repeated, and clustered loss of duplicategenes in allopolyploid plant populations of independent origin,” CurrentBiology, vol. 22, no. 3, pp. 248–252, 2012.

[120] M.-J. Yoo, X. Liu, J. C. Pires, P. S. Soltis, and D. E. Soltis, “Nonadditive geneexpression in polyploids,” Annual Review of Genetics, vol. 48, pp. 485–517,2014.

[121] M. Pfeifer, K. G. Kugler, S. R. Sandve, B. Zhan, H. Rudi, T. R. Hvidsten, K. F.Mayer, O.-A. Olsen, I. W. G. S. Consortium, et al., “Genome interplay in thegrain transcriptome of hexaploid bread wheat,” Science, vol. 345, no. 6194,p. 1250091, 2014.

[122] P. J. Wittkopp, B. K. Haerum, and A. G. Clark, “Evolutionary changes in cisand trans gene regulation,” Nature, vol. 430, no. 6995, p. 85, 2004.

50

[123] M.-C. Combes, A. Dereeper, D. Severac, B. Bertrand, and P. Lashermes,“Contribution of subgenomes to the transcriptome and their intertwinedregulation in the allopolyploid Coffea arabica grown at contrastedtemperatures.,” The New Phytologist, vol. 200, no. 1, pp. 251–260, 2013.

51

Acta Universitatis UpsaliensisDigital Comprehensive Summaries of Uppsala Dissertationsfrom the Faculty of Science and Technology 1632

Editor: The Dean of the Faculty of Science and Technology

A doctoral dissertation from the Faculty of Science andTechnology, Uppsala University, is usually a summary of anumber of papers. A few copies of the complete dissertationare kept at major Swedish research libraries, while thesummary alone is distributed internationally throughthe series Digital Comprehensive Summaries of UppsalaDissertations from the Faculty of Science and Technology.(Prior to January, 2005, the series was published under thetitle “Comprehensive Summaries of Uppsala Dissertationsfrom the Faculty of Science and Technology”.)

Distribution: publications.uu.seurn:nbn:se:uu:diva-341709

ACTAUNIVERSITATIS

UPSALIENSISUPPSALA

2018