exploring genomic diversity in wild and cultivated barley using ......7.3.1 phenotype of bw370 and...
TRANSCRIPT
Exploring genomic diversity in wild and
cultivated barley using high throughput DNA
sequencing technology
By Cong Tan
This thesis was presented for the degree of Doctor of Philosophy of
Murdoch University
February 2018
I
Declaration
I declare that this thesis is my own account of my research and contains as its main
content work which has not previously been submitted for a degree at any tertiary
education institution.
26/02/2018
II
Acknowledgements
The completion of this thesis could not have succeeded without the support and help
from several institutions and numerous individuals.
My research was funded by a Murdoch University Strategic Scholarship. I thank
Murdoch University for providing the facilities, supporting my research and giving
me such an opportunity to pursue my research career.
I acknowledge and thank the Grains Research and Development Corporation for their
grant supporting my research work. Without this financial support it would have
been impossible to perform all the work in my thesis.
I wish to express my sincere gratitude to my supervisor Professor Chengdao Li, for
his unfailing encouragement, generosity and patient guidance. Without his support it
would not have been possible, for me, to overcome the challenges of pursuing my
research career in Australia. I am grateful to him for giving me the invaluable
opportunity to join the barley group (Western Barley Genetics Alliance) and
including me in the community of barley research. I will never forget the moments
when he patiently explained the details of my research which I needed to pay
attention to. He changed my life and brought me into the holy place of science.
I appreciate Professor Matthew Bellgard, Director, Centre for Comparative
Genomics for his support and guidance in bioinformatic analysis and providing
computing facilities for my research. I thank him for giving me the opportunity to
join the Centre for Comparative Genomics. I thank Dr Roberto Barrero, Dr Kathryn
III
Napier, Dr Brett Chapman and Dr Mark Black for their kind help and patient
guidance.
I appreciate the support and help from my colleague in the barley group including Dr
Qisen Zhang, Dr Gaofeng Zhou, Mrs Ruth Abbott, Ms Sharon Westcott, Dr Tefera
Angessa, Dr Hao Zhou, Dr Yong Jia, Dr Yong Han, Dr Camilla Hill. I thank Dr
Qisen Zhang for the time he took to correct my spoken English and revise my
manuscripts. To Dr Gaofeng Zhou thank you for your guidance during my research
and giving me a hand whenever I was in difficulties. To Ruth for helping me in
ordering “stuff” needed for my research, preparing documents for a visa application
and graduation. Thank you to Sharon, Tefera, Hao, Yong Jia and Yong Han for your
kindness and patience in teaching me experimental techniques and excellent
suggestion whenever I turned to you for help. Thanks Camilla for your suggestion
throughout my research and the time taken to revise my manuscripts.
Thank you to Professor Guoping Zhang, Professor Fei Dai, Dr Xiaolei Wang from
Zhejiang University for their collaboration and contribution to the RNA sequencing
of wild barley. I thank Professor Eviatar Nevo for providing seeds of wild barleys
collected from the “Evolution Canyon” and revising the manuscript of the chapter 3.
I thank Professor Zujun Yang, Guangrong Li, Bin Li and Tao lang from University
of Electronic Science and Technology of China for their contribution to the
experiment “Fluorescence in situ hybridization” in Chapter 3. Thank you to Beijing
Genomic Institute (Shenzhen) for their excellent service in DNA sequencing.
IV
I thank Murdoch University staff Mr Dale Banks, Ms Julie Blake, Ms Jacqueline
Dyer and Ms Amy Sheppard Moeen for their kind help and support during my
research.
Australia is the first foreign country that my wife and I have travelled to. It was a big
challenge to settle down and adapt to life abroad. It was Dr Xiao-Qi Zhang who put a
lot of time and effort into helping us find accommodation, buy living necessities and
settle in to the Australian way of life. We will never forget her compassionate care
like a mother, especially in the first several months. We were so moved when she
sent us warm dumplings and a pair of pillows once she knew we were having some
difficulties. In our first six months she drove us to buy our food every week and
sometimes taught us how to cook. Her kindness and care allowed us to quickly adapt
to life abroad and enable me to focus on my study and research.
Of course, this thesis could not have been undertaken without the strong support
from my family. I appreciate and am grateful to my parents for their love,
commitment and effort in supporting my studies and research for so many years.
To my wife P ing Zhou thank you for your unstinting support, love, care and being
there for me for during the past four years. For giving me such a lovely daughter
Yutong Tan and carrying almost all the responsibility in taking care of her when
things were difficult. To my daughter Yutong Tan thank you for bringing me such
happiness. Thank you to my sisters and young brother for their love and support
during my studies abroad.
V
Finally, I thank Ms Christine Davies for her for editing and proofreading the
literature review part of this thesis.
VI
Abstract
Barley (Hordeum vulgare L.) was domesticated from wild barley (Hordeum
spontaneum L.) around 10,000 years ago in the Fertile Crescent. It has been widely
used as a model organism to research plant adaptive evolutionary processes as well
as a vital genetic repository to explore plant abiotic and biotic stress tolerance. This
thesis used the recent barley genome sequence as a reference to characterise the
genomic evolution of wild barley, explore genomic diversity, and created a database
of barley genomic variation.
Wild barley accessions from opposite slopes of the ‘Evolution Canyon’ in Israel,
separated by only 250 m but with drastically different microclimates, were
sequenced at the transcriptome and whole-genome levels. Sympatric speciation was
identified between the wild barleys from the two slopes, and the distance between
them was even greater than that between cultivated barley and Tibetan wild barley.
About 243 Mb genomic regions were identified under environmental selection. The
number of upregulated genes from the south-facing slope wild barley was about eight
times that of the north-facing slope under drought stress.
To explore genomic diversity in cultivated barley germplasm from Australia , whole-
genome sequencing was conducted on eleven cultivated barley varieties. In total,
29,550,641 single nucleotide polymorphisms (SNPs) and 1,592,034 short insertions
and deletions (InDels) were identified. These SNPs and InDels can be used to
develop Kompetitive Allele Specific PCR (KASP) and InDel markers for genome-
assisted breeding.
VII
To facilitate access to and use of the genomic resources exploited in this study, a
database of barley genomic variations, BarleyVarDB, was created. This database
provides three main sets of information: (1) SNPs and InDels, (2) barley genome
reference and its annotation, and (3) 522,212 whole genome-wide simple sequence
repeats (SSRs).
Using the resources developed in this study, I characterised the mutation pattern of a
barley mutant derived from gamma radiation. The results indicate that mutations
induced by gamma radiation are not evenly distributed in the whole genome but
occur in several concentrated regions in chromosomes. Besides, I narrowed down the
candidate genes for high-tillering mutant phenotype from 139 to 15 by comparing the
sequence difference in the 10 Mb target region between a pair of near isogenic lines
using transcriptome sequencing.
VIII
List of abbreviations
Abbr. Full name
BAC Bacterial Artificial Chromosome
cDNA Complementary DNA
ChIP-seq Chromatin Immunoprecipitation Sequencing
DArT Diversity Arrays Technology
dNTP Deoxyribonucleoside Triphosphates
EC The Evolution Canyon, in Israel
EST Expressed Sequence Tag
FAO Food and Agriculture Organization
Fst Wright's Fixation Index
InDel Small Fragment Insertion and Deletion
KASP Kompetitive Allele Specific PCR
LD Linkage Disequilibrium
N50 the Shortest Sequence Length At 50% Of the Genome
NCBI National Center for Biotechnology Information
NFS European North-Facing Slope of Evolution Canyon
IX
Abbr. Full name
NGS Next-generation sequencing
NIL Near Isogenic Line
NT Nucleotide-NCBI
PCR Polymerase Chain Reaction
QTL Quantitative Trait Loci
RFLP Restriction Fragment Length Polymorphism
RIL Recombinant Inbred Line
RNA-seq RNA sequencing
SFS African South-Facing Slope of Evolution Canyon
SNP Single Nucleotide Polymorphism
SSR Simple Sequence Repeat
UN The United Nations
X
Contents
Declaration ........................................................................................................... I
Acknowledgements ............................................................................................. II
Abstract ............................................................................................................. VI
List of abbreviations ....................................................................................... VIII
List of figures ................................................................................................... XIV
List of tables .................................................................................................. XVII
Chapter 1 Introduction .................................................................................. 1
Chapter 2 Literature review .......................................................................... 4
2.1 Barley genomic research ..........................................................................5
2.1.1 Barley domestication and evolution ......................................................6
2.1.2 Constructing the barley reference genome ............................................7
2.1.3 Molecular markers and genotyping .......................................................9
2.1.4 Linkage mapping ............................................................................... 12
2.1.5 Association mapping.......................................................................... 14
2.2 DNA sequencing technology and application.......................................... 17
2.2.1 DNA sequencing technology .............................................................. 18
XI
2.2.2 Constructing reference genome .......................................................... 26
2.2.3 Detecting genetic variants .................................................................. 29
2.2.4 RNA-seq for analysing gene expression level ..................................... 30
2.2.5 ChIP-seq determining protein binding sites ......................................... 32
2.2.6 Bisulfite sequencing for detecting methylation .................................... 33
Chapter 3 Sympatric speciation and adaptive evolution of wild barley in
“Evolution Canyon” .......................................................................................... 36
3.1 Introduction........................................................................................... 38
3.2 Methods ................................................................................................ 41
3.3 Result.................................................................................................... 47
3.3.1 Dramatic divergence between wild barleys from SFS and NFS............ 47
3.3.2 Bigger genomic divergence between SFS and NFS wild barleys than that
between Tibetan wild barleys and domesticated barleys.................................. 49
3.3.3 Genomic regions under environmental selection of NFS and SFS ........ 56
3.3.4 Differential drought stress response of SFS and NFS wild barleys ....... 61
3.4 Discussion............................................................................................. 73
Chapter 4 Genomic diversity of cultivated barley ....................................... 81
4.1 Introduction........................................................................................... 82
XII
4.2 Methods ................................................................................................ 84
4.3 Results .................................................................................................. 88
4.3.1 Whole genome sequencing of eleven cultivated barleys ...................... 88
4.3.2 SNP and InDel variants in eleven cultivated barleys ............................ 90
4.3.3 Evolutionary relationship analysis ...................................................... 91
4.3.4 Genetic divergence between Australian cultivated barley and EC wild
barley 92
4.4 Discussion............................................................................................. 96
Chapter 5 BarleyVarDB, a database of barley genomic variation............... 99
5.1 Introduction......................................................................................... 101
5.2 Database construction .......................................................................... 103
5.3 Database description ............................................................................ 108
5.4 Future improvement ............................................................................ 113
Chapter 6 Characterization of genome-wide variations induced by gamma-
ray radiation in barley using RNAseq ............................................................. 114
6.1 Introduction......................................................................................... 116
6.2 Methods .............................................................................................. 119
6.3 Results ................................................................................................ 121
XIII
6.3.1 SNPs and InDels screening through transcriptome sequencing .......... 121
6.3.2 Genetic mutations induced by gamma ray radiation........................... 123
6.3.3 Genes affected by mutations............................................................. 124
6.4 Discussion........................................................................................... 128
Chapter 7 Characterization of a barley mutant with high-tillering and
narrow leaf 132
7.1 Introduction......................................................................................... 133
7.2 Methods .............................................................................................. 135
7.3 Result.................................................................................................. 139
7.3.1 Phenotype of BW370 and Bowman .................................................. 139
7.3.2 Narrow down candidate genes by RNAseq ....................................... 141
7.4 Discussion........................................................................................... 144
Reference ......................................................................................................... 146
Appendix 1 - Bioinformatic Pipeline ............................................................... 179
Appendix 2 - List of publications..................................................................... 187
Appendix 3- One manuscript........................................................................... 189
XIV
List of figures
Figure 1-1 Thesis structure......................................................................................2
Figure 2-1 Main content of literature review. ...........................................................4
Figure 2-2 DNA sequencing technology and application. ....................................... 18
Figure 2-3 Diagrams of chain termination sequencing. ........................................... 19
Figure 2-4 Illumina/Solexa sequencing (Mardis, 2008) .......................................... 21
Figure 2-5 PACBIO SMRT technology. ................................................................ 23
Figure 2-6 Distribution of read length in SMART sequencing (from PACBIO
website) ............................................................................................................... 23
Figure 2-7 Schematic diagram of DNA methylation sequencing (Tollefsbol, 2011). 34
Figure 3-1 Cross-section view and Sample collection sites of the “Evolution Canyon”
I (EC-I), Lower Nahal Oren Mount Carmel. .......................................................... 39
Figure 3-2 Diagram of methods and aims .............................................................. 41
Figure 3-3 Genomic diversity and evolution analysis based on RNA sequencing of
ten samples from two slopes of ‘’Evolution Canyon”. ............................................ 48
Figure 3-4 Genomic diversity and evolution related analysis based on whole genome
sequencing of ten samples from four different groups. ........................................... 53
Figure 3-5 Distribution of FST values across the whole genome. ............................. 57
XV
Figure 3-6 Selection sweep analysis in whole genome. .......................................... 58
Figure 3-7 Statistics of DEGs under drought stress treatment in SFS and NFS. ....... 62
Figure 3-8 Density of DEGs in each 10 Mb region along chromosomes under
drought stress treatment in SFS and NFS. .............................................................. 63
Figure 3-9 Over-represented and under-represented GO terms for differentially
expressed genes in SFS wild barley under drought stress........................................ 64
Figure 3-10 Over-represented and under-represented GO terms for differentially
expressed genes in NFS wild barley under drought stress. ...................................... 65
Figure 3-11 Karyotypical comparison of wild barley NFS6.1 and SFS2.3 and
cultivated barley Golden Promise by FISH. ........................................................... 74
Figure 4-1 Evolutionary relationships of 26 barley accessions. ............................... 92
Figure 4-2 Genomic difference between Australian cultivated barleys and EC wild
barleys. ................................................................................................................ 94
Figure 4-3 GO enrichment of genes under environmental selection. ....................... 95
Figure 5-1 Framework, implementation, and date collection of BarleyVarDB. ...... 104
Figure 5-2 Web interfaces of retrieving SNPs, INDELs, SSRs and gene annotation in
BarleyVarDB. .................................................................................................... 109
Figure 5-3 Examples of software applications in BarleyVarDB. ........................... 112
XVI
Figure 6-1 Phenotype of Vla-WT and Vla-MT and distribution of mutations in
chromosome 5 and chromosome 7....................................................................... 122
Figure 6-2 Genetic effects of mutations and biological function of genes with mutant
genes.................................................................................................................. 126
Figure 7-1 The generation of BW370. ................................................................. 136
Figure 7-2 Phenotype of BW370 and Bowman. ................................................... 140
XVII
List of tables
Table 2-1 Assembly and annotation statistics of the barley genome (Mascher et al. ,
2017) .....................................................................................................................9
Table 2-2 Performance comparison of dominant sequencers................................... 24
Table 2-3 Genome assembly and gene prediction of major crops and model
organisms............................................................................................................. 27
Table 3-1 Output statistics of whole-genome shotgun sequencing of ten barley
accessions. ........................................................................................................... 51
Table 3-2 Assembly statistics of eight barley accessions. ....................................... 52
Table 3-3 Evolutionary distance for ten EC wild barleys using whole genome
shotgun sequencing .............................................................................................. 55
Table 3-4 Plant reproductive related homology genes in barley .............................. 60
Table 3-5 Summary of DEGs in SFS and NFS wild barley under drought treatment 62
Table 3-6 Homology of plant drought related genes in barley ................................. 67
Table 4-1 Sample and data information ................................................................. 85
Table 4-2 Summary of sequencing output and variants calling ............................... 89
Table 4-3 Pairwise comparison of seven Australian barley varieties ....................... 90
Table 6-1 Genetic mutation counts in each chromosome. ..................................... 124
XVIII
Table 7-1 Candidate genes and function description............................................. 141
1
Chapter 1 Introduction
Food security is one of the major global issues. The rapidly expanding demand for
agricultural products mainly comes from the growing population and changing diet.
Increasing food production is facing several challenges including less arable land,
climate change and water scarcity (Godfray et al. , 2010). Improving crop yield is
considered a key measure for ensuring food security. Increases in crop yield in the
last century have relied on fertiliser use, which has caused severe environmental
pollution including land degeneration. This century, the principal approach for
increasing crop yields is to breed competitive crop varieties with improved grain and
environmental stress tolerance. Barley (Hordeum vulgare L.) is one of the founder
crops from the Old World and the fourth most important cereal crop now. It was
domesticated from wild barley (Hordeum spontaneum L.) around 10,000 years ago in
the Fertile Crescent. Barley can grow in diverse environments extreme latitude like
the Tibetan P lateau. It has been widely used as a model organism to research plant
adaptive evolutionary processes as well as a vital genetic repository to explore plant
abiotic and biotic stress tolerance.
This thesis used the recent barley genome sequence as a reference to characterise
genomic evolution of wild barley, explore genomic diversity, and develop tools for
the management and implementation of barley genomic diversity datasets. This
thesis comprises a literature review (Chapter 2) and five experimental chapters
(Chapters 3–7) (Figure 1-1).
2
Figure 1-1 Thesis structure
3
This thesis aimed to:
1. Confirm whether sympatric speciation occurred between wild barleys from
the south-facing and north-facing slopes of the ‘Evolution Canyon’ in Israel
and identify genomic regions and potential genes under environmental
selection.
2. Exploit genomic diversity in eleven cultivated barley and identify
domestication signatures of cultivated barley from wild barley.
3. Create a barley genomic variation database for the barley research
community to access barley genome reference information and genomic
variations identified from wild barley and cultivated barley.
4. Characterise the pattern of mutations induced by gamma-ray radiation.
5. Narrow down the causal genes for the high tiller number mutant.
4
Chapter 2 Literature review
Figure 2-1 Main content of literature review.
5
2.1 Barley genomic research
Barley is the fourth most important crop in the world, both in terms of grain
production and cultivation area according to the Food and Agricultural Organization
(FAO) of the United Nations (www.fao.org/faostat). In Australia, barley is the
second most important crop, where it is widely planted from Western Australia to
southern Queensland. In 2013, about 7.47 million tonnes of barley grains were
harvested from four million hectares in Australia.
Barley provides abundant energy and nutrition for humans and animals. Barley grain
is mostly used as animal fodder, while malting barley is an important raw material
for production of beer and certain distilled beverages. The remainder is used as a
functional food for humans. Barley can grow in diverse environments and even
extreme climates such as the Tibetan Plateau, where barley grain is a staple human
food (Harlan and Zohary, 1966). Barley grains contain the eight essential amino
acids (valine, isoleucine, leucine, threonine, phenylalanine, tryptophan, methionine,
and lysine) that are not produced naturally by human body. Barley also contains
soluble fibre, which plays an essential role in our digestive, heart and skin health, and
may improve blood sugar control and weight management.
Barley is one of the founder crops of the Old World and was domesticated from wild
barley (Hordeum spontaneum L.) around 10,000 years ago in the Fertile Crescent
and can grow in diverse environments extreme latitude like the Tibetan Plateau
(Harlan and Zohary, 1966). As a result, barley has been widely used as a model
organism to research plant adaptive evolutionary processes as well as a vital genetic
6
repository to explore plant abiotic and biotic stress tolerance (International Barley
Genome Sequencing et al., 2012).
2.1.1 Barley domestication and evolution
Barley (Hordeum vulgare L. subsp. vulgare) is a member of the grass family. It is
one of the earliest domesticated cereal crops, which supplied essential energy for the
first human civilisation in the Near East Fertile Crescent about 10,000 years ago
(Harlan and Zohary, 1966). Recently, the study of an old beer recipe indicated that
barley grain was used to brewing beer as early as 5,000 years ago, as determined
from archaeological evidence including brewing tools and chemical residues found at
the Mijiaya site in China (Wang et al., 2016). Beer making was also assumed to play
an important role in promoting the spread of barley cultivation from Western Eurasia
to Eastern Asia including China.
Cultivated barley is derived from its progenitor of wild barley (Hordeum vulgare L.
subsp. spontaneum), which has been found in multiple regions including Turkey,
Israel, Jordan, Iraq and Afghanistan (Harlan and Zohary, 1966). Compared with
cultivated barley, wild barley features brittle rachises that increase the possibility of
propagation in a natural environment. However, the fragile rachis is not an ideal trait
for artificial selection because it reduces grain yields due to seed dispersal. Loss of
seed dispersal is assumed to be a crucial step in the process of crop domestication
(Fuller and Allaby, 2009; Pourkheirandish et al. , 2015). In barley, the transition from
shattering to non-shattering pods occurred in the mutation of two adjacent and
complementary genes Btr1 and Btr2 on chromosome 3H (Pourkheirandish et al. ,
2015; Takahashi and Hayashi, 1964). These genes were involved in the independent
7
evolution of landraces in eastern and western Asia, which strongly suggests the
presence of at least two centres of independent domestication.
The Near East Fertile Crescent is a well-recognised barley evolution centre. It
spreads from the Levant in the west to regions around the Euphrates and Tigris
Rivers in the east. The second domestication of barley occurred 1500–3000 km east
of the Fertile Crescent due to the haplotype frequency difference among geographic
regions at multiple loci (Morrell and Clegg, 2007). The second domestication mainly
contributes diversity from Central Asia to the Far East while the diversity at the first
centre contributes most of the diversity in Europe and America. Recently, Tibet was
identified as another domestication centre of cultivated barley after analysing the
genotypic division of worldwide cultivated barley and wild barley from the Near
East and Tibet (Dai et al. , 2012). A polyphyletic origin was also proposed after
cluster analysis of 186 barley accessions (including wild barley and barley landraces)
from worldwide locations with chloroplast DNA microsatellite markers (Molina-
Cano et al., 2005).
2.1.2 Constructing the barley reference genome
Challenges associated with constructing the barley reference genome include its
large genome size and the complexity of genomic content. Barley is diploid
(2n=2x=14) with an estimated genome size of 5.1 Gb, about 13 times that of rice and
1.5 times that of human. Up to 84% of the barley genome is accounted for repetitive
sequence structures such as retrotransposon (International Barley Genome
Sequencing et al., 2012). Given the above challenges, it is not possible to generate a
high-quality barley genome reference using whole-genome shotgun sequencing
8
strategy, thought sequencing technology and sequence assembling algorism has
made huge progress in the past decades.
In 2012, a barley physical map with a total length of 4.83 Gb was generated using
high-information-content fingerprinting together with contig assembly (International
Barley Genome Sequencing et al., 2012). The physical map represented up to 98% of
the barley genome. It comprised 9,265 contigs generated by assembling 571,000
BAC clones, which can be represented by a minimum tilling path of 67,000 BAC
clones. Sequencing 5,341 gene-containing BACs and 304,523 BAC-ends provided
the fundamental information to integrate the physical map with a high-density
genetic map, whole-genome shotgun sequencing result, and transcript sequences
(Ariyadasa et al., 2014; Mascher et al. , 2013a). Integrating the above information
resulted in a barley framework with 26,159 high-confidence genes. This barley
genome framework provided a limited information for functional gene isolation and
molecular breeding in barley (Nakamura et al., 2016; Pourkheirandish et al., 2015;
Russell et al., 2016; Sato et al., 2016).
A high-quality reference for barley genome was attained in 2017 (Beier et al., 2017;
Mascher et al., 2017) and involved the cooperation of barley genetic researchers
from Australia, China, Germany, UK, US, Denmark and so on. Based on the
previous barley genome framework, 87,075 BACs covering the minimum tilling path
were selected for hierarchical shotgun sequencing and de novo assembly (Munoz-
Amatriain et al. , 2015). Super-scaffolds were generated by overlapping the adjacent
BAC assembly, and assigned to chromosomes based on multifaceted information
including a barley physical map, barley genetic map, optical map, and chromosome
9
conformation capture sequencing (Ariyadasa et al., 2014; Mascher et al., 2013a). The
resulting chromosome-scale assembly of the barley genome is about 4.97 Gb (Table
2-1), which consists of 6,347 super-scaffolds with N50 of 1.9 Mb (Beier et al., 2017;
Mascher et al., 2017). 39,734 high-confidence genes were annotated, based on gene
model prediction and evidence of RNA-seq and sequence homology from other
species.
Table 2-1 Assembly and annotation statistics of the barley genome (Mascher et al.,
2017)
Assembly control argument Value
Number and cumulative length of sequenced BACs 87,075 (11.3 Gb)
Length of non-redundant sequence 4.79 Gb
Number of sequence contigs 466,070
BAC sequence contig N50 79 kb
Number and cumulative length of BAC super-
scaffolds 4,235 (4.58 Gb)
Number and cumulative length of singleton BACs 2,123 (205 Mb)
Super-scaffold N50 1.9 Mb
Sequence anchored to the POPSEQ genetic map 4.63 Gb (97%)
Sequence anchored to the Hi-C map 4.54 Gb (95%)
Number of annotated high-confidence genes 39,734
Annotated coding sequence 65.3 Mb (1.4%)
Annotated transposable elements 3.70 Gb (80.8%)
2.1.3 Molecular markers and genotyping
10
Molecular markers have been widely used to compare the genetic difference between
individuals and it is a fundamental technology of modern genetic research including
linkage analysis and association mapping. Molecular marker resources in barley are
classified into three types: traditional molecular markers, array-based molecular
markers, and sequencing-based molecular markers.
Traditional molecular markers include Restriction Fragment Length Polymorphism
(RFLP) and Simple Sequence Repeats (SSR). RFLP was the first generation of
molecular markers used to construct genetic maps in barley (Sherman et al. , 1995).
Genotyping by RFLP involves hybridisation and isotope labelling, which is labour
intensive and harmful to operators’ health. SSR markers are more sensitive than
RFLP, and its operation is relatively easy because it is PCR-based. Three sets of
barley SSR markers were developed using an original method involving the
construction of a small insert genomic DNA library, hybridisation using tandemly
repeated oligonucleotides, and sequencing of positive clones (Liu et al., 1996;
Ramsay et al., 2000; Struss and Plieske, 1998). With the accumulation of sequences,
especially expressed sequence tags (EST) in public databases, software was designed
to predict SSR motifs for SSR marker development in barley (Thiel et al., 2003). In
2007, a barley SSR consensus map comprising 775 markers was built by integrating
SSR markers from six genetic maps (Varshney et al. , 2007). However, genotyping by
SSR markers was limited due to it being low density, labour intensive and time-
consuming, such that it cannot meet the demand for high density and high throughput
when genotyping large numbers of individuals.
11
To circumvent the limitation of traditional molecular markers, Diversity Array
Technology (DArT) and SNP arrays were developed based on microarray
technology. They detect DNA variation at hundreds to thousands of locations in
parallel. DArT works by detecting whether sample fragments are present in genomic
representation (Jaccoud et al. , 2001). In barley, DArT was first used to genotype an
F2 population derived from a Steptoe × Morex cross and resulted in a genetic map
comprising 384 DArT markers (Wenzl et al. , 2004). In the following two years,
2,085 DArT markers were developed from ten biparental populations and integrated
into a consensus map of medium density with 850 traditional markers (Wenzl et al. ,
2006). While genotyping by DArT is more efficient than traditional molecular
markers, it is unable to meet the high density required for constructing a genetic map
and the high resolution of gene mapping.
Single nucleotide polymorphisms are the most abundant type of molecular marker.
With the accumulation of sequence information from diverse germplasm in barley,
SNP information was increasingly exploited. Close et al. (2009) identified about
22,000 SNPs from publicly available ESTs. The performance of 4,596 selected SNPs
was tested on the Illumina GoldenGate Oligonucleotide platform, and the minor
allele frequency was estimated for each SNP in a collection of ~200 barley
germplasm. 3,072 SNPs with good performance and proper allele frequency were
selected and incorporated into two sets of SNP arrays. One of these arrays, with
1,536 loci, was used to genotype 500 selected UK barley cultivars, and association
mapping was performed for 15 morphology traits based on the genotype dataset
(Cockram et al., 2010). Two years later, another 220 spring barley cultivars collected
worldwide were genotyped using the same SNP array for a genome-wide association
12
study (Pasam et al., 2012). Although SNP arrays are useful for determining
genotypes in populations with large sample numbers, it can only detect variations in
certain genomic regions. Genotyping by SNP and DArT arrays is efficient and
reproducible, but they need to be developed first.
Sequencing-based markers do not need to be developed before they are used for
genotyping. Whole-genome sequencing can detect almost all sequence variants in the
whole genome. However, it is too expensive to adopt a whole-genome sequencing
strategy to genotype a large population in barley. RAD sequencing (Ren et al., 2016;
Zhou et al., 2015) is the most-used approach for genotyping a large number of
individuals. RAD sequencing only sequences the regions around restriction enzyme
recognising sites, which can reduce the complexity of the genome. Zhou et al. (2015)
conducted RAD sequencing for a DH population with 96 individuals from a cv. AC
Metcalfe × cv. Baudin cross. The sequencing library was constructed from genomic
DNA digested with restriction enzyme EcoRI for each sample. Clean reads were
mapped to the assembly of barley cv. Morex for SNP calling of each sample. Finally,
a high-density genetic map was constructed based on the genotype data of 13,547
SNP markers in the population. Based on this genetic map, a major quantitative trait
locus (QTL) controlling plant height was identified on chromosome 3H, which
explained 44.5% of the phenotype variants.
2.1.4 Linkage mapping
Most agricultural traits of cereal crops are controlled by QTLs, such as flowering
time, plant height, grain number and weight. The identification and molecular
dissection of these QTLs will provide fundamental knowledge and genetic resources
13
for rational design and molecular breeding of new crop varieties with high yield,
environmental stress resistance and superior grain quality (Zeng et al., 2017).
Linkage mapping and association mapping are two major approaches widely adopted
to isolate QTLs of agricultural importance.
Linkage mapping is a method for identifying functional QTLs in bi-parent
populations using genetic maps, which can be constructed by calculating the rate of
recombination between each pair of molecular markers. The closer two genes are in
physical distance, the lower the rate of recombination. The advantage of map-based
cloning is that it does not require knowledge of the protein product of genes which
control the traits. In the past decades, linkage mapping has been adopted to identify
QTLs controlling diverse agronomic traits in barley, such as seed dormancy (Han et
al., 1996; Kleinhofs et al., 1993; Nakamura et al., 2016; Sato et al., 2016), nonbrittle
rachis (Komatsuda and Mano, 2002; Komatsuda et al., 2004; Pourkheirandish et al. ,
2015), and powdery mildew resistance (Buschges et al. , 1997; Hinze et al. , 1991). To
dissect the genetic basis of seed dormancy in barley, a genetic map comprising 295
traditional molecular markers was constructed using a double haploid line (DHL)
population with 150 individuals from a cv. Steptoe (high level of dormancy) × cv.
Morex (no dormancy) cross, from which four QTLs regulating seed dormancy (SD1,
SD2, SD3, SD4) were identified (Kleinhofs et al., 1993). SD1 and SD2 were further
validated and fine mapped using three F2 segregation populations whose parents
were selected from previous DHLs (Han et al., 1996). Eventually, SD1 was identified
to encode an alanine aminotransferase (AlaAT) (Sato et al., 2016), while the causal
gene SD2 encodes Mitogen-activated Protein Kinase Kinase 3 (MKK3) (Nakamura
et al., 2016). The nonbrittle rachis trait was formed in the domestication process of
14
cultivated barley from wild barley and two closely linked QTLs (Brt1 and Brt2) were
identified to co-control this phenotype using an F8 recombinant inbred line (RIL)
population with 215 individuals derived from a cv. Azumamugi (AZ) × cv. Kanto
Nakate Gold (KNG) cross genotyped using 272 markers (Komatsuda and Mano,
2002). The two QTLs were targeted within a 0.19 cM region on chromosome 3H
based on an F2 segregation population with 10,084 individuals, which covers a
genomic region of 403 kb (Pourkheirandish et al., 2015). More SNP markers were
developed in the candidate region, and two enlarged F2 segregation populations
(14,058 and 12,257 individuals) were created. According to recombination analysis,
Brt1 was delimited to a 1.2 kb interval with only one ORF (ORF1) and Brt2 in a 4.9
kb with three ORFs (ORF2, ORF3, ORF4). Sequence analysis of the three candidate
genes for brt2 between two parents revealed that an 11 bp deletion in ORF3 causes a
frameshift in cv. Azumamugi (Pourkheirandish et al., 2015).
2.1.5 Association mapping
Association mapping is a method for identifying functional QTLs within a natural
population based on linkage disequilibrium (LD) between functional loci and
molecular markers. LD refers to the non-random association of alleles of two genes
or markers. Compared with linkage mapping, association mapping has several major
advantages. Firstly, more alleles can be investigated in a natural population using
association mapping, compared with only two alleles in a bi-parent population using
linkage mapping. Secondly, recombination events in the natural population are more
abundant than that in biparental population because they have accumulated over
millions of generations. The level of the recombination is a major factor that
15
determines the resolution of gene mapping. Thirdly, it is relatively quick and easy to
collect natural populations than to create a crossing population, which is labour
intensive and time-consuming. Given the above advantages, association mapping is
considered a promising tool for dissecting the genetic basis of complex agronomic
traits in crops. With the advances in high-throughput genotyping by sequencing,
association mapping has been successfully used for functional QTL mapping in
plants over the last decade, including Arabidopsis (Atwell et al., 2010), maize
(Buckler et al., 2009), and rice (Huang et al., 2010).
In barley, association mapping has been used to identify functional QTLs related to
morphological traits (Cockram et al., 2010; Pasam et al., 2012; Wang et al. , 2012),
disease resistance (Richards et al., 2017; Roy et al., 2010; Sallam et al., 2017;
Turuspekov et al., 2016), drought tolerance (Wehner et al., 2015), and soil acid and
salinity tolerance (Fan et al., 2016; Zhou et al., 2016). To identify QTLs underlying
morphological traits, Cockram et al. (2010) conducted a genome-wide association
study with a natural population consisting of 500 UK barley cultivars. The
association panel was genotyped using 1,536 genome-wide SNPs, and 32
morphological traits were scored and used to scan for marker-trait associations.
Eighteen QTLs were identified and had significant associations with 15 traits. To
identify QTLs controlling net blotch resistance in barley, Richards et al. (2017)
collected 957 barley core germplasm genotyped with 6,525 SNPs; 16 unique
genomic loci had significant associations with net blotch disease resistance and five
had novel QTLs. Improving the drought tolerance of crops can reduce grain
production losses due to water shortages. To identify QTLs underlining drought
tolerance, Wehner et al. (2015) conducted a genome-wide association study with 156
16
winter barley germplasm. The association panel was genotyped using the Illumina 9
k iSelect SNP chip. Two major QTLs controlling drought stress and leaf senescence
were delimited on chromosome 2H and 5H. To dissect the genetic basis of barley
salinity stress tolerance, Fan et al. (2016) conducted a genome-wide association
study with 206 barley accessions collected worldwide which were genotyped using
408 DArT markers. One of 24 QTLs was commonly detected with significant
marker-trait association with salinity stress tolerance.
17
2.2 DNA sequencing technology and application
DNA is the main carrier of genetic information, and eventually determines the amino
acids of each protein in a biological system. DNA sequencing is a process to
determine the order of the four bases in one piece of DNA, being adenine (A),
thymine (T), cytosine (C) and guanine (G). The first attempt to characterise a DNA
sequence was conducted in bacteriophage λ, in which the nucleotide composition of
the 5’-terminated strand was determined (Wu and Kaiser, 1968). Wu (1970)
developed a DNA sequencing method based on a primer-extension strategy and
determined the DNA sequence of the cohesive ends of bacteriophage λ. This DNA
sequencing method was improved to become the well-known ‘first-generation DNA
sequencing’ (Maxam and Gilbert, 1977; Sanger and Coulson, 1975; Sanger et al. ,
1977). The past half-century has witnessed dramatic advances in both DNA
sequencing technology and its application to various aspects of life science. In this
part, I will start with comparing the advantages and disadvantages of the existing
major DNA sequencing platforms summarising the achievements of building the
reference genome, detecting genomic variation, and deciphering the gene expression
profile based on DNA sequencing ( 2-2).
18
Figure 2-2 DNA sequencing technology and application.
2.2.1 DNA sequencing technology
First-generation DNA sequencing also refers to chain-terminating sequencing or
capillary sequencing. In this method, the chemical analogues di-deoxynucleo-
tidetriphosphates (ddNTPs) of four types of deoxyribonucleotides (dNTPs) are used
to inhibit the elongation of DNA synthesis. Compared with dNTPs, ddNTPs lack the
3’ hydroxyl group that is required to form a phosphodiester bond between the DNA
terminal and the next dNTP ( 2-3 a). Four types of ddNTPs are labelled with
different fluorescent dyes and mixed into a DNA synthesis reaction at a certain
concentration. ddNTPs are incorporated randomly into the synthesis reaction, and
polynucleotides of different length with ddNTPs in their 3’ terminals are generated.
The resulting polynucleotides are analysed with capillary electrophoresis and a
fluorescent detector ( 2-3 b). ABI3730 is a widely used capillary sequencing machine
19
these years, with a sequencing length up to 1100 bp with base accuracy as high as
99.999%. First-generation sequencing technology is challenged by slow data
generation, being about 3.4 Mb per 24 h for each ABI3730 sequencer. Because only
one DNA template can be added to each reaction and 384 reactions can be run
simultaneously at each term of about 3 hours.
Figure 2-3 Diagrams of chain termination sequencing.
a 3’-5’-phosphodiester bond in the process of DNA synthesis; b primer extension
with ddNTPs and determining DNA sequence (Garrido-Cardenas et al., 2017).
20
The output from DNA sequencing has made a quantitative leap with the introduction
of massive parallel sequencing technology, designated ‘second-generation
sequencing (SGS)’. SGS has made it possible to sequence thousands of millions of
DNA templates simultaneously by spatial separation. While there are several
commercially available SGS platforms in the sequencing market, the Illumina
Genome Analyzer is the most popular one. Illumina sequencing uses a sequencing-
by-synthesis approach and reversible dye-terminator technology. The steps involved
in Illumina sequencing include preparing the sequencing library, attaching DNA
templates to the surface of flow cell channels, generating template clusters by bridge
amplification, incorporating one nucleotide labelled with fluorescence, imaging for
base calling, and removing the 3’ blocking group. At present, Illumina has a range of
sequencers available—including MiSeq, NextSeq, HiSeq and NovaSeq—
(https://www.illumina.com). NovaSeq can generate a maximum output (6000 Gb) in
about 44 h with up to 20 billion reads (paired-end 150 bp) in each run. That equates
to about 3272 Gb per day, or about one million times the speed of the AB370
capillary sequencer. SGS has dramatically increased the speed of DNA sequencing
and decreased the cost. The limitation of SGS is the short length of output reads,
which is challenging when assembling complex genomes with large portions of
repetitive sequences.
21
Figure 2-4 Illumina/Solexa sequencing (Mardis, 2008)
22
To overcome the challenge of read length in SGS, a new concept of single-molecular
real-time (SMRT) sequencing was proposed. Sequencing technologies based on this
concept are classified as ‘third-generation sequencing (TGS)’. In SMRT, DNA
polymerase is immobilised and deoxyribonucleoside triphosphates (dNTPs) are
labelled with four different fluorescence colours. The fluorescence of a dNTP can be
detected when it is recruited by the immobilised DNA polymerase to incorporate into
the end of a growing DNA strand ( 2-5). The challenge of TGS is that the sensitivity
of the signal of the recruited dNTP is weakened by the background signal from
dissociative dNTPs. This challenge has been circumvented by zero-mode waveguide
(ZMWs) technology invented by Pacific BioScience (PACBIO) ( 2-5). ZMWs
provide nanostructures for each DNA synthesis reaction in a separate volume so that
the background signal can be reduced and millions of reactions can be processed
simultaneously (Eid et al. , 2009). Because the DNA templates are PCR-free and the
process of DNA synthesis is uninterrupted, the DNA templates can be sequenced
from start to end. The length of sequences produced by SMRT sequencing can be as
long as 70 kb (Figure 2-6).
23
Figure 2-5 PACBIO SMRT technology.
A, the structure of ZMW; B, the process of signal detection in the process of dNTP
incorporation (Eid et al., 2009)
Figure 2-6 Distribution of read length in SMART sequencing (from PACBIO
website)
24
Table 2-2 Performance comparison of dominant sequencers
Method Read length Accuracy Reads per run Time per run
US dollar per 1
Mb
Comments
Sanger sequencing
()ABI3730
400–900 bp 99.90% 384 20 min to 3 h $2,400 Low throughput,
high quality
Hiseq serials
(Illumina)
50–300 bp 99.90% up to 3 billion 1 to 11 days $0.05–0.15 High throughput,
cheap, short reads
Ion semiconductor
(Ion Torrent) up to 600 bp 98% up to 80 million 2 h $1.00
High throughput, a
bit expensive
25
Method Read length Accuracy Reads per run Time per run
US dollar per 1
Mb
Comments
SMRT (PACBIO) 10–40 kb 87% 50,000 per
SMRT cell 30 min to 4 h $0.13–0.60
Long reads,
relatively cheap
Nanopore (Oxford) up to 500 kb 92–97% NA 1 min to 48 h $500–999 per
Flow Cell
Long reads,
convenient
26
2.2.2 Constructing reference genome
The high quality of the reference genome sequence is fundamental for genomic
research. According to NCBI/genome statistics, by the end of 2017, genomes from
34,605 organisms have been sequenced and assembled at different levels (contigs,
scaffolds or chromosomes), and these include 5,135 eukaryotes, 130,201 prokaryotes,
and 14,001 viruses. Within the eukaryotes, there are 2,777 fungi species, 1,298
animal species and 495 plant species. The genomes of major crops such as wheat
(International Wheat Genome Sequencing, 2014), maize (Jiao et al., 2017; Schnable
et al., 2009), rice (Goff et al. , 2002; International Rice Genome Sequencing, 2005),
barley (Dai et al. , 2018; Mascher et al., 2017), foxtail millet (Zhang et al., 2012),
sorghum (Paterson et al., 2009), rye (Bauer et al., 2017), rapeseed (Chalhoub et al. ,
2014), and soybean (Schmutz et al., 2010), and common model organisms such as
Arabidopsis (Arabidopsis Genome, 2000) and purple false brome (Vogel et al. , 2010),
have all been sequenced (Table 2-3).
27
Table 2-3 Genome assembly and gene prediction of major crops and model organisms
Common
name Scientific name Chromosomes
Genome size (Mb)
Assembly size (Mb)
Predicted gene number
Assemblies Reference
Arabidopsis Arabidopsis
thaliana 5 125 119 25,498 10 (Arabidopsis Genome, 2000)
Rice Oryza sativa 12 389 374 37,544 23 (International Rice Genome
Sequencing, 2005)
Sorghum Sorghum bicolor 10 730 709 27,640 4 (Paterson et al., 2009)
Maize Zea mays 10 2,300 2,135 32,540 10 (Jiao et al., 2017; Schnable et al.,
2009)
Purple false brome
Brachypodium distachyon
5 355 272 25,532 4 (Vogel et al., 2010)
Soybean Glycine max 20 1,100 978 46,430 2 (Schmutz et al., 2010)
Foxtail
millet Setaria italica 9 491 423 38,801 2 (Zhang et al., 2012)
Rapeseed Brassica napus 19 1,130 976 101,040 2 (Chalhoub et al., 2014)
Bread wheat Triticum aestivum 21 17,000 15,000 124,201 15 (International Wheat Genome
Sequencing, 2014)
Barley Hordeum vulgare 7 5,100 4,790 39,734 8 (Mascher et al., 2017)
Rye Secale cereale 7 7,900 2,800 27,784 2 (Bauer et al., 2017)
28
Most draft genomes were completed using one of the three major sequencing
methods—whole-genome shotgun sequencing (Goff et al. , 2002), map-based clone
sequencing (International Rice Genome Sequencing, 2005) or chromosome
conformation capture sequencing (Mascher et al. , 2017). Whole-genome shotgun
sequencing is a fast and efficient method to obtain a draft genome, whereby genomic
DNA is randomly fragmented and ligated with universal sequencing adapters before
being sequenced. Goff et al. (2002) completed the first draft genome of rice using
this method, with 42,109 contigs covering 389,809,244 bp assembled from 5.5
million reads, and about 40,000 genes predicted. Whole-genome shotgun sequencing
is an efficient method for obtaining the genome sequence of regions with low copy,
but it fails to assemble sequences in regions with a high copy of repetitive structures.
The rice reference genome was completed using the map-based clone sequence
method (International Rice Genome Sequencing, 2005). Firstly, a physical map was
established by fingerprinting BACs from nine independent genomic libraries together
with clone-end sequencing. Secondly, 3,453 representative clones selected according
to the physical map were randomly fragmented and sequenced with the shotgun
sequencing method to about 10-fold and assembled. Thirdly, assemblies of adjacent
clones were overlapped to generate super-scaffolds and the sequence gaps filled
using long-range PCR sequencing. Finally, the assembly of 12 chromosomes
generated a pseudomolecule genome with a total length of about 370 Mb, with only
36 estimated gaps.
29
Barley has a large genome size (estimated 5.1 Gb), about 13 times that of the rice
genome. Moreover, up to 80% of barley genome is accounted for by repetitive
structure sequences. Large genome size and complex genome content make it more
difficult to generate a high-quality genome assembly. The dramatic advance of DNA
sequencing and relevant molecular biology technologies, especially three-
dimensional chromatin conformation capture sequencing, offer new insight and
convenience for assembling the barley genome (Jin et al., 2013). A high-quality
barley genome was achieved in 2017 by the International Barley Sequencing
Consortium (Mascher et al., 2017). To overcome the challenge of abundant repetitive
elements in the barley genome, the most advanced technologies were used, including
hierarchical shotgun sequencing, optical mapping, and chromosome conformation
capture sequencing. Eventually, a pseudomolecule genome of barley was assembled
with a total length of 4.79 Gb, with 39,734 high-confidence genes predicted from the
pseudomolecule sequence with the supporting evidences including transcriptome
sequences and homology proteins which have been well-characterised and saved in
the public databases.
2.2.3 Detecting genetic variants
The relationship between genetic variants and phenotype are the major focus of
genetic research. High-throughput DNA sequencing makes it possible to detect
genetic variants in the whole genome. Genetic variants can be detected in spececial
genomic regions with different densities by choosing whole-genome sequencing
(International Barley Genome Sequenc ing et al., 2012; Zeng et al., 2015),
transcriptome sequencing (Dai et al., 2014), exon sequencing (Mascher et al. , 2013b;
30
Russell et al., 2016) or restriction site associated DNA sequencing (Zhou et al. ,
2015). To exploit the natural diversity in barley, four barley cultivars (Bowman, Igri,
Barke and Haruna Nijo) and one wild barley were sequenced at a depth of 5 to 25
folds using the whole-genome shotgun strategy. Clean reads for each accession were
mapped to the cv. Morex assembly, which identified about 15 million SNPs
(International Barley Genome Sequencing et al., 2012). Zeng et al. (2015) conducted
whole-genome sequencing on five Tibetan wild barleys and five Tibetan cultivated
barleys with an average coverage of 16.4-fold. About 36 million SNPs and 2.28
million short InDels were identified. Due to the large size of the barley genome,
detecting genomic diversity by whole-genome sequencing is not affordable for most
researchers with limited funding. Compared with whole-genome sequencing,
uncovering genetic diversity by transcriptome sequencing is more efficient. To study
the origin of cultivated barley, Dai et al. (2014) conducted transcriptome sequencing
of 21 barley accessions—12 wild barleys and nine cultivated barleys—which
generated about 48 Gb of clean data. After mapping these clean reads to the cv.
Morex assembly, 55,973 unique SNPs were identified. Exome sequencing is another
method that focuses on the gene coding region. Mascher et al. (2013b) developed a
barley exome capture array that can be used to enrich fragments in the coding region.
Russell et al. (2016) sequenced 267 barley landraces and wild accessions using the
exome sequencing strategy to study the environmental adaptation of barley, which
identified about 1.7 million SNPs and 143,872 short InDels.
2.2.4 RNA-seq for analysing gene expression level
31
RNA-seq refers to whole transcriptome sequencing, which provides a new strategy
for comparing RNA transcripts, alternative gene splices, changes in gene expression,
co-expression networks, and gene fusion. Prior to RNA-seq, gene expression changes
were usually analysed with gene expression microarrays, which is based on molecule
hybridisation technology. The experimental steps for preparing RNA sequencing
libraries are as follows: (1) total RNAs is extracted from the sample, which is then
treated with DNase I to avoid DNA contamination; (2) mRNA is enriched using
beads with Oligo (dT) complementary with poly-A at the 5’end of mRNA for
eukaryotes; (3) enriched mRNA fragments are fragmented into smaller pieces before
synthesising the first strand and second strands of cDNA; and (4) the double strand
cDNA is end-repaired and the sequencing adapter added to both ends before being
sequenced on next-generation sequencing platforms.
Discovering new transcripts is an important aim of RNA-seq for biologists. For
organisms with a complete genome sequence, transcripts can be assembled using the
genome sequence as a guide. In this approach, the first step is to align short reads of
RNA-seq with the genomic DNA sequence. The biggest challenge for aligning RNA-
seq reads with genomic DNA is spliced alignment. This is because genes in
eukaryotic genomes contain introns and mature mRNA does not have these introns.
To overcome this challenge, Tophat2, the latest RNA-seq alignment software,
divides this process into three steps (Kim et al., 2013). Tophat2 begins by aligning all
reads with the provided transcriptome with mismatch less than three bases. Then the
unmapped reads (from transcriptome) are aligned with the genome sequence and
reads spanning a single exon are kept. For multi-exon spanning reads, the exons are
32
split into 31 bp segments and aligned to the reference sequence. The resulting
alignments are taken as input files for transcript assembly using Cufflinks software.
Alternative splice events can be detected by SOAPsplice (version 1.10) with the
highest call rate (detected true junction number/true junction in total) and the lowest
false positive rate (detected false junction number/the number of junctions all
detected) compared with TopHat, SpliceMap and MapSplice (Huang et al., 2011a).
The general steps in this process are summarised as follows. The intact alignments
are those aligned perfectly to the reference, with the remainder referred to as
unmapped reads (IUM). IUMs are spliced into two exon segments of 30 bp and
realigned to the reference allowing no gaps and less than 5’ mismatch. The distance
between these two mapped positions should range from 50 bp to 50 Mb, and the
boundary of introns should follow the rule of GT-AG, GC-AG, AT-AC (Burset et al.,
2000). Based on the detected splice junctions, alternative splice events are identified
according to the definition.
2.2.5 ChIP-seq determining protein binding sites
The ChIP-seq strategy combines chromatin immunoprecipitation (ChIP) with next-
generation sequencing to determine whether the target protein interact with specific
genomic regions. ChIP is used to investigate the interaction between proteins and
DNA. Studying the way proteins interact with DNA will help to understand how the
target protein affects certain phenotypes. The main aim of ChIP-seq is to identify
specific DNA-binding sites.
33
The key step in ChIP-seq is to enrich the target DNA sequences which are bound to
the specific protein using ChIP. The workflow of ChIP is briefly as follows: (1)
DNA-protein complexes are formed in living cells or tissues and then sheared into
about 500 bp DNA fragments; (2) target DNA fragments cross-linked with proteins
are specially immunoprecipitated using protein-specific antibodies; (3) the bound
DNA fragments are released from DNA-protein complexes and purified; and (4) the
enriched target DNA fragments are used to prepare sequencing libraries that are
sequenced using next-generation sequencing technology.
To identify enhancer sequences which regulate spatial and temporal expression of
genes in humans and mice, the antibodies of an enhancer-associated protein p300
was used to enrich the DNA-p300 complex after cross-linking, chromatin extraction
and sheared by sonication and then sequenced on the Hiseq2000 platform (Visel et
al., 2009). Several thousand genomic binding sites were identified for p300 in mouse
midbrain, embryonic forebrain, and limb tissue. This was confirmed by testing the
sequences of 86 of these binding sites in a transgenic mouse assay (Visel et al.,
2009). Their study confirmed that ChIP-seq technology is an accurate approach for
identifying and isolating regulatory sequences such as enhancers.
2.2.6 Bisulfite sequencing for detecting methylation
DNA methylation is a typical epigenetic phenomenon, which complements the
determination of biological phenotype. Methylation in the promoters or within the
transcribed regions of the gene is highly correlated with gene transcription level
(Zhang et al., 2006b; Zilberman et al., 2007). The aim of DNA methylation
34
sequencing technology is to identify the methylation pattern of genomic DNA, which
is based on sodium bisulphite treatment.
Figure 2-7 Schematic diagram of DNA methylation sequencing (Tollefsbol, 2011).
Bisulphite treatment is the key point of methylation sequencing. Normal cytosine
residues are converted into uracil after DNA fragments are treated with bisulphite.
However, the bisulphite does not affect the cytosine residues with 5-methylation. It is
assumed that all unmethylated cytosines in the whole genome are converted to uracil
35
after being treated with bisulphite. As a result, it is possible to identify the
methylation status of CpG dinucleotides to sequencing bisulphite-treated genomic
DNA. Because methyl-cytosines appear to be cytosines while unmethyl-cytosines
appear to be thymines after PCR amplification in the process of NGS sequencing
(Frommer et al., 1992).
To study the DNA methylation pattern in Arabidopsis , a study was conducted to
generate a map of cytosine methylation at single-base-pair resolution using the
shotgun bisulphite sequencing approach (Zhang et al. , 2006b). The genomic DNA
was treated with sodium bisulphite to convert unmethylated cytosines to uracils and
then sequenced on the Illumina sequencing platform. After sequencing, about 20-fold
coverage of clean reads was produced with 1.7% CHH, 6.7% CHG and 24% CG
methylation in their experimental condition.
36
Chapter 3 Sympatric speciation and adaptive
evolution of wild barley in
“Evolution Canyon”
Abstract
Sympatric speciation is a highly contentious concept and it is under strong debate
whether natural selection can outperform the continuous homogenizing force of free
gene flow in a sympatric ecological condition and generate reproductive
incompatibility. Here, we provide substantial new genomic evidence for adaptation-
driven incipient sympatric ecological speciation of two wild barley populations
within 250 m from the ‘Evolution Canyon’ (EC), consisting of two abutting slopes
with drastically different microclimates. Phylogenetic relationship and population
structure analysis uncovered extraordinary genetic divergence between inter-slope
EC populations, considerably greater than between the Tibetan wild and cultivated
barleys. A genome-wide FST analysis revealed detailed information on the genomic
differentiation with a total of 243 Mb genomic regions under environmental
selection, representing ~5% of the barley reference genome. One prominent
‘chromosome island’, spanning a continuous region of around 200 Mb, was
identified on chromosome 3H under strong natural selection. A homologous gene
involving the indica-japonica rice speciation located in the selection sweep region
with unique amino acid substitution between the two populations. In-situ
hybridization results demonstrated potential chromosome structure changes in this
region. RNA-seq analysis demonstrated that the genotype from one slope has
37
evolved unique mechanisms in responding to heat and drought. Our study provides
comprehensive genomic evidence and new insights to understand the process of
incipient sympatric speciation of wild barleys.
38
3.1 Introduction
Wild barley (Hordeum spontaneum L.) is the ancestor of cultivated barley (Hordeum
vulgare L.) with a wide geographic distribution across highly diverse environments
throughout the Near East, from the Eastern Mediterranean coasts to Tibet (Zohary et
al., 2012). Genomic resources obtained from wild crop relatives have dramaticly
contributed to enhancing crop productivity (Hajjar and Hodgkin, 2007; Nevo, 2012)
and understanding the genetic basis of adaptation and speciation. The evolutionary
processes that drive the extraordinary adaptive genomic divergence in wild crops
hold great potential for crop improvement (Nevo, 2012; Nevo et al. , 1992; Tester and
Langridge, 2010).
The high genetic and environmental heterogeneity across the distribution range of
wild barley suggests local adaptation and genomic differentiation along macro and
micro environmental gradients. The ‘Evolution Canyon’ I (EC), located in Lower
Nahal Oren, Mount Carmel in Israel, is a microsite model for biodiversity evolution,
adaptation and incipient sympatric speciation across life (Nevo, 1995, 2006; Nevo,
2014a). Different biotic and abiotic stress pressures at EC make it well suited to
characterize the causes and driving forces of evolution in action (Nevo, 2012; Nevo,
2013). EC consists of two abutting slopes separated (on average) by a distance of
only 250 m: the temperate, forested, cool and humid, ‘European’ north-facing slope
(NFS) and the tropical, savannoid, hot and dry, ‘African’ south-facing slope (SFS).
The geology and macro climates are very similar on the opposing slopes, but they
differ sharply in their micro climates ( 3-1). The SFS features extreme levels of solar
radiation (200‒800% more than NFS) with higher daily temperatures and drought,
39
whereas NFS is characterized by a cooler climate with a higher relative humidity (1–
7%) than SFS (Nevo, 1995, 2012; Pavlícek et al., 2003). The sharp micro climatic
divergence between the abutting slopes (Pavlícek et al., 2003) has resulted in
contrasting populations of microbial, animal, fungi and plant species (Nevo, 2014a).
Figure 3-1 Cross-section view and Sample collection sites of the “Evolution
Canyon” I (EC-I), Lower Nahal Oren Mount Carmel.
a, the microclimate model of the “Evolution Canyon” indicating sunshine reception
difference of the opposing slopes. b, the North-facing slope (NFS) and the South-
40
facing slope (SFS) in the “Evolution Canyon”, with an average distance of about 250
meters. c, an air overview of the “Evolution Canyon”. d, Wild barley, Hordeum
spontaneum, growing on the Natufian terrace and cemetery of the Oren cave at
“Evolution Canyon”. (from Nevo, 2012 )
We investigated the genomic and transcriptomic signatures of microevolution of wild
barley from EC, collected from mid-stations on the abutting slopes along the North–
South microclimatic gradient of EC. We used high coverage (~40X) whole genome
sequencing, with a comparison to Tibetan wild as well as cultivated barley varieties.
This study highlights genetic variants (SNPs and short insertions/deletions) including
genes with significant mutations (missense, lost or gained start or stop codons)
identified using the map-based reference sequence of the barley genome (Mascher et
al., 2017) as a reference. Genome-wide RNA profiling identified differentially
expressed genes (DEG) between NFS and SFS. This study provides tangible
evidence of ecological stress as a force that generates and maintains local adaptive
patterns, leading to incipient sympatric adaptive ecological speciat ion, and linking
micro and macro-evolutionary processes in wild barley populations.
41
3.2 Methods
Figure 3-2 Diagram of methods and aims
Sample collection.
The two slopes of ‘Evolution Canyon’ (EC) have seven sampling stations ( 3-1). Ten
barley accessions used in this study for transcriptome sequencing were independent
42
accessions collected from SFS station #2 and NFS station #6. The ten barley
accessions used in this study for whole-genome sequencing were from four different
groups: Tibetan wild barley (W1 and X1), Australian and Canadian malting barley
cultivars (Baudin and AC Metcalfe), and wild barley from EC, Mount Carmel, Israel
(South Face Slope station SFS2.1, SFS2.2, SFS2.3 and North Face Slope station
NFS6.1, NFS7.1 and NFS7.2). The Tibetan wild barleys were collected on the
Tibetan Plateau by T.W. Xu (Xu, 1982). The two commercial barley cultivars
Baudin (Australia) and AC Metcalfe (Canada) are premium malting quality varieties.
RNA sequencing and differential expression gene analysis
Wild barley seeds were sown in pots filled with peat under well-watered conditions
in glasshouse with the normal grow condition. When the plants reached the third leaf
stage, watering was ceased for the drought treatment but continued f or the controls
for a further 80 days. Fully expanded leaves on both the control and drought plants
were sampled with three replicates and flash frozen in liquid nitrogen for RNA
extraction. Total RNA was extracted using TRIzol Reagent (Invitrogen, Carlsbad,
CA, USA) and purified using the RNeasy Mini Kit (Qiagen, Germantown, MD,
USA). 2×150 bp paired-end sequencing libraries were constructed and sequenced on
an Illumina HiSeq2000 platform (Illumina, San Diego, CA, USA) as described
previously (Dai et al., 2014).
From an initial 368,900,936 paired-end raw reads, 295,697,023 clean reads were
retained after filtering. Retained pair-ended reads were aligned against the up-to-date
version of the Morex barley pseudomolecule sequences using the open source Kmer
adaptive aligner Biokanga (version 3.8.7) using default parameters. A count matrix
43
was generated and loaded into R (version 3.3.1) for downstream statistical analysis.
Read counts were normalized using the model-based Trimmed Mean of M-values
(TMM) method of Robinson and Oshlack (Robinson and Oshlack, 2010), followed
by fitting of a negative binomial generalized linear model (GLR) accommodating the
genotype differences and effects induced by the drought treatment. The edgeR
package (Robinson et al. , 2010) (version 3.14.0) was used for transcript differential
expression analysis. A total of 21,122 transcripts were included in the differentially
expressed (DE) analysis with more than ten reads for at least three samples. A
likelihood ratio test for comparing the genotypes was conducted to obtain DE
transcripts, and the resulting P-values were adjusted for multiple testing using
Hochberg’s FDR adjustment approach (Benjamini and Hochberg, 1995). Transcripts
were considered highly differentially expressed when the adjusted P-value was
<0.05. Meanwhile, GO enrichment analysis was performed on the differentially
expressed genes. The gene subsets were compared to the high confidence gene set of
barley genome reference (Mascher et al., 2017) to determine enriched or depleted
gene ontology terms. For the enrichment analysis, the free and open-source R
package GOstats (Gentleman et al. , 2004) (hypergeometric test) was used with p-
value cutoff of 0.05.
Whole-genome sequencing.
Genomic DNA was extracted from the leaves of ten barley plants, one plant per
accession. The DNA was analyzed for quality and concentration using agarose gels
and an Agilent Technologies 2100 Bioanalyzer (Agilent Technologies, Santa Clara,
CA, USA). Genomic DNA was fragmented and then used to prepare Illumina paired-
44
end libraries, following the manufacturer’s instructions (Paired-End Sample
Preparation Guide, Illumina, 1005063). Insert fragment lengths of 500 bp, 700 bp
and 1 kb were generated. A final set of 48 libraries was then sequenced on the HiSeq
2000 platform in BGI-Shenzhen (Beijing Genomics Institute-Shenzhen, Shenzhen,
China).
Genome assembly
All reads were filtered for quality using Trimmomatic (Bolger et al., 2014), with a
window of 4 bp requiring an average >Q20 and minimum read length of 50 bp.
Genome assemblies for each of the eight accessions were performed at the Pawsey
Supercomputing Centre (Murdoch University, Australia) using 32 GB memory and
six cores, using SOAPdenovo2 (Luo et al., 2012) with k=51. The assembly was
assessed by the figure of N50 and the percentage of the contigs which could be
mapped to the barley reference genome using BLASTN with the similarity no less
than 85%.
Variant calling and validation.
The barley pseudo-molecular genome (Mascher et al., 2017) was used as the
reference to map the clean reads using BWA-MEM (Li and Durbin, 2010) with
default parameters. PCR Duplications were marked and removed using Picard
(version 1.129). Only those reads having unique mapping positions in reference
genome were remained and proceeded to detect genomic variation (SNPs and
InDels). Sequence variations (insertions, deletions and SNPs) were detected by
running three rounds of SAMTools (version 1.7) plus BCFTools (version 1.7) and
45
the Genome Analysis ToolKit (GATK,version 3.8) variants calling pipeline. Briefly,
the first round was performed with SAMTool plus BCFTools and filtered based on
the mapping quality and variants calling quality. The result of the first round was
used as the guide of realignment around potential InDel and variants calling for the
second round in GATK. The common variants detected by both SAMTools plus
BCFTools pipeline and GATK pipeline were used to guide variants calling in the
third round using GATK.
Population structure analysis.
The phylogenetic tree was constructed from the neighbour-joining method using
PHYLIP (version 3) based on evolutionary distance, which was computed with the
F84 model using the DNADIST utility in PHYLIP package. The tree file was
constructed and annotated using MEGA (Tamura et al., 2013) (version 6) and
TreeView (version 1.6.6). Principle component analysis was performed with the
EIGENSOFT program (version 6). Subpopulation components were estimated using
the ADMIXTURE program (version 1.23) with the K cluster ranging from 1 to 6.
Selective sweep analysis.
Selective sweep analysis is one of the main methods for detecting environmental or
natural selection signatures. Genomic regions under selective sweeps due to
environmental stresses on opposing slopes were measured using the fixation index
FST. Selective sweep analysis was conducted by measuring the patterns of allele
frequencies in each 1 Mb fragment along chromosomes using the whole-genome-
46
wide SNP variation data. Genomic regions with FST values above 0.89 were
considered under strong selective sweeps.
Function and pathway annotation.
Function and pathway annotation based on protein homology was performed using
an automatic functional annotation and classification tool (AutoFACT). Briefly,
coding sequences were aligned to NT database using program blastn and to protein
databases including NR, Swiss-Prot, KEGG using program blastx with the same
threshold. And then the function and pathway annotations of the best hits with a
biological informative description were assigned to query sequences.
47
3.3 Result
3.3.1 Dramatic divergence between wild barleys from SFS
and NFS
To investigate the genetic diversity of wild barleys in the ‘Evolution Canyon’ I (EC)
in Israel, transcriptome sequencing was performed with ten samples from SFS and
NFS ( 3-1) in this study. About 3.70 Gb clean sequencing data was obtained for each
wild barley accession on average. A total of 218,709 SNPs and 8,640 InDels detected
for these ten barley accessions in transcriptome level. SNP number for individuals
ranges from 92,556 to 119,675 and InDel from 4,174 to 5,407. The SNP frequency
distribution of ten wild barleys in each 1 Mb region along seven chromosomes were
showed in a circus plot ( 3-3 a). Overall, the SNP frequenc ies within distal regions
are higher than those in centromeric regions. The distribution patterns are similar for
wild barley accessions from each slope but different for accessions from opposing
slopes.
48
Figure 3-3 Genomic diversity and evolution analysis based on RNA sequencing of
ten samples from two slopes of ‘’Evolution Canyon”.
a, SNP frequency distribution is displayed as ten circular histograms representing ten
barley accessions from two slopes. From inner to outer, circle 1-10 indicates barley
accessions of NFS148, NFS146, NFS145, NFS144, NFS143, SFS125, SFS124,
SFS123, SFS122, SFS120. The height of bars indicates the SNP number (logarithm
with base of 10) in each 1 Mb genomic region. b, Neighbour-Joining tree
construction. c, Principle component analysis. This analysis was conducted based on
49
218,709 SNPs without miss ing data using EIGENSOFT. d, Population structure
analysis was conducted using admixture.
To characterize the genetic divergence of wild barleys from the opposing slopes in
the “Evolution Canyon”, neighbour-joining tree, principle component and population
structure analysis are conducted based on the SNP variants. From the neighbour-
joining tree ( 3-3b), five accessions from the SFS slopes were clearly separated from
the other five accessions from the NFS group. The evolutionary distances was
calculated using F84 model (Felsenstein, 1981) based on SNP variants of these ten
wild barley accessions (Table 3-1). The evolution distance among five accessions in
SFS are 0.14~0.23 and in NFS 0.18~0.16. However, the distance between SFS group
and NFS group is up to 1.03, which indicates the dramatic genetic divergence
between SFS and NFS wild barleys. Wild barleys from the opposing slopes were also
apparently split into two clusters according to the first eigenvector, which explains
up to 56% of genetic difference between SFS and NFS wild barleys ( 3-3 c). The
obvious genetic divergent of wild barley between the two slopes was also confirmed
by population structure analysis ( 3-3 d). When K value was set to two, five wild
barleys from SFS were inferred to originate from one ancestor while five NFS wild
barleys were from another ancestor.
3.3.2 Bigger genomic divergence between SFS and NFS wild
barleys than that between Tibetan wild barleys and
domesticated barleys.
50
To further investigate the genomic divergence of wild barleys from the opposing
slopes, we performed high coverage whole genome sequencing with another ten
barley accessions. They represent four barley groups: six wild barley accessions
collected from the opposing slopes of the ‘Evolution Canyon’ (SFS and NFS), two
wild barley accessions from the Tibetan Plateau (TB1, TB2) and two cultivated
barleys (DB1 and DB2) (Table 3-1). About 190 Gb of clean data was obtained for
each barley accession on average, which corresponded to about 39.4× genome
coverage of the recently published barley genome reference (Mascher et al., 2017).
De novo assembly was performed for eight out of the ten accessions with the
assembly length ranging from 3.63 to 4.54 Gb and N50 ranging from 7.18 to 10.89
Kb (Table 3-2). More than 99% of clean reads were mapped to the barley reference
genome (Mascher et al., 2017). Up to 84% of the reference sequence positions were
sequenced at least five times. Based solely on reads with unique mapping position in
the reference genome, 63,565,829 single nucleotide variants (SNPs) and 2,673,996
short insertions/deletions (InDels) were identified for ten barley accessions in total.
The distribution of SNP in each 1 Mb region along chromosomes was plotted in
3-4a. In contrast to the RNA sequencing data, the SNPs were evenly distributed in
the wild barley genomes from EC. However, loss of genetic diversity in some
chromosome regions were clearly evident in the cultivated barleys and even in
Tibetan wild barleys.
51
Table 3-1 Output statistics of whole-genome shotgun sequencing of ten barley accessions.
NCBI-ID Sample Describe Reads Read length (bp) Clean Data (nt) Coverage
SAMN05207298 DB1 Canadian cultivar barley 2,025,830,279 95 192,453,876,505 39.81
SAMN05207297 DB2 Australia cultivar barley 2,083,773,587 95 197,958,490,765 40.95
SAMN05207291 TB1 Tibetan wild barley 2,107,917,374 95 200,252,150,530 41.42
SAMN05207292 TB2 Tibetan wild barley 2,003,555,938 95 190,337,814,110 39.38
SAMN05207293 SFS2.1 SFS wild barley 2,025,527,964 95 192,425,156,580 39.81
SAMN05207294 SFS2.2 SFS wild barley 1,902,396,508 100 190,239,650,800 39.36
SAMN08038525 SFS2.3 SFS wild barley 721,898,538 150 108,284,780,700 22.36
SAMN08038526 NFS6.1 NFS wild barley 787,729,618 150 118,189,116,000 24.43
SAMN05207295 NFS7.1 NFS wild barley 1,668,485,369 95 158,506,110,055 32.79
SAMN05207296 NFS7.2 NFS wild barley 2,016,099,086 100 201,609,908,600 41.71
52
Table 3-2 Assembly statistics of eight barley accessions.
Variety
Genome
length (Gbps)
GC (%)
Total number
scaffolds (>= 100 bp)
Number scaffolds
Number
scaffolds in N50 set
Minimum
scaffold length
N50
Maximum
scaffold length
DB1 4.53 43.8 7,237,863 353,225 63,409 1,000 7,184 85,370
DB2 4.54 44.3 4,714,325 366,546 64,118 1,000 8,288 141,105
SFS2.1 4.49 44.4 4,818,935 352,645 60,957 1,000 8,764 113,245
SFS2.2 3.63 44.2 5,063,088 351,782 58,896 1,000 9,499 113,496
NFS7.1 4.47 44.3 5,138,819 354,303 62,415 1,000 8,374 132,282
NFS7.2 3.66 44.3 4,353,671 357,168 56,429 1,000 10,885 149,690
TB1 4.49 44.2 5,042,536 362,319 64,848 1,000 8,020 94,844
TB2 4.42 44.2 4,950,063 320,524 52,623 1000 10,199 145,095
53
Figure 3-4 Genomic diversity and evolution analysis based on whole genome
sequencing of ten samples from four different groups.
a, SNP frequency distribution is displayed as ten circular histograms representing
ten barley accessions. From inner to outer, circle 1-10 indicates barley accessions of
DB2, DB1, TB2, TB1, SFS2.2, SFS2.1, SFS2.3, NFS6.1, NFS7.2, NFS7.1. The
height of bars indicates the SNP number (logarithm to the base 10) in each 1 Mb
genomic region; b, Neighbour-Joining tree construction. It was conducted using
Phylip based on the sequence distance, which was calculated based on 29429 SNPs
evenly distributed in each chromosome. c, Principle component analysis. This
analysis was conducted based on 54,255,907 SNPs without missing data using
EIGENSOFT. d, Population structure analysis was conducted using admixture
54
Based on the genome-wide SNPs, neighbour-joining tree, principle component and
population structure were conducted to compare the evolution distance of wild barley
between the apposing slopes and that between Tibetan wild barley and cultivated
barley. In the neighbour-joining tree ( 3-4 b), six wild barley from the “Evolution
Canyon” were separated from both cultivated barley and Tibetan wild barley. The
evolutionary distance between SFS and NFS wild barleys is about 0.60, which is
about twice that between Tibetan wild barley and cultivated barley (~ 0.30) (Table
3-3). In the plot of principle component analysis ( 3-4c), six EC wild barleys were
separated from Tibetan wild barley and cultivated barley in the first eigenvector (~
25%). Three SFS wild barleys were separated from three NFS wild barleys in the
second eigenvector (~ 21%). Both neighbour-joining tree and the PCAs revealed a
distinct separation of the NFS and SFS wild barley populations, in line with the
sharply divergent microclimates of the two slopes at EC. To estimate the individual
evolutionary ancestry and admixture proportions, a population genetic structure
analysis with a K-value from 2 to 4 was conducted ( 3-4d). When K was set to 3,
NFS and SFS wild barleys are indicated to be derived from two different ancestors
while domesticated barley and Tibetan wild barleys are from the third ancestor.
When K was set to 4, four groups of barleys are assumed to be derived from four
different ancestors. This result also shows that the evolutionary relationship between
SFS and NFS wild barleys are much farther than that between Tibetan wild barley
and cultivated barley.
55
Table 3-3 Evolutionary distance for ten EC wild barleys using whole genome shotgun sequencing
sample DB1 DB2 TB1 TB2 SFS2.1 SFS2.2 SFS2.3 NFS6.1 NFS7.1 NFS7.2
DB1 0.00
DB2 0.06 0.00
TB1 0.28 0.30 0.00
TB2 0.29 0.31 0.39 0.00
SFS2.1 0.58 0.60 0.63 0.65 0.00
SFS2.2 0.57 0.59 0.63 0.60 0.40 0.00
SFS2.3 0.55 0.57 0.61 0.63 0.39 0.36 0.00
NFS6.1 0.57 0.59 0.57 0.60 0.61 0.55 0.50 0.00
NFS7.1 0.57 0.59 0.56 0.57 0.56 0.52 0.64 0.43 0.00
NFS7.2 0.58 0.60 0.57 0.58 0.57 0.52 0.65 0.43 0.02 0.00
56
3.3.3 Genomic regions under environmental selection of NFS
and SFS
To uncover the genetic basis of the remarkable genetic divergence and to infer
genomic regions subjected to environmental selection, Wright’s F-statistic FST was
calculated between SFS and NFS wild barley groups. The frequency distribution of
FST value between SFS and NFS wild barleys was featured by a multimodal
distribution. The distribution of the windowed FST value between NFS and SFS wild
barley along each chromosome is shown in 3-6. The total length of genomic regions
with FST values above the threshold of 0.89 was about 243 Mb, which contains 919
high confidence genes. A continuous genomic region of about 200 Mb in the central
part of chromosome 3H showed strong environmental selection.
57
Figure 3-5 Distribution of FST values across the whole genome.
FST value was calculated in each 1 Mb region with steps of 250 Kbs. The x-axis
indicates the value of FST, and y-axis shows the FST value frequency.
58
Figure 3-6 Selection sweep analysis in whole genome.
It was conducted based on on 54,255,907 SNPs from whole genome sequencing
without missing data using VCFtools. Red horizontal line indicates the threshold of
95%, which is the Fst value of 0.89.
To further investigating whether the environmental selection resulted in speciation
between the SFS and NFS wild barleys, we searched the candidate genes responsible
for reproductive isolation in several plant species including crops (Bikard et al. ,
59
2009; Chen et al., 2008; Long et al., 2008; Mizuta et al., 2010; Yamagata et al.,
2010; Yang et al., 2012). Using BLASTP, with a threshold E-value ≤2.00E-56 and
aligned amino acids length >100, we identified 22 homologous genes for
reproductive isolation in the barley genome. 88 unique SNPs or InDels were found
between the NFS and SFS genotypes in twelve genes related to reproductive
isolation (Table 3-4). Five of these genes were in the above selective sweep regions
(HORVU4Hr1G084220, HORVU3Hr1G052250, HORVU1Hr1G060040,
HORVU3Hr1G107970, and HORVU3Hr1G107990). Notably, two genes
(HORVU3Hr1G052250, HORVU1Hr1G060040) mapped in the selection region
showed unique variants between SFS and NFS populations. A gene homologous to
the rice hybrid male sterility locus SaF (HORVU3Hr1G052250), resulted in indica-
japonica speciation through a two-gene/three-components interaction model (Long et
al., 2008), was located in the selection sweep signature region of chromosome 3H (
3-6). A substitution of Phe287Ser between SaF+ and SaF- impairs biological
function and results in male sterility in the indica-japonica hybrid. It is interesting to
observe a substitution of Thr179Ala between the SFS and NFS genotypes. Whether
this substitution impacts reproduction between interslope wild barley populations
remains to be validated.
60
Table 3-4 Plant reproductive related homology genes in barley
Speciation Related Genes AA-len Target Target Length Aligned-Length Identity E-value Mismatch Gaps
HBD2 463 HORVU6Hr1G058230.9 418 431 88.17 0 35 16
L27 145 HORVU2Hr1G066830.8 146 145 84.83 9.00E-90 22 0
HPA1 417 HORVU6Hr1G069390.6 263 260 74.62 5.00E-145 66 0
S5-3 470 HORVU7Hr1G107190.2 455 326 50.61 7.00E-77 131 30
S5 357 HORVU4Hr1G084220.1 574 364 47.8 2.00E-106 168 22
CF2 344 HORVU7Hr1G044230.3 341 313 46.96 1.00E-95 156 10
CF2 344 HORVU6Hr1G084520.5 340 342 45.91 2.00E-98 172 13
CF2 344 HORVU1Hr1G060040.1 342 342 45.91 2.00E-102 172 13
TTG2 429 HORVU5Hr1G028340.3 356 256 45.7 1.00E-61 90 49
CF2 344 HORVU2Hr1G009150.1 342 339 45.13 5.00E-99 172 14
CF2 344 HORVU4Hr1G006310.1 346 344 43.6 9.00E-97 184 10
TTG2 429 HORVU2Hr1G036320.4 553 261 43.3 2.00E-56 116 32
TTG2 429 HORVU3Hr1G088200.6 405 319 41.38 3.00E-57 123 64
S5-3 470 HORVU5Hr1G078400.1 672 391 39.64 1.00E-75 213 23
S5-3 470 HORVU5Hr1G014490.1 430 421 39.43 7.00E-76 193 62
S5-3 470 HORVU6Hr1G008880.6 557 402 38.81 4.00E-76 221 25
SaF+ 476 HORVU4Hr1G021110.1 512 397 38.04 2.00E-71 235 11
SaF+ 476 HORVU6Hr1G025790.1 544 487 37.37 1.00E-79 260 45
SaF+ 476 HORVU3Hr1G052250.6 503 472 37.29 2.00E-84 266 30
Tcr:Smoke1 504 HORVU1Hr1G081310.3 359 319 36.05 3.00E-59 196 8
Tcr:Smoke1 504 HORVU3Hr1G107990.1 553 316 35.76 1.00E-60 193 10
Tcr:Smoke1 504 HORVU3Hr1G107970.4 518 316 35.76 2.00E-60 193 10
61
3.3.4 Differential drought stress response of SFS and NFS
wild barleys
Drought is a key climatic stress that distinguishes SFS and NFS. To compare the
drought stress response of wild barleys from the two slopes, transcriptome
sequencing was performed for SFS and NFS wild barleys with treatment to simulate
drought stress in the SFS. About 44.40 Gb clean data was generated for twelve
samples including SFS and NFS wild barleys with two conditions and three
replications. Differential gene expression was analysed for SFS and NFS wild barley
under drought stress and the overview information is given in Table 3-5. 274 genes
were commonly detected to be differentially expressed in both SFS and NFS wild
barley under drought. 2,791 genes were differentially expressed only in SFS wild
barley under drought stress whereas 1,030 differentially expressed genes only in NFS
wild barley (Figure 3-7). The density of uniquely differential expression genes in
each 10 Mb region along seven chromosomes were showed in Figure 3-8. We have
tried to plot the normal distribution and found that the gene density distribution of
DEGs is more obvious than the normal distribution. GO enrichment analysis was
conducted for DEGs in SFS and NFS wild barleys under drought stress (Figure 3-9).
Over-represented GO terms DEGs in SFS wild barley includes translational
elongation, ribosome biogenesis, organic acid metabolic process, lipid metabolic
process and so on (Figure 3-9). The over-represented GO terms in NFS wild barley
are different with that in SFS. For example, lipid metabolic process is over-
represented in the DEGs of SFS wild barley, while it is under-represented in the
62
DEGs of NFS wild barley. The over-represented GO terms in NFS wild barley
includes amine metabolic process, tryptophan metabolic process and so on.
Table 3-5 Summary of DEGs in SFS and NFS wild barley under drought treatment
Treatments Total DEG Up-regulation Down-regulation
SFS 2791 1607 1184
NFS 1030 212 818
overlap 274 129 145
Figure 3-7 Statistics of DEGs under drought stress treatment in SFS and NFS.
Yellow circle indicates numbers of DEGs in SFS, blue for DEGs in NFS and overlap
between them is the DEGs detected in both SFS and NFS.
63
Figure 3-8 Density of DEGs in each 10 Mb region along chromosomes under
drought stress treatment in SFS and NFS.
Y-axis indicates the differential gene number in each 10 Mb region and X-axis
indicates physical position. Blue dots represent DEGs detected only in SFS under
drought stress treatment and red represent DEGs in NFS and green for DEGs
detected in both SFS and NFS
64
Figure 3-9 Over-represented and under-represented GO terms for differentially
expressed genes in SFS wild barley under drought stress.
65
Figure 3-10 Over-represented and under-represented GO terms for differentially
expressed genes in NFS wild barley under drought stress.
66
In theory, DEGs response to drought treatment from the SFS is more likely to play
important roles in enhancing the drought tolerance. Thus, we compared the protein
similarity between genes proved to be responsible for drought tolerance in nine
species (Alter et al. , 2015) and drought-treated DEGs in SFS wild barley but not
NFS. Particularly, we have found 46 of them having homologous genes with
experimental evidences to control plant drought resistance (Table 3-6). For example,
the protein sequence identity between HORVU2Hr1G060350 and CIPK3 is up to
64% with the coverage of 95% (425/446 peptides). CIPK3 was reported to regulate
abscisic acid and cold signal transduction as a calcium sensor-associated protein
kinase in Arabidopsis (Kim et al., 2003). It was also demonstrated that the expression
of CIPK3 is responsive to ABA and stress conditions and the expression pattern of a
number of stress gene markers were altered by the disruption of CIPK3. Another
drought responsive differential expression gene HORVU2Hr1G075470 in SFS was
detected to have 72% protein sequence identity with 92% coverage with SRK2C
gene. The SRK2C was reported to improve drought tolerance by controlling the
expression level of stress-responsive genes in Arabidopsis (Umezawa et al., 2004).
67
Table 3-6 Homology of plant drought related genes in barley
Index DEGs-SFS drought-genes gene / ID / organism function
1 HORVU1Hr1G004150 LOC_Os01g42860 OCPI1 / LOC_Os01g42860 / Oryza
sativa Oryza sativa chymotrypsin
inhibitor-like 1
2 HORVU1Hr1G020690 Solyc11g018800 PO2 / Solyc11g018800 / Capsicum
annuum extracellular peroxidase 2
3 HORVU1Hr1G049210 AT1G10370 GSTU17 / AT1G10370 / Arabidopsis
thaliana glutathion s-transferase U17
4 HORVU1Hr1G049270 AT1G10370 GSTU17 / AT1G10370 / Arabidopsis
thaliana glutathion s-transferase U17
5 HORVU1Hr1G063560 AT2G18960 OST2 / AT2G18960 / Arabidopsis
thaliana
plasma membrane proton
ATPase
6 HORVU2Hr1G009580 AT2G47800 MRP4 / AT2G47800 / Arabidopsis
thaliana multidrug resistance-associated
protein, ABC transporter
7 HORVU2Hr1G009940 Solyc01g111510 APX / Solyc01g111510 / Populus
tomentosa ascorbate peroxidase
68
Index DEGs-SFS drought-genes gene / ID / organism function
8 HORVU2Hr1G010990 AT3G53420 BnPIP1 / AT3G53420 / Brassica
napus PIP
9 HORVU2Hr1G018260 LOC_Os11g02240 CIPK15 / LOC_Os11g02240 / Oryza
sativa calcineurin B-like protein-interacting protein kinase
10 HORVU2Hr1G018390 Solyc11g018800 PO2 / Solyc11g018800 / Capsicum
annuum extracellular peroxidase 2
11 HORVU2Hr1G045200 AT1G10370 GSTU17 / AT1G10370 / Arabidopsis
thaliana glutathion s-transferase U17
12 HORVU2Hr1G060350 LOC_Os07g48760 CIPK03 / LOC_Os07g48760 / Oryza
sativa
calcineurin B-like protein-
interacting protein kinase
13 HORVU2Hr1G075470 AT1G78290 SRK2C / AT1G78290 / Arabidopsis
thaliana
SNF1-related protein kinase 2,
osmotic-stress-activated protein kinase
14 HORVU2Hr1G078570 Solyc00g015750 ALR / Solyc00g015750 / Medicago
sativa aldose/aldehyde reductase
15 HORVU2Hr1G079920 AT1G15820 LHCB6 / AT1G15820 / Arabidopsis
thaliana light harvesting chlorophyll a/b
binding protein
69
Index DEGs-SFS drought-genes gene / ID / organism function
16 HORVU2Hr1G105510 AT2G47800 MRP4 / AT2G47800 / Arabidopsis
thaliana multidrug resistance-associated
protein, ABC transporter
17 HORVU2Hr1G111760 LOC_Os04g56130 GbRLK / LOC_Os04g56130 /
Gossypium barbadense Receptor-like kinase
18 HORVU3Hr1G007410 Solyc00g015750 ALR / Solyc00g015750 / Medicago
sativa aldose/aldehyde reductase
19 HORVU3Hr1G023220 AT5G49890 CLCc / AT5G49890 / Arabidopsis
thaliana chloride channel
20 HORVU3Hr1G029350 LOC_Os11g02240 CIPK15 / LOC_Os11g02240 / Oryza
sativa
calcineurin B-like protein-
interacting protein kinase
21 HORVU3Hr1G038290 Solyc11g018800 PO2 / Solyc11g018800 / Capsicum
annuum extracellular peroxidase 2
22 HORVU3Hr1G073100 LOC_Os01g55450 CIPK12 / LOC_Os01g55450 / Oryza
sativa calcineurin B-like protein-interacting protein kinase
23 HORVU3Hr1G111640 AT1G10370 GSTU17 / AT1G10370 / Arabidopsis
thaliana glutathion s-transferase U17
70
Index DEGs-SFS drought-genes gene / ID / organism function
24 HORVU4Hr1G022630 LOC_Os11g02240 CIPK15 / LOC_Os11g02240 / Oryza
sativa calcineurin B-like protein-interacting protein kinase
25 HORVU4Hr1G052200 LOC_Os07g48760 CIPK03 / LOC_Os07g48760 / Oryza
sativa calcineurin B-like protein-interacting protein kinase
26 HORVU4Hr1G055220 AT1G01360 PYL9/RCAR1 / AT1G01360 /
Arabidopsis thaliana
soluble ABA receptor interacts with and regulates PP2Cs ABI1
and ABI2
27 HORVU4Hr1G080820 MLOC_77777 HVA1 / MLOC_77777 / Hordeum
vulgare group 3 late embryogenesis
abundant protein
28 HORVU5Hr1G000780 AT2G18960 OST2 / AT2G18960 / Arabidopsis
thaliana
plasma membrane proton
ATPase
29 HORVU5Hr1G030810 LOC_Os03g03660 CDPK7 / LOC_Os03g03660 / Oryza
sativa calcium-dependent protein
kinase
30 HORVU5Hr1G045850 AT1G10370 GSTU17 / AT1G10370 / Arabidopsis
thaliana glutathion s-transferase U17
31 HORVU5Hr1G048240 AT2G29980 FAD3 / AT2G29980 / Brassica napus fatty acid desaturase
71
Index DEGs-SFS drought-genes gene / ID / organism function
32 HORVU5Hr1G054510 LOC_Os09g13570 OsbZIP71 / LOC_Os09g13570 /
Oryza sativa bZIP transcription factor
33 HORVU5Hr1G077910 AT1G52400 AtBG1 / AT1G52400 / Arabidopsis
thaliana
beta-glucosidase; hydrolyzes glucose-conjugated, inactive
ABA to active ABA
34 HORVU5Hr1G080300 MLOC_54227 CBF4 / MLOC_54227 / Hordeum
vulgare CBF/DREB TF
35 HORVU5Hr1G093660 LOC_Os01g55450 CIPK12 / LOC_Os01g55450 / Oryza
sativa calcineurin B-like protein-interacting protein kinase
36 HORVU5Hr1G097500 Solyc04g077980 DgZFP3 / Solyc04g077980 /
Chrysanthemum x morifolium
Cys2/His2-type zinc finger
protein gene
37 HORVU5Hr1G103430 AT1G10370 GSTU17 / AT1G10370 / Arabidopsis
thaliana glutathion s-transferase U17
38 HORVU5Hr1G103890 AT2G27150 AAO3 / AT2G27150 / Arabidopsis
thaliana
Arabidopsis aldehyde oxidase, catalyzes final step in ABA
biosynthesis
39 HORVU6Hr1G019540 AT4G31120 CAU1 / AT4G31120 / Arabidopsis
thaliana
H4R3sme2 (for histone H4 Arg
3 with symmetric dimethylation)-type histone
72
Index DEGs-SFS drought-genes gene / ID / organism function
methylase protein arginine methytransferase5/Shk1
binding protein1
40 HORVU6Hr1G035470 AT5G67300 MYB44 / AT5G67300 / Arabidopsis
thaliana MYB type TF
41 HORVU7Hr1G025000 AT2G47800 MRP4 / AT2G47800 / Arabidopsis
thaliana
multidrug resistance-associated
protein, ABC transporter
42 HORVU7Hr1G028910 AT1G15690 AVP1 / AT1G15690 / Arabidopsis
thaliana vacuolar membrane H+-
Pyrophosphatase
43 HORVU7Hr1G038770 AT1G75270 DHAR2 / AT1G75270 / Arabidopsis
thaliana dehydroascorbate reductase
44 HORVU7Hr1G078440 MLOC_62301 Srg6 / MLOC_62301 / Triticum
aestivum stress-responsive gene, TF
45 HORVU7Hr1G080550 Solyc11g018800 PO2 / Solyc11g018800 / Capsicum
annuum extracellular peroxidase 2
46 HORVU7Hr1G089510 LOC_Os01g55450 CIPK12 / LOC_Os01g55450 / Oryza
sativa calcineurin B-like protein-interacting protein kinase
73
3.4 Discussion
Cultivated barley were domesticated around 10,000 years ago at the Fertile Crescent
where their wild relatives still thrive today. However, a large proportion of suitable
habitats for wild barley have been lost due to climate changes since the last glacial
ages when the climate was much wetter and cooler (Russell et al., 2014). Located at
Mount Carmel in Israel, EC is the natural habitat of wild barley and often referred to
as the ‘Israeli Galapagos’(Nevo, 2014a). The SFS in EC features extreme levels of
solar radiation (200‒800% higher than the NFS) and the associated higher daily
temperatures and drought (Nevo, 1995, 2012; Pavlícek et al. , 2003). NFS is
characterized by a cooler climate with higher relative humidity (1–7%) than SFS.
Thus, EC provides an ideal model to understand the impact of global warming on
genome evolution. Despite evidence of substantial gene flow between the abutting
slopes, significant interslope genetic variation and diversity were detected In this
study, whole genome-wide diversity was investigated for both SFS and NFS wild
barleys. And dramatic genetic divergence was proved to exist between SFS and NFS
wild barleys both in transcriptome level and whole genome level. It was also
suggested that the genetic divergence between SFS and NFS wild barleys are much
larger than that between Tibetan wild barley and cultivated barley. All these evidence
indicates that sympatric speciation exists in wild barley in EC microclimate
environment, which coincides with that reported in other species (Hadid et al., 2014;
Li et al., 2016b; Parnas, 2006).
Considering that the limited gene flow may be a major factor causing the sympatric
speciation of wild barley in the opposing slopes, allele frequency was analysed in
74
whole genome based on the Wright’s F-statistic FST. Up to 243 Mb genomic regions
containing 919 high confidence genes were showed to be under the selection of
differential microclimates between SFS and NFS. 52 out of the 919 genes
differentially expressed in SFS wild barley under drought stress and 18 were
differentially expressed in NFS. There is a large genomic region with le ngth of about
200 Mb in central part of chromosome 3H showing dramatic environmental
selection. We also performed Fluorescence in situ hybridization (FISH) to compare
the overall karyotypical difference of each chromosome between SFS and NFS wild
barleys (Figure 3-11). The dramatic karyotypical difference was detected in the
chromosome 3H between SFS and NFS wild barley.
Figure 3-11 Karyotypical comparison of wild barley NFS6.1 and SFS2.3 and
cultivated barley Golden Promise by FISH.
75
The most contrasting environmental difference between the opposing slopes is
drought and heat due to different levels of solar radiation. Drought alters the gene
expression of compounds that protect cells from dehydration, including genes
belonging to the Dehydrin gene group and heat shock protein 17. Suprunova et al
(Suprunova et al., 2004) measured the expression levels of Dhn genes in tolerant and
sensitive wild barley genotypes, and found that Dhn levels increased earlier and with
higher intensity in tolerant genotypes than in sensitive genotypes. Confirming these
findings, Yang et al. (2009) detected higher levels of Dhn expression in the resistant
SFS compared to sensitive NFS wild barley after dehydration. To investigate the
influence of drought stress on the transcriptional level of wild barley from the
opposing slopes, we performed transcriptome sequencing and detected DEGs
responding to the drought treatment. The number of DEGs in SFS barley is about
three times that in NFS barley after dehydration stress, further supporting interslope
genotypic divergence between tolerant genotypes on the SFS and sensitive genotypes
on the NFS. Lipid metabolic process is over-represented in DEGs of SFS wild barley
but under-represented in DEGs of NFS wild barleys. It was previously reported that
the adjustment of lipid content in cell membrane plays vital roles in response to
drought stress in Arabidopsis (Gigon et al., 2004). Thus, adjustment of lipid
metabolism may be an important mechanism for the wild barley in SFS to adapt to
the extreme drought and heat environment.
Diverse stress factors such as drought can either accelerate or delay flowering in
many plant species. Alteration of flowering in response to stress is known as stress-
induced flowering, which has recently been considered a third category of flowering
response in addition to photoperiodic flowering and vernalisation (Takeno, 2016).
76
Wild barley from SFS flowered one to two weeks earlier than NFS wild barley, even
under well-watered conditions (Parnas, 2006). Different flowering date was also
assumed to be one factor limiting the gene flow between SFS and NFS wild barley.
Recently, Nevo et al (Nevo, 2012) reported that wild barley and wild emmer wheat
(Triticum dicoccoides) populations across Israel flowered about ten days earlier in
2009 compared to 1980 as a result of rising average temperatures due to global
warming. In this study, seven genes with roles in flowering time were detected
within the selective sweep regions , including PHYB and GI. PHYB modulates gene
activity to influence plant growth and development and increases drought tolerance
by enhancing abscisic acid (ABA) sensitivity in Arabidopsis thaliana (Gonzalez-
Guzman et al., 2012). Here, PHYB transcript was present at higher levels in SFS and
lower levels in NFS wild barley after drought, suggesting that higher PHYB
expression alleviates the symptoms of stress more effectively in SFS wild barley.
Drought also alters the circadian clock, impacting the expression of circadian clock
genes such as GIGANTEA (GI) to mediate different signaling pathways that
modulate drought stress perception and responses (Riboni et al. , 2013). In
Arabidopsis, GI enabled the drought escape response via ABA-dependent activation
of the flowering time genes FLOWERING LOCUS T (FT), TWIN SISTER OF FT
(TSF) and SUPPRESSOR OF OVEREXPRESSION OF CONSTANS (SOC1).
Drought is a stress that impacts both SFS and NFS, therefore, both had increased GI
levels after drought. The genotype from NFS was particularly vulnerable to drought,
such that GI was more up-regulated than down-regulated in SFS to potentially induce
earlier flowering. The wild barleys in EC have evolved a different mechanism to
control flowering time through light signalling pathway comparing to the cultivated
77
barley in which photoperiodic response and vernalisation are the major factors for
flowering.
Sympatric speciation—the evolution of reproductive isolation without geographic
barriers—is a highly contentious concept in evolutionary biology (Bird et al. , 2012;
Bolnick and Fitzpatrick, 2007). Whether natural selection can outperform the
continuous homogenizing force of free gene flow in a sympatric ecological condition
and generate reproductive incompatibility is under strong debate. Our study is the
first one to use whole-genome sequencing and transcriptome sequencing to
investigate the genomic differentiation pattern of sympatric speciation in a plant
species. A genome-wide FST analysis revealed detailed information on the genomic
differentiation between SFS and NFS barley. Notably, the total genomic region
identified as a divergence outlier (FST > 0.89) was about 243 Mb ( 3-6), representing
~5% of the barley reference genome, and falls within the 0.4–35.5% range reported
for other plants under divergent selection (Strasburg et al., 2012). This value is
higher than the 3.6% reported for Oenanthe conioides and Oenanthe aquatic, two
Apiaceae plant species under sympatric speciation (Westberg and Kadereit, 2014). In
another study, on the sympatric speciation of a palm tree in an oceanic island, only
four of the 274 AFLP loci differed strongly between the two diverging populations
(Savolainen et al., 2006). In our study, the wild barley on the two opposing slopes are
under diverse and strong environmental selection involving heat, drought,
illuminance and pathogen stresses (Nevo, 1995; Nevo, 2014a), which may explain
the different result from the other two sympatric speciation cases, where ecological
selection in the corresponding plant populations was driven by a single
environmental factor such as soil pH or fresh water tides (Savolainen et al., 2006),
78
respectively. Moreover, the proportion of genomes under divergence in both studies
was estimated from AFLP analyses which may be biased and unable to uncover all
the genomic regions under natural selection. Interestingly, genomic divergence
analyses of the SFS and NFS barley revealed one prominent ‘chromosome island’ on
chromosome 3H under strong natural selection, spanning a continuous region of
around 200 Mb ( 3-6). This ‘unusual’ divergence signature represents a typical
genomic differentiation pattern expected for sympatric speciation, where only those
genetic regions directly related to the specific environmental selection pressure (most
likely heat, drought, illuminance stresses) will be differentiated, while the remainder
of the genome is subjected to continuous homogenizing forces due to free gene flow
in the sympatric setting (Strasburg et al., 2012). This is consistent with the genetic
differentiation pattern reported in the other two sympatric speciation cases of plants
(Savolainen et al., 2006). In contrast, genetic differences are expected to accumulate
throughout the whole genome for allopatric speciation with geographic isolation (Via
and West, 2008). Our results lend further support to the sympatric speciation of wild
barley at EC at the genetic divergence level. Detailed examination of the genomic
function of this chromosome island may shed light on the speciation process of
plants and allow us to understand better how wild barley evolves to adapt to highly
diverse environmental conditions across the world. Our results are also supported by
the previous studies based on F1 hybrids from 40 intraslope and 50 interslope crosses
tested for a range of vegetative and reproductive traits , supports incipient sympatric
speciation in wild barley from opposing slopes at EC (Nevo, 2006; Parnas, 2006).
For most traits, interslope crossbred plants were significantly inferior to intraslope
hybrids.
79
In summary, we observed strong genomic differentiation of wild barley across a
short distance of 250 m due to the sharp microclimatic divergence between the
abutting slope populations at Evolution Canyon I, Israel. To put the local patterns of
genetic variation into a broader geographic context, we also analyzed two Tibetan
wild barley accessions and two Canadian and Australian cultivated barley varieties.
Sample accessions from both sides of the slopes at EC confirmed the presence of
strong genomic differentiation underlying adaptive incipient sympatric ecological
speciation. The process of adaptive incipient sympatric speciation in wild barley at
EC, due to sharp interslope microclimatic divergence is supported by molecular
markers, genomic and transcriptomic evidence, and is shared with organisms across
phylogeny from bacteria to mammals identified in previous studies (Nevo, 2014a).
The genetic base of cultivated barley has rapidly narrowed resulting in loss of genetic
diversity, increasing the risk that cultivated barley will become increasingly
susceptible to diseases and environmental stresses. Wild and cultivated barley are
both diploid and inter-fertile opening up the possibility of using wild barley for
genetic variation in the improvement of cultivated barley against abiotic and biotic
environmental stresses.
Contribution statement
Eviatar Nevo (EN), Chengdao Li(CL) collected the samples. Xiaoqi Zhang (XZ), Fei
Dai (FD), CL performed DNA sequencing. FD, Xiaolei Wang (XW), Guoping Zhang
(GZ) performed RNA sequencing and the drought experiment. Cong Tan (CT),
Penghao Wang (PW), Brett Chapman (BC), Brett Chapman (RB), Matthew Bellgard
(MB) processed the data and performed computational analyses. CT, CBH (Camilla
80
Beate Hill), PW, BC, RB, Qisen Zhang (QZ), CL analysed the data and interpreted
the results. CL, EN and GZ conceived and designed the study.
81
Chapter 4 Genomic diversity of cultivated
barley
Abstract
Barley (Hordeum vulgare L.) is one of the earliest domesticated crops and ranks the
fourth in the world. It provides abundance of nutrition and energy for humans and
animals. And it has been widely used as model plant for Triticeae research. Recently,
a high quality of pseudomolecule genome sequence was generated for barley cv.
Morex, which paved the way to investigate the genomic diversity and isolate
functional genes by DNA sequencing in barley. In this study, we selected eleven
cultivated barleys from Australian cultivated barley germplasm to investigate the
genomic diversity. They were sequenced with whole genome shotgun sequencing
strategy with high-coverage from 25-fold to 42-fold. In total, 29,550641 SNPs and
1,592,034 InDels were detected for the eleven cultivated barleys. After comparing
the divergence between Australian cultivated barelys and wild barleys from ‘the
Evolution Canyon’, a total of 612 Mb genomic regions containing 2,649 high-
confidence genes were demonstrated to be under the selection of domestication. The
genomic variants and candidate genes under selection of domestication identified in
this study will benefit the isolation of QTLs controlling important agronomic traits
and the development of genome-wide molecular markers for molecular breeding of
new varieties in barley.
82
4.1 Introduction
Barley is the second most important crop in Australian and the fourth in the world
both in terms of grain production and harvested area (www.fao.org/faostat). Barley
provides abundance of energy and nutrition for human and animal. Meanwhile,
malting barley is an important raw material for brewing beer and distilled beverage.
Cultivated barley (Hordeum vulgare L.) is domesticated from wild barley (Hordeum
spontaneum L.) around 10,000 years ago at the Fertile Crescent (Harlan and Zohary,
1966). Barley can grow in highly diverse environments from the Eastern
Mediterranean coasts to Tibet (Dai et al. , 2014; Zeng et al. , 2015). It has been widely
used as model organism to research plant adaptive evolutionary processes as well as
vital genetic repository to explore plant abiotic and biotic stress tolerance.
Barley is diploid (2n=2x=14) with an estimated haploid genome of 5.1 Gb. Barely’s
genome is featured by extreme abundance of repetitive elements , which hindered the
assembly of barley genome. In 2012, a barley physical map with total length of 4.83
Gb was assembly based on high-information-content fingerprinting, BAC-end
sequencing and a high density genetic map (International Barley Genome
Sequencing et al., 2012; Mascher et al., 2013a). The physical map could be
represented by a minimum tilling path of 67,000 BAC clones. A gene space
assembly was generated by sequencing and assembling of 5,341 gene-containing
BACs, which had benefited functional QTL identification by linkage mapping (Zhou
et al., 2015) and association mapping (Russell et al., 2016), and genes isolation such
as MKK3 and AlaAT regulating seed dormancy (Nakamura et al., 2016; Sato et al. ,
2016). The past several years has witnessed the dramatic development of DNA
83
sequencing technology and molecular biology such as hierarchical shotgun
sequencing, optical mapping, and chromosome conformation capture sequencing,
which provide new approaches to resolve the challenges of barley genome assembly.
A high-quality pseudomolecule genome of barley was eventually achieved by the
International Barley Genome Sequencing Consortium in 2017 (Beier et al., 2017;
Mascher et al., 2017). The pseudomolecule (4.79 Gb) covers 94% of barley genome,
within which 39,734 high confidence genes were annotated. The completion of
barley reference genome is an important milestone of barley genetic research, which
marks the full entry of the barley genetic research into a genomic era represented by
high throughput DNA sequencing.
In this study, we use the achieved barley genome sequence as a reference to
investigate the genomic diversity in Australian cultivated barleys. Eleven accessions
were selected from Australian cultivated barley germplasm and sequenced with
whole-genome shotgun sequencing approach. Firstly, we detected the whole
genome-wide SNPs and short InDels in the eleven Australian cultivated barleys.
Secondly, we investigated the evolutionary relationship of Australian cultivated
barley with 15 other barley accessions and the genomic divergence of Australian
cultivated barley between wild barley from ‘the Evolution Canyon’.
84
4.2 Methods
Materials and datasets
In this study, eleven Australia cultivated barleys (Table 4-1, No.1-11) were selected
and sequenced with whole-genome shotgun strategy. They were grown in Murdoch
University glasshouse under standard management. Total genomic DNA was
extracted from leaf at three leave stage for each sample and randomly fragmented
using ultrasound method for sequencing libraries preparation. Sequencing libraries
were prepared according to the manual of Illumina sequenc ing library kit (Paired-
End Sample Preparation Guide, Illumina, 1005063). Then, they were sequenced on
HiSeq 2500 platform in BGI-Shenzhen (Beijing Genomics Institute-Shenzhen,
Shenzhen, China).
Besides, whole genome sequencing data of 10 barley accessions from our previous
project (Table 4-1, No.12-21). Five samples (Bowman, Barke, Haruna_Nijo, Igri,
B1k-04-12) sequenced by the Leibniz Institute of Plant Genetics and Crop Plant
Research (IPK) are downloaded from the NCBI-SRA database. We only
incorporated the sequencing data of these samples into our analysis to compare the
difference between genomic difference between Australian barleys and European
barleys.The detail information of these samples and data are given in Table 4-1.
85
Table 4-1 Sample and data information
No. Sample Type Data_origin Institute
1 Scope AUS cultivar sequencing WBGA
2 La_Trobe AUS cultivar sequencing WBGA
3 WI4304 AUS breeding line sequencing WBGA
4 Commander AUS cultivar sequencing WBGA
5 Fleet
AUS cultivar sequencing WBGA
6 Hindmarsh AUS cultivar sequencing WBGA
7 Vlamingh AUS cultivar sequencing WBGA
8 Franklin AUS cultivar sequencing WBGA
9 YYXT China cultivar sequencing WBGA
10 XYH0 China cultivar sequencing WBGA
11 XYH2 China cultivar sequencing WBGA
12 AC_metcarlfe AUS cultivar previous project WBGA
13 Baudin AUS cultivar previous project WBGA
14 SFS2.1 EC wild barley previous project WBGA
15 SFS2.2 EC wild barley previous project WBGA
16 SFS2.3 EC wild barley previous project WBGA
17 NFS6.1 EC wild barley previous project WBGA
18 NFS7.1 EC wild barley previous project WBGA
19 NFS7.2 EC wild barley previous project WBGA
20 TB1 Tibetan wild barley previous project WBGA
21 TB2 Tibetan wild barley previous project WBGA
22 Igri Japan cultivated barley EMBL/ENA IPK
23 Haruna Nijo Japan cultivated barley EMBL/ENA IPK
24 Bowman US cultivated barley EMBL/ENA IPK
25 Barke German cultivated
barley EMBL/ENA IPK
26 B1k-04-12 Israel wild barley EMBL/ENA IPK
86
Mapping reads to reference and detecting genomic variant
The detailed procedures for mapping reads against reference and genomic variation
detection for the collected sequencing data can refer to Chapter 3, and the summary
is given as follows. Firstly, a strict filtration was performed to remove reads with
adapter contamination and trim the poor-quality bases for each accession using
Cutadapt (Martin, 2011). The quality of clean data was assessed using Fastqc
(Andrews, 2014) before further analysis. Then, the barley pseudo molecular genome
was used as the reference to map the clean reads using BWA-MEN (Li, 2013). Next,
only those having unique mapping positions remained and proceeded to detect
genomic variation (SNPs and InDels). The final SNPs and InDels were generated by
running two rounds of variants calling analysis using SAMTools/BCFTools
(Hofmann, 2014; Ibarra et al. , 2016) and GATK (Graeber et al., 2012) pipelines. The
first round was performed with SAMTOOLS and BCFTOOLS for all accessions and
the result was controlled based on the variants quality. Then, the calling result of the
first round was used as the input file to guide the realignment around potential InDels
and variants calling for the second round in GATK. The SNPs and InDels were
detected by both SAMTools/BCFTools pipeline and GATK were retained and then
strictly filtered mainly based on base quality, mapping quality and variants
supporting depth (Li, 2014).
Evolutionary relationship and selective sweep analysis
The evolutionary distances were calculated using the Maximum Composite
Likelihood method, based on SNP variants evenly distributed in whole-genome
without missing data. Phylogenetic tree was constructed using the Neighbour-Joining
87
method in MEGA software and validated with bootstrap test (1000 replicates).
Selective sweep analysis was conducted by VCFtools, which measures the patterns
of allele frequencies in each 1 Mb fragment along chromosomes using the whole-
genome-wide SNP variation data. Genomic regions with FST values above 0.89 were
considered under strong selective sweeps.
Genetic effect prediction and gene function annotation
Genetic effect of each variant was predicted using SnpEff (Cingolani et al., 2012)
based on its position in gene models and the changing of coding product. Function
annotation of genes with high effect were predicted based on the sequence homology
with well researched genes in databases of NR, Swiss-Prot, KEGG, COG and GO.
This step was achieved using AutoFACT (Koski et al., 2005).
88
4.3 Results
4.3.1 Whole genome sequencing of eleven cultivated barleys
To investigate the genomic diversity in cultivated barleys, eleven barley cultivars
were selected as representatives for whole genome shotgun sequencing. The identity
of the Australian barleys have been examined according to their DNA fingerprint.
For each sample, genomic DNA was fragmented and fragments with the size of 300
to 500 bp were selected for preparing sequencing libraries. For eleven cultivated
barleys, paired-end sequencing with reads length of 100 to 150 bp produced a total of
16,052,460,585 clean reads after filtration. Detail information of sequencing output
for each sample was given in Table 4-2. Sequencing coverage of individual sample
ranges from 25-fold to 42-fold. Clean reads of each accession were mapped to the
barley reference genome using BWA-MEM and mapped reads with gaps were
realigned using GATK. Mapping rate is about 95% on average for these eleven
barley accessions. Only those reads have unique mapping positions with high quality
scores were further used to detect SNPs and short InDels
89
Table 4-2 Summary of sequencing output and variants calling
No. Sample Read lengh
(bp) Clean reads Clean bases Coverage (X) SNP number
InDel number
1 Scope PE-100 2,003,430,854 200,343,085,400 41.48 11,004,995 873,208
2 La_Trobe PE-100 1,867,321,773 186,732,177,300 38.66 11,462,699 892,956
3 WI4304 PE-100 2,198,504,849 219,850,484,900 45.52 12,345,774 935,979
4 Commander PE-100 2,164,074,376 216,407,437,600 44.80 13,125,947 987,294
5 Fleet PE-100 2,138,529,486 213,852,948,600 44.28 12,082,089 952,594
6 Hindmarsh PE-100 2,029,068,470 202,906,847,000 42.01 11,486,896 896,416
7 Vlamingh PE-100 1,997,091,617 199,709,161,700 41.35 12,511,246 943,524
8 Franklin PE-150 821,367,392 123,205,108,800 25.51 12,003,604 731,997
9 YYXT PE-150 853,749,138 128,062,370,700 26.68 13,342,817 766,413
10 XYH0 PE-150 668,729,618 100,309,442,700 20.90 9,896,567 552,573
11 XYH2 PE-150 810,200,780 121,530,117,000 25.32 10,244,910 614,319
90
4.3.2 SNP and InDel variants in eleven cultivated barleys
In total, 29,550641 SNPs and 1,592,034 InDels were detected for the eleven
Australian cultivated barleys. Detailed variant information of each sample was in
Table 4-2. On average, about 11 million SNPs and 867 thousand InDels were
detected for each cultivated barley. Barley cultivar Commander has the most SNPs
(13,125,947) and InDels (987,294) compared to the barley reference cv Morex.
Effects of all detected SNPs and InDels were predicted based on their positions in the
gene models of barley reference and change of the coding product. Variants located
in the intergenic regions accounted for about 88.35% of variants , while 0.49% of
variants were in exon regions. For those in exon regions, 194,656 of them caused
missense mutation and 25,235 caused frameshift mutation.
Table 4-3 Pairwise comparison of seven Australian barley varieties
Difference Command
er Fleet
Hindmash
La_Trobe
Scope Vlaming
h WI4304
Commander
0 1,825,40
8 1,905,59
8 1,861,09
8 2,361,19
1 2,188,03
3 2,509,81
4
Fleet 1,825,408 0 1,987,58
8 1,940,13
9 2,435,43
7 2,282,32
6 2,445,76
3
Hindmash 1,905,598 1,987,58
8 0 765,570
2,369,199
2,112,996
2,194,204
La_Trobe 1,861,098 1,940,13
9 765,570 0
2,294,582
2,071,247
2,154,517
Scope 2,361,191 2,435,43
7 2,369,19
9 2,294,58
2 0
2,580,188
2,560,647
Vlamingh 2,188,033 2,282,32
6 2,112,99
6 2,071,24
7 2,580,18
8 0
2,469,176
WI4304 2,509,814 2,445,76
3 2,194,20
4 2,154,51
7 2,560,64
7 2,469,17
6 0
91
4.3.3 Evolutionary relationship analysis
To investigate the evolutionary relationship of Australian cultivated barley with
cultivated barley from other continents and wild barleys, we incorporated the whole
genome sequencing data of ten samples from our previous project and five samples
sequenced by Leibniz Institute of Plant Genetics and Crop Plant Research.
Evolutionary distance was measured using the Maximum Composite Likelihood
method. The final phylogenetic tree was showed in Figure 4-1. Most Australian
cultivated barleys were clustered in one group, while seven Israel wild barleys were
clustered into another group.
92
Figure 4-1 Evolutionary relationships of 26 barley accessions.
The phylogenetic tree was inferred using the Neighbour-Joining method with
bootstrap replication of 1000. The evolutionary distances were computed using the
Maximum Composite Likelihood method. The analysis was based on 1,864 SNP
variants evenly distributed in whole genome of 26 barley accessions.
4.3.4 Genetic divergence between Australian cultivated
barley and EC wild barley
93
The genomic difference between the Australian and wild barleys were calculated
with the statistic value of Fst, which tests whether the allele frequency of each
screening window is significant different between the two groups. The Fst was
calculate using VCFtools with the screening window size of 1 Mb and moving step
of 250 Kbs’. Only SNPs with allele frequency over than 5% were kept for the sweep
selection analysis. The FST between these two barley groups was plotted in the
Figure 4-2. Large continuous genomic regions with environmental selection can be
seen around the centomere regions of chromosome 1H, 4H and 6H (Figure 4-2). In
total, 2,447 out of 19,341 windows have FST value bigger than 0.89, which covers
612 Mb genomic region. There are 2,649 high-confidence genes within the genomic
regions under environmental selection. GO enrichment for 933 out of 2,649 genes
was conducted and the result is showed in Figure 4-3. Most of genes under
environmental selection are classified into binding and catalytic activity in molecular
function as well as metabolic process in biological process annoation.
94
Figure 4-2 Genomic difference between Australian cultivated barleys and EC wild
barleys.
It was calculated based on whole-genome wide SNPs between Australian cultivated
barleys and ‘the Evolution Canyon’ wild barleys. Red horizontal line indicates the
threshold of 95%, which is the threshold value of 0.89.
95
Figure 4-3 GO enrichment of genes under environmental selection.
96
4.4 Discussion
The completion of barley reference genome with a high qua lity is an important
milestone in barley genetic research, which provides a blueprint to investigate the
genomic evolution of barley and to isolate the genes controlling important agronomic
traits (Mascher et al. , 2017). In this study, we sequenced eleven Australian cultivated
barleys using whole genome shotgun sequencing strategy to investigate the genomic
diversity in Australian cultivated barley and its domestication. A total of 29,550641
SNPs and 1,592,034 InDels were detected for the eleven Australian cultivated
barleys. Based on genomic variants, we investigated the evolution relationship
between Australian cultivated barleys with other fifteen barley accessions including
wild barleys and cultivated barleys from other continents. Most Australian cultivated
barleys are clustered into a group and apart from cultivated barleys from Asia and
America. This phenomenon was assumed to be due to the selection of Australian
specific environment and climate. LaTrobe is the sister line of Hindmarsh because
Latrobe is selected from Hindmarsh with a spontaneous mutation. Similarly, XYH2
is a mutant line from XYH0. Moreover, we investigated the genetic divergence
between Australian cultivated barley and wild barley from ‘the Evolution Canyon’.
612 Mb genomic regions containing 2,649 high-confidence genes were demonstrated
to be under the selection of domestication.
Traditional molecular markers such as RFLP (Sherman et al., 1995), SSR (Varshney
et al., 2007) and array-based markers such as DArT array (Wenzl et al., 2004), SNP
array (Close et al., 2009) have played important roles in primary mapping of many
important agronomic QTLs. But the limited number of these molecular markers
97
cannot meet the requirement of QTL fine mapping. Lack of polymorphic molecular
marker is one of the two major challenges in the process of QTLs fine mapping in
barley. The genomic variants discovered in this study can be used to develop
molecular markers with the highest density. For example, Jia et al. (2016) narrowed
down the target region of the semi-dwarf gene ari-e from 6.8 cM to 0.58 Mb region
using the InDel markers developed based on the Indel information between cv.
Hindmash and Tibetan wild barley W1. Wang et al. (2017) developed Indel markers
based on genetic variants between cv Scope and cv. Vlamingh in a target region of
9.85 Mb. After genotyping with the new InDel markers, the candidate region was
further narrowed down to a 0.482 Mb region only containing 11 high-confidence
genes.
Discovering the genomic diversity is the fundamental for further explore the
invaluable genetic resource in each specie. To discover the genomic diversity in rice,
an international cooperation project was performed to whole genome shotgun
sequencing about 3000 rice accessions (Li et al. , 2014a). This resource has been used
to research the evolution and domestication of rice. Discovering the genomic
diversity of barley germplasm is limited by the large genome size and cost. In this
study, we performed high coverage whole genome sequencing for several major
Austrian barleys. For example, ‘Baudin’ and ‘La_Trope’ are two of the popular
marting barleys widely grown in Australia. This partly uncovered the genomic
diversity in Australian barley germplasms. These genomic diversity information of
Australian core varieties are very important to find out which functional genes are
critical for barley to better adapt to the special environment/climate in Australia. It
remains expensive to perform whole genome sequencing for a large number of
98
barley accessions given the large size of barley genome. Compared with whole
genome sequencing, exon sequencing is another option to discover the genomic
diversity in gene coding regions for species with large genome size like barley
(Mascher et al., 2013b; Warr et al. , 2015). A whole exome capture platform of barley
has been developed, which provides the possibility to selectively capture the coding
regions (about 61.6 Mbs) in barley genome (Mascher et al., 2013b). Using this barley
exome capture platform, a collection of 267 landrace barleys and wild barleys were
sequenced to understand the environmental adaptation of barley (Russell et al. ,
2016). Based on the genomic diversity discovered in this barley germplasm, it was
indicated that seasonal temperature and the amount of rainfall are two of the major
drivers for barley’s environmental adaptation. With the advancing of the DNA
sequencing technology and the reducing cost, it will be possible to investigate the
whole genome wide diversity for a large number of barley germplasms.
99
Chapter 5 BarleyVarDB, a database of barley
genomic variation
Abstract
Barley (Hordeum vulgare L.) is one of the first domesticated grain crops and has
been the fourth most important cereal source for human and animal consumption.
BarleyVarDB is a database of barley genomic variation. It can be publicly accessible
through the website at http://146.118.64.11/BarleyVar. This database mainly
provides three sets of information. Firstly, there are 57,754,224 SNPs and 3,600,663
InDels included in BarleyVarDB, which were identified from high-coverage whole
genome sequencing of twenty barley germplasm, including seven wild barley
accessions from three barley evolutionary original centres and 13 barley landraces
from different continents. Secondly, it makes the latest barley genome reference and
its annotation information publicly accessible, which has been achieved by the
International Barley Genome Sequencing Consortium (IBSC). Thirdly, 522,212
whole genome-wide microsatellites/SSRs, were also included in this database, which
was identified from the barley pseudo molecular genome sequence. In addition,
several useful web-based applicants were incorporated in BarleyVarDB, such as
Jbrowse, BLAST and Primer3. BarleyVarDB is featured by automatically designing
PCR primers for detecting any variants deposited in this database and a friendly
interface for accessing barley reference. In short, BarleyVarDB will benefit the
barley genetic research community by giving them access and making full use of all
100
publicly available barley genomic variation information and barley reference genome
as well as providing them ultra-high density of SNP and InDel markers for molecular
breeding and identification of functional genes with important agronomic traits in
barley.
101
5.1 Introduction
Single Nuclear Polymorphisms (SNPs) and Insertions or Deletions (InDels) are the
two most common types of genetic variations among living organisms. They have
played important roles in examining genetic diversity (Seberg and Petersen, 2009),
positional cloning (Li et al. , 2011; Xue et al. , 2008; Yan et al., 2014), association
mapping (Huang et al. , 2010; Huang et al., 2011b; Huang et al. , 2012b; Yano et al. ,
2016), and evolutionary biology (Huang et al., 2012a; Lin et al., 2014) in the past
decades. Barley (Hordeum vulgare L.) is one of the first domesticated grain crops
(Badr et al., 2000; Zohary et al., 2012) and has been the fourth most important cereal
source for human and animal consumption. It is widely used as a model organism to
research the genetic basis of plant adaptive evolutionary processes and as a vital
genetic repository to explore plant abiotic and biotic stress tolerance, as it can be
grown in diverse and extreme environments such as warm-dry Near East and in cold-
dry Tibet (Nevo, 2014b). The availability of barley reference genome achieved this
year (Mascher et al., 2017) and the dramatic advantage of next-generation
sequencing technology will make it more efficient to explore genomic variation in
barley germplasm, which will accelerate barley genomic and genetic research
progress. There are several genomic variation databases such as RiceVarMap (Zhao
et al., 2015), SNP-Seek (Alexandrov et al. , 2015) and HapRice (Yonemaru et al. ,
2014) conducted in rice and dbSNP in the National Center for Biotechnology
Information (NCBI) (Sherry et al., 2001). There however, hasn’t been any public
database available of genomic variation information in barley until recently, except
several databases for molecular markers such as GrainGenes (Carollo et al., 2005)
102
and barley physical map information of Barlex (Colmsee et al., 2015). At this stage,
a comprehensive database of barley genomic variation is in an urgent need for the
barley genomic research community to share and make good use of the genomic
variation information exploited by different research groups. Here, we constructed a
comprehensive database specializing in barley genomic variation and designated it as
BarleyVarDB. In summary, BarleyVarDB database mainly includes three sets of
data. Firstly, there are 57,754,224 SNPs and 3,600,663 InDels provided in
BarleyVarDB, which were identified from high coverage whole genome sequencing
(~40X) of twenty diverse barley germplasms representing wild barley accessions
from three widely accepted barley evolutionary origins and cultivar barley accessions
from different continents. Secondly, it makes the latest barley genome reference and
corresponding gene annotation information accessible to the research public, which
has been achieved by the International Barley Genome Sequencing Consortium
(IBSC) this year (Mascher et al., 2017). Thirdly, 522,212 microsatellites, also called
simple sequence repeats (SSRs), were included in this database and identified from
the barley pseudo molecular genome sequences. In addition, several friendly and
useful web-based tools were incorporated into BarleyVarDB, such as the genetic
genome browse (JBrowse) (Holton, 2002) for displaying the barley genomic data in
an overall, basic local alignment search tool (BLAST) (Altschul et al., 1990) for
mapping sequences against barley genome and annotation sequences and primer3 (Li
et al., 2015) for primer design. BarleyVarDB is publicly accessible through the
website interface at http://146.118.64.11/BarleyVar.
103
5.2 Database construction
The framework of BarleyVarDB is shown as the Figure 5-1a. The first aim of this
database is to share and make full use of the genomic variants (SNPs, InDels) mined
from increasingly accumulated genome sequencing data in barely research
community. These genomic variants can be retrieved by different query keywords
such as variants identifier, target region, candidate gene and so on. The second aim is
to distribute the updated barley genome reference with high quality to the public and
also make it easily accessible by barley genetic researcher without programming
experience through web-based applicants like JBrowse and BLAST. Both the
genomic variants and barley genome reference pave the way for gene mapping of
important agronomic traits, new competitive varieties breeding using molecular
marker assistant selection and research into the adaptational evolution and
domestication of barley.
104
Figure 5-1 Framework, implementation, and date collection of BarleyVarDB.
a, the framework and aims for building BarleyVarDB; b, the overview of the
implementation of BarleyVarDB; c, the geographic distribution map of barley
accessions collected in BarleyVarDB.
BarleyVarDB was mainly implemented using the open-source HTTP server of
Apache2 and the relational database management system of MySQL (Figure 5-1b).
Apache server provide the web interface for database clients to visit and interact with
the database server using internet browsers. All the genomic variations and barley
105
reference genome annotation information are stored and maintained in a MySQL
database server. Clients can retrieve these information through internet browser and
Apache server. Genomic variants database can be maintained and updated using
MySQL server commands in Linux or using Phpmyadmin through web browser.
Besides, some perl, python and PHP scripts were used to make the display of the
retrieved result friendlier as well as carry out some basic functions such as keywords
searching in batch.
At present, we have collected about 3.70 Tbs sequencing data of whole genome
sequencing of twenty-one barley accessions in high-coverage, which represent wild
barley from three evolution centres and cultivar barley from different continents
(Figure 5-1c). The first dataset comprises of six wild barley accessions from the
Tibetan plateau and opposing slopes of “Evolution Canyon” as well as nine cultivar
barley accessions from Australia and Canada. The detailed description of sample
collection and whole genome sequencing for this dataset were given in Chapter 3 and
Chapter 4. The main content of this chapter is to create a barley genomic variation
database to make the genomic information explored in chapter 3 and chapter 4
accessible to barley research community. In short, these barley germplasms were
planted in the glass house and genomic DNA was extracted from leaves of the plants
at three leaf stage using the CTAB (cetyl trimethylammonium bromide) method. The
purity and concentration of the genomic DNA. Sequencing library for each sample
was prepared following the manufacturer’s instruction (Paired-End Sample
Preparation Guide, Illumina, 1005063) and then sequenced on the Illumina HiSeq
2000 platform in BGI-Shenzhen (Beijing Genomics Institute-Shenzhen, Shenzhen,
China). Eventually, about 2.96 Tb high quality reads were produced from these
106
fifteen barley accessions and the raw data submitted to NCBI/SRA with BioProject
accession number PRJNA324520. The second dataset (about 736 Gbs) comprises of
five barley cultivars and one wild barley accession (International Barley Genome
Sequencing et al., 2012), which was performed by Leibniz Institute of Plant Genetics
and Crop Plant Research (IPK). It was downloaded from European Nucleotide
Archive (ENA, EMBL-EBI) with the accession number PRJEB628.
The detailed methodology for mapping reads against reference and genomic
variation detection for the collected sequencing data can refer to Chapter 3 and
Chapter 4, and the summary is given as follows. Firstly, a strict filtration was
performed to remove reads with contamination and trim the poor-quality bases for
each accession. Then, the barley pseudo molecular genome was used as the reference
to map the clean reads using BWA-MEN (Li, 2013). On average, about 98% of clean
reads were mapped to the reference sequence and about 49% of them have unique
mapping positions in the reference with high mapping quality. Next, only those
having unique mapping positions remained and proceeded to detect genomic
variation (SNPs and InDels). The final SNPs and InDels were generated by running
two rounds of variants calling analysis using SAMTools/BCFTools (Hofmann, 2014;
Ibarra et al., 2016) and GATK (Graeber et al., 2012) pipelines. The first round was
performed with SAMTOOLS and BCFTOOLS for all accessions and the result was
controlled based on the variants quality. Then, the calling result of the first round
was used as the input file to guide the realignment around potential InDels and
variants calling for the second round in GATK. The SNPs and InDels were detected
by both SAMTools/BCFTools pipeline and GATK were retained and then strictly
107
filtered mainly based on base quality, mapping quality and variants supporting depth
(Li, 2014).
Currently, we don’t have extra data for the validation. But our analysis pipeline has
been tested using the benchmark datasets, the accuracy is up to 95%. The variants
information in this chapter has been widely used for molecular marker development
in our own group and other groups.
108
5.3 Database description
Search SNP/InDel/SSR by Id: In this interface, web users can retrieve certain
SNP/InDel/SSR by entering their identifiers (such as “SNP|chr1H|000000749”,
“IND|chr1H|000023712”, “SSR|chr1H|000068174”), which is unique in the
database. Two pieces of information are included in the identifier: variation type
(“SNP”, “IND”, “SSR”) and chromosome coordinate (eg. chr1H|000000749). For
example, “SNP|chr1H|000000749” means a SNP at 749 bp in chromosome 1H of
barley reference.
Search SNP/InDel/SSR in target region: In this interface, a target region together
with barley accessions can be given and be used to retrieve SNP/InDel information.
Then the SNP and InDel Information in the target region among the given accessions
can be obtained. For SSR search, a certain region can also be given to acquire SSR
records in the target regions.
109
Figure 5-2 Web interfaces of retrieving SNPs, INDELs, SSRs and gene annotation
in BarleyVarDB.
a, main interface of SNP search input and result output; b, main interface of INDEL
search input and result output; c, main interface of SSRs search input and result
output; d, main interface of gene annotation search input and result output.
Search SNP/InDel/SSR within gene: genomic variants in gene region are quite
useful, because different alleles may result in the change of phenotype and they can
be used as a functional marker to track target genes of agronomic importance. The
genetic effects of each SNP or InDel were assessed based on their locations in gene
110
structure (5’UTR, exon, intron, 3’UTR) and the change of gene coding products
(nonsense mutation, missense mutation, open reading frame shift and so on) using
snpEff (Cingolani et al., 2012). Because the high coverage of whole genome
sequencing, the information of these SNP/InDels location and genetic effects will
also be useful to predict the candidate genes in target region of map-based cloning.
Search polymorphic SNP/InDel between two Samples: Comparison of genetic
background between two parent lines of a mapped population is a common task for
genetic research. In this interface, the user can directly get SNP/InDel sites with
different alleles for two given samples. This information is very useful in developing
polymorphic molecular markers for agronomic QTL mapping or marker-assistance
selection breeding. PCR Primers to examine polymorphic InDels can be
automatically designed after selecting the hyperlinked InDel names. For polymorphic
SNP sites, they can be converted to CAPS markers easily if they are located in the
recognizing sequence of restriction enzymes. Primer automatically design is
performed by a background processing script.
Web-based tools for barley genetic researchers: JavaScript-based Genome
browser (JBrowse) (Skinner et al., 2009) is a user-friendly and interactive web
interface for displaying and manipulating whole genome-wide datasets such as
reference genome sequence, reference annotation information, genomic variations,
gene expression levels and so on. All these large sizes of dataset are stored in
mySQL database which will accelerate the process of retrieving and displaying.
Meanwhile, a BLAST server for the latest version of barley genome reference are set
up in this website. The pre-build BLAST databases including whole genome
111
sequence, coding sequences, transcript sequences and predicted protein sequences of
barley reference. Besides, a web-based PCR primer design system is also
implemented with primer3plus as the core program. Except for the normal primer
design function, some special features for barley genome are added. For example,
users can directly pick up primers by entering the chromosome coordinate of the
target region in barley. It will save barley genetic researchers a lot of time and
increase the efficiency of their research.
112
Figure 5-3 Examples of software applications in BarleyVarDB.
a, example of gene annotations of barley genome reference displayed in Jbrowse; b
search interface of barley reference genome blast server; c example of primer designs
in barley using primer3plus.
113
5.4 Future improvement
We are applying for a Domain Name System (DNS) ID (web site) for this database.
Before the DNS applying has been approved, this database could be accessed
through the current IP address. Later on, we will link the current IP address to the
web site so users could access to the database by either current IP address or a web
site.
As more barley germplasms are sequenced, we will continuously exploit genomic
variation information from more sequenced data and add their genomic variation
information into BarleyVarDB to keep it the most comprehensive database for barley
genomic variation. Meanwhile, more web-based software resources for the barley
genetic research community to explore barley genomic data will be developed and
incorporated in BarleyVarDB. Feedback from users will be taken into consideration
when making BarleyVarDB more user-friendly.
114
Chapter 6 Characterization of genome-wide
variations induced by gamma-ray
radiation in barley using RNAseq
Abstract
Induced mutants not only provide a new approach to increase the diversity of
desirable traits for breeding new varieties, but also benefit characterization of the
genetic basis of functional genes. In the past decades, many mutant genes have been
identified to be responsible for the phenotype changes in mutants in various species
including Arabidopsis, rice and so on. However, the entire features of mutations in
induced mutants and the underlying mechanism of various types of artificial
mutagenesis remain unclear. In this study, we adopt transcriptome sequencing
strategy to characterize all mutations in coding regions in a barley dwarf mutant
induced by gamma radiation. In total, 1,193 genetic mutations were detected in gene
transcription regions introduced by gamma ray radiation. Interestingly, most of them
are concentrated in certain regions in two major chromosomes. Eventually, 140 out
of 26,745 expressed genes were affected due to ga mma radiation and their biological
functions include cellular process and metabolic process and so on. Our results
indicated that mutations induced by gamma radiation are not evenly distributed in
whole genome, but prone to occur in several concentrated regions in chromosomes.
Our study has provided an overview of the feature of the genetic mutations and genes
115
caused by the gamma ray radiation, which will benefit the deeper understanding the
mechanism of radiation mutation and utilizing of them for gene function analysis.
116
6.1 Introduction
In the past century, thousands of new crop varieties have been released from induced
mutants and widely grown (Ahloowalia and Maluszynski, 2001). They have played
an important role in creating desirable agronomic traits including higher yield,
improved quality, and abiotic stress tolerance for crop breeding. It was reported that
new varieties derived from gamma radiation mutagenesis have accounted for up to
64% of all radiation induced varieties from 1930 to 2004 (Ahloowalia et al., 2004).
For instance, ‘Calrose 76’, generated by gamma radiation, had been the prominent
rice variety in American California state until the late 1970. ‘Calrose 76’ is about 25
cm shorter and has higher potential yield than its parent line ‘Calrose’ (Monna, 2002;
Rutger et al., 1977). Meanwhile, artificial mutants have also been widely used to
characterize the genetic basis of functional genes controlling complex agronomic
traits with great importance in crops (Abe et al., 2012; Arite et al., 2007; Arite et al.,
2009; Ishikawa et al., 2005; Yan et al., 2007). For example, it was demonstrated that
the phenotype changing from ‘Calrose’ to ‘Calrose 76’ is due to a 1-bp substitution
induced by gamma radiation in the exon 2 of the rice ‘green revolution gene’ sd-1,
which encodes a gibberellin 20-oxidase (Monna, 2002). Although mutants are widely
used both in breeding and genetic research, the molecular features and underlying
mechanism of artificial mutations remain unclear.
The most widely used mutant developing techniques include insertion mutagenesis,
chemical mutagenesis and physical mutagenesis (Wang et al., 2013). Different
mutagenesis technologies are featured by diverse mechanisms and mutation patterns.
Comprehensive analysis of all flanking sequence tags isolated from a T-DNA
117
insertion library including 31,443 independent transformants showed that T-DNA
insertions were non-random distribution in chromosomes and happened more
frequently in the distal ends and less in the centromeric regions (Wu et al. , 2003;
Zhang et al. , 2007; Zhang et al. , 2006a). Compared with insertion mutagenesis,
chemical mutagenesis such as Ethyl Methane-Sulfonate (EMS) can induce high-
density mutations with random distribution (Wang et al., 2013). It was suggested that
EMS cause mispairing by introducing chemical modification of nucleotides and
mostly results in transition of C/G to T/A via C-to-T changing (Greene et al., 2003;
Kim et al., 2006; Koornneeff et al., 1982). It was also proved that EMS is able to
result in base-pair deletion or insertion as well as chromosome fracture (Sega, 1984).
Ionizing radiation was reported to be prone to cause chromosome alterations such as
deletions and inversions, which is the consequence of repair of ionizing radiation-
induced damage (Sankaranarayanan, 1991; Shirley et al. , 1992). Recently, one
research was conducted for six rice mutants induced by gamma radiation using
whole genome resequencing and no specific feature except mutations were almost
evenly distributed in each chromosome (Li et al., 2016c). However, the features of
mutations induced by gamma radiation in barley have not been characterized yet.
With the rapid advance of technologies in DNA sequencing and molecular biology, it
was possible to investigate all the types of mutations within whole genome level (Li
et al., 2016a; Li et al., 2017; Nordstrom et al., 2013a). It has been proved to be one of
the most efficient approaches to identify mutant genes by integrating bulk
segregation analysis and Restriction site Associated DNA Sequencing (RAD-Seq)
(Nakata et al., 2018; Takagi et al. , 2013a; Takagi et al., 2013b). Whole genome
sequencing and comparative analysis was also attempted to identify the casual genes
118
in mutants (Nordstrom et al., 2013b) or characterize all the mutations in a large
mutant library (Li et al., 2016a; Li et al., 2017). For species with large genome size
like barley (4.83 Gb) (Beier et al. , 2017; Mascher et al., 2017), performing whole
genome sequencing is still beyond the affordability of most researchers. In contrast,
RNAseq is more effective than whole genome sequencing to investigate genetic
variants in transcription regions, which are more likely to be causal for the change of
phenotype.
In this study, transcriptome sequencing was adopted to identify all the genetic
mutations in the transcription regions induced by gamma ray radiation in a mutant
Vla-MT derived from a barley cultivar Vlamingh. We then anchored these gamma
radiation mutations into annotated gene models and evaluated the genetic effects of
them on the gene coding products. We also investigated the pattern of the
distribution along chromosomes and base changes of these genetic mutations.
Finally, the potential functions of genes with coding product changes were predicted
based on protein sequence similarity. Overall, this study will provide insight for
comprehensive understanding the mechanism of mutation induced by gamma
radiation in plants and further accelerate effective application of radiation mutagen in
exploiting important genetic resource and improving crop breeding.
119
6.2 Methods
Plant materials and transcriptome sequencing
Vlamingh (Vla-WT) is a malt barely variety with high yield released from
Department of Primary Industry and Regional Development Western Australia. Two
kilogram of Vla-WT seeds were treated with 200 Gry 60
Co gamma radiation. Vla-
MT was one mutant selected from about 10,000 M2 individuals. Compared with its
wild type, Vla-MT is dwarfed and has more tillers. Vla-MT and the wild type
Vlamingh were grown in the Murdoch university Glasshouse. Total RNA of two
replicates of Vla-MT and Vlamingh were extracted from leaf at the six-week-old
seeding stage. RNAseq library was prepared following the protocol of TruSeq Pre-
Kit (Illumina) and fragments with length of 250-500 bp were selected and sequenced
on the HISeq 2000 platform in Beijing Genomics institute.
Reads mapping and SNP/InDel calling
Raw reads (90 bp paired-end) produced by the sequencer were strictly filtered to
remove reads with low quality using Sickle (NA and JN, 2011) (version 1.33). The
quality of cleaned data was assessed using FASTQC toolkit (Andrews). Clean reads
of each sample were mapped to the latest barley reference genome (Mascher et al. ,
2017) using the aligner HISAT (Kim et al. , 2015) (version 2.0.5). PCR duplications
were filtered using SAMtools (Li et al., 2009) (version 1.4.1). Only reads with
unique mapping position in reference were proceeded to calling SNPs and InDels for
each sample. This step was performed using the pipeline consisting of SAMtools and
BCFtools in SAMtools package.
120
Eventually, SNPs and InDels were filtered with following criterial: ⑴ all variants
within the low-complexity regions (LCRs) were removed; ⑵ all variants with calling
quality less than 50 and mapping quality less than 40 were all removed; ⑶ if the read
number supporting non-reference allele is bigger than five and its percentage larger
than 0.2, these variant sites will be retained. Otherwise, the variant site will be
filtered out. ⑷ any SNPs within 15 bp flanking region of an InDel will be filtered
out.
Genetic effects assessment and function annotation for mutations
Genetic effect of each mutation was predicted based on their position in gene models
and the changing of coding product using the program of SnpEff. Then those genes
with significant genetic effect were chosen to perform function and pathway
annotation. This step was achieved using AutoFACT program based on the
homology of their encoding protein with well researched genes and proteins stored in
databases of NR, Swiss-Prot, KEGG, COG and GO.
121
6.3 Results
6.3.1 SNPs and InDels screening through transcriptome
sequencing
To understand the feature of genetic mutations induced by gamma ray radiation, a
barley cultivar Vlamingh was irradiated with 200 Gy gamma rays. One dwarf mutant
(Vla-MT) was selected (Figure 6-1a). In this study, our focus are genetic mutations
in coding regions, because these mutations are more likely to cause phenotype
changes. Transcriptome sequencing strategy was adopted and performed for both
wild type Vlamingh (Vla-WT) and a dwarf mutant Vla-MT to identify genetic
mutations in coding regions caused by gamma ray radiation. Two biological
replicates were conducted for both Vla-MT and Vla-WT to increase the accuracy of
transcriptome sequencing and variant detecting. Raw data out from the sequencer of
each sample was strictly filtered and qualified to make it reliable for further
bioinformatic processing. As a result, about 4.67 Gb (on average) clean data with
high quality was generated for each sample. Then, the clean reads of each sample
were aligned to the barley reference genome with the annotated high confidence
genes as guidance. Eventually, about 95% of clean reads of each sample were
mapped to the barley genome and around 89% out of them had unique alignment
positions in barley chromosomes.
122
Figure 6-1 Phenotype of Vla-WT and Vla-MT and distribution of mutations in
chromosome 5 and chromosome 7.
a, overview of the phenotype of wild type Vla-WT (right) and mutant type Vla-MT
(left). b, distribution of detected mutations in chromosome 5 and 7. X-axis indicates
the physical coordination in chromosome 5 and Y-axis indicates the mutation
numbers for each 10 Mb regions.
In the process of SNPs and short InDels calling, only those uniquely aligned reads
were used to increase accuracy. In theory, the variants relative to the barley reference
in two repeats of each sample should be the same if the sequencing errors and
negative result could be avoided. To our surprise, the concordance rate of SNP and
InDel calling is 99.98% (45,637 out of 45,645) for two repeats of Vla-MT and
99.99% (46,582 out of 46585) for two repeats of Vla-WT. This means that the
potential errors induced in the process of sequencing and variants calling have been
properly controlled (to 0.02% - 0.01%), and the quality of detected variants are
123
reliable to further investigate the genetic mutations induced by ga mma ray radiation
in the mutant.
6.3.2 Genetic mutations induced by gamma ray radiation
Vla-MT is derived from Vla-WT by only exposing it to gamma ray radiation. So, all
the sequence difference between Vla-MT and Vla-WT are assumed to be genetic
mutations induced by gamma ray radiation. We aligned the transcriptomic sequences
from both Vla-MT and Vla-WT to the Morex reference genome sequence. It was
found that 97 % (42,686 out of 43,879) variants were identical between Vla-MT and
Vla-WT, which reflected the difference between Morex and Vlamingh. There are
1,193 variants consistently detected in two repeats of Vla-MT and Vla-WT, which
are assumed to be the genetic mutations induced by gamma ray radiation.
It was found that the 1,193 genetic mutations induced by gamma ray radiation are not
evenly distributed in each chromosome (Table 6-1). Interestingly, they are mainly
located in chromosomes 5H and 7H, except three mutations were detected in
chromosome 1H and only one in each of 2H, 4H and 6H. Chromosome 3H is the
only one where no mutation was identified. We have also found that genetic
mutations in chromosome 5H and 7H are only concentrated in several certain regions
(Figure 6-1b). This phenomenon is particularly obvious on chromosome 5H, in
which 550 genetic mutations are in 40 Mb genome regions (chr5H:590 Mb-630 Mb).
On chromosome 7H, they are not as concentrated as in 5H, but they are still in a
relative continuous region (Chr7H: 230 Mb-520 Mb).
124
Table 6-1 Genetic mutation counts in each chromosome.
Chromosome Length (Mb) Variants
1H 558.54 3
2H 768.08 1
3H 699.71 0
4H 647.06 1
5H 670.03 550
6H 583.38 1
7H 657.22 612
Un 249.77 25
Total 4,833.79 1,193
Among the 1,193 mutations induced by gamma ray radiation, 96.65% of them are
SNPs, except for 26 short deletion and 14 short insertions. The exact number of base
changes are given in table 2. There were four dominant changes: AG, CT,
GA, TC. Both Adenine (A) and Guanine (G) are pure bases with only two
different chemical groups in the side chain.
6.3.3 Genes affected by mutations
We also investigated which genes these genetic mutations are located in and which
regions they fell into by comparing their physical positions with the predicted gene
models. In theory, all of them should be anchored to transcription regions because
only mRNA fragments were selected to conduct sequencing in this study. It is
125
interesting to notice that there were still 14% of them mapped to the intergenic
regions and 6% of them to intron regions (Figure 6-2a). We speculated that there
should be some potential gene regions having not been included into the gene
annotations or alternative splicing events exist between different certain germplasm
accessions. Apart from these part mutations, 33% of all the detected genetic
mutations are anchored into gene exon regions and others are fell into upstream, 5’-
UTR, 3’-UTR and downstream regions, respectively (Figure 6-2a). Meanwhile, we
also predicted the genetic effects caused by genetic mutations anchored in gene
regions based on changing of coding product. As is shown in right pie chart of the
Figure 6-2a, there are 209 missense mutations, three nonsense mutations, two
frameshift mutations and one disruptive frameshift mutations. The others are
synonymous mutation, which are assumed to cause less effects to the coding
products.
126
Figure 6-2 Genetic effects of mutations and biological function of genes with mutant
genes.
a, genetic effect prediction of all detected mutations: left pie shows the percent of
mutations anchored into different types of genomic regions and right shows the
percent of genetic effects of mutations anchored in exon regions. b, biological
function classification of effected genes in cellular component, molecular function
127
and biological function levels. X-axis indicates the different types of functions and
Y-axis indicates the number and percent of genes fell into the classification of each
biological function description.
The 215 genetic mutations including missense, nonsense, frameshift, and disruptive
frameshift are assumed to be mutations with high level of risks, which are within
exon region and change the coding products. These 215 mutations are harboured in
140 genes. According to the protein sequence similarity with other well researched
proteins, these 140 genes are annotated with gene ontology. Their functions are
classified into three levels including biological process, molecular function, and
cellular component (Figure 6-2b). In biological process level, cellular process and
metabolic process are the two dominating parts of the high-level mutation genes,
which are followed by the cellular localization and biological regulation. In
molecular function level, binding and catalytic are the two main types for the
mutation genes.
128
6.4 Discussion
Most of previous reported mutants have been reported to impact qualitative traits
such as single tiller mutant (Li et al., 2003) and never flowering mutant (Wu et al. ,
2008) in rice and six-rowed mutant in barley (Komatsuda et al., 2007). In this
chapter, we analysed the mutations in coding regions in one mutant induced by
gamma radiation. Vla-MT, featured by increased tillers and reduced plant height
(Figure 6-1a), was initially selected as qualitative mutant. In this study, we found
that there are up to 1,193 genetic mutations detected merely in gene transcription
regions of the barley mutant generated by gamma ray radiation. And they are
concentrated in two major chromosomes rather than evenly distributed in whole
genome. Our result also suggested that even the genetic mutations in the two major
chromosomes are only within several small fragments of genomic regions. It was
assumed that the two regions with concentrated mutations may be due to the
segregation of the heterozygous regions in the wild type of Vlamingh. But this
possibility has been excluded by investigating the variants result in the wild type
Vlamingh using whole genome shotgun sequencing data with high coverage.
Another limitation is that we only surveyed the mutation pattern in one gamma
radiation mutant. The current result needs to further validate by an enlarged mutant
population. The causal gene responsible for the increased tiller number and reduced
plant height has been isolated by map-based approach and confirmed in the present
study (data not shown). The result of this study suggests that a mutant selected from
gamma radiation may also exist wider genetic variat ions for other genes and traits. In
other words, it is possible to improve quantitative traits from gamma-radiation, as up
129
to 140 genes were modified from a single mutant in the present study.Except for 26
short deletion and 14 short insertions, SNP mutation accounted for about 96.65% of
all the identified mutations. For SNP mutations, AG, CT, GA and TC are
the four major types. It was assumed that the chemical groups in the side chain prone
to be damaged by gamma ray radiation. Then, they are going to be repaired by the
DNA repair mechanism (Collins, 2004; Friedberg, 2003; Rastogi et al., 2010). Some
of them can be repaired and recovered to its original chemical structure, and the
others prone to be repaired and changed to their similar chemical structures
(Friedberg, 2003). The situation is assumed to be the same for Cytosine (C) and
Thymine (T), because they share the same monocycle and differ in two chemical
groups of the side chain.
Among the 1,193 genetic mutations, 450 out of them are anchored in gene exon
regions and 215 of them are predicted to be genetic mutations with high-level of
genetic effect, causing disruptive in-frameshift, frameshift, nonsense, and missense
mutations. Eventually, change of coding products was predicted for mutations in 140
genes due to the gamma radiation. This result is contrast with traditional views that
only a few of genes are affected and responsible for the mutant phenotype (Arite et
al., 2009; Monna, 2002; Yan et al., 2007).
Although whole genome resequencing has been used to identify mutations and
isolate mutant genes in species with simple complexity of genomes in recent years
(Li et al. , 2016a; Li et al. , 2017), it is not affordable for most researchers without
abundant funding. To achieve reliable results, the minimum depth of genome
sequence coverage was suggested as 30 times of the genome size (Sims et al. , 2014).
130
This means that it would need about 150 Gb data for each sample to investigate
mutation in whole genome for barley (Beier et al., 2017; Mascher et al. , 2017). As
most variants related to gene function-loss were reported to be in gene coding
sequence, it should be more efficient to identify the functional mutants by RNA
sequencing. In this study, an average size of 4.6 Gb RNAseq data was obtained for
each sample. This equals to about 78.41 folds of the total length of barley high
confident gene transcripts. Thus, the RNAseq is 33 times more efficient than the
whole genome sequencing. Meanwhile, RNAseq data can also be used to analysis
differential expression genes between mutant and wild plant, which was critical to
understand the genetic and biological basis of mutant genes. Besides, novel
transcripts can be detected to improve the gene annotation of genome. However,
detecting variants by RNA sequencing largely depends on the expression of potential
genes, which means variants in the genes which do not express at the sample
collection stage will be ignored. Exome capture sequencing is another option
targeting coding regions (Warr et al., 2015). There is one barley exome capture
platform developed to enrich exon regions in barley genome (Mascher et al., 2013b).
This exome capture platform is based on in-solution hybridization technology, which
could be used to enrich 61.6 Mb coding regions of barley genome (Mascher et al.,
2013b). A collection of 267 georeferenced landrace barleys and wild barleys were
sequenced using this exome capture platform (Russell et al., 2016). Both RNA
sequencing and exome capture sequencing largely reduced the genomic complexity
and sequencing cost. The disadvantage of transcriptome sequencing and exome
sequencing is that some variants located in intron region and noncoding region may
be missed.
131
Over the past several decades, map-based cloning has played a critical role in
identifying genes related to important agricultural traits in most crop species and
understanding the genetic basis of plant development (Fan et al., 2006; Li et al.,
2011; Li et al., 2014b; Xue et al., 2008). As molecular marker density has increased
dramatically with the application of genotyping-by-sequencing, the largest challenge
for map-based cloning is population size, which need to be large enough to produce
sufficient recombination events (Wang et al., 2011). However, developing a large
population to produce sufficient recombination events is laborious and expensive and
it is a challenge for species with large genome size such as barley. The present study
through combination of induced mutation and RNA sequencing provided one
important complementary strategy to classical biparental cross-mapping identifying
function genes.
132
Chapter 7 Characterization of a barley mutant
with high-tillering and narrow leaf
Abstract
Tiller number is one of most important agronomic traits and determines the grain
yield in cereal crops. Several key genes controlling the tiller development have been
isolated in rice and Arabidopsis in the past decade. However, the genetic basis of
tiller development is still not clear. In this study, we attempt to narrow down the
candidate genes responsible for increased tiller number by comparing genomic
variants in the target region between a pair of near-isogenic lines. The target region
delimited by two SNP markers covers about 10 Mb and contains 139 high-
confidence genes (Druka et al., 2011). In total, 225 variants were detected in the
target region between the near-isogenic lines after transcriptome sequencing. Four
genes in the target region contain frameshift mutations and eleven genes contain
missense mutations. These fifteen genes were assumed to be candidate genes
responsible for the mutant phenotype. This study provides new insights to accelerate
the isolation of functional QTLs by combining traditional linkage mapping and
transcriptome sequencing.
133
7.1 Introduction
Tiller number is one of the three major components of grain yield in cereal crops.
The development of tiller involves two stages: formation of dormant axillary buds
and outgrowth. The final number of tillers is co-defined by those two processes.
Cytokinin, auxin and strigolactone are three plant hormones participating in the
regulation of bud outgrowth (Domagalska and Leyser, 2011). Cytokinin is an
activator of dormant axillary buds, which is mostly synthesized in root and
transported upwards in the xylem (Tanaka et al., 2006). Auxin is assumed to
indirectly inhibit the outgrowth of axillary, because it does not enter the buds
(Bonner, 1933; Zhao, 2010). Auxin is mostly synthesized at the shoot apex and
transported downward through polar auxin transport stream. Strigolactone supresses
the activation of dormant axillary buds, which can be produced by both the shoots
and the roots (Gomez-Roldan et al., 2008; Lin et al., 2009).
In past decade, several genes have been identified in the pathway of Strigolactone
(SL) regulating tiller development in rice and Arabidopsis. For example, D27 (Lin et
al., 2009), HTD1 (Zou et al., 2006), and D10 (Arite et al., 2007) are required for
synthesizing SLs. D27 mainly expresses in vascular cells of roots and shoot and
encodes an iron-containing protein. Mutant d27 exhibits increased tillers and dwarf
phenotype in rice, which could be rescued by exogenous SLs (Lin et al., 2009).
HTD1 is a rice orthologue of MAX3 in Arabidopsis, which encodes a carotenoid
cleavage dioxygenase, which is required for synthesizing SLs (Booker et al., 2004;
Zou et al. , 2006). Mutant htd1 exhibits high number of tillers and reduced plant
height, which could be rescued by the introduction of Pro35S:HTD1. D10 is a rice
134
orthologue of MAX4 in Arabidopsis encoding a carotenoid cleavage dioxygenase 8.
D14 (Nakamura et al. , 2013), D3 (Ishikawa et al., 2005), D53 (Jiang et al. , 2013;
Zhou et al. , 2013), and IPA1(Jiao et al. , 2010; Miura et al. , 2010) participate in the
process of SLs signalling. D14 encodes a α/β-hydrolase protein, which participates in
the process of SLs perception (Chevalier et al. , 2014; Nakamura et al., 2013). D3 is a
rice orthologue of MAX2 in Arabidopsis, which encodes an F-box leucine-trich
repeat (Ishikawa et al., 2005). Mutant d53/e9 exhibits increased tiller and dwarf
phenotype and is resistant to the treatment of exogenous SLs (Zhou et al. , 2013).
Knocking down the expression of D53 could rescue the mutant phenotype in d14 and
d3, while overexpression has no effect on the phenotype. Meanwhile, degeneration
of D53 could be detected in the mutant d27 with the treatment of exogenous SLs, but
not in d3 or d14. So Zhou et al. (2013) and Jiang et al. (2013) concluded that D53
should be the downstream of D14 and D3. IPA1 reduces rice tiller number and
encodes SOUAMOSA PROMOTER BINDING PROTEIN-LIKE 14 (OsSPL14)
(Jiao et al., 2010; Miura et al., 2010). Song et al. (2017) demonstrated that IPA1 is a
direct downstream target of D53 and D53 negatively regulates the expression of
IPA1. However, little is known about the genetic basis and molecular mechanism of
tiller development in barley.
In this study, we aim to narrow down candidate genes responsible for the mutant
phenotype in a high-tillering mutant using transcriptome sequencing.
135
7.2 Methods
Materials
BW370 was obtained from NordGen genebank (http://www.nordgen.org/, stock
number BW370 or BGS-73) (Druka et al., 2011). BW370 is a near isogenic line
(NIL) of Bowman with a fragment in chromosome 2H from an X-ray induced mutant
of cv. Proctor. The generation of material BW370 was shown in Figure 7-1. Briefly,
BW370 was generated by six times repeated backcross using Bowman as recurrent
parent with the Proctor mutant as donor parent. BW370 is featured by narrower leaf,
higher number of tiller and grass
136
Figure 7-1 The generation of BW370.
137
Transcriptome sequencing
Total RNA of two replicates of BW370 were extracted from leaf four weeks after
sowing. RNAseq library was prepared following the protocol of TruSeq Pre-Kit
(Illumina) and fragments with length of 250-500 bp were selected and sequenced on
the HISeq 2000 platform in Beijing Genomics Institute-Shenzhen. RNA sequencing
generated 35,173,858 and 34,832,902 clean reads with read length of 90 bp for two
replicates of BW370. Whole genome sequencing data of Bowman was downloaded
from EMBL/ENA with the accession ID of ERP001449, which was produced by
International Barley Genome sequence consortium in 2012 (International Barley
Genome Sequencing et al., 2012).
Reads mapping and variants detection
Raw reads (90 bp paired-end) produced by the sequencer were proceeded to filter
and trim reads with poor base quality and sequencing adapter contamination using
cutadapt. The quality of cleaned data was assessed using FASTQC toolkit
(Andrews). Clean reads of each sample were mapped to the latest barley reference
genome (Mascher et al., 2017) using STAR (version 020201) with default arguments
(Dobin et al. , 2013). PCR duplications were filtered using SAMtools (Li et al. , 2009)
(version 1.4.1). Only reads with unique position in reference were preceded to calling
SNPs and InDels for each sample. This step was performed using the pipeline
consisting of SAMtools and BCFtools in SAMtools package.
Genetic effects prediction and gene function annotation
138
Genetic effect of detected variants were assessed using SnpEff (Cingolani et al. ,
2012) based on the relative position of the variants in the annotated gene model and
how they change the coding products. Gene function annotation was performed using
AutoFACT based on their sequence homology with well characterized genes stored
in database of NR, Swiss-Prot, KEGG, COG and GO.
139
7.3 Result
7.3.1 Phenotype of BW370 and Bowman
BW370 is a Near Isogenic Line (NIL) of barley cultivar Bowman. It was developed
by crossing an X-ray induced mutant of Proctor with bowman and the offspring with
mutant phenotype was backcrossed five times with cv Bowman as recurrent parent
(Druka et al., 2011). As shown in the Figure 7-2a, BW370 is featured by narrower
leaf and more tillers than Bowman. The photo was taken four weeks after sowing. At
that stage, the leaf length was about 18 cm in BW370 and about 25 cm in Bowman
(Figure 7-2b). The blade width ranged from 0.40 to 0.60 cm in BW370, while it
ranged from 0.8 to 1.0 cm in Bowman (Figure 7-2c). The average length of leaves
was about 19 cm in BW370 and 21 cm in Bowman. The morphological feature of
Vla-MT and its wild type cv. Vlamingh was shown in Figure 6-1a in previous
chapter. Briefly, Vla-MT is also featured by dwarf habit with narrow leaf and high
number of tiller.
140
Figure 7-2 Phenotype of BW370 and Bowman.
a, Photos of mutant BW370 and wild plant Bowman; b, photos of the flag leaf of
wild plant Bowman and BW370; c, box plot of leaf width of BW370 and Bowman;
d, box plot of leaf length of BW370 and Bowman.
141
7.3.2 Narrow down candidate genes by RNAseq
According to previous study, the target region of the mutant gene in BW370 was
reported to be delimited by SNP marker 1_0214 and 1_1250 (Druka et al. , 2011). It
was based on genotype comparison of 3,072 SNPs between BW370 and its recurrent
parent Bowman. After anchoring the sequences of 1_0214 and 1_1250 to barley
genome reference, their physical positions were chr2H:672,009,517 and
chr2H:682,243,690. The candidate region covers 10,234,173 bp. In this candidate
region, there are 139 annotated high-confidence genes.
In the target region (about 10 Mb), 225 variants were commonly detected in the two
replicates of BW370 compared with cv. Bowman. The variants include 196 SNPs, 15
insertions and 14 deletions. Effects of the variants were predicted based on their
position in the gene models and change of the coding products. Frameshift and
missense are two types of variants which change the coding product. Four genes
were with frameshift mutations and eleven genes with missense mutations (Table
7-1).
Table 7-1 Candidate genes and function description
No. Gene Chr Start End
Function
Description
frame
shift
missens
e
1
HORVU2Hr1
G096070.9
chr2H
6728951
70
6728989
32
CYCLIN B2;4 2 0
142
No. Gene Chr Start End
Function
Description
frame
shift
missens
e
2
HORVU2Hr1
G096080.12
chr2H
6729295
72
6729344
77
Transmembrane
protein 184C
1 0
3 HORVU2Hr1
G096250.4
chr2H 6735291
37
6735312
88
L-ascorbate
oxidase homolog
1 0
4
HORVU2Hr1
G096300.1
chr2H
6738486
12
6738567
34
Lysine-specific
histone
demethylase 1
homolog 3
1 0
5
HORVU2Hr1
G096360.13 chr2H
6741562
25
6741581
28
Aquaporin-like
superfamily protein
0 2
6 HORVU2Hr1
G096820.1
chr2H 6769408
21
6769417
24
Cactin 0 2
7
HORVU2Hr1
G096960.3
chr2H
6771914
75
6771991
59
glutathione
peroxidase 6
0 1
8
HORVU2Hr1
G097250.5
chr2H
6782345
39
6782538
90
serine
carboxypeptidase-
like 19
0 1
143
No. Gene Chr Start End
Function
Description
frame
shift
missens
e
9
HORVU2Hr1
G097270.14 chr2H
6782695
89
6782965
30
serine
carboxypeptidase-
like 19
0 1
10
HORVU2Hr1
G097380.3
chr2H
6786213
49
6786240
87
pectinesterase 11 0 1
11
HORVU2Hr1
G097800.8 chr2H
6804574
46
6804615
02
Cysteine and
histidine-rich
domain-containing
prot
0 1
12 HORVU2Hr1
G098010.1
chr2H 6816470
64
6816488
28
nudix hydrolase
homolog 8
0 1
13
HORVU2Hr1
G098020.29 chr2H
6817785
57
6817878
64
U3 small nucleolar
RNA-associated
protein 10
0 1
14
HORVU2Hr1
G098100.2
chr2H
6819241
54
6819270
69
Disease resistance
protein
0 1
144
7.4 Discussion
In this study, we adopted transcriptome sequencing to compare the sequence
difference of a target region with about 10 Mb between a pair of NILs. There are 109
high-confidence genes annotated in the target region. Transcriptome sequencing
reveal that four genes in the target region have frameshift mutations and eleven genes
have missense mutations. So, the eleven genes are more likely to be responsible for
the mutant phenotype in BW370. But more precise experiments are needed to
validate which exact gene causes the mutant phenotype and which pathway it plays
role in to regulate the tiller development in barley.
Our study demonstrated that combining the traditional gene mapping method
featured by biparental population and high throughput sequencing technology can
accelerate the identification of functional QTLs. The dramatic advance of DNA
sequencing makes it possible to detect all the sequence variants in whole genome in
single nucleotide resolution. Barley is diploid (2n=2x=14) with an estimated haploid
genome of 5.1 Gb, which is 1.5 times of human genome size and 13.4 times of rice
genome size. As a result, comparing sequence difference between individuals by
whole genome sequencing is quite expensive in barley. In contrast, RNA sequencing
is a good alternative to investigate sequence difference in barley. RNAseq focuses on
the interested region rather than whole genome, so it is an efficient strategy to
investigate genetic diversity only in gene coding regions.
Map-based cloning and genome-wide association studies are the two popular
approaches to identify functional genes in crops such rice (Huang et al. , 2010; Huang
et al., 2011b; Wang et al., 2011). Up to 2000 genes controlling important agronomic
145
traits have been cloning in rice, which provides vital gene resources for rational
breeding of competitive varieties in rice (Yao et al., 2018; Zeng et al., 2017).
However, the functional gene isolation in barley is challenged by the large genome
size. In species with large genome size like barley, gene mapping using mutant
approved to be a relatively efficient approach. Bowman mutant-containing isogenic
lines are the major mutant resource in barley (Druka et al., 2011). In recent years,
dozens of functional genes related to important agronomic traits have been isolated,
like VRS2 (Youssef et al., 2017), VRS3 (Bull et al. , 2017), Zeo (Houston et al., 2013).
Combining mutant resource with DNA high throughput sequencing technology
provide great convenience for functional gene mapping. Jiao et al. (2018) performed
whole genome sequencing to one mutant bulk and one wild type bulk and identified
MSD1 regulating pedicellate spikelet fertility sorghum. Given the 5.1 Gbs of barley
genome size, it remains expensive to perform whole genome sequencing for gene
mapping in barley. (Mascher et al., 2014) adopted exome capture sequencing
strategy to isolate a mutant gene responsible for the many-noded dwarf (mnd)
phenotype. This study set a good example to isolate mutant gene by high throughput
sequencing. In this chapter, we have detected all the mutations in gene coding
regions of the target fragment using RNA sequencing. However, we still cannot
pinpoint the causal gene for the mutant phenotype because the background mutation
cannot be excluded. Mutant gene mapping by RNA sequencing could only be used to
isolated genes which express in the sample collection stage. In future, an F2
segregation population is still required and used to exclude the background
mutations.
146
Reference
Abe, A., Kosugi, S., Yoshida, K., Natsume, S., Takagi, H., Kanzaki, H., Matsumura,
H., Yoshida, K., Mitsuoka, C., Tamiru, M., et al. (2012). Genome sequencing reveals
agronomically important loci in rice using MutMap. Nat Biotechnol 30, 174-178.
Ahloowalia, B.S., and Maluszynski, M. (2001). Induced mutations - A new paradigm
in plant breeding. Euphytica 118, 167-173.
Ahloowalia, B.S., Maluszynski, M., and Nichterlein, K. (2004). Global impact of
mutation-derived varieties. Euphytica 135, 187-204.
Alexandrov, N., Tai, S., Wang, W., Mansueto, L., Palis, K., Fuentes, R.R., Ulat, V.J.,
Chebotarov, D., Zhang, G., Li, Z., et al. (2015). SNP-Seek database of SNPs derived
from 3000 rice genomes. Nucleic Acids Res 43, D1023-1027.
Alter, S., Bader, K.C., Spannagl, M., Wang, Y., Bauer, E., Schon, C.C., and Mayer,
K.F. (2015). DroughtDB: an expert-curated compilation of plant drought stress genes
and their homologs in nine species. Database (Oxford) 2015, bav046.
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. (1990). Basic
local alignment search tool. J Mol Biol 215, 403-410.
Andrews, S. FastQC: a quality control tool for high throughput sequence data.
Andrews, S. (2014). FastQC: a quality control tool for high throughput sequence
data. Version 0.11. 2. Babraham Institute, Cambridge, UK http://www bioinformatics
babraham ac uk/projects/fastqc.
147
Arabidopsis Genome, I. (2000). Analysis of the genome sequence of the flowering
plant Arabidopsis thaliana. Nature 408, 796-815.
Arite, T., Iwata, H., Ohshima, K., Maekawa, M., Nakajima, M., Kojima, M.,
Sakakibara, H., and Kyozuka, J. (2007). DWARF10, an RMS1/MAX4/DAD1
ortholog, controls lateral bud outgrowth in rice. Plant J 51, 1019-1029.
Arite, T., Umehara, M., Ishikawa, S., Hanada, A., Maekawa, M., Yamaguchi, S., and
Kyozuka, J. (2009). d14, a strigolactone-insensitive mutant of rice, shows an
accelerated outgrowth of tillers. Plant Cell Physiol 50, 1416-1424.
Ariyadasa, R., Mascher, M., Nussbaumer, T., Schulte, D., Frenkel, Z., Poursarebani,
N., Zhou, R., Steuernagel, B., Gundlach, H., Taudien, S. , et al. (2014). A sequence-
ready physical map of barley anchored genetically by two million single-nucleotide
polymorphisms. Plant Physiol 164, 412-423.
Atwell, S., Huang, Y.S., Vilhjalmsson, B.J., Willems, G., Horton, M., Li, Y., Meng,
D., Platt, A., Tarone, A.M., Hu, T.T., et al. (2010). Genome-wide association study
of 107 phenotypes in Arabidopsis thaliana inbred lines. Nature 465, 627-631.
Badr, A., Muller, K., Schafer-Pregl, R., El Rabey, H., Effgen, S., Ibrahim, H.H.,
Pozzi, C., Rohde, W., and Salamini, F. (2000). On the origin and domestication
history of Barley (Hordeum vulgare). Mol Biol Evol 17, 499-510.
Bauer, E., Schmutzer, T., Barilar, I. , Mascher, M., Gundlach, H., Martis, M.M.,
Twardziok, S.O., Hackauf, B., Gordillo, A., Wilde, P., et al. (2017). Towards a
whole-genome sequence for rye (Secale cereale L.). Plant J 89, 853-869.
148
Beier, S., Himmelbach, A., Colmsee, C., Zhang, X.Q., Barrero, R.A., Zhang, Q., Li,
L., Bayer, M., Bolser, D., Taudien, S., et al. (2017). Construction of a map-based
reference genome sequence for barley, Hordeum vulgare L. Sci Data 4, 170044.
Benjamini, Y., and Hochberg, Y. (1995). Controlling the False Discovery Rate - a
Practical and Powerful Approach to Multiple Testing. J R Stat Soc Series B Stat
Methodol 57, 289-300.
Bikard, D., Patel, D., Le Mette, C., Giorgi, V., Camilleri, C., Bennett, M.J., and
Loudet, O. (2009). Divergent evolution of duplicate genes leads to genetic
incompatibilities within A. thaliana. Science 323, 623-626.
Bird, C.E., Fernandez-Silva, I., Skillings, D.J., and Toonen, R.J. (2012). Sympatric
Speciation in the Post “Modern Synthesis” Era of Evolutionary Biology. Evol Biol
39, 158-180.
Bolger, A.M., Lohse, M., and Usadel, B. (2014). Trimmomatic: a flexible trimmer
for Illumina sequence data. Bioinformatics 30, 2114-2120.
Bolnick, D.I., and Fitzpatrick, B.M. (2007). Sympatric Speciation: Models and
Empirical Evidence. Annu Rev Ecol Evol Syst 38, 459-487.
Bonner, J. (1933). Studies on the Growth Hormone of P lants: IV. On the Mechanism
of the Action. Proc Natl Acad Sci U S A 19, 717-719.
Booker, J., Auldridge, M., Wills, S., McCarty, D., Klee, H., and Leyser, O. (2004).
MAX3/CCD7 is a carotenoid cleavage dioxygenase required for the synthesis of a
novel plant signaling molecule. Curr Biol 14, 1232-1238.
149
Buckler, E.S., Holland, J.B., Bradbury, P.J., Acharya, C.B., Brown, P.J., Browne, C.,
Ersoz, E., Flint-Garcia, S., Garcia, A., Glaubitz, J.C., et al. (2009). The genetic
architecture of maize flowering time. Science 325, 714-718.
Bull, H., Casao, M.C., Zwirek, M., Flavell, A.J., Thomas, W.T.B., Guo, W., Zhang,
R., Rapazote-Flores, P., Kyriakidis, S., Russell, J., et al. (2017). Barley SIX-
ROWED SPIKE3 encodes a putative Jumonji C-type H3K9me2/me3 demethylase
that represses lateral spikelet fertility. Nat Commun 8, 936.
Burset, M., Seledtsov, I.A., and Solovyev, V.V. (2000). Analysis of canonical and
non-canonical splice sites in mammalian genomes. Nucleic Acids Res 28, 4364-
4375.
Buschges, R., Hollricher, K., Panstruga, R., Simons, G., Wolter, M., Frijters, A., van
Daelen, R., van der Lee, T., Diergaarde, P., Groenendijk, J., et al. (1997). The barley
Mlo gene: a novel control element of plant pathogen resistance. Cell 88, 695-705.
Carollo, V., Matthews, D.E., Lazo, G.R., Blake, T.K., Hummel, D.D., Lui, N., Hane,
D.L., and Anderson, O.D. (2005). GrainGenes 2.0. an improved resource for the
small-grains community. Plant Physiol 139, 643-651.
Chalhoub, B., Denoeud, F., Liu, S., Parkin, I.A., Tang, H., Wang, X., Chiquet, J.,
Belcram, H., Tong, C., Samans, B., et al. (2014). Plant genetics. Early allopolyploid
evolution in the post-Neolithic Brassica napus oilseed genome. Science 345, 950-
953.
Chen, J., Ding, J., Ouyang, Y., Du, H., Yang, J., Cheng, K., Zhao, J., Qiu, S., Zhang,
X., Yao, J., et al. (2008). A triallelic system of S5 is a major regulator of the
150
reproductive barrier and compatibility of indica-japonica hybrids in rice. Proc Natl
Acad Sci U S A 105, 11436-11441.
Chevalier, F., Nieminen, K., Sanchez-Ferrero, J.C., Rodriguez, M.L., Chagoyen, M.,
Hardtke, C.S., and Cubas, P. (2014). Strigolactone promotes degradation of
DWARF14, an alpha/beta hydrolase essential for strigolactone signaling in
Arabidopsis. Plant Cell 26, 1134-1150.
Cingolani, P., Platts, A., Wang le, L., Coon, M., Nguyen, T., Wang, L., Land, S.J.,
Lu, X., and Ruden, D.M. (2012). A program for annotating and predicting the effects
of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila
melanogaster strain w1118; iso-2; iso-3. Fly (Austin) 6, 80-92.
Close, T.J., Bhat, P.R., Lonardi, S., Wu, Y., Rostoks, N., Ramsay, L., Druka, A.,
Stein, N., Svensson, J.T., Wanamaker, S., et al. (2009). Development and
implementation of high-throughput SNP genotyping in barley. BMC Genomics 10,
582.
Cockram, J., White, J., Zuluaga, D.L., Smith, D., Comadran, J., Macaulay, M., Luo,
Z., Kearsey, M.J., Werner, P., Harrap, D., et al. (2010). Genome-wide association
mapping to candidate polymorphism resolution in the unsequenced barley genome.
Proc Natl Acad Sci U S A 107, 21611-21616.
Collins, A.R. (2004). The comet assay for DNA damage and repair: principles,
applications, and limitations. Mol Biotechnol 26, 249-261.
151
Colmsee, C., Beier, S., Himmelbach, A., Schmutzer, T., Stein, N., Scholz, U., and
Mascher, M. (2015). BARLEX - the Barley Draft Genome Explorer. Mol Plant 8,
964-966.
Dai, F., Chen, Z.H., Wang, X., Li, Z., Jin, G., Wu, D., Cai, S., Wang, N., Wu, F.,
Nevo, E., et al. (2014). Transcriptome profiling reveals mosaic genomic origins of
modern cultivated barley. Proc Natl Acad Sci U S A 111, 13403-13408.
Dai, F., Nevo, E., Wu, D., Comadran, J., Zhou, M., Qiu, L., Chen, Z., Beiles, A.,
Chen, G., and Zhang, G. (2012). Tibet is one of the centers of domestication of
cultivated barley. Proc Natl Acad Sci U S A 109, 16969-16973.
Dai, F., Wang, X., Zhang, X.Q., Chen, Z., Nevo, E., Jin, G., Wu, D., Li, C., and
Zhang, G. (2018). Assembly and analysis of a qingke reference genome demonstrate
its close genetic relation to modern cultivated barley. Plant Biotechnol J 16, 760-770.
Dobin, A., Davis, C.A., Schlesinger, F., Drenkow, J., Zaleski, C., Jha, S., Batut, P.,
Chaisson, M., and Gingeras, T.R. (2013). STAR: ultrafast universal RNA-seq
aligner. Bioinformatics 29, 15-21.
Domagalska, M.A., and Leyser, O. (2011). Signal integration in the control of shoot
branching. Nat Rev Mol Cell Biol 12, 211-221.
Druka, A., Franckowiak, J., Lundqvist, U., Bonar, N., Alexander, J., Houston, K.,
Radovic, S., Shahinnia, F., Vendramin, V., Morgante, M., et al. (2011). Genetic
dissection of barley morphology and development. Plant Physiol 155, 617-627.
152
Eid, J., Fehr, A., Gray, J., Luong, K., Lyle, J., Otto, G., Peluso, P ., Rank, D.,
Baybayan, P., Bettman, B., et al. (2009). Real-time DNA sequencing from single
polymerase molecules. Science 323, 133-138.
Fan, C., Xing, Y., Mao, H., Lu, T., Han, B., Xu, C., Li, X., and Zhang, Q. (2006).
GS3, a major QTL for grain length and weight and minor QTL for grain width and
thickness in rice, encodes a putative transmembrane protein. Theor Appl Genet 112,
1164-1171.
Fan, Y., Zhou, G., Shabala, S., Chen, Z.H., Cai, S., Li, C., and Zhou, M. (2016).
Genome-Wide Association Study Reveals a New QTL for Salinity Tolerance in
Barley (Hordeum vulgare L.). Front Plant Sci 7, 946.
Felsenstein, J. (1981). Evolutionary trees from DNA sequences: a maximum
likelihood approach. J Mol Evol 17, 368-376.
Friedberg, E.C. (2003). DNA damage and repair. Nature 421, 436-440.
Frommer, M., Mcdonald, L.E., Millar, D.S., Collis, C.M., Watt, F., Grigg, G.W.,
Molloy, P.L., and Paul, C.L. (1992). A Genomic Sequencing Protocol That Yields a
Positive Display of 5-Methylcytosine Residues in Individual DNA Strands. Proc Natl
Acad Sci U S A 89, 1827-1831.
Fuller, D.Q., and Allaby, R. (2009). Seed Dispersal and Crop Domestication:
Shattering, Germination and Seasonality in Evolution under Cultivation. In Fruit
Development and Seed Dispersal (Wiley-Blackwell), pp. 238-295.
153
Garrido-Cardenas, J.A., Garcia-Maroto, F., Alvarez-Bermejo, J.A., and Manzano-
Agugliaro, F. (2017). DNA Sequencing Sensors: An Overview. Sensors (Basel) 17.
Gentleman, R.C., Carey, V.J., Bates, D.M., Bolstad, B., Dettling, M., Dudoit, S.,
Ellis, B., Gautier, L., Ge, Y., Gentry, J., et al. (2004). Bioconductor: open software
development for computational biology and bioinformatics. Genome Biol 5, R80.
Gigon, A., Matos, A.R., Laffray, D., Zuily-Fodil, Y., and Pham-Thi, A.T. (2004).
Effect of drought stress on lipid metabolism in the leaves of Arabidopsis thaliana
(ecotype Columbia). Ann Bot 94, 345-351.
Godfray, H.C., Beddington, J.R., Crute, I.R., Haddad, L., Lawrence, D., Muir, J.F.,
Pretty, J., Robinson, S., Thomas, S.M., and Toulmin, C. (2010). Food security: the
challenge of feeding 9 billion people. Science 327, 812-818.
Goff, S.A., Ricke, D., Lan, T.H., Presting, G., Wang, R., Dunn, M., Glazebrook, J.,
Sessions, A., Oeller, P., Varma, H., et al. (2002). A draft sequence of the rice
genome (Oryza sativa L. ssp. japonica). Science 296, 92-100.
Gomez-Roldan, V., Fermas, S., Brewer, P.B., Puech-Pages, V., Dun, E.A., Pillot,
J.P., Letisse, F., Matusova, R., Danoun, S., Portais, J.C., et al. (2008). Strigolactone
inhibition of shoot branching. Nature 455, 189-194.
Gonzalez-Guzman, M., Pizzio, G.A., Antoni, R., Vera-Sirera, F., Merilo, E., Bassel,
G.W., Fernandez, M.A., Holdsworth, M.J., Perez-Amador, M.A., Kollist, H., et al.
(2012). Arabidopsis PYR/PYL/RCAR receptors play a major role in quantitative
regulation of stomatal aperture and transcriptional response to abscisic acid. Plant
Cell 24, 2483-2496.
154
Graeber, K., Nakabayashi, K., Miatton, E., Leubner-Metzger, G., and Soppe, W.J.
(2012). Molecular mechanisms of seed dormancy. Plant Cell Environ 35, 1769-1786.
Greene, E.A., Codomo, C.A., Taylor, N.E., Henikoff, J.G., Till, B.J., Reynolds, S.H.,
Enns, L.C., Burtner, C., Johnson, J.E., Odden, A.R., et al. (2003). Spectrum of
chemically induced mutations from a large-scale reverse-genetic screen in
Arabidopsis. Genetics 164, 731-740.
Hadid, Y., Pavlíček, T., Beiles, A., Ianovici, R., Raz, S., and Nevo, E. (2014).
Sympatric incipient speciation of spiny mice Acomys at “Evolution Canyon,” Israel.
Proc Natl Acad Sci U S A 111, 1043-1048.
Hajjar, R., and Hodgkin, T. (2007). The use of wild relatives in crop improvement: a
survey of developments over the last 20 years. Euphytica 156, 1-13.
Han, F., Ullrich, S.E., Clancy, J.A., Jitkov, V., Kilian, A., and Romagosa, I. (1996).
Verification of barley seed dormancy loci via linked molecular markers. Theor Appl
Genet 92, 87-91.
Harlan, J.R., and Zohary, D. (1966). Distribution of wild wheats and barley. Science
153, 1074-1080.
Hinze, K., Thompson, R.D., Ritter, E., Salamini, F., and Schulze-Lefert, P. (1991).
Restriction fragment length polymorphism-mediated targeting of the ml-o resistance
locus in barley (Hordeum vulgare). Proc Natl Acad Sci U S A 88, 3691-3695.
Hofmann, N. (2014). Cryptochromes and seed dormancy: the molecular mechanism
of blue light inhibition of grain germination. Plant Cell 26, 846.
155
Holton, T.A. (2002). Identification and mapping of polymorphic SSR markers from
expressed gene sequences of barley and wheat. Mol Breed 9, 63-71.
Houston, K., McKim, S.M., Comadran, J., Bonar, N., Druka, I., Uzrek, N., Cirillo,
E., Guzy-Wrobelska, J., Collins, N.C., Halpin, C., et al. (2013). Variation in the
interaction between alleles of HvAPETALA2 and microRNA172 determines the
density of grains on the barley inflorescence. Proc Natl Acad Sci U S A 110, 16675-
16680.
Huang, S., Zhang, J., Li, R., Zhang, W., He, Z., Lam, T.W., Peng, Z., and Yiu, S.M.
(2011a). SOAPsplice: Genome-Wide ab initio Detection of Splice Junctions from
RNA-Seq Data. Front Genet 2, 46.
Huang, X., Kurata, N., Wei, X., Wang, Z.X., Wang, A., Zhao, Q., Zhao, Y., Liu, K.,
Lu, H., Li, W., et al. (2012a). A map of rice genome variation reveals the origin of
cultivated rice. Nature 490, 497-501.
Huang, X., Wei, X., Sang, T., Zhao, Q., Feng, Q., Zhao, Y., Li, C., Zhu, C., Lu, T.,
Zhang, Z., et al. (2010). Genome-wide association studies of 14 agronomic traits in
rice landraces. Nat Genet 42, 961-967.
Huang, X., Zhao, Y., Wei, X., Li, C., Wang, A., Zhao, Q., Li, W., Guo, Y., Deng, L.,
Zhu, C., et al. (2011b). Genome-wide association study of flowering time and grain
yield traits in a worldwide collection of rice germplasm. Nat Genet 44, 32-39.
Huang, X.H., Zhao, Y., Wei, X.H., Li, C.Y., Wang, A., Zhao, Q., Li, W.J., Guo,
Y.L., Deng, L.W., Zhu, C.R., et al. (2012b). Genome-wide association study of
156
flowering time and grain yield traits in a worldwide collection of rice germplasm.
Nat Genet 44, 32-U53.
Ibarra, S.E., Tognacca, R.S., Dave, A., Graham, I.A., Sanchez, R.A., and Botto, J.F.
(2016). Molecular mechanisms underlying the entrance in secondary dormancy of
Arabidopsis seeds. Plant Cell Environ 39, 213-221.
International Barley Genome Sequencing, C., Mayer, K.F., Waugh, R., Brown, J.W.,
Schulman, A., Langridge, P., Platzer, M., Fincher, G.B., Muehlbauer, G.J., Sato, K. ,
et al. (2012). A physical, genetic and functional sequence assembly of the barley
genome. Nature 491, 711-716.
International Rice Genome Sequencing, P. (2005). The map-based sequence of the
rice genome. Nature 436, 793-800.
International Wheat Genome Sequencing, C. (2014). A chromosome-based draft
sequence of the hexaploid bread wheat (Triticum aestivum) genome. Science 345,
1251788.
Ishikawa, S., Maekawa, M., Arite, T., Onishi, K., Takamure, I., and Kyozuka, J.
(2005). Suppression of tiller bud activity in tillering dwarf mutants of rice. Plant Cell
Physiol 46, 79-86.
Jaccoud, D., Peng, K., Feinstein, D., and Kilian, A. (2001). Diversity arrays: a solid
state technology for sequence information independent genotyping. Nucleic Acids
Res 29, E25.
157
Jia, Q.J., Tan, C., Wang, J.M., Zhang, X.Q., Zhu, J.H., Luo, H., Yang, J.M.,
Westcott, S., Broughton, S., Moody, D., et al. (2016). Marker development using
SLAF-seq and whole-genome shotgun strategy to fine-map the semi-dwarf gene ari-e
in barley. BMC Genomics 17, 911.
Jiang, L., Liu, X., Xiong, G., Liu, H., Chen, F., Wang, L., Meng, X., Liu, G., Yu, H.,
Yuan, Y., et al. (2013). DWARF 53 acts as a repressor of strigolactone signalling in
rice. Nature 504, 401-405.
Jiao, Y., Lee, Y.K., Gladman, N., Chopra, R., Christensen, S.A., Regulski, M.,
Burow, G., Hayes, C., Burke, J., Ware, D., et al. (2018). MSD1 regulates pedicellate
spikelet fertility in sorghum through the jasmonic acid pathway. Nat Commun 9,
822.
Jiao, Y., Peluso, P., Shi, J., Liang, T., Stitzer, M.C., Wang, B., Campbell, M.S.,
Stein, J.C., Wei, X., Chin, C.S., et al. (2017). Improved maize reference genome
with single-molecule technologies. Nature 546, 524-527.
Jiao, Y., Wang, Y., Xue, D., Wang, J., Yan, M., Liu, G., Dong, G., Zeng, D., Lu, Z.,
Zhu, X., et al. (2010). Regulation of OsSPL14 by OsmiR156 defines ideal plant
architecture in rice. Nat Genet 42, 541-544.
Jin, F., Li, Y., Dixon, J.R., Selvaraj, S., Ye, Z., Lee, A.Y., Yen, C.A., Schmitt, A.D.,
Espinoza, C.A., and Ren, B. (2013). A high-resolution map of the three-dimensional
chromatin interactome in human cells. Nature 503, 290-294.
Kim, D., Langmead, B., and Salzberg, S.L. (2015). HISAT: a fast spliced aligner
with low memory requirements. Nat Methods 12, 357-360.
158
Kim, D., Pertea, G., Trapnell, C., Pimentel, H., Kelley, R., and Salzberg, S.L. (2013).
TopHat2: accurate alignment of transcriptomes in the presence of insertions,
deletions and gene fusions. Genome Biol 14, R36.
Kim, K.N., Cheong, Y.H., Grant, J.J., Pandey, G.K., and Luan, S. (2003). CIPK3, a
calcium sensor-associated protein kinase that regulates abscisic acid and cold signal
transduction in Arabidopsis. Plant Cell 15, 411-423.
Kim, Y., Schumaker, K.S., and Zhu, J.-K. (2006). EMS mutagenesis of Arabidopsis.
In Arabidopsis Protocols (Springer), pp. 101-103.
Kleinhofs, A., Kilian, A., Saghai Maroof, M.A., Biyashev, R.M., Hayes, P., Chen,
F.Q., Lapitan, N., Fenwick, A., Blake, T.K., Kanazin, V. , et al. (1993). A molecular,
isozyme and morphological map of the barley (Hordeum vulgare) genome. Theor
Appl Genet 86, 705-712.
Komatsuda, T., and Mano, Y. (2002). Molecular mapping of the intermedium spike-c
( int-c) and non-brittle rachis 1 ( btr1) loci in barley ( Hordeum vulgare L.). Theor
Appl Genet 105, 85-90.
Komatsuda, T., Maxim, P., Senthil, N., and Mano, Y. (2004). High-density AFLP
map of nonbrittle rachis 1 (btr1) and 2 (btr2) genes in barley (Hordeum vulgare L.).
Theor Appl Genet 109, 986-995.
Komatsuda, T., Pourkheirandish, M., He, C., Azhaguvel, P., Kanamori, H., Perovic,
D., Stein, N., Graner, A., Wicker, T., Tagiri, A., et al. (2007). Six-rowed barley
originated from a mutation in a homeodomain-leucine zipper I-class homeobox gene.
Proc Natl Acad Sci U S A 104, 1424-1429.
159
Koornneeff, M., Dellaert, L., and Van der Veen, J. (1982). EMS-and relation-
induced mutation frequencies at individual loci in Arabidopsis thaliana (L.) Heynh.
Mutat Res 93, 109-123.
Koski, L.B., Gray, M.W., Lang, B.F., and Burger, G. (2005). AutoFACT: An Auto
matic F unctional A nnotation and C lassification T ool. BMC Bioinformatics 6, 151.
Li, G., Chern, M., Jain, R., Martin, J.A., Schackwitz, W.S., Jiang, L., Vega-Sanchez,
M.E., Lipzen, A.M., Barry, K.W., Schmutz, J., et al. (2016a). Genome-Wide
Sequencing of 41 Rice (Oryza sativa L.) Mutated Lines Reveals Diverse Mutations
Induced by Fast-Neutron Irradiation. Mol Plant 9, 1078-1081.
Li, G., Jain, R., Chern, M., Pham, N.T., Martin, J.A., Wei, T., Schackwitz, W.S.,
Lipzen, A.M., Duong, P.Q., Jones, K.C., et al. (2017). The Sequences of 1504
Mutants in the Model Rice Variety Kitaake Facilitate Rapid Functional Genomic
Studies. Plant Cell 29, 1218-1231.
Li, H. (2013). Aligning sequence reads, clone sequences and assembly contigs with
BWA-MEM. arXiv preprint arXiv:13033997.
Li, H. (2014). Toward better understanding of artifacts in variant calling from high-
coverage samples. Bioinformatics 30, 2843-2851.
Li, H., and Durbin, R. (2010). Fast and accurate long-read alignment with Burrows-
Wheeler transform. Bioinformatics 26, 589-595.
160
Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G.,
Abecasis, G., Durbin, R., and Genome Project Data Processing, S. (2009). The
Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078-2079.
Li, J.Y., Wang, J., and Zeigler, R.S. (2014a). The 3,000 rice genomes project: new
opportunities and challenges for future rice research. Gigascience 3, 8.
Li, K., Wang, H., Cai, Z., Wang, L., Xu, Q., Lovy, M., Wang, Z., and Nevo, E.
(2016b). Sympatric speciation of spiny mice, Acomys, unfolded transcriptomically at
Evolution Canyon, Israel. Proc Natl Acad Sci U S A 113, 8254-8259.
Li, Q., Shao, J., Tang, S., Shen, Q., Wang, T., Chen, W., and Hong, Y. (2015).
Corrigendum: Wrinkled1 Accelerates Flowering and Regulates Lipid Homeostasis
between Oil Accumulation and Membrane Lipid Anabolism in Brassica napus. Front
Plant Sci 6, 1270.
Li, S., Zheng, Y.C., Cui, H.R., Fu, H.W., Shu, Q.Y., and Huang, J.Z. (2016c).
Frequency and type of inheritable mutations induced by gamma rays in rice as
revealed by whole genome sequencing. J Zhejiang Univ Sci B 17, 905-915.
Li, X., Qian, Q., Fu, Z., Wang, Y., Xiong, G., Zeng, D., Wang, X., Liu, X., Teng, S.,
Hiroshi, F., et al. (2003). Control of tillering in rice. Nature 422, 618-621.
Li, Y., Fan, C., Xing, Y., Jiang, Y., Luo, L., Sun, L., Shao, D., Xu, C., Li, X., Xiao,
J., et al. (2011). Natural variation in GS5 plays an important role in regulating grain
size and yield in rice. Nat Genet 43, 1266-1269.
161
Li, Y., Fan, C., Xing, Y., Yun, P., Luo, L., Yan, B., Peng, B., Xie, W., Wang, G., Li,
X., et al. (2014b). Chalk5 encodes a vacuolar H(+)-translocating pyrophosphatase
influencing grain chalkiness in rice. Nat Genet 46, 398-404.
Lin, H., Wang, R., Qian, Q., Yan, M., Meng, X., Fu, Z., Yan, C., Jiang, B., Su, Z.,
Li, J., et al. (2009). DWARF27, an iron-containing protein required for the
biosynthesis of strigolactones, regulates rice tiller bud outgrowth. Plant Cell 21,
1512-1525.
Lin, T., Zhu, G., Zhang, J., Xu, X., Yu, Q., Zheng, Z., Zhang, Z., Lun, Y., Li, S.,
Wang, X., et al. (2014). Genomic analyses provide insights into the history of tomato
breeding. Nat Genet 46, 1220-1226.
Liu, Z.W., Biyashev, R.M., and Maroof, M.A. (1996). Development of simple
sequence repeat DNA markers and their integration into a barley linkage map. Theor
Appl Genet 93, 869-876.
Long, Y., Zhao, L., Niu, B., Su, J., Wu, H., Chen, Y., Zhang, Q., Guo, J., Zhuang, C.,
Mei, M., et al. (2008). Hybrid male sterility in rice controlled by interaction between
divergent alleles of two adjacent genes. Proc Natl Acad Sci U S A 105, 18871-
18876.
Luo, R., Liu, B., Xie, Y., Li, Z., Huang, W., Yuan, J., He, G., Chen, Y., Pan, Q., Liu,
Y., et al. (2012). SOAPdenovo2: an empirically improved memory-efficient short-
read de novo assembler. Gigascience 1, 18.
Mardis, E.R. (2008). Next-generation DNA sequencing methods. Annu Rev
Genomics Hum Genet 9, 387-402.
162
Martin, M. (2011). Cutadapt removes adapter sequences from high-throughput
sequencing reads. EMBnet J 17, 10-12.
Mascher, M., Gundlach, H., Himmelbach, A., Beier, S., Twardziok, S.O., Wicker, T.,
Radchuk, V., Dockter, C., Hedley, P.E., Russell, J., et al. (2017). A chromosome
conformation capture ordered sequence of the barley genome. Nature 544, 427-433.
Mascher, M., Jost, M., Kuon, J.E., Himmelbach, A., Assfalg, A., Beier, S., Scholz,
U., Graner, A., and Stein, N. (2014). Mapping-by-sequencing accelerates forward
genetics in barley. Genome Biol 15, R78.
Mascher, M., Muehlbauer, G.J., Rokhsar, D.S., Chapman, J., Schmutz, J., Barry, K.,
Munoz-Amatriain, M., Close, T.J., Wise, R.P., Schulman, A.H. , et al. (2013a).
Anchoring and ordering NGS contig assemblies by population sequencing
(POPSEQ). Plant J 76, 718-727.
Mascher, M., Richmond, T.A., Gerhardt, D.J., Himmelbach, A., Clissold, L.,
Sampath, D., Ayling, S., Steuernagel, B., Pfeifer, M., D'Ascenzo, M. , et al. (2013b).
Barley whole exome capture: a tool for genomic research in the genus Hordeum and
beyond. Plant J 76, 494-505.
Maxam, A.M., and Gilbert, W. (1977). A new method for sequencing DNA. Proc
Natl Acad Sci U S A 74, 560-564.
Miura, K., Ikeda, M., Matsubara, A., Song, X.J., Ito, M., Asano, K., Matsuoka, M.,
Kitano, H., and Ashikari, M. (2010). OsSPL14 promotes panicle branching and
higher grain productivity in rice. Nat Genet 42, 545-549.
163
Mizuta, Y., Harushima, Y., and Kurata, N. (2010). Rice pollen hybrid
incompatibility caused by reciprocal gene loss of duplicated genes. Proc Natl Acad
Sci U S A 107, 20417-20422.
Molina-Cano, J.L., Russell, J.R., Moralejo, M.A., Escacena, J.L., Arias, G., and
Powell, W. (2005). Chloroplast DNA microsatellite analysis supports a polyphyletic
origin for barley. Theor Appl Genet 110, 613-619.
Monna, L. (2002). Positional Cloning of Rice Semidwarfing Gene, sd-1: Rice "Green
Revolution Gene" Encodes a Mutant Enzyme Involved in Gibberellin Synthesis.
DNA Res 9, 11-17.
Morrell, P.L., and Clegg, M.T. (2007). Genetic evidence for a second domestication
of barley (Hordeum vulgare) east of the Fertile Crescent. Proc Natl Acad Sci U S A
104, 3289-3294.
Munoz-Amatriain, M., Lonardi, S., Luo, M., Madishetty, K., Svensson, J.T.,
Moscou, M.J., Wanamaker, S., Jiang, T., Kleinhofs, A., Muehlbauer, G.J., et al.
(2015). Sequencing of 15 622 gene-bearing BACs clarifies the gene-dense regions of
the barley genome. Plant J 84, 216-227.
NA, J., and JN, F. (2011). Sickle: A sliding-window, adaptive, quality-based
trimming tool for FastQ files.
Nakamura, H., Xue, Y.L., Miyakawa, T., Hou, F., Qin, H.M., Fukui, K., Shi, X., Ito,
E., Ito, S., Park, S.H., et al. (2013). Molecular mechanism of strigolactone perception
by DWARF14. Nat Commun 4, 2613.
164
Nakamura, S., Pourkheirandish, M., Morishige, H., Kubo, Y., Nakamura, M.,
Ichimura, K., Seo, S., Kanamori, H., Wu, J., Ando, T., et al. (2016). Mitogen-
Activated Protein Kinase Kinase 3 Regulates Seed Dormancy in Barley. Curr Biol
26, 775-781.
Nakata, M., Miyashita, T., Kimura, R., Nakata, Y., Takagi, H., Kuroda, M.,
Yamaguchi, T., Umemoto, T., and Yamakawa, H. (2018). MutMapPlus identified
novel mutant alleles of a rice starch branching enzyme IIb gene for fine-tuning of
cooked rice texture. Plant Biotechnol J 16, 111-123.
Nevo, E. (1995). Asian, African and European Biota Meet at 'Evolution Canyon'
Israel: Local Tests of Global Biodiversity and Genetic Diversity Patterns. Proc R Soc
Lond B Biol Sci 262, 149-155.
Nevo, E. (2006). Evolution canyon": A microcosm of life's evolution focusing on
adaptation and speciation. Isr J Ecol Evol 52, 485-506.
Nevo, E. (2012). "Evolution Canyon," a potential microscale monitor of global
warming across life. Proc Natl Acad Sci U S A 109, 2960-2965.
Nevo, E. (2013). Darwinian Evolution: Evolution in Action Across Life at
"Evolution Canyon", Israel. Isr J Ecol Evol 55, 215-225.
Nevo, E. (2014a). Evolution in action: adaptation and incipient sympatric speciation
with gene flow across life at “Evolution Canyon”, Israel. Isr J Ecol Evol 60, 85-98.
165
Nevo, E. (2014b). Evolution of wild barley at “Evolution Canyon”: adaptation,
speciation, pre-agricultural collection, and barley improvement. Isr J Plant Sci 62,
22-32.
Nevo, E., Shewry, P.R., and others (1992). Origin, evolution, population genetics and
resources for breeding of wild barley, Hordeum spontaneum, in the Fertile Crescent
(Wallingford, UK: CAB International).
Nordstrom, K.J., Albani, M.C., James, G.V., Gutjahr, C., Hartwig, B., Turck, F.,
Paszkowski, U., Coupland, G., and Schneeberger, K. (2013a). Mutation
identification by direct comparison of whole-genome sequencing data from mutant
and wild-type individuals using k-mers. Nat Biotechnol 31, 325-330.
Nordstrom, K.J.V., Albani, M.C., James, G.V., Gutjahr, C., Hartwig, B., Turck, F.,
Paszkowski, U., Coupland, G., and Schneeberger, K. (2013b). Mutation
identification by direct comparison of whole-genome sequencing data from mutant
and wild-type individuals using k-mers. Nat Biotechnol 31, 325-+.
Parnas, T. (2006). Evidence for Incipient Sympatric Speciation in Wild Barley
Hordeum Spontaneum, at" Evolution Canyon", Mount Carmel, Israel, Based on
Hybridization and Physiological and Genetic Diversity Estimates (University of
Haifa, Faculty of Science and Science Education, Department of Evolutionary and
Environmental Biology).
Pasam, R.K., Sharma, R., Malosetti, M., van Eeuwijk, F.A., Haseneyer, G., Kilian,
B., and Graner, A. (2012). Genome-wide association studies for agronomical traits in
a world wide spring barley collection. BMC Plant Biol 12, 16.
166
Paterson, A.H., Bowers, J.E., Bruggmann, R., Dubchak, I., Grimwood, J., Gundlach,
H., Haberer, G., Hellsten, U., Mitros, T., Poliakov, A., et al. (2009). The Sorghum
bicolor genome and the diversification of grasses. Nature 457, 551-556.
Pavlícek, T., Sharon, D., Kravchenko, V., Saaroni, H., and Nevo, E. (2003).
Microclimatic interslope differences underlying biodiversity contrasts in" Evolution
Canyon", Mt. Carmel, Israel. Isr J Earth Sci 52.
Pourkheirandish, M., Hensel, G., Kilian, B., Senthil, N., Chen, G., Sameri, M.,
Azhaguvel, P., Sakuma, S., Dhanagond, S., Sharma, R., et al. (2015). Evolution of
the Grain Dispersal System in Barley. Cell 162, 527-539.
Ramsay, L., Macaulay, M., degli Ivanissevich, S., MacLean, K., Cardle, L., Fuller,
J., Edwards, K.J., Tuvesson, S., Morgante, M., Massari, A., et al. (2000). A simple
sequence repeat-based linkage map of barley. Genetics 156, 1997-2005.
Rastogi, R.P., Richa, Kumar, A., Tyagi, M.B., and Sinha, R.P. (2010). Molecular
mechanisms of ultraviolet radiation-induced DNA damage and repair. J Nucleic
Acids 2010, 592980.
Ren, X., Wang, J., Liu, L., Sun, G., Li, C., Luo, H., and Sun, D. (2016). SNP -based
high density genetic map and mapping of btwd1 dwarfing gene in barley. Sci Rep 6,
31741.
Riboni, M., Galbiati, M., Tonelli, C., and Conti, L. (2013). GIGANTEA enables
drought escape response via abscisic acid-dependent activation of the florigens and
SUPPRESSOR OF OVEREXPRESSION OF CONSTANS. Plant Physiol 162, 1706-
1719.
167
Richards, J.K., Friesen, T.L., and Brueggeman, R.S. (2017). Association mapping
utilizing diverse barley lines reveals net form net blotch seedling
resistance/susceptibility loci. Theor Appl Genet 130, 915-927.
Robinson, M.D., McCarthy, D.J., and Smyth, G.K. (2010). edgeR: a Bioconductor
package for differential expression analysis of digital gene expression data.
Bioinformatics 26, 139-140.
Robinson, M.D., and Oshlack, A. (2010). A scaling normalization method for
differential expression analysis of RNA-seq data. Genome Biol 11, R25.
Roy, J.K., Smith, K.P., Muehlbauer, G.J., Chao, S., C lose, T.J., and Steffenson, B.J.
(2010). Association mapping of spot blotch resistance in wild barley. Mol Breed 26,
243-256.
Russell, J., Mascher, M., Dawson, I.K., Kyriakidis, S., Calixto, C., Freund, F., Bayer,
M., Milne, I., Marshall-Griffiths, T., Heinen, S., et al. (2016). Exome sequencing of
geographically diverse barley landraces and wild relatives gives insights into
environmental adaptation. Nat Genet 48, 1024-1030.
Russell, J., van Zonneveld, M., Dawson, I.K., Booth, A., Waugh, R., and Steffenson ,
B. (2014). Genetic diversity and ecological niche modelling of wild barley: refugia,
large-scale post-LGM range expansion and limited mid-future climate threats? PLoS
One 9, e86021.
Rutger, J.N., Peterson, M.L., and Hu, C. (1977). Registration of Calrose 76 Rice1
(Reg. No. 45). Crop Sci 17, 978-978.
168
Sallam, A.H., Tyagi, P., Brown-Guedira, G., Muehlbauer, G.J., Hulse, A., and
Steffenson, B.J. (2017). Genome-Wide Association Mapping of Stem Rust
Resistance in Hordeum vulgare subsp. spontaneum. G3 (Bethesda) 7, 3491-3507.
Sanger, F., and Coulson, A.R. (1975). A rapid method for determining sequences in
DNA by primed synthesis with DNA polymerase. J Mol Biol 94, 441-448.
Sanger, F., Nicklen, S., and Coulson, A.R. (1977). DNA sequencing with chain-
terminating inhibitors. Proc Natl Acad Sci U S A 74, 5463-5467.
Sankaranarayanan, K. (1991). Ionizing radiation and genetic risks. III. Nature of
spontaneous and radiation-induced mutations in mammalian in vitro systems and
mechanisms of induction of mutations by radiation. Mutat Res 258, 75-97.
Sato, K., Yamane, M., Yamaji, N., Kanamori, H., Tagiri, A., Schwerdt, J.G., Fincher,
G.B., Matsumoto, T., Takeda, K., and Komatsuda, T. (2016). Alanine
aminotransferase controls seed dormancy in barley. Nat Commun 7, 11625.
Savolainen, V., Anstett, M.C., Lexer, C., Hutton, I., Clarkson, J.J., Norup, M.V.,
Powell, M.P., Springate, D., Salamin, N., and Baker, W.J. (2006). Sympatric
speciation in palms on an oceanic island. Nature 441, 210-213.
Schmutz, J. , Cannon, S.B., Schlueter, J., Ma, J., Mitros, T., Nelson, W., Hyten, D.L.,
Song, Q., Thelen, J.J., Cheng, J., et al. (2010). Erratum: Genome sequence of the
palaeopolyploid soybean. Nature 465, 120-120.
169
Schnable, P.S., Ware, D., Fulton, R.S., Stein, J.C., Wei, F., Pasternak, S., Liang, C.,
Zhang, J., Fulton, L., Graves, T.A., et al. (2009). The B73 maize genome:
complexity, diversity, and dynamics. Science 326, 1112-1115.
Seberg, O., and Petersen, G. (2009). A unified classification system for eukaryotic
transposable elements should reflect their phylogeny. Nat Rev Genet 10, 276.
Sega, G.A. (1984). A review of the genetic effects of ethyl methanesulfonate. Mutat
Res 134, 113-142.
Sherman, J.D., Fenwick, A.L., Namuth, D.M., and Lapitan, N.L. (1995). A barley
RFLP map: alignment of three barley maps and comparisons to Gramineae species.
Theor Appl Genet 91, 681-690.
Sherry, S.T., Ward, M.H., Kholodov, M., Baker, J., Phan, L., Smigielski, E.M., and
Sirotkin, K. (2001). dbSNP: the NCBI database of genetic variation. Nucleic Acids
Res 29, 308-311.
Shirley, B.W., Hanley, S., and Goodman, H.M. (1992). Effects of ionizing radiation
on a plant genome: analysis of two Arabidopsis transparent testa mutations. Plant
Cell 4, 333-347.
Sims, D., Sudbery, I., Ilott, N.E., Heger, A., and Ponting, C.P. (2014). Sequencing
depth and coverage: key considerations in genomic analyses. Nat Rev Genet 15, 121-
132.
Skinner, M.E., Uzilov, A.V., Stein, L.D., Mungall, C.J., and Holmes, I.H. (2009).
JBrowse: a next-generation genome browser. Genome Res 19, 1630-1638.
170
Song, X., Lu, Z., Yu, H., Shao, G., Xiong, J., Meng, X., Jing, Y., Liu, G., Xiong, G.,
Duan, J., et al. (2017). IPA1 functions as a downstream transcription factor repressed
by D53 in strigolactone signaling in rice. Cell Res 27, 1128-1141.
Strasburg, J.L., Sherman, N.A., Wright, K.M., Moyle, L.C., Willis, J.H., and
Rieseberg, L.H. (2012). What can patterns of differentiation across plant genomes
tell us about adaptation and speciation? Philos Trans R Soc Lond B Biol Sci 367,
364-373.
Struss, D., and Plieske, J. (1998). The use of microsatellite markers for detection of
genetic diversity in barley populations. Theor Appl Genet 97, 308-315.
Suprunova, T., Krugman, T., Fahima, T., Chen, G., Shams, I., Korol, A., and Nevo,
E. (2004). Differential expression of dehydrin (Dhn) in response to water stress in
resistant and sensitive wild barley (Hordeum spontaneum). Plant Cell Environ 27,
297-308.
Takagi, H., Abe, A., Yoshida, K., Kosugi, S., Natsume, S., Mitsuoka, C., Uemura,
A., Utsushi, H., Tamiru, M., Takuno, S., et al. (2013a). QTL-seq: rapid mapping of
quantitative trait loci in rice by whole genome resequencing of DNA from two
bulked populations. Plant J 74, 174-183.
Takagi, H., Uemura, A., Yaegashi, H., Tamiru, M., Abe, A., Mitsuoka, C., Utsushi,
H., Natsume, S., Kanzaki, H., Matsumura, H., et al. (2013b). MutMap-Gap: whole-
genome resequencing of mutant F2 progeny bulk combined with de novo assembly
of gap regions identifies the rice blast resistance gene Pii. New Phytol 200, 276-283.
171
Takahashi, R., and Hayashi, J. (1964). Linkage study of two complementary genes
for brittle rachis in barley. Berichte des Ohara Instituts für Landwirtschaftliche
Biologie, Okayama Universität 12, 99-105.
Takeno, K. (2016). Stress-induced flowering: the third category of flowering
response. J Exp Bot 67, 4925-4934.
Tamura, K., Stecher, G., Peterson, D., Filipski, A., and Kumar, S. (2013). MEGA6:
Molecular Evolutionary Genetics Analysis version 6.0. Mol Biol Evol 30, 2725-
2729.
Tanaka, M., Takei, K., Kojima, M., Sakakibara, H., and Mori, H. (2006). Auxin
controls local cytokinin biosynthesis in the nodal stem in apical dominance. Plant J
45, 1028-1036.
Tester, M., and Langridge, P. (2010). Breeding technologies to increase crop
production in a changing world. Science 327, 818-822.
Thiel, T., Michalek, W., Varshney, R.K., and Graner, A. (2003). Exploiting EST
databases for the development and characterization of gene-derived SSR-markers in
barley (Hordeum vulgare L.). Theor Appl Genet 106, 411-422.
Tollefsbol, T.O. (2011). Epigenetics : The new science of genetics. In Handbook of
Epigenetics (Elsevier), pp. 1-6.
Turuspekov, Y., Ormanbekova, D., Rsaliev, A., and Abugalieva, S. (2016). Genome-
wide association study on stem rust resistance in Kazakh spring barley lines. BMC
Plant Biol 16 Suppl 1, 6.
172
Umezawa, T., Yoshida, R., Maruyama, K., Yamaguchi-Shinozaki, K., and
Shinozaki, K. (2004). SRK2C, a SNF1-related protein kinase 2, improves drought
tolerance by controlling stress-responsive gene expression in Arabidopsis thaliana.
Proc Natl Acad Sci U S A 101, 17306-17311.
Varshney, R.K., Marcel, T.C., Ramsay, L., Russell, J. , Roder, M.S., Stein, N.,
Waugh, R., Langridge, P., Niks, R.E., and Graner, A. (2007). A high density barley
microsatellite consensus map with 775 SSR loci. Theor Appl Genet 114, 1091-1103.
Via, S., and West, J. (2008). The genetic mosaic suggests a new role for hitchhiking
in ecological speciation. Mol Ecol 17, 4334-4345.
Visel, A., Blow, M.J., Li, Z., Zhang, T., Akiyama, J.A., Holt, A., Plajzer-Frick, I.,
Shoukry, M., Wright, C., Chen, F., et al. (2009). ChIP-seq accurately predicts tissue-
specific activity of enhancers. Nature 457, 854-858.
Vogel, J.P., Garvin, D.F., Mockler, T.C., Schmutz, J., Rokhsar, D., Bevan, M.W.,
Barry, K., Lucas, S., Harmon-Smith, M., and Lail, K. (2010). Genome sequencing
and analysis of the model grass Brachypodium distachyon. Nature 463, 763-768.
Wang, J., Liu, L., Ball, T., Yu, L., Li, Y., and Xing, F. (2016). Revealing a 5,000-y-
old beer recipe in China. Proc Natl Acad Sci U S A 113, 6444-6448.
Wang, L., Wang, A., Huang, X., Zhao, Q., Dong, G., Qian, Q., Sang, T., and Han, B.
(2011). Mapping 49 quantitative trait loci at high resolution through sequencing-
based genotyping of rice recombinant inbred lines. Theor Appl Genet 122, 327-340.
173
Wang, M., Jiang, N., Jia, T., Leach, L., Cockram, J., Comadran, J., Shaw, P., Waugh,
R., and Luo, Z. (2012). Genome-wide association mapping of agronomic and
morphologic traits in highly structured populations of barley cultivars. Theor Appl
Genet 124, 233-246.
Wang, N., Long, T., Yao, W., Xiong, L., Zhang, Q., and Wu, C. (2013). Mutant
resources for the functional analysis of the rice genome. Mol Plant 6, 596-604.
Wang, R., Yang, F., Zhang, X.Q., Wu, D., Tan, C., Westcott, S., Broughton, S., Li,
C., Zhang, W., and Xu, Y. (2017). Characterization of a Thermo-Inducible
Chlorophyll-Deficient Mutant in Barley. Front Plant Sci 8, 1936.
Warr, A., Robert, C., Hume, D., Archibald, A., Deeb, N., and Watson, M. (2015).
Exome Sequencing: Current and Future Perspectives. G3 (Bethesda) 5, 1543-1550.
Wehner, G.G., Balko, C.C., Enders, M.M., Humbeck, K.K., and Ordon, F.F. (2015).
Identification of genomic regions involved in tolerance to drought stress and drought
stress induced leaf senescence in juvenile barley. BMC Plant Biol 15, 125.
Wenzl, P., Carling, J., Kudrna, D., Jaccoud, D., Huttner, E., Kleinhofs, A., and
Kilian, A. (2004). Diversity Arrays Technology (DArT) for whole-genome profiling
of barley. Proc Natl Acad Sci U S A 101, 9915-9920.
Wenzl, P., Li, H., Carling, J., Zhou, M., Raman, H., Paul, E., Hearnden, P., Maier,
C., Xia, L., Caig, V., et al. (2006). A high-density consensus map of barley linking
DArT markers to SSR, RFLP and STS loci and agricultural traits. BMC Genomics 7,
206.
174
Westberg, E., and Kadereit, J.W. (2014). Genetic evidence for divergent selection
onOenanthe conioidesandOe. aquatica(Apiaceae), a candidate case for sympatric
speciation. Biol J Linn Soc 113, 50-56.
Wu, C., Li, X., Yuan, W., Chen, G., Kilian, A., Li, J., Xu, C., Li, X., Zhou, D.X.,
Wang, S., et al. (2003). Development of enhancer trap lines for functional analysis of
the rice genome. Plant J 35, 418-427.
Wu, C., You, C., Li, C., Long, T., Chen, G., Byrne, M.E., and Zhang, Q. (2008).
RID1, encoding a Cys2/His2-type zinc finger transcription factor, acts as a master
switch from vegetative to floral development in rice. Proc Natl Acad Sci U S A 105,
12915-12920.
Wu, R. (1970). Nucleotide sequence analysis of DNA. I. Partial sequence of the
cohesive ends of bacteriophage lambda and 186 DNA. J Mol Biol 51, 501-521.
Wu, R., and Kaiser, A.D. (1968). Structure and base sequence in the cohesive ends of
bacteriophage lambda DNA. J Mol Biol 35, 523-537.
Xu, T. (1982). Origin and Evolution of Cultivated Barley in China [J]. Acta Genetica
Sinica 6.
Xue, W., Xing, Y., Weng, X., Zhao, Y., Tang, W., Wang, L., Zhou, H., Yu, S., Xu,
C., Li, X., et al. (2008). Natural variation in Ghd7 is an important regulator of
heading date and yield potential in rice. Nat Genet 40, 761-767.
Yamagata, Y., Yamamoto, E., Aya, K., Win, K.T., Doi, K., Sobrizal, Ito, T.,
Kanamori, H., Wu, J., Matsumoto, T., et al. (2010). Mitochondrial gene in the
175
nuclear genome induces reproductive barrier in rice. Proc Natl Acad Sci U S A 107,
1494-1499.
Yan, H., Saika, H., Maekawa, M., Takamure, I., Tsutsumi, N., Kyozuka, J., and
Nakazono, M. (2007). Rice tillering dwarf mutant dwarf3 has increased leaf
longevity during darkness-induced senescence or hydrogen peroxide-induced cell
death. Genes Genet Syst 82, 361-366.
Yan, W., Liu, H., Zhou, X., Li, Q., Zhang, J., Lu, L., Liu, T., Liu, H., Zhang, C.,
Zhang, Z., et al. (2014). Natural variation in Ghd7. 1 plays an important role in grain
yield and adaptation in rice. Cell Res 23, 2013-2023.
Yang, J., Zhao, X., Cheng, K., Du, H., Ouyang, Y., Chen, J., Q iu, S., Huang, J.,
Jiang, Y., Jiang, L., et al. (2012). A killer-protector system regulates both hybrid
sterility and segregation distortion in rice. Science 337, 1336-1340.
Yang, Z., Zhang, T., Bolshoy, A., Beharav, A., and Nevo, E. (2009). Adaptive
microclimatic structural and expressional dehydrin 1 evolution in wild barley,
Hordeum spontaneum, at 'Evolution Canyon', Mount Carmel, Israel. Mol Ecol 18,
2063-2075.
Yano, K., Yamamoto, E., Aya, K., Takeuchi, H., Lo, P.C., Hu, L., Yamasaki, M.,
Yoshida, S., Kitano, H., Hirano, K., et al. (2016). Genome-wide association study
using whole-genome sequencing rapidly identifies new genes influencing agronomic
traits in rice. Nat Genet 48, 927-934.
176
Yao, W., Li, G., Yu, Y., and Ouyang, Y. (2018). funRiceGenes dataset for
comprehensive understanding and application of rice functional genes. Gigascience
7, 1-9.
Yonemaru, J., Ebana, K., and Yano, M. (2014). HapRice, an SNP haplotype database
and a web tool for rice. Plant Cell Physiol 55, e9.
Youssef, H.M., Eggert, K., Koppolu, R., Alqudah, A.M., Poursarebani, N., Fazeli,
A., Sakuma, S., Tagiri, A., Rutten, T., Govind, G., et al. (2017). VRS2 regulates
hormone-mediated inflorescence patterning in barley. Nat Genet 49, 157-161.
Zeng, D., Tian, Z., Rao, Y., Dong, G., Yang, Y., Huang, L., Leng, Y., Xu, J., Sun,
C., Zhang, G., et al. (2017). Rational design of high-yield and superior-quality rice.
Nat Plants 3, 17031.
Zeng, X., Long, H., Wang, Z., Zhao, S., Tang, Y., Huang, Z., Wang, Y., Xu, Q.,
Mao, L., Deng, G., et al. (2015). The draft genome of Tibetan hulless barley reveals
adaptive patterns to the high stressful Tibetan P lateau. Proc Natl Acad Sci U S A
112, 1095-1100.
Zhang, G., Liu, X., Quan, Z., Cheng, S., Xu, X., Pan, S., Xie, M., Zeng, P., Yue, Z.,
Wang, W., et al. (2012). Genome sequence of foxtail millet (Setaria italica) provides
insights into grass evolution and biofuel potential. Nat Biotechnol 30, 549-554.
Zhang, J., Guo, D., Chang, Y., You, C., Li, X., Dai, X., Weng, Q., Zhang, J., Chen,
G., Li, X., et al. (2007). Non-random distribution of T-DNA insertions at various
levels of the genome hierarchy as revealed by analyzing 13 804 T-DNA flanking
sequences from an enhancer-trap mutant library. Plant J 49, 947-959.
177
Zhang, J., Li, C., Wu, C., Xiong, L., Chen, G., Zhang, Q., and Wang, S. (2006a).
RMD: a rice mutant database for functional analysis of the rice genome. Nucleic
Acids Res 34, D745-748.
Zhang, X., Yazaki, J., Sundaresan, A., Cokus, S., Chan, S.W., Chen, H., Henderson,
I.R., Shinn, P., Pellegrini, M., Jacobsen, S.E., et al. (2006b). Genome-wide high-
resolution mapping and functional analysis of DNA methylation in arabidopsis. Cell
126, 1189-1201.
Zhao, H., Yao, W., Ouyang, Y., Yang, W., Wang, G., Lian, X., Xing, Y., Chen, L.,
and Xie, W. (2015). RiceVarMap: a comprehensive database of rice genomic
variations. Nucleic Acids Res 43, D1018-1022.
Zhao, Y. (2010). Auxin biosynthesis and its role in plant development. Annu Rev
Plant Biol 61, 49-64.
Zhou, F., Lin, Q., Zhu, L., Ren, Y., Zhou, K., Shabek, N., Wu, F., Mao, H., Dong,
W., Gan, L., et al. (2013). D14-SCF(D3)-dependent degradation of D53 regulates
strigolactone signalling. Nature 504, 406-410.
Zhou, G., Broughton, S., Zhang, X.Q., Ma, Y., Zhou, M., and Li, C. (2016).
Genome-Wide Association Mapping of Acid Soil Resistance in Barley (Hordeum
vulgare L.). Front Plant Sci 7, 406.
Zhou, G., Zhang, Q., Zhang, X.Q., Tan, C., and Li, C. (2015). Construction of High-
Density Genetic Map in Barley through Restriction-Site Associated DNA
Sequencing. PLoS One 10, e0133161.
178
Zilberman, D., Gehring, M., Tran, R.K., Ballinger, T., and Henikoff, S. (2007).
Genome-wide analysis of Arabidopsis thaliana DNA methylation uncovers an
interdependence between methylation and transcription. Nat Genet 39, 61-69.
Zohary, D., Hopf, M., and Weiss, E. (2012). Domestication of Plants in the Old
World: The origin and spread of domesticated plants in Southwest Asia, Europe, and
the Mediterranean Basin (Oxford University Press on Demand).
Zou, J., Zhang, S., Zhang, W., Li, G., Chen, Z., Zhai, W., Zhao, X., Pan, X., Xie, Q.,
and Zhu, L. (2006). The rice HIGH-TILLERING DWARF1 encoding an ortholog of
Arabidopsis MAX3 is required for negative regulation of the outgrowth of axillary
buds. Plant J 48, 687-698.
179
Appendix 1 - Bioinformatic Pipeline
#################################################################
########
## Creater: Cong Tan
## Contact: [email protected]
## Date: 08-July-2018
#################################################################
########
Whole genome sequencing SNP and InDel analysis pipeline
#################################################################
##########
## part1: sequencing result control and filtration
## trimming
rule clean_sickle:
message: """------ start cleaning ------"""
input: "raw/{smp}_1.clean.fq.gz",
"raw/{smp}_2.clean.fq.gz"
output: "clean/{smp}_1.fq.gz",
"clean/{smp}_2.fq.gz",
"clean/{smp}_0.fq.gz"
180
params: "-l 80 -q 30 -t sanger -g"
log: "logs/clean/{smp}.log"
shell: """
sickle pe {params} -f {input[0]} -r {input[1]} -o {output[0]} -p
{output[1]} -s {output[2]} &>{log}
"""
## fastqc
rule QA_fastqc:
input: "clean/{smp}_1.fq.gz",
"clean/{smp}_2.fq.gz",
"clean/{smp}_0.fq.gz"
output: "clean/fastqc/{smp}_1_fastqc.zip",
"clean/fastqc/{smp}_2_fastqc.zip"
"clean/fastqc/{smp}_0_fastqc.zip"
params: " -o clean/fastqc/ -f fastq --extract "
log: "logs/fastqc/{smp}.log"
shell: """
fastqc {params} {input[0]} {input[1]} {input[2]} 2>{log}
"""
#################################################################
########################################3
## part2: mapping reads to the reference
## mapping for each lane
rule mapping_pe:
input: fwd="clean/{smp}/{id}_1.fq.gz",
181
rev="clean/{smp}/{id}_2.fq.gz"
params: " -M -R '@RG\tID:{id}\tSM:{smp}' "
output: pe="mapped/{smp}/{id}.pe.bam"
log:"logs/bwa/{smp}/{id}.bwa.log"
threads:16
shell:"""
bwa mem {params} -t {threads} {REF} {input.fwd} {input.rev} |\
samtools view -bSh -o {output.pe}
"""
rule mapping_se:
input:se="clean/{smp}/{id}_0.fq.gz"
params: " -M -R '@RG\tID:{id}\tSM:{smp}' "
output: se="mapped/{smp}/{id}.se.bam"
threads: 16
shell: """
bwa mem {params} -t {threads} {REF} {input.se} |\
samtools view -bhS -o {output.se}
"""
## merge PE and SE for each lane
rule merge_id:
input: pe="mapped/{smp}/{id}.pe.bam",
se="mapped/{smp}/{id}.se.bam"
output:bam="mapped/{smp}/{id}.bam"
log: "logs/samtools/{smp}/{id}.merge.log"
shell:"""
samtools merge -f {output.bam} {input.pe} {input.se} && \
rm {input.pe} {input.se}
182
"""
## sort for each lane
rule sort_samtools:
input: "mapped/{smp}/{id}.bam"
params: "-T mapped/{smp}/{id}"
output: "mapped/{smp}/{id}.srt.bam"
log: "logs/samtools/{smp}/{id}.sort.log"
threads:16
shell:"""
samtools sort {params} -o {output} {input} \
&& rm {input} 2>{log}
"""
rule rmdup:
input:"mapped/{smp}/{id}.srt.bam"
output: "mapped/{smp}/{id}.rmdup.bam"
log:"logs/rmdup_{smp}_{id}.log"
shell:" samtools rmdup {input} {output} && rm {input} 2>{log} "
## merge lanes for each sample
rule bam_list:
input:expand("mapped/{smp}/{id}.rmdup.bam",zip,smp=SAMPLES,id=IDS)
output:expand("mapped/{smp}/{smp}.bam.txt",smp=list(set(SAMPLES)))
run:
for one in input:
ones = one.split("/")
183
ofile = open("/".join([ones[0],ones[1],ones[1]])+".bam.txt","a")
ofile.write(one+"\n")
rule merge_sample:
input:"mapped/{smp}/{smp}.bam.txt"
output:"mapped/{smp}/{smp}.bam"
run:
ifile = open(input[0]).readlines()
if len(ifile) == 1:
one = ifile[0].strip()
ones = ifile[0].split("/")
two = "/".join([ones[0],ones[1],ones[1]]) + ".bam"
os.rename(one,two)
elif len(ifile) > 1:
cmd="samtools merge -b" + input[0] + " " + output[0]
os.system(cmd)
rule bam_index:
input:"mapped/{smp}/{smp}.bam"
output:"mapped/{smp}/{smp}.bam.bai"
log:"logs/samtools/samtools_index_{smp}.log"
shell:"samtools index {input}"
#################################################################
#########
## part3: SNP and InDel calling by bcftools (round 1)
184
## variants calling
rule call_bcftools:
input:"bam/{smp}.rmdup.bam"
output:"bcf/{smp}.raw.bcf"
log: "logs/bcf/bcf-call.log"
params: sam="-C50 -Q20 -m3 -t AD,DP -u ",bcf="-m -O b"
threads: 8
shell: """
samtools mpileup {params[sam]} -f {ref} -b {input[0]} |\
bcftools call {params[bcf]} -o {output[0]} 2>{log}
"""
## variants filtering
rule filter_bcftools:
input:"bcf/{smp}.raw.bcf"
output:"bcf/{smp}.flt.bcf"
log:"logs/bcf/filter.log"
params:"-e 'QUAL<50||MQ<40 ' -O b "
shell:" bcftools filter -o {input} {output} 2>{log}"
## variants result VCF file index
rule index_bcftools:
input:"bcf/{smp}.flt.bcf"
output:"bcf/{smp}.flt.bcf.csi"
log:"logs/bcf/index.log"
shell:"bcftools index {input} 2>{log}"
185
#################################################################
#########
## part4: SNP and InDel calling by GATK (round 2 and 3)
##gatk-target_creator
rule target_creator:
input:"mapped/{smp}/{id}.rmdup.bam"
output:"mapped/{smp}/{id}.intervals"
threads:8
shell:"""
gatk RealignerTargetCreator \
-I {input} -R {ref} -o {output} \
-fixMisencodedQuals -nt 8
"""
## gatk-realign
rule realign2:
input:"mapped/{smp}/{id}.rmdup.bam","mapped/{smp}/{id}.intervals"
output:"mapped/{smp}/{id}.realign.bam"
threads:8
shell:"""
gatk IndelRealigner \
-R {ref} \
-I {input[0]} \
-targetIntervals {input[1]} \
-o {output}
"""
186
## gatk-calling
rule call-gatk:
input:"bam/{smp}_realign.bam"
output:"gatk/{smp}_raw.g.vcf"
threads:8
shell:"""
gatk HaplotypeCaller
-R {ref} \
-I {input} \
--emitRefConfidence GVCF \
--variant_index_type LINEAR --variant_index_parameter 128000 \
--genotyping_mode DISCOVERY \
-stand_emit_conf 10 -stand_call_conf 30 \
-o {output} 2
"""
## gatk-genotype
rule genotype:
input:"gvcf.list"
output:"genotype.vcf"
threads:30
shell:"""
gatk GenotypeGVCFs \
-R {ref} --variant {input} -o {output}
"""
187
Appendix 2 - List of publications
1. Mascher M, (… Tan C) et al. A chromosome conformation capture ordered
sequence of the barley genome. Nature. 2017; doi:10.1038/nature22043.
2. Beier S, (…Tan C) et al. Construction of a map-based reference genome
sequence for barley, Hordeum vulgare L. Sci data. 2017;
doi:10.1038/sdata.2017.44.
3. Prade VM, (… Tan C) et al. The pseudogenes of barley. Plant J. 2017;
doi:10.1111/tpj.13794.
4. Wang R, (… Tan C) et al. Characterization of a Thermo-Inducible Chlorophyll-
Deficient Mutant in Barley. Front Plant Sci. 2017; doi:10.3389/fpls.2017.01936.
5. Zhang Q, Zhang X, Wang S, Tan C, Zhou G, Li C. Involvement of alternative
splicing in barley seed germination. P loS one. 2016;
10.1371/journal.pone.0152824.
6. Jia Q, Tan C, Wang J et al. Marker development using SLAF-seq and whole-
genome shotgun strategy to fine-map the semi-dwarf gene ari-e in barley. BMC
Genomics. 2016; doi:10.1186/s12864-016-3247-4.
7. Zhou G, Zhang Q, Zhang XQ, Tan C, Li C. Construction of high-density genetic
map in barley through restriction-site associated DNA sequencing. PloS one.
2016; doi:10.1371/journal.pone.0133161.
8. Zhou G, Zhang Q, Tan C, Zhang XQ, Li C. Development of genome-wide InDel
markers and their integration with SSR, DArT and SNP markers in single barley
map. BMC Genomics. 2016; doi:10.1186/s12864-015-2027-x.
188
9. Yang H, Jian J, Li X, Renshaw D, Clements J, Sweetingham MW, Tan C, Li C.
Application of whole genome re-sequencing data in the development of
diagnostic DNA markers tightly linked to a disease-resistance locus for marker-
assisted selection in lupin (Lupinus angustifolius). BMC Genomics. 2015;
doi:10.1186/s12864-015-1878-5.
189
Appendix 3- One manuscript
190
Sympatric ecological speciation of wild barley driven by microclimates at Evolution
Canyon, Israel
Cong Tan1#
, Fei Dai2#
, Camilla Beate Hill1#
, Penghao Wang3, Brett Chapman
4, Qisen Zhang
5,
Yong Jia1, Xiaolei Wang
2, Xiao-Qi Zhang
1, Roberto Barrero
4, Gaofeng Zhou
1, Yong Han
1,
Wenying Zhang6, Manuel Spannagl
6, Ilka Braumann
7, Alan H. Schulman
8,9, Peter
Langridge10
, Matthew Bellgard4, Robbie Waugh
11, Dongfa Sun
12, Eviatar Nevo
13*, Guoping
Zhang2*
, Chengdao Li1,14*
1Western Barley Genetics Alliance, Western Australian State Agricultural Biotechnology
Centre, School of Veterinary and Life Sciences, Murdoch University, WA 6150, Australia
2Department of Agronomy, Zhejiang Key Laboratory of Crop Germplasm, Zhejiang
University, Hangzhou 310058, China
3School of Veterinary and Life Sciences, Murdoch University, Murdoch, WA 6150, Australia
4Centre for Comparative Genomics, Murdoch University, South Street, Murdoch, Perth, WA
6150, Australia
5Australian Export Grains Innovation Centre, 3 Baron-Hay Court, South Perth, WA 6155,
Australia
6Plant Genome and Systems Biology, Helmholtz Center Munich – German Research Center
for Environmental Health, 85764 Neuherberg, Germany
7Carlsberg Laboratory, Gamle Carlsberg Vej 10, 1799 Copenhagen V, Denmark
8LUKE/BI Plant Genomics Lab, Institute of Biotechnology, Viikki BiocenterUniversity of
Helsinki
9LUKE Natural Resources Institute Finland
10School of Agriculture, Food and Wine, Waite Research Institute, The University of
Adelaide, PMB 1, Glen Osmond, SA 5064, Australia
11The James Hutton Institute, Invergowrie, Dundee DD2 5DA, Scotland, UK
12Huazhong Agricultural University, Wuhan, China
13Institute of Evolution, University of Haifa, Haifa 3498838, Israel
14Department of Primary Industries and Regional Development, South Perth, WA 6155,
Australia
#these authors contributed equally to this work
191
*Corresponding authors: Chengdao Li ([email protected]); Guoping Zhang
([email protected]); Eviatar Nevo ([email protected])
192
Abstract
Sympatric speciation is a highly contentious concept and it is under strong debate whether
natural selection can outperform the continuous homogenizing force of free gene flow in a
sympatric ecological condition and generate reproductive incompatibility. Here, we provide
substantial new genomic evidence for adaptation-driven incipient sympatric ecological
speciation of two wild barley populations within 250 m from the ‘Evolution Canyon’ (EC),
consisting of two abutting slopes with drastically different microclimates. Phylogenetic
relationship and population structure analysis uncovered extraordinary genetic divergence
between interslope EC populations, considerably greater than between the Tibetan wild and
cultivated barleys. A genome-wide FST analysis revealed detailed information on the
genomic differentiation with a total of 243 Mb genomic regions under environmental
selection, representing ~5% of the barley reference genome. One prominent ‘chromosome
island’, spanning a continuous region of around 200 Mb, was identified on chromosome 3H
under strong natural selection. A homologous gene involving the indica-japonica rice
speciation located in the selection sweep region with unique amino acid substitution between
the two populations. In-situ hybridization results demonstrated potential chromosome
structure changes in this region. RNA-seq analysis demonstrated that the genotype from one
slope has evolved unique mechanisms in responding to heat and drought. Our study provides
comprehensive genomic evidence and new insights to understand the process of incipient
sympatric speciation of wild barleys.
193
Wild barley (Hordeum spontaneum L.) is the ancestor of cultivated barley (Hordeum vulgare
L.) with a wide geographic distribution across highly diverse environments throughout the
Near East, from the Eastern Mediterranean coasts to Tibet1. Genomic resources obtained
from wild crop relatives have dramaticly contributed to enhancing crop productivity2,3
and
understanding the genetic basis of adaptation and speciation. The evolutionary processes that
drive the extraordinary adaptive genomic divergence in wild crops hold great potential for
crop improvement3-5
.
The high genetic and environmental heterogeneity across the distribution range of wild barley
suggests local adaptation and genomic differentiation along macro and microenvironmental
gradients. The ‘Evolution Canyon’ I (EC), located in Lower Nahal Oren, Mount Carmel in
Israel, is a microsite model for biodiversity evolution, adaptation and incipient sympatric
speciation across life6-8
. Different biotic and abiotic stress pressures at EC make it well suited
to characterize the causes and driving forces of evolution in action3,9
. EC consists of two
abutting slopes separated (on average) by a distance of only 250 m: the temperate, forested,
cool and humid, ‘European’ north-facing slope (NFS) and the tropical, savannoid, hot and
dry, ‘African’ south-facing slope (SFS). The geology and macroclimates are very similar on
the opposing slopes, but they differ sharply in their microclimates (Fig. S1). The SFS features
extreme levels of solar radiation (200‒800% more than NFS) with higher daily temperatures
and drought, whereas NFS is characterized by a cooler climate with a higher relative
humidity (1–7%) than SFS3,6,10
. The sharp microclimatic divergence between the abutting
slopes10
has resulted in contrasting populations of microbial, animal, fungi and plant species8.
We investigated the genomic and transcriptomic signatures of microevolution of wild barley
from EC, collected from mid-stations on the abutting slopes along the North–South
microclimatic gradient of EC. We used high coverage (~40X) whole genome sequencing,
with a comparison to Tibetan wild as well as cultivated barley varieties. This study highlights
194
genetic variants (SNPs and short insertions/deletions) including genes with significant
mutations (missense, lost or gained start or stop codons) identified using the map-based
reference sequence of the barley genome11
as a reference. Genome-wide RNA profiling
identified differentially expressed genes (DEG) between NFS and SFS. This study provides
tangible evidence of ecological stress as a force that generates and maintains local adaptive
patterns, leading to incipient sympatric adaptive ecological speciation, and linking micro and
macro-evolutionary processes in wild barley populations.
Results
Dramatic divergence between wild barleys from SFS and NFS
To investigate the genetic diversity of wild barleys in the ‘Evolution Canyon’ I (EC) in Israel,
transcriptome sequencing was performed with ten samples from SFS and NFS (Fig S1) in
this study. About 3.70 Gb clean sequencing data was obtained for each wild barley accession
on average. A total of 218,709 SNPs and 8,640 InDels detected for these ten barley
accessions in transcriptome level. SNP number for individuals ranges from 92,556 to 119,675
and InDel from 4,174 to 5,407. The SNP frequency distribution of ten wild barleys in each 1
Mb region along seven chromosomes were showed in circus plot (Fig 1a). Overall, the SNP
frequency within distal regions are higher than that in centromere regions. The distribution
patterns are similar for wild barley accessions from each slope but different for accessions
from opposing slopes.
To characterize the genetic divergence of wild barleys from the opposing slopes in the
“Evolution Canyon”, neighbour-joining tree, principle component and population structure
analysis are conducted based on the SNP variants. From the neighbour-joining tree (Fig 1b),
five accessions from the SFS slopes were clearly separated with the other five accessions
from the NFS group. The evolutionary distance was calculated using F84 model12
based on
SNP variants of these ten wild barley accessions (Table S3). The distance among five
195
accessions in SFS are 0.14~0.23 and in NFS 0.18~0.16. However, the distance between SFS
group and NFS group is up to 1.03, which indicates the dramatic genetic divergence between
SFS and NFS wild barleys. Wild barleys from the opposing slopes were also apparently split
into two clusters according to the first eigenvector, which explain up to 56% of genetic
difference between SFS and NFS wild barleys (Fig. 1c). The obvious genetic divergent of
wild barley between the two slopes was also confirmed by population structure analysis (Fig
1d). When K value was set to two, five wild barleys from SFS was inferred to originate from
one ancestor while five NFS wild barleys are from another ancestor.
Bigger genomic divergence between SFS and NFS wild barleys than that between
Tibetan wild barleys and domesticated barleys.
To further investigate the genomic divergence of wild barleys from the opposing slopes, we
performed high coverage whole genome sequencing with another ten barley accessions. They
represent four barley groups: six wild barley accessions collected from the opposing slopes of
the ‘Evolution Canyon’ (SFS and NFS), two wild barley accessions from the Tibetan Plateau
(TB1, TB2) and two cultivated barleys (DB1 and DB2) (Table S1). About 190 Gb of clean
data was obtained for each barley accession on average, which corresponded to about 39.4×
genome coverage of the recently published barley genome reference11
. De novo assembly
was performed for eight out of the ten accessions with the assembly length ranging from 3.63
to 4.54 Gb and N50 ranging from 7.18 to 10.89 Kb (Table S2). More than 99% of clean reads
were mapped to the barley reference genome11
. Up to 84% of the reference sequence
positions were sequenced at least five times. Based solely on reads with unique mapping
position in the reference genome, 63,565,829 single nucleotide variants (SNPs) and
2,673,996 short insertions/deletions (InDels) were identified for ten barley accessions in total.
The distribution of SNP frequency in each 1 Mb region along chromosomes was plotted in
Fig. 2a. In contrast to the RNA sequencing data, the SNP were evenly distributed in the wild
196
barley genomes from EC. However, loss of genetic diversity in some chromosome regions
were clearly evident in the cultivated barleys and even Tibetan wild barleys.
Based on the genome-wide SNPs, neighbor-joining tree, principle component and population
structure were conducted to compare the evolution distance of wild barley between the
apposing slopes and that between Tibetan wild barley and cultivated barley. In the neighbour-
joining tree (Fig. 2b), six wild barley from the “Evolution Canyon” were separated from both
cultivated barley and Tibetan wild barley. The evolutionary distance between SFS and NFS
wild barleys is about 0.60, which is twice of that between Tibetan wild barley and cultivated
barley (~ 0.30) (Table S4). In the plot of principle component analysis (Fig. 2c), six EC wild
barleys were separated from Tibetan wild barley and cultivated barley in the first eigenvector
(~ 25%). Three SFS wild barleys were separated from three NFS wild barleys in the second
eigenvector (~ 21%). Both neighbour-joining tree and the PCAs revealed a distinct separation
of the NFS and SFS wild barley populations, in line with the sharply divergent microclimates
of the two slopes at EC. To estimate the individual evolutionary ancestry and admixture
proportions, a population genetic structure analysis with a K-value from 2 to 4 was conducted
(Fig. 2d). When K was set to 3, NFS and SFS wild barleys are indicated to be derived from
two different ancestors while domesticated barley and Tibetan wild barleys are from the third
ancestor. When K was set to 4, four groups of barleys are assumed to be derived from four
different ancestors. This result also shows that the evolutionary relationship between SFS and
NFS wild barleys are much farther than that between Tibetan wild barley and cultivated
barley.
Genomic regions under environmental selection of NFS and SFS
To uncover the genetic basis of the remarkable genetic divergence and to infer genomic
regions subjected to environmental selection, Wright’s F-statistic FST was calculated between
197
SFS and NFS wild barley groups. The frequency distribution of FST value between SFS and
NFS wild barleys was featured by a multimodal distribution (Fig. S2). The distribution of the
windowed FST value between NFS and SFS wild barley along each chromosome is shown in
Fig. 3. The total length of genomic regions with FST values above the threshold of 0.89 was
about 243 Mb, which contains 919 high confidence genes. A continuous genomic region of
about 200 Mb in the central part of chromosome 3H showed strong environmental selection.
To further investigating whether the environmental selection resulted in speciation between
the SFL and NFS wild barleys, we searched the key genes responsible for reproductive
isolation in several plant species including crops13-18
. Using BLASTP, with a threshold E-
value ≤2.00E-56 and aligned amino acids length >100, we identified 22 homologous genes
for reproductive isolation in the barley genome. 88 unique SNPs or InDels were found
between the NFS and SFS genotypes in twelve genes related to reproductive isolation
(Supplementary 2). Five of these genes were in the above selective sweep regions
(HORVU4Hr1G084220, HORVU3Hr1G052250, HORVU1Hr1G060040,
HORVU3Hr1G107970, and HORVU3Hr1G107990). Notably, two genes
(HORVU3Hr1G052250, HORVU1Hr1G060040) mapped in the selection region showed
unique variants between SFS and NFS populations. A gene homologous to the rice hybrid
male sterility locus SaF (HORVU3Hr1G052250), resulted in indica-japonica speciation
through a two-gene/three-components interaction model15
, was located in the selection sweep
signature region of chromosome 3H (Fig. 3). A substitution of Phe287Ser between SaF+ and
SaF- impairs biological function and results in male sterility in the indica-japonica hybrid. It
is interesting to observe a substitution of Thr179Ala between the SFS and NFS genotypes.
Whether this substitution impacts reproduction between interslope wild barley populations
remains to be validated.
198
Different drought stress response of SFS and NFS wild barleys
Drought is a key climatic stress that distinguishes SFS and NFS. To compare the drought
stress response of wild barleys from the two slopes, transcriptome sequencing was performed
for SFS and NFS wild barleys with treatment to simulate drought stress in the SFS. About
44.40 Gb clean data was generated for twelve samples including SFS and NFS wild barleys
with two conditions and three replications. Differential gene expression was analysed for SFS
and NFS wild barley under drought stress and the overview information is given in Table 1.
274 genes were commonly detected to be differentially expressed in both SFS and NFS wild
barley under drought. 2,791 genes were differentially expressed only in SFS wild barley
under drought stress whereas 1,030 differentially expressed genes only in NFS wild barley
(Fig. 4). The frequency of uniquely deferential expression genes in each 10 Mb region along
seven chromosomes was showed in Fig. 5. More genes were up-regulated in SFS than NFS
wild barley, whereas more genes were down-regulated in NFS than SFS wild barley. GO
enrichment analysis was conducted for DEGs in SFS and NFS wild barleys under drought
stress (Fig. 4S, Fig. 5S). Over-represented GO terms DEGs in SFS wild barley includes
translational elongation, ribosome biogenesis, organic acid metabolic process, lipid metabolic
process and so on (Fig. 4S). The over-represented GO terms in NFS wild barley is different
with that in SFS. For example, lipid metabolic process is over-represented in the DEGs of
SFS wild barley, while it is under-represented in the DEGs of NFS wild barley (Fig. 5S). The
over-represented GO terms in NFS wild barley includes amine metabolic process, tryptophan
metabolic process and so son.
In theory, DEGs response to drought treatment from the SFS is more likely to play important
roles in enhancing the drought tolerance. Thus, we compared the protein similarity between
genes proved to be responsible for drought tolerance in nine species19
and drought-treated
DEGs in SFS wild barley but not NFS. Particularly, we have found 46 of them having
199
homologous genes with experimental evidences to control plant drought resistance (Table
S4). For example, the protein sequence identity between HORVU2Hr1G060350 and CIPK3
is up to 64% with the coverage of 95% (425/446 peptides). CIPK3 was reported to regulate
abscisic acid and cold signal transduction as a calcium sensor-associated protein kinase in
Arabidopsis20
. It was also demonstrated that the expression of CIPK3 is responsive to ABA
and stress conditions and the expression pattern of a number of stress gene markers were
altered by the disruption of CIPK3. Another drought responsive differential expression gene
HORVU2Hr1G075470 in SFS was detected to have 72% protein sequence identity with 92%
coverage with SRK2C gene. The SRK2C was reported to improve drought tolerance by
controlling the expression level of stress-responsive genes in Arabidopsis21
.
Discussion
Cultivated barley were domesticated around 10,000 years ago at the Fertile Crescent where
their wild relatives still thrive today. However, a large proportion of suitable habitats for wild
barley have been lost due to climate changes since the last glacial ages when the climate was
much wetter and cooler22
. Located at Mount Carmel in Israel, EC is the natural habitat of
wild barley and often referred to as the ‘Israeli Galapagos’8. The SFS in EC features extreme
levels of solar radiation (200‒800% higher than the NFS) and the associated higher daily
temperatures and drought3,6,10
. NFS is characterized by a cooler climate with higher relative
humidity (1–7%) than SFS. Thus, EC provides an ideal model to understand the impact of
global warming on genome evolution. Despite evidence of substantial gene flow between the
abutting slopes, significant interslope genetic variation and diversity were detected13-16
. In
this study, whole genome wide genomic diversity was investigated for both SFS and NFS
wild barleys. And dramatic genetic divergency was proved to exist between SFS and NFS
wild barleys both in transcriptome level and whole genome level. It was also suggested that
the genetic divergence between SFS and NFS wild barleys are much larger than that between
Tibetan wild barley and cultivated barley. All these evidence indicates that sympatric
speciation exists in wild barley in EC microclimate environment, which coincides with that
reported in other species23-25
.
200
Considering that the limited gene flow may be a major factor causing the sympatric
speciation of wild barley in the opposing slopes, allele frequency was analysed in whole
genome based on the Wright’s F-statistic FST. Up to 243 Mb genomic regions containing
919 high confidence genes were indicated to under the selection of differential microclimates
between SFS and NFS. 52 out of the 919 genes were differentially express in SFS wild barley
under drought stress and 18 were differentially expressed in NFS. There is a large genomic
region with length of about 200 Mb in central part of chromosome 3H showing dramatic
environmental selection. We also performed Fluorescence in situ hybridization (FISH) to
compare the overall karyotypical difference of each chromosome between SFS and NFS wild
barleys (Fig. 6). The dramatic karyotypical differenced was detected in the chromosome 3H
between SFS and NFS wild barley. To further exploit the exact genes in these genomic
regions under environmental selection, more controllable experiment materials such as
advanced mapping populations need to construct.
The most contrasting environmental difference between the opposing slopes is drought and
heat due to different levels of solar radiation. Drought alters the gene expression of
compounds that protect cells from dehydration, including genes belonging to the Dehydrin
gene group and heat shock protein 17. Suprunova et al26
measured the expression levels of
Dhn genes in tolerant and sensitive wild barley genotypes, and found that Dhn levels
increased earlier and with higher intensity in tolerant genotypes than in sensitive genotypes.
Confirming these findings, Yang et al27,28
detected higher levels of Dhn expression in the
resistant SFS compared to sensitive NFS wild barley after dehydration24
. To investigate the
influence of drought stress on the transcriptional level of wild barley from the opposing
slopes, we performed transcriptome sequencing and detected DEGs responding to the drought
treatment. We showed that the number of DEGs in SFS barley is about three times of that in
NFS barley after dehydration stress, further supporting interslope genotypic divergence
between tolerant genotypes on the SFS and sensitive genotypes on the NFS. The response to
drought treatment in SFS resulted in differential gene expression of more genes with greater
magnitude compared to NFS, including drought tolerance and earlier flowering-related genes.
Lipid metabolic process is over-represented in DEGs of SFS wild barley but under-
represented in DEGs of NFS wild barleys. Previous study indicated that the adjustment of
lipid content in cell membrane plays vital roles in response to drought stress in Arabidopsis29
.
Thus, adjustment of lipid metabolism may be an important mechanism for the wild barley in
SFS to adapt to the extreme drought and heat environment.
201
Diverse stress factors such as drought can either accelerate or delay flowering in many plant
species. Alteration of flowering in response to stress is known as stress-induced flowering,
which has recently been considered a third category of flowering response in addition to
photoperiodic flowering and vernalisation30
. Wild barley from SFS flowered one to two
weeks earlier than NFS wild barley, even under well-watered conditions25
. Different
flowering date was also assumed to be one factor limiting the gene flow between SFS and
NFS wild barley. Recently, Nevo et al3 reported that wild barley and wild emmer wheat
(Triticum dicoccoides) populations across Israel flowered about ten days earlier in 2009
compared to 1980 as a result of rising average temperatures due to global warming. In this
study, seven genes with roles in flowering time were detected within the selective sweep
regions, including PHYB and GI. PHYB modulates gene activity to influence plant growth
and development and increases drought tolerance by enhancing abscisic acid (ABA)
sensitivity in Arabidopsis thaliana31
. Here, PHYB transcript was present at higher levels in
SFS and lower levels in NFS wild barley after drought, suggesting that higher PHYB
expression alleviates the symptoms of stress more effectively in SFS wild barley. Drought
also alters the circadian clock, impacting the expression of circadian clock genes such as
GIGANTEA (GI) to mediate different signaling pathways that modulate drought stress
perception and responses32
. In Arabidopsis, GI enabled the drought escape response via
ABA-dependent activation of the flowering time genes FLOWERING LOCUS T (FT), TWIN
SISTER OF FT (TSF) and SUPPRESSOR OF OVEREXPRESSION OF CONSTANS (SOC1).
Drought is a stress that impacts both SFS and NFS, therefore, both had increased GI levels
after drought. The genotype from NFS was particularly vulnerable to drought, such that GI
was more up-regulated than down-regulated in SFS to potentially induce earlier flowering.
The wild barleys in EC have evolved a different mechanism to control flowering time
through light signalling pathway comparing to the cultivated barley in which photoperiodic
response and vernalisation are the major factors for flowering.
Sympatric speciation—the evolution of reproductive isolation without geographic barriers—
is a highly contentious concept in evolutionary biology33,34
. Whether natural selection can
outperform the continuous homogenizing force of free gene flow in a sympatric ecological
condition and generate reproductive incompatibility is under strong debate. Our study is the
first one to use whole-genome sequencing and transcriptome sequencing to investigate the
genomic differentiation pattern of sympatric speciation in a plant species. A genome-wide
FST analysis revealed detailed information on the genomic differentiation between SFS and
202
NFS barley. Notably, the total genomic region identified as a divergence outlier (FST > 0.89)
was about 243 Mb (Fig. 4), representing ~5% of the barley reference genome, and falls
within the 0.4–35.5% range reported for other plants under divergent selection35
. This value
is higher than the 3.6% reported for Oenanthe conioides and Oenanthe aquatic, two Apiaceae
plant species under sympatric speciation36
. In another study, on the sympatric speciation of a
palm tree in an oceanic island, only four of the 274 AFLP loci differed strongly between the
two diverging populations37
. In our study, the wild barley on the two opposing slopes are
under diverse and strong environmental selection involving heat, drought, illuminance and
pathogen stresses6,8
, which may explain the different result from the other two sympatric
speciation cases, where ecological selection in the corresponding plant populations was
driven by a single environmental factor such as soil pH32
or fresh water tides37
, respectively.
Moreover, the proportion of genomes under divergence in both studies was estimated from
AFLP analyses which may be biased and unable to uncover all the genomic regions under
natural selection. Interestingly, genomic divergence analyses of the SFS and NFS barley
revealed one prominent ‘chromosome island’ on chromosome 3H under strong natural
selection, spanning a continuous region of around 200 Mb (Fig. 3). This ‘unusual’ divergence
signature represents a typical genomic differentiation pattern expected for sympatric
speciation, where only those genetic regions directly related to the specific environmental
selection pressure (most likely heat, drought, illuminance stresses) will be differentiated,
while the remainder of the genome is subjected to continuous homogenizing forces due to
free gene flow in the sympatric setting35
. This is consistent with the genetic differentiation
pattern reported in the other two sympatric speciation cases of plants32,37
. In contrast, genetic
differences are expected to accumulate throughout the whole genome for allopatric speciation
with geographic isolation38
. Our results lend further support to the sympatric speciation of
wild barley at EC at the genetic divergence level. Detailed examination of the genomic
function of this chromosome island may shed light on the speciation process of plants and
allow us to understand better how wild barley evolves to adapt to highly diverse
environmental conditions across the world. Our results are also supported by the previous
studies based on F1 hybrids from 40 intraslope and 50 interslope crosses tested for a range of
vegetative and reproductive traits , supports incipient sympatric speciation in wild barley from
opposing slopes at EC7,25
. For most traits, interslope crossbred plants were significantly
inferior to intraslope hybrids.
203
In summary, we observed strong genomic differentiation of wild barley across a short
distance of 250 m due to the sharp microclimatic divergence between the abutting slope
populations at Evolution Canyon I, Israel. To put the local patter ns of genetic variation into a
broader geographic context, we also analyzed two Tibetan wild barley accessions and two
Canadian and Australian cultivated barley varieties. Sample accessions from both sides of the
slopes at EC confirmed the presence of strong genomic differentiation underlying adaptive
incipient sympatric ecological speciation. The process of adaptive incipient sympatric
speciation in wild barley at EC, due to sharp interslope microclimatic divergence is supported
by molecular markers, genomic and transcriptomic evidence, and is shared with organisms
across phylogeny from bacteria to mammals identified in previous studies8. The genetic base
of cultivated barley has rapidly narrowed resulting in loss of genetic diversity, increasing the
risk that cultivated barley will become increasingly susceptible to diseases and environmental
stresses. Wild and cultivated barley are both diploid and inter-fertile opening up the
possibility of using wild barley for genetic variation in the improvement of cultivated barley
against abiotic and biotic environmental stresses.
204
Methods
Sample collection. The two slopes of ‘Evolution Canyon’ (EC) have seven sampling stations
(Fig. S1). Ten barley accessions used in this study for transcriptome sequencing were
independent accessions collected from SFS station #2 and NFS station #6. The ten barley
accessions used in this study for whole-genome sequencing were from four different groups:
Tibetan wild barley (W1 and X1), Australian and Canadian malting barley cultivars (Baudin
and AC Metcalfe), and wild barley from EC, Mount Carmel, Israel (South Face Slope station
SFS2.1, SFS2.2, SFS2.3 and North Face Slope station NFS6.1, NFS7.1 and NFS7.2). The
Tibetan wild barleys were collected on the Tibetan Plateau by T.W. Xu39
. The two
commercial barley cultivars Baudin (Australia) and AC Metcalfe (Canada) are premium
malting quality varieties.
Transcriptome sequencing. Wild barley seeds were sown in pots filled with peat under
well-watered conditions. When the plants reached the third leaf stage, watering was ceased
for the drought treatment but continued for the controls for a further 80 days. Fully expanded
leaves on both the control and droughted plants were sampled with three replicates and flash
frozen in liquid nitrogen for RNA extraction. Total RNA was extracted using TRIzol Reagent
(Invitrogen, Carlsbad, CA, USA) and purified using the RNeasy Mini Kit (Qiagen,
Germantown, MD, USA). 2×150 bp paired-end sequencing libraries were constructed and
sequenced on an Illumina HiSeq2000 platform (Illumina, San Diego, CA, USA) as described
previously40
.
Whole-genome sequencing. Genomic DNA was extracted from the leaves of ten barley
plants, one plant per accession. The DNA was analyzed for quality and concentration using
agarose gels and an Agilent Technologies 2100 Bioanalyzer (Agilent Technologies, Santa
Clara, CA, USA). Genomic DNA was fragmented and then used to prepare Illumina paired-
205
end libraries, following the manufacturer’s instructions (Paired-End Sample Preparation
Guide, Illumina, 1005063). Insert fragment lengths of 500 bp, 700 bp and 1 kb were
generated. A final set of 48 libraries was then sequenced on the HiSeq 2000 platform in BGI-
Shenzhen (Beijing Genomics Institute-Shenzhen, Shenzhen, China).
Genome assembly and assessment. All reads were filtered for quality using Trimmomatic41
,
with a window of 4 bp requiring an average >Q20 and minimum read length of 50 bp.
Genome assemblies for each of the eight accessions were performed at the Pawsey
Supercomputing Centre (Murdoch University, Australia) using 32 GB memory and six cores,
using SOAPdenovo242
with k=51. The overall scaffold numbers and lengths of the
assemblies were determined, and the level of completeness was assessed using BLASTN43
.
Variant calling and validation. The barley pseudo-molecular genome11
was used as the
reference to mapping the clean reads using BWA-MEM44
with default parameters. PCR
Duplications were marked and removed using Picard (version 1.129). Only those reads
having unique mapping positions in reference genome were remained and proceeded to detect
genomic variation (SNPs and InDels). Sequence variations (insertions, deletions and SNPs)
were detected by running three rounds of SAMTools plus BCFToools and the Genome
Analysis ToolKit (GATK) variants calling pipeline. Briefly, the first round was performed
with SAMTool plus BCFTools and filtered based on the mapping quality and variants calling
quality. The result of the first round was used as the guide of realignment around potential
InDel and variants calling for the second round in GATK. The common variants detected by
both SAMTools plus BCFTools pipeline and GATK pipeline were used to guide variants
calling in the third round using GATK.
Population structure analysis. The phylogenetic tree was constructed from the neighbour-
joining method using PHYLIP (version 3) based on evolutionary distance, which was
206
computed with the F84 model using the DNADIST utility in PHYLIP package. The tree file
was constructed and annotated using MEGA45
(version 6) and TreeView (version 1.6.6).
Principle component analysis was performed with the EIGENSOFT program (version 6).
Subpopulation components were estimated using the ADMIXTURE program (version 1.23)
with the K cluster ranging from 1 to 6.
Selective sweep analysis. Selective sweep analysis is one of the main methods for detecting
environmental or natural selection signatures. Genomic regions under selective sweeps due to
environmental stresses on opposing slopes were measured using the fixation index FST.
Selective sweep analysis was conducted by measuring the patterns of allele frequencies in
each 1 Mb fragment along chromosomes using the whole-genome-wide SNP variation data.
Genomic regions with FST values above 0.89 were considered under strong selective sweeps.
Differential expression analysis . From an initial 368,900,936 paired-end raw reads,
295,697,023 clean reads were retained after filtering. Retained pair-ended reads were aligned
against the up-to-date version of the Morex barley pseudomolecule sequences using the open
source Kmer adaptive aligner Biokanga (version 3.8.7). In total, 81,683 transcripts resulted in
130,419,949 unique alignments. A count matrix was generated and loaded into R (version
3.3.1) for downstream statistical analysis. Read counts were normalized using the model-
based Trimmed Mean of M-values (TMM) method of Robinson and Oshlack46
, followed by
fitting of a negative binomial generalized linear model (GLR) accommodating the genotype
differences and effects induced by the drought treatment. The edgeR package47
(version
3.14.0) was used for transcript differential expression analysis. A total of 21,122 transcripts
were included in the differentially expressed (DE) analysis with more than ten reads for at
least three samples. A likelihood ratio test for comparing the genotypes was conducted to
obtain DE transcripts, and the resulting P-values were adjusted for multiple testing using
207
Hochberg’s FDR adjustment approach48
. Transcripts were considered highly differentially
expressed when the adjusted P-value was <0.05. Meanwhile, GO enrichment analysis was
performed on the differentially expressed genes. The gene subsets were compared to the high
confidence gene set of barley genome reference11
to determine enriched or depleted gene
ontology terms. For the enrichment analysis, the free and open-source R package GOstats49
(hypergeometric test) was used with p-value cutoff of 0.05.
Function and pathway annotation. Function and pathway annotation based on protein
homology was performed using an automatic functional annotation and classification tool
(AutoFACT). Briefly, coding sequences were aligned to NT database using program blastn
and to protein databases including NR, Swiss-Prot, KEGG using program blastx with the
same threshold. And then the function and pathway annotations of the best hits with a
biological informative description were assigned to query sequences.
208
References
1. Zohary, D., Hopf, M. & Weiss, E. Domestication of Plants in the Old World: The origin and spread of domesticated plants in Southwest Asia, Europe, and the Mediterranean Basin, (Oxford University Press on Demand, 2012).
2. Hajjar, R. & Hodgkin, T. The use of wild relatives in crop improvement: A survey of developments over the last 20 years. Euphytica 156, 1-13 (2007).
3. Nevo, E. "Evolution Canyon," a potential microscale monitor of global warming across life. Proc Natl Acad Sci U S A 109, 2960-5 (2012).
4. Nevo, E., Shewry, P.R. & others. Origin, evolution, population genetics and resources for breeding of wild barley, Hordeum spontaneum, in the Fertile Crescent, 19-43 (CAB International, Wallingford, UK, 1992).
5. Tester, M. & Langridge, P. Breeding technologies to increase crop production in a changing world. Science 327, 818-22 (2010).
6. Nevo, E. Asian, African and European Biota Meet at Evolution-Canyon Israel - Local Tests of Global Biodiversity and Genetic Diversity Patterns. P Roy Soc B-Biol Sci 262, 149-155 (1995).
7. Nevo, E. Evolution canyon": A microcosm of life's evolution focusing on adaptation and speciation. Isr J Ecol Evol 52, 485-506 (2006).
8. Nevo, E. Evolution in action: adaptation and incipient sympatric speciation with gene flow across life at "Evolution Canyon", Israel. Isr J Ecol Evol 60, 85-98 (2014).
9. Nevo, E. Darwinian Evolution: Evolution in Action across Life at "Evolution Canyon", Israel. Isr J Ecol Evol 55, 215-225 (2009).
10. Pavlícek, T., Sharon, D., Kravchenko, V., Saaroni, H. & Nevo, E. Microclimatic interslope differences underlying biodiversity contrasts in" Evolution Canyon", Mt. Carmel, Israel. Isr J Earth Sci 52(2003).
11. Mascher, M. et al. A chromosome conformation capture ordered sequence of the barley genome. Nature 544, 427-433 (2017).
12. Felsenstein, J. Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 17, 368-76 (1981).
13. Bikard, D. et al. Divergent evolution of duplicate genes leads to genetic incompatibilities within A. thaliana. Science 323, 623-6 (2009).
14. Chen, J. et al. A triallelic system of S5 is a major regulator of the reproductive barrier and compatibility of indica-japonica hybrids in rice. Proc Natl Acad Sci U S A 105, 11436-41 (2008).
15. Long, Y. et al. Hybrid male sterility in rice controlled by interaction between divergent alleles of two adjacent genes. Proc Natl Acad Sci U S A 105, 18871-6 (2008).
16. Yamagata, Y. et al. Mitochondrial gene in the nuclear genome induces reproductive barrier in rice. Proc Natl Acad Sci U S A 107, 1494-9 (2010).
17. Yang, J. et al. A killer-protector system regulates both hybrid sterility and segregation distortion in rice. Science 337, 1336-40 (2012).
18. Mizuta, Y., Harushima, Y. & Kurata, N. Rice pollen hybrid incompatibility caused by reciprocal gene loss of duplicated genes. Proc Natl Acad Sci U S A 107, 20417-22 (2010).
19. Alter, S. et al. DroughtDB: an expert-curated compilation of plant drought stress genes and their homologs in nine species. Database (Oxford) (2015).
20. Kim, K.N., Cheong, Y.H., Grant, J.J., Pandey, G.K. & Luan, S. CIPK3, a calcium sensor-associated protein kinase that regulates abscisic acid and cold signal transduction in Arabidopsis. Plant Cell 15, 411-423 (2003).
21. Umezawa, T., Yoshida, R., Maruyama, K., Yamaguchi-Shinozaki, K. & Shinozaki, K. SRK2C, a SNF1-related protein kinase 2, improves drought tolerance by controlling stress-responsive gene expression in Arabidopsis thaliana. Proc Natl Acad Sci U S A 101, 17306-17311 (2004).
209
22. Russell, J. et al. Genetic diversity and ecological niche modelling of wild barley: refugia, large-scale post-LGM range expansion and limited mid-future climate threats? PLoS One 9, e86021 (2014).
23. Li, K. et al. Sympatric speciation of spiny mice, Acomys, unfolded transcriptomically at Evolution Canyon, Israel. Proc Natl Acad Sci U S A 113, 8254-9 (2016).
24. Hadid, Y. et al. Sympatric incipient speciation of spiny mice Acomys at “Evolution Canyon,” Israel. Proc Natl Acad Sci U S A 111, 1043-1048 (2014).
25. Parnas, T. Evidence for Incipient Sympatric Speciation in Wild Barley Hordeum Spontaneum, at" Evolution Canyon", Mount Carmel, Israel, Based on Hybridization and Physiological and Genetic Diversity Estimates, (University of Haifa, Faculty of Science and Science Education, Department of Evolutionary and Environmental Biology, 2006).
26. Suprunova, T. et al. Differential expression of dehydrin (Dhn) in response to water stress in resistant and sensitive wild barley (Hordeum spontaneum). Plant Cell Environ 27, 297-308 (2004).
27. Yang, Z.J., Zhang, T., Bolshoy, A., Beharav, A. & Nevo, E. Adaptive microclimatic structural and expressional dehydrin 1 evolution in wild barley, Hordeum spontaneum, at 'Evolution Canyon', Mount Carmel, Israel. Mol Ecol 18, 2063-2075 (2009).
28. Yang, Z.J., Zhang, T., Li, G.R. & Nevo, E. Adaptive microclimatic evolution of the dehydrin 6 gene in wild barley at "Evolution Canyon", Israel. Genetica 139, 1429-1438 (2011).
29. Gigon, A., Matos, A.R., Laffray, D., Zuily-Fodil, Y. & Pham-Thi, A.T. Effect of drought stress on lipid metabolism in the leaves of Arabidopsis thaliana (ecotype Columbia). Ann Bot 94, 345-51 (2004).
30. Takeno, K. Stress-induced flowering: the third category of flowering response. J Exp Bot 67, 4925-34 (2016).
31. Gonzalez-Guzman, M. et al. Arabidopsis PYR/PYL/RCAR receptors play a major role in quantitative regulation of stomatal aperture and transcriptional response to abscisic acid. Plant Cell 24, 2483-96 (2012).
32. Riboni, M., Galbiati, M., Tonelli, C. & Conti, L. GIGANTEA Enables Drought Escape Response via Abscisic Acid-Dependent Activation of the Florigens and SUPPRESSOR OF OVEREXPRESSION OF CONSTANS1. Plant Physiol 162, 1706-1719 (2013).
33. Bird, C.E., Fernandez-Silva, I., Skillings, D.J. & Toonen, R.J. Sympatric Speciation in the Post "Modern Synthesis" Era of Evolutionary Biology. Evol Biol 39, 158-180 (2012).
34. Bolnick, D.I. & Fitzpatrick, B.M. Sympatric speciation: Models and empirical evidence. Annu Rev Ecol Evol S 38, 459-487 (2007).
35. Strasburg, J.L. et al. What can patterns of differentiation across plant genomes tell us about adaptation and speciation? Philos Trans R Soc Lond B Biol Sci 367, 364-373 (2012).
36. Westberg, E. & Kadereit, J.W. Genetic evidence for divergent selection on Oenanthe conioides and Oe. aquatica (Apiaceae), a candidate case for sympatric speciation. Biol J Linn Soc 113, 50-56 (2014).
37. Savolainen, V. et al. Sympatric speciation in palms on an oceanic island. Nature 441, 210-3 (2006).
38. Via, S. & West, J. The genetic mosaic suggests a new role for hitchhiking in ecological speciation. Mol Ecol 17, 4334-45 (2008).
39. Tingwen, X. Origin and Evolution of Cultivated Barley in China [J]. Acta Genetica Sinica 6(1982).
40. Dai, F. et al. Transcriptome profiling reveals mosaic genomic origins of modern cultivated barley. Proc Natl Acad Sci U S A 111, 13403-8 (2014).
41. Bolger, A.M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114-20 (2014).
42. Luo, R. et al. SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience 1, 18 (2012).
210
43. Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. Basic local alignment search tool. J Mol Biol 215, 403-10 (1990).
44. Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 26, 589-95 (2010).
45. Tamura, K., Stecher, G., Peterson, D., Filipski, A. & Kumar, S. MEGA6: Molecular Evolutionary Genetics Analysis version 6.0. Mol Biol Evol 30, 2725-9 (2013).
46. Robinson, M.D. & Oshlack, A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol 11, R25 (2010).
47. Robinson, M.D., McCarthy, D.J. & Smyth, G.K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139-40 (2010).
48. Benjamini, Y. & Hochberg, Y. Controlling the False Discovery Rate - a Practical and Powerful Approach to Multiple Testing. J Roy Stat Soc B Met 57, 289-300 (1995).
49. Gentleman, R.C. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 5, R80 (2004).
211
Acknowledgements
This work was supported by the Australian Grain Research and Development Corporation
(DAW00233) and the National Natural Science Foundation of China (31620103912 and
31471480) and the Natural Science Foundation of Zhejiang Province Grant LR15C130001.
Bioinformatic resources were provided by the Pawsey Supercomputing Centre with funding
from the Australian Government and the Government of Western Australia, and the NeCTAR
Research Cloud, a collaborative Australian research platform supported by the National
Collaborative Research Infrastructure Strategy (NCRIS). The International Barley Genome
Sequence Consortium provided pre-publication access to the barley reference genome
sequence with particular thanks to Nils Stein, Martin Mascher, Tim Close and Mats Hansson.
Authorship contributions
EN, DS, WZ, CL collected the samples. XZ, FD, WZ, CL performed DNA sequencing. FD,
XW, GZ performed RNA sequencing and the drought experiment. CT, PW, BC, RB, MB
processed the data and performed computational analyses. CT, CBH, PW, BC, RB, QZ, CL
analyzed the data and interpreted the results. CT, CBH, EN, RB, BC, YJ, QZ, PW, YH and
CL wrote the manuscript. MS, IB, AS, PL, RW, GZ and CL provided the reference genome
sequence. CL, EN and GZ conceived and designed the study. All authors have read and
approved the manuscript for publication.
Competing financial interests: The authors declare no competing financial interest
Materials & Correspondence: Correspondence and requests for materials should be
addressed to C.L.
Data availability
212
All of the raw reads for the whole-genome shotgun sequencing were deposited in NCBI-SRA
with accession number SRP076351, and sample IDs SAMN05207291 to SAMN05207298,
SAMN08038525 and SAMN08038526. Accession identifiers for ten RNA-seq samples are
SAMN05200841 and SAMN05207047-SAMN05207055.
FIGURE LEGENDS
Fig. 1 | Genomic diversity and evolution related analysis based on RNA sequencing of
ten samples from two slopes of ‘’Evolution Canyon”. a, SNP frequency distribution is
displayed as ten circular histograms representing ten barley accessions from two slopes. From
inner to outer, circle 1-10 indicates barley accessions of NFS148, NFS146, NFS145,
NFS144, NFS143, SFS125, SFS124, SFS123, SFS122, SFS120. The height of bars indicates
the SNP number (logarithm with base of 10) in each 1 Mb genomic region. b, Neighbor-Join
tree construction. c, Principle component analysis. This analysis was conducted based on
218,709 SNPs without missing data using EIGENSOFT. d, Population structure analysis was
conducted using admixture.
Fig. 2 | Genomic diversity and evolution related analysis based on whole genome
sequencing of ten samples from four different groups. a, SNP frequency distribution is
displayed as ten circular histograms representing ten barley accessions. From inner to outer,
circle 1-10 indicates barley accessions of DB2, DB1, TB2, TB1, SFS2.2, SFS2.1, SFS2.3,
NFS6.1, NFS7.2, NFS7.1. The height of bars indicates the SNP number (logarithm with base
of 10) in each 1 Mb genomic region.; b, Neighbour-Join tree construction. It was conducted
using Phylip based on the sequence distance, which was calculated based on 29429 SNPs
evenly distributed in each chromosome. c, Principle component analysis. This analysis was
213
conducted based on 54,255,907 SNPs without missing data using EIGENSOFT. d,
Population structure analysis was conducted using admixture
Fig. 3 | Selection sweep analysis in whole genome. It was conducted based on on
54,255,907 SNPs from whole genome sequencing without missing data using VCFtools. Red
horizontal line indicates the threshold of 95%, which is the Fst value of 0.89.
Fig. 4 | Statistics of DEGs under drought stress treatment in SFS and NFS. Yellow circle
indicates numbers of DEGs in SFS, blue for DEGs in NFS and overlap part is the DEGs
detected in both SFS and NFS.
Fig. 5 | Distribution of DEGs under drought stress treatment in SFS and NFS. Y-axis
indicates the differential gene number in each 10 Mb region and X-axis indicates physical
position. Blue dots represent DEGs detected only in SFS under drought stress treatment and
red represent DEGs in NFS and green for DEGs detected in both SFS and NFS
Fig. 6 | Karyotypical comparison of wild barley NFS6.1 and SFS2.3 and cultivated barley
Golden Promise by FISH.
Tables
Table 1 | Summary of the number of differentially expressed genes (DEG) of samples
from the SFS and NFS after drought stress.
Treatments Total DEG Up-regulation Down-regulation
SFS 2791 1607 1184
NFS 1030 212 818
overlap 274 129 145
214
215
216
217
218