molecular markers in plant systematicsand population biology · molecular markers in plant...
TRANSCRIPT
Molecular markers in plant systematics and population
biology
7. DNA sequencing (cpDNA)
Tomáš Fér
DNA sequencing
• detection the order of nucleotides in a DNA strand
…ATATATAGGCAAGGAATCTCTATTATTAAATCATT…
• use the information to model evolutionary processes
• make hypothesis about similarity and relationships among taxa
Sequencing principle
• PCR with a primer pair• amplification of the target region
• cycle sequencing (dideoxy, Sanger)• use of one primer only• dNTP as well as ddNTP are present in the mixture• produce fragments differing exactly by one base
• electrophoretic separation of fragments in the gel• automated sequencer
2´, 3´‐ dideoxy NTPs
dTTP ddTTP
3´-CTGGACTGCA-5´5´-GACCT
Cycle sequencing
5´-ATGCATGC-3´3´-TACG-5´ primer
template
ddGTP
ATGCATGCGTACG
ddCTP
ATGCATGCCGTACG
ddATP
ATGCATGCACGTACG
ATGCATGC
ddTTP
TACGTACG
Automated sequencer
ABI (Applied Biosystems) – gel, capillary systems (up to 96)
Genome structure• genetic information – order of nucleotides (ACGT)
• coding regions – exons – conserved• non‐coding regions – introns, spacers – variable
• nuclear, chloroplast and mitochondrial genome
5´ UTR exon 1 intron exon 2 3´ UTR intergenic spacer exon
GENE 1 GENE 2
Sequence evolutionary rate
Types of variability in DNA sequences
5bp indel point mutations (SNPs)
Chloroplast genome
• many genes are single‐copy (only 1 copy in the whole genome)
• conserved evolution of the chloroplast genome• disadvantage when studying intraspecific or population variability
• many conserved regions can be used as priming sites
• structural rearrangements of chloroplast genome• mainly on larger evolotionary scale
• inversion – e.g., 30kb inversion differentiates bryophytes and higher plants
• extensive deletions
• loss of specific genes and intrones
• chloroplast capture• chloroplast transfer from one species to another by introgression
• can influence phylogeny in a wrong way (when not recognized)
Chloroplast genome
• 4 rRNAs
• 30‐11 tRNAs
• 21 ribosomal proteins (rps)
• 4 RNA polymerase subunits (rpo)
• 28 thylakoid proteins (ps)
• rbcL (large RuBisCO subunit)
• 11 proteins similar to NADH (ndh)
Frequently sequenced cpDNA regions
+ many others…
rbcL
• gene for large subunit of ribuloso‐1,5‐bisphosphate‐carboxylase/oxygenase(RUBISCO)
• 1,428, 1,431 or 1,434 bp in length – indelsare extremely rare
• one of the first sequenced genes• very conserved, systematics at family or generic level, in some groups at species level
atpB• gene coding beta subunit of ATP synthase
• 1,497 bp in length, indels not found
• similar use as rbcL
• codes a subunit of chloroplast NADH‐dehydrogenase
• 2,233 bp in length (tobacco)
• about 2x more substitutions then rbcL
• for generic level
ndhF
matK
• gene coding maturase (splicing of plastid genes)
• about 1,550 bp in length – low number of indels
• systematics at family and generic level
trnL intron and spacer between trnL and trnF
• tRNA genes – secondary structure• accumulation of insertions/deletionswith the same rate as nucleotide substitutions
• alignment problems, especially in distant organisms (sometimes already at family level)
• suitable for systematics of (closely) related species
trnLUAA5´exon
trnLUAA3´exon trnFGAA
intron spacer
atpB‐rbcL
• spacer of about 900‐1,000 bp in length
• systematics at family and generic level
Variable non‐coding cpDNA regionsShaw et al. (2005): The tortoise and the hare II: relative utility of 21 noncoding chloroplast DNA
sequences for phylogenetic analysis. Am. J. Bot. 92: 142‐166
trnH‐psbApsbA‐trnK
rpS16
trnS‐trnG
rpoB‐trnC
trnD‐trnT
trnC‐ycf6ycf6‐psbMpsbM‐ trnD
trnS‐trnfM
trnS‐rpS4rpS4‐trnTtrnT‐trnLtrnL‐trnF
5´rpS12‐rpL20
psbB‐psbH
rpL16
trnG‐trnG
Variable non‐coding cpDNA regions
Shaw et al. (2007): Comparison of whole chloroplast genome sequences to choose noncoding regions for phylogenetic studies in angiosperms: the tortoise and the hare III. Am. J. Bot. 94: 275–288.• another 13 regions
Shaw et al. (2014): Chloroplast DNA sequence utility for the lowest phylogenetic and phylogeographic inferences in angiosperms: The tortoise and the hare IV. Am. J. Bot. 101: 1987–2004.• top 13 regions within each major evolutionary lineage
Use of chloroplast sequences
• phylogeny of angiosperms
• among‐species relationships within a genus
• within‐species phylogeography (haplotype definition)
• hybridization – inference of the maternal taxon (individual) – cpDNA maternaly inherited in angiosperms
Jansen et al. (2007)81 plastid genes76,583 nucleotides
Angiosperm phylogeny
Relationships among species
CapsicumatpB‐rbcL spacerWalsh & Hoot (2001)
Phylogeography
Arabis alpinatrnL‐trnFKoch et al. 2006
Inter‐specific hybridization
PersicariamatK, psbA‐trnH, trnL‐trnFversus ITSKim & Donoghue 2008
incongruence between cpDNA and nDNA
Data analysis• multiple alignment
• construction of phylogenetic tree• distance methods
• maximum parsimony (MP)
• maximum likelihood (ML)
• Bayesian inference (BI)
S206 ATATATATATAGGCAAGGAATCTCTATTATTAAATCATTTAGAATCCATA
S207 ATATATATA--GGCAAGGAATCTCTATTATTAAATCATTTAGAATCCATA
S208 ATATATATA--GGCAAGGAATCTCTATTATTAAATCATTTAGAATCCATA
S209 ATATATATA--GGCAAGGAATCTCTATTATTAAATCATTTAGAATCCATA
S210 ATATATATA--GGCAAGGAATCTCTATTATTAAATCATTTAGAATCCATA
S0G3 ATATATATA--GGCAAGGAATCTCTATTATTAAATCATTTAGAATCCATA
TL ATATATATATAGGCAAGGAATCTCTATTATTAAATCATTCATAATTCATA
Maximum parsimony (MP)
• cladistic method• search for the simplest tree (most parsimonioustree)
• i.e., tree in which the evolution is explained by minimum number of substitutions
• software• PAUP * Phylogenetic Analysis Using Parsimony(* and other methods)
• TNTTree Analysis Using New Technology
Maximum likelihood (ML)
• search for tree with the highest probability (likelihood – L)
• probability that observed sequences evolved under given tree topology (and under given evolutionary model)
• software GARLI, PhyML, RAxML, PAML…
Evolutionary models for DNA sequences
• models for sequence changes
• parameters• base frequences• substitution types (transitions, transversions)• heterogeneity in substitution rates (G)• proportion of invariant sites (I)
A
T C
G purines
pyrimidines
transition
transversion
Substitution modelsJC (Jukes‐Cantor 1969)– same substitution rates– same base frequencies
A
CT
G
a
aa
a
a
a
A
CT
G
a
aa
a
a
a
A
CT
G
a
aa
b
b
a
A
CT
G
d
ce
f
a
b
K2P (Kimura 2 parameter 1980)– two different substitution rates– same base frequencies
F81 (Felsenstein 1981)– same substitution rates– different base frequencies HKY (Hasegawa, Kishino & Yano 1985)
– two different substitution rates– different base frequencies
GTR (General time‐reversible model)(Tavaré et al. 1986)– six different substitution rates– different base frequencies
Increasing
num
ber of param
eters
b
A
CT
G
a
aa
b
a
Which model to select?
• MODELTEST: A tool to select the best‐fit model of nucleotide substitution (Posada et al.)
• testing different models – selecting the simpliestthat sufficiently explain the data using• hierarchical likelihood ratio tests (hLRTs)
• Akaike information criterion (AIC)
• jModelTest2 (https://code.google.com/p/jmodeltest2/)
Saturation• signal and noise in the data
• corrected versus uncorrecteddistance
• skewness (g1‐statistics), ISS
Gontcharov & Melkonian 2008
Molecular clock
• strict (global)• clocklike evolution
• local
• relaxed clocks• autocorrelated (closely related taxa have similar mutation rates)
• uncorrelated (lognormal, exponential)
• calibration• substitution rates from another study or generally assumed rate (e.g., for cpDNA)
• fossils
• biogeography
• software• BEAST (Bayesian), r8s (non‐parametric rate smoothing, penalized likelihood), …
http://sydney.edu.au/science/biology/meep/documents/Workshop_Lecture4.pdf
Estimates of divergence times(BEAST – Bayesian Evolutionary Analysis Sampling Trees)
Antonelli A. (2009): Have giant lobelias evolved several times independently? Life form shifts and historical biogeography of the cosmopolitan and highly diverse subfamily Lobelioideae (Campanulaceae). BMC Biol. 7:82
Gene banks – databases of sequences
• GenBankNational Centre for Biotechnology Information (NCBI)http://www.ncbi.nlm.nih.gov/
• EMBLEuropean Bioinformatics Institute (EBI) http://www.ebi.ac.uk/embl/
GenBank example
Population studySanz M. et al. (2014): Southern isolation and northern long‐distance dispersal shaped the phylogeography of the widespread, but highly disjunct, European high mountain plant Artemisia eriantha (Asteraceae). Botanical Journal of the Linnean Society 174: 214–226.
Systematic study
Renner S.S. (2004): A chloroplast phylogeny of Arisaema (Araceae) illustrates Tertiary floristic links between Asia, North America, and East Africa. American Journal of Botany 91(6): 881–888
Literature
Soltis D.E. & al. [eds.] (1998): Molecular systematics of plants.II. DNA sequencing.
Hollingsworth & al. [eds.] (1999): Molecular systematics and plant evolution.
Hall B.G. (2001): Phylogenetic trees made easy.
Felsenstein J. (2004): Inferring phylogenies.
Lemey P. & al. [eds.] (2009): The phylogenetic handbook. 2nd ed.
Wiley E.O. & Lieberman B.S. (2011): Phylogenetics. Theory and Practice of Phylogenetic Systematics. 2nd ed.
Wheeler W. C. (2012): Systematics. A course of lectures.
Drummond et al. (2009): Relaxed phylogenetics and dating with confidence. PLoS Biol 4(5): e88