molecular markers in plant systematicsand population biology · molecular markers in plant...

38
Molecular markers in plant systematics and population biology 7. DNA sequencing (cpDNA) Tomáš Fér [email protected]

Upload: vukiet

Post on 06-May-2018

217 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Molecular markers in plant systematicsand population biology · Molecular markers in plant systematicsand population biology 7. DNA sequencing (cpDNA) ... Microsoft PowerPoint

Molecular markers in plant systematics and population 

biology

7. DNA sequencing (cpDNA)

Tomáš Fér

[email protected]

Page 2: Molecular markers in plant systematicsand population biology · Molecular markers in plant systematicsand population biology 7. DNA sequencing (cpDNA) ... Microsoft PowerPoint

DNA sequencing

• detection the order of nucleotides in a DNA strand

…ATATATAGGCAAGGAATCTCTATTATTAAATCATT…

• use the information to model evolutionary processes

• make hypothesis about similarity and relationships among taxa

Page 3: Molecular markers in plant systematicsand population biology · Molecular markers in plant systematicsand population biology 7. DNA sequencing (cpDNA) ... Microsoft PowerPoint

Sequencing principle

• PCR with a primer pair• amplification of the target region

• cycle sequencing (dideoxy, Sanger)• use of one primer only• dNTP as well as ddNTP are present in the mixture• produce fragments differing exactly by one base

• electrophoretic separation of fragments in the gel• automated sequencer

Page 4: Molecular markers in plant systematicsand population biology · Molecular markers in plant systematicsand population biology 7. DNA sequencing (cpDNA) ... Microsoft PowerPoint

2´, 3´‐ dideoxy NTPs

dTTP ddTTP

3´-CTGGACTGCA-5´5´-GACCT

Page 5: Molecular markers in plant systematicsand population biology · Molecular markers in plant systematicsand population biology 7. DNA sequencing (cpDNA) ... Microsoft PowerPoint

Cycle sequencing

5´-ATGCATGC-3´3´-TACG-5´ primer

template

ddGTP

ATGCATGCGTACG

ddCTP

ATGCATGCCGTACG

ddATP

ATGCATGCACGTACG

ATGCATGC

ddTTP

TACGTACG

Page 6: Molecular markers in plant systematicsand population biology · Molecular markers in plant systematicsand population biology 7. DNA sequencing (cpDNA) ... Microsoft PowerPoint

Automated sequencer

ABI (Applied Biosystems) – gel, capillary systems (up to 96) 

Page 7: Molecular markers in plant systematicsand population biology · Molecular markers in plant systematicsand population biology 7. DNA sequencing (cpDNA) ... Microsoft PowerPoint

Genome structure• genetic information – order of nucleotides (ACGT)

• coding regions – exons – conserved• non‐coding regions – introns, spacers – variable

• nuclear, chloroplast and mitochondrial genome

5´ UTR       exon 1      intron       exon 2    3´ UTR       intergenic spacer exon

GENE 1 GENE 2

Page 8: Molecular markers in plant systematicsand population biology · Molecular markers in plant systematicsand population biology 7. DNA sequencing (cpDNA) ... Microsoft PowerPoint

Sequence evolutionary rate

Page 9: Molecular markers in plant systematicsand population biology · Molecular markers in plant systematicsand population biology 7. DNA sequencing (cpDNA) ... Microsoft PowerPoint

Types of variability in DNA sequences

5bp indel point mutations (SNPs)

Page 10: Molecular markers in plant systematicsand population biology · Molecular markers in plant systematicsand population biology 7. DNA sequencing (cpDNA) ... Microsoft PowerPoint

Chloroplast genome

• many genes are single‐copy (only 1 copy in the whole genome)

• conserved evolution of the chloroplast genome• disadvantage when studying intraspecific or population variability

• many conserved regions can be used as priming sites

• structural rearrangements of chloroplast genome• mainly on larger evolotionary scale

• inversion – e.g., 30kb inversion differentiates bryophytes and higher plants

• extensive deletions

• loss of specific genes and intrones

• chloroplast capture• chloroplast transfer from one species to another by introgression

• can influence phylogeny in a wrong way (when not recognized)

Page 11: Molecular markers in plant systematicsand population biology · Molecular markers in plant systematicsand population biology 7. DNA sequencing (cpDNA) ... Microsoft PowerPoint

Chloroplast genome

• 4 rRNAs

• 30‐11 tRNAs

• 21 ribosomal proteins (rps)

• 4 RNA polymerase subunits (rpo)

• 28 thylakoid proteins (ps)

• rbcL (large RuBisCO subunit)

• 11 proteins similar to NADH (ndh)

Page 12: Molecular markers in plant systematicsand population biology · Molecular markers in plant systematicsand population biology 7. DNA sequencing (cpDNA) ... Microsoft PowerPoint

Frequently sequenced cpDNA regions

+ many others…

Page 13: Molecular markers in plant systematicsand population biology · Molecular markers in plant systematicsand population biology 7. DNA sequencing (cpDNA) ... Microsoft PowerPoint

rbcL

• gene for large subunit of ribuloso‐1,5‐bisphosphate‐carboxylase/oxygenase(RUBISCO)

• 1,428, 1,431 or 1,434 bp in length – indelsare extremely rare

• one of the first sequenced genes• very conserved, systematics at family or generic level, in some groups at species level

Page 14: Molecular markers in plant systematicsand population biology · Molecular markers in plant systematicsand population biology 7. DNA sequencing (cpDNA) ... Microsoft PowerPoint

atpB• gene coding beta subunit of ATP synthase

• 1,497 bp in length, indels not found

• similar use as rbcL

• codes a subunit of chloroplast NADH‐dehydrogenase

• 2,233 bp in length (tobacco)

• about 2x more substitutions then rbcL

• for generic level

ndhF

Page 15: Molecular markers in plant systematicsand population biology · Molecular markers in plant systematicsand population biology 7. DNA sequencing (cpDNA) ... Microsoft PowerPoint

matK

• gene coding maturase (splicing of plastid genes)

• about 1,550 bp in length – low number of indels

• systematics at family and generic level

Page 16: Molecular markers in plant systematicsand population biology · Molecular markers in plant systematicsand population biology 7. DNA sequencing (cpDNA) ... Microsoft PowerPoint

trnL intron and spacer between trnL and trnF

• tRNA genes – secondary structure• accumulation of insertions/deletionswith the same rate as nucleotide substitutions

• alignment problems, especially in distant organisms (sometimes already at family level)

• suitable for systematics of (closely) related species

trnLUAA5´exon

trnLUAA3´exon trnFGAA

intron spacer

Page 17: Molecular markers in plant systematicsand population biology · Molecular markers in plant systematicsand population biology 7. DNA sequencing (cpDNA) ... Microsoft PowerPoint

atpB‐rbcL

• spacer of about 900‐1,000 bp in length

• systematics at family and generic level

Page 18: Molecular markers in plant systematicsand population biology · Molecular markers in plant systematicsand population biology 7. DNA sequencing (cpDNA) ... Microsoft PowerPoint

Variable non‐coding cpDNA regionsShaw et al. (2005): The tortoise and the hare II: relative utility of 21 noncoding chloroplast DNA 

sequences for phylogenetic analysis. Am. J. Bot. 92: 142‐166

trnH‐psbApsbA‐trnK

rpS16

trnS‐trnG

rpoB‐trnC

trnD‐trnT

trnC‐ycf6ycf6‐psbMpsbM‐ trnD

trnS‐trnfM

trnS‐rpS4rpS4‐trnTtrnT‐trnLtrnL‐trnF

5´rpS12‐rpL20

psbB‐psbH

rpL16

trnG‐trnG

Page 19: Molecular markers in plant systematicsand population biology · Molecular markers in plant systematicsand population biology 7. DNA sequencing (cpDNA) ... Microsoft PowerPoint

Variable non‐coding cpDNA regions

Shaw et al. (2007): Comparison of whole chloroplast genome sequences to choose noncoding regions for phylogenetic studies in angiosperms: the tortoise and the hare III. Am. J. Bot. 94: 275–288.• another 13 regions

Shaw et al. (2014): Chloroplast DNA sequence utility for the lowest phylogenetic and phylogeographic inferences in angiosperms: The tortoise and the hare IV. Am. J. Bot. 101: 1987–2004.• top 13 regions within each major evolutionary lineage

Page 20: Molecular markers in plant systematicsand population biology · Molecular markers in plant systematicsand population biology 7. DNA sequencing (cpDNA) ... Microsoft PowerPoint

Use of chloroplast sequences

• phylogeny of angiosperms

• among‐species relationships within a genus

• within‐species phylogeography (haplotype definition)

• hybridization – inference of the maternal taxon (individual) – cpDNA maternaly inherited in angiosperms

Page 21: Molecular markers in plant systematicsand population biology · Molecular markers in plant systematicsand population biology 7. DNA sequencing (cpDNA) ... Microsoft PowerPoint

Jansen et al. (2007)81 plastid genes76,583 nucleotides

Angiosperm phylogeny

Page 22: Molecular markers in plant systematicsand population biology · Molecular markers in plant systematicsand population biology 7. DNA sequencing (cpDNA) ... Microsoft PowerPoint

Relationships among species

CapsicumatpB‐rbcL spacerWalsh & Hoot (2001)

Page 23: Molecular markers in plant systematicsand population biology · Molecular markers in plant systematicsand population biology 7. DNA sequencing (cpDNA) ... Microsoft PowerPoint

Phylogeography

Arabis alpinatrnL‐trnFKoch et al. 2006

Page 24: Molecular markers in plant systematicsand population biology · Molecular markers in plant systematicsand population biology 7. DNA sequencing (cpDNA) ... Microsoft PowerPoint

Inter‐specific hybridization

PersicariamatK, psbA‐trnH, trnL‐trnFversus ITSKim & Donoghue 2008

incongruence between cpDNA and nDNA

Page 25: Molecular markers in plant systematicsand population biology · Molecular markers in plant systematicsand population biology 7. DNA sequencing (cpDNA) ... Microsoft PowerPoint

Data analysis• multiple alignment

• construction of phylogenetic tree• distance methods

• maximum parsimony (MP)

• maximum likelihood (ML)

• Bayesian inference (BI)

S206 ATATATATATAGGCAAGGAATCTCTATTATTAAATCATTTAGAATCCATA

S207 ATATATATA--GGCAAGGAATCTCTATTATTAAATCATTTAGAATCCATA

S208 ATATATATA--GGCAAGGAATCTCTATTATTAAATCATTTAGAATCCATA

S209 ATATATATA--GGCAAGGAATCTCTATTATTAAATCATTTAGAATCCATA

S210 ATATATATA--GGCAAGGAATCTCTATTATTAAATCATTTAGAATCCATA

S0G3 ATATATATA--GGCAAGGAATCTCTATTATTAAATCATTTAGAATCCATA

TL ATATATATATAGGCAAGGAATCTCTATTATTAAATCATTCATAATTCATA

Page 26: Molecular markers in plant systematicsand population biology · Molecular markers in plant systematicsand population biology 7. DNA sequencing (cpDNA) ... Microsoft PowerPoint

Maximum parsimony (MP)

• cladistic method• search for the simplest tree (most parsimonioustree)

• i.e., tree in which the evolution is explained by minimum number of substitutions

• software• PAUP * Phylogenetic Analysis Using Parsimony(* and other methods)

• TNTTree Analysis Using New Technology

Page 27: Molecular markers in plant systematicsand population biology · Molecular markers in plant systematicsand population biology 7. DNA sequencing (cpDNA) ... Microsoft PowerPoint

Maximum likelihood (ML)

• search for tree with the highest probability (likelihood – L)

• probability that observed sequences evolved under given tree topology (and under given evolutionary model)

• software GARLI, PhyML, RAxML, PAML…

Page 28: Molecular markers in plant systematicsand population biology · Molecular markers in plant systematicsand population biology 7. DNA sequencing (cpDNA) ... Microsoft PowerPoint

Evolutionary models for DNA sequences

• models for sequence changes

• parameters• base frequences• substitution types (transitions, transversions)• heterogeneity in substitution rates (G)• proportion of invariant sites (I)

A

T C

G purines

pyrimidines

transition

transversion

Page 29: Molecular markers in plant systematicsand population biology · Molecular markers in plant systematicsand population biology 7. DNA sequencing (cpDNA) ... Microsoft PowerPoint

Substitution modelsJC (Jukes‐Cantor 1969)– same substitution rates– same base frequencies

A

CT

G

a

aa

a

a

a

A

CT

G

a

aa

a

a

a

A

CT

G

a

aa

b

b

a

A

CT

G

d

ce

f

a

b

K2P (Kimura 2 parameter 1980)– two different substitution rates– same base frequencies

F81 (Felsenstein 1981)– same substitution rates– different base frequencies HKY (Hasegawa, Kishino & Yano 1985)

– two different substitution rates– different base frequencies

GTR (General time‐reversible model)(Tavaré et al. 1986)– six different substitution rates– different base frequencies

Increasing

 num

ber of param

eters

b

A

CT

G

a

aa

b

a

Page 30: Molecular markers in plant systematicsand population biology · Molecular markers in plant systematicsand population biology 7. DNA sequencing (cpDNA) ... Microsoft PowerPoint

Which model to select?

• MODELTEST: A tool to select the best‐fit model of nucleotide substitution (Posada et al.)

• testing different models – selecting the simpliestthat sufficiently explain the data using• hierarchical likelihood ratio tests (hLRTs)

• Akaike information criterion (AIC)

• jModelTest2 (https://code.google.com/p/jmodeltest2/)

Page 31: Molecular markers in plant systematicsand population biology · Molecular markers in plant systematicsand population biology 7. DNA sequencing (cpDNA) ... Microsoft PowerPoint

Saturation• signal and noise in the data

• corrected versus uncorrecteddistance

• skewness (g1‐statistics), ISS

Gontcharov & Melkonian 2008

Page 32: Molecular markers in plant systematicsand population biology · Molecular markers in plant systematicsand population biology 7. DNA sequencing (cpDNA) ... Microsoft PowerPoint

Molecular clock

• strict (global)• clocklike evolution

• local

• relaxed clocks• autocorrelated (closely related taxa have similar mutation rates)

• uncorrelated (lognormal, exponential)

• calibration• substitution rates from another study or generally assumed rate (e.g., for cpDNA)

• fossils

• biogeography

• software• BEAST (Bayesian), r8s (non‐parametric rate smoothing, penalized likelihood), … 

http://sydney.edu.au/science/biology/meep/documents/Workshop_Lecture4.pdf

Page 33: Molecular markers in plant systematicsand population biology · Molecular markers in plant systematicsand population biology 7. DNA sequencing (cpDNA) ... Microsoft PowerPoint

Estimates of divergence times(BEAST – Bayesian Evolutionary Analysis Sampling Trees)

Antonelli A. (2009): Have giant lobelias evolved several times independently? Life form shifts and historical biogeography of the cosmopolitan and highly diverse subfamily Lobelioideae (Campanulaceae). BMC Biol. 7:82

Page 34: Molecular markers in plant systematicsand population biology · Molecular markers in plant systematicsand population biology 7. DNA sequencing (cpDNA) ... Microsoft PowerPoint

Gene banks – databases of sequences

• GenBankNational Centre for Biotechnology Information (NCBI)http://www.ncbi.nlm.nih.gov/

• EMBLEuropean Bioinformatics Institute (EBI) http://www.ebi.ac.uk/embl/

Page 35: Molecular markers in plant systematicsand population biology · Molecular markers in plant systematicsand population biology 7. DNA sequencing (cpDNA) ... Microsoft PowerPoint

GenBank example

Page 36: Molecular markers in plant systematicsand population biology · Molecular markers in plant systematicsand population biology 7. DNA sequencing (cpDNA) ... Microsoft PowerPoint

Population studySanz M. et al. (2014): Southern isolation and northern long‐distance dispersal shaped the phylogeography of the widespread, but highly disjunct, European high mountain plant Artemisia eriantha (Asteraceae). Botanical Journal of the Linnean Society 174: 214–226.

Page 37: Molecular markers in plant systematicsand population biology · Molecular markers in plant systematicsand population biology 7. DNA sequencing (cpDNA) ... Microsoft PowerPoint

Systematic study

Renner S.S. (2004): A chloroplast phylogeny of Arisaema (Araceae) illustrates Tertiary floristic links between Asia, North America, and East Africa. American Journal of Botany 91(6): 881–888

Page 38: Molecular markers in plant systematicsand population biology · Molecular markers in plant systematicsand population biology 7. DNA sequencing (cpDNA) ... Microsoft PowerPoint

Literature

Soltis D.E. & al. [eds.] (1998): Molecular systematics of plants.II. DNA sequencing.

Hollingsworth & al. [eds.] (1999): Molecular systematics and plant evolution.

Hall B.G. (2001): Phylogenetic trees made easy.

Felsenstein J. (2004): Inferring phylogenies.

Lemey P. & al. [eds.] (2009): The phylogenetic handbook. 2nd ed.

Wiley E.O. & Lieberman B.S. (2011): Phylogenetics. Theory and Practice of Phylogenetic Systematics. 2nd ed.

Wheeler W. C. (2012): Systematics. A course of lectures.

Drummond et al. (2009): Relaxed phylogenetics and dating with confidence. PLoS Biol 4(5): e88