molecular phylogenetics 4 level 3 molecular evolution and bioinformatics jim provan page and holmes:...

10
Molecular Molecular phylogenetics 4 phylogenetics 4 Level 3 Molecular Evolution and Level 3 Molecular Evolution and Bioinformatics Bioinformatics Jim Provan Jim Provan Page and Holmes: Sections 6.7-8 Page and Holmes: Sections 6.7-8

Upload: janis-barnett

Post on 03-Jan-2016

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Molecular phylogenetics 4 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 6.7-8

Molecular Molecular phylogenetics 4phylogenetics 4

Level 3 Molecular Evolution and Level 3 Molecular Evolution and BioinformaticsBioinformatics

Jim ProvanJim Provan

Page and Holmes: Sections 6.7-8Page and Holmes: Sections 6.7-8

Page 2: Molecular phylogenetics 4 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 6.7-8

Have we got the true tree?Have we got the true tree?

Several approaches developed to answer this Several approaches developed to answer this question:question:

Analysis:Analysis:– In some cases (e.g. UPGMA) the phylogenetic method is simple In some cases (e.g. UPGMA) the phylogenetic method is simple

enough that we can establish mathematically the exact enough that we can establish mathematically the exact conditions under which it will failconditions under which it will fail

– Parsimony can fail under particular distribution of edge lengthsParsimony can fail under particular distribution of edge lengths

Known phylogeniesKnown phylogenies– Best evidence for success of a tree-building method would be if it Best evidence for success of a tree-building method would be if it

could accurately reconstruct a known phylogenycould accurately reconstruct a known phylogeny– Typically, only “known” phylogenies exist for crop plants and Typically, only “known” phylogenies exist for crop plants and

laboratory animals and even these are often suspectlaboratory animals and even these are often suspect– Growth of bacteriophage T7 in the presence of mutagens allowed Growth of bacteriophage T7 in the presence of mutagens allowed

comparison of tree building methodscomparison of tree building methods

Page 3: Molecular phylogenetics 4 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 6.7-8

Have we got the true tree?Have we got the true tree?

Several approaches (continued):Several approaches (continued):Simulation:Simulation:

– Provide software with a tree and “evolve” DNA Provide software with a tree and “evolve” DNA sequences along branches according to some modelsequences along branches according to some model

– Supply the resulting sequences for a range of tree-Supply the resulting sequences for a range of tree-building methods and determine which (if any) recover building methods and determine which (if any) recover the original treethe original tree

– An advantage of this approach is that we can explore An advantage of this approach is that we can explore the effects of a wide range of parameters on the the effects of a wide range of parameters on the performance of tree reconstruction methodsperformance of tree reconstruction methods

– A disadvantage is that the models used to generate the A disadvantage is that the models used to generate the new sequences may be unrealistic, particularly in new sequences may be unrealistic, particularly in biasing the model towards a particular methodbiasing the model towards a particular method

Page 4: Molecular phylogenetics 4 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 6.7-8

The “Felsenstein Zone”The “Felsenstein Zone”

UPGMAUPGMA ParsimonyParsimony

Page 5: Molecular phylogenetics 4 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 6.7-8

CongruenceCongruence

CongruenceCongruence is the agreement between estimates is the agreement between estimates of phylogeny based on different characters:of phylogeny based on different characters:

If data sets are independent, the probability of obtaining If data sets are independent, the probability of obtaining similar trees is extremely smallsimilar trees is extremely smallConversely, if different data sets give similar trees then Conversely, if different data sets give similar trees then this suggests that both reflect the same underlying cause, this suggests that both reflect the same underlying cause, namely they reflect the same evolutionary historynamely they reflect the same evolutionary history

Two ways of using congruence:Two ways of using congruence:To validate a method of inference: a method that To validate a method of inference: a method that constantly recovers similar trees from different data sets constantly recovers similar trees from different data sets will be preferred to a method that produces different will be preferred to a method that produces different trees from different data setstrees from different data setsTo validate a new source of data: does a newly To validate a new source of data: does a newly sequenced gene contain phylogenetic information?sequenced gene contain phylogenetic information?

Page 6: Molecular phylogenetics 4 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 6.7-8

Sampling errorSampling error

If a data set contains homoplasy then different If a data set contains homoplasy then different nucleotide sites support different trees:nucleotide sites support different trees:

Which tree(s) a given data set supports depends on Which tree(s) a given data set supports depends on which characters have been sampledwhich characters have been sampledEstimates of phylogeny based on samples will be Estimates of phylogeny based on samples will be accompanied by accompanied by sample errorsample error

Effects of sampling error evident by comparing Effects of sampling error evident by comparing trees for different mitochondrial genes:trees for different mitochondrial genes:

Since there is no recombination, all mitochondrial genes Since there is no recombination, all mitochondrial genes share the same evolutionary historyshare the same evolutionary historySeveral different trees were obtainedSeveral different trees were obtained

Sampling of taxa is also importantSampling of taxa is also important

Page 7: Molecular phylogenetics 4 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 6.7-8

BootstrappingBootstrapping

Bootstrapping is a way of calculating sampling error Bootstrapping is a way of calculating sampling error without taking repeated samples from the without taking repeated samples from the population / species under study:population / species under study:

Mimics the technique of repeated sampling from the Mimics the technique of repeated sampling from the original population by resampling from the original sampleoriginal population by resampling from the original sampleEach resampling is a Each resampling is a pseudoreplicatepseudoreplicate

Bootstrapping can be applied to phylogenetics by Bootstrapping can be applied to phylogenetics by taking several pseudoreplicates:taking several pseudoreplicates:

Sampling with replacement gives a new data set based on Sampling with replacement gives a new data set based on the original sample:the original sample:

– Some sites represented more than onceSome sites represented more than once– Some sites not represented at allSome sites not represented at all

Pseudoreplicate can be used to construct a new treePseudoreplicate can be used to construct a new tree

Page 8: Molecular phylogenetics 4 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 6.7-8

BootstrappingBootstrapping

1 2 3 4 5 6 7 8 91 2 3 4 5 6 7 8 9HumanHuman T C C T T A A A AT C C T T A A A AChimpChimp T T C T A T A A AT T C T A T A A AGorillaGorilla T T A C A A T A AT T A C A A T A AOrang-utanOrang-utan C C A C A A A T AC C A C A A A T AGibbonGibbon C C A C A A A A TC C A C A A A A T

2 7 7 3 1 7 4 9 62 7 7 3 1 7 4 9 6C A A C T A T A AC A A C T A T A AC A A C T A T A TC A A C T A T A TA T T A T T C A AA T T A T T C A AA A A A C A C A AA A A A C A C A AA A A A C A C T AA A A A C A C T A

Original treeOriginal tree Bootstrap treeBootstrap tree

Page 9: Molecular phylogenetics 4 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 6.7-8

BootstrappingBootstrapping

CC

GG

HH

BB

OO

41/10041/100

BB

OO

GG

HH

CC

28/10028/100

BB

OO

CC

HH

GG

31/10031/100

Page 10: Molecular phylogenetics 4 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 6.7-8

What can go wrong?What can go wrong?

Sampling error:Sampling error:Almost all phylogenies are based on a sample of some Almost all phylogenies are based on a sample of some sortsortEspecially true given the vagaries of homoplasyEspecially true given the vagaries of homoplasy

Incorrect model of sequence evolution:Incorrect model of sequence evolution:All methods make implicit or explicit assumptions All methods make implicit or explicit assumptions about evolutionary processabout evolutionary processExample is problem of base composition:Example is problem of base composition:

– An AT rich part of a gene may be more similar to an AT An AT rich part of a gene may be more similar to an AT rich part of a different gene purely by chancerich part of a different gene purely by chance

Tree structure:Tree structure:Evolutionary history is not always simple:Evolutionary history is not always simple:

– Rapid cladogenesisRapid cladogenesis– Widely differing rates of divergenceWidely differing rates of divergence– Horizontal gene transferHorizontal gene transfer