Download - Phylogeny
ECH3937 20431ECH3937 20167ECH3937 14722ECH3937 17419ECH3937 18541ECH3937 17896ECH3937 17863PlTTO1 141022SCRI1043 50855SCRI1043 53164SCRI1043 54408SCRI1043 57558ECH3937 17662ECH3937 17665ECH3937 17668ECH3937 17672ECH3937 17674SCRI1043 58972SCRI1043 59598
SCRI1043 59602
SCRI1043 59605
SCRI1043 59608
ECH3937 16585SCRI1043 48415ECH3937 18892SCRI1043 48511ECH3937 18585SCRI1043 58061
ECH3937 18765
SCRI1043 52036ECH3937 15168
SCRI1043 58277SenLT2 95332
MG1655 4744
EcoRIM 131994
EDL933 26073
Sfl301 88087Sfl2457T 72554
YPCO92 120790YPCO92 120791
ECH3937 16436ECH3937 17090
ECH3937 17537SCRI1043 50312YP91001 236316YPCO92 119344YPKIM 32701
ECH3937 16979SCRI1043 51018ECH3937 14618SCRI1043 54897
ECH3937 19858
ECH3937 19855ECH3937 19852
ECH3937 19851SCRI1043 51623ECH3937 19306
ECH3937 19050SCRI1043 54545SCRI1043 54551
ECH3937 17097SCRI1043 47533
ECH3937 20252
ECH3937 15600ECH3937 15603ECH3937 16115
SCRI1043 47561
YP91001 243830YPCO92 129020YPKIM 33881
ECH3937 15513SCRI1043 53830YP91001 241164YPCO92 117982YPKIM 32953
ECH3937 19309
ECH3937 14726
EcoRIM 134012EDL933 28099MG1655 10087
CFT073 79997
Sfl301 89357
Sfl2457T 74510
SenCT18 112185
SenLT2 100002
SenTy2 84856
ECH3937 14843SCRI1043 57364YP91001 238556YPCO92 120760YPKIM 31500
ECH3937 14536
SCRI1043 52281
ECH3937 19790
SCRI1043 47806
SCRI1043 47811
SenLT2 99999
SenCT18 112182SenTy2 84855
ECH3937 19718SCRI1043 56439ECH3937 18754SCRI1043 52063ECH3937 46680SCRI1043 48500
ECH3937 14824SCRI1043 57403ECH3937 18502SCRI1043 57501
ECH3937 18511SCRI1043 57474
EcoRIM 132579EDL933 26681MG1655 6288
Sfl301 88220
Sfl2457T 73134
PlTTO1 141025
YP91001 238664YPCO92 120888 YPKIM 31533
MG1655 6290CFT073 78420
EcoRIM 132580EDL933 26682
Sfl301 88221
Sfl2457T 73135
SenCT18 109228SenLT2 96205SenTy2 82672
SenLT2 103914
EcoRIM 135436
EDL933 29529
MG1655 14282
CFT073 81646
Sfl301 90598
Sfl2457T 76001SenCT18 107948SenLT2 95413SenTy2 83275SenCT18 112064SenLT2 99818SenCT18 113896SenLT2 101078SenTy2 85671YP91001 238660YPCO92 120881YPKIM 31532
SCRI1043 53573ECH3937 16380
Phylogeny
Vocabulary of Phylogenetic Trees
• Graph of edges and nodes that illustrates the evolutionary relationships among “Operational Taxonomic Units or OTUs”
• Topology refers to the branching pattern
http://www.ncbi.nlm.nih.gov/About/primer/phylo.html
Rooting and Scaling – Same tree, different look?
http://www.ncbi.nlm.nih.gov/About/primer/phylo.html
Three different rooted trees consistent with a four taxon unrooted tree
What is the total number of possible rooted trees consistent with this unrooted tree?
http://www.ncbi.nlm.nih.gov/About/primer/phylo.html
How many possible trees for n taxa?
Number of Rooted Trees = (2n -3)!(2 n -2) (n -2)!
Number of Unrooted Trees = (2n -5)!
(2 n -3) (n -3)!
Phylogeny and Genomics
• A species tree provides a framework for analyzing presence and absence of genes in genomes (or traits in organisms)– The species tree may be unknown
• A genome is a (comprehensive) source of DNA and (predicted) protein sequences to use for phylogenetic reconstruction– Different regions of the genome may support different trees
• Trees are useful for examining evolutionary history of gene families– Knowledge of the species tree affects interpretation of gene family
trees.
Erwinia carotovora
Knowing the relationship between strains and species provides a framework for interpretation
Pantoea stewartii
Salmonella enterica
Yersinia pestis
Erwinia carotovora
A reasonable guess based on the character “host type”
Pantoea stewartii
Salmonella enterica
Yersinia pestis
But is this a good choice if the goal is to reconstruct the “species tree”?
Why might you choose to build your tree based on a molecular sequence data rather than phenotype even if what you are really interested in is the evolution of host range?
Best tree from molecular phylogenetic analysis using multiple core metabolism proteins
“True” species tree?
Pantoea stewartii
Salmonella enterica
Erwinia carotovora
Yersinia pestis
Why choose to use multiple genes or proteins instead of one?
Why choose core metabolism proteins?
Why might it be a bad idea?
Mapping the trait of interest (phenotypes, presence/absence of genes) onto the species tree
“True” species tree
Pantoea stewartii
Salmonella enterica
Erwinia carotovora
Yersinia pestis
Signaling system
+
-
+
-Trait/Gene of Interest
Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A............................................................ Organism B............................tg.................g.......g.... Organism C.................a.....................................g.... Organism D.......a...............................................g.... Organism E
Organism A
Organism B
Organism C
Organism D
Organism E
From Multiple Alignment to Phylogeny
Four Approaches to Tree Reconstruction
• Distance Methods (MEGA, PAUP, Phylip)– Estimate a distance matrix– Infer topology and branch lengths
• Maximum Parsimony (PAUP)– Sift through all possible trees to find “the one” that requires
the smallest number of evolutionary events
• Maximum Likelihood (PAUP)– Find the tree most likely to have generated the sequence
data
• Bayesian (MrBayes)– Produce a probability distribution for all (or a well sampled
subset) possible trees using MCMC to explore tree space
Distance matrices and data types
• DNA sequence
• Protein sequence
• Shared gene content
• Similarity of gene expression profile
• Anything you can represent as a pair-wise distance between OTUs
DNA or Protein?
DNA• Well developed evolutionary
models• Vary among closely related
OTUs• Can be used for regions other
than protein coding genes• Can be partitioned into
synoymous/nonsynonymous• “Saturate” faster than proteins
because there are only 4 characters (GATC)
Protein• Evolutionary models are
available (empirical)• Conserved enough to use for
distantly related OTUs• Can only be used for proteins
• 20 characters
Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A............................................................ Organism B............................tg.................g.......g.... Organism C.........a.......a.....................................g.... Organism D.......a.a.............................................g.... Organism E
A B C D EOrganism A - 0 4 3 3Organism B - 4 3 3Organism C - 4 4Organism D - 2Organism E -
USE AN EVOLUTIONARY MODEL TO CORRECT THE DISTANCE MATRIX FOR UNOBSERVED CHANGES.
Distance – in its simplest form is a count of the differences between two sequences
Five Models for Nucleotide Substitution(There are others)
Jukes and Cantor, 1969All substitutions are equally likely
Kimura, 1980Transitions are more likely than transversions
TamuraTransitions are more likely that transversions and GC content does not equal AT content.
Tamura and NeiTransitions are more likely than transversions AND GC-content doesn’t equal the AT-content AND there is a rate difference between G-A and T-C transitions
UnrestrictedThere is no discernable relationship between rates
An element of eij of the matrix stands for the substitution rate
from the nucleotide in the ith row to the nucleotide in the jth column
A T C GA - T - C - G -
A T C GA - T - C - G -
Models of Nucleotide Substitution
Jukes-Cantor
Kimura
Rate Heterogeneity
• Instead of assuming a uniform distribution across the alignment allow rate to vary according to the gamma family of distributions
Alpha < 1 there is strong among-site variation
Higher alpha, lower heterogeneity
Can be estimated for individual data sets
UPGMA (Unweighted Pair Group Method with Arithmetic mean) is a simple method that is also used for microarray clustering.
Assumes constant rates of evolution among different lineages -> linear relationship between distance and time
A B C D EOrganism A - 0.00 0.04 0.03 0.03Organism B - 0.04 0.03 0.03Organism C - 0.04 0.04Organism D - 0.02Organism E -
Infer topology and branch lengths from the matrix using an algorithm like UPGMA
UPGMA Step 1-
Cluster the Operational Taxonomic Units OTUs with the smallest distance with branch length = d/2
Organism A
Organism B
time
A B C D EOrganism A - 0.00 0.04 0.03 0.03Organism B - 0.04 0.03 0.03Organism C - 0.04 0.04Organism D - 0.02Organism E -
A B C D EOrganism A - 0.00 0.04 0.03 0.03Organism B - 0.04 0.03 0.03Organism C - 0.04 0.04Organism D - 0.02Organism E -
UPGMA Step 2- Collapse the distance matrix to reflect distance from the AB group by taking the average of the distance from A-all others and B-all others
AB C D EGroup AB - 0.04 0.03 0.03Organism C - 0.04 0.04Organism D - 0.02Organism E -
UPGMA Step 3- Repeat Step 1 with the collapsed distance matrix
Step 1- Cluster OTUs with the smallest distance with branch length = d/2
AB C D EGroup AB - 0.04 0.03 0.03Organism C - 0.04 0.04Organism D - 0.02Organism E -
Organism A
Organism B
time
Organism D
Organism E
0.01
0.01
UPGMA Step 4- nContinue to collapse and join until all taxa are added
AB C DEGroup AB - 0.04 0.03Organism C - 0.04Group DE -
Organism A
Organism B
time
Organism D
Organism E
0.01
0.01
0.015
0.005
ABDE CGroup ABDE - 0.04Organism C -
0.005
0.02Organism C
Alternative to UPGMA that does not assume a constant evolutionary rate
Neighbor-joining takes a step-wise approach similar to UPGMA, but chooses branch lengths that minimize the total branch length (minimum evolution) at every step.
Not guaranteed to get the overall optimal (minimal branch length) tree because it is a greedy algorithm.
Distance methods are fast and scale well for large number of taxa.
Maximum Parsimony - Sift through all possible trees to find “the one” that requires the smallest number of evolutionary events
With so many trees, it is often necessary to use a heuristic approach that looks at a subset of all possible trees (TBR, Branch and Bound)
Organism A
Organism B
time
Organism D
Organism E
Organism C
rootedunrooted
(2n-5)!2n-3(n-3)!
(2n-3)!2n-2(n-2)!
Organism A a
Organism B a
time
Organism D g
Organism E g
Organism C g
Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A............................................................ Organism B............................tg.................g.......g.... Organism C.................a.....................................g.... Organism D.......a...............................................g.... Organism E
g
g
g -> a
g
1 event
Maximum Parsimony
Organism A a
Organism E g
time
Organism D g
Organism B a
Organism C g
Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A............................................................ Organism B............................tg.................g.......g.... Organism C.................a.....................................g.... Organism D.......a...............................................g.... Organism E
a
3 eventsa
Maximum Parsimony
a -> g
a -> g
a -> g
3 eventsWrong tree?
Organism A
Organism B
time
Organism D
Organism E
Organism C
g
g
g -> a
g
1 eventRight tree?
Maximum Parsimony
Organism A a
Organism E g
time
Organism D g
Organism B a
Organism C g
a
a
a -> g
a -> g
a -> g
3 eventsWrong tree?
Organism A
Organism B
time
Organism D
Organism E
Organism C
g
g
g -> a
g
1 eventRight tree?
Maximum Parsimony
Organism A a
Organism E g
time
Organism D g
Organism B a
Organism C g
a
a
a -> g
a -> g
a -> g
Maximum Likelihood Methods
• Given an evolutionary model, evaluate all possible tree topologies and calculate the probability of generating the observed data.
• Choose the tree with the highest probability (generally expressed as the log likelihood)
• Computationally intensive and sensitive to model selection
BootstrappingGgaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A
............................................................ Organism B
............................tg.................g.......g.... Organism C
.................a.....................................g.... Organism D
.......a...............................................g.... Organism E
Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A
............................................................ Organism B
............................tg.................g.......g.... Organism C
.................a.....................................g.... Organism D
.......a...............................................g.... Organism E
Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A
............................................................ Organism B
............................tg.................g.......g.... Organism C
.................a.....................................g.... Organism D
.......a...............................................g.... Organism E Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A
............................................................ Organism B
............................tg.................g.......g.... Organism C
.................a.....................................g.... Organism D
.......a...............................................g.... Organism E
Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A
............................................................ Organism B
............................tg.................g.......g.... Organism C
.................a.....................................g.... Organism D
.......a...............................................g.... Organism E
Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A
............................................................ Organism B
............................tg.................g.......g.... Organism C
.................a.....................................g.... Organism D
.......a...............................................g.... Organism E
Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A
............................................................ Organism B
............................tg.................g.......g.... Organism C
.................a.....................................g.... Organism D
.......a...............................................g.... Organism E
Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A
............................................................ Organism B
............................tg.................g.......g.... Organism C
.................a.....................................g.... Organism D
.......a...............................................g.... Organism E
Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A
............................................................ Organism B
............................tg.................g.......g.... Organism C
.................a.....................................g.... Organism D
.......a...............................................g.... Organism E
Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A
............................................................ Organism B
............................tg.................g.......g.... Organism C
.................a.....................................g.... Organism D
.......a...............................................g.... Organism E
Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A
............................................................ Organism B
............................tg.................g.......g.... Organism C
.................a.....................................g.... Organism D
.......a...............................................g.... Organism E
Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A
............................................................ Organism B
............................tg.................g.......g.... Organism C
.................a.....................................g.... Organism D
.......a...............................................g.... Organism E
Organism A
Organism B
Organism C
Organism D
Organism E
100%
100%
50%
A method of testing the reliability of the tree
Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A............................................................ Organism B............................tg.................g.......g.... Organism C.................a.....................................g.... Organism D.......a...............................................g.... Organism E
Bootstrap to Assess Confidence in Branches
Resample with replacement to produce1000 alignments of the same size
c
.
.
.
.
ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A............................................................ Organism B............................tg.................g.......g.... Organism C.................a.....................................g.... Organism D.......a...............................................g.... Organism E
Bootstrap to Assess Confidence in Branches
Resample with replacement to produce1000 alignments of the same size
c c
. .
. .
. .
. .
ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A............................................................ Organism B............................tg.................g.......g.... Organism C.................a.....................................g.... Organism D.......a...............................................g.... Organism E
Bootstrap to Assess Confidence in Branches
Resample with replacement to produce1000 alignments of the same size
c c a
. . .
. . g
. . .
. . .
ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A............................................................ Organism B............................tg.................g.......g.... Organism C.................a.....................................g.... Organism D.......a...............................................g.... Organism E
Bootstrap to Assess Confidence in Branches
Resample with replacement to produce1000 alignments of the same size
c c a t g g a
. . . . . . .
. . g . . . g
. . . . . a .
. . . . . . .
Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A............................................................ Organism B............................tg.................g.......g.... Organism C.................a.....................................g.... Organism D.......a...............................................g.... Organism E
Many different Alignments
Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A
............................................................ Organism B
............................tg.................g.......g.... Organism C
.................a.....................................g.... Organism D
.......a...............................................g.... Organism E
Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A
............................................................ Organism B
............................tg.................g.......g.... Organism C
.................a.....................................g.... Organism D
.......a...............................................g.... Organism E
Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A
............................................................ Organism B
............................tg.................g.......g.... Organism C
.................a.....................................g.... Organism D
.......a...............................................g.... Organism E
Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A
............................................................ Organism B
............................tg.................g.......g.... Organism C
.................a.....................................g.... Organism D
.......a...............................................g.... Organism E
Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A
............................................................ Organism B
............................tg.................g.......g.... Organism C
.................a.....................................g.... Organism D
.......a...............................................g.... Organism E
Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A
............................................................ Organism B
............................tg.................g.......g.... Organism C
.................a.....................................g.... Organism D
.......a...............................................g.... Organism E
Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A
............................................................ Organism B
............................tg.................g.......g.... Organism C
.................a.....................................g.... Organism D
.......a...............................................g.... Organism E
Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A
............................................................ Organism B
............................tg.................g.......g.... Organism C
.................a.....................................g.... Organism D
.......a...............................................g.... Organism E
Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A
............................................................ Organism B
............................tg.................g.......g.... Organism C
.................a.....................................g.... Organism D
.......a...............................................g.... Organism E
Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A
............................................................ Organism B
............................tg.................g.......g.... Organism C
.................a.....................................g.... Organism D
.......a...............................................g.... Organism E
Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A
............................................................ Organism B
............................tg.................g.......g.... Organism C
.................a.....................................g.... Organism D
.......a...............................................g.... Organism E
Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A
............................................................ Organism B
............................tg.................g.......g.... Organism C
.................a.....................................g.... Organism D
.......a...............................................g.... Organism E
Many different AlignmentsGgaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A
............................................................ Organism B
............................tg.................g.......g.... Organism C
.................a.....................................g.... Organism D
.......a...............................................g.... Organism E
Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A
............................................................ Organism B
............................tg.................g.......g.... Organism C
.................a.....................................g.... Organism D
.......a...............................................g.... Organism E
Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A
............................................................ Organism B
............................tg.................g.......g.... Organism C
.................a.....................................g.... Organism D
.......a...............................................g.... Organism E Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A
............................................................ Organism B
............................tg.................g.......g.... Organism C
.................a.....................................g.... Organism D
.......a...............................................g.... Organism E
Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A
............................................................ Organism B
............................tg.................g.......g.... Organism C
.................a.....................................g.... Organism D
.......a...............................................g.... Organism E
Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A
............................................................ Organism B
............................tg.................g.......g.... Organism C
.................a.....................................g.... Organism D
.......a...............................................g.... Organism E
Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A
............................................................ Organism B
............................tg.................g.......g.... Organism C
.................a.....................................g.... Organism D
.......a...............................................g.... Organism E
Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A
............................................................ Organism B
............................tg.................g.......g.... Organism C
.................a.....................................g.... Organism D
.......a...............................................g.... Organism E
Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A
............................................................ Organism B
............................tg.................g.......g.... Organism C
.................a.....................................g.... Organism D
.......a...............................................g.... Organism E
Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A
............................................................ Organism B
............................tg.................g.......g.... Organism C
.................a.....................................g.... Organism D
.......a...............................................g.... Organism E
Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A
............................................................ Organism B
............................tg.................g.......g.... Organism C
.................a.....................................g.... Organism D
.......a...............................................g.... Organism E
Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A
............................................................ Organism B
............................tg.................g.......g.... Organism C
.................a.....................................g.... Organism D
.......a...............................................g.... Organism E
Organism A
Organism B
Organism C
Organism D
Organism E
100%
100%
50%
What percentage of the datasets support each branch?
Bootstrapping and what it really tells us.
The underlying rational behind bootstrapping is to predict what would happen if more data were collected or small perturbations were made to the existing data. Bootstrapping does not indicate the chance that the branch topology is in the correct location. (Holder, M., Lewis, P. 2003)
Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A
............................................................ Organism B
............................tg.................g.......g.... Organism C
.................a.....................................g.... Organism D
.......a...............................................g.... Organism E
Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A
............................................................ Organism B
............................tg.................g.......g.... Organism C
.................a.....................................g.... Organism D
.......a...............................................g.... Organism E
Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A
............................................................ Organism B
............................tg.................g.......g.... Organism C
.................a.....................................g.... Organism D
.......a...............................................g.... Organism E
Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A
............................................................ Organism B
............................tg.................g.......g.... Organism C
.................a.....................................g.... Organism D
.......a...............................................g.... Organism E
Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A
............................................................ Organism B
............................tg.................g.......g.... Organism C
.................a.....................................g.... Organism D
.......a...............................................g.... Organism E
Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A
............................................................ Organism B
............................tg.................g.......g.... Organism C
.................a.....................................g.... Organism D
.......a...............................................g.... Organism E
Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A
............................................................ Organism B
............................tg.................g.......g.... Organism C
.................a.....................................g.... Organism D
.......a...............................................g.... Organism E
More simulated data
Genome-scale phylogeny
• Total Evidence approach - generate one tree from all available data
• Consensus approach – generate a tree for each gene and generate an average tree
• Network approach – show different relationships for different genes rather than a single bifurcating tree
Ma et al. unpublished analysis of 976 sets of orthologs from 8 enterobacteria and an outgroup.
Total Evidence – concated 976 protein multiple alignments
Majority Rule Consensus -976 separate Bayesian
phylogenies
Network representation of all topologies
An example of incongruence between different genes in Lactobacillus genomes
Nicolas et al. BMC Evolutionary Biology 2007; 7:141
Analyzed 480 proteins
3:2 ratio of genes supporting Ta vs. Tb, but Tc is almost never seen.
Touchon et al. 2009 PLoS Genetics