phylogeny

44
SC R I1043 57558 EC H 3937 17672 SC R I1043 58972 SC R I1043 59598 SC R I1043 59602 SC R I1043 59605 SC R I1043 59608 SC R I1043 48511 SC R I1043 58061 EC H 3937 18765 SC R I1043 52036 EC H 3937 15168 SC R I1043 58277 SenLT2 95332 M G 1655 4744 EcoR IM 131994 ED L933 26073 Sfl301 88087 Sfl2457T 72554 92 120791 EC H 3937 17090 92 119344 SC R I1043 54897 EC H 3937 19858 EC H 3937 19855 EC H 3937 19852 SC R I1043 51623 EC H 3937 19306 EC H 3937 19050 SC R I1043 54545 SC R I1043 54551 EC H 3937 17097 SC R I1043 47533 EC H 3937 20252 EC H 3937 15600 EC H 3937 15603 EC H 3937 16115 SC R I1043 47561 YP91001 243830 YPCO 92 129020 YPKIM 33881 YPCO 92 117982 YPKIM 32953 EC H 3937 19309 EC H 3937 14726 EcoR IM 134012 ED L933 28099 M G 1655 10087 C FT073 79997 Sfl301 89357 Sfl2457T 74510 SenC T18 112185 SenLT2 100002 SenTy2 84856 SC R I1043 57364 YP91001 238556 YPCO YPKIM 31500 EC H 3937 14536 SC R I1043 52281 EC H 3937 19790 SC R I1043 47806 SC R I1043 47811 SenLT2 99999 SenC T18 112182 SenTy2 84855 SC R I1043 56439 SC R I1043 48500 EC H 3937 14824 SC R I1043 57403 EC H 3937 18502 SC R I1043 57501 SC R I1043 57474 EcoR IM ED L933 26681 M G 1655 6288 Sfl301 88220 Sfl2457T 73134 PlTTO 1 141025 YP91001 238664 YPCO 92 120888 YPKIM C FT073 78420 EcoR IM ED L933 26682 Sfl301 88221 Sfl2457T 73135 SenC T18 109228 SenLT2 96205 SenTy2 82672 SenLT2 103914 EcoR IM 135436 ED L933 29529 M G 1655 14282 C FT073 81646 Sfl301 90598 Sfl2457T 76001 92 120881 YPKIM 31532 SC R I1043 53573 EC H 3937 16380 Phylogeny

Upload: afric

Post on 14-Jan-2016

25 views

Category:

Documents


3 download

DESCRIPTION

Phylogeny. Vocabulary of Phylogenetic Trees. Graph of edges and nodes that illustrates the evolutionary relationships among “Operational Taxonomic Units or OTUs” Topology refers to the branching pattern. http://www.ncbi.nlm.nih.gov/About/primer/phylo.html. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Phylogeny

ECH3937 20431ECH3937 20167ECH3937 14722ECH3937 17419ECH3937 18541ECH3937 17896ECH3937 17863PlTTO1 141022SCRI1043 50855SCRI1043 53164SCRI1043 54408SCRI1043 57558ECH3937 17662ECH3937 17665ECH3937 17668ECH3937 17672ECH3937 17674SCRI1043 58972SCRI1043 59598

SCRI1043 59602

SCRI1043 59605

SCRI1043 59608

ECH3937 16585SCRI1043 48415ECH3937 18892SCRI1043 48511ECH3937 18585SCRI1043 58061

ECH3937 18765

SCRI1043 52036ECH3937 15168

SCRI1043 58277SenLT2 95332

MG1655 4744

EcoRIM 131994

EDL933 26073

Sfl301 88087Sfl2457T 72554

YPCO92 120790YPCO92 120791

ECH3937 16436ECH3937 17090

ECH3937 17537SCRI1043 50312YP91001 236316YPCO92 119344YPKIM 32701

ECH3937 16979SCRI1043 51018ECH3937 14618SCRI1043 54897

ECH3937 19858

ECH3937 19855ECH3937 19852

ECH3937 19851SCRI1043 51623ECH3937 19306

ECH3937 19050SCRI1043 54545SCRI1043 54551

ECH3937 17097SCRI1043 47533

ECH3937 20252

ECH3937 15600ECH3937 15603ECH3937 16115

SCRI1043 47561

YP91001 243830YPCO92 129020YPKIM 33881

ECH3937 15513SCRI1043 53830YP91001 241164YPCO92 117982YPKIM 32953

ECH3937 19309

ECH3937 14726

EcoRIM 134012EDL933 28099MG1655 10087

CFT073 79997

Sfl301 89357

Sfl2457T 74510

SenCT18 112185

SenLT2 100002

SenTy2 84856

ECH3937 14843SCRI1043 57364YP91001 238556YPCO92 120760YPKIM 31500

ECH3937 14536

SCRI1043 52281

ECH3937 19790

SCRI1043 47806

SCRI1043 47811

SenLT2 99999

SenCT18 112182SenTy2 84855

ECH3937 19718SCRI1043 56439ECH3937 18754SCRI1043 52063ECH3937 46680SCRI1043 48500

ECH3937 14824SCRI1043 57403ECH3937 18502SCRI1043 57501

ECH3937 18511SCRI1043 57474

EcoRIM 132579EDL933 26681MG1655 6288

Sfl301 88220

Sfl2457T 73134

PlTTO1 141025

YP91001 238664YPCO92 120888 YPKIM 31533

MG1655 6290CFT073 78420

EcoRIM 132580EDL933 26682

Sfl301 88221

Sfl2457T 73135

SenCT18 109228SenLT2 96205SenTy2 82672

SenLT2 103914

EcoRIM 135436

EDL933 29529

MG1655 14282

CFT073 81646

Sfl301 90598

Sfl2457T 76001SenCT18 107948SenLT2 95413SenTy2 83275SenCT18 112064SenLT2 99818SenCT18 113896SenLT2 101078SenTy2 85671YP91001 238660YPCO92 120881YPKIM 31532

SCRI1043 53573ECH3937 16380

Phylogeny

Page 2: Phylogeny

Vocabulary of Phylogenetic Trees

• Graph of edges and nodes that illustrates the evolutionary relationships among “Operational Taxonomic Units or OTUs”

• Topology refers to the branching pattern

http://www.ncbi.nlm.nih.gov/About/primer/phylo.html

Page 3: Phylogeny

Rooting and Scaling – Same tree, different look?

http://www.ncbi.nlm.nih.gov/About/primer/phylo.html

Page 4: Phylogeny

Three different rooted trees consistent with a four taxon unrooted tree

What is the total number of possible rooted trees consistent with this unrooted tree?

http://www.ncbi.nlm.nih.gov/About/primer/phylo.html

Page 5: Phylogeny

How many possible trees for n taxa?

Number of Rooted Trees = (2n -3)!(2 n -2) (n -2)!

Number of Unrooted Trees = (2n -5)!

(2 n -3) (n -3)!

Page 6: Phylogeny

Phylogeny and Genomics

• A species tree provides a framework for analyzing presence and absence of genes in genomes (or traits in organisms)– The species tree may be unknown

• A genome is a (comprehensive) source of DNA and (predicted) protein sequences to use for phylogenetic reconstruction– Different regions of the genome may support different trees

• Trees are useful for examining evolutionary history of gene families– Knowledge of the species tree affects interpretation of gene family

trees.

Page 7: Phylogeny

Erwinia carotovora

Knowing the relationship between strains and species provides a framework for interpretation

Pantoea stewartii

Salmonella enterica

Yersinia pestis

Page 8: Phylogeny

Erwinia carotovora

A reasonable guess based on the character “host type”

Pantoea stewartii

Salmonella enterica

Yersinia pestis

But is this a good choice if the goal is to reconstruct the “species tree”?

Why might you choose to build your tree based on a molecular sequence data rather than phenotype even if what you are really interested in is the evolution of host range?

Page 9: Phylogeny

Best tree from molecular phylogenetic analysis using multiple core metabolism proteins

“True” species tree?

Pantoea stewartii

Salmonella enterica

Erwinia carotovora

Yersinia pestis

Why choose to use multiple genes or proteins instead of one?

Why choose core metabolism proteins?

Why might it be a bad idea?

Page 10: Phylogeny

Mapping the trait of interest (phenotypes, presence/absence of genes) onto the species tree

“True” species tree

Pantoea stewartii

Salmonella enterica

Erwinia carotovora

Yersinia pestis

Signaling system

+

-

+

-Trait/Gene of Interest

Page 11: Phylogeny

Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A............................................................ Organism B............................tg.................g.......g.... Organism C.................a.....................................g.... Organism D.......a...............................................g.... Organism E

Organism A

Organism B

Organism C

Organism D

Organism E

From Multiple Alignment to Phylogeny

Page 12: Phylogeny

Four Approaches to Tree Reconstruction

• Distance Methods (MEGA, PAUP, Phylip)– Estimate a distance matrix– Infer topology and branch lengths

• Maximum Parsimony (PAUP)– Sift through all possible trees to find “the one” that requires

the smallest number of evolutionary events

• Maximum Likelihood (PAUP)– Find the tree most likely to have generated the sequence

data

• Bayesian (MrBayes)– Produce a probability distribution for all (or a well sampled

subset) possible trees using MCMC to explore tree space

Page 13: Phylogeny

Distance matrices and data types

• DNA sequence

• Protein sequence

• Shared gene content

• Similarity of gene expression profile

• Anything you can represent as a pair-wise distance between OTUs

Page 14: Phylogeny

DNA or Protein?

DNA• Well developed evolutionary

models• Vary among closely related

OTUs• Can be used for regions other

than protein coding genes• Can be partitioned into

synoymous/nonsynonymous• “Saturate” faster than proteins

because there are only 4 characters (GATC)

Protein• Evolutionary models are

available (empirical)• Conserved enough to use for

distantly related OTUs• Can only be used for proteins

• 20 characters

Page 15: Phylogeny

Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A............................................................ Organism B............................tg.................g.......g.... Organism C.........a.......a.....................................g.... Organism D.......a.a.............................................g.... Organism E

A B C D EOrganism A - 0 4 3 3Organism B - 4 3 3Organism C - 4 4Organism D - 2Organism E -

USE AN EVOLUTIONARY MODEL TO CORRECT THE DISTANCE MATRIX FOR UNOBSERVED CHANGES.

Distance – in its simplest form is a count of the differences between two sequences

Page 16: Phylogeny

Five Models for Nucleotide Substitution(There are others)

Jukes and Cantor, 1969All substitutions are equally likely

Kimura, 1980Transitions are more likely than transversions

TamuraTransitions are more likely that transversions and GC content does not equal AT content.

Tamura and NeiTransitions are more likely than transversions AND GC-content doesn’t equal the AT-content AND there is a rate difference between G-A and T-C transitions

UnrestrictedThere is no discernable relationship between rates

Page 17: Phylogeny

An element of eij of the matrix stands for the substitution rate

from the nucleotide in the ith row to the nucleotide in the jth column

A T C GA - T - C - G -

A T C GA - T - C - G -

Models of Nucleotide Substitution

Jukes-Cantor

Kimura

Page 18: Phylogeny

Rate Heterogeneity

• Instead of assuming a uniform distribution across the alignment allow rate to vary according to the gamma family of distributions

Alpha < 1 there is strong among-site variation

Higher alpha, lower heterogeneity

Can be estimated for individual data sets

Page 19: Phylogeny

UPGMA (Unweighted Pair Group Method with Arithmetic mean) is a simple method that is also used for microarray clustering.

Assumes constant rates of evolution among different lineages -> linear relationship between distance and time

A B C D EOrganism A - 0.00 0.04 0.03 0.03Organism B - 0.04 0.03 0.03Organism C - 0.04 0.04Organism D - 0.02Organism E -

Infer topology and branch lengths from the matrix using an algorithm like UPGMA

Page 20: Phylogeny

UPGMA Step 1-

Cluster the Operational Taxonomic Units OTUs with the smallest distance with branch length = d/2

Organism A

Organism B

time

A B C D EOrganism A - 0.00 0.04 0.03 0.03Organism B - 0.04 0.03 0.03Organism C - 0.04 0.04Organism D - 0.02Organism E -

Page 21: Phylogeny

A B C D EOrganism A - 0.00 0.04 0.03 0.03Organism B - 0.04 0.03 0.03Organism C - 0.04 0.04Organism D - 0.02Organism E -

UPGMA Step 2- Collapse the distance matrix to reflect distance from the AB group by taking the average of the distance from A-all others and B-all others

AB C D EGroup AB - 0.04 0.03 0.03Organism C - 0.04 0.04Organism D - 0.02Organism E -

Page 22: Phylogeny

UPGMA Step 3- Repeat Step 1 with the collapsed distance matrix

Step 1- Cluster OTUs with the smallest distance with branch length = d/2

AB C D EGroup AB - 0.04 0.03 0.03Organism C - 0.04 0.04Organism D - 0.02Organism E -

Organism A

Organism B

time

Organism D

Organism E

0.01

0.01

Page 23: Phylogeny

UPGMA Step 4- nContinue to collapse and join until all taxa are added

AB C DEGroup AB - 0.04 0.03Organism C - 0.04Group DE -

Organism A

Organism B

time

Organism D

Organism E

0.01

0.01

0.015

0.005

ABDE CGroup ABDE - 0.04Organism C -

0.005

0.02Organism C

Page 24: Phylogeny

Alternative to UPGMA that does not assume a constant evolutionary rate

Neighbor-joining takes a step-wise approach similar to UPGMA, but chooses branch lengths that minimize the total branch length (minimum evolution) at every step.

Not guaranteed to get the overall optimal (minimal branch length) tree because it is a greedy algorithm.

Distance methods are fast and scale well for large number of taxa.

Page 25: Phylogeny

Maximum Parsimony - Sift through all possible trees to find “the one” that requires the smallest number of evolutionary events

With so many trees, it is often necessary to use a heuristic approach that looks at a subset of all possible trees (TBR, Branch and Bound)

Organism A

Organism B

time

Organism D

Organism E

Organism C

rootedunrooted

(2n-5)!2n-3(n-3)!

(2n-3)!2n-2(n-2)!

Page 26: Phylogeny

Organism A a

Organism B a

time

Organism D g

Organism E g

Organism C g

Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A............................................................ Organism B............................tg.................g.......g.... Organism C.................a.....................................g.... Organism D.......a...............................................g.... Organism E

g

g

g -> a

g

1 event

Maximum Parsimony

Page 27: Phylogeny

Organism A a

Organism E g

time

Organism D g

Organism B a

Organism C g

Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A............................................................ Organism B............................tg.................g.......g.... Organism C.................a.....................................g.... Organism D.......a...............................................g.... Organism E

a

3 eventsa

Maximum Parsimony

a -> g

a -> g

a -> g

Page 28: Phylogeny

3 eventsWrong tree?

Organism A

Organism B

time

Organism D

Organism E

Organism C

g

g

g -> a

g

1 eventRight tree?

Maximum Parsimony

Organism A a

Organism E g

time

Organism D g

Organism B a

Organism C g

a

a

a -> g

a -> g

a -> g

Page 29: Phylogeny

3 eventsWrong tree?

Organism A

Organism B

time

Organism D

Organism E

Organism C

g

g

g -> a

g

1 eventRight tree?

Maximum Parsimony

Organism A a

Organism E g

time

Organism D g

Organism B a

Organism C g

a

a

a -> g

a -> g

a -> g

Page 30: Phylogeny

Maximum Likelihood Methods

• Given an evolutionary model, evaluate all possible tree topologies and calculate the probability of generating the observed data.

• Choose the tree with the highest probability (generally expressed as the log likelihood)

• Computationally intensive and sensitive to model selection

Page 31: Phylogeny

BootstrappingGgaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A

............................................................ Organism B

............................tg.................g.......g.... Organism C

.................a.....................................g.... Organism D

.......a...............................................g.... Organism E

Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A

............................................................ Organism B

............................tg.................g.......g.... Organism C

.................a.....................................g.... Organism D

.......a...............................................g.... Organism E

Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A

............................................................ Organism B

............................tg.................g.......g.... Organism C

.................a.....................................g.... Organism D

.......a...............................................g.... Organism E Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A

............................................................ Organism B

............................tg.................g.......g.... Organism C

.................a.....................................g.... Organism D

.......a...............................................g.... Organism E

Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A

............................................................ Organism B

............................tg.................g.......g.... Organism C

.................a.....................................g.... Organism D

.......a...............................................g.... Organism E

Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A

............................................................ Organism B

............................tg.................g.......g.... Organism C

.................a.....................................g.... Organism D

.......a...............................................g.... Organism E

Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A

............................................................ Organism B

............................tg.................g.......g.... Organism C

.................a.....................................g.... Organism D

.......a...............................................g.... Organism E

Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A

............................................................ Organism B

............................tg.................g.......g.... Organism C

.................a.....................................g.... Organism D

.......a...............................................g.... Organism E

Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A

............................................................ Organism B

............................tg.................g.......g.... Organism C

.................a.....................................g.... Organism D

.......a...............................................g.... Organism E

Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A

............................................................ Organism B

............................tg.................g.......g.... Organism C

.................a.....................................g.... Organism D

.......a...............................................g.... Organism E

Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A

............................................................ Organism B

............................tg.................g.......g.... Organism C

.................a.....................................g.... Organism D

.......a...............................................g.... Organism E

Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A

............................................................ Organism B

............................tg.................g.......g.... Organism C

.................a.....................................g.... Organism D

.......a...............................................g.... Organism E

Organism A

Organism B

Organism C

Organism D

Organism E

100%

100%

50%

A method of testing the reliability of the tree

Page 32: Phylogeny

Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A............................................................ Organism B............................tg.................g.......g.... Organism C.................a.....................................g.... Organism D.......a...............................................g.... Organism E

Bootstrap to Assess Confidence in Branches

Resample with replacement to produce1000 alignments of the same size

c

.

.

.

.

Page 33: Phylogeny

ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A............................................................ Organism B............................tg.................g.......g.... Organism C.................a.....................................g.... Organism D.......a...............................................g.... Organism E

Bootstrap to Assess Confidence in Branches

Resample with replacement to produce1000 alignments of the same size

c c

. .

. .

. .

. .

Page 34: Phylogeny

ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A............................................................ Organism B............................tg.................g.......g.... Organism C.................a.....................................g.... Organism D.......a...............................................g.... Organism E

Bootstrap to Assess Confidence in Branches

Resample with replacement to produce1000 alignments of the same size

c c a

. . .

. . g

. . .

. . .

Page 35: Phylogeny

ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A............................................................ Organism B............................tg.................g.......g.... Organism C.................a.....................................g.... Organism D.......a...............................................g.... Organism E

Bootstrap to Assess Confidence in Branches

Resample with replacement to produce1000 alignments of the same size

c c a t g g a

. . . . . . .

. . g . . . g

. . . . . a .

. . . . . . .

Page 36: Phylogeny

Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A............................................................ Organism B............................tg.................g.......g.... Organism C.................a.....................................g.... Organism D.......a...............................................g.... Organism E

Many different Alignments

Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A

............................................................ Organism B

............................tg.................g.......g.... Organism C

.................a.....................................g.... Organism D

.......a...............................................g.... Organism E

Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A

............................................................ Organism B

............................tg.................g.......g.... Organism C

.................a.....................................g.... Organism D

.......a...............................................g.... Organism E

Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A

............................................................ Organism B

............................tg.................g.......g.... Organism C

.................a.....................................g.... Organism D

.......a...............................................g.... Organism E

Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A

............................................................ Organism B

............................tg.................g.......g.... Organism C

.................a.....................................g.... Organism D

.......a...............................................g.... Organism E

Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A

............................................................ Organism B

............................tg.................g.......g.... Organism C

.................a.....................................g.... Organism D

.......a...............................................g.... Organism E

Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A

............................................................ Organism B

............................tg.................g.......g.... Organism C

.................a.....................................g.... Organism D

.......a...............................................g.... Organism E

Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A

............................................................ Organism B

............................tg.................g.......g.... Organism C

.................a.....................................g.... Organism D

.......a...............................................g.... Organism E

Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A

............................................................ Organism B

............................tg.................g.......g.... Organism C

.................a.....................................g.... Organism D

.......a...............................................g.... Organism E

Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A

............................................................ Organism B

............................tg.................g.......g.... Organism C

.................a.....................................g.... Organism D

.......a...............................................g.... Organism E

Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A

............................................................ Organism B

............................tg.................g.......g.... Organism C

.................a.....................................g.... Organism D

.......a...............................................g.... Organism E

Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A

............................................................ Organism B

............................tg.................g.......g.... Organism C

.................a.....................................g.... Organism D

.......a...............................................g.... Organism E

Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A

............................................................ Organism B

............................tg.................g.......g.... Organism C

.................a.....................................g.... Organism D

.......a...............................................g.... Organism E

Page 37: Phylogeny

Many different AlignmentsGgaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A

............................................................ Organism B

............................tg.................g.......g.... Organism C

.................a.....................................g.... Organism D

.......a...............................................g.... Organism E

Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A

............................................................ Organism B

............................tg.................g.......g.... Organism C

.................a.....................................g.... Organism D

.......a...............................................g.... Organism E

Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A

............................................................ Organism B

............................tg.................g.......g.... Organism C

.................a.....................................g.... Organism D

.......a...............................................g.... Organism E Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A

............................................................ Organism B

............................tg.................g.......g.... Organism C

.................a.....................................g.... Organism D

.......a...............................................g.... Organism E

Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A

............................................................ Organism B

............................tg.................g.......g.... Organism C

.................a.....................................g.... Organism D

.......a...............................................g.... Organism E

Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A

............................................................ Organism B

............................tg.................g.......g.... Organism C

.................a.....................................g.... Organism D

.......a...............................................g.... Organism E

Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A

............................................................ Organism B

............................tg.................g.......g.... Organism C

.................a.....................................g.... Organism D

.......a...............................................g.... Organism E

Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A

............................................................ Organism B

............................tg.................g.......g.... Organism C

.................a.....................................g.... Organism D

.......a...............................................g.... Organism E

Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A

............................................................ Organism B

............................tg.................g.......g.... Organism C

.................a.....................................g.... Organism D

.......a...............................................g.... Organism E

Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A

............................................................ Organism B

............................tg.................g.......g.... Organism C

.................a.....................................g.... Organism D

.......a...............................................g.... Organism E

Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A

............................................................ Organism B

............................tg.................g.......g.... Organism C

.................a.....................................g.... Organism D

.......a...............................................g.... Organism E

Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A

............................................................ Organism B

............................tg.................g.......g.... Organism C

.................a.....................................g.... Organism D

.......a...............................................g.... Organism E

Organism A

Organism B

Organism C

Organism D

Organism E

100%

100%

50%

What percentage of the datasets support each branch?

Page 38: Phylogeny

Bootstrapping and what it really tells us.

The underlying rational behind bootstrapping is to predict what would happen if more data were collected or small perturbations were made to the existing data. Bootstrapping does not indicate the chance that the branch topology is in the correct location. (Holder, M., Lewis, P. 2003)

Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A

............................................................ Organism B

............................tg.................g.......g.... Organism C

.................a.....................................g.... Organism D

.......a...............................................g.... Organism E

Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A

............................................................ Organism B

............................tg.................g.......g.... Organism C

.................a.....................................g.... Organism D

.......a...............................................g.... Organism E

Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A

............................................................ Organism B

............................tg.................g.......g.... Organism C

.................a.....................................g.... Organism D

.......a...............................................g.... Organism E

Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A

............................................................ Organism B

............................tg.................g.......g.... Organism C

.................a.....................................g.... Organism D

.......a...............................................g.... Organism E

Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A

............................................................ Organism B

............................tg.................g.......g.... Organism C

.................a.....................................g.... Organism D

.......a...............................................g.... Organism E

Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A

............................................................ Organism B

............................tg.................g.......g.... Organism C

.................a.....................................g.... Organism D

.......a...............................................g.... Organism E

Ggaccttcgggcctcacgccatcggatgaacccagatgggattagctagtaggtgaggta Organism A

............................................................ Organism B

............................tg.................g.......g.... Organism C

.................a.....................................g.... Organism D

.......a...............................................g.... Organism E

More simulated data

Page 39: Phylogeny

Genome-scale phylogeny

• Total Evidence approach - generate one tree from all available data

• Consensus approach – generate a tree for each gene and generate an average tree

• Network approach – show different relationships for different genes rather than a single bifurcating tree

Page 40: Phylogeny

Ma et al. unpublished analysis of 976 sets of orthologs from 8 enterobacteria and an outgroup.

Total Evidence – concated 976 protein multiple alignments

Majority Rule Consensus -976 separate Bayesian

phylogenies

Network representation of all topologies

Page 41: Phylogeny

An example of incongruence between different genes in Lactobacillus genomes

Nicolas et al. BMC Evolutionary Biology 2007; 7:141

Analyzed 480 proteins

3:2 ratio of genes supporting Ta vs. Tb, but Tc is almost never seen.

Page 42: Phylogeny

Touchon et al. 2009 PLoS Genetics

Page 43: Phylogeny
Page 44: Phylogeny