phylogenetic reconstruction. what is phylogenetic analysis and why should we perform it?...

72
Phylogenetic reconstruction

Upload: arturo-whittemore

Post on 31-Mar-2015

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Phylogenetic reconstruction. What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: 1.Phylogeny inference

Phylogenetic reconstruction

Page 2: Phylogenetic reconstruction. What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: 1.Phylogeny inference

What is phylogenetic analysis and why should we perform it?

Phylogenetic analysis has two major components:

1. Phylogeny inference or “tree building”

2. Character and rate analysis

Page 3: Phylogenetic reconstruction. What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: 1.Phylogeny inference

A

B

C

D

E

TAXA (genes,populations,species, etc.)used to inferthe phylogeny

Common Phylogenetic Tree Terminology

Vertex or Nodes

Edges

Page 4: Phylogenetic reconstruction. What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: 1.Phylogeny inference

Ancestral Node or ROOT of

the TreeInternal Nodes orDivergence Points

(represent hypothetical ancestors of the taxa)

Branches, Lineages or Clades

Terminal Nodes (Leaves)

A

B

C

D

E

Represent theTAXA (genes,populations,species, etc.)used to inferthe phylogeny

Common Phylogenetic Tree Terminology

Page 5: Phylogenetic reconstruction. What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: 1.Phylogeny inference

Phylogenetic trees diagram the evolutionary relationships between the taxa

((A,(B,C)),(D,E)) = The above phylogeny as nested parentheses

Taxon A

Taxon B

Taxon C

Taxon E

Taxon D

No meaning to thespacing or to order

- no scale (for ‘cladograms’),- proportional to genetic distance or amount of change (for ‘phylograms’ or ‘additive trees’), - proportional to time (for ‘ultrametric trees’ or true evolutionary trees).

Page 6: Phylogenetic reconstruction. What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: 1.Phylogeny inference

A few examples of what can be inferred from phylogenetic trees built from DNAor protein sequence data:

• Which species are the closest living relatives of modern humans?

• Did the infamous Florida Dentist infect his patients with HIV?

• What were the origins of specific transposable elements?

• Plus countless others…..

Page 7: Phylogenetic reconstruction. What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: 1.Phylogeny inference

Using Phylogeny to Understand Gene Duplication and Loss

A. A gene tree.B. The gene tree superimposed on a species tree, allowing

identification of the duplication and loss events.

Page 8: Phylogenetic reconstruction. What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: 1.Phylogeny inference

Which species are the closest living relatives of modern humans?

Mitochondrial DNA and most nuclear DNA-encoded genes,

The pre-molecular view

MYA

Chimpanzees

Orangutans Humans

Bonobos

GorillasHumans

Bonobos

Gorillas Orangutans

Chimpanzees

MYA015-30014

Page 9: Phylogenetic reconstruction. What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: 1.Phylogeny inference

Did the Florida Dentist infect his patients with HIV?

DENTIST

DENTIST

Patient D

Patient F

Patient C

Patient A

Patient G

Patient BPatient E

Patient A

Local control 2

Local control 3

Local control 9

Local control 35

Local control 3

Yes:The HIV sequences fromthese patients fall withinthe clade of HIV sequences found in the dentist.

No

No

From Ou et al. (1992) and Page & Holmes (1998)

Phylogenetic treeof HIV sequencesfrom the DENTIST,his Patients, & LocalHIV-infected People:

Page 10: Phylogenetic reconstruction. What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: 1.Phylogeny inference

A few examples of what can be learned from character analysis using phylogenies as analytical frameworks:

• When did specific episodes of positive Darwinian selection occur during evolutionary history?

• What was the most likely geographical location of the common ancestor of the African apes and humans?

• Plus countless others…..

Page 11: Phylogenetic reconstruction. What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: 1.Phylogeny inference

Phylogenetic Resources

• NCBI Taxonomy Browser• http://www.ncbi.nlm.nih.gov/Taxonomy/

• “Tree of Life”• http://phylogeny.arizona.edu/tree/phylogeny.html

Page 12: Phylogenetic reconstruction. What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: 1.Phylogeny inference

Completely unresolvedor "star" phylogeny

Partially resolvedphylogeny

Fully resolved,bifurcating phylogeny

A A A

B

B B

C

C

C

E

E

E

D

D D

Polytomy or multifurcation A bifurcation

The goal of phylogeny inference is to resolve the branching orders of lineages in evolutionary trees:

Page 13: Phylogenetic reconstruction. What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: 1.Phylogeny inference

There are three possible unrooted trees for four taxa (A, B, C, D)

A C

B D

Tree 1

A B

C D

Tree 2

A B

D C

Tree 3

Which one is correct?

Page 14: Phylogenetic reconstruction. What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: 1.Phylogeny inference

The number of unrooted trees increases in a greater than exponential manner with number of taxa

(2N - 5)!! = # unrooted trees for N taxa

CA

B D

A B

C

A D

B E

C

A D

B E

C

F

Page 15: Phylogenetic reconstruction. What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: 1.Phylogeny inference

Inferring evolutionary relationships between the taxa requires rooting the tree:

A

BC

Root D

A B C D

RootNote that in this rooted tree, taxon A is no more closely related to taxon B than it is to C or D.

Rooted tree

Unrooted tree

B

Page 16: Phylogenetic reconstruction. What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: 1.Phylogeny inference

Now, try it again with the root at another position:

A

BC

Root

D

Unrooted tree

Note that in this rooted tree, taxon A is most closely related to taxon B, and together they are equally distantly related to taxa C and D.

C D

Root

Rooted tree

A

BB

Page 17: Phylogenetic reconstruction. What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: 1.Phylogeny inference

An unrooted, four-taxon tree theoretically can be rooted in five different places to produce five different rooted trees

The unrooted tree 1:

A C

B D

Rooted tree 1d

C

D

A

B

4

Rooted tree 1c

A

B

C

D

3

Rooted tree 1e

D

C

A

B

5

Rooted tree 1b

A

B

C

D

2

Rooted tree 1a

B

A

C

D

1

These trees show five different evolutionary relationships among the taxa!

Page 18: Phylogenetic reconstruction. What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: 1.Phylogeny inference

All of these rearrangements show the same evolutionary relationships between the taxa

B

A

C

D

A

B

D

C

B

C

A

D

B

D

A

C

B

AC

DRooted tree 1a

B

A

C

D

A

B

C

D

Page 19: Phylogenetic reconstruction. What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: 1.Phylogeny inference

Think for yourself

• How many unrooted trees are there with 4 taxa?

• With 5 taxa?

• How many rooted trees are there with 4 taxa?

• With 5 taxa?

Page 20: Phylogenetic reconstruction. What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: 1.Phylogeny inference

x =

CA

B D

A D

B E

C

A D

B E

C

F (2N - 3)!! = # unrooted trees for N taxa

Each unrooted tree theoretically can be rooted anywhere along any of its branches

Page 21: Phylogenetic reconstruction. What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: 1.Phylogeny inference

Finding the best tree

• Number of (rooted) trees• 3 taxa -> 3 trees• 4 taxa -> 15 trees• 10 taxa -> 34 459 425 trees• 25 taxa -> 1,19·1030 trees• 52 taxa -> 2,75·1080 trees

• Finding the optimal tree is an NP-complete problem

Search strategies• Exact

• Exhaustive• Branch and bound

• Algorithmic• Greedy algorithms

(including Neighbor-joining)

• Heuristic• Systematic; branch-

swapping (NNI, SPR, TBR)

• Stochastic • Markov Chain Monte

Carlo (MCMC)• Genetic algorithms

Page 22: Phylogenetic reconstruction. What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: 1.Phylogeny inference

Rooting Trees

• Molecular Clock• Root=midpoint, longest span

• Extrinsic Evidence (Outgroup)• select fungus as root for plants

A

B

C

D

10

2

3

5

2

d (A,D) = 10 + 3 + 5 = 18Midpoint = 18 / 2 = 9

outgroup

Page 23: Phylogenetic reconstruction. What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: 1.Phylogeny inference

Phylogenetic Models

• All sequences are homologous

• Each position in alignment homologous

• Positions evolve independently

Page 24: Phylogenetic reconstruction. What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: 1.Phylogeny inference

Steps in Analysis

• Data Model (Alignment)

• DNA base substitution model

• Build Trees• Algorithm based vs Criterion based• Distance based vs Character-based

Page 25: Phylogenetic reconstruction. What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: 1.Phylogeny inference

Practicalities

• Quality of input data critical• Examine data from all possible angles

• distance, parsimony, likelihood

• Outgroup taxon critical• problem if outgroup shares a selective

property with a subset of ingroup

• Order of input can be problematic• Try different orders!

• Assess the variation in your data in some way

Page 26: Phylogenetic reconstruction. What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: 1.Phylogeny inference

Types of data used in phylogenetic inference:Character-based methods: Use the aligned characters, such as DNA

or protein sequences, directly during tree inference. Taxa Characters

Species A ATGGCTATTCTTATAGTACGSpecies B ATCGCTAGTCTTATATTACASpecies C TTCACTAGACCTGTGGTCCASpecies D TTGACCAGACCTGTGGTCCGSpecies E TTGACCAGTTCTCTAGTTCG

Distance-based methods: Transform the sequence data into pairwise distances, and use the matrix during tree building.

A B C D E Species A ---- 0.20 0.50 0.45 0.40 Species B 0.23 ---- 0.40 0.55 0.50 Species C 0.87 0.59 ---- 0.15 0.40 Species D 0.73 1.12 0.17 ---- 0.25 Species E 0.59 0.89 0.61 0.31 ----

Example 1: Uncorrected“p” distance(=observed percentsequence difference)

Example 2: Kimura 2-parameter distance(estimate of the true number of substitutions between taxa)

Page 27: Phylogenetic reconstruction. What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: 1.Phylogeny inference

Types of computational methods:

Clustering algorithms: Use pairwise distances. Are purely algorithmic methods.

Optimality approaches: Use either character or distance data. - minimum branch lengths, - fewest number of events, - highest likelihood

Warning: Finding an optimal tree is not necessarily the same as finding the "true” tree.

Page 28: Phylogenetic reconstruction. What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: 1.Phylogeny inference

Molecular phylogenetic tree building methods:

COMPUTATIONAL METHOD

Clustering algorithmOptimality criterion

DA

TA

TY

PE

Ch

arac

ters

Dis

tan

ces

PARSIMONY

MAXIMUM LIKELIHOOD

UPGMA

NEIGHBOR-JOINING

MINIMUM EVOLUTION

LEAST SQUARES

Page 29: Phylogenetic reconstruction. What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: 1.Phylogeny inference

Tree-Building Methods

• Distance• UPGMA, NJ, FM, ME

• Character• Maximum Parsimony• Maximum Likelihood

Page 30: Phylogenetic reconstruction. What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: 1.Phylogeny inference

Distance Methods

• Measure distance (dissimilarity)

• Methods• UPGMA (Unweighted pair group

method with Arithmetic Mean)• NJ (Neighbor joining)• FM (Fitch-Margoliash)• ME (Minimal Evolution)

Page 31: Phylogenetic reconstruction. What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: 1.Phylogeny inference

Inferring Trees and Ancestors

CCCAGGCCCAAG->

CCCAAGCCCAAA->

CCCAAACCCAAA->

CCCAAC

Page 32: Phylogenetic reconstruction. What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: 1.Phylogeny inference

Different Criteria

1 CCCAGG2 CCCAAG3 CCCAAA4 CCCAAC

1-2 1

1-3 2

1-4 2

2-3 1

2-4 1

3-4 1

1,2 can be sister taxaAND

3,4 can be sister taxa

Infer ancestor of 1,2 and 3,4

Page 33: Phylogenetic reconstruction. What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: 1.Phylogeny inference

UPGMA: Clustering

Page 34: Phylogenetic reconstruction. What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: 1.Phylogeny inference

UPGMA: Distance measure

where |Ci| and |Cj| denote the number of sequences in cluster i and j, respectively.

ji CqCp

pqji

ij dCC

din ,in ||||

1

Clustering: All leaves are assigned to a cluster, which then are iteratively merged according to their distance.

The distance between two clusters i and j is defined as:

Page 35: Phylogenetic reconstruction. What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: 1.Phylogeny inference

||||

||||

ii

jjliilkl CC

CdCdd

jik CCC (1)

(2)

UPGMA: ReplacingNode k replaces nodes i and j with their union:

The new distances between the new node k and all other clusters l are computed according to:

Page 36: Phylogenetic reconstruction. What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: 1.Phylogeny inference

UPGMA: Algorithm

• Initialization:• Assign each sequence i to its own cluster Ci .• Define one leaf of T for each sequence, and place at height zero.

• Iteration • Determine the two clusters i, j for which di,j is minimal.• Define a new cluster k by Ck = Ci U Cj, and define dkl for all l by (2).• Define a node k with daughter nodes i and j, and place it at height

di,j/2.• Add k to the current clusters and remove i.

• Termination• When only two clusters i, j remain, place the root at height di,j/2.

Page 37: Phylogenetic reconstruction. What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: 1.Phylogeny inference

UPGMA example: Step 1 Alignment -> distance

A B C D E F G

A -

B 63 -

C 94 79 -

D 111 96 47 -

E 67 20 83 100 -

F 23 58 89 106 62 -

G 107 92 43 16 96 102 -

Example 1: Uncorrected“p” distance(=observed percentsequence difference)

Example 2: Kimura 2-parameter distance(estimate of the true number of substitutions between taxa)

Distance:

Distance matrix:

Page 38: Phylogenetic reconstruction. What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: 1.Phylogeny inference

GD

Step 2: distance -> cladeA B C D E F G

A -

B 63 -

C 94 79 -

D 111 96 47 -

E 67 23 83 100 -

F 20 58 89 106 62 -

G 107 92 43 16 96 102 -

Page 39: Phylogenetic reconstruction. What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: 1.Phylogeny inference

A B C E F DG

A -

B 63 -

C 94 79 -

E 67 23 83 -

F 20 58 89 62 -

DG 109 94 45 98 104 -

GD

Step 3: merge D and G

||||

||||

ii

jjliilkl CC

CdCdd

Page 40: Phylogenetic reconstruction. What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: 1.Phylogeny inference

A B C E F DG

A -

B 63 -

C 94 79 -

E 67 23 83 -

F 20 58 89 62 -

DG 109 94 45 98 104 -

GD A F

Step 4

Page 41: Phylogenetic reconstruction. What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: 1.Phylogeny inference

AF B C E DG

AF -

B 61 -

C 92 79 -

E 65 23 62 -

DG 107 94 45 98 -

A F

Step 5

GD

Page 42: Phylogenetic reconstruction. What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: 1.Phylogeny inference

AF B C E DG

AF -

B 61 -

C 92 79 -

E 65 23 62 -

DG 107 94 45 98 -

A F

Step 6

GD B E

Page 43: Phylogenetic reconstruction. What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: 1.Phylogeny inference

AF BE C DG

AF -

BE 63 -

C 92 71 -

DG 107 96 45 -

Step 7

A FGD B E

Page 44: Phylogenetic reconstruction. What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: 1.Phylogeny inference

AF BE C DG

AF -

BE 63 -

C 92 71 -

DG 107 96 45 -

GD C

Step 8

A F B E

Page 45: Phylogenetic reconstruction. What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: 1.Phylogeny inference

GD C

Step 9

A F B E

AF BE CDG

AF -

BE 63 -

CDG 102 88 -

Page 46: Phylogenetic reconstruction. What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: 1.Phylogeny inference

GD C

Step 10

AF BE CDG

AF -

BE 63 -

CDG 102 88 -

B EA F

Page 47: Phylogenetic reconstruction. What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: 1.Phylogeny inference

Root

B E

AFBE CDG

AFBE -

CDG 94 -

GD C A F

UPGMA: distance -> phylogeny

B E GD CA F

Page 48: Phylogenetic reconstruction. What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: 1.Phylogeny inference

COMPUTATIONAL METHOD

Clustering algorithmOptimality criterion

DA

TA

TY

PE

Ch

arac

ters

Dis

tan

ces

PARSIMONY

MAXIMUM LIKELIHOOD

UPGMA

NEIGHBOR-JOINING

MINIMUM EVOLUTION

LEAST SQUARES

Classification of phylogenetic inference methods

Page 49: Phylogenetic reconstruction. What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: 1.Phylogeny inference

Clustering methods (UPGMA & N-J)

Optimality criterion: NONE. The algorithm itself builds‘the’ tree.

Page 50: Phylogenetic reconstruction. What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: 1.Phylogeny inference

Minimum evolution (ME) methods

Optimality criterion: The tree(s) with the shortest sum of thebranch lengths (or overall tree length) is chosen as the best tree.

Page 51: Phylogenetic reconstruction. What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: 1.Phylogeny inference

Character Methods

• Maximum Parsimony• minimal changes to produce data

• Maximum Likelihood

Page 52: Phylogenetic reconstruction. What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: 1.Phylogeny inference

Parsimony methods:Optimality criterion: The ‘most-parsimonious’ tree is the one thatrequires the fewest number of evolutionary events (e.g., nucleotidesubstitutions, amino acid replacements) to explain the sequences.

Page 53: Phylogenetic reconstruction. What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: 1.Phylogeny inference

Parsimony

CCCAGGCCCAAG->

CCCAAGCCCAAA->

CCCAAACCCAAA->

CCCAAC

4 TAXA, 3 changes minimum

Page 54: Phylogenetic reconstruction. What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: 1.Phylogeny inference

Parsimony – an example

acgtatgga acgggtgca aacggtgga aactgtgca

: c

: c

: a

: a

: c

: c

: a

: a

: c

: a

: a

: c

Total tree length: 7 Total tree length: 8 Total tree length: 8

Page 55: Phylogenetic reconstruction. What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: 1.Phylogeny inference

Maximum likelihood (ML) methodsOptimality criterion: ML methods evaluate phylogenetic hypothesesin terms of the probability that a proposed model of the evolutionaryprocess and the proposed unrooted tree would give rise to theobserved data. The tree found to have the highest ML value isconsidered to be the preferred tree.

Page 56: Phylogenetic reconstruction. What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: 1.Phylogeny inference

Using modelsObserved differences

Actual changes

A G

C T

Q

3 3 3 3

Example: Jukes-Cantor

pij 1

4

1

4e 4t

pij 1

4

3

4e 4t

, if i=j

, if i≠j

pt : proportion of different nucleotides

Page 57: Phylogenetic reconstruction. What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: 1.Phylogeny inference

Maximum likelihood

• Given two trees, the one with the higher likelihood, i.e. the one with the higher conditional probability of observing the data, is the better

• Site likelihood is the conditional probability of the data at one site given the assumed model of evolution and parameters of the model

• Data set likelihood is the product of the site likelihoods

• Likelihood values under different models are comparable, thus giving us a way to test the adequacy of the model

LH P(D | H ) P(D | T ,)

Page 58: Phylogenetic reconstruction. What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: 1.Phylogeny inference

-55.0

-54.5

-54.0

-53.5

-53.0

-52.5

-52.0

-51.5

-51.0

-50.5

0 0.02 0.04 0.06 0.08 0.1

30 nucleotides from -globin genes of two primates on a one-edge tree

* *Gorilla GAAGTCCTTGAGAAATAAACTGCACACTGGOrangutan GGACTCCTTGAGAAATAAACTGCACACTGG

There are two differences and 28 similarities

28

42

4 3116

11

16

1

tt eeL

t

lnL

t= 0.02327lnL= -51.133956

Page 59: Phylogenetic reconstruction. What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: 1.Phylogeny inference

Likelihoods of a more interesting tree

• Data for one site is shown on the tree• Edge lengths are defined as =3(t)i

• Computational root is chosen arbitrarily (homogenous models) at an internal node (arrow)

• u is the state at the root node, v at the other internal node

Li 1

4PuA(d1)

v

u PuA(d2 )Puv(d5 )PvC (d3 )PvT (d4 )

A

A

C

T

d1 d3

d4d2

d5

Page 60: Phylogenetic reconstruction. What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: 1.Phylogeny inference

Assessing the variation

• Jack-knife – resampling without replacement• Bootstrap – resampling with replacement

• From the characters (sites) draw randomly as many times as there are number of characters

• Analyze this re-sampled data set in the same way as the study

• Repeat this 100+ times and summarize, for example, as a majority rule consensus tree

• Bootstrap proportions between 0.5 and 1 can be interpreted as a measure of confidence or support

Page 61: Phylogenetic reconstruction. What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: 1.Phylogeny inference

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20Aus C G A C G G T G G T C T A T A C A C G ABeus C G G C G G T G A T C T A T G C A C G GCeus T G G C G G C G T C T C A T A C A A T ADeus T A A C G A T G A C C C G A C T A T T G

Original data set with n characters.

2 3 13 8 3 19 14 6 20 20 7 1 9 11 17 10 6 14 8 16Aus G A A G A G T G A A T C G C A T G T G CBeus G G A G G G T G G G T C A C A T G T G CCeus G G A G G T T G A A C T T T A C G T G CDeus A A G G A T A A G G T T A C A C A A G T

Draw n characters randomly with re-placement. Repeat m times.

m pseudo-replicates, each with n characters.

Aus

Beus

Ceus

Deus

Original analysis, e.g. MP, ML, NJ.

Aus

Beus

Ceus

Deus

75%

Evaluate the results from the m analyses.

Aus

Beus

Ceus

Deus

Aus

Beus

Ceus

Deus

Aus

Beus

Ceus

Deus

Aus

Beus

Ceus

Deus

Aus

Beus

Ceus

Deus

Aus

Beus

Ceus

Deus

Repeat original analysis on each of the pseudo-replicate data sets.

Bootstrap

Page 62: Phylogenetic reconstruction. What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: 1.Phylogeny inference

Resampling statistics in phylogenetics

• ”…provides us with a confidence interval…[of] the phylogeny that would be estimated on repeated sampling of many characters from the underlying pool of characters” (Felsenstein 1985)

• True? We don’t know. The exact statistical interpretation remains unclear.

Page 63: Phylogenetic reconstruction. What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: 1.Phylogeny inference

Pros and cons of some methods• Pair-wise, algorithmic approach

+ Fast

+ Models can be used when transforming to distances

- Information is lost when transforming to pair-wise distances

- One will get a tree, but no measure of goodness to compare with other hypotheses

• Parsimony+ Philosophically appealing – Occam’s razor

- Can be inconsistent

- Can be computationally slow

• Maximum likelihood+ Model based

- Model based

- Computationally veeeeery slow

Page 64: Phylogenetic reconstruction. What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: 1.Phylogeny inference

Computation

• For large data sets (many taxa) exact solutions for any method employing an optimality criterion (parsimony, likelihood, minimum evolution) are not possible

Page 65: Phylogenetic reconstruction. What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: 1.Phylogeny inference

What can go wrong?

• Sampling error• Assessed by - for example - the bootstrap

• Systematic error (inconsistent method)• Tests of the adequacy of models used

• Reality• A tree may be a poor model of the real history• Information has been lost by subsequent

evolutionary changes

• “Species” vs. “gene” trees

Page 66: Phylogenetic reconstruction. What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: 1.Phylogeny inference

Canis MusGadus

What is wrong with this tree?

• Negligible (within sequence) sampling error

• Tree estimated by a consistent method

100

100

Page 67: Phylogenetic reconstruction. What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: 1.Phylogeny inference

Gene duplication

“Species” tree

“Gene” trees

The expected tree…

Page 68: Phylogenetic reconstruction. What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: 1.Phylogeny inference

Canis Mus Gadus Gadus Mus Canis

Two copies (paralogs) present in the genomes

Paralogous

Orthologous Orthologous

Page 69: Phylogenetic reconstruction. What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: 1.Phylogeny inference

Canis Gadus Mus

What we have studied…

Page 70: Phylogenetic reconstruction. What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: 1.Phylogeny inference

HIV Genome Diversity

• Error prone (RT) replication

• High rate of replication• 1010 virions/day

• In vivo selection pressure

Page 71: Phylogenetic reconstruction. What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: 1.Phylogeny inference

HIV tree

Recombinants!

ENV

GAG

AIDS 1996, 10:S13

Page 72: Phylogenetic reconstruction. What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: 1.Phylogeny inference

To conclude–

• Shit in, shit out : Alignment crucial• Try several methods, for consistency• Beware of paralogues• If recombinations possible: every site its tree.