phylogenies: parsimony plus a tantalizing taste of likelihood...tree of life diagrams depict the...

21
Phylogenies: Parsimony Plus a Tantalizing Taste of Likelihood CSEP 590 A Spring 2013 1

Upload: others

Post on 08-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Phylogenies: Parsimony Plus a Tantalizing Taste of Likelihood...TREE OF LIFE Diagrams depict the history of animal lineages as they evolved over time. Each branch represents a lineage

Phylogenies: Parsimony Plus a Tantalizing Taste of Likelihood

CSEP 590 ASpring 2013

1

Page 2: Phylogenies: Parsimony Plus a Tantalizing Taste of Likelihood...TREE OF LIFE Diagrams depict the history of animal lineages as they evolved over time. Each branch represents a lineage

Phylogenies (aka Evolutionary Trees)

“Nothing in biology makes sense, except in the light of evolution”

-- Theodosius Dobzhansky, 1973

2

Page 3: Phylogenies: Parsimony Plus a Tantalizing Taste of Likelihood...TREE OF LIFE Diagrams depict the history of animal lineages as they evolved over time. Each branch represents a lineage

Comb Jellies: Evolutionary enigma

http://www.sciencenews.org/view/feature/id/350120/description/Evolutionary_enigmas3

Page 4: Phylogenies: Parsimony Plus a Tantalizing Taste of Likelihood...TREE OF LIFE Diagrams depict the history of animal lineages as they evolved over time. Each branch represents a lineage

TREE OF LIFEDiagrams depict the history of animal lineages as they evolved over time. Each branch represents a lineage that shares an ancestor with all of the animals that branch after the point where it splits from the tree. Biologists traditionally build trees by comparing species’ anatomies; now they also compare DNA sequences.

4

Page 5: Phylogenies: Parsimony Plus a Tantalizing Taste of Likelihood...TREE OF LIFE Diagrams depict the history of animal lineages as they evolved over time. Each branch represents a lineage

5

Page 6: Phylogenies: Parsimony Plus a Tantalizing Taste of Likelihood...TREE OF LIFE Diagrams depict the history of animal lineages as they evolved over time. Each branch represents a lineage

A Complex Question:

Given data (sequences, anatomy, ...) infer the phylogeny

A Simpler Question:

Given data and a phylogeny, evaluate “how much change” is needed to fit data to tree

(The former question is usually tackled by sampling tree topologies & comparing them by the later metric…)

6

Page 7: Phylogenies: Parsimony Plus a Tantalizing Taste of Likelihood...TREE OF LIFE Diagrams depict the history of animal lineages as they evolved over time. Each branch represents a lineage

Human A T G A T ...

Chimp A T G A T ...

Gorilla A T G A G ...

Rat A T G C G ...

Mouse A T G C T ...

Parsimony

General idea ~ Occam’s Razor: Given data where change is rare, prefer an explanation that requires few events

7

Page 8: Phylogenies: Parsimony Plus a Tantalizing Taste of Likelihood...TREE OF LIFE Diagrams depict the history of animal lineages as they evolved over time. Each branch represents a lineage

Human A T G A T ...

Chimp A T G A T ...

Gorilla A T G A G ...

Rat A T G C G ...

Mouse A T G C T ...

General idea ~ Occam’s Razor: Given data where change is rare, prefer an explanation that requires few events

Parsimony

A

A

A

AA

A

A

A

A

0 changes

(of course other, less

parsimonious, answers possible)

8

Page 9: Phylogenies: Parsimony Plus a Tantalizing Taste of Likelihood...TREE OF LIFE Diagrams depict the history of animal lineages as they evolved over time. Each branch represents a lineage

Human A T G A T ...

Chimp A T G A T ...

Gorilla A T G A G ...

Rat A T G C G ...

Mouse A T G C T ...

General idea ~ Occam’s Razor: Given data where change is rare, prefer an explanation that requires few events

Parsimony

T

T

T

TT

T

T

T

T

0 changes

9

Page 10: Phylogenies: Parsimony Plus a Tantalizing Taste of Likelihood...TREE OF LIFE Diagrams depict the history of animal lineages as they evolved over time. Each branch represents a lineage

Human A T G A T ...

Chimp A T G A T ...

Gorilla A T G A G ...

Rat A T G C G ...

Mouse A T G C T ...

General idea ~ Occam’s Razor: Given data where change is rare, prefer an explanation that requires few events

Parsimony

G

G

G

GG

G

G

G

G

0 changes

10

Page 11: Phylogenies: Parsimony Plus a Tantalizing Taste of Likelihood...TREE OF LIFE Diagrams depict the history of animal lineages as they evolved over time. Each branch represents a lineage

Human A T G A T ...

Chimp A T G A T ...

Gorilla A T G A G ...

Rat A T G C G ...

Mouse A T G C T ...

General idea ~ Occam’s Razor: Given data where change is rare, prefer an explanation that requires few events

Parsimony

A

C

A/C

AA

A

A

C

C

1 change

11

Page 12: Phylogenies: Parsimony Plus a Tantalizing Taste of Likelihood...TREE OF LIFE Diagrams depict the history of animal lineages as they evolved over time. Each branch represents a lineage

Human A T G A T ...

Chimp A T G A T ...

Gorilla A T G A G ...

Rat A T G C G ...

Mouse A T G C T ...

General idea ~ Occam’s Razor: Given data where change is rare, prefer an explanation that requires few events

Parsimony

T

G/T

G/T

G/TT

T

G

T

G

2 changes

12

Page 13: Phylogenies: Parsimony Plus a Tantalizing Taste of Likelihood...TREE OF LIFE Diagrams depict the history of animal lineages as they evolved over time. Each branch represents a lineage

Counting Events Parsimoniously

Lesson of example – no unique reconstruction

But there is a unique minimum number, of course

How to find it?

Early solutions 1965-75

13

Page 14: Phylogenies: Parsimony Plus a Tantalizing Taste of Likelihood...TREE OF LIFE Diagrams depict the history of animal lineages as they evolved over time. Each branch represents a lineage

G G TTT

A C G T A C G T A C G T A C G T A C G T

A C G T A C G T

A C G T

A C G T

Sankoff & Rousseau, ‘75Pu(s) = best parsimony score of subtree rooted at node u, assuming u is labeled by character s

14

Page 15: Phylogenies: Parsimony Plus a Tantalizing Taste of Likelihood...TREE OF LIFE Diagrams depict the history of animal lineages as they evolved over time. Each branch represents a lineage

For leaf u:

Pu(s) =

⇢0 if u is a leaf labeled s1 if u is a leaf not labeled s

For internal node u:

Pu(s) =

X

v2child(u)

min

t2{A,C,G,T}cost(s, t) + Pv(t)

Sankoff-Rousseau Recurrence

For Leaf u:

For Internal node u:

Time: O(alphabet2 x tree size)

Pu(s) = best parsimony score of subtree rooted at node u, assuming u is labeled by character s

15

Page 16: Phylogenies: Parsimony Plus a Tantalizing Taste of Likelihood...TREE OF LIFE Diagrams depict the history of animal lineages as they evolved over time. Each branch represents a lineage

A C G T A C G T

Sankoff & Rousseau, ‘75Pu(s) = best parsimony score of subtree rooted at node u, assuming u is labeled by character s

A C G T

For leaf u:

Pu(s) =

⇢0 if u is a leaf labeled s1 if u is a leaf not labeled s

For internal node u:

Pu(s) =

X

v2child(u)

min

t2{A,C,G,T}cost(s, t) + Pv(t)

u

v1 v2

s v t cost(s,t)+Pv(t) min

v1

A

v1

Cv1 G

v1

T

v2

A

v2

Cv2 G

v2

Tsum: Pu(s) = sum: Pu(s) = sum: Pu(s) = sum: Pu(s) =

16

Page 17: Phylogenies: Parsimony Plus a Tantalizing Taste of Likelihood...TREE OF LIFE Diagrams depict the history of animal lineages as they evolved over time. Each branch represents a lineage

TT

A C G T A C G T

Sankoff & Rousseau, ‘75Pu(s) = best parsimony score of subtree rooted at node u, assuming u is labeled by character s

∞ ∞ ∞ 0 ∞ ∞ ∞ 0

A C G T

2 2 2 0

For leaf u:

Pu(s) =

⇢0 if u is a leaf labeled s1 if u is a leaf not labeled s

For internal node u:

Pu(s) =

X

v2child(u)

min

t2{A,C,G,T}cost(s, t) + Pv(t)

u

v1 v2

s v t cost(s,t)+Pv(t) min

A

v1

A 0 + ∞1

A

v1

C 1 + ∞ 1

A

v1 G 1 + ∞ 1

A

v1

T 1 + 0

1

A

v2

A 0 + ∞1

A

v2

C 1 + ∞ 1

A

v2 G 1 + ∞

1

A

v2

T 1 + 0

1

sum: Pu(s) = sum: Pu(s) = sum: Pu(s) = sum: Pu(s) = 217

Page 18: Phylogenies: Parsimony Plus a Tantalizing Taste of Likelihood...TREE OF LIFE Diagrams depict the history of animal lineages as they evolved over time. Each branch represents a lineage

G G TTT

A C G T A C G T A C G T A C G T A C G T

A C G T A C G T

A C G T

A C G T

Sankoff & Rousseau, ‘75Pu(s) = best parsimony score of subtree rooted at node u, assuming u is labeled by character s

∞ ∞ ∞ 0 ∞ ∞ ∞ 0 ∞ ∞ 0 ∞ ∞ ∞ 0 ∞ ∞ ∞ ∞ 0

2 2 2 0 2 2 1 1

2 2 1 1

4 4 2 2Min = 2 (G or T)

18

Page 19: Phylogenies: Parsimony Plus a Tantalizing Taste of Likelihood...TREE OF LIFE Diagrams depict the history of animal lineages as they evolved over time. Each branch represents a lineage

Which tree is better?

Which has smaller parsimony score?

Which is more likely, assuming edge length proportional to evolutionary rate?

A

A

G

G

A

A

G

G

19

Page 20: Phylogenies: Parsimony Plus a Tantalizing Taste of Likelihood...TREE OF LIFE Diagrams depict the history of animal lineages as they evolved over time. Each branch represents a lineage

Parsimony – Generalities

Parsimony is not the best way to evaluate a phylogeny (maximum likelihood generally preferred - as previous slide suggests)

But it is a natural approach, works well in many cases, and is fast.

Finding the best tree: a much harder problem

Much is known about these problems; Inferring

Phylogenies by Joe Felsenstein is a great resource.

20

Page 21: Phylogenies: Parsimony Plus a Tantalizing Taste of Likelihood...TREE OF LIFE Diagrams depict the history of animal lineages as they evolved over time. Each branch represents a lineage

Phylogenetic Footprinting

A lovely extension of the above ideas. E.g., suppose promoters of orthologous genes in multiple species all contain (variants of) a common k-base transcription factor binding site. Roughly as above, but 4k table entries per node…

1. M Blanchette, B Schwikowski, M Tompa, Algorithms for Phylogenetic Footprinting. J Comp Biol, vol. 9, no. 2, 2002, 211-223

2. M Blanchette and M Tompa, FootPrinter: a Program Designed for Phylogenetic Footprinting. Nucleic Acids Research, vol. 31, no. 13, July 2003, 3840-3842

21