lecture 2. tree space and searching tree...

48
Lecture 2. Tree space and searching tree space Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 2. Tree space and searching tree space – p.1/48

Upload: others

Post on 30-May-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 2. Tree space and searching tree spaceevolution.gs.washington.edu/gs541/2010/lecture2.pdf · All possible trees a b Forming all 4-species trees by adding the next species

Lecture 2. Tree space and searching tree space

Joe Felsenstein

Department of Genome Sciences and Department of Biology

Lecture 2. Tree space and searching tree space – p.1/48

Page 2: Lecture 2. Tree space and searching tree spaceevolution.gs.washington.edu/gs541/2010/lecture2.pdf · All possible trees a b Forming all 4-species trees by adding the next species

The same tree?

Hum

an

Chi

mp

Gor

illa

Ora

ng

Gib

bon

Mac

acqu

e

Col

obus

Hum

an

Chi

mp

Gor

illa

Ora

ng

Gib

bon

Mac

acqu

e

Col

obus

Lecture 2. Tree space and searching tree space – p.2/48

Page 3: Lecture 2. Tree space and searching tree spaceevolution.gs.washington.edu/gs541/2010/lecture2.pdf · All possible trees a b Forming all 4-species trees by adding the next species

All possible treesa b

Forming all 4-species trees by adding the next species in all possibleplaces

Lecture 2. Tree space and searching tree space – p.3/48

Page 4: Lecture 2. Tree space and searching tree spaceevolution.gs.washington.edu/gs541/2010/lecture2.pdf · All possible trees a b Forming all 4-species trees by adding the next species

All possible treesa b

a bca bc bc a

Forming all 4-species trees by adding the next species in all possibleplaces

Lecture 2. Tree space and searching tree space – p.4/48

Page 5: Lecture 2. Tree space and searching tree spaceevolution.gs.washington.edu/gs541/2010/lecture2.pdf · All possible trees a b Forming all 4-species trees by adding the next species

All possible treesa b

a bca bc bc a

a d c b

c bd a

c b

c

a d

a b d

a c b d

etc. etc.

Forming all 4-species trees by adding the next species in all possibleplaces

Lecture 2. Tree space and searching tree space – p.5/48

Page 6: Lecture 2. Tree space and searching tree spaceevolution.gs.washington.edu/gs541/2010/lecture2.pdf · All possible trees a b Forming all 4-species trees by adding the next species

The number of rooted bifurcating trees:

1 × 3 × 5 × 7 × . . . × (2n − 3)

= (2n − 3)!/(

(n − 2)! 2n−2)

Lecture 2. Tree space and searching tree space – p.6/48

Page 7: Lecture 2. Tree space and searching tree spaceevolution.gs.washington.edu/gs541/2010/lecture2.pdf · All possible trees a b Forming all 4-species trees by adding the next species

which is:species number of trees

1 12 13 34 155 1056 9457 10,3958 135,1359 2,027,025

10 34,459,42511 654,729,07512 13,749,310,57513 316,234,143,22514 7,905,853,580,62515 213,458,046,676,87516 6,190,283,353,629,37517 191,898,783,962,510,62518 6,332,659,870,762,850,62519 221,643,095,476,699,771,87520 8,200,794,532,637,891,559,37530 4.9518 ×10

38

40 1.00985 ×1057

50 2.75292 ×1076

Lecture 2. Tree space and searching tree space – p.7/48

Page 8: Lecture 2. Tree space and searching tree spaceevolution.gs.washington.edu/gs541/2010/lecture2.pdf · All possible trees a b Forming all 4-species trees by adding the next species

Mapping an unrooted tree into a rooted tree

12

3

4

5 6

7

8

1

6

47

5

2

8 3

12

3

4

5 6

7

8

1

6

47

5

2

8 3

... one with one fewer species.

Lecture 2. Tree space and searching tree space – p.8/48

Page 9: Lecture 2. Tree space and searching tree spaceevolution.gs.washington.edu/gs541/2010/lecture2.pdf · All possible trees a b Forming all 4-species trees by adding the next species

For one tree topology

The space of trees varying all 2n − 3 branch lengths, each a nonegativenumber, defines an “orthant" (open corner) of a (2n − 3)-dimensional realspace:

A

B

C

D

E

F

v 1

v

vv

v

v

v 23

v 4

5

6

78

v 9

wall wall

floor v 9

Lecture 2. Tree space and searching tree space – p.9/48

Page 10: Lecture 2. Tree space and searching tree spaceevolution.gs.washington.edu/gs541/2010/lecture2.pdf · All possible trees a b Forming all 4-species trees by adding the next species

Through the looking glass ...

A

B

C

D

E

F

v 1

v

vv

v

v

v 23

v 4

5

6

78

v 9

Lecture 2. Tree space and searching tree space – p.10/48

Page 11: Lecture 2. Tree space and searching tree spaceevolution.gs.washington.edu/gs541/2010/lecture2.pdf · All possible trees a b Forming all 4-species trees by adding the next species

Through the looking glass ...

A

B

C

D

E

F

v 1

v

vv

v

v

v 23

v 4

5

6

78

v 9

A

B

C

E

F

v 1

v

vv

vv 2

3

D

v

v 4

56

78

Lecture 2. Tree space and searching tree space – p.11/48

Page 12: Lecture 2. Tree space and searching tree spaceevolution.gs.washington.edu/gs541/2010/lecture2.pdf · All possible trees a b Forming all 4-species trees by adding the next species

Through the looking glass ...

A

B

C

D

E

F

v 1

v

vv

v

v

v 23

v 4

5

6

78

v 9

A

B

C

E

F

v 1

v

vv

vv 2

3

D

v

v 4

56

78

A

B

C

D

E

F

v 1

v

vv

v

v

v 23

v 4

56

78

v 9

Lecture 2. Tree space and searching tree space – p.12/48

Page 13: Lecture 2. Tree space and searching tree spaceevolution.gs.washington.edu/gs541/2010/lecture2.pdf · All possible trees a b Forming all 4-species trees by adding the next species

Through the looking glass ...

A

B

C

D

E

F

v 1

v

vv

v

v

v 23

v 4

5

6

78

v 9

A

B

C

E

F

v 1

v

vv

vv 2

3

D

v

v 4

56

78

A

B

C

D

E

F

v 1

v

v

vv

v

v 23

v 4

5

6

7

8

v 9

Lecture 2. Tree space and searching tree space – p.13/48

Page 14: Lecture 2. Tree space and searching tree spaceevolution.gs.washington.edu/gs541/2010/lecture2.pdf · All possible trees a b Forming all 4-species trees by adding the next species

Tree space

t1t2

t1t2

an example: three species with a clock

A B C

t 1

t 2

t 1

t 2

OK

not possible

trifurcation

etc.

when we consider all three possible topologies, the space looks like:

Lecture 2. Tree space and searching tree space – p.14/48

Page 15: Lecture 2. Tree space and searching tree spaceevolution.gs.washington.edu/gs541/2010/lecture2.pdf · All possible trees a b Forming all 4-species trees by adding the next species

A global maximum is not easy to find

If start here

Lecture 2. Tree space and searching tree space – p.15/48

Page 16: Lecture 2. Tree space and searching tree spaceevolution.gs.washington.edu/gs541/2010/lecture2.pdf · All possible trees a b Forming all 4-species trees by adding the next species

A global maximum is not easy to find

If start here

Lecture 2. Tree space and searching tree space – p.16/48

Page 17: Lecture 2. Tree space and searching tree spaceevolution.gs.washington.edu/gs541/2010/lecture2.pdf · All possible trees a b Forming all 4-species trees by adding the next species

A global maximum is not easy to find

If start here

Lecture 2. Tree space and searching tree space – p.17/48

Page 18: Lecture 2. Tree space and searching tree spaceevolution.gs.washington.edu/gs541/2010/lecture2.pdf · All possible trees a b Forming all 4-species trees by adding the next species

A global maximum is not easy to find

If start here

Lecture 2. Tree space and searching tree space – p.18/48

Page 19: Lecture 2. Tree space and searching tree spaceevolution.gs.washington.edu/gs541/2010/lecture2.pdf · All possible trees a b Forming all 4-species trees by adding the next species

A global maximum is not easy to find

If start here

Lecture 2. Tree space and searching tree space – p.19/48

Page 20: Lecture 2. Tree space and searching tree spaceevolution.gs.washington.edu/gs541/2010/lecture2.pdf · All possible trees a b Forming all 4-species trees by adding the next species

A global maximum is not easy to find

If start here

Lecture 2. Tree space and searching tree space – p.20/48

Page 21: Lecture 2. Tree space and searching tree spaceevolution.gs.washington.edu/gs541/2010/lecture2.pdf · All possible trees a b Forming all 4-species trees by adding the next species

A global maximum is not easy to find

If start here

Lecture 2. Tree space and searching tree space – p.21/48

Page 22: Lecture 2. Tree space and searching tree spaceevolution.gs.washington.edu/gs541/2010/lecture2.pdf · All possible trees a b Forming all 4-species trees by adding the next species

A global maximum is not easy to find

If start here

Lecture 2. Tree space and searching tree space – p.22/48

Page 23: Lecture 2. Tree space and searching tree spaceevolution.gs.washington.edu/gs541/2010/lecture2.pdf · All possible trees a b Forming all 4-species trees by adding the next species

A global maximum is not easy to find

If start here

Lecture 2. Tree space and searching tree space – p.23/48

Page 24: Lecture 2. Tree space and searching tree spaceevolution.gs.washington.edu/gs541/2010/lecture2.pdf · All possible trees a b Forming all 4-species trees by adding the next species

A global maximum is not easy to find

If start here

Lecture 2. Tree space and searching tree space – p.24/48

Page 25: Lecture 2. Tree space and searching tree spaceevolution.gs.washington.edu/gs541/2010/lecture2.pdf · All possible trees a b Forming all 4-species trees by adding the next species

A global maximum is not easy to find

If start here

Lecture 2. Tree space and searching tree space – p.25/48

Page 26: Lecture 2. Tree space and searching tree spaceevolution.gs.washington.edu/gs541/2010/lecture2.pdf · All possible trees a b Forming all 4-species trees by adding the next species

A global maximum is not easy to find

If start here

Lecture 2. Tree space and searching tree space – p.26/48

Page 27: Lecture 2. Tree space and searching tree spaceevolution.gs.washington.edu/gs541/2010/lecture2.pdf · All possible trees a b Forming all 4-species trees by adding the next species

A global maximum is not easy to find

If start here

Lecture 2. Tree space and searching tree space – p.27/48

Page 28: Lecture 2. Tree space and searching tree spaceevolution.gs.washington.edu/gs541/2010/lecture2.pdf · All possible trees a b Forming all 4-species trees by adding the next species

A global maximum is not easy to find

If start here

Lecture 2. Tree space and searching tree space – p.28/48

Page 29: Lecture 2. Tree space and searching tree spaceevolution.gs.washington.edu/gs541/2010/lecture2.pdf · All possible trees a b Forming all 4-species trees by adding the next species

A global maximum is not easy to find

end up here

If start here

Lecture 2. Tree space and searching tree space – p.29/48

Page 30: Lecture 2. Tree space and searching tree spaceevolution.gs.washington.edu/gs541/2010/lecture2.pdf · All possible trees a b Forming all 4-species trees by adding the next species

A global maximum is not easy to find

end up here but global maximum is here

If start here

Lecture 2. Tree space and searching tree space – p.30/48

Page 31: Lecture 2. Tree space and searching tree spaceevolution.gs.washington.edu/gs541/2010/lecture2.pdf · All possible trees a b Forming all 4-species trees by adding the next species

Nearest-neighbor interchanges (NNIs)

U V

U V

U V

S T

S T

S T S T

U V

and reforming them in one of the two possible alternative ways:

is rearranged by dissolving the connections to an interior branch

A subtree

(The triangles are subtrees)Lecture 2. Tree space and searching tree space – p.31/48

Page 32: Lecture 2. Tree space and searching tree spaceevolution.gs.washington.edu/gs541/2010/lecture2.pdf · All possible trees a b Forming all 4-species trees by adding the next species

all 15 trees, connected by NNIs

D B

C E

A

C E

D A

B D C

A E

B

A C

D E

B

E B

C D

A B C

D E

A

C B

D E

A

A B

D E

C

E B

D C

AB D

C E

A

B C

E D

A

A B

E C

D

D B

E C

AC D

B E

A

E C

B D

A

The graph of all 15 5-species unrooted trees, connected by NNIs

Lecture 2. Tree space and searching tree space – p.32/48

Page 33: Lecture 2. Tree space and searching tree spaceevolution.gs.washington.edu/gs541/2010/lecture2.pdf · All possible trees a b Forming all 4-species trees by adding the next species

with parsimony scores

11

9

11

11

11

10

11

9

11

9

11

8

9

10

9

The same graph with parsimony scores (try a “greedy" search with NNI’s)

Lecture 2. Tree space and searching tree space – p.33/48

Page 34: Lecture 2. Tree space and searching tree spaceevolution.gs.washington.edu/gs541/2010/lecture2.pdf · All possible trees a b Forming all 4-species trees by adding the next species

Subtree pruning and regrafting (SPR) rearrangement

A

BC

D

E

F

G

H

I

J

KA

BC

D

E

F

G

H

I

J

K

Break a branch, remove a subtree

Add it in, attaching it to one (*)

A

C

D

E

G

K

*

A

C

D

G

E

K

B

F

H

I

J

Here is the result:

of the other branches

Lecture 2. Tree space and searching tree space – p.34/48

Page 35: Lecture 2. Tree space and searching tree spaceevolution.gs.washington.edu/gs541/2010/lecture2.pdf · All possible trees a b Forming all 4-species trees by adding the next species

Tree bisection and reconnection (TBR) rearrangement

A

BC

D

E

F

G

H

I

J

KA

BC

D

E

F

G

H

I

J

K

Here is the result:

Break a branch, separate the subtrees

Connect a branch of one

A

BC

D

E

F

G

H

I

J

KA

B

C

D

E

GF

HI

J

K

to a branch of the other

Lecture 2. Tree space and searching tree space – p.35/48

Page 36: Lecture 2. Tree space and searching tree spaceevolution.gs.washington.edu/gs541/2010/lecture2.pdf · All possible trees a b Forming all 4-species trees by adding the next species

Greedy search by sequential addition

A

D

B

C

A

B

C

B

C

D

A

D

A

C

B

8 7 9

BA

D

C

E11

A

D 9

E

C

B

A

D E9

BC

A

9 C

B

E

D

D 9 C

BEA

Greedy search by addition of species in a fixed order (A, B, C, D, E) in thebest place each time. Lecture 2. Tree space and searching tree space – p.36/48

Page 37: Lecture 2. Tree space and searching tree spaceevolution.gs.washington.edu/gs541/2010/lecture2.pdf · All possible trees a b Forming all 4-species trees by adding the next species

Goloboff’s time-saving trick

H−K

L

M−

R

S−U

A

V−Z

V−Z

A−G H−R

S−U

B−G

Goloboff’s economy in computing scores of rearranged treesOnce the “views” have been computed, they can be taken to

represent subtrees, without going inside those subtrees

Lecture 2. Tree space and searching tree space – p.37/48

Page 38: Lecture 2. Tree space and searching tree spaceevolution.gs.washington.edu/gs541/2010/lecture2.pdf · All possible trees a b Forming all 4-species trees by adding the next species

Star decomposition

A

C

D

E F

B

E

C

D

A

B

F

B C

D

A

E F

E

C

D

A

B

F

B C

D

A

F

E

C

D

A

B

F

E

“Star decomposition" search for best tree can happen in multiple ways

Lecture 2. Tree space and searching tree space – p.38/48

Page 39: Lecture 2. Tree space and searching tree spaceevolution.gs.washington.edu/gs541/2010/lecture2.pdf · All possible trees a b Forming all 4-species trees by adding the next species

Disk-covering

A

B

C D

EF

0.1

0.05

0.1 0.04 0.1

0.030.030.02

0.05

“Disk covering" – assembly of a tree from overlapping estimated subtrees

Lecture 2. Tree space and searching tree space – p.39/48

Page 40: Lecture 2. Tree space and searching tree spaceevolution.gs.washington.edu/gs541/2010/lecture2.pdf · All possible trees a b Forming all 4-species trees by adding the next species

Shortest Hamiltonian path problem(a) (b)

(c) (d)

Lecture 2. Tree space and searching tree space – p.40/48

Page 41: Lecture 2. Tree space and searching tree spaceevolution.gs.washington.edu/gs541/2010/lecture2.pdf · All possible trees a b Forming all 4-species trees by adding the next species

Search tree for this problem

etc. etc.

etc.etc.

add 1 add 2 add 3

add 2 add 3 add 4 add 5

add 3 add 5

add 8 add 10add 9

add 9

add 9add 3add 10

add 10 add 8

add 8add 3add 10

add 10 add 8

add 8add 3

add 9

etc. etc.

start

(1,2,3,4,5,6,7,8,9,10) (1,2,3,4,5,6,7,9,8,10) (1,2,3,4,5,6,7,10,8,9)

(1,2,3,4,5,6,7,8,10,9) (1,2,3,4,5,6,7,9,10,8) (1,2,3,4,5,6,7,10,9,8)

add 4

etc.

add 9

Lecture 2. Tree space and searching tree space – p.41/48

Page 42: Lecture 2. Tree space and searching tree spaceevolution.gs.washington.edu/gs541/2010/lecture2.pdf · All possible trees a b Forming all 4-species trees by adding the next species

Search tree of trees

C

A

B

D C

A B

A

C

B

D

A

B

C

D

A

ED

B

C

E

DA

B

C

D

AE

B

C

D

AC

B

E

D

AB

C

E

E

AC

B

D

E

C A B

DC

AE

B

D

C

AD

B

E

C

AB

D

E

E

AB

C

D

E

BA

C

D

B

AE

C

D

B

AD

C

E

B

A C D

ELecture 2. Tree space and searching tree space – p.42/48

Page 43: Lecture 2. Tree space and searching tree spaceevolution.gs.washington.edu/gs541/2010/lecture2.pdf · All possible trees a b Forming all 4-species trees by adding the next species

same, with parsimony scores in place of trees

8

11

11

9

3

9

7 8

9

9

9

10

10

1111

11

11

9

11

Lecture 2. Tree space and searching tree space – p.43/48

Page 44: Lecture 2. Tree space and searching tree spaceevolution.gs.washington.edu/gs541/2010/lecture2.pdf · All possible trees a b Forming all 4-species trees by adding the next species

Polynomial time and exponential time

1 10 10010

0

101

102

103

104

105

106

Tim

e

Problem size

6n +4n−33

e0.5n

How does the time taken by an algorithm depend on the size of theproblem? If it is a polynomial (even one with big coefficients), with a bigenough case it is faster than one that depends on the size exponentially.

Lecture 2. Tree space and searching tree space – p.44/48

Page 45: Lecture 2. Tree space and searching tree spaceevolution.gs.washington.edu/gs541/2010/lecture2.pdf · All possible trees a b Forming all 4-species trees by adding the next species

NP completeness and NP hardness

P

NP

does thispart exist?

NP Hard

is P = NP?

NP Complete

(This diagram is not quite correct – see the diagrams on the Wikipedia page for “NP-hard”).

P = problems that can be solved by a polynomial time algorithm

NP complete = problems for which a proposed solution can be checked in polynomial timebut for which it can be proven that if one of them is in P, all are.

NP hard = problems for which a solution can be checked in polynomial time, but might be notsolvable in polynomial time.

Lecture 2. Tree space and searching tree space – p.45/48

Page 46: Lecture 2. Tree space and searching tree spaceevolution.gs.washington.edu/gs541/2010/lecture2.pdf · All possible trees a b Forming all 4-species trees by adding the next species

Some referencesFelsenstein, J. 1978. The number of evolutionary trees. Systematic Zoology27: 27-33.

(Correction, vol. 30, p. 122, 1981) [Review of counting tip-labelled trees, recursion forcounting multifurcating case]

Cavalli-Sforza, L. L. and A. W. F. Edwards. 1967. Phylogenetic analysis: models and estimationprocedures. American Journal of Human Genetics19: 233-257. also Evolution21: 550-570.[Includes counting and tree shapes]

Camin, J. H. and R. R. Sokal. 1965. A method for deducing branching sequences in phylogeny.Evolution19: 311-326. [Early parsimony paper includes rearrangement of trees]

Waterman, M. S. and T. F. Smith. 1978. On the similarity of dendrograms. Journal of Theoretical

Biology73: 789-800. [Defines NNIs. Uses them to get a distance between trees.]Maddison, D. R. 1991. The discovery and importance of multiple islands of most-parsimonious

trees. Systematic Zoology40: 315-328. [Discusses heuristic search strategy involving ties,multiple starts]

Farris, J. S. 1970. Methods for computing Wagner trees. Systematic Zoology19: 83-92. [Earlyparsimony algorithms paper is one of first to mention sequential addition strategy]

Lecture 2. Tree space and searching tree space – p.46/48

Page 47: Lecture 2. Tree space and searching tree spaceevolution.gs.washington.edu/gs541/2010/lecture2.pdf · All possible trees a b Forming all 4-species trees by adding the next species

continuedSaitou, N., and M. Nei. 1987. The neighbor-joining method: a new method for reconstructing

phylogenetic trees. Molecular Biology and Evolution4: 406-425. [First mention ofstar-decomposition search for best trees, sort of]

Strimmer, K., and A. von Haeseler. 1996. Quartet puzzling: a quartet maximum likelihoodmethod for reconstructing tree topologies. Molecular Biology and Evolution13: 964-969.[Assembles trees out of quartets]

Huson, D., S. Nettles, L. Parida, T. Warnow, and S. Yooseph. 1998. The disk-covering method fortree reconstruction. pp. 62-75 in Proceedings of “Algorithms and Experiments” (ALEX98), Trento,

Italy, Feb. 9-11, 1998, ed. R. Battiti and A. A. Bertossi. [“Disk-covering method” for longstringy trees]

Swofford, D. L. and G. J. Olsen. 1990. Phylogeny reconstruction. Chapter 11, Pp. 411-501 inMolecular Systematics,ed. D. M. Hillis and C. Moritz. Sinauer Associates, Sunderland,Massachusetts. [Review that discusses strategies, names SPR and TBR rearrangementmethods]

Foulds, L. R. and R. L. Graham. 1982. The Steiner problem in phylogeny is NP-complete.Advances in Applied Mathematics3: 43-49. [Parsimony is NP-hard]

Graham, R. L. and L. R. Foulds. 1982. Unlikelihood that minimal phylogenies for a realisticbiological study can be constructed in reasonable computat ional time. Mathematical

Biosciences60: 133-142. [ ... and more]

Lecture 2. Tree space and searching tree space – p.47/48

Page 48: Lecture 2. Tree space and searching tree spaceevolution.gs.washington.edu/gs541/2010/lecture2.pdf · All possible trees a b Forming all 4-species trees by adding the next species

continuedHendy, M. D. and D. Penny. 1982. Branch and bound algorithms to determine minimal

evolutionary trees. Mathematical Biosciences60: 133-142 [Introduced branch-and-bound forphylogenies]

Felsenstein, J. 2004. Inferring Phylogenies.Sinauer Associates, Sunderland, Massachusetts. [Forthis lecture the material is chapters 3, 4, and 5]

Semple, C. and M. Steel. 2003. Phylogenetics.Oxford University Press, Oxford. [Also coverssearch strategies]

Lecture 2. Tree space and searching tree space – p.48/48