lecture 2. tree space and searching tree...
TRANSCRIPT
Lecture 2. Tree space and searching tree space
Joe Felsenstein
Department of Genome Sciences and Department of Biology
Lecture 2. Tree space and searching tree space – p.1/48
The same tree?
Hum
an
Chi
mp
Gor
illa
Ora
ng
Gib
bon
Mac
acqu
e
Col
obus
Hum
an
Chi
mp
Gor
illa
Ora
ng
Gib
bon
Mac
acqu
e
Col
obus
Lecture 2. Tree space and searching tree space – p.2/48
All possible treesa b
Forming all 4-species trees by adding the next species in all possibleplaces
Lecture 2. Tree space and searching tree space – p.3/48
All possible treesa b
a bca bc bc a
Forming all 4-species trees by adding the next species in all possibleplaces
Lecture 2. Tree space and searching tree space – p.4/48
All possible treesa b
a bca bc bc a
a d c b
c bd a
c b
c
a d
a b d
a c b d
etc. etc.
Forming all 4-species trees by adding the next species in all possibleplaces
Lecture 2. Tree space and searching tree space – p.5/48
The number of rooted bifurcating trees:
1 × 3 × 5 × 7 × . . . × (2n − 3)
= (2n − 3)!/(
(n − 2)! 2n−2)
Lecture 2. Tree space and searching tree space – p.6/48
which is:species number of trees
1 12 13 34 155 1056 9457 10,3958 135,1359 2,027,025
10 34,459,42511 654,729,07512 13,749,310,57513 316,234,143,22514 7,905,853,580,62515 213,458,046,676,87516 6,190,283,353,629,37517 191,898,783,962,510,62518 6,332,659,870,762,850,62519 221,643,095,476,699,771,87520 8,200,794,532,637,891,559,37530 4.9518 ×10
38
40 1.00985 ×1057
50 2.75292 ×1076
Lecture 2. Tree space and searching tree space – p.7/48
Mapping an unrooted tree into a rooted tree
12
3
4
5 6
7
8
1
6
47
5
2
8 3
12
3
4
5 6
7
8
1
6
47
5
2
8 3
... one with one fewer species.
Lecture 2. Tree space and searching tree space – p.8/48
For one tree topology
The space of trees varying all 2n − 3 branch lengths, each a nonegativenumber, defines an “orthant" (open corner) of a (2n − 3)-dimensional realspace:
A
B
C
D
E
F
v 1
v
vv
v
v
v 23
v 4
5
6
78
v 9
wall wall
floor v 9
Lecture 2. Tree space and searching tree space – p.9/48
Through the looking glass ...
A
B
C
D
E
F
v 1
v
vv
v
v
v 23
v 4
5
6
78
v 9
Lecture 2. Tree space and searching tree space – p.10/48
Through the looking glass ...
A
B
C
D
E
F
v 1
v
vv
v
v
v 23
v 4
5
6
78
v 9
A
B
C
E
F
v 1
v
vv
vv 2
3
D
v
v 4
56
78
Lecture 2. Tree space and searching tree space – p.11/48
Through the looking glass ...
A
B
C
D
E
F
v 1
v
vv
v
v
v 23
v 4
5
6
78
v 9
A
B
C
E
F
v 1
v
vv
vv 2
3
D
v
v 4
56
78
A
B
C
D
E
F
v 1
v
vv
v
v
v 23
v 4
56
78
v 9
Lecture 2. Tree space and searching tree space – p.12/48
Through the looking glass ...
A
B
C
D
E
F
v 1
v
vv
v
v
v 23
v 4
5
6
78
v 9
A
B
C
E
F
v 1
v
vv
vv 2
3
D
v
v 4
56
78
A
B
C
D
E
F
v 1
v
v
vv
v
v 23
v 4
5
6
7
8
v 9
Lecture 2. Tree space and searching tree space – p.13/48
Tree space
t1t2
t1t2
an example: three species with a clock
A B C
t 1
t 2
t 1
t 2
OK
not possible
trifurcation
etc.
when we consider all three possible topologies, the space looks like:
Lecture 2. Tree space and searching tree space – p.14/48
A global maximum is not easy to find
If start here
Lecture 2. Tree space and searching tree space – p.15/48
A global maximum is not easy to find
If start here
Lecture 2. Tree space and searching tree space – p.16/48
A global maximum is not easy to find
If start here
Lecture 2. Tree space and searching tree space – p.17/48
A global maximum is not easy to find
If start here
Lecture 2. Tree space and searching tree space – p.18/48
A global maximum is not easy to find
If start here
Lecture 2. Tree space and searching tree space – p.19/48
A global maximum is not easy to find
If start here
Lecture 2. Tree space and searching tree space – p.20/48
A global maximum is not easy to find
If start here
Lecture 2. Tree space and searching tree space – p.21/48
A global maximum is not easy to find
If start here
Lecture 2. Tree space and searching tree space – p.22/48
A global maximum is not easy to find
If start here
Lecture 2. Tree space and searching tree space – p.23/48
A global maximum is not easy to find
If start here
Lecture 2. Tree space and searching tree space – p.24/48
A global maximum is not easy to find
If start here
Lecture 2. Tree space and searching tree space – p.25/48
A global maximum is not easy to find
If start here
Lecture 2. Tree space and searching tree space – p.26/48
A global maximum is not easy to find
If start here
Lecture 2. Tree space and searching tree space – p.27/48
A global maximum is not easy to find
If start here
Lecture 2. Tree space and searching tree space – p.28/48
A global maximum is not easy to find
end up here
If start here
Lecture 2. Tree space and searching tree space – p.29/48
A global maximum is not easy to find
end up here but global maximum is here
If start here
Lecture 2. Tree space and searching tree space – p.30/48
Nearest-neighbor interchanges (NNIs)
U V
U V
U V
S T
S T
S T S T
U V
and reforming them in one of the two possible alternative ways:
is rearranged by dissolving the connections to an interior branch
A subtree
(The triangles are subtrees)Lecture 2. Tree space and searching tree space – p.31/48
all 15 trees, connected by NNIs
D B
C E
A
C E
D A
B D C
A E
B
A C
D E
B
E B
C D
A B C
D E
A
C B
D E
A
A B
D E
C
E B
D C
AB D
C E
A
B C
E D
A
A B
E C
D
D B
E C
AC D
B E
A
E C
B D
A
The graph of all 15 5-species unrooted trees, connected by NNIs
Lecture 2. Tree space and searching tree space – p.32/48
with parsimony scores
11
9
11
11
11
10
11
9
11
9
11
8
9
10
9
The same graph with parsimony scores (try a “greedy" search with NNI’s)
Lecture 2. Tree space and searching tree space – p.33/48
Subtree pruning and regrafting (SPR) rearrangement
A
BC
D
E
F
G
H
I
J
KA
BC
D
E
F
G
H
I
J
K
Break a branch, remove a subtree
Add it in, attaching it to one (*)
A
C
D
E
G
K
*
A
C
D
G
E
K
B
F
H
I
J
Here is the result:
of the other branches
Lecture 2. Tree space and searching tree space – p.34/48
Tree bisection and reconnection (TBR) rearrangement
A
BC
D
E
F
G
H
I
J
KA
BC
D
E
F
G
H
I
J
K
Here is the result:
Break a branch, separate the subtrees
Connect a branch of one
A
BC
D
E
F
G
H
I
J
KA
B
C
D
E
GF
HI
J
K
to a branch of the other
Lecture 2. Tree space and searching tree space – p.35/48
Greedy search by sequential addition
A
D
B
C
A
B
C
B
C
D
A
D
A
C
B
8 7 9
BA
D
C
E11
A
D 9
E
C
B
A
D E9
BC
A
9 C
B
E
D
D 9 C
BEA
Greedy search by addition of species in a fixed order (A, B, C, D, E) in thebest place each time. Lecture 2. Tree space and searching tree space – p.36/48
Goloboff’s time-saving trick
H−K
L
M−
R
S−U
A
V−Z
V−Z
A−G H−R
S−U
B−G
Goloboff’s economy in computing scores of rearranged treesOnce the “views” have been computed, they can be taken to
represent subtrees, without going inside those subtrees
Lecture 2. Tree space and searching tree space – p.37/48
Star decomposition
A
C
D
E F
B
E
C
D
A
B
F
B C
D
A
E F
E
C
D
A
B
F
B C
D
A
F
E
C
D
A
B
F
E
“Star decomposition" search for best tree can happen in multiple ways
Lecture 2. Tree space and searching tree space – p.38/48
Disk-covering
A
B
C D
EF
0.1
0.05
0.1 0.04 0.1
0.030.030.02
0.05
“Disk covering" – assembly of a tree from overlapping estimated subtrees
Lecture 2. Tree space and searching tree space – p.39/48
Shortest Hamiltonian path problem(a) (b)
(c) (d)
Lecture 2. Tree space and searching tree space – p.40/48
Search tree for this problem
etc. etc.
etc.etc.
add 1 add 2 add 3
add 2 add 3 add 4 add 5
add 3 add 5
add 8 add 10add 9
add 9
add 9add 3add 10
add 10 add 8
add 8add 3add 10
add 10 add 8
add 8add 3
add 9
etc. etc.
start
(1,2,3,4,5,6,7,8,9,10) (1,2,3,4,5,6,7,9,8,10) (1,2,3,4,5,6,7,10,8,9)
(1,2,3,4,5,6,7,8,10,9) (1,2,3,4,5,6,7,9,10,8) (1,2,3,4,5,6,7,10,9,8)
add 4
etc.
add 9
Lecture 2. Tree space and searching tree space – p.41/48
Search tree of trees
C
A
B
D C
A B
A
C
B
D
A
B
C
D
A
ED
B
C
E
DA
B
C
D
AE
B
C
D
AC
B
E
D
AB
C
E
E
AC
B
D
E
C A B
DC
AE
B
D
C
AD
B
E
C
AB
D
E
E
AB
C
D
E
BA
C
D
B
AE
C
D
B
AD
C
E
B
A C D
ELecture 2. Tree space and searching tree space – p.42/48
same, with parsimony scores in place of trees
8
11
11
9
3
9
7 8
9
9
9
10
10
1111
11
11
9
11
Lecture 2. Tree space and searching tree space – p.43/48
Polynomial time and exponential time
1 10 10010
0
101
102
103
104
105
106
Tim
e
Problem size
6n +4n−33
e0.5n
How does the time taken by an algorithm depend on the size of theproblem? If it is a polynomial (even one with big coefficients), with a bigenough case it is faster than one that depends on the size exponentially.
Lecture 2. Tree space and searching tree space – p.44/48
NP completeness and NP hardness
P
NP
does thispart exist?
NP Hard
is P = NP?
NP Complete
(This diagram is not quite correct – see the diagrams on the Wikipedia page for “NP-hard”).
P = problems that can be solved by a polynomial time algorithm
NP complete = problems for which a proposed solution can be checked in polynomial timebut for which it can be proven that if one of them is in P, all are.
NP hard = problems for which a solution can be checked in polynomial time, but might be notsolvable in polynomial time.
Lecture 2. Tree space and searching tree space – p.45/48
Some referencesFelsenstein, J. 1978. The number of evolutionary trees. Systematic Zoology27: 27-33.
(Correction, vol. 30, p. 122, 1981) [Review of counting tip-labelled trees, recursion forcounting multifurcating case]
Cavalli-Sforza, L. L. and A. W. F. Edwards. 1967. Phylogenetic analysis: models and estimationprocedures. American Journal of Human Genetics19: 233-257. also Evolution21: 550-570.[Includes counting and tree shapes]
Camin, J. H. and R. R. Sokal. 1965. A method for deducing branching sequences in phylogeny.Evolution19: 311-326. [Early parsimony paper includes rearrangement of trees]
Waterman, M. S. and T. F. Smith. 1978. On the similarity of dendrograms. Journal of Theoretical
Biology73: 789-800. [Defines NNIs. Uses them to get a distance between trees.]Maddison, D. R. 1991. The discovery and importance of multiple islands of most-parsimonious
trees. Systematic Zoology40: 315-328. [Discusses heuristic search strategy involving ties,multiple starts]
Farris, J. S. 1970. Methods for computing Wagner trees. Systematic Zoology19: 83-92. [Earlyparsimony algorithms paper is one of first to mention sequential addition strategy]
Lecture 2. Tree space and searching tree space – p.46/48
continuedSaitou, N., and M. Nei. 1987. The neighbor-joining method: a new method for reconstructing
phylogenetic trees. Molecular Biology and Evolution4: 406-425. [First mention ofstar-decomposition search for best trees, sort of]
Strimmer, K., and A. von Haeseler. 1996. Quartet puzzling: a quartet maximum likelihoodmethod for reconstructing tree topologies. Molecular Biology and Evolution13: 964-969.[Assembles trees out of quartets]
Huson, D., S. Nettles, L. Parida, T. Warnow, and S. Yooseph. 1998. The disk-covering method fortree reconstruction. pp. 62-75 in Proceedings of “Algorithms and Experiments” (ALEX98), Trento,
Italy, Feb. 9-11, 1998, ed. R. Battiti and A. A. Bertossi. [“Disk-covering method” for longstringy trees]
Swofford, D. L. and G. J. Olsen. 1990. Phylogeny reconstruction. Chapter 11, Pp. 411-501 inMolecular Systematics,ed. D. M. Hillis and C. Moritz. Sinauer Associates, Sunderland,Massachusetts. [Review that discusses strategies, names SPR and TBR rearrangementmethods]
Foulds, L. R. and R. L. Graham. 1982. The Steiner problem in phylogeny is NP-complete.Advances in Applied Mathematics3: 43-49. [Parsimony is NP-hard]
Graham, R. L. and L. R. Foulds. 1982. Unlikelihood that minimal phylogenies for a realisticbiological study can be constructed in reasonable computat ional time. Mathematical
Biosciences60: 133-142. [ ... and more]
Lecture 2. Tree space and searching tree space – p.47/48
continuedHendy, M. D. and D. Penny. 1982. Branch and bound algorithms to determine minimal
evolutionary trees. Mathematical Biosciences60: 133-142 [Introduced branch-and-bound forphylogenies]
Felsenstein, J. 2004. Inferring Phylogenies.Sinauer Associates, Sunderland, Massachusetts. [Forthis lecture the material is chapters 3, 4, and 5]
Semple, C. and M. Steel. 2003. Phylogenetics.Oxford University Press, Oxford. [Also coverssearch strategies]
Lecture 2. Tree space and searching tree space – p.48/48