phylogenetic trees (2) lecture 12
DESCRIPTION
Phylogenetic Trees (2) Lecture 12. Based on: Durbin et al Section 7.3, 7.8, Gusfield: Algorithms on Strings, Trees, and Sequences Section 17. Recall: The Four Points Condition. Theorem: A set M of L objects is additive iff any subset of four objects can be labeled i,j,k,l so that: - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Phylogenetic Trees (2) Lecture 12](https://reader035.vdocuments.net/reader035/viewer/2022062314/56812b1e550346895d8f1594/html5/thumbnails/1.jpg)
.
Phylogenetic Trees (2)Lecture 12
Based on: Durbin et al Section 7.3, 7.8, Gusfield: Algorithms on Strings, Trees, and Sequences Section 17.
![Page 2: Phylogenetic Trees (2) Lecture 12](https://reader035.vdocuments.net/reader035/viewer/2022062314/56812b1e550346895d8f1594/html5/thumbnails/2.jpg)
2
Recall: The Four Points Condition
Theorem: A set M of L objects is additive iff any subset of four objects can be labeled i,j,k,l so that:
d(i,k) + d(j,l) = d(i,l) +d(k,j) ≥ d(i,j) + d(k,l) We call {{i,j},{k,l}} the “split” of {i,j,k,l}.
The four point condition doesn’t provides an algorithm to construct a tree from distance matrix, or to decide that there is no such tree (ie, that the set is not additive).The first methods for constructing trees for additive sets used neighbor joining methods:
![Page 3: Phylogenetic Trees (2) Lecture 12](https://reader035.vdocuments.net/reader035/viewer/2022062314/56812b1e550346895d8f1594/html5/thumbnails/3.jpg)
3
Constructing additive trees:The neighbor joining problem
Let i, j be neighboring leaves in a tree, let k be their parent, and let
m be any other vertex.
The formula
shows that we can compute the distances of k to all other leaves.
This suggest the following method to construct tree from a
distance matrix:
1. Find neighboring leaves i,j in the tree,
2. Replace i,j by their parent k and recursively construct a tree T
for the smaller set.
3. Add i,j as children of k in T.
)],(),(),([),( jidmjdmidmkd 2
1
![Page 4: Phylogenetic Trees (2) Lecture 12](https://reader035.vdocuments.net/reader035/viewer/2022062314/56812b1e550346895d8f1594/html5/thumbnails/4.jpg)
4
Neighbor Finding
How can we find from distances alone a pair of nodes which are neighboring leaves?
Closest nodes aren’t necessarily neighboring leaves.
AB
CD
Next we show one way to find neighbors from distances.
![Page 5: Phylogenetic Trees (2) Lecture 12](https://reader035.vdocuments.net/reader035/viewer/2022062314/56812b1e550346895d8f1594/html5/thumbnails/5.jpg)
5
Neighbor Finding: Seitou&Nei method
Theorem (Saitou&Nei) Assume all edge weights are positive. If D(i,j) is minimal (among all pairs of leaves), then i and j are neighboring leaves in the tree.
ij
kl
m
T1T2
is a leaf
For a leaf , let ( , ).im
i r d i m Definition: Let , be leaves Then
( , ) ( 2) ( , ) ( )where is the number of leaves in
i j
i jD i j L d i j r r
L T
The proof is rather involved, and will be skipped.
![Page 6: Phylogenetic Trees (2) Lecture 12](https://reader035.vdocuments.net/reader035/viewer/2022062314/56812b1e550346895d8f1594/html5/thumbnails/6.jpg)
6
Saitou&Nei proof (to be skipped)
Notations used in the proof p(i,j) = the path from vertex i to vertex j; P(D,C) = (e1,e2,e3) = (D,E,F,C)
A B
CD
e1
e3
e2
For a vertex i, and an edge e=(i’,j’):Ni(e) = |{k : e is on p(i,k)}|.ND(e1) = 3, ND(e2) = 2, ND(e3) = 1NC(e1) = 1
EF
![Page 7: Phylogenetic Trees (2) Lecture 12](https://reader035.vdocuments.net/reader035/viewer/2022062314/56812b1e550346895d8f1594/html5/thumbnails/7.jpg)
7
Saitou&Nei proof
( , )
( , )
For leaves , connected by a path ( , ,.., , ):
( )[ ( ) ( )]
( 2)[ ( , ) ( , )] ( )[ ( ) ( )]
i j i je p i j
i je p l k
i j i l k j
r r d e N e N e
L d i l d k j d e N e N e
i
j
kl
Rest of T is a leaf
Observe that ( , ) ( ) ( ), i im e E
r d i m d e N e
![Page 8: Phylogenetic Trees (2) Lecture 12](https://reader035.vdocuments.net/reader035/viewer/2022062314/56812b1e550346895d8f1594/html5/thumbnails/8.jpg)
8
Saitou&Nei proof
Proof of Theorem: Assume for contradiction that D(i,j) is minimized for i,j which are not neighboring leaves.Let (i,l,...,k,j) be the path from i to j. let T1 and T2 be the subtrees rooted at k and l which do not contain edges from P(i,j) (see figure).
ij
kl
T1T2
Notation: |T| = #(leaves in T).
![Page 9: Phylogenetic Trees (2) Lecture 12](https://reader035.vdocuments.net/reader035/viewer/2022062314/56812b1e550346895d8f1594/html5/thumbnails/9.jpg)
9
Saitou&Nei proof
Case 1: i or j has a neighboring leaf. WLOG j has a neighbor leaf m.A. D(i,j) - D(m,j)=(L-2)(d(i,j) - d(j,m) ) – (ri+rj) + (rm+ rj)
=(L-2)(d(i,k)-d(k,m) )+rm-ri
B. rm-ri ≥ (L-2)(d(k,m)-d(i,l)) + (4-L)d(k,l)
i j
kl
mT2
Substituting B in A:D(i,j) - D(m,j) ≥
(L-2)(d(i,k)-d(i,l)) + (4-L)d(k,l) = 2d(k,l) > 0,
contradicting the minimality assumption.
(since for each edge eP(k,l), Nm(e) ≥ 2 and Ni(e) L-2)
![Page 10: Phylogenetic Trees (2) Lecture 12](https://reader035.vdocuments.net/reader035/viewer/2022062314/56812b1e550346895d8f1594/html5/thumbnails/10.jpg)
10
Saitou&Nei proofCase 2: Not case 1. Then both T1 and T2 contain 2 neighboring leaves.WLOG |T2| ≥ |T1| . Let n,m be neighboring leaves in T1. We shall prove that D(m,n) < D(i,j), which will again contradict the minimality assumption.
i j
kl
mn
p
T1
T2
![Page 11: Phylogenetic Trees (2) Lecture 12](https://reader035.vdocuments.net/reader035/viewer/2022062314/56812b1e550346895d8f1594/html5/thumbnails/11.jpg)
11
Saitou&Nei proof
i j
kl
mn
p
T1
T2
A. 0 ≤ D(m,n) - D(i,j)= (L-2)(d(m,n) - d(i,j) ) + (ri+rj) – (rm+rn)
B. rj-rm< (L-2)(d(j,k) – d(m,p)) + (|T1|-|T2|)d(k,p)C. ri-rn < (L-2)(d(i,k) – d(n,p)) + (|T1|-|T2|)d(l,p)
Adding B and C, noting that d(l,p)>d(k,p):D. (ri+rj) – (rm+rn) < (L-2)(d(i,j)-d(n,m)) + 2(|T1|-|T2|)d(k,p)
Substituting D in the right hand side of A:D(m,n ) - D(i,j)< 2(|T1|-|T2|)d(k,p) ≤ 0, as claimed.
![Page 12: Phylogenetic Trees (2) Lecture 12](https://reader035.vdocuments.net/reader035/viewer/2022062314/56812b1e550346895d8f1594/html5/thumbnails/12.jpg)
12
A simpler neighbor finding method:Select an arbitrary node r. For each pair of labeled nodes (i,j) let C(i,j) be defined
by the following figure:
C(i,j)
i
j
r
Claim (from final exam, Winter 02-3): Let i, j be such that C(i,j) is maximized.Then i and j are neighboring leaves.
)],(),(),([),( jidrjdridjiC 2
1
![Page 13: Phylogenetic Trees (2) Lecture 12](https://reader035.vdocuments.net/reader035/viewer/2022062314/56812b1e550346895d8f1594/html5/thumbnails/13.jpg)
13
Neighbor Joining Algorithm Set M to contain all leaves, and select a root r. |M|=L If L =2, return tree of two vertices
Iteration: Choose i,j such that C(i,j) is maximal Create new vertex k, and set
remove i,j, and add k to M Recursively construct a tree on the smaller set, then add i,j as
children on k, at distances d(i,k) and d(j,k).
ij
k
m
)],(),(),([),(
),(),(),(
)],(),(),([),(
jidmjdmidmkdm
kidjidkjd
rjdridjidkid
2
1 , nodeeach for
2
1
![Page 14: Phylogenetic Trees (2) Lecture 12](https://reader035.vdocuments.net/reader035/viewer/2022062314/56812b1e550346895d8f1594/html5/thumbnails/14.jpg)
14
Complexity of Neighbor Joining Algorithm
Naive Implementation:
Initialization: θ(L2) to compute the C(i,j)’s.
Each Iteration: O(L) to update {C(i,k):i L} for the new node k. O(L2) to find the maximal C(i,j).
Total of O(L3).i
j
k
m
![Page 15: Phylogenetic Trees (2) Lecture 12](https://reader035.vdocuments.net/reader035/viewer/2022062314/56812b1e550346895d8f1594/html5/thumbnails/15.jpg)
15
Complexity of Neighbor Joining Algorithm
Using Heap to store the C(i,j)’s:
Initialization: θ(L2) to compute and heapify the C(i,j)’s.
Each Iteration: O(1) to find the maximal C(i,j). O(L logL) to delete {C(m,i), C(m,j)} and add C(m,k) for
all vertices m.
Total of O(L2 log L).
(implementation details are omitted)
![Page 16: Phylogenetic Trees (2) Lecture 12](https://reader035.vdocuments.net/reader035/viewer/2022062314/56812b1e550346895d8f1594/html5/thumbnails/16.jpg)
16
Ultrametric trees A more recent (and more efficient) way for constructing and identifying additive trees.
Idea: Reduce the problem to constructing trees by the “heights” of the internal nodes. For leaves i,j, D(i,j) represent the “height” of the common ancestor of i and j.
AE
D C
B
8
5
3
3
![Page 17: Phylogenetic Trees (2) Lecture 12](https://reader035.vdocuments.net/reader035/viewer/2022062314/56812b1e550346895d8f1594/html5/thumbnails/17.jpg)
17
Ultrametric Trees Definition: T is an ultrametric tree for a symmetric positive real
matrix D if:
1. The leaves of T correspond to the rows&columns of D
2. Internals nodes have at least two sons, and the Least Common Ancestor of i and j is labeled by D(i,j).
3. The labels decrease along paths from root to leaves
A B C D E
A 0 8 8 5 3
B 0 3 8 8
C 0 8 8
D 0 5
E 0A
E
D C
B
8
5
3
3
![Page 18: Phylogenetic Trees (2) Lecture 12](https://reader035.vdocuments.net/reader035/viewer/2022062314/56812b1e550346895d8f1594/html5/thumbnails/18.jpg)
18
We will study later the following question:
Given a symmetric positive real matrix D,
Is there an ultrametric tree T for D?
Centrality of Ultrametric Trees
But first we show that algorithm that constructs ultrametric trees from a matrix (or decides that no such tree exists) can be used to construct trees for additive sets and other related problems.
![Page 19: Phylogenetic Trees (2) Lecture 12](https://reader035.vdocuments.net/reader035/viewer/2022062314/56812b1e550346895d8f1594/html5/thumbnails/19.jpg)
19
Use the labels to define weights for all internal edges in the natural way.For this, consider the labels of leaves to be 0. We get an additive ultrametric tree whose height is the label of the root.
E
D C
B
8
5
3
3
2
53
A
3 3
5
3
3
Transforming Ultrametric Trees to Weighted Trees
Note that in this tree all leaves are at the same height. This is why it is called ultrametric.
![Page 20: Phylogenetic Trees (2) Lecture 12](https://reader035.vdocuments.net/reader035/viewer/2022062314/56812b1e550346895d8f1594/html5/thumbnails/20.jpg)
20
Transforming Weighted Trees to Ultrametric Trees
A weighted Tree T can be transformed to an ultrametric tree T’ as follows:
Step 1: Pick a node k as a root, and “hang” the tree at k.
a
b
c
d
2
23
4
1
a
b
c d
2
13
4 2
k=a
![Page 21: Phylogenetic Trees (2) Lecture 12](https://reader035.vdocuments.net/reader035/viewer/2022062314/56812b1e550346895d8f1594/html5/thumbnails/21.jpg)
21
Transforming Weighted Trees to Ultrametric Trees
Step 2: Let M = maxid(i,k). M is taken to be the height of T’.Label the root by M, and label each internal node j by M-d(k,j).
a
b
c
d
2
23
4
1
a
b
c d
2
13
4 2
9
7
4
k=a, M=9
![Page 22: Phylogenetic Trees (2) Lecture 12](https://reader035.vdocuments.net/reader035/viewer/2022062314/56812b1e550346895d8f1594/html5/thumbnails/22.jpg)
22
Transforming Weighted Trees to Ultrametric Trees
Step 3: “Stretch” edges of leaves so that they are all at distance M from the root
M=9
a
b
c d
2
13
4 2
9
7
4
(9)
(6)
(2)(0)
a
b
c d
7
9
7
4
2
3
4
9
4
![Page 23: Phylogenetic Trees (2) Lecture 12](https://reader035.vdocuments.net/reader035/viewer/2022062314/56812b1e550346895d8f1594/html5/thumbnails/23.jpg)
23
Reconstructing Weighted Trees from Ultrametric Trees
M = 9
Weight of an internal edge is the difference between its endpoints.Weights of an edge to leaf i is obtained by substracting M-d(k,i) from its current weight.
a
b
c d
7(-6)
9
7
4
4
9 (-9)
4(-2)
a
b
c d
1
2
3
4
0
2
![Page 24: Phylogenetic Trees (2) Lecture 12](https://reader035.vdocuments.net/reader035/viewer/2022062314/56812b1e550346895d8f1594/html5/thumbnails/24.jpg)
24
Solving the Additive Tree Problem by the Ultrametric Problem: Outline
We solve the additive tree problem by reducing it to the ultrametric problem as follows:
1. Given an input matrix D = D(i,j) of distances, transform it to a
matrix D’= D’(i,j), where D’(i,j) is the height of the LCA of i
and j in the corresponding ultrametric tree T’.
2. Construct the ultrametric tree, T’, for D’.
3. Reconstruct the additive tree T from T’.
![Page 25: Phylogenetic Trees (2) Lecture 12](https://reader035.vdocuments.net/reader035/viewer/2022062314/56812b1e550346895d8f1594/html5/thumbnails/25.jpg)
25
How D’ is constructed from D
D’(i,j) should be the height of the Least Common Ancestror of i and j in T’, the ultrametric tree hanged at k:
Thus, D’(i,j) = M - d(k,m), where d(k,m) is computed by:
a
b
c d
2
13
4 2
9
7
1( , ) ( ( , ) ( , ) ( , ))
2(Here, a, b, c)
d k m d i k d j k d i j
k i j
![Page 26: Phylogenetic Trees (2) Lecture 12](https://reader035.vdocuments.net/reader035/viewer/2022062314/56812b1e550346895d8f1594/html5/thumbnails/26.jpg)
26
The transformation of D to D’
a b c d
a 9 9 9
b 7 7
c 4
d
a b c d
a 3 9 7
b 8 6
c 6
d
Distance matrix D
a
b
c d
2
13
4 2
9
Ultrametric matrix D’
a
b
c d
9
7
4
M=9
T T’
![Page 27: Phylogenetic Trees (2) Lecture 12](https://reader035.vdocuments.net/reader035/viewer/2022062314/56812b1e550346895d8f1594/html5/thumbnails/27.jpg)
27
Identifying Ultrametric Trees
Definition: A symmetric matrix D is ultrametric iff for each 3 indices i, j, k
D(i,j) ≤ max {D(i,k),D(j,k)}.
(ie, there is a tie for the maximum value)
Theorem: D has an ultrametric tree iff it is ultrametric
Proof: Next lecture.