computational molecular biology - nuig mathematics
TRANSCRIPT
Computational Molecular Biology
Lecture Thirteen: Neighbour-joining algorithm
Semester I, 2009-10
Graham EllisNUI Galway, Ireland
About the algorithm
Neighbour-joining is a method used for the construction ofphylogenetic trees.
About the algorithm
Neighbour-joining is a method used for the construction ofphylogenetic trees.
It is usually used for trees based on DNA or protein sequence data.
The algorithm’s input
The algorithm inputs an r × r symmetric matrix D with zeros onthe diagonal.
The algorithm’s input
The algorithm inputs an r × r symmetric matrix D with zeros onthe diagonal.
No assumption is made about the triangle inequality, or the fourpoint condition.
The algorithm’s input
The algorithm inputs an r × r symmetric matrix D with zeros onthe diagonal.
No assumption is made about the triangle inequality, or the fourpoint condition.
The idea is that the matrix arises from experimental data from r
taxa (e.g. DNA samples).
The algorithm’s output
The algorithm outputs a phylogenetic tree with r leaves and withlenghts assigned to edges.
The algorithm
Neighbour-joining is an iterative algorithm. Each iteration consistsof the following steps:
The algorithm
Neighbour-joining is an iterative algorithm. Each iteration consistsof the following steps:
1. Based on the current distance matrix calculate the matrix Q(explained below).
The algorithm
Neighbour-joining is an iterative algorithm. Each iteration consistsof the following steps:
1. Based on the current distance matrix calculate the matrix Q(explained below).
2. Find the pair of taxa in Q with the lowest value. Create anode on the tree that joins these two taxa.
The algorithm
Neighbour-joining is an iterative algorithm. Each iteration consistsof the following steps:
1. Based on the current distance matrix calculate the matrix Q(explained below).
2. Find the pair of taxa in Q with the lowest value. Create anode on the tree that joins these two taxa.
3. Calculate the distance of each of the taxa in the pair to thisnew node (using the formula below).
The algorithm
Neighbour-joining is an iterative algorithm. Each iteration consistsof the following steps:
1. Based on the current distance matrix calculate the matrix Q(explained below).
2. Find the pair of taxa in Q with the lowest value. Create anode on the tree that joins these two taxa.
3. Calculate the distance of each of the taxa in the pair to thisnew node (using the formula below).
4. Calculate the distance of all taxa outside of this pair to thenew node.
The algorithm
Neighbour-joining is an iterative algorithm. Each iteration consistsof the following steps:
1. Based on the current distance matrix calculate the matrix Q(explained below).
2. Find the pair of taxa in Q with the lowest value. Create anode on the tree that joins these two taxa.
3. Calculate the distance of each of the taxa in the pair to thisnew node (using the formula below).
4. Calculate the distance of all taxa outside of this pair to thenew node.
5. Start the algorithm again, considering the pair of joinedneighbors as a single taxon and using the distances calculatedin the previous step.
The Q matrix
Let D be our distance data relating r taxa. We calculate Q asfollows:
Q(i , j) = (r − 2)D(i , j) −r
∑
k=1
D(i , k) −r
∑
k=1
D(j , k)
Example
Suppose we start with the following distance data.
D A B C D
A 0 7 11 14B 7 0 6 9C 11 6 0 7D 14 9 7 0
Example
Suppose we start with the following distance data.
D A B C D
A 0 7 11 14B 7 0 6 9C 11 6 0 7D 14 9 7 0
We get the following Q matrix
Q A B C D
A −64 −40 −34 −34B −40 −44 −34 −34C −34 −34 −48 −40D −34 −34 −40 −60
Example (cont.)
The neighbours (A,B) and neighbours (C ,D) both have lowest Q
value -40. We choose either pair and join them.
Example (cont.)
The neighbours (A,B) and neighbours (C ,D) both have lowest Q
value -40. We choose either pair and join them.
Let’s choose (A,B). Our graph starts to look like
A B C D
E
Example (cont.)
We now calculate the distance from E to the paired taxa A,B
using the fomula
D(A,E ) =1
2D(A,B) +
1
2(r − 2)
[
r∑
k=1
D(A, k) −r
∑
k=1
D(B , k)
]
.
Example (cont.)
We now calculate the distance from E to the paired taxa A,B
using the fomula
D(A,E ) =1
2D(A,B) +
1
2(r − 2)
[
r∑
k=1
D(A, k) −r
∑
k=1
D(B , k)
]
.
The formula gives D(A,E ) = 6, from which we deduceD(B ,E ) = 1.
Example (cont.)
We now calculate the distance from E to any of the other fournodes X using the fomula
D(E ,X ) =1
2[D(A,X ) − D(A,E )] +
1
2[D(B ,X ) − D(B ,E )]
Here E is the new node, X is the node whose distance from E weare computing, and A,B are the two nodes just joined.
Example (cont.)
We now calculate the distance from E to any of the other fournodes X using the fomula
D(E ,X ) =1
2[D(A,X ) − D(A,E )] +
1
2[D(B ,X ) − D(B ,E )]
Here E is the new node, X is the node whose distance from E weare computing, and A,B are the two nodes just joined.
We get
D E C D
E 0 5 8C 5 0 7D 8 7 0
Example (cont.)
Now we find the next Q matrix. Use it to adjoin a new node toour tree. Then calculate a new distance matrix D.