computational molecular biology - nuig mathematics

Computational Molecular Biology

Lecture Thirteen: Neighbour-joining algorithm

Semester I, 2009-10

Graham EllisNUI Galway, Ireland

About the algorithm

Neighbour-joining is a method used for the construction ofphylogenetic trees.

About the algorithm

Neighbour-joining is a method used for the construction ofphylogenetic trees.

It is usually used for trees based on DNA or protein sequence data.

The algorithm’s input

The algorithm inputs an r × r symmetric matrix D with zeros onthe diagonal.



No assumption is made about the triangle inequality, or the fourpoint condition.



No assumption is made about the triangle inequality, or the fourpoint condition.

The idea is that the matrix arises from experimental data from r

taxa (e.g. DNA samples).

The algorithm’s output

The algorithm outputs a phylogenetic tree with r leaves and withlenghts assigned to edges.

The algorithm

Neighbour-joining is an iterative algorithm. Each iteration consistsof the following steps:

The algorithm


1. Based on the current distance matrix calculate the matrix Q(explained below).

The algorithm



2. Find the pair of taxa in Q with the lowest value. Create anode on the tree that joins these two taxa.

The algorithm




3. Calculate the distance of each of the taxa in the pair to thisnew node (using the formula below).

The algorithm





4. Calculate the distance of all taxa outside of this pair to thenew node.

The algorithm





4. Calculate the distance of all taxa outside of this pair to thenew node.

5. Start the algorithm again, considering the pair of joinedneighbors as a single taxon and using the distances calculatedin the previous step.

The Q matrix

Let D be our distance data relating r taxa. We calculate Q asfollows:

Q(i , j) = (r − 2)D(i , j) −r

∑

k=1

D(i , k) −r

∑

k=1

D(j , k)

Example

Suppose we start with the following distance data.

D A B C D

A 0 7 11 14B 7 0 6 9C 11 6 0 7D 14 9 7 0

Example

Suppose we start with the following distance data.

D A B C D

A 0 7 11 14B 7 0 6 9C 11 6 0 7D 14 9 7 0

We get the following Q matrix

Q A B C D

A −64 −40 −34 −34B −40 −44 −34 −34C −34 −34 −48 −40D −34 −34 −40 −60

Example (cont.)

The neighbours (A,B) and neighbours (C ,D) both have lowest Q

value -40. We choose either pair and join them.

Example (cont.)

The neighbours (A,B) and neighbours (C ,D) both have lowest Q

value -40. We choose either pair and join them.

Let’s choose (A,B). Our graph starts to look like

A B C D

E

Example (cont.)

We now calculate the distance from E to the paired taxa A,B

using the fomula

D(A,E ) =1

2D(A,B) +

1

2(r − 2)

[

r∑

k=1

D(A, k) −r

∑

k=1

D(B , k)

]

.

Example (cont.)

We now calculate the distance from E to the paired taxa A,B

using the fomula

D(A,E ) =1

2D(A,B) +

1

2(r − 2)

[

r∑

k=1

D(A, k) −r

∑

k=1

D(B , k)

]

.

The formula gives D(A,E ) = 6, from which we deduceD(B ,E ) = 1.

Example (cont.)

We now calculate the distance from E to any of the other fournodes X using the fomula

D(E ,X ) =1

2[D(A,X ) − D(A,E )] +

1

2[D(B ,X ) − D(B ,E )]

Here E is the new node, X is the node whose distance from E weare computing, and A,B are the two nodes just joined.

Example (cont.)

We now calculate the distance from E to any of the other fournodes X using the fomula

D(E ,X ) =1

2[D(A,X ) − D(A,E )] +

1

2[D(B ,X ) − D(B ,E )]

Here E is the new node, X is the node whose distance from E weare computing, and A,B are the two nodes just joined.

We get

D E C D

E 0 5 8C 5 0 7D 8 7 0

Example (cont.)

Now we find the next Q matrix. Use it to adjoin a new node toour tree. Then calculate a new distance matrix D.

computational molecular biology - nuig mathematics

Documents