the saitou&nei neighbor joining algorithm ©shlomo moran & ilan gronau

16
The Saitou&Nei Neighbor Joining Algorithm ©Shlomo Moran & Ilan Gronau

Upload: colleen-logan

Post on 18-Dec-2015

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: The Saitou&Nei Neighbor Joining Algorithm ©Shlomo Moran & Ilan Gronau

The Saitou&Nei Neighbor Joining Algorithm

©Shlomo Moran & Ilan Gronau

Page 2: The Saitou&Nei Neighbor Joining Algorithm ©Shlomo Moran & Ilan Gronau

2

Recall: Distance-Based Reconstruction:

• Input: distances between all taxon-pairs• Output: a tree (edge-weighted) best-describing the

distances

0

30

980

1514180

171620220

1615192190

D

4 5

7 21

210 61

Page 3: The Saitou&Nei Neighbor Joining Algorithm ©Shlomo Moran & Ilan Gronau

3

Requirements from Distance-Based Tree-Reconstruction Algorithms

1. Consistency: If the input metric is a tree metric, the returned tree should be the (unique) tree which fits this metric.

2. Efficiency: poly-time, preferably no more than O(n3), where n is the number of leaves (ie, the distance matrix is nXn).

3. Robustness: if the input matrix is “close” to tree metric, the algorithm should return the corresponding tree.

Definition: Tree metric or additive distances are distances which can be realized by a weighted tree.

A natural family of algorithms which satisfy 1 and 2 is called “Neighbor Joining”, presented next. Then we present one such algorithm which is known to be robust in practice.

Page 4: The Saitou&Nei Neighbor Joining Algorithm ©Shlomo Moran & Ilan Gronau

4

The Neighbor Joining Tree-Reconstruction Scheme

1. Use D to select pair of neighboring leaves (cherries) i,j

2. Define a new vertex v as the parent of the cherries i,j

3. Compute a reduced (n-1)✕(n-1) distance matrix D’, over S’=S \ {i,j}{v}:

Important: need to compute distances from v to other vertices in S’, s.t.

D’ is a distance matrix of the reduced tree T’, obtained by prunning

i,j from T.

Start with an n✕n distance matrix D over a set S of n taxa (or vertices, or leaves)0 . .

0 . .

0 . .

0 .

0 .

0

0

D’

0 .. ..

0

0 ..

0 ..

0 ..

0

0

0

Di

v

j

Francez
next 4 slides from Ilan's, 17/5/07
Page 5: The Saitou&Nei Neighbor Joining Algorithm ©Shlomo Moran & Ilan Gronau

5

The Neighbor Joining Tree-Reconstruction Scheme

4. Apply the method recursively on the reduced matrix D’, to get

the reduced tree T’.

5. In T’, add i,j as children of v (and possibly update edge

lengths).

Recursion base: when there are only two objects, return a tree with 2 leaves.

v

ji

0 . .

0 . .

0 . .

0 .

0 .

0

0

D’

v

T’

Question: how can we find cherries?

Page 6: The Saitou&Nei Neighbor Joining Algorithm ©Shlomo Moran & Ilan Gronau

7

Least Common Ancestor Depth

Let i,j be leaves in T, and let r i,j be a vertex in T.LCAr(i,j) is the Least Common Ancestor of i and j when r is

viewed as a root.If r is fixed we just write LCA(i,j) . dT(r,LCA(i,j)) is the “depth of LCAr(i,j)”.

ij

r

dT(r,LCA(i,j))

Page 7: The Saitou&Nei Neighbor Joining Algorithm ©Shlomo Moran & Ilan Gronau

8

Let T be a weighted tree, with a root r. For leaves i,j ≠r , let L (i,j)=dT(r,LCA(i,j))

Then if :

Cherries maximize the LCA Depth

i j

r

0 0( , ) max{ ( , )}L i j L i j j

v

Then i and j are cherries.This property can be used to select cherries pairs.The “Saitou&Nei” NJ algorithm uses a variant of this property.

Page 8: The Saitou&Nei Neighbor Joining Algorithm ©Shlomo Moran & Ilan Gronau

9

Saitou & Nei’s Neighbor Joining Algorithm (1987)

select , which maximize the sum

( , ) ( , ) ( , ) ( 2) ( , )r r

i j

Q i j D r i D r j n D i j

~13,000 citations (Science Citation Index)

Implemented in numerous phylogenetic packages

Fastest implementation - θ(n3)

Usually referred to as “the NJ algorithm”

Identified by its neigbor selection criterion

Saitou & Nei’s neighbor-selection criterion

Page 9: The Saitou&Nei Neighbor Joining Algorithm ©Shlomo Moran & Ilan Gronau

10

Consistency of Seitou&Nei method

Theorem (Saitou&Nei) Assume all edge weights of T are positive. If Q(i,j)=max{i’,j’} Q(i’,j’) , then i and j are cherries in the tree.

Proof: in the following slides.

( , ) ( , ) ( , ) ( 2) ( , )r r

Q i j D r i D r j n D i j

Page 10: The Saitou&Nei Neighbor Joining Algorithm ©Shlomo Moran & Ilan Gronau

,

2 ( , ) ( , )rr i j

LCA i j D i j

Intuition: NJ “tries” to selects taxon-pairs with average deepest LCA

The addition of D(i,j) is needed to make the formula consistent .

Next we prove the above equality.

Saitou & Nei’s Selection criterion:Select i,j which maximize

( , ) ( , ) ( , ) ( 2) ( , )r r

Q i j D r i D r j n D i j

1st step in the proof: Express Saitou&Nei selection criterion in terms of LCA distances

shlomo moran
the expression as LCA distances originated in Mirkin96, mentioned in Gascuel-unj.
Page 11: The Saitou&Nei Neighbor Joining Algorithm ©Shlomo Moran & Ilan Gronau

12

Proof of equality in previous slide

, ,

,

( , ) ( , ) ( , ) ( , ) ( , ) ( 2) ( , )

[ ( , ) ( , ) ( , )] 2 ( , )

r i j r i j

r i j

Q i j D i j D i r D j i D j r n D i j

D i r D j r D i j D i j

-2d(r,LCAr(i,j))

,

2 ( , ) ( , ( , ))rr i j

D i j d r LCA i j

ri rj

Page 12: The Saitou&Nei Neighbor Joining Algorithm ©Shlomo Moran & Ilan Gronau

13

2nd step in proof:Consistency of Saitou&Nei Neighbor Selection

,

We need to show that a pair of leaves , which maximize

'( , ) ( , ) / 2 ( , ) ( , ( , ))

must be cherries. First we express ' as a sum of edge weights.

rr i j

i j

Q i j Q i j D i j D r LCA i j

Q

, ( , ) ( , )

'( , ) ( , ) ( , ( , )) ( ) ( ) ( )r ir i j e path i j e path i j

Q i j D i j D r LCA i j w e N e w e

For a vertex i, and an edge e:Ni(e) = |{rS : e is on path(i,r)}|Then:

Note: If e’ is a “leaf edge”, then w(e’) is added exactly once to Q(i,j).

ij

rRest of T

e

path(i,j)

Page 13: The Saitou&Nei Neighbor Joining Algorithm ©Shlomo Moran & Ilan Gronau

14

Let (see the figure below):• path(i,j) = (i,...,k,j).• T1 = the subtree rooted at k. WLOG that T1 has at most n/2 leaves. • T2 = T \ T1.

ij

k

T1

T2

Assume for contradiction that Q’(i,j) is maximized for i,j which are not cherries.

i’j’Let i’,j’ be any two cherries in T1. We

will show that Q’(i’,j’) > Q’(i,j).

Consistency of Saitou&Nei (cont)

Page 14: The Saitou&Nei Neighbor Joining Algorithm ©Shlomo Moran & Ilan Gronau

15

ij

k

T1

T2

Proof that Q’(i’,j’)>Q’(i,j):

i’j’

( , ) ( , )

'( ', ') ( ', ')

'( , ) ( ) ( ) ( )

'( ', ') ( ) ( ) ( )

ie p i j e p i j

ie p i j e p i j

Q i j w e N e w e

Q i j w e N e w e

Each leaf edge e adds w(e) both to Q’(i,j) and to Q’(i’,j’), so we can ignore the contribution of leaf edges to both Q’(i,j) and Q’(i’,j’)

Consistency of Saitou&Nei (cont)

january06: maybe a slide explaining that n_i(e) is the number of leaves in the relevant subtree should be added.
Page 15: The Saitou&Nei Neighbor Joining Algorithm ©Shlomo Moran & Ilan Gronau

16

ij

k

T1

T2i’

j’

Location of internal edge e

# w(e) added to Q’(i,j)

# w(e) added to Q’(i’,j’)

epath(i,j) 1 Ni’(e)≥2

epath(i’,j) Ni (e) < n/2 Ni’(e) ≥ n/2

eT\path(i,i’) Ni (e) = Ni’(e)

Since there is at least one internal edge e in path(i,j), Q’(i’,j’) > Q’(i,j). QED

Contribution of internal edges to Q(i,j) and to Q(i’,j’)

Consistency of Saitou&Nei (end)

Page 16: The Saitou&Nei Neighbor Joining Algorithm ©Shlomo Moran & Ilan Gronau

17

Initialization: θ(n2) to compute Q(i,j) for all i,jL.

Each Iteration: O(n2) to find the maximal Q(i,j), and to update the

values of Q(x,y)

Total: O(n3)

Complexity of Seitou&Nei NJ Algorithm