lecture 6a – introduction to trees & optimality criteria branches: n-taxa -> 2n-3 branches...

12
cture 6A – Introduction to Trees & Optimality Crite Branches: n-taxa -> 2n-3 branches 1, 2, 4, 6, & 7 are external (leaves) 3 & 5 are internal branches (edges) Nodes A – E are terminals x, y, & z are internal (vertices)

Upload: lee-ramsey

Post on 17-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 6A – Introduction to Trees & Optimality Criteria Branches: n-taxa -> 2n-3 branches 1, 2, 4, 6, & 7 are external (leaves) 3 & 5 are internal branches

Lecture 6A – Introduction to Trees & Optimality Criteria

Branches: n-taxa -> 2n-3 branches 1, 2, 4, 6, & 7 are external (leaves)3 & 5 are internal branches (edges)

Nodes A – E are terminals x, y, & z are internal (vertices)

Page 2: Lecture 6A – Introduction to Trees & Optimality Criteria Branches: n-taxa -> 2n-3 branches 1, 2, 4, 6, & 7 are external (leaves) 3 & 5 are internal branches

If we break branch 3, we have two sub-trees (A,B) and (C,(D,E)).

((A,B),C,(D,E)).

Newick Format

Page 3: Lecture 6A – Introduction to Trees & Optimality Criteria Branches: n-taxa -> 2n-3 branches 1, 2, 4, 6, & 7 are external (leaves) 3 & 5 are internal branches

Rooting – The tree is an unrooted tree.

Page 4: Lecture 6A – Introduction to Trees & Optimality Criteria Branches: n-taxa -> 2n-3 branches 1, 2, 4, 6, & 7 are external (leaves) 3 & 5 are internal branches

Also note that there is free rotation around nodes:

Page 5: Lecture 6A – Introduction to Trees & Optimality Criteria Branches: n-taxa -> 2n-3 branches 1, 2, 4, 6, & 7 are external (leaves) 3 & 5 are internal branches

The Scope of the Problem

Taxa Unrooted Trees3 14 35 156 1057 9458 10,3959 135,13510 2.027 X 106

22 3 X 1023

50 3 X 1074

100 2 X 1082

1000 2 X 102,860

10 mil 5 X 1068,667,340

Page 6: Lecture 6A – Introduction to Trees & Optimality Criteria Branches: n-taxa -> 2n-3 branches 1, 2, 4, 6, & 7 are external (leaves) 3 & 5 are internal branches

II. Optimality Criteria

A. Parsimony

First, the score of a tree (i.e., its length) for the entire data set is given by:

li is the length of character i when optimized on tree t.

wi is the weight we assign to character i.

Page 7: Lecture 6A – Introduction to Trees & Optimality Criteria Branches: n-taxa -> 2n-3 branches 1, 2, 4, 6, & 7 are external (leaves) 3 & 5 are internal branches

The Fitch Algorithm: state sets and accumulated lengths.

We erect a state set at each terminal node and assign an accumulated length of zero to terminal nodes. This is the minimum number of changes in the daughter subtree.

Page 8: Lecture 6A – Introduction to Trees & Optimality Criteria Branches: n-taxa -> 2n-3 branches 1, 2, 4, 6, & 7 are external (leaves) 3 & 5 are internal branches

The Fitch Algorithm: state sets and accumulated lengths.

1 – Form the intersection of the state sets of the two daughter nodes. If the intersection is non-empty, assign the set for the internal node equal to the intersection. The accumulated length of the internal node is the sum of those of the daughter nodes.

2 – If the intersection is empty, we assign the union of the two daughter nodes to the state set for the internal node. The accumulated length is the sum of those of the daughter nodes plus one.

empty

Union:0+0+1=1

non-empty

Intersection:0+0=0

empty

Union:1+0+1=2 So li = 2

Page 9: Lecture 6A – Introduction to Trees & Optimality Criteria Branches: n-taxa -> 2n-3 branches 1, 2, 4, 6, & 7 are external (leaves) 3 & 5 are internal branches

Sankoff Algorithm – Character-state vectors and step-matrices

Step Matrix – define ci,j

A CG TA -- 41 4C 4 --4 1G 1 4-- 4T 4 14 --

Step one: Fill in the character-state vectors for terminal nodes.

Each cell is the sk(i)

Page 10: Lecture 6A – Introduction to Trees & Optimality Criteria Branches: n-taxa -> 2n-3 branches 1, 2, 4, 6, & 7 are external (leaves) 3 & 5 are internal branches

Step two: Fill in vectors for other nodes, descending tree.

s1(A) = cAG + cAA = 1 + 0 = 1,

s1(C) = cCG + cCA = 4 + 4 = 8,

s1(G) = cGG + cGA = 0 + 1 = 1,

s1(T) = cTG + cTA = 4 + 4 = 8

Node 1: Node 2:

s2(A) = 4 + 4 = 8

s2(C) = 0 + 0 = 0

s2(G) = 4 + 4 = 8

s2(T) = 1 + 1 = 2

Page 11: Lecture 6A – Introduction to Trees & Optimality Criteria Branches: n-taxa -> 2n-3 branches 1, 2, 4, 6, & 7 are external (leaves) 3 & 5 are internal branches

For nodes below, we must calculate the cost for each possible state assignment for daughter nodes.

s3(A) = min[s1A + cAj] + min[s2A + cAj]

s3(C) = min[s1C + cCj] + min[s2C + cCj]

s3(G) = min[s1G + cGj] + min[s2G + cGj]

s3(T) = min[s1T + cTj] + min[s2T + cTj]

So we fill in the character-state vector for node 3.

From daughter node 1From step matrix

= min[1,12,2,12] + min[8,4,9,6] = 1+4 = 5

5

= min [5,8,5,9] + min[12,0,12,3] = 5+0 = 5

5

= min [2,12,1,12] + min[9,4,8,6] = 1+4 = 5

5

= min [5,9,5,8] + min[12,1,12,2] = 5+1 = 6

6

Page 12: Lecture 6A – Introduction to Trees & Optimality Criteria Branches: n-taxa -> 2n-3 branches 1, 2, 4, 6, & 7 are external (leaves) 3 & 5 are internal branches

Points to note:1) Two types of weighting are possible: weighting of transformations within characters

(which we demonstrated with the step matrix) and weighting among characters, which are reflected in the weighted sum of lengths across characters.

2) One can’t compare tree lengths across weighting schemes. In the first example, with all transformations having the same cost, the length of the character on this tree was 2.

In the second, with a 4:1 step matrix to weight transversions, the length was 5.