1 modified mincut supertrees roderic page university of glasgow
TRANSCRIPT
1
Modified Mincut Supertrees
Roderic PageUniversity of Glasgow
2
Tree of Life
About 1.7 million species described.
What we have so far:
• TreeBASE database (15,000 taxa)
• Ribosomal Database Project (RDP II) (20,000 sequences)
• The Tree of Life Project (11,000 taxa)
3
Recent interest in the Tree of Life
Assembling the Tree of Life: Science, Relevance, and Challenges AMNH, New York, May 2002
$US 10 million “to construct a phylogeny for the 1.7 million described species ofLife” announced February 15th 2002
NSF sponsored “Tree of Life” workshops(2000-2001)
European initiative (ATOL) under FP6
4
Problem: how to build the tree of life
Solutions:
• Find one or more “magic markers” that will allow us to recover the whole tree in one go (problems: combinability and complexity)
• Assemble big tree from many smaller trees derived from many kinds of data (supertrees)
5
Tree terminology
a b c d
{a,b}
{a,b,c}
{a,b,c,d} root
leaf
internal nodecluster
edge
6
Nestings and triplets
a b c d
{a,b} <T {a,b,c,d}
{b,c} <T {a,b,c,d}
(bc)d
bc|d
Nestings
Triplets
7
Supertree
a b c b c da b c d
supertree
T1 T2
+ =
8
Some desirable properties of a supertree method
(Steel et al., 2000)
• The supertree can be computed in polynomial time
• A grouping in one or more trees that is not contradicted by any other tree occurs in the supertree
9
Homo sapiens 1 1 1
Pan paniscus 1 1 1
Gorilla gorilla 1 1 0
Pongo pygmaeus 1 0 0
Hylobates 0 0 0
1 2 3
1
2
3
MRP(Matrix Representation Parsimony)
•NP-hard•Can generate many solutions
10
Aho et al.’s algorithm (OneTree)Aho, A. V., Sagiv, Y., Syzmanski, T. G., and Ullman, J. D. 1981. Inferring a
tree from lowest common ancestors with an application to the optimization of relational expressions. SIAM J. Comput. 10: 405-421.
Input: set of rooted trees
1. If set is compatible (i.e., will agree on a tree), output that tree.
2. If set is not compatible, stop!
11
a b c b c d
T1 T2
a b
cd
a, b
d
a, b, c, d
a b
ca, b, c
a b
c
Aho et al.’sOneTree algorithm
supertree
12
Mincut supertreesSemple, C., and Steel, M. 2000. A supertree method for
rooted trees. Discrete Appl. Math. 105: 147-158.
• Modifies OneTree by cutting graph
• Requires rooted trees (no analogue of OneTree for unrooted trees)
• Recursive
• Polynomial time
13
a b c d e a b c d
T1 T2
a
b
c
de
{T 1,T 2}S
Semple and Steel (2000)
14
a
b
c
de
a,b
c
de
1
1 1
1
11
1
2
{T1,T2}Smax
S /E{T1,T2} {T1,T2}
Collapsing the graph(Semple and Steel mincut algorithm)
This edge has
maximum weight
15
Cut the graph to get supertree
a b c d e
supertree
a,b
c
de
1
1
1max
S /E{T1,T2} {T1,T2}
16
My mincut supertree implementationdarwin.zoology.gla.ac.uk/~rpage/supertree
• Written in C++
• Uses GTL (Graph Template Library) to handle graphs (formerly a free alternative to LEDA)
• Finds all mincuts of a graph faster than Semple and Steel’s algorithm
17
A counter example: two input trees...
a
b
c
x1
x2
x3
c
b
a
y1
y2
y3
y4
18
Mincut gives this (strange) result
cx1x2x3bay1y2y3y4
• Disputed relationships among a, b, and c are resolved
• x1, x2, and x3 collapsed into polytomy
19
Problem:Cuts depend on connectivity(in this example it is a function of tree size)
a
x1
x2 y1
y3
y4x3
y2
c
b
{T1,T2}S
20
So, mincut doesn’t work
• But, Semple and Steel said it did
• My program seems to work
• Argh!!! What is happening….?
21
What mincut does… …and does not do
• Mincut supertree is guaranteed to include any nesting which occurs in all input trees
• Makes no claims about nestings which occur in only some of the trees
• “Does exactly what it says on the tin™”
22
Modifying mincut supertree
• Can we incorporate more of the information in the input trees?
• Three categories of information• Unanimous (all trees have that grouping)• Contradicted (trees explicitly disagree)• Uncontradicted (some trees have information
that no other tree disagrees with)
23
Uncontradicted informationassume we have k input trees
a b
a and b co-occurin a tree
a and b nestedin a tree
a b
c n
c - n = 0 uncontradicted (if c = k then unanimous)
c - n > 0 contradicted
24
Uncontradicted informationassume we have k input trees
a b
a and b co-occurin a tree
a and b nestedin a tree
a b
c n
c - n -f = 0 uncontradicted (if c = k then unanimous)
c - n - f > 0 contradicted
a b
a and b in a fan
f
25
a
b
c
x1 x
x3
y1 y2y3 y4
2
a
b
c
y1
y3
y4
x1
x2
x3
y2
Uncontradicted
Uncontradicted but adjacent to contradictedContradicted
Classifying edges
{T1,T2}S
26
Modified mincut
• Species a, b, and c form a polytomy
• x1, x2, and x3 resolved as per the input tree
modified mincut
abcx1x2x3y1y2y3y4
271 2 3 4 5
1 2 3 4 5 1 2 3 4 5
1 2 3 4 5
(12)5
(45)1
(23)5
(34)1
If no tree contradicts an item of information, is that information always in the supertree?
28
1 2
3
4
5
No!Steel, Dress, & Böcker 2000
• The four trees display (12)5, (23)5, (34)1, and (45)1
• No tree displays (IK)J or (JK)I for any (IJ)K above
• Triplets are uncontradicted, but cannot form a tree
29
Future directions
• Improve handling of uncontradicted information
• Add support for constraints
• Visualising very big trees
• Better integration into phylogeny
databases (www.treebase.org)
darwin.zoology.gla.ac.uk/~rpage/supertree