perfect phylogeny tutorial #10 © ilan gronau original slides by shlomo moran

14
. Perfect Phylogeny Tutorial #10 © Ilan Gronau Original slides by Shlomo Moran

Upload: lilian-reeves

Post on 08-Jan-2018

224 views

Category:

Documents


0 download

DESCRIPTION

3 no reversals Homoplasy-Free Characters no convergence Homoplasy-free characters induce a convex coloring of the phylogenetic tree The Perfect Phylogeny Problem: Given character-vectors for S, find: -a phylogenetic tree T over S. ( S is the leaf-set of T ) -convex character assignments to all vertices of T. ! This problem is generally NP-hard ! If exists

TRANSCRIPT

Page 1: Perfect Phylogeny Tutorial #10 © Ilan Gronau Original slides by Shlomo Moran

.

Perfect PhylogenyTutorial #10

© Ilan Gronau

Original slides by Shlomo Moran

Page 2: Perfect Phylogeny Tutorial #10 © Ilan Gronau Original slides by Shlomo Moran

2

The underlying model:• A character-vector is given for every specie in S.• Each character represents some observable trait.• Each character takes values from a finite set.• Basic Underlying Assumption: characters are

homoplasy free.

Perfect Phylogeny

Page 3: Perfect Phylogeny Tutorial #10 © Ilan Gronau Original slides by Shlomo Moran

3

no reversals

Homoplasy-Free Characters

no convergence

Homoplasy-free characters induce a convex coloring of the phylogenetic tree

The Perfect Phylogeny Problem:

Given character-vectors for S, find:- a phylogenetic tree T over S.

(S is the leaf-set of T)- convex character assignments to

all vertices of T.! This problem is generally NP-hard !If exists

Page 4: Perfect Phylogeny Tutorial #10 © Ilan Gronau Original slides by Shlomo Moran

4

Directed binary characters: • 0 – property exists• 1 – property doesn’t exist• Initially (at the root) all propertied do not exist.

Input: binary coloring (C1,…,Cm) of a set S (nxm binary matrix M)

Problem: Find a phylogenetic tree T over S (if one exists), s.t.1. For j=1,…,m, the partial coloring induced by Cj is convex in

T.2. The root has state 0 in all characters.

Directed Binary Perfect Phylogeny

We will present a polynomial-time solution

Page 5: Perfect Phylogeny Tutorial #10 © Ilan Gronau Original slides by Shlomo Moran

5

A

ED

C

B

(11000)

(00100)

(01000)

(00110)

(11001)

m characters

n sp

ecie

sExample

C1 C2 C3 C4 C5

A 1 1 0 0 0B 0 0 1 0 0C 1 1 0 0 1D 0 0 1 1 0E 0 1 0 0 0

Input: Possible output:

(00000)

(11000)

(01000)(00100)

C2

C3

zero-root

Page 6: Perfect Phylogeny Tutorial #10 © Ilan Gronau Original slides by Shlomo Moran

6

A tree is a directed perfect phylogeny for a given 0/1 matrix

iff we can map each character to an

edge/vertex on which this character was “turned on”.

C1 C2 C3 C4 C5

A 1 1 0 0 0B 0 0 1 0 0C 1 1 0 0 1D 0 0 1 1 0E 0 1 0 0 0

A

ED

C

B

C4

C3

C1

C5

Example:

An Important Observation

C2 origin of C2

Page 7: Perfect Phylogeny Tutorial #10 © Ilan Gronau Original slides by Shlomo Moran

7

Laminar MatricesDefinitions: Oj – set of objects that have character Cj (Oj={i : Mij=1}). A collection of sets {S1 ,…, Sk} is laminar if

for all i, j, either Si and Sj are disjoint, or one includes the other.

Theorem: A binary matrix M has a perfect phylogenetic tree iff the collection {O1 ,…, Om} is laminar.

C1 C2 C3 C4 C5

A 1 1 0 0 0B 0 0 1 0 0C 1 1 0 0 1D 0 0 1 1 0E 0 1 0 0 0

C1 C2 C3 C4 C5

A 1 1 0 0 0B 0 0 1 0 1C 1 1 0 0 1D 0 0 1 1 0E 0 1 0 0 1

Laminar Not Laminar

Page 8: Perfect Phylogeny Tutorial #10 © Ilan Gronau Original slides by Shlomo Moran

8

Proof of Theorem

Assume M has a perfect phylogeny.Consider the edges labeled Ci and Cj: If there is a root-to-leaf path containing both edges (C1,C2 below),

then Oi includes Oj or vice-versa. Otherwise, Oi and Oj are disjoint (C1,C3 below).

A

ED

C

B

C4

C3

C5

C1

C2

Page 9: Perfect Phylogeny Tutorial #10 © Ilan Gronau Original slides by Shlomo Moran

9

Assume that the collection {O1 ,…, Ok} is laminar. We prove by induction on the number of characters k that M has a perfect phylogenetic tree.

Basis: one character. There are at most two (distinct) objects, one with and one without this character.

C1

A 1B 0

C1

ABroot

Proof of Theorem (cont)

Page 10: Perfect Phylogeny Tutorial #10 © Ilan Gronau Original slides by Shlomo Moran

10

Assume that the collection {O1 ,…, Ok} is laminar.

Induction step: assume correctness for n-1 characters.Consider a matrix with n characters (non-zero columns), and assume WLOG that O1 is not contained in Oj for all j > 1. S1 – the set of objects i for which Mi1 = 1. S2 – the remaining objects. Claim: each character belongs to objects in S1 or S2 , but not to both.

By induction there are trees T1 and T2 for S1 and S2. C1 C2 C3 C4 C5

A 1 1 0 0 0B 0 0 1 0 0C 1 1 0 0 1D 0 0 1 1 0E 1 0 0 0 0

T1 T2

C1S1 ={A,C,E}S2 ={B,D}

Proof of Theorem (cont)

why is this?

Page 11: Perfect Phylogeny Tutorial #10 © Ilan Gronau Original slides by Shlomo Moran

11

Efficient Implementation1. Sort the columns (characters) according to decreasing binary

value.

Claim: If the binary value of column i is larger than that of column j, then Oi is not a proper subset of Oj.

Proof: Oi > Oj means the 1’s in Oi are not covered by the 1’s in Oj.

C1 C2 C3 C4 C5

A 1 1 0 0 0B 0 0 1 0 0C 1 1 0 0 1D 0 0 1 1 0E 0 1 0 0 0

C2 C1 C3 C5 C4

A 1 1 0 0 0B 0 0 1 0 0C 1 1 0 1 0D 0 0 1 0 1E 1 0 0 0 0

Page 12: Perfect Phylogeny Tutorial #10 © Ilan Gronau Original slides by Shlomo Moran

12

why is this?

2. Make a backwards linked list of the 1’s in each row

Claim: If the columns are sorted, then the set of columns is laminar ifffor each column i, all the links leaving column i point at the same column.

If the matrix is laminar then these pointers define the inclusion hierarchy

Efficient Implementation (cont)

C2 C1 C3 C5 C4

A 1 1 0 0 0B 0 0 1 0 0C 1 1 0 1 0D 0 0 1 0 1E 1 0 0 0 0

C2 C1 C3 C5 C4

A 1 1 0 0 0B 0 0 1 0 0C 1 1 0 1 0D 0 0 1 0 1E 0 0 1 1 0

Page 13: Perfect Phylogeny Tutorial #10 © Ilan Gronau Original slides by Shlomo Moran

13

(11000)

(00100)

(01000)

(00110)

(11001)

(00000)

(11000)

(10000)(00100)

3. If the matrix is laminar, compute the inclusion hierarchy4. Reconstruct topology of the phylogenetic tree and ancestral

character states

Efficient Implementation (cont)

C2 C1 C3 C5 C4

A 1 1 0 0 0B 0 0 1 0 0C 1 1 0 1 0D 0 0 1 0 1E 1 0 0 0 0

C5

C1C2

C4

C3

A

ED

C

B

C4

C3

C5

C1

C2

Page 14: Perfect Phylogeny Tutorial #10 © Ilan Gronau Original slides by Shlomo Moran

14

1. Sort the columns (characters) according to decreasing binary value.

2. Make a backwards linked list of the 1’s in each row 3. If the matrix is laminar, compute the inclusion hierarchy4. Reconstruct topology of the phylogenetic tree and ancestral

character states

Complexity: O(mn) – use radix (bucket) sort in stage 1.

Efficient Implementation - Summary

C1 C2 C3 C4 C5

A 1 1 0 0 0B 0 0 1 0 0C 1 1 0 0 1D 0 0 1 1 0E 0 1 0 0 0

C2 C1 C3 C5 C4

A 1 1 0 0 0B 0 0 1 0 0C 1 1 0 1 0D 0 0 1 0 1E 1 0 0 0 0