week 3: parsimony variants, compatibility, statistics and...

Post on 24-May-2020

19 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Week 3: Parsimony variants, compatibility, statistics andparsimony

Genome 570

January, 2010

Week 3: Parsimony variants, compatibility, statistics andparsimony – p.1/46

Camin-Sokal parsimony

1 0 1

0

0 1 0

0 1

1

1

4 changes

Week 3: Parsimony variants, compatibility, statistics andparsimony – p.2/46

Dollo parsimony

1 0 1

0

0 1 0

0 1

1

1

4 changes

(3 losses)

Week 3: Parsimony variants, compatibility, statistics andparsimony – p.3/46

Polymorphism parsimony

1 0 1

0

0 1 0

0 1

1

1

5 retentions ofpolymorphism

01

Week 3: Parsimony variants, compatibility, statistics andparsimony – p.4/46

nonadditive binary coding

0

1

2

3

4

original

new characters

0

0

0

0

0

a b

0

1

1

1

1

c

−1 1 0 0 0 0

d e

0 0 0

0

1

1

0

0

0

1

0

0

0

0

1

−1

0

1

2 3

4

a b

c d

e

Week 3: Parsimony variants, compatibility, statistics andparsimony – p.5/46

Dollo parsimony – a paradox

−1

0

1

2 3

4 A B C D

3 32 4

y

x

Week 3: Parsimony variants, compatibility, statistics andparsimony – p.6/46

Weighting using a function

Suppose that each step is to be weighted 1/(n + 2) if the character has atotal of n steps.

Then the total weight of steps when there are n steps in that character isn × 1/(n + 2) which is n/(n + 2)

Total steps in Weight of Total weightthe character each step of all steps

0 0 01 0.3333 0.33332 0.25 0.53 0.2 0.64 0.1667 0.66675 0.142857 0.71428

Week 3: Parsimony variants, compatibility, statistics andparsimony – p.7/46

Successive weighting

J.S. Farris suggested (1969):

1. Infer a tree with unweighted parsimony

2. Calculate weights for each character (or site) based on the numberof changes in that character on that tree

3. Use those weights to infer a new tree

4. Unless the tree hasn’t changed, go back to step 2.

Week 3: Parsimony variants, compatibility, statistics andparsimony – p.8/46

Successive weighting

There are 15 possible unrooted trees, which fall into 5 types according tohow many changes they have in each character. The table shows the totalweighted number of changes when each tree type is evaluated using theweights implied by the 5 different tree types.

number have pattern type tree type used for weightsof trees of changes: of tree I II III IV V

1 (1,1,1,2,2,1) I 2.333 2.250 2.167 2.417 2.0832 (1,2,1,2,2,1) II 2.667 2.500 2.500 2.667 2.3332 (2,1,2,2,2,1) III 3.000 2.917 2.667 2.917 2.5833 (2,2,2,1,1,1) IV 2.833 2.667 2.500 2.500 2.3337 (2,2,2,2,2,1) V 3.333 3.167 3.000 3.167 2.833

Week 3: Parsimony variants, compatibility, statistics andparsimony – p.9/46

Ties in successive weighting

An example of successive weighting that would show the difficulty it has indetecting ties. The table shows the total weighted number of changeswhen each tree type is evaluated using the weights implied by the differenttree types.

number have pattern type tree type used for weightsof trees of changes: of tree I II III

1 (1,1,2,2) I 1.667 1.833 1.51 (2,2,1,1) II 1.833 1.667 1.51 (2,2,2,2) III 2.333 2.333 2

Week 3: Parsimony variants, compatibility, statistics andparsimony – p.10/46

Nonsuccessive weighting

We can use a different algorithm to avoid the issue of dependence onstarting point:

Search over tree space in the usual way

For each tree, evaluate the number of steps in each character (orsite)

Calculate the weight for the character from its number of steps onthe current tree in the search.Add up the weighted steps across characters to evaluate that tree.

This is equivalent to looking only at the diagonals in the preceding tablesof trees. Weights are never based on a different tree.

Week 3: Parsimony variants, compatibility, statistics andparsimony – p.11/46

Evaluating compatibility with 0/1 characters

E. O. Wilson (Systematic Zoology, 1965) pointed out that for two characters,if all four combinations 00, 01, 10 and 11 exist in one or another species,the two characters must be incompatible with each other, in the sense thatthere can be no tree on which they each change only once (so the derivedstate is uniquely derived).

Compatiblecharacters

0 10 X X1 X

Incompatiblecharacters

0 10 X X1 X X

The logic is straightforward: to originate a new combination requires astep in one character (or both). With four combinations there are 3 stepsrequired, one more than the number that would be needed if bothcharacters were incompatible.

Week 3: Parsimony variants, compatibility, statistics andparsimony – p.12/46

Data example for compatibility

The data set Table 1.1 with an added species all of whose characters are0.

CharactersSpecies 1 2 3 4 5 6Alpha 1 0 0 1 1 0Beta 0 0 1 0 0 0Gamma 1 1 0 0 0 0Delta 1 1 0 1 1 1Epsilon 0 0 1 1 1 0Omega 0 0 0 0 0 0

Week 3: Parsimony variants, compatibility, statistics andparsimony – p.13/46

The compatibility matrix for this data set

1 2 3 4 5 61 2 3 4 5 6

123456

Week 3: Parsimony variants, compatibility, statistics andparsimony – p.14/46

The Pairwise Compatibility Theorem

A set S of characters has all pairs of characterscompatible with each other if and only if all ofthe characters in the set are jointly compatible (inthat there exists a tree with which all of them arecompatible).

Week 3: Parsimony variants, compatibility, statistics andparsimony – p.15/46

The compatibility graph

Maximal cliques? Largest?

1

2 3

4

56

Week 3: Parsimony variants, compatibility, statistics andparsimony – p.16/46

One of the cliques it contains

1

2 3

4

56

Week 3: Parsimony variants, compatibility, statistics andparsimony – p.17/46

A maximal clique

1

2 3

4

56

Week 3: Parsimony variants, compatibility, statistics andparsimony – p.18/46

The largest maximal clique

1

2 3

4

56

Week 3: Parsimony variants, compatibility, statistics andparsimony – p.19/46

Making the tree by popping

Alpha

Beta

Gamma

Delta

Epsilon

AlphaBeta

Gamma

DeltaEpsilon

4 5AlphaBetaGammaDeltaEpsilon

0011

10011

1

Character 1Character 3

1 2 3 6101

01

11

00

0

1

1

0

00

00010

Week 3: Parsimony variants, compatibility, statistics andparsimony – p.20/46

Making the tree by popping

Gamma

Delta

Alpha

Beta

Gamma

Delta

Epsilon

AlphaBeta

Gamma

DeltaEpsilon

Beta

EpsilonAlpha

4 5AlphaBetaGammaDeltaEpsilon

0011

10011

1

Character 1

Character 2

Character 3

1 2 3 6101

01

11

00

0

1

1

0

00

00010

Week 3: Parsimony variants, compatibility, statistics andparsimony – p.21/46

Making the tree by popping

Gamma

DeltaGamma

Alpha

Beta

Gamma

Delta

Epsilon

AlphaBeta

Gamma

DeltaEpsilon

Beta

EpsilonAlpha

Delta

Beta

EpsilonAlpha

4 5AlphaBetaGammaDeltaEpsilon

0011

10011

1

Character 1

Character 2

Character 3

Character 6

1 2 3 6101

01

11

00

0

1

1

0

00

00010

Week 3: Parsimony variants, compatibility, statistics andparsimony – p.22/46

Making the tree by popping

Alpha

Beta

Gamma

Delta

Epsilon

AlphaBeta

Gamma

DeltaEpsilon

Beta

EpsilonAlpha

Alpha

Gamma

Delta

Beta

Epsilon

Tree is:

Gamma

Delta

Gamma

Delta

Beta

EpsilonAlpha

4 5AlphaBetaGammaDeltaEpsilon

0011

10011

1

Character 1

Character 2

Character 3

Character 6

1 2 3 6101

01

11

00

0

1

1

0

00

00010

Week 3: Parsimony variants, compatibility, statistics andparsimony – p.23/46

Unknown states and compatibility

A data set that has all pairs of characters compatible, but which cannothave all characters compatible with the same tree. This violates thePairwise Compatibility Theorem, owing to the unknown ("?") states.

Alpha 0 0 0Beta ? 0 1Gamma 1 ? 0Delta 0 1 ?Epsilon 1 1 1

Week 3: Parsimony variants, compatibility, statistics andparsimony – p.24/46

Multiple character states and compatibility

Fitch’s set of nucleotide sequences that have each pair of sitescompatible, but which are not all compatible with the same tree.

Alpha A A ABeta A C CGamma C G CDelta C C GEpsilon G A G

Week 3: Parsimony variants, compatibility, statistics andparsimony – p.25/46

Transition probabilities in a two-state model

In a symmetric two-state model with rate of change r per unit time, where

0 1rr we have Prob (1 | 0, t, r) = 1

2(1 − e

−2rt)

0.0 0.5 1.0 1.5 2.00.0

0.2

0.4

0.6

0.8

1.0

2 (1 1 − e

r t

r t

Pro

babi

lity

of c

hang

e

)− 2 r t

Week 3: Parsimony variants, compatibility, statistics andparsimony – p.26/46

Likelihood and parsimony

L = Prob (Data|Tree)

=chars∏

i=1

recon−structions

(

12

B∏

j=1

{

ritj if this character changes1 − ritj if it does not change

)

Week 3: Parsimony variants, compatibility, statistics andparsimony – p.27/46

approximating ...

For each character the likelihood is approximately

1

2

B∏

i=1

(ritj)nij .

For them all it is approximately

L ≃chars∏

i=1

1

2

branches∏

j=1

(ritj)nij

.

Tossing out the 1/2 terms and taking logarithms:

− ln L ≃chars∑

i=1

branches∑

j=1

nij (− ln (ritj)) .

Week 3: Parsimony variants, compatibility, statistics andparsimony – p.28/46

Weights and likelihood

Probabilities of change and resulting weights for an imaginary case.

Character ri tj changes total weight

1 0.01 1 4.6052 0.01 2 9.2103 0.00001 1 11.519

Week 3: Parsimony variants, compatibility, statistics andparsimony – p.29/46

with one-tenth as much change ...

Character ri tj changes total weight

1 0.001 1 6.9082 0.001 2 13.8163 0.000001 1 13.816

Week 3: Parsimony variants, compatibility, statistics andparsimony – p.30/46

... and with one-tenth less again

Character ri tj changes total weight

1 0.0001 1 9.2102 0.0001 2 18.4213 0.0000001 1 16.118

Week 3: Parsimony variants, compatibility, statistics andparsimony – p.31/46

Trees, steps, and patterns

0 0 0

1 1 1

1 1 1

1 1 1

1 1 1

0000

0001

0010

0100

0111

1000

1011

1101

1110

1111

1 1 1

1 1 1

1 2 2

1 2 2

0011

0101

0110

1001

1010

22 122 1

22 122 1

1100

1 1 1

111

0 0 0

a c

b d

a b

c d

a b

d c

Week 3: Parsimony variants, compatibility, statistics andparsimony – p.32/46

Trees, steps, and patterns with DNA

AAAA

AAAC

AAAG

AAAT

AACA

AACG

AACT

AAGA

AAGC

AAGT

AATA

AATC

AATG

ACAA

ACAG

ACAT

ACCC

TTTT

...

0 0 0

1 1 1

1 1 1

1 1 1

1 1 11 1 1

2 2 2

2 2 2

1 1 1

2 2 2

2 2 2

1 1 1

2 2 2

2 2 2

1 1 1

2 2 2

2 2 2

AACC

AAGG

AATT

ACAC

ACCA

1 2 2

1 2 2

1 2 2

2 1 2

2 2 11 1 1

0 0 0

...

a c a b a b

b d c d d c

Week 3: Parsimony variants, compatibility, statistics andparsimony – p.33/46

Parsimony and patterns

nxxyy + 2nxyxy + 2nxyyx = 2(nxxyy + nxyxy + nxyyx) − nxxyy

2nxxyy + nxyxy + 2nxyyx = 2(nxxyy + nxyxy + nxyyx) − nxyxy

2nxxyy + 2nxyxy + nxyyx = 2(nxxyy + nxyxy + nxyyx) − nxyyx.

Week 3: Parsimony variants, compatibility, statistics andparsimony – p.34/46

The counterexample

p pq

q q

A

B

C

D

Week 3: Parsimony variants, compatibility, statistics andparsimony – p.35/46

Pattern probabilities

If tip pattern is 1100, and internal nodes are 1, 1 the probability is

1

2(1 − p)(1 − q)(1 − q)pq

and in general summing over all four possibilities:

P1100 = 12

(

(1 − p)(1 − q)2pq + (1 − p)2(1 − q)2q

+ p2q3 + pq(1 − p)(1 − q)2)

Week 3: Parsimony variants, compatibility, statistics andparsimony – p.36/46

Pattern probabilities

Pxxyy = (1 − p)(1 − q)[q(1 − q)(1 − p) + q(1 − q)p]

+ pq[(1 − q)2(1 − p) + q2p]

Pxyxy = (1 − p)q[q(1 − q)p + q(1 − q)(1 − p)]

+ p(1 − q)[p(1 − q)2 + (1 − p)q2]

Pxyyx = (1 − p)q[(1 − p)q2 + p(1 − q)2]

+ p(1 − q)[q(1 − q)p + q(1 − q)(1 − p)]

Week 3: Parsimony variants, compatibility, statistics andparsimony – p.37/46

Taking differences

Pxyxy − Pxyyx = (1 − 2q)[

q2(1 − p)2 + (1 − q)2p2]

Which is always positive as long as q < 1/2 and either p or q ispositive. Thus Pxyxy > Pxyyx

To have Pxxyy be the largest of the three, we only need to know that

Pxxyy − Pxyxy > 0

and after a struggle that turns out to require

(1 − 2q)[

q(1 − q) − p2]

> 0

which (provided q < 1/2) is true if and only if

q(1 − q) > p2

Week 3: Parsimony variants, compatibility, statistics andparsimony – p.38/46

Conditions for inconsistency

p

q0.1 0.2 0.3 0.4 0.50

0.1

0.2

0.3

0.4

0.5

0

consistent

inconsistent

p2< q (1−q)

Week 3: Parsimony variants, compatibility, statistics andparsimony – p.39/46

Example for patterns with DNA

root

z

p

qq

w

p

q

1(C) 3(A)

4(A)2(C)

Week 3: Parsimony variants, compatibility, statistics andparsimony – p.40/46

Calculating a pattern frequency

Prob [CCAA] = 118

(1 − p)(1 − q)2pq + 127

pq2(1 − p)(1 − q)

+ 1162

p2q2(1 − q) + 7972

p2q3

+ 112

(1 − p)2(1 − q)2q.

Week 3: Parsimony variants, compatibility, statistics andparsimony – p.41/46

Pattern frequencies

Prob [xxyy] = (1 − p)2q(1 − q)2 + 23p(1 − p)q(1 − q)2

+ 49p(1 − p)q2(1 − q) + 2

27p2q2(1 − q)

+ 781

p2q3

Prob [xyxy] = 13(1 − p)2q2(1 − q) + 2

9p(1 − p)q2(1 − q)

+ 427

p(1 − p)q3 + 13p2(1 − q)3

+ 29p2q2(1 − q) + 2

81p2q3

Prob [xyyx] = 181

(1 − p)2q3 + 23p(1 − p)q(1 − q)2

+ 49p(1 − p)q3 + 1

9p2q(1 − q)2

+ 627

p2q2(1 − q) + 281

p2q3.

Week 3: Parsimony variants, compatibility, statistics andparsimony – p.42/46

Conditions for inconsistency with DNA

p <−18 q + 24 q2 +

243 q − 567 q2 + 648 q3 − 288 q4

9 − 24 q + 32 q2

For small p and q this is approximately

1

3p2 < q

Week 3: Parsimony variants, compatibility, statistics andparsimony – p.43/46

Conditions for inconsistency with DNA

0.7

0.6

0.5

0.4

0.3

0.2

0.1

00.2 0.4 0.60

q

p

inconsistent

consistent

Week 3: Parsimony variants, compatibility, statistics andparsimony – p.44/46

Inconsistency with a clock

A B C D E A B C D E

x x

Week 3: Parsimony variants, compatibility, statistics andparsimony – p.45/46

Example showing inconsistency with a clock

200

100

0100 200 500 1000 2000

sitestr

ees

corr

ect o

ut o

f 100

0

b

b

E

A

B

C

D

0.50

0.30

0.20−b

b = 0.03

b = 0.02

Week 3: Parsimony variants, compatibility, statistics andparsimony – p.46/46

top related