uncorrelated and autocorrelated relaxed phylogenetics

35
Juin 2008 bioinf.cs.auckland. ac.nz Uncorrelated and Autocorrelated relaxed phylogenetics Michaël Defoin-Platel and Alexei Drummond

Upload: foy

Post on 11-Feb-2016

76 views

Category:

Documents


0 download

DESCRIPTION

Uncorrelated and Autocorrelated relaxed phylogenetics. Michaël Defoin-Platel and Alexei Drummond. (Bayesian) RELAXED PHYLOGENETICS. t 0. t 1. b 1. b 3. t 2. b 5. time. b 2. b 4. Relaxed Phylogenetics allows - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Uncorrelated and Autocorrelated relaxed phylogenetics

Juin 2008 bioinf.cs.auckland.ac.nz

Uncorrelated and Autocorrelatedrelaxed phylogenetics

Michaël Defoin-Platel and Alexei Drummond

Page 2: Uncorrelated and Autocorrelated relaxed phylogenetics

Relaxed Phylogenetics 2

(Bayesian) RELAXED PHYLOGENETICS

Relaxed Phylogenetics allows •the co-estimation of divergence times together with a

phylogenetic reconstruction•should be compared with

b1

b2

b3

b4

b5

time

t0

t1

t2

Unrooted (2n-3 parameters)

Rooted with a strict clock(n-1 divergence times)

Page 3: Uncorrelated and Autocorrelated relaxed phylogenetics

Relaxed Phylogenetics 3

TIME, SUBSTITUTIONS, and RATES

Time, substitutions and rates•Expected number of substitutions

per site on a particular branch i

•Substitution rate R(t) cannot be directly observed !

→Only the product of rate and time is identifiable→Without information external to the data, rate and time cannot be

separated…

T

ii dttRTb0

)()(

time

i

T

0

Page 4: Uncorrelated and Autocorrelated relaxed phylogenetics

Relaxed Phylogenetics 4

MOLECULAR CLOCK HYPOTHESIS

Molecular Clock Hypothesis (MCH)(Zuckerlandl and

Pauling 1965)•DNA and protein sequences change at a rate that is constant

over time•First the substitution rate is estimated then time corresponds

to sequence divergence divided by the rate→Estimation of relative rate and relative divergence times

Calibration•Time reference, scaling

•Bayesian Phylogenetics : Priors on node height or on tips

→Transform relative to absolute rate

divergenceevaluatedtimeevaluated

divergencencalibratiotimencalibratio

Page 5: Uncorrelated and Autocorrelated relaxed phylogenetics

Relaxed Phylogenetics 5

MOLECULAR CLOCK HYPOTHESIS

Substitution rate depends on •Natural selection, population size, body mass,

generation time, mutation rate, mutation pattern, …→MCH is often violated !

How to deal with non-clock like data•Keep them !•Remove them !•Relax the MCH

→Allow the rate of evolution to vary→Make assumptions about the variations

Page 6: Uncorrelated and Autocorrelated relaxed phylogenetics

Relaxed Phylogenetics 6

RELAXING THE MCH

Modeling the “Rate of evolution of the rate of evolution”•Sanderson “nonparametric” model•(Random) Local Clock model•Uncorrelated relaxed clock model•Autocorrelated relaxed clock model•Compound Poisson process

Implementation of relaxed clock models in Beast allows to co-estimate

•the substitution parameters•the clock parameters •the ancestral phylogenies•the demography•…

→Relaxed phylogenetics

Page 7: Uncorrelated and Autocorrelated relaxed phylogenetics

Relaxed Phylogenetics 7

UNCORRELATED RELAXED CLOCK (UC) Drummond et al 2006

Hypothesis•The rate of evolution is probably never exactly the same

for all evolutionary lineages •Rates follow a given distribution

Prior on rates

→Distribution of the rates given by the hyperparameters and 2 or

)(),(~ 2 ExporLogNormalr

Page 8: Uncorrelated and Autocorrelated relaxed phylogenetics

Relaxed Phylogenetics 8

UNCORRELATED RELAXED CLOCK (UC) Drummond et al 2006

Implementation•Different rates in a tree•But a constant rate per branch•On a given rooted tree of n

species 2n-2 ratesn-1 divergence times

•The distribution is discretized•Each branch of the tree is

assigned a given rate category•Category mixing :

swappeddrawn (uniform)random walk

time

t0

t1

t2

4321

r1

r0

r2r3r4 r5

0 2 4 6 8 10

relative rate r

),(~ 2LNr

Page 9: Uncorrelated and Autocorrelated relaxed phylogenetics

Relaxed Phylogenetics 9

AUTOCORRELATED RELAXED CLOCK (AC) Thorne and Kishino 1998,2001,2002

Hypothesis•The rate is probably never exactly the same for all evolutionary

lineages •For closely related lineages the rates should be similar

Prior on rates

•log of the rates follow a Normal distribution•Expectation of a rate r is its ancestor rate rA

→Rate at the root node is given by the hyperparameter →Amount of variation is given by the hyperparameter 2

2

2

,2

)log(~)log(|)log( ttrNrr AA

rA

rt

Page 10: Uncorrelated and Autocorrelated relaxed phylogenetics

Relaxed Phylogenetics 10

AUTOCORRELATED RELAXED CLOCK (AC) Thorne and Kishino 1998,2001,2002

Implementation•Different rates in a tree•But a constant rate per branch•On a given rooted tree of n

species 2n-2 rates n-1 divergence times

Episodic vs Time dependent•Episodic variance = 2

•Time dependent variance = t 2

time

t0

t1

t2

4321

r1

r0

r2r3r4 r5

Page 11: Uncorrelated and Autocorrelated relaxed phylogenetics

Relaxed Phylogenetics 11

GOALS of this TALK

Validation of models implementation

Comparison of models•Fit the data•Deal with calibrations•Estimate of divergence times•Estimate of rates•Reconstruct the tree topology

Page 12: Uncorrelated and Autocorrelated relaxed phylogenetics

Relaxed Phylogenetics 12

PHYLOGENETIC ANALYSIS

Dataset 1: Lemurs (Yoder et al 2000)•36 species (lemurs + mammals outgroup)•alignment of 1812 nucleotides (2 genes)•7 calibration points

Settings•HKY substitution model + gamma rate heterogeneity•Yule tree prior•4 independent runs of 20 M steps of MCMC for each

setting

Page 13: Uncorrelated and Autocorrelated relaxed phylogenetics

Relaxed Phylogenetics 13

PHYLOGENETIC ANALYSIS

Dataset 2: Primates (Peter Waddell)•7 species of primates: human, chimp, gorilla, orangutan,

gibbon, macaque and marmoset•alignment of 1,362,261 nucleotides •Non coding regions•calibration : 16 MYA divergence time

of human – orangutan

Settings•GTR substitution model + gamma rate heterogeneity +

Invariant•Coalescent or Yule tree prior•4 independent runs of 50 M steps of MCMC for each

setting

Page 14: Uncorrelated and Autocorrelated relaxed phylogenetics

Relaxed Phylogenetics 14

PHYLOGENETIC ANALYSIS

Dataset 3: Yeast (Rokas et al 2003)•8 species of yeast•alignment of 127,026 nucleotides (106 genes)•calibration : Normal prior on the root height N (1, 0.025)

Settings•GTR substitution model + gamma rate heterogeneity +

Invariant•Yule tree prior•4 independent runs of 50 M steps of MCMC for each

setting

Page 15: Uncorrelated and Autocorrelated relaxed phylogenetics

Relaxed Phylogenetics 15

PHYLOGENETIC ANALYSIS

Dataset 4: Dengue (Rambaut 2000)•17 serotype 4 sequences•alignment of 1,485 nucleotides•serial sampling (1956-1994)

Settings•HKY substitution model•Coalescent tree prior•4 independent runs of 10 M steps of MCMC for each

setting

Page 16: Uncorrelated and Autocorrelated relaxed phylogenetics

Relaxed Phylogenetics 16

PHYLOGENETIC ANALYSIS

Dataset 5 : Influenza A virus (Drummond et al 2006)•69 sequences •each sequence represents a consensus of the viral

population•alignment of 98 nucleotides•serial sampling (1981-1998)

Settings•HKY substitution model + gamma rate heterogeneity•Coalescent tree prior•Constant population size•4 independent runs of 20 M steps of MCMC for each

setting

Page 17: Uncorrelated and Autocorrelated relaxed phylogenetics

Relaxed Phylogenetics 17

MODEL COMPARISON

Bayes Factor (Kass and Raftery 1995, Marc Suchard 2005) •Quantifies the real support of two competing hypothesis

given the observed data

→Ratio of the marginal likelihood of two models M1 and M2

→Bayesian analogue of the likelihood rate test (LRT)

)Pr()Pr(

2

1

MDMD

K

Page 18: Uncorrelated and Autocorrelated relaxed phylogenetics

Relaxed Phylogenetics 18

MARGINAL LOG LIKELIHOOD

  SC UC AC eAC

Lemurs -31 524.7 -31 349.3 -31 355.4 -31 352.3

Primates -3 090 089.90 -3 089 592.76 -3 089 591.72 -3 089 591.37Yeast -684 380.8 -683 754.6 -683 754.4 -683 754.6Dengue -3 861.7 -3 861.5 -3 861.9 -3 861.7Influenza -4 288.8 -4 263.9 -4272.1 -4 275.7

  A priori  

  Clock-like Correlated Calibrations

Lemurs No ? 7 internal (hard)

Primates Nearly Yes 1 internal (soft)

Yeast No ? root node (soft)

Dengue Yes Yes Serial Sampling

Influenza No No Serial Sampling

Page 19: Uncorrelated and Autocorrelated relaxed phylogenetics

Relaxed Phylogenetics 19

Influenza datasetConsensus trees

Uncorrelated AutoCorrelated

Page 20: Uncorrelated and Autocorrelated relaxed phylogenetics

Relaxed Phylogenetics 20

DIVERGENCE TIMES

Lemurs Primates Yeast

Dengue Influenza

Lemurs Primates Yeast

Dengue Influenza

Lemurs Primates Yeast

Dengue Influenza

Page 21: Uncorrelated and Autocorrelated relaxed phylogenetics

Relaxed Phylogenetics 21

DIVERGENCE TIMES

Beast: mean of the posterior distributions, error bars are 95% lower and upper HPDsGlazko et al: error bars are +/- standard error

Posterior distribution of the root height

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

20 28 36 44 52 60 68 77 85 93

root height

mar

gina

l den

sity

UC+CoalescentUC+YuleAC+CoalescentAC+Yule

Posterior distribution of the root height

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

20 28 36 44 52 60 68 77 85 93

root height

mar

gina

l den

sity

UC+CoalescentUC+YuleAC+CoalescentAC+Yule

Divergence times of Human from other Primates

0

10

20

30

40

50

60

70

80

90

100

Chimp Gorilla Orangutan Gibbon OWM NWMM

YA

Table 5 Glazko et al (2003)UC + CoalescentUC + YuleAC + CoalescentAC + Yule

Divergence times of Human from other Primates

0

10

20

30

40

50

60

70

80

90

100

Chimp Gorilla Orangutan Gibbon OWM NWMM

YA

Table 5 Glazko et al (2003)UC + CoalescentUC + YuleAC + CoalescentAC + Yule

Page 22: Uncorrelated and Autocorrelated relaxed phylogenetics

Relaxed Phylogenetics 22

DIVERGENCE TIMES

Uncorrelated Relaxed Clock

Human

Chimp

Gorilla

Orang

Gibbon

Macaque

Marmoset

Autocorrelated Relaxed Clock

8.00E-04

8.50E-04

9.00E-04

9.50E-04

1.00E-03

1.05E-03

1.10E-03

1.15E-03

1.20E-03

0 10 20 30 40 50

Mya

Bran

che

rate

Human

Chimp

Gorilla

Orang

Gibbon

Macaque

Marmoset

8.00E-04

8.50E-04

9.00E-04

9.50E-04

1.00E-03

1.05E-03

1.10E-03

1.15E-03

1.20E-03

0 10 20 30 40 50

Mya

Bran

che

rate

Human

Chimp

Gorilla

Orang

Gibbon

Macaque

Marmoset

Human

Chimp

Gorilla

Orang

Gibbon

Macaque

Marmoset

8.00E-04

8.50E-04

9.00E-04

9.50E-04

1.00E-03

1.05E-03

1.10E-03

1.15E-03

1.20E-03

0 10 20 30 40 50 60

MyaBr

anch

era

te

8.00E-04

8.50E-04

9.00E-04

9.50E-04

1.00E-03

1.05E-03

1.10E-03

1.15E-03

1.20E-03

0 10 20 30 40 50 60

MyaBr

anch

era

te

Page 23: Uncorrelated and Autocorrelated relaxed phylogenetics

Relaxed Phylogenetics 23

RATE OF EVOLUTION

    Mean External Coefficient of Coefficient of    Rate Rate Variation CorrelationLemurs SC 0.00297 - -

  UC 0.00309 0.00357 0.39 0.01  AC 0.00325 0.00419 0.37 0.88  eAC 0.00325 0.00472 0.49 0.88Primates SC 0.00095 - -

  UC 0.00098 0.00099 0.12 -0.14

  AC 0.00105 0.00100 0.11 0.56

  eAC 0.00104 0.00099 0.11 0.74Yeast SC 1.03 - -

  UC 0.87 0.83 0.46 -0.13

  AC 0.83 0.79 0.37 0.19

  eAC 0.90 0.98 0.44 0.33

Page 24: Uncorrelated and Autocorrelated relaxed phylogenetics

Relaxed Phylogenetics 24

RATE OF EVOLUTION

    Mean External Coefficient of Coefficient of    Rate Rate Variation Correlation

Dengue SC 0.00080 - -

  UC 0.00081 0.00082 0.06 -0.03

  AC 0.00079 0.00080 0.06 0.69

  eAC 0.00079 0.00081 0.05 0.69Influenza SC 0.0048 - -

  UC 0.0050 0.0061 0.58 -0.01  AC 0.0050 0.0052 0.37 0.87  eAC 0.0045 0.0052 0.38 0.89

Page 25: Uncorrelated and Autocorrelated relaxed phylogenetics

Relaxed Phylogenetics 25

RATE OF EVOLUTION

Page 26: Uncorrelated and Autocorrelated relaxed phylogenetics

Relaxed Phylogenetics 26

RATE OF EVOLUTION

Page 27: Uncorrelated and Autocorrelated relaxed phylogenetics

Relaxed Phylogenetics 27

GENES RATE VS

SPECIES RATE

Mean rate per “locus”

Primates Yeast

Page 28: Uncorrelated and Autocorrelated relaxed phylogenetics

Relaxed Phylogenetics 28

NAÏVE MULTIPLE LOCUS APPROACH

Super Matrix→Genes share the same divergence time

Multiple Locus→Perform a relaxed phylogenetic analysis for each “genes”

  SC UC AC eACYeast (SM) -684 380.8 -683 754.6 -683 754.4 -683 754.6Yeast (mL) -672 854.3 -672 135.5 -672 115.8 -672 128.86Primates (SM) -3 090 089.90 -3 089 592.76 -3 089 591.72 -3 089 591.37Primates (mL) -3 078 315.48 -3 077 756.50 -3 077 784.95 -3 078 136.58

Page 29: Uncorrelated and Autocorrelated relaxed phylogenetics

Relaxed Phylogenetics 29

GENES DIVERGENCE TIMES VS

SPECIES DIVERGENCE TIMES

Page 30: Uncorrelated and Autocorrelated relaxed phylogenetics

Relaxed Phylogenetics 30

GENES DIVERGENCE TIMES VS

SPECIES DIVERGENCE TIMES

Root Height in the primates dataset

Genome Multiple LocusMean Error Mean Error

SC 56.91 0.04 57.91 0.51UC 55.7 0.61 55.47 0.60AC 49.7 0.08 51.52 0.39eAC 51.06 0.58 54.9 0.47

Page 31: Uncorrelated and Autocorrelated relaxed phylogenetics

Relaxed Phylogenetics 31

GENES RATE VS

SPECIES RATE

   Coefficient of

VariationCoefficient ofCorrelation

    Super MatrixMultiple Locus Super Matrix Multiple Locus

Yeast UC 0.46 0.75 -0.13 -0.07

  AC 0.37 0.71 0.19 0.39  eAC 0.44 0.77 0.33 0.34

Primates UC 0.12 0.16 -0.14 -0.08  AC 0.11 0.10 0.56 0.44

  eAC 0.11 0.03 0.74 0.49

Page 32: Uncorrelated and Autocorrelated relaxed phylogenetics

Relaxed Phylogenetics 32

GENES TREE VS

SPECIES TREE

% True Tree in Size of True Tree95% Cred Set 95% Cred Set Posterior

Yeast SC 64.7 2.9 25.4UC 92.4 24.7 20.6

AC 88.6 17.8 15.7

  eAC 88.6 15.1 19.1Primates SC 86.7 1.1 79.4

UC 87.5 1.3 75.7

AC 87.5 1.2 77.7

eAC 87.5 1.1 79.1

Page 33: Uncorrelated and Autocorrelated relaxed phylogenetics

Relaxed Phylogenetics 33

GENES TREE VS

SPECIES TREE

Page 34: Uncorrelated and Autocorrelated relaxed phylogenetics

Relaxed Phylogenetics 34

Conclusions

Validation of the implementation in Beast

Model comparison•Fit the data•Uncorrelated vs Autocorrelated : prior knowledge•Calibrations•Estimate of rates•Disagree in the multiple locus approach•Reconstruct the tree topology

Page 35: Uncorrelated and Autocorrelated relaxed phylogenetics

Relaxed Phylogenetics 35

THANKS