talk at evolution 2014 -- raleigh nc usa

36
Guenomu – a Bayesian supertree program for species tree reconstruction Leonardo de Oliveira Martins Diego Mallo David Posada [email protected] [email protected] [email protected]

Upload: leonardo-de-oliveira-martins

Post on 04-Dec-2014

229 views

Category:

Science


1 download

DESCRIPTION

Talk on the 23/6/2014 at the Evolution 2014 meeting, describing new results with guenomu on a wide range of simulation scenarios. The talk can be seen on YouTube: https://www.youtube.com/watch?v=gWnecpEZH0Q

TRANSCRIPT

Page 1: Talk at Evolution 2014 -- Raleigh NC USA

Guenomu – a Bayesian supertree program for species tree reconstruction

Leonardo de Oliveira Martins

Diego Mallo

David Posada

[email protected]

[email protected]

[email protected]

Page 2: Talk at Evolution 2014 -- Raleigh NC USA

Guenomu – a Bayesian supertree program for species tree reconstruction

Leonardo de Oliveira Martins

Diego Mallo

David Posada

[email protected]

[email protected]

[email protected]

With help from Klaus Schliep, Mateus Patricio and Nicolas Lartillot

Page 3: Talk at Evolution 2014 -- Raleigh NC USA

D1

D2

G1 S

G2

Model for the evolution of gene families

.

.

.

Dn

Gn

Page 4: Talk at Evolution 2014 -- Raleigh NC USA

Model for the evolution of gene families

Our assumption:

We just need to consider the simplest

explanation for the difference between

the gene and species trees

D1

G1 S

Page 5: Talk at Evolution 2014 -- Raleigh NC USA

Model for the evolution of gene families

distance between G and S

P(G

/S)

Our assumption:

We just need to consider the simplest

explanation for the difference between

the gene and species trees

D1

G1 S

Steel and Rodrigo 2008(doi:10.1080/10635150802033014)

Page 6: Talk at Evolution 2014 -- Raleigh NC USA

Model for the evolution of gene families

Our assumption:

We just need to consider the simplest

explanation for the difference between

the gene and species trees

● we may use several such simple explanations

D1

G1 S

distance between G and S

P(G

/S)

Steel and Rodrigo 2008(doi:10.1080/10635150802033014)

Page 7: Talk at Evolution 2014 -- Raleigh NC USA

Model for the evolution of gene families

Our assumption:

We just need to consider the simplest

explanation for the difference between

the gene and species trees

● we may use several such simple explanations

● work with unrooted gene trees

D1

G1 S

distance between G and S

P(G

/S)

Steel and Rodrigo 2008(doi:10.1080/10635150802033014)

Page 8: Talk at Evolution 2014 -- Raleigh NC USA

Quantifying the disagreement

gene tree species tree

reconciliation

assuming deepcoal:

assuming duplosses:

1 deepcoal

1 dup3 losses

Stochastic error/nonparametric (RF distance etc.)

assuming HGT:

1 event

Page 9: Talk at Evolution 2014 -- Raleigh NC USA

Quantifying the disagreement

gene tree species tree

reconciliation

assuming deepcoal:

assuming duplosses:

1 deepcoal

1 dup3 losses

Stochastic error/nonparametric (RF distance etc.)

assuming HGT:

1 event

Must work with mul-trees

Page 10: Talk at Evolution 2014 -- Raleigh NC USA

.

.

.

Distribution of gene trees: probabilistic model

S

D1

G1

Dn

Gn

Page 11: Talk at Evolution 2014 -- Raleigh NC USA

.

.

.

Distribution of gene trees: probabilistic model

θ1

θn

S

D1

G1

Dn

Gn

Page 12: Talk at Evolution 2014 -- Raleigh NC USA

λdup1

λdupn

λdupprior

.

.

.

.

.

.

Distribution of gene trees: probabilistic model

S

D1

G1

.

.

.Dn

Gn

θ1

θn

Page 13: Talk at Evolution 2014 -- Raleigh NC USA

λdup1

λdupn

λdupprior

.

.

.

.

.

.

λloss1

λlossn

λlossprior

λRF1

λRFn

λRFprior...

Distribution of gene trees: probabilistic model

S

D1

G1

.

.

.Dn

Gn

θ1

θn

Page 14: Talk at Evolution 2014 -- Raleigh NC USA

λdup1

λdupn

λdupprior

.

.

.

.

.

.

λloss1

λlossn

λlossprior

λRF1

λRFn

λRFprior...

Distribution of gene trees: probabilistic model

S

D1

G1

.

.

.Dn

Gn

θ1

θn

Page 15: Talk at Evolution 2014 -- Raleigh NC USA

λdup1

λdupn

λdupprior

.

.

.

.

.

.

λloss1

λlossn

λlossprior

λRF1

λRFn

λRFprior...

guenomu algorithm

SG1

.

.

.Gn

.

.

.

ImportanceSampling

So we can use complex, state-of-the-art software for phylogenetic inference

Page 16: Talk at Evolution 2014 -- Raleigh NC USA

λdup1

λdupn

λdupprior

.

.

.

.

.

.

λloss1

λlossn

λlossprior

λRF1

λRFn

λRFprior...

SG1

.

.

.Gn

.

.

.

ImportanceSampling

So we can use complex, state-of-the-art software for phylogenetic inference

Input

guenomu algorithm

Page 17: Talk at Evolution 2014 -- Raleigh NC USA

λdup1

λdupn

λdupprior

.

.

.

.

.

.

λloss1

λlossn

λlossprior

λRF1

λRFn

λRFprior...

SG1

.

.

.Gn

.

.

.

ImportanceSampling

So we can use complex, state-of-the-art software for phylogenetic inference

Output

Do not rely on single estimates of gene phylogenies

guenomu algorithm

Page 18: Talk at Evolution 2014 -- Raleigh NC USA

Simulation scenarios

1 – SimPhy (Mallo et al. in prep)

Page 19: Talk at Evolution 2014 -- Raleigh NC USA

Simulation scenarios

Rasmussen, Kellis 2012 (10.1101/gr.123901.111)

1 – SimPhy (Mallo et al. in prep)

Page 20: Talk at Evolution 2014 -- Raleigh NC USA

Simulation scenarios

Rasmussen, Kellis 2012 (10.1101/gr.123901.111)

2 – generation of tree uncertainty

1 – SimPhy (Mallo et al. in prep)

Page 21: Talk at Evolution 2014 -- Raleigh NC USA

Analysis of simulated data sets

Page 22: Talk at Evolution 2014 -- Raleigh NC USA

Analysis of simulated data sets● Distance matrix methods

Page 23: Talk at Evolution 2014 -- Raleigh NC USA

Analysis of simulated data sets● Distance matrix methods

Helmkamp et al. 2012 (doi:10.1089/cmb.2012.0042)

Page 24: Talk at Evolution 2014 -- Raleigh NC USA

Analysis of simulated data sets● Distance matrix methods

● Gene Tree Parsimony

Chaudhary et al. 2010 (doi:10.1186/1471-2105-11-574)

Page 25: Talk at Evolution 2014 -- Raleigh NC USA

Analysis of simulated data sets● Distance matrix methods

● Gene Tree Parsimony

● guenomu

Page 26: Talk at Evolution 2014 -- Raleigh NC USA

Species tree accuracy distributions per method

Page 27: Talk at Evolution 2014 -- Raleigh NC USA

Species tree accuracyby simulation model

~7k samples under the coalescent

~15k under pure-birth

Page 28: Talk at Evolution 2014 -- Raleigh NC USA

Species tree accuracyby simulation model

~7k samples under the coalescent

~15k under pure-birth

0.8 ~ 1

0.6 ~ 0.8

0.4 ~ 0.6

0.2 ~ 0.4

0 ~ 0.2

Page 29: Talk at Evolution 2014 -- Raleigh NC USA

Species tree accuracyby simulation model

~7k samples under the coalescent

~15k under pure-birth

0.8 ~ 1

0.6 ~ 0.8

0.4 ~ 0.6

0.2 ~ 0.4

0 ~ 0.2

Page 30: Talk at Evolution 2014 -- Raleigh NC USA

Gene family tree reconstruction accuracy

input:

Page 31: Talk at Evolution 2014 -- Raleigh NC USA

Gene family tree reconstruction accuracy

input: output:

Page 32: Talk at Evolution 2014 -- Raleigh NC USA

Gene family tree reconstruction accuracy

input: output:

Page 33: Talk at Evolution 2014 -- Raleigh NC USA

Thank you!

http://bitbucket.org/leomrtns/guenomu

Page 34: Talk at Evolution 2014 -- Raleigh NC USA

Species tree accuracy distributions, per simulation

Page 35: Talk at Evolution 2014 -- Raleigh NC USA

Species tree accuracy distribution – mulRF distance only

Page 36: Talk at Evolution 2014 -- Raleigh NC USA