talk at evolution 2014 -- raleigh nc usa
DESCRIPTION
Talk on the 23/6/2014 at the Evolution 2014 meeting, describing new results with guenomu on a wide range of simulation scenarios. The talk can be seen on YouTube: https://www.youtube.com/watch?v=gWnecpEZH0QTRANSCRIPT
Guenomu – a Bayesian supertree program for species tree reconstruction
Leonardo de Oliveira Martins
Diego Mallo
David Posada
Guenomu – a Bayesian supertree program for species tree reconstruction
Leonardo de Oliveira Martins
Diego Mallo
David Posada
With help from Klaus Schliep, Mateus Patricio and Nicolas Lartillot
D1
D2
G1 S
G2
Model for the evolution of gene families
.
.
.
Dn
Gn
Model for the evolution of gene families
Our assumption:
We just need to consider the simplest
explanation for the difference between
the gene and species trees
D1
G1 S
Model for the evolution of gene families
distance between G and S
P(G
/S)
Our assumption:
We just need to consider the simplest
explanation for the difference between
the gene and species trees
D1
G1 S
Steel and Rodrigo 2008(doi:10.1080/10635150802033014)
Model for the evolution of gene families
Our assumption:
We just need to consider the simplest
explanation for the difference between
the gene and species trees
● we may use several such simple explanations
D1
G1 S
distance between G and S
P(G
/S)
Steel and Rodrigo 2008(doi:10.1080/10635150802033014)
Model for the evolution of gene families
Our assumption:
We just need to consider the simplest
explanation for the difference between
the gene and species trees
● we may use several such simple explanations
● work with unrooted gene trees
D1
G1 S
distance between G and S
P(G
/S)
Steel and Rodrigo 2008(doi:10.1080/10635150802033014)
Quantifying the disagreement
gene tree species tree
reconciliation
assuming deepcoal:
assuming duplosses:
1 deepcoal
1 dup3 losses
Stochastic error/nonparametric (RF distance etc.)
assuming HGT:
1 event
Quantifying the disagreement
gene tree species tree
reconciliation
assuming deepcoal:
assuming duplosses:
1 deepcoal
1 dup3 losses
Stochastic error/nonparametric (RF distance etc.)
assuming HGT:
1 event
Must work with mul-trees
.
.
.
Distribution of gene trees: probabilistic model
S
D1
G1
Dn
Gn
.
.
.
Distribution of gene trees: probabilistic model
θ1
θn
S
D1
G1
Dn
Gn
λdup1
λdupn
λdupprior
.
.
.
.
.
.
Distribution of gene trees: probabilistic model
S
D1
G1
.
.
.Dn
Gn
θ1
θn
λdup1
λdupn
λdupprior
.
.
.
.
.
.
λloss1
λlossn
λlossprior
λRF1
λRFn
λRFprior...
Distribution of gene trees: probabilistic model
S
D1
G1
.
.
.Dn
Gn
θ1
θn
λdup1
λdupn
λdupprior
.
.
.
.
.
.
λloss1
λlossn
λlossprior
λRF1
λRFn
λRFprior...
Distribution of gene trees: probabilistic model
S
D1
G1
.
.
.Dn
Gn
θ1
θn
λdup1
λdupn
λdupprior
.
.
.
.
.
.
λloss1
λlossn
λlossprior
λRF1
λRFn
λRFprior...
guenomu algorithm
SG1
.
.
.Gn
.
.
.
ImportanceSampling
So we can use complex, state-of-the-art software for phylogenetic inference
λdup1
λdupn
λdupprior
.
.
.
.
.
.
λloss1
λlossn
λlossprior
λRF1
λRFn
λRFprior...
SG1
.
.
.Gn
.
.
.
ImportanceSampling
So we can use complex, state-of-the-art software for phylogenetic inference
Input
guenomu algorithm
λdup1
λdupn
λdupprior
.
.
.
.
.
.
λloss1
λlossn
λlossprior
λRF1
λRFn
λRFprior...
SG1
.
.
.Gn
.
.
.
ImportanceSampling
So we can use complex, state-of-the-art software for phylogenetic inference
Output
Do not rely on single estimates of gene phylogenies
guenomu algorithm
Simulation scenarios
1 – SimPhy (Mallo et al. in prep)
Simulation scenarios
Rasmussen, Kellis 2012 (10.1101/gr.123901.111)
1 – SimPhy (Mallo et al. in prep)
Simulation scenarios
Rasmussen, Kellis 2012 (10.1101/gr.123901.111)
2 – generation of tree uncertainty
1 – SimPhy (Mallo et al. in prep)
Analysis of simulated data sets
Analysis of simulated data sets● Distance matrix methods
Analysis of simulated data sets● Distance matrix methods
Helmkamp et al. 2012 (doi:10.1089/cmb.2012.0042)
Analysis of simulated data sets● Distance matrix methods
● Gene Tree Parsimony
Chaudhary et al. 2010 (doi:10.1186/1471-2105-11-574)
Analysis of simulated data sets● Distance matrix methods
● Gene Tree Parsimony
● guenomu
Species tree accuracy distributions per method
Species tree accuracyby simulation model
~7k samples under the coalescent
~15k under pure-birth
Species tree accuracyby simulation model
~7k samples under the coalescent
~15k under pure-birth
0.8 ~ 1
0.6 ~ 0.8
0.4 ~ 0.6
0.2 ~ 0.4
0 ~ 0.2
Species tree accuracyby simulation model
~7k samples under the coalescent
~15k under pure-birth
0.8 ~ 1
0.6 ~ 0.8
0.4 ~ 0.6
0.2 ~ 0.4
0 ~ 0.2
Gene family tree reconstruction accuracy
input:
Gene family tree reconstruction accuracy
input: output:
Gene family tree reconstruction accuracy
input: output:
Thank you!
http://bitbucket.org/leomrtns/guenomu
Species tree accuracy distributions, per simulation
Species tree accuracy distribution – mulRF distance only