bayesian ii spring 2010. major issues in phylogenetic bi have we reached convergence? if so, do we...

26
Bayesian II Spring 2010

Upload: dayna-ross

Post on 20-Jan-2018

214 views

Category:

Documents


0 download

DESCRIPTION

Have we reached convergence? Look at the trace plots of the posterior probability (not reliable) Convergence Diagnostics: – Average Standard Deviation of the Split frequencies: compares the node frequencies between the independent runs (the closer to zero, the better, but it is not clear how small it should be) – Potential Scale Reduction Factor (PSRF): the closer to one, the better

TRANSCRIPT

Page 1: Bayesian II Spring 2010. Major Issues in Phylogenetic BI Have we reached convergence? If so, do we have…

Bayesian II

Spring 2010

Page 2: Bayesian II Spring 2010. Major Issues in Phylogenetic BI Have we reached convergence? If so, do we have…

Major Issues in Phylogenetic BI

• Have we reached convergence?• If so, do we have a large enough sample of the

posterior?

Page 3: Bayesian II Spring 2010. Major Issues in Phylogenetic BI Have we reached convergence? If so, do we have…

Have we reached convergence?

• Look at the trace plots of the posterior probability (not reliable)

• Convergence Diagnostics:– Average Standard Deviation of the Split

frequencies: compares the node frequencies between the independent runs (the closer to zero, the better, but it is not clear how small it should be)

– Potential Scale Reduction Factor (PSRF): the closer to one, the better

Page 4: Bayesian II Spring 2010. Major Issues in Phylogenetic BI Have we reached convergence? If so, do we have…

Convergence Diagnostics

• By default performs two independent analyses starting from different random trees (mcmc nruns=2)

• Average standard deviation of clade frequencies calculated and presented during the run (mcmc mcmcdiagn=yes diagnfreq=1000) and written to file (.mcmc)

• Standard deviation of each clade frequency and potential scale reduction for branch lengths calculated with sumt

• Potential scale reduction calculated for all substitution model parameters with sump

Page 5: Bayesian II Spring 2010. Major Issues in Phylogenetic BI Have we reached convergence? If so, do we have…

Have we reached convergence?PSRF (sump command)

• Model parameter summaries over all 3 runs sampled in files• "Ligia16SCOI28SGulfnoADLGGTRGbygene.nex.run1.p", "Ligia16SCOI28SGulfnoADLGGTRGbygene.nex.run2.p" etc:• (Summaries are based on a total of 162003 samples from 3 runs)• (Each run produced 60001 samples of which 54001 samples were included)•  • 95% Cred. Interval• ----------------------• Parameter Mean Variance Lower Upper Median PSRF *• -------------------------------------------------------------------------------------------• TL{all} 1.578535 0.008255 1.417000 1.771000 1.573000 1.000• r(A<->C){1} 0.036558 0.000133 0.016961 0.061987 0.035555 1.000• r(A<->G){1} 0.405681 0.002305 0.314283 0.502156 0.404811 1.000• r(A<->T){1} 0.044360 0.000120 0.025482 0.068209 0.043489 1.000• r(C<->G){1} 0.022006 0.000120 0.004949 0.047283 0.020532 1.000• r(C<->T){1} 0.472371 0.002269 0.379396 0.565910 0.472279 1.000• r(G<->T){1} 0.019024 0.000079 0.005074 0.039579 0.017873 1.000• pi(A){1} 0.292047 0.000390 0.254211 0.331575 0.291819 1.000• pi(C){1} 0.204875 0.000278 0.173364 0.238514 0.204491 1.000• pi(G){1} 0.214907 0.000324 0.180784 0.251150 0.214522 1.000• pi(T){1} 0.288171 0.000367 0.251355 0.326301 0.287933 1.000• alpha{1} 0.183087 0.000419 0.146653 0.226929 0.181792 1.000• -------------------------------------------------------------------------------------------• * Convergence diagnostic (PSRF = Potential scale reduction factor [Gelman• and Rubin, 1992], uncorrected) should approach 1 as runs converge. The• values may be unreliable if you have a small number of samples. PSRF should• only be used as a rough guide to convergence since all the assumptions• that allow one to interpret it as a scale reduction factor are not met in• the phylogenetic context.•  

Page 6: Bayesian II Spring 2010. Major Issues in Phylogenetic BI Have we reached convergence? If so, do we have…

Copyright restrictions may apply.

Nylander, J. A.A. et al. Bioinformatics 2008 24:581-583; doi:10.1093/bioinformatics/btm388

Are We There Yet (AWTY)?

Page 7: Bayesian II Spring 2010. Major Issues in Phylogenetic BI Have we reached convergence? If so, do we have…

Empirical Data: two independent runs 300,000,000 generations: complex model with three partitions (by codon): the bad news

Plotted in Tracer

Page 8: Bayesian II Spring 2010. Major Issues in Phylogenetic BI Have we reached convergence? If so, do we have…

Empirical Data: two independent runs 300,000,000 generations: complex model with three partitions (by codon): the good news; splits were highly correlated between the

two runs

Plotted in AWTY: http://ceb.scs.fsu.edu/awty

Page 9: Bayesian II Spring 2010. Major Issues in Phylogenetic BI Have we reached convergence? If so, do we have…

Empirical Data: two independent runs 300,000,000 generations: complex model with three partitions (by codon): the good news; splits were highly correlated between the

two runs

Plotted in AWTY: http://ceb.scs.fsu.edu/awty

What caused the difference in posterior probabilities? Estimation of particular parameters

Page 10: Bayesian II Spring 2010. Major Issues in Phylogenetic BI Have we reached convergence? If so, do we have…

Yes we have reached convergence:Do we have a large enough sample of the posterior?

• Long runs are better than short one, but how long?• Good mixing: “Examine the acceptance rates of the

proposal mechanisms used in your• analysis (output at the end of the run)• The Metropolis proposals used by MrBayes work

best when their acceptance rate is neither too low nor too high. A rough guide is to try to get them within the range of 10 % to 70 %”

Page 11: Bayesian II Spring 2010. Major Issues in Phylogenetic BI Have we reached convergence? If so, do we have…

Acceptance Rates Analysis used 2373953.05 seconds of CPU time on processor 0 Likelihood of best state for "cold" chain of run 1 was -9688.70 Likelihood of best state for "cold" chain of run 2 was -9865.21 Likelihood of best state for "cold" chain of run 3 was -9887.59 Likelihood of best state for "cold" chain of run 4 was -9895.35 Acceptance rates for the moves in the "cold" chain of run 1: With prob. Chain accepted changes to 42.86 % param. 1 (revmat) with Dirichlet proposal 21.42 % param. 2 (revmat) with Dirichlet proposal 55.92 % param. 3 (revmat) with Dirichlet proposal 29.18 % param. 4 (state frequencies) with Dirichlet proposal 12.22 % param. 5 (state frequencies) with Dirichlet proposal 24.61 % param. 6 (state frequencies) with Dirichlet proposal 41.02 % param. 7 (gamma shape) with multiplier 31.74 % param. 8 (gamma shape) with multiplier 79.95 % param. 9 (gamma shape) with multiplier 40.16 % param. 10 (rate multiplier) with Dirichlet proposal 15.80 % param. 11 (topology and branch lengths) with extending TBR 23.62 % param. 11 (topology and branch lengths) with LOCAL

Page 12: Bayesian II Spring 2010. Major Issues in Phylogenetic BI Have we reached convergence? If so, do we have…

tree 1 tree 2 tree 3

)|( Xf

Posterior probability distribution

Parameter space

Post

erio

r pro

babi

lity

Page 13: Bayesian II Spring 2010. Major Issues in Phylogenetic BI Have we reached convergence? If so, do we have…

cold chain

heated chain

Metropolis-coupled Markov chain Monte Carlo

a. k. a.

MCMCMC

a. k. a.

(MC)3

Page 14: Bayesian II Spring 2010. Major Issues in Phylogenetic BI Have we reached convergence? If so, do we have…

cold chain

hot chain

Page 15: Bayesian II Spring 2010. Major Issues in Phylogenetic BI Have we reached convergence? If so, do we have…

cold chain

hot chain

Page 16: Bayesian II Spring 2010. Major Issues in Phylogenetic BI Have we reached convergence? If so, do we have…

cold chain

hot chain

Page 17: Bayesian II Spring 2010. Major Issues in Phylogenetic BI Have we reached convergence? If so, do we have…

unsuccessful swap

cold chain

hot chain

Page 18: Bayesian II Spring 2010. Major Issues in Phylogenetic BI Have we reached convergence? If so, do we have…

cold chain

hot chain

Page 19: Bayesian II Spring 2010. Major Issues in Phylogenetic BI Have we reached convergence? If so, do we have…

cold chain

hot chain

Page 20: Bayesian II Spring 2010. Major Issues in Phylogenetic BI Have we reached convergence? If so, do we have…

cold chain

hot chain

successful swap

Page 21: Bayesian II Spring 2010. Major Issues in Phylogenetic BI Have we reached convergence? If so, do we have…

cold chain

hot chain

Page 22: Bayesian II Spring 2010. Major Issues in Phylogenetic BI Have we reached convergence? If so, do we have…

cold chain

hot chain

Page 23: Bayesian II Spring 2010. Major Issues in Phylogenetic BI Have we reached convergence? If so, do we have…

cold chain

hot chain

successful swap

Page 24: Bayesian II Spring 2010. Major Issues in Phylogenetic BI Have we reached convergence? If so, do we have…

cold chain

hot chain

Page 25: Bayesian II Spring 2010. Major Issues in Phylogenetic BI Have we reached convergence? If so, do we have…

Improving Convergence

(Only if convergence diagnostics indicate problem!)• Change tuning parameters of proposals to bring

acceptance rate into the range 10 % to 70 %• Propose changes to ‘difficult’ parameters more often• Use different proposal mechanisms• Change heating temperature to bring acceptance rate

of swaps between adjacent chains into the range 10 % to 70 %.

• Run the chain longer• Increase the number of heated chains• Make the model more realistic

Page 26: Bayesian II Spring 2010. Major Issues in Phylogenetic BI Have we reached convergence? If so, do we have…

Sam

pled

val

ueTarget distribution

Too modest proposalsAcceptance rate too highPoor mixing

Too bold proposalsAcceptance rate too lowPoor mixing

Moderately bold proposalsAcceptance rate intermediateGood mixing