molecular phylogenetics - nui galwaycathal/teaching/msc11/phylogenetics.pdf · rate of evolution...
TRANSCRIPT
![Page 1: Molecular phylogenetics - NUI Galwaycathal/Teaching/MSc11/Phylogenetics.pdf · rate of evolution across alignment sites (typically modeled with discretized gamma distribution)](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec5091be9f9bc484151f5db/html5/thumbnails/1.jpg)
Molecular phylogenetics
![Page 2: Molecular phylogenetics - NUI Galwaycathal/Teaching/MSc11/Phylogenetics.pdf · rate of evolution across alignment sites (typically modeled with discretized gamma distribution)](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec5091be9f9bc484151f5db/html5/thumbnails/2.jpg)
Genetic distance
Define genetic distance between a pair of ‘homologous’sequences x and y as the number of substitutions that haveoccurred (per alignment site) since x and y diverged from theircommon ancestor
![Page 3: Molecular phylogenetics - NUI Galwaycathal/Teaching/MSc11/Phylogenetics.pdf · rate of evolution across alignment sites (typically modeled with discretized gamma distribution)](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec5091be9f9bc484151f5db/html5/thumbnails/3.jpg)
Genetic distance
Given the following sequence alignment, infer the geneticdistance
A C G T T C A T T - - T G
A G - T C C C T G G G G G
Simplification: Ignore alignment positions with gaps
![Page 4: Molecular phylogenetics - NUI Galwaycathal/Teaching/MSc11/Phylogenetics.pdf · rate of evolution across alignment sites (typically modeled with discretized gamma distribution)](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec5091be9f9bc484151f5db/html5/thumbnails/4.jpg)
Model of evolution
Continuous-time process over {A,C,G,T}
)|(maxargˆ
)()|(
)(
21
tALt
tptAL
etp
t
iAA
ijGt
ij
ii
![Page 5: Molecular phylogenetics - NUI Galwaycathal/Teaching/MSc11/Phylogenetics.pdf · rate of evolution across alignment sites (typically modeled with discretized gamma distribution)](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec5091be9f9bc484151f5db/html5/thumbnails/5.jpg)
Some standard models of nucleotide evolution: Jukes and Cantor
3
3
3
3
G
C
T
A
GCTA
![Page 6: Molecular phylogenetics - NUI Galwaycathal/Teaching/MSc11/Phylogenetics.pdf · rate of evolution across alignment sites (typically modeled with discretized gamma distribution)](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec5091be9f9bc484151f5db/html5/thumbnails/6.jpg)
Kimura two-parameter model (1980)
G
C
T
A
GCTA
With
ji
ijii qq
Models a difference in the rate of transitions and transversions
![Page 7: Molecular phylogenetics - NUI Galwaycathal/Teaching/MSc11/Phylogenetics.pdf · rate of evolution across alignment sites (typically modeled with discretized gamma distribution)](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec5091be9f9bc484151f5db/html5/thumbnails/7.jpg)
Recall:
1
i
jijiji pp
imply πi are the limiting probabilities ofthe chain and the chain is reversible
for all i,j
Analogous result applies to the gij of a continuous-timechain
![Page 8: Molecular phylogenetics - NUI Galwaycathal/Teaching/MSc11/Phylogenetics.pdf · rate of evolution across alignment sites (typically modeled with discretized gamma distribution)](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec5091be9f9bc484151f5db/html5/thumbnails/8.jpg)
Kishino-Hasegawa-Yano (1985)
CTA
GTA
GCA
GCT
G
C
T
A
GCTA
Includes parameters gk for the equilibrium nucleotidefrequencies
![Page 9: Molecular phylogenetics - NUI Galwaycathal/Teaching/MSc11/Phylogenetics.pdf · rate of evolution across alignment sites (typically modeled with discretized gamma distribution)](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec5091be9f9bc484151f5db/html5/thumbnails/9.jpg)
General Time-Reversible Model (Simon Tavaré 1986)
CTA
GTA
GCA
GCT
G
C
T
A
GCTA
![Page 10: Molecular phylogenetics - NUI Galwaycathal/Teaching/MSc11/Phylogenetics.pdf · rate of evolution across alignment sites (typically modeled with discretized gamma distribution)](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec5091be9f9bc484151f5db/html5/thumbnails/10.jpg)
Generator matrix (or equivalently time) scaled so that onesubstitution expected in one unit of time
ji
iji g 1
![Page 11: Molecular phylogenetics - NUI Galwaycathal/Teaching/MSc11/Phylogenetics.pdf · rate of evolution across alignment sites (typically modeled with discretized gamma distribution)](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec5091be9f9bc484151f5db/html5/thumbnails/11.jpg)
Commonly used evolutionary models also allow heterogeneity inrate of evolution across alignment sites (typically modeled withdiscretized gamma distribution)
In general, simpler nucleotide substitution models are nestedwithin successively more complex models – standard modelcomparison techniques can be used to select an appropriatemodel.
![Page 12: Molecular phylogenetics - NUI Galwaycathal/Teaching/MSc11/Phylogenetics.pdf · rate of evolution across alignment sites (typically modeled with discretized gamma distribution)](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec5091be9f9bc484151f5db/html5/thumbnails/12.jpg)
Phylogenetic tree: binary tree with edges representing genetic distance
An evolving sequence can bifurcate (e.g. speciation), giving rise to two daughtersequences
S1S2
S3S4
Branch length represents genetic distance between sequences (orhypothetisized sequences) at the nodes
‘Rooted’ tree
![Page 13: Molecular phylogenetics - NUI Galwaycathal/Teaching/MSc11/Phylogenetics.pdf · rate of evolution across alignment sites (typically modeled with discretized gamma distribution)](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec5091be9f9bc484151f5db/html5/thumbnails/13.jpg)
‘Unrooted’ tree
![Page 14: Molecular phylogenetics - NUI Galwaycathal/Teaching/MSc11/Phylogenetics.pdf · rate of evolution across alignment sites (typically modeled with discretized gamma distribution)](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec5091be9f9bc484151f5db/html5/thumbnails/14.jpg)
Rooted versus unrooted tree
Most substitution models are reversible (see previous slides). Therefore themodels cannot distinguish the time-direction of evolution. Externalinformation is usually incorporated to decide the position of the root(hypothetical ancestor of all of the sequences represented in the alignment)
![Page 15: Molecular phylogenetics - NUI Galwaycathal/Teaching/MSc11/Phylogenetics.pdf · rate of evolution across alignment sites (typically modeled with discretized gamma distribution)](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec5091be9f9bc484151f5db/html5/thumbnails/15.jpg)
ECP1 MOUSE
ECP2 MOUSE
ECP RAT
ECP HUMAN
ECP PONPY
0.1
Alternative tree representations…
![Page 16: Molecular phylogenetics - NUI Galwaycathal/Teaching/MSc11/Phylogenetics.pdf · rate of evolution across alignment sites (typically modeled with discretized gamma distribution)](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec5091be9f9bc484151f5db/html5/thumbnails/16.jpg)
ECP1 MOUSE
ECP2 MOUSE
ECP RAT
ECP HUMAN
ECP PONPY
0.1
Alternative tree representations…
![Page 17: Molecular phylogenetics - NUI Galwaycathal/Teaching/MSc11/Phylogenetics.pdf · rate of evolution across alignment sites (typically modeled with discretized gamma distribution)](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec5091be9f9bc484151f5db/html5/thumbnails/17.jpg)
EC
P1
MO
US
E
ECP2 MOUSE
EC
PR
AT
ECP
HU
MA
N
ECP PONPY
0.1
Alternative tree representations…
![Page 18: Molecular phylogenetics - NUI Galwaycathal/Teaching/MSc11/Phylogenetics.pdf · rate of evolution across alignment sites (typically modeled with discretized gamma distribution)](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec5091be9f9bc484151f5db/html5/thumbnails/18.jpg)
ECP1 MOUSE
ECP2 MOUSE
ECP RAT
ECP HUMAN
ECP PONPY
0.1
Most conventional representation of a ‘rooted’ phylogenetic tree
![Page 19: Molecular phylogenetics - NUI Galwaycathal/Teaching/MSc11/Phylogenetics.pdf · rate of evolution across alignment sites (typically modeled with discretized gamma distribution)](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec5091be9f9bc484151f5db/html5/thumbnails/19.jpg)
How many trees?
(2n – 3)!/(2n-2 (n – 2)!)
10 20 30 40 50
020
40
60
Number of sequences
log
10(#
T)
![Page 20: Molecular phylogenetics - NUI Galwaycathal/Teaching/MSc11/Phylogenetics.pdf · rate of evolution across alignment sites (typically modeled with discretized gamma distribution)](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec5091be9f9bc484151f5db/html5/thumbnails/20.jpg)
The phylogeny problem
Given a set of aligned DNA or amino acid sequences, infer the phylogenetic treerepresenting the evolution of the set of taxa
Requires:
- An optimality criterion (what constitutes the ‘best’ tree)
- Search algorithm
Commonly applied optimality criteria are
- Minimum evolution (tree with shortest sum of branch lengths)
- Maximum parsimony (tree requiring smallest number of steps to explain thedata)
- Maximum Likelihood
- Maximum a posteriori Probability (MAP)
![Page 21: Molecular phylogenetics - NUI Galwaycathal/Teaching/MSc11/Phylogenetics.pdf · rate of evolution across alignment sites (typically modeled with discretized gamma distribution)](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec5091be9f9bc484151f5db/html5/thumbnails/21.jpg)
The likelihood of a tree:
),|()...,|(),|()(...)|( 11331221
1 2 3
nnna a a a
abaPabaPabaPaPTDPn
a1
a2
a3
an
b3
b2
bn
an-1
![Page 22: Molecular phylogenetics - NUI Galwaycathal/Teaching/MSc11/Phylogenetics.pdf · rate of evolution across alignment sites (typically modeled with discretized gamma distribution)](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec5091be9f9bc484151f5db/html5/thumbnails/22.jpg)
A recursive algorithm is used to avoid doing all the summations (Felsenstein’sPruning Algorithm)
Let Lmk be the likelihood of the subtree decended from node k, given that the
nucleotide present at node k, is m then
k
i
jb2
b1
s
js
s
is
km bmsPLbmsPLL ),|(),|( 21
![Page 23: Molecular phylogenetics - NUI Galwaycathal/Teaching/MSc11/Phylogenetics.pdf · rate of evolution across alignment sites (typically modeled with discretized gamma distribution)](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec5091be9f9bc484151f5db/html5/thumbnails/23.jpg)
The L’s can be worked out easily for the leaf nodes:
Consider position i in sequence X
If b is a leaf node, then
Lab = 1 if Xi = a
Missing information can be handled easily (using intermediate values at terminalnodes)
![Page 24: Molecular phylogenetics - NUI Galwaycathal/Teaching/MSc11/Phylogenetics.pdf · rate of evolution across alignment sites (typically modeled with discretized gamma distribution)](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec5091be9f9bc484151f5db/html5/thumbnails/24.jpg)
r
)()|( spLDTPs
rs
Complexity: O(n . m . k2)
(n = # sequences; m = sequence length; k = alphabet size)
![Page 25: Molecular phylogenetics - NUI Galwaycathal/Teaching/MSc11/Phylogenetics.pdf · rate of evolution across alignment sites (typically modeled with discretized gamma distribution)](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec5091be9f9bc484151f5db/html5/thumbnails/25.jpg)
643
652
451
321
G
C
T
A
GCTA
Exercise: Given the instantaneous transition rate matrix and tree showncalculate the likelihood of the single alignment column shown at the tips ofthe tree.
A
A
T
G
0.05
0.05
0.05
0.05
0.01
0.01
![Page 26: Molecular phylogenetics - NUI Galwaycathal/Teaching/MSc11/Phylogenetics.pdf · rate of evolution across alignment sites (typically modeled with discretized gamma distribution)](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec5091be9f9bc484151f5db/html5/thumbnails/26.jpg)
Optimizing branch lengths
• If all branch lengths are known except one then the likelihood of the treecan be expressed as a function of the unknown branch length
• Standard problem of maximization in 1D for a single branch (e.g. Newton-Raphson)
• Although branches are not independent branch maximizations tend not tointerfere to a great extent
• A small number of successive maximizations normally succeeds inachieving the maximum likelihood set of branch lengths
![Page 27: Molecular phylogenetics - NUI Galwaycathal/Teaching/MSc11/Phylogenetics.pdf · rate of evolution across alignment sites (typically modeled with discretized gamma distribution)](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec5091be9f9bc484151f5db/html5/thumbnails/27.jpg)
Searching for optimal trees
Branch & Bound
Heuristics – usually local perturbations with hill-climbing
Markov-Chain Monte Carlo (MC3)
Genetic algorithms
etc.
![Page 28: Molecular phylogenetics - NUI Galwaycathal/Teaching/MSc11/Phylogenetics.pdf · rate of evolution across alignment sites (typically modeled with discretized gamma distribution)](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec5091be9f9bc484151f5db/html5/thumbnails/28.jpg)
Common heuristic algorithm: Neighbor-Joining, anapproximation to the minimum evolution tree
8
7
6
54
1
2
3
8
7
6
5
23
4
1
Choose the pair that minimizes the length of the resulting tree
![Page 29: Molecular phylogenetics - NUI Galwaycathal/Teaching/MSc11/Phylogenetics.pdf · rate of evolution across alignment sites (typically modeled with discretized gamma distribution)](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec5091be9f9bc484151f5db/html5/thumbnails/29.jpg)
t
r s
u v
dAB ~ r + sdCD ~ u + vdAD ~ r + t + vdBC ~ s + t + u
Tree length = u + v + t + r + s
A B
C D
(r, s, u, v, t are estimated using theleast squares method)
![Page 30: Molecular phylogenetics - NUI Galwaycathal/Teaching/MSc11/Phylogenetics.pdf · rate of evolution across alignment sites (typically modeled with discretized gamma distribution)](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec5091be9f9bc484151f5db/html5/thumbnails/30.jpg)
Branch & Bound
Exact
Can be used with several different optimality criteria
Algorithm:
Traverse the search tree in some order
Exclude a subtree from the search if the score on the root node of the subtree isless than the best score achieved so far
Can improve speed by starting with a tree inferred using a different method
Works because the score only gets worse as you proceed towards the tips of thetree
Complexity:At worst equal to the complexity of the exhaustive search
![Page 31: Molecular phylogenetics - NUI Galwaycathal/Teaching/MSc11/Phylogenetics.pdf · rate of evolution across alignment sites (typically modeled with discretized gamma distribution)](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec5091be9f9bc484151f5db/html5/thumbnails/31.jpg)
Branch and bound
http://artedi.ebc.uu.se/course/X3-2004/Phylogeny/Phylogeny-TreeSearch/Phylogeny-Search.html
![Page 32: Molecular phylogenetics - NUI Galwaycathal/Teaching/MSc11/Phylogenetics.pdf · rate of evolution across alignment sites (typically modeled with discretized gamma distribution)](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec5091be9f9bc484151f5db/html5/thumbnails/32.jpg)
Heuristic search algorithms
Greedy algorithms - Hill climbing approach
• NNI (Nearest Neighbour Interchange): break an interior branch and replacewith one of the two alternative branches
• SPR (Subtree Pruning and Regrafting): remove a subtree from the treeand reinsert elsewhere
• TBR (Tree Bisection and Reconnection): break the tree to form twosubtrees. Reconnect the two subtrees with a new branch between twoexisting branches in the two subtrees
Genetic Algorithms (e.g. MetaPig; Garli)
![Page 33: Molecular phylogenetics - NUI Galwaycathal/Teaching/MSc11/Phylogenetics.pdf · rate of evolution across alignment sites (typically modeled with discretized gamma distribution)](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec5091be9f9bc484151f5db/html5/thumbnails/33.jpg)
Heuristic search algorithms
Can be sped up by starting with a reasonable tree (e.g. tree inferred with NJalgorithm).
Speed up also by estimating other parameters using an approximate treeprior to inferring the final tree topology (iterate if necessary).
Start tree can be from
- an tree inferred from another method (e.g. NJ)
- Stepwise addition
- Star decomposition
![Page 34: Molecular phylogenetics - NUI Galwaycathal/Teaching/MSc11/Phylogenetics.pdf · rate of evolution across alignment sites (typically modeled with discretized gamma distribution)](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec5091be9f9bc484151f5db/html5/thumbnails/34.jpg)
Star decomposition
http://artedi.ebc.uu.se/course/X3-2004/Phylogeny/Phylogeny-TreeSearch/Phylogeny-Search.html
![Page 35: Molecular phylogenetics - NUI Galwaycathal/Teaching/MSc11/Phylogenetics.pdf · rate of evolution across alignment sites (typically modeled with discretized gamma distribution)](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec5091be9f9bc484151f5db/html5/thumbnails/35.jpg)
Stepwise addition
http://artedi.ebc.uu.se/course/X3-2004/Phylogeny/Phylogeny-TreeSearch/Phylogeny-Search.html
![Page 36: Molecular phylogenetics - NUI Galwaycathal/Teaching/MSc11/Phylogenetics.pdf · rate of evolution across alignment sites (typically modeled with discretized gamma distribution)](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec5091be9f9bc484151f5db/html5/thumbnails/36.jpg)
Traversing a tree
Tree traversals:
Preorder: node; left subtree; right subtree
Inorder: left subtree; node; right subtree
Postorder: left subtree; right subtree; node
All of these can be implemented using recursive functions
![Page 37: Molecular phylogenetics - NUI Galwaycathal/Teaching/MSc11/Phylogenetics.pdf · rate of evolution across alignment sites (typically modeled with discretized gamma distribution)](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec5091be9f9bc484151f5db/html5/thumbnails/37.jpg)
Exercise: Sketch this tree and label its nodes in the order inwhich they would be visited on preorder, inorder and postordertraversal, starting the algorithm at the root node
![Page 38: Molecular phylogenetics - NUI Galwaycathal/Teaching/MSc11/Phylogenetics.pdf · rate of evolution across alignment sites (typically modeled with discretized gamma distribution)](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec5091be9f9bc484151f5db/html5/thumbnails/38.jpg)
Bayesian MCMC in phylogenetics
Prior over trees (often flat)
Starting tree (star decomposition, step-wise addition, NJ)
Proposals: tree perturbations
Acceptance depends on ratio of posterior probabilities ( = ratio of likelihoods)
Determine burn-in and convergence
![Page 39: Molecular phylogenetics - NUI Galwaycathal/Teaching/MSc11/Phylogenetics.pdf · rate of evolution across alignment sites (typically modeled with discretized gamma distribution)](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec5091be9f9bc484151f5db/html5/thumbnails/39.jpg)
In molecular phylogenetics the prior is usually ‘flat’
Why bother?
1. We get the answer as a probability
2. We get to use MCMC to sample over trees/search for‘best’ tree
3. Allows us to integrate over nuisance parametersrather than using their optimal values
![Page 40: Molecular phylogenetics - NUI Galwaycathal/Teaching/MSc11/Phylogenetics.pdf · rate of evolution across alignment sites (typically modeled with discretized gamma distribution)](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec5091be9f9bc484151f5db/html5/thumbnails/40.jpg)
![Page 41: Molecular phylogenetics - NUI Galwaycathal/Teaching/MSc11/Phylogenetics.pdf · rate of evolution across alignment sites (typically modeled with discretized gamma distribution)](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec5091be9f9bc484151f5db/html5/thumbnails/41.jpg)
Running a phylogenetic MCMC
Generate long chain of trees/parameters sampled according to their jointposterior probability
The number of times the chain visits tree X is proportional to the probability oftree X
The number of times a specific branch is sampled can be used to estimate theposterior probability that the ‘clade’ of taxa specified by the branch is correct
![Page 42: Molecular phylogenetics - NUI Galwaycathal/Teaching/MSc11/Phylogenetics.pdf · rate of evolution across alignment sites (typically modeled with discretized gamma distribution)](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec5091be9f9bc484151f5db/html5/thumbnails/42.jpg)
Multiple chains may be used (Metropolis coupled MCMC = MC3)
• Only one chain is sampled
• The other chains are heated (i.e. they can take bigger steps)
• Chains can swap states
• Allows crossing of valleys
![Page 43: Molecular phylogenetics - NUI Galwaycathal/Teaching/MSc11/Phylogenetics.pdf · rate of evolution across alignment sites (typically modeled with discretized gamma distribution)](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec5091be9f9bc484151f5db/html5/thumbnails/43.jpg)
Burnin
• From an arbitrary starting point the chain can take some time to equilibriate
• Consequently, the chain takes some time before samples are obtaiendaccording to their posterior probabilites
• Initially probability of trees increases with time
• Programmes allowed to run until the probabilities are fluctuating randomlyabout a constant mean
• Data generated before the chain equilibriates are discarded
![Page 44: Molecular phylogenetics - NUI Galwaycathal/Teaching/MSc11/Phylogenetics.pdf · rate of evolution across alignment sites (typically modeled with discretized gamma distribution)](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec5091be9f9bc484151f5db/html5/thumbnails/44.jpg)
0 200 400 600 800 1000 1200
-25000
-20000
-15000
-10000
Index
lnL1
0 200 400 600 800 1000 1200
-25000
-20000
-15000
-10000
Index
lnL2
![Page 45: Molecular phylogenetics - NUI Galwaycathal/Teaching/MSc11/Phylogenetics.pdf · rate of evolution across alignment sites (typically modeled with discretized gamma distribution)](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec5091be9f9bc484151f5db/html5/thumbnails/45.jpg)
Proposals
• Topology (e.g. NNI) or ‘coalescence time’ perturbations have been used
• Choice of proposal significant
– Too aggressive results in rejection of most proposals
– Too conservative takes too long to provide adequate sampling of parameterspace
![Page 46: Molecular phylogenetics - NUI Galwaycathal/Teaching/MSc11/Phylogenetics.pdf · rate of evolution across alignment sites (typically modeled with discretized gamma distribution)](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec5091be9f9bc484151f5db/html5/thumbnails/46.jpg)
Advantages of Bayesian methods
- relatively fast
- easily interpretable
- often very accurate
Disadvantages of Bayesian methods
- can be difficult to be sure of convergence (this has improved withavailability of better diagnostics)
- still controversial in molecular phylogenetics – choice of prior can bedifficult to justify
- thought by some to exaggerate confidence
Software: e.g. MrBayes
![Page 47: Molecular phylogenetics - NUI Galwaycathal/Teaching/MSc11/Phylogenetics.pdf · rate of evolution across alignment sites (typically modeled with discretized gamma distribution)](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec5091be9f9bc484151f5db/html5/thumbnails/47.jpg)
Inferring Phylogenies
Joseph Felsenstein, 2004
Sinauer
Further reading (molecular phylogenetics)
![Page 48: Molecular phylogenetics - NUI Galwaycathal/Teaching/MSc11/Phylogenetics.pdf · rate of evolution across alignment sites (typically modeled with discretized gamma distribution)](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec5091be9f9bc484151f5db/html5/thumbnails/48.jpg)
‘Universal’ genetic code is degenerate => natural classification ofmutations as:
Nonsynonymous: amino acid changing (rate - dN)
Synonymous: no amino acid change (rate - dS)
ω: dN/dS
ω > 1 => adaptive evolution (actually ‘diversifying selection’)
Models of codon sequence evolution and inference of positiveDarwinian selection
![Page 49: Molecular phylogenetics - NUI Galwaycathal/Teaching/MSc11/Phylogenetics.pdf · rate of evolution across alignment sites (typically modeled with discretized gamma distribution)](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec5091be9f9bc484151f5db/html5/thumbnails/49.jpg)
![Page 50: Molecular phylogenetics - NUI Galwaycathal/Teaching/MSc11/Phylogenetics.pdf · rate of evolution across alignment sites (typically modeled with discretized gamma distribution)](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec5091be9f9bc484151f5db/html5/thumbnails/50.jpg)
Example: Analysis of selection using simple discrete ω distribution
Neutral model:
0 1
Selection model:
0 1
Free parameters: ω- < 1; pω-; ω+ > 1; pω+
ω
ω
Free parameters: ω- < 1; pω-
![Page 51: Molecular phylogenetics - NUI Galwaycathal/Teaching/MSc11/Phylogenetics.pdf · rate of evolution across alignment sites (typically modeled with discretized gamma distribution)](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec5091be9f9bc484151f5db/html5/thumbnails/51.jpg)
Is there a subset of sites with ω > 1
- model comparison techniques
Which sites are evolving adaptively (empirical Bayes method)
- fix all parameter values to their ML estimates
- using ML estimates as priors, calculate posteriorprobabilities of belonging to selection site class
Questions of interest
)1()1|()1( iii PDLP
![Page 52: Molecular phylogenetics - NUI Galwaycathal/Teaching/MSc11/Phylogenetics.pdf · rate of evolution across alignment sites (typically modeled with discretized gamma distribution)](https://reader034.vdocuments.net/reader034/viewer/2022042219/5ec5091be9f9bc484151f5db/html5/thumbnails/52.jpg)
Analysis of selection
- Infer a phylogenetic tree
- Obtain ML estimates of all parameters
- Use LRT (or other model comparison method) to evaluateevidence for selection
- Use empirical Bayes method (or a variant) to estimate posteriorprobabilities of belonging to the selection site class
ki
i
i
kpP
pDPDP
)(
)|()|(