selecting methods and models for phylogenetic inference & assessing the reliability of inferred...
TRANSCRIPT
Selecting Methods and Models for Phylogenetic Inference
&Assessing the Reliability of
Inferred Trees
Jim WilgenbuschSchool of Computational Science and Information
Technology
OverviewPart1: Distance Methods (Picking up from last week)
• Clustering Algorithms• As an Optimality Criterion
Part2: Method and Model Selection• Methods: Assumptions, Efficiency, Power, Robustness, Falsifiability,
and Accuracy• Models: Likelihood Ratio Test, Parametric Bootstrap, AIC
Part3: Assessing Reliability of Trees• Bootstrap/Jackknife, KH-test, Parametric Bootstrap, Bayesian Posterior
Probability of Tree Distribution
Part4: Practical Component• Align Sequences• Infer Tree(s)
Resources
This lecture closely follows:Page, R. D. and Holmes, E. C. 1998. Molecular Evolution: A
Phylogenetic Approach. Blackwell Science, Oxford.
For more details see:Li, W. 1997. Molecular Evolution. Sinauer Associates. Sunderland,
Massachusetts.
Swofford, D. L., G. J. Olsen, P. J. Waddell, and D. M. Hillis. 1996. Phylogenetic inference. Pages 407-514 in D. M. Hillis, C. Moritz, and B. Mable (eds.) Molecular Systematics (2nd ed.), Sinauer Associates, Sunderland, Massachusetts.
Sequences to Distances
Lemur AAGCTTCATAG…TTACATCATCCAHomo AAGCTTCACCG…TTACATCCTCATPan AAGCTTCACCG…TTACATCCTCATGoril AAGCTTCACCG…CCCACGGACTTAPongo AAGCTTCACCG…GCAACCACCCTCHylo AAGCTTTACAG…TGCAACCGTCCTMaca AAGCTTTTCCG…CGCAACCATCCT
Uncorrected ("p") distance matrix
1 2 3 4 5 6 7 1 Lemur catta - 2 Homo sapiens 0.30794 - 3 Pan 0.30906 0.08929 - 4 Gorilla 0.29341 0.10379 0.10603 - 5 Pongo 0.29265 0.16087 0.17093 0.16645 - 6 Hylobates 0.29559 0.18192 0.18862 0.18862 0.18768 - 7 Macaca fuscata 0.28215 0.23214 0.24330 0.23661 0.24467 0.24665 -
Distance Methods
• Clustering Algorithms– Unweighted Pair-Group Method with Arithmetic
Means (UPGMA)• Ultrametric Tree
– Neighbor Joining• Additive trees
• Distance as an Optimality Criterion– Fitch-Margoliash and Related Methods
– Minimum Evolution
Metric DistancesMetric Distances:• d(a,b) ≥ 0 (non-negativity)• d(a,b) = d(b,a) (symmetry)• d(a,c) ≤ d(a,b) + d(b,c)
(triangle inequality)• d(a,b) = 0 if and only if a=b)
(distinctness)
Ultrametric Distances:• d(a,b) ≤ max[d(a,c), d(b,c)]
(isosceles triangle)
Additive Distances:• d(a,b)+d(b+d) ≤ max[d(a,c)
+d(b,d)+d(a,d)+d(b,c)] (four-point condition)
a
b c
a b c
AdditiveTree
UltrametricTree
3
2 2
1
1
3
2
1
UPGMAUltrametric Tree
a b
1 1
a -
b 2 -
c 6 6 -
d 10 10 10 -
a b c d c
23
a-b -
c 6 -
d 10 10 -
a-b c d
d
2
5
a-b-c -
d 10 -
a-b-c d
Neighbor Joining MethodAdditive Tree
• Similar to other clustering methods except the ultrametric assumption is removed.– In practical terms this means that the tree is no longer
constrained to be clock-like.
• NJ finds neighbors sequentially that may minimize the total length of the tree.
• Separation between each pair of nodes is adjusted on the basis of their average divergence from all other nodes.
• NJ algorithm is a special case of the star decomposition method.
Star Decomposition
AB
C
DE
AB C
DE
AB
CDE
AB
CD
E
E
A
B
D
C
Advantages to Clustering methods
Sequences Number of Trees3 14 35 156 1057 9458 103959 135,135
10 2,027,02511 34,459,42512 654,729,07513 13,749,310,57514 316,234,143,22515 7,905,853,580,625
sp2
sp3
sp4
sp1
sp2
sp4
sp3
sp1
sp3
sp4
sp2
sp1
Distance as an Optimality Criterion• Real data are rarely metric• Fitch and Margoliash (1967)
Attempts to minimize the difference between the observed pairwise distances and the path length distances between two taxa on a tree topology
dij is the observed pairwise distance
pij is the path length distance between taxa i and j on the tree.
wij is a weighting factor that in some cases might be used to assign less weight to those distances likely to be more variable.
is often set to 2, so that E becomes the least-squares fit criterion
• Minimum Evolution (Kidd and Sgaramella-Zonta, 1971)• E from FM is used to fit branch lengths• the best tree is the one with the smallest sum of branch lengths
€
E = wij | dij − pij |a
j= i+1
T
∑i=1
T−1
∑
Part II: A. Choosing Among
General Methods Parsimony
Likelihood
Distance?
Choosing Among General Methods(from Page and Holmes, 1998)
• Efficiency– How fast is a method?
• Power– How much data does the method need to produce a reasonable
result?
• Robustness– Will minor violations of the method’s assumptions result in poor
estimates of phylogeny?
• Falsifiability– Will the method tell us when its assumptions are violated, (i.e. that
we should not be using the method at all)?
Testing the Accuracy of Methods • Known phylogenies
– Fitch and Atchley (1985)– Atchley and Fitch (1991)
• used genetic data from inbreed strains of laboratory mice to test phylogenic methods
– Hillis et al. (1992, 1994)• used virus sequences (bacteriophage T7) which have a fast rate
of mutation and speed things up by introducing mutagens.
• Simulate Data– Select a known tree (topology and branch lengths)– “Mutate Data” on the tree according to the model so
that number of changes on a branch are proportional to the defined branch length.
Testing Accuracy of MethodsSimulate Data on Four-taxa trees
(Parametric Bootstrap)
Generate data matrices under a modelof sequenceevolution
seqA seqB
seqC seqD
seqA agctaagcctcaagctaagcctcaseqB accagagtctcgagctaagcctcaseqC accagagtctcaagctaagcctcaseqD agctaagcctcaagctaagcctca
seqA seqC
seqB seqD
seqA seqB
seqC seqD
seqA seqC
seqD seqB
pars
imon
y
distance
likelihood
repeat many times
The Methods• Distance
– Pick the tree or set of trees that minimizes the difference between observed pairwise distances and path-length distances calculated on a tree (i.e., sum of branch lengths between a pair of external nodes).
• Maximum Parsimony– Pick the tree or set of trees that minimizes the total
amount of implied evolutionary change (often counted as steps).
• Maximum Likelihood– Pick the tree or set of trees that maximizes the
probability of having observed the data.
Accuracy of MethodsSimulated Data on Four-taxa tree
Fig 3. From Huelsenbeck and Hillis, 1996
Performance of methods over parameter space (from Holder, 2001)
Distance Methods • Pros
– Fast• algorithmic methods (UPGMA, NJ) Very Fast• NP-complete when used as an optimality criterion (not enough time
in the universe to solve exactly when N is large).
– When distances are small corrected distances can perform well • too small and you might observe large statistical error
• Cons– UPGMA requires ultrametric data (i.e., a clock)
• even when distance are small the method may fail
– Loss of information• cannot trace characters transformations over the tree
– Branch lengths may be difficult to interpret for some distances
Parsimony
• Pros– Relatively fast, easy to calculate tree scores (still an NP-Complete
problem)– No explicit assumptions about the evolutionary process– For large trees fairly robust to LBA
• Cons– No explicit assumptions– No objective function for determining best weighting scheme– Inconsistent when branch lengths are heterogeneous (i.e., long
branches attract)
Long-Branch AttractionA A
C G
reconstructwith
parsimonyC G
CGA
TA
True Tree Estimated Tree
Breaking Up Long Branches:Can parsimony artifacts be cured by dense
taxonomic sampling?
Sometimes but the problem is more complicated (see Poe and Swofford, 1999)
A A
C G
C G
CGA
TA
True Tree
A T
reconstructwith
parsimony
Estimated Tree
Maximum Likelihood
• Pros– Requires an explicit model of nucleotide evolution– Compensates well for unobserved changes – The likelihood can be used as an objective function to determine the
relative fit between data and model – Robust to violations of the underlying model
• Cons– Requires an explicit model of nucleotide evolution– Relatively slow, estimating parameters and calculate tree scores can take a
long time for more than 20 sequences (also the NP-complete problem)
Part II: B. Choosing among models within each general class of method
Parsimony
Likelihood
Distance
•Fitch•Transversion•Generalized
?
•GTR•TrN•HKY85•K3ST•K2P•F81•JC
?•Absolute•Mean•Evolutionary•LogDet
?
Pinv+
Choosing a model
• Goal: “parsimonious” models– Find the simplest model (fewest parameters)
that is adequate to explain the data– Don’t improve the “fit” of the model by trying
to explain everything
Overparameterizing a Model
0
20
40
60
80
100
0 25 50 75 100
y
x
y=1.30+0.965x
(r2 =0.963)
-80
-40
0
40
80
120
0 25 50 75 100
y
x
y=- 330+134x - 15.5x2 +0.816x3
- 0.0225x4 +0.000335x5
- 0.00000255x6 +0.00000000777x7
(r2 =1.000)
Choosing among Parsimony Methods
• Equal Weighted Parsimony– Change from one base to another = 1
• Transversion Parsimony– Transversions ( purines {A, G} pyrimidines {C, T}) =1
– Transistions (A G or C T) = 0
• General or Weighted Parsimony
A G C T
A - 1 3 3
G 1 - 3 3
C 3 3 - 1
T 3 3 1 -
A “step matrix” or cost matrix for transformations from one base to another.
Comparing Parsimony Scores
Gorilla
Pan
Homo sapiens
Hylobates
Pongo
50 changes
4635
50
40
61
159
113
Homo sapiens
Gorilla
Pan
Hylobates
Pongo
50 changes
4212
51
44
39
95
73
Gorilla
Pan
Homo sapiens
Hylobates
Pongo
10 changes
333
210
32
20
Unweighted or Fitch parsimonyscore: 356
Transversionparsimonyscore: 73
General or weightedparsimonyTv:Ti of 3:1score: 504
No common currency
Choosing among Likelihood Methods
The likelihood can be considered a “goodness of fit” for the data (i.e., common currency)
• Likelihood Ratio test
• Monte Carlo Simulations
• Akaike Information Criterion
GTR
SYMTrN
F81
JC
K3ST
K2P
HKY85F84
Equal base frequencies
3 substitution types(transitions,2 transversion classes)
2 substitution types(transitions vs. transversions)
2 substitution types(transitions vs.transversions)
Equal basefrequencies
Single substitution typeEqual base frequencies
(general time-reversible)
(Tamura-Nei)
(Felsenstein)
Jukes-Cantor
(Kimura 2-parameter)
(Kimura 3-subst. type)
(Felsenstein)
A Family of Substitution Models(from Swofford et al., 1996)
Likelihood Ratio For Nested Phylogenetic Models
GorillaPan
Homo sapiens
Hylobates
Pongo
0.05 substitutions/site
GorillaPan
Homo sapiens
Hylobates
Pongo
0.1 substitutions/site
GTR+ GTR
-lnL1 = 2625.73859 -lnL0 = 2664.43013
∆ = 38.69154
Likelihood Ratio For Nested Models
= L0/ L1 we are less likely to accept the model restriction when is small– L1= ML estimate given the data when fewer restriction are imposed on the
model of nucleotide substitution (more free parameters or with rate heterogeneity).
– L0 = ML estimate given the data when certain parameters are restricted (fewer free parameters/ without rate heterogeneity).
• To quantify a confidence interval we use the fact that -2ln is generally 2 with k degrees of freedom. – where k is the difference between the number of free parameters used to
calculate L0 and L1 (L0 has k fewer parameters then L1).
Likelihood Ratio For Nested Phylogenetic Models
GorillaPan
Homo sapiens
Hylobates
Pongo
0.05 substitutions/site
GorillaPan
Homo sapiens
Hylobates
Pongo
0.1 substitutions/site
GTR+ GTR
-lnL1 = 2625.73859 -lnL0 = 2664.43013
= 2(-lnL1 - -lnL0)
2df=1=77.38308, P < 0.0001
Parametric Bootstrap For Non-Nested Models (Efron, 1985)
seqA agctaagcctcaagctaagcctcaseqB accagagtctcgagctaagcctcaseqC accagagtctcaagctaagcctcaseqD agctaagcctcaagctaagcctca seqE agctaagcctcaagctaagcctca
GorillaPan
Homo sapiens
Hylobates
Pongo
0.05 substitutions/site
seqA agctaagcctcaagctaagcctcaseqB accagagtctcgagctaagcctcaseqC accagagtctcaagctaagcctcaseqD agctaagcctcaagctaagcctca seqE agctaagcctcaagctaagcctca
…
n replicate data setsseqA agctaagcctcaagctaagcctcaseqB accagagtctcgagctaagcctcaseqC accagagtctcaagctaagcctcaseqD agctaagcctcaagctaagcctca seqE agctaagcctcaagctaagcctca
GTR
1 = 2(-lnL1 - -lnL0)
2 = 2(-lnL1 - -lnL0)
n = 2(-lnL1 - -lnL0)
…
ˆ
ˆ
ˆ
•Generate data sets on the best tree using the parameters estimated under the null model from the observed data.
•Calculate the LRT for each rep.
•Use the estimated LRTs as your null distribution.
Parametric Bootstrap For Non-Nested Phylogenetic Models
(Goldman,1993)
0
5
10
15
20
25
30
35
40
45
25 30 35 40 45 50 65 70 75 80
= 2(-lnL1 - -lnL0)
Frequency
Observedvalue of
ˆ
Akaike Information Criterion (AIC) (Akaike, 1974)
• AIC = -2lnL + 2n
• Where,lnL is the maximum likelihood value of a specific
model of nucleotide sequence evolution and tree topology given the data.
n = the number of parameters free to vary
• Smaller AIC indicates a better model
Part III: Assessing Confidence of the Phylogenetic Tree
0
2
4
6
8
10
12
14
12891313132813431358137313881403141814331448146314781493150815231538155315681583159816131628164316581673
MP tree = 1300 ± ?
Distribution of tree scores
Ways of assessing support for a tree topology
• Bootstrap/Jackknife analyses
• Parametric bootstrap
• KH-test and others
• Bayesian Posterior Probabilities
Bootstrap Technique(Efron, 1979)
= 5 (true mean)
0
2
4
6
8
10
12
14
16
û = 5.3Sample meanone replicate
0
2
4
6
8
10
12
14
16
pseudo replicate 1û = 4.8
0
2
4
6
8
10
12
14
16
18
…
…
pseudo replicate nû = 5.1
Lemur AAGCTTCATAG…TTACATCATCCAHomo AAGCTTCACCG…TTACATCCTCATPan AAGCTTCACCG…TTACATCCTCATGoril AAGCTTCACCG…CCCACGGACTTAPongo AAGCTTCACCG…GCAACCACCCTCHylo AAGCTTTACAG…TGCAACCGTCCTMaca AAGCTTTTCCG…CGCAACCATCCT
Bootstrapping Phylogenetic Data(Felsenstein, 1985)
Lemur AAGCTTCATAG…TTACATCATCCAHomo AAGCTTCACCG…TTACATCCTCATPan AAGCTTCACCG…TTACATCCTCATGoril AAGCTTCACCG…CCCACGGACTTAPongo AAGCTTCACCG…GCAACCACCCTCHylo AAGCTTTACAG…TGCAACCGTCCTMaca AAGCTTTTCCG…CGCAACCATCCT
Lemur catta
Homo sapiens
Pan
Gorilla
Pongo
Hylobates
Macaca fuscata
Original data set
Lemur AAGCTTCATAG…TTACATCATCCAHomo AAGCTTCACCG…TTACATCCTCATPan AAGCTTCACCG…TTACATCCTCATGoril AAGCTTCACCG…CCCACGGACTTAPongo AAGCTTCACCG…GCAACCACCCTCHylo AAGCTTTACAG…TGCAACCGTCCTMaca AAGCTTTTCCG…CGCAACCATCCT
Lemur catta
Homo sapiens
Pan
Gorilla
Pongo
Hylobates
Macaca fuscata
…
…
pseudo rep n
Lemur catta
Homo sapiens
Pan
Gorilla
Pongo
Hylobates
Macaca fuscata
pseudo rep 1
Majority-rule Consensus tree
Lemur catta
Homo sapiens
Pan
Gorilla
Pongo
Hylobates
Macaca fuscata
100
91
100
57
1234567 Freq-----------------.***... 100.00.*****. 100.00.****.. 91.17..**... 57.33.**.... 42.00.***.*. 7.83
(1)
(2)
(3)
(4)
(5)
(6)(7)
Jackknifing Phylogenetic Data
• Also used to assess support for nodes on a given tree
• Data are sampled without replacement • Replicates represent some fraction of the total data
set.• Jackknife tree is also displayed as a consensus
tree, where support for a node is given as the percent of the jackknife replicates which contain the node.
Parametric Bootstrap
Homo sapiens
Pan
Gorilla
Pongo
Hylobates
Lemur AAGCTTCATAG…TTACATCATCCAHomo AAGCTTCACCG…TTACATCCTCATPan AAGCTTCACCG…TTACATCCTCATGoril AAGCTTCACCG…CCCACGGACTTAPongo AAGCTTCACCG…GCAACCACCCTCHylo AAGCTTTACAG…TGCAACCGTCCTMaca AAGCTTTTCCG…CGCAACCATCCT
Lemur AAGCTTCATAG…TTACATCATCCAHomo AAGCTTCACCG…TTACATCCTCATPan AAGCTTCACCG…TTACATCCTCATGoril AAGCTTCACCG…CCCACGGACTTAPongo AAGCTTCACCG…GCAACCACCCTCHylo AAGCTTTACAG…TGCAACCGTCCTMaca AAGCTTTTCCG…CGCAACCATCCT
Lemur AAGCTTCATAG…TTACATCATCCAHomo AAGCTTCACCG…TTACATCCTCATPan AAGCTTCACCG…TTACATCCTCATGoril AAGCTTCACCG…CCCACGGACTTAPongo AAGCTTCACCG…GCAACCACCCTCHylo AAGCTTTACAG…TGCAACCGTCCTMaca AAGCTTTTCCG…CGCAACCATCCT
Homo sapiens
Pan
Gorilla
Pongo
Hylobates
Homo sapiens
Pongo
Gorilla
Pan
Hylobates
Homo sapiens
Pan
Gorilla
Pongo
Hylobates
Generate data sets on tree given branch lengths and substitution parameters …
reestimate the tree
Simulated data sets
Is Cyclura Sister to Iguana?A Case of Long Branch Attraction? -- The Parametric Bootstrap Applied
Hyp A Hyp B
Outgroup Outgroup
Cyclura
Cyclura
Iguana Iguana
Sauromalus
Sauromalus
Others
Others
0
10
20
30
40
50
60
A B C D E
0
5
10
15
20
25
30
35
A B C D E
Parsimony Likelihood
0
20
40
60
80
100
120
A B C D E
0
10
20
30
40
50
60
70
80
90
A B C D E
Parsimony Likelihood
Kishino-Hasegawa Test (KH-test) (Kishino and Hasegawa, 1989)
Lemur catta
Tarsius syrichta
Saimiri sciureus
Macaca fuscata
M. mulatta
M. fascicularis
M. sylvanus
Homo sapiens
Pan
Gorilla
Pongo
Hylobates
Lemur catta
Tarsius syrichta
Saimiri sciureus
Macaca fuscata
M. mulatta
M. fascicularis
M. sylvanus
Homo sapiens
Pan
Gorilla
Pongo
Hylobates
-lnL = 5735.81631-lnL = 5728.062107.75420
Site log likelihood scores
1 2 3 4 5 6 7 … n-sites SUMTree 1 -1.8035 -1.8035 -3.2596 -4.8071 -2.2836 -2.2836 -9.7611 … -2.1021 -5728.0621Tree 2 -1.8045 -1.8045 -3.2608 -4.8120 -2.2846 -2.2846 -9.7737 … -2.1030 -5735.8163Difference 0.0010 0.0010 0.0012 0.0048 0.0010 0.0010 0.0126 … 0.0009 7.7542
Distribution of Log Likelihood Differences
0
100
200
300
400
500
600
700
800
900
-3.0
2661
899
-2.7
7316
1101
-2.5
1970
3212
-2.2
6624
5323
-2.0
1278
7434
-1.7
5932
9545
-1.5
0587
1656
-1.2
5241
3767
-0.9
9895
5878
-0.7
4549
7989
-0.4
9204
01
-0.2
3858
2211
0.01
4875
678
0.26
8333
567
0.52
1791
456
Frequency
KH-test Assumptions
• Large number of sites sampled. In this case the paired t-test essentially equals a z-test
• Site are independently and identically distributed.
• Trees must be selected a priori• Alternative to KH-test exist that relax some
of these assumptions– SH-test (see Goldman, 2000)
Bayesian Approach to Phylogenetic Inference
(see Huelsenbeck, et al. 2001)
• Based on a quantity call the Posterior Probability of a tree
• Posterior Probability of a tree is the probability of the tree conditional on the observations. P.P. is given by combining a the prior and the likelihood for each tree using Baye’s rule.
• Baye’s rule
€
Pr[Tree | Data] = Pr[Data | Tree] × P[Tree]
Pr[Data]
The likelihood The prior probability
Normalizing constant
Calculating the Posterior Probability of a tree
• Usually the P[Tree] is equally probable for all trees• The likelihood is calculated in the same way as was
described last week• Easy to formulate but hard to solve analytically
– requires the summation over all trees and for each tree, integration over all combination of branch lengths and substitution parameters.
• Markov Chain Monte Carlo can be used to approximate the Posterior Probability
• MCMC algorithm– propose new tree by stochastically perturbing the current tree– accept or reject with a probability according to the algorithm describe by
Metropolis-Hasting– repeat many thousands of times (AKA, generations)
Approximation of the Posterior Probabilities using MCMC
-lnL
Generations (x106)
Tree with Bayesian Posterior Probabilities for node support
Uta stansburianaUrosaurus ornatusP. hernandesiP. platyrhinosC. d. crinitisC. d. carmenesisC. d. rhodostictusC. d. myurusC. d. ventralisC. d. bogertiC. d. ssp.C. t. texanusC. t. scitulusH. l. lacerataH. l. subcaudalisH. m. elegansH. m. thermophilaH. m. campiH. m. bunkeriH. m. ruthveniH. m. flavilentaH. m. maculataH. p. propinquaU. scoparia 1U. scoparia 2U. n. rufopunctataU. n. notataU. inornataU. exsulU. paraphygasSceloporus jarrovii
60
82
98
68
53
10078
97
96100
100100
56
100
95
79
62100
6686
96
74100
100
10086
100
Bootstrap
BibliographyAkaike, H. 1974. A new look at the statistical model identification. IEEE Trans. Autom. Contr., 19:716-723.
Atchley, W. R. and W. M. Fitch. 1991. Gene trees and the origins of inbred strains of mice. Science 254: 554-558.
Efron, B. (1985) Bootstrap confidence intervals for a class of parametric problems. Biometrika, 72, 45-58.
Fitch, W. M. and W.R. Atchley. 1985. Evolution in inbred strains of mice appears rapid. Science 228:1169-1175.
Fitch, W. M. and E. Margoliash. 1967. Construction of phylogenetic trees. Science155:279-284.
Hillis, D. M., J. J. Bull, M. E. White, M. R. Badgett, and I. J. Molineux. 1992. Experimental phylogenetics: generation of a known phylogeny. Science 255:589-592.
Hillis, D. M., J. P. Huelsenbeck, and C. W. Cunningham. 1994. Application and accuracy of molecular phylogenies. Science 264:671-677.
Holder, M. T. 2001. Using a Complex Model of Sequence Evolution to Evaluate and Improve Phylogenetic Methods. Ph.D. Dissertation. Univ. of Texas at Austin.
Huelsenbeck, J. P., Hillis, D. M. and Jones, R. 1996. Parametric bootstrapping in molecular phylogenetics: Applications and performance. In Ferraris, J. D. and Palumbi, S. R. (eds.), Molecular Zoology. Advances, strategies and protocols. Wiley-Liss, New York, pp. 19-45.
Huelsenbeck, J. P., Ronquist, F., Nielsen, R., Bollback, J. P. 2001. Bayesian inference of phylogeny and its impact on evolutionary biology. Science 294: 2310- 2314.
Bibliography (continued)Goldman, N. 1993. Statistical tests of models of DNA substitution. Journal of Molecular Evolution 36: 182-98.
Goldman, N., J. P. Anderson, and A. G. Rodrigo. 2000. Likelihood-based tests of topologies in phylogenetics. Systematic Biology 49:652-670.
Kishino, H. and M. Hasegawa. 1989. Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in Hominoidea. Journal of Molecular Evolution 29:170-179.
Lewis, P. O. 2001. Phylogenetic systematics turns a new leaf . Trends in Evolution and Ecology 16:30-36.
Li, W. 1997. Molecular Evolution. Sinauer Associates. Sunderland, Massachusetts.
Page, R. D. and Holmes, E. C. 1998. Molecular Evolution: A Phylogenetic Approach. Blackwell Science, Oxford.
Poe, S., and D. L. Swofford. 1999. Taxon sampling revisited. Nature 389:299-300.
Shimodaira, H. and M. Hasegawa. 1999. Multiple Comparisons of Log-Likelihoods with Applications to Phylogenetic Inference. Molecular Biology and Evolution 16:1114-1116.
Steel, M. and Penny, D. 2000. Parsimony, likelihood, and the role of models in molecular phylogenetics. Molecular Biology and Evolution 17:839-850.
Swofford, D. L., G. J. Olsen, P. J. Waddell, and D. M. Hillis. 1996. Phylogenetic inference. Pages 407-514 in D. M. Hillis, C. Moritz, and B. Mable (eds.) Molecular Systematics (2nd ed.), Sinauer Associates, Sunderland, Massachusetts.