a note on the relaxation time of two markov chains on rooted phylogenetic tree spaces

Statistics and Probability Letters 84 (2014) 247–252

Contents lists available at ScienceDirect

Statistics and Probability Letters

journal homepage: www.elsevier.com/locate/stapro

A note on the relaxation time of two Markov chains on rootedphylogenetic tree spacesDavid A. Spade a,∗, Radu Herbei b, Laura S. Kubatko b

a University of Missouri–Kansas City, Kansas City, MO, 64110, United Statesb The Ohio State University, Columbus, OH, 43210, United States

a r t i c l e i n f o

Article history:Received 8 May 2012Received in revised form 19 August 2013Accepted 18 September 2013Available online 29 October 2013

Keywords:Markov chainsPhylogenetic treesRelaxation timeDistinguished paths

a b s t r a c t

Phylogenetic trees are commonly used to model the evolutionary relationships among acollection of biological species. Over the past fifteen years, the convergence properties forMarkov chains defined on phylogenetic trees have been studied, yielding results about thetime required for such chains to converge to their stationary distributions. In this work wederive an upper bound on the relaxation time of twoMarkov chains on rooted binary trees:one defined by nearest neighbor interchanges (NNI) and the other defined by subtree pruneand regraft (SPR) moves.

© 2013 Elsevier B.V. All rights reserved.

1. Introduction

In biology, it is often of interest to study the patterns of evolution among a collection of species. Typical assumptionsare that the species have evolved from a common ancestor and that the process of speciation results in the formation oftwo new species at a single point in time. To visually describe these assumptions, biologists commonly use a rooted, binarytree called a phylogenetic tree. This undirected, acyclic graph has n external vertices, called leaves, and n− 2 internal nodesof degree 3. The graph also has one node of degree 2 that shall be termed the root. Markov chain Monte Carlo methods arefrequently used to estimate the distribution of these trees givenDNA sequences at the leaves. Therefore, understanding ratesof convergence of Markov chains widely used in phylogenetics is important in efficiently estimating phylogenetic trees.

While a considerable amount of work has been done in the area of mixing times for Markov chains, the application ofthese techniques to phylogenetics has only been considered in the past fifteen years. Diaconis and Stroock (1991) developinequalities that are integral to our study of relaxation andmixing times. In particular, they establish bounds on the spectralgap of the transition matrix of an irreducible, aperiodic, and reversible Markov chain. Aldous (2000) explores the idea ofusing chain coupling to establish bounds on the mixing time of a Markov chain on unrooted phylogenies. His work gives anO(n3) upper bound on the relaxation time of a chainwhere a step of the chain consists of removing a leaf from a tree and thenattaching it to another edge. Aldous (2000) also expands the concepts brought forth by Diaconis and Stroock (1991), provingthat the relaxation timeof his chain is boundedbelowbyO(n2). He conjectures that the relaxation time is also bounded aboveby O(n2), and Schweinsberg (2002) later proves Aldous’s conjecture using the method of distinguished paths.

Randall and Tetali (1999) investigate the time required for a Markov chain on rooted phylogenetic trees to converge toits stationary distribution. Their chain moves about the set Tn of n-leaf rooted phylogenetic trees by performing at each stepof the chain a tree rearrangement similar to those in one of the Markov chains we describe below. They also establish thatthe mixing time of the chain under study is O(n5 log n).

∗ Corresponding author. Tel.: +1 816 235 2853.E-mail addresses: [email protected] (D.A. Spade), [email protected] (R. Herbei), [email protected] (L.S. Kubatko).

0167-7152/$ – see front matter© 2013 Elsevier B.V. All rights reserved.http://dx.doi.org/10.1016/j.spl.2013.09.017

http://dx.doi.org/10.1016/j.spl.2013.09.017

http://www.elsevier.com/locate/stapro

http://www.elsevier.com/locate/stapro

http://crossmark.crossref.org/dialog/?doi=10.1016/j.spl.2013.09.017&domain=pdf

mailto:[email protected]



http://dx.doi.org/10.1016/j.spl.2013.09.017

248 D.A. Spade et al. / Statistics and Probability Letters 84 (2014) 247–252

Fig. 1. Example SPRmoves. Tree x is split into two subtrees shown to the right along the branchmarked with a black dot. Tree y1 is formed by re-attachingthe rightmost subtree to the branch ancestral to the clade containing leaves r5 and r6 . Tree y2 is formed by re-attaching the leftmost subtree to the branchancestral to leaf r1 . The tree y3 is formed by re-attaching the two branches extending back from the roots of the trees formed by splitting x.

In theworkwe present here, we establish upper bounds on the relaxation time of two particularMarkov chains on rootedbinary trees. One of theseMarkov chainsmoves about Tn in a fashion similar to the chain of Aldous (2000) and Schweinsberg(2002). We add some conventions to these tree rearrangements in order to handle situations that arise with rooted trees,but that do not occur with unrooted trees. The other Markov chain moves about the tree space by making moves similar tothose in the work of Randall and Tetali (1999).

2. Background and notation

Let Tn be the set of n-leaf rooted trees, having cardinality cn ≡ (2n − 3)!! (Felsenstein, 2004). A homogeneous Markovchain {Xt}t≥0 on Tn is defined via a transition probability matrix of P(x, y) = Pr(Xm+1 = y | Xm = x), for each x, y ∈ Tn. Weassume that P(·, ·) satisfies the usual regularity conditions ensuring existence and uniqueness of a stationary distributionπ(·). In this paper we focus on two types of transitions on Tn: (1) subtree prune and regraft (SPR) and (2) nearest neighborinterchange (NNI).

An SPR transition consists of choosing an edge uniformly at random and pruning the subtree that descends from thisedge. The pruned subtree is viewed as being rooted at the node that descends from the selected edge, keeping the edgeextending out from this root. The remaining tree is viewed as being rooted at the most recent common ancestor (MRCA)of the remaining leaves, with an added edge that extends back from this tree. An edge is randomly selected (from eitherof the two subtrees) and the edge extending back from the root of the other subtree is attached to this edge. The root ofthe resulting tree is either the root of the subtree to which the edge is attached (when the randomly selected edge for re-attachment is not the one extending back from the root) or is the node formed by re-attaching the subtree (when the edgeselected for re-attachment is the one extending back from the root). Fig. 1 gives an example of three possible SPR moves.

An NNI transition is performed by first choosing an internal node (other than the root node) to be the target node. Thetarget node has two child nodes and a sibling node. Two of these three nodes, along with their descendant subtrees, will beselected to become the new children of the target node. With probability 0.5, we select the two current children (and thetree does not change), and with probability 0.5 we select the sibling node and one of the two children at random. In thissituation, the child that is not selected becomes the new sibling of the target node. Let {Xt}t≥0 be theMarkov chain resultingfrom SPR transitions and {Yt}t≥0 be the chain resulting from NNI transitions. The following lemma describes the transitionprobability matrices for the two chains.

Lemma 1. Let P1 and P2 be the transition probability matrices for the SPR and NNI chains, respectively. Then

(i) For each x, y ∈ Tn, such that P1(x, y) > 0, P1(x, y) ≥1

(2n−2)2.

(ii) For each x, y ∈ Tn such that P2(x, y) > 0, P2(x, y) =12 if x = y and P2(x, y) =

14(n−2) , otherwise.

Part (i) of this lemma follows from the fact that there are 2n−2 choices for the edge that is cut and, given the edge that iscut, there are 2n−2ways to re-attach the two resulting subtrees. Some of thesemay give the same transition between trees.The proof of part (ii) of this lemma is straightforward and is thus omitted from this manuscript. Note that both the NNI andthe SPR transitions are reversible and symmetric and that the resulting Markov chains are aperiodic and irreducible (Karlinand Taylor, 1975), thus ergodic. In both cases, the unique stationary distribution is the uniform distribution, π(x) = 1/cn,for each x ∈ Tn.

Often, it is of interest to study the rate at which an ergodic Markov chain converges to its stationary distribution. This istypically done by obtaining upper and lower bounds on themixing time of the Markov chain. The mixing time of the process{Xm}m≥0 is defined as τmix(ϵ) := min

m : maxx∈Tn ∥Pm(x, ·)− π(·)∥TV < ϵ

, where ∥µ(·) − ν(·)∥TV denotes the total

D.A. Spade et al. / Statistics and Probability Letters 84 (2014) 247–252 249

variation distance between the probability measures µ(·) and ν(·). For background and some elementary properties of thetotal variation distance and the mixing time of a Markov chain, see for example Chapter 4 of Levin et al. (2009). Deriving anupper bound on the mixing time of a Markov chain has long been a research topic (see Montenegro and Tetali (2006) fora recent methodological survey). Alternatively, researchers have developed bounds for the relaxation time, τrel, of a Markovchain, which is defined as 1/(1 − λ2), where λ2 is the second largest eigenvalue of the transition matrix P. There is a well-known connection between the mixing time and relaxation time of a Markov chain, given for example in Theorem 12.3of Levin et al. (2009), which states that

τmix(ϵ) ≤ log

1ϵπmin

τrel.

Here πmin = minx π(x). In this work we make use of the method of distinguished paths to derive an upper bound onthe relaxation times for the Markov chains {Xt}t≥0 and {Yt}t≥0. This approach has been used in various settings; see forexample Diaconis and Stroock (1991), Jerrum and Sinclair (1989) and Schweinsberg (2002).

3. Main result

In this section we obtain upper bounds on the relaxation times of {Xt}t≥0 and {Yt}t≥0. We begin by stating three lemmasfrom Schweinsberg (2002) that are modified to handle rooted trees and that will be used to establish the upper bounds.

Consider a subset U of the leaves {1, 2, . . . , n}. For an unrooted tree x, consider the tree formed by removing all of theleaves whose labels are not in U from x. The resulting tree is called the U-tree derived from x. The following result is statedin Schweinsberg (2002).

Lemma 2. If U is a k-element subset of {1, 2, . . . , n} and x is a uniform random n-leaf tree, then the U-tree derived from x is auniform random k-leaf tree.

To see that this result holds for rooted trees as well, note that a rooted n-leaf tree can be viewed as an unrooted (n+ 1)-leaftree with the (n + 1)st leaf extending from the root.

For x ∈ Tn, the diameter of x is denoted by diam(x) and is defined to be the number of edges traversed in the longestpath between any two leaves, where the path does not intersect itself. The following lemma states that most trees have adiameter of order O(

√n).

Lemma 3. There exists a constant C1 < ∞ such that the set B = {x ∈ Tn : diam(x) ≤ C1n12 } has π(B) ≥

12 , where π(·) is the

uniform distribution on Tn.

Proof. Schweinsberg (2002) shows that if π∗ is the uniform distribution on the space of unrooted tree topologies, thenthere exists a constant C2 < ∞ such that themedian diameter with respect to π∗ is no larger than C2

√n. Since a rooted tree

topology can be viewed as an unrooted tree topology with n + 1 leaves, it follows that the median diameter, with respectto π , of a rooted tree topology is no larger than C2

√n + 1, thus establishing Lemma 3. �

Let G = (Tn, E) be the underlying graph corresponding to the transition matrix P1. The vertex set of G is Tn, and the edgeset is E = {e ≡ (x, y) : P1(x, y) > 0}. Given an edge e ∈ E, we set Q (e) = Q (x, y) = π(x)P1(x, y). The following result,stated in Schweinsberg (2002), is used to provide an upper bound on the relaxation time τrel(X).

Lemma 4. Let B ⊂ Tn. Suppose that for all x ∈ Tn and y ∈ B, γxy is a path in G, possibly random, from x to y which has at most Ledges. Then,

τrel(X) ≤4Lπ(B)

maxe∈E

1

Q (e)

x∈Tn

y∈B

π(x)π(y) Pr(e ∈ γxy)

. (1)

We are now ready to state the first of our two main results.

Theorem 1. Let τrel(X) be the relaxation time of the Markov chain {Xt}t≥0. Then there exists a finite constant M1 < ∞ such thatτrel(X) ≤ M1n5/2.

Proof. Our approach consists of analyzing the factors appearing on the right hand side of (1). Using Lemma 3, select thesubset B ⊂ Tn such that π(B) ≥ 1/2 and every rooted tree x ∈ B has diameter of order O(

√n). Our next step is to provide

an upper bound for the lengths of paths from elements x ∈ Tn to y ∈ B. We construct a random path γxy = (x1, x2, . . . , xm)such that x1 = x and xm = y. At step k of the path, the tree xk+1 is formed by removing a leaf from xk and re-attaching it insuch a way that the arrangement of the leaves that are moved in steps 1, . . . , k is the same in xk+1 as the arrangement ofthe same leaves in y.

Formally, let R = {r1, r2, . . . , rn} be a uniform random permutation of the leaf labels of x. For some set A ⊂ R andtree x ∈ Tn, let x(A) be the subtree of x that results from the removal from x of all leaves whose labels are not in A.


Fig. 2. Illustration of the first two steps of the SPR path from x (leftmost tree) to y (rightmost tree). In the first step, the leaf labeled r2 has been removedand re-attached to the branch immediately ancestral to the leaf r1 . In the second step, the leaf labeled r3 has been removed and re-attached to the branchimmediately ancestral to leaf r1 .

The path starts with x1 = x. The first step of the path consists of removing the leaf with label r2 and attaching it to theleaf labeled r1. The resulting tree is denoted by x2. The tree x2 contains a subtree S2, which we define to be the rootedsubtree that contains only the leaves r1 and r2 and the edge ascending from its root. Assume that for k ≥ 2, we haveconstructed a tree xk such that it contains a subtree Sk with the properties that (i) the root of Sk is theMRCA of {r1, r2, . . . , rk};(ii) xk(r1, r2, . . . , rk) = y(r1, r2, . . . , rk); and (iii) Sk includes an edge ascending from its root. The tree xk+1 is definedby removing the leaf rk+1 from xk and attaching it to the unique edge of Sk that produces xk+1 with the property thatxk+1(r1, r2, . . . , rk+1) = y(r1, r2, . . . , rk+1). At step n − 1, we remove leaf rn from xn−1 and attach it to Sn−1 such thatxn = Sn = y. The first two steps of a typical path of this type are illustrated in Fig. 2.

This approach defines a path γxy from x to y. If an edge e ∈ E connects xk to xk+1 in the path γxy, we say e ∈ γxy at the kthstep. Clearly, |γxy| ≤ n − 1 for all x, y ∈ Tn. By Lemma 4,

τrel(X) ≤4(n − 1)π(B)

maxe∈E

1

Q (e)

x∈Tn

y∈B


.

The chain {Xm} is irreducible and aperiodic with a symmetric transition matrix, so the stationary distribution for the chainis uniform on Tn. If P1(x, y) > 0, then Q (e) ≥ 1/

(2n − 2)2cn

,which implies that

τrel(X) ≤ 8(n − 1)cn(2n − 2)2 maxe∈E

x∈Tn

y∈B


≤ 32n3cn maxe∈E

x∈Tn

y∈B

π(x)π(y) Pr(e ∈ γxy).

Consider the set K(e) =k : Pr(e ∈ γxy at step k) > 0 for some x ∈ Tn, y ∈ B

. Schweinsberg (2002) shows that in the case

of unrooted trees |K(e)| = O(n12 ). For rooted trees, the argument is similar; hence,

τrel(X) ≤ 32n3cn maxe∈E

x∈Tn

y∈B

π(x)π(y)

k∈K(e)

Pr(e ∈ γxy at step k)

≤ 32C1n7/2cn max

e∈Emaxk∈K(e)

x∈Tn

y∈Tn

π(x)π(y) Pr(e ∈ γxy at step k), (2)

for some constant C1 < ∞. We derive a further upper bound for (2) as follows. Assume that x, y ∈ Tn are independentuniform random n-leaf trees and {r1, r2, . . . , rn} is a random permutation of the leaves. We consider a fixed edge e ∈ E anda fixed k ∈ K(e). Let v and w be the trees connected by edge e so that e is on the path γxy at step k means that xk = v andxk+1 = w. In this case, there are three independent events that must occur:

(a) The subtree Sk contains the leaves r1, r2, . . . , rk and the leaf being moved is rk+1.(b) The {r1, r2, . . . , rk+1}-tree derived from y is the same as the {r1, r2, . . . , rk+1}-tree derived fromw.(c) The {r1, rk+1, rk+2, . . . , rn}-tree derived from x is the same as the {r1, rk+1, rk+2, . . . , rn}-tree derived from v.

Event (a) has probability 1/ nk

× 1/(n− k) because r1, r2, . . . , rn is a random permutation of the leaf labels. Events (b) and

(c) have probabilities 1/ck+1 and 1/cn−k+1, respectively, by applying Lemma 2. Thus we havex∈Tn

y∈Tn

π(x)π(y) Pr(e ∈ γxy at step k) ≤1 n

k

(n − k)ck+1cn−k+1

.

Stirling’s formula can be used to show that for rooted trees cn ≈ 2nnn−1e−n, and it is then straightforward to show that

cn nk

(n − k)ck+1cn−k+1

≤ C2/n ,

for some constant C2 < ∞. Combining this with (2) yields the desired result. �

D.A. Spade et al. / Statistics and Probability Letters 84 (2014) 247–252 251

Fig. 3. (a) An NNI that results in leaf r5 being moved up one level in the tree. In the starting tree, the target node is identified by a large black dot. Thesibling and the child r5 are interchanged, yielding the tree on the right. (b) An NNI that results in leaf r5 being moved down one level of the tree. The leftchild and sibling (r5) of the target are interchanged to yield the tree on the right.

3.1. The NNI chain

An argument similar to that given above can be used to derive an upper bound on the relaxation time of the Markovchain {Yt}t≥0 which we give in the following theorem.

Theorem 2. Let τrel(Y) be the relaxation time for {Yt}t≥0. There exists a finite constant M2 < ∞ such that τrel(Y) ≤ M2n4.

Proof. We construct a path from x∈ Tn to y ∈ Tn, where each step of the path is completed by performing an NNI transitionon the tree that results from the previous step. We develop the path by starting with the SPR path described above andsplitting each SPR transition into a series of NNI steps, thus constructing an NNI path. Lemma 4 can then be used to developan upper bound on the relaxation time of {Yt}t≥0.

Construction of the NNI path. Consider an SPR transition that converts tree v ∈ Tn to treew ∈ Tn by pruning and regraftinga leaf l. We decompose this move into a sequence of NNI transitions in which leaf l is moved ‘‘up’’ one level at a time in thetree and/or moved ‘‘down’’ the tree until w is reached. Fig. 3 shows how any leaf can be moved up or down one level ofthe tree by performing an NNI transition. This idea enables us to construct an NNI path ψxy between any two trees x and yin Tn by first constructing an SPR path γxy as before, and then decomposing each SPR transition into a series of NNI steps.Formally, let x, y ∈ Tn and let γxy = (x1, x2, . . . , xn) be the SPR path described above such that x = x1 and y = xn. Thecorresponding NNI path can be written as

ψxy = (x(1)1 , x(2)1 , . . . , x

(n1−1)1 , x(1)2 , x

(2)2 , . . . , x

(n2−1)2 , . . . , x(1)k , . . . , x

(nk−1)k , x(1)k+1, . . . , x

(nn−1−1)n−1 , x(1)n ) (3)

where nk denotes the number of NNI moves needed to convert tree xk to tree xk+1 and x1 = x(1)1 , x2 = x(1)2 , . . . , xn = x(1)n .The length of such a path is no longer than n(n − 1). Also, observe that each intermediary tree x(i)k has the property that itcontains a subtree Sk having the properties (i)–(iii) as in the SPR case.

Define the subpath ψ (k)xy = (x(1)k , x

(2)k , . . . , x

(nk−1)k , x(1)k+1) and note that this is an NNI path from xk to xk+1. Let A denote

the edge set of the underlying graph corresponding to the NNI given by the transition matrix P2. Using Lemma 4 again, therelaxation time for the NNI chain is bounded by

τrel(Y) ≤ D1n3cn maxa∈A

x∈Tn

y∈B

π(x)π(y) Pr(a ∈ ψxy) ≤ D1n3cn maxa∈A

x∈Tn

y∈B

π(x)π(y)

k

Pra ∈ ψ (k)

xy

for some finite constantD1 < ∞.We claim that the rightmost sumabove only containsO(n1/2)non-zero terms (as in the SPRcase). To see this, note that the event ‘‘a ∈ ψ

(k)xy ’’ can only occur if edge a is on the NNI path that produces edge e = (xk, xk+1)

in the SPR path at step k; in other words, the events ‘‘a ∈ ψ(k)xy ’’ and ‘‘e ∈ γxy at step k’’ are equivalent. As previously noted,

there are at most O(n1/2) possible values for the integer k such that P(e ∈ γxy at step k) > 0, and thus it follows that

τrel(Y) ≤ D1n7/2cn maxa∈A

maxk

x∈Tn

y∈Tn

π(x)π(y) Pra ∈ ψ (k)

xy

. (4)

Also, observe that if a ∈ ψ(k)xy then a = (v,w) for some v,w ∈ Tn and the following three events must occur:

(a’) The subtree Sk contains the leaves r1, r2, . . . , rk and the leaf being moved is rk+1.(b’) The {r1, r2, . . . , rk}-tree derived from y is the same as the {r1, r2, . . . , rk}-tree derived fromw and the leaf being moved

is rk+1;(c’) The {r1, rk+2, rk+3, . . . , rn}-tree derived from x is the same as the {r1, rk+2, rk+3, . . . , rn}-tree derived from v and the leaf

being moved is rk+1.

Event (a’) has probability 1/ nk

×1/(n−k), because a randompermutation of leaves is used to form the SPRpath. Conditional

on (a’), events (b’) and (c’) are independent and have probabilities 1/ck and 1/cn−k, respectively, using Lemma 2, and thusx∈Tn

y∈Tn

π(x)π(y) Pra ∈ ψ (k)

xy

≤

1 nk

(n − k)ckcn−k

.

The desired bound now follows in a similar way as in the proof of Theorem 1. �


4. Conclusions

In this paper, we investigate the convergence of two classes of commonly-used Markov chains on rooted phylogenetictree spaces. We develop an upper bound of O(n5/2) on the relaxation time of the SPR chain, as well as an upper bound ofO(n4) on the relaxation time of the NNI chain.We note thatwhile the bounds given here are the best established to date, theyare unlikely to be sharp. For example, Aldous has recently conjectured (Aldous, 2012) that the relaxation time for a chainsimilar to those discussed here isO(n3/2). Herbei and Kubatko (2013) have obtained similar results using a simulation-basedmethod. It is important to continue study of themixing time for chains of these types, because these results have implicationsfor the design of MCMC algorithms to infer phylogenetic trees given data on the leaves in a Bayesian setting.

References

Aldous, D., 2000. Mixing time for a Markov chain on cladograms. Combinatorics, Probability and Computing 9, 191–204.Aldous, D., 2012. http://www.stat.berkeley.edu/~aldous/Research/OP/clad-mix.pdf. Date visited 09/20/2012.Diaconis, P., Stroock, D., 1991. Geometric bounds for eigenvalues of Markov chains. The Annals of Applied Probability 1, 36–61.Felsenstein, J., 2004. Inferring Phylogenies, second ed. Sinauer Associates, Inc.Herbei, R., Kubatko, L., 2013. Monte Carlo estimation of total variation distance of Markov chains on large spaces, with application to phylogenetics.

Statistical Applications in Genetics and Molecular Biology 12, 39–48.Jerrum, M., Sinclair, A., 1989. Approximating the permanent. SIAM J. Comput. 18, 1149–1178.Karlin, S., Taylor, H., 1975. A First Course in Stochastic Processes, second ed. Academic Press, Inc.Levin, D.A., Peres, Y., Wilmer, E.L., 2009. Markov Chains and Mixing Times. American Mathematical Society.Montenegro, R., Tetali, P., 2006. Mathematical aspects of mixing times in Markov chains. Foundations and Trends in Theoretical Computer Science 1 (3),

237–354.Randall, D., Tetali, P., 1999. Analyzing Glauber dynamics by comparison of Markov chains. Journal of Mathematical Physics 41, 1598–1615.Schweinsberg, J., 2002. An O(n2) bound for the relaxation time of a Markov chain on cladograms. Random Structures & Algorithms 20, 59–70.

http://refhub.elsevier.com/S0167-7152(13)00315-5/sbref1

http://www.stat.berkeley.edu/~aldous/Research/OP/clad-mix.pdf










a note on the relaxation time of two markov chains on rooted phylogenetic tree spaces

Documents