cutoff on all ramanujan graphs - nyu couranteyal/papers/ramanujan.pdfcutoff on all ramanujan graphs...

CUTOFF ON ALL RAMANUJAN GRAPHS

EYAL LUBETZKY AND YUVAL PERES

Abstract. We show that on every Ramanujan graph G, the simple

random walk exhibits cutoff: when G has n vertices and degree d, the

total-variation distance of the walk from the uniform distribution at

time t = dd−2

logd−1 n + s√

logn is asymptotically P(Z > c s) where

Z is a standard normal variable and c = c(d) is an explicit constant.

Furthermore, for all 1 ≤ p ≤ ∞, d-regular Ramanujan graphs minimize

the asymptotic Lp-mixing time for SRW among all d-regular graphs.

Our proof also shows that, for every vertex x in G as above, its distance

from n− o(n) of the vertices is asymptotically logd−1 n.

1. Introduction

A family of d-regular graphs Gn with d ≥ 3 fixed is called an expander,

following the works of Alon and Milman [4,6] from the 1980’s, if all nontrivial

eigenvalues of the adjacency matrices are uniformly bounded away from d.

Lubotzky, Phillips, and Sarnak [25] defined a connected d-regular graph G

with d ≥ 3 to be Ramanujan iff every eigenvalue λ of its adjacency matrix

is either ±d or satisfies |λ| ≤ 2√d− 1. Such expanders, which in light of

the Alon–Boppana Theorem [29] have an asymptotically optimal spectral

gap, were first constructed, using deep number theoretic tools, in [25] and

independently by Margulis [28] (see also [13, 24] and Fig. 1). Due to their

remarkable expansion properties, Ramanujan graphs have found numerous

applications (cf. [19] and the references therein). However, after 25 years of

study, the geometry of these objects is still mysterious, and in particular,

determining the profile of distances between vertices in such a graph and

the precise mixing time of simple random walk (SRW) remained open.

Formally, letting ‖µ − ν‖tv = supA[µ(A) − ν(A)] denote total-variation

distance, the (L1) mixing time of a finite Markov chain with transition kernel

P and stationarity distribution π is defined as

tmix(ε) = min{t : Dtv(t) ≤ ε} where Dtv(t) = maxx‖P t(x, ·)− π‖tv .

A sequence of finite ergodic Markov chains is said to exhibit cutoff if its

total-variation distance from stationarity drops abruptly, over a period of

time referred to as the cutoff window, from near 1 to near 0; that is, there

is cutoff iff tmix(ε) = (1 + o(1))tmix(ε′) for any fixed 0 < ε, ε′ < 1.

Our main result shows that the Ramanujan assumption implies cutoff

with tmix(ε) = ( dd−2 + o(1)) logd−1 n and window O(

√log n).

1

2 E. LUBETZKY AND Y. PERES

Figure 1. A ball of radius 4 in the Lubotzky–Phillips–Sarnak6-regular Ramanujan graph on n = 12180 vertices via PSL(2,F29).

Theorem 1. On any sequence of d-regular non-bipartite Ramanujan graphs,

SRW exhibits cutoff. More precisely, let G be such a graph on n vertices and

t?(n) := dd−2 logd−1 n .

Then for every fixed s ∈ R and every initial vertex x, the SRW satisfies

Dtv

(t?(n) + s

√logd−1 n

)→ P (Z > cd s) as n→∞ , (1.1)

where Z is a standard normal random variable and cd = (d−2)3/2

2√d(d−1)

.

Consequently, we obtain that the profile of graph distances from every

vertex x in a d-regular Ramanujan graph G concentrates on logd−1 n (the

minimum possible value it can concentrate on in a d-regular graph).

Corollary 2. Let G be a d-regular Ramanujan graph on n vertices. Then

for every vertex x in G,

#{y :∣∣dist(x, y)− logd−1 n

∣∣ > 3 logd−1 log n}

= o(n) ,

and furthermore, for all except o(n) vertices y there is a nonbacktracking

cycle1 through x, y of length at most 2 logd−1 n+ 6 logd−1 log n.

More can be said about high-girth Ramanujan graphs, e.g., the bipartite

LPS expanders whose girth is asymptotically 43 logd−1 n (see, e.g., [24, §7]).

Corollary 3. Let G be a d-regular Ramanujan graph with n vertices and

girth g, and set R = dlogd−1 n + 5 logd−1 log ne. For every k ≤ g − R and

simple path (xi)ki=1 in G, for all but a o(1)-fraction of simple paths (yi)

ki=1

in G, there are vertex-disjoint paths of length R from xi to yi for all i.

1A nonbacktracking cycle is a sequence of adjacent vertices v0, . . . , vk such that v0 = vkand vi 6= v(i+2) mod k for all i.

CUTOFF ON ALL RAMANUJAN GRAPHS 3

0.2

0.4

0.6

0.8

1.0

5 10 15 20 25 30

0.2

0.4

0.6

0.8

1.0

dd−2 logd−1 n 1

2 log1/ρ n

Figure 2. Distance of SRW from equilibrium in L1 (blue) and L2

(red, capped at 1). On left, the LPS graph on PSL(2,F29) shown inFig. 1; on right, asymptotics via Theorem 1 and Proposition 6.

0.2

0.4

0.6

0.8

1.0

5 10 15 20 25 30

0.2

0.4

0.6

0.8

1.0

logd−1 n

Figure 3. The analogue of Fig. 2 for the NBRW (see §1.4).

That Corollaries 2–3 also cover bipartite Ramanujan graphs (recently

shown to exist for every degree d ≥ 3 in [27]) follows form an extension of

the proof of Theorem 1 to the bipartite setting (see Corollary 3.9). Moreover,

it extends to the case where the graph G is weakly Ramanujan (see §1.2).

1.1. Background and related work. The cutoff phenomenon was first

identified in pioneering studies of Diaconis, Shahshahani and Aldous [1,2,14]

in the early 1980’s, and while believed to be widespread, rigorous examples

where it was confirmed were scarce. In view of the canonical example where

there is no cutoff—SRW on a cycle—and the fact that a necessary condition

for any reversible Markov chain to have cutoff is for its inverse spectral-gap to

be negligible compared to its mixing time, the second author conjectured [30]

in 2004 that on every transitive expander SRW has cutoff.

Durrett [15, §6] conjectured in 2008 that the random walk should have

cutoff on a uniformly chosen d-regular graph on n vertices (typically a good

expander) with probability tending to 1 as n→∞; indeed this is the case, as

was verified by the first author and Sly [22] in 2010. Subsequently, expanders

without cutoff were constructed in [23], but these were highly asymmetric.

The conjectured behavior of cutoff for all transitive expanders was reiterated

in the latter works (see [22, Conjecture 6.1] and [23, §3]), yet this was not

verified nor refuted on any single example to date.


As a special case, Theorem 1 confirms cutoff on all transitive Ramanujan

graphs—in particular for the Lubotzky–Phillips–Sarnak graphs (see Fig. 2).

The concentration of measure phenomenon in expanders, discovered by

Alon and Milman [6], implies that the distance from a prescribed vertex

is concentrated up to an O(1)-window. Formally, for every sequence of

expander graphs Gn on n vertices and vertex x ∈ V (Gn) there exists a

sequence mn,x and constants a,C > 0 so that, for every r > 0,

# {y ∈ V (Gn) : |distGn(x, y)−mn,x| > r} ≤ Ca−rn . (1.2)

Corollary 2 shows that mn,x = logd−1 n+O(log log n) for Ramanujan graphs.

As for the diameter, Alon and Milman ([6, Theorem 2.7]) showed that

diam(G) ≤ 2√

2d/(d− λ) log2 n for every d-regular graph G on n vertices

where all nontrivial eigenvalues are at most λ in absolute value. This bound

was improved to dlogd/λ(n − 1)e by Chung [11, Theorem 1], and then to⌊ cosh−1(n−1)cosh−1(d/λ)

⌋+1 in [12] using properties of Tk(x), the Chebyshev polynomials

of the first kind. Since cosh(12 log(d−1)) = d/(2√d− 1) for any d, this bound

translates to 2 logd−1 n+O(1) for Ramanujan graphs, and remains the best

known upper bound on the diameter of the LPS expanders (for which this

was proved directly in [25] via the polynomials Tk(x) as later used in [12]).

Corollary 2 implies this asymptotically for every Ramanujan graph: as the

distance from any vertex x to most of the vertices is (1 + o(1)) logd−1 n,

the distance between any two vertices x, y is at most (2 + o(1)) logd−1 n.

Moreover, one can deduce that for every two vertices x, y and every integer

` ≥ (2 + o(1)) logd−1 n, there exists a path of length exactly ` between x, y.

A new impetus for understanding distances in Ramanujan graphs is due

to their role as building blocks in quantum computing; see the influential

letter by P. Sarnak [32]. Some of Sarnak’s ideas were developed further

by his student N.T. Sardari in an insightful paper [31] posted to the arXiv

a few months after the initial posting of the present paper. For a certain

infinite family of (p + 1)-regular n-vertex Ramanujan graphs, Sardari [31]

shows that the diameter is at least b43 logp(n)c and also gives an alternative

proof of the first part of Corollary 2.

1.2. Extensions. A sequence of connected d-regular graphs (d ≥ 3 fixed)

Gn on n vertices is called weakly Ramanujan if, for some δn = o(1) as

n→∞, every eigenvalue λ of Gn is either ±d or has |λ| ≤ 2√d− 1 + δn.

Theorem 4. On any sequence of d-regular non-bipartite weakly Ramanujan

graphs, SRW exhibits cutoff. More precisely, if Gn is such a graph on n

vertices then for every initial vertex x, the SRW has

tmix(ε) =(

dd−2 + o(1)

)logd−1 n for every fixed 0 < ε < 1 .


Corollary 5. Let Gn be a d-regular weakly Ramanujan sequence of graphs

on n vertices. Then for every vertex x in Gn,

#{y :∣∣∣dist(x, y)

logd−1 n− 1∣∣∣ > ε

}= o(n) for every fixed ε > 0 .

Remark. The weakly Ramanujan hypothesis in Theorem 4 and Corollary 5

may be relaxed to allow some exceptional eigenvalues; for instance, we can

allow no(1) eigenvalues λ to only satisfy |λ| < d− ε′ for some ε′ > 0 fixed.

By the result of Friedman [17] that a random (uniformly chosen) d-regular

graph on n vertices is typically weakly Ramanujan (as conjectured by Alon),

Theorem 4 then implies cutoff, re-deriving the above mentioned result of [22].

More generally, for two graphs F andG, a covering map φ : V (G)→ V (F )

is a graph homomorphism that, for every x ∈ V (G), induces a bijection

between the edges incident to x and those incident to φ(x). If such a map

exists, we say G is a lift (or a cover) of F ; a random n-lift of F is a uniformly

chosen lift out of all those with cover number n (i.e., |φ−1(x)| = n for all x).

Friedman and Kohler [18] recently proved (see also [9, Corollary 20] by

Bordenave) that for every fixed d-regular base graph F and δ > 0, if G is

a random n-lift of F then typically all of its “new” eigenvalues (those not

inherited from F via pullback) are at most 2√d− 1 + δ. By the remark

above, Theorem 4 and Corollary 5 apply here (for any fixed regular F ).

1.3. Cutoff in Lp-distance. Theorem 1 showed that Ramanujan graphs

have an optimal tmix for SRW: the total-variation distance from (1.1) matches

a lower bound valid for every d-regular graph on n vertices (Fact 2.1 in §2).

It turns out that Ramanujan graphs are extremal for Lp-mixing for all p ≥ 1.

For 1 ≤ p ≤ ∞, the Lp-mixing time of a Markov chain with transition

kernel P from its stationary distribution π is defined as

t(Lp)mix (ε) = min{t : Dp(t) ≤ ε} where Dp(t) = max

x

∥∥∥P t(x,·)π − 1

∥∥∥Lp(π)

(note that p = 1 measures total-variation mixing since D1(t) = 2Dtv(t),

whereas the L2-distance D2(t) is also known as the chi-square distance).

Chen and Saloff-Coste [10, Theorem 1.5] showed that a lazy random walk

on a family of expander graphs exhibits Lp-cutoff, at some unknown location,

for all p ∈ (1,∞]. (On the notable exception of p = 1, it is said in [10] that

there “the question is more subtle and no good general answer is known.”)

The following theorem gives a lower bound on for SRW on a d-regular

graph, asymptotically achieved by Ramanujan graphs for all p ∈ (1,∞].


Figure 4. Normalized eigenvalues of Ramanujan graphs on n ≈ 104

vertices: the 6-regular LPS expanders on PSL(2,Fq) for q = 29

(front; every nontrivial eigenvalue has multiplicity at least q−12 ) and

a 1000-lift of the 3-regular Peterson graph (back).

Proposition 6. Fix d ≥ 3 and let ρ = 2√d− 1/d. Then for all connected

d-regular graphs G on n vertices and every fixed ε > 0, the SRW satisfies

t(Lp)mix (ε) ≥

{cd,p logd−1 n−O(log log n) if p ∈ (1, 2]p−1p log1/ρ n−O(log log n) if p ∈ [2,∞]

, (1.3)

where cd,p = [2β−1+ pp−1Hd−1

(β ‖ d−1

d

)]−1 for β = [1+(d−1)(p−2)/p]−1 and

Hb(β ‖ α) = β logb(βα) + (1 − β) logb(

1−β1−α) is the relative entropy function.

Furthermore, if G is non-bipartite Ramanujan then, with the same notation,

t(Lp)mix (ε) =

{cd,p logd−1 n+O(log log n) if p ∈ (1, 2]p−1p log1/ρ n+O(log log n) if p ∈ [2,∞]

. (1.4)

1.4. Method of proof. The natural route to exploit spectral details on

the transition kernel P for an upper bound on the L1-distance from the

stationary distribution π is via the L2-distance (see, e.g., [19, Theorem 3.2]).

However, this fails to give the sought bound d+o(1)d−2 logd−1 n for the SRW, as

we see from Proposition 6 that the SRW on Ramanujan graphs exhibits an

L2-cutoff at 12 log1/ρ n > (1 + η) d

d−2 logd−1 n for some η(d) > 0 (see (2.7)).

To remedy this, we turn to the nonbacktracking random walk (NBRW),

which moves from the directed edge (x, y) to a uniformly chosen edge (y, z)

such that z 6= x. In recent years, delicate spectral information on random

graphs has been extracted by counting nonbacktracking paths; notably, this

was essential in the proofs that random d-regular graphs and random lifts

are weakly Ramanujan [9,17,18]. Here we follow the reverse route, and use

spectral information on the graph to control the nonbacktracking paths.


●●● ●●●

-2 -1 1 2 3 4 5

-2

-1

1

2

●●● ●●●

-2 -1 1 2

-2

-1

1

2

Figure 5. Eigenvalues of the nonbacktracking operator B of thegraphs from Fig. 4 (with the LPS expander on the left). Colors ofchords between pairs of eigenvalues θ, θ depict the inner product oftheir corresponding eigenvectors w,w′ (blue near 1).

The known relation between the spectrum of G and the spectrum of

the nonbacktracking operator B implies that if G is Ramanujan, each of

its nontrivial eigenvalues λ is mapped to eigenvalues θ, θ′ ∈ C of B with

modulus√d− 1 (see Fig. 4–5 showing this effect for two Ramanujan graphs

with drastically different spectral features). For intuition, note that, had

the operator B been self-adjoint and transitive (it is neither), we would

have gotten that the L2-distance at time t is O(√n(d − 1)−t/2), implying

the correct upper bound of (1 + o(1)) logd−1 n for the NBRW.

Fortunately, it turns out that, while not a normal operator, B is unitarily

similar to a matrix Λ that is block-diagonal with n− 1 non-singleton blocks

(n− 2 if G is bipartite), each of which has size 2× 2 (despite potential high

multiplicities in the eigenvalues of G) and corresponds to an eigenvector

pair w,w′ with matching eigenvalues θ, θ. This description of B appears in

Proposition 3.1 and may be of independent interest.

1.5. Organization. The rest of this paper is organized as follows. Section 2

describes the reduction of L1-mixing for the SRW to that of the NBRW and

establishes the optimality of the Lp-cutoff of SRW on Ramanujan graphs for

all p > 1 (Proposition 6). Section 3 studies the NBRW, beginning in §3.1

with the aforementioned spectral decomposition and its properties (an exact

computation of the off-diagonal entries is deferred to Proposition 4.1 in §4).

In §3.2 we give the proof of the non-bipartite case, which implies Theorem 1

and Corollary 2; and §3.3 includes the proofs of the extensions to bipartite

and weakly Ramanujan graphs, which imply Theorem 4 and Corollary 5.


2. Simple random walk

2.1. Reduction to NBRW. As described in [22] (see §2.3 and §5.2 there),

cutoff for SRW can be reduced to cutoff for the NBRW as follows. Let

G = (V,E) be a d-regular graph and let Td be the infinite regular tree

rooted at ξ, the universal cover of G. For a given vertex x ∈ V , consider a

cover map φ : Td → V with φ(ξ) = x, and observe that if (Xt) is SRW on Tdstarted at ξ, then Xt = φ(Xt) is SRW on G started at x. (This was also used

in the proof of the Alon–Boppana Theorem given in [25, Proposition 4.2].)

Similarly, if Yt is NBRW on Td started at (ξ, σ) ∈ ~E(Td), and we write

Yt = (Y ′t,Y ′′t ) to denote its endpoint vertices, then Yt = (Y ′t , Y′′t ) given

by Y ′t = φ(Y ′t) and Y ′′t = φ(Y ′′t ) is NBRW on G started at (x, φ(σ)). By

symmetry, if

Et,` := {dist(ξ,Xt) = `} ,then the conditional distribution of Xt given Et,` is uniform over the vertices

at distance ` from ξ in Td. Therefore,

Px (Xt ∈ · | Et,`) =1

d

∑σ:ξσ∈E(Td)

P(x,φ(σ))

(Y ′′`−1 ∈ ·

).

As a projection can only decrease total-variation distance, letting ` = tmix(ε)

for the NBRW on G and π be the uniform distribution over V (G), we get

‖Px (Xt ∈ ·)− π‖tv ≤ ε+ P (dist(ξ,Xt) < `) ,

and in particular, taking a maximum over x shows that the SRW on G has

Dtv(t) ≤ ε+ P(dist(ξ,Xt) < `) . (2.1)

Finally, since SRW on Td (d ≥ 3) is transient, Xt returns to ξ only a finite

number of times almost surely. If Xt 6= ξ then dist(Xt+1, ξ) − dist(Xt, ξ) is

equal to −1 with probability 1/d and +1 otherwise. Therefore, by the CLT,

dist(Xt, ξ)− ((d− 2)/d)t

(2√d− 1/d)

√t

⇒ N (0, 1) . (2.2)

Thus, if `→∞ then by (2.1), for every fixed s ≥ 0 the SRW on G satisfies

lim supn→∞

Dtv

(dd−2`+ s

√`)≤ ε+ P (Z > cd s) , (2.3)


2√d(d−1)

.

Conversely, the number of vertices at distance ` from a given vertex x ∈ Vis at most d(d − 1)`. So, on the event dist(Xt, ξ) < logd−1(εn/d), the SRW

Xt is confined to a set of at most εn vertices of G, thus its total-variation

distance from π is at least 1− ε. Altogether, (2.2) implies the following.


0 5 10 15 20 25 300.0

0.5

1.0

1.5

2.0

2.5

3.0

L1

L32 L2 L3 L5 L25 L∞

Figure 6. The Lp-distance (p ≥ 1) from equilibrium of SRW on theLPS graph on PSL(2,F29) shown in Fig. 1, highlighting p = 1, 2,∞.

Fact 2.1. For every d-regular graph on n vertices with d ≥ 3 fixed, and

every fixed s, ε > 0, the SRW on G satisfies

lim infn→∞

Dtv

(t− s

√logd−1 n

)≥ 1− ε− P (Z > cd s) (2.4)

at t = dd−2 logd−1(εn/d), where cd = (d−2)3/2

2√d(d−1)

and Z ∼ N (0, 1).

Comparing the two bounds (2.3)–(2.4) with the desired estimate (1.1) for

the SRW in Theorem 1, we see that the latter will follow if we show that the

NBRW has cutoff at time logd−1 n+ o(√

log n) with window o(√

log n). This

will be achieved in §3 via a spectral analysis of the nonbacktracking walk.

2.2. Optimal Lp-mixing on Ramanujan graphs. We begin with the

special case p = 2 of Proposition 6.

Lemma 2.2. Fix d ≥ 3 and let ρ = 2√d− 1/d. For every fixed ε > 0 and

every connected d-regular graph G on n vertices, the SRW satisfies

t(L2)mix (ε) ≥ 1

2 log1/ρ n−O(log log n) .

Moreover, if G is non-bipartite Ramanujan then this is tight: SRW has an

L2-cutoff at 12 log1/ρ n > (1+η) d

d−2 logd−1 n for some constant η = η(d) > 0.

Proof of Lemma 2.2. Let P t be the t-step transition kernel of SRW, and

let π be the uniform distribution on V (G). For any x ∈ V (G),∑y

P t(x, y)2 = P 2t(x, x) ≥ Q2t(ξ, ξ) ,


where Qt is the t-step transition kernel of SRW on Td, the infinite d-regular

tree rooted at ξ; indeed, as argued above, if Xt is SRW on the cover tree Tdthen Xt = φ(Xt) is SRW on G, where φ is the cover map, and in particular

a return to the root in the former implies a return to the origin in the latter.

The probability Q2t(ξ, ξ) is nothing but the probability of a 1d biased

walk, reflected at 0, to be 0 at time 2t, well-known (cf. [35, §5, p128]) to be

Q2t(ξ, ξ) =2ρ2

1− ρρ2t

t 22t−2

(2t− 2

t− 1

)∼ 2ρ2

1− ρ2ρ2t√πt3

for ρ =2√d− 1

d.

(2.5)

In particular, using the standard expansion of the L2-distance,∑x

(µ(x)

π(x)− 1

)2

π(x) =∑x

µ2(x)

π(x)− 1 (2.6)

which holds for every probability distribution µ, thus we have∥∥∥P t(x, ·)π

− 1∥∥∥2L2(π)

≥ cd nρ2tt−3/2 − 1

for cd = 2ρ2[(1− ρ2)√π]−1. Consequently, from any initial x ∈ V we have

t(L2)mix (x, ε) ≥ log(n/ε)

2 log(1/ρ)−O(log log n) ,

where t(L2)mix (x, ε) is the first t where ‖P t(x, ·)/π−1‖L2(π) becomes at most ε.

We next argue that

12 log1/ρ n >

dd−2 logd−1 n for every real d ∈ (2,∞) and n ≥ 2 . (2.7)

Indeed, (2.7) is equivalent to having d−2d log(d − 1) > 2 log

(d

2√d−1

)for all

real d ∈ (2,∞), which, in turn, immediately follows from the fact that

f(d) :=d− 2

dlog(d− 1)− 2 log

( d

2√d− 1

)has f ′(d) = 2d−2 log(d− 1), so f(2) = 0 whereas f ′(d) > 0 for all d > 1.

Finally, whenG is Ramanujan, the sought upper bound on the L2-distance

follows from considering the spectral representation (see, e.g., [3])

‖P t(x, ·)/π − 1‖2L2(π) = n

n∑i=2

|fi(x)|2(λi/d)2t (2.8)

for {fi}ni=1 an orthonormal basis of eigenfunctions with eigenvalues {λi}ni=1

of the adjacency matrix and λ1 = d, and plugging in |λi| ≤ 2√d− 1. �

Remark 2.3. A different perspective on Lemma 2.2 is given by the next

proof of a slightly weaker statement. By the generalization by Serre [33]

(see [13, Theorem 1.4.9]) of the Alon–Boppana Theorem [29], for every ε > 0

there exists cε,d > 0 such that G has at least cε,d n eigenvalues λ with


|λ| > 2√d− 1 − ε. Applying this fact for some ε(d) > 0 to be specified

later, since 1n

∑x ‖

P t(x,·)π − 1‖2L2(π) =

∑ni=2(λi/d)2t where λ2, . . . , λn are the

nontrivial eigenvalues of G (this follows from (2.8) since an average over x

allows one to replace∑

x |fi(x)|2 by 1 for each i), we deduce that

maxx

∥∥∥P t(x, ·)π

−1∥∥∥2L2(π)

≥ 1

n

∑x

∥∥∥P t(x, ·)π

−1∥∥∥2L2(π)

≥ cε,dn(2√d− 1− εd

)2t.

Consequently,

t(L2)mix (δ) ≥ log(n/δ)

2 log(

d2√d−1−ε

) −O(1) . (2.9)

The proof now follows from (2.7) as we may choose ε(d), η(d) > 0 so that the

right-hand of (2.9) would be at least (1 + η − o(1)) dd−2 logd−1 n, as needed.

For the general case of p ∈ [1,∞], we need the following simple claims.

Claim 2.4. Let G be a d-regular graph on n vertices, and let Td be the

infinite d-regular tree rooted at ξ. For every 1 ≤ p <∞, SRW on G satisfies

‖P t(x, ·)/π − 1‖Lp(π) ≥ n(p−1)/p‖Qt(ξ, ·)‖p − 1

for all x, t, where P and Q are the transition kernels of SRW on G and Td.

Proof. By the triangle inequality w.r.t. ‖ · ‖Lp(π),

‖P t(x, ·)/π − 1‖Lp(π) ≥ n(p−1)/p‖P t(x, ·)‖p − 1 .

Since P t(x, y) =∑

η∈φ−1(y)Qt(ξ, η) for every cover map φ : V (Td)→ V (G)

with φ(ξ) = x, using the fact (∑k

i=1 ai)p ≥

∑ki=1 a

pi for every a1, . . . , ak > 0

and p ≥ 1 gives (P t(x, y)

)p ≥ ∑η∈φ−1(y)

(Qt(ξ, η)

)p.

Summing over all y gives ‖P t(x, ·)‖p ≥ ‖Qt(ξ, ·)‖p, as required. �

Claim 2.5. Fix d ≥ 3 and let Td be the infinite d-regular tree rooted at ξ.

There exist constants c1(d), c2(d) > 0 such that, for all k and t,

c1(d) ≤Pξ(|Xt| = k)

k+1t P

(Zt = k+t

2

) ≤ c2(d)

where |Xt| is the distance of Xt from its origin ξ, and Zt ∼ Bin(t, d−1d ).

Proof. The case k = 0 follows from (2.5), since P(Z2t = t) = (d−1d )td−t(2tt

),

which is (ρ/2)2t(2tt

). This extends to all k using the decomposition

Pξ(|Xt| = k) =t−1∑`=0

Pξ(|X`| = 0)Pξ ({|Xj | > 0 : 1 ≤ j ≤ t− `} , |Xt−`| = k)

and the Ballot Theorem (see, e.g., [16, §III.1]). �


2 3 4 5 6 7 8

2

4

6

8

10

d = 3

d = 12

Figure 7. Lp-cutoff location (normalized by logd−1 n) as a functionof p ≥ 1 for Ramanujan graphs with degree d = 3, . . . , 12. Thefunctions are C1, but not C2 at p = 2.

For more general local limit theorems on trees, see, e.g., [21].

Proof of Proposition 6. With Claims 2.4 and 2.5 in mind, and using their

notation, for every t and p ∈ [1,∞] we have

‖Qt(ξ, ·)‖pp =∑k≥0

(d− 1)k(

(d− 1)−kPξ(|Xt| = k))p

≥(c1(d)

t

)p∑k≥0

((d− 1)k(1−p)/p P (Zt = (k + t)/2)

)p. (2.10)

Writing βt = (k+ t)/2 (so that k = (2β − 1)t), the large deviation estimate

P(Zt = βt) � t−1/2 exp[−He(β ‖ d−1d )t] (2.11)

for the binomial variable Zt thus leads to the following optimization problem:

min

{p− 1

p(2β − 1) +Hd−1

(β ‖ d−1

d

): 1

2 ≤ β ≤ 1

}. (2.12)

(Observe that in fact β ≤ d−1d since for β > d−1

d both terms are increasing.)

Let f(β) denote the objective in (2.12). Then

f ′(β) =2(p− 1)

p+ logd−1

( β

(1− β)(d− 1)

),

and solving f ′(β) = 0 we get 1−ββ = (d − 1)(p−2)/p. Since f ′′(β) is positive,

it follows that the minimizer of (2.12) is at

β∗ =1

(d− 1)(p−2)/p + 1∨ 1

2. (2.13)

(Observe that β∗ = 1/2 iff p ≥ 2, hence the two regimes for the Lp-cutoff

location as a function of p.) By (2.10)–(2.11), for some c = cd > 0,

‖Qt(ξ, ·)‖p ≥ cd t−3/2(d− 1)−f(β∗)t ,


and therefore, by Claim 2.4, for every starting vertex x,∥∥∥P t(x, ·)π

− 1∥∥∥Lp(π)

≥ cd n(p−1)/pt−3/2(d− 1)−f(β∗)t − 1 . (2.14)

This implies (1.3) (and is furthermore valid for every starting vertex x).

For matching upper bounds in case G is a Ramanujan graph, first take

p ≥ 2. The lower bound established above is Dp(t) ≥ cd n(p−1)/pt−3/2ρt − 1.

Recalling Lemma 2.2, for Ramanujan graphs,

D2(t) ≤√nρt , D∞(t) ≤ D2(bt/2c)2 ≤ nρt ,

using the well-known fact (a routine application of Cauchy–Schwarz) that

D∞(s+ t) ≤ D2(t)D∗2(s) , (2.15)

where D∗2(s) corresponds to the reversed chain (here D2(s) = D∗2(s) as SRW

is reversible). So, by the Riesz–Thorin Interpolation Theorem (see, e.g., [34,

Theorem 1.3, p. 179]), for 2 ≤ p ≤ ∞, we deduce that Dp(t) ≤ n(p−1)/pρt.Having established (1.4) for p ≥ 2, now take 1 < p ≤ 2. Let

P t(x, ·) =∑

k Pξ(|Xt| = k)µk(x, ·) (2.16)

where µk is the law of the projection of NBRW on the endpoint of its directed

edge, started at a uniform edge originating from x. By Jensen’s inequality,(d(d− 1)k−1

)−1/p‖µk(x, ·)− 1n‖p ≤

(d(d− 1)k−1

)−1/2‖µk(x, ·)− 1n‖2 .

Notice that

n∥∥∥µk(x, ·)− 1

n

∥∥∥22

=∥∥∥ µk(x, ·)

π−1∥∥∥2L2(π)

≤ maxy:xy∈E(G)

∥∥∥µk−1((x, y), ·)

π ~E−1∥∥∥2L2(π~E

)

where µk is the k-step transition kernel of the NBRW and π ~E is its stationary

distribution. In our analysis of the NBRW in §3, we will show (see (3.13))

that the right-hand side of the last display is O(nk2(d− 1)−k), whence

‖µk(x, ·)− 1n‖p ≤ cd k(d− 1)k(1−p)/p .

Recalling (2.16), it now follows that

‖P t(x, ·)− 1n‖p ≤ (t+ 1) max

0≤k≤t(Pξ(|Xt| = k)) ‖µk(x, ·)− 1

n‖p

≤ c′dt2 max0≤k≤t

(d− 1)k(1−p)/p (Pξ(|Xt| = k)) ,

which, in view of (2.10), gives rise to the same optimization problem (2.12).

Therefore, the right-hand side of the last display is at most tC(d− 1)−f(β∗)t

for C > 0 fixed. Taking t as in (1.4) with a suitable additive O(log log n)

term gives (d− 1)f(β∗)t ≥ n(p−1)/pt2C . Thus,

‖P t(x, ·)/π − 1‖Lp(π) = n(p−1)/p‖P t(x, ·)− 1n‖p ≤ t

−C ,

establishing (1.4) for all 1 < p ≤ 2. �


3. Nonbacktracking walks

3.1. Spectral decomposition. The spectrum of the nonbacktracking walk

has been thoroughly studied, in part due to the fact that its eigenvalues are

precisely the inverse of the poles of the so-called Ihara Zeta function of

the graph (cf. [8, 20]). Our analysis here, on the other hand, hinges on

the structure of the eigenfunctions, starting with a spectral decomposition

of the nonbacktracking operator; this builds on properties of this operator

that appear implicitly in [20] (see also [5,7,8] as well as [26, Exercise 6.59]).

Proposition 3.1 below gives a more complete picture.

Throughout this section, for a graph G = (V,E), we denote its adjacency

matrix by A = A(G) and let λ1 = d ≥ λ2 ≥ . . . ≥ λn be its eigenvalues.

Denote by ~E the set of N = 2|E| directed edges of G; we refer to undirected

edges as xy ∈ E and to directed ones as (x, y) ∈ ~E for the sake of clarity.

The nonbacktracking walk matrix B is the ( ~E × ~E)-matrix given by

B(u,v),(x,y) = 1{v=x , u 6=y} for (u, v) and (x, y) in ~E . (3.1)

Though B may not be a normal operator, it can be decomposed as follows.

Proposition 3.1. Let G = (V,E) be a connected d-regular graph (d ≥ 3) on

n vertices. Let N = dn and let {λi}ni=1 be the eigenvalues of the adjacency

matrix, with λ1 = d. Then the operator B from (3.1) is unitarily similar to

Λ = diag

(d− 1,

(θ2 α2

0 θ′2

), . . . ,

(θn αn0 θ′n

),

N/2−n︷︸︸︷− 1, . . . ,−1,

N/2−n+1︷︸︸︷1, . . . , 1

)(3.2)

where |αi| < 2(d− 1) for all i and θi, θ′i ∈ C are defined as the solutions to

θ2 − λiθ + d− 1 = 0 . (3.3)

Remark 3.2. The exact value of |αi| is shown in Proposition 4.1 to be d−2

for every |λi| ≤ 2√d− 1 and

√d2 − λ2i for every 2

√d− 1 < |λi| < d.

Remark 3.3. We see that every eigenvalue θ 6= ±1 of B is of the form

λ/2±√

(λ/2)2 − (d− 1) for some eigenvalue λ of A (with θ = d−1 matching

the principal eigenvalue λ = d). Indeed, this well-known fact follows from

Bass’s Formula [8], which in the d-regular case is equivalent to the statement

that fB(θ) =(1− θ2

)N/2−nfA(θ+(d−1)/θ) for fA and fB the characteristic

polynomials of A and B, respectively.

(i) λ = d corresponds to θ = d−1, the principal eigenvalue of B matching

the eigenvector w1 ≡ N−1/2; the second solution, θ′ = 1, was already

accounted for in (3.2). An eigenvalue of λ = −d (when G is bipartite)

yields θ = −(d−1) and an extra −1 eigenvalue (N −n+ 1 altogether).


(ii) 2√d− 1 < |λ| < d yields two eigenvalues θ 6= θ′ ∈ R of B.

(iii) λ < |2√d− 1| yields θ = θ′ ∈ C \ R with |θ| =

√d− 1 (for instance,

λ = 0 corresponds to θ = i√d− 1 and θ′ = −i

√d− 1).

(iv) λ = ±2√d− 1 gives a single solution θ = ±

√d− 1 with multiplicity 2.

Remark 3.4. For each θ ∈ C, define Tθ : `2(V )→ `2( ~E) by

(Tθf)(x, y) := θf(y)− f(x) . (3.4)

Each solution θ 6= ±1 of equation (3.3), for some λ such that Af = λf , is

an eigenvalue of B corresponding to the eigenvector Tθf ; indeed,

(BTθf)(x, y) =∑

z:yz∈Ez 6=x

(θf(z)− f(y)) = θ[(Af)(y)− f(x)]− (d− 1)f(y)

= [θλ− (d− 1)] f(y)− θf(x) = θ(Tθf)(x, y) ;

where the last equality used (3.3) to replace θλ by θ2 + d − 1; thus, Tθf is

an eigenfunction of B corresponding to θ as long as Tθf 6= 0, and clearly

Tθf = 0 only if θ = ±1 (which, in turn, occurs iff λ = ±d).

Proof of Proposition 3.1. Observe that `2( ~E) = `2+( ~E)⊕ `2−( ~E) where

`2+( ~E) = {w : w(x, y) = w(y, x)} , `2−( ~E) = {w : w(x, y) = −w(y, x)} ,

as the term for (x, y) in 〈w+, w−〉 cancels with that of (y, x) if w± ∈ `2±( ~E).

With this in mind, the eigenspaces of 1 and −1 in B are straightforward:

the star spaces S− ⊂ `2−( ~E) and S+ ⊂ `2+( ~E) are defined by

S± = Span({s±x : x ∈ V

}), where s±x (u, v) =

1 u = x ,

±1 v = x ,

0 otherwise .

For every w ∈ `2−( ~E) and s−x as above 〈w, s−x 〉 = 2∑

y:xy∈E w(x, y), and so

(Bw)(x, y) = −w(y, x) = w(x, y) when in addition w ⊥ s−y . Thus,

Bw = w for every w ∈ `2−( ~E) ∩ S⊥− , (3.5)

and similarly,

Bw = −w for every w ∈ `2+( ~E) ∩ S⊥+ . (3.6)

As for the dimension of these spaces, note that if {ax}x∈V is such that∑axs−x = 0 then ax = ay for every xy ∈ E; since G is connected, this implies

that dim(S−) = n− 1, thus B has an orthonormal system of N/2− (n− 1)

eigenvectors with eigenvalue 1. Similarly, if∑axs

+x = 0 then ax = −ay for

every xy ∈ E, so the eigenspace of −1 has dimension N/2− (n− 1) if G is

bipartite and dimension N/2− n otherwise.


Having specified these eigenvectors of B as well as those corresponding to

θi, θ′i in Remark 3.4, we proceed to analyzing their inner products. Observe

that after appropriate permutations of its rows and columns, B becomes

block diagonal with blocks Jd−Id, where Jd and Id are the all-one matrix and

identity matrix of order d, respectively; thus, B has an inverse, which under

the same permutations is block diagonal with blocks (d−1)−1Jd−Id, so the

matrix C := (d−1)B−1+B (which, of course, satisfies Cw = ((d−1)/θ+θ)w

for every eigenfunction w of B with eigenvalue θ) is given by

C(u,v),(x,y) =

1 {v = x, u 6= y} or {y = u, v 6= x} ,−(d− 2) (u, v) = (y, x) ,

0 otherwise .

Thus, C is real symmetric, and `2+( ~E) and `2−( ~E) are invariant under it.

Furthermore, if f ∈ `2(V ) and wf ∈ `2( ~E) is given by wf (x, y) := f(y) then

(Cwf )(x, y) =∑

z:yz∈Ez 6=x

f(z) +∑

v:vx∈Ev 6=y

f(x)− (d− 2)f(x)

=∑

z:yz∈Ez 6=x

f(z) + f(x) = (Af)(y) ,

and similarly, if w′f (x, y) := f(x) then (Cw′f )(x, y) = (Af)(x). Moreover,

〈wf , wg〉 =⟨w′f , w

′g

⟩= d 〈f, g〉 and

⟨wf , w

′g

⟩= 〈f,Ag〉 for f, g ∈ `2(V ).

In particular, the eigenfunctions (fi)ni=1 correspond in this way to pairwise

orthogonal eigenspaces of C with eigenvalues (λi)ni=1; the dimension of each

eigenspace is 1 if λi = ±d and 2 otherwise (as before, wf can be a multiple of

w′f only if w ≡ c or when G is bipartite and w ≡ c on one part and w ≡ −con the other), and they notably include the eigenfunctions Tθifi of B.

Of course, every such 2-dimensional eigenspace corresponding to λi 6= ±dis orthogonal to the eigenvectors of B from (3.5)–(3.6) (corresponding to the

eigenvalues ±1), as those are also eigenvectors of ±d for the self-adjoint C.

Finally, the eigenvector w ≡ 1 with the eigenvalue d−1 of B (and eigenvalue

d of C) is orthogonal to `2−( ~E) (thus to all eigenvectors from (3.5)), whereas

if G is bipartite and we take w ≡ 1 on outgoing edges from a prescribed

part of G and w ≡ −1 on the incoming ones (with eigenvalue −(d − 1) of

B) then w ⊥ `2+( ~E), thus it is orthogonal to all eigenvectors from (3.6).

Suppose for now that A has no eigenvalue λi such that |λi| = 2√d− 1.

Then there are two distinct solutions to (3.3) for each of the λi’s, and so, in

particular, the eigenspace of C corresponding to λi 6= ±d has two linearly

independent eigenvectors of B—corresponding to eigenvalues θi and θ′i. The

orthogonality of the eigenspaces from the discussion above now establishes

the form of Λ from (3.2).


When there exist eigenvalues of A such that |λi| = 2√d− 1, we have the

unique solution θi = λi/2 for (3.3), and claim that this gives rise to a Jordan

block(λi/2 10 λi/2

). Indeed, recalling that BTθfi = θifi, observe that

(BT1+θifi)(x, y) = [(1 + θi)λi − (d− 1)]fi(y)− (1 + θi)fi(x)

= θi[(1 + θi)fi(y)− fi(x)

]+[θ2i + θi − (d− 1)

]f(y)− f(x)

= θi(T1+θifi)(x, y) + (Tθifi)(x, y) , (3.7)

where the second equality used θi = λi/2 and the last one used θ2i = d−1. As

these both belong to the corresponding eigenspace of C, we arrive at (3.2).

To conclude the proof, it remains to show that |αi| < 2(d− 1) if λi 6= ±d.

Recall that there exist unit vectors wi, w′i such that Bw′i = αiwi+θ

′iw′i (these

can be taken as columns 2i and 2i+ 1 of U as above). Hence,

(B − θ′iI)w′i = αwi . (3.8)

Let ‖ · ‖2→2 be the `2( ~E)→ `2( ~E) operator norm; we claim ‖B‖2→2 = d−1.

Indeed, it is easy to verify that

(BB∗)(u,v),(x,y) =

d− 1 x = u and y = v ,

d− 2 x 6= u and y = v ,

0 otherwise .

We see that BB∗ has ‖BB∗‖∞→∞ = (d − 1)2 and an eigenvalue (d − 1)2

corresponding to the eigenvector w ≡ 1; thus, ‖B‖2→2 = d − 1. By (3.8),

using |θ′i| < d − 1 and ‖wi‖ = ‖w′i‖ = 1, and we infer that |α| < 2(d − 1),

concluding the proof of the proposition. �

3.2. Cutoff on non-bipartite Ramanujan graphs. On every d-regular

graph on n vertices, the number of directed edges at distance ` from a given

(x, y) ∈ ~E is at most (d−1)`; this readily implies (as stated in [22, Claim 4.8])

that the nonbacktracking random walk satisfies

tmix(1− ε) ≥ dlogd−1(dn)e − dlogd−1(1/ε)e for any 0 < ε < 1 . (3.9)

Our goal in this section is to show an asymptotically tight upper bound on

tmix using the spectral decomposition of the nonbacktracking operator B.

Theorem 3.5. Let G be a non-bipartite Ramanujan graph on n vertices

with degree d ≥ 3. Let µt be the t-step transition kernel of the NBRW, and

let π be the uniform distribution on ~E. Then for some fixed c(d) > 0,

max(x,y)∈ ~E

∥∥∥µt((x, y), ·)

π− 1∥∥∥2L2(π)

≤ c(d)

log nat t =

⌈logd−1 n+ 3 logd−1 log n

⌉.

Consequently, on any sequence of such graphs, the NBRW exhibits L1-cutoff

and L2-cutoff both at time logd−1 n.


Remark 3.6. The constant c(d) in the above theorem can be taken to be

8(d−1) log−2(d−1)+1 for any sufficiently large enough n (cf. (3.13) below).

Proof of Theorem 3.5. Appealing to Proposition 3.1, let U be the unitary

matrix such that B = UΛU∗ with Λ from (3.2), and write

U =(w1 | w2 | w′2 | w3 | w′3 | . . . | wn | w′n | u1 | . . . uN−(2n−1)

),

in which w1 ≡ N−1/2. Recalling Remark 3.3, observe that the assumption

that G is non-bipartite Ramanujan implies that for all i = 2, . . . , n, the

solutions θi, θ′i to (3.3) satisfy θ′i = θi and |θi| =

√d− 1.

Let (x0, y0) ∈ ~E be some initial edge for the NBRW; by the expansion (2.6)

of the L2-distance, the t-step transition kernel µt = (d− 1)−tBt satisfies∥∥∥µt((x0, y0), ·)π

− 1∥∥∥2L2(π)

= N∑(x,y)

∣∣µt((x0, y0), (x, y))∣∣2 − 1

= N∥∥µt((x0, y0), ·)∥∥2 − 1 . (3.10)

Using B = UΛU∗ with Λ from (3.2) and U as specified above we find that

Bt((x0, y0), ·

)= (d− 1)tw1(x0, y0)w1 +

∑i

(±1)tui(x0, y0)ui

+

n∑i=2

θtiwi(x0, y0)wi +(θtiw

′i(x0, y0) + γi(t)wi(x0, y0)

)w′i ,

where

γi(t) := αi

t−1∑j=0

θji θt−1−ji

with αi from Proposition 3.1. Note that in particular, as αi < 2(d− 1),

|γi(t)| ≤ 2(d− 1)t|θi|t−1 . (3.11)

From the above expansion of Bt, since U is unitary and w1 ≡ N−1/2,∥∥µt((x0, y0), ·)∥∥2 =1

N+∑i

(d− 1)−2t|ui(x0, y0)|2

+ (d− 1)−2tn∑i=2

(|θi|2t|wi(x0, y0)|2 +

∣∣θtiw′i(x0, y0) + γi(t)wi(x0, y0)∣∣2) .

(3.12)

Now we exploit the fact that G is Ramanujan: since |θi| =√d− 1 for every

2 ≤ i ≤ n, the expression in the second line of (3.12) is at most

(d− 1)−tn∑i=2

|wi(x0, y0)|2 + 2|w′i(x0, y0)|2 + 2|γi(t)|2

(d− 1)t|wi(x0, y0)|2 ,


using the parallelogram law. Since by Parseval’s identity,∑i

|ui(x0, y0)|2 +∑i

|wi(x0, y0)|2 + |w′i(x0, y0)|2 = ‖δ(x0,y0)‖2 = 1 ,

and with (3.10) in mind, we infer that∥∥∥µt((x0, y0), ·)π

− 1∥∥∥2L2(π)

≤ 2N(d− 1)−t(

1 + maxi

|γi(t)|2

(d− 1)t

).

Substituting the bound (3.11) on γi(t), again using that G is Ramanujan,∥∥∥µt((x0, y0), ·)π

− 1∥∥∥2L2(π)

≤ 2N(d− 1)−t(4(d− 1)t2 + 1

). (3.13)

In particular, for t = dlogd−1 n+ 3 logd−1 log ne,∥∥∥µt((x0, y0), ·)π

− 1∥∥∥2L2(π)

≤ O(1/ log n) = o(1) ,

thus concluding the proof of Theorem 3.5. �

Using the reduction in §2.1 from SRW to NBRW (see (2.3)–(2.4)), one

can deduce Theorem 1 from Theorem 3.5, as the O(log log n) window for

the NBRW is negligible compared with the term s√

logd−1 n in (1.1).

Note that for every integer ` ≥ (2+o(1)) logd−1 n there is a path of length

exactly ` between every pair of vertices x, y using D∞(2t) ≤ D2(t)D∗2(t) for

the NBRW (recall (2.15), and that the chain and its reversal are isomorphic).

Proof of Corollary 2. Since max(x,y) ‖µt((x, y), ·)−π‖tv = o(1) at time t

as per Theorem 3.5, for every x, all but o(n) directed edges can be reached

by a nonbacktracking path of length t from x. The remark above (3.9)

on the growth of balls in a d-regular graph thus implies the corollary: the

statement on a nonbacktracking cycle follows from applying this argument

once on a directed edge originating from x (and reaching almost every y

within the proper length bound) and once on an arbitrarily chosen other

directed edge ending at x, in the reversed NBRW. �

Proof of Corollary 3. Note that at time R, the L2-distance of the NBRW

from equilibrium is O(1/ log3/2 n) by (3.13), and that k = O(log n) since

k ≤ g. For a uniformly chosen path (yi)ki=1 in G, each yi is uniform by

the stationarity of the NBRW. Thus, by a union bound over the vertices yi,

for each i there exists a path of length R from the edge (xi, zi) to (yi, z′i),

except with probability O(k/ log3/2 n) = o(1), where zi and z′i are not on

the paths (xi) and (yi), respectively. The conclusion now follows since, if

vertex ` of the path from xi coincides with vertex `′ of the path from xj , then

`+`′+k > g and (R−`)+(R−`′)+k > g, so k > g−R, a contradiction. �


Remark 3.7. In the setting of Theorem 3.5, if G is in addition transitive

then, by using the exact value |αi| = d− 2 from Proposition 4.1 below, the

L2-mixing time of the NBRW can be pinpointed precisely: let

ΥG(k) := (d− 2)2(d− 1)−1∫Uk−1(x)2dµG

for µG = 1n

∑i δλi/(2

√d−1) the empirical spectral distribution (ESD) of G

and Uk(cos(x)) = sin((k−1)x)sinx the Chebyshev polynomial of the second kind.

Then for any fixed ε > 0,

t(L2)mix (ε) =

⌈logd−1(n) + logd−1

(ΥG(logd−1 n) + 2

)+ logd−1(1/ε)

⌉. (3.14)

Indeed, from (3.12) we see that for any non-bipartite Ramanujan graph G

(not necessarily transitive), averaging over the initial state (x0, y0) gives

1

N

∑(x0,y0)

∥∥µt((x0, y0), ·)∥∥2 =1

N+N − 2n+ 1

(d− 1)2t+

2n− 2

(d− 1)t+

∑i |γi(t)|2

(d− 1)t,

using that wi ⊥ w′i and ‖ui‖ = ‖wi‖ = ‖w′i‖ = 1 for all i. Thus, by (3.10),

1

N

∑(x0,y0)

∥∥∥µt((x0, y0), ·)π

− 1∥∥∥2L2(π)

= (1 + o(1))

(∑i

|γi(t)|2 + 2

)n

(d− 1)t,

provided that t→∞ with n. Writing ϕi = λi/(2√d− 1) (so θi = cosϕi for

i = 2, . . . , n) and using Proposition 4.1,

|γi(t)| = (d− 2)

∣∣∣∣ θti − θtiθi − θi

∣∣∣∣ = (d− 2)(d− 1)(t−1)/2∣∣∣∣sin(tϕi)

sinϕi

∣∣∣∣ ,which implies the analogue of (3.14) for the average of the mixing times over

the initial states (x0, y0), thus establishing (3.14) for the transitive case.

3.3. Extensions. We conclude with corollaries of the proof of Theorem 3.5.

3.3.1. Bipartite Ramanujan graphs. Following is the analog for NBRW in the

bipartite case; its SRW counterpart follows from the cover-tree reduction.

Corollary 3.8. Let G = (V0, V1, E) be a bipartite Ramanujan graph on n

vertices with degree d ≥ 3. Let µt be the t-step transition kernel of the

NBRW, and let π0 and π1 be the uniform distribution on the N/2 directed

edges originating from V0 and V1, respectively. Then for some fixed c(d) > 0,

max(x0,y0)∈ ~Ex0∈V0

∥∥∥µt((x0, y0), ·)π(t mod 2)

− 1∥∥∥2L2(π(t mod 2))

≤ c(d)

log n

at time

t =⌈logd−1 n+ 3 logd−1 log n

⌉.

Consequently, on any sequence of such graphs, the NBRW that is modified

to be lazy in its first step exhibits L1-cutoff and L2-cutoff at time logd−1 n.


Proof. Following the arguments used to prove Theorem 3.5, observe that in

computing E[|µt((x0, y0), (x, y)

)/π(t mod 2)−1|2

], the identity (3.10) becomes

valid once we replace N by N/2. The only other modification needed is to

treat λn = −d, which produces the eigenvalue θn = −(d− 1). Since all the

coordinates of wn are ±N−1/2, the contribution of this eigenvalue to the

right-hand of (3.12) is 1/N , exactly that of the eigenvalue d− 1 of B. The

combined 2/N cancels via the modified identity (3.10), thus (3.13) becomes∥∥∥∥µt((x0, y0), ·

)π(t mod 2)

− 1

∥∥∥∥2L2(π(t mod 2))

≤ N(d− 1)−t(4(d− 1)t2 + 1

),

which is O(1/ log n) at the same value of t. �

Corollary 3.9. Let G = (V0, V1, E) be a bipartite Ramanujan graph on n

vertices with degree d ≥ 3. Let P t be the t-step transition kernel of the SRW,

and let π0 and π1 be the uniform distribution on V0 and V1, respectively. Let

Then for every fixed s ∈ R and every initial vertex x, the SRW at time

t = dd−2 logd−1 n+ s

√logd−1 n .

satisfies

maxx0∈V0

∥∥P t(x0, ·)− π(t mod 2)

∥∥tv→ P (Z > cd s) as n→∞ ,


2√d(d−1)

.

Consequently, on any sequence of such graphs, the SRW that is modified to

be lazy in its first step exhibits L1-cutoff and L2-cutoff at time dd−2 logd−1 n.

3.3.2. Weakly Ramanujan graphs. It suffices to establish the result for the

NBRW (here we do not specify Dtv for the SRW within the cutoff window,

thus there is no need to control the NBRW within a window of o(√

log n)),

and Theorem 4 and Corollary 5 will then follow using the above reduction.

Corollary 3.10. Fix d ≥ 3 and let G be a d-regular graph on n vertices

whose nontrivial eigenvalues {λi}ni=2 all satisfy |λi| ≤ (1 + δn)2√d− 1 for

some δn going to 0 as n → ∞. Let µt be the t-step transition kernel of the

NBRW, and let π be the uniform distribution on ~E. For some fixed c(d) > 0,

max(x,y)∈ ~E

∥∥∥µt((x, y), ·)

π− 1∥∥∥2L2(π)

≤ c(d)

log n

at time

t =⌈(

1 + 5√δn)

logd−1 n+ 3 logd−1 log n⌉.

Consequently, on any sequence of such graphs, the NBRW exhibits L1-cutoff

and L2-cutoff both at time logd−1 n.


Proof. The analysis of blocks of Λ corresponding to eigenvalues λi (i ≥ 2)

of A such that |λi| ≤ 2√d− 1 remains valid unchanged, and it remains to

consider the effect of

|λi| = (1 + ε)2√d− 1 for some 0 < ε ≤ δn . (3.15)

As mentioned in the proof of Theorem 3.5, the fact that G is Ramanujan

is exploited when replacing |θi|2t by (d − 1)t for all i ≥ 2 in the spectral

decomposition (3.12), and once again (just above (3.13)) in the bound (3.11)

on γi(t). For λi as in (3.15), the corresponding real eigenvalues θi, θ′i of B

are given, as per (3.3), by(1 + ε±

√ε(2 + ε)

)√d− 1 ;

in particular, denoting |θi| > |θ′i|, we have |θi| = (1 +√

2ε + O(ε))√d− 1

(while at the same time |θ′i| <√d− 1). We account for this modified value

of |θi|2t in the spectral decomposition of ‖µt((x, y), ·

)‖2 via the pre-factor(

1 +√

2ε+O(ε))2t≤ exp

[(2√

2δn +O(δn))t],

thus replacing the right-hand of (3.13) by

2N(d− 1)−te[2√2δn+O(δn)]t (4(d− 1)t2 + 1

).

For the designated value of t (in which there is an extra additive term of

5√δn logd−1 n compared to t from Theorem 3.5) and using that δn → 0, we

find that there exists some fixed c(d) > 0 such that ‖µt((x, y), ·

)/π−1‖2L2(π)

is at most

c(d) + o(1)

log nexp

[(2√

2− 5 log(d− 1) + o(1))√

δnt],

which is O(1/ log n) since 2√

2 < 5 log(d− 1) for all d ≥ 3. �

Remark 3.11. Suppose that, for some δn = o(1) and fixed ε′ > 0, the graph

G has |λ| ≤ 2√d− 1 + δn for all eigenvalues λ except for no(1) exceptional

ones, which instead satisfy |λ| < d − ε′. Each eigenvalue of the latter form

corresponds to an additive term of O(a2t) in the right-hand of (3.13), where

0 < a < 1 depends only on d and ε′. For the prescribed t from Corollary 3.10,

this amounts to O(n−ε′′) for some fixed ε′′ > 0, thus the overall contribution

of these no(1) exceptional eigenvalues is negligible and the same result holds.

4. Pinpointing the spectral decomposition

The following proposition gives the precise moduli of the off-diagonal

terms in Λ from the spectral decomposition (3.2) in Proposition 3.1.


Proposition 4.1. In the setting of Proposition 3.1, for all i ≥ 2 we have

αi = 0 if λi = −d (and i = n), and otherwise

|αi| =

d− 2 if |λi| ≤ 2√d− 1 ,√

d2 − λ2i if |λi| > 2√d− 1 .

Proof. Let i ≥ 2, and for simplicity, omit its indices from the corresponding

subscripts; namely, let θ, θ′ correspond to the eigenvalue λ 6= ±d, and let f

be so that Af = λf and ‖f‖ = 1, where A is the adjacency matrix of G.

Case (1): |λ| 6= 2√d− 1: Recalling Tθ from (3.4), we claim that

α =β(θ′ − θ)√

1− |β|2where β :=

⟨Tθ′f, Tθf

⟩‖Tθ′f‖ ‖Tθf‖

. (4.1)

Indeed, taking

w =Tθf

‖Tθf‖, w′ =

Tθ′f

‖Tθ′f‖, w′′ =

w′ − βw‖w′ − βw‖

for β as above gives Bw = θw, Bw′ = θ′w, and

‖w′ − βw‖2 = 1 + |β|2 − β⟨w,w′

⟩− β〈w′, w〉 = 1− |β|2 , (4.2)

so w′′ = (1− |β|2)−1/2(w′ − βw) satisfies

Bw′′ =θ′w′ − βθw√

1− |β|2= θ′w′′ +

β(θ′ − θ)√1− |β|2

= θ′w′′ + αw ,

as claimed. To estimate α, observe that for every f ∈ `2(V ),

‖Tθf‖2`2( ~E)= d(|θ|2 + 1)‖f‖2`2(V ) −

(θ + θ

)〈Af, f〉`2(V ) . (4.3)

Case (1.a): below the Ramanujan threshold. When |λ| < 2√d− 1 we

have θ′ = θ ∈ C \ R. Since θ2 + d− 1 = λθ and |θ| =√d− 1,

‖Tθf‖2 = d2 − (θ + θ)λ = d2 − 2(d− 1)− (θ2 + θ2)

= d2 − 2(d− 1) [1 + cos(2ϕ)] = (d− 2)2 + 2(d− 1)(1− cos(2ϕ)) ,

where we let θ =√d− 1 exp(iϕ). Similarly,

β =d(θ2 + 1)− 2θλ

(d− 2)2 + 2(d− 1)(1− cos(2ϕ))=

(d− 2)(θ2 − 1)

(d− 2)2 + 2(d− 1)(1− cos(2ϕ)),

and so

1− |β|2 = 1− (d− 1)2 + 1− 2(d− 1) cos(2ϕ)[d− 2 + 2d−1d−2(1− cos(2ϕ))

]2=

4(d−1d−2)2(1− cos(2ϕ))2 + 2(d− 1)(1− cos(2ϕ))[d− 2 + 2d−1d−2(1− cos(2ϕ))

]2 .


Substituting cos(2ϕ) = 1− 2 sin2 ϕ we see that

1− |β|2 =4(d− 1)

(1 + 4 d−1

(d−2)2 sin2 ϕ)

sin2 ϕ(d− 2 + 4d−1d−2 sin2 ϕ

)2 =4(d− 1) sin2 ϕ

(d− 2)2 + 4(d− 1) sin2 ϕ,

and so|β|2

1− |β|2=

1

1− |β|2− 1 =

(d− 2)2

4(d− 1) sin2 ϕ.

Since θ − θ′ = 2√d− 1 sinϕ, we conclude from (4.1) that |α| = d− 2.

Case (1.b): above the Ramanujan threshold. For 2√d− 1 < |λ| < d,

we have θ 6= θ′ ∈ R, and assume w.l.o.g. that θ > θ′. By (4.3) we get

‖Tθf‖2 = d(θ2 + 1)− 2θλ = d(θ2 + 1)− 2(θ2 + d− 1) = (d− 2)(θ2 − 1) ,

and for the same reason, ‖Tθ′f‖2 = (d− 2)(θ′2 − 1). Similarly,

〈Tθf, Tθ′f〉 = d(θθ′ + 1)− (θ + θ′)λ = d2 − λ2 ,

using that θθ′ = d − 1 whereas θ + θ′ = λ through their definition in (3.3).

Since we also have θ2 + θ′2 = λ2 − 2(d− 1), we see that

(θ2 − 1)(θ′2 − 1) = (d− 1)2 −

(λ2 − 2(d− 1)

)+ 1 = d2 − λ2 ,

and altogether deduce that

β =d2 − λ2[

(d− 2)(θ2 − 1)] 12[(d− 2)(θ′2 − 1)

] 12

=

√d2 − λ2d− 2

.

Therefore,

β2

1− β2=

1

1− β2− 1 =

d2 − λ2

(d− 2)2 − (d2 − λ2)=

d2 − λ2

λ2 − 4(d− 1),

Recalling the definition (4.1) of α, and using that θ − θ′ =√λ2 − 4(d− 1),

we infer that α2 = d2 − λ2.Case (2): at the Ramanujan threshold: For |λ| = 2

√d− 1 we claim

α =‖Tθf‖

‖T1+θf‖√

1− |β|2where β :=

⟨T1+θf, Tθf

⟩‖T1+θf‖ ‖Tθf‖

. (4.4)

To see this, take

w =Tθf

‖Tθf‖, w′ =

T1+θf

‖T1+θf‖, w′′ =

w′ − βw‖w′ − βw‖

;

since Bw′ = θw′+ (‖Tθf‖/‖T1+θf‖)w by (3.7), while ‖w′− βw‖2 = 1− |β|2(by the same calculation as in (4.2)),

Bw′′ =θw′ + ‖T1+θf‖−1w − βθw√

1− |β|2= θw′′ + αw ,


as claimed. To compute α, we recall that θ = λ/2, and infer from (4.3) that

‖Tθf‖2 = d(θ2 + 1)− 2θλ = d2 − 2θλ = (d− 2)2 , (4.5)

‖T1+θf‖2 = d((1 + θ)2 + 1

)− 2(1 + θ)λ = (d− 2)(d+ 2θ − 2) + d ,

as well as that

〈T1+θf, Tθf〉 = d(θ(1 + θ) + 1

)− (2θ + 1)λ = (d− 2)(d+ θ − 2) .

We therefore have

1− β2 = 1− (d+ θ − 2)2

(d− 2)(d+ 2θ − 2) + d=

(d+ θ − 2)(−θ) + θ(d− 2) + d

(d− 2)(d+ 2θ − 2) + d

=d− θ2

(d− 2)(d+ 2θ − 2) + d=

1

‖T1+θf‖2

and so, by (4.4)–(4.5), α = ‖Tθf‖ = d− 2. �

Acknowledgements. We thank Shayan Oveis Gharan for suggesting that

we study cutoff on Ramanujan graphs, and Perla Sousi for comments on an

earlier version of this manuscript. The research of E.L. was supported in

part by NSF grant DMS-1513403.

References

[1] D. Aldous. Random walks on finite groups and rapidly mixing Markov chains. In

Seminar on probability, XVII, volume 986 of Lecture Notes in Math., pages 243–297.

Springer, Berlin, 1983.

[2] D. Aldous and P. Diaconis. Shuffling cards and stopping times. Amer. Math. Monthly,

93(5):333–348, 1986.

[3] D. Aldous and J. A. Fill. Reversible markov chains and random walks on graphs,

2002. Available at http://www.stat.berkeley.edu/~aldous/RWG/book.html.

[4] N. Alon. Eigenvalues and expanders. Combinatorica, 6(2):83–96, 1986.

[5] N. Alon, I. Benjamini, E. Lubetzky, and S. Sodin. Non-backtracking random walks

mix faster. Commun. Contemp. Math., 9(4):585–603, 2007.

[6] N. Alon and V. D. Milman. λ1, isoperimetric inequalities for graphs, and supercon-

centrators. J. Combin. Theory Ser. B, 38(1):73–88, 1985.

[7] O. Angel, J. Friedman, and S. Hoory. The non-backtracking spectrum of the universal

cover of a graph. Trans. Amer. Math. Soc., 367(6):4287–4318, 2015.

[8] H. Bass. The Ihara-Selberg zeta function of a tree lattice. Internat. J. Math., 3(6):717–

797, 1992.

[9] C. Bordenave. A new proof of Friedman’s second eigenvalue Theorem and its exten-

sion to random lifts. 2015. Preprint, available at arXiv:1502.04482.

[10] G.-Y. Chen and L. Saloff-Coste. The cutoff phenomenon for ergodic Markov processes.

Electron. J. Probab., 13:no. 3, 26–78, 2008.

[11] F. R. K. Chung. Diameters and eigenvalues. J. Amer. Math. Soc., 2(2):187–196, 1989.

[12] F. R. K. Chung, V. Faber, and T. A. Manteuffel. An upper bound on the diameter

of a graph from eigenvalues associated with its Laplacian. SIAM J. Discrete Math.,

7(3):443–457, 1994.


[13] G. Davidoff, P. Sarnak, and A. Valette. Elementary number theory, group theory,

and Ramanujan graphs, volume 55 of London Mathematical Society Student Texts.

Cambridge University Press, Cambridge, 2003.

[14] P. Diaconis and M. Shahshahani. Generating a random permutation with random

transpositions. Z. Wahrsch. Verw. Gebiete, 57(2):159–179, 1981.

[15] R. Durrett. Random graph dynamics. Cambridge Series in Statistical and Probabilistic

Mathematics. Cambridge University Press, Cambridge, 2010.

[16] W. Feller. An introduction to probability theory and its applications. Vol. I. Third

edition. John Wiley & Sons, Inc., New York-London-Sydney, 1968.

[17] J. Friedman. A proof of Alon’s second eigenvalue conjecture and related problems.

Mem. Amer. Math. Soc., 195(910):viii+100, 2008.

[18] J. Friedman and D. Kohler. The relativized second eigenvalue conjecture of Alon.

2014. Preprint, available at arXiv:1403.3462.

[19] S. Hoory, N. Linial, and A. Wigderson. Expander graphs and their applications. Bull.

Amer. Math. Soc. (N.S.), 43(4):439–561 (electronic), 2006.

[20] M. Kotani and T. Sunada. Zeta functions of finite graphs. J. Math. Sci. Univ. Tokyo,

7(1):7–25, 2000.

[21] S. P. Lalley. Finite range random walk on free groups and homogeneous trees. Ann.

Probab., 21(4):2087–2130, 1993.

[22] E. Lubetzky and A. Sly. Cutoff phenomena for random walks on random regular

graphs. Duke Math. J., 153(3):475–510, 2010.

[23] E. Lubetzky and A. Sly. Explicit expanders with cutoff phenomena. Electron. J.

Probab., 16:no. 15, 419–435, 2011.

[24] A. Lubotzky. Discrete groups, expanding graphs and invariant measures. Modern

Birkhauser Classics. Birkhauser Verlag, Basel, 2010.

[25] A. Lubotzky, R. Phillips, and P. Sarnak. Ramanujan graphs. Combinatorica,

8(3):261–277, 1988.

[26] R. Lyons and Y. Peres. Probability on Trees and Networks. Cambridge University

Press. In preparation. Current version available at http://pages.iu.edu/~rdlyons/.

[27] A. Marcus, D. A. Spielman, and N. Srivastava. Interlacing families I: bipartite Ra-

manujan graphs of all degrees. Ann. of Math., 182(1):307–325, 2015.

[28] G. A. Margulis. Explicit group-theoretic constructions of combinatorial schemes and

their applications in the construction of expanders and concentrators. Problemy

Peredachi Informatsii, 24(1):51–60, 1988.

[29] A. Nilli. On the second eigenvalue of a graph. Discrete Math., 91(2):207–210, 1991.

[30] Y. Peres. American Institute of Mathematics (AIM) research workshop “Sharp

Thresholds for Mixing Times”, Palo Alto, December 2004. Summary available at

http://www.aimath.org/WWN/mixingtimes.

[31] N. T. Sardari. Diameter of Ramanujan graphs and random Cayley graphs with nu-

merics. 2015. Preprint, available at arXiv:1511.09340.

[32] P. Sarnak. Letter to Scott Aaronson and Andrew Pollington on the Solovay–Kitaev

Theorem and Golden Gates (with an appendix on optimal lifting of integral points).

February 2015. Available at http://publications.ias.edu/sarnak/paper/2637.

[33] J.-P. Serre. Repartition asymptotique des valeurs propres de l’operateur de Hecke Tp.

J. Amer. Math. Soc., 10(1):75–102, 1997.

[34] E. M. Stein and G. Weiss. Introduction to Fourier analysis on Euclidean spaces.

Princeton University Press, Princeton, N.J., 1971.


[35] W. Woess. Denumerable Markov chains. European Mathematical Society (EMS),

Zurich, 2009. Generating functions, boundary theory, random walks on trees.

E. Lubetzky

Courant Institute, New York University, 251 Mercer St., New York, NY 10012.

E-mail address: [email protected]

Y. Peres

Microsoft Research, One Microsoft Way, Redmond, WA 98052.

E-mail address: [email protected]

cutoff on all ramanujan graphs - nyu couranteyal/papers/ramanujan.pdfcutoff on all ramanujan graphs...

Documents