topics in probabilistic method -...

Topics in Probabilistic MethodSelected mainly from Introduction to Graph Ramsey Theory

by Y. Li and W. Zang

Summer School 2017, Supported by NSFC

Jiangsu Normal University

Yusheng Li

Tongji University

Contents

1 Semi-random method 11.1 An example for random method . . . . . . . . . . . . . . 11.2 Semi-random method . . . . . . . . . . . . . . . . . . . . 21.3 New functions . . . . . . . . . . . . . . . . . . . . . . . . 51.4 Independence number of sparse graph . . . . . . . . . . . 6

2 Basic probabilistic method 112.1 Random graphs . . . . . . . . . . . . . . . . . . . . . . . 112.2 Elementary Examples . . . . . . . . . . . . . . . . . . . . 142.3 Label Vertices Randomly . . . . . . . . . . . . . . . . . . 192.4 Pick Vertices Randomly . . . . . . . . . . . . . . . . . . 23

3 The Lovasz Local Lemma 273.1 The local lemma . . . . . . . . . . . . . . . . . . . . . . 283.2 Applications of the local lemma . . . . . . . . . . . . . . 343.3 Triangle-free process ? . . . . . . . . . . . . . . . . . . . 40

4 Concentration 454.1 The Chernoff’s Inequality . . . . . . . . . . . . . . . . . 454.2 Applications of Chernoff’s Bounds . . . . . . . . . . . . . 534.3 Martingales on Random Graphs ? . . . . . . . . . . . . 564.4 Parameters of Random Graphs . . . . . . . . . . . . . . 62

5 Quasi-random graphs 695.1 Properties of dense graphs . . . . . . . . . . . . . . . . . 705.2 Paley Graphs . . . . . . . . . . . . . . . . . . . . . . . . 785.3 Graph with small second eigenvalue . . . . . . . . . . . . 82

1

0 CONTENTS

5.4 Erdos-Renyi graphs . . . . . . . . . . . . . . . . . . . . . 875.5 Applications of characters ? . . . . . . . . . . . . . . . . 935.6 Some multi-color Ramsey numbers . . . . . . . . . . . . 105

6 Real-world Networks 1156.1 Data and empirical research . . . . . . . . . . . . . . . . 1156.2 Six degrees of separation . . . . . . . . . . . . . . . . . . 1166.3 Clustering coefficient . . . . . . . . . . . . . . . . . . . . 1186.4 Small-world networks . . . . . . . . . . . . . . . . . . . . 1206.5 Power law and scale-free networks . . . . . . . . . . . . . 1216.6 Network Structure . . . . . . . . . . . . . . . . . . . . . 1246.7 References . . . . . . . . . . . . . . . . . . . . . . . . . . 127

Chapter 1

Semi-random method

1.1 An example for random method

Let α(G) be its independence number. We shall show the followingresult, called Turan bound, by random method. However, the problemitself is nothing on randomness.

Theorem 1.1 Let G = (V,E) be a graph of order N and average degreed. Then

α(G) ≥∑v∈V

1

1 + d(v)≥ N

1 + d.

Proof. Label all vertices in V (G) randomly by 1, 2, · · · , N. Define aset

I = v ∈ V : `(v) < `(w) for any w ∈ N(v),where `(v) is the label of v. Note that I is a random set determined by`. Let Xv be the indicator random variable for v ∈ I and X =

∑v∈V Xv,

clearly X = |I| and

E(X) =∑v

E(Xv) =∑v

Pr[v ∈ I] =∑v

1

1 + d(v),

since v ∈ I if and only if v is the least element among v and its neigh-bors. So there exists a specific labeling such that |I| ≥ E(X). ClearlyI is always an independent set thus α(G) ≥ |I|. Then the first in-equality follows. The second comes from the convexity of the functionf(x) = 1/(1 + x). 2

1

2 CHAPTER 1. SEMI-RANDOM METHOD

1.2 Semi-random method

For graphs F and H, Ramsey number r(F,H) is defined to be thesmallest N such that if G is a graph of order N , then either G containsF or G contains H. Let us denote r(Kk, Kn) by r(k, n), called asthe classic Ramsey number. Thus r(k, n) is the smallest N such thateither ω(G) ≥ k or α(G) ≥ n for any graph G of order N . Notethat ω(G) = α(G). Good bounds for α(G) are expected to give good

estimation on r(k, n). It is well known that r(k, n) ≤(k+n−2k−1

). Ajtai,

Komlos and Szemeredi (1980) proved that r(k, n) ≤ (5000)k nk−1

(logn)k−2

for fixed k ≥ 2. Let us gain an overview of the technique employed bythem.

A greedy algorithm to obtain an independent set is to put a vertex vinto the independent set and then delete all neighbors of v, and repeatthe process.

In order to produce a larger independent set by above algorithm,we hope to delete less vertices and more edges in each step so that theremaining graph is sparser. What v should be chosen? To obtain somecriterion, we define Q(v) to be the number of edges that incident witha neighbor of v, and define

Q0(v) =∑

u∈N(v)

deg(u).

Note that if we delete a vertex v and its neighbors, we delete exactlyQ(v) edges.

Lemma 1.1 Let v be a vertex of a graph G, then

Q(v) ≤ Q0(v),

and the equality holds if and only if N(v) contains no edge.

We shall establish a criterion R(v) ≥ 0 for choosing v. Before this,let us have a property of Q0(v).

Lemma 1.2 Let G = (V,E) be any graph with average degree d. Thenthe average value of Q0(v) over v ∈ V is at least d2.

1.2. SEMI-RANDOM METHOD 3

Proof. Let N denote the order of G. Then

1

N

∑v∈V

Q0(v) =1

N

∑v∈V

∑u∈N(v)

deg(u) =1

N

∑u∈V

deg(u)∑

v∈N(u)

1

=1

N

∑u∈V

deg2(u) ≥(

1

N

∑u∈V

deg(u)

)2

= d2,

where we have used the convexity of function f(x) = x2. 2

Ajtai, Komlos and Szemeredi defined a vertex v to be a groupieif the average degree of its neighbors is at least the average degree ofG. Then every graph has a groupie. This comes from an easy factthat there is some vertex v ∈ V so that Q0(v) − d deg(v) ≥ 0 as theinequality holds on average from Lemma 1.2. By deleting groupie andits neighbors recursively, they proved that for any triangle-free graphG of order N and average degree d,

α(G) ≥ cNlog d

d, (1.1)

where c = 1/100, and log x is the natural logarithmic function. Ajtai,Erdos, Komlos and Szemeredi conjectured in 1981 that the best con-stant is 1 − o(1), where the small term o(1) tends to zero as d → ∞.By adding the details to the proof for (1.1), Griggs (1983) improvedthe constant to 1/2.4. These results considerably improved the Turanbound when d is large. Now, let N = r(3, n)−1, there is a triangle-freegraph G on N vertices with independence number at most n−1. Sinceeach neighborhood of a triangle-free graph is an independent set, thusα(G) ≥ ∆(G) ≥ d, and n− 1 ≥ cN log(n−1)

n−1, it follows by

r(3, n) = N + 1 ≤(

1

c+ o(1)

)n2

log n,

as n → ∞. This bound is much stronger than the bound r(3, n) ≤(n+1

2

)= 1

2n(n+ 1). A deep result of Kim (1995) proved that

r(3, n) ≥(

1

162− o(1)

)n2

log n.


This fact and the argument just mentioned imply that one cannot ex-pect to improve lower bound (1.1) more than a multiplicative constanteven just for triangle-free graphs.

The method of Ajtai, Komlos and Szemeredi is now called “semi-random method” or “nibble method” initialized by Rodl (1985), inwhich they selected their objects in many small “nibbles” rather thana big “bite”, and then analyzed how the nibbles change the structureof the remainder. It was the method that Kim made an elaborate usein his random structure to obtain his lower bound of r(3, n).

In order to find a larger independent set, the key step is to determinethe vertex, which together with its neighbors, will be deleted. We nowlook how Shearer (1983) determined this vertex for triangle-free graphs.Suppose that f(x) is the function for which we want to prove

α(G) ≥ Nf(d)

for a triangle-free graph G. Naturally, we assume that f(x) is positive,decreasing, and more importantly, f(x) ≥ c log x/x for some constantc > 1/100 when x is sufficiently large. Let P (v) = d(v) + 1 and Q(v)denote the number of edges incident with v or one of neighbors of v. LetH denote the graph obtained from G by deleting v and its neighbors.Note that we delete exactly P (v) vertices and Q(v) = Q0(v) edges sinceG is triangle-free. Then H has N − P (v) vertices and Nd/2 − Q(v)edges. So its average degree is

dH =Nd− 2Q(v)

N − P (v).

By induction,

α(G) ≥ 1 + α(H) ≥ 1 + [N − P (v)]f(dH).

We do not know which of d and dH is bigger. That is why the algorithmof Ajtai et. al. deletes a groupie and its neighbors. However we canswap f(dH) with f(d), if we further assume that f(x) is convex so thatwe can use the fact f(x) ≥ f(d) + f ′(d)(x− d). Thus we have

1 + [N − P (v)]f(dH)

≥ 1 + (N − P (v))(f(d) + f ′(d)

(Nd− 2Q(v)

N − P (v)− d

))= Nf(d) +R(v)

1.3. NEW FUNCTIONS 5

whereR(v) = 1− P (v)f(d)− [2Q(v)− dP (v)]f ′(d).

The only thing left to prove is R(v) ≥ 0 for some v. To find such avertex v, let us look the average of R(v) as follows.

1

N

∑v

R(v) ≥ 1− (d+ 1)f(d)− [2d2 − d(d+ 1)]f ′(d)

= 1− (d+ 1)f(d)− d(d− 1)f ′(d),

where we used the fact that the average of Q(v) is at least d2 by Lemma1.2 and the assumption that f ′(x) ≤ 0. So we need that the functionf(x) satisfies the following differential equation

x(x− 1)f ′(x) + (x+ 1)f(x) = 1.

Solving this differential equation, Shearer thus obtained that

f(x) =x log x− x+ 1

(x− 1)2.

Luckily enough, f(x) is positive, decreasing and convex. Moreover,f(x) ∼ log x/x as x→∞.

1.3 New functions

We shall generalize Shearer’s result to improve upper bound of r(k, n)for general k.

Lemma 1.3 For m ≥ 1 and x ≥ 0, the function

fm(x) =∫ 1

0

(1− t)1/m

m+ (x−m)tdt

satisfies the differential equation

x(x−m)f ′m(x) + (x+ 1)fm(x) = 1. (1.2)

Moreover, fm(x) is completely monotonic on [0,∞), that is to say,(−1)kf (k)

m (x) > 0 for all k ≥ 0 and x ≥ 0. In particular, fm(x) ispositive, decreasing, and convex.

Proof. By differentiating under the integral and then integrating byparts, we have

x(x−m)f ′m(x)

= −x(x−m)∫ 1

0

(1− t)1/mt

(m+ (x−m)t)2dt

= x∫ 1

0(1− t)1/mt

d

dt

(1

m+ (x−m)t

)dt

= −x∫ 1

0

(1− t

m(1− t)

)(1− t)1/m

m+ (x−m)tdt

= −xfm(x) +∫ 1

0

(1− t)1/m

m

[1

1− t− m

m+ (x−m)t

]dt

= −xfm(x) + 1− fm(x).

Hence (1.2) follows. The complete monotonicity of fm(x) can be seenby repeated differentiating under the integral. 2

Corollary 1.1 For 0 ≤ x ≤ m, fm(x) ≤ 1/(1 + x), and for m ≥1, fm(x) ≥ log(x/m)−1

x.

Proof. The first statement comes from the differential equation inLemma 1.3 immediately since f ′m(x) < 0. To just the statement for thecase x > m, note that

fm(x) ≥∫ 1

0

(1− t)dtm+ (x−m)t

=x log(x/m)− (x−m)

(x−m)2>

log(x/m)− 1

x.

The last inequality holds since it amounts to

(2x−m) log(x/m) > x−m or (2t− 1) log t > t− 1

for t > 1, which is easy to prove. 2

1.4 Independence number of sparse graph

Theorem 1.2 Let G be a graph with N vertices and average degree d.If any subgraph induced by a neighborhood has the maximum degree atmost a, then

α(G) ≥ Nfa+1(d).

1.4. INDEPENDENCE NUMBER OF SPARSE GRAPH 7

Proof. We prove by induction on N , the number of vertices of G. IfN ≤ a+2, then d ≤ a+1. By the corollary, we have 1/(d+1) ≥ fa+1(d).It follows from Turan’s theorem that α(G) ≥ N

d+1≥ Nfa+1(d). So we

suppose N > a+ 2 hereafter. By the preceding argument, we may alsoassume d > a + 1. We shall let Gv stand for the subgraph of a graphG induced by the neighborhood of v. In case some vertex v of G hasdegree N − 1, by virtue of Turan’s theorem, we have α(Gv) ≥ N−1

a+1as

the maximum degree of Gv is at most a. It follows that

α(G) ≥ α(Gv) ≥N − 1

a+ 1≥ N

a+ 2= Nfa+1(a+ 1) ≥ Nfa+1(d).

So we suppose henceforth that

the maximum degree of G is at most N − 2.

For each v ∈ V (G), let P (v) = d(v) + 1 and let Q(v) denote thenumber of edges of G that are incident with either v or one of itsneighbors. Since the average degree of Gv is at most a, Gv contains atmost a

2d(v) edges. It follows that

Q(v) ≥∑

u∈N(v)

d(u)− a

2d(v).

Consequently, the average value of Q satisfies

1

N

∑v∈V

Q(v) ≥ d2 − ad

2.

SetR(v) = 1 + [P (v)d− 2Q(v)]f ′a+1(d)− P (v)fa+1(d).

Note that the coefficient of Q(v) is positive since f ′a+1(d) < 0. Then

1

N

∑v∈V

R(v) ≥ 1 + [(d+ 1)d− 2d2 + ad]f ′a+1(d)− (d+ 1)fa+1(d)

= 1− d(d− a− 1)f ′a+1(d)− (d+ 1)fa+1(d) = 0.

Hence there exists a vertex v0 ∈ V (G) such thatR(v0) ≥ 0. LetR(v0) =R, P (v0) = P and Q(v0) = Q. Then

R = 1 + (P d− 2Q)f ′a+1(d)− P fa+1(d) ≥ 0.


Delete v0 and its neighbors from G, in view of that the maximum degreeof G is at most N − 2, we obtain a nontrivial graph H with N − Pvertices and Nd/2 − Q edges. Note that any subgraph induced by aneighborhood of H has a maximum degree at most a, so by inductionhypothesis,

α(H) ≥ (N − P )fa+1

(Nd− 2Q

N − P

).

Clearly α(G) ≥ 1 + α(H). Since fa+1 is convex, fa+1(x) ≥ fa+1(d) +f ′a+1(d)(x− d) for all x ≥ 0. Combining these two facts, we obtain

α(G) ≥ 1 + α(H) ≥ 1 + (N − P )fa+1

(Nd− 2Q

N − P

)

≥ 1 + (N − P )

fa+1(d) + f ′a+1(d)

(P d− 2Q

N − P

)= 1 + (N − P )fa+1(d) + (P d− 2Q)f ′a+1(d)

= Nfa+1(d) + R ≥ Nfa+1(d).

This completes the proof. 2

Theorem 1.3 Let k ≥ 2 be fixed. Then for all large n,

r(k, n) ≤ (1 + o(1))nk−1

(log n)k−2.

Theorem 1.4 Let k ≥ 2 be fixed. Then for all large n,

r(k, n) ≤ (1 + o(1))nk−1

(log n)k−2.

Proof We will prove the assertion by induction on k. For k = 2, itfollows the trivial case r(2, n) = n.

For the case k = 3. Let G be graph of order N = r(3, n)− 1 whichcontains no triangles and α(G) ≤ n − 1. Since G is triangle-free, eachsubgraph induced by a neighborhood is empty thus its average degreeis zero. Let d be average degree of G. Since each neighborhood is anindependence set, n−1 ≥ α(G) ≥ d. By the inequality that f1 satisfies,

n− 1 ≥ Nf1(d) ≥ Nf1(n− 1) ≥ Nlog(n− 1)− 1

n− 1.

1.4. INDEPENDENCE NUMBER OF SPARSE GRAPH 9

Then r(3, n) − 1 < (n−1)2

logn−1, and it follows by r(3, n) ≤ n2

log(n/e)for large

n.Suppose the statement holds for 2, 3, . . . , k. We proceed to the

induction step. Let G be a graph of order N = r(k + 1, n) − 1 suchthat G contains no Kk+1 and that α(G) ≤ n− 1. Then for each vertexv of G, we have• the degree of v is at most r(k, n)− 1, and• the maximum degree of 〈N(v)〉 is at most r(k − 1, n)− 1,

where 〈N(v)〉 is the subgraph induced by N(v). For any ε > 0, let

d = (1 + ε)nk−1

(log n)k−2, and m =

⌊(1 + ε)

nk−2

(log n)k−3

⌋.

From induction hypothesis, r(k, n) < d and r(k − 1, n) < m for largen. Thus Theorem 1.2 implies that

n > α(G) ≥ Nfm(r(k, n)− 1) > Nfm(r(k, n)),

We may assume that r(k, n) > m as otherwise we are done. By theproperty of the function fm(x) in last section as it is decreasing oneither of x and m, we obtain

n > Nlog(r(k, n)/m)− 1

r(k, n)≥ N

log(n/ log n)− 1

(1 + ε)nk−1/(log n)k−2,

implies that for large n,

r(k + 1, n) = N + 1 ≤ (1 + 2ε)nk

(log n)k−1,

as asserted for case (k + 1). 2

Chapter 2

Basic probabilistic method

2.1 Random graphs

Probabilistic method basing on mathematical expectation is called ba-sic probabilistic method. We have an example in Lecture 1. To use thismethod, we should establish a appropriate probabilistic space. The s-pace of random graphs is used often and the graph Ramsey theory isthe birthplace of random graphs.

Every probability space whose points are graphs gives a notion of arandom graph. For a family of graphs G = G1, G2, · · · with probabil-ities Pr(Gi) such that 0 ≤ Pr(Gi) ≤ 1 and

∑i≥1 Pr(Gi) = 1, we have a

probability space of random graphs. Each Gi is called a random graphof G with probability Pr(Gi). We shall consider the probability spacethat consists of graphs on a fixed set V = [n] = 1, 2, . . . , n, wherethe vertices in V are distinguishable, so edges on V are distinguishable,too. Note that the complete graph Kn on vertex set V has(

n

1

)+

(n

2

)2 + · · ·+

(n

k

)2(k2) + · · ·+

(n

n

)2(n2)

subgraphs. The general term corresponds the subgraphs that have

exactly k vertices, and the last term 2(n2) corresponds all spanning sub-graphs.

Let us label all edges of Kn on vertex set V = [n] as e1, e2, . . . , em,

where m =(n2

). Note that the number of graphs on vertex set [n]

11

12 CHAPTER 2. BASIC PROBABILISTIC METHOD

is 2m since the edges are distinguishable. The space G(n; p1, . . . , pm)is defined for 0 ≤ pi ≤ 1 as follows. To get a random element ofthis space, one selects the edge ei independently, with probability pi.Putting it another way, the ground set of G(n; p1, . . . , pm) is the set ofall 2m graphs on V = [n]. For a specific graph H in the space withE(H) = ej : j ∈ S, where S ⊆ 1, . . . ,m is the index set of edgesof H, the probability that H appears is∏

j∈Spj∏j 6∈S

(1− pj).

That is to say, each of the edges of H has to be selected and none ofH is allowed to be selected. Write qj = 1− pj and G(p1, . . . , pm) for arandom element in G(n; p1, . . . , pm), then

Pr(G(p1, . . . , pm) = H)) = (Πj∈Spj) (Πj 6∈Sqj) .

Since the vertices (and hence edges) are distinguishable, the eventG(p1, . . . , pn) = H is different from that G(p1, . . . , pm) is isomorphicto H. To see that G(n; p1, . . . , pm) is truly a probability space, let usverify that∑

H

Pr(G(p1, . . . , pn) = H)) =∑S⊆[m]

(Πj∈Spj) (Πj 6∈Sqj)

= Πmj=1(pj + qj) = 1.

We shall concentrate on the case p1 = p2 = · · · = pm = p, for whichthe probability space G(n; p1, . . . , pm) is written as G(n, p).

In space G(n, p) the probability of a specific graph H in the spacewith k edges is pk(1−p)m−k: each of the k edges of H has to be selectedand none of H is allowed to be selected. Write Gp for a random elementof G(n, p), then

Pr(Gp = H) = pe(H)qm−e(H).

In the space G(n, 0), the probability that the empty graph Kn appears isone, and the probability that any other graph appears is zero. Similarly,in the space G(n, 1), the only graph that appears is Kn. Other thanthese two extremal cases, 0 < p < 1, any graph on vertex set [n] canappear with a positive probability. As p increases from 0 to 1, randomgraph Gp evolves from empty to full.

2.1. RANDOM GRAPHS 13

Another point of view may be convenient in which one colors alledges of the complete graph Kn with probability p, randomly and in-dependently. Thus random graph Gp is viewed as a random coloring ofedge set of Kn. The coloring of edge set of Kn is also said a coloring ofKn in short. Recalling the definition of Ramsey numbers, we see whythe relation between random method and Ramsey theory is so naturaland tight.

It is worth remarking that p = p(n) is often a function. The spaceG(n, p) is of great interest for fixed values of p as well; in particular,G(n, 1/2) could be viewed as the space: it consists of all 2m graphs onV = [n], and the probability of any graph is equiprobable. This is justa classical probability space. Thus Gn,1/2 is also obtained by pickingany of the 2m graphs on V = [n] at random with probability 2−m. Nomatter how p is fixed or a function, we tend to be interested in whathappens as n→∞.

Now we have obtained a space of random graphs, every graph invari-ant becomes a random variable; the nature of such a random variabledepends crucially on the space. For instant, the number Xk(G) of com-plete graphs of order k in G is a random variable on our space of randomgraphs.

To be proficient in the probabilistic method one must have a feelingfor asymptotic calculation. For the sake of convenience, we state somesimple inequalities that will be used in the calculations. The followingprecise formula is called Stirling formula.

Lemma 2.1 For all n ≥ 1,

n! =√

2πn(n

e

)nexp 1

12n+ θ,

where 0 < θ = θn < 1. Thus

√2πn

(n

e

)n< n! <

√2πn

(n

e

)ne1/12n,

and

n! = (1 + o(1))√

2πn(n

e

)n>(n

e

)n.

2


Lemma 2.2 For any positive integers N ≥ n,(N

n

)≤ Nn

n!≤(eN

n

)n.

If n = o(√N) as n→∞, then(

N

n

)∼ Nn

n!.

Proof. The first two inequalities are immediate from Stirling’s formula,and then it suffices to see

N(N − 1) · · · (N − n+ 1)

Nn= exp

[n−1∑i=1

log(

1− i

N

)]

= exp

[−

n−1∑i=1

i

N−O

(n2

N

)],

which goes to 1, and the desired asymptotical formula follows. 2

The following simple fact from calculus is often used.

Lemma 2.3 For any 0 ≤ x ≤ 1 and n ≥ 0, (1 − x)n ≤ e−nx. Ifx = x(n)→ 0 and x2n→ 0 as n→∞, then (1− x)n ∼ e−nx.

2.2 Elementary Examples

In the original proof of the exponent lower bound for r(n, n) in 1947,Erdos did not use the formal probabilistic language. So his paper hasbeen considered as an informal starting point of random graphs. Butin two papers published in 1959 and 1961, in which he gave a lowerbound c(n/ log n)2 for r(3, n), he even wrote probabilities in the titles.

Theorem 2.1 For n ≥ 1,

r(n, n) >n

e√

22n/2.

2.2. ELEMENTARY EXAMPLES 15

Proof. Consider the random graphs in G(N, 1/2), or color KN random-ly and independently with probability p = 1/2, where N is a positiveinteger to be chosen. Let S be a set of n vertices and let AS be theevent that S is monochromatic. Then

Pr[AS] = 2(

1

2

)(n2)= 21−(n2),

as for S to hold all(n2

)edges must be colored the same. Consider the

event⋃AS over all n-sets on [N ]. We use the simple fact that the

probability of a disjunction is at most the sum of the probability of theevents. Thus

Pr[∪AS] ≤∑

Pr[AS] =

(N

n

)21−(n2).

If this probability is less than one, then the event B =⋂AS has positive

probability. Therefore B is not the null event. Thus there is a point inthe probability space for which B holds. But a point in the probabilityspace is precisely a coloring of the edges of KN . And the event B isprecisely that under this coloring there is no monochromatic Kn. Hencer(n, n) > N .

We need to find the maximum possible N such that Pr[∪AS] < 1.From Stirling formula, we have(

N

n

)21−(n2) ≤ Nn

n!21−(n2) <

2√2πn

(e√

2N

n2n/2

)n.

This can be ensured by setting N =⌊

ne√

22n/2

⌋such that the fraction in

the parenthesis is at most one. Therefore r(n, n) ≥ N + 1 > ne√

22n/2.

2

Remark. The original proof of Erdos used the counting argument.He in fact used the space G(N, 1/2), which is a classical probabilityspace as mentioned. It is interesting to see that this space is the onlyone that counting argument works! For a property Q of graphs, if theprobability of graphs Gp of G(n, p) satisfying Q tends to 1 as n → ∞,we say that almost all graphs Gp satisfies Q. The above argument isan “almost all” argument, but nobody can construct one (family) ofthem.

Theorem 2.2 Let m,n and N be positive integers. If for some 0 N .

Proof. Consider random graphs Gp in G(N, p). Let S be a set of mvertices and let AS be the event that S induces a complete graph. Then

Pr[AS] = p(m2 ), and

Pr[∪AS] ≤∑

Pr[AS] =

(N

m

)p(

m2 ).

Let T be a set of n vertices and let BT be the event that T induces anindependent set. Then

Pr[∪BT ] ≤∑

Pr[BT ] =

(N

n

)p(

n2).

ThusPr[(∪AS) ∪ (∪BT )] < 1.

So there exists a graph on N vertices such that there is neither aninduced Km nor an induced Kn, thus r(m,n) > N . 2

The above result is ineffective in bounding r(3, n). We now examinethe lower bound of r(4, n). We shall give details in calculation forchoosing a suitable value of p, and that of N as large as possible forlarge n. To make the condition in Theorem 2.2 satisfied, we roughly

estimate(Nn

)as (eN/n)n, and (1− p)(

n2) as e−p(

n2), hence

(Nn

)(1− p)(

n2)

as (eN

n

)nexp

−p(n

2

)=(

eN

nep(n−1)/2

)n.

We have known that r(4, n) ≤ (1 + o(1))n3/(log n)2 in the last chap-ter, thus ep(n−1)/2 = na+o(1) for some constant a, so we take p =(c1 log n)/(n− 1). Then(

N

4

)p6 ∼ 1

24N4

(c1 log n

n

)6

∼ c61

24

N (log n

n

)3/24

< 1,

2.2. ELEMENTARY EXAMPLES 17

so N ∼ c2(n/ log n)3/2 for some constant c2.Formally, we let p = (c1 log n)/(n − 1) and N = bc2(n/ log n)3/2c,

where c1 and c2 are positive constants to be chosen satisfying thatc6

1c42 < 24. Then(

N

4

)p6 <

N4

24p6 ≤ c6

1c42

24

(n

n− 1

)6

≤ c3 < 1

for large n, where c3 is a constant. For the second term, we estimate

that (1− p)(n2) < e−pn(n−1)/2 = n−c1n/2 and(N

n

)(1− p)(

n2) <

(eN

n

)nn−c1n/2 =

(eN

n1+c1/2

)n.

In order to make the above tending to zero, we have to take c1 ≥ 1. Onthe other hand, in order to take c2 as large as possible with c6

1c42 < 24,

we have to take c1 as small as possible. So it has to be c1 = 1.Now, we may hope to optimize the constant c2. Since we need only

c2 < 241/4, so c2 = 241/4 − ε will be ok. Thus we have

r(4, n) ≥ (241/4 − o(1))

(n

log n

)3/2

.

Hereafter we will choose p with some foresight. For general m ≥ 4, bytaking p = (m− 3) log n/(n− 1), the similar calculation yields

r(m,n) ≥ c

(n

log n

)(m−1)/2

.

It is often to replace log n/(n− 1) in the expression of p by log n/n.We have seen that the property of Gp is sensitive with the value of

p. To ensure that Gp contains no Km (with a positive probability, more

precisely, or(Nm

)p(

m2 ) is small), it is better to take smaller p. But it is

better to take a bigger p to ensure that there is no induced Kn (i.e,(Nn

)(1 − p)(

n2) is small). Our task is to balance both sides to obtain a

larger N as possible.

We shall improve the obtained lower bounds for r(n, n) and r(m,n)by the proofs so called deletion method.

Theorem 2.3 As n→∞,

r(n, n) ≥ (1− o(1))n

e2n/2.

Proof. Consider the random graphs in G(N, 1/2). Let X be the num-ber of clique or independent set of size n. Then

X =∑

XS,

the sum over all n−set S, where XS is the indicator random variableof the event AS that S is a clique or independent. That is

XS =

1 if S induces Kn or Kn

0 otherwise.

Therefore

E[XS] = Pr[AS] = 2(

1

2

)(n2).

By linearity of expectation

E[X] =∑

E[XS] =

(N

n

)21−(n2).

There is a point in the probability space for which X does not exceedits expectation. That is, there is a graph with at most(

N

n

)21−(n2)

S that induces a Kn or a Kn. Fix that graph. For each such S select apoint x ∈ S arbitrarily and delete it from the vertex set. The remainingpoint V ∗ have neither Kn nor Kn. Thus

r(n, n) > |V ∗| ≥ N − E(X).

The rest of the proof is to find N such that |V ∗| as large as possible.

By taking N = bn2n/2

ec, from the Stirling formula, we have(

N

n

)21−(n2) <

(eN

n

)n21−(n2) < 2

(e√

2N

n2n/2

)n≤ 2n/2+1,

which is o(N). Thus r(n, n) ≥ (1− o(1))N . 2

2.3. LABEL VERTICES RANDOMLY 19

Theorem 2.4 For any positive integer m,n and N , and any real num-ber 0 N −(N

m

)p(

m2 ) −

(N

n

)(1− p)(

n2).

Consequently,

r(m,n) ≥ c

(n

log n

)m/2for all large n.

Proof. The first assertion is obvious. For the second, setting N =

a(n/ log n)m/2 and p = (m−2) log n/(n−1) such that a− (m−2)(m2 )

m!am >

0, this is possible since am = o(a) as a→ 0+. Then

N1 =

(N

m

)p(

m2 ) ∼ (m− 2)(

m2 )am

m!

(n

log n

)m/2,

and

N2 =

(N

n

)(1− p)(

n2) <

(eN

n

)ne−pn(n−1)/2 =

(eN

nm/2

)n→ 0.

So if c < a− (m−2)(m2 )am

m!

r(m,n) ≥ N −N1 −N2 > c

(n

log n

)m/2.

2

2.3 Label Vertices Randomly

We have proved the Turan bound α(G) ≥ ∑v 1/(1+d(v)) in Lecture 1,where we label vertices randomly. Let us have another result proven insimilar way. Let us introduce some notions before presenting results.Given a graph G = (V,E), set

Ni(v) = w ∈ V : d(w, v) = i,


which is the set of all the vertices of distance i from vertex v in G,and set di(v) = |Ni(v)|. Thus d0(v) = 1 and d1(v) = d(v). We do notdistinguish the subset Ni(v) and the subgraph of G induced by Ni(v)when there is no danger of confusion. The graph G is called (m, k)-colorable if Ni(v) is k−colorable for any vertex v and any i ≤ m, thatis, there is an assignment of k colors on vertices of Ni(v) so that no twoadjacent vertices receive the same color.

Theorem 2.5 Let m ≥ 2 and k ≥ 1 be integers and let G = (V,E) bean (m, k)-colorable graph. Then

α(G) ≥ c

(∑v∈V

d(v)1/(m−1)

)(m−1)/m

,

where c = 1k2(m−1)/m . So if G is d-regular, then α(G) ≥ c|V |1−1/md1/m.

Lemma 2.4 Let G = (V,E) be a (1, k)-colorable graph. Then

α(G) ≥ 1

k

∑v∈V

d1(v)

1 + d1(v) + d2(v).

Proof. Randomly label the vertices of G with a permutation of integers1, 2, . . . , N , where N = |V |. Let X be the set of all the vertices v suchthat the minimum label on the vertices in v ∪ N1(v) ∪ N2(v) is ona vertex in N1(v). Then the probability that X contains a vertex v is

d1(v)1+d1(v)+d2(v)

. So the expected size of X is∑v∈V

d1(v)1+d1(v)+d2(v)

. It followsthat for certain fixed permutation of integers from 1 to N , we have|X| ≥ ∑v∈V

d1(v)1+d1(v)+d2(v)

. We aim to prove that there is an independent

set in this X with size at least |X|/k.To this end, we define a relation R on X as follows. Let u, v ∈ X.

Call u and v satisfy the relation R if the minimum label on u∪N1(u)∪N2(u) is precisely the same as that on the vertices in v∪N1(v)∪N2(v).Clearly R is an equivalence relation, and thus X can be partitioned intocertain equivalence classes X1, X2, . . . , Xp for some positive integer p.For each 1 ≤ i ≤ p, by the definition of relation R, all vertices in Xi

share a neighbor vi in common, in which for any u ∈ Xi, the label ofvi is the minimum label on vertices in u ∪ N1(u) ∪ N2(u). Hence

2.3. LABEL VERTICES RANDOMLY 21

Xi ⊆ N1(vi) and clearly vi 6= vj for i 6= j. We claim that there isno edge between Xi and Xj whenever 1 ≤ i 6= j ≤ p. To justify it,assume the contrary: some wi ∈ Xi is adjacent to some wj ∈ Xj. Sincewi ∈ X and since vj ∈ N1(wi) ∪N2(wi), by the definition of X we seethat the label on vi is less than that on vj. Similarly, by consideringwj, we conclude that the label on vj is less than that on vi, yielding acontradiction.

Since Xi ⊆ N1(vi) for each 1 ≤ i ≤ p is k−colorable there is anindependent set Yi in Xi with |Yi| ≥ |Xi|/k. It follows from the aboveclaim that ∪pi=1Yi is an independent set with size at least

∑pi=1 |Xi|/k =

|X|/k, as desired. 2

For a triangle-free graph G, the proof is easier since X itself isindependent, so there is no need to introduce the equivalence relation.

Lemma 2.5 Let G = (V,E) be an (m, k)-colorable graph. Then forany 1 ≤ ` ≤ m+ 1, we have

α(G) ≥ 1

2k

∑v∈V

1 + d1(v) + · · ·+ d`−1(v)

1 + d1(v) + · · ·+ d`(v).

Proof. The proof goes along the same line as that of the precedinglemma, so we only give a sketch here.

Randomly label the vertices of G with a permutation of the integers1, 2, . . . , N , where N = |V |. Let X be the set of all the vertices v suchthat the minimum label on the vertices in ∪`j=0Nj(v) is on a vertex

∪`−1j=0Nj(v). Then for certain fixed permutation of the integers from 1

to N , we have

|X| ≥∑v∈V

1 + d1(v) + · · ·+ d`−1(v)

1 + d1(v) + · · ·+ d`(v).

We aim to prove that there is an independent set in this X with sizeat least |X|/(2k).

To this end, define an equivalence relation R on X such that uand v satisfy R if the minimum label on the vertices in ∪`j=0Nj(u) isprecisely the same as that on the vertices in ∪`j=0Nj(v). Then X canbe partitioned into certain equivalence classes X1, X2, . . . , Xp for some


integer p ≥ 1. It can be shown that for each 1 ≤ i ≤ p, there exists avertex vi such that• the distance between each vertex in Xi and vi is at most ` − 1;

and• vi is the vertex with the minimum label in ∪`j=0Nj(v) for each

v ∈ Xi.Based on these vi, we can deduce that there is no edge between Xi

and Xj whenever 1 ≤ i 6= j ≤ p. Now partition each Xi into subsetsXi,j, 1 ≤ j ≤ ` − 1, such that the distance between every vertex inXi,j and vi is j. Since each Xi,j contains an independent set Yi,j of sizeat least |Xi,j|/k, where 1 ≤ i ≤ p and 0 ≤ j ≤ ` − 1. Thus one of∪pi=1 ∪odd j Yi,j and ∪pi=1 ∪even j Yi,j is an independent set with size at

least 12k∪pi=1 ∪`−1

j=0Xi,j = |X|2k

, completing the proof. 2

Proof of Theorem 2.5. Applying Lemma 2.4 and Lemma 2.5repeatedly, we have

α(G) ≥ 1

k(m− 1)

∑v∈V

(d1(v)

1 + d1(v) + d2(v)

+1 + d1(v) + d2(v)

2(1 + d1(v) + d2(v) + d3(v))

+ · · ·+ 1 + d1(v) + · · ·+ dm−1(v)

2(1 + d1(v) + · · ·+ dm(v))

).

Since the arithmetic mean is no less than the geometric mean, we obtain

α(G) ≥ 1

k2(m−2)/(m−1)

∑v∈V

(d1(v)

1 + d1(v) + · · ·+ dm(v)

)1/(m−1)

.

By the condition that Ni(v) is k−colorable, there is an independent setin Ni(v) with size αi(v) ≥ di(v)/k, and by the fact that there is no edgebetween Ni(v) and Nj(v) whenever i− j = 0 (mod 2),

2α(G) ≥ 1 + α1(v) + · · ·+ αm(v) ≥ 1

k[1 + d1(v) + · · ·+ dm(v)].

Therefore

α(G) ≥ 1

k2(m−2)/(m−1)

∑v∈V

(d1(v)

2kα(G)

)1/(m−1)

,

2.4. PICK VERTICES RANDOMLY 23

the desired statement follows. 2

Theorem 2.5 can be used to shown r(C2m+1, Kn) ≤ c(nm+1

logn

)1/mfor

large n. We give proof for the case m = 2, and the proof for generalcase is similar.

Theorem 2.6 Let n ≥ ee2. Then

r(C5, Kn) ≤ 6n3/2

√log n

.

Proof. Let G be a graph on N = r(C5, Kn)− 1 vertices that containsno C5 and α(G) ≤ n − 1. We consider two cases depending on thevalue of d, the average degree of G. Note that if G is C5-free, then Gis (2, 3)-colorable.

Case 1. d > 3√n log n. By Theorem 2.5, we have α(G) ≥

√Nd/18.

It follows that n− 1 >√

3N√n log n/18, implying

N + 1 ≤ 6(n− 1)2

√n log n

+ 1 ≤ 6n3/2

√log n

.

Case 2. d ≤ 3√n log n. Since G is C5−free, each neighborhood of

G does not contain any path of length 3, according to a theorem inLecture 1, we have

n− 1 ≥ Nlog(3

√n log n/3)− 1

3√n log n

≥ N

√log n

6√n.

It follows that

N + 1 ≤ 6(n− 1)√n√

log n+ 1 ≤ 6n3/2

√log n

,

completing the proof. 2

2.4 Pick Vertices Randomly

A dominating set of a graph G = (V,E) is a set U ⊆ V such thatU ∪ N(U) = V , where N(U) = ∪u∈UN(u). The domination number


is the is the smallest cardinality among all dominating sets of G. Letβ(G) denote the domination number of G. Clearly

α(G) ≥ β(G)

since any maximal independent set is a dominating set. We would liketo ask what G look like if G is triangle-free and α(G) and β(G) areclose?

Recall that we have that for any triangle-free graph G with N ver-tices and average degree d, then α(G) ≥ Nf(d), where f(x) ∼ log x/xas x→∞. A similar bound for domination number du to Alon (1990)is as follows, in which the involved function has the same asymptoticalform. We shall write N [v] for N(v) ∪ v and N [X] for X ∪N(X).

Theorem 2.7 Let G be a graph with N vertices and minimum degreeδ ≥ 1. Then

β(G) ≤ N1 + log(δ + 1)

δ + 1.

Proof. Let us pick a vertex v of G with probability p randomly andindependently, where p = log(δ + 1)/(δ + 1). Let X be the random setof picked vertices and let Y = V \ N [X]. The set X ∪ Y is clearly adominating set of G which can be picked with the cardinality as smallas its expected number. The expected values of |X| is Np. A vertex vbelongs to Y if and only if neither v nor any neighbor belongs to X, so

E(|Y |) =∑v∈V

(1− p)1+d(v) ≤ N(1− p)1+δ.

ThusE(|X|+ |Y |) ≤ N

[p+ (1− p)1+δ

].

Since

(1− p)δ+1 ≤ e−p(δ+1) = e− log(δ+1) =1

δ + 1,

then we have

E(|X|+ |Y |) ≤ N1 + log(δ + 1)

δ + 1,

as desired. 2

2.4. PICK VERTICES RANDOMLY 25

In many cases, the probabilistic method supplies effective random-ized algorithms for various problems. In some cases, these algorithmscan be converted into deterministic ones. The aim of derandomizationis to convert probabilistic proofs of existence of combinatorial struc-tures into efficient deterministic algorithms for their actual structures.The following proof of Alon (1990) can be viewed as a derandomizationof the proof of the above theorem.

A constructive proof of Theorem 2.7. Let us run a greedyalgorithm as follows. At the initial step, let X = x, where deg(x) =∆(G), and let Y = V \N [X], where V = V (G). The set Y consists ofvertices that are not dominated by X. In general step, we shall selecta vertex z ∈ V \ X such that z dominates the most of vertices of Y ,and enlarge X by adding z into X. We claim that z dominates at least(δ + 1)|Y |/N vertices of Y . In fact, note the trivial fact that∑

y∈Y|N [y]| ≥ (δ + 1)|Y |,

in which the sum counts the vertices in V \X only. On average, eachvertex in V \X is counted

1

N − |X|∑y∈Y|N [y]| > (δ + 1)|Y |

N

times by these subsets N [y] in the sum, so some vertex z ∈ V \ Xappears at least (δ + 1)|Y |/N times, proving the claim.

We iteratively select a vertex in V \X that dominates the most ofthe vertices in Y . After each step of selection, the ratio of remainingvertices is at most 1−(δ+1)/N . Hence after N log(δ+1)/(δ+1) steps,the number of remaining vertices is at most

N

(1− δ + 1

N

)N log(δ+1)/(δ+1)

< Ne− log(δ+1) =N

δ + 1.

The selected vertices and these remaining vertices together form a dom-inating set of size at most N(1 + log(δ + 1))/(δ + 1). 2

Chapter 3

The Lovasz Local Lemma

Laszlo Lovasz (born March 9, 1948, recipient of the 1999 Wolf Prize) isa Hungarian-American mathematician, best known for his work in com-binatorics, who served as president of the International MathematicalUnion between January 1, 2007 and December 31, 2010

In probability theory, if a large number of events are all independentof one another and each has probability less than 1, then there is apositive (possibly small) probability that none of the events will occur.The Lovasz local lemma (a weaker version was proved in 1975 by Lovaszand Erdos) allows one to relax the independence condition slightly: Aslong as the events are “mostly” independent from one another andthey are not individually too likely, then there will still be a positiveprobability that none of them occurs. It is most commonly used in theprobabilistic method, in particular to give existence proofs. Differingfrom the “almost all” argument, we are concerned with the existenceof the event of small positive probability. There are several differentversions of the lemma. The simplest and most frequently used is thesymmetric version, and a complicated version is the general form, whichcan be used to improve some results in previous chapters or simplify theproofs. The last section is a brief description of a constrained randomgraph process, which makes the event appears with a larger probabilityin the constrained process.

27

28 CHAPTER 3. THE LOVASZ LOCAL LEMMA

3.1 The local lemma

Let A1, A2, . . . , An be the events in a probability space (Ω,F ,Pr). Incombinatorial application such as a coloring of edges of KN , any Ai isa “bad” event. We wish that no “bad” event happens, that is to say,we wish to show

Pr(∩Ai) > 0 (3.1)

so there is a point (coloring) which is good. In proofs in Chapter 4 forlower bound of r(n, n), AS is the event that S is monochromatic for ann-set S. The event AS is “bad”. Then ∩AS is the event in which noneof n-sets is monochromatic. Pr(∩AS) > 0 means that there must bea coloring in which no “bad” thing happens. That is, there is an edgecoloring of KN , which determines a graph G such that G contains noKn and whose complement contains no Kn either.

It is a trivial fact that if each Pr(Ai) is small such that

n∑i=1

Pr(Ai) < 1,

thenPr(∩Ai) = 1− Pr(∪Ai) ≥ 1−

∑Pr(Ai) > 0.

On the other hand, if A1, . . . , An are mutually independent events,i.e., any Ai is independent of any Boolean function of other Aj, andPr(Ai) = xi < 1 for 1 ≤ i ≤ n then

Pr(∩Ai) =n∏i=1

(1− xi) > 0.

with no further restriction on probabilities Pr(Ai).The Lovasz Local Lemma may be understood in term of taking

advantage of partial independence of the events A1, A2, . . . , An so that(3.1) can be ensured with far weaker bounds on the probabilities Pr(Ai)than that are needed for

∑Pr(Ai) < 1.

The argument to follow uses conditional probability. Recall that forevents A and B with Pr(B) > 0, the conditional probability Pr(A|B)is given by

Pr(A|B) =Pr(A ∩B)

Pr(B).

3.1. THE LOCAL LEMMA 29

If A and B are independent then Pr(A|B) = Pr(A). By admitting thatan event of zero probability is independent of any other event, then twoevents A and B are independent if and only if Pr(A∩B) = Pr(A) Pr(B).

Let us introduce a graph to describe the dependency of events. Agraph D on vertices [n] = 1, 2, . . . , n (the set of indices for the eventsAi) is called dependency graph for events A1, A2, . . . , An if for every i,the event Ai is mutually independent of all Aj with ij 6∈ E(D) andj 6= i. That is, Ai is independent of any Boolean function of theseevents in Aj : j 6∈ N [i]. This graph must contain edges betweenthe pairs of dependent events, and it contains such edges only in mostapplications so the term dependency graph is after. The original locallemma is as follows.

Theorem 3.1 If each of the events of A1, A2, . . . , An has probability por less, each vertex in the dependence graph has degree at most d ≥ 1,and if

4dp ≤ 1,

then Pr(∩ni=1Ai) > 0.

The following form of the Lovasz Local Lemma is called its generalform, see Spencer (1977).

Theorem 3.2 Let A1, A2, . . . , An be the events in a probability space(Ω,F ,Pr). Suppose that there exist real numbers x1, x2, . . . , xn suchthat 0 < xi < 1 and for i = 1, 2, . . . , n,

Pr(Ai) ≤ xiΠj: ij∈E(D)(1− xj).

Then Pr(∩ni=1Ai) ≥ Πni=1(1− xi) > 0.

We remark that if i is an isolated vertex in D, that is, the event Aiis mutually independent of all other events, then the neighborhood ofvertex i in D is empty, and Πj∈∅(1− xj) = 1.

Proof. The desired result follows directly from the following claim.Claim. For S ⊂ [n], set

CS = ∩j∈SAj.


(For S = ∅, we take CS to be Ω). If i 6∈ S, then

Pr(Ai|CS) ≤ xi.

Proof of the claim. The proof is by induction on |S|. If |S| = 0the desired result is immediate since the hypothesis of the local lemmayields

Pr(Ai|CS) = Pr(Ai|Ω) = Pr(Ai) ≤ xiΠj: ij∈E(D)(1− xj) ≤ xi.

Now assume that |S| ≥ 1 and form a partition S = (S1, S2), where

S1 = j ∈ S : ij ∈ E(D) and S2 = S \ S1.

Let us write Pr(Ai|CS) as

Pr(Ai ∩ CS)

Pr(CS)=

Pr(Ai ∩ CS1 ∩ CS2)

Pr(CS1 ∩ CS2)=

Pr(Ai ∩ CS1|CS2)

Pr(CS1|CS2),

and bound the numerator and denominator separately. First, since Aiand CS2 are independent,

Pr(Ai ∩ CS1 |CS2) ≤ Pr(Ai|CS2) = Pr(Ai) ≤ xiΠj∈S1(1− xj).

To bound the denominator, we use the induction hypothesis. If |S1| =0, then

Pr(CS1|CS2) = Pr(Ω|CS2) = 1

and the claim follows. Otherwise, suppose S1 = j1, j2, . . . , jr, wherer ≥ 1. Let D0,D1, . . . ,Dr be the events defined recursively by

D0 = CS2 = ∩j∈S2Aj

and for k = 1, 2, . . . , r,

Dk = Dk−1 ∩ Ajk =(∩j∈S2Aj

)∩(∩kt=1Ajt

).

They start with D0 = CS2 and end with Dr = CS. Note that for eachk = 0, 1, . . . , r − 1, the event Dk has a form of CT for some set T ⊆ S

with |T | < |S| and the fact that Pr(Aj|Dk) = 1 − Pr(Aj|Dk) . Usingthe induction hypothesis on CT repeatedly, we have

Pr(CS1|CS2) =Pr(CS)

Pr(D0)=

Pr(Dr)Pr(D0)

=Pr(Dr)

Pr(Dr−1)· · · Pr(D1)

Pr(D0)

= Pr(Ajr |Dr−1) · · ·Pr(Aj1|D0)

= (1− Pr(Ajr |Dr−1)) · · · (1− Pr(Aj1|D0))

≥ Πj∈S1(1− xj).

Combining this with what proved we have established the claim.

Note that ∩ni=k+1Ai has a form of CS with k 6∈ S. In view of theclaim just established,

Pr(∩ni=1Ai) = Pr(A1| ∩ni=2 Ai) Pr(∩ni=2Ai)

= Pr(A1| ∩ni=2 Ai) Pr(A2| ∩ni=3 Ai) · · ·Pr(An|Ω)

= (1− Pr(A1| ∩ni=2 Ai)) · · · (1− Pr(An|Ω))

≥ Πni=1(1− xi).

This completes the proof of the local lemma. 2

Let us call the following form of the local lemma as symmetric form.

Theorem 3.3 If each of the events of A1, . . . , An has probability p orless, each vertex in the dependence graph has degree at most d, and if

e(d+ 1)p ≤ 1,

where e is natural logarithm base, then Pr(∩ni=1Ai) > 0.

Proof. By taking xi = 1/(d+ 1) for 1 = 1, 2, . . . , n, we shall show

Pr(Ai) ≤ xiΠj: ij∈E(D)(1− xj).

Since for any i the right side is at least

1

d+ 1

(1− 1

d+ 1

)d>

1

e(d+ 1)≥ p,

where the first inequality holds from the fact(1 + 1

k

)k< e. 2

Note that the original condition 4dp ≤ 1 can be implied by Theorem

3.3 as 4dp ≥ e(d+1)p for d ≥ 3, and if d = 1, 2, then 1d+1

(1− 1

d+1

)d≥ p.

We also need the following form of local lemma due to Spencerwho used it to obtain his lower bound for non-diagonal classic Ramseynumbers. This form is slightly more convenient for some applications.

Corollary 3.1 Let A1, A2, . . . , An be events in a probability space. Ifthere exist numbers y1, y2, . . . , yn such that for each i, 0 < yi Pr(Ai) < 1,and

log yi ≥ −∑

j: ij∈E(D)

log(1− yj Pr(Aj)),

then Pr(∩Ai) > 0.

Proof. We may suppose that for each i the probability Pr(Ai) is pos-itive. Let xi be as in the general form of the local lemma and setyi = xi/Pr(Ai) for i = 1, 2, . . . , n. The hypothesis of the local lemma

Pr(Ai) ≤ xiΠj: ij∈E(D)(1− xj)

will take the form

yi ≥ Πj: ij∈E(D)1

1− yj Pr(Aj).

The assertion holds from taking logarithms on both sides of the aboveinequality. 2

Let us have an example to explain that for the local lemma, the de-pendency graph D may contain more edges other than these connectingpairs of dependent events.

Let 1, 2, 3 be the vertex set of aK3, and let the probability space Ωconsists of all 2-coloring of the vertices, in which each vertex is assignedto color red or blue with probability 1/2 randomly and independently.Then |Ω| = 8. For i < j, let Aij be the event that the edge i, j ismonochromatic. Clearly Pr(A12) = Pr(A13) = Pr(A23) = 1/2. Alsoevents A12, A13, A23 are pairwise independent as

Pr(A12A13) = Pr(A12A23) = Pr(A13A23) =1

4.


If we mistakenly use the local lemma by letting E(D) = ∅, then wewould set xij = 1/2 with Pr(Aij) ≤ xijΠ∅(1 − xk`). Thus we had awrong conclusion that χ(K3) were at most two by Pr(∩Aij) > 0.

Erdos and Spencer (1991) pointed that the dependency graph Dcan be replaced by a graph F on [n] if F satisfies that for each i andeach S ⊆ [n] \NF [i],

Pr(Ai∣∣∣ ∩j∈S Aj) ≤ xiΠj: ij∈E(F )(1− xj).

This condition contains conditional probabilities. To avoid to computethese probabilities in applications and to have a slightly stronger form,we shall specify their idea further. Let us have the following definitionsfirst from Erdos and Spencer (1991), and Lu and Szekely (2007).

Let A and B be events. Then B is said to be positive or negativeto A if Pr(A|B) ≥ Pr(A) or Pr(A|B) ≤ Pr(A), respectively. When theinequality holds strictly, then B is said to be strictly positive or strictlynegative to A, respectively. For a set S of indices of events, we write

CS = ∩j∈SAj,

in which CS is admitted as Ω if S = ∅. A graph F on [n] (the set ofindices for the events) is called negative of events A1, A2, . . . , An if forevery i and any set S ⊆ [n]\N [i], the event CS is negative to Ai, whereN [i] is the closed neighborhood of i in F .

We call F as negative graph as it is the “real” negative graph in mostapplications, which contains edges ij such that Ai and Aj are strictlynegative each other.

Note that a dependency graph is a negative graph, but the lattermay contain less edges, and the dependency graph in the local lemmacan be replaced by any negative graph.

It is easy to verify the local lemma holds if we replace the depen-dency graph by a negative graph.

Property 3.1 The local lemma holds when the dependency graph isreplaced by a negative graph of the events.

Since negative graphs are bipartite for most applications, the locallemma with negative graph will be easier to apply.

Let us remark that the dependency of events can be described bya directed graph instead of a graph. A directed graph D on ver-tices [n] = 1, 2, . . . , n is called directed dependency graph for eventsA1, A2, . . . , An if each event Ai is is mutually independent of the eventsin Aj : j 6∈ N+[i], where N+[i] is the closed out-neighborhood ofi. Then the condition to guarantee Pr(∩Ai) > 0 is that there exist0 < x1, x2, . . . , xn < 1 such that

Pr(Ai) ≤ xiΠj: (i,j)∈E(D)(1− xj),

where (i, j) is the arc from i to j in directed dependency graph D forthe events.

Similarly, the negative graph in the local lemma can be replaced bya directed negative graph.

However, no matter using negative graph or directed dependencygraph in the local lemma, the idea is to reduce the edges in the depen-dency graph.

3.2 Applications of the local lemma

Local Lemma is one of important mathematical contributions of Lovasz,and it has a lot applications in recent years in many fields. The locallemma was invented to prove a result on coloring for hypergraphs. Cal-l a coloring of vertices of a hypergraph H to be proper, if no edge ismonochromatic, and call H as k-colorable if there is a proper k-coloringfor its vertices. Using the original condition 4dp ≤ 1, Erdos and Lovasz(1975) proved that an r-uniform hypergraph H is 2-colorable if eachedge of H intersects at most 2r−3 other edges. As the first applicationof the local lemma, this result becomes a specific problem in derandom-ization of the local lemma. See, e.g., Beck (1991). We now improvethis result slightly.

Theorem 3.4 Let H = (V, E) be an r-uniform hypergraph. If eachedge of H intersects at most e−12r−2 other edges, the H is 2-colorable.

Proof. Set V = 1, 2, . . . , n and E = e1, e2, . . . , em. Let the prob-ability space consist of all 2-colorings of V , in which each vertex is

3.2. APPLICATIONS OF THE LOCAL LEMMA 35

colored by red and blue with probability 1/2 randomly and indepen-dently. Let Ai and Bi be the event that ei is monochromatically redand blue, respectively. Then Pr(Ai) = Pr(Bj) = 1/2r. Clearly eventsAi and Bj are strictly negative each other if and only if ei ∩ ej 6= ∅.Let us connect such pairs of Ai and Bj, and we thus obtain a negativegraph F of events A1, . . . , Am, B1, . . . , Bm, which is a bipartite graph.By assumption, each vertex of F has degree at most d ≤ e−12r−1, andthus ep(d+ 1) ≤ 1. So Pr

((∩iAi)∩ (∩jBj)

)> 0, and there is a proper

vertex coloring of H in two colors. 2

Before the above-mentioned result of Erdos and Lovasz, a similarresult had appeared shown by basic probabilistic method as follows.

Theorem 3.5 (Erdos-Selfridge) Let H = (V, E) be an r-uniform hy-pergraph. If |E| < 2r−1, the H is 2-colorable.

Proof. The basic probabilistic method gives

Pr(∪Ae) ≤∑

Pr(Ae) =|E|

2r−1< 1,

where Ae be the event that the edge e is monochromatic in the spacedefined as that in the proof of the last theorem. 2

Let us return to our main purpose. The following theorem ofSpencer (1975) improves Theorem 4.4 in Chapter 4 with another factor√

2. This is negligible when viewed in the light of the gap between theupper bound and lower bounds, but we do what we can. The progresson this difficult problem has been slow.

Theorem 3.6 As n→∞,

r(n, n) ≥ (1− o(1))

√2

en2n/2.

Proof. Consider the random graph space G(N, 1/2) or consider to coloreach edge of KN with probability 1/2, randomly and independently.Let S be a subset of size n of V (KN) and let AS signify the event “Sis monochromatic”, S ranging over the n-subsets. Define a graph Dwith vertex set consisting all such S and connect vertices S and T in

D if and only if |S ∩ T | ≥ 2. Then AS is mutually independent of allAT with T not adjacent to S, since the AT give information only aboutedges outside of S. Hence D is a dependency graph. We apply thelocal lemma with

p = Pr(AS) = 21−(n2).

And for any S, its degree d in D can be bounded as

d = |T : |S ∩ T | ≥ 2 <(n

2

)(N

n− 2

).

If ep(d+ 1) < 1 then Pr(∩AS) > 0 thus r(n, n) > N . So we want

e

(n

2

)(N

n− 2

)21−(n2) < 1.

As we did it before, the left hand side is less than

en2

2

(eN

n− 2

)n−2 2

2n(n−1)/2=en2

2

(n

n− 2

)n−2(

eN√2n2n/2

)n−2

.

For any ε > 0, if we take N = d(1 − ε)√

2en2n/2e, then the above tends

to zero. 2

The first application of the general form of the local lemma wasmade by Spencer (1977) who gave a lower bound

r(m,n) ≥ c

(n

log n

)(m+1)/2

,

which improves that obtained in Chapter 4. Erdos, Faudree, Rousseauand Schelp(1987), and Krivelevich(1995) generalized Spencer’s lowerbound from Km to a fixed graph F ; Li and Zang did it to r(F,Gn),where the order of Gn is n and e(Gn) = n2−o(1). Dong, Li and Lin(2009) did it further. Set

ρ(F ) =e(F )− 1

v(F )− 2,

where v(F ) and e(F ) are the order and the size of F , respectively.Recall the automorphism group of a graph G on n vertices defined

in Chapter 4, denoted by A(G), for which |A(G)| ≤ n!.

Theorem 3.7 Let F be a fixed graph with v(F ) ≥ 3, and let Gn be agraph of order n with average degree dn →∞. Then for all large n

r(F,Gn) ≥ c( dn

log dn

)ρ(F ),

where c = c(F ) > 0 is a constant.

Proof. Let m = v(F ) and ρ = ρ(F ). Clearly we may assumethat ρ > 1 and dn is sufficiently large. Color the edges of KN by redand blue randomly and independently, in which each edge is coloredred with probability p and blue with probability q = 1 − p. For eachsubgraph S of KN that is isomorphic to F , let AS be the event thatS spans a red F . For each subgraph T of KN that is isomorphic toGn, let BT be the event that T spans a blue Gn. Then Pr(AS) = pe(F )

and Pr(BT ) = qe(Gn). Since F has m vertices, there are (N)m/|A(F )|events of form AS. Similarly, there are (N)n/|A(Gn)| events of form BT .Obviously, a pair of distinct events are strictly negative each other if andonly if they are of different types and the corresponding subgraphs haveedges in common. Any pair of events of the same type are positive eachother and a pair of events of different types that do not have commonedges are independent. Hence, each A event is strictly negative to atmost (N)n/|A(Gn)| < Nn of the B events; each B event is strictlynegative to at most e(Gn)(N − 2)m−2 < e(Gn)Nm−2 of the A events.By connecting the events of different types that have common edges,we have a negative graph for these events.

We aim to prove that there exist positive numbers a and b satisfyingSpencer’s form of Local Lemma, namely, ape(F ) < 1 and bqe(Gn) < 1 holdwith yi = a for each A event and yj = b for each B event. Specifically,

log a ≥ −Nn log(1− bqe(Gn)), (3.2)

log b ≥ −e(Gn)Nm−2 log(1− ape(F )). (3.3)

If such a and b are available, then r(F,Gn) > N . To this end, set a = 2,

p =6ρ log dn

dn, b = exp

(ρ n log dn

), N = c

( dnlog dn

)ρ,

where c = c(F ) is a constant to be chosen. Using the basic inequalityq = 1− p < e−p for p > 0, we have

Nnbqe(Gn) ≤ Nnbe−pe(Gn) = expn logN + log b− pndn

2

≤ exp

− ρ n log dn

→ 0.

So bqe(Gn) → 0 thus log(1−x) ∼ −x for x = bqe(Gn), and the right-handside of (3.2) tend to zero, and thus (3.2) holds for all large n.

Note that the right-hand side of (3.3) is asymptotically

e(Gn)Nm−2ape(F ) = (6ρ)e(F )cm−2(n log dn

).

So (3.3) holds if we choose c such that

ρ > (6ρ)e(F )cm−2,

and the proof is completed. 2

Corollary 3.2 For fixed m ≥ 3 as n→∞,

cm

(n

log n

)(m+1)/2

≤ r(m,n) ≤ (1 + o(1))nm−1

(log n)m−2,

where cm = c(m) > 0 is constant.

The lower bound on classical Ramsey numbers will be improved inthe next section. However, the local lemma is still a generally powerfultool for applications and the proof is comparatively simple, particularly,for the problem involving system without specific structure.

The following result improves the lower bound obtained by deletionmethod as r(Cm, Kn) ≥ c(n/ log n)m/(m−1).

Corollary 3.3 Let integer m ≥ 3 be fixed. Then there exists constantc = c(m) > 0 so that

r(Cm, Kn) ≥ c

(n

log n

)(m−1)/(m−2)

for all large n.

Corollary 3.4 For any fixed integers k ≥ m ≥ 2, there exist constantc = c(m, k) > 0 so that

r(Km,k, Kn) ≥ c

(n

log n

)(mk−1)/(m+k−2)

for all large n.

We will prove that r(Km,k, Kn) ≤ c(

nlogn

)mlater. Note that the

exponent (mk − 1)/(m + k − 2) in the lower bound can be arbitrarilyclose to the exponent m in the upper bound for fixed large k.

We have seen that the probabilistic method has a lot applicationswith much better results than that by elementary combinatorial method.However, we shall see some other methods, such as the algebraic meth-ods, have much success for some topics in Ramsey theory in the nextseveral chapters. Let us conclude this section with two jokes given bySpencer (1994) to say that for many topics, unlike that for Turan’sbound for independence number shown in Chapter 4, the probabilis-tic method cannot provide “exact” results often. The problem thatSpencer joked is serious. In order to have verisimilitude, we write thejoked results as usual in “academic language” but without indices.

The proof following results is due to Joker, who used the basicprobabilistic method.

Theorem (Joker) Let S and T be nonempty sets. If |T | >(|S|2

), then

there exists an injection f : S → T .Proof. Consider the probability space consisting of all maps from S toT , in which each map appears equiprobably and independently. Forany unordered pair of points x and y of S, let Axy = Ayx signify theevent f(x) = f(y). Since for fixed pair x and y

|f : S → T : f(x) = f(y)| = |T ||S|−1,

we have Pr(Axy) = 1/|T | and

Pr(∪x,y⊆SAxy

)≤

∑x,y⊆S

1

|T |=

1

|T |

(|S|2

)< 1,

which implies that Pr(∩x,y⊆SAxy) > 0 and the desired injection exists.2

Later Joker amused himself by improving the above result by usingthe local lemma. The new result is tight up to a multiplicative constant.

Theorem (Joker) Let S and T be nonempty sets. If |T | ≥ 2e|S|, thenthere exists an injection f : S → T .Proof. The same as that for Joke 1 but apply the local lemma. In thedependence graph, the vertex Axy is adjacent to Axy′ with y′ ∈ S \ yand Ax′y with x′ ∈ S \ x. Let d = 2(|S| − 1), then the independencegraph is d regular, in which the event Axy is mutually independent to allnon-neighbors. As p = 1/|T |, the condition ensures e(d+1)p < 1, so thesymmetric form of the Local Lemma gives that Pr(∩x,y⊆SAxy) > 0,implying the existence of the desired injection. 2

3.3 Triangle-free process ?

It is often difficult to show the existence of small events. The locallemma is a tool for such proof that improved most lower bounds frombasic probabilistic method. The key for the proof of the local lemmaitself is conditional probability. A revolutionary idea for finding thesmall events is also “conditional”. If we know a certain condition inwhich the event is likely to appear, then the probability for event islarge under the condition. In other word, we try to switch a smallevent to be a large one conditionally. However, we may encounter thedifficulty to finger the conditional probability out.

Obtaining the right order of magnitude of r(m,n) even r(3, n) wascertainly a challenge in decades. A celebrated result of Kim (1995)showed that of r(3, n) is n2/ log n, which was obtained again by Bohman(2009). They used different analysis on the same random graph pro-cess, called the triangle-free process. For general constrained graph pro-cess, see, e.g., Rucinski and Wormald (1992), Erdos, Suen and Winkler(1995), Bollobas and Riordan (2000), and Osthus and Taraz (2001).

The triangle-free process can be described as follows. We begin withthe empty graph, denoted by G0, on N vertices. At step i we form thegraph Gi by adding a new edge to Gi−1 chosen uniformly at random

3.3. TRIANGLE-FREE PROCESS ? 41

from the collection of pairs of vertices that neither appear as edges inGi−1 nor form triangles when added as edges to Gi−1. The processterminates at a maximal triangle-free graph GM , for which the randomvariable M is the number of edges of GM . Note that a maximal triangle-free graph is connected and the number of edges in a triangle-free graphof order N is at most N2/4 (see Chapter 8), we have

N − 1 ≤M ≤ N2

4.

However, Bohman (2009) proved that almost surely (a.s.)

c1N3/2√

logN ≤M ≤ c2N3/2√

logN.

From a result in Chapter 3, we have that the independence number-s of such graphs are at least Ω(

√N logN). Remarkably, Kim and

Bohman show that a.s. independence numbers of such graphs are atmost O(

√N logN), which implies that r(3, n) ≥ Ω(n2/ log n).

Theorem 3.8 For some constant c > 0,

r(3, n) ≥ cn2

log n.

Let us talk a bit more on the process employed by Bohman. Fora set V , let V (2) be the set of all pairs u, v of V , which is the edgeset of complete graph on V . The vertex set of our complete graph oforder N is on [N ] = 1, 2, . . . , N. In the evolution of the triangle-freeprocess, we shall track the some random sets. Recall that Gi is thegraph given by the first i edges selected by the process. The graph Gi

partitions [N ](2) into three parts: Ei, Oi and Ci. The set Ei is simplythe edge set of Gi. A pair of [N ](2) is open, and in the set Oi, if it canstill be added as an edge without violating the triangle-free condition.A pair of [N ](2) is closed, and in the set Ci, if it is neither an edge inthe graph nor open; that is, a pair e = u, v is in Ci if e 6∈ Ei ∪ Oi

and there exists a vertex w such that u,w, v, w ∈ Ei. Note thatei+1 is chosen uniformly at random from Oi. That is to say, each edgeof Oi has the same probability 1/|Oi| to be chosen as ei+1. We do not


express this as Pr(ei+1 ∈ Oi) = 1/|Oi| as only edges in random set Oi

are available.

The proofs of Kim and Bohman are hard and tedious and thus areomitted. With more complicated analysis on K4-free process, Bohman(2009) also improved the known lower bound of r(4, n), and general-ly, Bohman and Keevash (2010) improved the lower bound of r(m,n)obtained from the local lemma by a factor (log n)1/(m−2) as

r(m,n) ≥ c( n

log n

)(m+1)/2(log n)1/(m−2).

References

J. Beck, An algorithmic approach to the Lovasz local lemma, Ran-dom Structures Algorithms, 2 (1991), 343-365.

T. Bohman, The triangle-free process, Adv. Math., 221 (2009),1653-1677.

T. Bohman and P. Keevash, The early evolution of H-free process,Invent. Math., 181 (2010), 291-336.

B. Bollobas, O. Riordan , Constrained graph processes, Electron. J.Combin. 7 (2000) R18.

L. Dong, Y. Li and Q. Lin, Ramsey numbers involving graphs withlarge degrees, Appl. Math. Lett., 22 (2009), 1577-1580.

P. Erdos, R. Faudree, C. Rousseau, and R. Schelp, A Ramsey prob-lem of Harary on graphs with prescribed size, Discrete Math., 67 (1987),227-233.

P. Erdos and L. Lovasz, Problems and results on 3-chromatic hy-pergraph and some related questions, in: Infinite and Finite Sets (toPaul Erdos on His 60th Birthday 1973) II, A. Hajnal, R. Rado and V.Sos Eds., Colloq. Math. Soc. Janos Bolyai, North-Holland, Amster-dam/London, 1975.

P. Erdos and J. Spencer, Lopsided Lovasz lemma and Latin transver-sals, Discrete Appl. Math., 30 (1991), 151-154.

3.3. TRIANGLE-FREE PROCESS ? 43

P. Erdos, S. Suen, P. Winkler, On the size of a random maximalgraph, Random Structures Algorithms 6 (1995) 309-318.

J. Kim, The Ramsey numberR(3, t) has order of magnitude t2/ log t,Random Structures Algorithms 7 (1995), 173-207.

M. Krivelevich, Bounding Ramsey numbers through large deviationinequalities, Random Structures Algorithms, 7 (1995), 145-155.

Y. Li and W. Zang, Ramsey numbers involving large dense graphsand bipartite Turan numbers, J. Combin. Theory Ser. B, 87 (2003),280-288.

L. Lu and L. Szekely, Using Lovasz local lemma in the space ofrandom injections, Electron. J. Combin. 14 (2007), no. 1, 63

D. Osthus, A. Taraz, Random maximal H-free graphs, RandomStructures Algorithms, 18 (2001), 61-82.

A. Rucinski, N. Wormald, Random graph processes with degreerestrictions, Combin. Probab. Comput. 1 (1992) 169-180.

J. Spencer, Ramsey’s theorem-a new lower bound, J. Combin. The-ory Ser. A, 18 (1975), 108-115.

J. Spencer, Asymptotic lower bound for Ramsey functions, DiscreteMath., 20 (1977), 69-76.

J. Spencer, Ten Lectures on the Probabilistic Method, 2nd Edition,SIAM, Philadelphia, 1994.

Chapter 4

Concentration

4.1 The Chernoff’s Inequality

The probability space we consider in graph Ramsey theory has onlyfinite many possible outcomes, and the random variable is often non-negative. Let X be a random variable, the expected value of X isdefined to be E(X) =

∑i ai Pr(X = ai), where the summation is taken

over all values ai that X can take.

Theorem 4.1 (Markov’s Inequality) Let a > 0 and let X be a non-negative random variable. Then

Pr(X ≥ a) ≤ E(X)

a.

Proof. Suppose that ai is the set of all values that X takes. Then

E(X) =∑i

ai Pr(X = ai)

≥∑ai≥a

ai Pr(X = ai)

≥ a∑ai≥a

Pr(X = ai) = aPr(X ≥ a),

as required. 2

45

46 CHAPTER 4. CONCENTRATION

Corollary 4.1 If a random variable X only takes nonnegative integervalues and E(X) < 1, then Pr(X ≥ 1) < 1 hence Pr(X = 0) > 0.

This is exactly what we used to obtain lower bounds of Ramseynumbers in the last chapter.

For a positive integer k, the kth moment of a real-valued randomvariable X is defined to be E(Xk), and so the first moment is simplythe expected value. Denote by µ = E(X), and define the variance ofX as E((X − µ)2), which is denoted by σ2. Call

σ =√E((X − µ)2)

as the standard deviation of X. A basic equality is as follows.

σ2 = E(X2)− µ2.

Theorem 4.2 (Chebyshev’s Inequality) Let X be a random vari-able and let a be a positive number. Then

Pr(|X − µ| ≥ a) ≤ σ2

a2.

Proof. By Markov’s inequality, for any a > 0,

σ2 = E((X − µ)2))

≥ a2 Pr((X − µ)2 ≥ a2)

= a2 Pr(|X − µ| ≥ a).

It follows by the required statement. 2

In importance, the second moment E(X2) is second to the firstmoment E(X).

Lemma 4.1 (Second Moment Method) If X is a random variable,then

Pr(X = 0) ≤ σ2

µ2=E(X2)− µ2

µ2,

where µ = E(X). In particular, Pr(X = 0)→ 0 if E(X2)/µ2 → 1.

4.1. THE CHERNOFF’S INEQUALITY 47

The proof follows from Chebyshev’s Inequality and the trivial factthat Pr(X = 0) ≤ Pr(|X −µ| ≥ µ) immediately. Intuitively, if σ growsmore slowly than µ grows, than Pr(X = 0)→ 0 since σ “pulls” X closeto µ thus far away from zero.

The Chebshev’s inequality is in fact the Markov’s inequality onrandom variable |X − µ|. However, Chebshev’s inequality states theprobability of a random variable X apart from E(X) is bounded. Whenthis is the case, we say that X is concentrated. A concentration boundsis used to show that a random variable is very close to its expectedvalue with high probability, so it behaves approximately as one may“expect” it to be. When Sn is the sum of n independent variables, eachvariable equals to 1 with probability p and −1 with probability 1 − p,respectively, the bound can be sharper. Such random variables arebounded in Chernoff’s inequality. Most of the results in this chaptermay be found in, or immediately derived from, the seminal paper ofChernoff (1952) while our proofs are self-contained. A set of randomvariables X1, X2, . . . are said to be mutually independent means eachXi is independent of any Boolean expression formed from other (Xj)

′s.In any form of Chernoff bounds, we assume that

Assumption A : On the independence of variables in Chernoffbound Let X1, X2, . . . be mutually independent variables and they havethe same binomial distribution. Set

Sn =n∑i=1

Xi.

All concentration bounds in the remaining part of this section are Cher-noff bounds of different forms, which estimate the probability of

Pr(Sn ≥ n(µ+ δ)),

where µ = E(Xi). The symmetric bound on Pr(Sn ≤ n(µ− δ)) can beobtained similarly.

Theorem 4.3 Under Assumption A, suppose

Pr(Xi = 1) = Pr(Xi = −1) =1

2

for i = 1, 2, . . .. Then for any δ > 0,

Pr(Sn ≥ nδ) < exp−nδ2/2,

and for any a > 0,

Pr(Sn ≥ a) < exp−a2/(2n).

Proof. Let λ > 0 be arbitrary. Then

E(eλXi) =eλ + e−λ

2.

Note that

E(eλSn) = E(eλX1)E(eλX2) . . . E(eλXn)

=

(eλ + e−λ

2

)n=

∞∑j=0

λ2j

(2j)!

n

<

∞∑j=0

1

j!

(λ2

2

)jn = enλ2/2,

where we use the fact that (2j)! ≥ 2jj! for all j ≥ 0 with strict inequalitywhen j ≥ 2. Now by Markov’s inequality,

Pr(Sn ≥ nδ) = Pr(eλSn ≥ eλnδ)

≤ E(eλSn)

eλnδ

< expn(λ2/2− λδ),

for all λ > 0. Setting λ = δ, we obtain the desired result. 2

For large n, the central limit theorem implies that Sn is approxi-mately normal with zero mean and standard deviation

√n. For any

fixed u,

limn→∞

Pr(Sn ≥ u√n)∫ ∞u

1√2πe−t

2/2dt < e−u2/2.

However, the Chernoff bound holds for all positive n and a.Since Xi is often an indicator variable of some random event, so Xi

takes 1 when the event appears and 0 otherwise. The following form ofChernoff bound may be used in more cases.

Pr(Xi = 1) = Pr(Xi = 0) =1

2

for i = 1, 2, . . .. Then for any δ > 0,

Pr(Sn ≥ n(1 + δ)/2) < exp−nδ2/2.

NamelyPr(Sn ≥ n(1/2 + δ) < exp−2nδ2.

Proof. Set Yi = 2Xi − 1 and Tn =∑ni=1 Yi = 2Sn − n. Then

Pr(Yi = 1) = Pr(Yi = −1) =1

2,

and Yi satisfies Assumption A. Note that Tn ≥ nδ if and only ifSn ≥ n(1 + δ)/2. Applying Theorem 4.3 to Yi and Tn, we have

Pr(Sn ≥ n(1 + δ)/2) = Pr(Tn ≥ nδ) < exp−nδ2/2

as claimed. 2

Under Assumption A, suppose

Pr(Xi = 1) = p, and Pr(Xi = 0) = 1− p

for i = 1, 2, . . .. Then we say that the sum Sn =∑ni=1 Xi has binomial

distribution, denoted by B(n, p). Involved in Theorem 4.4 is specialbinomial distribution B(n, 1/2). For general case , the calculation isslightly more complicated, but the technique is the same. As usual,denote by q for 1− p.


Pr(Xi = 1) = p and Pr(Xi = 0) = q

for i = 1, 2, . . .. Then there exists δ0 = δ0(p) > 0 so that if 0 < δ < δ0,then

Pr(Sn ≥ n(p+ δ)) < exp−nδ2/(3pq).

Proof. Denote by a for p+ δ. By the same argument as used before,

Pr(Sn ≥ na) = Pr(eλSn ≥ eλna)

≤ 1

eλnaE(eλSn)

=1

eλna(peλ + q)n

= (peλ(1−a) + qe−λa)n

for all λ > 0. Let c = 1 − a = q − δ > 0, then a + c = 1. By takingλ = log(aq/cp), we have

minλ>0

(peλc + qe−λa) = e−λa(peλ + q)

=

(cp

aq

)aq

c

(p

a

)a (qc

)c.

Setting 0 < δ < 1− p, and expanding in powers of δ, with the fact that

log(1 + x) = x− x2

2+x3

3+O(x4),

we find

log(p

a

)a= (p+ δ) log

(1− δ

p+ δ

)

= −δ − δ2

2(p+ δ)− δ3

3(p+ δ)2+ o(δ3),

and

log(q

c

)c= (q − δ) log

(1 +

δ

q − δ

)

= δ − δ2

2(q − δ)+

δ3

3(q − δ)2+ o(δ3).

Adding them by terms, the first sum vanishes, and the second is

−δ2

2

(1

p+ δ+

1

q − δ

)=−δ2

2

(1

p(1 + δ/p)+

1

q(1− δ/q)

)

=−δ2

2

(1

pq− (q2 − p2)δ

p2q2+ o(δ)

)

=−δ2

2pq+

(q − p)δ3

2p2q2+ o(δ3),

and the third is

δ3

3

(1

(q − δ)2− 1

(p+ δ)2

)=

δ3

3

(1

q2− 1

p2+ o(1)

)

=−(q − p)δ3

3p2q2+ o(δ3).

We have for small δ > 0

log[(p

a

)a (qc

)c]=−δ2

2pq+

(q − p)δ3

6p2q2+ o(δ3) <

−δ2

3pq.

Thus

Pr(Sn ≥ n(p+ δ)) < exp−nδ2/(3pq),


From above proof for p > q and Theorem 4.4 for p = q = 1/2, wesee that if p ≥ 1/2, the bound can be slightly better as

Pr(Sn > n(p+ δ)) < exp−nδ2/(2pq).

We now write out a symmetric form for Theorem 4.5, and omitthose for Theorem 4.3 and Theorem 4.4.


Pr(Xi = 1) = p and Pr(Xi = 0) = q

for i = 1, 2, . . .. Then there exists δ0 = δ0(p) > 0 so that if 0 < δ < δ0,then

Pr(Sn ≤ n(p− δ)) < exp−nδ2/(3pq).

Therefore

Pr(|Sn − np| > nδ)) < 2 exp−nδ2/(3pq).

2

From the above proof, we have

Pr(Sn ≥ na) ≤((

p

a

)a (qc

)c)n= exp

n(a log

p

a+ (1− a) log

q

1− a

),

where a = p+ δ and c = 1− a. Set k = na, then k > np and

Pr(Sn ≥ k) ≤ exp

n

((k/n) log

p

k/n+ (1− k/n) log

q

1− k/n

).

Let H(x) signify the entropy function

H(x) = x logp

x+ (1− x) log

q

1− x, 0 < x < 1,

thenPr(Sn ≥ k) ≤ expnH(k/n),

which is valid also for k = np since H(p) = 0. The following form ofChernoff’s inequality was used by Beck (1983).


Pr(Xi = 1) = p and Pr(Xi = 0) = q

for i = 1, 2, . . .. If k ≥ np, then

Pr(Sn ≥ k) ≤(np

k

)k ( nq

n− k

)n−k.

Consequently,

Pr(Sn ≥ k) ≤(npe

k

)k.

Proof. The right hand side of the first inequality is just expnH(k/n).For the second inequality, simply note that

(nq

n− k

)n−k≤(

n

n− k

)n−k=

(1 +

k

n− k

)n−k< ek.

Thus the required result follows. 2

4.2. APPLICATIONS OF CHERNOFF’S BOUNDS 53

4.2 Applications of Chernoff’s Bounds

Let us first see that a. a. graphs are nearly regular.

Theorem 4.8 Let 0 0 be fixed. Then almost allgraphs G in G(n, p) satisfy that

| deg(v)− (n− 1)p| ≤ ε(n− 1)p

for each vertex v.

Proof. LetG be a random graph in G(n, p) and let v ba a fixed vertex ofG. Then deg(v) has binomial distribution B(n−1, p). From Chernoff’sTheorems, we have

Pr(| deg(v)− (n− 1)p| > ε(n− 1)p) < 2 exp(−(n− 1)ε2/(3pq))

∼ 2 exp(−nε2/(3pq)).

Hence we bound the probability that there is at least one vertex v suchthat | deg(v) − (n − 1)p| > (n − 1)p) by (2 + o(1))n exp(−nε/(3pq)),which tends to zero as n→∞. 2

The condition that fixed p can be weakened as p = (log n/n)ω(n)with ω(n)→∞, see Alon and Spencer (1992).

Let us enjoy an application of Chernoff bound that is of Erdos style,which disproved a conjecture with almost all graphs.

A suspended path in graph G is a path (x0, x1, . . . , xk) in whichx1, . . . , xk−1 have degree two in G. A graph H is a subdivision of G ifH is obtained from G replacing edges of G by suspended paths, that isto say, H is obtained by adding vertices on the edges of G.

A often used measure for sparseness of graphs is Kr−freeness aswe have met previously. However, there are K3−free graphs whosechromatic number can be arbitrarily large, see Mycielski’s construction(1955) in the exercises. A more general measure for sparseness is toforbid subdivision. Hajos conjectured that every graph G with χ(G) ≥r contains a subdivision of Kr as a subgraph. This conjecture is trivialfor r = 2, 3, and it is confirmed by Dirac (1952) for r = 4, and it isopen for r = 5, 6. Catlin (1979) disproved the conjecture for r ≥ 7 by

a constructive proof, but the disproof of Erdos and Fajtlowicz (1981)was more powerful. Let γ(G) denote the largest r such that G containsa subdivision of Kr as a subgraph. Hajos conjecture is equivalent tothat γ(G) ≥ χ(G).

Theorem 4.9 Almost all graphs Gp ∈ G(n, 1/2) satisfy

χ(G) ≥ n

2 log2 n, and γ(G) ≤

√6n.

Proof. Set k = b2 log2 nc. Since

Pr(α(G) ≥ k) ≤(n

k

)2−(k2) <

(e√

2n

k2k/2

)k→ 0,

and the factα(G)χ(G) ≥ n

for any graph G, the first statement follows immediately. Set r =d√

6ne. Then n ≤ r2/6. There are(n

r

)≤(en

r

)r≤(er

6

)rpotential Kr subdivisions, one for each r−element subset of V (G). Ifwe fix such a subset X, then we notice that since each subdivided edgehas to use a distinct vertex of V (G) \ X. There are

(r2

)suspended

pathes in a subdivision, and at most n− r of them are of length two ormore, which are “really” subdivided edges. So the number of edges insubgraph induced by X is at least(

r

2

)− (n− r) ≥

(r

2

)+ r − r2

6≥ 2

3

(r

2

)

edges. But the number of edges in subgraph induced by X, denotedby e(X), has binomial distribution B(N, 1/2), where N =

(r2

). From

Chernoff bound in the last section,

Pr(e(X) ≥ N(1 + δ)/2) ≤ exp−Nδ2/2,

4.2. APPLICATIONS OF CHERNOFF’S BOUNDS 55

by taking δ = 1/3 hence 23

(r2

)=(r2

)(1 + δ)/2, we have

Pr

(e(X) ≥ 2

3

(r

2

))≤ exp−Nδ2/2 = exp

− 1

18

(r

2

).

Thus we bound the probability that our random graph G contains asubdivision of Kr as follows.

Pr(γ(G) ≥ r) ≤∑X

Pr

(e(X) ≥ 2

3

(r

2

))

≤(n

r

)exp

− 1

18

(r

2

)

=

(er exp−(r − 1)/36

6

)r,

which tends to zero as n→∞. 2

Since for almost all G in G(n, 1/2),

χ(G)− γ(G) ≥ n

2 log2 n−√

6n→∞

as n → ∞, so Hajos conjecture failed badly, and almost all graphsin G(n, 1/2) are counterexamples. Further more, the gap between thetruth and the conjecture is big.

The following is an application of Chernoff’s bounds for Ramseynumber r(Km,n, Kn).

Theorem 4.10 Let integer m ≥ 2 be fixed. Then there exists a con-stant c = c(m) > 0 such that

r(Km,n, Kn) ≥ cnm+1

(log n)m.

Proof. The lower bound is obtained through a simple application ofChernoff bound (Theorem 4.7). Let

N =

⌊nm+1

3(2m log n)m

⌋,

and let G(N, p) be a random graph of order N and edge probabilityp = (2m log n)/n. The probability that m chosen vertices in G(N, p)are connecting with another fixed vertex is pm. So the probability thatthey have at least n common neighbors is Pr(S ≥ n), where S has thebinomial distribution B(N −m, pm). Then n > Npm and Theorem 4.7yields

Pr(Km,n ⊆ G(N, p)) ≤(N

m

)((N −m)pme

n

)n

<Nm

m!

(Npme

n

)n< c1

nm(m+1)

(log n)m2

(e

3

)n,

where c1 = c1(m) > 0 is a constant. Hence Pr(Km,n ⊆ G(N, p)) →0. At the time, by standard estimates that

(Nn

)≤ (Ne/n)n and 1 −

p < e−p, we obtain a bound of the probability that G(N, p) has anindependent set of size at least n as follows

Pr(α(G(N, p)) ≥ n) ≤(N

n

)(1− p)n(n−1)/2

≤(Ne

ne−p(n−1)/2

)n≤(

c2

3(2m log n)m

)n,

where c2 = c2(m) > 0 is a constant, so Pr(α(G(N, p)) ≥ n)→ 0. Hencethe probability that G(N, p) contains neither Km,n as a subgraph noran independent set of size n is positive (in fact, close to 1). Thusr(Km,n, Kn) > N . 2.

Using an upper bound for Turan number ofKm,n and the main resultin Chapter 2, we can show that the lower bound in above theorem isthe right order of r(Km,n, Kn).

4.3 Martingales on Random Graphs ?

Most parameters of a random graph are concentrated around their ex-pectations. To describe such phenomena, martingale is a powerful tool,which may liberate us from drudgery computations.

4.3. MARTINGALES ON RANDOM GRAPHS ? 57

Let X and Y be random variables on a probability space Ω. Giv-en Y = y with Pr(Y = y) > 0, we define a conditional expectationE(X|Y = y) as

E(X|Y = y) =∑x

xPr(X = x|Y = y),

which is a number depending on y. As Y is random, we have a newrandom variable E(X|Y ). For an element s ∈ Ω, if Y (s) = y, thenE(X|Y ) takes value E(X|Y = y) at s.

Lemma 4.2 E[E(X|Y )] = E[X].

Proof. From the definition, we have

E[E(X|Y )] =∑y

E[X|Y = y] Pr(Y = y)

=∑y

(∑x

xPr[X = x|Y = y]

)Pr(Y = y)

=∑x

x

(∑y

Pr[X = x|Y = y] Pr(Y = y)

)=

∑x

xPr(X = x) = E(X)

as asserted. 2

A martingale is a sequence X0, X1, · · · , Xm of random variables sothat for 0 ≤ i < m,

E(Xi+1|Xi) = Xi;

namely, E(Xi+1|Xi = x) = x for any given Xi = x.Imagine one walks on a line randomly, at each step he moves one unit

to the left or right with probability p, or stands still with probability1 − 2p. Let Xi be the position of i step. This is a martingale as theexpected position after i + 1 steps equals the actual position after isteps.

Let us look some martingales used in graph theory. The first iscalled the edge exposure martingale on chromatic numbers, in whichwe reveal Gp one edge-slot at a time. Let the random graph space


G(n, p) be the underlying probability space. Set m =(n2

), and label

the potential edges on vertex set [n] by e1, e2, · · · , em in any manner.We define X0(H), X1(H), · · · , Xm(H) for a given graph H on vertex set[n], which are random variables if H is a random graph in G(n, p). LetX0(H) = E(χ(Gp)). For general i,

Xi(H) = E[χ(Gp)|ej ∈ E(Gp) iff ej ∈ E(H), 1 ≤ j ≤ i].

In other words, Xi(H) is the expected value of E[χ(Gp)] under thecondition that the set of the first i edges of Gp equals that of H whilethe remaining edges are not seen and considered to be random. Notethat X0 is a constant E(χ(Gp)) and Xm = χ(H).

Figure 1 shows why this is a martingale on the random space G(3, 0.5).Of cause, we can consider some other graph parameters.

e2 @@@

e3

e1

2@@@

2.25@@

1.75

e1

6 e1

@@

2.5

2PPP

2HHH

1.5@@@

3

2

2

22

2

2

1

@@

@@q@@

qq @@q q q

X0 X1 X2 X3 H

Fig. 5.1 An edge exposure martingale


3@@@1 2

2@@@

2.25PPPPP

1.75HHHHH@@@@@

3

2

2

2

2

2

2

1

@@

@@q@@

qq @@q q q

X1 X2 X3 H

Fig. 5.2 A vertex exposure martingale


The second is called the vertex exposure martingale on chromaticnumbers, in which we reveal Gp one vertex-slot at a time. Let therandom graph space G(n, p) be the underlying probability space. Wedefine X1 = E(χ(Gp)) and

Xi(H) = E[χ(Gp)|Ei(Gp) = Ei(H)],

where Ei(H) is the edge set induced by vertex set 1, · · · , i. In otherwords, Xi(H) is the expected value of E[χ(Gp)] under the conditionthat the set of the edges of Gp induced by the first i vertices equalsthat of H while the remaining edges are not seen and considered tobe random. Note that X1 is a constant E(χ(Gp)) and Xn = χ(H).Note that the vertex exposure martingale is a subsequence of the edgeexposure martingale.

In Fig. 5.1, The probability space is G(3, 0.5), so X0 = E(χ(Gp)) =2, and X1(H) = 2.75 if e1 ∈ E(H), and X1(H) = 1.75 otherwise. ThusE(X1|X0) = 2 = X0. The random variables X2 and X3 take 4 valuesand 8 values, respectively, and E(Xi+1|Xi) = Xi.

Lemma 4.3 Let Y be a (discrete) random variable such that E(Y ) = 0and |Y | ≤ 1. Then E(etY ) ≤ (et + e−t)/2 for all t ≥ 0.

Proof. For fixed t ≥ 0, set

h(y) =et + e−t

2+et − e−t

2y, −1 ≤ y ≤ 1.

Note that the function f(y) = ety is convex, and h(y) is a line throughthe point (−1, f(−1)) and (1, f(1)) as f(−1) = h(−1) and f(1) = h(1),hence ety ≤ h(y), and

E(etY ) ≤ E(h(Y )) =et + e−t

2

as E(Y ) = 0, and thus the assertion follows. 2

Theorem 4.11 (Azuma’s Inequality) Let X0, X1, · · · , Xm be a mar-tingale with

|Xi+1 −Xi| ≤ 1

for all 0 ≤ i < m, and let λ > 0. Then

Pr[Xm −X0 ≥ λ√m] < e−λ

2/2,

and

Pr[Xm −X0 ≤ −λ√m] < e−λ

2/2.

Proof. We may assume that X0 = 0 by translation. Set Yi = Xi−Xi−1,then |Yi| ≤ 1 and E(Yi|Xi−1) = 0. Then Lemma 4.3 yields that

E(etYi |Xi−1) ≤ et + e−t

2≤ et

2/2

for any t > 0, where the last inequality has been proved in the firstsection of this chapter. Hence by Lemma 4.2, we have

E(etXm) = E[etXm−1etYm

]= E

[E(etXm−1etYm|Xm−1

)]=

∑x

E(etXm−1etYm |Xm−1 = x

)Pr(Xm−1 = x)

=∑x

etxE(etYm|Xm−1 = x

)Pr(Xm−1 = x)

≤ et2/2∑x

etx Pr(Xm−1 = x)

= et2/2E(etXm−1).

This and the induction gave E(etXm) ≤ emt2/2. Using Markov’s In-

equality, we obtain

Pr(Xm ≥ λ√m) = Pr(etXm ≥ etλ

√m)

≤ E(etXm)

etλ√m≤ emt

2/2

etλ√m.

The assertion follows by letting t = λ/√m. 2

4.4 Parameters of Random Graphs

We are ready to discuss some parameters of random graph Gp for fixedp. It is easy to see some parameters are concentrated around theirexpectations. The following result was due Shamir and Spencer (1987).

Theorem 4.12 Let n and p be arbitrary and let Gp ∈ G(n, p). Then

Pr(|χ(Gp)− E(χ(Gp))| > λ

√n− 1

)< 2e−λ

2/2.

Proof. Consider the vertex exposure martingale X1, · · · , Xn on G(n, p)with the parameter χ(G). A single vertex can always be given a newcolor so Azuma’s Inequality can apply. 2

Similarly, we have

Pr(|ω(Gp)− E(ω(Gp))| > λ

√n− 1

)< 2e−λ

2/2,

and

Pr(|e(Gp)− E(e(Gp))| > λ

√m)< 2e−λ

2/2,

where m =(n2

). However, the proofs give no clue that what are these

expectations.

Lemma 4.4 Let 0 0 be fixed, and let

f(x) =(nx

)p(

x2) for 0 ≤ x ≤ n. Define an integer k such that

f(k − 1) > 1 ≥ f(k).

Then as n→∞,

dωn − εe ≤ k ≤ bωn + εc+ 1,

where

ωn = 2 loga n− 2 loga loga n+ 2 loga(e/2) + 1,

and f(k − 4) > c(

nloga n

)3= n3−o(1), where c > 0 is a constant.

4.4. PARAMETERS OF RANDOM GRAPHS 63

Proof. It is easy to know that k → ∞ and k = o(√n), thus by

Stirling’s formula, we have

f(k) =

(n

k

)p(

k2) ∼ nk

k!pk(k−1)/2 ∼ 1√

2πk

(en

kp(k−1)/2

)k.

So if δ > 0 fixed, for all large n,

en

kp(k−1)/2 ≤ 1 + δ

as f(k) ≤ 1. This is equivalent to that

k ≥ 2 loga n− 2 loga k + 2 loga e+ 1− 2 loga(1 + δ).

Let us set k ∼ 2 loga n first. Then the difference between the right handside in the above inequality and ωn is

2 loga2 loga n

k− 2 loga(1 + δ)→ −2 loga(1 + δ),

so k−ωn ≥ −2 loga(1+δ)+o(1) ≥ −ε if we take δ small enough. Hencek ≥ ωn − ε.

Similarly, from

f(k − 1) ∼ 1√2π(k − 1)

(en

k − 1p(k−2)/2

)k−1

,

we have enk−1

p(k−2)/2 ≥ 1, which gives

k ≤ 2 loga n− 2 loga(k − 1) + 2 loga e+ 2.

Furthermore, by taking k ∼ 2 loga n first, we obtain k ≤ ωn+1+o(1) ≤ωn + ε+ 1, the desired upper bound for k follows.

Finally, note that

f(k − 2) >f(k − 2)

f(k − 1)=

k − 1

n− k + 2ak−2 ∼ p2 k

nak >

cn

log n,

the assertion for f(k − 4) follows immediately. 2

Lemma 4.5 For fixed 0 0, almost all graphsGp ∈ G(n, p) satisfy

ω(Gp) < bωn + εc < 2 loga n,

where ωn is defined in Lemma 4.4.

Proof. Let Xr be the number of r-cliques, where r is referred as aninteger. Then

E(Xr) = f(r) =

(n

r

)p(

r2) ≤ nr

r!pr(r−1)/2 <

1√2πr

(en

rp(r−1)/2

)r.

We shall find some r = r(n) → ∞ such that E(Xr) → 0. This iscertainly true if enp(r−1)/2/r ≤ 1 (hence r →∞). The same argumentin the proof of Lemma 4.4 applies that if r = dωn+εe, then E(Xr)→ 0,thus Pr[ω(Gp) ≥ r]→ 0 and Pr[ω(Gp) ≤ bωn + εc]→ 1. 2

Remark. The above result can be stated as

Pr (ω(Gp) ≤ dωn + εe − 1)→ 1.

Matula (1970, 1972, 1976) was the first to notice that for fixed valuesof p almost all Gp ∈ G(n, p) have clique numbers concentrated on (atmost) two values,

bωn − εc ≤ ω(Gp) ≤ bωn + εc.

Results asserting this phenomenon were proved by Grimmett and Mc-Diarmid (1975); and these were further strengthened by Bollobas andErdos (1976).

In order to reduce the difficulty of the proof and preserve the typicalflavor, we slightly weaken the above lower bound bωn − εc by havingits asymptotical form a little bit later. Let us discuss the chromaticnumbers first. A technical lemma is as follows.

Lemma 4.6 Let k be the integer defined in Lemma 4.4 and let ` =k − 4. Let Y = Y (G) be the maximum size of a family of edge-disjointcliques of size ` in G ∈ G(n, p). Then

E(Y ) ≥ c n2

`4,

where c > 0 is a constant.

Proof. Let L denote the family of `-cliques of G. Then by Lemma 4.4,we have

µ = E(|L|) = f(`) =

(n

`

)p(

`2) ≥ c1

(n

`

)3

.

Let W denote the number of unordered pairs A,B of `-cliques of Gwith A ∼ B, where A ∼ B signifies that 2 ≤ |A ∩B| < `. Let

∆ =∑A∼B

Pr(AB),

where the sum is taken over all ordered pairs A,B. Then E(W ) =∆/2 and

∆ =

(n

`

)`−1∑i=2

(`

i

)(n− ``− i

)p2(`2)−(i2)

= µ`−1∑i=2

(`

i

)(n− ``− i

)p(

`2)−(i2) = µ

`−1∑i=2

Ri.

Setting a = 1/p, we have

Ri+1

Ri

=(`− i)2

(i+ 1)(n− 2`+ i+ 1)ai.

If i is small, say bounded, then this ratio is O((loga n)2/n), and if i islarge, say ` − i = O(1), then the ratio is at least

√n. It is increasing

on i, so

∆ = µ`−1∑i=2

Ri ≤ 2µ(R2 +Rr−1).

Here

R2 =

(`

2

)(n− ``− 2

)p(

`2)−1

=`2(`− 1)2

2p(n− `+ 2)(n− `+ 1)µ ≤ `4

2pn2µ,

andR`−1 = `(n− `)p(

`2)−(`−1

2 ) ≤ n`p`−1,

thus

∆ = 2µ

(`4

2pn2µ+ n`p`−1

)≤ C

µ2`4

n2.

Let C be a random subfamily of L defined by setting for each A ∈ L,

Pr[A ∈ C] = p1,

where 0 < p1 < 1 will be determined. Then E(|C|) = µp1. Let W ′ bethe number of unordered pairs A,B of `-cliques in C with A ∼ B.Then

E(W ′) = E(W )p21 =

∆p21

2.

Delete from C one set from each such pair A,B. This yields a set C∗of edge-disjoint `-cliques of G and

E(Y ) ≥ E(|C∗|) ≥ E(|C|)− E(W ′) = µp1 −∆p2

1

2.

By choosing p1 = µ∆< 1, we have

E(Y ) ≥ µ2

2∆≥ c n2

`4

as asserted. 2

Theorem 4.13 (Bollobas) Let 0 < p < 1, a = 1/p be fixed, andlet m = dn/ log2

a ne. Then for almost all graphs Gp ∈ G(n, p), eachinduced subgraph of order m of Gp has a clique of size at least r =2 loga n− 7 loga loga n.

Proof. Let S be an m-set of vertices. We shall bound the probabilitythat S induces no r-clique by e−m

1+δfor all large n (hence all large m),

where δ > 0 is a constant. So the probability that there exists an m-setwith no r-clique is at most(

n

m

)e−m

1+δ

<(en

m

)me−m

1+δ

= exp(m loge

en

m−m1+δ

),

which goes to zero, and the assertion follows.

Let X be the maximum number of pairwise edge-disjoint r-cliquessets in this graph (induced by S), where edge-disjoint means they shareat most one vertex. We shall show that X ≥ 1 holds almost surely.To do this, we invoke Azuma’s Inequality. Consider the edge exposuremartingale for X that results from revealing G one-edge slot at a time.We have X0 = E(X) and X(m2 ) = X. Clearly the Lipschitz condition

|Xi+1 −Xi| ≤ 1 is satisfied, so Azuma’s Lemma gives

Pr(X = 0) ≤ Pr[X − E(X) ≤ −E(X)]

= Pr

X − E(X) ≤ −λ(m

2

)1/2 ≤ e−λ

2/2

= exp

(− E2(X)

m(m− 1)

),

where λ = E(X)/(m2

)1/2. Hence it suffices to find δ > 0 such that

E2(X) ≥ m3+δ for all large n.Now, let t0 be the integer such that f(t0 − 1) > 1 ≥ f(t0), where

f(x) =(mx

)p(

x2), and let t = t0 − 4. Then by Lemma 4.4, we have

t ≥ 2 logam− 2 loga logam− 3 > 2 loga n− 7 loga loga n,

so t > r. Let T be the maximum number of edge-disjoint cliques of sizet, Then E(X) ≥ E(T ) and E(T ) ≥ cm2/t4 by Lemma 4.6, hence

E(X) ≥ cm2

t4∼ cn2

16(loga n)8,

implying that E2(X) ≥ n4−o(1) ≥ n3+δ for any 1 > δ > 0 if n is large,which completes the proof. 2

Theorem 4.14 (Bollobas, 1988) Let 0 0, almost all graphs Gp ∈ G(n, p)satisfy

n

2 logb n≤ χ(Gp) ≤ (1 + ε)

n

2 logb n.

Proof. The lower bound holds because almost all Gp satisfy α(Gp) ≤2 logb n and χ(G)α(G) ≥ n. The upper bound follows from the abovetheorem, which is applied for independent sets instead of cliques, be-cause we can almost always select independent set of size 2 logb n −7 logb logb n until we have only n/ log2

b n < (ε/2)n/(2 logb n) verticesleft. We first use at most

n

2 logb n− 7 logb logb n<(

1 +ε

2

)n

2 logb n

colors, and then we can complete the coloring by using distinct newcolors on each of the remaining vertices. 2

Let us remark that Achlioptas and Naor recently obtained a resulton sparser random graphs as follows. Given d > 0, let kd be the smallestinteger k such that d < 2k log k. Then χ(Gp) for almost all Gp ∈G(n, d/n) is either kd or kd + 1. This result improves an earlier resultof Luczak (1991) by specifying the form of kd.

Theorem 4.15 Let 0 0 be fixed. Then almost allgraphs Gp ∈ G(n, p) satisfy

(1− ε)2 logb n ≤ α(Gp) < 2 logb n.

Proof. The upper bound is the complement of that in Lemma 4.5.The lower bound follows from Theorem 4.14 and the fact that α(G) ≥n/χ(G). 2

Theorem 4.16 Let 0 0 be fixed. Then almost allgraphs Gp ∈ G(n, p) satisfy

(1− ε)2 loga n ≤ ω(Gp) < 2 loga n.

Proof. This is complement of Theorem 4.15. 2

For some graph parameter f(G), we have seen that there is a func-tion g(n) such that almost all graphs Gp in G(n, p) satisfy that

(1− ε)g(n) ≤ f(Gp) ≤ (1 + ε)g(n),

hence f(G) concentrate in a small range. We shall call the functiong(n) as a threshold for the parameter f . We will discuss the thresholdfor probability p = p(n) instead of fixed p, and will consider some othergraph parameters in the next chapter.

Chapter 5

Quasi-random graphs

Random graphs have been proven to be one of most important tools inmodern graph theory. Their tremendous triumph raises the followinggeneral question: what are the essential properties and how can we tellwhen a given graph behaves like a random graph Gp in G(n, p)? Herea typical property of random graphs is that almost all Gp satisfy. Thisleads us to a concept of quasi-random graphs. It was Thomason (1987)who introduced the notation of jumbled graphs to measure the similar-ity between the edge distribution of quasi-random graphs and randomgraphs. Quasi-random graphs are also called pseudo-graphs. A cor-nerstone contribution of Chung, Graham and Wilson (1989) showedthat many properties of different nature are equivalent to the nota-tion of quasi-random graphs. For a survey on quasi-random graphs,see Krivelevich and Sudakov (2006). This chapter focuses on quasi-random graphs. In recent years, there are some quasi-random familiesof graphs appearing, which are all constructed by finite fields. Theiralgebraic parameters are easier to compute, some of which are relatedto characters of finite fields and thus the third section is devoted tothe topics. The last section is application for quasi-random graphs inRamsey theory.

69

70 CHAPTER 5. QUASI-RANDOM GRAPHS

5.1 Properties of dense graphs

Speaking formally, a quasi-random G of order n is a graph that behaveslike a random graph G(n, p) with p = e(G)/

(n2

). For 0 < p < 1 ≤ α,

a graph G is called (p, α)-jumbled if each induced subgraph H on hvertices of G satisfies that∣∣∣e(H)− p

(h

2

)∣∣∣ ≤ αh.

Equivalently, G is (p, α)-jumbled if the average degree d(H) of eachinduced subgraph H of G satisfies that

|d(H)− p(h− 1)| ≤ 2α.

The following result of Thomason (1987) contains a simple localcondition of a graph of being jumbled.

Theorem 5.1 Let G be a graph of order n with δ(G) ≥ pn. If any pairof vertices has at most p2n+ ` common neighbors, where ` > 0, then G

is (p,√

(p+ `)n/2)-jumbled.

Proof. Let H be an induced subgraph of G of order h with d(H) = d,where h < n. Write V (G) = v1, v2, . . . , vn and V (H) = v1, v2, . . . , vh,say. Let di be the number of neighbors of vi in H for 1 ≤ i ≤ n. Then∑hi=1 di = hd and

n∑j=h+1

dj ≥h∑i=1

(pn− di) = h(pn− d).

Since any pair of vertices are covered by at most p2n + ` vertices, andat most that in H particularly, we have

n∑i=1

(di2

)≤(h

2

)(p2n+ `).

The above and the convexity of the function(x2

)imply that

h

(d

2

)+ (n− h)

(h(pn− d)/(n− h)

2

)≤(h

2

)(p2n+ `).

5.1. PROPERTIES OF DENSE GRAPHS 71

Equivalently,

(d− ph)2 ≤ n− hn

[(h− 1)`+ p(1− p)n

],

which gives that

|d− p(h− 1)| ≤√

(p+ `)n

as claimed. Finally, note that the same inequality holds for h = n. 2

For given graphs G and H, let N∗G(H) be the number of labeledoccurrences of H as an induced subgraph of G, which is the number ofadjacency-preserving injections from V (H) to V (G) whose image is theset of vertices of an induced copy ofH ofG. Namely, these injections areboth adjacency-preserving and non-adjacency-preserving. Let NG(H)be the number of labeled copies of H as a (not necessarily induced)subgraph of G. Then

NG(H) =∑H′N∗G(H ′),

where H ′ ranges over all graphs on V (H) obtained from H by adding aset of edges. For example, if G = H = Ct, then N∗G(H) = NG(H) = 2t,and if G = Kn and n ≥ t ≥ 4, then N∗G(Ct) = 0 and NG(Ct) =

N∗G(Kt) = (n)t. If G = Kn/2,n/2 and n is even, then NG(C4) = 2(n2(n

2−

1))2∼ 2 ·

(n2

)4for large n.

Let G be a (p, α)-jumbled graph of order n, where α = αn = o(n)as n → ∞. Then, as shown by Thomason, for fixed p and fixed graphH of order h

N∗G(H) ∼ pe(H)(1− p)(h2)−e(H)nh.

Let x and y be vertices of G. Denote by s(x, y) the number of verticesof G adjacent to x and y the same way: either to both or none. Let λibe eigenvalues of G with |λ1| ≥ |λ2| ≥ · · · ≥ |λn|. Let λ = λ(G) = |λ2|.For two (not necessarily disjoint) subsets B and C, let e(B,C) denotethe number of edges from B to C, in which each edge in B ∩ C iscounted twice. If B ∩ C = ∅, then e(B,C) is simply the number ofedges between B and C.

The quasi-random graph defined by Chung, Graham and Wilsonis in fact a family of simple graphs, which satisfy any (hence all) of


equivalent properties in the following theorem. It is remarkable thatthese properties ignore “small” local structures. The expressions of theproperties are related to the edge density p, here p = 1/2.

Theorem 5.2 Let G be a sequence of graphs, where G = Gn is agraph of order n. Then the following properties are equivalent:

P1(h): For any fixed h ≥ 4 and graph H of order h, N∗G(H) ∼ (12)(h2)nh.

P2(t): e(G) ∼ n2

4and NG(Ct) ≤ (n

2)t + o(nt) for any even t ≥ 4.

P3: e(G) ≥ n2

4+ o(n2), λ1 ∼ n

2and λ2 = o(n).

P4: For each U ⊆ V (G), e(U) = 12

(|U |2

)+ o(n2).

P5: For each U ⊆ V (G) with |U | = bn2c, e(U) ∼ n2

16.

P6:∑x,y

∣∣∣s(x, y)− n2

∣∣∣ = o(n3).

P7:∑x,y

∣∣∣|N(x) ∩N(y)| − n4

∣∣∣ = o(n3).

Proof.? The steps of proof of Chung, Graham and Wilso are P1(h+1)⇒P1(h) and

P1(2h)⇒ P2(2t)⇒ P2(4)⇒ P3 ⇒ P4 ⇐⇒ P5 ⇒ P6 ⇒ P1(2h),

so that all but P7 are proven to be equivalent. They then add P7 tothe equivalent chain by proving that

P2(t)⇒ P7 ⇒ P6.

Here, we omit some steps but keep most of them and preserve thetypical flavor.

Fact 1. P1(h+ 1)⇒ P1(h), and P1(3) implies the property

P0 :∑v

∣∣∣ deg(v)− n

2

∣∣∣ = o(n2).

Let us remark that P0 is equivalent to that

P ′0 : All but o(n) vertices of G have degree (1 + o(n))n2


by Cauchy-Schwarz inequality, and P0 implies that

e(G) ∼ n2

4.

Assume that P1(h+ 1) holds. Let H be a graph of order h. There are2h ways to extend it to a graph H ′ of order h+ 1, and each copy of His contained in n − h subgraphs H ′ of order h + 1. By P1(h + 1), wehave

N∗G(H ′) ∼ nh+12−(h+12 ),

thus

N∗G(H) ∼ nh+12−(h+12 ) 2h

n− h∼ nh2−(h2),

which is P1(h). Suppose that G satisfies P1(3). Let Hi be the graphof order 3 and i edges, 1 ≤ i ≤ 3. By counting how often each edge cancontribute to the various N∗G(Hi), we have

(n− 2)∑v

deg(v) = N∗G(H1) + 2N∗G(H2) +N∗G(H3) ∼ n3

2,

thus∑v deg(v) ∼ n2

2and e(G) ∼ n2

4. Also

∑v

deg(v)(deg(v)− 1) = N∗G(H2) +N∗G(H3) ∼ n3

4,

implying that∑v deg2(v) ∼ n3

4. Then, by Cauchy-Schwarz,

∑v

∣∣∣ deg(v)− n

2

∣∣∣ ≤ √n(∑

v

∣∣∣ deg(v)− n

2

∣∣∣2)1/2

=√n(∑

v

deg2(v)− n∑v

deg(v) +n3

4

)1/2,

which is o(n2).

Fact 2. P1(2t)⇒ P2(2t) (t ≥ 2). Fact 1 has proved that e(G) ∼ n2

4.

We then show that

NG(C2t) =∑H′N∗G(C2t) ≤ (1 + o(1))

(n2

)2t.


As H ′ ranges over all graphs on V (H) obtained from H by adding to

it a set of edges, the number of such sets is 2(2t2 )−2t. This and P1(2t)

imply P2(2t).

Fact 3. P2(2t) ⇒ P2(4) ⇒ P3. There is nothing to prove for thefirst implication and we prove the second. Let A be the adjacencymatrix of G and d the average degree of G. We first claim that

λ1 ≥ d.

Let us verify that for any unit vector X, λ1 ≥ X tAX. Let Λ be thediagonal matrix with diagonal entries λ1, λ2, . . . , λn and P a normalorthogonal matrix such that PAP t = Λ. Then PX is a unit vector,and

λ1 = λ1(PX)t · (PX) ≥ (PX)tΛ(PX) = X t(P tΛP )X = X tAX.

By taking X = 1√nJ , where J = (1, 1, . . . , 1)t, we obtain that

λ1 ≥1

nJ tAJ =

1

n

∑v

deg(v) = d

as claimed. This and e(G) ∼ n2

2imply λ1 ≥ n

2+ o(n). Next, consider

the trace of A4. Clearly,

tr(A4) =n∑i=1

λ4i ≥ λ4

1 ≥ (1 + o(1))n4

16.

On the other hand, as this trace is precisely the number of labeled andclosed walks of length 4 inG, i.e., the number of sequences v0, v1, v2, v3, v4 =v0 such that vivi+1 is an edge. This number is NG(C4) plus the num-ber of such sequences in which v2 = v0, and plus the number of suchsequences in which v2 6= v0. Thus

n∑i=1

λ4i = NG(C4) + o(n4) ∼

(n2

)4.

It follows that tr(A4) ∼ n4

16, thus λ1 ∼ n

2and

∑ni=2 λ

4i = o(n4) hence

λ2 = o(n) as desired.

Fact 4. P3 ⇒ P4. To simply the proof, we suppose that G isregular. Then the Fact 4 follows from Corollary 5.2 in the next section.

Fact 5. P4 ⇐⇒ P5. The implication P4 ⇒ P5 is immediate, sowe show P5 ⇒ P4. By ignoring one vertex possibly, we assume that nis even so that n/2 is an integer. Suppose that for any subset S with

|S| = n/2,∣∣∣e(S)− n2

16

∣∣∣ < εn2, where ε > 0 is fixed. We shall show thatfor any subset T , ∣∣∣e(T )− 1

2

(t

2

)∣∣∣ < 20εn2,

where t = |T |. Let us consider two cases, .Case 1. t = |T | ≥ n/2. By averaging over all S ⊆ T with |S| = n/2,

we have

e(T ) =1(t−2n/2−2

) ∑e(S) : S ⊆ T, |S| = n/2

as each edge is counted exactly

(t−2n/2−2

)times. Thus

e(T ) ≤

(tn/2

)(

t−2n/2−2

)(n2

16+ εn2

)≤(t

2

)(1

2+ 9ε

).

Similarly,

e(T ) ≥(t

2

)(1

2− 9ε

).

Case 2. t = |T | < n/2. We shall show that the assumption

e(T ) ≥ 1

2

(t

2

)+ 20εn2

leads to a contradiction. Set T = V \ T . Then |T | = n− t > n/2 andby Case 1, we have(

n− t2

)(1

2− 9ε

)< e(T ) <

(n− t

2

)(1

2+ 9ε

).

Consider the average value A of e(T ∪ T ′), where T ′ ranges over allsubsets of T with |T ′| = n/2− t so that |T ∪ T ′| = n/2, so

A =

(n− tn/2− t

)−1∑T ′

e(T ∪ T ′) : T ′ ⊆ T , |T ′| = n/2− t

.

Counting how much different edges contribute to the sum, we knowthat the sum equals to

e(T )

(n− tn/2− t

)+ e(T )

(n− t− 2

n/2− t− 2

)+ e(T, T )

(n− t− 1

n/2− t− 1

).

From the fact that e(T, T ) = e(G)− e(T )− e(T ), we obtain that

A =n/2

n− te(T )− (n/2− t)n/2

(n− t)(n− t− 1)e(T ) +

n/2− tn− t

e(G),

which satisfies

A ≥ n/2

n− t

1

2

(t

2

)+ 20εn2

− (n/2− t)n/2

(n− t)(n− t− 1)

(n− t

2

)(1

2+ 9ε

)+n/2− tn− t

(n

2

)(1

2− 9ε

)>n2

16+ εn2.

Similarly, the assumption

e(T ) <1

2

(t

2

)− 20εn2,

leads a contradiction to the property P5, too. 2

A property is called a quasi-random property for p = 1/2 if it isequivalent to any property in Theorem 5.2. It is surprised that P2(4),which seems to be weaker, is a quasi-random property for p = 1/2.

Theorem 5.3 The property

P2(4) : e(G) ∼ n2

4and NG(C4) ≤

(n2

)4+ o(n4)

is a quasi-random property for p = 1/2.

Proof. See Fact 3 in the proof of the last theorem. 2

Some other properties can be added to the list, one of which is inthe next theorem.

Theorem 5.4 The property

P8 : For all U, V ⊆ V (G), e(U, V ) =1

2|U ||V |+ o(n2)

is a quasi-random property for p = 1/2.

Proof. Let us prove the result by P4 ⇐⇒ P8. It suffices to show thatP4 ⇒ P8. Suppose that P4 holds. If U and V are disjoint, then

e(U, V ) = e(U ∪ V )− e(U)− e(V )

=1

4(u+ v)2 − 1

4u2 − 1

4v2 + o(n2)

=1

2uv + o(n2),

where u = |U | and v = |V |. In case U and V are not disjoint, write|U ∩ V | = x, from P4 and what we just proved, we know that e(U, V )equals to

e(U \ V, V \ U) + e(U ∩ V, U \ V ) + e(U ∩ V, V \ U) + 2e(U ∩ V )

=1

2(u− x)(v − x) +

1

2x(u− x) +

1

2x(v − x) + o(n2)

=1

2uv + o(n2),

which is P8. 2

The following theorem is for general edge density p. However, 0 <p < 1 is fixed.

Theorem 5.5 Let G be a sequence of graphs, where G = Gn is agraph of order n. Let 0 < p < 1 be fixed. Then the following propertiesare equivalent:

P1(h): For any fixed h ≥ 4 and graph H of order h,

N∗G(H) ∼ pe(H)(1− p)(h2)−e(H)nh.

P2(t): e(G) ∼ pn2

2and NG(Ct) ≤ (pn)t + o(nt) for any even t ≥ 4.


P3: e(G) ≥ pn2

2+ o(n2), λ1 ∼ pn and λ2 = o(λ1).

P4: For each U ⊆ V (G), e(U) = p(|U |2

)+ o(n2).

P5: For each U ⊆ V (G) with |U | = bn2c, e(U) ∼ p

8n2.

P6:∑x,y

∣∣∣s(x, y)− (p2 + (1− p)2)n∣∣∣ = o(n3).

P7:∑x,y

∣∣∣|N(x) ∩N(y)| − p2n∣∣∣ = o(n3).

5.2 Paley Graphs

Let q be a prime power. An element a ∈ F (q) is called to be quadraticif a = b2 for some b ∈ F (q). A quadratic element of Zp = F (p) is usuallycalled a quadratic residue (mod p) to signify the modulo operations.

Let us define a function χ(x) on F (q) as

χ(x) = x(q−1)/2.

This function is usually called the quadratic residue character of F (q).

Lemma 5.1 Let q be an odd prime power. Then χ(x) ∈ −1, 0, 1. Ifx 6= 0, then χ(x) = 1 if and only if x is quadratic, namely,

χ(x) =

1 x is quadratic, x 6= 0,0 x = 0,−1 x is not quadratic.

Furthermore, half of elements of F ∗(q) are quadratic, and half of thatare non-quadratic.

Proof. Let x be an element of F ∗(q). Then χ(x) = ±1 as

(χ(x)− 1)(χ(x) + 1) = χ2(x)− 1 = xq−1 − 1 = 0.

Let ν be a primitive element of F (q). Then

F ∗(q) = ν, ν2, . . . , νq−2, νq−1 = 1.

5.2. PALEY GRAPHS 79

As ν is primitive, it not quadratic and χ(ν) 6= 1 hence the set of non-zero quadratic elements is

S0 = ν2, ν4, . . . , νq−1 = 1,

and the set of non-quadratic elements is

S1 = ν, ν3, . . . , νq−2.

Using the facts that χ(ν) = −1 and χ(νk) = χk(ν), we have χ(x) = 1if and only if x ∈ S0, as claimed. 2

Note that, asymptotically, there are half primes p ≤ n of the formof p ≡ 1 (mod 4) and half of the form of p ≡ 3 (mod 4).

The Paley graph Pq is defined as follows. Let q ≡ 1 (mod 4) be aprime power. The vertex set of Pq is F (q), and distinct vertices x andy are adjacent if and only if

χ(x− y) = (x− y)(q−1)/2 = 1.

So x and y are adjacent if and only if x− y is non-zero quadratic. Notethat χ(x− y) = χ(y − x) as χ(−1) = 1 as q ≡ 1 (mod 4).

Let A be an additive group and let S be an inverse-closed subsetof A∗. A graph, called the Cayley graph with respect to S, is definedas follows: its vertex set is A and u and v are adjacent if u − v ∈ S.Clearly, the Paley graph is the Cayley graph with respective to thesubset of non-zero quadratic elements. As an example, it is easy toverify that P5 is C5, which is the Ramsey graph for r(3, 3).

A graph G of order n is said to be a strongly regular graph with pa-rameters n, d, λ, µ, denoted by srg(n, d, λ, µ) in short, if it is d-regular,and any pair of vertices have λ common neighbors if they are ad-jacent, and µ common neighbors otherwise. For example, C5 is ansrg(5, 2, 0, 1). Strongly regular graphs were introduced by Bose (1963).It is easy to see that the complement of an srg is also an srg.

Proposition 5.1 Let G be an srg(n, d, λ, µ). Then its complement isalso an srg(n, d1, λ1, µ1), where

d1 = n− d− 1,

λ1 = n− 2d+ µ− 2,

µ1 = n− 2d+ λ.


Proof. The value of d1 can be determined by d+d1 = n−1. Let u and vbe distinct vertices of G. If they are non-adjacent, then |N(u)∪N(v)| =2d−µ. The remaining n−2d+µ−2 vertices are the common neighborof u and v in G, giving λ1 as claimed. If u and v are adjacent, thenu, v ⊆ N(u) ∪ N(v) and |N(u) ∪ N(v)| = 2d − λ . The remainingn− 2d+ λ vertices are common neighbors of u and v in G, yielding µ1

as claimed. 2

For vertex disjoint graphs G and H, let G ∪ H be the graph onvertex set V (G) ∪ V (H) edge set E(G) ∪ E(H), which is called theunion of G and H. Let mG be the union of m copies of G. the unionmKk is an srg(mk, k − 1, k − 2, 0). On the other hand, if G is ansrg(n, k, λ, 0), then G is a union of complete graphs with the sameorder. We sometimes exclude complete and empty graphs as an srg toavoid to define µ and λ, respectively. A relation among the parametersis as follows.

Proposition 5.2 Let G be an srg(n, d, λ, µ). Then

d(d− λ− 1) = µ(n− d− 1).

Proof. Let v be a vertex and let M(v) be the set of non-neighbors of v.Consider the partition V (G) = v ∪N(v) ∪M(v). By the definition,N(v) contains d vertices, and M(v) contains n − d − 1 vertices. Eachvertex of N(v) is adjacent to λ vertices in N(v), and hence d − λ − 1vertices in M(v). Each vertex in M(v) is adjacent to µ vertices in N(v).Counting the edges between N(v) and M(v) in two ways, the requiredequality follows. 2

A graph G is called vertex-transitive if for any two vertices a and b ofG, there is an automorphism mapping a to b, it is called edge-transitiveif for any two edges ab and uv of G, there is an automorphism mappinga, b to u, v.

Theorem 5.6 Let q ≡ 1 (mod 4) be a prime power. Then the Paleygraph Pq is an

srg(q,q − 1

2,q − 5

4,q − 1

4

).

Furthermore, it is self-complementary, vertex-transitive and edge-transitive.

5.2. PALEY GRAPHS 81

Proof. Lemma 5.1 implies that Pq is (q − 1)/2-regular. Note that∑x χ(x) = 0, and the number of common neighbors of two vertices a

and b is

∑x 6=a,b

1 + χ(x− a)

2· 1 + χ(x− b)

2

=q − 2

4− χ(a− b)

2+

1

4

∑x 6=a,b

χ(x− a)χ(x− b).

Using the fact that for x 6= b, χ(x−b) = χ((x−b)−1) and the multiplicityof χ, we can write the last term as

1

4

∑x 6=a,b

χ(x− ax− b

)=

1

4

∑x 6=0,1

χ(x) =−1

4.

Thus the number of common neighbors of a and b is (q− 3)/4− χ(a−b)/2, which is (q− 5)/4 if a and b are adjacent and (q− 1)/4 otherwise.

Fix a ∈ F ∗(q) with χ(a) = −1, define a map φ0 as

φ0 : V (Pq)→ V (Pq), φ0(x) = ax.

Then the map φ0 is an automorphism between Pq and Pq. Hence Pq isself-complementary.

It is easy to verify that the map φ1(x) = a+b−x is an automorphismmapping a to b, and the map φ2(x) = u−v

a−b (x−b)+v is an automorphismmapping an edge ab to an edge uv. 2.

Then Pq is (q − 1)/2-regular, and the distinct eigenvalues of Pq are(q − 1)/2, (

√q − 1)/2 and −(

√q − 1)/2. Therefore,

e(Pq) =q(q − 1)

4∼ q2

4, λ1 =

q − 1

2∼ q

2, λ =

√q − 1

2= o(q).

Thus Pq satisfies quasi-random property P3 hence all other quasi-randomproperties with p = 1/2.


5.3 Graph with small second eigenvalue

The last section was devoted to the quasi-random graphs of fixed edgedensity. Let us now switch to the case of density p = p(n) = o(1),which is more important for some applications.

In applications, we shall allow the graphs to be semi-simple, thatis, each vertex is attached at most a loop. When p → 0, the situationis significantly more complicated as revealed by Chung and Graham(2002). The first remarked fact is that the properties defined for quasi-random graphs with fixed edge density may be not equivalent anymore.Let Eo

q be the Erdo-Renyi graph of order n = q2 + q + 1. The graph is(q+ 1)-regular, in which q+ 1 vertices have loops (each of such verticeshas one). So the edge density p ∼ 1√

n. We have found in Chapter 9

that λ1 = q + 1 ∼ pn, and λ =√q = o(d). So the property P3 holds.

However,p4(1− p)2n4 ∼ n2,

and thus the property P1(4) does not hold as Eoq does not contain C4.

Recall that the quasi-random property P3, the magnitude of λ =λ(G) is a measure of quasi-randomness. As called by Alon, a graph Gis an (n, d, λ)-graph if G is d-regular with n vertices and

λ = λ(G) = max|λi| : 2 ≤ i ≤ n,where λ1 = d, and λ2, . . . , λn are all eigenvalues of G. Here he con-nected quasi-randomness to the eigenvalue gap. For sparse graphs withp = o(1), Chung and Graham (2002) found some equivalent propertiesunder certain conditions. One of the properties is that λ1 ∼ pn andλ = o(λ1).

We shall have more results on (n, d, λ)-graphs, which are due to Alonet.al, particularly Alon and Spencer (2008). For two (not necessarilydisjoint) subsets B and C, we have defined e(B,C) as the number ofordered pairs (u, v) with u ∈ B and v ∈ C. If G is simple, then e(B,C)is the same as defined in the last section, i.e., it counts each edge fromB \ C to C \ B once, and each edge in B ∩ C twice. When G is semi-simple, it also counts each loop in B ∩ C once. For disjoint subsets Band C in a random graph, e(B,C) is expected to be d

n|B||C|, which is

close to the right-hand side of the inequality in the following theoremif λ is much smaller than d.

5.3. GRAPH WITH SMALL SECOND EIGENVALUE 83

Theorem 5.7 Let G = (V,E) be a semi-simple (n, d, λ)-graph. Thenfor each partition of V into disjoint subsets B and C,

e(B,C) ≥ (d− λ)|B||C|n

Proof. Let A be the adjacency matrix of G and I the identity matrixof order n. Observe that for any real vector x of dimension n (as a realvalued function on V ), we have

((dI − A)x, x) =∑u∈V

(dx2

u −∑

v: uv∈Exvxu

)= d

∑u∈V

x2u − 2

∑uv∈E

xvxu =∑uv∈E

(xu − xv

)2.

Set b = |B| and c = |C| = n− b. Define a vector x = (xv) by

xv =

−c v ∈ B,b v ∈ C.

Note that dI−A and A have the same eigenvectors, and that the eigen-values of dI − A are precisely d− µ as µ ranges over all eigenvalues ofA. Also, d is the largest eigenvalue of A corresponding to the eigen-vector J = (1, 1, . . . , 1)t and (x, J) = 0. Hence x is orthogonal to theeigenvector of the smallest eigenvalue of dI − A.

Since dI −A is a symmetric matrix, its eigenvectors are orthogonaleach other and form a basis of the n-dimensional space and x is alinear combination of these eigenvectors other than that of J/

√n. This

together with the fact that d − λ is the second smallest eigenvalue ofdI − A, we have

((dI − A)x, x) ≥ (d− λ)(x, x) = (d− λ)(bc2 + cb2) = (d− λ)bcn.

However, as B and C form a partition of V ,∑uv∈E

(xu − xv)2 = e(B,C)(b+ c)2 = e(B,C)n2,

implying the desired inequality. 2

The next theorem bounds some kind of variance. In a random d-regular graph, we expect that a vertex v has d

n|B| neighbors in B. The

theorem shows that if λ is small, then |NB(v)| is not too far from theexpectation for most vertices v, where NB(v) = N(v) ∩B.


Theorem 5.8 Let G = (V,E) be a semi-simple (n, d, λ) graph. Thenfor each B ⊆ V ,

∑v∈V

(|NB(v)| − d

n|B|

)2≤ λ2 |B|(n− |B|)

n.

Proof. Let A be the adjacency matrix of G. Define a vector f : V → Rby

fu =

1− b

nu ∈ B,

− bn

u 6∈ B,where b = |B|. Then

∑u fu = 0, and f is orthogonal to the eigenvector

J = (1, 1, . . . , 1)t of the largest eigenvalue d of A. Thus f is a linearcombination of eigenvectors other than J , and

(Af,Af) ≤ λ2(f, f) = λ2 b(n− b)n

.

Let Av be the row of A corresponding to vertex v. Then the coordinate(Af)v of Af at v is

Avf =(1− b

n

)|NB(v)| − b

n(d− |NB(v)|) = |NB(v)| − db

n,

and thus

(Af,Af) =∑v

(|NB(v)| − db

n

)2,

the desired inequality follows. 2

Corollary 5.1 Let G = (V,E) be a semi-simple (n, d, λ)-graph. Thenfor every two subsets B and C of G, we have∣∣∣e(B,C)− d

n|B||C|

∣∣∣ ≤ λ√|B||C|.

Proof. Set b = |B| and c = |C|. Note that∣∣∣e(B,C)− dbc

n

∣∣∣ =∣∣∣ ∑v∈C

(|NB(v)| − db

n

)∣∣∣ ≤ ∑v∈C

∣∣∣|NB(v)| − db

n

∣∣∣≤√c[ ∑v∈C

(|NB(v)| − db

n

)2]1/2,

5.3. GRAPH WITH SMALL SECOND EIGENVALUE 85

where the Cauchy-Schwarz inequality is used. From Theorem 5.8, wehave ∣∣∣e(B,C)− dbc

n

∣∣∣ ≤ √c[ ∑v∈V

(|NB(v)| − db

n

)2]1/2

≤ λ√c

√b(1− b

n) ≤ λ

√bc

as desired. 2

Let e(B) and `(B) be the number of edges and loops in B, respec-tively. Then

e(B,B) = 2 e(B) + `(B).

Note that `(B) ≤ |B| if G is semi-simple.

Corollary 5.2 Let G = (V,E) be a semi-simple (n, d, λ) graph, andlet B be a subset of G. Then

∣∣∣e(B)− d

2n|B|2

∣∣∣ ≤ λ+ 1

2|B|.

Remark. By setting e(B) = 0, we have α(G) ≤ λ+1dn, which is

slightly weaker than a similar bound obtained in Chapter 9.

For an (n, d, λ)-graph G = (V,E) and B ⊆ V , define B as the set ofvertices u so that the proportion of N(u) in B, which is |NB(u)|/|B|, isat most half of that in V . Then |B||B| is at most Θ(n2/d) if λ = Θ(

√d).

Corollary 5.3 Let G = (V,E) be a semi-simple (n, d, λ)-graph andB ⊆ V . Define

B = u ∈ V : |NB(u)| ≤ d

2n|B|,

where NB(u) = N(u) ∩B. Then

|B||B| ≤(2λn

d

)2.

Consequently, |B ∩B| ≤ 2λnd

.


Proof. From Theorem 5.8, we have

∑v∈V

(|NB(v)| − d

n|B|

)2≤ λ2 |B|(n− |B|)

n≤ λ2|B|.

Each v ∈ B contributes to the left-hand side more than(d|B|2n

)2, thus

|B|(d|B|

2n

)2≤ λ2|B|,

implying what as claimed.

For an (n, d, λ)-graph, the spectral gap between d and λ is a measurefor its quasi-random property. The smaller the value of λ compared tod, the closer is edge distribution to the ideal uniform distribution. Howsmall can be λ?

Theorem 5.9 Let G be an (n, d, λ)-graph and let ε > 0. If d ≤ (1−ε)n,then

λ ≥√εd.

Proof. Let A be the adjacency matrix of G. Then

nd = 2e(G) = tr(A2) =n∑i=1

λ2i

≤ d2 + (n− 1)λ2 ≤ (1− ε)nd+ nλ2,

which follows by what claimed.

On this estimate, we can say, not precisely, that an (n, d, λ)-graphwith λ = Ω(

√d) has good quasi-randomness. Recall a result in Chapter

2, if G is an srg(n, d, µ1, µ2) with n ≥ 3. Then, except λ1 = d, the othereigenvalues are solutions of the equation

λ2 + (µ2 − µ1)λ+ (µ2 − d) = 0.

Thus when µ1−µ2 is small compared to d, which implies that λ is closeto√d, G has good quasi-randomness.

5.4. ERDOS-RENYI GRAPHS 87

5.4 Erdos-Renyi graphs

The starting point of a problem involving complete bipartite graph isusually at C4 = K2,2. We begin with a construction of a graph ofErdos-Renyi (1962), which contains no C4.

Let F = F (q) be the Galois field with q elements. Define an equiv-alence relation ≡ on (F 3)∗ = F 3 \ (0, 0, 0) by letting (a1, a2, a3) ≡(b1, b2, b3) if there is λ ∈ F ∗ = F\0 such that (a1, a2, a3) = λ(b1, b2, b3).Let 〈a1, a2, a3〉 denote the equivalence class containing (a1, a2, a3), andlet V be the set of all equivalence classes.

Define a graph Eq on vertex set V by letting distinct vertices 〈a1, a2, a3〉and 〈x1, x2, x3〉 be adjacent if and only if

a1x1 + a2x2 + a3x3 = 0.

This definition is clearly compatible, i.e., it does not depend on thechoice of representative elements of the equivalence classes. It is trivialto see that

|V | = q3 − 1

q − 1= q2 + q + 1.

For a vertex A = 〈a1, a2, a3〉, since a1x1 + a2x2 + a3x3 = 0 has q2 − 1solutions forming q + 1 vertices,

deg(A) =

q if a2

1 + a22 + a2

3 = 0,q + 1 otherwise.

We now come to the point to see the most important fact on Eqwhat contains no C4.

Theorem 5.10 The graph Eq contains no C4.

Proof. Let 〈a1, a2, a3〉 and 〈b1, b2, b3〉 be distinct vertices. Then thevectors (a1, a2, a3) and (b1, b2, b3) are linearly independent. Considerthe equation system

a1x1 + a2x2 + a3x3 = 0b1x1 + b2x2 + b3x3 = 0,

which has exactly q− 1 solutions forming only one vertex. The desiredassertion follows. 2


Let n = q2 + q + 1 and let e(Eq) be the number of edges of Eq.Then, as q →∞,

ex(n;C4) ≥ e(Eq) ≥ (1− o(1))1

2n3/2.

Let us associate the graph Eq with a more general construction,which is a (q + 1)-uniform hypergraph (X,L) called projective plane.However, the members in L are called lines, and the order of such aplane does not mean the cardinality of X. We will need projectiveplanes in the next section.

A projective plane of order q consists of a set X of q2 +q+1 elementscalled points, and a family L of subsets of X called lines, having thefollowing properties:

(P1) Every line has q + 1 points.(P2) Any pair of distinct points lie on a unique line.

The only possible projective plane of order q = 1 is a triangle. Theunique projective plane of order q = 2 is the famous Fano plane. Itcontains 7 points, 7 lines, in which each line has 3 points, see Fig 9.1

u

u uu u

uu

TTTTTTT"

"""""

bb

bb

bb

&%'$

Fig. 9.1. The Fano plane

Additional properties of projective planes are as follows.

Corollary 5.4 A projective plane of order q has the properties as fol-lows.


(P3) Any point lies on q + 1 lines.(P4) There are q2 + q + 1 lines.(P5) Any two lines meet in a unique point.

Proof. Fix a point x ∈ X. There are q(q + 1) points different from x;each line through x contains q further points, and there are no otheroverlaps between these lines (apart from x). So q(q+1) points of X\xare partitioned equally into parts by these lines. Therefore there mustbe q + 1 lines through x.

To show (P4), let us count in two ways the pairs (x, L) with x ∈ L,we obtain |L|(q + 1) = (q2 + q + 1)(q + 1), so |L| = q2 + q + 1.

Finally, let L1 and L2 be distinct lines, and let x be a point of L1.Then the q + 1 points of L2 are joined to x by different lines. Sincethere are only q + 1 lines through x, they all meet L2 in a point. Inparticular, L1 meets L2 in a point. 2

A nice property of projective planes is their duality. Let (X,L) be aprojective plane of order q, and let M = (mx,L) be its incidence matrix,in which the rows and columns correspond to points and lines. Eachrow and column of M has exactly q + 1 1′s, and any two rows and anytwo columns share exactly one 1. The transpose of M leaves the matrixunchanged.

Return to the graph Eq, whose vertex set is V consisting of q2 +q+1points (equivalence classes in (F 3

q )∗). Let 〈a1, a2, a3〉 be a point of V .Define a line L(a1, a2, a3) to be the set of all points 〈x1, x2, x3〉 in V(not vectors (x1, x2, x3) in (F 3

q )∗ ) for which

a1x1 + a2x2 + a3x3 = 0.

It is easy to see that the definition for lines is compatible, and eachline contains exactly q + 1 points. Note that some lines L(a1, a2, a3)contain point 〈a1, a2, a3〉 and some do not. Any pair of distinct points〈x1, x2, x3〉 and 〈y1, y2, y3〉 lie on a unique line L(a1, a2, a3) with

a1x1 + a2x2 + a3x3 = 0,a1y1 + a2y2 + a3y3 = 0.

Therefore, we obtain a projective plane (V,L), where L consists of al-l lines defined. This projective plane is usually denoted by PG(2, q).


Some authors use PG(2, q) to signify the Erdos-Renyi graph or a bi-partite graph, whose bipartition are points and lines, in which a pointis adjacent to a line if and only if the point is contained in the line.

No projective plane of order non-prime power is known to exist,and it is conjectured that there is none. It is known that there is noprojective plane of order 6, 10 or 14. It is not known whether there isa projective plane of order 12.

We hope to have an exact expression of e(Eq).

Lemma 5.2 Let q = pm with prime p and odd m. Then there areprecisely q2 − 1 non-zero solutions (x1, x2, x3) of the equation

x21 + x2

2 + x23 = 0

in F (q), and hence precisely q + 1 vertices in Eq of degree q and q2

vertices of degree q + 1.

Proof. Label the vertex set of Eq as

V (Eq) = A,B, · · · , X, · · · , Y, · · · , Z

in some order. We write X ⊥ Y if and only if

x1y1 + x2y2 + x3y3 = 0,

where X = 〈x1, x2, x3〉 and Y = 〈y1, y2, y3〉 . Let n = q2 + q + 1 anddefine an n× n real matrix M = (mij) by

mij =

1 if X ⊥ Y ,0 otherwise,

where X and Y represent the ith vertex and the jth vertex, and X andY are associated with the ith row and jth row in M , respectively. Notethe diagonal elements of M are different from that in the adjacencymatrix of Eq. We have mii = 1 if X ⊥ X, that is, X lies on the conicx2

1 + x22 + x2

3 = 0. All that remains to show is that

tr(M) = q + 1,


where tr(M) =∑ni=1mii is the trace of M . We know that the trace

equals the sum of eigenvalues.

Fact 1. Any row of M contains precisely q + 1 ones hence q + 1 isan eigenvalue of M .

This is because ML = (q + 1)L, where L = (1, 1, . . . , 1)T .

Fact 2. For i 6= j, there is exactly one column with 1 in both theith row and the jth row. Namely, Mi ·Mj = 1, where Mi and Mj arethe ith row and the jth row of M , respectively.

Suppose that X and Y represent (the vertices) the ith row and thejth row, respectively. Then there is a unique (vertex) row, say the kthrow, corresponding to the solution (w1, w2, w3) of the equation system

x1w1 + x2w2 + x3w3 = 0,y1w1 + y2w2 + y3w3 = 0.

That is to say, mik = mjk = 1. Note that M is symmetric, so we seethat only in the kth column, the elements in both the ith row and thejth row are 1.

Using these two facts and the symmetry of M , we have

M2 =

q + 1 1 · · · 1 1

1 q + 1 · · · 1 1...

...1 1 · · · 1 q + 1

= qI + J,

where I is the identity matrix and J is the all-ones-matrix. It is easyto see J has the eigenvalues n = q2 + q+ 1 (of multiplicity 1) and 0 (ofmultiplicity n−1 = q2+q). This follows by that M2 has the eigenvaluesq+n = (q+1)2 (of multiplicity 1) and q (of multiplicity n−1 = q2 +q).

Since M is symmetric and hence it is diagonalizable intoλ1

λ2

. . .

λn

,


where λ1, . . . , λn are eigenvalues of M . This implies that M2 is diago-nalizable into

λ21

λ22

. . .

λ2n

,implying that λ2

1 = (q + 1)2 and λ22 = . . . = λ2

n = q, in which theeigenvalues may be relabelled. So λ1 = q + 1 as q + 1 is an eigenvalueof M , and λi = ±√q for i = 2, . . . , n. Let s and t be the numbers ofeigenvalues of M equal to

√q and −√q, respectively. Then s+t = n−1

and

tr(M) = (q + 1) + (s− t)√q.

Since the trace is an integer and q = pm with m odd, we must haves = t hence tr(M) = q + 1, completing the proof. 2

Let us remark that in fact for any prime power q, the assertion inabove lemma holds.

Theorem 5.11 The order of the graph Eq is q2 + q + 1 and

e(Eq) =1

2q(q + 1)2

for prime power q. 2

Remak. If we define a loop for a vertex A when a21 + a2

2 + a23 = 0,

and a graph Eoq from Eq by attaching a loop for a vertex 〈a1, a2, a3〉,

then Eoq is (q + 1)-regular. The proof of Lemma 5.2 in fact gives that

the distinct eigenvalues of Eoq are q + 1,

√q and −√q. So λ ∼

√d.

These eigenvalues are close to that of Eq.

Lemma 5.3 Let A and B be real symmetric matrices of order n. Letthe eigenvalues λi(A), λi(B) and λi(A+B) of A, B and A+B, respec-tively, be labelled in non-increasing order. Then, for each 1 ≤ i ≤ n,we have

λi(A) + λ1(B) ≥ λi(A+B) ≥ λi(A) + λn(B).

5.5. APPLICATIONS OF CHARACTERS ? 93

Therefore, the eigenvalues λi of Eq satisfy q ≤ λ1 ≤ q+ 1 and the otherpositive λi has

√q ≤ λi ≤

√q + 1, and negative ones have −√q + 1 ≤

λi ≤ −√q + 1.

Note that Eq does not have a fixed positive density. However, it hasa good quasi-randomness as λ1 = O(

√d).

5.5 Applications of characters ?

We shall find the spectrum of Gq,t defined in Chapter 9. Let us definethe characters of a finite field F (q), which are group homomorphismsfrom F (q) or F ∗(q) to

S1 = z : |z| = 1 = eiθ : 0 ≤ θ < 2π,

respectively, where S1 is a multiplicative group of complex numbers.An additive character of F (q) is a function ψ : F (q) → S1 such thatfor any x, y ∈ F (q),

ψ(x+ y) = ψ(x)ψ(y).

Clearly ψ(0) = 1 and ψ(−x) = ψ(x). The trivial function ψ0 withψ0(x) ≡ 1 is also called the principle additive character of F (q).

A multiplicative character of F (q) is a function χ : F ∗(q)→ S1 suchthat for any x, y ∈ F ∗(q),

χ(xy) = χ(x)χ(y).

Clearly χ(1) = 1 and χ(x−1) = χ(x). The trivial function χ0 withχ0(x) ≡ 1 is also called the principal multiplicative character of F (q).It is often to extend the domain of a multiplicative character χ to allof F (q) by χ(0) = 0 if χ 6= χ0, and χ0(0) = 1. The character χ withχ(x) = x(q−1)/2 is often called quadratic residue character.

In the following proofs, we shall not distinguish the elements of F (p)from the integers of 0, 1, . . . , p− 1.

Lemma 5.4 The numbers of additive characters and multiplicative char-acters of F (q) are q and q − 1, respectively.


Proof. Let us begin with the multiplicative group F ∗(q), which is acyclic group of order q − 1 with F ∗(q) = 1, µ, . . . , µq−2, where µ is aprimitive element of F (q). Each multiplicative character χ of F (q) isuniquely determined by χ(µ). From 1 = χ(µq−1) = χ(µ)q−1, we havethat χ(µ) = ζkq−1 for some 0 ≤ k ≤ q − 2, where ζq−1 = e2πi/(q−1). If weuse χ1 to signify the multiplicative character of F (q) with χ1(µ) = ζq−1,then the set of all multiplicative characters of F (q) is χk1 : 0 ≤ k ≤q − 2, in which χ0

1 is the trivial character χ0. Thus F (q) has q − 1multiplicative characters, forming a group isomorphic to F ∗(q).

Let q = pm and let ζp = e2πi/p. For each a = (a1, a2, . . . , am) ∈Fm(p), set

ψa : F (q)→ S1, ψa(x) = ζa1x1+a2x2+···+amxmp ,

where x = (x1, x2, . . . , xm) is the unique expression of x as a vector ofFm(p). Then ψa is an additive character of F (q). For a 6= a′, we showthat ψa 6= ψa′ . It suffices to show that ψa is not the trivial characterfor a 6= 0. Since for a 6= 0 there is some k such that 1 ≤ ak ≤ p − 1,so for ek = (0, . . . , 0, 1, 0, . . . , 0), the unit vector with 1 at the kthcoordinate, ψa(ek) = ζakp 6= 1. Thus the group of additive characters ofF (q) contains at least hence exactly q elements. 2

As usual, the function δ(x, y) is the Kronecker’s symbol defined as

δ(x, y) =

1 if x = y0 otherwise.

The proof of Lemma 5.4 implies the following result.

Lemma 5.5 Let χ be a multiplicative character of F (q). Then∑t∈F (q)

χ(t) = q δ(χ, χ0).

Let us define the Gaussian sum as

G(χ, ψ) =∑

x∈F (q)

χ(x)ψ(x).


Theorem 5.12 Let χ be a multiplicative character of F (q) and ψ bean additive character of F (q). Then G(χ0, ψ0) = q, and G(χ, ψ) = 0 ifexactly one of χ and ψ is trivial. Furthermore

|G(χ, ψ)| = √q

if none of χ and ψ is trivial.

Proof. The first two equalities are easy, and we shall verify the last. Inorder to simplify the proof, we prove it for the case that q is a prime p,which is the most important special case. The proof for general q willbe given later, which is more involved and the readers are encouragedto skip.

Let ψ and χ be additive character and multiplicative character ofF (p), none of which is trivial. From the proof of Lemma 5.4, we haveψ(x) = ζaxp , where ζp = e2πi/p and a 6= 0. Let ga(χ) =

∑x∈F (p) χ(x)ζaxp ,

which is G(χ, ψ) on F (p).We shall verify that ga(χ) = χ(a)g1(χ). This follows from that

ga(χ) =∑

x∈F (p)

χ(x)ζaxp =∑

y∈F (p)

χ(a−1y)ζyp

= χ(a−1)∑

y∈F (p)

χ(y)ζyp = χ(a)g1(χ),

and|ga(χ)|2 = ga(χ)ga(χ) = |χ(a)|2|g1(χ)|2 = |g1(χ)|2.

That is to say, |ga(χ)|2 have the same value for any a 6= 0. On the otherhand, for any a ∈ F (p)

ga(χ)ga(χ) =∑

x∈F (p)

χ(x)ζaxp∑

y∈F (p)

χ(y)ζ−ayp =∑

x, y∈F (p)

χ(x)χ(y)ζa(x−y)p .

It is easy to see that∑a∈F (p) ζ

a(x−y)p = p δ(x, y) as a(x− y) ranges all of

F (p) for x− y 6= 0, which and the fact that χ(0) = 0 as χ 6= χ0 implythat ∑

a∈F (p)

ga(χ)ga(χ) =∑

x, y∈F (p)

χ(x)χ(y)δ(x, y)p = (p− 1)p.


Since g0(χ) = 0 as χ 6= χ0, we obtain that (p − 1)|g1(χ)|2 = (p − 1)phence |ga(χ)| = |g1(χ)| = √p. 2

The proof of Theorem 5.12 for general q = pm

The forms of additive characters in the proof of Lemma 5.4 aresimple, but we shall express them in the other way for proving Theorem5.12 in general case.

For α ∈ F (q) = F (pm), define the trace of α to be

tr(α) = α + αp + αp2

+ · · ·+ αpm−1

.

Lemma 5.6 If α, β ∈ F (q) and a ∈ F (p), then(a) tr(α) ∈ F (p).(b) tr(α + β) = tr(α) + tr(β).(c) tr(aα) = a tr(α).(d) For fixed α 6= 0, tr(αx) maps F (q) onto F (p).

Proof. The properties (a),(b) and (c) follow from the facts that trp(α) =tr(α), (α+β)p = αp+βp, αq = α and ap = a. To show the property (d),consider the fact that the polynomial tr(αx) has at most pm−1 rootsand αx ranges all pm elements of F (q), there is x ∈ F (q) = F (pm) suchthat tr(αx) = c 6= 0, where c ∈ F (p). If b ∈ F (p), the using property(c) we see that tr((b/c)αx) = (b/c)tr(αx) = b. Thus the trace tr(αx)is onto. 2

For fixed α ∈ F (q), we now define ψα : F (q)→ S1 by

ψα(x) = ζtr(αx)p ,

where ζp = e2πi/p. Note that ψ0 is the trivial additive character of F (q).For the case q = p, ψα(x) = ζαxp is exactly what we have used.

Lemma 5.7 The function ψα has the following properties.(a) ψα(x+ y) = ψα(x)ψα(y) for any x, y ∈ F (q).(b) If α 6= 0, then there is x ∈ F (q) such that ψα(x) 6= 1.(c) If α 6= 0, then

∑x∈F (q) ψα(x) = 0; if x 6= 0, then

∑α∈F (q) ψα(x) =

0.


Proof. The property (a) follows from that tr(α(x + y)) = tr(αx) +tr(αy). The property (b) follows from the fact that tr(αx) is onto asα 6= 0, so there x ∈ F (q) such that tr(αx) = 1. Then ψα(x) = ζp 6= 1.As ψα(x) = ψx(α), we shall only verify the first equality in the property(c). Let S =

∑x∈F (q) ψα(x). Choose y such that ψα(y) 6= 1, thus

ψα(y)S =∑

x∈F (q)

ψα(x)ψα(y) =∑

x∈F (q)

ψα(x+ y) = S,

thus S = 0. 2

Lemma 5.8 For any fixed α ∈ F (q), the function ψα is an additivecharacter of F (q), and any additive character of F (q) is of such form.Furthermore, for any x, y ∈ F (q),∑

α∈F (q)

ψα(x− y) = q δ(x, y).

Proof. The first assertion follows the property (a) in Lemma 5.7. Forthe second, we shall verify that the number of such functions is q. Itsuffices to show that if α 6= β, the functions ψα and ψβ are distinct. Ifψα(x) = ψβ(x) for any x ∈ F (q), then

ζtr((α−β)x)p = ψα−β(x) = 1

for any x ∈ F (q), implying that α = β from the property (b) in Lemma5.7.

Since ∑α∈F (q)

ψα(x− y) =∑

α∈F (q)

ζtr(α(x−y))p ,

which is q for x = y. If x 6= y, equality follows from the fact thatα(x− y) ranges over all of F (q) and the property (c) in Lemma 5.7. 2

We now write Gaussian sum in the form

G(χ, ψα) =∑

x∈F (q)

χ(x)ψα(x).

We shall prove that if χ 6= χ0 and α 6= 0, then

|G(χ, ψα)| = √q.


Proof. The proof is an analogy of that for the case q = p. For anyα 6= 0, we first verify that G(χ, ψα) = χ(α)G(χ, ψ1). This is becausethat

G(χ, ψα) =∑

x∈F (q)

χ(x)ζtr(αx)p =

∑y∈F (q)

χ(α−1y)ζtr(y)p

= χ(α−1)∑

y∈F (q)

χ(y)ζtr(y)p = χ(α)G(χ, ψ1).

Therefore, we have |G(χ, ψα)|2 = |G(χ, ψ1)|2 for α 6= 0. On the otherhand, for any α ∈ F (q),

G(χ, ψα)G(χ, ψα) =∑

x∈F (q)

χ(x)ζtr(αx))p

∑y∈F (q)

χ(y)ζ−tr(αy))p

=∑

x, y∈F (q)

χ(x)χ(y)ζtr(α(x−y))p .

Since χ(0) = 0 as χ 6= χ0,∑α∈F (q)

G(χ, ψα)G(χ, ψα) =∑x, y

χ(x)χ(y)δ(x, y)q = q(q − 1).

Observing that G(χ, ψ0) = 0 as χ 6= χ0, we have∑α∈F (q)

G(χ, ψα)G(χ, ψα) =∑

α∈F ∗(q)|G(χ, ψ1)|2 = (q − 1)|G(χ, ψ1)|2,

yielding that (q−1)|G(χ, ψ1)|2 = q(q−1) hence |G(χ, ψα)| = |G(χ, ψ1)| =√q. The proof for general case of Theorem 5.12 is completed. 2

The order of a multiplicative character χ is the smallest positiveinteger d such that χd = χ0. A more sophisticated result on charactersum is the Weil’s theorem as follows. Let χ be the multiplicative char-acter of Fq = F (q) of order d > 1 and f(x) a polynomial over Fq. Iff(x) has precisely s distinct zeros and it is not the form c(g(x))d, wherec ∈ Fq and g(x) ∈ Fq[x], then∣∣∣∣∣∣

∑x∈F (q)

χ(f(x))

∣∣∣∣∣∣ ≤ (s− 1)√q. (5.1)

In particular, the inequality holds when χ is the square residue char-acter and f(x) is not the form cg2(x), where c ∈ Fq and g(x) ∈ Fq[x].Similarly, for an additive character ψ 6= ψ0, if g(x) is a polynomial of

degree n < q with g.c.d. (n, q) = 1, then∣∣∣∑x∈F (q) ψ(g(x))

∣∣∣ ≤ (n−1)√q.

The prepared results are enough for introducing the following resultof Szabo (2003) on spectrum of Go

q,t, which is constructed in Chapter9. Note that Go

q,t is qt−1(q − 1)-regular, in which each vertex (A, a) ∈F (q2)× F ∗(q) with N(2A) = a2 has a loop.

Theorem 5.13 Let t ≥ 2 be an integer and q be an odd prime power.The spectrum of Go

q,t is as follows.

eig. qt−1 − 1 q(t−1)/2 1 0 −1 −q(t−1)/2

mul. 1 (qt−1−1)(q−2)2

qt−1−12

q − 2 qt−1−12

(qt−1−1)(q−2)2

Proof. Let M be the adjacency matrix of Goq,t. Let ψ be an additive

character of F (qt−1) and χ be a multiplicative character of F (q). LetV (ψ, χ) be the column vector whose coordinates are labeled by verticesof Go

q,t, whose entry at (X, x) is ψ(X)χ(x). Then the entry of thecolumn vector MV (ψ, χ) at (A, a) is

∑(B,b)∈F (qt−1)×F∗(q)

N(A+B)=ab

ψ(B)χ(b) =∑

B∈F (qt−1)\−Aψ(B)χ

(N(A+B)

a

)

=∑

C∈F ∗(qt−1)

ψ(C − A)χ

(N(C)

a

)

=

∑C∈F ∗(qt−1)

ψ(C)χ(N(C))

ψ(A) χ(a).

Setting

λ = λ(ψ, χ) =∑

C∈F ∗(qt−1)

ψ(C)χ(N(C)),

then λ(χ, ψ) = λ(ψ, χ),

MV (ψ, χ) = λV (ψ, χ), (5.2)


and MV (ψ, χ) = λV (ψ, χ). Thus we have

M2V (ψ, χ) = λλV (ψ, χ) = |λ|2V (ψ, χ).

Hence V (ψ, χ) is an eigenvector of M2 with corresponding eigenvalue|λ(ψ, χ)|2.

Observe that in the multiplicative group consisting of additive char-acters, the inverse ψ−1 of ψ is ψ, and the similar statement holds alsofor the multiplicative group consisting of multiplicative characters. Weclaim that the eigenvectors of the form V (ψ, χ) are pairwise orthogonal.Let (ψ′, χ′) 6= (ψ, χ), and let ψ′′ = ψ′ψ−1 = ψ′ψ and χ′′ = χ′χ−1 = χ′χ.Then (ψ′′, χ′′) 6= (ψ0, χ0), where ψ0 and χ0 are trivial additive characterand trivial multiplicative character, respectively. The inner product ofcomplex vectors V (ψ′, χ′) and V (ψ, χ) is

V T (ψ′, χ′)V (ψ, χ) = V T (ψ′, χ′)V (ψ, χ)

=∑

(X,x)∈F (qt−1)×F ∗(q)ψ′′(X)χ′′(x)

=∑

X∈F (qt−1)

ψ′′(X)∑

x∈F ∗(q)χ′′(x) = 0,

as one of sum in the last row is 0. The number of vectors of form V (ψ, χ)is equal to the order of Go

q,t by Lemma 5.4, hence all eigenvalues of M2

are of form |λ(ψ, χ)|2. Therefore, any eigenvalue of M is of form

±|λ(ψ, χ)| = ±

∣∣∣∣∣∣∑

C∈F ∗(qt−1)

ψ(C)χ(N(C))

∣∣∣∣∣∣ .When ψ = ψ0 and χ = χ0, the corresponding eigenvalue is qt−1 − 1,which can be obtained from Perron-Frobenius Theorem with multiplic-ity 1.

Let µ be a primitive element of F (qt−1), and let

Ak = µk+j(q−1) : 0 ≤ j ≤ `− 1,

where ` = (qt−1 − 1)/(q − 1). Then A0, A1, . . . , Aq−2 form a partitionof F ∗(q) with |Ak| = `. It is easy to see N(x) = N(y) if x and y are inthe same Ak. Therefore, when ψ = ψ0 and χ 6= χ0, as

|λ(ψ0, χ)| =

∣∣∣∣∣∣∑

C∈F ∗(qt−1)

χ(N(C))

∣∣∣∣∣∣ =

∣∣∣∣∣∣`∑

c∈F ∗(q)χ(c)

∣∣∣∣∣∣ = 0,


thus 0 is an eigenvalue of M with multiplicity q−2 which is the numberof multiplicative characters of F (q) except χ0.

When ψ 6= ψ0 and χ = χ0,

λ(ψ, χ0) =∑

C∈F ∗(qt−1)

ψ(C) = −ψ(0) = −1.

So 1 is an eigenvalue of M2 with multiplicity qt−1 − 1, then ±1 areeigenvalues of M with the sum of the multiplicities being qt−1− 1. LetW (ψ) = V (ψ, χ0)+V (ψ, χ0). It follows from (5.2) that for any ψ 6= ψ0,MW (ψ) = −W (ψ). For any ψ, ψ′ 6= ψ0, ψ 6= ψ′, it is easy to see thatthe complex vectors W (ψ) and W (ψ′) are orthogonal. So −1 is aneigenvalue of M with multiplicity at least (qt−1 − 1)/2. Similarly, byconsidering V (ψ, χ0)− V (ψ, χ0), we know that 1 is an eigenvalue of Mwith multiplicity at least (qt−1−1)/2, hence each multiplicity is exactly(qt−1 − 1)/2.

When ψ 6= ψ0 and χ 6= χ0, observing that χN is a non-trivialmultiplicative character of F (qt−1), by Theorem 5.12 on Gaussian sum,

|λ| =

∣∣∣∣∣∣∑

C∈F ∗(qt−1)

ψ(C)χ(N(C))

∣∣∣∣∣∣ = q(t−1)/2.

Let S and T be the multiplicities of the eigenvalues q(t−1)/2 and−q(t−1)/2

of M , respectively. As (qt−1−1)(q−2) is the number of vectors of formV (ψ, χ) with ψ 6= ψ0 and χ 6= χ0,

S + T = (qt−1 − 1)(q − 2).

By the definition, the graph Goq,t has a loop at each vertex (A, a) if

and only if N(2A) = a2. Since exactly (q − 1)/2 elements of F ∗(q) aresquares and the equation N(X) = y has (qt−1 − 1)/(q− 1) solutions inX for each fixed y ∈ F ∗(q), there are (qt−1−1)/2 elements A ∈ F (qt−1)with N(2A) being a non-zero square. Once N(2A) is a non-zero square,there are two distinct elements a,−a ∈ F ∗(q) with N(2A) = a2 =(−a)2. Thus Go

q,t contains qt−1 − 1 loops, which is the trace of M .Hence

qt−1 − 1 = tr(M) =qt−1(q−1)∑

j=1

λj


= qt−1 − 1 +qt−1 − 1

2− qt−1 − 1

2+ q(t−1)/2(S − T ),

implying that S = T = (qt−1 − 1)(q − 2)/2. 2

The above theorem has the following corollary, which and the lowerbound in Chapter 9 imply α(Gq,t) = α(Go

q,t) = Θ(q(t+1)/2).

Corollary 5.5 Let t ≥ 2 be an integer and q be an odd prime power.Then

α(Gq,t) ≤(qt − qt−1)(q(t−1)/2 + 1)

qt−1 + q(t−1)/2 − 1∼ q(t+1)/2 ∼ n(t+1)/(2t),

where n = qt−1(q − 1) is the order of Gq,t.

Let us conclude this section with an algebraic construction thatalmost matches the probabilistic bound rk(Km,n) ≥ kmn− n1/2+o(1) inChapter 5.

Theorem 5.14 Let positive integers k and m be fixed. Then

rk(Km,n) ≥ kmn− n0.525.

for large n.

Proof. As the assertion is trivial for k = 1, we assume that k ≥ 2.Let p ≡ 1 (mod 2k) be a prime and Fp the finite field of p elements.Let µ be a primitive element of Fp. Define a logarithmic-like functionlogµ(x) : F ∗p → Zp−1 = 0, 1, . . . , p− 2 as

logµ(x) = ` if x = µ`, 0 ≤ ` ≤ p− 2.

For every j with 0 ≤ j ≤ k − 1, define a graph Hj on vertex set Fp, inwhich x and y is adjacent in Hj if and only if

logµ(x− y) ≡ j ( mod k).

As p ≡ 1 (mod 2k) and (−1)2 = 1, we have −1 = µ(p−1)/2, and thuslogµ(x− y) ≡ logµ(y−x) (mod k), so the definition is compatible. Incase k = 2, the graph H0 is the Paley graph.

Lemma 5.9 Let k ≥ 2 be an integer and p ≡ 1 (mod 2k) be a prime.Let Hj, 0 ≤ j ≤ k − 1, be the graph defined with respect to a primitiveelement of µ of Fp. Then these Hj are pairwise isomorphic.

Proof. We shall verify that each Hj is isomorphic to H0. Define abijection φ on F (p) as φ(z) = µjz. Then x, y is an edge of H0 if andonly if x − y = µ` for some ` ≡ 0 (mod k). As φ(x) − φ(y) = µj+`,thus x, y is an edge of H0 if and only if φ(x), φ(y) is an edge of Hj.Thus Hj is isomorphic to H0. For any vertex x, its neighborhood in H0

is

x+ µk, x+ µ2k, . . . , x+ µp−1,

so the degree of x in H0 is ((p− 1)/k). This proves the lemma. 2

Let ζk = e2πi/k. It is easy to see the following identity holds

(x− ζk)(x− ζ2k) · · · (x− ζk−1

k ) = 1 + x+ x2 + · · ·+ xk−1. (5.3)

Define a function χ on F ∗p as

χ(x) = ζ`k, where ` ≡ logµ x (mod k).

Extend χ to all of Fp by χ(0) = 0. Then χ is a multiplicative characterof Fp of order k.

Let U ⊆ Fp be a subset of vertices of the graph H0 with |U | = m.Denote by J(U) for ∩u∈UN(u). If |J(U)| < n for any such U , thenrk(Km,n) > p from Lemma 5.9. For a fixed U , define a function f(x)on x ∈ Fp as

f(x) =∏u∈U

k−1∏j=1

(χ(x− u)− ζjk) =∏u∈U

k−1∑j=0

χj(x− u),

where we use the identity (5.3). For x 6∈ U , if x 6∈ J(U), then f(x) = 0as χ(x − u) = ζjk for some j with 1 ≤ j ≤ k − 1. If x ∈ J(U), thenf(x) = km as logµ(x−u) ≡ 0 (mod k) hence χ(x−u) = 1. Therefore,we have ∑

x 6∈Uf(x) = km|J(U)|.

Set U = u1, u2, . . . , um. Note that χ is multiplicative thus

f(x) =m∏t=1

(1 + χ(x− ut) + · · ·+ χk−1(x− ut)

)=

∑0≤j1,...,jm≤k−1

χ((x− u1)j1 · · · (x− um)jm

)= 1 +

∑0≤j1,...,jm≤k−1j1+...+jm≥1

χ((x− u1)j1 · · · (x− um)jm

).

Applying the Weil’s theorem for the the polynomial (x− u1)j1 · · · (x−um)jm with j1 + · · ·+ jm ≥ 1, which is not the form chk(x) with c ∈ Fpand h(x) ∈ Fp[x] as j1, . . . , jm < k, from (5.1), we have∣∣∣ ∑

x∈Fpχ((x− u1)j1 · · · (x− um)jm

)∣∣∣ ≤ (m− 1)√p.

Hence we obtain that∣∣∣p− ∑x∈Fp

f(x)∣∣∣ =

∣∣∣ ∑x∈Fp

∑0≤j1,...,im≤k−1j1+...+jm≥1

χ((x− u1)j1 · · · (x− um)jm

)∣∣∣=

∣∣∣ ∑0≤j1,...,jm≤k−1j1+...+jm≥1

∑x∈Fp

χ((x− u1)j1 · · · (x− um)jm

)∣∣∣≤

∑0≤j1,...,jm≤k−1j1+...+jm≥1

(m− 1)√p.

It is well-known that the number of solutions of nonnegative integers(j1, . . . , jm) of the equation

j1 + j2 + · · ·+ jm = s

is(s+m−1

s

)for a fixed integer s. Omitting the constraint that j1, . . . , jm ≤

k − 1, we obtain that

∣∣∣p− ∑x∈Fp

f(x)∣∣∣ ≤ m(k−1)∑

s=1

(s+m− 1

s

)(m− 1)

√p = A

√p,

5.6. SOME MULTI-COLOR RAMSEY NUMBERS 105

where A = A(k,m) is independent of p. Note that |f(x)| ≤ km, thus|∑x∈U f(x)| ≤ mkm and∣∣∣p− km|J(U)|

∣∣∣ =∣∣∣p−∑

x6∈Uf(x)

∣∣∣ ≤ ∣∣∣p− ∑x∈Fp

f(x)∣∣∣+ ∣∣∣ ∑

x∈Uf(x)

∣∣∣≤ A

√p+mkm ≤ (A+ 1)

√p

for large p, which implies that

km|J(U)| ≤ p+ (A+ 1)√p.

It is known that there are asymptotically N/(φ(2k) logN) primes p inthe form p ≡ 1 (mod 2k) between 1 and N , where φ(2k) is the numberof integers from 1 to 2k that are relatively prime to 2k. Let p ≡ 1 ( mod2k) be a prime between kmn−n0.525 and kmn−n0.525/2. The existence ofsuch prime for large n is ensured by results for estimating the differencebetween consecutive primes, see Baker, Harman and Pintz (2001). Theconstant 0.525 is in the process of improvement to 0.5 + o(1) impliedby the famous Riemann hypothesis. By choosing such p, we have

|J(U)| ≤ n− n0.525

2+ (A+ 1)

√kmn < n,

for large n. Thus H0 contains no Km,n, implying

rk(Km,n) > p ≥ kmn− n0.525

as each Hi is isomorphic to H0. 2

The largest difference between consecutive primes is conjectured asp1/2+o(1). If so, we have rk(Km,n) ≥ kmn− n1/2+o(1), which is the sameas that in Chapter 5.

5.6 Some multi-color Ramsey numbers

For H1 = H2 = · · · = Hk = H, let us write

rk+1(H;Hk+1) = r(H1, . . . , Hk, Hk+1)

Alon and Rodl (2005) gave sharp bounds for multi-color Ramsey num-bers in form of rk+1(H;Kn) with k ≥ 2, where H is a (some kind)bipartite graph or K3. Their main idea is to estimate the number ofindependent sets of given size in a quasi-random graph G, which con-tains no H, and to consider random shifts of G. The number of shiftsis k. The bigger k, the tighter is the bound. When k = 1, which mean-s no shift actually, the method is ineffective (for bounding r(H,Kn)).It is interesting that G is Turan’s graph when H is bipartite. Recallthat the graphs constructed in Chapter 9 are regular, which containsloops but each vertex has at most one loop. We call such graphs to besemi-simple.

Theorem 5.15 Let G = (V,E) be a semi-simple (N, d, λ)-graph, andlet n0 = 2N logN

d. Then for any n ≥ n0, the number M of independent

sets of size n in G satisfies that

M ≤(edn

2λn0

)n0(

2eλN

dn

)n.

Proof. Consider the number of ways to choose an ordered set v1, v2, . . . , vnof n vertices which form an independent set. Starting with B0 = V , wechoose v1 arbitrarily. Define

Bi = V \ ∪ij=1N [vj].

Then Bi is the set of vertices by deleting v1, v2, . . . , vi and their neigh-bors, where v1, . . . , vi have been chosen. Clearly Bi is decreasing inthe sense Bi ⊇ Bi+1, and vi+1 has to lie in Bi. Define

Bi =

u ∈ V : |NBi(u)| ≤ d

2N|Bi|

,

where NBi(u) = N(u) ∩ Bi. If the next chosen vertex vi+1 from Bi isnot in Bi, then Bi+1 is obtained by deleting vi+1 and at least d

2N|Bi|

vertices from Bi and so

|Bi+1| <(1− d

2N

)|Bi|.

Hence throughout the process there cannot be more than n0 choiceslike that, since otherwise the corresponding set of non-neighbors willbe empty before the process terminates from(

1− d

2N

)n0

=(1− d

2N

)2N logN/d<

1

N.

It follows that with at most n0 possible exceptions, each vertex vi+1 hasto lie in Bi ∩Bi. By Corollary 5.3, we have

|Bi ∩Bi| ≤2λN

d.

Therefore, the total number of choices for the ordered set v1, v2, . . . , vnis at most (

n

n0

)Nn0

(2λN

d

)n−n0 ≤(edn

2λn0

)n0(

2λN

d

)n.

Indeed, there are(nn0

)possibilities to choose a set of indices covering

all indices i for which the vertex vi+1 has not been chosen in Bi ∩ Bi.Then there are at most N ways to choose each such vertex vi, and atmost 2λN

dways to choose each vertex vj+1 for each other index j.

Now, dividing the above bound by n!, we obtain an upper boundfor the number of unordered independent sets of size n as claimed. 2

Theorem 5.16 Let G be a graph of order N that contains no H, andlet M be the number of independent sets of size n in G. If

Mk <

(N

n

)k−1

,

then rk+1(H;Kn) > N .

Proof. For each i, 1 ≤ i ≤ k, let Gi be a random copy of G on thesame vertex set V , that is, a graph obtained from G by mapping itsvertices to those of V according to a random one to one mapping. Theprobability that a fixed set of n vertices of V will be an independentset in each Gi is M(

Nn

)k < 1(

Nn

) ,

implying that with a positive probability there is no such independentset.

Color each edge of KN on V (G) by minimum i if the edge belongs toGi. Otherwise, color the edge by k+1. Then, there is no monochromaticH in first k colors and no Kn in the last color k+1, so rk+1(H,Kn) > N .2

Let us recall an upper bound in Chapter 8 that for any fixed integerss ≥ t ≥ 2,

rk+1(Kt, s;Kn) ≤ c

(n

log n

)t,

where c = c(k, t, s) > 0 is a constant.

Theorem 5.17 The Ramsey number rk+1(C4;Kn) satisfies the follow-ing:

(1) For any fixed k ≥ 3, rk+1(C4;Kn) = Θ(

nlogn

)2.

(2) There are positive constants c1 and c2 such that

c1

(n log log n

(log n)2

)2≤ r(C4, C4, Kn) ≤ c2

( n

log n

)2.

Proof. It suffices to prove the lower bounds. Consider the Erdos-Renyi graph Eo

q of order N = q2 + q + 1 and let M be the numberof independent sets of size n. By Lemma 5.16, we shall show that

Mk <(Nn

)k−1. The graph Eo

q is d-regular, where d = q + 1. Let

n0 = 2N logNd

. Then for large q,

4q log q < n0 < 4(q + 1) log q.

From Theorem 5.15, if n ≥ n0, which can be guaranteed by takingc > 4, we have

M ≤(edn

2λn0

)n0(

2eλN

dn

)n,

where λ =√q.

(1) For k ≥ 3, it suffices to show that r(C4, C4, C4, Kn) = Θ(

nlogn

)2

as rk+1(C4;Kn) ≥ r(C4, C4, C4, Kn). Set n = cq log q, where c is a large

constant to be chosen. We shall show that

M3/n <

(N

n

)2/n

. (5.4)

Substituting d, λ, n0, n,N by values in terms of q, we have

M3/n ≤(ce(q + 1)

8√q

) 12c

(1+1/q)( 2e√q

c log q

)3∼ c1

q12/c+3/2

(log q)3,

where c1 is a positive constant, and(N

n

)2/n

∼(eNn

)2∼( eq

c log q

)2.

Thus the inequality (5.4) holds if 12/c+3/2 ≤ 2, which is c ≥ 24. Thenwe have n > n0 and N ∼ q2 ∼ n2/(c log n)2 as q →∞, completing theproof for k ≥ 4.

(2) For r(C4, C4, Kn), set n = cq log2 q/ log log q, where c is a posi-

tive constant to be chosen. We shall show that M2/n <(Nn

)1/n. Note

that for some constant ci > 0,

M2/n ≤ c1

(√q log q

log log q

) 8 log log qc log q

(√q log log q

log2 q

)2

≤ c2q1+ 4 log log q

c log q

( log log q

log2 q

)2= c2

q(log log q)2

(log q)4−4/c,

and (N

n

)1/n

≥ c3eN

n≥ c4

q log log q

log q.

We are done by taking c > 4/3 so that 4− 4/c > 1. 2

Note that we have found the spectrum of the projective norm graphGq,t in the last section and it contains no Kt, s for s ≥ (t − 1)! + 1.Similar argument gives the following result.

Theorem 5.18 For any fixed t ≥ 2 and s ≥ (t − 1)! + 1, the Ramseynumber rk+1(Kt,s;Kn) satisfies the following:

(1) For any k ≥ 3,

rk+1(Kt,s;Kn) = Θ( n

log n

)t.

(2) There are positive constants c1 and c2 such that

c1

(n log log n

(log n)2

)t≤ r(Kt,s, Kt,s, Kn) ≤ c2

( n

log n

)t.

Alon and Rodl (2005) also solved a conjecture of Erdo and Sosconjectured that

limn

r(K3, K3, Kn)

r(K3, Kn)=∞.

Recall that in Chapter 2 we define how to “blow up” G with H, whereeach vertex v of G is replaced by a copy of H, denoted by Hv, in whichany pair of vertices from distinct Hu and Hv are adjacent if and only ifu and v are adjacent. If H is an r-independent set, we call it an r-blowup of G.

Lemma 5.10 There is some constant c = ck > 0 such that

rk+1(K3;Kn) ≥ c nk+1

(log n)2k

for all large n.

Proof. Let N = c1s2/ log s < r(K3, Ks+1), where c1 > 0 is a fixed

constant. Then there is a graph F of order N with no K3 and α(F ) ≤ s.Let G be the r-blow up of F and M the number of independent sets ofsize n in G, where r = r(s) will be chosen. Then

M ≤

(Ns

)(rs)n

n!<(eNs

)s(ersn

)n.

The first inequality follows from the facts that each independent setA of size n can be partitioned into at most s blocks (blown vertices);

there are at most(Ns

)ways to choose these blocks, and each vertex of

A is one of the rs vertices in a block.

Since G has rN vertices and it contains no K3, by Theorem 5.4, wehave rk+1(K3;Kn) > rN if

Mk <

(rN

n

)k−1

.

We now take r = sk−1(log s)2−k and n = cs log s, where c > 0 is aconstant to be chosen. Then

Mk/n <( c1es

log s

)k/(c log s)( esk−1

c(log s)k−1

)k≤ c2

ck

( s

log s

)k(k−1),

where c2 and henceforth c3 and c4 are positive constants that is inde-pendent of c, and

(rN

n

)(k−1)/n

> c3

(erNn

)k−1≥ c4

ck−1

( s

log s

)k(k−1).

Thus the condition is satisfied if we take c large such that c2/ck <

c4/ck−1, and hence

rk+1(K3;Kn) > rN =c1s

k+1

(log s)k−1= Θ

( nk+1

(log n)2k

),


Theorem 5.19 There are constants ci = ci(k) > 0 such that

c1 nk+1

(log n)2k≤ rk+1(K3;Kn) ≤ c2 n

k+1

(log n)k

for all large n.

Proof. It remains to show the upper bound, which holds for k = 0 andk = 1. We next prove the result by induction on k ≥ 2. Assuming thatthe result holds on k − 1, we prove it for k. Let N = rk+1(K3;Kn) −1. There is an edge-coloring of KN by colors 1, 2, . . . , k + 1 with nomonochromatic K3 in any of the first k colors, and no monochromatic

Kn in the color k + 1. Consider the graph T consisting of all edges ofthe first k colors. Then D = ∆(T ) satisfies that

D ≤ k(rk(K3;Kn)− 1) < krk(K3, Kn).

The neighborhood N(v) of v in T is ∪ki=1Ni(v), where Ni(v) is the setof neighbors of v that are connected to v by an edge in the color i,1 ≤ i ≤ k. For a vertex u in N(v), we consider the neighbor of u inthe subgraph induced by N(v) in T . Suppose u ∈ N1(v), say. Suchneighbors are these in

N(u) ∩N(v) = ∪kj=1

(∪ki=1

(Ni(u) ∩Nj(v)

)).

First of all, N1(u)∩N1(v) = ∅ since there is no monochromatic trianglein the color 1. For 2 ≤ i ≤ k, Ni(u) ∩ N1(v) contains no edge in thecolors i and 1, thus |Ni(u) ∩N1(v)| ≤ rk−1(K3, Kn)− 1. Hence∣∣∣ ∪ki=1

(Ni(u) ∩N1(v)

)∣∣∣ < (k − 1)rk−1(K3, Kn).

Similarly, for 2 ≤ j ≤ k,∣∣∣ ∪ki=1

(Ni(u) ∩Nj(v)

)∣∣∣ < (k − 1)rk−1(K3, Kn).

Thus the maximum degree of subgraph induced by N(v) in T is at lessthan m = k2rk−1(K3, Kn). By the main result in Chapter 3, we have

n ≥ α(T ) ≥ Nlog(D/m)− 1

D.

Then, using the induction hypothesis for D and Lemma 5.10 for m, wehave the desired upper bound. 2

References

N. Alon and V. Rodl, Sharp bounds for some multicolor Ramseynumbers, Combinatoria, 25 (2005), 125-141.

N. Alon and J. Spencer, The Probabilistic Method, 3rd ed., Wiley-Interscience, New York, 2008.

R. Baker, G. Harman and J. Pintz, The difference between consec-utive primes, II, Proc. Lond. Math. Soc., 83 (2001), 532-562.


F. R. Chung, R. Graham, Sparse quasi-random graphs, Combina-toria, 22 (2002), 217-244.

F. R. Chung, R. Graham and R. Wilson, Quasi-random graphs,Combinatoria, 9 (1989), 345-362.

M. Krivelevich and B. Sudakov, Pseudo-random graphs, Bolyai Soc.Math. Stud., 15 (2006), 199-262.

J. Seidel, A survey of two-graphs, in: Colloquio Internazionale sulleTeorie Combinatorie (Rome, 1973), vol I, Atti dei Convegni Lincei, No.17, Accad. Naz. Lincei, Rome, 1976, 481C 511.

T. Szabo, On the spectrum of projective norm-graphs, Inform. Pro-cess. Lett., 86 (2) (2003), 71-74.

A. Thomason, Pseudo-random graphs, in: Proceedings of RandomGraphs, Poznan 1985, M. Karonski, Eds., Ann. Discrete Math. 33(1987), 307-331.

Chapter 6

Real-world Networks

Complex systems from various fields, such as physics, biology, or sociol-ogy, can be systematically analyzed using their network representation.A network (also known as a graph) is composed of vertices (or nodes)and edges, where vertices represent the constitutes in the system andedges represent the relationships between these constitutes.

We shall introduce some basic concepts for real-world networks inthis chapter.

Note that in some papers on graphs produced by a random process,“typical” graphs (instead of random graphs) are chosen to present thegraphs in the random graph space.

6.1 Data and empirical research

Big data is a term for large or complex data. The term often referssimply to that traditional data processing applications are inadequate,and seldom to a particular size of data set. Most big data come fromreal-world networks, and analysis of such data sets can find new corre-lations to spot business trends, prevent diseases, combat crime and soon.

Empirical research is research using empirical evidence, where theempirical evidence is the record of direct observations or experiencesin form of data and big data. Through quantifying the data, a re-searcher can answer empirical questions from real-world. In particular,

115

116 CHAPTER 6. REAL-WORLD NETWORKS

we are interested in the empirical research and the data from real-worldnetworks.

It is usual that researchers aim the common case, so they oftendescribe the behavior by ignoring some case that are not significant forthe research. This is similar to the case in random graph, we describean event by saying “almost all” to signify that the probability of eventgoes to 1. It is often that we are more concerned with the average ofparameters since they are concentrated at the average in most cases.Average of some parameters may be more meaningful than the extremalvalue of them.

However, this is not always the case. When investigating a socialnetworks, the nodes of large degrees, called “hubs” such as internetcelebrities, attracted much attention as these nodes are very importantfor the structure of the networks.

Collecting data that needed is an challenge before analysis. Forexample, the data in Barabasi and Albert (2009) came from a softwaredesigned to collect the links in World Wide Web pointing from onepage to another. The data in Backstrom and Kleinberg (2014) andUgander, Backstrom, Marlow and Kleinberg(2012) came from FacebookInc. directly as several of co-authors are employees of the company.

6.2 Six degrees of separation

Six degrees of separation is the theory that any pair of persons is sixor fewer steps away in the world as a network connected by friendship,which means the maximum distance of nodes in the network is at mostsix in language of graph theory. However, as claimed before, “any pair”for social networks for sociology may mean most pairs.

The term small world became famous since a paper of S. Milgram(1967) who was an American psychologist. Some seminal works havebeen conducted before Milgram took up the experiments reported asthe small world problem, and the experiment is called “the small-worldexperiment”, in which Milgram and other researchers examined the av-erage path length for social networks of people in the United States.The research suggested that human society is a small-world-type net-work, and the experiments are often associated with the phrase “six

6.2. SIX DEGREES OF SEPARATION 117

degrees of separation”, although Milgram did not use this term him-self.

Milgram’s experiment developed out of a desire to learn more aboutthe probability that two randomly selected people would know eachother. This is one way of looking at the small world problem.

Though the experiment went through several variations, Milgramtypically chose individuals in cities of Omaha, Nebraska, and Wichita,Kansas, to be the starting points and Boston, Massachusetts, to bethe end point of a chain of correspondence. These cities were selectedbecause they were thought to represent a great distance in US, bothsocially and geographically.

Information packets (a letter,a roster and postcards) were initiallysent to randomly selected individuals in Omaha or Wichita. In themore likely case that the person did not personally know the target,then the person was to think of a friend or relative he knew personallywho was more likely to know the target. He was then directed to signhis name on a roster in the information packet and forward the packetto that person. When and if the package eventually reached the contactperson in Boston, the researchers could examine the roster to count thenumber of times it had been forwarded from person to person.

However, a significant problem was that often people refused to passthe letter forward, and thus the chain never reached its destination. Inone case, only 64 of the 296 letters eventually did reach the targetcontact. Among these chains, the average path length fell around fiveand a half or six. Hence, the researchers concluded that people in USare separated by about six people on average.

Smaller communities, such as mathematicians and actors, have beenfound to be densely connected by chains of personal or professional as-sociations. Mathematicians have created the Erdos number to describetheir distance from Paul Erdos based on shared publications. A simi-lar exercise has been carried out for the actor Kevin Bacon and otheractors who appeared in movies together with him.

In 2001, D. Watts attempted to recreate Milgram’s experiment onthe Internet, using an e-mail message as the “package” that neededto be delivered, with 48, 000 senders and 19 targets (in 157 countries).Watts found that the average number of intermediaries was aroundsix, reported in Watts (1998). Today, the phrase “six degrees of sep-


aration” is often used as a synonym for the idea of the “small world”phenomenon.

Watts and Strogatz (1998) showed that the average path lengthbetween two nodes in a random network is equal to logN/ logK, whereN is number of nodes and K is degree of acquaintances per node.Thus, assuming 10% of population of US is too young to participateand N = 300, 000, 000 (90% of the US population) and K = 30, theDegrees of Separation 5.7. If N = 6, 000, 000, 000 (90% of the Worldpopulation) and K = 30, then Degrees of Separation 6.6.

However, the convenient way of communication in a social networkwill make the average distance smaller and smaller. Facebook’s datateam released data in online papers described that amongst all Face-book users at the time of research, the average distances of friendshiplinks were 5.28 in 2008, 4.74 in 2011 and 3.57 in February 2016 (thisyear). The world changes from “six degrees of separation” to “fourdegrees of separation”.

6.3 Clustering coefficient

An important measure of network topology, called clustering coef-ficient, assesses the triangular pattern as well as the connectivity in avertex’s neighborhood: a vertex has a high clustering coefficient if itsneighbors tend to be directly connected with each other. The clusteringcoefficient cv of a vertex v can be calculated as

cv =

0, if dv = 0ev

(dv2 ), if dv ≥ 2.

For dv = 1, it is a convention to define cv ∈ [0, 1] depending on thesituation. Thus 0 ≤ cv ≤ 1. The clustering coefficient cv for dv ≥ 2is the ratio of number of triangles and all possible triangles that sharevertex v.

Let Gk be a graph obtained from G by adding new edges connectingvertices of distance at most k in G. It is to see if n ≥ 8, then cv = 1/2for each v in circular lattice C2

n.

6.3. CLUSTERING COEFFICIENT 119

For a graph G of order N (i.e., G contains N vertices) and minimumdegree δ(G) ≥ 2, its average clustering coefficient is defined as

c(G) =1

N

∑v∈V

cv =2

N

∑v∈V

evdv(dv − 1)

.

Average clustering coefficient explains the clustering (triangulation)within a network by averaging the clustering coefficients of all its nodes.The idea of clustering coefficient is proposed (especially in the analysisof social networks) to measure the local connectivity or “cliqueness” ofa social network. If a network has a high average clustering coefficientand a small average distance, it is often called a “small-world” network.

Let us label the vertices of G of order N as v1, v2, . . . , vN . Recallthat A = (aij)N×N is the adjacency matrix of G, where

aij =

1, if vivj ∈ E,0, otherwise.

We also call the eigenvalues of A as eigenvalues of G. Let λ1 ≥ λ2 ≥· · · ≥ λN be eigenvalues of G in a non-increasing order. Set

λ = λ(G) = max|λi| : 2 ≤ i ≤ N.

As called by Alon, a graph G is an (N, d, λ)-graph if G is d-regularwith N vertices and λ = λ(G). Note that a d-regular connected graphsatisfies that λ1 = d. For an (N, d, λ)-graph, the spectral gap betweend and λ is a measure for its quasi-random property. The smaller thevalue of λ compared to d, the closer is the edge distribution to theideal uniform distribution (i.e., it becomes a random graph). We maysay, not precisely, that an (N, d, λ)-graph with λ = O(

√d) has good

quasi-randomness. Generally, this is a weak condition as most randomgraphs are such graphs.

Theorem 6.1 Let G be an (N, d, λ)-graph that is connected. If λ =O(√d) as d→∞, then

c(G) ∼ d

N.


Proof. Let A be adjacency matrix of G. Note that A is symmetric,and thus it is diagonalizable. Let λ1, λ2, . . . , λN be the eigenvalues ofA. Then the eigenvalues of Ak are λk1, λ

k2, . . . , λ

kN . Note that the (i, j)

element of Ak is the number of walks from vertex vi to vertex vj, anda closed walk of length 3 is exactly a triangle. Thus the ith diagonalelement of A3 is 2evi , and

c(G) =2

N

∑v

evdv(dv − 1)

=1

Nd(d− 1)

N∑i=1

λ3i =

1

Nd(d− 1)

(d3 +

N∑i=2

λ3i

),

where we used the fact that λ1 = d as G is d-regular and connected.The assumption λ = O(

√d) implies that∣∣∣∑N

i=2 λ3i

∣∣∣Nd(d− 1)

≤ Nλ3

Nd(d− 1)=O(d3/2)

d2→ 0.

Thus

c(G) ∼ d2

N(d− 1)∼ d

N

for large d. 2

6.4 Small-world networks

The small-world phenomenon is typical for random graphs that havesmall maximum distances. A definition for small-world network de-scribes it as a network, in which the typical distance L between tworandomly chosen nodes grows proportionally to logN , where N is thenumbers of nodes in the network. Namely,

E(L) = Θ(logN),

which grows slowly as N → ∞ and the average distance of nodes issmall.

A certain category of small-world networks were identified as a classof random graphs by D. Watts and S. Strogatz in (1998). They mea-sured that in fact many real-world networks have a small average dis-tance, but also a clustering coefficient significantly higher than expected

6.5. POWER LAW AND SCALE-FREE NETWORKS 121

by random chance. They noted that graphs could be classified accord-ing to two independent structural features, namely the clustering co-efficient, and average distance. Purely random graphs, built accordingto the Erdos-Renyi (ER) model, exhibit a small average distance (typ-ically as Θ(logN)) along with a small clustering coefficient (typicallyd/N where d is the expected value of degrees).

Many biological,technological and social networks lie between com-pletely regular and completely random. Typically, these networks havemany vertices that are sparse in sense that the average degrees aremuch less than the number of vertices.

Watts and Strogatz modelled the small-world networks by startingat a graph Ck

n with

n k log n 1,

where k log n guarantees that a random graph will be connected.Then, they choose vertices in order and edges adjacent to the chosenvertices and reconnect these edges to vertices chosen uniformly at ran-dom. In the process, the average clustering coefficient decreases slowlyand the average distance decreases rapidly and thus they obtained anetwork between regular ring lattice Ck

n and completely random net-work. The obtained network has a large average clustering coefficientand a small average distance, which is called small-world network.

6.5 Power law and scale-free networks

Let X be a discrete random variables taking positive integers. If

Pr(X = k) =c

kγ,

where c, γ > 0 are constants, then X is said to have a power law distri-bution. This distribution is also call Pareto distribution as economistPareto originally used it to describe the allocation of wealth that a larg-er portion of the wealth of society is owned by a smaller percentage ofthe people (so called 80-20 rule). Contrast to exponential distributionthat decreases rapidly, power law is also called heavy-tailed distribu-tion.

Let P (k) be the fraction of nodes in the network G that have degreek, namely

P (k) =|v : deg(v) = k|

N,

where N is number of nodes in G. If P (k) is equal (or close) to powerlaw, then the network G is said to be scale-free. For such networks, itis typical 2 < γ ≤ 3.

The networks of citations between scientific papers are interesting.In 1965, D. Price (1965) found that vertices of degree k in such net-works had a heavy-tailed distribution following a power law. He didnot however use the term “scale-free network”, which was not coineduntil some decades later. In 1976, Price also proposed a mechanismto explain the occurrence of power laws in citation networks, which hecalled cumulative advantage but which is today more commonly knownunder the name preferential attachment.

In 1999, A. Barabasi and colleagues coined the term scale-free net-work when they found some nodes had a much bigger degrees thanthe that “expected” in random network, and they were surprised andused the term “scale-free”, which now is used to describe the class ofnetworks that exhibit a power-law degree distribution.

Albert L. Barabasi is a physicist, best known for his work in theresearch of network theory, and Reka Albert, the co-author of paper(2009), is a professor of physics and biology.

In their earlier study (1999), Albert, Jeong and Barabasi found thatthe World Wide Web is not a random network, but the number of linksper node, often called the degree distribution, follows a power law. Sub-sequently, researchers found that not only the WWW, but many othernetworks, follow the same distribution. These different datasets togeth-er indicated that we are dealing with a potentially universal behavior,which might have a common explanation.

Barabasi and Albert (2009) proposed a generative mechanism toexplain the appearance of power-law distributions, which they called“preferential attachment” and which is essentially the same as thatproposed by Price. Analytic solutions for this mechanism (also similarto the solution of Price) were presented earlier by Dorogovtsev, Mendesand Samukhin (2002). Finally, it was rigorously proved by mathemati-cians Bollobas, Riordan, Spencer and Tusnady (2001).

6.5. POWER LAW AND SCALE-FREE NETWORKS 123

To explain this phenomenon, Barabasi and Albert (2009) suggestedthe following random graph process as a model, called BA model.

Consider a random graph process in which vertices are added to thegraph one at a time and joined to a fixed number of earlier vertices,selected with probabilities proportional to their degrees. Let v1, v2, . . .be a sequence of vertices. Assume that m0 ≥ 2 is the number of verticesto start at the process, and let d(vi) be the degree for the early vertexvi in the existing graph.

They described the process to start with a small number m0 ofvertices, at every time step we add a new vertex with m ≤ m0 edgesthat link the new vertex to m different vertices already present in thesystem. If the new vertex is vt+1, then the probability that vt+1 isadjacent to vi is proportional to

d(vi)∑tj=1 d(vj)

.

The above probability signifies the new vertex to incorporate preferen-tial attachment. Note that, to be clear to start, there should exist atleast one edge in the first m0 vertices. A question is if we connect eachearly vertex and new vertex randomly by above probability, then theexcepted number of new edges is one.

The research in Barabasi and Albert (2009) is empirical, and theproof is heuristic. The process defined in Bollobas et. al. (2001)preserves the idea of preferential attachment, and the description ismuch more complex, and power law has been shown for degrees atmost N1/15 with γ = 3.

On a theoretical level, some other abstract definitions of scale-freehave been proposed. For example, Li et. al. (2005) offered a potentiallymore precise “scale-free metric”. Let G = (V,E) be a simple graphand s(G) =

∑uv∈E d(u)d(v) and S(G) = s(G)/smax, where smax as the

maximum value of s(H) among simple graphs H on same vertex setV with degree distribution identical to G. The notation S(G) gives ametric between 0 and 1, where a G with small S(G) is “scale-rich”, andG with S(G) close to 1 is “scale-free”. Note that s(G) is maximizedwhen high-degree nodes are connected to other high-degree nodes andS(G) captures the notion of self-similarity implied in the name “scale-free”.


Some properties are often listed as the characteristics of scale-freenetworks, which are as follows.

• Power-law degree distribution;

• Generated by certain random processes with preferential attach-ment;

• Highly connected hubs that hold the network together with the“robust yet fragile” feature of error tolerance, which is robustwhen attacked by removing some nodes randomly and fragile byremoving some hubs deliberately;

• Generic in the sense of being preserved under random degree-preserving rewiring;

• Self-similar;

• Universal in the sense of not depending on domain-specific details.

6.6 Network Structure

As pointed by Newman (2003), the research on networks may pro-vide new insight into the study of complex systems. Networks havemany notable properties, such as the small-world property, the scale-free property, the community structure property, and the links betweentwo objects usually display diversity.

By collecting data from mobile phones, Fagle, Pentland and Lazer(2009) found that the data have the potential to provide insight in-to the relational dynamics of individuals, and allow the prediction ofindividual-level outcomes such as job satisfaction.

The concept of contagion has expanded from its original groundingin epidemic disease to describe many processes that spread across net-works such as fads, political opinions, the adoption of new technologies,and financial decisions, see, e.g. R. Pastor-Satorras and A. Vespignani(2001) and M. Newman, D. Watts and S. Strogatz (2002).

In traditional models of social contagion, the probability that anindividual is affected by the contagion grows by monotonically with the

6.6. NETWORK STRUCTURE 125

size of neighborhood. By analyzing the growth of Facebook, Ugander,Backstrom, Marlow and Kleinberg (2012) find that the probability ofcontagion is tightly controlled by the number of connected componentsin an individual neighborhood, rather than by the actual size of theneighborhood.

A crucial task in the analysis of on-line social-networking systems isto identify important people liked by strong social ties. Drawing datafrom e-mail, Kossinets and Watts has developed a method of analyzingand estimating tie strength in on-line domains (2006), in which thekey structure is embeddedness–the number |N(u) ∩ N(v)| of mutualfriends of two people u and v, a quantity that typically increases withtie strength.

The embeddedness is not necessarily to be the most appropriate forcharacterizing particular types of strength ties. Backstrom and Klein-berg (2014) proposed a networks-based characterization for intimaterelationships, those involving spouses or romantic partners. Using da-ta from a large sample of Facebook users, they try to recognize thesepeople with high accuracy. They found that embeddedness is in facta comparatively weak means of characterizing romantic relations, andthat an alternate network measure that they term dispersion is signif-icantly more effective. Roughly, a link between two people has highdispersion when their mutual friends are not well connected to one an-other. Their research has an important contingent nature: given that auser has declared a relationship partner, they want to understand howeffectively they can find partner.

Note that the links to a person’s relationship partner or other closestfriends may have lower embeddedness, but they often involve mutualneighbors from several foci, reflecting the fact that the social orbitsof these close friends are not bounded within any one focus–consider,for example, a husband who knows several of his wife’s co-workers,family members, and former classmates, even though these people beingto different foci and do not know each other. Thus, Backstrom andKleinberg proposed some definition as follows.

For a network G = (V,E) and a pair of nodes u and v, denote byCuv = N(u)∩N(v), the set of mutual friends of u and v, and cuv = |Cuv|.Let d(s, t, G) be the graph-theoretic distance between u and v in G. For


distinct s and t in Cuv, define

duv(s, t) =

1, d(s, t, Cuv) ≥ 3,0, d(s, t, Cuv ≤ 2.

Then, define the absolute dispersion of u and v as

disp(u, v) =∑

s,t∈Cuv , s 6=tduv(s, t).

Note that disp(u, v) depends on both of Cuv and d(s, t, Cuv), and define

norm(u, v) =disp(u, v)

cuv,

which is called normalized dispersion. Predicting u’s partner to be theindividual v with

norm(u, v) = maxnorm(u, x) : x ∈ V gives the correct answer in 48.0% of all instances.

There are two ways to strengthen normalized dispersion that leadto increased performance. The first is to rank pair of u and v by afunction of the form

(disp(u, v) + b)α

(cuv + c).

Searching over choices α, b and c leads to maximum performance of50.5% at

α = 0.61, b = 0, c = 5.

The second way is by applying the idea of dispersion recursively. For afixed node u, define first xv = for all neighbors v of u. Then, iterativelyupdate each xv to be∑

w∈Cuv x2w + 2

∑s,t∈Cuv duv(s, t)xsxtcuv

→ xv.

Note that after the first iteration, xv = 1 + 2 · norm(u, v), and henceranking nodes by xv after the first iteration is equivalently to rankingnodes by norm(u, v). Backstrom and Kleinberg found that the highestperformance ranking nodes by values of xv after the third iteration,call such xv as recursive dispersion. The performance by embedded-ness and recursive dispersion for romantic relationships is 24.7% and50.6%, respectively; and that for (married) spouses is 32.1% and 60.7%,respectively.

6.7. REFERENCES 127

6.7 References

R. Albert, H. Jeong and A.L. Barabasi, Internet-diameter of theWorld-Wide Web, Nature, 401 (6749) (1999),130-131.

L. Backstrom and J. Kleinberg, Romantic partnerships and the dis-persion of socialties: A network analysis of relation status on Facebook,Proc. 17th ACM conference on computer supported cooperative workand social computing, 2014.

A. Barabasi and R. Albert, Emergence of scaling in random net-works, Science, 286 (5439) (1999), 509-512.

B. Bollobas, O. Riordan, J. Spencer and G. Tusnady, The degreesequence of a scale-free random graph process, Random Struct. Algor.,18 (2001), 279-290.

S. Dorogovtsev, J. Mendes, Evolution of networks, Advances inPhysics, 51 (4) (2002),1079.

N. Fagle, A. Pentland and D. Lazer, Infering friendship networkstructure by using mobile phone data, Proc. Natl. Acad. Sci. USA,106 (36) (2009), 15274-15278.

G. Kossinets and D. Watts, Empirical analysis of an evolving socialnetwork, Science, 311 (2006), 88-90.

L. Li, D. Alderson, J. Doyle and W. Willinger, Towards a theoryof Scale-free graphs: Definitions, properties and implications, InternetMath., 2 (4) (2005), 431-523.

S. Milgram, The small world problems, Psychology Today, 2 (1967),60-67.

M. Newman, The structure and function of complete networks,SIAM Review, 45 (2003), 167-256.

M. Newman, D. Watts and S. Strogatz, Random graph model forsocial networks, Proc. Natl. Acad. Sci. USA, 99 (Suppl 1) (2002),2566-2572.

R. Pastor-Satorras and A. Vespignani, Epidemic spreading in scale-free networks, Phys. Rew. Lett., 86 (2001), 3200-3203.


D. Price, Networks of scientific papers, Science, 149 (3683) (1965),510-515.

J. Ugander, L. Backstrom, C. Marlow and J. Kleinberg, Structuraldiversity in social contagion, Proc. Natl. Acad. Sci. USA, 109 (16)(2012), 5962-5966.

D. Watts and S. Strogatz, Collective dynamics of ‘small-world’ net-works, Nature, 393 (6684) (1998), 440-442.

topics in probabilistic method -...

Documents