low-complexity decompositions of combinatorial …...make it simpler to apply tools such as the...

IMPA

Master’s Thesis

Low-Complexity Decompositions ofCombinatorial Objects

Author:Davi Castro-Silva

Supervisor:Roberto Imbuzeiro Oliveira

A thesis submitted in fulfillment of the requirementsfor the degree of Master in Mathematics

at

Instituto de Matematica Pura e Aplicada

April 15, 2018

iii

Contents

1 Introduction 11.1 High-level overview of the framework . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Remarks on notation and terminology . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Examples and structure of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Abstract decomposition theorems 72.1 Probabilistic setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.2 Structure and pseudorandomness . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.3 Weak decompositions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.3.1 Weak Regularity Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.4 Strong decompositions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.4.1 Szemeredi Regularity Lemma . . . . . . . . . . . . . . . . . . . . . . . . . 11

3 Counting subgraphs and the Graph Removal Lemma 153.1 Counting subgraphs globally . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153.2 Counting subgraphs locally and the Removal Lemma . . . . . . . . . . . . . . . . 173.3 Application to property testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4 Extensions of graph regularity 214.1 Regular approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.2 Relative regularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

5 Hypergraph regularity 275.1 Intuition and definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275.2 Regularity at a single level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285.3 Regularizing all levels simultaneously . . . . . . . . . . . . . . . . . . . . . . . . . 30

6 Dealing with sparsity: transference principles 336.1 Subsets of pseudorandom sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336.2 Upper-regular functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346.3 Green-Tao-Ziegler Dense Model Theorem . . . . . . . . . . . . . . . . . . . . . . 37

7 Transference results for L1 structure 417.1 Relationships between cut norm and L1 norm . . . . . . . . . . . . . . . . . . . . 417.2 Inheritance of structure lemmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437.3 A “coarse” structural correspondence . . . . . . . . . . . . . . . . . . . . . . . . . 457.4 A “fine” structural correspondence . . . . . . . . . . . . . . . . . . . . . . . . . . 46

7.4.1 Proof of Theorem 7.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

8 Extensions and open problems 51

Bibliography 53

1

Chapter 1

Introduction

Many times in Mathematics and Computer Science we are dealing with a large and general classof objects of a certain kind and we wish to obtain non-trivial results which are valid for all objectsbelonging to this class. This may be a very hard task if the possible spectrum of behavior forthe members of this class is very broad, since it is unlikely that any single argument will holduniformly along this whole spectrum.

Such results may be easy (or easier) to obtain when the class we are dealing with is highlystructured, in the sense that one can encode its elements in such a way that the description ofeach object has a relatively small size; then it may be possible to use this structure to proveresults valid uniformly over all objects in this class, or to do a case-by-case analysis to obtainsuch results.

At the other end of the spectrum there are the random objects, which have a very highcomplexity in the sense that any description of a randomly chosen object must specify the randomchoices made at each point separately, and thus be very large if the object in consideration islarge. However, for such objects there are various “concentration inequalities” which may beused to obtain results valid with high probability over the set of random choices made.

Therefore, if we can decompose every object belonging to the general class we are interestedin into a “highly structured” component (which has low complexity) and a “pseudorandom”component (which mimics the behavior of random objects in certain key statistics), then we mayanalyze each of these components separately by different means and so be able to obtain resultswhich are valid for all such objects.

An illustrative example of a “structure-pseudorandomness” decomposition of this kind isSzemeredi’s celebrated Regularity Lemma [31]. This important result roughly asserts that thevertices of any graph G may be partitioned into a bounded number of equal-sized parts, insuch a way that for almost all pairs of partition classes the bipartite graph between them israndom-like. Both the upper bound we get for the order of this partition and the quality ofthe pseudorandomness behavior of the edges between these pairs depend only on an accuracyparameter ε we are at liberty to choose.

In this example, the object to be decomposed is the edge set E of a given arbitrary graphG = (V,E), which belongs to the “general class” of all graphs. The structured componentthen represents the pairs (Vi, Vj) of partition classes together with the density of edges betweenthem, and it has low complexity because the order of the partition is uniformly bounded for allgraphs. The pseudorandom component represents the actual edges between these pairs, and hasa random-like property known as ε-regularity which we will define in the next chapter.

This result has many applications in Combinatorics and Computer Science (see, e.g., [20, 21]for a survey), and it has inspired numerous other decomposition results in a similar spirit bothinside and outside Graph Theory.

In this work we aim to survey many decomposition theorems of this form present in theliterature. We provide a unified framework for proving them and present some new results alongthese same lines.

1.1 High-level overview of the framework

In our setting, the combinatorial objects to be decomposed will be represented as functionsdefined over a discrete space X. This identification does not give much loss in generality, sincegiven a combinatorial object O (such as a graph, hypergraph or additive group), we may usually

2 Chapter 1. Introduction

identify some underlying discrete space X for this kind of object and then represent O as afunction fO defined on X.

We endow X with a probability measure P, so that the objects considered may be viewed asrandom variables, and define a family C of “low-complexity” subsets of X. The specifics of boththe probability measure P and the structured family C will depend on the application at hand,and it is from them that we will define our notions of complexity and pseudorandomness.

The sets belonging to C are seen as the basic structured sets, which have complexity 1, andany subset of X which may be obtained by boolean operations from at most k of these basicstructured sets A1, · · · , Ak ∈ C is said to have complexity at most k according to C.

We then say two functions g, h : X → R are ε-indistinguishable according to C if, for allsets A ∈ C, we have that |E [(g − h) 1A]| ≤ ε. Intuitively, this means that we are not able toeffectively distinguish between h and g by taking their empirical averages over random elementschosen from one of the basic sets in C.

A function f : X → R is then said to be ε-pseudorandom if it is ε-indistinguishable from theconstant function 1 on X. Thus pseudorandom functions are in some sense uniformly distributedover structured sets, mimicking random functions of mean 1 defined on X.

These concepts are closely related to the notions of pseudorandomness and indistinguishabilityin the area of Computational Complexity Theory (in the non-uniform setting). In this case, wehave a collection F of “efficiently computable” boolean functions f : X → 0, 1 (which arethough of as adversaries), and two distributions A and B on X are said to be ε-indistinguishableby F if

|P (f(A) = 1)− P (f(B) = 1)| ≤ ε ∀f ∈ F

A distribution R is then said to be ε-pseudorandom for F if it is ε-indistinguishable from theuniform distribution UX on X. Intuitively, this means that no adversary from the class F is ableto distinguish R from UX with non-negligible advantage.

This is completely equivalent to our definitions, if we identify each function f in F with itssupport f−1(1) inX, and identify the distributions A, B with the functions g(x) := P(A = x)·|X|,h(x) := P(B = x) · |X|. Then

|P (f(A) = 1)− P (f(B) = 1)| =∣∣E [(g − h) 1f−1(1)

]∣∣ ,where the expectation on the right-hand side is with respect to the uniform distribution.

In our abstract decomposition theorems given in Chapter 2, it will be convenient to deal withσ-algebras on X rather than with subsets of X; since a σ-algebra on a finite space X is a finitecollection of subsets of X, the intuition will be essentially the same. However, this change willmake it simpler to apply tools such as the Cauchy-Schwarz inequality and Pythagoras’ theorem,which will be both very important in our energy-increment arguments.

Moreover, we will also require pseudorandom functions to have no correlation in averagevalue to the structured sets, and thus be ε-indistinguishable from the zero function on X. Sincethe expected value function is linear, this “translation” in our definition makes no importantdifference.

The framework as described here will be retaken in Chapter 6, when we talk about transfer-ence principles and the Dense Model Theorem.

1.2 Remarks on notation and terminology

We will be mainly interested in very large objects, and use the usual asymptotic notation O, Ω,and Θ with subscripts indicating parameters the implied constant is allowed to depend on. Forinstance, Oα,β(X) denotes a quantity bounded in absolute value by Cα,β |X| for some quantityCα,β depending only on α and β.

We write Ea∈A,b∈B to denote the expectation when a in chosen uniformly from the set A andb is chosen uniformly from the set B, both choices being independent. For any real numbers aand b, we write x = a± b to denote a− b ≤ x ≤ a+ b.

Given an integer n ≥ 1, we write [n] for the set 1, · · · , n. If A is a set and k is an integer,we write

(Ak

)to denote the collection of all k-element subsets of A. We write A4B to denote

the symmetric difference (A \B) ∪ (B \A) of the sets A and B.

1.3. Examples and structure of the thesis 3

Formally, a graph G is given by a pair (V,E), where V is a finite set called the vertex set andE ⊆

(V2

), called the edge set, is a subset of the (unordered) pairs of vertices. We will sometimes

write uv or vu to denote an edge u, v ∈ E, and denote by 1G the edge indicator function1G(x, y) := 1xy ∈ E.

For any graph G, we will refer to its vertex set by V (G) and its edge set as E(G); if there isno risk of confusion, we may denote them only by V and E. For subsets A,B ⊆ V (G), we writeeG(A,B) to denote the number of edges in G with one vertex in A and other in B, counting twiceedges inside A ∩ B. We also write dG(A,B) := eG(A,B)/|A× B| to denote the edge density ofthe pair (A,B).

As customary, we say that H is a subgraph of G, and write H ⊆ G, if V (H) ⊆ V (G) andE(H) ⊆ E(G). Moreover, if V (H) = V (G), then we say that H is a spanning subgraph of G. If

W ⊆ V (G), then the subgraph of G induced by W is the graph G[W ] :=(W,E(G) ∩

(W2

)). A

subgraph H of G is an induced subgraph if H = G[V (H)].The collection of all partitions of a set A is denoted by P(A). If P0 := (Vi)i∈[k] ∈ P(V ) is a

partition of a vertex set V , we say a graph G = (V,E) is P0-partite if there are no edges insideeach vertex class in P0, i.e., if eG(Vi, Vi) = 0 for all i ∈ [k].

1.3 Examples and structure of the thesis

In Combinatorics, probably the most important and widely known decomposition result of thekind we discussed is Szemeredi’s Regularity Lemma (Theorem 2.3 proven in the next chapter),already mentioned in the first part of this chapter. This lemma, in a slightly weaker earlierversion, was originally used by Szemeredi to prove that all sets of integers with positive upper-density contain arbitrarily large arithmetic progressions [30], a result now known as Szemeredi’sTheorem.

More recently, it was used by Lovasz and B. Szegedy [22] to construct limit objects for infinitesequences of graphs. The Regularity Lemma was shown to imply the compactness of a metricspace on two-variable functions in which finite graphs may be naturally embedded, and thusthat every sequence of graphs has a converging subsequence in this space. The authors thenshowed how the compactness of this metric space may be used to prove a stronger version ofthe Regularity Lemma, known as the Regular Approximation Lemma, which we will prove inSection 4.1.

In Computer Science, Trevisan, Tulsiani and Vadhan [37] proved a general decompositiontheorem with the same philosophy as the results presented here, and in a similar frameworkas the one discussed in Section 1.1 (but with a different notion of complexity more adaptedto applications in Computer Science). They used this result to show that every high-entropydistribution is indistinguishable from an efficiently samplable distribution of the same entropy.They also showed how this decomposition theorem may be used to prove an important result inComputational Complexity Theory known as Impagliazzo’s Hardcore Theorem [17].

Still in the realm of Computer Science, an important theme which falls within the scope of oursubject matter is that of graph property testing, introduced in the seminal paper of Goldreich,Goldwasser and Ron [13]. Let us now quickly present the main problem in this area and itsconnection to our philosophy and objectives; we will mention this theme again in Section 3.3 andin the introduction of Chapter 4.

Given ε > 0, let us say a graph G on n vertices is ε-far from satisfying a graph property Pif one needs to add and/or delete at least εn2 edges to G in order to turn it into a graph thatsatisfies P. An ε-test for P is a randomized algorithm T making a total number of edge queriesbounded by a function of ε only, and which can distinguish with probability at least 2/3 betweengraphs satisfying P and graphs that are ε-far from satisfying P. A graph property P is then saidto be testable if, for any given ε > 0, there exists an ε-test for P.

The central problem in graph property testing is to determine which properties are testable,and also to devise efficient ε-tests for these properties. To see its relation to our subject of study,suppose we have a decomposition of a graph G into a structured low-complexity component anda pseudorandom component. Intuitively, if we query a large (constant) number of randomlychosen edges from G, then with high probability we will have queried the expected proportion ofedges from each of the classes in the structured component, and the effects of the pseudorandomcomponent will be averaged out; thus a property P should be testable if and only if knowing the

4 Chapter 1. Introduction

structured component of a graph is enough to tell if P is satisfied (or is close to being satisfied)for that graph.

It turns out that this intuition may be formalized, and the testable graph properties werecompletely characterized in this sense on a great paper by Alon, Fischer, Newman and Shapira[1]. Their characterization roughly says that a graph property P is testable if and only if, forevery ε > 0, ε-testing P can be reduced (in a specific property-testing sense) to the property ofsatisfying one of finitely many Szemeredi-partition instances. This characterization is an illus-trative example of our “low-complexity decomposition” interpretation of the Regularity Lemma,and this interpretation was key in proving the characterization in [1].

Another class of results which is closely related to our subject matter of low-complexitydecompositions is that of transference principles, which allow us to transfer some combinatorialtheorems from the “dense setting” over to the “sparse setting”, where the objects may be muchharder to handle.

To account for the vanishing density of the objects in the sparse setting, it is usual to renor-malize the functions representing these objects so that they have average close to 1. This renor-malization causes the functions considered to become unbounded as the size of the universe Xgrows, which is a major source of difficulties. The transference principles then assert that, ifthe sparse (unbounded) functions satisfy some mild “uniformity” conditions, then they may bemodeled by bounded functions which have similar key properties as those of the original function.

Transference principles of this form were essential ingredients in the papers [16, 36], wherethe authors transfered Szemeredi’s Theorem, and its generalization to polynomial progressionsdue to Bergelson and Leibman [4], to dense subsets of a sufficiently pseudorandom subset of theintegers. Using this result, they were able to prove that the theorems mentioned above holdalso for the set of primes, even though it has zero density inside the integers. In a subsequentpaper [35], Tao transfered the Hypergraph Removal Lemma (see [32, 28]) to sub-hypergraphs ofpseudorandom sparse hypergraphs, and then used this result to prove that the Gaussian primescontain arbitrarily shaped constellations.

In this work we will not focus on giving applications of the decomposition theorems mentioned,but rather concentrate on the abstract mathematical ideas behind these results. These ideasmay be viewed as representing a dichotomy between structure and randomness, as brilliantlyadvocated by Tao [34, 33], and which seems to permeate many areas of Mathematics.

We will focus here on the case of (finite) sets, graphs and hypergraphs, with the main interestbeing the case of graphs. However, our methods are presented in a general context, and mayalso be used in other settings.

In Chapter 2 we will present the general framework for establishing our results and the precisedefinition of complexity and pseudorandomness we will use. We will then show how to apply ourdecomposition theorems by using them to prove a weaker form of the Regularity Lemma due toFrieze and Kannan [12], and then Szemeredi’s Regularity Lemma itself.

In Chapter 3 we show how the regular properties of the partitions given in each form ofthe Regularity Lemma may be used to approximate the number of copies of any fixed graphH inside a graph G. This approximate counting is then used to prove an important result inGraph Theory known as the Graph Removal Lemma, which roughly says that every graph Gon n vertices having o(n|V (H)|) copies of a given graph H can be made H-free by deleting o(n2)edges.

We next prove two extensions of the Regularity Lemma for graphs in Chapter 4, which wereeach made to handle a different issue that the original Regularity Lemma left unaddressed. Thefirst extension, called the Regular Approximation Lemma (Theorem 4.1), intuitively asserts thatis possible to greatly enhance the regularity of a graph by making very few edge modifications,and so gives us more control for the pseudorandom component in terms of the complexity of thestructured component. The second extension (Theorem 4.2) is a relative form of the RegularityLemma, useful for dealing with arbitrary spanning subgraphs of a known fixed graph, and isespecially useful for dealing with very sparse graphs.

We will then give in Chapter 5 the generalization of the Regularity Lemma to the setting ofuniform hypergraphs, which are “higher order” versions of graphs whose edges are now composedof d vertices, for some integer d ≥ 3 called the uniformity of the hypergraph. We remark that thishigher number of vertices inside each edge introduces a much more intricate structure than thatpresent in graphs, and the corresponding regularity lemma is accordingly much more involvedthan Szemeredi’s Regularity Lemma.

1.3. Examples and structure of the thesis 5

In Chapter 6 we will consider transference principles, which were already discussed above,and which permit us to transfer some combinatorial theorems from the usual “positive density”setting to objects having asymptotically negligible density but which satisfy some mild uniformityconditions. We will present three results in this direction, which concern different uniformityconditions we require the sparse objects to satisfy, but which give similar conclusions.

Chapter 7 will be dedicated to obtaining transference results in the graph setting for L1 struc-ture, which is stronger than the cut structure which is preserved by the transference principlesgiven in Chapter 6. These results are in some sense an strengthening of the theorems given inChapter 6 when applied to the setting of graphs, and may be seen as requiring the “transferencefunction” from the sparse space to the dense space to be continuous in L1 norm, so as to pre-serve the underlying L1 geometry. The results presented in this chapter are the main originalcontributions of this work

Chapter 8 then mentions some possible extensions to the results shown in Chapter 7, indi-cating a path to be taken for future work.

7

Chapter 2

Abstract decomposition theorems

This chapter is aimed at developing a general method to decompose an arbitrary object f ofsome kind into a sum f = fstr + fpsd, where fstr is a low-complexity structured componentand fpsd behaves randomly. As mentioned in the introduction, such a decomposition is usefulbecause we may then use different methods to analyze each one of the components separately,taking advantage of their structure, and making it much easier to analyze the arbitrary originalobject f .

In such a decomposition we must always perform a trade-off, increasing our control on oneof the components at the expense of worsening our control on the other. In many situations, itturns out to be useful to allow a third term ferr into the decomposition f = fstr + fpsd + ferr,which can be seen as an error term and may be made sufficiently small for the application athand. We will see in Subsection 2.4.1 that the presence of the error component is in fact essentialif we wish to use this decomposition to prove Szemeredi’s Regularity Lemma.

The method we will use to prove such “decomposition theorems” is a simple energy-incrementargument (in this form due to Tao [33]), which will be described in the next sections.

2.1 Probabilistic setting

Let (X,Bmax,P) be a probability space, and for brevity let us call a sub-σ-algebra B of Bmax

simply a factor of Bmax.Given measurable sets A1, · · · , Ak ∈ Bmax, we denote by σ(A1, · · · , Ak) the smallest factor of

Bmax which contains all these sets. Given factors B1, · · · ,Bk ⊆ Bmax, we denote by B1∨· · ·∨Bk

the join of these factors, which is the smallest factor of Bmax which contains all of them.Given a square-integrable function f ∈ L2(Bmax) and a factor B ⊆ Bmax, we define the

conditional expectation E[f |B] ∈ L2(B) as the orthogonal projection of f to the closed subspaceL2(B) of L2(Bmax) consisting of the B-square integrable functions.

A simple application of Pythagoras’ theorem then gives the following lemma:

Lemma 2.1 (Pythagoras’ theorem). Let B ⊆ B′ be two factors of Bmax. Then for any functionf ∈ L2(Bmax) we have

‖E[f |B′]‖2L2 = ‖E[f |B]‖2L2 + ‖E[f |B′]− E[f |B]‖2L2

We remark that, even though the general decomposition theorems in this chapter are statedand proven in full generality, in applications we will only deal with finite probability spacesequipped with the discrete σ-algebra Bmax = 2X . In this restricted setting, X will be a finiteset, every subset A ⊆ X will be measurable and a factor B of Bmax may be identified with thepartition of X induced by its atoms; this identification will be made throughout the rest of thiswork without further comments.

If in addition we suppose that the probability P is the uniform probability distribution overX, then for any partition B : X = X1 ∪ · · · ∪Xk of X into k atoms we have that

E[f |B](x) =1

|Xi|∑y∈Xi

f(y)

whenever x ∈ Xi; thus the conditional expectation is just an averaging of the function over theatoms of B.

8 Chapter 2. Abstract decomposition theorems

As an important example, consider the case of a bipartite graph G = (V1 ∪ V2, E). We thentake X = V1×V2, Bmax = 2V1×V2 and the uniform probability distribution over (X,Bmax); thiscorresponds to picking a pair (x1, x2) ∈ V1 × V2 uniformly at random. If we have a partitionB : V1 × V2 =

⋃i,j Xi × Yj , then E[1G|B](x, y) = eG(Xi, Yj)/|Xi × Yj | is just the edge density

between Xi 3 x and Yj 3 y.For any sets A ⊆ V1, B ⊆ V2 we then have that

|E [(1G − E[1G]) 1A×B ]| = 1

|V1 × V2|

∣∣∣∣eG(A,B)− eG(V1, V2)

|V1 × V2||A×B|

∣∣∣∣ ,which may be seen as a kind of discrepancy of the edges over the pair (A,B) and, if made smallerthan ε, resembles the usual definition of ε-regularity (which we recall bellow in Remark 2.1).

It will therefore be more appropriate in our setting to define ε-regularity in the following lessstandard (but essentially equivalent) way:

Definition 2.1 (ε-regularity). A bipartite graph G = (V1 ∪V2, E) is ε-regular for some ε > 0 if,for all sets A ⊆ V1, B ⊆ V2, we have that∣∣∣∣eG(A,B)− eG(V1, V2)

|V1 × V2||A×B|

∣∣∣∣ ≤ ε|V1 × V2| (2.1)

Similarly, a (non-bipartite) graph G = (V,E) is ε-regular if, for all A,B ⊆ V , we have∣∣∣∣eG(A,B)− 2|E||V |2|A×B|

∣∣∣∣ ≤ ε|V |2Remark 2.1. The usual definition of ε-regularity requires instead the left-hand side of (2.1) tobe smaller than ε|A×B| whenever |A| ≥ ε|V1|, |B| ≥ ε|V2|; this requirement implies ε-regularityin our definition, which in turn implies ε1/3-regularity in the usual definition.

2.2 Structure and pseudorandomness

We will now define the notions of structure and pseudorandomness we will need for the de-composition theorems in the next two sections; these notions were first introduced by Tao in[33].

The basic structured objects will be factors of Bmax belonging to a collection S fixed at thebeginning. These factors are supposed to be of low complexity and represent the information wecan efficiently obtain about our random variables. It is from them that we define the complexityof other objects.

Remark 2.2. As explained in Section 1.1, the idea was to define the complexity of arbitrary setsin our space X by how they relate to some “basic structured sets” belonging to a given family C.While this is indeed the spirit of our definitions, in order to apply the energy-increment methodit is better to work with factors of Bmax instead of subsets of X; this is why the basic objects withwhich we define complexity are factors instead of sets. It may be instructive to think of thesebasic factors in S as each being generated by a single basic structured set in C.

Definition 2.2 (Complexity). We say that a factor B ⊆ Bmax has S-complexity at most M ,and denote this by complexS(B) ≤ M , if it may be written as the join B = Y1 ∨ · · · ∨ Ym of mfactors Yi ∈ S for some m ≤M .

Definition 2.3 (Pseudorandomness). Given ε > 0, we say that a function f ∈ L2(Bmax) isε-pseudorandom according to S if ‖E[f |Y]‖L2 ≤ ε holds for all Y ∈ S.

Intuitively, a function f : X → R is pseudorandom if it has negligible correlation with allstructured factors. This may be seen by using the Cauchy-Schwarz inequality: for every setA ⊆ X which is measurable by some structured factor Y ∈ S, we have

|E[f1A]| = |E [E [f |Y] 1A]| ≤ ‖E[f |Y]‖L2‖1A‖L2 ≤ ‖E[f |Y]‖L2 , (2.2)

and so |E[f1A]| ≤ ε if f is ε-pseudorandom.

2.3. Weak decompositions 9

2.3 Weak decompositions

To ease the presentation, we will give in this chapter the general theorems for “one-dimensional”functions, i.e., functions whose image is in R. However, this restriction is not of importancein the proofs, and the theorems may then be straightforwardly generalized to functions whoseimage is in Rk. We will say more about this generalization on a later chapter, where we will needa “multi-dimensional” structure theorem in order to prove the Hypergraph Regularity Lemma.

From the definitions presented last section, we get the following energy-increment result:

Lemma 2.2 (Lack of pseudorandomness implies energy increment [33]). Let f ∈ L2(Bmax),ε > 0 and B ⊆ Bmax be such that f −E[f |B] is not ε-pseudorandom. Then there exists a factorY ∈ S such that

‖E[f |B ∨ Y]‖2L2 > ‖E[f |B]‖2L2 + ε2

Proof. By hypothesis, there exists Y ∈ S such that ‖E [f − E [f |B] |Y]‖2L2 > ε2. By Pythagoras’theorem, this implies that

‖E [f − E [f |B] |B ∨ Y]‖2L2 > ε2

By Pythagoras’ theorem again, we have

‖E [f − E [f |B] |B ∨ Y]‖2L2 = ‖E [f |B ∨ Y]− E [f |B]‖2L2

= ‖E [f |B ∨ Y]‖2L2 − ‖E [f |B]‖2L2 ,

and so ‖E [f |B ∨ Y]‖2L2 > ‖E [f |B]‖2L2 + ε2.

To draw a parallel with Graph Theory, note that on the bipartite graph setting if

B : V1 × V2 =⋃i,j

Xi × Yj

is a (product) partition of V1 × V2, then

‖E[f |B]‖2L2 =∑i

∑j

|Xi||V1||Yj ||V2|

(eG(Xi, Yj)

|Xi × Yj |

)2

is the so-called index of the partition B, which is an essential ingredient in the usual proof ofthe Regularity Lemma; here, it will represent the “energy” we wish to maximize and is at thecore of our energy-increment arguments.

By a simple iteration of Lemma 2.2, we easily obtain a “weak” decomposition theorem (which,following Tao, we will call the Weak Structure Theorem):

Lemma 2.3 (Weak Structure Theorem [33]). Let f ∈ L2(Bmax) be such that ‖f‖L2 ≤ 1, let Bbe a factor of Bmax and let 0 < ε ≤ 1. Then there exists a decomposition f = fstr + fpsd where:

• fstr = E[f |B ∨ Z] for some factor Z of S-complexity less than 1/ε2

• fpsd is ε-pseudorandom according to S

Proof. We will choose factors Y1,Y2, · · · ,Ym ∈ S, for some m < 1/ε2, using the followingalgorithm (which relies on Lemma 2.2):

– Step 0: Initialize i = 0

– Step 1: Define Z := Y1 ∨ · · · ∨ Yi, fstr := E[f |B ∨ Z] and fi := f − fstr

– Step 2: If fi is ε-pseudorandom, let fpsd := fi and STOP. Otherwise, choose Yi+1 ∈ S suchthat ‖E[f |B ∨ Z ∨ Yi+1]‖2L2 > ‖E[f |B ∨ Z]‖2L2 + ε2

– Step 3: Increment i to i+ 1 and return to Step 1

Since the energy ‖E[f |B∨Z]‖2L2 is bounded between 0 and 1 (by the hypothesis ‖f‖L2 ≤ 1) andincrements by more than ε2 at each iteration, the algorithm must terminate in less than 1/ε2

iterations and the lemma follows.


2.3.1 Weak Regularity Lemma

As a simple and important application of Lemma 2.3, we will prove Frieze and Kannan’s WeakRegularity Lemma:

Theorem 2.1 (Weak Regularity Lemma [12]). For every ε > 0 and every graph G = (V,E),

there exists a partition P : V = V1 ∪ · · · ∪ Vk of V into k ≤ 22/ε2 parts satisfying∣∣∣∣∣∣eG(A,B)−∑i,j∈[k]

eG(Vi, Vj)

|Vi × Vj ||A ∩ Vi||B ∩ Vj |

∣∣∣∣∣∣ ≤ ε|V |2, ∀A,B ⊆ V (2.3)

Remark 2.3. A partition P satisfying (2.3) is called a weak ε-regular partition of V .

Proof. This is basically a restatement of the Weak Structure Theorem specialized to the graphsetting.

The probability space here is given by (V ×V, P, 2V×V ), with P being the uniform probabilitydistribution over V × V . We define the structured set S := σ (A× V, V ×B) : A,B ⊆ V ,which is chosen so that any product set A×B ⊆ V ×V will be measurable in some factor Y ∈ S,and any factor Y ∈ S will have only product sets as its atoms.

By the Weak Structure Theorem applied to the edge indicator function 1G and the trivialσ-algebra B = ∅, V × V , we obtain a factor Z of S-complexity at most 1/ε2 for which

‖E [1G − E [1G|Z] |Y]‖L2 ≤ ε, ∀Y ∈ S

By construction, the factor Z will be a product σ-algebra, and each “coordinate” of Z induces apartition of V into at most 21/ε2 atoms; we refine these partitions and so obtain a single partitionP : V = V1 ∪ · · · ∪ Vk into k ≤ 22/ε2 parts.

Since for any sets A,B ⊆ V there exists a structured factor Y ∈ S for which A×B ∈ Y, byCauchy-Schwarz we obtain

maxA,B⊆V

E [(1G − E [1G|Z])1A×B ] = maxA,B⊆V

E [(E [1G − E [1G|Z] |σ(A× V, V ×B)]) 1A×B ]

≤ maxY∈S‖E [1G − E [1G|Z] |Y]‖L2 ≤ ε

We then finish the proof by noting that

E [(1G − E [1G|Z])1A×B ] =1

|V |2

eG(A,B)−∑i,j∈[k]

eG(Vi, Vj)

|Vi × Vj ||A ∩ Vi||B ∩ Vj |

2.4 Strong decompositions

The Weak Structure Theorem (Lemma 2.3) already gives an interesting and non-trivial decom-position result, but its applications are limited because the pseudorandomness of the componentfpsd is relatively weak compared to the complexity bound we have for the component fstr. Asalready noted before, the way to increase our control on both of these terms simultaneously isto allow for a small error term ferr on the decomposition.

This is done in the following theorem:

Theorem 2.2 (Strong Structure Theorem [33]). Let f ∈ L2(Bmax) be such that ‖f‖L2 ≤ 1, let0 < ε ≤ 1 and let F : R+ → R+ be an arbitrary increasing function. Then there exists an integerM = Oε,F (1) and a decomposition f = fstr + fpsd + ferr such that:

• fstr = E[f |B], for some factor B of S-complexity at most M

• fpsd = f − E[f |B′], where B ⊆ B′ ⊆ Bmax, is 1/F (M)-pseudorandom

• ferr = E[f |B′]− E[f |B] satisfies ‖ferr‖L2 ≤ ε

Proof. By increasing F if necessary, we may assume that F (x) ≥ x+ 1 for all x ∈ R+, and thatF is strictly increasing.

2.4. Strong decompositions 11

We will recursively define real numbers M0 < M1 < M2 < · · · and factors B0 ⊆ B1 ⊆B2 ⊆ · · · ⊆ Bmax in the following way. First, set M0 := 0 and B0 := ∅, X. Then, for everyi ≥ 1, use Lemma 2.3 with ε being 1/F (Mi−1) and B being Bi−1 to obtain a factor Zi ofS-complexity at most F (Mi−1)2 such that f − E[f |Bi−1 ∨ Zi] is 1/F (Mi−1)-pseudorandom; setthen Mi := Mi−1 + F (Mi−1)2 and Bi := Bi−1 ∨ Zi.

Note that complexS(Bi) ≤ Mi for all i ≥ 0. By Pythagoras’ theorem and the hypothesis‖f‖L2 ≤ 1, the energy ‖E[f |Bi]‖2L2 is increasing and is bounded between 0 and 1. By thepigeonhole principle, we may then find an index 1 ≤ j ≤ 1/ε2 such that

‖E [f |Bj ]‖2L2 − ‖E [f |Bj−1]‖2L2 ≤ ε2

By Pythagoras’ theorem, this implies that ‖E [f |Bj ]− E [f |Bj−1]‖L2 ≤ ε. We may then setfstr := E [f |Bj−1], fpsd := f −E [f |Bj ], M := Mj−1 and ferr := E [f |Bj ]−E [f |Bj−1] to obtainthe claim.

We remark that the upper bound we get for M in this theorem is extremely large. Indeed,this bound is obtained by iteratively applying 1/ε2 times the transformation x 7→ x + F (x)2

starting from x = 0.If F is an exponential function (as will be the case in the proof of Szemeredi’s Regularity

Lemma), then we obtain an exponential tower of height Θ(ε−2). Unfortunately, as we will brieflydiscuss in Remark 2.5 at the end of this chapter, these terrible bounds cannot be improved ingeneral.

2.4.1 Szemeredi Regularity Lemma

Using the Strong Structure Theorem, we may now prove Szemeredi’s celebrated RegularityLemma:

Theorem 2.3 (Szemeredi Regularity Lemma [31]). For every ε > 0 and k0 ≥ 1, there existsan integer K0 such that the following holds. Every graph G = (V,E) with |V | ≥ K0 admits apartition P : V = V0 ∪ V1 ∪ · · · ∪ Vk of its vertex set with the following properties:

• k0 ≤ k ≤ K0

• |V0| < ε|V | and |V1| = |V2| = · · · = |Vk|

• all but at most εk2 of the pairs (Vi, Vj) are ε-regular

Remark 2.4. The set V0 is called the exceptional set, and a partition P satisfying the secondand third properties above is called an ε-regular partition of V .

Proof. We will apply the Strong Structure Theorem in the graph setting: X = V × V , Bmax =2V×V and uniform probability distribution P; this is the space generated by selecting pairs ofvertices (x, y) from V 2 independently and uniformly at random. The set of basic structuredobjects will be S := σ(A× V, V ×B) : A,B ⊆ V , which represents the information of whethervertex x belongs to a subset A and whether y belongs to a subset B of V .

Define the error parameter α and the function F by

α :=ε3/2

8, F (x) :=

8

ε

(k0 +

22x

α

)2

(2.4)

Applying the Strong Structure Theorem to the edge indicator function 1G, with ε substituted byα and the function F as defined above, we obtain an integer M = Oα,F (1) and a pair of factorsB ⊆ B′ ⊆ 2V×V such that:

– complexS(B) ≤M

– ‖E [1E − E[1E |B′]|Y]‖L2 ≤ 1/F (M) ∀Y ∈ S

– ‖E[1E |B′]− E[1E |B]‖L2 ≤ α


We note that, for any function f and any Y-measurable set Y , we can apply Cauchy-Schwarzto obtain

|E [f1Y ]| = |E [E [f |Y] 1Y ]| ≤ ‖E [f |Y]‖L2 ‖1Y ‖L2 ≤ ‖E [f |Y]‖L2

By the definition of the set S, the second property then implies that

|E [(1E − E [1E |B′]) 1A×B ]| ≤ 1

F (M)∀A,B ⊆ V (2.5)

By construction, the factor B is a product σ-algebra and each coordinate of B is generatedby at most complexS(B) ≤M sets. Thus, each coordinate of B induces a partition of V into atmost 2M parts, and their common refinement (which we will call P) partitions V into at most 22M

atoms. We further refine this partition into a new more “equitable” partition V = V0∪V1∪· · ·∪Vkin such a way that:

– k0 ≤ k = OM,α(1)

– |V0| < α|V | and |V1| = |V2| = · · · = |Vk|

– each non-exceptional set Vi is entirely contained within an atom of P

This may be accomplished by setting k := maxk0, d22M/αe, partitioning V greedily inside theatoms of P, and uniting all remaining vertices into the exceptional set V0; this way we have that

|V0| ≤(|V |k− 1

)22M < α|V | and |Vi| =

|V | − |V0|k

≥ (1− α)|V |k

∀i ∈ [k]

As each Vi×Vj is contained inside an atom of B, we see that E[1G|B] is constant over Vi×Vjfor all i, j ∈ [k]; let di,j denote this value. To show that this pair (Vi, Vj) is ε-regular, that is∣∣∣∣eG(A,B)− eG(Vi, Vj)

|Vi × Vj ||A×B|

∣∣∣∣ ≤ ε|Vi × Vj | ∀A ⊆ Vi, B ⊆ Vj ,

by the triangle inequality it suffices to show that

|eG(A,B)− di,j |A×B|| ≤ε

2|Vi × Vj | ∀A ⊆ Vi, B ⊆ Vj

Dividing by |V |2, this is equivalent to

|E [(1G − E[1G|B])1A×B ]| ≤ ε|Vi × Vj |2|V |2

Since |Vi × Vj | ≥ |V |2/2k2, it suffices to show that |E [(1G − E[1G|B])1A×B ] | ≤ ε/4k2.By our choice of F and inequality (2.5), we have

|E [(1G − E[1G|B′])1A×B ] | ≤ ε

8

(k0 +

22M

α

)−2

≤ ε

8k2,

so by the triangle inequality it is sufficient to prove that

|E [(E[1G|B′]− E[1G|B]) 1A×B ]| ≤ ε

8k2

Because ‖1A×B‖2L2 ≤ ‖1Vi×Vj‖2L2 ≤ 1/k2, by Cauchy-Schwarz it suffices to show that∣∣∣E [(E[1G|B′]− E[1G|B])21Vi×Vj

]∣∣∣ ≤ ε2

64k2

This last inequality must be satisfied by all but at most εk2 pairs (Vi, Vj), otherwise we willhave ∣∣∣E [(E[1G|B′]− E[1G|B])

2]∣∣∣ > εk2 · ε2

64k2=ε3

64= α2,

contradicting the fact that ‖E [1G|B′]− E [1G|B]‖L2 ≤ α, and finishing the proof.

2.4. Strong decompositions 13

Repeating the same proof but starting with a non-trivial partition of the vertex-set into atmost k0 parts, we can make sure the final regular partition of the graph refines a given partition,which is important in some applications of the Regularity Lemma (especially when dealing withpartite graphs).

Also, by applying this theorem with a somewhat smaller value for ε and then equitablyredistributing the vertices of the exceptional set V0 into the other parts, we can easily obtain a(nearly) equitable partition of V without an exceptional class which satisfies the same regularityconditions.

We present these simple remarks as a corollary bellow:

Corollary 2.1. For every ε > 0 and k0 ≥ 1, there exists an integer K0 such that the followingholds. For every graph G = (V,E) and every equitable partition P0 of V into at most k0 parts,there exists a partition P : V = V1 ∪ V2 ∪ · · · ∪ Vk which refines P0 and satisfies the followingproperties:

• k0 ≤ k ≤ K0

• ||Vi| − |Vj || ≤ 1 for all i, j ∈ [k]

• all but at most εk2 of the pairs (Vi, Vj) are ε-regular

Observe that our proof of the Szemeredi Regularity Lemma relied on the decomposition1G = fstr + fpsd + ferr given by the Strong Structure Theorem (Theorem 2.2). As noted in thebeginning of Chapter 1, the structured component fstr = E [1G|B] gave us our regular partitionof the vertex set and the edge densities di,j between their classes, while the pseudorandomcomponent fpsd = 1G − E [1G|B′] gave us the random-like distribution of the actual edges of Ginside pairs of classes in a finer partition given by B′.

In here we also have the error term ferr, which essentially gives the difference in edge densitiesbetween pairs of the regular partition we constructed and those of the finer partition given by B′,and it is this term which is responsible for the (possible) existence of up to εk2 irregular pairs.It is then a natural question whether the presence of these irregular pairs is truly necessary ormay be eliminated.

It turns out that these irregular pairs are necessary: for a given n ∈ N, let us define thehalf graph on 2n vertices as the bipartite graph Hn with vertex classes A = a1, · · · , an andB = b1, · · · , bn, in which aibj is an edge if and only if i ≤ j. As was noted in [3], the half-graphsgive us an infinite family of graphs for which every ε-regular partition of their vertex sets into kparts must contain many irregular pairs (at least ck, for some c > 0 depending only on ε).

As the presence of irregular pairs comes from the error component ferr, this justifies our claimmade at the beginning of this chapter that the existence of the error component is necessary ifwe wish to prove the Regularity Lemma.

Remark 2.5. As already noted above, because the function F chosen in the proof of the Reg-ularity Lemma has exponential growth, the upper bound K0 we obtain for the number of partsin the regular partition is an exponential tower of height Θ(ε−2). By an ingenious probabilisticconstruction, Gowers [15] was able to prove a lower bound on K0 which was a tower of exponentsof height Θ(ε−1/48). More recently, Fox and Miklos Lovasz [10] were able to improve this boundand show that the upper bound given by our proof is in fact tight, as there exist graphs where anyε-regular partition must have a number of parts at least as big as a tower of exponents of heightΘ(ε−2).

15

Chapter 3

Counting subgraphs and theGraph Removal Lemma

The regularity lemmas we have seen may be viewed as approximation results stating that everygraph G can be approximated by some structure of bounded complexity. This structure isessentially a “rounded” version of G, and may be identified with a weighted graph on the samevertex set as G whose weights are given by the edge densities between the classes of the regularpartition. With this interpretation in mind, we will use the following notation:

Definition 3.1 (Rounded graph). If G = (V,E) is a graph and P : V = V1 ∪ · · · ∪ Vk is apartition of V , we denote by GP := E[1G|P ⊗ P] the function which maps each pair (x, y) ∈ V 2

to the edge density dG(Vi, Vj) between the classes Vi 3 x and Vj 3 y.

The notion of approximation given by the regularity lemmas is related to indistinguishabilityby cuts. More precisely, following the discussion in Section 1.1, we say that G and GP areε-indistinguishable by cuts if |Ex,y∈V [(1G(x, y)−GP(x, y))1A×B(x, y)]| ≤ ε holds for all cutsA×B ⊆ V × V .

This suggests us to work with the following norm, which will greatly simplify the presentationand the proofs of some results to follow:

Definition 3.2 (Cut norm). Let V1, V2 be (not necessarily distinct) finite sets. Given anyfunction f : V1 × V2 → R, we define the cut norm of f as

‖f‖ := maxA⊆V1,B⊆V2

|Ex∈V1,y∈V2[f(x, y)1A×B(x, y)]| (3.1)

It is easy to see that the cut norm is indeed a norm. Using this notation, our definitionof ε-regularity of a graph G is equivalent to the inequality ‖1G − E[1G]‖ ≤ ε. Moreover, apartition P of V (G) is weak ε-regular for G if and only if ‖1G −GP‖ ≤ ε.

One can easily obtain the following equivalent expression for the cut norm, which will proveto be useful below:

‖f‖ = maxa:V1→[0,1]b:V2→[0,1]

|Ex∈V1,y∈V2 [f(x, y)a(x)b(y)]| (3.2)

Indeed, since the expectation above is bilinear in a and b, the extrema occur when a and b are0, 1-valued and so (3.1) and (3.2) are equivalent.

The counting lemmas proven in the next sections are standard in Graph Theory, and concernapproximately counting copies of a fixed graph H inside a large graph G using only the infor-mation given by the rounded graph GP (we define a copy of H in G as being a subgraph of Gwhich is isomorphic to H).

3.1 Counting subgraphs globally

Instead of counting copies of a subgraph H in G, it will be more convenient (and essentiallyequivalent when G is large) to count homomorphisms from H to G.

Definition 3.3 (Homomorphism). Given two graphs G and H, an homomorphism from H toG is a map ϕ : V (H)→ V (G) which preserves adjacency between vertices:

∀x, y ∈ V (H), x, y ∈ E(H)⇒ ϕ(x), ϕ(y) ∈ E(G)

16 Chapter 3. Counting subgraphs and the Graph Removal Lemma

We denote the number of homomorphisms from H to G by hom(H,G).

Denoting by n the number of vertices in G and by h the number of vertices in H, we notethat hom(H,G) differs from the number of labeled copies of H in G by at most nh−1, whichbecomes negligible when compared to nh as n grows large; this justifies our claim that countingcopies of H in a large graph G is essentially equivalent to counting homomorphisms from H toG.

We then have the following lemma, which roughly says we can count copies of H inside G upto an o(nh) additive error by only knowing a weak o(1)-regular partition for G:

Lemma 3.1 (Global counting lemma). Let H be a graph with vertex set V (H) = [h] and ε > 0 bea positive number. Then for any graph G = (V,E) with |V | = n and any partition P = (Vi)i∈[k]

of V which is weak ε-regular for G, we have

hom(H,G) =∑

φ:[h]→[k]

∏ij∈E(H)

dG(Vφ(i), Vφ(j))∏i∈[h]

|Vφ(i)| ± ε|E(H)|nh,

where the sum is over all functions from [h] to [k].

Proof. By definition, for any x1 ∈ Vφ(1), x2 ∈ Vφ(2), · · · , xh ∈ Vφ(h) and any i, j ∈ [h], we haveGP(xi, xj) = dG(Vφ(i), Vφ(j)). It follows that∏

ij∈E(H)


1xi∈Vφ(i)=

∏ij∈E(H)

GP(xi, xj)∏i∈[h]

1xi∈Vφ(i)(3.3)

We note also that ∏i∈[h]

|Vφ(i)| =∑

x1,··· ,xh∈V

∏i∈[h]

1xi∈Vφ(i)(3.4)

Using equations (3.3) and (3.4), we obtain∑φ:[h]→[k]

∏ij∈E(H)


|Vφ(i)|

=∑

φ:[h]→[k]

∏ij∈E(H)

dG(Vφ(i), Vφ(j))

∑x1,··· ,xh∈V

∏i∈[h]

1xi∈Vφ(i)

=

∑φ:[h]→[k]

∑x1,··· ,xh∈V

∏ij∈E(H)

GP(xi, xj)∏i∈[h]

1xi∈Vφ(i)

=∑

x1,··· ,xh∈V

∑φ:[h]→[k]

∏i∈[h]

1xi∈Vφ(i)

∏ij∈E(H)

GP(xi, xj)

=∑

x1,··· ,xh∈V

∏ij∈E(H)

GP(xi, xj)

It then suffices to prove the inequality∣∣∣∣∣∣Ex1,··· ,xh∈V

∏ij∈E(H)

1G(xi, xj)−∏

ij∈E(H)

GP(xi, xj)

∣∣∣∣∣∣ ≤ ε|E(H)|

We will prove this by a simple telescoping sum argument, which was given in [5]. Let us arbitrarilylabel the edges of H by e1, e2, · · · , e|E(H)|; we then have the identity∏

ij∈E(H)

1G(xi, xj)−∏

ij∈E(H)

GP(xi, xj)

=

|E(H)|∑t=1

(t−1∏r=1

GP(xer )

)(1G(xet)−GP(xet))

|E(H)|∏s=t+1

1G(xes)

3.2. Counting subgraphs locally and the Removal Lemma 17

Suppose for notational convenience that et = 1, 2; then for any fixed x3, · · · , xh ∈ V we have∣∣∣∣∣∣Ex1,x2∈V

(t−1∏r=1

GP(xer )

)(1G(x1, x2)−GP(x1, x2))

|E(H)|∏s=t+1

1G(xes)

∣∣∣∣∣∣≤ |Ex1,x2∈V [(1G(x1, x2)−GP(x1, x2)) at(x1)bt(x2)]| ,

where at and bt are the functions given by

at(x1) :=∏r<t1∈er

GP(xer )∏s>t1∈es

1G(xes) and bt(x2) :=∏r<t2∈er

GP(xer )∏s>t2∈es

1G(xes).

By hypothesis we have that ‖1G − GP‖ ≤ ε, so by equation (3.2) the expression on the rightis at most ε for all fixed x3, · · · , xh.

Applying this same reasoning for all edges i, j ∈ E(H) and using the triangle inequality,we obtain∣∣∣∣∣∣Ex1,··· ,xh∈V

∏ij∈E(H)

1G(xi, xj)−∏

ij∈E(H)

GP(xi, xj)

∣∣∣∣∣∣≤|E(H)|∑t=1

∣∣∣∣∣∣Ex1,··· ,xh∈V

(t−1∏s=1

GP(xes)

)(1G(xet)−GP(xet))

|E(H)|∏s=t+1

1G(xes)

∣∣∣∣∣∣≤ ε|E(H)|

Using this lemma and a new efficient algorithm for finding a weak regular partition of a graph,Fox, Miklos Lovasz and Zhao [11] recently obtained a deterministic algorithm running in timeO(ε−OH(1)n2) which, for any given graph G on n vertices, finds the number of copies of H in Gup to an error of at most εnh.

Note that the bounds obtained by the Weak Regularity Lemma (Theorem 2.1) are exponentialin 1/ε, while the running time of the algorithm mentioned is polynomial in 1/ε (for any fixedgraph H). This is possible because the weak ε-regular partition obtained in the proof of Theorem2.1 is actually generated by only 2/ε2 sets, and it is possible to use these generating sets in thealgorithm for computing copies of H instead of the partition they induce on V .

3.2 Counting subgraphs locally and the Removal Lemma

The Global Counting Lemma proven last section allows us to count the total number of copiesof a given graph H inside a large graph G up to an additive error which is small when comparedto |V (G)||V (H)|, the number of maps from V (H) to V (G).

However, due to the global nature of this counting, this lemma is unsuitable for the purposeof counting copies of H inside a small subset of V (G), which will be needed in the proof of theGraph Removal Lemma. For this we need a somewhat stronger counting lemma, which countsthe copies of H locally instead of globally.

The next definition makes this idea of “local counting” more precise.

Definition 3.4 (Canonical homomorphisms). Let H be a graph on [h] and V1, · · · , Vh be (notnecessarily distinct) vertex sets. If G is a graph on V1 ∪ · · · ∪ Vh, then a homomorphism ϕ fromH to G is said to be canonical if it maps each i ∈ [h] to a vertex in the corresponding set Vi. Wethen denote by hom∗(H,G) the number of canonical homomorphisms from H to G.

The corresponding counting lemma for canonical homomorphisms is the following:

Lemma 3.2 (Local Counting Lemma). Let H be a graph with vertex set V (H) = [h], and letV1, · · · , Vh be (not necessarily distinct) vertex sets. If G is a graph on V1 ∪ · · · ∪ Vh for whichthe pair (Vi, Vj) is ε-regular whenever ij ∈ E(H), then the number hom∗(H,G) of canonical

18 Chapter 3. Counting subgraphs and the Graph Removal Lemma

homomorphisms of H in G satisfies∣∣∣∣∣∣hom∗(H,G)−∏

ij∈E(H)

dG(Vi, Vj)∏i∈[h]

|Vi|

∣∣∣∣∣∣ ≤ ε|E(H)|∏i∈[h]

|Vi|

The proof of Lemma 3.2 is very simple and proceeds by the same telescoping sum argumentas that used in the proof of Lemma 3.1, but now using the inequality∣∣Exi∈Vi,xj∈Vj [(1G(xi, xj)− dG(Vi, Vj)) a(xi)b(xj)]

∣∣ ≤ ε,valid for all i, j ∈ E(H) and all functions a : Vi → [0, 1], b : Vj → [0, 1].

Note the asymmetry in the conditions required for each of the lemmas given above: while inthe Global Counting Lemma we only need to know a weak ε-regular partition for the graph G,in the Local Counting Lemma we require all pairs (Vi, Vj) for which ij ∈ E(H) to be ε-regular,which is a much stronger condition.

The reason for this asymmetry is that the Global Counting Lemma only estimates the averageof hom∗φ(H,G) over all mappings φ : [h] → [k] (where hom∗φ(H,G) denotes the number ofhomomorphisms where each i ∈ [h] is mapped to a vertex xi ∈ Vφ(i)), while the Local CountingLemma is equivalent to estimating a single one of these values.

Using Lemma 3.2, we may now state and prove the Graph Removal Lemma:

Theorem 3.1 (Graph Removal Lemma [9]). For every graph H and every ε > 0, there existn0 ∈ N and δ > 0 such that the following holds. Any graph G on n ≥ n0 vertices which containsat most δn|V (H)| copies of H may be made H-free by removing at most εn2 edges.

Before proving Theorem 3.1, it is important to make an observation. This theorem may beinformally though of as saying that, if a given large graph G contains “few” copies of a fixedgraph H, then we can destroy all these copies of H by removing “few” edges from G.

However, while this is indeed a valid interpretation of the result, it hides the crucial fact thatthe two occurrences of the word “few” are actually very different in nature. Indeed, if G has nvertices, then “few” copies of H mean a small constant times n|V (H)|, while “few” edges mean asmall constant times n2. It is not at all clear why, say, 2n7/2 copies of the complete graph K4

on four vertices cannot be fitted inside G in such a way to obtain at least n2/1000 edge-disjointcopies of K4.

Proof. Let h be the number of vertices of H, and define γ := minε4 ,

ε|E(H)|

2|E(H)|

. We will prove

the theorem for δ := ε|E(H)|(4h!)−1(2K(γ))−h and n0 := δ−1, where K(γ) is the bound given bythe Szemeredi Regularity Lemma (Theorem 2.3) for error parameter γ and lower bound m = 1.

Suppose that G has at most δnh copies of H. Applying the Szemeredi Regularity Lemmato G with error parameter γ and lower bound m = 1, we obtain a γ-regular partition V =V0 ∪ V1 ∪ · · · ∪ Vk into k + 1 ≤ K(γ) + 1 parts.

We now construct a subgraph G′ of G by deleting the following edges:

– Edges incident to a vertex in the exceptional set V0;

– Edges between pairs (Vi, Vj) which are not γ-regular;

– Edges between pairs (Vi, Vj) with edge-density at most ε.

The number of edges deleted is then at most

γn · n+ γk2 ·(nk

)2

+ ε

(n

2

)≤ εn2

If G′ does not contain a copy of H we are done; suppose for contradiction that G′ contains acopy of H, and fix such a copy.

For every i ∈ [h], let φ(i) ∈ [k] be the index of the set Vφ(i) which contains the i-th vertexof this copy of H in G′; applying the local counting lemma (Lemma 3.2) on the graph G′ and

3.3. Application to property testing 19

vertex sets Vφ(1), · · · , Vφ(h), we obtain the existence of at least ∏ij∈E(H)

dG′(Vφ(i), Vφ(j)) − γ|E(H)|

∏i∈[h]

|Vφ(i)| ≥ (ε|E(H)| − γ|E(H)|)(

(1− γ)n

k

)h

≥ ε|E(H)|

2

(n

2K(γ)

)h= 2h!δnh

homomorphisms of H in G′, and hence at least 2h!δnh − nh−1 > h!δnh labeled copies of H inG′. This contradicts the fact that G has at most δnh copies of H, completing the proof.

3.3 Application to property testing

As a simple application of Theorem 3.1 we will use it to give an interesting result in the area ofproperty testing, namely that the property of being H-free is testable for any graph H. Let usfirst briefly recall some definitions given in Section 1.3.

Given a graph property P and a constant ε > 0, we say that a graph G on n vertices is ε-farfrom satisfying P if one needs to add and/or delete at least εn2 edges to G in order to turn itinto a graph that satisfies P. An ε-test for P is a randomized algorithm which, by making onlyOε(1) edge queries, can distinguish with probability at least 2/3 between graphs satisfying Pand graphs that are ε-far from satisfying P. Finally, a graph property P is testable if, for anygiven ε > 0, there exists an ε-test for P.

For a given graph H, we then say that a graph G satisfies the property of being H-free ifit has no subgraph which is isomorphic to H. Let us now use Theorem 3.1 to show that theproperty of being H-free is testable for any fixed graph H.

Let δ = δ(ε) be the constant obtained in Theorem 3.1 applied to a given graph H and quantityε > 0. Our ε-test for H-freeness then proceeds as follows: take k := d2/δe sets of h := |V (H)|vertices each from G, chosen uniformly and independently at random, and declare G to be H-freeif and only if none of them contains a copy of H.

If G is indeed H-free, it is clear that we won’t obtain a copy of H this way, so the algorithmwill give the correct answer with probability 1. If G is ε-far from being H-free, by Theorem 3.1there exist at least δnh copies of H in G. Since each set of h vertices in G can contain at mosth! copies of H, it follows that the probability of finding a copy of H inside a uniformly chosenset of h vertices in G is at least

δnh/h!(nh

) ≥ δ

The probability that the algorithm errs in this case (that is, fails to find a copy of H) is thenat most (1− δ)k ≤ e−δk ≤ e−2 < 1/3, proving this algorithm is indeed an ε-test.

Using a stronger variant of the Regularity Lemma which is very close in spirit to our StrongStructure Theorem (when specialized to the graph setting), Alon, Fischer, Krivelevich and M.Szegedy [2] were able to prove the natural generalization of the Graph Removal Lemma forinduced subgraphs (where we are now allowed to add and/or remove up to εn2 edges in orderto destroy all induced copies of H in G), and thus that being induced H-free is also testable.They in fact obtained a much stronger result in property testing, as we will briefly discuss in theintroductory part of the next chapter.

21

Chapter 4

Extensions of graph regularity

There have been various extensions of Szemeredi’s Regularity Lemma for graphs (see [27] forsome of them), which involve many of the same principles and ideas as the original RegularityLemma but are each tailored for a specific application.

We have already seen Frieze and Kannan’s Weak Regularity Lemma (Theorem 2.1), whichhas a weaker notion of regularity but provides much better bounds for the size of the regularpartitions, and may thus be used for algorithmic purposes (see, for instance, [12] and [11]).

In the opposite direction, there is the Strong Regularity Lemma of Alon, Fischer, Krivelevichand M. Szegedy [2] mentioned at the end of the last chapter, which provides partitions withstronger regularity properties at the cost of worsening still the upper bound we get for the orderof these partitions. Roughly speaking, given a constant ν > 0 and any decreasing functionε : N → (0, 1], this lemma provides two equitable partitions P ⊂ Q of the graph such that P isν-regular, Q is ε(|P|)-regular, and P is ν-close to Q in some sense which is related to the edgedensities between the classes inside each partition.

This lemma was originally obtained in order to prove a far-reaching generalization of theInduced Graph Removal Lemma for a family of colored graphs, and with this result (and someadditional ideas) prove that every first-order graph property not containing a quantifier alterna-tion of the form ∀∃ is testable (in the sense described in Section 1.3).

It isn’t hard to prove the Strong Regularity Lemma by iterating our Strong Structure Theoremor the Szemeredi Regularity Lemma (we will essentially do it in our proof of Theorem 4.1), butits statement is somewhat technical and we refrain from giving it in order not to make ourpresentation too repetitive. We only remark that, even on the tame case when the function ε(k)is a polynomial in 1/k, the bound we get for the size of the partitions P and Q is a wowzer-typefunction (one level higher in the Ackermann hierarchy than the exponential tower function) in apower of 1/ν. Moreover, this cannot be improved (see [6]).

In this chapter we will focus on two variants of the Regularity Lemma which have a somewhatdistinct flavor.

4.1 Regular approximation

As noted in Remark 2.5, the number of parts in an ε-regular partition of an arbitrary graphcannot be guaranteed to be smaller that a tower of exponents of height Θ(ε−2); even a weak

ε-regular partition can only be guaranteed to have size at most 2Θ(ε−2) [6], which is still areasonably large function of ε.

Despite these results, the next theorem allows us to have an arbitrarily good control on thesize of the regular partition in terms of its regularity parameter, as long as we are allowed tomodify a small fraction of the edges of the graph. It was first obtained by Rodl and Schacht [26]in a more general form (stated for uniform hypergraphs), and is a byproduct of the hypergraphgeneralization of the Regularity Lemma, which we will see in the next chapter.

Theorem 4.1 (Regular Approximation Lemma [26]). For every ν > 0 and every functionε : N → (0, 1] there exist integers n0 and K0 such that the following holds. For every graphG = (V,E) on |V | = n ≥ n0 vertices, there exists an equitable partition V = V1 ∪ V2 ∪ · · · ∪ Vkinto k ≤ K0 parts and a graph H = (V,E′) on the same vertex set as G such that:

• |E4E′| ≤ νn2

• On the graph H, every pair (Vi, Vj) is ε(k)-regular

22 Chapter 4. Extensions of graph regularity

Proof. We will apply an energy-increment argument similar to that used in the proof of Theorem2.2, the main difference being that we now iterate the Szemeredi Regularity Lemma (in the formwhich it appears in Corollary 2.1) instead of Lemma 2.3.

Start with the trivial partition P0 := V and k0 := |P0| = 1. For each i ≥ 0, having Piand ki already known, apply Corollary 2.1 to the graph G with initial partition Pi and errorparameter ε(ki)/4. We then obtain an equitable partition Pi+1 of size ki+1 = Oki,ε(ki)(1) whichrefines Pi and such that all but at most ε(ki)k

2i+1/4 pairs of classes in Pi+1 are ε(ki)/4-regular

for G.Since each Pi+1 refines Pi, if we denote by Pi⊗Pi the product σ-algebra on V ×V generated

by the products of classes in Pi, we see by Pythagoras’ theorem that ‖E [1G | Pi ⊗ Pi]‖2L2 isincreasing and bounded between 0 and 1. By the pigeonhole principle, there must exist a valueof j ≤ 9/ν2 such that

‖E [1G | Pj+1 ⊗ Pj+1]‖2L2 ≤ ‖E [1G | Pj ⊗ Pj ]‖2L2 +ν2

9(4.1)

For such a value of j, let us denote P := Pj and Q := Pj+1, and note that k := |Pj | is boundedby a function depending only on ε(·) and ν. By Pythagoras’ theorem and equation (4.1), weconclude that

‖E [1G | P ⊗ P]− E [1G | Q ⊗ Q]‖L2 ≤ν

3

We relabel the classes of the partitions P and Q in such a way that

P = (Vi)i∈[k], Q = (Vi,r)i∈[k],r∈[m], and Vi =⋃r∈[m]

Vi,r for all i ∈ [k].

Since all refining partitions Pi in our argument are required to be equitable and they havebounded order, this can always be done if |V | = n is sufficiently large.

Now, for every pair of classes (Vi,r, Vj,s) with i, j ∈ [k], r, s ∈ [m], we add or delete edgesrandomly to change the (expected) density of this pair to dG(Vi, Vj). Note that the expectednumber of changed edges in each pair (Vi,r, Vj,s) is

|dG (Vi, Vj)− dG (Vi,r, Vj,s)| · |Vi,r||Vj,s| (4.2)

Because for every (x, y) ∈ Vi,r × Vj,s we have that E [1G|Q ⊗ Q] (x, y) = dG(Vi,r, Vj,s) andE [1G|P ⊗ P] (x, y) = dG(Vi, Vj), it follows from (4.2) that the expected total number of edgeschanged in G is∑

i,j∈[k]

∑r,s∈[m]

|dG (Vi, Vj)−dG (Vi,r, Vj,s) | · |Vi,r||Vj,s|

= n2Ex,y∈V [ |E [1G|P ⊗ P] (x, y)− E [1G|Q ⊗ Q] (x, y)| ]= n2 ‖E [1G|P ⊗ P]− E [1G|Q ⊗ Q]‖L1

≤ νn2

3

By concentration inequalities, this value will be less than νn2/2 with high probability.Moreover, it follows from the Chernoff bound (see the proof of Lemma 7.2 in Chapter 7) that,

after these changes, with high probability all pairs (Vi,r, Vj,s) which were ε(k)/4-regular in G willbe ε(k)/2-regular and have density dG(Vi, Vj)± ε(k)/4.

By definition, at most (ε(k)/4)k2m2 pairs (Vi,r, Vj,s) are not ε(k)/4-regular in G. For eachof them, we substitute the graph G[Vi,r, Vj,s] by a random graph on Vi,r × Vj,s with expecteddensity dG(Vi, Vj). Then with high probability we change a further at most

2ε(k)k2m2

4

2n2

k2m2≤ νn2

2

edges, and all of these pairs (Vi,r, Vj,s) will also be ε(k)/2-regular and have density dG(Vi, Vj)±ε(k)/4 on the modified graph.

4.2. Relative regularity 23

Call by H the graph obtained after all these modifications, and note that

|E(G)4E(H)| ≤ νn2

Consider now a pair of indices i, j ∈ [k]. For all r, s ∈ [m], we know that (Vi,r, Vj,s) is ε(k)/2-regular in H and has density dH(Vi,r, Vj,s) = dG(Vi, Vj) ± ε(k)/4. Thus, for any sets A ⊆ Vi,B ⊆ Vj we have

eH(A,B) =∑

r,s∈[m]

(dH(Vi,r, Vj,s)|A ∩ Vi,r||B ∩ Vj,s| ±

ε(k)

2|Vi,r||Vj,s|

)

=∑

r,s∈[m]

((dG(Vi, Vj)±

ε(k)

4

)|A ∩ Vi,r||B ∩ Vj,s| ±

ε(k)

2|Vi,r||Vj,s|

)

= dG(Vi, Vj)|A||B| ±3ε(k)

4|Vi||Vj |

= dH(Vi, Vj)|A||B| ± ε(k)|Vi||Vj |,

showing that (Vi, Vj) is ε(k)-regular in H and completing the proof.

4.2 Relative regularity

In some applications, we are dealing with spanning subgraphs of a fixed graph G (which may beeasier for us to analyze) and we wish to obtain some regularity result for these graphs relativeto the host graph G.

One important example of this is when we are dealing with subgraphs of the random graphG(n, p), with p = p(n) tending to zero as n grows, and we wish to obtain results valid with highprobability for all spanning subgraphs of G(n, p) (see [8] for many such results).

A closely related situation is that of subgraphs of sparse pseudorandom graphs (see [7]). Inthis case, instead of having a random model for graphs and obtaining results valid with highprobability over the random choices made, we have a fixed (very sparse) graph G which exhibitsrandom-like behavior in the distribution of its edges, and we wish to use this behavior to extendresults from the usual “dense” setting of graphs to all spanning subgraphs of G (this philosophywill be retaken in Chapters 6 and 7).

In this section we will try to obtain the most general conditions possible that the host graphG must satisfy which allow us to use our framework to prove “relative regularity” results forits spanning subgraphs. This is done so that we may use the properties we know the graph Gsatisfies in order to obtain similar properties satisfied by its subgraphs.

The precise notion of relative regularity we will use is defined below:

Definition 4.1 ((ε,H,G)-regularity). Let G be a graph on V and H be a spanning subgraph ofG. We say a pair (U,W ) of subsets of V is (ε,H,G)-regular if∣∣∣∣eH(A,B)

eG(A,B)− eH(U,W )

eG(U,W )

∣∣∣∣ ≤ ε ∀A ⊆ U,B ⊆W : |A×B| ≥ ε|U ×W |,

where we define eH(A,B)/eG(A,B) := 0 when eG(A,B) = 0.

In our analysis we will only consider the case where G is bipartite, but it is easy to extendthis analysis to the non-partite case by using the same arguments that we used in the proofs ofTheorems 2.1 and 2.3.

Let then G = (V1∪V2, E(G)) be a bipartite graph, and we wish to see under which conditionswe are able to obtain regularity-like results for a subgraph H ⊆ G relative to G. Let PG bethe probability distribution on V1 × V2 given by PG(x, y) := 1G(x, y)/|E(G)|, and denote thecorresponding L2 norm by ‖ · ‖L2(G).

Using the Strong Structure Theorem to 1H in this space with a small error parameter α,increasing function F , and with structured set

S := σ(A× V2, V1 ×B) : A ⊆ V1, B ⊆ V2,


we obtain a factor B ⊂ 2V1×V2 of S-complexity at most M = OF,α(1) and a decomposition 1H =fstr + fpsd + ferr, where fstr = EG [1H |B], fpsd is 1/F (M)-pseudorandom and ‖ferr‖L2(G) ≤ α.

The factor B is formed by the join of at most M factors of the form σ(Ai×V2, V1×Bi), withAi ⊆ V1 and Bi ⊆ V2. If V1 = U1 ∪ U2 ∪ · · · ∪ UK is the partition of V1 induced by the sets Aiand V2 = W1 ∪W2 ∪ · · · ∪WL is the partition of V2 induced by the sets Bi, then we know theirorders K,L are at most 2M and every atom of B is of the form Ur ×Ws.

We may refine these partitions in order to obtain equitable partitions

V1 = V1,0 ∪ V1,1 ∪ · · · ∪ V1,k, V2 = V2,0 ∪ V2,1 ∪ · · · ∪ V2,k

into k := d2M/αe sets of equal size plus the exceptional sets V1,0, V2,0 satisfying |V1,0| < α|V1|,|V2,0| < α|V2|. We then wish to show that∣∣∣∣eH(A,B)

eG(A,B)− eH(V1,i, V2,j)

eG(V1,i, V2,j)

∣∣∣∣ ≤ εwhenever A ⊆ V1,i, B ⊆ V2,j satisfy |A×B| ≥ ε|V1,i × V2,j |, for some i, j ∈ [k].

Because each V1,i × V2,j is contained within a single atom Ur ×Ws of B (if i, j ∈ [k]) and

|V1,i × V2,j | ≥(1− α)2|V1 × V2|

k2≥ |V1 × V2|

2k2,

by the triangle inequality it suffices to show that∣∣∣∣eH(A,B)

eG(A,B)− eH(Ur,Ws)

eG(Ur,Ws)

∣∣∣∣ ≤ ε

2

holds whenever A×B ⊆ Ur ×Ws ∈ B satisfy |A×B| ≥ ε|V1 × V2|/2k2.Now we note that, for every (x, y) ∈ Ur ×Ws, we have

EG[1H |B](x, y) =EG[1H1Ur×Ws ]

PG(Ur ×Ws)=eH(Ur,Ws)

eG(Ur,Ws)

This implies that, whenever A×B ⊆ Ur ×Ws, we have

EG[(1H − fstr)1A×B ] =

(eH(A,B)

eG(A,B)− eH(Ur,Ws)

eG(Ur,Ws)

)eG(A,B)

|E(G)|

As 1H − fstr = fpsd + ferr, by the triangle inequality it then suffices to show that

|EG[fpsd1A×B ]| ≤ ε

4

eG(A,B)

|E(G)|and |EG[ferr1A×B ]| ≤ ε

4

eG(A,B)

|E(G)|(4.3)

whenever |A×B| ≥ ε|V1×V2|2k2 and A×B is contained inside a single product set V1,i × V2,j .

For the first inequality, we note that

|EG [fpsd1A×B ]| ≤ ‖EG [fpsd|σ(A× V2, V1 ×B)]‖L2(G) ≤1

F (M)(4.4)

For the second inequality, we apply Cauchy-Schwarz and obtain

|EG [ferr1A×B ]|2 ≤ ‖ferr1V1,i×V2,j‖2L2(G)‖1A×B‖

2L2(G) = EG

[f2err1V1,i×V2,j

] eG(A,B)

|E(G)|,

so the second inequality of (4.3) would be satisfied if we could ascertain that

EG[f2err1V1,i×V2,j

] ≤ ε2

16min

A×B⊂V1×V2

|A×B|≥ε|V1×V2|/2k2

eG(A,B)

|E(G)|(4.5)

4.2. Relative regularity 25

Suppose then we have a lower-bound γ > 0 on the relative density of product sets A× B ofsize greater than ε|V1 × V2|/2k2, i.e.

dG(A,B)

dG(V1, V2)=

eG(A,B)/|A×B|eG(V1, V2)/|V1 × V2|

≥ γ, ∀A ⊆ V1, B ⊆ V2 : |A×B| ≥ ε|V1 × V2|2k2

Then, if we take α = ε2√

γ32 , equation (4.5) must be satisfied by all but at most εk2 pairs

(V1,i, V2,j), otherwise we would have

α2 ≥ ‖ferr‖2L2(G) = EG[f2err] > εk2 · ε

2

16

γ|A×B||V1 × V2|

≥ ε4

32γ = α2

This proves that the second inequality in (4.3) holds for all but at most εk2 of the pairs (V1,i, V2,j).Likewise, if we take the function F (x) := ε−2α−2γ−122x+5, we have that

1

F (M)=

ε2γ

8

( α

2 · 2M)2

≤ ε2γ

8k2≤ ε

4

dG(A,B)

dG(V1, V2)

|A×B||V1 × V2|

=ε

4

eG(A,B)

|E(G)|

holds whenever |A × B| ≥ ε|V1 × V2|/2k2. Together with equation (4.4), this proves the firstinequality in (4.3).

It is a simple exercise to extend this argument to the multi-partite case by repeating it foreach pair (with a smaller error parameter) and refining the partitions obtained. We then obtainthe following theorem of relative regularity.

Theorem 4.2 (Relative Regularity Lemma). For every ε, γ > 0 and k0, ` ≥ 1, there existconstants η > 0 and K0 ≥ k0 such that the following holds.

Let G = (V,E(G)) be a P0-partite graph on n vertices, where P0 : V = V1 ∪ · · · ∪ V` is anequitable `-partition of V , and suppose that

dG(A,B)

dG(Vi, Vj)≥ γ ∀A ⊆ Vi, B ⊆ Vj : |A×B| ≥ η|Vi × Vj | (4.6)

is valid for every i, j ∈ [`].Then every spanning subgraph H ⊆ G admits an equitable partition P into k parts refining

P0 which satisfies:

• k0 ≤ k ≤ K0

• All but at most εk2 pairs of parts in P are (ε,H,G)-regular

The condition (4.6) given in the statement of the theorem amounts to saying that the graphG contains no reasonably large sets of vertices having density much smaller than its expecteddensity. It may be intuitively though of as having “no sparse spots” (apart from those given bythe partition P0).

One of the simplest classes of graphs satisfying this condition is the class of η-uniform graphs,which are graphs that satisfy a natural kind of pseudorandomness condition which takes intoconsideration the graph’s edge density. Below we give its definition in the more general situationof partite graphs, as given in [19]:

Definition 4.2 ((P0, η)-uniform graphs). Let a partition P0 = (Vi)i∈[`] of V be fixed. We write(A,B) ≺ P0 if either ` = 1 or A ⊂ Vi, B ⊂ Vj for some i 6= j in [`]. Given a constant η > 0, wethen say a graph G = (V,E) of density p := 2|E|/|V |2 is (P0, η)-uniform if∣∣∣∣eG(A,B)

|A×B|− p∣∣∣∣ ≤ ηp ∀(A,B) ≺ P0 : |A×B| ≥ η|V |2

If P0 is the trivial partition of V into a single part, we say simply that G is η-uniform.

The Sparse Regularity Lemma of Kohayakawa and Rodl then follows as an immediate corol-lary of Theorem 4.2:

Corollary 4.1 (Sparse Regularity Lemma I [18, 19]). For every ε > 0 and k0 ≥ 1, there existconstants η > 0 and K0 ≥ k0 such that the following holds.


Suppose G is a (P0, η)-uniform graph, where P0 is an equitable partition of V (G) into atmost k0 parts. Then every spanning subgraph H ⊆ G admits an equitable partition P into kparts refining P0 which satisfies:

• k0 ≤ k ≤ K0

• All but at most εk2 pairs of parts in P are (ε,H,G)-regular

Another version of the Sparse Regularity Lemma considered in these same papers deals withupper-regular graphs, a condition which may be roughly described as having “no dense spots”and is in some sense dual to the condition given by equation (4.6). We will show how to derivethis version in Section 6.2.

27

Chapter 5

Hypergraph regularity

In this chapter we will extend the Regularity Lemma to uniform hypergraphs. This result, calledthe Hypergraph Regularity Lemma, was first obtained by Nagle, Rodl, Schacht and Skokan [23,29, 28] and, independently, by Gowers [14].

The version we will present here is due to Tao [32], who proved it in order to obtain his resultthat the Gaussian primes contain arbitrarily shaped constellations [35], which is a version of theGreen-Tao theorem for the Gaussian primes. The proof presented here is closely related to theone given by Tao, but adapted to our setting and using the methods already developed at earliersections.

5.1 Intuition and definitions

Definition 5.1. Given an integer d ≥ 2, a d-uniform hypergraph is a pair H = (V, E) where Vis a vertex set and E ⊆

(Vd

)is a collection of unordered d-tuples of vertices (which we will call

edges).

The Hypergraph Regularity Lemma may be seen as a “higher-order” version of Szemeredi’sRegularity Lemma for graphs (Theorem 2.3); while the graph version seeks to regularize the setof edges of a graph (which is of “second order”, as a subset of the pairs of vertices) by partitioningits vertex set (which is then of “first order”), the hypergraph regularity lemma seeks to regularizethe d-th order set of edges of a d-uniform hypergraph by (d− 1)-th order sets of (d− 1)-tuples ofvertices, and then regularize these new sets by (d− 2)-th order sets of (d− 2)-tuples of verticesand so on, until we end up in a partition of its vertex set.

This way, we get a sequence

Pd =

E ,(V

d

)\ E, Pd−1 ⊂ P

((V

d− 1

)), · · · , P2 ⊂ P

((V

2

)), P1 ⊂ P(V )

of partitions at each order such that the j-th order partition Pj is well approximated in a certainsense by the (j − 1)-th order partition Pj−1, for all 2 ≤ j ≤ d.

Remark 5.1. It might seem strange that we need so many partitions, and at all different “or-ders”. It is possible to obtain a “regularity lemma” only partitioning the set of vertices and nothigher-order tuples, but the regular properties of the partitions obtained are not sufficiently strongto imply some important applications such as a hypergraph counting lemma. This is related toa similar problem when trying to construct limit objects for hypergraphs using the HypergraphRegularity Lemma (see [38]), which requires us to consider a (2d − 2)-dimensional object forlimits of d-uniform hypergraphs

To obtain such partitions, we will make use of a “multidimensional” generalization of theStrong Structure Theorem (SST, Theorem 2.2) given in Section 2.4, and whose proof is essentiallyidentical to that of the original SST. For this, we will need to define a kind of multidimensionalconditional expectation:

Definition 5.2. Let f := (f (1), · · · , f (k)) ∈⊗

i∈[k] L2(Bmax) be a k-tuple of square-integrable

real functions. Given a factor B :=⊗

i∈[k] B(i) of Bk

max :=⊗

i∈[k] Bmax, define E[f |B] ∈⊗i∈[k] L

2(B(i)) by

E[f |B](x) :=(E[f (1)|B(1)](x), · · · , E[f (k)|B(k)](x)

), ∀x ∈ X

28 Chapter 5. Hypergraph regularity

We also define the norm

‖f‖L2∗k :=

√√√√ k∑i=1

‖f (i)‖2L2

With these definitions, we obtain:

Theorem 5.1 (Multidimensional Strong Structure Theorem). Let S be a collection of “struc-tured factors” of Bk

max. Suppose f := (f (1), · · · , f (k)) ∈⊗

i∈[k] L2(Bmax) satisfies ‖f‖2L2∗k ≤ k,

let 0 < ε ≤ 1 and m ≥ 1 be constants, and let F : R+ → R+ be an arbitrary increasing function.Then there exists an integer M = Oε,F,m,k(1) satisfying M ≥ m, factors B ⊆B′ ⊆Bk

max,and a decomposition f = fstr + fpsd + ferr such that:

• fstr = E[f |B], with complexS(B) ≤M

• fpsd = f − E[f |B′] is 1/F (M)-pseudorandom

• ferr = E[f |B′]− E[f |B] satisfies ‖ferr‖L2∗k ≤ ε

The main difference between this multidimensional version of SST and simple repeated appli-cations of the original SST is that in Theorem 5.1 the individual factors B(1), · · · ,B(k) may bemade correlated to each other, depending on the structure of the set S. This correlation cannotbe obtained simply by applying k times the original SST, and it will be crucial for establishingthe Hypergraph Regularity Lemma.

Proof. We repeat the proof of the Strong Structure Theorem (Theorem 2.2), but using the newconventions. Set M0 := m and B0 :=

⊗i∈[k] ∅, X. For each i ≥ 1, use (a multidimensional

version of) the Weak Structure Theorem (Lemma 2.3) with ε being 1/F (Mi−1) and B beingBi−1. We obtain a factor Zi of complexity at most kF (Mi−1)2 relative to S, and such thatf −E[f |Bi−1 ∨Zi] is 1/F (Mi−1)-pseudorandom; set then Bi := Bi−1 ∨Zi and Mi := Mi−1 +kF (Mi−1)2.

Because ‖f‖2L2∗k ≤ k, by the pigeonhole principle there exists j ≤ k/ε2 such that

‖E[f |Bj+1]− E[f |Bj ]‖2L2∗k = ‖E[f |Bj+1]‖2L2∗k − ‖E[f |Bj ]‖2L2∗k ≤ ε2

We may then take B := Bj , B′ := Bj+1 and M := Mj .

We will now fix some notation for the hypergraph setting before stating the HypergraphRegularity Lemma. It will be convenient to restrict ourselves to the case of partite hypergraphs.

Let then H = ((Vj)j∈[`], E) be an `-partite d-uniform hypergraph whose vertex set is indexedby [`]; this means that each hyperedge will have exactly d vertices, no two of which belonging tothe same class Vj .

For any subset f ⊆ [`], we define Vf = (Vj)j∈f :=∏j∈f Vj and let πf : V[`] → Vf be the

canonical projection map onto the coordinates in f. We then define on V[`] the σ-algebra Af :=

π−1f (E) : E ⊆ Vf, which is the collection of all subsets of V[`] which depend only on the elements

whose index belongs to the set f.As an example, consider a 3-partite 3-uniform hypergraph H = ((V1, V2, V3), E). Then

E ⊆ V1 × V2 × V3, V1,3 = V1 × V3, V1 = V1,

π1,3(x1, x2, x3) = (x1, x3) ∈ V1,3 ∀(x1, x2, x3) ∈ V1 × V2 × V3

and A1 = E1×V2×V3 : E1 ⊆ V1, A1,2 = E12×V3 : E12 ⊆ V1×V2, A1,2,3 = 2V1×V2×V3 .Given a factor B ⊆ A[`], define the complexity of B (written complex(B)) as the smallest

number of sets needed to generate B as a σ-algebra. Given a set e ⊆ [`], define the skeleton ∂eof e as the collection f ⊂ e : |f| = |e| − 1.

5.2 Regularity at a single level

Let us now give a high-level overview of our Hypergraph Regularity Lemma to be proven in thenext section.

5.2. Regularity at a single level 29

Consider a subset g ∈(

[`]d

)of d indices in [`], and let Eg := E ∩ Vg be the edges of the

hypergraph H with one vertex in each Vj , j ∈ g. We then start with the factor

Bg := ∅, π−1g (Eg), π−1

g (Vg \ Eg), V[`] ⊂ Ag

generated by the edges Eg (which may be an arbitrary subset of Vg), and try to capture some ofits structure by lower-order σ-algebras Be ⊂ Ae, for e ∈

(gd−1

). We then try to model these σ-

algebras Be by still lower-order approximations Bf ⊂ Af, for f ∈(

gd−2

), and repeat this procedure

of regularizing higher-order σ-algebras by lower-order ones until we end up in a partition of eachvertex class Vj , j ∈ g, which is represented by a factor Bj of Aj.

The constructed σ-algebras Be should capture most of the structure of the hyperedges Egpresent on the |e|-th order level which is measurable by Ae, so that the actual hyperedges betweenthe atoms of Be behave randomly, while still maintaining bounded complexity.

This must be performed for all set of indices g ∈(

[`]d

), and the lower-order σ-algebras Be we

construct must depend only on the set of indices e but not on the sets g ⊃ e we started with.This is the reason we need the dependencies between the factors B(i) given by Theorem 5.1,which is not guaranteed by only repeatedly applying the Strong Structure Theorem.

Our first step is then to obtain a lemma to regularize each “level” of our construction sepa-rately. As before, we will need to construct two σ-algebras (the coarse and fine approximations)to obtain an optimal result.

Lemma 5.1 (Regularity at the j-th Level [32]). Let m ≥ 1 and ` ≥ d ≥ 2 be integers, and letH = ((Vi)i∈[`], E) be an `-partite d-uniform hypergraph on V = V1 ∪ · · · ∪ V`.

Let 2 ≤ j ≤ d be an integer and, for each e ∈(

[`]j

), let Be ⊆ Ae be a σ-algebra satisfying

complex(Be) ≤ m.Let ε > 0 be a positive number and F : R+ → R+ be an arbitrary increasing function.Then there exists M = Oε,F,m,`(1) satisfying the bound M ≥ F (m) and, for each f ∈

([`]j−1

),

there exists a pair of σ-algebras Bf ⊆ B′f ⊆ Af (the coarse and fine approximations) such that

the following holds. Every j-th level measurable set Ee ∈ Be, e ∈(

[`]j

), admits a lower-order

decomposition 1Ee= f

(Ee)str + f

(Ee)psd + f

(Ee)err where:

• f (Ee)str := E

[1Ee|∨

f∈∂e Bf

], with complex(Bf) ≤M ∀f ∈

([`]j−1

)• f (Ee)

psd := 1Ee− E

[1Ee|∨

f∈∂e B′f

]satisfies

supEf∈Af∀f∈∂e

∣∣∣∣∣∣Ef (Ee)

psd

∏f∈∂e

1Ef

∣∣∣∣∣∣ ≤ 1

F (M)

• f (Ee)err := E

[1Ee|∨

f∈∂e B′f

]− E

[1Ee|∨

f∈∂e Bf

]satisfies ‖f (Ee)

err ‖L2 ≤ ε

This lemma is basically a restatement of the Multidimensional Strong Structure Theorem(Theorem 5.1) applied to the partite hypergraph setting. Here Bmax = A[`] is the discrete σ-algebra on V[`] =

∏i∈[`] Vi, we have a function 1Ee

for each set Ee measurable with respect toone of the original σ-algebras Be we are given at the beginning, and we wish to decompose allof them at once by using bounded-complexity σ-algebras Bf one level lower than the Be.

This follows essentially by enumerating all measurable sets Ee ∈ Be and taking the collectionS to be a suitable “multidimensional” version of

Sj =

∨f∈∂e

σ(Ef) : e ∈(

[`]

j

), Ef ∈ Af ∀f ∈ ∂e

, (5.1)

which represents the information in Be which is measurable with respect to a lower-order σ-algebra Bf, f ∈ ∂e.

Proof. We enumerate the “original” measurable sets Ee by an index from 1 to k :=∑

e∈([`]j ) |Be|,

so that for every e ∈(

[`]j

), each set Ee ∈ Be is mapped to an index ı(Ee) ∈ [k]. We denote also by

30 Chapter 5. Hypergraph regularity

e : [k]→(

[`]j

)the membership function which associates each index i ∈ [k] to the set e(i) ∈

([`]j

)from whose σ-algebra Be(i) came the set Ee indexed by i.

With this enumeration, we join all these measurable sets’ indicator functions into a single

k-tuple f =(1Ee

: e ∈(

[`]j

), Ee ∈ Be

), so that the i-th coordinate f (i) is the indicator function

of the set indexed by i. Note that k is bounded, since

k =∑

e∈([`]j )

|Be| ≤∑

e∈([`]j )

22complex(Be)

≤ 2`22m

The version of the set Sj (equation (5.1)) which is adapted to our setting is

Sj :=

k⊗i=1

∨f∈∂e(i)∩∂g

σ(Ef)

: g ∈(

[`]

j

), Ef ∈ Af ∀f ∈ ∂g

It represents taking a factor Yg =

∨f∈∂g σ(Ef) ∈ Sj ∩ Ag for some g ∈

([`]j

)and pulling it back

to a factor of the whole space of f in the natural way. Any factor B = Y1 ∨ · · · ∨ Yr withcomplexity r relative to Sj may then be written as B =

⊗i∈[k]

∨f∈∂e(i) Bf, for some factors

Bf ⊆ Af generated by at most r sets Ef ∈ Af, ∀f ∈(

[`]j−1

).

The Multidimensional Strong Structure Theorem (Theorem 5.1) applied to f and Sj then

permits us to conclude the proof, since for every e ∈(

[`]j

)and Ef ∈ Af, ∀f ∈ ∂e, the set

⋂f∈∂eEf

is measurable by a factor Ye ∈ Sj , and so∣∣∣∣∣∣Ef ı(Ee)

psd

∏f∈∂e

1Ef

∣∣∣∣∣∣ ≤∥∥∥E [f ı(Ee)

psd | Ye]∥∥∥L2≤ supY∈Sj

‖E [fpsd | Y ]‖L2∗k ≤1

F (M).

Intuitively, this lemma provides “coarse” low-order σ-algebras Bff∈∂e and “fine” low-orderσ-algebras B′ff∈∂e which approximate the higher-order σ-algebras Be given at the beginning.The coarse σ-algebras have bounded complexity, the fine approximation is close to the coarseapproximation in L2 norm, and the higher-order σ-algebras are very pseudorandom with respectto the fine lower-order σ-algebras.

5.3 Regularizing all levels simultaneously

The full regularity lemma now follows from the previous lemma by recursion, made to regularizeall different levels at once:

Theorem 5.2 (Hypergraph Regularity Lemma [32]). Let ` ≥ d ≥ 2 be integers and let H =((Vi)i∈[`], E) be an `-partite d-uniform hypergraph on V = V1 ∪ · · · ∪ V`.

Let F : R+ → R+ be an arbitrary increasing function and, for all e ∈(

[`]d

), define Ee := E ∩Ve

as the set of edges of H in Ve and Be := ∅, π−1e (Ee), π−1

e (Ve \ Ee), V[`] as the factor of Ae

generated by π−1e (Ee).

Define Md := 1. Then there exist numbers M1,M2, · · · ,Md−1 satisfying

F (1) ≤Md−1 ≤ F (Md−1) ≤Md−2 ≤ · · · ≤ F (M2) ≤M1 = OF,`(1)

and, for each 1 ≤ j < d and each f ∈(

[`]j

), there exists a pair of σ-algebras Bf ⊆ B′f ⊆ Af such

that:

• complex(Bf) ≤Mj for all 1 ≤ j ≤ d, f ∈(

[`]

j

)

•

∥∥∥∥∥∥E1Ee

|∨f∈∂e

B′f

− E

1Ee|∨f∈∂e

Bf

∥∥∥∥∥∥L2

≤ 1

F (Mj)

for all 2 ≤ j ≤ d, e ∈(

[`]

j

), Ee ∈ Be

5.3. Regularizing all levels simultaneously 31

• supEf∈Af ∀f∈∂e

∣∣∣∣∣∣E1Ee

− E

1Ee|∨f∈∂e

B′f

∏f∈∂e

1Ef

∣∣∣∣∣∣ ≤ 1

F (M1)

for all 2 ≤ j ≤ d, e ∈(

[`]

j

), Ee ∈ Be

Proof. We proceed by recursion on the level j, from j = d to j = 2.We first use Regularity at the j-th Level (Lemma 5.1) with j = d, σ-algebras Be for e ∈

([`]d

),

m = 1, ε = 1/F (1) and the function F in the lemma being substituted by a function Fd−1 wewill choose at the end. We then obtain a number Md−1 satisfying the bounds Fd−1(1) ≤Md−1 =

O`,F (1),Fd−1(1) and, for each f ∈

([`]d−1

), we obtain a pair of σ-algebras Bf ⊂ B′f ⊂ Af such that:

− complex(Bf) ≤Md−1 ∀f ∈(

[`]

d− 1

)

−

∥∥∥∥∥∥E1Ee

|∨f∈∂e

B′f

− E

1Ee|∨f∈∂e

Bf

∥∥∥∥∥∥L2

≤ 1

F (1)=

1

F (Md)∀e ∈

([`]

d

), Ee ∈ Be

− supEf∈Af ∀f∈∂e

∣∣∣∣∣∣E1Ee

− E

1Ee|∨f∈∂e

B′f

∏f∈∂e

1Ef

∣∣∣∣∣∣ ≤ 1

Fd−1(Md−1)∀e ∈

([`]

d

), Ee ∈ Be

Supposing we have already constructed the σ-algebras Be ⊂ B′e ⊂ Ae for all e ∈(

[`]j

), with

2 ≤ j < d, together with the number Mj , we now use Regularity at the j-th Level for theseσ-algebras (Be)e∈([`]

j ) with m = Mj , ε = 1/F (Mj) and F being Fj−1 (for some function Fj−1

we will choose at the end). We then obtain a number Mj−1 satisfying Fj−1(Mj) ≤ Mj−1 =

OMj ,`,F (Mj),Fj−1(1) and, for each f ∈

([`]j−1

), a pair of σ-algebras Bf ⊂ B′f ⊂ Af such that:

− complex(Bf) ≤Mj−1 ∀f ∈(

[`]

j − 1

)

−

∥∥∥∥∥∥E1Ee

|∨f∈∂e

B′f

− E

1Ee|∨f∈∂e

Bf

∥∥∥∥∥∥L2

≤ 1

F (Mj)∀e ∈

([`]

j

), Ee ∈ Be

− supEf∈Af ∀f∈∂e

∣∣∣∣∣∣E1Ee

− E

1Ee|∨f∈∂e

B′f

∏f∈∂e

1Ef

∣∣∣∣∣∣ ≤ 1

Fj−1(Mj−1)∀e ∈

([`]

j

), Ee ∈ Be

Now it suffices to choose the functions F1, F2, · · · , Fd−1 in such a way that Fi(Mi) ≥ F (M1)for all i ∈ [d− 1] (and that M1 remains bounded by OF,`(1)). We then choose F1 = F and, foreach j from 2 to d− 1, choose Fj in a way that Fj(Mj) ≥ Fj−1(Mj−1); this is possible becauseFj−1 was already chosen and Mj−1 depends only on `,Mj , F and Fj−1, so it suffices to chooseFj sufficiently large depending on `, F and Fj−1 (and so ultimately only on ` and F ).

This way we have that Fi(Mi) ≥ F (M1) for all i ∈ [d− 1], and

M1 = O`,F,F1,F2,··· ,Fd−1(1) = O`,F (1).

This theorem may be used to give a Hypergraph Counting Lemma and a Hypergraph RemovalLemma which generalize those proven for graphs in Chapter 3 to arbitrary uniform hypergraphs.We will not do so here, but instead refer the interested reader to [32] or [28].

33

Chapter 6

Dealing with sparsity:transference principles

This chapter is devoted to proving results which allow us to transfer some combinatorial theoremsfrom the usual “positive density” setting to the very sparse setting, when we are dealing withobjects which have asymptotically negligible density as the size of our universe increases. Themoral of these results is that, if a sparse object S satisfies some mild regularity conditions, thenit may be modeled by a dense object D which is in some sense indistinguishable from S.

For the rest of this chapter, fix a finite set X to be our universe and a probability distributionP on X. While P may be arbitrary, it is instructive to think of it as being the uniform probabilitydistribution over the elements of X, and this is what we will assume in our informal discussions.

We will use the ideas and definitions discussed in Section 1.1, which we briefly recall below.Let C ⊆ 2X be a collection of subsets of X, which we will think of as being the basic structuredsubsets of X and are supposed to be of low complexity.

We say that two functions g, h : X → R are ε-indistinguishable according to C if for all A ∈ Cwe have that |E [(g − h) 1A]| ≤ ε. A set A′ ⊆ X is said to have complexity at most K withrespect to C if it may be expressed as a boolean combination of at most K sets of C; we denotethis by complexC(A

′) ≤ K.

6.1 Subsets of pseudorandom sets

The aim of this section is to show that every subset D of a (possibly very sparse) pseudorandomset R ⊂ X may be modeled by a set M ⊂ X whose density inside X is the same as the densityof D in R.

We will actually prove a slightly more general result, regarding arbitrary positive functionsg : X → R+ (which we will henceforth call measures) instead of sets. This result concerns whenit is possible to approximate a measure g by a bounded function h : X → [0, 1] in such a waythat g is indistinguishable from h according to the collection C.

To see the relation of this problem to that described in the first paragraph, let us representthe pseudorandom set R by its normalized indicator function gR := 1R · |X|/|R|, and representany subset D ⊂ R by gD := 1D · |X|/|R|. This normalization is made to account for the possiblyvery small density of R in X, and for us to have the expectation of gD equal to the relativedensity of D in R.

If we can approximate the measure gD by a bounded function h : X → [0, 1], then by samplingindependently each element x ∈ X with probability h(x), we can construct a set M which (withhigh probability) will have the same density in X as that of D in R, and also have the sameproportion of elements intersecting each of the sets in C.

The main difference of this problem to that considered before in Chapter 2 is that the measureg we wish to model is no longer bounded, which introduces some additional complications. Whatour next theorem will show is that, if instead g is bounded by a sufficiently pseudorandom measureν : X → R+ (which may itself be unbounded), then there exists a function h : X → [0, 1] whichis indistinguishable from g according to C.

The notion of pseudorandomness we will need here is that of being indistinguishable fromthe constant function 1 by a somewhat larger collection than C, comprising all sets of a givencomplexity K with respect to C:

34 Chapter 6. Dealing with sparsity: transference principles

Definition 6.1. A measure ν : X → R+ is (η,K)-pseudorandom according to C if

|E [(ν − 1)1A]| ≤ η ∀A ⊆ X : complexC(A) ≤ K

With this notion of pseudorandomness, we obtain the following theorem:

Theorem 6.1 (Pseudorandom Transference Principle [24]). Let ε ∈ (0, 1] be an error parameterand suppose ν : X → R+ is (ε,

⌈1/ε2

⌉)-pseudorandom according to C.

Then any measure g : X → R+ satisfying 0 ≤ g ≤ ν admits a “dense model” h : X → [0, 1]which is 3ε-indistinguishable from g:

|E [(g − h)1A]| ≤ 3ε, ∀A ∈ C

Proof. Define the probability distribution Pν on X by Pν(A) := E[ν1A]/E[ν] for all A ⊆ X, andlet Eν be the associated expected value function. Define the bounded function

f(x) =

g(x)/ν(x) if ν(x) > 0

0 if ν(x) = 0

and apply the Weak Structure Theorem (Lemma 2.3) to f , with probability distribution Pν andstructured set S = σ(A) : A ∈ C.

We then obtain a factor B ⊆ 2X of complexity less than 1/ε2 with respect to C and suchthat the function h := Eν [f |B] satisfies

|Eν [(f − h)1A]| ≤ ‖Eν [f − h | σ(A)]‖L2(ν) ≤ ε, ∀A ∈ C

Since 0 ≤ g ≤ ν, it is clear that h satisfies 0 ≤ h ≤ 1; let us now prove that |E [(g − h)1A]| ≤ 3εfor all A ∈ C.

Fixed any A ∈ C, by the triangle inequality we have

|E [(g − h)1A]| = |E [(νf − h)1A]| ≤ |E [ν(f − h)1A]|+ |E [(ν − 1)h1A]|

The first term on the right-hand side is easily seen to be at most 2ε, since

|E [ν(f − h)1A]| = |Eν [(f − h)1A]| · E[ν] ≤ ε(1 + ε) ≤ 2ε

For the second term, note that B∨σ(A) has complexity at most⌈1/ε2

⌉and h1A is measurable

with respect to B ∨ σ(A). Because 0 ≤ h ≤ 1 and the function y 7→ E[(ν − 1)y1A′ ] is linear forany fixed set A′ ⊆ X, there exists a set A′ ∈ B ∨ σ(A) such that

|E[(ν − 1)h1A]| ≤ |E[(ν − 1)1A′ ]|

By the pseudorandomness hypothesis of ν we obtain that |E[(ν−1)1A′ ]| ≤ ε, thus completingthe proof.

This theorem is closely related to the transference principle used by Green, Tao and Ziegler[16, 36] to transfer Additive Combinatorics results from dense subsets of the integers to densesubsets of sparse pseudorandom sets of integers. We will give their result, called here the DenseModel Theorem, in Section 6.3.

We remark that Theorem 6.1 applied to the case X = V × V and C = A×B : A,B ⊆ V allows us to transfer the Szemeredi Regularity Lemma (Theorem 2.3) to the “subgraph of anη-uniform graph” setting and with this reprove Corollary 4.1; see the details in [24].

6.2 Upper-regular functions

In some situations, the sparse set or unbounded measure we are interested in analyzing is notmajorized by some fixed pseudorandom measure. In such cases, it is also possible to obtain asimilar transference result as that given in the last section if the object in question satisfies somemild uniformity condition which we call upper-regularity:

6.2. Upper-regular functions 35

Definition 6.2 (Upper regularity). Given constants η > 0 and D,K ≥ 1, we say that a functionf : X → R+ is (η,D,K)-upper regular with respect to C if

E [f1A] ≤ DP(A)‖f‖L1 ∀A ⊆ X : complexC(A) ≤ K, P(A) ≥ η

Intuitively this definition says that, while the function f may sometimes take values muchhigher than its average, inside reasonably large sets of bounded complexity these values are tosome extent averaged out.

As a more “analytical” example of upper regular functions, note that whenever we have‖f‖Lp ≤ C‖f‖L1 for some value p > 1 and some constant C ≥ 1, by Holder’s inequality weobtain that

E[f1A] ≤ ‖f‖Lp‖1A‖Lq ≤ C‖f‖L1P(A)1−1/p

The value on the right-hand side is at most Cη−1/pP(A)‖f‖L1 if P(A) ≥ η, showing that in thiscase f is (η, Cη−1/p, K)-upper regular for any η > 0, K ≥ 1 and any class of distinguishersC ⊆ 2X .

The definition of upper regularity may be seen as a weak “L1-Cauchy-Schwarz” inequality,and it will allow us to apply (with some care) the energy-increment method for a function f ofbounded L1 norm even when we have no bounds for the L2 norm of f .

The following lemma uses this idea in order to obtain an analogue of the Weak StructureTheorem (Lemma 2.3) for upper-regular functions:

Lemma 6.1. Given ε > 0 and D ≥ 1, define K := 9D4/ε2 and η := (ε/3D)9D4/ε2 . Then forevery (η,D,K)-upper regular function f : X → R+ there exists a factor B ⊆ 2X such that:

• complexC(A′) ≤ K, ∀A′ ∈ B

• P(A′) ≥ η, ∀A′ ∈ B \ ∅

• |E[(f − E[f |B])1A]| ≤ ε‖f‖L1 , ∀A ∈ C

Proof. We will recursively choose sets A1, A2, · · · , Am ⊂ X, for some 1 ≤ m ≤ K, in the followingway. First, define Z0 := ∅, X and set i = 1. Given Zi−1, we will construct a collection Ciwhich approximates C in the following sense:

1. For all A ∈ C there exists A′ ∈ Ci such that P(A4A′) ≤ α and A′ ∈ Zi−1 ∨ σ(A)

2. P(A′ ∩ A)/P(A) ∈ [α, 1− α] ∪ 0, 1 for all A′ ∈ Ci and every atom A of Zi−1

To do this we decompose Zi−1 into atoms A1 ∪A2 ∪ · · · ∪ AMi−1and construct, for each set

A ∈ C, the “approximating set” A′ :=⋃Mi−1

j=1 A′j where

A′j :=

∅, if P(A ∩ Aj) < αP(Aj)

A ∩ Aj , if P(A ∩ Aj)/P(Aj) ∈ [α, 1− α]

Aj , if P(A ∩ Aj) > (1− α)P(Aj)

This way, for every j ∈ [Mi−1] we have that

P((A ∩ Aj)4A′j) ≤ αP(Aj) ⇒ P(A4A′) ≤ αA′j ∈ Zi−1 ∨ σ(A) ⇒ A′ ∈ Zi−1 ∨ σ(A),

and condition 2 is satisfied by construction. The set Ci is then formed by all these sets A′.If ‖E [f − E[f |Zi−1] | σ(A′)]‖L2 ≤ α‖f‖L1 for all A′ ∈ Ci, then set B = Zi−1. Otherwise,

choose (any) Ai ∈ Ci such that

‖E [f |Zi−1 ∨ σ(Ai)]‖2L2 ≥ ‖E[f |Zi−1]‖2L2 + α2‖f‖2L1

(the existence of such a set is guaranteed by Lemma 2.2). Define Zi := Zi−1 ∨ σ(Ai) andincrement i to i+ 1.


We see that, for all i ≤ K, any non-empty set B ∈ Zi will have complexity at most i andprobability at least αi ≥ αK = η, so

‖E[f |Zi]‖L∞ = maxA atom of Zi

|E[f1A]|P(A)

≤ D‖f‖L1

This way, since ‖E[f |Zi]‖2L2 is bounded by D2‖f‖2L1 and increases by at least α2‖f‖2L1 at eachstep, the algorithm must terminate at a time m ≤ D2/α2. At the end, by Cauchy-Schwarz andour stopping condition, we must have

∀A′ ∈ Cm+1, |E [(f − E[f |B]) 1A′ ]| ≤ ‖E [f − E[f |B] | σ(A′)]‖L2 ≤ α‖f‖L1

Given any A ∈ C, we then have

|E [(f − E[f |B]) 1A]| ≤ |E [(f − E[f |B]) 1A′ ]|+∣∣E [(f − E[f |B]) 1A\A′

]∣∣+∣∣E [(f − E[f |B]) 1A′\A

]∣∣≤ α‖f‖L1 +D‖f‖L1(P(A4A′) + η)

≤ 3Dα‖f‖L1

It then suffices to take α = ε/3D and K = 9D4/ε2, η = (ε/3D)9D4/ε2 .

By taking the “dense model” function h = E [f |B] obtained in Lemma 6.1, we immediatelyobtain the corresponding transference principle for upper-regular functions:

Theorem 6.2 (Upper-regular Transference Principle). For every ε > 0 and D ≥ 1, there exist

K = (D/ε)O(1)

and η = 2−(D/ε)O(1)

such that the following holds.If f : X → R+ is an (η,D,K)-upper regular function with respect to C satisfying ‖f‖L1 ≤ 1,

then f admits a 1/D-dense model h : X → [0, D] which is ε-indistinguishable from f :

|E [(f − h)1A]| ≤ ε, ∀A ∈ C

This notion of upper-regularity given for functions may be naturally specialized to the case ofgraphs. Given constants 0 < η ≤ 1 and D ≥ 1, we say a graph G = (V,E) is (η,D)-upper regularif dG(A,B) ≤ Dp for all subsets A,B ⊆ V satisfying |A||B| ≥ η|V |2, where p := 2|E|/|V |2 is theedge density of G.

Intuitively, this condition means that there are no reasonably large sets A,B ⊆ V whichhave a density much higher than the average density of the graph. It is easy to see that everysubgraph G with relative density 1/D inside an η-uniform graph Γ is (η, (1+η)D)-upper regular.We will next use Theorem 6.2 to prove a partial converse to this observation:

Lemma 6.2. For every ε > 0 and D ≥ 1, there exist η = 2−(D/ε)O(1)

and n0 ∈ N such thatthe following holds. For every (η,D)-upper regular graph G on n ≥ n0 vertices and with density1/n p 1, there exists an ε-uniform graph Γ ⊇ G on the same vertex set such that G is1/D-dense on Γ.

Proof. Let us denote by V the vertex set of G. Apply the Upper-regular Transference Principle(Theorem 6.2) to the normalized edge indicator function p−11G on the space V ×V , with uniformprobability distribution and collection of distinguishers C = A×B : A,B ⊆ V . We then obtaina function h : V ×V → [0, D] which satisfies the inequality ‖p−11G−h‖ ≤ ε, and by specializingthe proof of Theorem 6.2 we may easily require h to be symmetric (that is, h(x, y) = h(y, x) forall x, y ∈ V ).

Define the function fΓ := 1G + p(1 − 1G)(D − h), and note that 0 ≤ fΓ ≤ 1. Let Γ be arandom graph on V with P(xy ∈ Γ) = fΓ(x, y) for all pairs x, y of vertices in V , all choicesbeing independent.

With probability 1 the graph Γ will contain G as a subgraph, and if p 1/n then standardconcentration inequalities imply that ‖1Γ − fΓ‖ ≤ εp with high probability (see the details inthe proof of Lemma 7.2 next chapter). Then

‖p−11Γ −D‖ ≤ p−1‖1Γ − fΓ‖ + ‖p−1fΓ −D‖≤ ε+ ‖p−11G − h‖ + ‖1G(D − h)‖≤ ε+ ε+Dp,

6.3. Green-Tao-Ziegler Dense Model Theorem 37

which is less than 3ε if p ≤ ε/D.With positive probability, the value of ‖1Γ‖L1 will be at most its expectation ‖fΓ‖L1 ≤ Dp;

thus there exists a graph Γ ⊇ G of density at most Dp and which is (3ε)1/2-uniform.

This lemma together with Corollary 4.1 allows us to immediately prove a sparse regularitylemma for upper-regular graphs, provided the density of the upper-regular graph satisfies 1/np 1. Iterating Lemma 6.1 instead of Lemma 2.3 in the proof of the Strong Structure Theorem(Theorem 2.2) and then repeating our proof of the Szemeredi Regularity Lemma with minimalchanges, it is easy to obtain the full sparse regularity lemma we give below without these furtherconditions. We note that this version of sparse regularity was also proven first by Kohayakawaand Rodl [18, 19].

Corollary 6.1 (Sparse Regularity Lemma II [18, 19]). For every ε > 0, k0 ≥ 1 and D ≥ 1 thereexist constants η > 0 and K0 ≥ k0 such that the following holds.

Every (η,D)-upper regular graph G = (V,E) admits an equitable partition P = (Vi)i∈[k] intok parts such that:

• k0 ≤ k ≤ K0

• ||Vi| − |Vj || ≤ 1 for all i, j ∈ [k]

• all but but at most εk2 pairs (Vi, Vj) are εp regular, where p := 2|E|/|V |2

6.3 Green-Tao-Ziegler Dense Model Theorem

One of the key ingredients in Green and Tao’s proof that the primes contain arbitrarily longarithmetic progressions [16] was a relative version of Szemeredi’s Theorem. Szemeredi’s Theorem[30] states that any subset of the integers with positive upper density contains arbitrarily longarithmetic progressions. The relative version of this theorem proven by Green and Tao givesthe same conclusion when the ground set is no longer the integers, but instead an arbitrarypseudorandom subset (or more generally some pseudorandom measure).

The proof of this relative version is split in two parts, the Dense Model Theorem and theCounting Lemma. The Dense Model Theorem will be given below, and asserts that any relativelydense set A of a sufficiently pseudorandom subset of N may be modeled by a dense subset Aof N (this is the same idea of Theorem 6.1 given in Section 6.1, but uses a slightly differentmeasure of pseudorandomness). The Counting Lemma then says that the number of arithmeticprogressions in A is close (when properly normalized) to the number of arithmetic progressionsin A. Szemeredi’s Theorem applied to the set A then permits us to conclude A has arbitrarilylong arithmetic progressions.

The Dense Model Theorem was made more explicit in a subsequent paper of Tao and Ziegler[36], where they used similar methods to obtain the stronger result that the primes contain arbi-trarily long polynomial progressions. The pseudorandomness condition they use in this theoremis equivalent to the one given below, taken from [25]:

Definition 6.3. If F is a collection of bounded functions f : X → [−1, 1], we denote by Fk the

collection of all functions of the form∏k′

i=1 fi, where fi ∈ F and k′ ≤ k. We say ν : X → R+ isη-pseudorandom according to Fk if

|E [(ν − 1)f ′]| ≤ η ∀f ′ ∈ Fk

In this context, the transference principle they proved says roughly the following: if ν isη(ε)-pseudorandom according to FK(ε), then any measure g satisfying 0 ≤ g ≤ ν admits a densemodel h : X → [0, 1] which is ε-indistinguishable from g according to F (where η(ε),K(ε) dependonly on ε).

The full statement of their theorem will be given below (Theorem 6.3), but before stating itformally we will discuss how to obtain such a result using our setting and previous theorems.

By using the transformation f 7→ (1 + f)/2, we may assume without loss of generality thatall functions f ∈ F have image in [0, 1]; this assumption will slightly simplify the exposition.

If all functions in the family F were boolean functions, then Theorem 6.1 would easily permitus to conclude. Indeed, in this case we could associate each boolean function to its support


and take for C the family of the supports of all f ∈ F ; the product of k functions in F wouldthen simply be the intersection of their supports. Since for k sets A1, · · · , Ak ∈ C the atoms ofσ(A1, · · · , Ak) are exactly the (non-empty) intersections

⋂i≤k Bi, where each Bi is either Ai or

ACi , and since there are at most 21/ε2 atoms in a factor generated by k = 1/ε2 sets, it follows

that if ν is ε2−1/ε2 -pseudorandom according to F1/ε2 , then ν is also (ε, 1/ε2)-pseudorandomaccording to C. Theorem 6.1 immediately allows us to conclude (substituting ε by ε/3).

If the functions in F are not boolean, we construct a family C of sets having the samedistinguishing power as F and try to apply Theorem 6.1 to the family C. To construct C, let usfirst bound the complexity of the functions in F by approximating each f ∈ F by a step-functionf with steps of size ε/2; more precisely, given f ∈ F we define the function

f(x) :=

⌊f(x)

ε/2

⌋ε

2,

and note that 0 ≤ f(x)− f(x) < ε/2 for all x ∈ X.By the bounds we have on g and the searched dense model h, we obtain

0 ≤ E[g(f − f)] <ε

2, 0 ≤ E[h(f − f)] <

ε

2,

from which we get |E[(g − h)f ]− E[(g − h)f ]| < ε2 .

We now write the approximating function f as the sum of a bounded number of booleanfunctions:

f(x) =

b2/εc∑j=1

ε

21f(x)≥jε/2

Then, if |E[(g − h)f ]| > ε, we have that

ε

2< |E[(g − h)f ]| ≤ ε

2

b2/εc∑j=1

|E[(g − h)1f(x)≥jε/2]|,

so there must be some 1 ≤ j ≤ b2/εc such that |E[(g − h)1f(x)≥jε/2]| > ε/2.This way, whenever the family of functions F ε-distinguishes g from h, the family of sets

CF,ε := x ∈ X : f(x) ≥ jε/2 : f ∈ F , 1 ≤ j ≤ b2/εc

will ε/2-distinguish g from h. By Theorem 6.1 we just need to make sure |E[(ν − 1)1A′ ]| ≤ ε/6for all sets A′ ⊆ X of complexity at most 36/ε2 relative to CF,ε to conclude this cannot happen.

Suppose then there exists a set A′ ⊆ X of complexity k ≤ 36/ε2 relative to CF,ε for which|E[(ν − 1)1A′ ]| > ε/6. By definition, there exist

A1 = x ∈ X : f1(x) ≥ j1ε/2 , · · · , Ak = x ∈ X : fk(x) ≥ jkε/2 ∈ CF,ε

and a set Ω ⊆ −1,+1k such that we may write A′ as a disjoint union of atoms

A′ =⋃ω∈Ω

k⋂i=1

Aωii ,

where we define A+1i := Ai and A−1

i := X \Ai. This implies there exists ω ∈ Ω for which∣∣∣∣∣E[

(ν − 1)

k∏i=1

1Aωii

]∣∣∣∣∣ > ε

6 · 2k= 2−1/εO(1)

By a “quantitative” version of the Weierstrass polynomial approximation theorem, we mayapproximate any threshold function 1x≥t inside [0, 1] by a polynomial p : [0, 1] → [0, 1] whosedegree and height depend only on the desired accuracy. To be more precise, given an accuracyparameter α > 0, the notion of approximation we require is for the polynomial p to have distance

6.3. Green-Tao-Ziegler Dense Model Theorem 39

at most α to 1x≥t in L∞ norm inside the smaller set [0, 1]\ [t−α, t], where we take away a smallinterval around t to account for the discontinuity of the function 1x≥t at that point.

Approximating each indicator function 1Aωii(x) := 1fi(x)≥jiε/2ωi in this way by a polynomial

pi(fi(x)) with sufficient accuracy, we obtain∣∣∣∣∣E[

(ν − 1)

k∏i=1

pi(fi)

]∣∣∣∣∣ > 2−1/εO(1)

Expanding the product into a sum of multinomials in f1, · · · , fk and using the fact that thedegree and height of each polynomial is bounded as a function of ε, we see that one of the terms|E [(ν − 1)fn1

1 fn22 · · · f

nkk ]| must be grater than cε > 0 depending only on ε.

If we then take K = K(ε) as 36/ε2 times the largest possible degree of such a polynomial pi,we will have obtained a function f ′ ∈ FK which satisfies |E[(ν − 1)f ′]| > cε. It then suffices torequire the pseudorandomness hypothesis

|E[(ν − 1)f ′]| ≤ cε ∀f ′ ∈ FK

to guarantee that every function 0 ≤ g ≤ ν has a dense model h : X → [0, 1] which is ε-indistinguishable from g by the family of functions F .

With some more care (see [24] or [25]), we may obtain more optimal bounds as in the theorembelow:

Theorem 6.3 (Green-Tao-Ziegler Dense Model Theorem [16, 36]). For every ε > 0, there exist

k = 1/εO(1) and η = 2−1/εO(1)

such that the following holds.Suppose that F is a finite collection of bounded functions f : X → [−1, 1], ν : X → R+ is

η-pseudorandom according to Fk, and g : X → R+ is a function such that g ≤ ν.Then there exists a bounded function h : X → [0, 1] such that

|E [(g − h)f ]| ≤ ε ∀f ∈ F

We remark that in the papers [16, 36] cited above the result appeared without an explicitbound on the constants involved (as they were only interested in asymptotic results), and withvery different notation. The theorem stated here appeared in [24, 25] and, apart from thedifferences noted above, is equivalent to Theorem 7.1 in [36].

41

Chapter 7

Transference results for L1

structure

The transference principles shown in the last chapter provide a means of transferring somecombinatorial results from the dense setting (where they are easier to prove) to the sparsesetting, provided the sparse objects satisfy some mild regularity conditions.

Consider for instance the Pseudorandom Transference Principle (Theorem 6.1) specialized tothe case of graphs. The sparse setting (which we will call “sparse space”) may be identified withthe space of all subgraphs of a given sparse pseudorandom graph Γ, while the “dense space” isidentified with the set of all graphs on the same vertex set V as Γ. This result is then a kind ofcorrespondence between these two spaces, roughly saying that every subgraph G of Γ admits amodel graph f(G) on the same vertex set which is dense and indistinguishable from G by cuts(when G is properly normalized).

The aim of this chapter is to show that this dense model function f may be made “continuous”in L1 norm, so that sparse graphs whose edge sets are close to each other will have dense modelswhose edge sets are close to each other, and also to obtain the same result for a “sparse model”function g from the dense space to the sparse space. This will allow us to pass from one spaceof graphs to the other while preserving the underlying L1 geometry, which may be important insome applications.

7.1 Relationships between cut norm and L1 norm

The notion of approximation given by the transference principles (when applied to the graphsetting) is most naturally expressed in terms of the cut norm, and it will be crucial for ourobjectives to understand the relationship between the cut norm and the L1 norm for functionsdefined on V × V .

We recall from Chapter 3 that, for any f : V × V → R, the cut norm of f is defined as

‖f‖ := maxA,B⊆V

|Ex,y∈V [f(x, y)1A×B(x, y)]|,

and that by linearity this definition is equivalent to

‖f‖ = maxu,v:V→[0,1]

|Ex,y∈V [f(x, y)u(x)v(y)]|

The L1 norm of f is here (and for the rest of this chapter) defined as

‖f‖L1 := Ex,y∈V [|f(x, y)|]

It is easy to see that ‖f‖ ≤ ‖f‖L1 , with equality holding if f is either non-negative ornon-positive. However, there is no inequality in the other direction which holds generally forall functions, as can be seen by taking a random uniform assignment of 1 or −1 for each pair(x, y) ∈ V 2. This random function will clearly have unitary L1 norm, but with high probabilityits cut norm will be O(|V |−1/2).

The main reason why such an assignment f has unbounded ratio ‖f‖L1/‖f‖ is that thereis no correlation between the values taken by f and the cuts A × B used to give the cut norm.

42 Chapter 7. Transference results for L1 structure

If we require f to be correlated to such structures, then it is possible to obtain some bounds onthe ratio ‖f‖L1/‖f‖, which will be important for our results.

We then define the following notion:

Definition 7.1 (Step function). We say a function f : V ×V → R is a step function on k stepsif there exists a partition P = (Vi)i∈[k] of V into k parts such that f is constant inside each setVi × Vj, i, j ∈ [k].

If we suppose that the function f is a step function on k steps, then a simple relationshipbetween the cut norm and the L1 norm is that ‖f‖L1 ≤ 2k‖f‖. Indeed, suppose the steps areV1, V2, · · · , Vk and, for each i ∈ [k], define the sets

Pi :=⋃j∈[k]

f |Vi×Vj>0

Vj and Ni :=⋃j∈[k]

f |Vi×Vj<0

Vj

Then

‖f‖L1 =

k∑i=1

‖f1Vi×V ‖L1=

k∑i=1

(|E[f1Vi×Pi ]|+ |E[f1Vi×Ni ]|) ≤ 2k‖f‖

This way, functions with a relatively small number of steps play an important role when wewish to have a relationship between the cut norm and the L1 norm. This suggests us to considerthe following important operator, related to the “rounded graph” as defined in Chapter 3:

Definition 7.2 (Stepping operator). Let P be a partition of the set V , whose atoms are V1, V2, · · · , Vk.Given any function f : V 2 → R, we define (f)P := E [f | P ⊗ P] as the function

(f)P (x, y) =1

|Vi||Vj |∑

u∈Vi,v∈Vj

f(u, v),

where Vi is the part which contains x and Vj is the part which contains y.

It is easy to prove that the stepping operator is a contraction in cut norm. Moreover, sinceit is a conditional expectation, we know from Probability Theory that it is also a contraction inLp norm for all p ≥ 1.

Using these facts, we may prove the following crucial “rigidity” result:

Lemma 7.1 (Rigidity of Strong Regularity). Let f1, f2 : V 2 → R be two functions and P1,P2

be two partitions of V into at most k parts. If∥∥f1 − (f1)P1

∥∥

,∥∥f2 − (f2)P2

∥∥≤ ε/4k2, then∥∥(f1)P1

− (f2)P2

∥∥L1≤ ‖f1 − f2‖L1 + ε

Proof. Let Q := P1∨P2 be the common refinement of P1 and P2. Because the stepping operatoris a contraction in L1 norm, we conclude that∥∥(f1)Q − (f2)Q

∥∥L1 ≤ ‖f1 − f2‖L1

Also, because the stepping operator is a contraction in cut norm, we have∥∥(f1)P1− (f1)Q

∥∥

=∥∥((f1)P1

− f1)Q∥∥≤∥∥(f1)P1

− f1

∥∥≤ ε

4k2

Notice the partition Q has at most k2 steps (since each of P1 and P2 has at most k steps),and so ∥∥(f1)P1

− (f1)Q∥∥L1≤ 2k2

∥∥(f1)P1− (f1)Q

∥∥≤ ε

2

The same reasoning as above similarly implies that∥∥(f2)P2

− (f2)Q∥∥L1 ≤ ε/2, and thus by

the triangle inequality∥∥(f1)P1− (f2)P2

∥∥L1≤∥∥(f1)P1

− (f1)Q∥∥L1

+∥∥(f1)Q − (f2)Q

∥∥L1 +

∥∥(f2)Q − (f2)P2

∥∥L1

≤∥∥(f1)Q − (f2)Q

∥∥L1 + ε

≤ ‖f1 − f2‖L1 + ε

7.2. Inheritance of structure lemmas 43

7.2 Inheritance of structure lemmas

This section is devoted to establishing two important lemmas on “inheriting structure” by takingrandom choices inside a sufficiently uniform graph Γ.

Recall from Section 4.2 that a graph Γ = (V,E) with density p := 2|E|/|V |2 is η-uniform if|dΓ(A,B)− p| ≤ ηp holds for all sets A,B ⊆ V satisfying |A× B| ≥ η|V |2. We note the simplefact that this condition implies

∥∥p−11Γ − 1∥∥≤ 2η.

Suppose then we are given a symmetric function T : V 2 → [0, 1] with a bounded number ofsteps and a sufficiently uniform graph Γ on V . The lemmas proven in this section roughly saythat, if we randomly choose a subgraph G of Γ by adding each edge xy ∈ Γ to G with probabilityT (x, y), then G will inherit both the cut and the L1 structures from the function T .

Below we state the first of these lemmas, which deals with the cut structure:

Lemma 7.2 (Inheritance of cut structure). Let ε > 0, k ≥ 1 be given and define η := ε/4k.Then the following holds for every sufficiently large n ∈ N.

Let V be a set of n vertices, Γ be an η-uniform graph on V with density p ≥ Ωε(1/n), andT : V 2 → [0, 1] be a symmetric function with at most k steps. If G is a random (spanning)subgraph of Γ with

P(xy ∈ G) = T (x, y) ∀xy ∈ Γ,

all choices being independent, then

P(∥∥p−11G − T

∥∥> ε)< e−

ε2

8 pn2

In this lemma, the condition on the density p ≥ Ωε(1/n) means that p ≥ Cε/n for somequantity Cε > 0 depending only on ε.

The proof of Lemma 7.2 (and also that of Lemma 7.3 below) proceeds as follows. We firstbreak the statement into a part that uses only the “pseudorandomness” of Γ and a part thatinvolves only the “real randomness” of G. The pseudorandom part is in fact deterministic, andis dealt with by using the uniformity of Γ and the triangle inequality for the cut norm. Therandom part follows by standard applications of the Chernoff bound and the union bound.

We give the details below.

Proof. Because G ⊆ Γ, we have that∥∥p−11G − T∥∥≤∥∥p−11Γ(1G − T )

∥∥

+∥∥(p−11Γ − 1)T

∥∥

≤ p−1 ‖1Γ(1G − T )‖ +∑i∈[`]

∥∥(p−11Γ − 1)T1Vi×V∥∥, (7.1)

where V1, V2, · · · , V` are the steps of T (and so ` ≤ k).For a fixed i ∈ [`], let A ⊆ Vi, B ⊆ V be sets satisfying∣∣E [(p−11Γ − 1)T1A×B

]∣∣ =∥∥(p−11Γ − 1)T1Vi×V

∥∥,

let σ ∈ −1,+1 be the sign of E[(p−11Γ − 1)T1A×B

]and let Ti,j ∈ [0, 1] be the value of T over

Vi × Vj . Note that for every j ∈ [`] we must have

Ti,jσE[(p−11Γ − 1)1A×(B∩Vj)

]≤ σE

[(p−11Γ − 1)1A×(B∩Vj)

],

otherwise for a contradicting value of j we would have

σE[(p−11Γ − 1)T1A×(B\Vj)

]> σE

[(p−11Γ − 1)T1A×B

]=∥∥(p−11Γ − 1)T1Vi×V

∥∥,

which is impossible. Then

∥∥(p−11Γ − 1)T1Vi×V∥∥

= σE

∑j∈[`]

(p−11Γ − 1)T1A×(B∩Vj)

=∑j∈[`]

Ti,jσE[(p−11Γ − 1)1A×(B∩Vj)

]


≤∑j∈[`]

σE[(p−11Γ − 1)1A×(B∩Vj)

]= σE

[(p−11Γ − 1)1A×B

]≤∥∥p−11Γ − 1

∥∥

From inequality (7.1) we then obtain∥∥p−11G − T∥∥≤ p−1 ‖1Γ(1G − T )‖ +

∑i∈[`]

∥∥p−11Γ − 1∥∥

≤ p−1 ‖1Γ(1G − T )‖ + ` · 2η

≤ p−1 ‖1Γ(1G − T )‖ +ε

2,

and so P(∥∥p−11G − T

∥∥> ε)≤ P (‖1Γ(1G − T )‖ > εp/2).

For any given sets A,B ⊆ V , the Chernoff bound implies that

P

∣∣∣∣∣∣∑

xy∈Γ∩A×B(1xy∈G − T (x, y))

∣∣∣∣∣∣ > ε

2

pn2

2

≤ 2e−2

(ε2

pn2/2|Γ∩A×B|

)2

|Γ∩A×B|

≤ 2e−ε2

4 pn2

Since there are only 22n pairs (A,B) of subsets of V , by the union bound we obtain that

P(‖1Γ(1G − T )‖ >

εp

2

)= P

(∃A,B ⊆ V : |E [1Γ(1G − T )1A×B ]| > εp

2

)≤

∑A,B⊆V

P

∣∣∣∣∣∣∑

xy∈Γ∩A×B(1xy∈G − T (x, y))

∣∣∣∣∣∣ > ε

2

pn2

2

≤ 22n · 2e− ε

2

4 pn2

,

which is smaller than e−ε2

8 pn2

for sufficiently large n ∈ N, provided p ≥ Ωε(1/n).

The second lemma has a similar philosophy, but deals with the L1 structure:

Lemma 7.3 (Inheritance of L1 structure). Let ε > 0, k ≥ 1 be given and define η := ε/4k4.Then the following holds for every n ≥ 1.

Let V be a set of n vertices, Γ be an η-uniform graph on V with density p, and let T1, T2 :V 2 → [0, 1] be symmetric functions with at most k steps. For each edge xy ∈ Γ, we draw σxyuniformly and independently at random from [0, 1], and construct the (random) graphs G1, G2 onV by including edge xy ∈ Γ in G1 (resp. G2) if and only if σxy ≤ T1(x, y) (resp. σxy ≤ T2(x, y)).Then

P(∣∣p−1 ‖1G14G2

‖L1 − ‖T1 − T2‖L1

∣∣ > ε)< 2e−

ε2

4 pn2

Proof. Define T := T1−T2, so that T has at most k2 steps (which we will call V1, V2, · · · , V`, forsome ` ≤ k2) and ‖T‖L∞ ≤ 1. We have that∣∣p−1 ‖1G14G2

‖L1 − ‖T1 − T2‖L1

∣∣ ≤ p−1∣∣‖1G14G2

‖L1 − ‖1ΓT‖L1

∣∣+∣∣p−1 ‖1ΓT‖L1 − ‖T‖L1

∣∣Let Ti,j be the value of T over Vi × Vj for i, j ∈ [`]. Then

∣∣p−1 ‖1ΓT‖L1 − ‖T‖L1

∣∣ =1

n2

∣∣∣∣∣∣∑x,y∈V

p−11xy∈Γ|T (x, y)| −∑x,y∈V

|T (x, y)|

∣∣∣∣∣∣=

1

n2

∣∣∣∣∣∣∑i,j∈[`]

(p−1eΓ(Vi, Vj)− |Vi||Vj |

)|Ti,j |

∣∣∣∣∣∣≤ 1

n2

∑i,j∈[`]

∣∣p−1eΓ (Vi, Vj)− |Vi| |Vj |∣∣

7.3. A “coarse” structural correspondence 45

If |Vi||Vj | ≥ ηn2, then ∣∣p−1eΓ (Vi, Vj)− |Vi| |Vj |∣∣ ≤ η|Vi||Vj | ≤ ηn2,

while if |Vi||Vj | < ηn2 we have∣∣p−1eΓ (Vi, Vj)− |Vi| |Vj |∣∣ ≤ (1 + η)ηn2 ≤ 2ηn2

We then conclude that∣∣p−1 ‖1ΓT‖L1 − ‖T‖L1

∣∣ ≤ 1

n2

∑i,j∈[`]

2ηn2 = 2`2η ≤ ε

2,

and so by the Chernoff bound

P(∣∣p−1 ‖1G14G2

‖L1 − ‖T1 − T2‖L1

∣∣ > ε)≤ P

(∣∣‖1G14G2‖L1 − ‖1ΓT‖L1

∣∣ > εp

2

)= P

∣∣∣∣∣∣∑xy∈Γ

(1G14G2(x, y)− P (xy ∈ G14G2))

∣∣∣∣∣∣ > ε

2

pn2

2

≤ 2e−2( ε2 )

2 pn2

2 = 2e−ε2

4 pn2

7.3 A “coarse” structural correspondence

In this section we give a first transference result between the dense and the sparse settings whichalso takes into account the L1 structure of the graphs.

To enunciate this result, it will be convenient to first establish a notation for the differentspaces of graphs we will work with:

Definition 7.3 (Spaces of graphs). Given a graph Γ = (V (Γ), E(Γ)), we denote by S(Γ) thecollection of all spanning subgraphs of Γ, that is

S(Γ) := G = (V (Γ), E(G)) : E(G) ⊆ E(Γ)

There are two different kinds of graphs which will be used here to generate these spaces.The first is the complete graph on a given vertex set V , which we will denote by KV , and whosespace S(KV ) will represent the “dense space” of graphs on this vertex set. The second is a sparseη-uniform graph for some small constant η, which we will usually denote by Γ, and whose spaceS(Γ) represents the “sparse space” of graphs.

We then wish to prove that the sparse space S(Γ) and the dense space S(KV (Γ)) on the samevertex set are in some sense equivalent, having the same “geometries” given both by the cutstructure and by the L1 structure.

For this first result, we will try to construct a bijection between representative subsets of eachspace, in such a way that this bijection preserves both the cut structure of each graph and thepairwise L1 distance between them.

Perhaps the most natural way of obtaining representative subsets suitable to our purposesis to take an ε-net of each space in L1 norm, since this would imply that every graph has arepresentative in the corresponding set at a distance of at most ε in both L1 and cut norms.However, it is not hard to prove that any ε-net of S(KV (Γ)) will be larger than the entirespace S(Γ) if the density of Γ is sufficiently small compared to ε, which makes such a bijectionimpossible for sparse graphs Γ.

Fortunately, our proof of Theorem 6.1 implies that both spaces are equivalent in cut norm,so we may take ε-nets in cut norm from each of these spaces and make a bijection preserving thecut structure of each graph. The following theorem says we can do this in such a way to alsopreserve the pairwise L1 distance between them:

Theorem 7.1. For every ε > 0 there exists a constant η = 2−(1/ε)O(1)

and a function M : N→ Nsuch that the following holds.


Suppose (Γn)n∈N is a sequence of η-uniform graphs with |V (Γn)| = n and density p = p(n) ≥Ωε(1/n). Then there exist subsets

Sεn = G1, · · · , GM(n) ⊂ S(Γn), Dεn = H1, · · · , HM(n) ⊂ S(KV (Γn))

which satisfy the following conditions for every sufficiently large n ∈ N:

• Sεn and Dεn are ε-nets in cut norm of S(Γn) and S(KV (Γn)), respectively:

∀G ∈ S(Γn) ∃i ∈ [M(n)] : p−1‖1G − 1Gi‖ ≤ ε

∀H ∈ S(KV (Γn)) ∃i ∈ [M(n)] : ‖1H − 1Hi‖ ≤ ε

• For all i, Hi is a dense model of Gi:

∀i ∈ [M(n)],∥∥p−11Gi − 1Hi

∥∥≤ ε

• Sεn and Dεn have the same L1 structure:

∀i, j ∈ [M(n)],∥∥1Hi4Hj∥∥L1 = (1± ε)p−1

∥∥1Gi4Gj∥∥L1

Proof. Suppose n is sufficiently large, and let us call a symmetric function T : V (Γn)2 → [0, 1] bya template on V (Γn). Then the Weak Regularity Lemma (Theorem 2.1) and the PseudorandomTransference Principle (Theorem 6.1) imply that all graphs in S(Γn) and S(KV (Γn)) are ε-close

in cut norm to a template with at most K = 21/εO(1)

steps.Take an ε-net in L1 norm of the space of templates on V (Γn) with at most K steps and which

does not contain two templates that are ε-close in L1 norm. As there exists one such net with atmost Kn(1/ε)K

2

= eΘε(n) elements, we may apply the inheritance of structure lemmas at eachtemplate in this net (using the same random choices) and then apply union-bound to obtain two“model sets” Sεn ⊂ S(Γn), Dε

n ⊂ S(KV (Γn)) which have the same L1 and cut structures as thetemplates they model. The lemma follows by changing ε to ε/3.

7.4 A “fine” structural correspondence

Theorem 7.1 proven in the last section provides a “coarse” structural correspondence between thesparse space S(Γ) and the dense space S(KV (Γ)), given by means of a bijection between ε-netsin cut norm of each space, and which preserves the cut structure of graphs and their pairwiseL1 distance. However, these ε-nets contain only a vanishing fraction of the graphs in each space,and we have no guarantees that they are “well-spread” in L1 norm.

This section is devoted to showing that, if we do not require the correspondence map betweenthe two spaces to be bijective, then we can get a much “finer” structural correspondence, whichconcerns almost all graphs from each space and almost every “L1 structures” in a sense we willnow define.

For a given graph Γ and an integer m, let us define a constellation of order m in S(Γ) asbeing simply a collection of m graphs G1, · · · , Gm ⊆ S(Γ). A constellation should be seen asbeing characterized by the cut structure of its elements and the (relative) L1 structure betweenthem.

This definition is made so that we can talk about approximating a collection of several graphsat once while preserving the overall structure of this collection. This is the philosophy of thenext definition:

Definition 7.4 (ε-similarity). A constellation G1, · · · , Gm is ε-similar to a constellationH1, · · · , Hm in S(Γ) if:

– ∀i ∈ [m] : p−1‖1Gi − 1Hi‖ ≤ ε, and

– ∀i, j ∈ [m] : p−1‖1Gi4Gj‖ = p−1‖1Hi4Hj‖ ± ε,

where p := 2|E(Γ)|/|V (Γ)|2 is the density of Γ.

7.4. A “fine” structural correspondence 47

We say that a subset S ⊆ S(Γ) ε-contains all constellations of order m in S(Γ) if, for anycollection of m graphs G1, · · · , Gm ⊆ S(Γ), there exists a collection H1, · · · , Hm ⊆ S whichis ε-similar to G1, · · · , Gm.

Note that a set S being an ε-net of S(Γ) in cut norm is equivalent to S ε-containing allconstellations of order 1 in S(Γ), but is strictly weaker than ε-containing all constellations oforderm for anym ≥ 2. Intuitively, if we are dealing with at mostm graphs at a time, ε-containingall constellations of order m is virtually the same as being an ε-net in L1 norm.

With this notion, we may now state the “fine” structural correspondence principle:

Theorem 7.2. For every ε > 0, m ≥ 1, there exists a constant η = 2−(2m/ε)O(1)

such that thefollowing holds.

Suppose (Γn)n∈N is a sequence of η-uniform graphs with |V (Γn)| = n and density p = p(n) ≥Ωε,m(1/n). Then for every sufficiently large n ∈ N there exist subsets S′n ⊆ S(Γn), D′n ⊆S(KV (Γn)) and functions

f : S′n → D′n, g : D′n → S′n

which satisfy the following conditions:

• S′n and D′n contain almost all graphs in S(Γn) and S(KV (Γn)):

|S′n| > (1− ε)|S(Γn)|, |D′n| > (1− ε)|S(KV (Γn))|

• S′n and D′n ε-contain all constellations of order m in S(Γn) and S(KV (Γn)), respectively

• f(G) is a dense model of G and g(H) is a sparse model of H:

∀G ∈ S′n :∥∥p−11G − 1f(G)

∥∥≤ ε

∀H ∈ D′n :∥∥1H − p−11g(H)

∥∥≤ ε

• f and g are both continuous in L1-norm:

∀G1, G2 ∈ S′n :∥∥1f(G1)4f(G2)

∥∥L1 ≤ p−1 ‖1G14G2

‖L1 + ε

∀H1, H2 ∈ D′n : p−1∥∥1g(H1)4g(G2)

∥∥L1 ≤ ‖1H14H2

‖L1 + ε

7.4.1 Proof of Theorem 7.2

For the proof of Theorem 7.2 we will need to define some additional spaces:

Definition 7.5. Given ε > 0, k ≥ 1 and a graph Γ of density p := 2|E(Γ)|/|V (Γ)|2, we define:

• The space TV (Γ) of templates on V (Γ):

TV (Γ) :=T : V (Γ)2 → [0, 1], T symmetric

• The space TV (Γ);k of templates on V (Γ) with at most k steps:

TV (Γ);k :=T ∈ TV (Γ) : T has at most k steps

• The space S(Γ; k, ε) of well-structured graphs with a partition of order k:

S(Γ; k, ε) :=G ⊆ Γ : ∃ T ∈ TV (Γ);k ‖p−11G − T‖ ≤

ε

16k2

The following simple lemma is an easy consequence of Lemma 7.2 applied to the constant

template T ≡ 1/2:

Lemma 7.4. For every ε > 0, k ≥ 1 there exists η = (ε/k)O(1) such that the following holds. Forevery sequence (Γn)n≥1 of η-uniform graphs with |V (Γn)| = n and density p = p(n) ≥ Ωε(1/n),we have

limn→∞

|S(Γn; k, ε)||S(Γn)|

= 1


The following lemma is the main technical step in the proof of Theorem 7.2, and roughly saysthat the space S(Γn;K, ε) is very “well spread” in L1 norm:

Lemma 7.5. For every ε > 0, m ≥ 1 there exist η = 2−(2m/ε)O(1)

, K = 2(2m/ε)O(1)

such that thefollowing holds. If Γn is η-uniform, then S(Γn;K, ε) ε-contains all constellations of order m inS(Γn).

Proof. Let G1, · · · , Gm ⊂ S(Γn) be any constellation of order m. For every subset I ⊆ [m],define the graph

GI :=

(⋂i∈I

Gi

)∩

⋂j∈[m]\I

(Γ \Gj)

As GI ⊆ Γ for all I ⊆ [m], we may apply the Weak Regularity Lemma with error parameter

ε/2m+1 for each of these graphs. We then obtain, for each I ⊆ [m], a partition PI of V (Γ) into

at most 2(2m+1/ε)2

parts such that p−1 ‖1GI − (GI)PI‖ ≤ ε/2m+1.Let Q :=

∨I⊆[m] PI be the common refinement of these partitions; then

|Q| ≤(

2(2m+1/ε)2)2m

= 2ε−223m+2

=: K

and

p−1 ‖1GI − (GI)Q‖ ≤ p−1 ‖1GI − (GI)PI‖ + p−1 ‖(GI)PI − (GI)Q‖

≤ ε

2m+1+ p−1

∥∥((GI)PI )Q − (GI)Q∥∥

≤ ε

2m, ∀I ⊆ [m]

For each i ∈ [m] we then conclude that

p−1 ‖1Gi − (Gi)Q‖ ≤∑

I⊆[m]: i∈I

p−1 ‖1GI − (GI)Q‖ ≤ε

2

Note that all the GI are disjoint and Γ =⋃I⊆[m]GI , so

∑I⊆[m] (GI)Q = (Γ)Q. For each

I ⊆ [m], we may then construct a random graph HI ⊆ Γ in the following way: for each edgexy ∈ Γ, independently from all other choices, we put xy in exactly one of the HI with

∀I ⊆ [m], P(xy ∈ HI) =(GI)Q(x, y)

(Γ)Q(x, y)

If η ≤ ε/2m+3K2

4K = 2(2m/ε)O(1)

, then by inheritance of cut structure (Lemma 7.2) we have that

P(∥∥∥∥p−11HI −

(GI)Q(Γ)Q

∥∥∥∥

>ε

2m+3K2

)< e(ε/2m+4K2)2pn2

∀I ⊆ [m]

In particular,

P(∣∣∣∣p−1 ‖1HI‖L1 −

∥∥∥∥ (GI)Q(Γ)Q

∥∥∥∥L1

∣∣∣∣ > ε

2m+3K2

)< eΘε,m(pn2) ∀I ⊆ [m] (7.2)

Let Q1, · · · , Q` (where ` ≤ K) be the atoms of Q; then∣∣∣∣∥∥∥∥ (GI)Q(Γ)Q

∥∥∥∥L1

− p−1 ‖1GI‖L1

∣∣∣∣ ≤ ∥∥∥∥ (GI)Q(Γ)Q

− p−11GI

∥∥∥∥L1

=∑i,j≤`

|Qi||Qj |n2

∣∣∣∣eGI (Qi, Qj)eΓ(Qi, Qj)− p−1eGI (Qi, Qj)

|Qi||Qj |

∣∣∣∣=∑i,j≤`

eGI (Qi, Qj)

n2

∣∣∣∣ |Qi||Qj |eΓ(Qi, Qj)− p−1

∣∣∣∣

7.4. A “fine” structural correspondence 49

≤∑i,j≤`

|Qi||Qj |≥ηn2

2ηp−1 eGI (Qi, Qj)

n2+

∑i,j≤`

|Qi||Qj |<ηn2

p−1(1 + η)p · ηn2

n2

≤ 2ηK2

≤ ε

2m+3K∀I ⊆ [m]

Together with (7.2), this implies

P(∣∣p−1‖1HI‖L1 − p−1‖1GI‖L1

∣∣ > ε

2m+2K

)< eΘε,m(pn2)

We may then find 2m graphs HI , I ⊆ [m], such that∥∥∥∥p−11HI −(GI)Q(Γ)Q

∥∥∥∥

≤ ε

2m+3K2and p−1 |‖1HI‖L1 − ‖1GI‖L1 | ≤ ε

2m+2K

Fixing these graphs, we define for each i ∈ [m] the graph Hi :=⋃I3iHI and the template

Ti :=∑I3i

(GI)Q(Γ)Q

. We note that Hi ⊆ Γ, Ti ∈ TV (Γ);K for all i ∈ [m], and

∥∥p−11Hi − Ti∥∥≤∑I3i

∥∥∥∥p−11HI −(GI)Q(Γ)Q

∥∥∥∥

≤ 2m−1 ε

2m+3K2=

ε

16K2;

by definition we conclude that H1, · · · , Hm ⊂ S(Γ;K, ε).Let us now prove H1, · · · , Hm is ε-similar to G1, · · · , Gm. First,

p−1‖1Gi − 1Hi‖ ≤ p−1‖1Gi − (Gi)Q‖ + ‖p−1(Gi)Q − Ti‖ + ‖Ti − p−11Hi‖

≤ ε

2+∑I3i

∥∥∥∥p−1(GI)Q −(GI)Q(Γ)Q

∥∥∥∥L1

+ε

16K2

≤ ε

2+ 2m−1 ε

2m+3K+

ε

16K2≤ ε ∀i ∈ [m]

Also, because

1Hi4Hj =∑I⊆[m]

|I∩i,j|=1

1HI and 1Gi4Gj =∑I⊆[m]

|I∩i,j|=1

1GI ,

we have that

p−1∣∣‖1Hi4Hj‖L1 − ‖1Gi4Gj‖L1

∣∣ ≤ ∑I⊆[m]

|I∩i,j|=1

p−1 |‖1HI‖L1 − ‖1GI‖L1 |

≤ 2m−1 ε

2m+2K≤ ε

for all i, j ∈ [m], thus finishing the proof.

With these lemmas, we may now proceed to the proof of Theorem 7.2.

Proof of Theorem 7.2. Given ε > 0, m ≥ 1, let us define

K := 2ε−223m+2

= 2(2m/ε)O(1)

and η :=ε

2m+5K4= 2−(2m/ε)O(1)

Suppose Γ is an η-uniform graph with |V (Γ)| = n, density p ≥ Ωε(1/n) and n is sufficiently big.Let S′ := S(Γ;K, ε), D′ := S(KV (Γ);K, ε) and let T (ε) ⊂ TV (Γ);K be an (ε/16)-net of TV (Γ);K

in L1 norm containing eΘε,K(n) elements. We apply the inheritance of structure lemmas at eachtemplate in this net (using the same random choices) and then apply union bound to obtain theexistence of “model graphs” φ(T ) ∈ S(KV (Γ)) and ψ(T ) ∈ S(Γ) for any T ∈ T (ε) which satisfythe following conditions:

i) ∀T ∈ T (ε) : ‖1φ(T ) − T‖ ≤ ε/16K2,


‖p−11ψ(T ) − T‖ ≤ ε/16K2

ii) ∀T1, T2 ∈ T (ε) :∣∣∥∥1φ(T1)4φ(T2)

∥∥L1 − ‖T1 − T2‖L1

∣∣ ≤ ε/8,∣∣p−1∥∥1ψ(T1)4ψ(T2)

∥∥L1 − ‖T1 − T2‖L1

∣∣ ≤ ε/8Condition i) tells us that φ(T (ε)) ⊆ D′ and ψ(T (ε)) ⊆ S′; for each T ∈ T (ε), we define

f(ψ(T )) := φ(T ) and g(φ(T )) := ψ(T ). For every other G ∈ S′, H ∈ D′, we may find TG, TH ∈TV (Γ);K such that ‖p−11G − TG‖, ‖1H − TH‖ ≤ ε/16K2, and also T ′G, T

′H ∈ T (ε) such that

‖TG − T ′G‖L1 , ‖TH − T ′H‖L1 ≤ ε/16 (if several templates satisfy these properties, take any oneof them); we then define

f(G) := φ(T ′G), g(H) := ψ(T ′H)

By Lemma 7.4, we already know that

|S(Γ;K, ε)| > (1− ε)|S(Γ)| and |S(KV (Γ);K, ε)| > (1− ε)|S(KV (Γ))|

if n is sufficiently large. By Lemma 7.5, we also know that S(Γ;K, ε) and S(KV (Γ);K, ε) ε-containall constellations of order m in S(Γ) and S(KV (Γ)) respectively.

To finish the proof of the theorem it then suffices to prove the following two facts:

1. For all G ∈ S′, H ∈ D′, we have

‖p−11G − 1f(G)‖ ≤ ε, ‖1H − p−11g(H)‖ ≤ ε

This follows from

‖p−11G − 1f(G)‖ = ‖p−11G − 1φ(T ′G)‖≤ ‖p−11G − TG‖ + ‖TG − T ′G‖ + ‖T ′G − 1φ(T ′G)‖

≤ ε

16K2+ε

8+

ε

16K2≤ ε,

‖1H − p−11g(H)‖ = ‖1H − p−11ψ(T ′H)‖≤ ‖1H − TH‖ + ‖TH − T ′H‖ + ‖T ′H − p−11ψ(T ′H)‖

≤ ε

16K2+ε

8+

ε

16K2≤ ε

2. For all G1, G2 ∈ S′, H1, H2 ∈ D′, we have

‖1f(G1)4f(G2)‖L1 ≤ p−1‖1G14G2‖L1 + ε, p−1‖1g(H1)4g(H2)‖L1 ≤ ‖1H14H2

‖L1 + ε

This follows from the inequalities

‖TG1− TG2

‖L1 ≤ p−1‖1G14G2‖L1 +

3ε

4and |‖1f(G1)4f(G2)‖L1 − ‖T ′G1

− T ′G2‖L1 | ≤ ε

8,

so that we have

‖1f(G1)4f(G2)‖L1 ≤ ‖T ′G1− T ′G2

‖L1 +ε

8

≤ ‖T ′G1− TG1‖L1 + ‖TG1 − TG2‖L1 + ‖TG2 − T ′G2

‖L1 +ε

8

≤ ‖TG1 − TG2‖L1 +ε

4

≤ p−1‖1G14G2‖L1 + ε

The inequality p−1‖1g(H1)4g(H2)‖L1 ≤ ‖1H14H2‖L1 + ε is proven similarly.

51

Chapter 8

Extensions and open problems

In this last chapter we will briefly mention some possible extensions to the work done in Chapter7, which seem to be amenable to the methods exposed here and indicate a possible path to betaken for future work. We will also state some natural questions our results left open and whichdo not seem to follow from our methods, requiring substantially different ideas.

While the results presented in Chapter 7 focused solely on graphs (more specifically on sub-graphs of pseudorandom graphs), the arguments used in their proofs and the framework devel-oped in this work seem to make it possible to extend them to other combinatorial objects andother notions of pseudorandomness.

Perhaps the simplest and most natural extension of these results would be to generalize themto the space of upper-regular graphs (which were defined in Section 6.2). Indeed, our Lemma 6.2shows that, under the conditions 1/n p 1 we are interested in, every upper-regular graphG is a dense subgraph of some uniform graph ΓG; it then seems likely our methods will workin the space of upper-regular graphs too. The main difficulty is that there is no longer a fixeduniform graph Γ which contains all graphs in this space, so we cannot use it as a ground set forthe random choices and probabilistic estimates made as we did before. However, with a littlemore care and using slightly different concentration inequalities, it seems possible to eliminatethese difficulties.

It seems natural to expect that the robust properties for the dense model function obtainedin Theorem 7.2, as well as the structure-preserving correspondences we obtained in Theorem 7.1,should facilitate or optimize some stability results for graphs in the sparse setting we have workedin. Finding such interesting applications of our results would of course be very important.

The robustness of the dense model function given in Theorem 7.2 may also be valuable indifferent settings other than graphs. It would be interesting to see if it is possible to obtain moregeneral results in this direction, working abstractly with the framework as described in Section1.1 or in Chapter 2.

On a different direction, it might be interesting also to combine our theorems with otherknown results in the same setting of subgraphs of pseudorandom graphs. Indeed, our theoremsassume a very weak pseudorandomness condition on the host graph Γ, which is then satisfied inmost cases where some pseudorandomness condition is required.

A natural question which our results leave open is if a structural correspondence as given inTheorem 7.2 may be proven encompassing all graphs. Since our arguments require all graphsin each space to have a strongly regular partition into essentially the same number of parts, itdoesn’t seem likely that the energy-increment methods we have used throughout this work couldobtain such a universal result.

Another natural question is if in Theorem 7.2 we could make the “sparse model” function gbe the left-inverse of the “dense model” function f (the opposite direction is clearly impossiblebecause of the distinct cardinalities of the two spaces). However, in some sense it is impossibleto obtain such a result by using regularity, as we have done throughout; more specifically, wecannot obtain this result by passing through the space of templates on a bounded number ofsteps, which are the approximations of graphs we get from using the regularity lemmas.

Indeed, to get such a stronger result we would need to obtain, for each graph G belonging tothe domain S ′ of the dense model function f , a template T (G) for which the inequalities

p−1‖1G1− 1G2

‖L1 − ε ≤ ‖T (G1)− T (G2)‖L1 ≤ p−1‖1G1− 1G2

‖L1 + ε

are satisfied for all pairs G1, G2 ∈ S ′.

52 Chapter 8. Extensions and open problems

Unfortunately, the next lemma shows that unless S ′ contains only a negligible proportion ofgraphs from the sparse space (as was the case in Theorem 7.1), the space of templates is toosmall in L1 norm to obtain such a correspondence:

Lemma 8.1. For every 0 < ε, α < 1/2 and every k ∈ N there exists η > 0 such that thefollowing holds. Let (Γn)n∈N be a sequence of η-uniform graphs with |V (Γn)| = n and densityp = p(n) ≥ Ωε,α(1/n). Then for all sufficiently large n ∈ N and all sets S′n ⊆ S(Γn) with|S′n| ≥ ε|S(Γn)|, there exists no function T : S′n → TV (Γn);k satisfying

‖T (G1)− T (G2)‖L1 ≥ p−1‖1G1− 1G2

‖L1 − α ∀G1, G2 ∈ S′n

Proof. Let γ > 0 be a small constant depending on α and M be a large integer which we willchoose later. Fix a γ-net of TV (Γn);K in L1 norm with 2Θγ,K(n) elements, and choose M graphsG1, · · · , GM ∈ S′n satisfying

p−1‖1Gi − 1Gj‖L1 ≥ 1

2− γ ∀i, j ∈ [M ], i 6= j

Suppose the function T : S′n → TV (Γn);K satisfies the condition on the statement of the lemma.For every i, j ∈ [M ], let Ti, Tj be templates in the fixed γ-net such that ‖Ti − T (Gi)‖L1 , ‖Tj −T (Gj)‖L1 ≤ γ. We then have

‖Ti − Tj‖L1 ≥ ‖f(Gi)− f(Gj)‖L1 − ‖Ti − T (Gi)‖L1 − ‖Tj − T (Gj)‖L1

≥ ‖f(Gi)− f(Gj)‖L1 − 2γ

≥ p−1‖1G1 − 1G2‖L1 − α− 2γ

≥ 1

2− α− 3γ

By choosing γ = 1/2−α4 > 0 we make sure these templates are all different. However, if η > 0

is small enough and n is large enough, we obtain that almost all graphs Gi, Gj ∈ S(Γn) satisfyp−1‖1Gi − 1Gi‖L1 ≥ 1/2− α; this gives a contradiction for big n since the number of graphs inS′n satisfying this last inequality will be at least ε

2 |S′n| 2Θγ,K(n).

53

Bibliography

[1] Noga Alon, Eldar Fischer, Ilan Newman, and Asaf Shapira. “A combinatorial character-ization of the testable graph properties: it’s all about regularity”. In: SIAM Journal onComputing 39.1 (2009), pp. 143–167.

[2] Noga Alon, Eldar Fischer, Michael Krivelevich, and Mario Szegedy. “Efficient testing oflarge graphs”. In: Combinatorica 20.4 (2000), pp. 451–476.

[3] Noga Alon, Richard A Duke, Hanno Lefmann, Vojtech Rodl, and Raphael Yuster. “Thealgorithmic aspects of the regularity lemma”. In: Journal of Algorithms 16.1 (1994), pp. 80–109.

[4] Vitaly Bergelson and Alexander Leibman. “Polynomial extensions of van der Waerdensand Szemeredis theorems”. In: Journal of the American Mathematical Society 9.3 (1996),pp. 725–753.

[5] Christian Borgs, Jennifer T Chayes, Laszlo Lovasz, Vera T Sos, and Katalin Vesztergombi.“Convergent sequences of dense graphs I: Subgraph frequencies, metric properties andtesting”. In: Advances in Mathematics 219.6 (2008), pp. 1801–1851.

[6] David Conlon and Jacob Fox. “Bounds for graph regularity and removal lemmas”. In:Geometric and Functional Analysis 22.5 (2012), pp. 1191–1256.

[7] David Conlon, Jacob Fox, and Yufei Zhao. “Extremal results in sparse pseudorandomgraphs”. In: Advances in Mathematics 256 (2014), pp. 206–290.

[8] David Conlon and William Timothy Gowers. “Combinatorial theorems in sparse randomsets”. In: arXiv preprint arXiv:1011.4310 (2010).

[9] Paul Erdos, Peter Frankl, and Vojtech Rodl. “The asymptotic number of graphs not con-taining a fixed subgraph and a problem for hypergraphs having no exponent”. In: Graphsand Combinatorics 2.1 (1986), pp. 113–121.

[10] Jacob Fox and Laszlo Miklos Lovasz. “A tight lower bound for Szemer\’edi’s regularitylemma”. In: arXiv preprint arXiv:1403.1768 (2014).

[11] Jacob Fox, Laszlo Miklos Lovasz, and Yufei Zhao. “A fast new algorithm for weak graphregularity”. In: arXiv preprint arXiv:1801.05037 (2018).

[12] Alan Frieze and Ravi Kannan. “Quick approximation to matrices and applications”. In:Combinatorica 19.2 (1999), pp. 175–220.

[13] Oded Goldreich, Shari Goldwasser, and Dana Ron. “Property testing and its connection tolearning and approximation”. In: Journal of the ACM (JACM) 45.4 (1998), pp. 653–750.

[14] William T Gowers. “A new proof of Szemeredi’s theorem”. In: Geometric & FunctionalAnalysis GAFA 11.3 (2001), pp. 465–588.

[15] William T Gowers. “Lower bounds of tower type for Szemeredi’s uniformity lemma”. In:Geometric & Functional Analysis GAFA 7.2 (1997), pp. 322–337.

[16] Ben Green and Terence Tao. “The primes contain arbitrarily long arithmetic progressions”.In: Annals of Mathematics (2008), pp. 481–547.

[17] Russell Impagliazzo. “Hard-core distributions for somewhat hard problems”. In: Founda-tions of Computer Science, 1995. Proceedings., 36th Annual Symposium on. IEEE. 1995,pp. 538–545.

[18] Yoshiharu Kohayakawa. “Szemeredis regularity lemma for sparse graphs”. In: Foundationsof computational mathematics. Springer, 1997, pp. 216–230.

[19] Yoshiharu Kohayakawa and Vojtech Rodl. “Szemeredis regularity lemma and quasi-randomness”.In: Recent advances in algorithms and combinatorics. Springer, 2003, pp. 289–351.

54 BIBLIOGRAPHY

[20] Janos Komlos and Miklos Simonovits. “Szemeredi’s regularity lemma and its applicationsin graph theory”. In: (1996).

[21] Janos Komlos, Ali Shokoufandeh, Miklos Simonovits, and Endre Szemeredi. “The regularitylemma and its applications in graph theory”. In: Theoretical aspects of computer science.Springer. 2002, pp. 84–112.

[22] Laszlo Lovasz and Balazs Szegedy. “Szemeredis lemma for the analyst”. In: GAFA Geo-metric And Functional Analysis 17.1 (2007), pp. 252–270.

[23] Brendan Nagle, Vojtech Rodl, and Mathias Schacht. “The counting lemma for regulark-uniform hypergraphs”. In: Random Structures & Algorithms 28.2 (2006), pp. 113–179.

[24] Omer Reingold, Luca Trevisan, Madhur Tulsiani, and Salil Vadhan. “Dense subsets ofpseudorandom sets”. In: Foundations of Computer Science, 2008. FOCS’08. IEEE 49thAnnual IEEE Symposium on. IEEE. 2008, pp. 76–85.

[25] Omer Reingold, Luca Trevisan, Madhur Tulsiani, and Salil Vadhan. “New proofs of theGreen-Tao-Ziegler dense model theorem: An exposition”. In: arXiv preprint arXiv:0806.0381(2008).

[26] Vojtech Rodl and Mathias Schacht. “Regular partitions of hypergraphs: regularity lem-mas”. In: Combinatorics, Probability and Computing 16.6 (2007), pp. 833–885.

[27] Vojtech Rodl and Mathias Schacht. “Regularity lemmas for graphs”. In: Fete of combina-torics and computer science. Springer, 2010, pp. 287–325.

[28] Vojtech Rodl and Jozef Skokan. “Applications of the regularity lemma for uniform hyper-graphs”. In: Random Structures & Algorithms 28.2 (2006), pp. 180–194.

[29] Vojtech Rodl and Jozef Skokan. “Regularity Lemma for k-uniform hypergraphs”. In: Ran-dom Structures & Algorithms 25.1 (2004), pp. 1–42.

[30] Endre Szemeredi. “On sets of integers containing no k elements in arithmetic progression”.In: Acta Arith 27 (1975), pp. 299–345.

[31] Endre Szemeredi. Regular partitions of graphs. Tech. rep. STANFORD UNIV CALIFDEPT OF COMPUTER SCIENCE, 1975.

[32] Terence Tao. “A variant of the hypergraph removal lemma”. In: Journal of combinatorialtheory, Series A 113.7 (2006), pp. 1257–1280.

[33] Terence Tao. “Structure and randomness in combinatorics”. In: Foundations of ComputerScience, 2007. FOCS’07. 48th Annual IEEE Symposium on. IEEE. 2007, pp. 3–15.

[34] Terence Tao. Structure and randomness: pages from year one of a mathematical blog. Amer-ican Mathematical Soc., 2008.

[35] Terence Tao. “The Gaussian primes contain arbitrarily shaped constellations”. In: JournaldAnalyse Mathematique 99.1 (2006), pp. 109–176.

[36] Terence Tao and Tamar Ziegler. “The primes contain arbitrarily long polynomial progres-sions”. In: Acta Mathematica 201.2 (2008), pp. 213–305.

[37] Luca Trevisan, Madhur Tulsiani, and Salil Vadhan. “Regularity, boosting, and efficientlysimulating every high-entropy distribution”. In: Computational Complexity, 2009. CCC’09.24th Annual IEEE Conference on. IEEE. 2009, pp. 126–136.

[38] Yufei Zhao. “Hypergraph limits: a regularity approach”. In: Random Structures & Algo-rithms 47.2 (2015), pp. 205–226.

low-complexity decompositions of combinatorial …...make it simpler to apply tools such as the...

Documents