agglomeration in scale-free random graphs - universidade federal de … · 2019. 11. 15. ·...

72
Universidade Federal de Minas Gerais Instituto de Ciências Exatas Departamento de Matemática Agglomeration in scale-free random graphs por Rodrigo Botelho Ribeiro Belo Horizonte 2016

Upload: others

Post on 10-Mar-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Agglomeration in scale-free random graphs - Universidade Federal de … · 2019. 11. 15. · Departamento de matemática Agglomeration in scale-free random graphs por ... Neste trabalho

Universidade Federal de Minas Gerais

Instituto de Ciências Exatas

Departamento de Matemática

Agglomeration in scale-free random graphs

por

Rodrigo Botelho Ribeiro

Belo Horizonte

2016

Page 2: Agglomeration in scale-free random graphs - Universidade Federal de … · 2019. 11. 15. · Departamento de matemática Agglomeration in scale-free random graphs por ... Neste trabalho

Universidade Federal de Minas GeraisInstituto de Ciências ExatasDepartamento de matemática

Agglomeration in scale-free random

graphs

por

Rodrigo Botelho RibeiroOrientador: Remy Sanchis

Tese apresentada ao Departamento de Matemática da Universidade Federal de Minas Gerais, como parte

dos requisitos para obtenção do grau de

DOUTOR EM MATEMÁTICA

Belo Horizonte, 18 de julho de 2016

⇤O autor foi bolsista da CAPES durante a elaboração deste trabalho.

Page 3: Agglomeration in scale-free random graphs - Universidade Federal de … · 2019. 11. 15. · Departamento de matemática Agglomeration in scale-free random graphs por ... Neste trabalho

"Young man, in mathematics you do

not understand things. You just get

used to them."

John von Neumann

Page 4: Agglomeration in scale-free random graphs - Universidade Federal de … · 2019. 11. 15. · Departamento de matemática Agglomeration in scale-free random graphs por ... Neste trabalho

Agradecimentos

Aos meus pais e a todas as balas e chocolates que eles venderam para financiar minha educação básica.

Ao meu orientador e à estrutura pública da UFMG que me mostraram o fantástico mundo da ciência. E

por fim, mas não menos importante, ao lindo broto que nasceu em meu jardim.

ii

Page 5: Agglomeration in scale-free random graphs - Universidade Federal de … · 2019. 11. 15. · Departamento de matemática Agglomeration in scale-free random graphs por ... Neste trabalho

Resumo

Neste trabalho investigamos três modelos de grafos aleatórios que geram grafos livres de escala. Nosso

interese reside na formação de subgrafos completos, coeficientes de aglomeração, distribuição dos graus

e diâmetro. Nossos principais resultados mostram a existência assintoticamente quase certamente de

um subgrafo completo cuja ordem vai para infinito, além de cota superior para o diâmetro em um dos

modelos. Mostramos também que em um dos modelos, conhecido como Holme-Kim, os coeficientes de

aglomeração local e global possuem comportamentos bastante diferentes. Enquanto o primeiro permanece

longe do zero, o segundo tende a zero à medida que o tempo vai para infinito.

Palavras-Chaves: grafos, cliques, livres de escala, aglomeração, diâmetro, lei de potência.

iii

Page 6: Agglomeration in scale-free random graphs - Universidade Federal de … · 2019. 11. 15. · Departamento de matemática Agglomeration in scale-free random graphs por ... Neste trabalho

Abstract

In this work we investigate three random graph models capable of generating scale-free graphs. Our

main interest relies on formation of cliques, calculating clustering coefficients, the degree distribution and

diameters. The main results we have proven show the existence asymptotically almost surely the existence

of a clique whose order goes to infinity as the graph’s order goes to infinity, concentration inequalities for

the degrees and upper bound for the diameter in one of theses models. We also show that in the model

known as Holme-Kim’s model the clustering coefficients local and global present quite distinct behavior.

Whereas the former is bounded away from zero a.a.s, the latter goes to zero as the time goes to infinity.

Key-Words: graphs, cliques, scale-free, agglomeration, diameters, power-law

iv

Page 7: Agglomeration in scale-free random graphs - Universidade Federal de … · 2019. 11. 15. · Departamento de matemática Agglomeration in scale-free random graphs por ... Neste trabalho

CONTENTS

1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.1 Basic notation and definitions . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.1.1 Clustering coe�cients . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.3 Brief discussion of our results . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2. The Holme-Kim model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.1 Definition of the process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.1.1 Our results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.1.2 Heuristics and a seemingly general phenomenon . . . . . . . . . . . . 13

2.1.3 Main technical ideas . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.1.4 Organization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.2 Formal definition of the process . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.3 Technical estimates for vertex degrees . . . . . . . . . . . . . . . . . . . . . . 16

2.3.1 Degree increments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.3.2 Upper bounds on vertex degrees . . . . . . . . . . . . . . . . . . . . . 19

2.4 Positive local clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.5 Vanishing global clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.5.1 Preliminary estimates for number of cherries . . . . . . . . . . . . . . 24

2.5.2 The bootstrap argument . . . . . . . . . . . . . . . . . . . . . . . . . 29

Page 8: Agglomeration in scale-free random graphs - Universidade Federal de … · 2019. 11. 15. · Departamento de matemática Agglomeration in scale-free random graphs por ... Neste trabalho

Contents 3

2.5.3 Wrapping up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.6 Final comments on clustering . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.7 Power law degree distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3. The p-model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

3.1 Bounds for the vertices’ degree . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.1.1 Upper bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.1.2 Lower bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.2 The Community . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

3.3 Small-World Phenomenon . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4. The p(t)-model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

4.1 Continuity of the parameter . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4.2 The case p(t) = 1/t� . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

4.2.1 Power-law distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 54

4.2.2 The expected maximum degree . . . . . . . . . . . . . . . . . . . . . 55

4.3 The case p(t) = 1/ log(t) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

4.3.1 Power-law distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 56

4.3.2 Lower bounds for the diameter . . . . . . . . . . . . . . . . . . . . . 57

5. Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

5.1 Auxiliary Lemmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

5.2 Concentration Inequalities for Martingales . . . . . . . . . . . . . . . . . . . 63

Page 9: Agglomeration in scale-free random graphs - Universidade Federal de … · 2019. 11. 15. · Departamento de matemática Agglomeration in scale-free random graphs por ... Neste trabalho

1. INTRODUCTION

Before we overview the area and introduce the problems we treat in this thesis, we beginby the basic notation and definitions needed to understand our results. We believe this willmake the reading easier for those who are not familiar enough with all the graph theoreticalnotations. Those who are already familiar with graph theory may start theirs by the overviewand go back to the basic notation when it is necessary.

1.1 Basic notation and definitions

In this section, we give all the general notations and definitions needed to state and proveour results. We start from the most basic and general graph theory notation.

Recall that a graph G = (VG

, EG

) consists of a set VG

of vertices and a set EG

⇢�

VG

2

of edges. Given G and v, w 2 VG

, we say v and w are neighbors, and write v $ w, if{v, w} 2 E

G

. We also write �G

(v) = {w 2 V : w $ v} for the neighborhood of v 2 VG

and e(�G

(v)) for the number of edges between the neighbors of v. In this case, we define thedegree of a vertex v in G by the number of neighbors it has in G and we denote this quantityby d

G

(v). However, some of our models may generate multiple edges, i.e., two vertices maybe connected by more than one edge or we may have loops, edges whose end points are thesame vertex. In these cases, we let e

G

(v, w) be the number of edges connecting v and w anddefine d

G

(v) in terms of eG

(v, w) writing it as dG

(w) =P

w2�G(v)

eG

(v, w), i.e, the total ofedges whose one of its ends is v.

Another quantity we have interest of is the maximum degree of a graph, or multigraph. It isthe value of the largest degree in G and we denote it by d

max

(G).

A complete subgraph in G is a subgraph S ⇢ G whose vertices are all connected, by atleast one edge, to each other. The order of the complete subgraph is simply the number ofvertices it has. Usually, we will call a complete subgraph in G clique or community.

A triangle in a graph G = (VG

, EG

) is a subset {u, v, w} 2�

VG

3

with u $ v $ w $ u.We denote the number of triangles formed by a set of vertices S by �(S). Thus, the total

Page 10: Agglomeration in scale-free random graphs - Universidade Federal de … · 2019. 11. 15. · Departamento de matemática Agglomeration in scale-free random graphs por ... Neste trabalho

1. Introduction 5

number of triangles in G is �(G).

A path of length k in G is an alternating sequence of vertices and edges v0

, e0

, ..., ek�1

, vk

in which vi

6= vj

for all i, j k and the edge ej

connects the vertex j to the vertex j + 1.The diameter of G, denoted by diam(G) is the length of the largest path in G. We alsohave particular interest of paths of length 2, which we call cherries. Observe that

dG(v)

2

counts the number of cherries which middle vertex is v. Thus,P

v2V�

dG(v)

2

counts the totalnumber of cherries in G, quantity we denote by C

G

. For a fixed vertex v 2 VG

, we denotethe number of triangles sharing at least the common vertex v by �

G

(v).

We will often consider the sequence of graphs {Gt

}t

, which is indexed by a discrete timeparameter t � 0. When considering this sequence, we replace the subscript G

t

with t in ourgraph notation, so that V

t

:= VGt , Et

:= EGt , etc. Given a sequence of numerical values

{xt

}t�0

depending on t, we will let � xt

:= xt+1

� xt

.

Finally, we will use the Landau o/O/⇥ notation at several points of our discussion. Thisalways pressuposes some asymptotics as a parameter n or t of interest goes to +1. Justwhich parameter this is will always be clear from the context.

1.1.1 Clustering coe�cients

We now define the clustering coe�cients that appear in our work and latter we point outwhy their behavior must be clarified to avoid the confusion found in some works.

Definition 1 (Local clustering coe�cient in v). Given a vertex v 2 G of degree at least two,the local clustering coe�cient at v, is

CG

(v) =�

G

(v)�

dG(v)

2

.

For vertices of degree one, we put its local clustering coe�cient as zero.

Notice that 0 CG

(v) 1 always, since there can be at most one triangle in {v} [ �G

(v)for each pair of neighbors of v. In probabilistic terms, C

G

(v) measures the probability thata pair of random neighbors of v for an edge, ie. how likely it is that “two friends of v arealso each other’s friends.”

The two coe�cients for the graph G are as follows.

Definition 2 (Local and Global Clustering Coe�cients). The local clustering coe�cient

of G is defined as

Cloc

G

:=

P

v2VGCG

(v)

|VG

| ,

Page 11: Agglomeration in scale-free random graphs - Universidade Federal de … · 2019. 11. 15. · Departamento de matemática Agglomeration in scale-free random graphs por ... Neste trabalho

1. Introduction 6

whereas the global coe�cient is

Cglo

G

:= 3⇥ �(G)P

v2VG

dG(v)

2

.

Bollobas and Riordan [1, page 18] have observed that Cloc

G

and Cglo

G

are used interchangeablyin the non-rigorous literature. They warned that:

In some papers it is not clear which of the two definitions above is intended;when it is clear, sometimes Cloc is used, and sometimes Cglo; In more balancedgraphs the definitions will give more similar values, but they will still di↵er by atleast a constant factor much of the time.

In fact, more extreme di↵erences are possible for non-regular graphs.

Build a graph G consisting of an edge e and n � 2 other vertices connected to the twoendpoints of e, it is easy to see that Cloc

G

= 1� 2

n

. On the other hand, it is straightforward

to see that Cglo

G

= 3(n�2)

n�2+(n�1)(n�2)

= 3

n

..

One of our results shows that we have a random graph model wich obeys a power-lawdistribution and satisfies the less extreme bound Cloc

Gt> 0 and Cglo

Gt! 0 for t large. This

contradicts the numerical findings of [26], where the Holme-Kim model is cited as a modelthat exhibits positive “clustering coe�cient”, but the definition given is Cglo

G

and also thestatement made in [25] which a�rms that both definition are the same, less than constantfactors.

1.2 Overview

Since the 2000’s, the scientific community has made e↵orts to understand the structure oflarge complex networks – e.g., Twitter [15], Scientific coauthorship networks [23]– and also topropose and investigate random models capable of capturing properties pointed out by em-pirical results. Some models have shown to reproduce some of these properties. Three goodexamples are: the model proposed by Strogatz and Watts [29], which captures a phenomenonknown as small-world ; the one proposed by Albert and Barabasi [2], whose rule of evolution– known as preferential attachment –, is capable of producing graphs obeying a power-lawdistribution; and another one proposed by Holme and Kim [18], which captures the tendencythat some networks have of closing triangles. However, the same models have failed to cap-ture others properties also seen in real world networks – the Strogatz-Watts’ model does notobey power laws and the Albert-Barabasi’s model does not have agglomeration [1].

Page 12: Agglomeration in scale-free random graphs - Universidade Federal de … · 2019. 11. 15. · Departamento de matemática Agglomeration in scale-free random graphs por ... Neste trabalho

1. Introduction 7

Over the years, variations of the classic models have appeared, mainly of Albert-Barabasi’s,followed by papers that have investigated whether these random graph models had certainproperties beyond the power-law. In [14], the authors determine, asymptotically, the ex-pected value of the global clustering coe�cient of a random graph process which is a slightmodification of Albert-Barbsi’s model. Although this modification still obeys a power-law,it doesn’t have large cliques nor a high clustering coe�cient. In [13], the diameter of a largeclass of models with preferential attachment rules is investigated and all cases consideredhave diameters tending to infinity as the graph’s order increases. In [4, 9, 12], a modelcombining preferential attachment rules and vertices fitness is investigated; however, littleis known about its structure.

Other important related questions are about the spread of diseases in scale-free graphs andtheir vulnerability to deliberate attack – see [6] for an example – which are related somehow tothe existence of complete subgraphs. In the context of vulnerability, cliques play a significantrole. When the attack is completely random, large cliques have high probability of remainingconnected. On the other hand, deliberate attacks directed towards them represent a threatto network’s connectedness. Still in the practical context, the presence of certain subgraphsin biological networks, called motifs, is related to functional properties selected by evolution[21].

The importance of cliques goes beyond the examples cited above. They are useful in GraphTheory, too. The clique-number of G provides a lower bound to the chromatic number of G,i.e., the minimum number of colors such that G has a proper coloring. Furthermore, it alsoprovides a lower bound to the number of triangles in G, a fundamental quantity to study theso-called global clustering coe�cient of G (see e.g. [1, 14, 26, 28]). For more works relatedto cliques in scale-free random graphs, see also [5, 16, 17].

In short, we may say all these models wanted to achieve the following desiderata.

• Power-law degree distribution. This means that the fraction of nodes with degree kshould be roughly of the order k�� for some � > 0. This contrasts starkly with thesituation in Erdos-Renyi graphs with a similar edge density, where p

k

would have aPoisson shape. Power-law degree distributions are generally believed to exist in a widevariety of real-life complex networks [24]. The Barabasi/Albert model is rigorouslyknown to satisfy this property, with � = 3 [8].

• Clustering. In addition to the power law, social and metabolic networks have thefeature that “friends of friends tend to be friends”, ie. that if a node a is connectedto both b and c, the latter have higher propensity of being connected. This trait isnot captured by the Barabasi/Albert model. The Small World model of Strogatz andWatts [29], which is also extremely well-known, does have this property (but has nopower-law).

Page 13: Agglomeration in scale-free random graphs - Universidade Federal de … · 2019. 11. 15. · Departamento de matemática Agglomeration in scale-free random graphs por ... Neste trabalho

1. Introduction 8

• Small distances. A third characteristic of real-life networks is that, unlike e.g. pathsor patches of regular lattices, they tend to have diameter that is logarithmic in thenumber of nodes. This property is shared by the two aforementioned models and infact by many others, including some homogeneous ones.

• Large communities. Real-life networks, mainly social-networks, contain large completesubgraphs, also called communities in this context, which are explained by the tendencypeople have of forming large groups where everyone knows everyone.

Unfortunately, these models have failed in two aspects:

1. the graphs generated by them do not capture simultaneously all the properties seen inempirical results;

2. their dynamics do not take into account connections made by a�nity among the ver-tices, i.e., theirs dynamics ignore a phenomenon observed empirically which says thatthe vertices also tend to connect to other vertices of similar age or sharing some othercharacteristic.

Beyond questions about the model’s capability of capturing desired properties, we haveobserved that even the set of properties is not well defined, in the sense that some of theseproperties still have definition problems, as is the case of the concept of the clusteringcoe�cient, which we have discussed in Section 1.1.1.

To conclude this brief overview, we observe that despite many works of investigation ofrandom graph models with preferential attachment rules, such as [22, 14, 13], we believe thescientific community has not yet explored the subject in its full range. There are relevantmodels in the computer science literature whose properties lack mathematical rigor and somestructures of the graphs generated by these models, such as large cliques, have not receivedenough attention. The present thesis is a contribution to the matters above. We investigatemodels which may capture simultaneously all the properties aforementioned and also proposeanother one we believe is more realistic than those we have studied.

Page 14: Agglomeration in scale-free random graphs - Universidade Federal de … · 2019. 11. 15. · Departamento de matemática Agglomeration in scale-free random graphs por ... Neste trabalho

1. Introduction 9

1.3 Brief discussion of our results

We have investigated three random graph models, which will be formally defined in separatedchapters. For now we limit ourselves to informally state the main results we have concerningeach of them.

Holme-Kim model:

• The degree distribution follows a power-law distribution with exponent � = 3. We alsoprove sharp concentrations for this power law;

• With high probability, the model has a local clustering coe�cient bounded away fromzero.

• With high probability, the model’s global clustering coe�cient goes to zero as thenumber of nodes in the graph goes to infinity: we also compute its decay speed;

• We give a formal explanation for the di↵erent behavior among the two coe�cients;

• We also prove the Weak Law of Large Numbers for the numbers of paths of size twoin the graphs generated by this model;

The p-model:

• The degree distribution follows a power law distribution with exponent which is afunction of its parameter p given by � = 2 + p

2�p

. Actually, this results may be foundin [10], but we cite it here for completeness;

• The existence, with high probability, of a community whose order goes to infinity asthe graph’s size increases. We specify the community’s order in function of the model’sparameter p. This result may be found in the preprint at http://arxiv.org/abs/

1509.04650

• By a coupling with the classical Albert-Barabasi’s model we show that the modelexhibits the small-world phenomenon;

The p(t)-model:

• The degree distribution follows a power law distribution with exponent which dependson how fast the function p(t) goes to zero;

• A lower bound for the diameter for a specific function p(t).

Page 15: Agglomeration in scale-free random graphs - Universidade Federal de … · 2019. 11. 15. · Departamento de matemática Agglomeration in scale-free random graphs por ... Neste trabalho

2. THE HOLME-KIM MODEL

In this Chapter we provide a rigorous analysis of a specific non-homogeneous random graphmodel, whose motivation was to combine scale-freeness and clustering. This model wasintroduced in 2001 by Holme and Kim [19]. The Holme-Kim or HK model describes arandom sequence of graphs {G

t

}t2N that will be formally defined in Section 2.2. Here is

an informal description of this evolution. Fix two parameters p 2 [0, 1] and a positiveinteger m > 1, and start with a graph G

1

. For t > 1, the evolution from Gt�1

to Gt

consistsof the addition of a new vertex v

t

and m new edges between vt

and vertices already in Gt�1

.These m edges are added sequentially in the following fashion.

1. The first edge is always attached following the preferential attachment (PA)mechanism,that is, it connects to a previously existing node w with probability proportional tothe degree of w in G

t�1

.

2. Each of the m� 1 remaining edges makes a random decision on how to attach.

(a) With probability p, the edge is attached according to the triad formation (TF)mechanism. Let w0 be the node of G

t�1

to which the previous edge was attached.Then the current edge connects to a neighbor w of w0 chosen with probabilityproportional to number of edges between w and w0.

(b) With probability 1� p (p 2 [0, 1] fixed) the edge follows the same PA mechanismas the first edge (with fresh random choices).

The case p = 0 of this process, where only preferential attachment steps are performed, isessentially the Barabasi-Albert model [3]. The triad formation steps, on the other hand, arereminiscent of the copying model by Kumar et al. [20]. Holme and Kim argued on the basisof simulations and non-rigorous analysis that their model has the properties of scale-freenessand positive clustering.

Our rigorous results partly confirm their findings. The degree power law can be checkedby known methods. On the other hand, we show that there were aspects of the clusteringphenomenon (or lack thereof) that were not made evident in [19] or in other papers in thelarge networks literature (e.g. [27]). We will see that the question of whether the Holme-Kim

Page 16: Agglomeration in scale-free random graphs - Universidade Federal de … · 2019. 11. 15. · Departamento de matemática Agglomeration in scale-free random graphs por ... Neste trabalho

2. The Holme-Kim model 11

model has clustering admits two di↵erent answers depending on how we define clustering.

2.1 Definition of the process

In this section we give a more formal definition of the Holme-Kim process.

The model has two parameters: a positive integer numberm � 2 and a real number p 2 [0, 1].It produces a graph sequence {G

t

}t�1

which is obtained inductively according to the growthrule we describe below.

Initial state. The initial graph G1

, which will be taken as the graph with vertex setV1

= {1} and a single edge, which is a self-loop.

Evolution. For t > 1, obtain Gt+1

from Gt

adding to it a vertex t+ 1 and m edges betweent+ 1 and vertices Y (i)

t+1

2 Vt

, 1 i m. These vertices are chosen as follows. Let Ft

be the�-field generated by all random choices made in our construction up to time t. Assume weare given i.i.d. random variables (⇠(i)

t+1

) independent from Ft

. We define:

P⇣

Y(1)

t+1

= u�

Ft

=dt

(u)

2mt,

which means the first choice of vertex is always made using the preferential attachmentmechanism. The next m � 1 choices Y

(i)

t+1

, 2 i m, are made as follows: let F (i�1)

t

be

the �-field generated by Ft

and all subsequent random choices made in choosing Y(j)

t+1

for1 j i� 1. Then:

P⇣

Y(i)

t+1

= u�

⇠(i)

t+1

= x,F (i�1)

t

=

8

>

>

>

>

>

>

<

>

>

>

>

>

>

:

dt(u)

2mt

if x = 0,

et(Y(i�1)t+1 ,u)

dt(Y(i�1)t+1 )

if x = 1 and u 2 �t

(Y (i�1)

t+1

),

0 otherwise.

In words, for each choice for the m�1 end points, we flip an independent coin of parameter pand decide according to it which mechanism we use to choose the end point. With probabilityp use the triad formation mechanism, i.e., we choose the end point among the neighbors ofthe previously chosen vertex Y

(i�1)

t+1

. With probability 1� p, we make a fresh choice from Vt

using the preferential attachment mechanism. In this sense, if ⇠(i)t

= 1 we say we have takena TF-step. Otherwise, we say PA-step.

Page 17: Agglomeration in scale-free random graphs - Universidade Federal de … · 2019. 11. 15. · Departamento de matemática Agglomeration in scale-free random graphs por ... Neste trabalho

2. The Holme-Kim model 12

2.1.1 Our results

The main results in this Chapter show that such a disparity between local and global cluster-ing does indeed occur in the specific case of the Holme-Kim model, albeit in a less extremeform than what the Example suggests.

Theorem 2.1.1 (Positive local clustering for HK). Let {Gt

}t�0

be the sequence of graphsgenerated by the Holme-Kim model with parameters m � 2 and p 2 (0, 1). Then there existsc > 0 depending only on m and p such that the local clustering coe�cients Cloc

Gtof the graphs

Gt

satisfy:lim

t!+1P�

Cloc

Gt� c

= 1.

Theorem 2.1.2 (Vanishing glocal clustering for HK). Let {Gt

}t�0

be as in Theorem 2.1.1.Then there exist constants b

1

, b2

> 0 depending only on m and p such that the global clusteringcoe�cients Cglo

Gtsatisfy:

limt!+1

P✓

b1

log t Cglo

Gt b

2

log t

= 1.

Thus for large t, one of the two clustering coe�cients is typically far from 0, whereas theother one goes to 0 in probability, albeit at a small rate. This shows that the remark byBollobas and Riordan is very relevant in the analysis of at least one network model. Ourresults contradict the numerical findings of [27], where the Holme-Kim model is cited asa model that exhibits positive “clustering coe�cient”, but the definition of clustering usedcorresponds to the global coe�cient1.

For completeness, we will also check in the Appendix that the HK model is scale-free withpower-law exponent � = 3. The proof follows from standard methods in the literature.

Theorem 2.1.3 (The power-law for HK). Let {Gt

}t�0

be as in the previous theorem. Alsolet N

t

(d) be the number of vertices of degree d in Gt

and set

Dt

(d) :=EN

t

(d)

t.

Then

limt!1

Dt

(d) =2(m+ 3)(m+ 1)

(d+ 3)(d+ 2)(d+ 1).

andP⇣

|Nt

(d)�Dt

(d) t| � 16d · c ·pt⌘

(t+ 1)d�me�c

2.

1Their error may be partly explained by the slow decay of Cglo

Gt.

Page 18: Agglomeration in scale-free random graphs - Universidade Federal de … · 2019. 11. 15. · Departamento de matemática Agglomeration in scale-free random graphs por ... Neste trabalho

2. The Holme-Kim model 13

2.1.2 Heuristics and a seemingly general phenomenon

The disparity between Cloc

G

and Cglo

G

should be a general phenomenon for large scale-free graphmodels with many (but not too many triangles). This will transpire from the follwowingheuristic analysis of the Holme-Kim case with p 2 (0, 1).

To begin, it is not hard to understand why Theorem 2.1.1 should be true. By Theorem2.1.3, there is a positive fraction of nodes with degree m. Moreover, a positive fraction ofthese vertices are contained in at least one triangle because of TF steps. A more generalobservation may be made.

Reason for positive local clustering: if a positive fraction of nodes havedegree d (assumed constant), and a positive fraction of these nodes are con-tained in at least one triangle, then the local clustering coe�cient Cloc

Gtmust be

bounded away from zero.

We now argue that the vanishing of Cglo

Gtshould be a consequence of the power law degree

distribution. The global clustering coe�cient Cglo

Gtis essentially the ratio of the number of

triangles to the number of cherries in Gt

, the latter being denoted by Ct

. Now, one can easilyshow that the number of triangles in G

t

grows linearly in t with high probability, so:

Cglo

Gt⇡ # of triangles in G

t

Ct

⇡ t

Ct

.

To estimate Ct

, we note that each vertex v of degree d in Gt

is the “middle vertex” of exactlyd(d� 1)/2 ⇡ d2 cherries. This means

Ct

t⇡

t

X

d=1

Nt

(d)

td2

Nt(d)

t

⇠ Dt

(d) ⇡ d�3 by Theorem 2.1.3⌘

⇡t

X

d=1

1

d⇡ log t.

Our reasoning is not rigorous because it requires bounds on Nt

(d) for very large d. However,we feel our argument is compelling enough to be true for many models. In fact, consideringthe case where N

t

(d)/t ⇡ d�� for 0 < � 3, one is led to the following.

Heuristic reason for large number of cherries: if the fraction of nodesof degree d in G

t

is ⇡ d�� for some 0 < � 3, the number of cherries Ct

is superlinear in t. More precisely, we expect Ct

/t ⇡ t3�� for 0 < � < 3 andC

t

/t ⇡ log t for � = 3.

Page 19: Agglomeration in scale-free random graphs - Universidade Federal de … · 2019. 11. 15. · Departamento de matemática Agglomeration in scale-free random graphs por ... Neste trabalho

2. The Holme-Kim model 14

The power law range 0 < � 3 corresponds to most models of large networks in theliterature. Likewise, we believe that the disparity between Cglo

Gtand Cloc

Gtshould hold for all

“natural” random graph sequences with many triangles and power law degree distributionwith exponent 0 < � 3. The general message is this.

Heuristic disparity between local and global clustering: achievingpositive local clustering is “easy”: just introduce a density of triangles in sparseareas of the graph. On the other hand, if the number of triangles in G

t

growslinearly with time, and the fraction of nodes of degree d in G

t

is ⇡ d�� for some0 < � 3, then one expects a vanishingly small global clustering coe�cient.

2.1.3 Main technical ideas

At a high level, our proofs follow standard ideas from previous rigorous papers on complexnetworks. For instance, suppose one wants to keep track of the number of nodes of degreed at time t, for d = m,m + 1, . . . , D. Letting N

t

= (Nt

(m), . . . , Nt

(D)), the basic strategyadopted in several previous papers is to find a deterministic matrix M

t�1

and a deterministicvector r

t�1

, both measurable with respect to G0

, . . . , Gt�1

, such that:

Nt

= Mt�1

Nt

+ rt�1

+ ✏t

, where E(✏t

| G0

, . . . , Gt

) ⇡ 0.

This can be seen as “noisy version” of the deterministic recursion Nt

= Mt�1

Nt�1

+ rt�1

with ✏t

the “noise” term. One then studies the recursion and uses martingale techniques(especially the Azuma-Ho↵ding inequality) to prove that N

t

concentrates around the solutionof the deterministic recursion. Our own proof of the degree power law follows this outline,and is only slightly di↵erent from the one in [11].

Once the degree sequence is analyzed, Theorem 2.1.1 is then a matter of observing that adensity of vertices of degree m will be contained in at least one triangle, due to a TF step.On the other hand, the analysis of global clustering is harder due to the need to estimate thenumber of cherries C

t

. Justifying the heuristic calculation above would require strong controlof the degree distribution up to very large values of d. We opt instead to write a “noisyrecursion” for C

t

itself. However, the increments in this noisy recursion can be quite large,and the Azuma-Ho↵ding inequality is not enough to control the process. We use insteadFreedman’s concentration inequality, which involves the quadratic variation, but even thatis delicate because the variation might “blow up” in certain unlikely events. In the end, weuse a kind of “bootstrap” argument, whereby a preliminary estimate of C

t

is fed back intothe martingale calculation to give sharper control of the predictable terms and the variation.The upshot is a Weak Law of Large Numbers for C

t

, given in Theorem 2.1.4 below:

Page 20: Agglomeration in scale-free random graphs - Universidade Federal de … · 2019. 11. 15. · Departamento de matemática Agglomeration in scale-free random graphs por ... Neste trabalho

2. The Holme-Kim model 15

Theorem 2.1.4 (The Weak Law of large Numbers for Ct

). Let Ct

be the number of cherriesin G

t

, thenC

t

t log tP�!

m+ 1

2

.

Overall, we believe our martingale analysis of Ct

is our main technical contribution.

2.1.4 Organization.

The remainder of this Chapter is organized as follows. Section 2.2 presents a formal definitionof the model. In Section 2.3 we prove technical estimates for the degree which will be usefulthroughout the Chapter. Sections 2.4 and 2.5 are devoted to prove the bounds for thelocal and the global clustering coe�cients, respectively. Section 2.6 presents a comparativeexplanation for the so distinct behavior of the clustering coe�cients. The proof of thepower-law distribution is done at the end of the Chapter, in Section 2.7 because it followswell known martingale arguments.

2.2 Formal definition of the process

In this section we give a more formal definition of the Holme-Kim process (compare withthe beginning of this Chapter).

The model has two parameters: a positive integer numberm � 2 and a real number p 2 [0, 1].It produces a graph sequence {G

t

}t�1

which is obtained inductively according to the growthrule we describe below.

Initial state. The initial graph G1

, which will be taken as the graph with vertex setV1

= {1} and a single edge, which is a self-loop.

Evolution. For t > 1, obtain Gt+1

from Gt

adding to it a vertex t+ 1 and m edges betweent+ 1 and vertices Y (i)

t+1

2 Vt

, 1 i m. These vertices are chosen as follows. Let Ft

be the�-field generated by all random choices made in our construction up to time t. Assume weare given i.i.d. random variables (⇠(i)

t+1

) independent from Ft

. We define:

P⇣

Y(1)

t+1

= u�

Ft

=dt

(u)

2mt,

which means the first choice of vertex is always made using the preferential attachmentmechanism. The next m � 1 choices Y

(i)

t+1

, 2 i m, are made as follows: let F (i�1)

t

be

Page 21: Agglomeration in scale-free random graphs - Universidade Federal de … · 2019. 11. 15. · Departamento de matemática Agglomeration in scale-free random graphs por ... Neste trabalho

2. The Holme-Kim model 16

the �-field generated by Ft

and all subsequent random choices made in choosing Y(j)

t+1

for1 j i� 1. Then:

P⇣

Y(i)

t+1

= u�

⇠(i)

t+1

= x,F (i�1)

t

=

8

>

>

>

>

>

>

<

>

>

>

>

>

>

:

dt(u)

2mt

if x = 0,

et(Y(i�1)t+1 ,u)

dt(Y(i�1)t+1 )

if x = 1 and u 2 �t

(Y (i�1)

t+1

),

0 otherwise.

In words, for each choice for the m�1 end points, we flip an independent coin of parameter pand decide according to it which mechanism we use to choose the end point. With probabilityp use the triad formation mechanism, i.e., we choose the end point among the neighbors ofthe previously chosen vertex Y

(i�1)

t+1

. With probability 1� p, we make a fresh choice from Vt

using the preferential attachment mechanism. In this sense, if ⇠(i)t

= 1 we say we have takena TF-step. Otherwise, we say a PA-step was performed.

2.3 Technical estimates for vertex degrees

In this section we collect several results on vertex degrees. Subsection 2.3.1 describes theprobability of degree increments in a single step. In Subsection 2.3.2 we obtain upper boundson all degrees. Some of these results are fairly technical and may be skipped in a first reading.

2.3.1 Degree increments

We begin the following simple lemma.

Lemma 2.3.1. For all k 2 {1, ..,m}, there exists positive constants cm,p,k

such that

P (�dt

(v) = k|Gt

) = cm,p,k

dkt

(v)

t+O

dk+1

t

(v)

tk+1

.

In particular, for k = 1 we have cm,p,1

= 1/2.

Proof. We begin proving the following claim involving the random variables Y (i)

t

Claim: For all i 2 {0, 1, 2, . . . ,m},

P⇣

Y(i)

t+1

= v�

Gt

=dt

(v)

2mt. (2.3.2)

Page 22: Agglomeration in scale-free random graphs - Universidade Federal de … · 2019. 11. 15. · Departamento de matemática Agglomeration in scale-free random graphs por ... Neste trabalho

2. The Holme-Kim model 17

Proof of the claim: The proof follows by induction on i. For i = 1 we have nothing to do.So, suppose the claim holds for all choices before i� 1. Then,

P⇣

Y(i)

t+1

= v�

Gt

= P⇣

Y(i)

t+1

= v, Y(i�1)

t+1

= v�

Gt

+ P⇣

Y(i)

t+1

= v, Y(i�1)

t+1

6= v�

Gt

=(1� p)

4m2

d2t

(v)

t2+ P

Y(i)

t+1

= v, Y(i�1)

t+1

6= v�

Gt

.(2.3.3)

For the first term on the r.h.s the only way we can choose v again is following a PA-stepand then choosing v according to preferential attachment rule. This means

P⇣

Y(i)

t+1

= v, Y(i�1)

t+1

= v�

Gt

=(1� p)

4m2

d2t

(v)

t2. (2.3.4)

For the second term, we divide it in two sets, whether the vertex chosen at the previouschoice is a neighbor of v or not.

P⇣

Y(i)

t+1

= v, Y(i�1)

t+1

6= v�

Gt

=X

u/2�Gt (v)

P⇣

Y(i)

t+1

= v, Y(i�1)

t+1

= u�

Gt

+X

u2�Gt (v)

P⇣

Y(i)

t+1

= v, Y(i�1)

t+1

= u�

Gt

= (1� p)dt

(v)

2mt

0

@

X

u/2�Gt (v)

dt

(u)

2mt

1

A

+ p

0

@

X

u2�Gt (v)

eGt(u, v)

dt

(u)

dt

(u)

2mt

1

A

+(1� p)

2m

dt

(v)

t

0

@

X

u2�Gt (v)

dt

(u)

2mt

1

A

=dt

(v)

2mt� (1� p)

4m2

d2t

(v)

t2.

We used our inductive hypothesis and that

X

u2�Gt (v)

dt

(u)

2mt+

X

u/2�Gt (v)

dt

(u)

2mt= 1� d

t

(v)

2mt

Returning to (2.3.3) we prove the claim. ⌅

We will show the particular case k = 1 since we have particular interest in the value of cm,p,1

and point out how to obtain the other cases. We begin noticing that the process of choices

Page 23: Agglomeration in scale-free random graphs - Universidade Federal de … · 2019. 11. 15. · Departamento de matemática Agglomeration in scale-free random graphs por ... Neste trabalho

2. The Holme-Kim model 18

is by definition a homogeneous Markovian process. This means to evaluate the probabilityof a vertex increasing its degree by exactly one, the case k = 1, we just need to know theprobabilities of transition. In this way, use the notation P

t

to denote the measure conditionedon G

t

and go to the computation of the probabilities of transition. We start by the hardestone.

Pt

Y(i+1)

t+1

= v�

Y(i)

t+1

6= v⌘

=X

u2�Gt (v)

Pt

Y(i+1)

t+1

= v�

Y(i)

t+1

= u⌘

Pt

Y(i)

t+1

= u�

Y(i)

t+1

6= v⌘

+X

u/2�Gt (v)

Pt

Y(i+1)

t+1

= v�

Y(i)

t+1

= u⌘

Pt

Y(i)

t+1

= u�

Y(i)

t+1

6= v⌘

(2.3.5)

When u 2 �Gt(v), we can choose v taking any of the two steps. This implies the equation

below

Pt

Y(i+1)

t+1

= v�

Y(i)

t+1

= u⌘

= (1� p)dt

(v)

2mt+ p

et

(u, v)

dt

(u), (2.3.6)

but when u /2 �Gt(v) the only way we can choose v is following a PA-step, which implies

that

Pt

Y(i+1)

t+1

= v�

Y(i)

t+1

= u⌘

= (1� p)dt

(v)

2mt. (2.3.7)

We also notice that the equation below holds, since u 6= v and our claim is true

Pt

Y(i)

t+1

= u�

Y(i)

t+1

6= v⌘

=Pt

Y(i)

t+1

= u⌘

Pt

Y(i)

t+1

6= v⌘

Claim

=dt

(u)

2mtPt

Y(i)

t+1

6= v⌘ . (2.3.8)

And the same Claim also implies that

1

Pt

Y(i)

t+1

6= v⌘ =

1

1� dt(v)

2mt

= 1 +1X

n=1

dt

(v)

2mt

n

. (2.3.9)

Combining the last three equations to (2.3.5), we are able to get

Pt

Y(i+1)

t+1

= v�

Y(i)

t+1

6= v⌘

= (1� p)dt

(v)

2mt+ p

dt

(v)

2mt

1 +1X

n=1

dt

(v)

2mt

n

!

=dt

(v)

2mt+O

d2t

(v)

t2

.

(2.3.10)

Page 24: Agglomeration in scale-free random graphs - Universidade Federal de … · 2019. 11. 15. · Departamento de matemática Agglomeration in scale-free random graphs por ... Neste trabalho

2. The Holme-Kim model 19

If we chose v at the previous choice, the only way we select it again is following a PA-step,this means that

Pt

Y(i+1)

t+1

= v�

Y(i)

t+1

= v⌘

= (1� p)dt

(v)

2mt. (2.3.11)

From these two probabilities of transition we may obtain the other ones.

To compute the probability of {�dt

(v) = 1} given Gt

we may split it in m possible ways toincrease d

t

(v) by exactly one. Each possible way has an index i 2 {1, ...,m} meaning thestep we chose v and then we must avoid it at the other m� 1 choices. This means that eachof these m ways has a probability similar to the expression below

1� dt

(v)

2mt�O

d2t

(v)

t2

◆◆

i�1

dt

(v)

2mt+O

d2t

(v)

t2

◆◆✓

1� dt

(v)

2mt�O

d2t

(v)

t2

◆◆

m�i

which implies that

P (�dt

(v) = 1|Gt

) =dt

(v)

2t+O

d2t

(v)

t2

.

The cases k > 1 are obtained in the same way, considering the�

m

k

ways of increase dt

(v) byk.

Observation 1. We must notice that term O⇣

d

k+1t (v)

t

k+1

given by the Lemma actually is

a statement stronger than the real big-O notation. Since the Lemma’s proof implies theexistence of a positive constant c

m,p,k

such that

O

dk+1

t

(v)

tk+1

cm,p,k

dk+1

t

(v)

tk+1

,

for all vertex v and time t.

2.3.2 Upper bounds on vertex degrees

To control the number of cherries, Ct

, we will need upper bounds on vertex degrees. Thebound is obtained applying the Azuma-Ho↵ding inequality to the degree of each vertex,which is a martingale after normalizing by the quantity defined below.

�(t) :=t�1

Y

s=1

1 +1

2s

. (2.3.12)

A fact about �(t) will be useful: there exists positive constants b1

and b2

such that

b1

pt �(t) b

2

pt,

for all t.

Page 25: Agglomeration in scale-free random graphs - Universidade Federal de … · 2019. 11. 15. · Departamento de matemática Agglomeration in scale-free random graphs por ... Neste trabalho

2. The Holme-Kim model 20

Proposition 2.3.13. For each vertex j, the sequence (X(j)

t

)t�j

defined as

X(j)

t

:=dt

(j)

�(t)

is a martingale.

Proof. Since the vertex j will remain fixed throughout the proof, we will write simply Xt

instead of X(j)

t

.

Observe that we can write dt+1

(j) as follows

dt+1

(j) = dt

(j) +m

X

k=1

1n

Y(k)

t+1

= jo

.

In addition, we proved in Lemma 2.3.1 that, for all k 2 1, ...,m, we have

P⇣

Y(k)

t+1

= j�

Gt

=dt

(j)

2mt. (2.3.14)

Thus, the follow equivalence relation is true

E [dt+1

(j)|Gt

] =

1 +1

2t

dt

(j). (2.3.15)

Then, dividing the above equation by �(t+ 1) the desired result follows.

Once we have Proposition 2.3.13 we are able to obtain an upper bound for dt

(j).

Theorem 2.3.16. There is a positive constant b3

such that, for all vertex j

P⇣

dt

(j) � b3

pt log(t)

t�100.

Proof. The proof is essentially applying Azuma’s inequality to the martingale we obtainedin Proposition 2.3.13. Again we will write it as X

t

.

Applying Azuma’s inequality demands controlling Xt

’s variation, which satisfies the upperbound below

|�Xs

| =

ds+1

(j)��

1 + 1

2s

ds

(j)

�(s+ 1)

2m

�(s+ 1) b

4ps. (2.3.17)

Thus,t

X

s=j+1

|�Xs

|2 b5

log(t).

Page 26: Agglomeration in scale-free random graphs - Universidade Federal de … · 2019. 11. 15. · Departamento de matemática Agglomeration in scale-free random graphs por ... Neste trabalho

2. The Holme-Kim model 21

We must notice that none of the above constants depend on j. Then, Azuma’s inequalitygives us that

P (|Xt

�X0

| > �) 2 exp

� �2

b5

log(t)

. (2.3.18)

Choosing � = 10pb5

log(t) and recalling Xt

= dt

(j)/�(t) we obtain

P✓

dt

(j)� m�(t)

�(j)

> 10p

b5

�(t) log(t)

t�100.

Finally, using that b1

pt �(t+ 1) b

2

pt, comes

P✓

dt

(j)� m�(t)

�(j)

> 10p

b5

b2

pt log(t)

t�100.

Implying the desired result.

An immediate consequence of Theorem 2.3.16 is an upper bound for the maximum degreeof G

t

Corollary 2.3.19 (Upper Bound to the maximum degree). There exists a positive constantb1

such that

P⇣

dmax

(Gt

) � b1

pt log(t)

t�99.

Proof. The event involving dmax

(Gt

) may be seen as follows

n

dmax

(Gt

) � b1

pt log(t)

o

=[

jt

n

dt

(j) � b1

pt log(t)

o

.

Using union bound and applying Theorem 2.3.16 we prove the Corollary.

The next three lemmas are of a technical nature. Their statements will become clearer inthe proof of the upper bound for C

t

.

Lemma 2.3.20. There are positive constants b6

and b7

, such that, for all vertex j and alltime t

0

t, we have

P

dt0(j) > b

4

r

t0

tdt

(j) + b5

pt0

log(t)

!

t�100.

Proof. For each vertex j and t0

t, consider the sequence of random variables (Zs

)s�0

defined as Zs

= Xt0+s

, which is adaptable to the filtration Fs

:= Gt0+t

.

Page 27: Agglomeration in scale-free random graphs - Universidade Federal de … · 2019. 11. 15. · Departamento de matemática Agglomeration in scale-free random graphs por ... Neste trabalho

2. The Holme-Kim model 22

Concerning Zs

’s variation, using (2.3.17), we have the following upper bound

|�Zs

| = |�Xt0+s

| b4p

t0

+ s.

Thus,t�t0X

s=0

|�Zs

|2 b5

log(t).

Applying Azuma’s inequality, we obtain

P (|Zt�t0 � Z

0

| � �) 2 exp

� �2

b5

log(t)

. (2.3.21)

But, the definition of Zs

and the fact that �(t) = ⇥�p

t�

the inclusion of events below istrue

dt0(j) > b

2

pt0

�+b2

pt0

b1

ptdt

(j)

⇢ {|Zt

� Z0

| � �} ,

which, combined to (2.3.21), proves the lemma if we choose � = 10pb5

log(t).

Lemma 2.3.22. There is a positive constant b8

such that

P

t

[

j=1

t

[

t0=j

dt0(j) > b

8

pt0

log(t)

!

2t�98.

Proof. This Lemma is consequence of Theorem 2.3.16 and Lemma 2.3.20, which state, re-spectively

P⇣

dt

(j) � b3

pt log(t)

t�100, (2.3.23)

P

dt0(j) > b

4

r

t0

tdt

(j) + b5

pt0

log(t)

!

t�100. (2.3.24)

In which the constants b3

, b4

and b5

don’t depend on the vertex j neither the times t0

and t.

Now, for each t0

t and vertex j, consider the events below

At0,j :=

dt0(j) > b

8

pt0

log(t)

,

Bt0,j :=

(

dt0(j) > b

4

r

t0

tdt

(j) + b5

pt0

log(t)

)

,

eC

t,j

:=n

dt

(j) � b3

pt log(t)

o

.

Page 28: Agglomeration in scale-free random graphs - Universidade Federal de … · 2019. 11. 15. · Departamento de matemática Agglomeration in scale-free random graphs por ... Neste trabalho

2. The Holme-Kim model 23

Now we obtain an upper bound for P (At0,j) using the bounds we have obtained for the

probabilities of Bt0,j and C

t,j

.

P (At0,j) = P (A

t0,j \ Bt0,j) + P

At0,j \ Bc

t0,j

P (Bt0,j) + P

At0,j \Bc

t0,j\ C

t,j

+ P�

At0,j \Bc

t0,j\ Cc

t,j

P (Bt0,j) + P (C

t0,j) + P�

At0,j \ Bc

t0,j\ Cc

t,j

.

(2.3.25)

However, notice we have the following inclusion of events

Bc

t0,j\ Cc

t,j

⇢�

dt0(j) (b

4

b3

+ b5

)pt0

log(t)

.

Thus, choosing b8

= 2(b4

b3

+ b5

) we have At0,j \Bc

t0,j\Cc

t,j

= ;, which allows us to concludethat

P (At0,j) 2t�100.

Finally, an union bound over t0

followed by a union bound over j implies the desired result.

2.4 Positive local clustering

In this section we prove Theorem 2.1.1, which says that the local clustering coe�cient isbounded away from 0 with high probability.

Proof of Theorem 2.1.1. We must find a lower bound for

Cloc

Gt:=

1

t

X

v2Gt

CGt(v).

Let vm

be a vertex in Gt

whose degree is m. Observe that each TF-step we took when vm

was added increase e (�Gt(vm)) by one. So, denote by T

v

the number of TF-steps taken atthe moment of creation of vertex v. Since all the choices of steps are made independently,Tv

follows a binomial distribution with parameters m � 1 and p. Now, for every vertex weadd to the graph, put a blue label on it if T

v

� 1. The probability of labeling a vertex isbounded away from zero and we denote it by p

b

.

By Theorem 2.1.3, with probability at least 1� t�100, we have

Nt

(m) � b1

t� b2

p

t log(t).

Thus, the number of vertices in Gt

of degree m which were labeled, N (b)

t

(m), is boundedfrom below by a binomial random variable, B

t

, with parameters b1

t � b2

p

t log(t) and pb

.But, about B

t

we have, for all � > 0,

P✓

Bt

E[Bt

]

4

1� pb

+ pb

e��

b1t�b2

pt log(t)

expn

�pb

b1

t� b2

p

t log(t)⌘

/4o

Page 29: Agglomeration in scale-free random graphs - Universidade Federal de … · 2019. 11. 15. · Departamento de matemática Agglomeration in scale-free random graphs por ... Neste trabalho

2. The Holme-Kim model 24

and choosing � properly we conclude that, w.h.p,

N(b)

t

(m) � pb

b1

t� b2

p

t log(t)⌘

/4. (2.4.1)

Finally, note that each blue vertex of degree m has CGt(v) > 2m�2. Combining this with

(2.4.1) we have

Cloc

Gt>

1

t

X

v2N(b)t (m)

CGt(v) > t�1N

(b)

t

(m)2m�2

> t�12m�2pb

b1

t� b2

p

t log(t)⌘

/4

! 2m�2pb

b1

> 0 as t ! +1,

proving the theorem.

2.5 Vanishing global clustering

This section is devoted to the proof of Theorem 2.1.2 which states that the global clusteringof G

t

goes to zero at 1/ log t speed. Since the proof depends on estimates for the number ofcherries, C

t

, we first derive the necessary bounds and finally put all the pieces together atthe end of this section.

2.5.1 Preliminary estimates for number of cherries

Let

Ct

:=t

X

j=1

d2t

(j)

denote the sum of the squares of degrees in Gt

. We will to prove bounds for Ct

insteadof proving them directly for C

t

. Since Ct

= Ct

/2 �mt the results obtained for Ct

directlyextend to C

t

.

Lemma 2.5.1. There is a positive constant B3

such that

E

Cs+1

� Cs

2

Gs

B3

dmax

(Gs

)Cs

s.

Page 30: Agglomeration in scale-free random graphs - Universidade Federal de … · 2019. 11. 15. · Departamento de matemática Agglomeration in scale-free random graphs por ... Neste trabalho

2. The Holme-Kim model 25

Proof. We start the proof noticing that for all vertex j we have d2s+1

(j)�d2s

(j) 2mds

(j)+m2

deterministically From this remark the inequality below follows.

Cs+1

� Cs

2ms

X

j=1

ds

(j)1 {�ds

(j) � 1}+ 2m2. (2.5.2)

Since all vertices have degree at least m, we have m2 mP

s

j=1

ds

(j)1 {�ds

(j) � 1}, thus

Cs+1

� Cs

4ms

X

j=1

ds

(j)1 {�ds

(j) � 1} . (2.5.3)

Applying Cauchy-Schwarz to the above inequality, we obtain

Cs+1

� Cs

2

16m2

s

X

j=1

ds

(j)1 {�ds

(j) � 1} · 1 {�ds

(j) � 1}!

2

16m2

s

X

j=1

d2s

(j)1 {�ds

(j) � 1}!

s

X

j=1

1 {�ds

(j) � 1}!

16m3

s

X

j=1

d2s

(j)1 {�ds

(j) � 1} .

(2.5.4)

Recalling that

P (�ds

(j) � 1|Gs

) ds

(j)

2swe have

E✓

Cs+1

� Cs

2

Gs

B3

s

X

j=1

d3s

(j)

s B

3

dmax

(Gs

)Cs

s,

concluding the proof.

Theorem 2.5.5 (Upper bound for Ct

). There is a positive constant B1

such that

P�

Ct

� B1

t log2(t)�

t�98.

Proof. We show the result for Ct

, which is greater than Ct

. To do this, we need to deter-mine E[d2

t+1

(j)|Gt

].

As in proof of Proposition 2.3.13, write

dt+1

(j) = dt

(j) +m

X

k=1

1n

Y(k)

t+1

= jo

Page 31: Agglomeration in scale-free random graphs - Universidade Federal de … · 2019. 11. 15. · Departamento de matemática Agglomeration in scale-free random graphs por ... Neste trabalho

2. The Holme-Kim model 26

and denoteP

m

k=1

1n

Y(k)

t+1

= jo

by �dt

(j). Thus,

d2t+1

(j) = d2t

(j)

1 +�d

t

(j)

dt

(j)

2

= d2t

(j) + 2dt

(j)�dt

(j) + (�dt

(j))2.

Combining the above equation with (2.3.14) and (2.3.15), we get

E⇥

d2t+1

(j)�

�Gt

= d2t

(j) +d2t

(j)

t+ E

(�dt

(j))2�

�Gt

. (2.5.6)

Dividing the above equation by t+ 1, we get

E

d2t+1

(j)

t+ 1

Gt

=d2t

(j)

t+

E [(�dt

(j))2|Gt

]

t+ 1, (2.5.7)

which implies

E

"

Ct+1

t+ 1

Gt

#

=C

t

t+

m2

t+ 1+

t

X

j=1

E [(�dt

(j))2|Gt

]

t+ 1. (2.5.8)

It is straightforward to see that �dt

(j) (�dt

(j))2 m�dt

(j), which implies

E⇥

(�dt

(j))2�

�Gt

= ⇥

dt

(j)

t

.

Thus, (2.5.8) may be written as

E

"

Ct+1

t+ 1

Gt

#

=C

t

t+⇥

1

t

. (2.5.9)

Now, define

Xt

:=C

t+1

t+ 1.

Equation (2.5.9) states that Xt

is a martingale up to a term of magnitude ⇥�

1

t

. In order toapply martingale concentration inequalities, we decompose X

t

as in Doob’s Decompositiontheorem. X

t

can be written as Xt

= Mt

+ At

, in which Mt

is a martingale and At

is apredictable process. By Equation (2.5.9), we have

At

=t

X

s=2

E [Xs

|Gs�1

]�Xs�1

=t

X

s=2

1

s

. (2.5.10)

I.e., At

= ⇥ (log(t)) almost surely.

Page 32: Agglomeration in scale-free random graphs - Universidade Federal de … · 2019. 11. 15. · Departamento de matemática Agglomeration in scale-free random graphs por ... Neste trabalho

2. The Holme-Kim model 27

The remainder of the proof is devoted to controlling Xt

’s martingale component using Freed-man’s Inequality. Once again, by Doob’s Decomposition Theorem, we have

Mt

:= X0

+t

X

s=2

Xs

� E [Xs

|Gs�1

] .

Observe that Mt+1

= Mt

+Xt+1

� E [Xt+1

|Gt

], thus

|�Ms

| = |Xs+1

� E [Xs+1

|Gs

]| |Xs+1

�Xs

|+ b9

s

Cs+1

��

1 + 1

s

Cs

s+ 1

+b9

s

b10

dmax

(Gs

)

s+ b

11

Cs

s2+

b9

s.

(2.5.11)

Since �Cs

attains its maximum when the vertices of maximum degree in Gs

receive at leasta new edge at time s+ 1. Furthermore, since d

max

(Gs

) ms and Cs

m2s2, there exists aconstant b

12

such that maxst

|�Ms

| b12

almost surely.

Combining

|�Ms

|

Cs+1

��

1 + 1

s

Cs

s+ 1

+b9

s

with Cauchy-Schwarz and Lemma 2.5.1, we obtain positive constants b13

, b14

and b15

suchthat

E⇥

(�Ms

)2�

�Gs

CS

b13

Eh

(�Cs

)2�

Gs

i

s2+ b

14

C2

s

s4+

b15

s2

Lemma 2.5.1

b16

dmax

(Gs

)Cs

s3+ b

14

C2

s

s4+

b15

s2.

(2.5.12)

Now, define Vt

as

Vt

:=t

X

s=2

E⇥

(�Ms

)2�

�Gs

and call bad set the event below

Bt

:=t

[

j=1

t

[

t0=j

dt0(j) > b

8

pt0

log(t)

,

observe that Lemma 2.3.22 guarantees P(Bt

) 2t�98.

Page 33: Agglomeration in scale-free random graphs - Universidade Federal de … · 2019. 11. 15. · Departamento de matemática Agglomeration in scale-free random graphs por ... Neste trabalho

2. The Holme-Kim model 28

Also notice that Cs

b17

dmax

(Gs

)s almost surely and in Bc

t

we have dmax

(Gs

) b8

ps log(t)

for all s t. Then, outside Bt

we have

Vt

(2.5.12)

t

X

s=2

b16

dmax

(Gs

)Cs

s3+ b

14

C2

s

s4+

b15

s2

t

X

s=2

b16

b28

b17

s2 log2(t)

s3+

b14

b217

s3 log2(t)

s4+

b15

s2

b18

log3(t).

(2.5.13)

So, by Freedman’s inequality, we obtain

P�

Mt

> �, Vt

b18

log3(t)�

exp

� �2

2b18

log3(t) + 2b12

�/3

.

Therefore, if � = b19

log2(t) with b19

large enough, we get

P�

Mt

> b19

log2(t), Vt

b18

log3(t)�

t�100. (2.5.14)

The inequality (2.5.13) guarantees the following inclusion of events

Bc

t

⇢�

Vt

b18

log3(t)

. (2.5.15)

And also,�

Xt

� b21

log2(t)

⇢�

Mt

� (b21

� b20

) log2(t)

. (2.5.16)

since At

b20

log(t) and Mt

� Xt

� b20

log(t).

Finally,

P�

Mt

> b19

log2(t)�

= P�

Mt

> b19

log2(t), Vt

b18

log3(t)�

+ P�

Mt

> b19

log2(t), Vt

> b18

log3(t)�

t�100 + P (Bt

)

P�

Mt

> b19

log2(t)�

3t�98,

proving the Theorem..

We notice that from equation (2.5.8) we may extract the recurrence below

Eh

Ct

i

=

1 +1

t� 1

Eh

Ct�1

i

+ c0

,

in which c0

is a positive constant depending on m and p only. Expanding it, we obtain

Eh

Ct

i

=t�1

Y

s=1

1 +1

s

Eh

C1

i

+ c0

t�1

X

s=1

t�1

Y

r=s

1 +1

r

,

which implies E[Ct

] = ⇥(t log t). This means the upper bound for Ct

given by Theorem 2.5.5is exactly E[C

t

] log(t).

Page 34: Agglomeration in scale-free random graphs - Universidade Federal de … · 2019. 11. 15. · Departamento de matemática Agglomeration in scale-free random graphs por ... Neste trabalho

2. The Holme-Kim model 29

2.5.2 The bootstrap argument

Obtaining bounds for Ct

requires some control of its quadratic variation, which requiresbounds for the maximum degree and C

t

, as in Lemma 2.5.1. Applying some deterministicbounds and upper bounds on the maximum degree we were able to derive an upper boundfor C

t

, which is of order E[Ct

] log t. To improve this bound and obtain the right order,we proceed as in proof of Theorem 2.5.5, but making use of the preliminary estimatejustdiscussed. This is what we call the bootstrap argument.

The result we obtain is enunciated in Theorem 2.1.4 and consist of a Weak Law of LargeNumbers, which states that C

t

divided by t log t actually converges in probability to a con-stant depending only on m.

Proof of Theorem 2.1.4. In proof of Theorem 2.5.5, we decomposed the process Xt

= Ct

/tin two components: M

t

and At

. The first part of the proof will be dedicated to showingthat M

t

= o(log(t)), w.h.p. Then we show that At

= (m2 +m) log(t) also w.h.p.

We repeat the proof given for Theorem 2.5.5, but this time we change our definition of badset to

Bt

=t

[

s=log

1/2(t)

n

Cs

� b20

s log2(s)o

.

By Theorem 2.5.5 and union bound, P(Bt

) log�97/2(t). Observe that an upper bound forC

s

gives an upper bound for dmax

(Gs

), since

d2max

(Gs

) Cs

=) dmax

(Gs

) q

Cs

=) dmax

(Gs

) ps log(s),

when Cs

s log2(s).

Using (2.5.12) we have, in Bc

t

,

Vt

t�1

X

s=1

b16

dmax

(Gs

)Cs

s3+ b

14

Cs

2

s4+

b14

s2

log

1/2(t)�1

X

s=1

b016

+t�1

X

s=log

1/2(t)

b17

ps log(s)s log2(s)

s3+ b

18

s2 log4(s)

s4+

b14

s2

b19

log1/2(t),

(2.5.17)

since dmax

(Gs

) m · s and Cs

2m2 · s2 for all s and, in Bc

t

, dmax

(Gs

) pb20

s log(s) andC

s

b20

s log2(s) for all s � log1/2(t). Then, by Freedman’s inequality,

P⇣

Mt

� log1/4+�(t), Vt

b19

log1/2(t)⌘

= o(1). (2.5.18)

Page 35: Agglomeration in scale-free random graphs - Universidade Federal de … · 2019. 11. 15. · Departamento de matemática Agglomeration in scale-free random graphs por ... Neste trabalho

2. The Holme-Kim model 30

Recall equation (2.5.8)

E

"

Ct+1

t+ 1

Gt

#

=C

t

t+

m2

t+ 1+

t

X

j=1

E [(�dt

(j))2|Gt

]

t+ 1.

Now, we recall from Lemma 2.3.1 that for all k 2 {1, ...,m}

P⇣

Y(k)

t+1

= v�

Gt

=dt

(v)

2mt.

Furthermore,

P⇣

Y(k)

t+1

= v, Y(j)

t+1

= v�

Gt

= O

d2t

(v)

t2

.

Thus,

E⇥

(�dt

(v))2�

�Gt

=dt

(v)

2t+O

d2t

(v)

t2

, (2.5.19)

which implies that

E

"

Ct+1

t+ 1

Gt

#

=C

t

t+

m2 +m

t+ 1+O

Ct

t3

!

,

and consequently

At

=t

X

s=2

m2 +m

s+ 1+O

Cs

s3

!

.

As we have already noticed before, all the constants involved in the big-O notation do notdepend on the time or the vertex. From the above equation we deduce that, in Bc

t

,

t

X

s=2

O

Cs

s3

!

b11

log

1/2(t)

X

s=2

Cs

s3+

t

X

s=log

1/2(t)

s log2(s)

s3 b

12

log(log(t)).

Thus, in Bc

t

,A

t

= (m2 +m) log(t) + o(log(t)).

Finally, fix a small positive "

P

Ct

t log(t)�m2 +m

> "

!

= P✓

Mt

+ At

log(t)�m2 +m

> "

P✓

Mt

+ At

log(t)�m2 +m

> ", Bc

t

+ P (Bt

) .

(2.5.20)

Page 36: Agglomeration in scale-free random graphs - Universidade Federal de … · 2019. 11. 15. · Departamento de matemática Agglomeration in scale-free random graphs por ... Neste trabalho

2. The Holme-Kim model 31

We also have that Bc

t

⇢ {Vt

b19

log1/2(t)}, which, combined to (2.5.18), implies

P✓

Mt

+ At

log(t)�m2 +m

> ",Mt

� log1/4+�(t), Bc

t

= o(1).

And recall that, in Bc

t

, Mt

is at most log1/4+�(t) and At

= (m2 +m) log(t) + o(log(t)), thus

P✓

Mt

+ At

log(t)�m2 +m

> ",Mt

< log1/4+�(t), Bc

t

= 0

for large enough t.

Recalling that Ct

= Ct

/2�mt we obtain the desired result.

2.5.3 Wrapping up

Until here we devoted our e↵orts to properly control the number of cherries in Gt

. Now, wecombine these results with simple bounds for the number of triangles in G

t

to finally obtainthe exact order of the global clustering.

Proof of Theorem 2.1.2. By Theorem 2.1.4 we have

Ct

= ⇥ (t log(t)) ,w.h.p.

But, observe that number of triangles in Gt

, �Gt , is bounded from above by

m

2

t. And notethat every TF-step we take increases �

Gt by one. Then,

�Gt � Z

t

=t

X

s=1

Ts

where Ts

is the number of TF-steps we took at time s. Since all the choices concerning whichkind of step we follow are independent, T

s

⇠ bin(m � 1, p) and Zt

⇠ bin((m � 1)t, p). ByCherno↵ Bounds, Z

t

� �(m� 1)pt, for a small �, w.h.p. Thus,

�Gt = ⇥(t),w.h.p,

which conclude the proof.

2.6 Final comments on clustering

We end our discussion about clustering by comparing the two clustering coe�cients from adi↵erent perspective than in Section 2.1.2. Recall that Cloc

Gtis an unweighted average of local

Page 37: Agglomeration in scale-free random graphs - Universidade Federal de … · 2019. 11. 15. · Departamento de matemática Agglomeration in scale-free random graphs por ... Neste trabalho

2. The Holme-Kim model 32

clustering coe�cients.

Cloc

Gt:=

1

t

X

v2Gt

CGt(v).

On the other hand, Cglo

Gtis a weighted average, where the weight of vertex v is the number

of cherries that it belongs to,

Cglo

Gt= 3⇥

P

v2GtCGt(v)

dt(v)

2

P

v2Gt

dt(v)

2

(2.6.1)

Thus the weight of v in Cglo

Gtis basically proportional to the square of the degree. This skews

the distribution of weights towards high-degree nodes. The clustering of the high degreevertices is the reason why the two coe�cients present so distinct behavior.

We will show below that CGt(v) for a vertex v of high degree d is of order d�1, which explains

why Cglo

Gtgoes to zero. Recall that the r.v. e

t

(�v

) counts the number of edges between theneighbors of v. Due to the definition of our model, one can only increse e

t

(�v

) by one if dt

(v)is also increased by at least one unit. Since e

t

(�v

) can only increase by m units in each timestep, we have:

et

(�v

) mdt

(v),

which implies an upper bound for CGt(v) ⇡ e

t

(�v

)/dt

(v)2 of order d�1

t

(v). The next propo-sition gives a lower bound of the same order.

Proposition 2.6.2. Let v be a vertex of Gt

. Then, there are positive constants, b1

and b2

,such that

P✓

CGt(v)

b1

dt

(v)

dt

(v) � b2

log(t)

t�100.

This proposition does not prove our clustering estimates, but seems interesting in any case.

Proof of the Proposition. Observe that if we choose v and take a TF-step thereafter, weincrease e

t

(�v

) by one. Then, if we look only at times in which this occurs, et

(�v

) must begreater than a binomial random variable with parameters: number of times we choose v atthe first choice and p. Since all the choices concerning the kind of step we take are madeindependently of the whole process, we just need to prove that number of times we choose vat the first choice, denoted by d

(1)

t

(v), is proportional to dt

(v) w.h.p.

Recall that Y(1)

s

indicates the vertex chosen at time s at the first of our m choices. Therandom variable d

(1)

t

(v) can be written in terms of Y (1)’s just as follows

d(1)

t

(v) =t

X

s=v+1

1�

Y (1)

s

= v

. (2.6.3)

Page 38: Agglomeration in scale-free random graphs - Universidade Federal de … · 2019. 11. 15. · Departamento de matemática Agglomeration in scale-free random graphs por ... Neste trabalho

2. The Holme-Kim model 33

We first claim that if dt

(v) is large enough, a positive fraction of its value must come

from d(1)

t

(v).

Claim: There exists positive constants b1

and b2

such that

P⇣

d(1)

t

(v) b1

dt

(v)�

dt

(v) � b2

log(t)⌘

t�100.

Proof of the claim: To prove the claim we condition on all possible trajectories of dt

(v). Inthis direction, let ! be an event describing when v was chosen and how many times at eachstep. We have to notice that ! does not record whether v was chosen by a PA-step or aTF-step. The event ! can be regarded as a vector in {0, 1, ...,m}t�v�1 such that !(s) = kmeans we chose v k-times at time s. For each !, let d

!

(v) be the degree of v obtained bythe sequence of choices given by !.

Recall the Equations (2.3.10) and (2.3.11) which states that

Pt

Y(i+1)

t+1

= v�

Y(i)

t+1

6= v⌘

=dt

(v)

2mt+O

d2t

(v)

t2

and

Pt

Y(i+1)

t+1

= v�

Y(i)

t+1

= v⌘

= (1� p)dt

(v)

2mt.

For any ! such that !(s) = k � 1, we may show, using (2.3.10) and (2.3.11), that thereexists a positive constant � depending only on m and p, such that

P�

Y (1)

s

= v�

�!�

� �. (2.6.4)

Furthermore, given !, the random variables 1{Y (1)

s

= v} are independent. This implies

that, given !, the random variable d(1)t

(v) dominates stochastically another random variablefollowing a binomial distribution of parameters d

!

(v)/m and �. Thus, by Cherno↵ bounds,we can choose a small b

1

such that

P⇣

d(1)

t

(v) b1

d!

(v)�

!⌘

exp (�d!

(v)) .

Since we are on the event Dt

:= {dt

(v) � b2

log(t)}, all d!

(v) � b2

log(t) for some b2

thatcan be chosen in a way such that

P⇣

d(1)

t

b1

d!

(v)�

!⌘

t�100

for all ! compatible with Dt

. Finally, to estimaten

d(1)

t

(v) b1

dt

(v)o

, we condition on all

possible history of choices !

P⇣

d(1)

t

(v) b1

dt

(v)�

Dt

=X

!

P⇣

d(1)

t

(v) b1

dt

(v)�

!, Dt

P (!|Dt

) t�100

(2.6.5)

Page 39: Agglomeration in scale-free random graphs - Universidade Federal de … · 2019. 11. 15. · Departamento de matemática Agglomeration in scale-free random graphs por ... Neste trabalho

2. The Holme-Kim model 34

and this proves the claim. ⌅

As we observed at the beginning, et

(�v

) dominates a random variable bin(d(1)t

(v), p). And

by the claim, d(1)t

(v) is proportional to dt

(v), w.h.p. Using Cherno↵ Bounds, we obtain theresult.

2.7 Power law degree distribution

We end this Chapter by proving the power law distribution.

Lemma 2.7.1 (Lemma 3.1 [11]). Let at

be a sequence of positive real numbers satisfying thefollowing recurrence relation

at+1

=

1� bt

t

at

+ ct

.

Furthermore, suppose bt

! b > 0 and ct

! c, then

limt!1

at

t=

c

1 + b.

Proof of Theorem 2.1.3. We divide the proof into two parts. Part (a) is the power law forthe expected value of the proportion of vertices with degree d. Part (b) is the concentrationinequalities N

t

(d).

Proof of part (a).The is essentially the same gave in Section 3.2 of [11]. The key step isobtain a recurrence relation involving E[N

t

(d)] which has the same form of that requiredby Lemma 5.1.1. To obtain the recurrence relation, observe that N

t+1

(d) can be written asfollow

Nt+1

(d) =X

v2Nt(d)

{�dt(v)=0} +X

v2Nt(d�1)

{�dt(v)=1} + . . .+X

v2Nt(d�m)

{�dt(v)=m}. (2.7.2)

Taking the conditional expected value with respect to Gt

on the above equation, applyingLemma 2.3.1 and recalling that N

t

(d) t, we obtain

E[Nt+1

(d)|Gt

] = Nt

(d)

1� d

2t+O

d2

t2

◆�

+Nt

(d�1)

(d� 1)

2t+O

(d� 1)2

t2

◆�

+O

1

t

.

Finally, taking the expected value on both sides, denoting ENt

(d) by a(d)

t

, we have

a(d)

t+1

=

2

41�d

2

+O⇣

d

2

t

t

3

5 a(d)

t

+ a(d�1)

t

2

4

d�1

2

+O⇣

(d�1)

2

t

t

3

5+O

1

t

. (2.7.3)

Page 40: Agglomeration in scale-free random graphs - Universidade Federal de … · 2019. 11. 15. · Departamento de matemática Agglomeration in scale-free random graphs por ... Neste trabalho

2. The Holme-Kim model 35

From here, the proof follows by an induction on d � m and application of Lemma 5.1.1,assuming ENt(d�1)

t

�! Dd�1

, which gives us

at

(d)

t�!

Dd�1

(d�1)

2

1 + d/2= D

d�1

d� 1

2 + d=: D

d

and this gives us that

Dd

=2

2 +m

d

Y

k=m+1

(k � 1)

k + 2=

2(m+ 3)(m+ 1)

(d+ 3)(d+ 2)(d+ 1)

which proves the part (a).

Proof of part (b). The proof is in line with proof of Theorem 3.2 in [11]. For this reason weshow only that following process

X(d)

t

:=N

t

(d)�Dd

t+ 16d · c ·pt

d

(t),

in which d

(t) is defined as

d

(t) :=t�1

Y

s=d

1� d

2s

is a submartingale and we give an upper bound for its variation.

As in Theorem 3.2 of [11], the proof follows by induction on d.Inductive step: Suppose that for all d0 d� 1 we have

P⇣

Nt

(d0) Dd

0t� 16d0 · c ·pt⌘

(t+ 1)d0�me�c

2. (2.7.4)

Recalling that

E [Nt+1

(d)|Gt

] =

1� d

2t+O

d2

t2

◆◆

Nt

(d) +(d� 1)N

t

(d� 1)

2t+O

t�1

we have the following recurrence relation

Eh

d

(t+ 1)X(d)

t

Gt

i

�✓

1� d

2t

Nt

(d) +(d� 1)N

t

(d� 1)

2t

+O�

t�1

+ 16d · c ·pt�D

d

(t+ 1).

(2.7.5)

The inductive hypothesis assure us that

Nt

(d� 1) � Dd�1

t� 16(d� 1) · c ·pt,

Page 41: Agglomeration in scale-free random graphs - Universidade Federal de … · 2019. 11. 15. · Departamento de matemática Agglomeration in scale-free random graphs por ... Neste trabalho

2. The Holme-Kim model 36

with probability at least 1� (t+ 1)d�1�me�c

2. Thus, returning on (2.7.5),

Eh

d

(t+ 1)X(d)

t

Gt

i

�✓

1� d

2t

Nt

(d)

+(d� 1)D

d�1

2�D

d

(t+ 1) + 16d · c ·pt+O

t�1

.

(2.7.6)

But, observe that about the r.h.s of the above inequality, we have

(d� 1)Dd�1

2�D

d

(t+ 1) + 16d · c ·pt+O

t�1

�✓

1� d

2t

�Dd

t+ 16d · c ·pt⌘

()(d� 1)D

d�1

2�D

d

+O�

t�1

� dDd

2t� 8d2cp

t

()(d� 1)D

d�1

2+

8d2cpt+O

t�1

� Dd

+dD

d

2t(2.7.7)

but the last inequality is true since we have (d�1)Dd�1

= (d+2)Dd

and Dd

= 2(m+3)(m+1)

(d+3)(d+2)(d+1)

.

Returning to (2.7.6), we just proved that X(d)

t+1

is a submartingale with fail probability

bounded from above by (t + 1)d�me�c

2. And its variation �X

(d)

t

satisfies the upper boundbelow

��X(d)

s

� �Ns

(d) +Dd

+ 16dcs�1/2 + dNs

(d)(2s)�1

d

(s+ 1)

m+ 2/(d+ 2) + 17dcs�1/2 + d/2

d

(s+ 1)

2d

d

(s+ 1)+

17dcps

d

(s+ 1),

(2.7.8)

since �Ns

(d) m, Ns

(d) s and Dd

2/(d + 2) for all s and d. Thus, there is a positiveconstant M , such that

��X(d)

s

2 16d2

2

d

(s+ 1)+

Md2c2

s 2

d

(s+ 1). (2.7.9)

The lower bound for Nt

(d) is proven applying Theorem 2.36 of [11] on X(d)

t

, setting

� = 2c ·

v

u

u

t

t+1

X

s=d

�X(d)

s

2

.

Page 42: Agglomeration in scale-free random graphs - Universidade Federal de … · 2019. 11. 15. · Departamento de matemática Agglomeration in scale-free random graphs por ... Neste trabalho

2. The Holme-Kim model 37

The upper bound is obtained the same way, but considering the process

�X(d)

t

:=N

t

(d)�Dd

t� 16d · c ·pt

d

(t),

which is a supermartingale.

Page 43: Agglomeration in scale-free random graphs - Universidade Federal de … · 2019. 11. 15. · Departamento de matemática Agglomeration in scale-free random graphs por ... Neste trabalho

3. THE P-MODEL

In this chapter, we investigate a random graph model in a class known as GLP in theliterature of Computer Science and Physics. Informally, it is a modification of the modelproposed by Barabasi - Albert in [2], in which links among existing vertices are allowed. Thee↵ect of this alteration has positive consequences in the sense that this model outperformsothers popular models when the task is predicting or mimicking real-world complex networks(In [30], the authors have a quantitative version of this statement).

Let us briefly describe the process. This model has two parameters: a real number p 2 [0, 1]and an initial graph G

1

. For the sake of simplicity we will consider G1

to be the graph withone vertex and one loop. We consider the following two stochastic operations that can beperformed on the graph G:

• Vertex-step - Add a new vertex v, and add an edge {u, v} by choosing u 2 G withprobability proportional to its degree.

• Edge-step - Add a new edge {u1

, u2

} by independently choosing vertices u1

, u2

2 Gwith probability proportional to their degrees. We note that we allow loops to beadded, and we also allow a new connection to be added between vertices that alreadyshared an edge.

For a more formal definition, consider a sequence (Xt

)t�1

of i.i.d’s random variables, such

that Xt

d

= Ber(p). We define inductively a random graph process (Gt

)t�1

as follows: startwith G

1

, the graph with one vertex and one loop. Given Gt

, form Gt+1

by performing avertex-step on G

t

when Xt

= 1, and performing an edge-step on Gt

when Xt

= 0. Theresulting process is the object of study of this chapter.

The chapter’s goal is to investigate the existence of large complete subgraphs in Gt

, whichwe may refer as communities. We are interested in communities whose vertex set cardinality,which we also call order, goes to infinity as the process evolves.

This chapter is organized as follows: in Section 3.1.1 we prove an upper bound for the degreeof all vertices at all times using Azuma’s inequality. This upper bound is used in the proof ofthe lower bound for the degrees. Section 3.1.2 is dedicated to lower bounds for the degrees.

Page 44: Agglomeration in scale-free random graphs - Universidade Federal de … · 2019. 11. 15. · Departamento de matemática Agglomeration in scale-free random graphs por ... Neste trabalho

3. The p-model 39

Theses bounds require more work and some technical lemmas. The proof we present heredoes not use martingales inequalities. In Section 3.2, we rigorously prove the existence of alarge community applying the bounds obtained in Section 3.1.2. In Section 3.3 we presenta upper bound for the diameter of the graph generated by this model coupling it with thatgenerated by Barabasi-Albert’s model.

3.1 Bounds for the vertices’ degree

3.1.1 Upper bounds

In this section we will establish a simple bound on the probability that a given vertexhas a large degree. The result will follow from a application of Azuma’s inequality (seeTheorem 2.19 from [10]).

We will let i and j denote the i-th and j-th vertices to enter the random graph processrespectively. The random time in which the j-th vertex is added to the graph will bedenoted by T

j,1

.

Definition 3. Since the constant 1 � p/2 is going to appear many times throughout thisChapter, it deserves a special notation. We write

cp

:= 1� p/2.

Proposition 3.1.1. For each vertex j and each j t0

t the sequence of random vari-ables (Z

t

)t�t0

defined as

Zt

:=dt

(j)1{Tj,1=t0}Q

t�1

s=1

1 + cp

s

� (3.1.2)

is a martingale starting from t0

.

Proof. We consider the process (Gt

)t�0

to be adapted to a filtration (Ft

)t�1

. Recall �dt

(j) =dt+1

(j)� dt

(j). It is clear that �dt

(j) 2 {0, 1, 2}. Furthermore, conditioned on Ft

, we knowthe probability that �d

t

(j) takes each of these values. So that, assuming that j alreadyexists at time t, we have

E [�dt

(j)|Ft

] = 1 · pdt(j)2t

+ 1 · (1� p)2dt

(j)

2t

1� dt

(j)

2t

+ 2 · (1� p)(d

t

(j))2

4t2

= cp

dt

(j)

t.

Page 45: Agglomeration in scale-free random graphs - Universidade Federal de … · 2019. 11. 15. · Departamento de matemática Agglomeration in scale-free random graphs por ... Neste trabalho

3. The p-model 40

The information “vertex j exists at time t” can be introduced in the equation using therandom variable 1{Tj,1=t0}. Using the fact that 1{Tj,1=t0} is F

t0 measurable, we gain

E⇥

dt+1

(j)1{Tj,1=t0}�

�Ft

=⇣

1 +cp

t

dt

(j)1{Tj,1=t0}. (3.1.3)

Dividing the above equation byQ

t

s=1

1 + cp

s

we obtain the desired result.

Now we prove the main result of the section.

Proposition 3.1.4 (Upper bound for the degree). There exists a universal positive con-stant C

1

, such that for every vertex j we have that

P

dt

(j) � C1

tcp

s

log(t)

j1�p

!

1

t100.

Proof. We define �(t) :=Q

t�1

s=1

1 + cp

s

. Note that we can write

dt

(j)

�(t)=

t

X

t0=j

dt

(j)

�(t)1{Tj,1=t0},

and by Proposition 3.1.1 each term in the sum is a martingale. We want to apply Azuma’sinequality for each summand, but first we need some bounds on �. In this direction, Lemma5.1.2 assures us the existence of a positive constant c such that �(t) > ctcp .

In order to apply Azuma’s inequality, we must bound the variation of the random variable Zt

defined in (4.2.5), which satisfies the following upper bound

at

:=

dt+1

(j)1{Tj,1=t0}

�(t+ 1)�

dt

(j)1{Tj,1=t0}

�(t)

dt+1

(j)��

1 + cp

t

dt

(j)

�(t+ 1)

2 + cp

�(t+ 1), (3.1.5)

since �dt

(j) 2 and dt

(j) t. By the above discussion about � we have that a2t

< (2+cp)2

c

21t

2�p ,

which impliesP

t

s=t0a2s

< c0t�(1�p)

0

for some constant c0 > 0. Then, applying Azuma’sinequality, we obtain

P✓

dt

(j)1{Tj,1=t0}

�(t)� E

dt0(j)1{Tj,1=t0}

�(t0

)

> �

2 exp�

��2t1�p

0

/2c0�

, (3.1.6)

Note that

E

dt0(j)1{Tj,1=t0}

�(t0

)

= �(t0

)�1P(Tj,1

= t0

) �(t0

)�1.

Page 46: Agglomeration in scale-free random graphs - Universidade Federal de … · 2019. 11. 15. · Departamento de matemática Agglomeration in scale-free random graphs por ... Neste trabalho

3. The p-model 41

By choosing � = c2

q

tp�1

0

log(t), where c2

is a su�ciently large positive constant dependingonly on p, we gain

P

dt

(j)1{Tj,1=t0} � c2

tcp

s

log(t)

t1�p

0

+�(t)

�(t0

)

!

1

t101.

Using the union bound and the asymptotic behaviour of �, we obtain the result for allpossibles times t

0

� j: there exists a positive constant C1

depending only on p such that

P

t

[

t0=j

(

dt

(j)1{Tj,1=t0} � C1

tcp

s

log(t)

j1�p

)!

1

t100.

This finishes the proof of the Proposition.

Via a union bound over all vertices j in Gt

, we obtain the following result:

Corollary 3.1.7 (Upper bound to the maximum degree). Let dmax

(Gt

) denote the maximumdegree among all the vertices of G

t

. Then, there exists a universal positive constant C2

suchthat

P⇣

dmax

(Gt

) � C2

tcpp

log(t)⌘

1

t99.

3.1.2 Lower bounds

This section is devoted to proving the results needed to state a useful lower bound on thedegree of the vertices that entered the random graph early in the history of the process. Weprove two lemmas that let us control the tail of the random times Tm

j,k

, defined bellow, beforeproving the main result, Propositions 3.1.17. First we need a new notation:

Definition 4. Fix a vertex j and two integers m, k � 1. We define the random time

Tm

j,k

:= inft�1

8

<

:

jm

X

i=(j�1)m+1

dt

(i) = k

9

=

;

.

We also write Tj,k

:= T 1

j,k

. In other words, Tj,k

is the first time that the j-th vertex hasdegree at least k. Tm

j,k

can then be explained in the following way: assume that we identifyall the vertices 1 through m, then identify all the vertices m + 1 through 2m and so on.We let d

t,m

(j) denote the degree of the j-th block of m vertices. Then Tm

j,k

is the first timethat d

t,m

(j) is larger than, or equals to, k.

Page 47: Agglomeration in scale-free random graphs - Universidade Federal de … · 2019. 11. 15. · Departamento de matemática Agglomeration in scale-free random graphs por ... Neste trabalho

3. The p-model 42

Lemma 3.1.8. Let 0 < � < (c�1

p

� 1). For every k � 1, every large enough m 2 Z andj � (m2/(1�p) + 1), we can construct a sequence ⌘

m

, .., ⌘k

of independent random variables,with

⌘i

d

= Exp⇣

cp

1� 1� p

2(2� p)i�

i⌘

, for i = m, . . . , k, (3.1.9)

such that the whole sequence is independent of Tm

j,m

, .., Tm

j,k+1

, and such that

P�

Tm

j,k+1

> t�

P

Tm

j,m

exp

k

X

i=m

⌘i

!

> t

!

+m

[(j � 1)m]99

Proof. We follow the idea of the proof of Lemma 3.1 in [22]. But in our context the existenceof the edge-step demands more attention and prevents a straightforward application of thislemma.

We begin by constructing the k + 1 � m independent random variables ⌘m

, .., ⌘k

with dis-tribution given by (3.1.9), the whole sequence being independent of the random timesTm

j,m

, .., Tm

j,k+1

. Observe that

P�

Tm

j,k+1

> t�

=1X

s=k

P�

Tm

j,k+1

> t�

�Tm

j,k

= s�

P�

Tm

j,k

= s�

=k

1+�X

s=k

P�

Tm

j,k+1

> t�

�Tm

j,k

= s�

P�

Tm

j,k

= s�

+1X

s=k

1+�

P�

Tm

j,k+1

> t�

�Tm

j,k

= s�

P�

Tm

j,k

= s�

P�

Tm

j,k

k1+�

+1X

s=k

1+�

P�

Tm

j,k+1

> t�

�Tm

j,k

= s�

P�

Tm

j,k

= s�

.

(3.1.10)

We obtain an upper bound for the term P�

Tm

j,k+1

> t�

�Tm

j,k

= s�

in the following way: once thevertex (block) j reaches degree k at time s, we must avoid choosing j at all the subsequentsteps until time t. We note that there exists the possibility that Tm

j,k+1

= Tm

j,k

, in the casethat we add a loop to j at time Tm

j,k

, but in this case P�

Tm

j,k+1

> t�

�Tm

j,k

= s�

is equal to 0,and our calculations remain the same. Noting that at each step s+1 we choose the vertex jwith probability

cp

ds

(j)

s� (1� p)d2

s

(j)

4s2,

Page 48: Agglomeration in scale-free random graphs - Universidade Federal de … · 2019. 11. 15. · Departamento de matemática Agglomeration in scale-free random graphs por ... Neste trabalho

3. The p-model 43

and recalling that cp

:= 1� p/2, we obtain, for s � k1+�,

P�

Tm

j,k+1

> t�

�Tm

j,k

= s�

t�1

Y

r=s

1� cp

k

r+

(1� p)k2

4r2

=t�1

Y

r=s

1� cp

k

r

1� (1� p)k

2(2� p)r

◆�

t�1

Y

r=s

1� cp

k

r

1� (1� p)

2(2� p)k�

◆�

.

(3.1.11)

We introduce the notation

�k

:=(1� p)

2(2� p)k�

. (3.1.12)

Observe that

1� cp

k(1� �k

)

s exp

1

s

◆�cpk(1��k)

1 +1

s

◆�cpk(1��k)

=

s

s+ 1

cpk(1��k)

.

Plugging the above inequality into (3.1.11), noting that this results in a telescopic product,and recalling the definition of ⌘

i

, we get

P�

Tm

j,k+1

> t�

�Tm

j,k

= s�

⇣s

t

cp(1��k)k

= P�

Tm

j,k

e⌘k > t�

�Tm

j,k

= s�

.

Combining the above inequality with (3.1.10), we obtain

P�

Tm

j,k+1

> t�

P�

Tm

j,k

k1+�

+1X

s=k

1+�

P�

Tm

j,k

e⌘k > t�

�Tm

j,k

= s�

P�

Tm

j,k

= s�

P�

Tm

j,k

k1+�

+1X

s=k

P�

Tm

j,k

e⌘k > t�

�Tm

j,k

= s�

P�

Tm

j,k

= s�

= P�

Tm

j,k

k1+�

+ P�

Tm

j,k

e⌘k > t�

.

(3.1.13)

Writeerr(k) := P

Tm

j,k

k1+�

.

By the above equation, we also have, recalling that ⌘k

is independent from both Tm

j,k�1

and ⌘k�1

,

P�

Tm

j,k

e⌘k > t�

=

Z 1

0

P✓

Tm

j,k

>t

s

P (e⌘k = ds)

Z 1

0

P✓

Tm

j,k�1

e⌘k�1 >t

s

+ err (k � 1)

P (e⌘k = ds)

P�

Tm

j,k�1

e(⌘k+⌘k�1) > t�

+ err(k � 1),

(3.1.14)

Page 49: Agglomeration in scale-free random graphs - Universidade Federal de … · 2019. 11. 15. · Departamento de matemática Agglomeration in scale-free random graphs por ... Neste trabalho

3. The p-model 44

where P (e⌘k = ds) denotes the measure in R induced by the random variable e⌘k . Proceedingin this way, we obtain

P�

Tm

j,k+1

> t�

P

Tm

j,m

exp

k

X

i=m

⌘i

!

> t

!

+k

X

n=m

err(n).

It remains to be shown that the sum of errors is su�ciently small. First we note that:

Tm

j,n

n1+�

=

8

<

:

mj

X

i=m(j�1)+1

dn

1+� (i) � n

9

=

;

.

But for small n and large j the above event is actually empty, since none of the m verticesin the j-th block has enough time to be added by the process. In order to the above eventto be non-empty, we need at least one of the random variables d

n

1+� (i), for i 2 {(j � 1)m+1, . . . , jm} to be possibly not identically null. For this, n and j must satisfy the inequalitybelow:

n1+� � (j � 1)m+ 1 () n � [(j � 1)m+ 1]1

1+� .

A straightforward application of Dirichlet’s pigeon-hole principle shows that

Tm

j,n

n1+�

⇢jm

[

i=(j�1)m+1

n

dn

1+� (i) � n

m

o

.

We note that, ifn

m� C

1

n(1+�)cpp

(1 + �) log n

(j � 1)1�p2

, (3.1.15)

thenn

dn

1+� (i) � n

m

o

⇢(

dn

1+� (i) � C1

n(1+�)cpp

(1 + �) log n

(j � 1)1�p2

)

.

But (3.1.15) is always valid for large n, since (1 + �)cp

< 1 and j � m2/(1�p) + 1. Sincei > (j � 1)m, Proposition 3.1.1 implies

P�

Tm

j,n

n1+�

mn�100(1+�).

Consequentlyk

X

n=1

err(n) =k

X

n=[(j�1)m+1]

11+�

err(n) m

[(j � 1)m]99,

which concludes the proof.

Page 50: Agglomeration in scale-free random graphs - Universidade Federal de … · 2019. 11. 15. · Departamento de matemática Agglomeration in scale-free random graphs por ... Neste trabalho

3. The p-model 45

Lemma 3.1.16. For all vertex j and all m,R 2 N, there exists a positive constant c =cm,R,p

> 0 such thatE[TR

jm,1

] cjR.

Proof. Note that we can write

Tjm,1

= 1 +jm�1

X

i=1

(Ti+1,1

� Ti,1

),

so that Tjm,1

is distributed as 1 plus a sum of jm�1 independent geometric random variablesof parameter p. Recall that a random variable which follows a negative binomial distributionof parameters jm� 1 and p has moment generating function

G(s) = (1� p)jm�1

(1� pes)jm�1

.

By taking the R-th derivative of G(s) and evaluating it at 0, one can conclude the Lemma’sstatement.

Now we state and prove the main theorem of this section.

Proposition 3.1.17 (Lower bound for the degree). Fix m su�ciently large, and let

1 < R < mcp

(1� �m

).

Then there exists a positive constant c = c(m,R, p) such that, for

� 2 (0, cp

(1� �m

)),

we have:

P�

dt,m

(j) < t��

cjR

tR��R/cp(1��m)

+m

[(j � 1)m]99.

Proof. Fix m su�ciently large. By Lemma 3.1.8,

P(dt,m

(j) < t�) = P(Tm

j,t

� > t) P

0

@Tm

j,m

exp

0

@

t

�X

i=m

⌘i

1

A > t

1

A+m

[(j � 1)m]99. (3.1.18)

We need to control the first term of the inequality’s right hand side. We have that

P

0

@Tm

j,m

exp

0

@

t

�X

i=m

⌘i

1

A > t

1

A =1X

n=1

P

0

@exp

0

@

t

�X

i=m

⌘i

1

A > t/n

1

AP�

Tm

j,m

= n�

, (3.1.19)

Page 51: Agglomeration in scale-free random graphs - Universidade Federal de … · 2019. 11. 15. · Departamento de matemática Agglomeration in scale-free random graphs por ... Neste trabalho

3. The p-model 46

because Tm

j,m

is independent of ⌘i

for all possible values of i. Since ⌘i

d

= Exp(icp

(1� �i

)), wehave that its moment generating function is given by

Gi

(s) =1

1� s

icp(1��i)

,

for s < icp

(1� �i

). Then, for 1 < R < mcp

(1� �m

), Markov’s Inequality implies

P

0

@exp

0

@

t

�X

i=m

⌘i

1

A > t/n

1

A nR

tR

t

�Y

i=m

1� R/[cp

(1� �i

)]

i

◆�1

nR

tR

t

�Y

i=m

1� R/[cp

(1� �m

)]

i

◆�1

.

(3.1.20)

The last product can be written in terms of the Gamma Function, using its multiplicativeproperty. Note that

t

�Y

i=m

1� R/cp

(1� �m

)

i

=�(m)�

t� + 1�R/cp

(1� �m

)�

� (m�R/cp

(1� �m

))�(t� + 1).

This in turn implies the existence of a constant b = bm,R,p

> 0 such that

t

�Y

i=m

1� R/cp

(1� �m

)

i

> bt��R(cp(1��m))

�1.

Then, combining this bound with inequality (3.1.20), we obtain

P

0

@exp

0

@

t

�X

i=m

⌘i

1

A > t/n

1

A bnRt�R(cp(1��m))

�1

tR. (3.1.21)

Notice that, by the time that the jm-th vertex enters the graph, the j-th block of m ver-tices has total degree at least m, that is, Tm

j,m

Tjm,1

. This fact, together with (3.1.19),Lemma 3.1.16, and the above inequality, implies

P

0

@Tm

j,m

exp

0

@

t

�X

i=m

⌘i

1

A > t

1

A b

tR��R(cp(1��m))

�1

1X

n=1

nRP�

Tm

j,m

= n�

b

tR��R(cp(1��m))

�1

1X

n=1

nRP (Tjm,1

= n)

=b

tR��R(cp(1��m))

�1 E⇥

TR

jm,1

Lemma 3.1.16

cjR

tR��R(cp(1��m))

�1 ,

(3.1.22)

Page 52: Agglomeration in scale-free random graphs - Universidade Federal de … · 2019. 11. 15. · Departamento de matemática Agglomeration in scale-free random graphs por ... Neste trabalho

3. The p-model 47

for some constant c = cm,p,R

> 0. Combining (3.1.22) with (3.1.18) gives the desired result.

Recall that in Corollary 2.3.19 we proved an upper bound for the maximum degree, Ie, weshowed that, w.h.p,

dmax

(Gt

) C1

t1�p/2

p

log(t).

The Corollary below states that we may have dmax

(Gt

) � t1�p/2�", in which " is small as wewant.

Corollary 3.1.23 (Lower bound for the maximum degree). For all " > 0, there exists apositive constant C

2

= c("), such that

P�

dmax

(Gt

) C2

t1�p/2�"

= o(1).

Proof. It is enough to show that there exists at least one vertex j with degree large enough,w.h.p. For this, in Theorem 3.1.17, choose m large enough so that �

m

", � = (1� ")cp

(1��m

) and let j be the t"R/2-th block of m vertices. With all these choices, Theorem 3.1.17gives us

P�

dt,m

(t"R/2) < t(1�")(1�p/2)

c

t"R/2

.

The above inequality means, w.h.p, that at least one vertex in the t"R/2-th block of verticeshas degree greater than t(1�")(1�p/2)/m, which concludes the proof.

3.2 The Community

This section is devoted to prove the existence asymptotically almost surely of a large com-munity – clique, complete subgraph – in the graph generated by the p-model’s dynamics.I.e, the existence of a community in G

t

whose size grows to infinity polynomially in t. Thestatement is formalized in the theorem below.

Theorem 3.2.1 (The Community). Let " > 0, then, G2t

has a complete subgraph of or-

der t(1�")

(1�p)2�p , asymptotically almost surely.

Proof. Fix " > 0. Let ↵ = (1 � ") (1�p)

2�p

, fix � = 1+"2

�1(1�p)

2

and choose "0 > 0 such that"0 < ↵. By Proposition 3.1.17 and the union bound we have that

P

0

@

t

↵[

j=t

"0

dt,m

(j) < t�

1

A C

tR� �R

cp(1��m)�↵(R+1)

+m

(mt)98"0. (3.2.2)

Page 53: Agglomeration in scale-free random graphs - Universidade Federal de … · 2019. 11. 15. · Departamento de matemática Agglomeration in scale-free random graphs por ... Neste trabalho

3. The p-model 48

For our choice of ↵, �, and for su�ciently large m and R, the right hand side of the aboveinequality goes to 0 as t goes to infinity. The immediate consequence of this fact is thatinside each one of those t↵ � t"

0blocks of vertices of size m there exists at least one vertex

whose degree is larger than t�/m, with high probability. We denote by Lj

= Lj,m,p

the vertexof largest degree among the vertices of the j-th block, and we call L

j

the leader of its block.As we noted, d

t

(Lj

) � t�/m.

We now prove that these vertices of large degree are connected with high probability. Wewrite i = j to denote the fact that there are no edges between the vertices i and j. Wedefine the event

Bt

=t

↵[

j=t

"0

dt,m

(j) < t�

.

Let gs

(i, j) be the indicator function of the event where we add an edge between Li

and Lj

in an edge-step at time s. We define the random variable

Y ij

2t

=2t

Y

s=t+1

(1� gs

(i, j)).

In other words, Y ij

2t

is the indicator function of the event where we don’t connect Li

and Lj

in any of the edge-steps between times t+ 1 and 2t. We have that

E⇥

1B

ct(1� g

2t

(i, j))�

�F2t�1

=

1� 2(1� p)d2t�1

(Li

)d2t�1

(Lj

)

4(2t� 1)2

1B

ct

1� (1� p)t2�

8t2m2

1B

ct.

(3.2.3)

Now, observe that Y ij

2t

= Y ij

2t�1

(1� g2t

(i, j)). So, by using (3.2.3), we obtain

E⇥

1B

ctY ij

2t

�F2t�1

1� (1� p)t2�

8t2m2

Y ij

2t�1

1B

ct. (3.2.4)

Proceeding inductively, taking the conditional expectation with respect to Fs�1

at eachstep s, we gain the following inequality

E⇥

1B

ctY ij

2t

�Ft

1� (1� p)t2�

8t2m2

t

1B

ct exp

�c1

t2��1

1B

ct, (3.2.5)

where c1

is a positive constant depending on both p and m. Since 1{Li=Lj in G2t} Y ij

2t

, theabove inequality implies

P (Li

= Lj

in G2t

, Bc

t

) exp�

�c1

t2��1

P (Bc

t

) .

Page 54: Agglomeration in scale-free random graphs - Universidade Federal de … · 2019. 11. 15. · Departamento de matemática Agglomeration in scale-free random graphs por ... Neste trabalho

3. The p-model 49

Now, by the union bound, we have

P

0

@

[

t

"0i,jt

{Li

= Lj

in G2t

} , Bc

t

1

A t2↵ exp�

�c1

t2��1

P (Bc

t

) .

And finally,

P

0

@

[

t

"0i,jt

{Li

= Lj

in G2t

}

1

A = P

0

@

[

t

"0i,jt

{Li

= Lj

in G2t

} , Bc

t

1

A

+ P

0

@

[

t

"0i,jt

{Li

= Lj

in G2t

} , Bt

1

A

t2↵ exp�

�c1

t2��1

P (Bc

t

) + P (Bt

) .

The above inequality, together with our choice of ↵, �, and inequality (3.2.2), imply theexistence of a subgraph of G

t

with order t↵(1� o(1)) asymptotically almost surely.

3.3 Small-World Phenomenon

In this section we obtain an upper bound for diam(Gt

) of order log(t). The bound is aimmediate consequence of the next Theorem.

Theorem 3.3.1. Let G(p)

t

be the graph generated by the p-model and G(1)

t

the graph generatedby the Barabasi-Albert model. Then there exists a coupling of the models which gives

P⇣

diam⇣

G(p)

t

diam⇣

G(1)

t

⌘⌘

= 1.

Proof. We construct the coupling explicitly.

1. Start with G1

the graph with one vertex and one loop attached to it.

2. Obtain Gt+1

from Gt

in the following way

• With probability p add a solid vertex and connect it with u 2 Gt

chosen withprobability dGt(u)/2t;

• With probability 1 � p add a ghost vertex and let two edges leaves from it.Choose u

1

, u2

2 Gt

to be the tails of each edge independently and again withprobability dGt(u)/2t.

Page 55: Agglomeration in scale-free random graphs - Universidade Federal de … · 2019. 11. 15. · Departamento de matemática Agglomeration in scale-free random graphs por ... Neste trabalho

3. The p-model 50

We will call the first edge by attractive edge and it will be represent by the dashed edge on thedrawings below. Whereas the second one will be called as connective edge and representedby a solid edge. The ghost vertices will be represent by a circle whereas the solid verticeswill be the black disks.

Fig. 3.1: A ghost-vertex with its attractive edge and connective edge

Before we continue, we must make an important observation.

Observation. The degree of a vertex does not count attractive edges.

From the dynamics we have just described above we may conclude that the graph G(p)

t

maybe obtained from G

t

simply identifying the ghost vertices with the tail of their attractiveedges. Whereas the graph G

(1)

t

is obtained from Gt

just declaring all vertices equals andremoving the attractive edges.

To see why the first statement is true, observe that the presence of the attractive edge is justanother way to add a new edge to G

(p)

t

. The attractive edge tells us where goes the head ofthe new edge, whereas the connective tells where goes its tail. For the second statement, ifwe always ignore attractive edges and all vertices are truly vertices, does not matter whichstep we take, we always add a new vertex and connect it to a existing vertex with probabilityproportional to its degree.

Now, observe that for each path �G

(1)t

in G(1)

t

we have two options: it uses ghost vertices or

it does not. A path � that uses only solid vertices is also contained in G(p)

t

, which means �has the same, or smaller, size in G

(p)

t

. On the other hand, if �G

(1)t

does use at least one ghost

vertex, then we have two options: the attractive edge of this vertex points to another vertexoutside of �

G

(1)t

or it points to another in �G

(1)t.

Fig. 3.2: A path with attractive edge pointing outside

Page 56: Agglomeration in scale-free random graphs - Universidade Federal de … · 2019. 11. 15. · Departamento de matemática Agglomeration in scale-free random graphs por ... Neste trabalho

3. The p-model 51

Observe that when the attractive edge points outside �G

(1)t

we do not increase the size of the

path when we regard it as a path in G(1)

t

. Whereas if the attractive points to a vertex in�G

(1)t

when we identify the vertices to regard � as a path in G(p)

t

we may decrease its size.

Fig. 3.3: A path with attractive edge pointing inside

This proves that all paths in G(p)

t

are smaller than the paths in G(1)

t

, which gives us thedesired result.

The above Theorem combined to the upper bound for diam⇣

G(1)

t

given in [7] gives us the

following Corollary.

Corollary 3.3.2 (Small-world). There exists a positive constant C such that

P⇣

diam⇣

G(p)

t

� C log(t)⌘

= o(1).

Page 57: Agglomeration in scale-free random graphs - Universidade Federal de … · 2019. 11. 15. · Departamento de matemática Agglomeration in scale-free random graphs por ... Neste trabalho

4. THE P (T )-MODEL

This chapter has as main objective introduces a model we proposed and shows our progressinvestigating it. We intend to show a some results which may clarify why the the model maybe a good model to study.

The p(t)-model is a modification of the p-model in which the parameter p is taken to be adecreasing function of the time t. We also demands that this function p(t) goes to zero as tgoes to infinity and that it is bounded from above by one. Recall that in the p-model, theparameter p is the probability of adding a new vertex to the graph. So, in the p(t)-modelthis probability goes to zero as the time goes to infinity. This property has as main objectivebe more realistic, since in many complex networks, as the social ones, the probability of newindividuals become part of the network decreases as the network’s order increases.

Since this model is very similar to the p-model, we briefly describe its dynamics. The modelhas two parameters: a decreasing function of the time p(t) and an initial graph G

1

, which wewill consider to be the graph with one vertex and one loop. As in the p-model, we considerthe following two stochastic operations that can be performed on the graph G:

• Vertex-step - Add a new vertex v, and add an edge {u, v} by choosing u 2 G withprobability proportional to its degree.

• Edge-step - Add a new edge {u1

, u2

} by independently choosing vertices u1

, u2

2 Gwith probability proportional to their degrees. We note that we allow loops to beadded, and we also allow a new connection to be added between vertices that alreadyshared an edge.

Then, we obtain Gt+1

performing a vertex-step on Gt

with probability p(t) or an edge-stepwith probability 1� p(t).

Page 58: Agglomeration in scale-free random graphs - Universidade Federal de … · 2019. 11. 15. · Departamento de matemática Agglomeration in scale-free random graphs por ... Neste trabalho

4. The p(t)-model 53

Power-Law exponent Expected Max. Degree Diameter

p(t) = p 2 + p

2�p

⇥⇣

t2�p2

O(log(t))

p(t) = 1/ log(t) 2 ⇥⇣

t

log(t)

� log(t)

log(log(t))

p(t) = 1/t� 2� � ⇥ (t) constant

Tab. 4.1: Comparison table among the di↵erent values for p(t).

4.1 Continuity of the parameter

From [10] and Proposition 3.1.1 two properties of the p-model are known:

1. It obeys a power-law distribution of exponent � = 2 + p/(2� p);

2. The expected value of the maximum degree satisfies:

E [dmax

(Gt

)] = ⇥�

t1�p/2

.

Looking to these two properties, we may note that � tends to 2 and the expected value of themaximum degree tends to t as p goes to zero. Of course, taking the limit on p is meaningless,since p is fixed on the p-model. Even if p is very close to zero, we still have � > 2 and theexpected value of the maximum degree far from being of order t. In this way, the p(t)-modelcomes to give a proper sense to this limit since its parameter goes to zero. So, we wouldexpect the following two properties of it:

1. It obeys a power-law distribution of exponent � = 2;

2. The expected value of the maximum degree satisfies:

E [dmax

(Gt

)] = ⇥ (t) .

Curiously, that is not the case. The p(t)-model may behave very di↵erently depending onhow fast p(t) goes to zero. The Table 4.1 summarizes this brief discussion.

Page 59: Agglomeration in scale-free random graphs - Universidade Federal de … · 2019. 11. 15. · Departamento de matemática Agglomeration in scale-free random graphs por ... Neste trabalho

4. The p(t)-model 54

4.2 The case p(t) = 1/t�

In this section we denote by G�

t

the graph generated by the p(t)-model choosing p(t) = t��,where � 2 (0, 1). We begin noticing that E[|V (G�

t

)|] = ⇥(t1��) since we may write

|V (G�

t

)| =t

X

s=2

Ber(p(s)),

in which (Ber(p(s)))s

are independent random variables following a Bernoulli’s distributionof parameter p(s). And by Cherno↵ bounds we may assure that the number of vertices inG�

t

is close to its mean w.h.p.

4.2.1 Power-law distribution

The proof of the power-law distribution is essentially the same we gave in Chapter 2 forHolme-Kim’s model, which is very similar to that given in [10] for the p-model. For thisreason we will just point out the main di↵erences for this case.

We keep the notation used in Chapter 2, denoting the number of vertices of degree d byN

t,�

(d). We are interested of evaluate the limit of E [Nt,�

(d)] /t1�� when t goes to infinity.Notice the term t1�� which represent the order of total number of vertices in G�

t

instead ofthe usual t as in Theorem 2.1.3.

Theorem 4.2.1 (Power-law distribution). There exists a positive constant c(�) such that

E [Nt,�

(d)]

t1��

�! c(�)

d2��

Proof. Let at

(d) be E [Nt,�

(d)]. Proceeding as in proof of Theorem 2.1.3 we may derive thefollowing recurrence relation for a

t

at+1

(d) =

1� (2� p(t))d

2t

at

(d) +(d� 1)(2� p(t))

2tat

(d� 1) +O�

t�1

. (4.2.2)

So, supposingat

(k)

t1��

�! Mk

,

for all k d and writing a0t

(d) = t�at

(d) we have

a0t

(d)

t=

at

(d)

t1��

Page 60: Agglomeration in scale-free random graphs - Universidade Federal de … · 2019. 11. 15. · Departamento de matemática Agglomeration in scale-free random graphs por ... Neste trabalho

4. The p(t)-model 55

and the that a0t

(d) satisfies the following recurrence relation

a0t+1

(d) =

1� (2� p(t))d

2t

◆✓

1 +1

t

a0t

(d) +(d� 1)(2� p(t))

2t1��

at

(d� 1) +O�

t�1+�

.

(4.2.3)Now, observe that as t gets large we may write

1 +1

t

= 1 +�

t+O

t�2

.

Returning to (4.2.3) we have

a0t+1

(d) =

1� (2� p(t))d� 2�

2t

a0t

(d) +(d� 1)(2� p(t))

2t1��

at

(d� 1) +O�

t�1+�

.

Finally, using Lemma 5.1.1 and the inductive hypothesis we obtain

E [Nt,�

(d)]

t1��

=a0t

(d)

t�! (d� 1)

d+ 1� �M

d�1

which is enough to conclude the desired result.

4.2.2 The expected maximum degree

The result is an immediate consequence of the Proposition below, which is essentially theProposition 3.1.1 we proved for the case p(t) = p.

Proposition 4.2.4. For each vertex j and each j t0

t the sequence of random vari-ables (Z

t

)t�t0

defined as

Zt

:=dt

(j)1{Tj,1=t0}Q

t�1

s=1

1 + 1�p(s)/2

s

⌘ (4.2.5)

is a martingale starting from t0

.

Proof. As in proof of Proposition 3.1.1, define �dt

(j) := dt+1

(j)� dt

(j). So, assuming thatj already exists at time t, we have

E [�dt

(j)|Ft

] = 1 · p(t)dt(j)2t

+ 1 · (1� p(t))2dt

(j)

2t

1� dt

(j)

2t

+ 2 · (1� p(t))(d

t

(j))2

4t2

=

1� p(t)

2

dt

(j)

t.

(4.2.6)

Dividing the above equation byQ

t

s=1

1 + 1�p(s)/2

s

we obtain the desired result.

Page 61: Agglomeration in scale-free random graphs - Universidade Federal de … · 2019. 11. 15. · Departamento de matemática Agglomeration in scale-free random graphs por ... Neste trabalho

4. The p(t)-model 56

From the Proposition, specifically from equation (4.2.6), we have that

E [dt

(j)] =t

Y

s=j

1 +1� s��/2

s

= ⇥

t

j

which implies the desired result.

Observation. We observe that from the above equation we also deduce the expected degreewhen p(t) = 1/ log(t).

4.3 The case p(t) = 1/ log(t)

4.3.1 Power-law distribution

This time we denote the random graph by Gt

and we observe that |V (Gt

)| = ⇥ (t/ log(t)).

Theorem 4.3.1 (Power-law distribution). There exists a positive constant c1

such that

E [Nt

(d)]

t/ log(t)�! c

1

(d+ 1)d

Proof. The proof is essentially the same we gave for Theorem 4.2.1 but here we let

a0t

(d) = at

(d) log(t).

This leads to the following recurrence relation involving a0t

(d)

a0t+1

(d) =

1� (2� p(t))d

2t

log(t+ 1)

log(t)a0t

(d)+log(t+ 1)

log(t)

(d� 1)(2� p(t))

2ta0t

(d� 1). (4.3.2)

Now, observe that as t gets large we may write

log(t+ 1)

log(t)= 1 +

1

t log(t)+O

t�2

.

Which yields

a0t+1

(d) =

1� (2� p(t))d

2t+O

1

t log(t)

◆◆

a0t

(d) +log(t+ 1)

log(t)

(d� 1)(2� p(t))

2ta0t

(d� 1).

Finally, using Lemma 5.1.1 and the inductive hypothesis we obtain

E [Nt

(d)]

t/ log(t)=

a0t

(d)

t�! (d� 1)

d+ 1M

d�1

which is enough to conclude the desired result.

Page 62: Agglomeration in scale-free random graphs - Universidade Federal de … · 2019. 11. 15. · Departamento de matemática Agglomeration in scale-free random graphs por ... Neste trabalho

4. The p(t)-model 57

4.3.2 Lower bounds for the diameter

Here we prove a lower bound for the diameter of Gt

showing that we have a large numberof paths of size log(t)/ log(log(t)), which we call snakes. The main result is the following

Theorem 4.3.3. There exists a positive constant c such that

P✓

diam (Gt

) � clog(t)

log(log(t))

= o(1).

We begin by the main object needed for the proof of the above theorem.

Definition 5 (Snake). Given a sequence of steps of size l, ~t = (t1

, .., tl

), a path of Gt

whosevertices are those created at times t

1

, t2

, .., tl

plus the neighbor of the vertex created at timet1

and that doesn’t have edges created at other times will be called a snake of size l anddenoted by s

~

t

.

In other words, a snake s~

t

is a path created exactly as vector of times ~t = (t1

, .., tl

) instructs:at time t

1

we follow a vertex-step and connect the vertex created to another vertex, at timet2

we again take a vertex-step connecting the new vertex to the vertex created at time t1

and so on. Beyond the sequence of vertex-step, we require that no other edge is connectedto the path.

t0 t1 tl

...

Fig. 4.1: A snake-path of size l

Let Sl

(t) be the set containing all snakes of size l in Gt

and denote by N l

ct,t

the numberof snakes whose vertices were created at times between ct and t. We will need some lowerbounds for E

N l

ct,t

which will be expressed in the following two lemmas below.

Lemma 4.3.4 (Snake Lemma). For any 0 < c < 1 and any integer l min{(1 � c)t, ct},the following lower bound to E

N l

ct,t

holds

E⇥

N l

ct,t

�✓

(1� c)t

l

1

(2t)l�1 logl(t)

1� 2l

ct

t

.

Proof. The random variable N l

ct,t

can be written as

N l

ct,t

=X

t1<t2<...<tlti2[ct,t]

1 {s~

t

2 Sl

(t)} (4.3.5)

Page 63: Agglomeration in scale-free random graphs - Universidade Federal de … · 2019. 11. 15. · Departamento de matemática Agglomeration in scale-free random graphs por ... Neste trabalho

4. The p(t)-model 58

Then, its expected value is

E⇥

N l

ct,t

=X

t1<t2<...<tlti2[ct,t]

P (s~

t

2 Sl

(t))

So, it su�ces to obtain a proper lower bound to P (s~

t

2 Sl

(t)). We claim that

P (s~

t

2 Sl

(t)) � 1

(2t)l�1 logl(t)

1� 2l

ct

t

.

To see why the claim above is true, observe that the snake requires the birth of l vertices attimes greater than ct. This explains the factor of log�l(t). When the vertex born, it mustconnects to the previous vertex in the snake, this is done with probability at least (2t)�1.Letting the vertex created at time t

1

connects to any vertex in Gt1�1

, we have the factorof (2t)l�1. We also must protect the snake from each possible edge created at times not in ~t.This is made with probability at least

1� 2l

ct

, because the probability of to connect to thesnake at each step is at most 2l/ct.

Finally we have the following lower bound to E⇥

N l

ct,t

E⇥

N l

ct,t

�✓

(1� c)t

l

1

(2t)l�1 logl(t)

1� 2l

ct

t

. (4.3.6)

Lemma 4.3.7. There exists positives constants c1

, c2

and c3

such that, for l = c2

log(t)

log(log(t))

,

E⇥

N l

c1t,t

� tc3 .

Proof. We use the lower bound provided by Lemma 4.3.4 substituting l = c2

log(t)

log(log(t))

. In this

way, we must obtain lower bound for�

(1�c1)t

l

. But, since l << t, Stirling’s formula assuresthat there exists c

3

such that✓

(1� c1

)t

l

� c3

(1� c1

)ltlell�l� 12 .

Choosing c1

= 1� e�1, we have✓

(1� c1

)t

l

� c3

tll�l� 12 � c

3

exp {l log(t)� 2l log(l)} .

Combining the above inequality and (4.3.5) we have, for t large enough,

E⇥

N l

c1t,t

� c4

exp�

l log(t)� 2l log(l)� (l � 1) log(2t)� l log(log(t))� 2c�1

1

l

.

Page 64: Agglomeration in scale-free random graphs - Universidade Federal de … · 2019. 11. 15. · Departamento de matemática Agglomeration in scale-free random graphs por ... Neste trabalho

4. The p(t)-model 59

Finally, substituting l = c2

log(t)

log(log(t))

in the above inequality, we are able to chose c2

such that

E⇥

N l

c1t,t

� tc5

and this proves the lemma.

Some notations and definitions will be useful throughout the proof of Theorem 4.3.3. Wewrite {e

s

9 s~

t

} for the event in which the edge created at the s-th step doesn’t connect toany vertex of s

~

t

.

Definition 6 (Snake’s degree). Given a snake s~

t

, we denote by dr

(s~

t

) the sum of the degreesof each snake’s vertex. I.e.,

dr

(s~

t

) =X

ti2~t

dr

(vertex added at time ti

)

If r > t and s~

t

has size l, then dr

(s~

t

) = 2l � 1. And more

P (es+1

9 s~

t

| s~

t

⇢ Gs

) = 1�✓

1� p(s)

2

ds

(s~

t

)

s. (4.3.8)

We write

cp

(s) := 1� p(s)

2.

Finally we are able to prove the main section’s theorem.

Proof of Theorem 4.3.3. Paley-Zigmund’s inequality assures that, for any 0 ✓ 1

P�

N l

ct,t

> ✓E⇥

N l

ct,t

⇤�

� (1� ✓)2�

E⇥

N l

ct,t

⇤�

2

Eh

N l

ct,t

2

i . (4.3.9)

If we guarantee that

1. E⇥

N l

ct,t

! 1;

2.�

E⇥

N l

ct,t

⇤�

2

= (1� o(1))Eh

N l

ct,t

2

i

,

choosing ✓ = ✓(t) such that ✓(t) goes to zero slower than E⇥

N l

ct,t

goes to infinity the Theoremis proven.

Page 65: Agglomeration in scale-free random graphs - Universidade Federal de … · 2019. 11. 15. · Departamento de matemática Agglomeration in scale-free random graphs por ... Neste trabalho

4. The p(t)-model 60

Lemma 4.3.7 provides a option for the constant c. We let c be the constant c2

in thelemma’s statement. Moreover, it also guarantees that E

N l

ct,t

goes to infinity. So, the proofis dedicated to show item 2.

To create the snake s~

t

we must add new vertex at time t1

then we avoid the snake from timet1

+ 1 to time t2

� 1. At time t2

, we add a new vertex and connect it to t1

, next we againavoid the snake in all steps between t

2

+ 1 and t3

� 1. We keep following theses rules untiltime t. We are saying that

P (s~

t

2 Sl

(t)) = p(t1

)t2�1

Y

r1=t1+1

1� cp

(r1

)

r1

p(t2

)1

2(t2

� 1)...

t

Y

rl=tl+1

1� cp

(rl

)(2l � 1)

rl

(4.3.10)

Note that If we have two snakes s~

t

and s~r

, by the definition of snakes, they can’t oc-cur simultaneously, unless they have disjoints vertex sets. With this in mind we compareP (s

~

t

, s~r

2 Sl

(t)) with P (s~

t

2 Sl

(t))P (s~r

2 Sl

(t)). We would like to show the following

Claim: For two snakes with disjoints vertices set, we have

P (s~

t

, s~r

2 Sl

(t)) = (1 + o(1))P (s~

t

2 Sl

(t))P (s~r

2 Sl

(t)) .

Proof of the claim: We separate in cases.

Case 1: s 2 ~t but s /2 ~r (s /2 ~t but s 2 ~r.).

To calculate P (s~

t

, s~r

2 Sl

(t)), we must add a new vertex at time s and connect it to the lastvertex added of s

~

t

. We do this with probability equal p(s) 1

2(s�1)

. On the other hand, to deter-

mine P (s~r

2 Sl

(t)), we must avoid s~r

and we do this with probability 1� cp(s�1)ds�1(s~r)

s�1

. Whilein P (s

~

t

2 Sl

(t)) we see exactly p(s) 1

2(s�1)

. This means in P (s~

t

2 Sl

(t))P (s~r

2 Sl

(t)) we see⇣

1� cp(s�1)ds�1(s~r)

s�1

p(s) 1

2(s�1)

while in P (s~

t

, s~r

2 Sl

(t)), the term p(s) 1

2(s�1)

. This case occurs

2l times since the snakes are disjoints. Thus, recalling s 2 [ct, t], l = c log(t)/ log(log(t)) andthat the degree of each snake is at most 2l � 1, we can bound the product of all the terms

of the form⇣

1� cp(s�1)ds�1(s~r)

s�1

from above by

1� c3

t

2l

and from bellow by✓

1� c4

l

t

2l

.

Page 66: Agglomeration in scale-free random graphs - Universidade Federal de … · 2019. 11. 15. · Departamento de matemática Agglomeration in scale-free random graphs por ... Neste trabalho

4. The p(t)-model 61

Observe that both product goes to 1 as t goes to infinity.

Case 2: s /2 ~t and s /2 ~r.

In P (s~r

2 Sl

(t)) as in P (s~

t

2 Sl

(t)) we see terms like 1� cp(s�1)ds�1(s~r)

s�1

, since we must avoidthe snakes in each event. But, in P (s

~

t

, s~r

2 Sl

(t)) we observe

1� cp

(s� 1)(ds�1

(s~

t

) + ds�1

(s~r

))

s� 1

since we must avoid both snakes at same time. Now, observe that

1� a

s

⌘⇣

1� c

s

=

1� (a+ c)

s

◆✓

1 +ac

s2(1� o(1))

.

We have ⇥(t) terms likes⇣

1 + ac

s

2(1�o(1))

. But again, as in Case 1, we still have their product

going to 1. This proves the claim. ⌅

Now, observe that

Eh

N l

ct,t

2

i

= E

2

4

0

@

X

~

t

1 {s~

t

2 Sl

(t)}

1

A

X

~r

1 {s~r

2 Sl

(t)}!

3

5

=X

~

t,~r

disjoints

P (s~

t

, s~r

2 Sl

(t)) + E⇥

N l

ct,t

.(4.3.11)

And that

E⇥

N l

ct,t

⇤�

2

=X

~

t,~r

disjoints

P (s~

t

2 Sl

(t))P (s~r

2 Sl

(t)) +X

~

t,~r

~r\~t 6=;

P (s~

t

2 Sl

(t))P (s~r

2 Sl

(t)) .

Thus, using the equations above, Lemma 4.3.7 and Jensen’s Inequality, we are able to obtain

1 Eh

N l

ct,t

2

i

E⇥

N l

ct,t

⇤�

2

P

~

t,~r

disjoints

P (s~

t

, s~r

2 Sl

(t))

P

~

t,~r

disjoints

P (s~

t

2 Sl

(t))P (s~r

2 Sl

(t))+

1

E⇥

N l

ct,t

⇤ = 1 + o(1)

which proves the desired result.

Page 67: Agglomeration in scale-free random graphs - Universidade Federal de … · 2019. 11. 15. · Departamento de matemática Agglomeration in scale-free random graphs por ... Neste trabalho

5. APPENDIX

5.1 Auxiliary Lemmas

Lemma 5.1.1 (Lemma 3.1 [10]). Let at

be a sequence of positive real numbers satisfying thefollowing recurrence relation

at+1

=

1� bt

t

at

+ ct

.

Furthermore, suppose bt

! b > 0 and ct

! c, then

limt!1

at

t=

c

1 + b.

Lemma 5.1.2. Let c > 0 be a fixed constant and �(t) be a real function defined as

�(t) :=t�1

Y

s=1

1 +c

s

, for t � 1.

Then, there exists positive constants b1

and b2

depending on c only such that

b1

tc �(t+ 1) b2

tc, for all t � 1.

Proof. The proof is essentially controlling of Taylor Series. Observe that, for all 0 < x < 1,we have

ex = (1 + x)

1 +x2

1 + x

1X

k=2

xk�2

k!

!

(1 + x)

1 + x2

1X

k=2

1

k!

!

(1 + x)�

1 + ex2

.

(5.1.3)

Since 1 + x ex, we also have

t�1

Y

s=1

1 +(ce)2

s2

exp

(

t�1

X

s=1

(ce)2

s2

)

eb, (5.1.4)

Page 68: Agglomeration in scale-free random graphs - Universidade Federal de … · 2019. 11. 15. · Departamento de matemática Agglomeration in scale-free random graphs por ... Neste trabalho

5. Appendix 63

in which, b = expn

P1s=1

(ce)

2

s

2

o

. Finally, combining the inequality above we have

�(t) �exp

P

t�1

s=1

c

s

Q

t�1

s=1

1 + (ce)

2

s

2

⌘ � b1

tc.

The upper bound comes from the bound 1 + x ex.

5.2 Concentration Inequalities for Martingales

Theorem 5.2.1 (Azuma’s Inequality - [10]). Let (Mn

)n�1

be a (super)martingale satisfying

|Mi+1

�Mi

| ai

Then, for all � > 0 we have

P (Mn

�M0

> �) exp

� �2P

n

i=1

a2i

.

Theorem 5.2.2 (Freedman’s Inequality). Let (Mn

,Fn

)n�1

be a (super)martingale. Write

Vn

:=n�1

X

k=1

E⇥

(Mk+1

�Mk

)2�

�Fk

and suppose that M0

= 0 and

|Mk+1

�Mk

| R, for all k.

Then, for all � > 0 we have

P�

Mn

� �, Vn

�2, for some n�

exp

� �2

2�2 +R�

.

Proof. Without lost of generality, we assume R = 1. If it is not, another martingale can bedefined just dividing by R. To simply our writing, denote by D

k

the di↵erence Mk

�Mk�1

and put ⌧ = inf{n : Mn

> �}. The proof will require the inequality below, which we juststate since it may be derived via straightforward calculus arguments.

esx 1 + sx+ (es � 1� s)x2, (5.2.3)

for all s > 0 and x 1. The proof is essentially the follow

Page 69: Agglomeration in scale-free random graphs - Universidade Federal de … · 2019. 11. 15. · Departamento de matemática Agglomeration in scale-free random graphs por ... Neste trabalho

5. Appendix 64

Claim: For all s > 0,

E⇥

esDk+1�

�Fk

e(es�1�s)E[D2

k+1|Fk].

Proof of the claim: By hypothesis |Mk+1

�Mk

| 1. Thus, (5.2.3) gives us

esDk+1 1 + sDk+1

+ (es � 1� s)D2

k+1

.

Taking the conditional expectation on both sides and using that Mn

is a (super)martingal,we have

E⇥

esDk+1�

�Fk

1 + (es � 1� s)E⇥

D2

k

�Fk�1

e(es�1�s)E[D2

k+1|Fk]

proving the claim. ⌅

Now, let Sn

be the following

Sn

:= esMn�(e

s�1�s)Vn = Sn�1

esDn�(e

s�1�s)E[D2n|Fn�1]

and observe that our claim guarantees (Sn

)n

is a positive supermartingal. Thus, by theOptional Stopping Theorem,

E [S⌧^n] 1, for all n

and by Fatou’s Lemma,Z

{⌧<1}S⌧

lim infn!1

Z

{⌧<1}S⌧^n 1.

Now, let A denotes the event we are interest in and observe

S⌧

1A

� 1A

es��(e

s�1�s)�

2,

which impliesP(A) e�s�+(e

s�1�s)�

2. (5.2.4)

Minimizing on s the above inequality, we obtain

s = log

�+ �2

�2

.

Returning on (5.2.4), we have

P(A) ✓

�2

�+ �2

�+�

2

e� exp

� ��2

2�2 + �

,

since log(1 + x) � 2x/(2 + x), which concludes the proof.

Page 70: Agglomeration in scale-free random graphs - Universidade Federal de … · 2019. 11. 15. · Departamento de matemática Agglomeration in scale-free random graphs por ... Neste trabalho

BIBLIOGRAPHY

[1] B. Bollobas. Mathematical results on scale-free random graphs. In Handbook of Graphsand Networks, pages 1–37, 2003.

[2] A. Barabasi and R. Albert. Emergence of scaling in random networks. Science,(5439):509–512, 1999.

[3] A-L. Barabasi and R. Albert. Emergence of scaling in random networks. Science, 1999.

[4] G. Bianconi and A. L. Barabasi. Bose-Einstein Condensation in Complex Networks.Physical Review Letters, 86:5632–5635, 2001.

[5] G. Bianconi and M. Marsili. Emergence of large cliques in random scale-free networks.EPL (Europhysics Letters), 74(4):740, 2006.

[6] B. Bollobas and O. Riordan. Robustness and vulnerability of scale-free random graphs.Internet Math., 1(1):1–35, 2003.

[7] B. Bollobas and O. Riordan. The diameter of a scale-free random graph. Combinatorica,24(1):5–34, 2004.

[8] Bla Bollobs, Oliver Riordan, Joel Spencer, and Gbor Tusndy. The degree sequence ofa scale-free random graph process. RANDOM STRUCTURES AND ALGORITHMS,18:279–290, 2001.

[9] C. Borgs, J. T. Chayes, C. Daskalakis, and S. Roch. First to market is not everything:an analysis of preferential attachment with fitness. In STOC, pages 135–144. ACM,2007.

[10] F. Chung and L. Lu. Complex Graphs and Networks (Cbms Regional Conference Seriesin Mathematics). American Mathematical Society, Boston, MA, USA, 2006.

[11] F. Chung and L. Lu. Complex Graphs and Networks (Cbms Regional Conference Seriesin Mathematics). American Mathematical Society, Boston, MA, USA, 2006.

[12] S. Dereich and M. Ortgiese. Robust analysis of preferential attachment models withfitness. Combinatorics, Probability and Computing, 23:386–411, 2014.

Page 71: Agglomeration in scale-free random graphs - Universidade Federal de … · 2019. 11. 15. · Departamento de matemática Agglomeration in scale-free random graphs por ... Neste trabalho

Bibliography 66

[13] S. Dommers, R. van der Hofstad, and G. Hooghiemstra. Diameters in preferentialattachment models. Journal of Statistical Physics, (1):72–107, 2010.

[14] N. Eggemann and S.D. Noble. The clustering coe�cient of a scale-free random graph.Discrete Applied Mathematics, 2011.

[15] M. Gabielkov and A. Legout. The Complete Picture of the Twitter Social Graph. InACM CoNEXT 2012 Student Workshop, Nice, France, December 2012.

[16] M. Girvan and M.E.J Newman. Community structure in social and biological net-works. Proceedings of the National Academy of Sciences of the United States of America,99(12):7821–7826, 2002.

[17] S. Janson, T. Luczak, and I. Norros. Large cliques in a power-law random graph. J.App. Prob., 2010.

[18] B. Jim and P. Holme. Growing scale-free networks with tunable clustering. Phys. Rev.E, 2002.

[19] B. Jim and P. Holme. Growing scale-free networks with tunable clustering. Phys. Rev.E, 2002.

[20] R. Kumar, P. Raghavan, S. Rajagopalan, D. Sivakumar, A. Tomkins, and E. Upfal.Random graph models for the web graph. In FOCS, pages 57–65, 2000.

[21] R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon. Networkmotifs: simple building blocks of complex networks. Science, 298(5594):824–827, Octo-ber 2002.

[22] T. Mori. The maximum degree of the barabasi-albert random tree. Comb. Probab.Computing, 2005.

[23] M. E. J. Newman. Who is the best connected scientist? A Study of scientific coauthor-ship networks. Phys. Rev., page 016131, 2001.

[24] M. E. J. Newman. Networks: An Introduction. Oxford University Press, Inc., NewYork, NY, USA, 2010.

[25] M. E. J. Newman, D. J Watts, and S. Strogatz. Random graph models of social net-works. Proceedings of the National Academy of Sciences of the United States of America,99:2566–2572, 2002.

[26] L. Ostroumova, A. Ryabchenko, and E. Samosvat. Generalized preferential attachment:Tunable power-law degree distribution and clustering coe�cient. Lectures Notes inComputer Science, 2013.

Page 72: Agglomeration in scale-free random graphs - Universidade Federal de … · 2019. 11. 15. · Departamento de matemática Agglomeration in scale-free random graphs por ... Neste trabalho

Bibliography 67

[27] L. Ostroumova, A. Ryabchenko, and E. Samosvat. Generalized preferential attachment:Tunable power-law degree distribution and clustering coe�cient. Lectures Notes inComputer Science, 2013.

[28] L. Ostroumova and E. Samosvat. Global clustering coe�cient in scale-free networks.Lectures Notes in Computer Science, 2014.

[29] S.H Strogatz and D. J. Watts. Collective dynamics of ’small-world’ networks. Nature,1998.

[30] W.-Q. Wang, Q.-M. Zhang, and T. Zhou. Evaluating network models: A likelihoodanalysis. EPL (Europhysics Letters), 98(2):28004, 2012.