degrees, power laws and popularitygmateosb/ece442/slides/... · power law and exponential degree...

45
Degrees, Power Laws and Popularity Gonzalo Mateos Dept. of ECE and Goergen Institute for Data Science University of Rochester [email protected] http://www.ece.rochester.edu/ ~ gmateosb/ February 13, 2020 Network Science Analytics Degrees, Power Laws and Popularity 1

Upload: others

Post on 22-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Degrees, Power Laws and Popularitygmateosb/ECE442/Slides/... · Power law and exponential degree distributions THE SCALE-FREE PROPERTY 10 Poisson vs. Power-law Distributions Figure

Degrees, Power Laws and Popularity

Gonzalo MateosDept. of ECE and Goergen Institute for Data Science

University of Rochester

[email protected]

http://www.ece.rochester.edu/~gmateosb/

February 13, 2020

Network Science Analytics Degrees, Power Laws and Popularity 1

Page 2: Degrees, Power Laws and Popularitygmateosb/ECE442/Slides/... · Power law and exponential degree distributions THE SCALE-FREE PROPERTY 10 Poisson vs. Power-law Distributions Figure

Degree distributions

Degree distributions

Power-law degree distributions

Visualizing and fitting power laws

Popularity and preferential attachment

Network Science Analytics Degrees, Power Laws and Popularity 2

Page 3: Degrees, Power Laws and Popularitygmateosb/ECE442/Slides/... · Power law and exponential degree distributions THE SCALE-FREE PROPERTY 10 Poisson vs. Power-law Distributions Figure

Descriptive analysis of network characterstics

I Given a network graph representation of a complex system

⇒ Structural properties of G key to system-level understanding

Example

I Q1: Underpinning of various types of basic social dynamics?

A: Study vertex triplets (triads) and patterns of ties among them

I Q2: How can we formalize the notion of ‘importance’ in a network?

A: Define measures of individual vertex (or group) centrality

I Q3: Can we identify communities and cohesive subgroups?

A: Formulate as a graph partitioning (clustering) problem

I Characterization of individual vertices/edges and network cohesionI Social network analysis, math, computer science, statistical physics

Network Science Analytics Degrees, Power Laws and Popularity 3

Page 4: Degrees, Power Laws and Popularitygmateosb/ECE442/Slides/... · Power law and exponential degree distributions THE SCALE-FREE PROPERTY 10 Poisson vs. Power-law Distributions Figure

Degree

I Def: The degree dv of vertex v is its number of incident edges

⇒ Degree sequence arranges degrees in non-decreasing order

1

2 3

45

6

2

2 4

33

2

I In figure ⇒ Vertex degrees shown in red, e.g., d1 = 2 and d5 = 3⇒ Graph’s degree sequence is 2,2,2,3,3,4

I In general, the degree sequence does not uniquely specify the graph

I High-degree vertices are likely to be influential, central, prominent

Network Science Analytics Degrees, Power Laws and Popularity 4

Page 5: Degrees, Power Laws and Popularitygmateosb/ECE442/Slides/... · Power law and exponential degree distributions THE SCALE-FREE PROPERTY 10 Poisson vs. Power-law Distributions Figure

Degree distribution

I Let N(d) denote the number of vertices with degree d

⇒ Fraction of vertices with degree d is P (d) := N(d)Nv

I Def: The collection {P (d)}d≥0 is the degree distribution of GI Histogram formed from the degree sequence (bins of size one)

P(d)

d

I P (d) = probability that randomly chosen node has degree d

⇒ Summarizes the local connectivity in the network graph

Network Science Analytics Degrees, Power Laws and Popularity 5

Page 6: Degrees, Power Laws and Popularitygmateosb/ECE442/Slides/... · Power law and exponential degree distributions THE SCALE-FREE PROPERTY 10 Poisson vs. Power-law Distributions Figure

Joint degree distribution

I Q: What about patterns of association among nodes of given degrees?

I A: Define the two-dimensional analogue of a degree distribution3

0 2 4 6 8 10

02

46

810

log2(Degree)

log 2(Degree)

0 2 4 6 80

24

68

log2(Degree)

log 2(Degree)

Fig. 4.3 Image representation of the logarithmically transformed joint degree distribution{log2 fd,d′}, for the router-level Internet data (left) and the protein interaction data (right). Col-ors range from blue (low relative frequency) to red (high relative frequency), with white indicatingareas with no data. Note that both x- and y-axes are also on base-2 logarithmic scales.

Router-level Internet Protein interaction

I Prob. of random edge having incident vertices with degrees (d1, d2)

Network Science Analytics Degrees, Power Laws and Popularity 6

Page 7: Degrees, Power Laws and Popularitygmateosb/ECE442/Slides/... · Power law and exponential degree distributions THE SCALE-FREE PROPERTY 10 Poisson vs. Power-law Distributions Figure

A simple random graph model

I Def: The Erdos-Renyi random graph model Gn,p

I Undirected graph with n vertices, i.e., of order Nv = nI Edge (u, v) present with probability p, independent of other edges

I Simulation is easy: draw(n2

)i.i.d. Bernoulli(p) RVs

Example

I Three realizations of G10, 16. The size Ne is a random variable

Network Science Analytics Degrees, Power Laws and Popularity 7

Page 8: Degrees, Power Laws and Popularitygmateosb/ECE442/Slides/... · Power law and exponential degree distributions THE SCALE-FREE PROPERTY 10 Poisson vs. Power-law Distributions Figure

Degree distribution of Gn,p

I Q: Degree distribution P (d) of the Erdos-Renyi graph Gn,p?

I Define I {(v , u)} = 1 if (v , u) ∈ E , and I {(v , u)} = 0 otherwise.

⇒ Fix v . For all u 6= v , the indicator RVs are i.i.d. Bernoulli(p)

I Let Dv be the (random) degree of vertex v . Hence,

Dv =∑u 6=v

I {(v , u)}

⇒ Dv is binomial with parameters (n − 1, p) and

P (d) = P (Dv = d) =

(n − 1

d

)pd(1− p)(n−1)−d

I In words, the probability of having exactly d edges incident to v

⇒ Same for all v ∈ V , by independence of the Gn,p model

Network Science Analytics Degrees, Power Laws and Popularity 8

Page 9: Degrees, Power Laws and Popularitygmateosb/ECE442/Slides/... · Power law and exponential degree distributions THE SCALE-FREE PROPERTY 10 Poisson vs. Power-law Distributions Figure

Behavior for large n

I Q: How does the degree distribution look like for a large network?

I Recall Dv is a sum of n − 1 i.i.d. Bernoulli(p) RVs

⇒ Central Limit Theorem: Dv ∼ N (np, np(1− p)) for large n

d0 5 10 15 20 25 30 35 40

P(d

)

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2p=0.5, n=20p=0.5, n=40p=0.5, n=60

d0 5 10 15 20 25

P(d

)

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2Binomial(20,1/2)Binomial(60,1/6)Poisson(10)

I Makes most sense to increase n with fixed E [Dv ] = (n − 1)p = µ

⇒ Law of rare events: Dv ∼ Poisson(µ) for large n

Network Science Analytics Degrees, Power Laws and Popularity 9

Page 10: Degrees, Power Laws and Popularitygmateosb/ECE442/Slides/... · Power law and exponential degree distributions THE SCALE-FREE PROPERTY 10 Poisson vs. Power-law Distributions Figure

Law of rare events

I Substituting p = µ/n in the binomial PMF yields

Pn(d) =n!

(n − d)!d!

(µn

)d (1− µ

n

)n−d

=n(n − 1) . . . (n − d + 1)

nd

µd

d!

(1− µ/n)n

(1− µ/n)d

I In the limit, red term is limn→∞

(1− µ/n)n = e−µ

I Black and blue terms converge to 1. Limit is the Poisson PMF

limn→∞

Pn(d) = 1µd

d!

e−µ

1= e−µµ

d

d!

I Approximation usually called “law of rare events”I Individual edges happen with small probability p = µ/nI The aggregate (degree, number of edges), though, need not be rare

Network Science Analytics Degrees, Power Laws and Popularity 10

Page 11: Degrees, Power Laws and Popularitygmateosb/ECE442/Slides/... · Power law and exponential degree distributions THE SCALE-FREE PROPERTY 10 Poisson vs. Power-law Distributions Figure

The Gn,p model and real-world networks

I For large graphs, Gn,p suggests P (d) with an exponential tail

⇒ Unlikely to see degrees spanning several orders of magnitude

d0 10 20 30 40 50

P(d

)

0

0.02

0.04

0.06

0.08

0.1

0.12Linear scale

101 102 103

P(d

)10-300

10-250

10-200

10-150

10-100

10-50

100 Logarithmic scale

d

I Concentrated distribution around the mean E [Dv ] = (n − 1)p

I Q: Is this in agreement with real-world networks?

Network Science Analytics Degrees, Power Laws and Popularity 11

Page 12: Degrees, Power Laws and Popularitygmateosb/ECE442/Slides/... · Power law and exponential degree distributions THE SCALE-FREE PROPERTY 10 Poisson vs. Power-law Distributions Figure

World Wide Web

I Degree distributions of the WWW analyzed in [Broder et al ’00]

⇒ Web a digraph, study both in- and out-degree distributions

I Majority of vertices naturally have small degrees

⇒ Nontrivial amount with orders of magnitude higher degrees

Network Science Analytics Degrees, Power Laws and Popularity 12

Page 13: Degrees, Power Laws and Popularitygmateosb/ECE442/Slides/... · Power law and exponential degree distributions THE SCALE-FREE PROPERTY 10 Poisson vs. Power-law Distributions Figure

Internet autonomous systems

I The topology of the AS-level Internet studied in [Faloutsos3 ’99]

I Right-skewed degree distributions also found for router-level Internet

Network Science Analytics Degrees, Power Laws and Popularity 13

Page 14: Degrees, Power Laws and Popularitygmateosb/ECE442/Slides/... · Power law and exponential degree distributions THE SCALE-FREE PROPERTY 10 Poisson vs. Power-law Distributions Figure

Seems to be a structural pattern

I More heavy-tailed degree distributions found in [Barabasi-Albert ’99]P(

d)

d d d Author collaboration Web graph Power grid

I These heterogeneous, diffuse degree distributions are not exponential

Network Science Analytics Degrees, Power Laws and Popularity 14

Page 15: Degrees, Power Laws and Popularitygmateosb/ECE442/Slides/... · Power law and exponential degree distributions THE SCALE-FREE PROPERTY 10 Poisson vs. Power-law Distributions Figure

Power laws

Degree distributions

Power-law degree distributions

Visualizing and fitting power laws

Popularity and preferential attachment

Network Science Analytics Degrees, Power Laws and Popularity 15

Page 16: Degrees, Power Laws and Popularitygmateosb/ECE442/Slides/... · Power law and exponential degree distributions THE SCALE-FREE PROPERTY 10 Poisson vs. Power-law Distributions Figure

Power-law degree distributions1

0 2 4 6 8 10

−15

−10

−5

log2(Degree)

log 2(Frequency)

0 2 4 6 8

−12

−10

−8−6

−4

log2(Degree)

log 2(Frequency)

Fig. 4.1 Degree distributions. Left: the router-level Internet network graph described in Sec-tion 3.5.2. Right: the network of measured interactions among proteins in S. cerevisiae (yeast),as of January 2007. In each plot, both x- and y-axes are in base-2 logarithmic scale.

Copyright 2009 Springer Science+Business Media, LLC. These figures may be used for noncom-mercial purposes as long as the source is cited: Kolaczyk, Eric D. Statistical Analysis of NetworkData: Methods and Models (2009) Springer Science+Business Media LLC.

I Log-log plots show roughly a linear decay, suggesting the power law

P (d) ∝ d−α ⇒ log P (d) = C − α log d

I Power-law exponent (negative slope) is typically α ∈ [2, 3]I Normalization constant C is mostly uninteresting

I Power laws often best followed in the tail, i.e., for d ≥ dmin

Network Science Analytics Degrees, Power Laws and Popularity 16

Page 17: Degrees, Power Laws and Popularitygmateosb/ECE442/Slides/... · Power law and exponential degree distributions THE SCALE-FREE PROPERTY 10 Poisson vs. Power-law Distributions Figure

Power law and exponential degree distributions

THE SCALE-FREE PROPERTY 10

Poisson vs. Power-law DistributionsFigure 4.4

(d)

(b)(a)

(c)

(a) Comparing a Poisson function with a power-law function (ਠ= 2.1) on a linear plot. Both distributions have ࢭk10 =ࢮ.

(b) The same curves as in (a), but shown on a log-log plot, allowing us to inspect the dif-ference between the two functions in the high-k regime.

(c) A random network with ࢭk3 =ࢮ and N = 50, illustrating that most nodes have compara-ble degree k ࢭݍkࢮ.

(d) A scale-free network with ਠ=2.1 and ࢭkࢮ= 3, illustrating that numerous small-degree nodes coexist with a few highly connected hubs.

The Largest Hub

All real networks are finite. The size of the WWW is estimated to be N ݍ 1012 nodes; the size of the social network is the Earth’s population, about N -These numbers are huge, but finite. Other networks pale in com .109 × �7ݍparison: The genetic network in a human cell has approximately 20,000 genes while the metabolic network of the E. Coli bacteria has only about a thousand metabolites. This prompts us to ask: How does the network size affect the size of its hubs? To answer this we calculate the expected maxi-mum degree, kmax, called the natural cutoff of the degree distribution pk. It represents the expected size of the largest hub in a network.

It is instructive to perform the calculation first for the exponential dis-tribution

For a network with minimum degree kmin, the normalization condition

provides C = ਨeਨkmin. To calculate kmax we assume that in a network of N nodes we expect at most one node in the (kmax, ∞) regime (ADVANCED TOPICS 3.B). In other words the probability to observe a node whose degree exceeds kmax is 1/N:

(4.16)

(4.15)∫ =∞ p k dk( ) 1kmin

∫ =∞ p k dk N( ) 1 .kmax

1000 10 20 30 40 50

0.05

0.1

0.15

10-6

100

10-1

10-2

10-3

10-4

10-5

101 102 103

POISSON

kk

pkpk

pk ~ k-2.1

POISSON

pk ~ k-2.1

1000 10 20 30 40 50

0.05

0.1

0.15

10-6

100

10-1

10-2

10-3

10-4

10-5

101 102 103

POISSON

kk

pkpk

pk ~ k-2.1

POISSON

pk ~ k-2.1

1000 10 20 30 40 50

0.05

0.1

0.15

10-6

100

10-1

10-2

10-3

10-4

10-5

101 102 103

POISSON

kk

pkpk

pk ~ k-2.1

POISSON

pk ~ k-2.1

1000 10 20 30 40 50

0.05

0.1

0.15

10-6

100

10-1

10-2

10-3

10-4

10-5

101 102 103

POISSON

kk

pkpk

pk ~ k-2.1

POISSON

pk ~ k-2.1

p(k) = Ce��k .

P(d)

d   d  

P(d)=d-­‐2.1   P(d)=d-­‐2.1  

Poisson  

Poisson  

I Erdos-Renyi’s Poisson degree distribution exhibits a sharp cutoff

⇒ Power laws upper bound exponential tails for large enough d

Network Science Analytics Degrees, Power Laws and Popularity 17

Page 18: Degrees, Power Laws and Popularitygmateosb/ECE442/Slides/... · Power law and exponential degree distributions THE SCALE-FREE PROPERTY 10 Poisson vs. Power-law Distributions Figure

Scale-free networks

I Scale-free network: degree distribution with power-law tailI Name motivated for the scale-invariance property of power laws

I Def: A scale-free function f (x) satisfies f (ax) = bf (x), for a, b ∈ R

Example

I Power-law functions f (x) = x−α are scale-free since

f (ax) = (ax)−α = a−αf (x) = bf (x), where b := a−α

I Exponential functions f (x) = cx are not scale-free because

f (ax) = cax = (cx)a = f a(x) 6= bf (x), except when a = b = 1

I No ‘characteristic scale’ for the degrees. More soon

⇒ Functional form of the distribution is invariant to scale

Network Science Analytics Degrees, Power Laws and Popularity 18

Page 19: Degrees, Power Laws and Popularitygmateosb/ECE442/Slides/... · Power law and exponential degree distributions THE SCALE-FREE PROPERTY 10 Poisson vs. Power-law Distributions Figure

Power-law distributions are ubiquitous

I Power-law distributions widespread beyond networks [Clauset et al ’07]

Network Science Analytics Degrees, Power Laws and Popularity 19

Page 20: Degrees, Power Laws and Popularitygmateosb/ECE442/Slides/... · Power law and exponential degree distributions THE SCALE-FREE PROPERTY 10 Poisson vs. Power-law Distributions Figure

Normalization

I The power-law degree distribution P (d) = Cd−α is a PMF, hence

1 =∞∑d=0

P (d) =∞∑d=0

Cd−α ⇒ C =1∑∞

d=0 d−α

I Often a power law is only valid for the tail d ≥ dmin, hence

C =1∑∞

d=dmind−α

≈ 1∫∞dmin

x−αdx= (α− 1)dα−1

min

⇒ Sound approximation since P (d) varies slowly for large d

I The normalized power-law degree distribution is

P (d) =α− 1

dmin

(d

dmin

)−α

, d ≥ dmin

Network Science Analytics Degrees, Power Laws and Popularity 20

Page 21: Degrees, Power Laws and Popularitygmateosb/ECE442/Slides/... · Power law and exponential degree distributions THE SCALE-FREE PROPERTY 10 Poisson vs. Power-law Distributions Figure

Power-law probability density function

I Often convenient to treat degrees as real valued, i.e., d ∈ R+

I Define a power-law PDF for the tail of the degree distribution as

p(d) =α− 1

dmin

(d

dmin

)−α

, d ≥ dmin

⇒ A valid PDF, already showed that∫∞dmin

p(x)dx = 1

⇒ Convergence of the integral requires α > 1

I Ex: Probability that a random node has degree exceeding 100 is

P (Dv > 100) =

∫ ∞

100

α− 1

dmin

(x

dmin

)−α

dx =

(100

dmin

)1−α

Network Science Analytics Degrees, Power Laws and Popularity 21

Page 22: Degrees, Power Laws and Popularitygmateosb/ECE442/Slides/... · Power law and exponential degree distributions THE SCALE-FREE PROPERTY 10 Poisson vs. Power-law Distributions Figure

Divergent moments

I Q: What is the m-th moment of a power-law distributed RV?

I From the definition of moment and the power-law PDF one has

E [Dmv ] =

∫ ∞

dmin

xmp(x)dx =α− 1

d1−αmin

[xm+1−α

m + 1− α

]∞dmin

⇒ Convergence of the integral requires m + 1 < α

I For real-world networks, typically α ∈ (2, 3) so

E [Dv ] =

(α− 1

α− 2

)dmin < ∞ and E [Dm

v ] = ∞, m ≥ 2

I In particular, the second moment and variance are infinite

⇒ Consistent with variability and heterogeneity of degrees

Network Science Analytics Degrees, Power Laws and Popularity 22

Page 23: Degrees, Power Laws and Popularitygmateosb/ECE442/Slides/... · Power law and exponential degree distributions THE SCALE-FREE PROPERTY 10 Poisson vs. Power-law Distributions Figure

Revisiting the scale-free property

I A measure of scale of a RV is its standard deviation σ

P(d)

d  

P(d)=d-­‐2.1  

Poisson  

μ  

Large random network Gn,p

I Randomly chosen node has degree d = µ±√µ. The scale is µ

Scale-free networkI Randomly chosen node has degree d = µ±∞. There is no scale

Network Science Analytics Degrees, Power Laws and Popularity 23

Page 24: Degrees, Power Laws and Popularitygmateosb/ECE442/Slides/... · Power law and exponential degree distributions THE SCALE-FREE PROPERTY 10 Poisson vs. Power-law Distributions Figure

Visualizing and fitting power laws

Degree distributions

Power-law degree distributions

Visualizing and fitting power laws

Popularity and preferential attachment

Network Science Analytics Degrees, Power Laws and Popularity 24

Page 25: Degrees, Power Laws and Popularitygmateosb/ECE442/Slides/... · Power law and exponential degree distributions THE SCALE-FREE PROPERTY 10 Poisson vs. Power-law Distributions Figure

Visualizing power-law degree distributions

I A simple histogram may be problematic for visualizing P (d)

⇒ Use log-log scale to warp probabilities and widespread degrees

P(d)

d   d  

P(d)

I Large statistical fluctuations (‘noise’) in the tail for large d

⇒ With bins of size one, high-degree counts are small

⇒ Makes sense to increase the bin size

Network Science Analytics Degrees, Power Laws and Popularity 25

Page 26: Degrees, Power Laws and Popularitygmateosb/ECE442/Slides/... · Power law and exponential degree distributions THE SCALE-FREE PROPERTY 10 Poisson vs. Power-law Distributions Figure

Logarithmic binning

I Uniformly widening bins sacrifices resolution for small degrees

⇒ Use bins of different sizes in different parts of the histogram

P(d)

d  I Logarithmic binning is widely used. The n-th bin is

an−1 ≤ d < an, n = 1, 2, . . .

Ex: Common choice is a = 2, n-th bin has width 2n − 2n−1 = 2n−1

I Normalize by the bin width. Wider bins will accrue higher counts

Network Science Analytics Degrees, Power Laws and Popularity 26

Page 27: Degrees, Power Laws and Popularitygmateosb/ECE442/Slides/... · Power law and exponential degree distributions THE SCALE-FREE PROPERTY 10 Poisson vs. Power-law Distributions Figure

Complementary cumulative distribution function

I Def: The complementary cumulative distribution function (CCDF) is

F (d) = P (Dv ≥ d)

⇒ Function F (d) is the fraction of vertices with degree at least d

I For a power-law PDF, the CCDF also obeys a power law since

P (Dv ≥ d) =

∫ ∞

d

α− 1

dmin

(x

dmin

)−α

dx =

(d

dmin

)−(α−1)

I If the PDF has exponent α, then CCDF F (d) has exponent α− 1

Network Science Analytics Degrees, Power Laws and Popularity 27

Page 28: Degrees, Power Laws and Popularitygmateosb/ECE442/Slides/... · Power law and exponential degree distributions THE SCALE-FREE PROPERTY 10 Poisson vs. Power-law Distributions Figure

Computing the CCDF

Step 1: List the degrees dv in descending order

Step 2: Assign ranks rv (from 1 to Nv ) to vertices in that order

Step 3: The CCDF is the plot of rv/Nv versus degree dv

1

0.1 0.3 0.4

1 2 3 4 d

F(d)

dv rv rv/Nv

4

3

3

2

1

1

1

1

1

1

1

2

3

4

5

6

7

8

9

10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

0.1

0.3

0.4

1.0

F(d)

I If degrees are repeated, CCDF is the largest value of rv/Nv

I If d not observed, F (d) = value for next (larger) observed degree

Network Science Analytics Degrees, Power Laws and Popularity 28

Page 29: Degrees, Power Laws and Popularitygmateosb/ECE442/Slides/... · Power law and exponential degree distributions THE SCALE-FREE PROPERTY 10 Poisson vs. Power-law Distributions Figure

Visualizing power laws with the CCDF

I Plot the CCDF in a log-log scale and look for a straight-line behavior

d  

F(d)

I Mitigates noise using cumulative frequencies (cf. raw frequencies)

I No binning needed ⇒ Avoids information loss as bins widen

Network Science Analytics Degrees, Power Laws and Popularity 29

Page 30: Degrees, Power Laws and Popularitygmateosb/ECE442/Slides/... · Power law and exponential degree distributions THE SCALE-FREE PROPERTY 10 Poisson vs. Power-law Distributions Figure

Fitting power-law distributions

I Basic, yet nontrivial task is to estimate the exponent α from data

I A power law implies the linear model log P (d) = C − α log d + ε

⇒ Natural to form the linear least-squares (LS) estimator

{α, C} = argminα,C

∑i

(log P (di )− C + α log di )2

I Simple, very popular, but not advisable for at least three reasons:

1) Extremely noisy high-degree data, where the counts are the lowest2) Estimates are biased. The log transform distorts unevenly the errors3) If the power law is only valid in the tail, need to hand pick dmin

Network Science Analytics Degrees, Power Laws and Popularity 30

Page 31: Degrees, Power Laws and Popularitygmateosb/ECE442/Slides/... · Power law and exponential degree distributions THE SCALE-FREE PROPERTY 10 Poisson vs. Power-law Distributions Figure

Linear regression inference on the CCDF

I A solution to the noise problem is to use the CCDF F (d)

⇒ Cumulative frequencies smoothen the noise

I Recall the CCDF follows a power law with exponent α− 1

⇒ Can use a linear regression-based approach to find α, but . . .

P(d)

d   d  

P(d)

d  

F(d)

I Successive points in the CCDF plot are not mutually independent

⇒ (Ordinary) LS is not optimal for correlated errors

Network Science Analytics Degrees, Power Laws and Popularity 31

Page 32: Degrees, Power Laws and Popularitygmateosb/ECE442/Slides/... · Power law and exponential degree distributions THE SCALE-FREE PROPERTY 10 Poisson vs. Power-law Distributions Figure

Maximum-likelihood estimator

I Suppose {di}Nv

i=1 are independent and follow a power law. MLE of α?

⇒ The data PDF is f (d ;α) = α−1dmin

(d

dmin

)−α

, d ≥ dmin

I The log-likelihood function is (up to constants independent of α)

`Nv (α) =Nv∑i=1

log f (di ;α) ∝ Nv log (α− 1)− α

Nv∑i=1

log

(didmin

)I The MLE α (a.k.a. Hill estimator) solves the equation

∂`Nv (α)

∂α

∣∣∣∣α=α

=Nv

α− 1−

Nv∑i=1

log

(didmin

)= 0

I The solution is

α = 1 +

[1

Nv

Nv∑i=1

log

(didmin

)]−1

Network Science Analytics Degrees, Power Laws and Popularity 32

Page 33: Degrees, Power Laws and Popularitygmateosb/ECE442/Slides/... · Power law and exponential degree distributions THE SCALE-FREE PROPERTY 10 Poisson vs. Power-law Distributions Figure

Hill plot of ML estimates

I Q: How can we go around hand-picking the value of dmin?

1) Rank-order degrees to obtain the sequence d(1) ≤ . . . ≤ d(Nv )

2) For each k ∈ {1, . . . ,Nv − 1} let dmin = d(Nv−k). The MLEs are

α(k) = 1 +

[1

k

k−1∑i=0

log

(d(Nv−i)

d(Nv−k)

)]−1

3) Draw and examine the Hill plot of α(k) versus k

I If a power law is credible, the Hill plot should ‘settle down’

⇒ Identify stable α for a wide range of (intermediate) k values

I Q: Why focus on values on the intermediate range?I Small k: Inaccurate estimation due to limited dataI Large k: Bias if power law is only valid in the tail

Network Science Analytics Degrees, Power Laws and Popularity 33

Page 34: Degrees, Power Laws and Popularitygmateosb/ECE442/Slides/... · Power law and exponential degree distributions THE SCALE-FREE PROPERTY 10 Poisson vs. Power-law Distributions Figure

Example: Internet and protein interaction data

2

0 50000 100000 150000

23

45

67

8

k

αk

0 1000 2000 3000 4000 5000

1.5

2.5

3.5

4.5

k

αk

Fig. 4.2 Hill plots of maximum likelihood estimates αk, as a function of k (i.e., the number oforder statistics used), for the Internet (left) and protein interaction (right) datasets.

1

0 2 4 6 8 10

−15

−10

−5

log2(Degree)

log 2(Frequency)

0 2 4 6 8

−12

−10

−8−6

−4

log2(Degree)

log 2(Frequency)

Fig. 4.1 Degree distributions. Left: the router-level Internet network graph described in Sec-tion 3.5.2. Right: the network of measured interactions among proteins in S. cerevisiae (yeast),as of January 2007. In each plot, both x- and y-axes are in base-2 logarithmic scale.

Copyright 2009 Springer Science+Business Media, LLC. These figures may be used for noncom-mercial purposes as long as the source is cited: Kolaczyk, Eric D. Statistical Analysis of NetworkData: Methods and Models (2009) Springer Science+Business Media LLC.

Power law is credible

Power law is inappropriate

I Sharp decay in α suggests a simple power-law model is inappropriate

Network Science Analytics Degrees, Power Laws and Popularity 34

Page 35: Degrees, Power Laws and Popularitygmateosb/ECE442/Slides/... · Power law and exponential degree distributions THE SCALE-FREE PROPERTY 10 Poisson vs. Power-law Distributions Figure

Example: Flickr data

I Flickr social network: Nv ≈ 0.6M, Ne ≈ 3.5M [Leskovec et al ’08]

F(d)

F(d)

P

(d)

P(d

)

d   d  

d   d  

d-­‐1.75  

d-­‐0.75   d-­‐0.75  d-­‐0.75e-­‐0.002d  

Linear scale Log scale

CCDF, log scale Power law with exp. cutoff, log scale

I Good fit to a power law with exponential cutoff F (d) ∝ d−αe−βd

Network Science Analytics Degrees, Power Laws and Popularity 35

Page 36: Degrees, Power Laws and Popularitygmateosb/ECE442/Slides/... · Power law and exponential degree distributions THE SCALE-FREE PROPERTY 10 Poisson vs. Power-law Distributions Figure

Popularity and preferential attachment

Degree distributions

Power-law degree distributions

Visualizing and fitting power laws

Popularity and preferential attachment

Network Science Analytics Degrees, Power Laws and Popularity 36

Page 37: Degrees, Power Laws and Popularitygmateosb/ECE442/Slides/... · Power law and exponential degree distributions THE SCALE-FREE PROPERTY 10 Poisson vs. Power-law Distributions Figure

Popularity as a network phenomenon

I Popularity is a phenomenon characterized by extreme imbalancesI How can we quantify these imbalances? Why do they arise?

I Basic models of network behavior can be very insightful

⇒ Result of coupled decisions, correlated behavior in a population

Network Science Analytics Degrees, Power Laws and Popularity 37

Page 38: Degrees, Power Laws and Popularitygmateosb/ECE442/Slides/... · Power law and exponential degree distributions THE SCALE-FREE PROPERTY 10 Poisson vs. Power-law Distributions Figure

Preferential attachment model

I Simple model for the creation of e.g., links among Web pages

I Vertices are created one at a time, denoted 1, . . . ,Nv

I When node j is created, it makes a single arc to i , 1 ≤ i < j

I Creation of (j , i) governed by a probabilistic rule:I With probability p, j links to i chosen uniformly at randomI With probability 1− p, j links to i with probability ∝ d in

i

I The resulting graph is directed, each vertex has doutv = 1

I Preferential attachment model leads to “rich-gets-richer” dynamics

⇒ Arcs formed preferentially to (currently) most popular nodes

⇒ Prob. that i increases its popularity ∝ i ’s current popularity

Network Science Analytics Degrees, Power Laws and Popularity 38

Page 39: Degrees, Power Laws and Popularitygmateosb/ECE442/Slides/... · Power law and exponential degree distributions THE SCALE-FREE PROPERTY 10 Poisson vs. Power-law Distributions Figure

Preferential attachment yields power laws

TheoremThe preferential attachment model gives rise to a power-law in-degreedistribution with exponent α = 1 + 1

1−p , i.e.,

P(d in = d

)∝ d

−(1+ 1

1−p

)

I Key: “j links to i with probability ∝ d ini ” equivalent to copying, i.e.,

“j chooses k uniformly at random, and links to i if (k, i) ∈ E”

I Reflect: Copy other’s decision vs. independent decisions in Gn,p

I As p → 0 ⇒ Copying more frequent ⇒ Smaller α → 2I Intuitive: more likely to see extremely popular pages (heavier tail)

Network Science Analytics Degrees, Power Laws and Popularity 39

Page 40: Degrees, Power Laws and Popularitygmateosb/ECE442/Slides/... · Power law and exponential degree distributions THE SCALE-FREE PROPERTY 10 Poisson vs. Power-law Distributions Figure

Continuous approximation

I In-degree d ini (t) of node i at time t ≥ i is a RV. Two facts

F1) Initial condition: d ini (i) = 0 since node i just created at time t = i

F2) Dynamics of d ini (t): Probability that new node t + 1 > i links to i is

P ((t + 1, i) ∈ E) = p × 1

t+ (1− p)× d in

i (t)

t

I Will study a deterministic, continuous approximation to the modelI Continuous time t ∈ [0,Nv ]I Continuous degrees x in

i (t) : [i ,Nv ] 7→ R+ are deterministic

I Require in-degrees to satisfy the following growth equation

dx ini (t)

dt=

p

t+

(1− p)x ini (t)

t, x ini (i) = 0

Network Science Analytics Degrees, Power Laws and Popularity 40

Page 41: Degrees, Power Laws and Popularitygmateosb/ECE442/Slides/... · Power law and exponential degree distributions THE SCALE-FREE PROPERTY 10 Poisson vs. Power-law Distributions Figure

Solving the differential equation

I Solve the first-order differential equation for x ini (t) (let q = 1− p)

dx inidt

=p + qx ini

t

I Divide both sides by p + qx ini (t) and integrate over t∫1

p + qx ini

dx inidt

dt =

∫1

tdt

I Solving the integrals, we obtain (c is a constant)

ln (p + qx ini ) = q ln (t) + c

Network Science Analytics Degrees, Power Laws and Popularity 41

Page 42: Degrees, Power Laws and Popularitygmateosb/ECE442/Slides/... · Power law and exponential degree distributions THE SCALE-FREE PROPERTY 10 Poisson vs. Power-law Distributions Figure

Solving the differential equation (cont.)

I Exponentiating and letting K = ec we find

ln (p + qx ini (t)) = q ln (t) + c ⇒ x ini (t) =1

q(Ktq − p)

I To determine the unknown constant K , use the initial condition

0 = x ini (i) =1

q(Kiq − p) ⇒ K =

p

iq

I Hence, the deterministic approximation of d ini (t) evolves as

x ini (t) =1

q

( p

iq× tq − p

)=

p

q

[( ti

)q

− 1

]

Network Science Analytics Degrees, Power Laws and Popularity 42

Page 43: Degrees, Power Laws and Popularitygmateosb/ECE442/Slides/... · Power law and exponential degree distributions THE SCALE-FREE PROPERTY 10 Poisson vs. Power-law Distributions Figure

Obtaining the degree distribution

I Q: At time t, what fraction F (d) of all nodes have in-degree ≥ d?

Approximation: What fraction of all functions x ini (t) ≥ d by time t?

x ini (t) =p

q

[( ti

)q

− 1

]≥ d

I Can be rewritten in terms of i as

i ≤ t

[(q

p

)d + 1

]−1/q

I By time t there are exactly t nodes in the graph, so the fraction is

F (d) =

[(q

p

)d + 1

]−1/q

= 1− F (d)

Network Science Analytics Degrees, Power Laws and Popularity 43

Page 44: Degrees, Power Laws and Popularitygmateosb/ECE442/Slides/... · Power law and exponential degree distributions THE SCALE-FREE PROPERTY 10 Poisson vs. Power-law Distributions Figure

Identifying the power law

I The degree distribution is given by the PDF p(d)

I Recall that the PDF, CDF and CCDF are related, namely

p(x) =dF (x)

dx= −dF (x)

dx

I Differentiating F (d) =[(

qp

)d + 1

]−1/q

yields

p(d) =1

p

[(q

p

)d + 1

]−(1+ 1q )

I Showed p(d) ∝ d−(1+1/q), a power law with exponent α = 1 + 11−p

⇒ Disclaimer: Relied on heuristic arguments

⇒ Rigorous, probabilistic analysis possible

Network Science Analytics Degrees, Power Laws and Popularity 44

Page 45: Degrees, Power Laws and Popularitygmateosb/ECE442/Slides/... · Power law and exponential degree distributions THE SCALE-FREE PROPERTY 10 Poisson vs. Power-law Distributions Figure

Glossary

I Degree distribution

I Erdos-Renyi model

I Binomial distribution

I Law of rare events

I Right-skewed distribution

I Logarithmic scale

I Power law

I Exponential and heavy tails

I Scale-free network

I Characteristic scale

I Logarithmic binning

I Cumulative frequencies

I Hill estimator and plot

I Exponential cutoff

I Coupled decisions

I Preferential attachment model

I Rich-gets-richer phenomena

I Growth equation

Network Science Analytics Degrees, Power Laws and Popularity 45