random walks: basic concepts and applicationspages.di.unipi.it/ricci/slidesrandomwalk.pdf · basic...

Basic Concepts Natural Random Walk Random Walks Characterization Metropolis Hastings Applications

Random Walks:Basic Concepts and Applications

Laura Ricci

Dipartimento di Informatica

25 luglio 2012

PhD in Computer Science


Outline

1 Basic Concepts

2 Natural Random Walk

3 Random Walks Characterization

4 Metropolis Hastings

5 Applications


Random Walk: Basic Concepts

A Random Walk in synthesis:

given an indirected graph and a starting point, select aneighbour at random

move to the selected neighbour and repeat the sameprocess till a termination condition is verified

the random sequence of points selected in this way is arandom walk of the graph


A Random Walk


The Natural Random Walk

Natural Random Walk

Given an undirected graph G = (V,E), with n =| V | andm =| E |, a natural random walk is a stochastic process thatstarts from a given vertex, and then selects one of its neighborsuniformly at random to visit.

The natural random walk is defined by the following transitionmatrix P :

P (x, y) =

{1

degree(x) , y is a neighbour of x

0, otherwise

where x is the out degree of the node x


Natural Random Walk

note that we assume undirected graph: i.e. if the walkercan go from i to j, it can also go from j to i

this does not imply that the probability of the transition ijis the same of the transition ji

it depends on the degree distribution of the nodes


Natural Random Walk: Stationary Distribution

Stationary Distribution

Given an irreducible and aperiodic graph with a set of nodes Nand a set E of edges, the probability of being at a particularnode v converges to the stationary distribution

πRW (v) = deg(v)2×|E| = ∝ degree(v)

if we run the random walk for sufficiently long, then we getarbitrarily close to achieving the distribution πRW (v)

put in a different way: the fraction of time spent at a nodeis directly proportional to the degree of node....

the probability of sampling a node depends on its degree

the natural random walk is inherently biased towards nodewith higher degree


Natural Random Walk Stationary Distribution

Consider a random walk on the house graph below:

d1, d2, d3, d4, d5) = (2, 3, 3, 2, 2)

so stationary distribution is

π1, π2, π3, π4, π5 = ( 212 ,

312 ,

312 ,

212 ,

212)


Natural Random Walk Stationary Distribution

A king moves at random on an 8× 8 chessboard. Thenumber of moves in various locations are as follows:

interior tiles: 8 movesedge tiles: 5 movescorner tiles: 3 moves

The number of all possible edges(moves on the chessboard)is 420

Therefore, the stationary distribution is ( 8420 ,

3420 ,

5420)

This gives an idea of the time spent by the king on eachkind of tile during the random walk


Random Walks and Markov Chains

Time Reversible Markov Chain: the probability of theoccurrence of a state sequence occurring is equal to theprobability of the reversal of that state sequence

running time backward does not affect the distribution at all

Consider a Markov Chain with state space {s1, . . . , sk} andprobability transition matrix P . A stationary distributionπ on S is said to be time reversible for the chain if∀i, j ∈ {1, . . . k}, we have:

π(i)× P (i, j) = π(j)× P (j, i)


Time Reversible Markov Chain

Time Reversibility interpretation

think of π(i)× P (i, j) as the limiting long run fraction oftransitions made by the Markov chain that go from state ito state j.

time reversibility requires that the long run fraction of i toj transitions is the same as that of the j to i transitions,∀i, j.

note that this is a more stringent requirement thanstationarity, which equates the long run fraction oftransitions that go out of state i to the long run fraction oftransitions that go into state i.


Natural Random Walk and Markov Chains

the natural random walk on a graph is a time reversibleMarkov chain with respect to its stationary distribution

as a matter of fact, for all vi and vj neighbours,

π(i)× P (i, j) =did× 1

di=

1

d

π(j)× P (j, i) =djd× 1

dj=

1

d

otherwise, if vi and vj are not neighbours

π(i)× P (i, j) = π(j)× P (j, i) (1)



not all walks on a graph are reversible Markov Chains

consider the following walker: at each time step, the walkermoves one step clockwise with probability 3

4 and one stepcounter clockwise with probability 1

4

π = (14 ,14 ,

14 ,

14) is the only stationary distribution



The transition graph is the following one:

is it sufficient to show that the stationary distributionπ = (14 ,

14 ,

14 ,

14) is not reversible, to conclude that the chain

is not reversible

π(1)× P (1, 2) =1

4× 3

4=

3

16

π(2)× P (2, 1) =1

4× 1

4=

1

16


Random Walk Metrics

Important measures of Random Walk

Access or Hitting Time, Hij : expected number of stepsbefore node j is visited, starting from node i.

Commute Time: expected number of steps in the randomwalk starting at i, before node j is visited and then node iis reached again.

Cover time expected number of steps to reach every node,starting from a given initial distribution.

Graph Cover Time Maximum Cover Times over all Vertexes

Mixing Rate measures how fast the random walk convergesto the Stationary Distribution (Steady State).


Computing Random Walk Metrics: A Warm UpExample

the values of the metrics depends on

the graph topologythe probability fluxes on the graph

let us consider the (simple and not realistic) case ofcomplete graph with nodes {0, . . . , n− 1}

each pair of vertices is connected by an edge

consider a natural random walk on this graph and compute

the access time for a pair of nodesthe cover time of the random walk


A Warm Up Example: Hitting Time

each node has the same number of connections to othernodes

so we can consider a generic pair of nodes, for instancenode 0 and node 1 and compute H(0, 1), without loss ofgenerality.

the probability that, staring from the node 0, we reachnode 1 in the t-h step is(

n−2n−1

)t−1× 1

n−1

so the expected hitting time is:

H(0, 1) =∑∞

t=1 t×(n−2n−1

)t−1× 1

n−1 = n− 1


A Warm Up Example: Cover Time

The problem of the coverage of a complete graph is closelyrelated to the Coupon Collector Problem

since you are eager of cereals you often buy them

each box of cereal contains one of n different coupons

each coupon is chosen independently and uniformly atrandom from n ones

you cannot collaborate with other people to collect thecoupons!

when you have collected one of every type of coupon, youwin a prize!

under these conditions, what is the expected number of boxof cereals you have to buy before you win the prize?



Coupon collection problem is modelled through a geometricdistribution

Geometric Random Distribution:

a sequence of independent trials repeated until the firstsuccess

each trial succeeds with probability p

P(X = n) = (1− p)n−1p

E[X] = 1p


A Warm Example: Cover Time

Modelling the coverage problem as a coupon collection problem:

the coupons are the vertexes of the graph

collecting a coupon corresponds to visiting a new node



Cover time may be modelled by a sequence of GeometricVariables

Let us define a vertex as collected when it has been visitedat least a time by the random walk

X number of vertexes visited by the random walk beforecollecting all the vertexes in the graph

Xi number of vertexes visited after having collected i− 1vertexes and before collecting a new vertex.

X =∑n

i=1Xi

we are interested in computing the expected value ofX = E[X]



Xi is a geometric random variable with success probability

pi = 1− i− 1

n

E[Xi] =1

pi=

n

n− i+ 1

by exploiting the linearity of expectations

E[X] = E[∑n

i=1Xi] =∑n

i=1 E[Xi] =∑n

i=1n

n−i+1 = n∑n

i=11i

the summation∑n

i1i is the harmonic number ≈ logn

we can conclude that the cover time (time to collect all thevertexes of the graph) is ≈ nlogn



Note that

if we want to collect all the nodes through a single randomwalk, we need obviously at least n steps

the factor log(n) introduces a reasonable delay due to therandomness of the approach

this result has been obtained for a completely connected(regular) graph

the cover time in general depends from the topology of thegraph and from the probability fluxes on it


The Lollipop Graph

in the ”lollipop graph”, the asymmetry in the number ofneighbours implies very different hitting time Huv and Hvu

every random walk starting at u has no option but to go inthe direction of u, Hvu = Θ(n2)

a random walk starting at v has very little probability ofproceeding along the straight line, Huv = Θ(n3)

the cover time of the graph is high (Θ(n3)) due to a similarreason.


The Metropolis Hasting Method

Markov Chains (and Random Walks) are a very useful andgeneral tool for simulations

suppose we want to simulate a random draw from somedistribution π on a finite set S.

for instance: generate a set of numbers chosen according toa power law distribution, or to a gaussian distribution,. . .

we can exploit the basic theorem of Markov Chains

find an irreducible, aperiodic probability transition matrixP such that its stationary distribution is π (πP = π)run the corresponding Markov chain for a sufficiently longtime

now the problem is to find the matrix P such that thestationary distribution is π



start from a given connected undirected graph defined on aset S of states. For instance:

a P2P network overlaythe social relationship graph

define the probability transition matrix P such that therandom walk on this graph converges to the stationarydistribution π

the definition of P depends on:

the graph topologythe distribution probability π we want to obtain

different graph topologies lead to different matrices P withstationary distribution π.


The Metropolis Hastings Method

to obtain a stationary distribution where the probability tovisit each node is proportional to its degree, simply run anatural random walk

in general, however, we want to obtain a different,arbitrary distribution π, such that:

π(i) ∝ degree(i)× f(i)

or

π(i) ∝ πRW × f(i)

Our Goal

find a simple way to modify the random walk transitionprobabilities so that the modified probability transition matrixhas stationary distribution π.



To obtain the distribution π, such that:

π(i) ∝ degree(i)× f(i) (2)

the Metropolis Hasting Methods defines the following transitionmatrix

P (i, j) =

1

degree(i) ×min{1,f(j)f(i) } if j ∈ N(i)

1−∑

k∈N(i) P (i, k) if i = j

0 otherwise

(3)

Theorem

Let π the distribution defined by (2)and P (i, j) defined by (3),then πP = π, i.e. π is a stationary distribution.



A modified version of natural random walk

To run the natural random walk, at each time, choose arandom neighbor and go there.

To run Metropolis Hastings, suppose to be at node i,generate next random transition as follows:

start out in the same way as natural random walk bychoosing a random neighbour j of i. j is a ”candidate”state

then make the next probabilistic decision: ”accept thecandidate” and move to j, or ”reject it” and stay at i

the probability to accept the candidate is given by the extrafactor

min{1, f(j)f(i) }



Let us consider the ”correction factor”

min{1, f(j)f(i) }

if f(j) ≥ f(i), the minimum is 1, the chain definitely movesfrom i to j.

if f(j) < f(i), the minimum is f(j)f(i) , the chain moves to j

with probability f(j)f(i) .

if f(j) is much smaller than f(i), the desired distribution πplaces much less probability on j than on i

the chain should make a transition from i to j much lessfrequently than the random walk does

this is accomplished in the Metropolis Hastings chain byusually rejecting the candidate j.


Metropolis Hastings: Adjusting for Degree Bias

avoid the bias toward higher degree nodes

build an uniform distribution even for non regular graphs

define the transition matrix as follows:

P (i, j) =

1

degree(i) ×min{1,degree(i)degree(j)} if j ∈ N(i)

1−∑

k∈N(i) P (i, k) if i = j

0 otherwise

the bias toward higher degree nodes is removed by reducingthe probability of transitioning to higher degree nodes ateach step

instantiate the general formula by

f(j) =1

degree(j), f(i) =

1

degree(i)


Metropolis Hastings: Adjusting for Degree Bias

The algorithm for selecting the next step from the node x

Select a neighbour y of x uniformly at random

Query y for a list of its neighbours, to determine its degree

Generate a random number p, uniformly at randombetween 0 and 1

If p ≤ degree(x)degree(y) , y is the next step.

Otherwise, remain at x as the next step


Metropolis Hasting: An Example

Consider again the house graph:

on the left: Random Walk Stationary Distribution

on the right: the target distribution


Metropolis Hastings: An Example

Consider again the house graph:

to define ∀i f(i), exploit the relation

πTARGET (i) = f(i)πRW (i)

πTARGET (i) = f(i)d(j)

2M

for instance f(1) =412212

= 2

similarly, f(2) = 23 = f(3) and f(4) = 1 = f(5).


Metropolis Hastings: An Example

For instance, the transition probabilities are computed asfollows:

P (1, 2) =1

2×min{1,

23

2} =

1

6

P (1, 1) = 1− 1

6− 1

6=

2

3and so on . . .


Random Walks: Applications

Random Walk exploited to model different scenarios inmathematics and physics

brownian motion of dust particle

statistical mechanics

Random Walks in Computer Science

epidemic diffusion of the information

generate random samples from a large set (for instance a setof nodes from a complex networks)

computation of aggregate functions on complex sets . . .


Random Walks in Computer Science

The graph

nodes: nodes of a peer to peer network, vertex of a socialgraphedges: P2P overlay connections, social relation in a socialnetwork

since a random walk can be viewed as a Markov chain, it iscompletely characterized by its stochastic matrix

we are interested in the probabilities to be assigned to theedge of the graph in order to obtain a given probabilitydistribution


Random Walks: Sampling a Complex Network

measuring the characteristics of complex networks:

P2P networksOn line Social NetworksWorld Wide Web

the complete dataset is not available due to

the huge size of the networkthe privacy concern

sampling techniques are essential for practical estimation ofstudying properties of the network

study the properties of the network based on a small butrepresentative sample.


Random Walks: Sampling a Complex Network

Parameters which may be estimated by a random walk

topological characteristics

node degree distributionclustering coefficientnetwork sizenetwork diameter

node characteristics

link bandwidthnumber of shared filesnumber of friends in a social network

A large amount of proposals in the last years.


Random Walks: Sampling A Complex Network

Random Walk convergence detection mechanisms are requiredto obtain useful sample:

a valid sample of a network may be derived from a randomwalk only when the distribution is stationary

this is true only asymptotically

how many of the initial samples in each walk have to bediscarded to lose dependence from the startingpoint(burn-in problem) ?

a further problem:

how many samples to collect before we having collected arepresentative sample?


Random Walks: Convergence Detection

A naive approach:

run the sampling process long enough and discard a numberof initial burn in samples pro actively

pro: simplicity

cons: from a practical point of view the length of the burnin phase should be minimized because to reduce bandwidthconsumption and computational time.

A more refined approach: estimate convergence from as aset of statistical properties of the walks as they arecollected

several techniques from the Markov chain diagnosticsliterature


Geweke’s Diagnostics

basic idea: take two non overlapping parts of the RandomWalk

compare the means of both parts, using a difference ofmeans test

if the two parts are separated by many iterations, thecorrelation between them is low

see if the two parts of the chain belong to the samedistribution

the two parts should be identically distributed when thewalk converges


Geweke’s Diagnostics

Let X be a single sequence of samples of the metric ofinterest, obtained by a random walk. Consider:

a prefix Xa of X (usually 10% of X)a longer suffix Xb of X (usually 50% of X)

compute the z statistics

z =E(Xa)− E(Xb)√V ar(Xa) + V ar(Xb)

(4)

if | z |> T where T is a threshold value (usually 1.96),iterates from the prefix segment were not yet drawn fromthe target distribution and should be discarded

otherwise declare convergence


Convergence: Using Multiple Parallel Random Walks

a single walk may get trapped in a cluster while exploringthe network

the chain may stay long in some non-representative regionthis may lead to erroneous diagnosis of convergence

to improve convergence, use multiple parallel random walks

Gelman Rubin Diagnostics:

uses parallel chains and discards initial values of each chain

check if all the chains converge to the approximatively thesame target distribution.

the test outputs a single value R that is a function of meansand variances of all chains

failure indicates the need to run a longer chain: burn-in yetto be completed


Applications: Random Membership Management

a Membership Service provides to a node a list of membersof a dynamic network

the overhead of maintaining the full list of members maybe too high for a complex network:

a node maintain a random subset of the nodes.

A random-walk based membership service: random sampleof size k is computed by node i as follows:

start k Metropolis Hastings random walks in parallel

the probability of visiting a node converges to the uniformone

each node visited by the random walk sends its membershipinformation (for instance its IP address) to i which updatesits local membership set


Applications: Load Balancing

Load imbalance in P2P networks may be caused by severalfactors:

uneven distribution of dataheterogeneity in node capacities.

Load biased Random Walks:

sample node with probabilities proportional to their load

discover overloaded nodes more often

each node issues a random walker that persistently runs asa node sampler

overloaded nodes exchange tasks with more light weightednodes


Applications: Index Free Data Search

Index Free Search Method as alternatives to structuredDHT

floodingrandom walks

Popularity-biased Random Walks

content popularity of a peer pi: number of queries satisfiedby pi divided by the total number of queries received by pi.

define a bias in the search towards more popular peers

Index Free Searching: each peer is probed with a probabilityproportional to the square root of its query popularity

random walks: basic concepts and applicationspages.di.unipi.it/ricci/slidesrandomwalk.pdf · basic...

Documents