spacey random walks and higher order markov chains

Spacey Random Walks on Higher-Order Markov Chains

David F. Gleich!Purdue University!

Joint work with Austin Benson, Lek-Heng Lim, supported by "NSF CAREER CCF-1149756 IIS-1422918

SIAM NetSci15 David Gleich · Purdue 1

2

Spacey walk !on Google Images From Film.com

WARNING!!This talk presents the “forward” explicit derivation (i.e. lots of little steps) rather than the implicit “backwards” derivation (i.e. big intuitive leaps)


PageRank: The initial condition

My dissertation"Models & Algorithms for PageRank Sensitivity

The essence of PageRank!Take any Markov chain P, PageRank "creates a related chain with great “utility” •  Unique stationary distribution •  Fast convergence •  Modeling flexibility

(I � ↵P)x = (1 � ↵)v

PageRank beyond the Web arXiv:1407.5107

by Jessica Leber

Fast Magazine


Be careful about what you discuss after a talk…

I gave a talk!at the Univ. of Chicago and visited Lek-heng Lim!

He told me about a new idea!in Markov chains analysis and tensor eigenvalues


Approximate stationary distributions of higher-order Markov chains

A higher order Markov chain!depends on the last few states. These become Markov chains on the product state space."But that’s usually too large for stationary distributions.

The approximation!is that we form a rank-1 approximation of that stationary distribution object.

Due to Michael Ng and collaborators

P(Xt+1 = i | history) = P(Xt+1 = i | Xt = j , Xt�1 = k )

P(X = [i , j ]) = x

i

x

j


P(X = [i , j ]) = Xi ,j

Why?


Multidimensional, multi-faceted data from inform-atics and simulations

are usually analyzed by two dimensional matrix computations like SVD ...

This proposal investigates multidimensional representations and multi-dimensional clustering algorithms using new, structured tensor computations.

SVD, PCA, NMF, ROM, Eigenmaps.

Multi-dimensional spectral clustering

These will impact methods to decompose hyper-spectral images, determine network functions, and build reduced-order models of simulations.

But two dimensional analysis limits insights and methodology.

++

We want to analyze higher-order relationships and multi-way data and … Things like •  Enron emails •  Regular hypergraphs And there’s three+ indices! So it’s a "higher-order Markov chain

Approximate stationary distributions of higher-order Markov chains

The new problem!of computing an approx. stationary dist. is a tensor eigenvector

The new problem’!•  existence is guaranteed under mild conditions •  uniqueness … •  convergence …

Due to Michael Ng and collaborators

x

i

=X

jk

P

ijk

x

j

x

k

or x = Px

2

require heroic algebra (and are hard to check)


Some small quick notes

A stochastic matrix M is a Markov chain A stochastic hypermatrix / tensor / probability P table "is a higher-order Markov chain


Multidimensional, multi-faceted data from inform-atics and simulations

are usually analyzed by two dimensional matrix computations like SVD ...

This proposal investigates multidimensional representations and multi-dimensional clustering algorithms using new, structured tensor computations.

SVD, PCA, NMF, ROM, Eigenmaps.

Multi-dimensional spectral clustering

These will impact methods to decompose hyper-spectral images, determine network functions, and build reduced-order models of simulations.

But two dimensional analysis limits insights and methodology.

++

PageRank to the rescue! What if we looked at these approx. stat. distributions of a PageRank modified higher-order chain?

Multilinear PageRank!

•  Formally the Li & Ng approx. stat. dist. of the PageRank modified higher order Markov chain

•  Guaranteed existence! •  Fast convergence ? •  Uniqueness ?

x = ↵Px

2 + (1 � ↵)v

Multilinear PageRank"Gleich, Lim, Yu"

arXiv:1409.1465

when alpha < 1/order ! when alpha < 1/order !


One nagging question …!Is there a stochastic process that underlies this approximation?


Meanwhile … "Spectral clustering of tensors

Austin Benson (a colleague) asked"if there were any interesting method to “cluster” tensors.

“Recall” spectral clustering on graphs! !

SIAM Data Mining 2015, arXiv:1502.05058

graph! random walk! second eigenvector! sweep cut partition


MT y = �2y

S̄S

minS

�(S) = minS

#(edges cut)min(vol(S), vol(S̄))



“Conjecture” spectral clustering on tensors! !


graph/tensor! higher-order random walk! second eigenvector! sweep cut partition

??????!


We tried many •  apriori good and •  retrospectively bad ideas for the second eigenvector


Austin and I were talking one day …

... about the problem of the process. (He was using Multilinear PageRank as the “first” eigenvector.) He observed that

One of the five algorithms !for multilinear PageRank uses a seq. of Markov chains. Is there some way to turn this into a random walk?

xk+1 = stat. dist. of Markov chain based on ↵, v, P, and xk


EUREKA!


The spacey random walk Consider a higher-order Markov chain. If we were perfect, we’d figure out the stationary distribution of that. But we are spacey!•  On arriving at state j, we promptly "

“space out” and forget we came from k. •  But we still believe we are “higher-order” •  So we invent a state k by drawing a random

state from our history.

P(Xt+1 = i | history) = P(Xt+1 = i | Xt = j , Xt�1 = k )


The spacey random walk

This is a vertex-reinforced random walk! "e.g. Polya’s urn. Pemantle, 1992; Benaïm, 1997; Pemantle 2007


P(Xt+1 = i | Xt = j and the right filtration on history)

=X

k

Pi ,j ,k Ck (t)/(t + n)

Let Ct (k ) = (1 +Pt

s=1 Ind{Xs = k})

How often we’ve visited state k in the past

Stationary distributions of vertex reinforced random walks A vertex-reinforced random walk at time t transitions according to a Markov matrix M given the observed frequencies. This has a stationary distribution, iff the dynamical system converges.


dx

dt= ⇡[M(x)] � x

P(Xt+1 = i | Xt = j and the right filtration on history)= [M(t)]i ,j= [M(c(t))]i ,j

⇡[M] is a map to the stat. dist.

M. Benïam 1997

The Markov matrix for "Spacey Random Walks A necessary condition for a stationary distribution (otherwise makes no sense)


Property B. Let P be an order-m, n dimensional probability table. Then P hasproperty B if there is a unique stationary distribution associated with all stochasticcombinations of the lastm�2modes. That is,M =

Pk ,`,... P(:, :, k , `, ...)�k ,`,... defines

a Markov chain with a unique Perron root when all �s are positive and sum to one.dx

dt= ⇡[M(x)] � x

M =X

k

P(:, :, k )xk

This is the transition probability associated with guessing the last state based on history!

We have all sorts of cool results on spacey random walks… e.g. Suppose you have a Polya Urn with memory… "Then it always has a stationary distribution!


Back to Multilinear PageRank The Multilinear PageRank problem is what we call a spacey random surfer model. •  This is a spacey random walk •  We add random jumps with probability (1-alpha) It’s also a vertex-reinforced random walk. Thus, it has a stationary probability if converges.


dx

dt= ⇡[M(x)] � x

M(x) = ↵P

k

P(:, :, k )xk

+ (1 � ↵)v

Which occurs when alpha < 1/order !

Some interesting notes about vertex reinforced random walks •  The power method is NOT the natural

algorithm! It’s to evolve the ODE. •  It’s unclear if there are any structural

properties that guarantee a stationary distribution (except for something like the Multilinear PageRank equation)

•  Can be tough to analyze the resulting ODEs •  Asymptotically creates a Markov chain!


… back to spectral clustering …







??????!








M(x)Ty = �2y

Use the asymptotic Markov matrix!

Problem current methods only consider edges … and that is not enough for current problems


In social networks, we want to penalize cutting triangles more than cutting edges. The triangle motif represents stronger social ties.

Problem current methods only consider edges


SPT16

HO

CLN1

CLN2 SWI4_SWI6

In transcription networks, the ``feedforward loop” motif represents biological function. Thus, we want to look for clusters of this structure.

An example with a layered flow network


0

123

4 5

6 7

8 9

10 11

§  The network “flows” downward §  Use directed 3-cycles to model flow

i

kj

i

kj

i

kj

i

kj1 1 1 2

§  Tensor spectral clustering: {0,1,2,3}, {4,5,6,7}, {8,9,10,11} §  Standard spectral: {0,1,2,3,4,5,6,7}, {8,10,11}, {9}


WAW2015 EURANDOM – Eindhoven – Netherlands

Workshop on Algorithms and Models for the Web Graph (but it’s grown to be all types of network analysis)

December 10-‐11

Winter School on Complex Network and Graph Models December 7-‐8

Submissions Due July 25th!

Time for Lots of Questions!

Manuscripts!Li, Ng. On the limiting probability distribution of a transition probability tensor. Linear & Multilinear Algebra 2013. Gleich. PageRank beyond the Web. (accepted at SIAM Review) Gleich, Lim, Yu. Multilinear PageRank. (under review…) Benson, Gleich, Leskovec. Tensor Spectral Clustering for partitioning higher order network structures. SDM 2015, arXiv:" https://github.com/arbenson/tensor-sc Benson, Gleich, Leskovec. Forthcoming. (Much better method…) Benson, Gleich, Lim. The Spacey Random Walk. In prep.


spacey random walks and higher order markov chains

Technology

higherorder relationships

purdue university

markov chain p

markov chains analysis

multifaceted data

multiway data

structured tensor

stationary distribution