relaxation methods for the matrix exponential on large networks

36
Coordinate descent methods for the matrix exponential on large networks David F. Gleich Purdue University Joint work with Kyle Kloster @ Purdue supported by NSF CAREER 1149756-CCF Code www.cs.purdue.edu/homes/dgleich/codes/nexpokit ICME David Gleich · Purdue 1

Upload: david-gleich

Post on 15-Jan-2015

549 views

Category:

Technology


2 download

DESCRIPTION

My talk from the Stanford ICME seminar series on doing network analysis and link prediction using the a fast algorithm for the matrix exponential on graph problems.

TRANSCRIPT

Page 1: Relaxation methods for the matrix exponential on large networks

Coordinate descent methods for the matrix exponential !

on large networks

David F. Gleich!Purdue University!

Joint work with Kyle Kloster @ Purdue supported by "NSF CAREER 1149756-CCF

Code www.cs.purdue.edu/homes/dgleich/codes/nexpokit!

ICME David Gleich · Purdue 1

Page 2: Relaxation methods for the matrix exponential on large networks

This talk

x = exp(P)ec

x the solution

P the matrix

ec the column

localized large, sparse, stochastic

ICME David Gleich · Purdue 2

Page 3: Relaxation methods for the matrix exponential on large networks

Localized solutions

0 2 4 6x 105

0

0.5

1

1.5plot(x)

nnz(x) = 513, 969

100 102 104 10610−15

10−10

10−5

100

100 102 104 10610−15

10−10

10−5

100

nonzeros

erro

r

x = exp(P)ec

length(x) = 513, 969ICME David Gleich · Purdue 3

Page 4: Relaxation methods for the matrix exponential on large networks

Our mission!Find the solution with work "roughly proportional to the "localization, not the matrix.

ICME David Gleich · Purdue 4

Page 5: Relaxation methods for the matrix exponential on large networks

Our algorithm!www.cs.purdue.edu/homes/dgleich/codes/nexpokit

100 102 104 10610−15

10−10

10−5

100

100 102 104 10610−15

10−10

10−5

100

nonzeros

erro

r

ICME David Gleich · Purdue 5

Page 6: Relaxation methods for the matrix exponential on large networks

Outline

1.  Motivation and setup 2.  Converting x = exp(P) ec into a linear system 3.  Coordinate descent methods for "

linear systems from large networks 4.  Error analysis 5.  Experiments

ICME David Gleich · Purdue 6

Page 7: Relaxation methods for the matrix exponential on large networks

Models and algorithms for high performance !matrix and network computations

ICME David Gleich · Purdue 7

18 P. G. CONSTANTINE, D. F. GLEICH, Y. HOU, AND J. TEMPLETON

1

error

0

2

(a) Error, s = 0.39 cm

1

std

0

2

(b) Std, s = 0.39 cm

10

error

0

20

(c) Error, s = 1.95 cm

10

std

0

20

(d) Std, s = 1.95 cm

Fig. 4.5: Error in the reduce order model compared to the prediction standard de-viation for one realization of the bubble locations at the final time for two values ofthe bubble radius, s = 0.39 and s = 1.95 cm. (Colors are visible in the electronicversion.)

the varying conductivity fields took approximately twenty minutes to construct usingCubit after substantial optimizations.

Working with the simulation data involved a few pre- and post-processing steps:interpret 4TB of Exodus II files from Aria, globally transpose the data, compute theTSSVD, and compute predictions and errors. The preprocessing steps took approx-imately 8-15 hours. We collected precise timing information, but we do not reportit as these times are from a multi-tenant, unoptimized Hadoop cluster where otherjobs with sizes ranging between 100GB and 2TB of data sometimes ran concurrently.Also, during our computations, we observed failures in hard disk drives and issuescausing entire nodes to fail. Given that the cluster has 40 cores, there was at most2400 cpu-hours consumed via these calculations—compared to the 131,072 hours ittook to compute 4096 heat transfer simulations on Red Sky. Thus, evaluating theROM was about 50-times faster than computing a full simulation.

We used 20,000 reducers to convert the Exodus II simulation data. This choicedetermined how many map tasks each subsequent step utilized—around 33,000. Wealso found it advantageous to store matrices in blocks of about 16MB per record. Thereduction in the data enabled us to use a laptop to compute the coe�cients of theROM and apply to the far face for the UQ study in Section 4.4.

Here are a few pertinent challenges we encountered while performing this study.Generating 8192 meshes with di↵erent material properties and running independent

Tensor eigenvalues"and a power method

28

Tensor methods for network alignment

Network alignment is the problem of computing an approximate isomorphism between two net-works. In collaboration with Mohsen Bayati, Amin Saberi, Ying Wang, and Margot Gerritsen,the PI has developed a state of the art belief propagation method (Bayati et al., 2009).

FIGURE 6 – Previous workfrom the PI tackled net-work alignment with ma-trix methods for edgeoverlap:

i

j

j

0i

0

OverlapOverlap

A L B

This proposal is for match-ing triangles using tensormethods:

j

i

k

j

0

i

0

k

0

TriangleTriangle

A L B

If xi, xj , and xk areindicators associated withthe edges (i, i0), (j, j0), and(k, k0), then we want toinclude the product xixjxk

in the objective, yielding atensor problem.

We propose to study tensor methods to perform network alignmentwith triangle and other higher-order graph moment matching. Similarideas were proposed by Svab (2007); Chertok and Keller (2010) alsoproposed using triangles to aid in network alignment problems.In Bayati et al. (2011), we found that triangles were a key missingcomponent in a network alignment problem with a known solution.Given that preserving a triangle requires three edges between twographs, this yields a tensor problem:

maximizeX

i2L

wixi +X

i2L

X

j2L

xixjSi,j +X

i2L

X

j2L

X

k2L

xixjxkTi,j,k

| {z }triangle overlap term

subject to x is a matching.

Here, Ti,j,k = 1 when the edges corresponding to i, j, and k inL results in a triangle in the induced matching. Maximizing thisobjective is an intractable problem. We plan to investigate a heuris-tic based on a rank-1 approximation of the tensor T and usinga maximum-weight matching based rounding. Similar heuristicshave been useful in other matrix-based network alignment algo-rithms (Singh et al., 2007; Bayati et al., 2009). The work involvesenhancing the Symmetric-Shifted-Higher-Order Power Method due toKolda and Mayo (2011) to incredibly large and sparse tensors . On thisaspect, we plan to collaborate with Tamara G. Kolda. In an initialevaluation of this triangle matching on synthetic problems, using thetensor rank-1 approximation alone produced results that identifiedthe correct solution whereas all matrix approaches could not.

vision for the future

All of these projects fit into the PI’s vision for modernizing the matrix-computation paradigmto match the rapidly evolving space of network computations. This vision extends beyondthe scope of the current proposal. For example, the web is a huge network with over onetrillion unique URLs (Alpert and Hajaj, 2008), and search engines have indexed over 180billion of them (Cuil, 2009). Yet, why do we need to compute with the entire network?By way of analogy, note that we do not often solve partial di↵erential equations or modelmacro-scale physics by explicitly simulating the motion or interaction of elementary particles.We need something equivalent for the web and other large networks. Such investigations maytake many forms: network models, network geometry, or network model reduction. It is thevision of the PI that the language, algebra, and methodology of matrix computations will

11

maximize

Pijk

T

ijk

x

i

x

j

x

k

subject to kxk2

= 1

Human protein interaction networks 48,228 triangles Yeast protein interaction networks 257,978 triangles The tensor T has ~100,000,000,000 nonzeros

We work with it implicitly

where ! ensures the 2-norm

[x

(next)

]

i

= ⇢ · (

X

jk

T

ijk

x

j

x

k

+ �x

i

)

SSHOPM method due to "Kolda and Mayo

Simulation data analysis SIMAX ‘09, SISC ‘11,MapReduce ‘11, ICASSP ’12

Network alignment ICDM ‘09, SC ‘11, TKDE ‘13

Fast & Scalable"Network centrality SC ‘05, WAW ‘07, SISC ‘10, WWW ’10, …

Data clustering WSDM ‘12, KDD ‘12, CIKM ’13 …

Ax = b

min kAx � bkAx = �x

Massive matrix "computations

on multi-threaded and distributed architectures

Page 8: Relaxation methods for the matrix exponential on large networks

Matrix exponentials

exp(A) is defined as

1X

k=0

1

k !

Ak Always converges

special case of a function of a matrix

dx

dt= Ax(t) , x(t) = exp(tA)x(0)

Evolution operator "for an ODE

A is n ⇥ n, real

ICME David Gleich · Purdue 8

Page 9: Relaxation methods for the matrix exponential on large networks

6,$0� 5(9,(:� "��6RFLHW\�IRU�,QGXVWULDO�DQG�$SSOLHG�0DWKHPDWLFV�9RO������1R�����2FWREHU������ �������������������������������

1,1(7((1� '8%,286� :$<6� 72� &20387(�7+(� (;321(17,$/� 2)� $�0$75,; �

&/(9(� 02/(5W� $1'� &+$5/(6� 9$1� /2$1W�

$EVWUDFW��,Q�SULQFLSOH��WKH�H[SRQHQWLDO�RI�D�PDWUL[�FRXOG�EH�FRPSXWHG�LQ�PDQ\�ZD\V��0HWKRGV�LQYROYLQJ�DSSUR[LPDWLRQ�WKHRU\��GLIIHUHQWLDO�HTXDWLRQV��WKH�PDWUL[�HLJHQYDOXHV��DQG�WKH�PDWUL[�FKDUDFWHULVWLF�SRO\��QRPLDO�KDYH�EHHQ�SURSRVHG��,Q�SUDFWLFH��FRQVLGHUDWLRQ�RI�FRPSXWDWLRQDO�VWDELOLW\�DQG�HIILFLHQF\�LQGLFDWHV�WKDW�VRPH�RI�WKH�PHWKRGV�DUH�SUHIHUDEOH�WR�RWKHUV��EXW�WKDW�QRQH�DUH�FRPSOHWHO\�VDWLVIDFWRU\��

���,QWURGXFWLRQ��0DWKHPDWLFDO�PRGHOV� RI� PDQ\�SK\VLFDO��ELRORJLFDO��DQG�HFRQRPLF�SURFHVVHV�LQYROYH�V\VWHPV�RI�OLQHDU��FRQVWDQW�FRHIILFLHQW�RUGLQDU\�GLIIHUHQWLDO�HTXDWLRQV�

[��W�� �$[��W���

+HUH�$� LV�D�JLYHQ��IL[HG��UHDO�RU�FRPSOH[�Q�E\�Q�PDWUL[��$�VROXWLRQ�YHFWRU�[�W��LV�VRXJKW�ZKLFK�VDWLVILHV�DQ�LQLWLDO�FRQGLWLRQ�

[��2�� �[R��

,Q�FRQWURO�WKHRU\��$�LV�NQRZQ�DV�WKH�VWDWH�FRPSDQLRQ�PDWUL[�DQG�[�W��LV�WKH�V\VWHP�UHVSRQVH��

,Q�SULQFLSOH��WKH�VROXWLRQ�LV�JLYHQ�E\�[�W�� � HW$[R�ZKHUH�HW$�FDQ�EH�IRUPDOO\�GHILQHG�E\�WKH�FRQYHUJHQW�SRZHU�VHULHV�

W�$��W$� W$�H� ,�,$�� a������

7KH�HIIHFWLYH�FRPSXWDWLRQ�RI�WKLV�PDWUL[�IXQFWLRQ�LV�WKH�PDLQ�WRSLF�RI�WKLV�VXUYH\��:H�ZLOO�SULPDULO\�EH�FRQFHUQHG�ZLWK�PDWULFHV�ZKRVH�RUGHU�Q�LV�OHVV�WKDQ�D�IHZ�

KXQGUHG��VR�WKDW�DOO�WKH�HOHPHQWV�FDQ�EH�VWRUHG�LQ�WKH�PDLQ�PHPRU\�RI�D�FRQWHPSRUDU\�FRPSXWHU��2XU�GLVFXVVLRQ�ZLOO�EH�OHVV�JHUPDQH�WR�WKH�W\SH�RI�ODUJH��VSDUVH�PDWULFHV�ZKLFK�RFFXU�LQ�WKH�PHWKRG�RI�OLQHV�IRU�SDUWLDO�GLIIHUHQWLDO�HTXDWLRQV��

'R]HQV�RI�PHWKRGV�IRU�FRPSXWLQJ�H�W$FDQ�EH�REWDLQHG�IURP�PRUH�RU�OHVV�FODVVLFDO�UHVXOWV�LQ�DQDO\VLV��DSSUR[LPDWLRQ�WKHRU\��DQG�PDWUL[�WKHRU\��6RPH�RI�WKH�PHWKRGV�KDYH�EHHQ�SURSRVHG�DV�VSHFLILF�DOJRULWKPV��ZKLOH�RWKHUV�DUH�EDVHG�RQ�OHVV�FRQVWUXFWLYH�FKDUDFWHUL]DWLRQV��2XU� ELEOLRJUDSK\�FRQFHQWUDWHV�RQ� UHFHQW�SDSHUV�ZLWK�VWURQJ�DOJRULWKPLF�FRQWHQW��DOWKRXJK�ZH�KDYH�LQFOXGHG�D�IDLU�QXPEHU�RI�UHIHUHQFHV�ZKLFK�SRVVHVV�KLVWRULFDO�RU�WKHRUHWLFDO�LQWHUHVW��

,Q�WKLV�VXUYH\�ZH�WU\�WR�GHVFULEH�DOO�WKH�PHWKRGV�WKDW�DSSHDU�WR�EH�SUDFWLFDO��FODVVLI\�WKHP�LQWR�ILYH�EURDG�FDWHJRULHV��DQG�DVVHVV�WKHLU�UHODWLYH�HIIHFWLYHQHVV��$FWX��DOO\��HDFK�RI�WKH��PHWKRGV��ZKHQ�FRPSOHWHO\�LPSOHPHQWHG�PLJKW�OHDG�WR�PDQ\�GLIIHUHQW�FRPSXWHU�SURJUDPV�ZKLFK�GLIIHU�LQ�YDULRXV�GHWDLOV��0RUHRYHU��WKHVH�GHWDLOV�PLJKW�KDYH�PRUH�LQIOXHQFH�RQ�WKH�DFWXDO�SHUIRUPDQFH�WKDQ�RXU�JURVV�DVVHVVPHQW�LQGLFDWHV��7KXV��RXU�FRPPHQWV�PD\�QRW�GLUHFWO\�DSSO\�WR�SDUWLFXODU�VXEURXWLQHV��

,Q�DVVHVVLQJ�WKH�HIIHFWLYHQHVV�RI�YDULRXV�DOJRULWKPV�ZH�ZLOO�EH�FRQFHUQHG�ZLWK�WKH�IROORZLQJ�DWWULEXWHV��OLVWHG�LQ�GHFUHDVLQJ�RUGHU�RI�LPSRUWDQFH��JHQHUDOLW\��UHOLDELOLW\��

�5HFHLYHG�E\�WKH�HGLWRUV�-XO\�����������DQG�LQ�UHYLVHG�IRUP�0DUFK�����������W�'HSDUWPHQW�RI�0DWKHPDWLFV��8QLYHUVLW\�RI�1HZ�0H[LFR��$OEXTXHUTXH��1HZ�0H[LFR��������7KLV�

ZRUN�ZDV�SDUWLDOO\�VXSSRUWHG�E\�16)�*UDQW�0&6����������W�'HSDUWPHQW�RI�&RPSXWHU�6FLHQFH��&RUQHOO�8QLYHUVLW\��,WKDFD��1HZ�<RUN��������7KLV�ZRUN�ZDV�

SDUWLDOO\�VXSSRUWHG�E\�16)�*UDQW�0&6����������

����

This content downloaded from 128.210.126.199 on Sun, 28 Jul 2013 21:30:56 PMAll use subject to JSTOR Terms and Conditions

SIAM REVIEW c! 2003 Society for Industrial and Applied MathematicsVol. 45, No. 1, pp. 3–49

Nineteen Dubious Ways toCompute the Exponential of aMatrix, Twenty-Five Years Later!

Cleve Moler†

Charles Van Loan‡

Abstract. In principle, the exponential of a matrix could be computed in many ways. Methods involv-ing approximation theory, di!erential equations, the matrix eigenvalues, and the matrixcharacteristic polynomial have been proposed. In practice, consideration of computationalstability and e"ciency indicates that some of the methods are preferable to others butthat none are completely satisfactory.

Most of this paper was originally published in 1978. An update, with a separatebibliography, describes a few recent developments.

Key words. matrix, exponential, roundo! error, truncation error, condition

AMS subject classifications. 15A15, 65F15, 65F30, 65L99

PII. S0036144502418010

1. Introduction. Mathematical models of many physical, biological, and eco-nomic processes involve systems of linear, constant coe!cient ordinary di"erentialequations

x(t) = Ax(t).

Here A is a given, fixed, real or complex n-by-n matrix. A solution vector x(t) issought which satisfies an initial condition

x(0) = x0.

In control theory, A is known as the state companion matrix and x(t) is the systemresponse.

In principle, the solution is given by x(t) = etAx0, where etA can be formallydefined by the convergent power series

etA = I + tA +t2A2

2!+ · · · .

The e"ective computation of this matrix function is the main topic of this survey.

!Published electronically February 3, 2003. A portion of this paper originally appeared in SIAMReview, Volume 20, Number 4, 1978, pages 801–836.

http://www.siam.org/journals/sirev/45-1/41801.html†The MathWorks, Inc., 3 Apple Hill Drive, Natick, MA 01760-2098 ([email protected]).‡Department of Computer Science, Cornell University, 4130 Upson Hall, Ithaca, NY 14853-7501

([email protected]).

3

Dow

nloa

ded

07/2

8/13

to 1

28.2

10.1

26.1

99. R

edis

tribu

tion

subj

ect t

o SI

AM

lice

nse

or c

opyr

ight

; see

http

://w

ww

.siam

.org

/jour

nals

/ojs

a.ph

p

ICME David Gleich · Purdue 9

Page 10: Relaxation methods for the matrix exponential on large networks

Matrix exponentials on large networks

exp(A) =

1X

k=0

1

k !

Ak If A is the adjacency matrix, then Ak counts the number of length k paths between node pairs.

[Estrada 2000, Farahat et al. 2002, 2006] Large entries denote important nodes or edges. Used for link prediction and centrality

If P is a transition matrix, then "Pk is the probability of a length k walk between node pairs.

[Kondor & Lafferty 2002, Kunegis & Lommatzsch 2009, Chung 2007] Used for link prediction, kernels, and clustering or community detection

exp(P) =

1X

k=0

1

k !

Pk

ICME David Gleich · Purdue 10

Page 11: Relaxation methods for the matrix exponential on large networks

Another useful matrix exponential

P column stochastic

e.g. P = AT D�1

A is the adjacency matrix

if A is symmetric

exp(PT) = exp(D�1A) = D�1

exp(AD�1

)D = D�1

exp(P)D

ICME David Gleich · Purdue 11

Page 12: Relaxation methods for the matrix exponential on large networks

Another useful matrix exponential

P column stochastic

e.g. P = AT D�1

A is the adjacency matrix

if A is symmetric

exp(�L) = exp(D�1/2AD�1/2 � I)

=

1

eexp(D�1/2AD�1/2

)

=

1

eD�1/2

exp(AD�1

)D1/2

=

1

eD�1/2

exp(P)D1/2

Negative Normalized Laplacian

ICME David Gleich · Purdue 12

Page 13: Relaxation methods for the matrix exponential on large networks

Matrix exponentials on large networks Is a single column interesting? Yes!

exp(P)ec =

1X

k=0

1

k !

Pk ec Link prediction scores for node c A community relative to node c

But … modern networks are "large ~ O(109) nodes, sparse ~ O(1011) edges, constantly changing …

and so we’d like "speed over accuracy

ICME David Gleich · Purdue 13

Page 14: Relaxation methods for the matrix exponential on large networks

The issue with existing methods

We want good results in less than one matvec. Our graphs have small diameter and fast fill-in. Krylov methods !A few matvecs, quick loss of sparsity due to orthogonality !Direct expansion!A few matvecs, quick loss of sparsity due to fill-in

ICME David Gleich · Purdue 14

exp(P)ec ⇡ ⇢Vexp(H)e1

[Sidje 1998]"ExpoKit

exp(P)ec ⇡PN

k=0

1

k !

Pk ec

Page 15: Relaxation methods for the matrix exponential on large networks

Outline

1.  Motivation and setup 2.  Converting x = exp(P) ec into a linear system 3.  Coordinate descent methods for "

linear systems from large networks 4.  Error analysis 5.  Experiments

ICME David Gleich · Purdue 15

Page 16: Relaxation methods for the matrix exponential on large networks

Our underlying method

Direct expansion!A few matvecs, quick loss of sparsity due to fill-in This method is stable for stochastic P!

"… no cancellation, unbounded norm, etc. !!

ICME David Gleich · Purdue 16

x = exp(P)ec ⇡PN

k=0

1

k !

P

kec = xN

Lemma kx � xNk1 1N!N

Page 17: Relaxation methods for the matrix exponential on large networks

Our underlying method !as a linear system Direct expansion! "!!!

ICME David Gleich · Purdue 17

x = exp(P)ec ⇡PN

k=0

1

k !

P

kec = xN

2

6666664

III�P/1 III

�P/2. . .. . . III

�P/N III

3

7777775

2

6666664

v0v1......

vN

3

7777775=

2

6666664

ec0......0

3

7777775xN =

NX

i=0

vi

(III⌦ IIIN � SN ⌦ P)v = e1 ⌦ ec

Lemma we approximate xN well if we approximate v well

Page 18: Relaxation methods for the matrix exponential on large networks

Our mission (2)!Approximately solve " when A, b are sparse,"x is localized.

ICME David Gleich · Purdue 18

Ax = b

Page 19: Relaxation methods for the matrix exponential on large networks

Outline

1.  Motivation and setup 2.  Converting x = exp(P) ec into a linear system 3.  Coordinate descent methods for "

linear systems from large networks 4.  Error analysis 5.  Experiments

ICME David Gleich · Purdue 19

Page 20: Relaxation methods for the matrix exponential on large networks

Coordinate descent, Gauss-Southwell, Gauss-Seidel, relaxation & “push” methods

ICME David Gleich · Purdue 20

Algebraically! Procedurally!

Solve(A,b) x = sparse(size(A,1),1) r = b While (1) Pick j where r(j) != 0 z = r(j) x(j) = x(j) + r(j) For i where A(i,j) != 0 r(i) = r(i) – z*A(i,j)

Ax = b

r

(k ) = b � Ax

(k )

x

(k+1) = x

(k ) + ejeTj r

(k )

r

(k+1) = r

(k ) � r (k )j Aej

Page 21: Relaxation methods for the matrix exponential on large networks

It’s called the “push” method because of PageRank

ICME David Gleich · Purdue 21

(III� ↵P)x = v

r

(k )

= v � (III� ↵P)x

(k )

x

(k+1)

= x

(k )

+ ejeTj r

(k )

“r

(k+1)

= r

(k ) � r (k )

j Aej ”

r (k+1)

i =

8><

>:

0 i = jr (k )

i + ↵Pi ,j r(k )

j Pi ,j 6= 0

r (k )

i otherwise

PageRankPush(links,v,alpha) x = sparse(size(A,1),1) r = b While (1) Pick j where r(j) != 0 z = r(j) x(j) = x(j) + z r(j) = 0 z = z / deg(j) For i where “j links to i” r(i) = r(i) + z

Page 22: Relaxation methods for the matrix exponential on large networks

It’s called the “push” method because of PageRank

ICME David Gleich · Purdue 22

Demo

Page 23: Relaxation methods for the matrix exponential on large networks

Justification of terminology

This method is frequently “rediscovered” (3 times for PageRank!)

Let Ax = b, diag(A) = I It’s Gauss-Seidel if j is chosen cyclically It’s Gauss-Southwell if j is the largest entry in the residual It’s coordinate descent if A is symmetric, pos. definite It’s a relaxation step for any A

Works great for other problems too! "[Bonchi, Gleich, et al. J. Internet Math. 2012]

ICME David Gleich · Purdue 23

Page 24: Relaxation methods for the matrix exponential on large networks

Back to the exponential

ICME David Gleich · Purdue 24

2

6666664

III�P/1 III

�P/2. . .. . . III

�P/N III

3

7777775

2

6666664

v0v1......

vN

3

7777775=

2

6666664

ec0......0

3

7777775xN =

NX

i=0

vi

(III⌦ IIIN � SN ⌦ P)v = e1 ⌦ ec

Solve this system via the same method. Optimization 1 build system implicitly Optimization 2 don’t store vi, just store sum xN

Page 25: Relaxation methods for the matrix exponential on large networks

Code (inefficient, but working) for !Gauss-Southwell to solve function x = nexpm(P,c,tol) n = size(P,1); N = 11; sumr=1; r = zeros(n,N+1); r(c,1) = 1; x = zeros(n,1); % the residual and solution while sumr >= tol % use max iteration too [ml,q]=max(r(:)); i=mod(q-1,n)+1; k=ceil(q/n); % use a heap in practice for max r(q) = 0; x(i) = x(i)+ml; sumr = sumr-ml;% zero the residual, add to solution [nset,~,vals] = find(P(:,i)); ml=ml/k; % look up the neighbors of node i for j=1:numel(nset) % for all neighbors if k==N, x(nset(j)) = x(nset(j)) + vals(j)*ml; % add to solution else, r(nset(j),k+1) = r(nset(j),k+1) + vals(j)*ml;% or add to next residual sumr = sumr + vals(j)*ml; end, end, end % end if, end for, end while Todo use dictionary for x, r and use heap or queue for residual

ICME David Gleich · Purdue 25

Page 26: Relaxation methods for the matrix exponential on large networks

Outline

1.  Motivation and setup 2.  Converting x = exp(P) ec into a linear system 3.  Coordinate descent methods for "

linear systems from large networks 4.  Error analysis 5.  Experiments

ICME David Gleich · Purdue 26

✓ ✓

Page 27: Relaxation methods for the matrix exponential on large networks

Error analysis for Gauss-Southwell

ICME David Gleich · Purdue 27

Theorem

Assume P is column-stochastic, v

(0)

= 0.

(Nonnegativity)

iterates and residuals are nonnegative

v

(l) � 0 and r

(l) � 0

(Convergence)

residual goes to 0:

kr

(l)k1

Q

l

k=1

�1 � 1

2dk

� l

(

� 1

2d

)

(III⌦ IIIN � SN ⌦ P)v = e1 ⌦ ec

“easy”

“annoying” d is the

largest degree

Page 28: Relaxation methods for the matrix exponential on large networks

Proof sketch

Gauss-Southwell picks largest residual ⇒  Bound the update by avg. nonzeros in residual (sloppy) ⇒  Algebraic convergence with slow rate, but each update is

REALLY fast O(d max log n).

If d is log log n, then our method runs in sub-linear time "(but so does just about anything)

ICME David Gleich · Purdue 28

Page 29: Relaxation methods for the matrix exponential on large networks

Overall error analysis

ICME David Gleich · Purdue 29

Components!Truncation to N terms Residual to error Approximate solve

Theorem kxN(`) � xk1 1

N!N+

1e· `� 1

2d

After ℓ steps of Gauss-Southwell

Page 30: Relaxation methods for the matrix exponential on large networks

Outline

1.  Motivation and setup 2.  Converting x = exp(P) ec into a linear system 3.  Coordinate descent methods for "

linear systems from large networks 4.  Error analysis 5.  Experiments

ICME David Gleich · Purdue 30

✓ ✓

Page 31: Relaxation methods for the matrix exponential on large networks

Our implementations

C++ mex implementation with a heap to implement Gauss-Southwell. C++ mex implementation with a queue to store all residual entries ≥ 1/(tol nN).

At completion, the residual norm ≤ tol. We use the queue except for the runtime comparison.

ICME David Gleich · Purdue 31

Page 32: Relaxation methods for the matrix exponential on large networks

Accuracy vs. tolerance

ICME David Gleich · Purdue 32

0

0.2

0.4

0.6

0.8

1

!2 !3 !4 !5 !6 !7log10 of residual tolerance

Pre

cisi

on a

t 1

00

pgp!ccpgp social graph, 10k vertices

For the pgp social graph, we study the precision in finding the 100 largest nodes as we vary the tolerance. This set of 100 does not include the nodes immediate neighbors. (Boxplot over 50 trials)

Page 33: Relaxation methods for the matrix exponential on large networks

Accuracy vs. work

ICME David Gleich · Purdue 33

For the dblp collaboration graph, we study the precision in finding the 100 largest nodes as we vary the work. This set of 100 does not include the nodes immediate neighbors. (One column, but representative)

10−2 10−1 100

0

0.2

0.4

0.6

0.8

1

dblp−cc

Effective matrix−vector products

Prec

isio

n

to

l=10

−4

tol=

10−5

@10@25@100@1000

dblp collaboration graph, 225k vertices

Page 34: Relaxation methods for the matrix exponential on large networks

Runtime

ICME David Gleich · Purdue 34

103

104

105

106

10!4

10!2

100

|E| + |V|

Runtim

e (

secs

).

TSGSTSGSQEXPVMEXPVTAYLOR

Flickr social network"500k nodes, 5M edges

Page 35: Relaxation methods for the matrix exponential on large networks

Outline

1.  Motivation and setup 2.  Converting x = exp(P) ec into a linear system 3.  Coordinate descent methods for "

linear systems from large networks 4.  Error analysis 5.  Experiments

ICME David Gleich · Purdue 35

✓ ✓

✓ ✓

Page 36: Relaxation methods for the matrix exponential on large networks

References and ongoing work Kloster and Gleich, Workshop on Algorithms for the Web-graph, 2013 (forthcoming). www.cs.purdue.edu/homes/dgleich/codes/nexpokit

•  Error analysis using the queue •  Better linear systems for faster convergence •  Asynchronous coordinate descent methods •  Scaling up to billion node graphs •  More explicit localization in algorithms

ICME David Gleich · Purdue 36