relaxation methods for the matrix exponential on large networks
DESCRIPTION
My talk from the Stanford ICME seminar series on doing network analysis and link prediction using the a fast algorithm for the matrix exponential on graph problems.TRANSCRIPT
Coordinate descent methods for the matrix exponential !
on large networks
David F. Gleich!Purdue University!
Joint work with Kyle Kloster @ Purdue supported by "NSF CAREER 1149756-CCF
Code www.cs.purdue.edu/homes/dgleich/codes/nexpokit!
ICME David Gleich · Purdue 1
This talk
x = exp(P)ec
x the solution
P the matrix
ec the column
localized large, sparse, stochastic
ICME David Gleich · Purdue 2
Localized solutions
0 2 4 6x 105
0
0.5
1
1.5plot(x)
nnz(x) = 513, 969
100 102 104 10610−15
10−10
10−5
100
100 102 104 10610−15
10−10
10−5
100
nonzeros
erro
r
x = exp(P)ec
length(x) = 513, 969ICME David Gleich · Purdue 3
Our mission!Find the solution with work "roughly proportional to the "localization, not the matrix.
ICME David Gleich · Purdue 4
Our algorithm!www.cs.purdue.edu/homes/dgleich/codes/nexpokit
100 102 104 10610−15
10−10
10−5
100
100 102 104 10610−15
10−10
10−5
100
nonzeros
erro
r
ICME David Gleich · Purdue 5
Outline
1. Motivation and setup 2. Converting x = exp(P) ec into a linear system 3. Coordinate descent methods for "
linear systems from large networks 4. Error analysis 5. Experiments
ICME David Gleich · Purdue 6
Models and algorithms for high performance !matrix and network computations
ICME David Gleich · Purdue 7
18 P. G. CONSTANTINE, D. F. GLEICH, Y. HOU, AND J. TEMPLETON
1
error
0
2
(a) Error, s = 0.39 cm
1
std
0
2
(b) Std, s = 0.39 cm
10
error
0
20
(c) Error, s = 1.95 cm
10
std
0
20
(d) Std, s = 1.95 cm
Fig. 4.5: Error in the reduce order model compared to the prediction standard de-viation for one realization of the bubble locations at the final time for two values ofthe bubble radius, s = 0.39 and s = 1.95 cm. (Colors are visible in the electronicversion.)
the varying conductivity fields took approximately twenty minutes to construct usingCubit after substantial optimizations.
Working with the simulation data involved a few pre- and post-processing steps:interpret 4TB of Exodus II files from Aria, globally transpose the data, compute theTSSVD, and compute predictions and errors. The preprocessing steps took approx-imately 8-15 hours. We collected precise timing information, but we do not reportit as these times are from a multi-tenant, unoptimized Hadoop cluster where otherjobs with sizes ranging between 100GB and 2TB of data sometimes ran concurrently.Also, during our computations, we observed failures in hard disk drives and issuescausing entire nodes to fail. Given that the cluster has 40 cores, there was at most2400 cpu-hours consumed via these calculations—compared to the 131,072 hours ittook to compute 4096 heat transfer simulations on Red Sky. Thus, evaluating theROM was about 50-times faster than computing a full simulation.
We used 20,000 reducers to convert the Exodus II simulation data. This choicedetermined how many map tasks each subsequent step utilized—around 33,000. Wealso found it advantageous to store matrices in blocks of about 16MB per record. Thereduction in the data enabled us to use a laptop to compute the coe�cients of theROM and apply to the far face for the UQ study in Section 4.4.
Here are a few pertinent challenges we encountered while performing this study.Generating 8192 meshes with di↵erent material properties and running independent
Tensor eigenvalues"and a power method
28
Tensor methods for network alignment
Network alignment is the problem of computing an approximate isomorphism between two net-works. In collaboration with Mohsen Bayati, Amin Saberi, Ying Wang, and Margot Gerritsen,the PI has developed a state of the art belief propagation method (Bayati et al., 2009).
FIGURE 6 – Previous workfrom the PI tackled net-work alignment with ma-trix methods for edgeoverlap:
i
j
j
0i
0
OverlapOverlap
A L B
This proposal is for match-ing triangles using tensormethods:
j
i
k
j
0
i
0
k
0
TriangleTriangle
A L B
If xi, xj , and xk areindicators associated withthe edges (i, i0), (j, j0), and(k, k0), then we want toinclude the product xixjxk
in the objective, yielding atensor problem.
We propose to study tensor methods to perform network alignmentwith triangle and other higher-order graph moment matching. Similarideas were proposed by Svab (2007); Chertok and Keller (2010) alsoproposed using triangles to aid in network alignment problems.In Bayati et al. (2011), we found that triangles were a key missingcomponent in a network alignment problem with a known solution.Given that preserving a triangle requires three edges between twographs, this yields a tensor problem:
maximizeX
i2L
wixi +X
i2L
X
j2L
xixjSi,j +X
i2L
X
j2L
X
k2L
xixjxkTi,j,k
| {z }triangle overlap term
subject to x is a matching.
Here, Ti,j,k = 1 when the edges corresponding to i, j, and k inL results in a triangle in the induced matching. Maximizing thisobjective is an intractable problem. We plan to investigate a heuris-tic based on a rank-1 approximation of the tensor T and usinga maximum-weight matching based rounding. Similar heuristicshave been useful in other matrix-based network alignment algo-rithms (Singh et al., 2007; Bayati et al., 2009). The work involvesenhancing the Symmetric-Shifted-Higher-Order Power Method due toKolda and Mayo (2011) to incredibly large and sparse tensors . On thisaspect, we plan to collaborate with Tamara G. Kolda. In an initialevaluation of this triangle matching on synthetic problems, using thetensor rank-1 approximation alone produced results that identifiedthe correct solution whereas all matrix approaches could not.
vision for the future
All of these projects fit into the PI’s vision for modernizing the matrix-computation paradigmto match the rapidly evolving space of network computations. This vision extends beyondthe scope of the current proposal. For example, the web is a huge network with over onetrillion unique URLs (Alpert and Hajaj, 2008), and search engines have indexed over 180billion of them (Cuil, 2009). Yet, why do we need to compute with the entire network?By way of analogy, note that we do not often solve partial di↵erential equations or modelmacro-scale physics by explicitly simulating the motion or interaction of elementary particles.We need something equivalent for the web and other large networks. Such investigations maytake many forms: network models, network geometry, or network model reduction. It is thevision of the PI that the language, algebra, and methodology of matrix computations will
11
maximize
Pijk
T
ijk
x
i
x
j
x
k
subject to kxk2
= 1
Human protein interaction networks 48,228 triangles Yeast protein interaction networks 257,978 triangles The tensor T has ~100,000,000,000 nonzeros
We work with it implicitly
where ! ensures the 2-norm
[x
(next)
]
i
= ⇢ · (
X
jk
T
ijk
x
j
x
k
+ �x
i
)
SSHOPM method due to "Kolda and Mayo
Simulation data analysis SIMAX ‘09, SISC ‘11,MapReduce ‘11, ICASSP ’12
Network alignment ICDM ‘09, SC ‘11, TKDE ‘13
Fast & Scalable"Network centrality SC ‘05, WAW ‘07, SISC ‘10, WWW ’10, …
Data clustering WSDM ‘12, KDD ‘12, CIKM ’13 …
Ax = b
min kAx � bkAx = �x
Massive matrix "computations
on multi-threaded and distributed architectures
Matrix exponentials
exp(A) is defined as
1X
k=0
1
k !
Ak Always converges
special case of a function of a matrix
dx
dt= Ax(t) , x(t) = exp(tA)x(0)
Evolution operator "for an ODE
A is n ⇥ n, real
ICME David Gleich · Purdue 8
6,$0� 5(9,(:� "��6RFLHW\�IRU�,QGXVWULDO�DQG�$SSOLHG�0DWKHPDWLFV�9RO������1R�����2FWREHU������ �������������������������������
1,1(7((1� '8%,286� :$<6� 72� &20387(�7+(� (;321(17,$/� 2)� $�0$75,; �
&/(9(� 02/(5W� $1'� &+$5/(6� 9$1� /2$1W�
$EVWUDFW��,Q�SULQFLSOH��WKH�H[SRQHQWLDO�RI�D�PDWUL[�FRXOG�EH�FRPSXWHG�LQ�PDQ\�ZD\V��0HWKRGV�LQYROYLQJ�DSSUR[LPDWLRQ�WKHRU\��GLIIHUHQWLDO�HTXDWLRQV��WKH�PDWUL[�HLJHQYDOXHV��DQG�WKH�PDWUL[�FKDUDFWHULVWLF�SRO\��QRPLDO�KDYH�EHHQ�SURSRVHG��,Q�SUDFWLFH��FRQVLGHUDWLRQ�RI�FRPSXWDWLRQDO�VWDELOLW\�DQG�HIILFLHQF\�LQGLFDWHV�WKDW�VRPH�RI�WKH�PHWKRGV�DUH�SUHIHUDEOH�WR�RWKHUV��EXW�WKDW�QRQH�DUH�FRPSOHWHO\�VDWLVIDFWRU\��
���,QWURGXFWLRQ��0DWKHPDWLFDO�PRGHOV� RI� PDQ\�SK\VLFDO��ELRORJLFDO��DQG�HFRQRPLF�SURFHVVHV�LQYROYH�V\VWHPV�RI�OLQHDU��FRQVWDQW�FRHIILFLHQW�RUGLQDU\�GLIIHUHQWLDO�HTXDWLRQV�
[��W�� �$[��W���
+HUH�$� LV�D�JLYHQ��IL[HG��UHDO�RU�FRPSOH[�Q�E\�Q�PDWUL[��$�VROXWLRQ�YHFWRU�[�W��LV�VRXJKW�ZKLFK�VDWLVILHV�DQ�LQLWLDO�FRQGLWLRQ�
[��2�� �[R��
,Q�FRQWURO�WKHRU\��$�LV�NQRZQ�DV�WKH�VWDWH�FRPSDQLRQ�PDWUL[�DQG�[�W��LV�WKH�V\VWHP�UHVSRQVH��
,Q�SULQFLSOH��WKH�VROXWLRQ�LV�JLYHQ�E\�[�W�� � HW$[R�ZKHUH�HW$�FDQ�EH�IRUPDOO\�GHILQHG�E\�WKH�FRQYHUJHQW�SRZHU�VHULHV�
W�$��W$� W$�H� ,�,$�� a������
7KH�HIIHFWLYH�FRPSXWDWLRQ�RI�WKLV�PDWUL[�IXQFWLRQ�LV�WKH�PDLQ�WRSLF�RI�WKLV�VXUYH\��:H�ZLOO�SULPDULO\�EH�FRQFHUQHG�ZLWK�PDWULFHV�ZKRVH�RUGHU�Q�LV�OHVV�WKDQ�D�IHZ�
KXQGUHG��VR�WKDW�DOO�WKH�HOHPHQWV�FDQ�EH�VWRUHG�LQ�WKH�PDLQ�PHPRU\�RI�D�FRQWHPSRUDU\�FRPSXWHU��2XU�GLVFXVVLRQ�ZLOO�EH�OHVV�JHUPDQH�WR�WKH�W\SH�RI�ODUJH��VSDUVH�PDWULFHV�ZKLFK�RFFXU�LQ�WKH�PHWKRG�RI�OLQHV�IRU�SDUWLDO�GLIIHUHQWLDO�HTXDWLRQV��
'R]HQV�RI�PHWKRGV�IRU�FRPSXWLQJ�H�W$FDQ�EH�REWDLQHG�IURP�PRUH�RU�OHVV�FODVVLFDO�UHVXOWV�LQ�DQDO\VLV��DSSUR[LPDWLRQ�WKHRU\��DQG�PDWUL[�WKHRU\��6RPH�RI�WKH�PHWKRGV�KDYH�EHHQ�SURSRVHG�DV�VSHFLILF�DOJRULWKPV��ZKLOH�RWKHUV�DUH�EDVHG�RQ�OHVV�FRQVWUXFWLYH�FKDUDFWHUL]DWLRQV��2XU� ELEOLRJUDSK\�FRQFHQWUDWHV�RQ� UHFHQW�SDSHUV�ZLWK�VWURQJ�DOJRULWKPLF�FRQWHQW��DOWKRXJK�ZH�KDYH�LQFOXGHG�D�IDLU�QXPEHU�RI�UHIHUHQFHV�ZKLFK�SRVVHVV�KLVWRULFDO�RU�WKHRUHWLFDO�LQWHUHVW��
,Q�WKLV�VXUYH\�ZH�WU\�WR�GHVFULEH�DOO�WKH�PHWKRGV�WKDW�DSSHDU�WR�EH�SUDFWLFDO��FODVVLI\�WKHP�LQWR�ILYH�EURDG�FDWHJRULHV��DQG�DVVHVV�WKHLU�UHODWLYH�HIIHFWLYHQHVV��$FWX��DOO\��HDFK�RI�WKH��PHWKRGV��ZKHQ�FRPSOHWHO\�LPSOHPHQWHG�PLJKW�OHDG�WR�PDQ\�GLIIHUHQW�FRPSXWHU�SURJUDPV�ZKLFK�GLIIHU�LQ�YDULRXV�GHWDLOV��0RUHRYHU��WKHVH�GHWDLOV�PLJKW�KDYH�PRUH�LQIOXHQFH�RQ�WKH�DFWXDO�SHUIRUPDQFH�WKDQ�RXU�JURVV�DVVHVVPHQW�LQGLFDWHV��7KXV��RXU�FRPPHQWV�PD\�QRW�GLUHFWO\�DSSO\�WR�SDUWLFXODU�VXEURXWLQHV��
,Q�DVVHVVLQJ�WKH�HIIHFWLYHQHVV�RI�YDULRXV�DOJRULWKPV�ZH�ZLOO�EH�FRQFHUQHG�ZLWK�WKH�IROORZLQJ�DWWULEXWHV��OLVWHG�LQ�GHFUHDVLQJ�RUGHU�RI�LPSRUWDQFH��JHQHUDOLW\��UHOLDELOLW\��
�5HFHLYHG�E\�WKH�HGLWRUV�-XO\�����������DQG�LQ�UHYLVHG�IRUP�0DUFK�����������W�'HSDUWPHQW�RI�0DWKHPDWLFV��8QLYHUVLW\�RI�1HZ�0H[LFR��$OEXTXHUTXH��1HZ�0H[LFR��������7KLV�
ZRUN�ZDV�SDUWLDOO\�VXSSRUWHG�E\�16)�*UDQW�0&6����������W�'HSDUWPHQW�RI�&RPSXWHU�6FLHQFH��&RUQHOO�8QLYHUVLW\��,WKDFD��1HZ�<RUN��������7KLV�ZRUN�ZDV�
SDUWLDOO\�VXSSRUWHG�E\�16)�*UDQW�0&6����������
����
This content downloaded from 128.210.126.199 on Sun, 28 Jul 2013 21:30:56 PMAll use subject to JSTOR Terms and Conditions
SIAM REVIEW c! 2003 Society for Industrial and Applied MathematicsVol. 45, No. 1, pp. 3–49
Nineteen Dubious Ways toCompute the Exponential of aMatrix, Twenty-Five Years Later!
Cleve Moler†
Charles Van Loan‡
Abstract. In principle, the exponential of a matrix could be computed in many ways. Methods involv-ing approximation theory, di!erential equations, the matrix eigenvalues, and the matrixcharacteristic polynomial have been proposed. In practice, consideration of computationalstability and e"ciency indicates that some of the methods are preferable to others butthat none are completely satisfactory.
Most of this paper was originally published in 1978. An update, with a separatebibliography, describes a few recent developments.
Key words. matrix, exponential, roundo! error, truncation error, condition
AMS subject classifications. 15A15, 65F15, 65F30, 65L99
PII. S0036144502418010
1. Introduction. Mathematical models of many physical, biological, and eco-nomic processes involve systems of linear, constant coe!cient ordinary di"erentialequations
x(t) = Ax(t).
Here A is a given, fixed, real or complex n-by-n matrix. A solution vector x(t) issought which satisfies an initial condition
x(0) = x0.
In control theory, A is known as the state companion matrix and x(t) is the systemresponse.
In principle, the solution is given by x(t) = etAx0, where etA can be formallydefined by the convergent power series
etA = I + tA +t2A2
2!+ · · · .
The e"ective computation of this matrix function is the main topic of this survey.
!Published electronically February 3, 2003. A portion of this paper originally appeared in SIAMReview, Volume 20, Number 4, 1978, pages 801–836.
http://www.siam.org/journals/sirev/45-1/41801.html†The MathWorks, Inc., 3 Apple Hill Drive, Natick, MA 01760-2098 ([email protected]).‡Department of Computer Science, Cornell University, 4130 Upson Hall, Ithaca, NY 14853-7501
3
Dow
nloa
ded
07/2
8/13
to 1
28.2
10.1
26.1
99. R
edis
tribu
tion
subj
ect t
o SI
AM
lice
nse
or c
opyr
ight
; see
http
://w
ww
.siam
.org
/jour
nals
/ojs
a.ph
p
ICME David Gleich · Purdue 9
Matrix exponentials on large networks
exp(A) =
1X
k=0
1
k !
Ak If A is the adjacency matrix, then Ak counts the number of length k paths between node pairs.
[Estrada 2000, Farahat et al. 2002, 2006] Large entries denote important nodes or edges. Used for link prediction and centrality
If P is a transition matrix, then "Pk is the probability of a length k walk between node pairs.
[Kondor & Lafferty 2002, Kunegis & Lommatzsch 2009, Chung 2007] Used for link prediction, kernels, and clustering or community detection
exp(P) =
1X
k=0
1
k !
Pk
ICME David Gleich · Purdue 10
Another useful matrix exponential
P column stochastic
e.g. P = AT D�1
A is the adjacency matrix
if A is symmetric
exp(PT) = exp(D�1A) = D�1
exp(AD�1
)D = D�1
exp(P)D
ICME David Gleich · Purdue 11
Another useful matrix exponential
P column stochastic
e.g. P = AT D�1
A is the adjacency matrix
if A is symmetric
exp(�L) = exp(D�1/2AD�1/2 � I)
=
1
eexp(D�1/2AD�1/2
)
=
1
eD�1/2
exp(AD�1
)D1/2
=
1
eD�1/2
exp(P)D1/2
Negative Normalized Laplacian
ICME David Gleich · Purdue 12
Matrix exponentials on large networks Is a single column interesting? Yes!
exp(P)ec =
1X
k=0
1
k !
Pk ec Link prediction scores for node c A community relative to node c
But … modern networks are "large ~ O(109) nodes, sparse ~ O(1011) edges, constantly changing …
and so we’d like "speed over accuracy
ICME David Gleich · Purdue 13
The issue with existing methods
We want good results in less than one matvec. Our graphs have small diameter and fast fill-in. Krylov methods !A few matvecs, quick loss of sparsity due to orthogonality !Direct expansion!A few matvecs, quick loss of sparsity due to fill-in
ICME David Gleich · Purdue 14
exp(P)ec ⇡ ⇢Vexp(H)e1
[Sidje 1998]"ExpoKit
exp(P)ec ⇡PN
k=0
1
k !
Pk ec
Outline
1. Motivation and setup 2. Converting x = exp(P) ec into a linear system 3. Coordinate descent methods for "
linear systems from large networks 4. Error analysis 5. Experiments
✓
ICME David Gleich · Purdue 15
Our underlying method
Direct expansion!A few matvecs, quick loss of sparsity due to fill-in This method is stable for stochastic P!
"… no cancellation, unbounded norm, etc. !!
ICME David Gleich · Purdue 16
x = exp(P)ec ⇡PN
k=0
1
k !
P
kec = xN
Lemma kx � xNk1 1N!N
Our underlying method !as a linear system Direct expansion! "!!!
ICME David Gleich · Purdue 17
x = exp(P)ec ⇡PN
k=0
1
k !
P
kec = xN
2
6666664
III�P/1 III
�P/2. . .. . . III
�P/N III
3
7777775
2
6666664
v0v1......
vN
3
7777775=
2
6666664
ec0......0
3
7777775xN =
NX
i=0
vi
(III⌦ IIIN � SN ⌦ P)v = e1 ⌦ ec
Lemma we approximate xN well if we approximate v well
Our mission (2)!Approximately solve " when A, b are sparse,"x is localized.
ICME David Gleich · Purdue 18
Ax = b
Outline
1. Motivation and setup 2. Converting x = exp(P) ec into a linear system 3. Coordinate descent methods for "
linear systems from large networks 4. Error analysis 5. Experiments
✓
ICME David Gleich · Purdue 19
✓
Coordinate descent, Gauss-Southwell, Gauss-Seidel, relaxation & “push” methods
ICME David Gleich · Purdue 20
Algebraically! Procedurally!
Solve(A,b) x = sparse(size(A,1),1) r = b While (1) Pick j where r(j) != 0 z = r(j) x(j) = x(j) + r(j) For i where A(i,j) != 0 r(i) = r(i) – z*A(i,j)
Ax = b
r
(k ) = b � Ax
(k )
x
(k+1) = x
(k ) + ejeTj r
(k )
r
(k+1) = r
(k ) � r (k )j Aej
It’s called the “push” method because of PageRank
ICME David Gleich · Purdue 21
(III� ↵P)x = v
r
(k )
= v � (III� ↵P)x
(k )
x
(k+1)
= x
(k )
+ ejeTj r
(k )
“r
(k+1)
= r
(k ) � r (k )
j Aej ”
r (k+1)
i =
8><
>:
0 i = jr (k )
i + ↵Pi ,j r(k )
j Pi ,j 6= 0
r (k )
i otherwise
PageRankPush(links,v,alpha) x = sparse(size(A,1),1) r = b While (1) Pick j where r(j) != 0 z = r(j) x(j) = x(j) + z r(j) = 0 z = z / deg(j) For i where “j links to i” r(i) = r(i) + z
It’s called the “push” method because of PageRank
ICME David Gleich · Purdue 22
Demo
Justification of terminology
This method is frequently “rediscovered” (3 times for PageRank!)
Let Ax = b, diag(A) = I It’s Gauss-Seidel if j is chosen cyclically It’s Gauss-Southwell if j is the largest entry in the residual It’s coordinate descent if A is symmetric, pos. definite It’s a relaxation step for any A
Works great for other problems too! "[Bonchi, Gleich, et al. J. Internet Math. 2012]
ICME David Gleich · Purdue 23
Back to the exponential
ICME David Gleich · Purdue 24
2
6666664
III�P/1 III
�P/2. . .. . . III
�P/N III
3
7777775
2
6666664
v0v1......
vN
3
7777775=
2
6666664
ec0......0
3
7777775xN =
NX
i=0
vi
(III⌦ IIIN � SN ⌦ P)v = e1 ⌦ ec
Solve this system via the same method. Optimization 1 build system implicitly Optimization 2 don’t store vi, just store sum xN
Code (inefficient, but working) for !Gauss-Southwell to solve function x = nexpm(P,c,tol) n = size(P,1); N = 11; sumr=1; r = zeros(n,N+1); r(c,1) = 1; x = zeros(n,1); % the residual and solution while sumr >= tol % use max iteration too [ml,q]=max(r(:)); i=mod(q-1,n)+1; k=ceil(q/n); % use a heap in practice for max r(q) = 0; x(i) = x(i)+ml; sumr = sumr-ml;% zero the residual, add to solution [nset,~,vals] = find(P(:,i)); ml=ml/k; % look up the neighbors of node i for j=1:numel(nset) % for all neighbors if k==N, x(nset(j)) = x(nset(j)) + vals(j)*ml; % add to solution else, r(nset(j),k+1) = r(nset(j),k+1) + vals(j)*ml;% or add to next residual sumr = sumr + vals(j)*ml; end, end, end % end if, end for, end while Todo use dictionary for x, r and use heap or queue for residual
ICME David Gleich · Purdue 25
Outline
1. Motivation and setup 2. Converting x = exp(P) ec into a linear system 3. Coordinate descent methods for "
linear systems from large networks 4. Error analysis 5. Experiments
✓
ICME David Gleich · Purdue 26
✓ ✓
Error analysis for Gauss-Southwell
ICME David Gleich · Purdue 27
Theorem
Assume P is column-stochastic, v
(0)
= 0.
(Nonnegativity)
iterates and residuals are nonnegative
v
(l) � 0 and r
(l) � 0
(Convergence)
residual goes to 0:
kr
(l)k1
Q
l
k=1
�1 � 1
2dk
� l
(
� 1
2d
)
(III⌦ IIIN � SN ⌦ P)v = e1 ⌦ ec
“easy”
“annoying” d is the
largest degree
Proof sketch
Gauss-Southwell picks largest residual ⇒ Bound the update by avg. nonzeros in residual (sloppy) ⇒ Algebraic convergence with slow rate, but each update is
REALLY fast O(d max log n).
If d is log log n, then our method runs in sub-linear time "(but so does just about anything)
ICME David Gleich · Purdue 28
Overall error analysis
ICME David Gleich · Purdue 29
Components!Truncation to N terms Residual to error Approximate solve
Theorem kxN(`) � xk1 1
N!N+
1e· `� 1
2d
After ℓ steps of Gauss-Southwell
Outline
1. Motivation and setup 2. Converting x = exp(P) ec into a linear system 3. Coordinate descent methods for "
linear systems from large networks 4. Error analysis 5. Experiments
✓
ICME David Gleich · Purdue 30
✓ ✓
✓
Our implementations
C++ mex implementation with a heap to implement Gauss-Southwell. C++ mex implementation with a queue to store all residual entries ≥ 1/(tol nN).
At completion, the residual norm ≤ tol. We use the queue except for the runtime comparison.
ICME David Gleich · Purdue 31
Accuracy vs. tolerance
ICME David Gleich · Purdue 32
0
0.2
0.4
0.6
0.8
1
!2 !3 !4 !5 !6 !7log10 of residual tolerance
Pre
cisi
on a
t 1
00
pgp!ccpgp social graph, 10k vertices
For the pgp social graph, we study the precision in finding the 100 largest nodes as we vary the tolerance. This set of 100 does not include the nodes immediate neighbors. (Boxplot over 50 trials)
Accuracy vs. work
ICME David Gleich · Purdue 33
For the dblp collaboration graph, we study the precision in finding the 100 largest nodes as we vary the work. This set of 100 does not include the nodes immediate neighbors. (One column, but representative)
10−2 10−1 100
0
0.2
0.4
0.6
0.8
1
dblp−cc
Effective matrix−vector products
Prec
isio
n
to
l=10
−4
tol=
10−5
@10@25@100@1000
dblp collaboration graph, 225k vertices
Runtime
ICME David Gleich · Purdue 34
103
104
105
106
10!4
10!2
100
|E| + |V|
Runtim
e (
secs
).
TSGSTSGSQEXPVMEXPVTAYLOR
Flickr social network"500k nodes, 5M edges
Outline
1. Motivation and setup 2. Converting x = exp(P) ec into a linear system 3. Coordinate descent methods for "
linear systems from large networks 4. Error analysis 5. Experiments
✓
ICME David Gleich · Purdue 35
✓ ✓
✓ ✓
References and ongoing work Kloster and Gleich, Workshop on Algorithms for the Web-graph, 2013 (forthcoming). www.cs.purdue.edu/homes/dgleich/codes/nexpokit
• Error analysis using the queue • Better linear systems for faster convergence • Asynchronous coordinate descent methods • Scaling up to billion node graphs • More explicit localization in algorithms
ICME David Gleich · Purdue 36