1 parallel sparse operations in matlab: exploring large graphs john r. gilbert university of...

35
1 Parallel Sparse Operations in Matlab: Exploring Large Graphs John R. Gilbert University of California at Santa Barbara Aydin Buluc (UCSB) Brad McRae (NCEAS) Steve Reinhardt (Interactive Supercomputing) Viral Shah (ISC & UCSB) with thanks to Alan Edelman (MIT & ISC) and Jeremy Kepner (MIT-LL) Support: DOE, NSF, DARPA, SGI, ISC

Upload: lambert-walton

Post on 16-Dec-2015

220 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: 1 Parallel Sparse Operations in Matlab: Exploring Large Graphs John R. Gilbert University of California at Santa Barbara Aydin Buluc (UCSB) Brad McRae

1

Parallel Sparse Operations in Matlab: Exploring Large Graphs

John R. GilbertUniversity of California at Santa Barbara

Aydin Buluc (UCSB)Brad McRae (NCEAS)Steve Reinhardt (Interactive Supercomputing)Viral Shah (ISC & UCSB)

with thanks to Alan Edelman (MIT & ISC) and Jeremy Kepner (MIT-LL)

Support: DOE, NSF, DARPA, SGI, ISC

Page 2: 1 Parallel Sparse Operations in Matlab: Exploring Large Graphs John R. Gilbert University of California at Santa Barbara Aydin Buluc (UCSB) Brad McRae

2

3D Spectral Coordinates

Page 3: 1 Parallel Sparse Operations in Matlab: Exploring Large Graphs John R. Gilbert University of California at Santa Barbara Aydin Buluc (UCSB) Brad McRae

3

2D Histogram: RMAT Graph

Page 4: 1 Parallel Sparse Operations in Matlab: Exploring Large Graphs John R. Gilbert University of California at Santa Barbara Aydin Buluc (UCSB) Brad McRae

4

Strongly Connected Components

Page 5: 1 Parallel Sparse Operations in Matlab: Exploring Large Graphs John R. Gilbert University of California at Santa Barbara Aydin Buluc (UCSB) Brad McRae

5

Social Network Analysis in Matlab: 1993

Co-author graph from 1993

Householdersymposium

Page 6: 1 Parallel Sparse Operations in Matlab: Exploring Large Graphs John R. Gilbert University of California at Santa Barbara Aydin Buluc (UCSB) Brad McRae

6

Combinatorial Scientific Computing

Emerging large scale, high-performance applications:

• Web search and information retrieval

• Knowledge discovery

• Computational biology

• Dynamical systems

• Machine learning

• Bioinformatics

• Sparse matrix methods

• Geometric modeling

• . . .

How will combinatorial methods be used by nonexperts?

Page 7: 1 Parallel Sparse Operations in Matlab: Exploring Large Graphs John R. Gilbert University of California at Santa Barbara Aydin Buluc (UCSB) Brad McRae

7

Outline

• Infrastructure: Array-based sparse graph computation

• An application: Computational ecology

• Some nuts and bolts: Sparse matrix multiplication

Page 8: 1 Parallel Sparse Operations in Matlab: Exploring Large Graphs John R. Gilbert University of California at Santa Barbara Aydin Buluc (UCSB) Brad McRae

8

Matlab*P

A = rand(4000*p, 4000*p);

x = randn(4000*p, 1);

y = zeros(size(x));

while norm(x-y) / norm(x) > 1e-11

y = x;

x = A*x;

x = x / norm(x);

end;

Page 9: 1 Parallel Sparse Operations in Matlab: Exploring Large Graphs John R. Gilbert University of California at Santa Barbara Aydin Buluc (UCSB) Brad McRae

9

MATLAB®

Star-P Architecture

Ordinary Matlab variables

Star-P

client manager

server manager

package manager

processor #0

processor #n-1

processor #1

processor #2

processor #3

. . .

ScaLAPACK

FFTW

FPGA interface

matrix manager Distributed matrices

sort

dense/sparse

UPC user code

MPI user code

Page 10: 1 Parallel Sparse Operations in Matlab: Exploring Large Graphs John R. Gilbert University of California at Santa Barbara Aydin Buluc (UCSB) Brad McRae

10

P0

P1

P2

Pn

5941 532631

23 131

Each processor stores local vertices & edges in a compressed row structure.

Has been scaled to >108 vertices, >109 edges in interactive session.

Distributed Sparse Array Structure

1

2 326

53

41

31

59

Page 11: 1 Parallel Sparse Operations in Matlab: Exploring Large Graphs John R. Gilbert University of California at Santa Barbara Aydin Buluc (UCSB) Brad McRae

11

Sparse Array and Matrix Operations

• dsparse layout, same semantics as ordinary full & sparse

• Matrix arithmetic: +, max, sum, etc.

• matrix * matrix and matrix * vector

• Matrix indexing and concatenation

A (1:3, [4 5 2]) = [ B(:, J) C ] ;

• Linear solvers: x = A \ b; using SuperLU (MPI)

• Eigensolvers: [V, D] = eigs(A); using PARPACK (MPI)

Page 12: 1 Parallel Sparse Operations in Matlab: Exploring Large Graphs John R. Gilbert University of California at Santa Barbara Aydin Buluc (UCSB) Brad McRae

12

Large-Scale Graph Algorithms

• Graph theory, algorithms, and data structures are ubiquitous in sparse matrix computation.

• Time to turn the relationship around!

• Represent a graph as a sparse adjacency matrix.

• A sparse matrix language is a good start on primitives for computing with graphs.

• Leverage the mature techniques and tools of high-performance numerical computation.

Page 13: 1 Parallel Sparse Operations in Matlab: Exploring Large Graphs John R. Gilbert University of California at Santa Barbara Aydin Buluc (UCSB) Brad McRae

13

Sparse Adjacency Matrix and Graph

• Adjacency matrix: sparse array w/ nonzeros for graph edges

• Storage-efficient implementation from sparse data structures

x ATx

1 2

3

4 7

6

5

AT

Page 14: 1 Parallel Sparse Operations in Matlab: Exploring Large Graphs John R. Gilbert University of California at Santa Barbara Aydin Buluc (UCSB) Brad McRae

14

Breadth-First Search: sparse mat * vec

x ATx

1 2

3

4 7

6

5

AT

• Multiply by adjacency matrix step to neighbor vertices

• Work-efficient implementation from sparse data structures

Page 15: 1 Parallel Sparse Operations in Matlab: Exploring Large Graphs John R. Gilbert University of California at Santa Barbara Aydin Buluc (UCSB) Brad McRae

15

Breadth-First Search: sparse mat * vec

x ATx

1 2

3

4 7

6

5

AT

• Multiply by adjacency matrix step to neighbor vertices

• Work-efficient implementation from sparse data structures

Page 16: 1 Parallel Sparse Operations in Matlab: Exploring Large Graphs John R. Gilbert University of California at Santa Barbara Aydin Buluc (UCSB) Brad McRae

16

Breadth-First Search: sparse mat * vec

AT

1 2

3

4 7

6

5

(AT)2x

x ATx

• Multiply by adjacency matrix step to neighbor vertices

• Work-efficient implementation from sparse data structures

Page 17: 1 Parallel Sparse Operations in Matlab: Exploring Large Graphs John R. Gilbert University of California at Santa Barbara Aydin Buluc (UCSB) Brad McRae

17

• Many tight clusters, loosely interconnected

• Input data is edge triples < i, j, label(i,j) >

• Vertices and edges permuted randomly

HPCS Graph Clustering Benchmark

Fine-grained, irregular data access

Searching and clustering

Page 18: 1 Parallel Sparse Operations in Matlab: Exploring Large Graphs John R. Gilbert University of California at Santa Barbara Aydin Buluc (UCSB) Brad McRae

18

Clustering by Breadth-First Search

% Grow each seed to vertices

% reached by at least k

% paths of length 1 or 2

C = sparse(seeds, 1:ns, 1, n, ns);

C = A * C;

C = C + A * C;

C = C >= k;

• Grow local clusters from many seeds in parallel

• Breadth-first search by sparse matrix * matrix

• Cluster vertices connected by many short paths

Page 19: 1 Parallel Sparse Operations in Matlab: Exploring Large Graphs John R. Gilbert University of California at Santa Barbara Aydin Buluc (UCSB) Brad McRae

19

Toolbox for Graph Analysis and Pattern Discovery

Layer 1: Graph Theoretic Tools

• Graph operations

• Global structure of graphs

• Graph partitioning and clustering

• Graph generators

• Visualization and graphics

• Scan and combining operations

• Utilities

Page 20: 1 Parallel Sparse Operations in Matlab: Exploring Large Graphs John R. Gilbert University of California at Santa Barbara Aydin Buluc (UCSB) Brad McRae

20

Typical Application Stack

Distributed Sparse MatricesArithmetic, matrix multiplication, indexing, solvers (\, eigs)

Graph Analysis & PD Toolbox

Graph querying & manipulation, connectivity, spanning trees,

geometric partitioning, nested dissection, NNMF, . . .

Preconditioned Iterative Methods

CG, BiCGStab, etc. + combinatorial preconditioners (AMG, Vaidya)

Applications

Computational ecology, CFD, data exploration

Page 21: 1 Parallel Sparse Operations in Matlab: Exploring Large Graphs John R. Gilbert University of California at Santa Barbara Aydin Buluc (UCSB) Brad McRae

21

Landscape Connnectivity Modeling

• Landscape type and features facilitate or impede movement of members of a species

• Different species have different criteria, scales, etc.

• Habitat quality, gene flow, population stability

• Corridor identification, conservation planning

Page 22: 1 Parallel Sparse Operations in Matlab: Exploring Large Graphs John R. Gilbert University of California at Santa Barbara Aydin Buluc (UCSB) Brad McRae

22

Pumas in Southern California

Joshua Tree N.P.

L.A.Palm Springs

Habitat quality model

Page 23: 1 Parallel Sparse Operations in Matlab: Exploring Large Graphs John R. Gilbert University of California at Santa Barbara Aydin Buluc (UCSB) Brad McRae

23

Predicting Gene Flow with Resistive Networks

Circuit model predictions:

N = 100 m = 0.01N = 100 m = 0.01Genetic vs. geographic distance:

Page 24: 1 Parallel Sparse Operations in Matlab: Exploring Large Graphs John R. Gilbert University of California at Santa Barbara Aydin Buluc (UCSB) Brad McRae

24

Early Experience with Real Genetic Data

• Good results with wolverines, mahogany, pumas

• Matlab implementation

• Needed:

– Finer resolution

– Larger landscapes

– Faster interaction

5km resolution(too coarse)

Page 25: 1 Parallel Sparse Operations in Matlab: Exploring Large Graphs John R. Gilbert University of California at Santa Barbara Aydin Buluc (UCSB) Brad McRae

25

Circuitscape: Combinatorics and Numerics

• Model landscape (ideally at 100m resolution for pumas).

• Initial grid models connections to 4 or 8 neighbors.

• Partition landscape into connected components via GAPDT

• Use GAPDT to contract habitats into single graph nodes.

• Compute resistance for pairs of habitats .

• Direct methods are too slow for largest problems.

• Use iterative solvers via Star-P:Hypre (PCG+AMG)

Page 26: 1 Parallel Sparse Operations in Matlab: Exploring Large Graphs John R. Gilbert University of California at Santa Barbara Aydin Buluc (UCSB) Brad McRae

26

Parallel Circuitscape Results

• Pumas in southern California:

– 12 million nodes

– Under 1 hour (16 processors)

– Original code took 3 days at coarser resolution

• Targeting much larger problems:

– Yellowstone-to-Yukon corridorFigures courtesy of Brad McRae, NCEAS

Page 27: 1 Parallel Sparse Operations in Matlab: Exploring Large Graphs John R. Gilbert University of California at Santa Barbara Aydin Buluc (UCSB) Brad McRae

27

Sparse Matrix times Sparse Matrix

• A primitive in many array-based graph algorithms:

– Parallel breadth-first search

– Shortest paths

– Graph contraction

– Subgraph / submatrix indexing

– Etc.

• Graphs are often not mesh-like, i.e. geometric locality and good separators.

• Often do not want to optimize for one repeated operation, as in matvec for iterative methods

Page 28: 1 Parallel Sparse Operations in Matlab: Exploring Large Graphs John R. Gilbert University of California at Santa Barbara Aydin Buluc (UCSB) Brad McRae

28

Sparse Matrix times Sparse Matrix

• Current work:

– Parallel algorithms with 2D data layout

– Sequential and parallel hypersparse algorithms

– Matrices over semirings

Page 29: 1 Parallel Sparse Operations in Matlab: Exploring Large Graphs John R. Gilbert University of California at Santa Barbara Aydin Buluc (UCSB) Brad McRae

29

* =I

J

A(I,K)

K

K

B(K,J)

C(I,J)

ParSpGEMM

C(I,J) += A(I,K)*B(K,J) • Based on SUMMA

• Simple for non-square matrices, etc.

Page 30: 1 Parallel Sparse Operations in Matlab: Exploring Large Graphs John R. Gilbert University of California at Santa Barbara Aydin Buluc (UCSB) Brad McRae

30

How Sparse? HyperSparse !

p blocks

p

nnz(j) = c

0p

cnnz(j) =

Any local data structure that depends on local submatrix dimension n (such as CSR or CSC) is too wasteful.

Page 31: 1 Parallel Sparse Operations in Matlab: Exploring Large Graphs John R. Gilbert University of California at Santa Barbara Aydin Buluc (UCSB) Brad McRae

31

SparseDComp Data Structure

• “Doubly compressed” data structure

• Maintains both DCSC and DCSR

• C = A*B needs only A.DCSC and B.DCSR

• 4*nnz values communicated for A*B in the worst case (though we usually get away with much less)

Page 32: 1 Parallel Sparse Operations in Matlab: Exploring Large Graphs John R. Gilbert University of California at Santa Barbara Aydin Buluc (UCSB) Brad McRae

32

Sequential Operation Counts

• Matlab: O(n+nnz(B)+f)

• SpGEMM: O(nzc(A)+nzr(B)+f*logk)

Break-even point

Required non- zero operations (flops)

Number of columns of A containing at least one non-zero

Page 33: 1 Parallel Sparse Operations in Matlab: Exploring Large Graphs John R. Gilbert University of California at Santa Barbara Aydin Buluc (UCSB) Brad McRae

33

Parallel Timings

• 16-processor Opteron, hypertransport, 64 GB memory

• R-MAT * R-MAT

• n = 220

• nnz = {8, 4, 2, 1, .5} * 220

time vs n/nnz, log-log plot

Page 34: 1 Parallel Sparse Operations in Matlab: Exploring Large Graphs John R. Gilbert University of California at Santa Barbara Aydin Buluc (UCSB) Brad McRae

34

Matrices over Semirings

• Matrix multiplication C = AB (or matrix/vector):

Ci,j = Ai,1B1,j + Ai,2B2,j + · · · + Ai,nBn,j

• Replace scalar operations and + by

: associative, distributes over , identity 1

: associative, commutative, identity 0 annihilates under

• Then Ci,j = Ai,1B1,j Ai,2B2,j · · · Ai,nBn,j

• Examples: (,+) ; (and,or) ; (+,min) ; . . .

• Same data reference pattern and control flow

Page 35: 1 Parallel Sparse Operations in Matlab: Exploring Large Graphs John R. Gilbert University of California at Santa Barbara Aydin Buluc (UCSB) Brad McRae

35

Remarks

• Tools for combinatorial methods built on parallel

sparse matrix infrastructure

• Easy-to-use interactive programming environment

– Rapid prototyping tool for algorithm development

– Interactive exploration and visualization of data

• Sparse matrix * sparse matrix is a key primitive

• Matrices over semirings like (min,+) as well as (+,*)