carnegie mellon university graphlab tutorial yucheng low

Carnegie Mellon University

GraphLab TutorialYucheng Low

2

GraphLab Team

YuchengLow

AapoKyrola

JayGu

JosephGonzalez

DannyBickson

Carlos Guestrin

GraphLab 0.5 (2010) Internal Experimental Code

Insanely Templatized

Development History

GraphLab 1 (2011)

Nearly Everything is Templatized

First Open Source Release (< June 2011 LGPL >= June 2011 APL)

GraphLab 2 (2012)

Many Things are Templatized

Shared Memory : Jan 2012Distributed : May 2012

Graphlab 2 Technical Design Goals

Improved useabilityDecreased compile timeAs good or better performance than GraphLab 1Improved distributed scalability

… other abstraction changes … (come to the talk!)

Development HistoryEver since GraphLab 1.0, all active development are open source (APL):

code.google.com/p/graphlabapi/

(Even current experimental code. Activated with a --experimental flag on ./configure )

Guaranteed Target Platforms• Any x86 Linux system with gcc >= 4.2• Any x86 Mac system with gcc 4.2.1 ( OS X 10.5 ?? )

• Other platforms?

… We welcome contributors.

Tutorial OutlineGraphLab in a few slides + PageRankChecking out GraphLab v2Implementing PageRank in GraphLab v2Overview of different GraphLab schedulersPreview of Distributed GraphLab v2

(may not work in your checkout!)Ongoing work… (however much as time allows)

WarningA preview of code still in intensive development!

Things may or may not work for you!

Interface may still change!

GraphLab 1 GraphLab 2 still has a number of performance regressions we are ironing out.

PageRank ExampleIterate:

Where:α is the random reset probabilityL[j] is the number of links on page j

1 32

4 65

10

The GraphLab Framework

Scheduler Consistency Model

Graph BasedData Representation

Update FunctionsUser Computation

11

Data GraphA graph with arbitrary data (C++ Objects) associated with each vertex and edge

Vertex Data:• Webpage• Webpage Features

Edge Data:• Link weight

Graph:• Link graph

12





pagerank(i, scope){ // Get Neighborhood data (R[i], Wij, R[j]) scope;

// Update the vertex data

// Reschedule Neighbors if needed if R[i] changes then reschedule_neighbors_of(i); }

;][)1(][][

iNj

ji jRWiR

Update Functions

13

An update function is a user defined program which when applied to a vertex transforms the data in the scope of the vertex

14

Dynamic Schedule

e f g

kjih

dcbaCPU 1

CPU 2

a

h

a

b

b

i

Process repeats until scheduler is empty

Source Code Interjection 1

Graph, update functions, and schedulers

--scope=vertex--scope=edge

Consistency

Trade-offConsistency “Throughput”

# “iterations” per second

Goal of ML algorithm: Converge

False Trade-off

18

Ensuring Race-Free CodeHow much can computation overlap?

19





Importance of ConsistencyFast ML Algorithm development cycle:

Build

Test

Debug

Tweak Model

Necessary for framework to behave predictably and consistently and avoid problems caused by non-determinism.Is the execution wrong? Or is the model wrong?

20

Full Consistency

Guaranteed safety for all update functions

Full Consistency

Parallel update only allowed two vertices apart Reduced opportunities for parallelism

Obtaining More Parallelism

Not all update functions will modify the entire scope!

Belief Propagation: Only uses edge dataGibbs Sampling: Only needs to read adjacent vertices

Edge Consistency

Obtaining More Parallelism

“Map” operations. Feature extraction on vertex data

Vertex Consistency





27

Shared VariablesGlobal aggregation through Sync OperationA global parallel reduction over the graph dataSynced variables recomputed at defined intervals while update functions are running

Sync: HighestPageRank

Sync: Loglikelihood

28

Source Code Interjection 2

Shared variables

What can we do with these primitives?

…many many things…

Matrix FactorizationNetflix Collaborative Filtering

Alternating Least Squares Matrix Factorization

Model: 0.5 million nodes, 99 million edges

Netflix

Users

Movies

d

NetflixSpeedup Increasing size of the matrix factorization

Video Co-SegmentationDiscover “coherent”segment types acrossa video (extends Batra et al. ‘10)

1. Form super-voxels video2. EM & inference in Markov random field

Large model: 23 million nodes, 390 million edges

GraphLab

Ideal

Many MoreTensor FactorizationBayesian Matrix FactorizationGraphical Model Inference/LearningLinear SVMEM clusteringLinear Solvers using GaBPSVDEtc.

Distributed Preview

GraphLab 2 Abstraction

Changes(an overview couple of them)

(Come to the talk for the rest!)

Exploiting Update Functors

(for the greater good)

Exploiting Update Functors (for the greater good)

1. Update Functors store state2. Scheduler schedules update functor instances.

3. We can use update functors as a controlled asynchronous message passing to communicate between vertices!

Delta Based Update Functorsstruct pagerank : public iupdate_functor<graph, pagerank> {

double delta;pagerank(double d) : delta(d) { }void operator+=(pagerank& other) { delta +=

other.delta; }void operator()(icontext_type& context) {

vertex_data& vdata = context.vertex_data();

vdata.rank += delta;if(abs(delta) > EPSILON) {

double out_delta = delta * (1 – RESET_PROB) *

1/context.num_out_edges(edge.source());

context.schedule_out_neighbors(pagerank(out_delta));}

}};// Initial Rank: R[i] = 0;// Initial Schedule: pagerank(RESET_PROB);

Asynchronous Message PassingObviously not all computation can be written this way. But when it can; it can be extremely fast.

Factorized Updates

PageRank in GraphLab

struct pagerank : public iupdate_functor<graph, pagerank> {

void operator()(icontext_type& context) {vertex_data& vdata =

context.vertex_data(); double sum = 0;foreach ( edge_type edge,

context.in_edges() )sum +=

context.const_edge_data(edge).weight *

context.const_vertex_data(edge.source()).rank;double old_rank = vdata.rank;vdata.rank = RESET_PROB + (1-RESET_PROB) *

sum;double residual = abs(vdata.rank –

old_rank) /

context.num_out_edges();if (residual > EPSILON)

context.reschedule_out_neighbors(pagerank());}

};

PageRank in GraphLab

struct pagerank : public iupdate_functor<graph, pagerank> {

void operator()(icontext_type& context) {vertex_data& vdata =

context.vertex_data(); double sum = 0;foreach ( edge_type edge,

context.in_edges() )sum +=

context.const_edge_data(edge).weight *

context.const_vertex_data(edge.source()).rank;double old_rank = vdata.rank;vdata.rank = RESET_PROB + (1-RESET_PROB) *

sum;double residual = abs(vdata.rank –

old_rank) /

context.num_out_edges();if (residual > EPSILON)

context.reschedule_out_neighbors(pagerank());}

};

Atomic Single Vertex Apply

Parallel Scatter [Reschedule]

Parallel “Sum” Gather

Decomposable Update Functors

Decompose update functions into 3 phases:

+ + … + Δ

Y YY

ParallelSum

User Defined:

Gather( ) ΔY

Δ1 + Δ2 Δ3

Y Scope

Gather

Y

YApply( , Δ) Y

Apply the accumulated value to center vertex

User Defined:

Apply

Y

Scatter( )

Update adjacent edgesand vertices.

User Defined:Y

Scatter

Factorized PageRankstruct pagerank : public iupdate_functor<graph, pagerank> { double accum = 0, residual = 0;

void gather(icontext_type& context, const edge_type& edge) {

accum += context.const_edge_data(edge).weight *

context.const_vertex_data(edge.source()).rank;}void merge(const pagerank& other) { accum +=

other.accum; }void apply(icontext_type& context) {

vertex_data& vdata = context.vertex_data();double old_value = vdata.rank;vdata.rank = RESET_PROB + (1 - RESET_PROB)

* accum; residual = fabs(vdata.rank – old_value) /

context.num_out_edges();}void scatter(icontext_type& context, const

edge_type& edge) {if (residual > EPSILON)

context.schedule(edge.target(), pagerank());

}};

Demo of *everything*

PageRank

Ongoing WorkExtensions to improve performance on large graphs.

(See the GraphLab talk later!!)Better distributed Graph representation methodsPossibly better Graph PartitioningOff-core Graph storageContinually changing graphs

All New rewrite of distributed GraphLab (come back in May!)

carnegie mellon university graphlab tutorial yucheng low

Documents