cc-4007, large-scale machine learning on graphs, by yucheng low, joseph gonzalez and carlos guestrin

80
LargeScale Machine Learning and Graphs Yucheng Low

Upload: amd-developer-central

Post on 01-Nov-2014

591 views

Category:

Technology


2 download

DESCRIPTION

Presentation CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin at the AMD Developer Summit (APU13) Nov. 11-13, 2013.

TRANSCRIPT

Page 1: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

Large-­‐Scale  Machine  Learning  and  Graphs  

6. Before

8. After

7. After

Yucheng  Low  

Page 2: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

Phase  1:  POSSIBILITY  

Benz  Patent  Motorwagen  (1886)  

Page 3: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

Phase  2:  SCALABILITY  

Model  T  Ford  (1908)  

Page 4: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

Phase  3:  USABILITY  

Page 5: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

Possibility 6. Before

8. After

7. After

Graph+  Scalability 6. Before

8. After

7. After

Usability

6. Before

8. After

7. After

Page 6: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

How  will  we  design  and  implement    

parallel  learning  systems?    

The  Big  QuesFon  of    Big  Learning  

Page 7: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

MapReduce  for  Data-­‐Parallel  ML  

Excellent  for  large  data-­‐parallel  tasks!  

Data-Parallel Graph-Parallel

Cross  ValidaFon  

Feature    ExtracFon  

MapReduce  

CompuFng  Sufficient  StaFsFcs    

Graphical  Models  Gibbs  Sampling  

Belief  PropagaFon  VariaFonal  Opt.  

Semi-­‐Supervised    Learning  

Label  PropagaFon  CoEM  

Graph  Analysis  PageRank  

Triangle  CounFng  

CollaboraLve    Filtering  

Tensor  FactorizaFon  

Is  there  more  to  Machine  Learning  

?  

Page 8: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

Liberal   ConservaFve  

Post  

Post  

Post  

Post  

Post  

Post  

Post  

Post  

Es(mate  Poli(cal  Bias  

Post  

Post  

Post  

Post  

Post  

Post  

Post  

Post  

Post  

Post  

Post  

Post  

Post  

Post  

?  ?  

?  

?  

?  ?  

?  

?   ?  ?  

?  

?  

?  ?  

?  ?  

?  

?  

?  

?  

?  

?  

?  

?  

?  

?  

?  

?   ?  

?  

Semi-­‐Supervised  &    TransducFve  Learning  

Page 9: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

Flashback  to  1998  

First  Google  advantage:    a  Graph  Algorithm  &  a  System  to  Support  it!  

Page 10: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

The  Power  of  Dependencies  

 where  the  value  is!  

Page 11: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

It’s  all  about  the  graphs…    

Page 12: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

Social  Media  

! Graphs  encode  the  relaLonships  between:  

! Big:  100  billions  of  verLces  and  edges  and  rich  metadata  ! Facebook  (10/2012):  1B  users,  144B  friendships    ! Twicer  (2011):  15B  follower  edges  

AdverLsing  Science   Web  

People  Facts  

Products  Interests  

Ideas  

Page 13: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

Examples  of  Graphs  in    

Machine  Learning  

Page 14: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

CollaboraFve  Filtering:  ExploiFng  Dependencies  

City  of  God  

Wild  Strawberries  

The  CelebraFon  

La  Dolce  Vita  

Women  on  the  Verge  of  a  Nervous  Breakdown  

Latent  Factor  Models  Matrix  CompleFon/FactorizaFon  Models  

Page 15: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

Topic  Modeling  

Cat  

Apple  

Growth  

Hat  

Plant  

Latent  Dirichlet  AllocaFon,  etc  

Page 16: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

Example  Topics  Discovered  from  Wikipedia  

Page 17: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

6. Before

8. After

7. After

Data

Machine  Learning  Pipeline  

images    

docs    

movie    raFngs  

 

social  acFvity  

 

 

Extract Features

Graph Formation Structured

Machine Learning Algorithm

Value from Data

face  labels  

 

doc  topics  

 

movie  recommend  

 

senFment  analysis  

Page 18: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

ML  Tasks  Beyond  Data-­‐Parallelism    

Data-Parallel Graph-Parallel

Cross  ValidaFon  

Feature    ExtracFon  

Map  Reduce  

CompuFng  Sufficient  StaFsFcs    

Graphical  Models  Gibbs  Sampling  

Belief  PropagaFon  VariaFonal  Opt.  

Semi-­‐Supervised    Learning  

Label  PropagaFon  CoEM  

Graph  Analysis  PageRank  

Triangle  CounFng  

CollaboraLve    Filtering  

Tensor  FactorizaFon  

Page 19: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

Example  of  a  Graph-­‐Parallel  Algorithm  

Page 20: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

PageRank  

What’s the rank of this user?

Rank?  

Depends on rank of who follows her

Depends on rank of who follows them…

Loops  in  graph  è  Must  iterate!  

Page 21: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

PageRank  IteraFon  

! α  is  the  random  reset  probability ! wji  is  the  prob.  transiFoning  (similarity)  from  j  to  i

R[i] = ↵+ (1� ↵)X

(j,i)2E

wjiR[j]R[i]  

R[j]  wji   Iterate  unFl  convergence:  

“My  rank  is  weighted    average  of  my  friends’  ranks”  

Page 22: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

ProperFes  of  Graph  Parallel  Algorithms  

Dependency  Graph  

IteraFve  ComputaFon  

My  Rank  

Friends  Rank  

Local  Updates  

Page 23: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

The  Need  for  a  New  AbstracFon  

Data-Parallel Graph-Parallel

Cross  ValidaFon  

Feature    ExtracFon  

Map  Reduce  

CompuFng  Sufficient  StaFsFcs    

Graphical  Models  Gibbs  Sampling  

Belief  PropagaFon  VariaFonal  Opt.  

Semi-­‐Supervised    Learning  

Label  PropagaFon  CoEM  

Data-­‐Mining  PageRank  

Triangle  CounFng  

CollaboraLve    Filtering  

Tensor  FactorizaFon  

! Need:  Asynchronous,  Dynamic  Parallel  ComputaFons  

Page 24: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

The  GraphLab  Goals  

Efficient  parallel  

predicFons  

Know how to solve ML problem

on 1 machine

Page 25: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

POSSIBILITY  

Page 26: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

Data  Graph  Data  associated  with  verFces  and  edges  

Vertex  Data:  •   User  profile  text  •   Current  interests  esFmates  

Edge  Data:  •   Similarity  weights    

Graph:  •   Social  Network  

Page 27: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

How  do  we  program    graph  computaFon?  

“Think  like  a  Vertex.”  -­‐Malewicz  et  al.  [SIGMOD’10]  

Page 28: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

pagerank(i,  scope){      //  Get  Neighborhood  data      (R[i],  wij,  R[j])  !scope;    

     //  Update  the  vertex  data            //  Reschedule  Neighbors  if  needed      if  R[i]  changes  then            reschedule_neighbors_of(i);    }  

R[i]←α + (1−α) wji ×R[ j]j∈N [i]∑ ;

Update  FuncFons  User-­‐defined  program:  applied  to    vertex  transforms  data  in  scope  of  vertex  

Dynamic    computaLon  

Update  funcFon  applied  (asynchronously)    in  parallel  unFl  convergence  

 Many  schedulers  available  to  prioriFze  computaFon  

Page 29: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

The  GraphLab  Framework  

Scheduler   Consistency  Model  

Graph  Based  Data  Representa(on  

Update  FuncFons  User  Computa(on  

Page 30: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

Bayesian  Tensor    FactorizaFon  

Gibbs  Sampling  Dynamic  Block  Gibbs  Sampling  

Matrix  FactorizaFon  

Lasso  

SVM  

Belief  PropagaFon   PageRank  

CoEM  

K-­‐Means  

SVD  

LDA  

…Many  others…  Linear  Solvers  

Splash  Sampler  AlternaFng  Least    

Squares  

Page 31: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

Never  Ending  Learner  Project  (CoEM)  

Hadoop   95  Cores   7.5  hrs  

Distributed  GraphLab  

32  EC2  machines  

80  secs  

0.3% of Hadoop time

2 orders of mag faster "# 2 orders of mag cheaper

Page 32: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

! ML  algorithms  as  vertex  programs    ! Asynchronous  execuFon  and  consistency  models  

 

Page 33: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

GraphLab  1  provided  exciFng  scaling  performance  

But…  

Thus  far…  

We  couldn’t  scale  up  to    Altavista  Webgraph  2002  1.4B  verLces,  6.7B  edges  

Page 34: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

Natural  Graphs  

[Image  from  WikiCommons]  

Page 35: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

Problem:  ExisFng  distributed  graph  

computaFon  systems  perform  poorly  on  Natural  Graphs  

Page 36: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

Achilles  Heel:      Idealized  Graph  AssumpFon  

Assumed…   But,  Natural  Graphs…  

Small  degree  "    Easy  to  parFFon   Many  high  degree  verFces  

(power-­‐law  degree  distribuFon)    "    

Very  hard  to  parFFon  

Page 37: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

Power-­‐Law  Degree  DistribuFon  

100 102 104 106 108100

102

104

106

108

1010

degree

count

High-­‐Degree    VerFces:    

1%  verFces  adjacent  to  50%  of  edges    

Num

ber  o

f  VerFces  

AltaVista  WebGraph  1.4B  VerFces,  6.6B  Edges  

Degree  

Page 38: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

High  Degree  VerFces  are  Common  

Users  

Movies  

NeYlix  

“Social”  People   Popular  Movies  

θ Z w Z w Z w Z w

θ Z w Z w Z w Z w

θ Z w Z w Z w Z w

θ Z w Z w Z w Z w

β α

Hyper  Parameters  

Docs  

Words  

LDA  

Common  Words  

Obama  

Page 39: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

Power-­‐Law  Degree  DistribuFon  “Star  Like”  MoFf  

President  Obama   Followers  

Page 40: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

Problem:    High  Degree  VerLces  è  High  CommunicaLon  for  Distributed  Updates  

Y  

Machine  1   Machine  2  

Natural  graphs  do  not  have  low-­‐cost  balanced  cuts              [Leskovec  et  al.  08,  Lang  04]  

 

Popular  parFFoning  tools  (MeFs,  Chaco,…)  perform  poorly                                              [Abou-­‐Rjeili  et  al.  06]  

Extremely  slow  and  require  substan(al  memory  

Data transmitted across network

O(# cut edges)

Page 41: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

Random  ParFFoning  ! Both  GraphLab  1,  Pregel,  Twicer,  Facebook,…  rely  on  Random  (hashed)  parFFoning  for  Natural  Graphs  

Machine  1   Machine  2  

3"2"

1"

D

A"

C"

B" 2"3"

C"

D

B"A"

1"

D

A"

C"C"

B"

(a) Edge-Cut

B"A" 1"

C" D3"

C" B"2"

C" D

B"A" 1"

3"

(b) Vertex-Cut

Figure 4: (a) An edge-cut and (b) vertex-cut of a graph intothree parts. Shaded vertices are ghosts and mirrors respectively.

5 Distributed Graph Placement

The PowerGraph abstraction relies on the distributed data-graph to store the computation state and encode the in-teraction between vertex programs. The placement ofthe data-graph structure and data plays a central role inminimizing communication and ensuring work balance.

A common approach to placing a graph on a cluster of pmachines is to construct a balanced p-way edge-cut (e.g.,Fig. 4a) in which vertices are evenly assigned to machinesand the number of edges spanning machines is minimized.Unfortunately, the tools [21, 31] for constructing balancededge-cuts perform poorly [1, 26, 23] or even fail on power-law graphs. When the graph is difficult to partition, bothGraphLab and Pregel resort to hashed (random) vertexplacement. While fast and easy to implement, hashedvertex placement cuts most of the edges:

Theorem 5.1. If vertices are randomly assigned to pmachines then the expected fraction of edges cut is:

E|Edges Cut|

|E|

�= 1� 1

p(5.1)

For example if just two machines are used, half of theof edges will be cut requiring order |E|/2 communication.

5.1 Balanced p-way Vertex-CutThe PowerGraph abstraction enables a single vertex pro-gram to span multiple machines. Hence, we can ensurework balance by evenly assigning edges to machines.Communication is minimized by limiting the number ofmachines a single vertex spans. A balanced p-way vertex-cut formalizes this objective by assigning each edge e2 Eto a machine A(e) 2 {1, . . . , p}. Each vertex then spansthe set of machines A(v)✓ {1, . . . , p} that contain its ad-jacent edges. We define the balanced vertex-cut objective:

minA

1|V | Â

v2V|A(v)| (5.2)

s.t. maxm

|{e 2 E | A(e) = m}|< l |E|p

(5.3)

where the imbalance factor l � 1 is a small constant. Weuse the term replicas of a vertex v to denote the |A(v)|copies of the vertex v: each machine in A(v) has a replicaof v. The objective term (Eq. 5.2) therefore minimizes the

average number of replicas in the graph and as a conse-quence the total storage and communication requirementsof the PowerGraph engine.

Vertex-cuts address many of the major issues associatedwith edge-cuts in power-law graphs. Percolation theory[3] suggests that power-law graphs have good vertex-cuts.Intuitively, by cutting a small fraction of the very highdegree vertices we can quickly shatter a graph. Further-more, because the balance constraint (Eq. 5.3) ensuresthat edges are uniformly distributed over machines, wenaturally achieve improved work balance even in the pres-ence of very high-degree vertices.

The simplest method to construct a vertex cut is torandomly assign edges to machines. Random (hashed)edge placement is fully data-parallel, achieves nearly per-fect balance on large graphs, and can be applied in thestreaming setting. In the following we relate the expectednormalized replication factor (Eq. 5.2) to the number ofmachines and the power-law constant a .

Theorem 5.2 (Randomized Vertex Cuts). Let D[v] denotethe degree of vertex v. A uniform random edge placementon p machines has an expected replication factor

E"

1|V | Â

v2V|A(v)|

#=

p|V | Â

v2V

1�✓

1� 1p

◆D[v]!. (5.4)

For a graph with power-law constant a we obtain:

E"

1|V | Â

v2V|A(v)|

#= p� pLia

✓p�1

p

◆/z (a) (5.5)

where Lia (x) is the transcendental polylog function andz (a) is the Riemann Zeta function (plotted in Fig. 5a).

Higher a values imply a lower replication factor, con-firming our earlier intuition. In contrast to a random 2-way edge-cut which requires order |E|/2 communicationa random 2-way vertex-cut on an a = 2 power-law graphrequires only order 0.3 |V | communication, a substantialsavings on natural graphs where E can be an order ofmagnitude larger than V (see Tab. 1a).

5.2 Greedy Vertex-CutsWe can improve upon the randomly constructed vertex-cut by de-randomizing the edge-placement process. Theresulting algorithm is a sequential greedy heuristic whichplaces the next edge on the machine that minimizes theconditional expected replication factor. To construct thede-randomization we consider the task of placing the i+1edge after having placed the previous i edges. Using theconditional expectation we define the objective:

argmink

E"

Âv2V

|A(v)|

����� Ai,A(ei+1) = k

#(5.6)

6

For  p  Machines:          10  Machines  à  90%  of  edges  cut  100  Machines  à  99%  of  edges  cut!  

All  data  is  communicated…  Licle  advantage  over  MapReduce  

Page 42: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

In  Summary  

GraphLab  1  and  Pregel  are  not  well  suited  for  natural  graphs  

 ! Poor  performance  on  high-­‐degree  verFces  ! Low  Quality  ParFFoning  

Page 43: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

SCALABILITY  

6. Before

8. After

7. After

Page 44: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

Gather  InformaLon  About  Neighborhood  

Apply  Update  to  Vertex  

Sca7er  Signal  to  Neighbors    &  Modify  Edge  Data  

Common  Padern  for  Update  Fncs.  

GraphLab_PageRank(i)        //  Compute  sum  over  neighbors      total  =  0      foreach(  j  in  in_neighbors(i)):            total  =  total  +  R[j]  *  wji        //  Update  the  PageRank      R[i]  =  0.1  +  total          //  Trigger  neighbors  to  run  again      if  R[i]  not  converged  then          foreach(  j  in  out_neighbors(i))                signal  vertex-­‐program  on  j  

R[i]  

R[j]  wji  

Page 45: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

GAS  DecomposiFon  Y  

+  …  +            $    

Y  

Parallel  “Sum”  

Y  

Gather  (Reduce)  Apply  the  accumulated    value  to  center  vertex  

Apply  Update  adjacent  edges  

and  verFces.  

Scacer  

Accumulate  informaFon  about  neighborhood  

Y  

+    

Y  Σ Y’   Y’  

Page 46: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

Many  ML  Algorithms  fit  into  GAS  Model  

 graph  analyFcs,  inference  in  graphical  

models,  matrix  factorizaFon,  collaboraFve  filtering,  clustering,  LDA,  …  

Page 47: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

Minimizing  CommunicaFon  in  GL2  PowerGraph:  Vertex  Cuts  

Y  CommunicaFon  linear    in  #  spanned  machines  

Y  Y  

A  vertex-­‐cut  minimizes    #  machines  per  vertex  

Percola(on  theory  suggests  Power  Law  graphs  can  be  split  by  removing  only  a  small  set  of  ver(ces  [Albert  et  al.  2000]  

è  Small  vertex  cuts  possible!    

GL2  PowerGraph  includes  novel  vertex  cut  algorithms  %  

Provides  order  of  magnitude  gains  in  performance  

Page 48: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

   

From  the  AbstracFon    to  a  System  

6. Before

8. After

7. After

Page 49: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

34.8  Billion  Triangles  Triangle  CounLng  on  Twicer  Graph  

64  Machines  15  Seconds  

1636  Machines  423  Minutes  

Hadoop  [WWW’11]  

S.  Suri  and  S.  Vassilvitskii,  “CounFng  triangles  and  the  curse  of  the  last  reducer,”  WWW’11  

Why?  Wrong  AbstracLon    $                Broadcast  O(degree2)  messages  per  Vertex  

Page 50: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

Topic  Modeling  (LDA)  

! English  language  Wikipedia    ! 2.6M  Documents,  8.3M  Words,  500M  Tokens  

! ComputaFonally  intensive  algorithm  

0   20   40   60   80   100   120   140   160  

Smola  et  al.  

GL2  PowerGraph  

Million  Tokens  Per  Second  

100  Yahoo!  Machines  

64  cc2.8xlarge  EC2  Nodes  

Specifically  engineered  for  this  task  

200  lines  of  code  &  4  human  hours  

Page 51: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

How  well  does  GraphLab  scale?  

Yahoo  Altavista  Web  Graph  (2002):    One  of  the  largest  publicly  available  webgraphs  

1.4B  Webpages,    6.7  Billion  Links  

64  HPC  Nodes  

 

7  seconds  per  iter.  1B  links  processed  per  second  

30  lines  of  user  code    

Page 52: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

GraphChi:  Going  small  with  GraphLab  

Solve  huge  problems  on  small  or  embedded  

devices?  

Key:  Exploit  non-­‐volaFle  memory    (starFng  with  SSDs  and  HDs)  

6. Before

8. After

7. After

Page 53: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

GraphChi  –  disk-­‐based  GraphLab  

Challenge:          Random  Accesses  

Novel  GraphChi  soluLon:          Parallel  sliding  windows  method  è            minimizes  number  of  random  accesses  

Page 54: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

Triangle  CounFng  on  Twicer  Graph  40M  Users      1.2B  Edges  

Total:  34.8  Billion  Triangles  

Hadoop results from [Suri & Vassilvitskii '11]  

59  Minutes  

64  Machines,  1024  Cores  15  Seconds  

GraphLab2  

GraphChi  

Hadoop  

1636  Machines  423  Minutes  

59  Minutes,  1  Mac  Mini!  

Page 55: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

! ML  algorithms  as  vertex  programs    ! Asynchronous  execuFon  and  consistency  models  

! Natural  graphs  change  the  nature  of  computaFon  

! Vertex  cuts  and  gather/apply/scacer  model  

6. Before

8. After

7. After

Page 56: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

Scalability  

GL2  PowerGraph    focused  on  

at  the  loss  of  Usability  

Page 57: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

GraphLab  1  

Explicitly  described  operaLons  

PageRank(i,  scope){      acc  =  0      for  (j  in  InNeighbors)  {          acc  +=  pr[j]  *  edge[j].weight      }      pr[i]  =  0.15  +  0.85  *  acc  }  

Code is intuitive

Page 58: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

GL2  PowerGraph  GraphLab  1  

Explicitly  described  operaLons  

PageRank(i,  scope){      acc  =  0      for  (j  in  InNeighbors)  {          acc  +=  pr[j]  *  edge[j].weight      }      pr[i]  =  0.15  +  0.85  *  acc  }  

Implicit  operaLon  

Implicit  aggregaLon  

gather(edge)  {    return  edge.source.value  *                      edge.weight  }    

 merge(acc1,  acc2)  {                  return  accum1  +  accum2      }      

apply(v,  accum)  {    v.pr  =  0.15  +  0.85  *  acc  }    

Need to understand engine to understand code Code is intuitive

Page 59: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

What now?

Great  flexibility,  but  hit  scalability  wall  

6. Before

8. After

7. After

Scalability,    but  very  rigid  abstracFon  

(many  contorFons  needed  to  implement    

SVD++,  Restricted  Boltzmann  Machines)    

Page 60: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

USABILITY  

6. Before

8. After

7. After

Page 61: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

Machine 1 Machine 2

GL3  WarpGraph  Goals  

Program  Like  GraphLab  1  

Run  Like    GraphLab  2  

Page 62: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

Fine-­‐Grained  PrimiFves  

Y  

PageRankUpdateFunction(Y)  {          Y.pagerank  =  0.15  +  0.85  *                            MapReduceNeighbors(                              lambda  nbr:  nbr.pagerank*nbr.weight,                              lambda  (a,b):  a  +  b                      )  }  

Expose  Neighborhood  OperaLons  through  Parallel  Iterators  

(aggregate sum over neighbors)

R[i] = 0.15 + 0.85X

(j,i)2E

w[j, i] ⇤R[j]

Page 63: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

Expressive,  Extensible  Neighborhood  API  

+            +  …  +                

Y   Y  Y  

Parallel  Sum  

Y  

MapReduce  over  Neighbors  

Y  

Modify  adjacent  edges  

Parallel  Transform  Adjacent  Edges  

Y  

Schedule  a  selected  subset    of  adjacent  verFces  

Broadcast  

Page 64: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

Can  express  every  GL2  PowerGraph  program    (more  easily)  in  GL3  WarpGraph  

MulFple  gathers  

Scacer  before  gather  

CondiFonal  execuFon  

But  GL3  is  more  expressive  

UpdateFunction(v)  {      if  (v.data  ==  1)            accum  =  MapReduceNeighs(g,m)      else  ...  }  

Page 65: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

Graph  Coloring   Twicer  Graph:  41M  VerFces  1.4B  Edges  

WarpGraph  outperforms  PowerGraph  with  simpler  code  

32  Nodes  x  16  Cores  (EC2  HPC  cc2.8x)  

3.8x  Faster  GL3  WarpGraph   60  seconds  

227  seconds  GL2  PowerGraph  

Page 66: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

! ML  algorithms  as  vertex  programs    ! Asynchronous  execuFon  and  consistency  models  

! Natural  graphs  change  the  nature  of  computaFon  

! Vertex  cuts  and  gather/apply/scacer  model  

! Usability  is  key  ! Access  neighborhood  through  parallelizable  iterators  and  latency  hiding  

6. Before

8. After

7. After

6. Before

8. After

7. After

Page 67: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

Usability  for  Whom???  

…  WarpGraph  PowerGraph  

Page 68: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

Machine  Learning  PHASE  3  (part  2)  

USABILITY  

Page 69: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

ExciFng  Time  to  Work  in  ML  With Big Data, I’ll take over the world!!!

We met because of Big Data

Why won’t Big Data read my mind???

Unique  opportuniFes  to  change  the  world!!  ☺  But,  every  deployed  system  is  an  one-­‐off  soluFon,  

and  requires  PhDs  to  make  work…  '  

Page 70: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

ML  key  to  any  new  service  we  want  to  build  

But…    

Even  basics  of  scalable  ML  can  be  challenging  

6  months  from  R/Matlab  to  producFon,  at  best  

State-­‐of-­‐art  ML  algorithms  trapped  in  research  papers  

Goal  of  GraphLab  3:      Make  huge-­‐scale  machine  learning  accessible  to  all!      

Page 71: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

Step  0  :  Learn  ML    With  GraphLab  Notebook  

Page 72: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

Step  1  :  pip  install  graphlab    prototype  on  local  machine  

GraphLab  +  Python  

Page 73: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

Step  2  :  scale  to  full  dataset  in  the  cloud    with  minimal  code  changes  

GraphLab  +  Python  

Page 74: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

Step  3:  deploy  in  production      

GraphLab  +  Python  

Page 75: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

Step  4:  ???      

GraphLab  +  Python  

Page 76: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

Step  4:  Profit      

GraphLab  +  Python  

Page 77: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

GraphLab  Toolkits  Highly  scalable,  state-­‐of-­‐the-­‐art    

machine  learning  methods…  all  accessible  from  python  

Graph    AnalyFcs  

Graphical  Models  

Computer  Vision   Clustering   Topic  

Modeling  CollaboraFve  

Filtering  

Page 78: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

Now  with  GraphLab:    Learn/Prototype/Deploy  

Even  basics  of  scalable  ML  can  be  challenging  

6  months  from  R/Matlab  to  producFon,  at  best  

State-­‐of-­‐art  ML  algorithms  trapped  in  research  papers  

Learn ML with GraphLab Notebook

pip install graphlab then deploy on

EC2/Cluster

Fully integrated via GraphLab Toolkits

Page 79: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

We’re  selecFng  strategic  partners  

Help  define  our  strategy  &  prioriFes  And,  get  the  value  of  GraphLab  in  your  company    

[email protected]  

Page 80: CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

6. Before

8. After

7. After

Define  our  future:  [email protected]  Needless  to  say:  [email protected]  

C++  GraphLab  2.2  available  now:  graphlab.com  Beta  Program:  beta.graphlab.com  Follow  us  on  Twicer:  @graphlabteam