![Page 1: Data Parallel and Graph Parallel Systems for Large-scal e Data P rocessing](https://reader035.vdocuments.net/reader035/viewer/2022070401/56813723550346895d9eb056/html5/thumbnails/1.jpg)
Data Parallel and Graph Parallel Systems for Large-scale Data Processing
Presenter: Kun Li
![Page 2: Data Parallel and Graph Parallel Systems for Large-scal e Data P rocessing](https://reader035.vdocuments.net/reader035/viewer/2022070401/56813723550346895d9eb056/html5/thumbnails/2.jpg)
Threads, Locks, and Messages• ML experts repeatedly solve the same parallel
design challenges:– Implement and debug complex parallel system– Tune for a specific parallel platform– Two months later the conference paper contains:
“We implemented ______ in parallel.”• The resulting code:– is difficult to maintain– is difficult to extend– couples learning model to parallel implementation
2
![Page 3: Data Parallel and Graph Parallel Systems for Large-scal e Data P rocessing](https://reader035.vdocuments.net/reader035/viewer/2022070401/56813723550346895d9eb056/html5/thumbnails/3.jpg)
Map-Reduce / Hadoop
Build learning algorithms on-top of high-level parallel abstractions
... a better answer:
![Page 4: Data Parallel and Graph Parallel Systems for Large-scal e Data P rocessing](https://reader035.vdocuments.net/reader035/viewer/2022070401/56813723550346895d9eb056/html5/thumbnails/4.jpg)
Motivation
• Large-Scale Data Processing– Want to use 1000s of CPUs• But don’t want hassle of managing things
• MapReduce provides– Automatic parallelization & distribution– Fault tolerance– I/O scheduling– Monitoring & status updates
![Page 5: Data Parallel and Graph Parallel Systems for Large-scal e Data P rocessing](https://reader035.vdocuments.net/reader035/viewer/2022070401/56813723550346895d9eb056/html5/thumbnails/5.jpg)
Map/Reduce• map(key, val) is run on each item in set– emits new-key / new-val pairs
• reduce(key, vals) is run for each unique key emitted by map()– emits final output
![Page 6: Data Parallel and Graph Parallel Systems for Large-scal e Data P rocessing](https://reader035.vdocuments.net/reader035/viewer/2022070401/56813723550346895d9eb056/html5/thumbnails/6.jpg)
Count count indocs
map(key=url, val=contents):For each word w in contents, emit (w, “1”)
reduce(key=word, values=uniq_counts):Sum all “1”s in values listEmit result “(word, sum)”
see bob throwsee spot run
see 1bob 1 run 1see 1spot 1throw 1
bob 1 run 1see 2spot 1throw 1
![Page 7: Data Parallel and Graph Parallel Systems for Large-scal e Data P rocessing](https://reader035.vdocuments.net/reader035/viewer/2022070401/56813723550346895d9eb056/html5/thumbnails/7.jpg)
Grep
– Input consists of (url+offset, single line)– map(key=url+offset, val=line):• If contents matches regexp, emit (line, “1”)
– reduce(key=line, values=uniq_counts):• Don’t do anything; just emit line
![Page 8: Data Parallel and Graph Parallel Systems for Large-scal e Data P rocessing](https://reader035.vdocuments.net/reader035/viewer/2022070401/56813723550346895d9eb056/html5/thumbnails/8.jpg)
Reverse Web-Link Graph
• Map– For each URL linking to target, …– Output <target, source> pairs
• Reduce– Concatenate list of all source URLs– Outputs: <target, list (source)> pairs
![Page 9: Data Parallel and Graph Parallel Systems for Large-scal e Data P rocessing](https://reader035.vdocuments.net/reader035/viewer/2022070401/56813723550346895d9eb056/html5/thumbnails/9.jpg)
Job Processing
JobTracker
TaskTracker 0TaskTracker 1 TaskTracker 2
TaskTracker 3 TaskTracker 4 TaskTracker 5
1. Client submits “grep” job, indicating code and input files
2. JobTracker breaks input file into k chunks, (in this case 6). Assigns work to ttrackers.
3. After map(), tasktrackers exchange map-output to build reduce() keyspace
4. JobTracker breaks reduce() keyspace into m chunks (in this case 6). Assigns work.
5. reduce() output may go to NDFS
“grep”
![Page 10: Data Parallel and Graph Parallel Systems for Large-scal e Data P rocessing](https://reader035.vdocuments.net/reader035/viewer/2022070401/56813723550346895d9eb056/html5/thumbnails/10.jpg)
Execution
![Page 11: Data Parallel and Graph Parallel Systems for Large-scal e Data P rocessing](https://reader035.vdocuments.net/reader035/viewer/2022070401/56813723550346895d9eb056/html5/thumbnails/11.jpg)
Parallel Execution
![Page 12: Data Parallel and Graph Parallel Systems for Large-scal e Data P rocessing](https://reader035.vdocuments.net/reader035/viewer/2022070401/56813723550346895d9eb056/html5/thumbnails/12.jpg)
![Page 13: Data Parallel and Graph Parallel Systems for Large-scal e Data P rocessing](https://reader035.vdocuments.net/reader035/viewer/2022070401/56813723550346895d9eb056/html5/thumbnails/13.jpg)
![Page 14: Data Parallel and Graph Parallel Systems for Large-scal e Data P rocessing](https://reader035.vdocuments.net/reader035/viewer/2022070401/56813723550346895d9eb056/html5/thumbnails/14.jpg)
![Page 15: Data Parallel and Graph Parallel Systems for Large-scal e Data P rocessing](https://reader035.vdocuments.net/reader035/viewer/2022070401/56813723550346895d9eb056/html5/thumbnails/15.jpg)
![Page 16: Data Parallel and Graph Parallel Systems for Large-scal e Data P rocessing](https://reader035.vdocuments.net/reader035/viewer/2022070401/56813723550346895d9eb056/html5/thumbnails/16.jpg)
![Page 17: Data Parallel and Graph Parallel Systems for Large-scal e Data P rocessing](https://reader035.vdocuments.net/reader035/viewer/2022070401/56813723550346895d9eb056/html5/thumbnails/17.jpg)
![Page 18: Data Parallel and Graph Parallel Systems for Large-scal e Data P rocessing](https://reader035.vdocuments.net/reader035/viewer/2022070401/56813723550346895d9eb056/html5/thumbnails/18.jpg)
Refinement: Locality Optimization
• Master scheduling policy: – Asks GFS for locations of replicas of input file blocks – Map tasks scheduled so GFS input block replica are on same
machine or same rack
• Effect– Thousands of machines read input at local disk speed
• Without this, rack switches limit read rate
• Combiner– Useful for saving network bandwidth
![Page 19: Data Parallel and Graph Parallel Systems for Large-scal e Data P rocessing](https://reader035.vdocuments.net/reader035/viewer/2022070401/56813723550346895d9eb056/html5/thumbnails/19.jpg)
BeliefPropagation
Label Propagation
KernelMethods
Deep BeliefNetworks
NeuralNetworks
Tensor Factorization
PageRank
Lasso
Map-Reduce for Data-Parallel ML• Excellent for large data-parallel tasks!
19
Data-ParallelGraph-Parallel
CrossValidation
Feature Extraction
Map Reduce
Computing SufficientStatistics
Is there more toMachine Learning
?
![Page 20: Data Parallel and Graph Parallel Systems for Large-scal e Data P rocessing](https://reader035.vdocuments.net/reader035/viewer/2022070401/56813723550346895d9eb056/html5/thumbnails/20.jpg)
Properties of Graph Parallel Algorithms
DependencyGraph
IterativeComputation
What I Like
What My Friends Like
Factored Computation
![Page 21: Data Parallel and Graph Parallel Systems for Large-scal e Data P rocessing](https://reader035.vdocuments.net/reader035/viewer/2022070401/56813723550346895d9eb056/html5/thumbnails/21.jpg)
?
BeliefPropagation
Label Propagation
KernelMethods
Deep BeliefNetworks
NeuralNetworks
Tensor Factorization
PageRank
Lasso
Map-Reduce for Data-Parallel ML• Excellent for large data-parallel tasks!
21
Data-ParallelGraph-Parallel
CrossValidation
Feature Extraction
Map Reduce
Computing SufficientStatistics
Map Reduce?
![Page 22: Data Parallel and Graph Parallel Systems for Large-scal e Data P rocessing](https://reader035.vdocuments.net/reader035/viewer/2022070401/56813723550346895d9eb056/html5/thumbnails/22.jpg)
Why not use Map-Reducefor
Graph Parallel Algorithms?
![Page 23: Data Parallel and Graph Parallel Systems for Large-scal e Data P rocessing](https://reader035.vdocuments.net/reader035/viewer/2022070401/56813723550346895d9eb056/html5/thumbnails/23.jpg)
Data Dependencies• Map-Reduce does not efficiently express
dependent data– User must code substantial data transformations – Costly data replication
Inde
pend
ent D
ata
Row
s
![Page 24: Data Parallel and Graph Parallel Systems for Large-scal e Data P rocessing](https://reader035.vdocuments.net/reader035/viewer/2022070401/56813723550346895d9eb056/html5/thumbnails/24.jpg)
Slow
Proc
esso
rIterative Algorithms
• Map-Reduce not efficiently express iterative algorithms:
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
CPU 1
CPU 2
CPU 3
Data
Data
Data
Data
Data
Data
Data
CPU 1
CPU 2
CPU 3
Data
Data
Data
Data
Data
Data
Data
CPU 1
CPU 2
CPU 3
Iterations
Barr
ier
Barr
ier
Barr
ier
![Page 25: Data Parallel and Graph Parallel Systems for Large-scal e Data P rocessing](https://reader035.vdocuments.net/reader035/viewer/2022070401/56813723550346895d9eb056/html5/thumbnails/25.jpg)
MapAbuse: Iterative MapReduce• Only a subset of data needs computation:
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
CPU 1
CPU 2
CPU 3
Data
Data
Data
Data
Data
Data
Data
CPU 1
CPU 2
CPU 3
Data
Data
Data
Data
Data
Data
Data
CPU 1
CPU 2
CPU 3
Iterations
Barr
ier
Barr
ier
Barr
ier
![Page 26: Data Parallel and Graph Parallel Systems for Large-scal e Data P rocessing](https://reader035.vdocuments.net/reader035/viewer/2022070401/56813723550346895d9eb056/html5/thumbnails/26.jpg)
MapAbuse: Iterative MapReduce• System is not optimized for iteration:
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
CPU 1
CPU 2
CPU 3
Data
Data
Data
Data
Data
Data
Data
CPU 1
CPU 2
CPU 3
Data
Data
Data
Data
Data
Data
Data
CPU 1
CPU 2
CPU 3
Iterations
Disk Pe
nalty
Disk Pe
nalty
Disk Pe
nalty
Sta
rtup
Pen
alty
Sta
rtup
Pen
alty
Sta
rtup
Pen
alty
![Page 27: Data Parallel and Graph Parallel Systems for Large-scal e Data P rocessing](https://reader035.vdocuments.net/reader035/viewer/2022070401/56813723550346895d9eb056/html5/thumbnails/27.jpg)
BeliefPropagation
SVM
KernelMethods
Deep BeliefNetworks
NeuralNetworks
Tensor Factorization
PageRank
Lasso
Map-Reduce for Data-Parallel ML• Excellent for large data-parallel tasks!
27
Data-ParallelGraph-Parallel
CrossValidation
Feature Extraction
Map Reduce
Computing SufficientStatistics
Map Reduce?GraphLab
![Page 28: Data Parallel and Graph Parallel Systems for Large-scal e Data P rocessing](https://reader035.vdocuments.net/reader035/viewer/2022070401/56813723550346895d9eb056/html5/thumbnails/28.jpg)
The GraphLab Framework
Scheduler Consistency Model
Graph BasedData Representation
Update FunctionsUser Computation
28
![Page 29: Data Parallel and Graph Parallel Systems for Large-scal e Data P rocessing](https://reader035.vdocuments.net/reader035/viewer/2022070401/56813723550346895d9eb056/html5/thumbnails/29.jpg)
Data Graph
29
A graph with arbitrary data (C++ Objects) associated with each vertex and edge.
Vertex Data:•User profile text• Current interests estimates
Edge Data:• Similarity weights
Graph:• Social Network
![Page 30: Data Parallel and Graph Parallel Systems for Large-scal e Data P rocessing](https://reader035.vdocuments.net/reader035/viewer/2022070401/56813723550346895d9eb056/html5/thumbnails/30.jpg)
Implementing the Data GraphMulticore Setting
• In Memory• Relatively Straight Forward
– vertex_data(vid) data– edge_data(vid,vid) data– neighbors(vid) vid_list
• Challenge:– Fast lookup, low overhead
• Solution:– Dense data-structures– Fixed Vdata&Edata types– Immutable graph structure
![Page 31: Data Parallel and Graph Parallel Systems for Large-scal e Data P rocessing](https://reader035.vdocuments.net/reader035/viewer/2022070401/56813723550346895d9eb056/html5/thumbnails/31.jpg)
The GraphLab Framework
Scheduler Consistency Model
Graph BasedData Representation
Update FunctionsUser Computation
31
![Page 32: Data Parallel and Graph Parallel Systems for Large-scal e Data P rocessing](https://reader035.vdocuments.net/reader035/viewer/2022070401/56813723550346895d9eb056/html5/thumbnails/32.jpg)
label_prop(i, scope){// Get Neighborhood data (Likes[i], Wij, Likes[j]) scope;
// Update the vertex data
// Reschedule Neighbors if needed if Likes[i] changes then reschedule_neighbors_of(i); }
Update Functions
32
An update function is a user defined program which when applied to a vertex transforms the data in the scopeof the vertex
![Page 33: Data Parallel and Graph Parallel Systems for Large-scal e Data P rocessing](https://reader035.vdocuments.net/reader035/viewer/2022070401/56813723550346895d9eb056/html5/thumbnails/33.jpg)
The GraphLab Framework
Scheduler Consistency Model
Graph BasedData Representation
Update FunctionsUser Computation
33
![Page 34: Data Parallel and Graph Parallel Systems for Large-scal e Data P rocessing](https://reader035.vdocuments.net/reader035/viewer/2022070401/56813723550346895d9eb056/html5/thumbnails/34.jpg)
The Scheduler
34
CPU 1
CPU 2
The scheduler determines the order that vertices are updated.
e f g
kjih
dcba b
ih
a
i
b e f
j
c
Sch
edule
r
The process repeats until the scheduler is empty.
![Page 35: Data Parallel and Graph Parallel Systems for Large-scal e Data P rocessing](https://reader035.vdocuments.net/reader035/viewer/2022070401/56813723550346895d9eb056/html5/thumbnails/35.jpg)
The GraphLab Framework
Scheduler Consistency Model
Graph BasedData Representation
Update FunctionsUser Computation
36
![Page 36: Data Parallel and Graph Parallel Systems for Large-scal e Data P rocessing](https://reader035.vdocuments.net/reader035/viewer/2022070401/56813723550346895d9eb056/html5/thumbnails/36.jpg)
Ensuring Race-Free Code• How much can computation overlap?
![Page 37: Data Parallel and Graph Parallel Systems for Large-scal e Data P rocessing](https://reader035.vdocuments.net/reader035/viewer/2022070401/56813723550346895d9eb056/html5/thumbnails/37.jpg)
GraphLab Ensures Sequential Consistency
38
For each parallel execution, there exists a sequential execution of update functions which produces the same result.
CPU 1
CPU 2
SingleCPU
Parallel
Sequential
time
![Page 38: Data Parallel and Graph Parallel Systems for Large-scal e Data P rocessing](https://reader035.vdocuments.net/reader035/viewer/2022070401/56813723550346895d9eb056/html5/thumbnails/38.jpg)
Consistency Rules
40
Guaranteed sequential consistency for all update functions
Data
![Page 39: Data Parallel and Graph Parallel Systems for Large-scal e Data P rocessing](https://reader035.vdocuments.net/reader035/viewer/2022070401/56813723550346895d9eb056/html5/thumbnails/39.jpg)
Full Consistency
41
![Page 40: Data Parallel and Graph Parallel Systems for Large-scal e Data P rocessing](https://reader035.vdocuments.net/reader035/viewer/2022070401/56813723550346895d9eb056/html5/thumbnails/40.jpg)
Obtaining More Parallelism
42
![Page 41: Data Parallel and Graph Parallel Systems for Large-scal e Data P rocessing](https://reader035.vdocuments.net/reader035/viewer/2022070401/56813723550346895d9eb056/html5/thumbnails/41.jpg)
Edge Consistency
43
CPU 1 CPU 2
Safe
Read
![Page 42: Data Parallel and Graph Parallel Systems for Large-scal e Data P rocessing](https://reader035.vdocuments.net/reader035/viewer/2022070401/56813723550346895d9eb056/html5/thumbnails/42.jpg)
Consistency Through R/W Locks• Read/Write locks:– Full Consistency
– Edge Consistency
Write Write WriteCanonical Lock Ordering
Read Write ReadRead Write
![Page 43: Data Parallel and Graph Parallel Systems for Large-scal e Data P rocessing](https://reader035.vdocuments.net/reader035/viewer/2022070401/56813723550346895d9eb056/html5/thumbnails/43.jpg)
Consistency Through Scheduling• Edge Consistency Model:– Two vertices can be Updated simultaneously if they do
not share an edge.• Graph Coloring:– Two vertices can be assigned the same color if they do
not share an edge.
Barr
ier
Phase 1
Barr
ier
Phase 2
Barr
ier
Phase 3