data parallel and graph parallel systems for large-scal e data p rocessing

Data Parallel and Graph Parallel Systems for Large-scale Data Processing

Presenter: Kun Li

Threads, Locks, and Messages• ML experts repeatedly solve the same parallel

design challenges:– Implement and debug complex parallel system– Tune for a specific parallel platform– Two months later the conference paper contains:

“We implemented ______ in parallel.”• The resulting code:– is difficult to maintain– is difficult to extend– couples learning model to parallel implementation

Map-Reduce / Hadoop

Build learning algorithms on-top of high-level parallel abstractions

... a better answer:

Motivation

• Large-Scale Data Processing– Want to use 1000s of CPUs• But don’t want hassle of managing things

• MapReduce provides– Automatic parallelization & distribution– Fault tolerance– I/O scheduling– Monitoring & status updates

Map/Reduce• map(key, val) is run on each item in set– emits new-key / new-val pairs

• reduce(key, vals) is run for each unique key emitted by map()– emits final output

Count count indocs

map(key=url, val=contents):For each word w in contents, emit (w, “1”)

reduce(key=word, values=uniq_counts):Sum all “1”s in values listEmit result “(word, sum)”

see bob throwsee spot run

see 1bob 1 run 1see 1spot 1throw 1

bob 1 run 1see 2spot 1throw 1

– Input consists of (url+offset, single line)– map(key=url+offset, val=line):• If contents matches regexp, emit (line, “1”)

– reduce(key=line, values=uniq_counts):• Don’t do anything; just emit line

Reverse Web-Link Graph

• Map– For each URL linking to target, …– Output <target, source> pairs

• Reduce– Concatenate list of all source URLs– Outputs: <target, list (source)> pairs

Job Processing

JobTracker

TaskTracker 0TaskTracker 1 TaskTracker 2

TaskTracker 3 TaskTracker 4 TaskTracker 5

1. Client submits “grep” job, indicating code and input files

2. JobTracker breaks input file into k chunks, (in this case 6). Assigns work to ttrackers.

3. After map(), tasktrackers exchange map-output to build reduce() keyspace

4. JobTracker breaks reduce() keyspace into m chunks (in this case 6). Assigns work.

5. reduce() output may go to NDFS

“grep”

Execution

Parallel Execution

Refinement: Locality Optimization

• Master scheduling policy: – Asks GFS for locations of replicas of input file blocks – Map tasks scheduled so GFS input block replica are on same

machine or same rack

• Effect– Thousands of machines read input at local disk speed

• Without this, rack switches limit read rate

• Combiner– Useful for saving network bandwidth

BeliefPropagation

Label Propagation

KernelMethods

Deep BeliefNetworks

NeuralNetworks

Tensor Factorization

PageRank

Map-Reduce for Data-Parallel ML• Excellent for large data-parallel tasks!

Data-ParallelGraph-Parallel

CrossValidation

Feature Extraction

Map Reduce

Computing SufficientStatistics

Is there more toMachine Learning

Properties of Graph Parallel Algorithms

DependencyGraph

IterativeComputation

What I Like

What My Friends Like

Factored Computation

BeliefPropagation

Label Propagation

KernelMethods

Deep BeliefNetworks

NeuralNetworks

PageRank

CrossValidation

Feature Extraction

Map Reduce

Map Reduce?

Why not use Map-Reducefor

Graph Parallel Algorithms?

Data Dependencies• Map-Reduce does not efficiently express

dependent data– User must code substantial data transformations – Costly data replication

rIterative Algorithms

• Map-Reduce not efficiently express iterative algorithms:

Iterations

MapAbuse: Iterative MapReduce• Only a subset of data needs computation:

Iterations

MapAbuse: Iterative MapReduce• System is not optimized for iteration:

Iterations

Disk Pe

BeliefPropagation

KernelMethods

Deep BeliefNetworks

NeuralNetworks

PageRank

CrossValidation

Feature Extraction

Map Reduce

Map Reduce?GraphLab

The GraphLab Framework

Scheduler Consistency Model

Graph BasedData Representation

Update FunctionsUser Computation

Data Graph

A graph with arbitrary data (C++ Objects) associated with each vertex and edge.

Vertex Data:•User profile text• Current interests estimates

Edge Data:• Similarity weights

Graph:• Social Network

Implementing the Data GraphMulticore Setting

• In Memory• Relatively Straight Forward

– vertex_data(vid) data– edge_data(vid,vid) data– neighbors(vid) vid_list

• Challenge:– Fast lookup, low overhead

• Solution:– Dense data-structures– Fixed Vdata&Edata types– Immutable graph structure

label_prop(i, scope){// Get Neighborhood data (Likes[i], Wij, Likes[j]) scope;

// Update the vertex data

// Reschedule Neighbors if needed if Likes[i] changes then reschedule_neighbors_of(i); }

Update Functions

An update function is a user defined program which when applied to a vertex transforms the data in the scopeof the vertex

The Scheduler

The scheduler determines the order that vertices are updated.

dcba b

The process repeats until the scheduler is empty.

Ensuring Race-Free Code• How much can computation overlap?

GraphLab Ensures Sequential Consistency

For each parallel execution, there exists a sequential execution of update functions which produces the same result.

SingleCPU

Parallel

Sequential

Consistency Rules

Guaranteed sequential consistency for all update functions

Full Consistency

Obtaining More Parallelism

Edge Consistency

CPU 1 CPU 2

Consistency Through R/W Locks• Read/Write locks:– Full Consistency

– Edge Consistency

Write Write WriteCanonical Lock Ordering

Read Write ReadRead Write

Consistency Through Scheduling• Edge Consistency Model:– Two vertices can be Updated simultaneously if they do

not share an edge.• Graph Coloring:– Two vertices can be assigned the same color if they do

not share an edge.

Phase 1

Phase 2

Phase 3

data parallel and graph parallel systems for large-scal e data p rocessing

Documents

binary c onstraint p rocessing chapter 2

s ignal p rocessing ( time - based effects ) delay,...

i ntroduction to p arallel p rocessing

improving scal

scal ability engine guidelines

r eal w orld s ignal p rocessing - analog, embedded...

cydar-scal user manual

molas cox in scal culo s

c entral p rocessing u nit cpu

histopathological image analysis using mage rocessing

la eﬁ ciencia de la captación ﬁ scal judicial ·...

the pan-starrs m oving o bject p rocessing s ystem (&...

correlations pvt scal

scal construction safety, health & security seminar 2017 ·...

2011 - toyota industries · the ﬁ scal year ended march...

scal csr chinese new year lunch for the elderly at radin...

comp30291 d igital m edia p rocessing

collision avoidance with potential fields based on...

scal, inc

dti image p rocessing p ipeline and cloud computing...