cluster computing architecture intel labs - 01.org · cluster computing architecture 10 *[neo4j]...
TRANSCRIPT
Intel Labs Cluster Computing Architecture
2 Cluster Computing Architecture
• INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL’S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL® PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. INTEL PRODUCTS ARE NOT INTENDED FOR USE IN MEDICAL, LIFE SAVING, OR LIFE SUSTAINING APPLICATIONS.
• Intel may make changes to specifications and product descriptions at any time, without notice.
• All products, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice.
• Intel, processors, chipsets, and desktop boards may contain design defects or errors known as errata, which may cause the product to deviate from published specifications. Current characterized errata are available on request.
• Code names featured are used internally within Intel to identify products that are in development and not yet publicly announced for release. Customers, licensees and other third parties are not authorized by Intel to use code names in advertising, promotion or marketing of any product or services and any such use of Intel's internal code names is at the sole risk of the user
• Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance.
• Intel, Intel Inside, and the Intel logo are trademarks of Intel Corporation in the United States and other countries.
• *Other names and brands may be claimed as the property of others.
• Copyright © 2012 Intel Corporation.
Legal Notices
Peta-scale graphs: The end-to-end challenge
Cluster Computing Architecture 3
Full Internet Map [Lumeta]
Social Graph [Facebook]
• GraphLab is indeed promising
• But we struggled with feeding it and other practicalities
• Set out to study potential approaches…
Cluster Computing Architecture 4
#1: Get a budget #2: Hire ninja coders with ninja resources!
Many of these challenges are solved for small
problems... but what about Internet scale?
Cluster Computing Architecture 5
How do we construct the graph?
Analyze it? _____ it? How do we store it? Query it?
Cluster Computing Architecture 6
Analysis
Transforms
Construction Storage
Interactive Query
Viz
Cluster Computing Architecture 7
Analysis
Transforms
Construction
Viz
*[Neo4j]
Distributed
Graph Databases
GBASE [CMU]
Cluster Computing Architecture 8
Analysis
Transforms
Construction
*[Neo4j]
Distributed
Graph Databases
GBASE [CMU]
Multiscale layout
Graphviz [ATT]
Cluster Computing Architecture 9
Construction
*[Neo4j]
Distributed
Graph Databases
GBASE [CMU]
Multiscale layout Peta-scale graph mining
on
Graphviz [ATT] [CMU]
[Stanford]
“Delete nodes of
degree 10”
Cluster Computing Architecture 10
*[Neo4j]
Distributed
Graph Databases
GBASE [CMU]
Multiscale layout Peta-scale graph mining
on
Graphviz [ATT] [CMU]
[Stanford]
“Delete nodes of
degree 10”
You.
The Data Spectrum
11 Cluster Computing Architecture
<?xml version=“1.0”?>
<quiz>
<question>
Are we really separated
by six degrees of
separation?
</question>
<answer>
According to Facebook,
it’s more like 4.74.
</answer>
...
</quiz>
XML
EDI
OEM
Relational DBs
Spreadsheets
JSON
Text
Speech
Images
Graph Databases
Unstructured Data Structured Data Semi-structured Data
Building Graphs for Practical Apps
Cluster Computing Architecture 12
What are the
highest ranked
pages?
XML
Docs
Extract
Topics & Words Bipartite
(Topics, Words) Count
Word
Frequency
News
Feeds
Extract Noun
Phrases
and Contexts
Bipartite
(NP, Context)
Count
NP Frequency
& Initialize type
Distribution
What does context
tell me about the
type (person, place,
thing) of this noun?
What words are
most associated
with what
topics?
Web
Pages Directed Graph
Extract Page
URLs and Links
Pre- processing
Raw Data
Graph Formation
Add Network Information
N/A
• Minimize the use of system resources, like memory,
storage, etc.
• Ensure GL’s computational effort is load balanced for power-law graphs
• Do our best to ensure the graph we generated is the
one we intended to
And, in practice and at scale we must:
Cluster Computing Architecture 13
… but the application programmer shouldn’t be
responsible for this domain expertise!
Pre- processing
Raw Data
Graph Formation
Add Network Information
Finalize for Parallel Computation
Cluster Computing Architecture 14
GraphBuilder makes it easy.
Cluster Computing Architecture 15
• Fills a hole in the ecosystem
• Written in Java for easy use in Hadoop MapReduce and applications
• Offloads domain expertise
Parsing Tokenization
or Feature Extraction
Edge List Generation
E, V Data Tabulation
Graph Checks & Transforms
Graph Partitioning & Serialization
App-Specific Code
GraphBuilder Library
Graph Normalization
Raw
Data
To
GraphLab
Graph Building is MapReduce-able
Cluster Computing Architecture 16
SHUFFLE
Cluster Computing Architecture 17
Kushal
Diana
Nilesh
Danny
Ted
Frank
Ivy
Jay
Interests People
Okay, we have a graph. Now what?
Cluster Computing Architecture 18
M2
M1
(5,4)
(3,2)
(0,6)
(1,7)
• We can save memory if we normalize it (e.g., reduced Wikipedia PageRank graph by 70%)
• But, seems to call for a global lookup in a framework that prefers independent subproblems
• A simple, scalable solution is to “shard” ordered lists:
Graph Normalization
Cluster Computing Architecture 19
(Aaron,0)
(AMD,4)
(Brad,1)
(CMU,2)
(Dan,5)
(Dave,3)
(IBM,6)
(Intel,7)
(Aaron,IBM)
(Brad,Intel)
(AMD,5)
(CMU,3)
(Aaron,0)
(AMD,4)
(Brad,1)
(CMU,2)
(Dan,5)
(Dave,3)
(IBM,6)
(Intel,7)
Dictionary Shard 1 Converted
Edge List
Shard 2 (Dan,AMD)
(Dave,CMU) (IBM,0)
(Intel,1)
Unconverted
Edge List
(Source Sorted) (Dest Sorted)
• Would like the ability to: – Optionally filter duplicate and/or self edges
– Transform a directed graph into an undirected graph
• The library provides: – Functions to perform self- and duplicate-edge removal
– Directionality transformation
• Solutions are based on a distributed hashing algorithm
Graph Checks & Transforms
Cluster Computing Architecture 20
M1
M2
H(A, B) H(C, D)
H(A, B) H(C, D)
Detector
Steering function
Detector B A
A B
C D
D C
Graph Checks & Transforms
Cluster Computing Architecture 21
M1
M2
H(A, B) H(C, D)
H(A, B) H(C, D)
Detector
Steering function
Detector
B A A B
C D
D C
• Would like the ability to: – Optionally filter duplicate and/or self edges
– Transform a directed graph into an undirected graph
• The library provides: – Functions to perform self- and duplicate-edge removal
– Directionality transformation
• Solutions are based on a distributed hashing algorithm
Graph Checks & Transforms
Cluster Computing Architecture 22
M1
M2
H(A, B) H(C, D)
H(A, B) H(C, D)
Detector
Steering function
Detector
C D
A B
D C
• Would like the ability to: – Optionally filter duplicate and/or self edges
– Transform a directed graph into an undirected graph
• The library provides: – Functions to perform self- and duplicate-edge removal
– Directionality transformation
• Solutions are based on a distributed hashing algorithm
Graph Checks & Transforms
Cluster Computing Architecture 23
M1
M2
H(A, B) H(C, D)
H(A, B) H(C, D)
Detector
Steering function
Detector C D D C
A B
• Would like the ability to: – Optionally filter duplicate and/or self edges
– Transform a directed graph into an undirected graph
• The library provides: – Functions to perform self- and duplicate-edge removal
– Directionality transformation
• Solutions are based on a distributed hashing algorithm
• Minimize communications by minimizing the number of machines v spans
• Maximize the edges placed on each machine, subject an imbalance factor
Graph Partitioning Objectives
Cluster Computing Architecture 24
A
B
C D
1
S
S S
S
2
• Minimize communications by minimizing the number of machines v spans
• Maximize the edges placed on each machine, subject an imbalance factor
Cluster Computing Architecture 25
A B
C D
1 1
1 2
2
Graph Partitioning Objectives
• Random edge assignment (to systems)
• Greedy uses global history used to place edges
• Oblivious implements a local version of the Greedy strategy
Partitioning Strategies
Cluster Computing Architecture 26
A
B
F C
D
E
G Machine 1
Oblivious Algorithm
J
I
B D
K
A
H
Machine 2
A
B
C D
E
F
Partition 1 Partition 2
CASE 1:
Both end points
have never been
seen before
Randomly
assign
A
B
F C
D
E
G
Machine 1’s Shard
A
D
Partition 1 Partition 2
A
B
E
F
C D
CASE 2:
Both end points have been seen
before on the same
partition
Assign to a partition which
contains both
endpoints
A
B
F C
D
E
G
Machine 1’s Shard
E
C
Partition 1 Partition 2
A
B
E
F
C D
A
B
F C
D
E
G
CASE 3:
Both end points
have been seen
before but on
different partitions
Assign to any
partition that
contains an endpoint
Machine 1’s Shard
B
F
E
D
C
Partition 1 Partition 2
A
B
E
F
C D
A
B
F C
D
E
G
CASE 3:
Both end points
have been seen
before but on
different partitions
Assign to any
partition that
contains an endpoint
Machine 1’s Shard
D
E
G
C
Partition1 Partition 2
A
B
E
F
C D
F
A
B
F C
D
E
G
CASE 4:
Only one end point
has been seen
before
Assign to a
partition that
contains the
endpoint
Machine 1’s Shard
Replication Results Twitter Graph: 41M vertices, 1.4B edges
Cluster Computing Architecture *Gonzalez et al., “PowerGraph: Distributed Graph-Parallel
Computation on Natural Graphs,” submitted to OSDI’12 33
Graph Serialization
• Self-describing data format
– JSON
– JSON with compression
• Extensible
– Easy to extend to alternative frameworks like Giraph
– May port to Graph Databases
Cluster Computing Architecture 34
Partitioning
JSON Encoding
Edge Lists
Vertex Lists
{
“src_id”: 34,
“dest_id”: 45
“e-data”: 30
}
{
“ver_id”: 34,
“v-data”: 56,
“mirror”: [1,2,3],
“owner”: 1
}
Build it!
Cluster Computing Architecture 35
GraphBuilder Stack
Cluster Computing Architecture 36
HDFS
Distributed Graph
Computation (GLv2)
MapReduce
GraphBuilder
GraphBuilder app GraphLab app
Built-in Parsers
User defined parser
Tabulations
User defined functions
Data Normalization
Graph Transformation
Graph Partitioning Fr
ame
wo
rk
Ad
apte
r
JSO
N
User Defined
• Hardware: 8 node cluster – 1U Dual CPU (Intel SNB)
Amazon build ZT systems – 64 GB Memory, Four
SATA Hard Drives – Intel 10G Adapter and
Switch
• Software: – Apache Hadoop 1.0.1 – GraphLab v2.1 – GraphBuilder beta
Prototype Overview
Cluster Computing Architecture 37
Our Wikipedia Graphs
38 Cluster Computing Architecture
Graph |V| |E|
LDA 4.9M 478M 2.23
PageRank 9.7M 107M 2.41
Top 1% of vertices are
adjacent to
49% of the edges!
LDA PageRank
Scaling Experiment
Cluster Computing Architecture 39
0
200
400
600
800
1000
4x 2x 1x
Ov
era
ll T
ime
(S
eco
nd
s)
Dataset Size
Finalization
Partitioning
Pre-processing/
Graph Formation
How did we do at partitioning?
Cluster Computing Architecture 40
7.5 replications/vertex 5.7 replications/vertex
Random Oblivious
For PageRank, the stats were 4.1 and 3.5, respectively.
• Improve partitioning quality and performance
• Explore better iterative MapReduce models
• Consider additional library functions
• Push to peta-scale graphs
• Prepare for open sourcing!
What’s next for GraphBuilder?
Cluster Computing Architecture 41
The Broader Perspective • Collaborate with the Big Data ISTC to develop large-
scale Graph DBs for GL2 and GraphBuilder
• Bring new technologies to large-scale ML systems, such as fast persistent memory-storage
• Explore new scale-out platform architectures for Big Data and Big Learning
42 Cluster Computing Architecture
Big Data ISTC
MIT
PSU
Brown
UW Seattle
Stanford
U Tenn Knoxville
UC Santa Barbara
Five focus areas: • Databases & Analytics • Math & Algorithms • Visualization • Architecture • Streaming