cluster computing architecture intel labs - 01.org · cluster computing architecture 10 *[neo4j]...

Intel Labs Cluster Computing Architecture

2 Cluster Computing Architecture

• INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL’S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL® PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. INTEL PRODUCTS ARE NOT INTENDED FOR USE IN MEDICAL, LIFE SAVING, OR LIFE SUSTAINING APPLICATIONS.

• Intel may make changes to specifications and product descriptions at any time, without notice.

• All products, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice.

• Intel, processors, chipsets, and desktop boards may contain design defects or errors known as errata, which may cause the product to deviate from published specifications. Current characterized errata are available on request.

• Code names featured are used internally within Intel to identify products that are in development and not yet publicly announced for release. Customers, licensees and other third parties are not authorized by Intel to use code names in advertising, promotion or marketing of any product or services and any such use of Intel's internal code names is at the sole risk of the user

• Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance.

• Intel, Intel Inside, and the Intel logo are trademarks of Intel Corporation in the United States and other countries.

• *Other names and brands may be claimed as the property of others.

• Copyright © 2012 Intel Corporation.

Legal Notices

Peta-scale graphs: The end-to-end challenge

Cluster Computing Architecture 3

Full Internet Map [Lumeta]

Social Graph [Facebook]

• GraphLab is indeed promising

• But we struggled with feeding it and other practicalities

• Set out to study potential approaches…


#1: Get a budget #2: Hire ninja coders with ninja resources!

Many of these challenges are solved for small

problems... but what about Internet scale?


How do we construct the graph?

Analyze it? _____ it? How do we store it? Query it?


Analysis

Transforms

Construction Storage

Interactive Query

Viz


Analysis

Transforms

Construction

Viz

*[Neo4j]

Distributed

Graph Databases

GBASE [CMU]

http://hive.apache.org/

http://www.scidb.org/


Analysis

Transforms

Construction

*[Neo4j]

Distributed

Graph Databases

GBASE [CMU]

Multiscale layout

Graphviz [ATT]




Construction

*[Neo4j]

Distributed

Graph Databases

GBASE [CMU]

Multiscale layout Peta-scale graph mining

on

Graphviz [ATT] [CMU]

[Stanford]

“Delete nodes of

degree 10”



http://www.cs.cmu.edu/~pegasus/index.htm

http://hadoop.apache.org/


*[Neo4j]

Distributed

Graph Databases

GBASE [CMU]

Multiscale layout Peta-scale graph mining

on

Graphviz [ATT] [CMU]

[Stanford]

“Delete nodes of

degree 10”

You.



http://www.cs.cmu.edu/~pegasus/index.htm

http://hadoop.apache.org/

The Data Spectrum


<?xml version=“1.0”?>

<quiz>

<question>

Are we really separated

by six degrees of

separation?

</question>

<answer>

According to Facebook,

it’s more like 4.74.

</answer>

...

</quiz>

XML

EDI

OEM

Relational DBs

Spreadsheets

JSON

Text

Speech

Images

Graph Databases

Unstructured Data Structured Data Semi-structured Data

Building Graphs for Practical Apps


What are the

highest ranked

pages?

XML

Docs

Extract

Topics & Words Bipartite

(Topics, Words) Count

Word

Frequency

News

Feeds

Extract Noun

Phrases

and Contexts

Bipartite

(NP, Context)

Count

NP Frequency

& Initialize type

Distribution

What does context

tell me about the

type (person, place,

thing) of this noun?

What words are

most associated

with what

topics?

Web

Pages Directed Graph

Extract Page

URLs and Links

Pre- processing

Raw Data

Graph Formation

Add Network Information

N/A

• Minimize the use of system resources, like memory,

storage, etc.

• Ensure GL’s computational effort is load balanced for power-law graphs

• Do our best to ensure the graph we generated is the

one we intended to

And, in practice and at scale we must:


… but the application programmer shouldn’t be

responsible for this domain expertise!

Pre- processing

Raw Data

Graph Formation

Add Network Information

Finalize for Parallel Computation

GraphBuilder makes it easy.


• Fills a hole in the ecosystem

• Written in Java for easy use in Hadoop MapReduce and applications

• Offloads domain expertise

Parsing Tokenization

or Feature Extraction

Edge List Generation

E, V Data Tabulation

Graph Checks & Transforms

Graph Partitioning & Serialization

App-Specific Code

GraphBuilder Library

Graph Normalization

Raw

Data

To

GraphLab

Graph Building is MapReduce-able


SHUFFLE


Kushal

Diana

Nilesh

Danny

Ted

Frank

Ivy

Jay

Interests People

Okay, we have a graph. Now what?


M2

M1

(5,4)

(3,2)

(0,6)

(1,7)

• We can save memory if we normalize it (e.g., reduced Wikipedia PageRank graph by 70%)

• But, seems to call for a global lookup in a framework that prefers independent subproblems

• A simple, scalable solution is to “shard” ordered lists:

Graph Normalization


(Aaron,0)

(AMD,4)

(Brad,1)

(CMU,2)

(Dan,5)

(Dave,3)

(IBM,6)

(Intel,7)

(Aaron,IBM)

(Brad,Intel)

(AMD,5)

(CMU,3)

(Aaron,0)

(AMD,4)

(Brad,1)

(CMU,2)

(Dan,5)

(Dave,3)

(IBM,6)

(Intel,7)

Dictionary Shard 1 Converted

Edge List

Shard 2 (Dan,AMD)

(Dave,CMU) (IBM,0)

(Intel,1)

Unconverted

Edge List

(Source Sorted) (Dest Sorted)

• Would like the ability to: – Optionally filter duplicate and/or self edges

– Transform a directed graph into an undirected graph

• The library provides: – Functions to perform self- and duplicate-edge removal

– Directionality transformation

• Solutions are based on a distributed hashing algorithm



M1

M2

H(A, B) H(C, D)

H(A, B) H(C, D)

Detector

Steering function

Detector B A

A B

C D

D C



M1

M2

H(A, B) H(C, D)

H(A, B) H(C, D)

Detector

Steering function

Detector

B A A B

C D

D C








M1

M2

H(A, B) H(C, D)

H(A, B) H(C, D)

Detector

Steering function

Detector

C D

A B

D C








M1

M2

H(A, B) H(C, D)

H(A, B) H(C, D)

Detector

Steering function

Detector C D D C

A B






• Minimize communications by minimizing the number of machines v spans

• Maximize the edges placed on each machine, subject an imbalance factor

Graph Partitioning Objectives


A

B

C D

1

S

S S

S

2

• Minimize communications by minimizing the number of machines v spans

• Maximize the edges placed on each machine, subject an imbalance factor


A B

C D

1 1

1 2

2

Graph Partitioning Objectives

• Random edge assignment (to systems)

• Greedy uses global history used to place edges

• Oblivious implements a local version of the Greedy strategy

Partitioning Strategies


A

B

F C

D

E

G Machine 1

Oblivious Algorithm

J

I

B D

K

A

H

Machine 2

A

B

C D

E

F

Partition 1 Partition 2

CASE 1:

Both end points

have never been

seen before

Randomly

assign

A

B

F C

D

E

G

Machine 1’s Shard

A

D


A

B

E

F

C D

CASE 2:

Both end points have been seen

before on the same

partition

Assign to a partition which

contains both

endpoints

A

B

F C

D

E

G

Machine 1’s Shard

E

C


A

B

E

F

C D

A

B

F C

D

E

G

CASE 3:

Both end points

have been seen

before but on

different partitions

Assign to any

partition that

contains an endpoint

Machine 1’s Shard

B

F

E

D

C


A

B

E

F

C D

A

B

F C

D

E

G

CASE 3:

Both end points

have been seen

before but on

different partitions

Assign to any

partition that

contains an endpoint

Machine 1’s Shard

D

E

G

C

Partition1 Partition 2

A

B

E

F

C D

F

A

B

F C

D

E

G

CASE 4:

Only one end point

has been seen

before

Assign to a

partition that

contains the

endpoint

Machine 1’s Shard

Replication Results Twitter Graph: 41M vertices, 1.4B edges

Cluster Computing Architecture *Gonzalez et al., “PowerGraph: Distributed Graph-Parallel

Computation on Natural Graphs,” submitted to OSDI’12 33

Graph Serialization

• Self-describing data format

– JSON

– JSON with compression

• Extensible

– Easy to extend to alternative frameworks like Giraph

– May port to Graph Databases


Partitioning

JSON Encoding

Edge Lists

Vertex Lists

{

“src_id”: 34,

“dest_id”: 45

“e-data”: 30

}

{

“ver_id”: 34,

“v-data”: 56,

“mirror”: [1,2,3],

“owner”: 1

}

Build it!


GraphBuilder Stack


HDFS

Distributed Graph

Computation (GLv2)

MapReduce

GraphBuilder

GraphBuilder app GraphLab app

Built-in Parsers

User defined parser

Tabulations

User defined functions

Data Normalization

Graph Transformation

Graph Partitioning Fr

ame

wo

rk

Ad

apte

r

JSO

N

User Defined

• Hardware: 8 node cluster – 1U Dual CPU (Intel SNB)

Amazon build ZT systems – 64 GB Memory, Four

SATA Hard Drives – Intel 10G Adapter and

Switch

• Software: – Apache Hadoop 1.0.1 – GraphLab v2.1 – GraphBuilder beta

Prototype Overview


Our Wikipedia Graphs


Graph |V| |E|

LDA 4.9M 478M 2.23

PageRank 9.7M 107M 2.41

Top 1% of vertices are

adjacent to

49% of the edges!

LDA PageRank

Scaling Experiment


0

200

400

600

800

1000

4x 2x 1x

Ov

era

ll T

ime

(S

eco

nd

s)

Dataset Size

Finalization

Partitioning

Pre-processing/

Graph Formation

How did we do at partitioning?


7.5 replications/vertex 5.7 replications/vertex

Random Oblivious

For PageRank, the stats were 4.1 and 3.5, respectively.

• Improve partitioning quality and performance

• Explore better iterative MapReduce models

• Consider additional library functions

• Push to peta-scale graphs

• Prepare for open sourcing!

What’s next for GraphBuilder?


The Broader Perspective • Collaborate with the Big Data ISTC to develop large-

scale Graph DBs for GL2 and GraphBuilder

• Bring new technologies to large-scale ML systems, such as fast persistent memory-storage

• Explore new scale-out platform architectures for Big Data and Big Learning


Big Data ISTC

MIT

PSU

Brown

UW Seattle

Stanford

U Tenn Knoxville

UC Santa Barbara

Five focus areas: • Databases & Analytics • Math & Algorithms • Visualization • Architecture • Streaming

cluster computing architecture intel labs - 01.org · cluster computing architecture 10 *[neo4j]...

Documents