cluster computing with dryadlinq

Post on 21-Mar-2016

36 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

Cluster Computing with DryadLINQ. Mihai Budiu Microsof t Research Silicon Valley IEEE Cloud Computing Conference July 19, 2008. Goal. Design Space. Grid. Internet. Data- parallel. Dryad. Search. Shared memory. Private d ata center. Transaction. HPC. Latency. Throughput. - PowerPoint PPT Presentation

TRANSCRIPT

Cluster Computing with DryadLINQ

Mihai BudiuMicrosoft Research Silicon Valley

IEEE Cloud Computing ConferenceJuly 19, 2008

2

Goal

3

Design Space

ThroughputLatency

Internet

Privatedata

center

Data-parallel

Sharedmemory

DryadSearch

HPC

Grid

Transaction

4

Data Partitioning

RAM

DATA

DATA

5

Data-Parallel Computation

Storage

Execution

Application

Parallel Databases

Map-Reduce

GFSBigTable

CosmosNTFS

Dryad

DryadLINQScope,PSQL

Sawzall, Pig

6

• Introduction• Dryad • DryadLINQ• Applications

Outline

7

2-D Piping• Unix Pipes: 1-D

grep | sed | sort | awk | perl

• Dryad: 2-D grep1000 | sed500 | sort1000 | awk500 | perl50

8

Virtualized 2-D Pipelines

9

Virtualized 2-D Pipelines

10

Virtualized 2-D Pipelines

11

Virtualized 2-D Pipelines

12

Virtualized 2-D Pipelines• 2D DAG• multi-machine• virtualized

13

Architecture

Files, TCP, FIFO, Networkjob schedule

data plane

control plane

NS PD PDPD

V V V

Job manager cluster

Fault Tolerance

15

• Introduction• Dryad • DryadLINQ• Applications

Outline

16

DryadLINQ

Dryad

17

LINQ = C# + Queries

Collection<T> collection;bool IsLegal(Key);string Hash(Key);

var results = from c in collection where IsLegal(c.key) select new { Hash(c.key), c.value};

18

Collection<T> collection;bool IsLegal(Key k);string Hash(Key);

var results = from c in collection where IsLegal(c.key) select new { Hash(c.key), c.value};

DryadLINQ = LINQ + Dryad

C#

collection

results

C# C# C#

Vertexcode

Queryplan(Dryad job)Data

19

Data Model

Partition

Collection

C# objects

20

Language Summary

WhereSelectGroupByOrderByAggregateJoinApplyMaterialize

21

Demo

Done

22

Example: Histogrampublic static IQueryable<Pair> Histogram( IQueryable<LineRecord> input, int k){ var words = input.SelectMany(x => x.line.Split(' ')); var groups = words.GroupBy(x => x); var counts = groups.Select(x => new Pair(x.Key, x.Count())); var ordered = counts.OrderByDescending(x => x.count); var top = ordered.Take(k); return top;}

“A line of words of wisdom”

[“A”, “line”, “of”, “words”, “of”, “wisdom”]

[[“A”], [“line”], [“of”, “of”], [“words”], [“wisdom”]]

[ {“A”, 1}, {“line”, 1}, {“of”, 2}, {“words”, 1}, {“wisdom”, 1}]

[{“of”, 2}, {“A”, 1}, {“line”, 1}, {“words”, 1}, {“wisdom”, 1}]

[{“of”, 2}, {“A”, 1}, {“line”, 1}]

23

Histogram Plan

SelectManyHashDistribute

MergeGroupBy

Select

OrderByDescendingTake

MergeSortTake

24

Map-Reduce in DryadLINQ

public static IQueryable<S> MapReduce<T,M,K,S>( this IQueryable<T> input, Expression<Func<T, IEnumerable<M>>> mapper, Expression<Func<M,K>> keySelector, Expression<Func<IGrouping<K,M>,S>> reducer) { var map = input.SelectMany(mapper); var group = map.GroupBy(keySelector); var result = group.Select(reducer); return result;}

25

Map-Reduce Plan

M

R

G

M

Q

G1

R

D

MS

G2

R

static dynamic

X

X

M

Q

G1

R

D

MS

G2

R

X

M

Q

G1

R

D

MS

G2

R

X

M

Q

G1

R

D

M

Q

G1

R

D

MS

G2

R

X

M

Q

G1

R

D

MS

G2

R

X

M

Q

G1

R

D

MS

G2

R

MS

G2

R

map

sort

groupby

reduce

distribute

mergesort

groupby

reduce

mergesort

groupby

reduce

consumer

map

parti

al a

ggre

gatio

nre

ducedynamic

26

• Introduction• Dryad • DryadLINQ• Applications

Outline

27

Applications

Dryad

DryadLINQ

Distributed Data Structures

Machine learningGraphsLog

analysisImage

processingCombinatorialoptimization

Raytracing

Dataanalysis

28

E.g: Linear Algebra

T U Vnmm ,,=, ,

T

Expectation Maximization (Gaussians)

29

• 160 lines • 3 iterations shown

Conclusions

• Dryad = distributed execution environment• Supports rich software ecosystem

• DryadLINQ = Compiles LINQ to Dryad• C# objects and declarative programming• .Net and Visual Studio

for distributed programming

30

31

Dryad Job Structure

grep

sed

sortawk

perlgrep

grepsed

sort

sort

awk

Inputfiles

Vertices (processes)

Outputfiles

ChannelsStage

32

Linear Regression

• Data

• Find

• S.t.

mt

nt yx ,

mnA

tt yAx

},...,1{ nt

33

Analytic Solution

X×XT X×XT X×XT Y×XT Y×XT Y×XT

Σ

X[0] X[1] X[2] Y[0] Y[1] Y[2]

Σ

[ ]-1

*

A

1))(( Ttt t

Ttt t xxxyA

Map

Reduce

34

Linear Regression Code

Vectors x = input(0), y = input(1);Matrices xx = x.Map(x, (a,b) => a.OuterProd(b));OneMatrix xxs = xx.Sum();Matrices yx = y.Map(x, (a,b) => a.OuterProd(b));OneMatrix yxs = yx.Sum();OneMatrix xxinv = xxs.Map(a => a.Inverse());OneMatrix A = yxs.Map(xxinv, (a, b) => a.Mult(b));

1))(( Ttt t

Ttt t xxxyA

35

Dryad = Execution Layer

Job (application)

Dryad

Cluster

Pipeline

Shell

Machine≈

36

• Many similarities• Exe + app. model• Map+sort+reduce• Few policies• Program=map+reduce• Simple• Mature (> 4 years)• Widely deployed• Hadoop

Dryad Map-Reduce

• Execution layer• Job = arbitrary DAG• Plug-in policies• Program=graph gen.• Complex ( features)• New (< 2 years)• Still growing• Internal

37

PLINQ

public static IEnumerable<TSource> DryadSort<TSource, TKey>(IEnumerable<TSource> source, Func<TSource, TKey> keySelector, IComparer<TKey> comparer, bool isDescending){

return source.AsParallel().OrderBy(keySelector, comparer);}

38

Operations on Large Vectors: Map 1

U

T

T Uf

f

f preserves partitioning

39

V

Map 2 (Pairwise)

T Uf

V

U

T

f

40

Map 3 (Vector-Scalar)T U

fV

V

40

U

T

f

Reduce (Fold)

41

U UU

U

f

f f f

fU U U

U

42

Software Stack

Windows Server

Cluster Services

Distributed Filesystem (Cosmos)

Dryad

Distributed Shell (Nebula)

PSQL

DryadLINQ

PerlSQL

server

C++

Windows Server

Windows Server

Windows Server

C++

CIFS/NTFS

legacycode

sed, awk, grep, etc.

SSISScope

C# Data structures

Applications

C#

Job

queu

eing

, mon

itorin

g

top related