on the convergence of cloud computing and desktop...

175
On the Convergence of Cloud Computing and Desktop Grids Presented by Derrick Kondo Many Slides by Jeff Barr, Amazon Inc. and Jeff Dean, Sanjay Ghemawat, Google, Inc.

Upload: others

Post on 24-Aug-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

On the Convergence of Cloud Computing and Desktop Grids

Presented by Derrick KondoMany Slides by

Jeff Barr, Amazon Inc.and Jeff Dean, Sanjay Ghemawat, Google, Inc.

Page 2: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Outline Cloud Computing

Background Architecture Map-Reduce

Desktop Grids Background & contract with clouds Architecture Prediction

Page 3: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Motivation

70% of Web Development Effort is “Muck”: Data Centers Bandwidth / Power / Cooling Operations Staffing

Scaling is Difficult and Expensive: Large Up-Front Investment Invest Ahead of Demand Load is Unpredictable

Page 4: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Dream or Nightmare?

Slashdot/Digg/TechCrunch EffectRapid, unexpected customer demand/growth

Same true for scien/fic workloads

Page 5: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Solution: Cloud Computing

Scale capacity on demandTurn fixed costs into variable costsAlways availableRock-solid reliabilitySimple APIs and conceptual modelsCost-effectiveReduced time to marketFocus on product & core competencies

Page 6: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

What is a cloud? Cloud computing is Internet-based ("cloud") development

and use of computer technology ("computing"). -- Wikipedia

A cloud is a distributed system where the user doesn't care exactly what resources are used to carry out an operation -- Prof. Douglas Thain

"A Cloud is a type of parallel and distributed system consisting of a collection of inter-connected and virtualized computers that are dynamically provisioned and presented as one or more unified computing resources based on service-level agreements established through negotiation between the service provider åand consumers.” -- Prof Raj Buyya

6

Page 7: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Cloud Providers Large-scale centralized systems

Low reliability, low-cost commodity components

Google 100,000 systems in 15 data centers [2005]

– Recent estimate: 500,000 systems in 30 data centers

7

!"#$ %&'()*+*,-.*$ /0*)$ (-'-#,0+$ /##)"#'-1$ /$ ./'()2.)1"0-$

1-*&3#$,4/,$,4-+$50/#$,)$4/6-$(-/1+$&#$,4-$*"..-($)7$899:;$$<4-&($

/55()/'4$4/*$./#+$*&.&0/(&,&-*$ ,)$ ,4-$)#-$1-*'(&=-1$4-(-$>?&3"(-$

@A;$ <4-$ '"((-#,$ !"#$ 5(),),+5-$ &*$ =/*-1$ "5)#$ /$ 8927)),$ *4&55&#3$

')#,/&#-($B&,4$ 8C8$ *+*,-.*;$ $ <4&*$ &*$ ()"340+$ 4/07$ ,4-$ 1-#*&,+$ )7$

,4-$D/'E/=0-$1-*&3#F$B&,4$&,*$GFG@8$*+*,-.*$&#$/$C927)),$')#,/&#-(;$$

$

$

!"#$%&'()'*$+',"-%./0/1&2/'345-6'3.7'

$

H"($ /55()/'4$ ="&01*$ )#$ ,4-$B)(E$ 1-*'(&=-1$ /=)6-$ &#$,4-$<-0')F$

5)B-($ 3-#-(/,&)#F$ /#1$ 1/,/$ *,)(/3-$ 1)./&#*$ /#1$ /550&-*$ &,$ ,)$

.)1"0/($ 1/,/$ '-#,-($ ')#*,("',&)#$ /#1I)($ -J,-#*&)#*;$K))30-$./+$

/0*)$4/6-$(-0/,-1$B)(E$"#1-(B/+$/,$,4-&($L/0&7)(#&/$4-/1M"/(,-(*F$

=",$1-,/&0*$#-6-($=--#$(-0-/*-1;$ $<4&*$B)(E$(-./&#*$"#')#7&(.-1$

/#1$)#0+$ (-5)(,-1$=+$/#$/",4)($5"=0&*4&#3$"#1-($ ,4-$5*-"1)#+.$

D)=-(,$N;$L(&#30-+$O@P;$

$

(8! 9:*;<=';,>?;@AB;C=*'A=9'!DBDE:'

FCEG'<4-$ (-'-#,$ B)(E$ )7$ D/'E/=0-$ !+*,-.*$ /#1$ !"#$ %&'()*+*,-.*$

*"33-*,*$ ,4/,$ ,4&*$ ')#,/&#-(2=/*-1$ *+*,-.*$ /55()/'4$ ./+$ =-$

')..-('&/00+$ 7-/*&=0-;$ $ %"'4$ 4/(1B/(-$ /#1$ *)7,B/(-$ B)(E$

(-./&#*$ ,)$ =-$ 1)#-$ &#$ (-7&#&#3$ ,4-$ 1/,/$ '-#,-($ 1-*&3#$ 5)&#,;$$

!5-'&7&'/00+F$ /(-$ ,4-$#)#27&-012*-(6&'-/=0-$')#,/&#-(*$5()5)*-1$ &#$

,4&*$5/5-($/$5(/',&'/0$1-*&3#$5)&#,Q$R/*&'/00+F$'/#$/$')#,/&#-($=-$

/**-.=0-1$ /,$ ,4-$ 7/',)(+$B&,4$ -#)"34$ (-1"#1/#'+$ ,)$("#$ 7)($ ,4-$

7"00$ S2+-/($ 50"*$ /.)(,&T/,&)#$ '+'0-$B&,4)",$ (-M"&(&#3$*-(6&'-Q$ $ U7$

,4-$ #)#27&-01$ *-(6&'-/=0-$ 1-*&3#$ /55()/'4$ &*$ ')*,$ -77-',&6-F$ ,4-#$

."'4$4&34-($1-#*&,&-*$/(-$5)**&=0-$ *&#'-$/&*0-$ *5/'-$ &*$#)$ 0)#3-($

#--1-1$ /#1$ .)(-$ ,&34,0+$ &#,-3(/,-1$ '))0&#3$ *+*,-.$ 1-*&3#*$ /(-$

5(/',&'/0;$$K-#-(/00+F$*+*,-.$*-(6&'-$')#*,(/&#,*$./E-$1&(-',$0&M"&1$

'))0&#3$1&77&'"0,$,)$"*-$&#$/$7&-012*-(6&'-/=0-$1-*&3#;$$D/'E/=0-$&*$

,/E&#3$/$*,-5$&#$,4&*$1&(-',&)#$=+$50/'&#3$.&#&2LDVL$"#&,*$=-*&1-$

-/'4$ (/'E;$ $ W)B-6-(F$ ,4-+$ *,&00$ "*-$ /&(2,)2B/,-($ '))0&#3$ &#$ ,4-&($

/55()/'4;$

$

!-6-(/0$ 6-#1)(*$ '"((-#,0+$ )77-($ .)(-$ 5)B-(2-77&'&-#,$ 1-*&3#*;$$

D/,4-($ ,4/#$ 4/6&#3$ /#$ &#-77&'&-#,$ *B&,'4&#3$ 5)B-($ *"550+$ B&,4$

-/'4$ *+*,-.F$ /$ (-',&7&-(I,(/#*7)(.-($ "#&,$ &*$ 5()6&1-1$ /,$ ,4-$ (/'E$

0-6-0$/#1$XL$5)B-($ &*$1&*,(&=",-1$ ,)$/00$*+*,-.*$B&,4&#$,4-$(/'E;$$

<)$ /6)&1$ 0)**-*$ )#$ ,4-$ B/+$ ,)$ ,4-$ (/'EF$ 4&34$ 6)0,/3-$ VL$ &*$

1&*,(&=",-1F$ B&,4$ CY9ZVL$ =-&#3$ /$ ')..)#$ '4)&'-;$ [*&#3$

-77&'&-#,$ XL$ ,(/#*7)(.-(*$ )#$ -/'4$ *+*,-.$ ')"50-1$ B&,4$ 4&34$

-77&'&-#'+$ (/'E20-6-0$ VL$ ,)$ XL$ (-',&7&-(I,(/#*7)(.-(*$ +&-01*$

*&3#&7&'/#,$5)B-($*/6&#3*;$U#$/11&,&)#$,)$=-&#3$.)(-$-77&'&-#,F$,4&*$

/55()/'4$4/*$5()6-#$,)$=-$.)(-$(-0&/=0-$&#$7&-01$"*/3-$/*$B-00;$

$

<4-$ 1-*&3#$ )7$ =(&#3&#3$ 4&34$ 6)0,/3-$ VL$ ,)$ ,4-$ (/'E$ /#1$ ,4-#$

1&*,(&=",&#3$0)B$6)0,/3-$XL$,)$-/'4$*+*,-.$4/*$1(/B=/'E*;$\&,4-($

XL$ 0)**-*$ ,)$ ,4-$ *+*,-.$."*,$=-$ ,)0-(/,-1$)($ ,4-$ *&T-$)7$ ,4-$="*$

=/(*$'/((+&#3$,4-$XL$."*,$=-$&#'(-/*-1;$$[*&#3$4&34-($6)0,/3-$XL$

1&*,(&=",&)#$)#$,4-$(/'E$4/*$*).-$5),-#,&/0$3/&#$=",F$/*$,4-$6)0,/3-$

'0&.=*F$ *)$ 1)-*$ ,4-$ (&*E$ ,)$ )5-(/,&)#*$ 5-(*)#/0;$ $ U7$ *+*,-.*$ '/#$

)#0+$=-$*/7-0+$.)6-1$=+$-0-',(&'&/#*F$')*,*$-*'/0/,-$,)$,4-$5)&#,$)7$

')#*".&#3$ /#+$ 5),-#,&/0$ */6&#3*;$ $ U#$ /$ #)#27&-012*-(6&'-/=0-$

')#,/&#-(F$ 4)B-6-(F$ 4&34-($ 6)0,/3-$ 0-6-0*$ ')"01$ =-$ -.50)+-1$

B&,4)",$,4&*$1)B#*&1-$')*,;$$]4/,$5)B-($1&*,(&=",&)#$&##)6/,&)#*$

/(-$5)**&=0-$&7$,4-$7&-01$*-(6&'-$')*,$')#*,(/&#,$&*$(-.)6-1Q$

$

U#$ ,4-$ 1-*&3#$ 1-*'(&=-1$ 4-(-F$ ,4-$1/,/$ '-#,-($ ="&01&#3$=0)'E$ &*$ /$

B-/,4-(5())7$ *4&55&#3$')#,/&#-($/#1$ ,4-*-$'/#$=-$*,/'E-1$S$ ,)$@$

4&34;$$]4/,$&*$,4-$&1-/0$1/,/$'-#,-($1-*&3#$B&,4$,4&*$.)1-0Q$$L/#$

B-$ 3)$ B&,4$ /$ *./00$ '-#,(/0$ #-,B)(E&#3F$ 5)B-(F$ '))0&#3F$ /#1$

*-'"(&,+$7/'&0&,+$*"(()"#1-1$=+$*,/'E-1$')#,/&#-(*Q$ $H($ &*$&,$')*,2

^"*,&7&-1$,)$E--5$,4-$')#,/&#-(*$&#$/$="&01&#3Q$$L"((-#,$1/,/$'-#,-($

1-*&3#$ ,4-)(+$ 7/6)(*$ ,4-$ G9$ ,)$ 892.-3/B/,,$ (/#3-$ /*$ ,4-$ &1-/0$

7/'&0&,+$ *&T-;$ $ W)B$ 1)-*$ ,4-$ .)1"0/($ 1/,/$ '-#,-($ '4/#3-$ ,4&*$

1-*&3#$5)&#,Q$$$

$

_)B-($1&*,(&=",&)#$/#1$-M"&5.-#,$')*,*$-J'--1$C9`$)7$,4-$')*,$)7$

/$ ,+5&'/0$ 1/,/$ '-#,-(F$ B4&0-$ ,4-$ ="&01&#3$ &,*-07$ ')*,*$ ^"*,$ )6-($

G@`S;$ $ V*$ /$ (-*"0,F$ /$ 1&77&'"0,$ ,(/1-2)77$ ."*,$ =-$ ./1-$ B4-#$

*-0-',&#3$ 5)B-($ 1-#*&,+;$ $ U7$ ,4-$ 1/,/$ '-#,-($ &*$ 5()6&*&)#-1$ ,)$

*"55)(,$ 6-(+$ 4&34$ 5)B-($ 1-#*&,&-*$ >4&34$ 5)B-($ ,)$ 70))($ *5/'-$

(/,&)A$ ,4-$ (&*E$ &*$ ,4/,$ *).-$ )7$ ,4-$ 5)B-($ B&00$ 3)$ "#"*-1$ &7$ ,4-$

/',"/0$(/'E*$,4/,$/(-$&#*,/00-1$')#*".-$0-**$5)B-(I*M"/(-$7)),$,4/#$

,4-$1/,/$'-#,-($1-*&3#$5)&#,;$ $<4&*$B)"01$B/*,-$-J5-#*&6-$5)B-($

-M"&5.-#,;$ $H#$ ,4-$),4-($ 4/#1F$ &7$ ,4-$1/,/$ '-#,-($ &*$ 5()6&*&)#-1$

B&,4$ /$ 0)B-($ 5)B-($ 1-#*&,+F$ ,4-$ (&*E$ &*$ ,4/,$ 70))($ *5/'-$ B&00$ =-$

B/*,-1;$ $ ]/*,-1$ 70))($ *5/'-$ &*F$ 4)B-6-(F$ ."'4$ '4-/5-($ ,4/#$

B/*,-1$ 5)B-($ '/5/'&,+;$ $ R-'/"*-$ ,4&*$ ,(/1-2)77$ &*$ 4/(1$ ,)$ 3-,$

-J/',0+$ ')((-',F$.)*,$ 7/'&0&,&-*$ -(($ ,)B/(1*$5()6&*&)#&#3$ ,)$ 0)B-($

5)B-($ 1-#*&,&-*$ *&#'-$ /$ .&*,/E-$ &#$ ,4&*$ 1&(-',&)#$ &*$ ."'4$ 0-**$

')*,0+$ ,4/#$ 5()6&1&#3$ .)(-$ 5)B-($ ,4/#$ '/#$ =-$ "*-1;$ $ V*$ /$

')#*-M"-#'-F$.)*,$ 7/'&0&,&-*$ /(-$5)B-($=)"#1$B&,4$."'4$B/*,-1$

70))($ *5/'-;$ $ <4&*$ &*$ /#$ "##-'-**/(+$ ')*,$ ,4/,$ /0*)$ #-3/,&6-0+$

&.5/',*$'))0&#3$/#1$5)B-($1&*,(&=",&)#$-77&'&-#'+;$$ U#$/$.)1"0/($

1-*&3#F$B-$'/#$-/*&0+$/1^"*,$1/,/$'-#,-($a70))($*5/'-b$,)$./,'4$,4-$

5)B-($ *"550+$ /#1$ 1&*,(&=",&)#$ '/5/=&0&,&-*$ )7$ ,4-$ '-#,-(;$ $ W)B$

1)-*$,4&*$(-1"',&)#$&#$')#*,(/&#,*$&.5/',$1/,/$'-#,-($1-*&3#Q$

$

<4-$ .)1"0/($ 1/,/$ '-#,-($ )5-#*$ "5$ ,4-$ 5)**&=&0&,+$ )7$ ./**&6-0+$

1&*,(&=",-1$ *-(6&'-$ 7/=(&'*;$ $ D/,4-($ ,4/#$ ')#'-#,(/,-$ /00$ ,4-$

(-*)"('-*$ &#$/$ *./00$#".=-($)7$ 0/(3-$1/,/$'-#,-(*$B&,4$G9$ ,)$892

.-3/B/,,$ 5)B-($ /55-,&,-*$ /#1$ 5()1&3&)"*$ #-,B)(E&#3$

(-M"&(-.-#,*F$'/#$B-$1&*,(&=",-$,4-$').5",&#3$'0)*-($,)$B4-(-$&,$&*$

$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$ $$$$$$$$$$$$$$$$$$$$$$$$$

S$]&#1)B*$c&6-$H5-(/,&)#*$')*,$1/,/;$

1,152 systems in 20x8x8 foot container

Page 8: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Types of Clouds Platform-as-a-service

E.g. Amazon’s EC2

Applications-as-a-service E.g. Google App Engine (DataStore/GQL,

MapReduce)

8

Page 9: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Google App Engine Run web applications (Python-based) API for data store, google accounts, URL

fetching, image manip., email Web-based admin console Free with up to 500MB of storage and 5

million views

9

Page 10: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Infrastructure Services

Compute

Store Message

Page 11: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Infrastructure Services

Compute

Store Message

Elas/c ComputeCloud

Simple Storage Service

Simple QueueService

Page 12: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Amazon Simple Storage Service

• 1 B – 5 GB / object• Fast, Reliable, Scalable• Redundant, Dispersed• 99.99% Availability Goal• Private or Public• Per‐object URLs & ACLs• BitTorrent Support

Page 13: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Pricing in EuropeStorage

    * $0.180 per GB – first 50 TB / month of storage used

    * $0.170 per GB – next 50 TB / month of storage used    * $0.160 per GB – next 400 TB / month of storage used    * $0.150 per GB – storage used / month over 500 TB

Data Transfer

    * $0.100 per GB – all data transfer in

    * $0.170 per GB – first 10 TB / month data transfer out    * $0.130 per GB – next 40 TB / month data transfer out    * $0.110 per GB – next 100 TB / month data transfer out    * $0.100 per GB – data transfer out / month over 150 TB

Requests

    * $0.012 per 1,000 PUT, COPY, POST, or LIST requests    * $0.012 per 10,000 GET and all other requests*

Page 14: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Amazon S3 Concepts

Objects:Opaque data to be stored (1 byte … 5 Gigabytes)Metadata (attribute-value, up to 4KB)

Authentication and access controls

Buckets (like directories):Object container – any number of objects100 buckets per account / buckets are “owned”

Keys:Unique object identifier within bucketUp to 1024 bytes longFlat object storage model

Functionality

– Simple put/get functionality – Limited search functionality – Objects are immutable, cannot be renamed

Standards-Based Interfaces:REST and SOAPURL-Addressability – every object has a URL

2‐level namespace

Page 15: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting
Page 16: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting
Page 17: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Amazon Elastic Compute Cloud

EC2

Page 18: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Amazon EC2 Virtual environment for linux/windows

applications Create Amazon Machine Image (AMI) with

app’s, lib’s, data, config settings, Upload image to S3, then start/stop/monitor

images

17

Page 19: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Amazon EC2 Features

•Elas/c: can increase number of resources as needed•Configurability: can configure hard resources (as instances) or soaware stack: OS, lib’s, app’s with root access•Reliability: 99.99%•For applica/ons•Persistant storage (independant of life of instance) •Mul/ple loca/ons: availability zones•Sta/c IP addresses associated with account (not instance)•Can remap IP addresses to another instance or availability zone as needed

Page 20: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Amazon EC2 Concepts

Amazon Machine Image (AMI):Bootable root diskPre-defined or user-builtCatalog of user-built AMIs

Instance:Running copy of an AMILaunch in less than 2 minutesStart/stop programmatically

Network Security Model:Explicit access controlSecurity groups

Inter-service bandwidth is free

Page 21: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Standard Instances

Small Instance (Default) 1.7 GB of memory, 1 EC2 Compute Unit (1 virtual core with 1 EC2 Compute Unit), 160 GB of instance storage, 32-bit platform Large Instance 7.5 GB of memory, 4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each), 850 GB of instance storage, 64-bit platform Extra Large Instance 15 GB of memory, 8 EC2 Compute Units (4 virtual cores with 2 EC2 Compute Units each), 1690 GB of instance storage, 64-bit platformEC2 Compute Unit (ECU) – One EC2 Compute Unit (ECU) provides the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.

20

Page 22: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Large instances

Instances of this family have proportionally more CPU resources than memory (RAM) and are well suited for compute-intensive applications. High-CPU Medium Instance 1.7 GB of memory, 5 EC2 Compute Units (2 virtual cores with 2.5 EC2 Compute Units each), 350 GB of instance storage, 32-bit platform High-CPU Extra Large Instance 7 GB of memory, 20 EC2 Compute Units (8 virtual cores with 2.5 EC2 Compute Units each), 1690 GB of instance storage, 64-bit platform21

Page 23: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Operating Systems and Software

22

•Opera/ng Systems        •Red Hat Enterprise Linux  Windows Server 2003   Oracle Enterprise Linux•OpenSolaris   openSUSE Linux   Ubuntu Linux•Fedora   Gentoo Linux   Debian

•Soaware•Databases•Oracle 11g, MySQL Enterprise, Microsoa SQL Server Standard 2005

•Batch Processing•Hadoop,   Condor

•eb Hos/ng•Apache HTTP, IIS/Asp.Net     

Page 24: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Pricing

Pay as you useStandard Instances

LinuxSmall (Default) $0.10 per hourLarge $0.40 per hourExtra Large $0.80 per hour

High CPU Instances Medium $0.20 per hourExtra Large $0.80 per hour

Internet Data TransferData transfer in: $0.10 per GBData transfer out: $0.17 per GB 23

Page 25: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Amazon EC2 At Work

StartupsCruxy – Media transcodingGigaVox Media – Podcast Management

Fortune 500 clients:High-Impact, Short-Term ProjectsDevelopment Host

Science / Research:Hadoop / MapReducempiBLAST

Load-Management and Load Balancing Tools:Pound WeogeoRightscale

Page 26: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Can Clouds Work for Science? Applications don’t need durability,

availability, and access performance all bundled together

25

!"##$%&'() '")*&+!$++) ',-).//0&!.1&0&'()"2)$'&0&'() +'"3.4-)+(+'-#+)

+$!,).+)56)2"3)',3--)3-.+"%+7)

8&3+'9) +'"3.4-) $'&0&'&-+) ,.:-) /3":-%) /"/$0.3) 2"3) !"%+$#-3+) .%*)

+#.00) 1$+&%-++-+) .%*) !"%'&%$-) '") -:"0:-) 3./&*0(7) ;-&%:&4"3.'&%4)

',-) *&+!$++&"%) "%) "/'&#.0) +'"3.4-) $'&0&'() *-+&4%) <&00) .22-!') ',-)

*-+&4%)"2)2$'$3-)+'"3.4-)$'&0&'()+-3:&!-+)',.')'.34-')',-)%--*+)"2)',-)

+!&-%!-)!"##$%&'(7)

5-!"%*9) !"#/$'&%4) !-%'-3+) ,.:-) !,.%4-*) 2"!$+) 23"#) +$//"3'&%4)

&+"0.'-*) /3"=-!'+) '") +$//"3'&%4) -%'&3-) $+-3) !"##$%&'&-+) 43"$/-*)

.3"$%*) !"##"%) +!&-%'&2&!) 4".0+) >-7479) ?-3.@3&*) ABCDE7) F3":&*&%4)

+'"3.4-).+).)$'&0&'() &+)#"3-)./') '") +$//"3')$+-3)!"00.1"3.'&"%).%*)

&%'-43.'&"%) <&',) .//0&!.'&"%+) .%*) 2&'+) ',-) #.%*.'-) "2) -G&+'&%4)

!"#/$'&%4) !-%'-3+) .%*) 0.34-) +!.0-) !(1-3) &%23.+'3$!'$3-)

*-/0"(#-%'+7)

8&%.00(9) *.'.H&%'-%+&:-) +!&-%'&2&!) .//0&!.'&"%+) !"%'&%$-) '") +/$3H&%)

%-<)/3"*$!'&"%H#"*-)!"00.1"3.'&"%+7)I+)+$!,9)-!"%"#&-+)"2)+!.0-)

1-!"#-) .) +&4%&2&!.%') &%!-%'&:-) 2"3) 1"',) +'"3.4-) !"%+$#-3+) .%*)

/3":&*-3+)'").*"/')',-)$'&0&'()#"*-07)

?,-) 3-+') "2) ',&+) +-!'&"%) *&+!$++-+) /"++&10-) .3!,&'-!'$3.0) .%*)

2$%!'&"%.0&'() !,.%4-+) '") &#/3":-) ',-) +$&'.1&0&'()"2)56) '") +$//"3')

+!&-%!-).//0&!.'&"%+7)

!"#! $%&'%()*%+,-./01/23%4.,563/347./*87*48,J'&0&'() +-3:&!-+) +,"$0*) .&#) '") /3":&*-) !"#/.3.10-) /-32"3#.%!-)

<&',) &%H,"$+-) +-3:&!-+) .%*) -G/0"&') -!"%"#&-+) "2) +!.0-) '") 3-*$!-)

!"+'+7) K-'9) <-) -+'&#.'-) ',.'9) '"*.(9) <,&0-) 56)#.() "22-3) +&#&0.3)

/-32"3#.%!-)'").%)&%H,"$+-)+-3:&!-9)&')*"-+)%"')"22-3).)!"#/-00&%4)

!"+').*:.%'.4-7)

L-) 1-0&-:-) ',.') ',-) +"0$'&"%) '") 3-*$!-) +'"3.4-) !"+'+) &+) '")

$%*-3+'.%*) .%*) 3-+/"%*) '") .//0&!.'&"%) 3-M$&3-#-%'+7) N"3-)

+/-!&2&!.00(9) 56) 1$%*0-+) .') .) +&%40-) /3&!-) /"&%') ',3--) +'"3.4-)

+(+'-#) !,.3.!'-3&+'&!+O) &%2&%&'-) *.'.) *$3.1&0&'(9) ,&4,) .:.&0.1&0&'(9)

.%*)2.+')*.'.).!!-++7)N.%().//0&!.'&"%+9),"<-:-39)*")%"')%--*).00)

',-+-) ',3--) !,.3.!'-3&+'&!+) 1$%*0-*) '"4-',-3) P) ',$+) Q$%1$%*0&%4R)

1() ) "22-3&%4) #$0'&/0-) !0.++-+) "2) +-3:&!-) '.34-'-*) 2"3) +/-!&2&!)

.//0&!.'&"%)%--*+)#.()0-.*)'")3-*$!-*)!"+'+).%*)',$+)0"<-3)$'&0&'()

/3&!-+7)?,-)3-+')"2)',&+)+-!'&"%)/3-+-%'+).**&'&"%.0).34$#-%'+)',.')

$%1$%*0&%4) !.%) 3-*$!-) !"+'+) 2"3) +/-!&2&!) !0.++-+) "2) .//0&!.'&"%+)

.%*).34$-+)',.')&')&+)'-!,%&!.00()2-.+&10-7)

!"#$"%&'"() *+$&%) ,-%$*-) *+./.7) I+) ?.10-) S) +,"<+9) -.!,) "2) ',-)

',3--) +.0&-%') /3"/-3'&-+) ',.') !,.3.!'-3&T-) .) +'"3.4-) +(+'-#) >*.'.)

*$3.1&0&'(9).:.&0.1&0&'(9).%*).!!-++H/-32"3#.%!-E)3-M$&3-+)*&22-3-%')

3-+"$3!-+) .%*) -%4&%--3&%4) '-!,%&M$-+7) 8"3) -G.#/0-9) I#.T"%)

!"$0*)"22-3).)!0.++)"2)+-3:&!-)',.')"22-3+)*$3.1&0&'()1$')*.'.)#.()1-)

$%.:.&0.10-)2"3)$/)'")U)*.(+)-:-3()#"%',V)+$!,).)+-3:&!-)#&4,')1-)

!,-./-3)2"3)I#.T"%)',.%)&'+)!$33-%')"22-3&%47)

!"#$"%&'"()*0")#-)-12&+'/-%)#3)022&'*0/'+".7)I)+&4%&2&!.%')+-')"2)

+-3:&!-+)3-M$&3-+)+'"3.4-)',.')"22-3+),&4,H/-32"3#.%!-)"%0()"%)"%-)

"3) '<") "2) ',-) .1":-) /-32"3#.%!-) *&3-!'&"%+) >?.10-) 6E7) 8"3)

-G.#/0-9) .%) .3!,&:.0) +'"3.4-) +-3:&!-) ',.') /$'+) .) /3-#&$#) "%)

*$3.1&0&'()!.%)+$3:&:-)0&#&'-*)/-3&"*+)<,-3-)*.'.)&+)%"').:.&0.10-7)

W%)XY-3"9) ',-) 0.34-) +,.3-)"2)*.'.) ',.') &+) &%23-M$-%'0()$+-*)!"$0*)

1-)<-00)+'"3-*)"%)'./-)'")3-*$!-)!"+'+7)5&#&0.30(9)*&+'3&1$'-*)*.'.)

!.!,&%4)*-#.%*+)2.+').!!-++)1$')%"'),&4,)*$3.1&0&'(7)

)

93&).,:",96.,/.81'/4.8,%..(.(,71,;/1<*(.,6*+6,;./01/23%4.,

(373,344.88=,6*+6,(373,3<3*)3&*)*7>,3%(,)1%+,(373,('/3&*)*7>,3/.,

(*00./.%7,

563/347./*87*48,?.81'/4.8,3%(,7.46%*@'.8,71,;/1<*(.,

76.2,

Z&4,H

/-32"3#.%!-)

*.'.).!!-++)

@-"43./,&!.0)*.'.)>"3)+'"3.4-E)3-/0&!.'&"%)

'")&#/3":-).!!-++)0"!.0&'(9),&4,H+/--*)

+'"3.4-9)2.')%-'<"3[+)

X$3.1&0&'()

X.'.)3-/0&!.'&"%)H)/"++&10-).'):.3&"$+)

0-:-0+O),.3*<.3-)>;IWXE9)#$0'&/0-)

0"!.'&"%+9)#$0'&/0-)#-*&.V)-3.+$3-)!"*-+)

I:.&0.1&0&'()

5-3:-3\+-3:&!-)3-/0&!.'&"%9),"'H+<./)

'-!,%"0"4&-+9)#$0'&),"+'&%49)'-!,%&M$-+)'")

&%!3-.+-)).:.&0.1&0&'()2"3).$G&0&.3()+-3:&!-+)

>-7479).$',-%'&!.'&"%9).!!-++)!"%'3"0E)

)

93&).,A",B;;)*437*1%,4)388.8,3%(,76.*/,38814*37.(,/.@'*/.2.%78,

B;;)*437*1%,

4)388,C'/3&*)*7>, B<3*)3&*)*7>,

D*+6,344.88,

8;..(,

].!,-) ^") X-/-%*+) K-+)

_"%4H'-3#)

.3!,&:.0)K-+) ^") ^")

`%0&%-)

/3"*$!'&"%)^") K-+) K-+)

a.'!,)

/3"*$!'&"%)^") ^") K-+)

)

W%)+!&-%'&2&!)3-+-.3!,)',-)!"+').++"!&.'-*)<&',)0"+&%4)*.'.)*-/-%*+)

"%)<,-',-3)',-)*.'.)&+)3.<)*.'.)"3)*-3&:-*)*.'.)"1'.&%-*)23"#)3.<)

*.'.) ',3"$4,)/3"!-++&%47)X-3&:-*)*.'.)!.%).0<.(+)1-)3-/3"*$!-*)

.+)%-!-++.3()1.+-*)"%).3!,&:-*)3.<)*.'.7)J+-3+)!"$0*)1-).00"<-*)

'") !,""+-) ',-)1-+') !"+') '3.*-"22)1-'<--%) +'"3&%4)*-3&:-*)*.'.)"%)

,&4,0() *$3.10-) +'"3.4-) .%*) +'"3&%4) &') "%) !,-./-39) 0-++) 3-0&.10-)

+'"3.4-)',.')#&4,')3-+$0')&%)*.'.)0"++).%*)',-)%--*)'")3-!"#/$'-*)

0"+')*.'.).+)%-!-++.3(7)

L-) %"'-) ',.') .) /"++&10-) .34$#-%') .4.&%+') #$0'&/0-) !0.++-+) "2)

+-3:&!-)&+)',.')',-()#.()&%'3"*$!-)%"%'3&:&.0)#.%.4-#-%')!"+'+).')

',-) $'&0&'() +-3:&!-) /3":&*-3) >&%) '-3#+) "2) 3-+"$3!-) #.%.4-#-%'9)

1&00&%4) !0&-%'+9) .%*) -*$!.'&%4) .//0&!.'&"%) *-:-0"/-3+) '") ,.3%-++)

',-#E7) ) W%) 2.!'9) .) +&#&0.3) .34$#-%'),.+)1--%)#.*-) 3-4.3*&%4) ',-)

&%'3"*$!'&"%)"2)M$.0&'()"2)+-3:&!-)&%)',-)W%'-3%-')ASbD)ASBD7))L,&0-)

<-).43--)',.')',-+-)!"+'+)!.%)1-)%"%'3&:&.09)<-)1-0&-:-)',.')2$'$3-)

3-+-.3!,) &%) ',&+) .3-.) !.%) 3-*$!-) ',-+-) !"+'+) ',3"$4,) .$'"#.'&"%)

.%*)#.[-)*&22-3-%'&.'-*)+'"3.4-)+-3:&!-+) ',3"$4,)$%1$%*0&%4).%*)

.//-.0&%4).0'-3%.'&:-7)

!":! E%4/.38*%+,F).G*&*)*7>,`$3) 56) -G/-3&-%!-) .00"<+) $+) '") 3-!"##-%*) /3&#&'&:-+) ',.')<&00)

-G'-%*)',-)20-G&1&0&'()"22-3-*)1()%":-0)+'"3.4-)&%23.+'3$!'$3-+)',.')

!.'-3)'")',-)*-#.%*+)"2)*.'.H&%'-%+&:-)+!&-%!-).//0&!.'&"%+7)?,-+-)

3-!"##-%*.'&"%+) !"$0*) .0+") /3":-) :.0$.10-) '") 0.34-) @3&*)

*-/0"(#-%'+) 0&[-) ?-3.@3&*) "3)L-+'@3&*) ASSD) <,&!,) .3-)#":&%4)

'"<.3*+) "22-3&%4) &%23.+'3$!'$3-) +-3:&!-+) 2"3) +!&-%!-9) +&#&0.3) &%)

4".0+) '") ',"+-) "22-3-*) 1()I#.T"%R+) 56) .%*)c]S) '") !"##-3!&.0)

.//0&!.'&"%+7)

CPU costs dominatefor scientific workflowapplication called montage

Page 27: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

MapReduce:Simplified Data Processing on

These are slides from Dan Weld’s class at U. Washington(who in turn made his slides based on those by Jeff Dean, Sanjay

Ghemawat, Google, Inc.)

Page 28: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

An abstraction is a simple interface that allows you to scale up well-structured problems to run on hundreds or thousands of computers at once. -- Douglas Thain

Page 29: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Large-scale Management Issues

How to parallelize Data distribution Scheduling Load balancing Failure management Deployment

Page 30: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

MapReduce

MapReduce provides Automatic parallelization & distribution Fault tolerance I/O scheduling Monitoring & status updates

Page 31: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Map/Reduce Map/Reduce

Programming model from Lisp (and other functional languages)▫ state what you want to do not how to get it

Many problems can be phrased this way Easy to distribute across nodes Nice retry/failure semantics

Page 32: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Map in Lisp (Scheme)

Page 33: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Map in Lisp (Scheme) (map f list [list2 list3 …])

Page 34: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Map in Lisp (Scheme) (map f list [list2 list3 …])

Unary operator

Page 35: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Map in Lisp (Scheme) (map f list [list2 list3 …])

Unary operator

Page 36: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Map in Lisp (Scheme) (map f list [list2 list3 …])

(map square ‘(1 2 3 4))

Unary operator

Page 37: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Map in Lisp (Scheme) (map f list [list2 list3 …])

(map square ‘(1 2 3 4))

Unary operator

Binary operator

Page 38: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Map in Lisp (Scheme) (map f list [list2 list3 …])

(map square ‘(1 2 3 4))

Unary operator

Binary operator

Page 39: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Map in Lisp (Scheme) (map f list [list2 list3 …])

(map square ‘(1 2 3 4))

Unary operator

Binary operator

Page 40: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Map in Lisp (Scheme) (map f list [list2 list3 …])

(map square ‘(1 2 3 4))

Unary operator

Binary operator

Page 41: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Map in Lisp (Scheme) (map f list [list2 list3 …])

(map square ‘(1 2 3 4)) (1 4 9 16)

Unary operator

Binary operator

Page 42: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Map in Lisp (Scheme) (map f list [list2 list3 …])

(map square ‘(1 2 3 4)) (1 4 9 16)

Unary operator

Binary operator

Page 43: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Map in Lisp (Scheme) (map f list [list2 list3 …])

(map square ‘(1 2 3 4)) (1 4 9 16)

(reduce + ‘(1 4 9 16))

Unary operator

Binary operator

Page 44: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Map in Lisp (Scheme) (map f list [list2 list3 …])

(map square ‘(1 2 3 4)) (1 4 9 16)

(reduce + ‘(1 4 9 16)) (+ 16 (+ 9 (+ 4 1) ) )

Unary operator

Binary operator

Page 45: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Map in Lisp (Scheme) (map f list [list2 list3 …])

(map square ‘(1 2 3 4)) (1 4 9 16)

(reduce + ‘(1 4 9 16)) (+ 16 (+ 9 (+ 4 1) ) ) 30

Unary operator

Binary operator

Page 46: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Map/Reduce ala Google

Page 47: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Map/Reduce ala Google map(key, val) is run on each item in set

emits new key, val pairs

reduce(key, vals) is run for each unique key emitted by map() emits final output

Page 48: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

count words in docs

Page 49: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

count words in docs Input consists of (url, contents) pairs

Page 50: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

count words in docs Input consists of (url, contents) pairs

map(key=url, val=contents):▫ For each word w in contents, emit (w, “1”)

Page 51: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

count words in docs Input consists of (url, contents) pairs

map(key=url, val=contents):▫ For each word w in contents, emit (w, “1”)

reduce(key=word, values=uniq_counts):

Page 52: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

count words in docs Input consists of (url, contents) pairs

map(key=url, val=contents):▫ For each word w in contents, emit (w, “1”)

reduce(key=word, values=uniq_counts):▫ Sum all “1”s in values list

Page 53: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

count words in docs Input consists of (url, contents) pairs

map(key=url, val=contents):▫ For each word w in contents, emit (w, “1”)

reduce(key=word, values=uniq_counts):▫ Sum all “1”s in values list▫ Emit result “(word, sum)”

Page 54: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Count, map(key=url, val=contents):

For each word w in contents, emit (w, “1”)

reduce(key=word, values=uniq_counts):Sum all “1”s in values list

see bob throwsee spot run

Page 55: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Count, map(key=url, val=contents):

For each word w in contents, emit (w, “1”)

reduce(key=word, values=uniq_counts):Sum all “1”s in values list

see bob throwsee spot run

see 1bob 1 run 1see 1spot 1throw 1

Page 56: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Count, map(key=url, val=contents):

For each word w in contents, emit (w, “1”)

reduce(key=word, values=uniq_counts):Sum all “1”s in values list

see bob throwsee spot run

see 1bob 1 run 1see 1spot 1throw 1

bob 1 run 1see 2spot 1throw 1

Page 57: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Grep

Page 58: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Grep Input consists of (url+offset, single line) map(key=url+offset, val=line):

Page 59: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Grep Input consists of (url+offset, single line) map(key=url+offset, val=line):▫ If contents matches regexp, emit (line, “1”)

Page 60: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Grep Input consists of (url+offset, single line) map(key=url+offset, val=line):▫ If contents matches regexp, emit (line, “1”)

reduce(key=line, values=uniq_counts):

Page 61: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Grep Input consists of (url+offset, single line) map(key=url+offset, val=line):▫ If contents matches regexp, emit (line, “1”)

reduce(key=line, values=uniq_counts):▫ Don’t do anything; just emit line

Page 62: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Example uses: distributed grep   distributed sort   web link-graph reversal

term-vector / host web access log stats inverted index construction

document clustering machine learning statistical machine translation

... ... ...

Model is Widely Applicable

Page 63: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Typical cluster:

• 100s/1000s of 2-CPU x86 machines, 2-4 GB of memory • Limited bisection bandwidth • Storage is on local IDE disks • GFS: distributed file system manages data (SOSP'03) • Job scheduling system: jobs made up of tasks, scheduler assigns tasks to machines

Implementation is a C++ library linked into user programs

Implementation Overview

Page 64: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Execution Overview

Page 65: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Execution Overview How is this distributed?

Page 66: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Execution Overview How is this distributed?

1. Partition input key/value pairs into chunks, run map() tasks in parallel

Page 67: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Execution Overview How is this distributed?

1. Partition input key/value pairs into chunks, run map() tasks in parallel

2. After all map()s are complete, consolidate all emitted values for each unique emitted key

Page 68: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Execution Overview How is this distributed?

1. Partition input key/value pairs into chunks, run map() tasks in parallel

2. After all map()s are complete, consolidate all emitted values for each unique emitted key

3. Now partition space of output map keys, and run reduce() in parallel

Page 69: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Execution Overview How is this distributed?

1. Partition input key/value pairs into chunks, run map() tasks in parallel

2. After all map()s are complete, consolidate all emitted values for each unique emitted key

3. Now partition space of output map keys, and run reduce() in parallel

If map() or reduce() fails, reexecute!

Page 70: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

UserProgram

Master

(1) fork

worker

(1) fork

worker

(1) fork

(2)assignmap

(2)assignreduce

split 0

split 1

split 2

split 3

split 4

outputfile 0

(6) write

worker(3) read

worker

(4) local write

Mapphase

Intermediate files(on local disks)

worker outputfile 1

Inputfiles

(5) remote read

Reducephase

Outputfiles

Figure 1: Execution overview

Inverted Index: The map function parses each docu-ment, and emits a sequence of !word,document ID"pairs. The reduce function accepts all pairs for a givenword, sorts the corresponding document IDs and emits a!word, list(document ID)" pair. The set of all outputpairs forms a simple inverted index. It is easy to augmentthis computation to keep track of word positions.

Distributed Sort: The map function extracts the keyfrom each record, and emits a !key,record" pair. Thereduce function emits all pairs unchanged. This compu-tation depends on the partitioning facilities described inSection 4.1 and the ordering properties described in Sec-tion 4.2.

3 Implementation

Many different implementations of the MapReduce in-terface are possible. The right choice depends on theenvironment. For example, one implementation may besuitable for a small shared-memory machine, another fora large NUMA multi-processor, and yet another for aneven larger collection of networked machines.This section describes an implementation targetedto the computing environment in wide use at Google:

large clusters of commodity PCs connected together withswitched Ethernet [4]. In our environment:

(1)Machines are typically dual-processor x86 processorsrunning Linux, with 2-4 GB of memory per machine.

(2) Commodity networking hardware is used – typicallyeither 100 megabits/second or 1 gigabit/second at themachine level, but averaging considerably less in over-all bisection bandwidth.

(3) A cluster consists of hundreds or thousands of ma-chines, and therefore machine failures are common.

(4) Storage is provided by inexpensive IDE disks at-tached directly to individual machines. A distributed filesystem [8] developed in-house is used to manage the datastored on these disks. The file system uses replication toprovide availability and reliability on top of unreliablehardware.

(5) Users submit jobs to a scheduling system. Each jobconsists of a set of tasks, and is mapped by the schedulerto a set of available machines within a cluster.

3.1 Execution Overview

The Map invocations are distributed across multiplemachines by automatically partitioning the input data

To appear in OSDI 2004 3

Execution in more detail

39

Page 71: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

UserProgram

Master

(1) fork

worker

(1) fork

worker

(1) fork

(2)assignmap

(2)assignreduce

split 0

split 1

split 2

split 3

split 4

outputfile 0

(6) write

worker(3) read

worker

(4) local write

Mapphase

Intermediate files(on local disks)

worker outputfile 1

Inputfiles

(5) remote read

Reducephase

Outputfiles

Figure 1: Execution overview

Inverted Index: The map function parses each docu-ment, and emits a sequence of !word,document ID"pairs. The reduce function accepts all pairs for a givenword, sorts the corresponding document IDs and emits a!word, list(document ID)" pair. The set of all outputpairs forms a simple inverted index. It is easy to augmentthis computation to keep track of word positions.

Distributed Sort: The map function extracts the keyfrom each record, and emits a !key,record" pair. Thereduce function emits all pairs unchanged. This compu-tation depends on the partitioning facilities described inSection 4.1 and the ordering properties described in Sec-tion 4.2.

3 Implementation

Many different implementations of the MapReduce in-terface are possible. The right choice depends on theenvironment. For example, one implementation may besuitable for a small shared-memory machine, another fora large NUMA multi-processor, and yet another for aneven larger collection of networked machines.This section describes an implementation targetedto the computing environment in wide use at Google:

large clusters of commodity PCs connected together withswitched Ethernet [4]. In our environment:

(1)Machines are typically dual-processor x86 processorsrunning Linux, with 2-4 GB of memory per machine.

(2) Commodity networking hardware is used – typicallyeither 100 megabits/second or 1 gigabit/second at themachine level, but averaging considerably less in over-all bisection bandwidth.

(3) A cluster consists of hundreds or thousands of ma-chines, and therefore machine failures are common.

(4) Storage is provided by inexpensive IDE disks at-tached directly to individual machines. A distributed filesystem [8] developed in-house is used to manage the datastored on these disks. The file system uses replication toprovide availability and reliability on top of unreliablehardware.

(5) Users submit jobs to a scheduling system. Each jobconsists of a set of tasks, and is mapped by the schedulerto a set of available machines within a cluster.

3.1 Execution Overview

The Map invocations are distributed across multiplemachines by automatically partitioning the input data

To appear in OSDI 2004 3

Execution in more detail

39

MR lib splits input.Starts masterand worker processes

Page 72: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

UserProgram

Master

(1) fork

worker

(1) fork

worker

(1) fork

(2)assignmap

(2)assignreduce

split 0

split 1

split 2

split 3

split 4

outputfile 0

(6) write

worker(3) read

worker

(4) local write

Mapphase

Intermediate files(on local disks)

worker outputfile 1

Inputfiles

(5) remote read

Reducephase

Outputfiles

Figure 1: Execution overview

Inverted Index: The map function parses each docu-ment, and emits a sequence of !word,document ID"pairs. The reduce function accepts all pairs for a givenword, sorts the corresponding document IDs and emits a!word, list(document ID)" pair. The set of all outputpairs forms a simple inverted index. It is easy to augmentthis computation to keep track of word positions.

Distributed Sort: The map function extracts the keyfrom each record, and emits a !key,record" pair. Thereduce function emits all pairs unchanged. This compu-tation depends on the partitioning facilities described inSection 4.1 and the ordering properties described in Sec-tion 4.2.

3 Implementation

Many different implementations of the MapReduce in-terface are possible. The right choice depends on theenvironment. For example, one implementation may besuitable for a small shared-memory machine, another fora large NUMA multi-processor, and yet another for aneven larger collection of networked machines.This section describes an implementation targetedto the computing environment in wide use at Google:

large clusters of commodity PCs connected together withswitched Ethernet [4]. In our environment:

(1)Machines are typically dual-processor x86 processorsrunning Linux, with 2-4 GB of memory per machine.

(2) Commodity networking hardware is used – typicallyeither 100 megabits/second or 1 gigabit/second at themachine level, but averaging considerably less in over-all bisection bandwidth.

(3) A cluster consists of hundreds or thousands of ma-chines, and therefore machine failures are common.

(4) Storage is provided by inexpensive IDE disks at-tached directly to individual machines. A distributed filesystem [8] developed in-house is used to manage the datastored on these disks. The file system uses replication toprovide availability and reliability on top of unreliablehardware.

(5) Users submit jobs to a scheduling system. Each jobconsists of a set of tasks, and is mapped by the schedulerto a set of available machines within a cluster.

3.1 Execution Overview

The Map invocations are distributed across multiplemachines by automatically partitioning the input data

To appear in OSDI 2004 3

Execution in more detail

39

MR lib splits input.Starts masterand worker processes

Master assignsM map tasksR reduce tasks

Page 73: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

UserProgram

Master

(1) fork

worker

(1) fork

worker

(1) fork

(2)assignmap

(2)assignreduce

split 0

split 1

split 2

split 3

split 4

outputfile 0

(6) write

worker(3) read

worker

(4) local write

Mapphase

Intermediate files(on local disks)

worker outputfile 1

Inputfiles

(5) remote read

Reducephase

Outputfiles

Figure 1: Execution overview

Inverted Index: The map function parses each docu-ment, and emits a sequence of !word,document ID"pairs. The reduce function accepts all pairs for a givenword, sorts the corresponding document IDs and emits a!word, list(document ID)" pair. The set of all outputpairs forms a simple inverted index. It is easy to augmentthis computation to keep track of word positions.

Distributed Sort: The map function extracts the keyfrom each record, and emits a !key,record" pair. Thereduce function emits all pairs unchanged. This compu-tation depends on the partitioning facilities described inSection 4.1 and the ordering properties described in Sec-tion 4.2.

3 Implementation

Many different implementations of the MapReduce in-terface are possible. The right choice depends on theenvironment. For example, one implementation may besuitable for a small shared-memory machine, another fora large NUMA multi-processor, and yet another for aneven larger collection of networked machines.This section describes an implementation targetedto the computing environment in wide use at Google:

large clusters of commodity PCs connected together withswitched Ethernet [4]. In our environment:

(1)Machines are typically dual-processor x86 processorsrunning Linux, with 2-4 GB of memory per machine.

(2) Commodity networking hardware is used – typicallyeither 100 megabits/second or 1 gigabit/second at themachine level, but averaging considerably less in over-all bisection bandwidth.

(3) A cluster consists of hundreds or thousands of ma-chines, and therefore machine failures are common.

(4) Storage is provided by inexpensive IDE disks at-tached directly to individual machines. A distributed filesystem [8] developed in-house is used to manage the datastored on these disks. The file system uses replication toprovide availability and reliability on top of unreliablehardware.

(5) Users submit jobs to a scheduling system. Each jobconsists of a set of tasks, and is mapped by the schedulerto a set of available machines within a cluster.

3.1 Execution Overview

The Map invocations are distributed across multiplemachines by automatically partitioning the input data

To appear in OSDI 2004 3

Execution in more detail

39

MR lib splits input.Starts masterand worker processes

Master assignsM map tasksR reduce tasks

Worker reads chunk& passes <key,val>to map function

Page 74: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

UserProgram

Master

(1) fork

worker

(1) fork

worker

(1) fork

(2)assignmap

(2)assignreduce

split 0

split 1

split 2

split 3

split 4

outputfile 0

(6) write

worker(3) read

worker

(4) local write

Mapphase

Intermediate files(on local disks)

worker outputfile 1

Inputfiles

(5) remote read

Reducephase

Outputfiles

Figure 1: Execution overview

Inverted Index: The map function parses each docu-ment, and emits a sequence of !word,document ID"pairs. The reduce function accepts all pairs for a givenword, sorts the corresponding document IDs and emits a!word, list(document ID)" pair. The set of all outputpairs forms a simple inverted index. It is easy to augmentthis computation to keep track of word positions.

Distributed Sort: The map function extracts the keyfrom each record, and emits a !key,record" pair. Thereduce function emits all pairs unchanged. This compu-tation depends on the partitioning facilities described inSection 4.1 and the ordering properties described in Sec-tion 4.2.

3 Implementation

Many different implementations of the MapReduce in-terface are possible. The right choice depends on theenvironment. For example, one implementation may besuitable for a small shared-memory machine, another fora large NUMA multi-processor, and yet another for aneven larger collection of networked machines.This section describes an implementation targetedto the computing environment in wide use at Google:

large clusters of commodity PCs connected together withswitched Ethernet [4]. In our environment:

(1)Machines are typically dual-processor x86 processorsrunning Linux, with 2-4 GB of memory per machine.

(2) Commodity networking hardware is used – typicallyeither 100 megabits/second or 1 gigabit/second at themachine level, but averaging considerably less in over-all bisection bandwidth.

(3) A cluster consists of hundreds or thousands of ma-chines, and therefore machine failures are common.

(4) Storage is provided by inexpensive IDE disks at-tached directly to individual machines. A distributed filesystem [8] developed in-house is used to manage the datastored on these disks. The file system uses replication toprovide availability and reliability on top of unreliablehardware.

(5) Users submit jobs to a scheduling system. Each jobconsists of a set of tasks, and is mapped by the schedulerto a set of available machines within a cluster.

3.1 Execution Overview

The Map invocations are distributed across multiplemachines by automatically partitioning the input data

To appear in OSDI 2004 3

Execution in more detail

39

MR lib splits input.Starts masterand worker processes

Master assignsM map tasksR reduce tasks

Worker reads chunk& passes <key,val>to map function

Intermediate pairsstored in memory,written to local disk periodically.Split using user-specified partition function. Location sent back to master

Page 75: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

UserProgram

Master

(1) fork

worker

(1) fork

worker

(1) fork

(2)assignmap

(2)assignreduce

split 0

split 1

split 2

split 3

split 4

outputfile 0

(6) write

worker(3) read

worker

(4) local write

Mapphase

Intermediate files(on local disks)

worker outputfile 1

Inputfiles

(5) remote read

Reducephase

Outputfiles

Figure 1: Execution overview

Inverted Index: The map function parses each docu-ment, and emits a sequence of !word,document ID"pairs. The reduce function accepts all pairs for a givenword, sorts the corresponding document IDs and emits a!word, list(document ID)" pair. The set of all outputpairs forms a simple inverted index. It is easy to augmentthis computation to keep track of word positions.

Distributed Sort: The map function extracts the keyfrom each record, and emits a !key,record" pair. Thereduce function emits all pairs unchanged. This compu-tation depends on the partitioning facilities described inSection 4.1 and the ordering properties described in Sec-tion 4.2.

3 Implementation

Many different implementations of the MapReduce in-terface are possible. The right choice depends on theenvironment. For example, one implementation may besuitable for a small shared-memory machine, another fora large NUMA multi-processor, and yet another for aneven larger collection of networked machines.This section describes an implementation targetedto the computing environment in wide use at Google:

large clusters of commodity PCs connected together withswitched Ethernet [4]. In our environment:

(1)Machines are typically dual-processor x86 processorsrunning Linux, with 2-4 GB of memory per machine.

(2) Commodity networking hardware is used – typicallyeither 100 megabits/second or 1 gigabit/second at themachine level, but averaging considerably less in over-all bisection bandwidth.

(3) A cluster consists of hundreds or thousands of ma-chines, and therefore machine failures are common.

(4) Storage is provided by inexpensive IDE disks at-tached directly to individual machines. A distributed filesystem [8] developed in-house is used to manage the datastored on these disks. The file system uses replication toprovide availability and reliability on top of unreliablehardware.

(5) Users submit jobs to a scheduling system. Each jobconsists of a set of tasks, and is mapped by the schedulerto a set of available machines within a cluster.

3.1 Execution Overview

The Map invocations are distributed across multiplemachines by automatically partitioning the input data

To appear in OSDI 2004 3

Execution in more detail

39

MR lib splits input.Starts masterand worker processes

Master assignsM map tasksR reduce tasks

Worker reads chunk& passes <key,val>to map function

Intermediate pairsstored in memory,written to local disk periodically.Split using user-specified partition function. Location sent back to master

Given location from Master, Worker reads intermediate pairs, sorts by key,passes to reduce function. Result in R output files on global FS

Page 76: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Key Grouping

Page 77: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Parallel Execution Partition function hashes by key. E.g. hash(key) mod R.

Page 78: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Fault Tolerance / Workers

Page 79: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Task states idle, in-progress, completed

Handled via re-execution

Fault Tolerance / Workers

Page 80: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Task states idle, in-progress, completed

Handled via re-execution Detect failure via periodic heartbeats

Fault Tolerance / Workers

Page 81: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Task states idle, in-progress, completed

Handled via re-execution Detect failure via periodic heartbeats Re-execute completed + in-progress map tasks

Fault Tolerance / Workers

Page 82: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Task states idle, in-progress, completed

Handled via re-execution Detect failure via periodic heartbeats Re-execute completed + in-progress map tasks▫ Why??? (Complete tasks on local disk)

Fault Tolerance / Workers

Page 83: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Task states idle, in-progress, completed

Handled via re-execution Detect failure via periodic heartbeats Re-execute completed + in-progress map tasks▫ Why??? (Complete tasks on local disk)

Re-execute in progress reduce tasks

Fault Tolerance / Workers

Page 84: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Task states idle, in-progress, completed

Handled via re-execution Detect failure via periodic heartbeats Re-execute completed + in-progress map tasks▫ Why??? (Complete tasks on local disk)

Re-execute in progress reduce tasks Task completion committed through master

Fault Tolerance / Workers

Page 85: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Task states idle, in-progress, completed

Handled via re-execution Detect failure via periodic heartbeats Re-execute completed + in-progress map tasks▫ Why??? (Complete tasks on local disk)

Re-execute in progress reduce tasks Task completion committed through master

Robust: lost 1600/1800 machines once finished ok

Fault Tolerance / Workers

Page 86: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Task states idle, in-progress, completed

Handled via re-execution Detect failure via periodic heartbeats Re-execute completed + in-progress map tasks▫ Why??? (Complete tasks on local disk)

Re-execute in progress reduce tasks Task completion committed through master

Robust: lost 1600/1800 machines once finished okSemantics in presence of failures: see paper

Fault Tolerance / Workers

Page 87: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Master Failure

Could handle, … ? But don't yet

(master failure unlikely) Could use VM mechanism to hide

master failure

Page 88: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Slow workers significantly delay completion time Other jobs consuming resources on machine Bad disks w/ soft errors transfer data slowly Weird things: processor caches disabled (!!)

Solution: Near end of phase, spawn backup tasks Whichever one finishes first "wins"

Dramatically shortens job completion time

Refinement:

Page 89: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

RefinementSkipping Bad Records

Map/Reduce functions sometimes fail for particular inputs Best solution is to debug & fix▫ Not always possible ~ third-party source libraries

On segmentation fault: ▫ Send UDP packet to master from signal handler ▫ Include sequence number of record being

processed If master sees two failures for same record: ▫ Next worker is told to skip the record

Page 90: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Sorting guarantees within each reduce partition

Compression of intermediate data Combiner

Useful for saving network bandwidth

Local execution for debugging/testing User-defined counters

Other Refinements

Page 91: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Tests run on cluster of 1800 machines: 4 GB of memory Dual-processor 2 GHz Xeons with Hyperthreading Dual 160 GB IDE disks Gigabit Ethernet per machine Bisection bandwidth approximately 100 Gbps

Two benchmarks: MR_GrepScan 1010 100-byte records to extract records

matching a rare pattern (92K matching records)

Performance

Page 92: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

MR_Grep

Locality optimization helps: Input stored on FS in 64GB chunks

Workers are spawned near corresponding chucks 1800 machines read 1 TB at peak ~31 GB/s W/out this, rack switches would limit to 10 GB/s

Startup overhead is significant for short jobs

Rate at whichinput is scanned

Time

Start up overhead high as MR assigns tasks to workers.Must send program to all workers, and open 1000 inputfiles

Page 93: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

MR_Sort

sort program sorts 1010 100-byte records (approximately 1 terabyte of data)

map: extract 10-byte sorting key. emit key and line as value

reduce: built-in identity function

input data split into 64-MB pieces (M=15000)

output data in 4000 files (R=4000)

Partition function uses initial bytes of key to place in one of R chunks Local sort done for each R chunk by MR before the “reduce”

Map task send intermediate output to local disk before shuffling to form partition

49

Page 94: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

MR_Sort

Page 95: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Normal No backup tasks 200 processes killed

MR_Sort

Page 96: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Normal No backup tasks 200 processes killed

MR_Sort

Page 97: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Normal No backup tasks 200 processes killed

MR_Sort

Backup tasks reduce job completion time a lot! System deals well with failures

Page 98: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Normal No backup tasks 200 processes killed

MR_Sort

Backup tasks reduce job completion time a lot! System deals well with failures

IO bw < grep since desired pattern uncommonsort spends half of time writing output to local disks

Page 99: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Normal No backup tasks 200 processes killed

MR_Sort

Backup tasks reduce job completion time a lot! System deals well with failures

IO bw < grep since desired pattern uncommonsort spends half of time writing output to local disks

Rate data send from map tasks to reducetasks

Page 100: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Normal No backup tasks 200 processes killed

MR_Sort

Backup tasks reduce job completion time a lot! System deals well with failures

IO bw < grep since desired pattern uncommonsort spends half of time writing output to local disks

Rate data send from map tasks to reducetasks

Delay between first shuffling and startof writing due to sorting of intermediate data

Page 101: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Normal No backup tasks 200 processes killed

MR_Sort

Backup tasks reduce job completion time a lot! System deals well with failures

IO bw < grep since desired pattern uncommonsort spends half of time writing output to local disks

Rate data send from map tasks to reducetasks

First hump: 1700 reducetasks. 1 per host out of 1700 hosts

Delay between first shuffling and startof writing due to sorting of intermediate data

Page 102: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Normal No backup tasks 200 processes killed

MR_Sort

Backup tasks reduce job completion time a lot! System deals well with failures

IO bw < grep since desired pattern uncommonsort spends half of time writing output to local disks

Rate data send from map tasks to reducetasks

First hump: 1700 reducetasks. 1 per host out of 1700 hosts

Second hump: remaining reduce tasks

Delay between first shuffling and startof writing due to sorting of intermediate data

Page 103: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Usage in Aug 2004

Page 104: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Number of jobs 29,423 Average job completion time 634 secs Machine days used 79,186 days

Usage in Aug 2004

Page 105: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Number of jobs 29,423 Average job completion time 634 secs Machine days used 79,186 days

Input data read 3,288 TB Intermediate data produced 758 TB Output data written 193 TB

Usage in Aug 2004

Page 106: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Number of jobs 29,423 Average job completion time 634 secs Machine days used 79,186 days

Input data read 3,288 TB Intermediate data produced 758 TB Output data written 193 TB

Average worker machines per job 157 Average worker deaths per job 1.2

Usage in Aug 2004

Page 107: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Number of jobs 29,423 Average job completion time 634 secs Machine days used 79,186 days

Input data read 3,288 TB Intermediate data produced 758 TB Output data written 193 TB

Average worker machines per job 157 Average worker deaths per job 1.2 Average map tasks per job 3,351

Usage in Aug 2004

Page 108: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Number of jobs 29,423 Average job completion time 634 secs Machine days used 79,186 days

Input data read 3,288 TB Intermediate data produced 758 TB Output data written 193 TB

Average worker machines per job 157 Average worker deaths per job 1.2 Average map tasks per job 3,351 Average reduce tasks per job 55

Usage in Aug 2004

Page 109: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Number of jobs 29,423 Average job completion time 634 secs Machine days used 79,186 days

Input data read 3,288 TB Intermediate data produced 758 TB Output data written 193 TB

Average worker machines per job 157 Average worker deaths per job 1.2 Average map tasks per job 3,351 Average reduce tasks per job 55

Usage in Aug 2004

Page 110: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Conclusions MapReduce proven to be useful abstraction

Greatly simplifies large-scale computations

Fun to use: focus on problem, let library deal w/ messy details

Page 111: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

A major step backwards http://www.databasecolumn.com/2008/01/mapreduce-a-major-step-

back.html

A giant step backward in the programming paradigm for large-scale data intensive applications

A sub-optimal implementation, in that it uses brute force instead of indexing (hash / B-trees)

Not novel at all -- it represents a specific implementation of well known techniques developed nearly 25 years ago

Missing most of the features that are routinely included in current DBMS

Incompatible with all of the tools DBMS users have come to depend on

53

Page 112: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

• Use free compute, storage and network resources in Internet and Intranet environments

• Reuse existing (power, resource)infrastructure

• Motivation

• High return on investment

• Savings often a factor 5 or 10 compared to dedicated cluster

• Access to huge computational power and storage resources

Desktop Grids

Page 113: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

State of the Art

• 400 TeraFlops/sec, over one million hosts

State of the artState of the art

Tightly-coupled,Tightly-coupled,

multiple applications,multiple applications,

with time constraintswith time constraints

Loosely-coupled, Loosely-coupled,

single application,single application,

without time constraintswithout time constraints

Page 114: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Challenges• Volatility

• Resources are shared

• Mouse/keyboard activity, user processes

• Nondeterministic failures

• Often 50% failure rates

• Heterogeneity

• Accessibility

• Resources are behind NAT’s, firewalls

• Security

Page 115: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Outline

• BOINC

• XtremWeb

• Prediction

Page 116: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

BOINC• Background

• Led by David Anderson, UC Berkeley

• SETI@home

• Single astronomy application

• Too many resources

• Goals of BOINC

• Ability to share resources among multiple projects

• User autonomy

• Usability

Page 117: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

BOINC ArchitectureDATABASE

DISPATCHER

WORKER

DATA REPOSITORY

FILE

SYSTEM

GENERATOR

INTERNET

INPUT FILE

DOWNLOAD

AND OUTPUT

UPLOAD

BOINC SERVER

WORK UNIT

REQUEST,

DOWNLOAD

PROJECT

BOOKKEEPING

WORKERWORKER

Page 118: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

BOINC Architecture

WORKER

INTERNET

DATABASE

DISPATCHER

DATA REPOSITORY

FILE

SYSTEM

GENERATOR BOINC SERVER

PROJECT

BOOKKEEPING

DATABASE

DISPATCHER

DATA REPOSITORY

FILE

SYSTEM

GENERATOR BOINC SERVER

PROJECT

BOOKKEEPING

DATABASE

DISPATCHER

DATA REPOSITORY

FILE

SYSTEM

GENERATOR BOINC SERVER

PROJECT

BOOKKEEPING

Page 119: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

BOINC Worker Scheduling Problem

• Workers have resource share (CPU) allocation per project

• Work units per project have a deadline

• Goal: meet deadline and also resource share allocations

• Which project to schedule next on worker?

Page 120: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

BOINC Scheduling Approach

• Use weighted round robin until a project risks missing deadline

• If so, switch to earliest deadline first scheduling

• N.B.: scheduling depends on many different parameters (e.g., availability of the resources, resource hardware, user preferences, task deadlines, resource shares, estimates of task completion time, number and characteristics of projects )

Page 121: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

XtremWeb

• Led by Gilles Fedak ([email protected]), INRIA Futurs

• Goals

• Support symmetric needs of users

• Allow any node to play any role (client, worker)

• Fault tolerance

• Usability

Page 122: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

XtremWeb Architecture

CLIENT WORKER

DISPATCHER

INTERNET

TASK SUBMISSION TASK REQUEST,

DOWNLOAD,

UPLOAD

SCHEDULER

PROJECT

BOOKEEPING

DATABASE

FILE

SYSTEM

XTREMWEB SERVER

CLIENTWORKER

Page 123: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Artur Andrzejak Zuse-Institute Berlin (ZIB)

Derrick Kondo INRIA

David P. Anderson UC Berkeley

Ensuring Collective Availability in Volatile Resource Pools via

Forecasting

Page 124: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Motivation

• Goal: can we deploy serious services / apps over unreliable resources?

• How unreliable?– mostly non-dedicated PC's (used for other purposes)– e.g. volunteer computing Grids such as SETI@home– no control over availability, frequent churn

• What are "serious" services / apps?– large scale service deployment

• examples: Amazon's EC2, TeraGrid, EGEE– complex applications

• examples: DAG/message-passing applications

– high availability: around 99.999

Page 125: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

How to do this?• Difficult to get (many) hosts with high avail• Instead, we strive for collective availability:

– def.: guarantee that with high probability, in a group of R ≥ N hosts, at least N remain available over time T

R = 6, N = 3

Time T

Page 126: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

How to do this?• Difficult to get (many) hosts with high avail• Instead, we strive for collective availability:

– def.: guarantee that with high probability, in a group of R ≥ N hosts, at least N remain available over time T

R = 6, N = 3

Time T

Page 127: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

How to do this?• Difficult to get (many) hosts with high avail• Instead, we strive for collective availability:

– def.: guarantee that with high probability, in a group of R ≥ N hosts, at least N remain available over time T

R = 6, N = 3 4 ≥ N survived, col. availability achieved

Time T

Page 128: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Our Focus• We use statistical and prediction methods to

answer the question:– Given a pool of non-dedicated hosts and a request for N

hosts, how to select them such that the collective availability is maximized?

• i.e. at least N among R hosts "survive" interval T

• Then deployment:

Initialgroup

Usageover T

Whichfailed?

Predictionof next phase

Replacementfrom pool

Usageover T

...x xx

Initialprediction

Page 129: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Availability Prediction• We propose efficient and domain-adjusted

predictions of availability for individual hosts– efficient:

• fast pre-selection of predictable hosts• use simple and fast classification algorithm

– domain-adjusted• analyze the factors of predictability and adjust our methods

to them

• Then we use these individual predictions to achieve collective availability

Page 130: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Measurement Data• Availability traces for over 48,000 hosts

participating in SETI@home• Active in Dec 1st, 2007 to Feb 12th, 2008

• Availability recorded by a BOINC client – depends whether the machine was idle– The definition of idle depends on user settings

• Quantized to 1 hour intervals– regarded as available only if uninterrupted avail for the

whole hour – quite conservative

• For availability characterization, see:– Derrick Kondo, Artur Andrzejak, David P. Anderson: On Correlated

Availability in Internet-Distributed Systems, 9th IEEE/ACM International Conference on Grid Computing (Grid 2008), Tsukuba, Japan, September 29-October 1, 2008

Page 131: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Prediction Process

Page 132: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Filtering Hosts By Predictability• We want to find out, for each host, whether its

availability predictions are likely to be accurate• I.e. we want hosts with high predictability:

– def.: expected accuracy of predictions from a model build on historical data

• To estimate it, we use indicators of predictability– fast to compute (at least faster than a prediction model)

– use only training data

Page 133: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Predictability Computation• To assess the accuracy of predictability indicators, we have

to compute for each host the true accuracy of model-based predictions

• To this end, we train a prediction model on the historical availability data (4 weeks @ 1 hour), and then compute the prediction error on the subsequent 2 weeks (1 hour => 2*7*24 predictions)– This is only the "laboratory" scenario, not done in real

deployment

– The predictability indicators should tell us, for which hosts it is not worth to build model / do predictions

Page 134: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Predictability Indicators• We have tested, among others:

– Average length of an uninterrupted availability segment

– Size of the compressed availability trace

• traces with predictable patterns are likely to compress better

– Prediction error tested on a part of the training data (as a "control indicator")

– Number of availability state changes per week (aveSwitches)

• Evaluation:

– correlation, scatter plots

Page 135: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

And the winner is..• Number of availability state changes per week: aveSwitches

Spearman's rank correlation coefficientindicator vs. prediction error

Scatter plotaveSwitches vs.prediction error

Page 136: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Why are AveSwitches good?• There are some "reasons" for data regularity →

high prediction accuracy1. Periodic behavior, e.g. daily periodicities

2. Long runs of availability / non-availability3. …

• We have studied which "reasons" are dominant:– by using data preprocessing which "helps" either 1 or 2

• results show that "reason" 2 is dominant

• highest accuracy for a mixture of both "reasons"

Page 137: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Filtering by predictability• We create two groups for further processing:

– low predictability, with aveSwitches ≥ 7.47

– high predictability, with aveSwitches < 7.47

Page 138: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Prediction Background> Def. of classifier: a function which learns its output value from

examples

> Function inputs are called attributes, in our study:

> Functions of availability represented as 01 binary string

> Time (e.g. hour in day), history bits (sum of recent k history bits)

> Output is an element from some fixed set, in our study:

> {0,1} representing availability

learn

← predict

Attribute1 ... Attributen Output

Example 1 [23,10] ... [21,5] 0

Example k [11,10] ... [5,0] 1

Prediction [1,7] ... [13,7] ?

}

Page 139: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Prediction• We have used a simple and fast classifier

– naive Bayes

• The classifier takes examples i.e. vectors of measured avail + preprocessed data over 30 days

• Predicts for each hour over two weeks– starting now, will the host be available in the next k

hours• this is prediction interval length, pil

1 h 1 h 1 h 1 h 1 h

"now" prediction interval with pil = 4h

Page 140: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

What drives accuracy?• Dependence upon

– prediction interval length, pil

– training interval length– host ownership type (private, school, work)

Page 141: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Simulation Approach• For each host in the high-predictability group

make prediction at t0 for pil time, and select random R among those predicted as availiable

• R depends upon:– N = the desired number of hosts (at least N should be

always available)

– the redundancy (R-N)/N

• Our simulations answer:– given N and α, the desired availability level, what is the

necessary redundancy, i.e. necessary R?

– a little weaker: success rate: ratio (# experiments with at least N hosts alive after time T) / (all experiments)

Page 142: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Necessary Redundancy• High predictability group (pil=4)

Page 143: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Necessary Redundancy• Low predictability group (pil=4)

Page 144: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Is this Redundancy too high?• In high predictability group, we have required

redundancy of 35%• However, we consider this dramatically low

– In comparison, SETI@home has 200% redundancy (also used for result validation)

– In terms of absolute savings, that equates to 165 TeraFLOPS saved in a 1 PetaFLOPS system (such as FOLDING@home) => significant power savings

• As a result, the BOINC consortium is interested in potentially applying our prediction schema in their job scheduling (preliminary talks)

Page 145: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Migration Overhead• We also evaluated the overhead due to host

migration, service restart between slices of len T• Threshold = a multiple of pil which describes the

total time (many T's) of running an app / service

• Turnover rate TR:– let S be a set of hosts predicted to be available at t0

– for those we predict which ones become not available after time pil, i.e. second prediction at t0+T

– TR is the fraction of hosts which change from avail to non-avail

– essentially, the higher, the more migration needed

Page 146: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Turnover Rates• about 2.5% for high predictability group• about 12% for low predictability group

Page 147: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Summary• Given that host redundancy is not an issue

("cheap" resources), high collective availability is achievable– even with low migration costs

• Predictability assessment and filtering is essential– improves accuracy– avoids many "wasted" predictions

• Future work:– hardest part: a new "application architecture" /

programming model for collective availability

– masking failures by virtualization and VM migration

Page 148: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

References• This work has been accepted at:

– 19th IFIP/IEEE Conference on Distributed Systems: Operations and Management (DSOM 2008) (part of Manweek 2008), Samos Island, Greece, September 22-26, 2008

• Pdf available on request, please send e mail (derrick.kondo [at] inria.fr)

Page 149: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Reverse Web-Link Graph Map

For each URL linking to target, … Output <target, source> pairs

Reduce Concatenate list of all source URLs Outputs: <target, list (source)> pairs

Page 150: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Inverted Index Map ()

emit <word, document ID>

Reduce emit <word, list (document ID)>

Page 151: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Job Processing

JobTracker

TaskTracker 0 TaskTracker 1 TaskTracker 2

TaskTracker 3 TaskTracker 4 TaskTracker 5

Page 152: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Job Processing

JobTracker

TaskTracker 0 TaskTracker 1 TaskTracker 2

TaskTracker 3 TaskTracker 4 TaskTracker 5

1. Client submits “grep” job, indicating code and input files

“grep”

Page 153: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Job Processing

JobTracker

TaskTracker 0 TaskTracker 1 TaskTracker 2

TaskTracker 3 TaskTracker 4 TaskTracker 5

1. Client submits “grep” job, indicating code and input files

2. JobTracker breaks input file into k chunks, (in this case 6). Assigns work to ttrackers.

Page 154: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Job Processing

JobTracker

TaskTracker 0 TaskTracker 1 TaskTracker 2

TaskTracker 3 TaskTracker 4 TaskTracker 5

1. Client submits “grep” job, indicating code and input files

2. JobTracker breaks input file into k chunks, (in this case 6). Assigns work to ttrackers.

3. After map(), tasktrackers exchange map-output to build reduce() keyspace

Page 155: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Job Processing

JobTracker

TaskTracker 0 TaskTracker 1 TaskTracker 2

TaskTracker 3 TaskTracker 4 TaskTracker 5

1. Client submits “grep” job, indicating code and input files

2. JobTracker breaks input file into k chunks, (in this case 6). Assigns work to ttrackers.

3. After map(), tasktrackers exchange map-output to build reduce() keyspace

4. JobTracker breaks reduce() keyspace into m chunks (in this case 6). Assigns work.

Page 156: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Job Processing

JobTracker

TaskTracker 0 TaskTracker 1 TaskTracker 2

TaskTracker 3 TaskTracker 4 TaskTracker 5

1. Client submits “grep” job, indicating code and input files

2. JobTracker breaks input file into k chunks, (in this case 6). Assigns work to ttrackers.

3. After map(), tasktrackers exchange map-output to build reduce() keyspace

4. JobTracker breaks reduce() keyspace into m chunks (in this case 6). Assigns work.

5. reduce() output may go to NDFS

Page 157: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Task Granularity & Pipelining Fine granularity tasks: map tasks >> machines

Minimizes time for fault recovery Can pipeline shuffling with map execution Better dynamic load balancing

Often use 200,000 map & 5000 reduce tasks Running on 2000 machines

Page 158: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting
Page 159: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting
Page 160: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting
Page 161: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting
Page 162: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting
Page 163: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting
Page 164: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting
Page 165: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting
Page 166: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting
Page 167: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting
Page 168: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting
Page 169: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Refinement: Master scheduling policy:

Asks GFS for locations of replicas of input file blocks Map tasks typically split into 64MB (GFS block size) Map tasks scheduled so GFS input block replica are on

same machine or same rack

Effect Thousands of machines read input at local disk speed

▫ Without this, rack switches limit read rate

Page 170: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

EC2 SOAP/Query API

Images:RegisterImageDescribeImagesDeregisterImage

Instances:RunInstancesDescribeInstancesTerminateInstancesGetConsoleOutputRebootInstances

Keypairs:CreateKeyPairDescribeKeyPairsDeleteKeyPair

Image Attributes:ModifyImageAttributeDescribeImageAttributeResetImageAttribute

Security Groups:CreateSecurityGroupDescribeSecurityGroupsDeleteSecurityGroupAuthorizeSecurityGroupIngressRevokeSecurityGroupIngress

Page 171: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

CloudFront

106

Page 172: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Rewrote Google's production indexingSystem using MapReduce

Set of 10, 14, 17, 21, 24 MapReduce operations

New code is simpler, easier to understand ▫ 3800 lines C++ 700

MapReduce handles failures, slow machines Easy to make indexing faster

Experience

Page 173: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Related Work Programming model inspired by functional

language primitives Partitioning/shuffling similar to many large-scale

sorting systems NOW-Sort ['97]

Re-execution for fault tolerance BAD-FS ['04] and TACC ['97]

Locality optimization has parallels with Active Disks/Diamond work Active Disks ['01], Diamond ['04]

Backup tasks similar to Eager Scheduling in Charlotte system Charlotte ['96]

Dynamic load balancing solves similar problem as

Page 174: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Cloud versus the Grid Geographically distributed Across multiple administrative domains App’s need high-level programming

abstractions (e.g. workflow)

109

Page 175: On the Convergence of Cloud Computing and Desktop Gridspolaris.imag.fr/arnaud.legrand/teaching/2011/PC_10_mapreduce.pdf · "A Cloud is a type of parallel and distributed system consisting

Steps Get Amazon account

http://www.amazonaws.com Boot instance of AMI image Log in with ssh Start Apache

110