1/11/2014 it0483-principles of cloud computing,n.arivazhagan 1 ito483-principles of cloud computing...

25
07/02/22 IT0483-PRINCIPLES OF CLOUD COMPUTING ,N.ARIVAZHAGAN 1 ITO483-PRINCIPLES OF CLOUD COMPUTING Unit-5. CASE STUDY : Amazon Case Study. Introduction to MapReduce: Discussion of Google Paper, GFS, HDFS, Hadoop Framework.

Upload: bryan-kelly

Post on 26-Mar-2015

220 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: 1/11/2014 IT0483-PRINCIPLES OF CLOUD COMPUTING,N.ARIVAZHAGAN 1 ITO483-PRINCIPLES OF CLOUD COMPUTING Unit-5. CASE STUDY : Amazon Case Study. Introduction

04/10/23IT0483-PRINCIPLES OF CLOUD COMPUTING

,N.ARIVAZHAGAN 1

ITO483-PRINCIPLES OF CLOUD COMPUTING

Unit-5. CASE STUDY : Amazon Case Study. Introduction to MapReduce: Discussion of Google Paper,GFS, HDFS, Hadoop Framework.

Page 2: 1/11/2014 IT0483-PRINCIPLES OF CLOUD COMPUTING,N.ARIVAZHAGAN 1 ITO483-PRINCIPLES OF CLOUD COMPUTING Unit-5. CASE STUDY : Amazon Case Study. Introduction

04/10/23IT0483-PRINCIPLES OF CLOUD COMPUTING

,N.ARIVAZHAGAN 2

AMAZON WEB SERVICEUsing Amazon Web Services, an e-commerce web site can weather unforeseen demand with ease; a pharmaceutical company can “rent” computing power to execute large-scale simulations; a media company can serve unlimited videos, music, and more; and an enterprise can deploy bandwidth-consuming services and training to its mobile workforce

Page 3: 1/11/2014 IT0483-PRINCIPLES OF CLOUD COMPUTING,N.ARIVAZHAGAN 1 ITO483-PRINCIPLES OF CLOUD COMPUTING Unit-5. CASE STUDY : Amazon Case Study. Introduction

04/10/23IT0483-PRINCIPLES OF CLOUD COMPUTING

,N.ARIVAZHAGAN 3

No contracts or commitments

Pay as you go

Transparent pricing

Better economics

Better use of your time

Better environmental impact

BENEFITS

Page 4: 1/11/2014 IT0483-PRINCIPLES OF CLOUD COMPUTING,N.ARIVAZHAGAN 1 ITO483-PRINCIPLES OF CLOUD COMPUTING Unit-5. CASE STUDY : Amazon Case Study. Introduction

04/10/23IT0483-PRINCIPLES OF CLOUD COMPUTING

,N.ARIVAZHAGAN 4

The idea of Map, and Reduce is 40+ year oldPresent in all Functional Programming Languages. See, e.g., APL, Lisp and ML

Alternate names for Map: Apply-AllHigher Order Functions

take function definitions as arguments, orreturn a function as output

Map and Reduce are higher-order functions

MAP REDUCE

Page 5: 1/11/2014 IT0483-PRINCIPLES OF CLOUD COMPUTING,N.ARIVAZHAGAN 1 ITO483-PRINCIPLES OF CLOUD COMPUTING Unit-5. CASE STUDY : Amazon Case Study. Introduction

04/10/23IT0483-PRINCIPLES OF CLOUD COMPUTING

,N.ARIVAZHAGAN 5

MAP REDUCEF(x: int) returns r: intLet V be an array of integers.W = map(F, V)

W[i] = F(V[i]) for all Ii.e., apply F to every element of V

Page 6: 1/11/2014 IT0483-PRINCIPLES OF CLOUD COMPUTING,N.ARIVAZHAGAN 1 ITO483-PRINCIPLES OF CLOUD COMPUTING Unit-5. CASE STUDY : Amazon Case Study. Introduction

04/10/23IT0483-PRINCIPLES OF CLOUD COMPUTING

,N.ARIVAZHAGAN 6

reduce also known as fold, accumulate, compress or inject

Reduce/fold takes in a function and folds it in between the elements of a list.

reduce: A Higher Order Function

Page 7: 1/11/2014 IT0483-PRINCIPLES OF CLOUD COMPUTING,N.ARIVAZHAGAN 1 ITO483-PRINCIPLES OF CLOUD COMPUTING Unit-5. CASE STUDY : Amazon Case Study. Introduction

04/10/23IT0483-PRINCIPLES OF CLOUD COMPUTING

,N.ARIVAZHAGAN 7

Map/Reduce Implementation Idea

MapReduce and Distributed File System framework for large commodity clustersMaster/Slave relationship

JobTracker handles all scheduling & data flow between TaskTrackersTaskTracker handles all worker tasks on a nodeIndividual worker task runs map or reduce operation

Integrates with HDFS for data locality

Page 8: 1/11/2014 IT0483-PRINCIPLES OF CLOUD COMPUTING,N.ARIVAZHAGAN 1 ITO483-PRINCIPLES OF CLOUD COMPUTING Unit-5. CASE STUDY : Amazon Case Study. Introduction

04/10/23IT0483-PRINCIPLES OF CLOUD COMPUTING

,N.ARIVAZHAGAN 8

HDFS: Hadoop's own file system. Amazon S3 file system.

Targeted at clusters hosted on the Amazon Elastic Compute Cloud server-on-demand infrastructureNot rack-aware

CloudStorepreviously Kosmos Distributed File Systemlike HDFS, this is rack-aware.

FTP Filesystemstored on remote FTP servers.

Read-only HTTP and HTTPS file systems.

Hadoop Supported File Systems

Page 9: 1/11/2014 IT0483-PRINCIPLES OF CLOUD COMPUTING,N.ARIVAZHAGAN 1 ITO483-PRINCIPLES OF CLOUD COMPUTING Unit-5. CASE STUDY : Amazon Case Study. Introduction

04/10/23IT0483-PRINCIPLES OF CLOUD COMPUTING

,N.ARIVAZHAGAN 9

Designed to scale to petabytes of storage, and run on top of the file systems of the underlying OS.

Master (“NameNode”) handles replication, deletion, creation

Slave (“DataNode”) handles data retrievalFiles stored in many blocks

Each block has a block IdBlock Id associated with several nodes hostname:port

(depending on level of replication)

HDFS: Hadoop Distr File System

Page 10: 1/11/2014 IT0483-PRINCIPLES OF CLOUD COMPUTING,N.ARIVAZHAGAN 1 ITO483-PRINCIPLES OF CLOUD COMPUTING Unit-5. CASE STUDY : Amazon Case Study. Introduction

04/10/23IT0483-PRINCIPLES OF CLOUD COMPUTING

,N.ARIVAZHAGAN 10

MapReduce is also the name of a framework developed by GoogleHadoop was initially developed by Yahoo and now part of the Apache group.Hadoop was inspired by Google's MapReduce and Google File System (GFS) papers.

Hadoop v. ‘MapReduce

Page 11: 1/11/2014 IT0483-PRINCIPLES OF CLOUD COMPUTING,N.ARIVAZHAGAN 1 ITO483-PRINCIPLES OF CLOUD COMPUTING Unit-5. CASE STUDY : Amazon Case Study. Introduction

04/10/23IT0483-PRINCIPLES OF CLOUD COMPUTING

,N.ARIVAZHAGAN 11

MapReduce Hadoop

Org Google Yahoo/Apache

Impl C++ Java

Distributed File Sys

GFS HDFS

Data Base Bigtable HBase

Distributed lock mgr

Chubby ZooKeeper

Page 12: 1/11/2014 IT0483-PRINCIPLES OF CLOUD COMPUTING,N.ARIVAZHAGAN 1 ITO483-PRINCIPLES OF CLOUD COMPUTING Unit-5. CASE STUDY : Amazon Case Study. Introduction

04/10/23IT0483-PRINCIPLES OF CLOUD COMPUTING

,N.ARIVAZHAGAN 12

Page 13: 1/11/2014 IT0483-PRINCIPLES OF CLOUD COMPUTING,N.ARIVAZHAGAN 1 ITO483-PRINCIPLES OF CLOUD COMPUTING Unit-5. CASE STUDY : Amazon Case Study. Introduction

wordCount

A Simple Hadoop Examplehttp://wiki.apache.org/hadoop/WordCount

Page 14: 1/11/2014 IT0483-PRINCIPLES OF CLOUD COMPUTING,N.ARIVAZHAGAN 1 ITO483-PRINCIPLES OF CLOUD COMPUTING Unit-5. CASE STUDY : Amazon Case Study. Introduction

Word Count Example• Read text files and count how often words

occur. – The input is text files– The output is a text file

• each line: word, tab, count

• Map: Produce pairs of (word, count)• Reduce: For each word, sum up the counts.

04/10/23 14IT0483-PRINCIPLES OF CLOUD COMPUTING

,N.ARIVAZHAGAN

Page 15: 1/11/2014 IT0483-PRINCIPLES OF CLOUD COMPUTING,N.ARIVAZHAGAN 1 ITO483-PRINCIPLES OF CLOUD COMPUTING Unit-5. CASE STUDY : Amazon Case Study. Introduction

WordCount Overview 3 import ... 12 public class WordCount { 13 14 public static class Map extends MapReduceBase implements Mapper ... { 17 18 public void map ... 26 } 27 28 public static class Reduce extends MapReduceBase implements Reducer ... { 29 30 public void reduce ... 37 } 38 39 public static void main(String[] args) throws Exception { 40 JobConf conf = new JobConf(WordCount.class); 41 ... 53 FileInputFormat.setInputPaths(conf, new Path(args[0])); 54 FileOutputFormat.setOutputPath(conf, new Path(args[1])); 55 56 JobClient.runJob(conf); 57 } 58 59 }

04/10/23 15IT0483-PRINCIPLES OF CLOUD COMPUTING

,N.ARIVAZHAGAN

Page 16: 1/11/2014 IT0483-PRINCIPLES OF CLOUD COMPUTING,N.ARIVAZHAGAN 1 ITO483-PRINCIPLES OF CLOUD COMPUTING Unit-5. CASE STUDY : Amazon Case Study. Introduction

wordCount Mapper 14 public static class Map extends MapReduceBase implements Mapper<LongWritable,

Text, Text, IntWritable> { 15 private final static IntWritable one = new IntWritable(1); 16 private Text word = new Text(); 17 18 public void map(

LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter)

throws IOException { 19 String line = value.toString(); 20 StringTokenizer tokenizer = new StringTokenizer(line); 21 while (tokenizer.hasMoreTokens()) { 22 word.set(tokenizer.nextToken()); 23 output.collect(word, one); 24 } 25 } 26 }

04/10/23 16IT0483-PRINCIPLES OF CLOUD COMPUTING

,N.ARIVAZHAGAN

Page 17: 1/11/2014 IT0483-PRINCIPLES OF CLOUD COMPUTING,N.ARIVAZHAGAN 1 ITO483-PRINCIPLES OF CLOUD COMPUTING Unit-5. CASE STUDY : Amazon Case Study. Introduction

wordCount Reducer 28 public static class Reduce extends MapReduceBase implements

Reducer<Text, IntWritable, Text, IntWritable> { 29 30 public void reduce(Text key, Iterator<IntWritable> values,

OutputCollector<Text, IntWritable> output,Reporter reporter)

throws IOException { 31 int sum = 0; 32 while (values.hasNext()) { 33 sum += values.next().get(); 34 } 35 output.collect(key, new IntWritable(sum)); 36 } 37 }

04/10/23 17IT0483-PRINCIPLES OF CLOUD COMPUTING

,N.ARIVAZHAGAN

Page 18: 1/11/2014 IT0483-PRINCIPLES OF CLOUD COMPUTING,N.ARIVAZHAGAN 1 ITO483-PRINCIPLES OF CLOUD COMPUTING Unit-5. CASE STUDY : Amazon Case Study. Introduction

wordCount JobConf 40 JobConf conf = new JobConf(WordCount.class); 41 conf.setJobName("wordcount"); 42 43 conf.setOutputKeyClass(Text.class); 44 conf.setOutputValueClass(IntWritable.class); 45 46 conf.setMapperClass(Map.class); 47 conf.setCombinerClass(Reduce.class); 48 conf.setReducerClass(Reduce.class); 49 50 conf.setInputFormat(TextInputFormat.class); 51 conf.setOutputFormat(TextOutputFormat.class);

04/10/23 18IT0483-PRINCIPLES OF CLOUD COMPUTING

,N.ARIVAZHAGAN

Page 19: 1/11/2014 IT0483-PRINCIPLES OF CLOUD COMPUTING,N.ARIVAZHAGAN 1 ITO483-PRINCIPLES OF CLOUD COMPUTING Unit-5. CASE STUDY : Amazon Case Study. Introduction

WordCount main 39 public static void main(String[] args) throws Exception { 40 JobConf conf = new JobConf(WordCount.class); 41 conf.setJobName("wordcount"); 42 43 conf.setOutputKeyClass(Text.class); 44 conf.setOutputValueClass(IntWritable.class); 45 46 conf.setMapperClass(Map.class); 47 conf.setCombinerClass(Reduce.class); 48 conf.setReducerClass(Reduce.class); 49 50 conf.setInputFormat(TextInputFormat.class); 51 conf.setOutputFormat(TextOutputFormat.class); 52 53 FileInputFormat.setInputPaths(conf, new Path(args[0])); 54 FileOutputFormat.setOutputPath(conf, new Path(args[1])); 55 56 JobClient.runJob(conf); 57 }

04/10/23 19IT0483-PRINCIPLES OF CLOUD COMPUTING

,N.ARIVAZHAGAN

Page 20: 1/11/2014 IT0483-PRINCIPLES OF CLOUD COMPUTING,N.ARIVAZHAGAN 1 ITO483-PRINCIPLES OF CLOUD COMPUTING Unit-5. CASE STUDY : Amazon Case Study. Introduction

Invocation of wordcount1. /usr/local/bin/hadoop dfs -mkdir <hdfs-dir>2. /usr/local/bin/hadoop dfs -copyFromLocal

<local-dir> <hdfs-dir> 3. /usr/local/bin/hadoop

jar hadoop-*-examples.jar wordcount [-m <#maps>] [-r <#reducers>] <in-dir> <out-dir>

04/10/23 20IT0483-PRINCIPLES OF CLOUD COMPUTING

,N.ARIVAZHAGAN

Page 21: 1/11/2014 IT0483-PRINCIPLES OF CLOUD COMPUTING,N.ARIVAZHAGAN 1 ITO483-PRINCIPLES OF CLOUD COMPUTING Unit-5. CASE STUDY : Amazon Case Study. Introduction

GFS• Google File System (GFS or GoogleFS) is a

proprietary distributed file system developed by Google Inc. for its own use. It is designed to provide efficient, reliable access to data using large clusters of commodity hardware.

04/10/23IT0483-PRINCIPLES OF CLOUD COMPUTING

,N.ARIVAZHAGAN 21

Page 22: 1/11/2014 IT0483-PRINCIPLES OF CLOUD COMPUTING,N.ARIVAZHAGAN 1 ITO483-PRINCIPLES OF CLOUD COMPUTING Unit-5. CASE STUDY : Amazon Case Study. Introduction

HDFS• Hadoop Distributed File System (HDFS™) is the

primary storage system used by Hadoop applications. HDFS creates multiple replicas of data blocks and distributes them on compute nodes throughout a cluster to enable reliable, extremely rapid computations.

04/10/23IT0483-PRINCIPLES OF CLOUD COMPUTING

,N.ARIVAZHAGAN 22

Page 23: 1/11/2014 IT0483-PRINCIPLES OF CLOUD COMPUTING,N.ARIVAZHAGAN 1 ITO483-PRINCIPLES OF CLOUD COMPUTING Unit-5. CASE STUDY : Amazon Case Study. Introduction

04/10/23IT0483-PRINCIPLES OF CLOUD

COMPUTING,N.ARIVAZHAGAN 23

Review questions Part A

1) What is the Apache Hadoop?2) Mention the uses of Amazon EC2 Cloud Computing services3) What is meant by MapReduce?4) Mention the hot spots of MapReduce framework5) What are the different steps in MapReduce framework6) What is the use of Map Partition function7) Mention the uses of MapReduce function8) Differentiate job tracker and task tracker9) What is the algorithm used in scheduling of Hadoop?10) What is meant by fair scheduler. Mention its uses11) What is meant by capacity scheduler12) Mention any four applications of Hadoop13) Who are the all the main users of Hadoop14) Mention any four commercially supported Hadoop related

products

Page 24: 1/11/2014 IT0483-PRINCIPLES OF CLOUD COMPUTING,N.ARIVAZHAGAN 1 ITO483-PRINCIPLES OF CLOUD COMPUTING Unit-5. CASE STUDY : Amazon Case Study. Introduction

04/10/23IT0483-PRINCIPLES OF CLOUD COMPUTING

,N.ARIVAZHAGAN 24

Part B

1) Draw and explain about Hadoop architecture

2) Explain about Hadoop File System

3) Explain about Amazon EC2 Cloud Computing Case Study for financial organization

4) Explain the concept of MapReduce

5) Explain the concept of Google File System

Page 25: 1/11/2014 IT0483-PRINCIPLES OF CLOUD COMPUTING,N.ARIVAZHAGAN 1 ITO483-PRINCIPLES OF CLOUD COMPUTING Unit-5. CASE STUDY : Amazon Case Study. Introduction

04/10/23IT0483-PRINCIPLES OF CLOUD

COMPUTING,N.ARIVAZHAGAN 25

REFERENCES1.WWW.WIKIPEDIA.ORG