introduction to google mapreduce
DESCRIPTION
Introduction to Google MapReduce. Based on materials from Internet. What is MapReduce?. A programming model (& its associated implementation) For processing large data set Exploits large set of commodity computers Executes process in distributed manner Offers high degree of transparencies. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Introduction to Google MapReduce](https://reader031.vdocuments.net/reader031/viewer/2022032006/5681325e550346895d98eead/html5/thumbnails/1.jpg)
Introduction to Google MapReduce
Based on materials from Internet
![Page 2: Introduction to Google MapReduce](https://reader031.vdocuments.net/reader031/viewer/2022032006/5681325e550346895d98eead/html5/thumbnails/2.jpg)
What is MapReduce? A programming model (& its associated
implementation) For processing large data set Exploits large set of commodity computers Executes process in distributed manner Offers high degree of transparencies
![Page 3: Introduction to Google MapReduce](https://reader031.vdocuments.net/reader031/viewer/2022032006/5681325e550346895d98eead/html5/thumbnails/3.jpg)
Distributed Word Count
Very big
data
Split data
Split data
Split data
Split data
count
count
count
count
count
count
count
count
mergemergedcount
![Page 4: Introduction to Google MapReduce](https://reader031.vdocuments.net/reader031/viewer/2022032006/5681325e550346895d98eead/html5/thumbnails/4.jpg)
Map Reduce
Map: Accepts input
key/value pair Emits intermediate
key/value pair
Reduce : Accepts intermediate
key/value* pair Emits output
key/value pair
Very big
data
ResultMAP
REDUCE
PartitioningFunction
![Page 5: Introduction to Google MapReduce](https://reader031.vdocuments.net/reader031/viewer/2022032006/5681325e550346895d98eead/html5/thumbnails/5.jpg)
![Page 6: Introduction to Google MapReduce](https://reader031.vdocuments.net/reader031/viewer/2022032006/5681325e550346895d98eead/html5/thumbnails/6.jpg)
![Page 7: Introduction to Google MapReduce](https://reader031.vdocuments.net/reader031/viewer/2022032006/5681325e550346895d98eead/html5/thumbnails/7.jpg)
Partitioning Function
![Page 8: Introduction to Google MapReduce](https://reader031.vdocuments.net/reader031/viewer/2022032006/5681325e550346895d98eead/html5/thumbnails/8.jpg)
Partitioning Function (2) Default : hash(key) mod R Guarantee:
Relatively well-balanced partitions Ordering guarantee within partition
Distributed Sort Map:
emit(key,value) Reduce (with R=1):
emit(key,value)
Distributed Word Count Map:
for all w in value do emit(w,1)
Reduce: emit(key,sum(value*))
![Page 9: Introduction to Google MapReduce](https://reader031.vdocuments.net/reader031/viewer/2022032006/5681325e550346895d98eead/html5/thumbnails/9.jpg)
MapReduce
Class MapReduce{Class Mapper …{ Map code;}Class Reduer …{ Reduce code;}Main(){
JobConf Conf=new JobConf(“MR.Class”);Other code;
}}
![Page 10: Introduction to Google MapReduce](https://reader031.vdocuments.net/reader031/viewer/2022032006/5681325e550346895d98eead/html5/thumbnails/10.jpg)
MapReduce TransparenciesPlus Google Distributed File System : Parallelization Fault-tolerance Locality optimization Load balancing
![Page 11: Introduction to Google MapReduce](https://reader031.vdocuments.net/reader031/viewer/2022032006/5681325e550346895d98eead/html5/thumbnails/11.jpg)
MapReduce outside Google Hadoop (Java)
Emulates MapReduce and GFS The architecture of Hadoop MapReduce and
DFS is master/slave
Master Slave
MapReduce jobtracker tasktracker
DFS namenode datanode
![Page 12: Introduction to Google MapReduce](https://reader031.vdocuments.net/reader031/viewer/2022032006/5681325e550346895d98eead/html5/thumbnails/12.jpg)
Example Word Count: Mappublic static class MapClass extends MapReduceBase implements Mapper { private final static IntWritable one= new IntWritable(1); private Text word = new Text(); public void map(WritableComparable key, Writable value, OutputCollector output, Reporter reporter) throws IOException { String line = ((Text)value).toString(); StringTokenizer itr = new StringTokenizer(line); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); output.collect(word, one); } }}
![Page 13: Introduction to Google MapReduce](https://reader031.vdocuments.net/reader031/viewer/2022032006/5681325e550346895d98eead/html5/thumbnails/13.jpg)
Example Word Count: Reduce
public static class Reduce extends MapReduceBase implements Reducer { public void reduce(WritableComparable key, Iterator values, OutputCollector output, Reporter reporter) throws IOException { int sum = 0; while (values.hasNext()) { sum += ((IntWritable) values.next()).get(); } output.collect(key, new IntWritable(sum)); }}
![Page 14: Introduction to Google MapReduce](https://reader031.vdocuments.net/reader031/viewer/2022032006/5681325e550346895d98eead/html5/thumbnails/14.jpg)
Example Word Count: Main
public static void main(String[] args) throws IOException { //checking goes here JobConf conf = new JobConf();
conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(IntWritable.class); conf.setMapperClass(MapClass.class); conf.setCombinerClass(Reduce.class); conf.setReducerClass(Reduce.class); conf.setInputPath(new Path(args[0])); conf.setOutputPath(new Path(args[1])); JobClient.runJob(conf);}
![Page 15: Introduction to Google MapReduce](https://reader031.vdocuments.net/reader031/viewer/2022032006/5681325e550346895d98eead/html5/thumbnails/15.jpg)
One time setup set hadoop-site.xml and slaves Initiate namenode Run Hadoop MapReduce and DFS Upload your data to DFS Run your process… Download your data from DFS
![Page 16: Introduction to Google MapReduce](https://reader031.vdocuments.net/reader031/viewer/2022032006/5681325e550346895d98eead/html5/thumbnails/16.jpg)
Summary A simple programming model for processing
large dataset on large set of computer cluster Fun to use, focus on problem, and let the
library deal with the messy detail
![Page 17: Introduction to Google MapReduce](https://reader031.vdocuments.net/reader031/viewer/2022032006/5681325e550346895d98eead/html5/thumbnails/17.jpg)
References Original paper
(http://labs.google.com/papers/mapreduce.html)
On wikipedia (http://en.wikipedia.org/wiki/MapReduce)
Hadoop – MapReduce in Java (http://lucene.apache.org/hadoop/)
Starfish - MapReduce in Ruby (http://rufy.com/starfish/)