map reduce basics
Post on 22-Jan-2018
279 Views
Preview:
TRANSCRIPT
Abhishek MukherjeeUtkarsh Srivastava
13th,September
Not everything that can be counted counts, and not everything that counts can be counted.
WELCOME TO BIG DATA TRANING
What are we going to cover today?
Uses of Big Data
What is Hadoop?
Short intro to the HDFS architecture.
What is Map Reduce?
The components of Map Reduce Algorithm
Hello world of map reduce i.e. Word Count Algorithm
Tips and Tricks of Map Reduce
Distribution of twitter data to test Map Reduce jars
Big data is an evolving term that describes any voluminous amount of structured, semi-structured and unstructured data that has the potential to be mined for information.
Lots of Data(Zetabytes or Terabytes or Petabytes)
Systems / Enterprises generate huge amount of data from Terabytes to and even Petabytes of information.
A airline jet collects 10 terabytes of sensor data for every 30 minutes of flying time.
What is Big Data?
HDFS ARCHITECTURE
HDFS ARCHITECTURE CONTD.
Map Phase
Combiner Phase(Optional)
Sort Phase
Shuffle Phase
Partition Phase(Optional)
Reducer Phase
Key points
Map Reduce Algorithm
Hello my name is abhishek Hello my name is utsav
Hello my passion is cricket
Imagine this as the input file:
Map Phase
This file has 2 lines. Each line in the file has a byte offset of its own which serves as a key to the mapper and the value of the mapper is the data which is present In the line.
Operation on output of map phaseHello 1
my 1
name 1
is 1
abhishek 1
Hello 1
my 1
name 1
is 1
utsav 1
Hello 1
my 1
passion 1
is 1
cricket 1
Hello(1,1,1)
my(1,1,1)
name(1,1,1)
is(1,1,1)
abhishek(1)
utsav(1)
passion(1)
cricket(1)
Key(tuple of values)
The key points are as follows:
Sort the key value pairs according to the key values
Shuffle the mapped output to get values with same key to create a tuple of values with same key
This output is fed to the reducer which in turn maps the values of the tuple by returning a single value for a list of values present in the tuple
Explaination of sort and shuffle phase
Reducer phase
Hello(1,1,1)
my(1,1,1)
name(1,1,1)
is(1,1,1)
abhishek(1)
utsav(1)
passion(1)
cricket(1)
Key(tuple of values)
abhishek(1)
cricket(1)
Hello(3)
is(3)
my(3)
name(3)
passion(1)
utsav(1)
Key(single value)
ANY QUERIES?
top related