analysing of big data using map reduce
TRANSCRIPT
1
Efficient Analysis of Big Data Using Map Reduce Framework
Presented by Rajshekhar (1BY14SCS15)
Under the guidance of Guru Prasad S
2
outlineAbstractIntrodution Goals & Challenges Analyzing DataApplicationsHDFSBig Data AnalyticsMapReduceConclusionReferences
3
Abstract• Data now stream from daily life from
phones ,credit cards ,televisions and computers etc.(specially from Internet).
• The data flows so fast. Five exabytes of data are generated every days!!
• This huge collection of data is known as big data. -- Data is too diverse, fast, changing and massive.
• Its difficult for the current computing infrastructure to handle big data. • To overcome this draw back, Google introduced “MapReduce” framework.
•
4
Introduction• Big Data has to deal with large and complex datasets that
can be structured, semi-structured, or unstructured.
• Big Data is so large that, its difficult to process by traditional database and other software techniques;
How to explore, analyze such large datasets?.
• Analyzing big data is one of the challenges for researchers system and academicians that needs special analyzing techniques
• Hadoop Map Reduce is a technique which analysis big data.Two distinct tasks MAP and REDUCE
that Hadoop programs perform.
5
3 C’s in BiG DATA
6
Goals and ChallengesGoals:-Main goals of high-dimensional data analysis
are:-
To develop effective methods that can accurately predict the future observations.
Exploring the hidden structures of each subpopulation of the data.
Extracting important common features across many subpopulations.
7
Continued….
Challenges:--A. Meeting the need for speed .B. Understanding the data.C. Addressing data quality .D. Displaying meaningful results .E. Dealing with outliers .
8
Applications
Aadhar project by Govt. of India uses Hadoop.
New applications that are becoming possible in the Big Data era include:
A. Personalized services.B. Internet security.C. Personalized medicine.
9
HDFS(Hadoop Distributed File System)
• Designed to hold very large amounts of data (petabytes or even zettabytes), and provide high-throughput access to this information.
Characteristics: • Fault tolerant.• Runs with commodity hardware.• Able to handle large datasets.• Master slave paradigm.• Write once file access only.
HDFS components:• NameNode.• DataNode.• Secondary NameNode.
10
HDFS continued….
Fig: HDFS architecture
11
BIG DATA ANALYTICS
•“ The process of collecting, organizing and analyzing large sets of data.”
--To discover patterns & other useful information.
•It will also help identify the data.
•Big data analysts basically want the knowledge that comes from analyzing the data.
12
MAP REDUCE
• Invented by Google.• Is a programming model for processing large
datasets distributed on a large cluster.• MapReduce is the heart of Hadoop.• Uses the concept of Divide and Conquer.
• Two methods: map() and Reduce() .
•Map() sorting and filtering.•Reduce()counting and produce Result.
13
Mapreduce continued
Fig: MapReduce architecture
INPU
T DAT
Amap( )
map( )
map( )
reduce( )
reduce( )
OUTPUT DATA
14
15
Map Reduce algorithms:
• MapReduce is a programming model designed for processing large volumes of data in parallel by dividing the work into a set of independent tasks.
For example twitter data was processed on
different servers on basis of months .
Hadoop is the physical implementation of Mapreduce .
It is combination of 2 java functions : Mapper() and Reducer().
example: to check popularity of text.
16
Continued…. Mapper function maps the split files and provide input
to reducer.Mapper ( filename , file –contents):
for each word in file-contents:emit (word , 1).
Reducer function clubs the input provided by mapper and produce output
Reducer ( word , values): sum=0; for each value in values:
sum=sum + value emit(word , sum).
17
Conclusion MapReduce is simple but provides good scalability and
fault-tolerance for massive data processing.
Analysis tools like Map Reduce over Hadoop guarantees …
Faster advances in many scientific disciplines and Improving the Profitability and success of many
enterprises.
MapReduce has received a lot of attentions in many fields--- including Data mining,
Information retrieval, Image retrieval and Pattern recognition.
18
References
[1]Hadoop ,“PoweredbyHadoop”,http://wiki.apache.org/hadoop/Poweredby.
[2 ] Hadoop Tutorial,YahooInc., https://developer.yahoo.com/hadoop/tutorial/index.html.
[3 ] Apache: Apache Hadoop,http://hadoop.apache.org [4 ] Hadoop Distributed File System (HDFS),
http://hortonworks.com/hadoop/hdfs/
[5 ] Jianqing Fan1, Fang Han and Han Liu, Challenges of Big Data analysis, National Science Review Advance Access published February, 2014.
[6 ] Haddop MapReduce- http://hadooptutorial.wikispaces.com/MapReduce
[7] Jens Dittrich JorgeArnulfo Quian´eRuiz, Efficient Big Data Processing in Hadoop MapReduce.
End of Presentation.