analysing of big data using map reduce

1

Efficient Analysis of Big Data Using Map Reduce Framework

Presented by Rajshekhar (1BY14SCS15)

Under the guidance of Guru Prasad S

2

outlineAbstractIntrodution Goals & Challenges Analyzing DataApplicationsHDFSBig Data AnalyticsMapReduceConclusionReferences

3

Abstract• Data now stream from daily life from

phones ,credit cards ,televisions and computers etc.(specially from Internet).

• The data flows so fast. Five exabytes of data are generated every days!!

• This huge collection of data is known as big data. -- Data is too diverse, fast, changing and massive.

• Its difficult for the current computing infrastructure to handle big data. • To overcome this draw back, Google introduced “MapReduce” framework.

•

4

Introduction• Big Data has to deal with large and complex datasets that

can be structured, semi-structured, or unstructured.

• Big Data is so large that, its difficult to process by traditional database and other software techniques;

How to explore, analyze such large datasets?.

• Analyzing big data is one of the challenges for researchers system and academicians that needs special analyzing techniques

• Hadoop Map Reduce is a technique which analysis big data.Two distinct tasks MAP and REDUCE

that Hadoop programs perform.

5

3 C’s in BiG DATA

6

Goals and ChallengesGoals:-Main goals of high-dimensional data analysis

are:-

To develop effective methods that can accurately predict the future observations.

Exploring the hidden structures of each subpopulation of the data.

Extracting important common features across many subpopulations.

7

Continued….

Challenges:--A. Meeting the need for speed .B. Understanding the data.C. Addressing data quality .D. Displaying meaningful results .E. Dealing with outliers .

8

Applications

Aadhar project by Govt. of India uses Hadoop.

New applications that are becoming possible in the Big Data era include:

A. Personalized services.B. Internet security.C. Personalized medicine.

9

HDFS(Hadoop Distributed File System)

• Designed to hold very large amounts of data (petabytes or even zettabytes), and provide high-throughput access to this information.

Characteristics: • Fault tolerant.• Runs with commodity hardware.• Able to handle large datasets.• Master slave paradigm.• Write once file access only.

HDFS components:• NameNode.• DataNode.• Secondary NameNode.

10

HDFS continued….

Fig: HDFS architecture

11

BIG DATA ANALYTICS

•“ The process of collecting, organizing and analyzing large sets of data.”

--To discover patterns & other useful information.

•It will also help identify the data.

•Big data analysts basically want the knowledge that comes from analyzing the data.

12

MAP REDUCE

• Invented by Google.• Is a programming model for processing large

datasets distributed on a large cluster.• MapReduce is the heart of Hadoop.• Uses the concept of Divide and Conquer.

• Two methods: map() and Reduce() .

•Map() sorting and filtering.•Reduce()counting and produce Result.

13

Mapreduce continued

Fig: MapReduce architecture

INPU

T DAT

Amap( )

map( )

map( )

reduce( )

reduce( )

OUTPUT DATA

15

Map Reduce algorithms:

• MapReduce is a programming model designed for processing large volumes of data in parallel by dividing the work into a set of independent tasks.

For example twitter data was processed on

different servers on basis of months .

Hadoop is the physical implementation of Mapreduce .

It is combination of 2 java functions : Mapper() and Reducer().

example: to check popularity of text.

16

Continued…. Mapper function maps the split files and provide input

to reducer.Mapper ( filename , file –contents):

for each word in file-contents:emit (word , 1).

Reducer function clubs the input provided by mapper and produce output

Reducer ( word , values): sum=0; for each value in values:

sum=sum + value emit(word , sum).

17

Conclusion MapReduce is simple but provides good scalability and

fault-tolerance for massive data processing.

Analysis tools like Map Reduce over Hadoop guarantees …

Faster advances in many scientific disciplines and Improving the Profitability and success of many

enterprises.

MapReduce has received a lot of attentions in many fields--- including Data mining,

Information retrieval, Image retrieval and Pattern recognition.

18

References

[1]Hadoop ,“PoweredbyHadoop”,http://wiki.apache.org/hadoop/Poweredby.

[2 ] Hadoop Tutorial,YahooInc., https://developer.yahoo.com/hadoop/tutorial/index.html.

[3 ] Apache: Apache Hadoop,http://hadoop.apache.org [4 ] Hadoop Distributed File System (HDFS),

http://hortonworks.com/hadoop/hdfs/

[5 ] Jianqing Fan1, Fang Han and Han Liu, Challenges of Big Data analysis, National Science Review Advance Access published February, 2014.

[6 ] Haddop MapReduce- http://hadooptutorial.wikispaces.com/MapReduce

[7] Jens Dittrich JorgeArnulfo Quian´eRuiz, Efficient Big Data Processing in Hadoop MapReduce.

https://developer.yahoo.com/hadoop/tutorial/index.html

End of Presentation.

analysing of big data using map reduce

Education

abstract data

exabytes of data

data flows

data quality

analysis big data

big data era

big data analysts

introduction big data