analysing of big data using map reduce

19
Efficient Analysis of Big Data Using Map Reduce Framework 1 Presented by Rajshekhar (1BY14SCS15) Under the guidance o Guru Prasad S

Upload: bmsit-blore

Post on 16-Aug-2015

20 views

Category:

Education


0 download

TRANSCRIPT

Page 1: Analysing of big data using map reduce

1

Efficient Analysis of Big Data Using Map Reduce Framework

Presented by Rajshekhar (1BY14SCS15)

Under the guidance of Guru Prasad S

Page 2: Analysing of big data using map reduce

2

outlineAbstractIntrodution Goals & Challenges Analyzing DataApplicationsHDFSBig Data AnalyticsMapReduceConclusionReferences

Page 3: Analysing of big data using map reduce

3

Abstract• Data now stream from daily life from

phones ,credit cards ,televisions and computers etc.(specially from Internet).

• The data flows so fast. Five exabytes of data are generated every days!!

• This huge collection of data is known as big data. -- Data is too diverse, fast, changing and massive.

• Its difficult for the current computing infrastructure to handle big data. • To overcome this draw back, Google introduced “MapReduce” framework.

Page 4: Analysing of big data using map reduce

4

Introduction• Big Data has to deal with large and complex datasets that

can be structured, semi-structured, or unstructured.

• Big Data is so large that, its difficult to process by traditional database and other software techniques;

How to explore, analyze such large datasets?.

• Analyzing big data is one of the challenges for researchers system and academicians that needs special analyzing techniques

• Hadoop Map Reduce is a technique which analysis big data.Two distinct tasks MAP and REDUCE

that Hadoop programs perform.

Page 5: Analysing of big data using map reduce

5

3 C’s in BiG DATA

Page 6: Analysing of big data using map reduce

6

Goals and ChallengesGoals:-Main goals of high-dimensional data analysis

are:-

To develop effective methods that can accurately predict the future observations.

Exploring the hidden structures of each subpopulation of the data.

Extracting important common features across many subpopulations.

Page 7: Analysing of big data using map reduce

7

Continued….

Challenges:--A. Meeting the need for speed .B. Understanding the data.C. Addressing data quality .D. Displaying meaningful results .E. Dealing with outliers .

Page 8: Analysing of big data using map reduce

8

Applications

Aadhar project by Govt. of India uses Hadoop.

New applications that are becoming possible in the Big Data era include:

A. Personalized services.B. Internet security.C. Personalized medicine.

Page 9: Analysing of big data using map reduce

9

HDFS(Hadoop Distributed File System)

• Designed to hold very large amounts of data (petabytes or even zettabytes), and provide high-throughput access to this information.

Characteristics: • Fault tolerant.• Runs with commodity hardware.• Able to handle large datasets.• Master slave paradigm.• Write once file access only.

HDFS components:• NameNode.• DataNode.• Secondary NameNode.

Page 10: Analysing of big data using map reduce

10

HDFS continued….

Fig: HDFS architecture

Page 11: Analysing of big data using map reduce

11

BIG DATA ANALYTICS

•“ The process of collecting, organizing and analyzing large sets of data.”

--To discover patterns & other useful information.

•It will also help identify the data.

•Big data analysts basically want the knowledge that comes from analyzing the data.

Page 12: Analysing of big data using map reduce

12

MAP REDUCE

• Invented by Google.• Is a programming model for processing large

datasets distributed on a large cluster.• MapReduce is the heart of Hadoop.• Uses the concept of Divide and Conquer.

• Two methods: map() and Reduce() .

•Map() sorting and filtering.•Reduce()counting and produce Result.

Page 13: Analysing of big data using map reduce

13

Mapreduce continued

Fig: MapReduce architecture

INPU

T DAT

Amap( )

map( )

map( )

reduce( )

reduce( )

OUTPUT DATA

Page 14: Analysing of big data using map reduce

14

Page 15: Analysing of big data using map reduce

15

Map Reduce algorithms:

• MapReduce is a programming model designed for processing large volumes of data in parallel by dividing the work into a set of independent tasks.

For example twitter data was processed on

different servers on basis of months .

Hadoop is the physical implementation of Mapreduce .

It is combination of 2 java functions : Mapper() and Reducer().

example: to check popularity of text.

Page 16: Analysing of big data using map reduce

16

Continued…. Mapper function maps the split files and provide input

to reducer.Mapper ( filename , file –contents):

for each word in file-contents:emit (word , 1).

Reducer function clubs the input provided by mapper and produce output

Reducer ( word , values): sum=0; for each value in values:

sum=sum + value emit(word , sum).

Page 17: Analysing of big data using map reduce

17

Conclusion MapReduce is simple but provides good scalability and

fault-tolerance for massive data processing.

Analysis tools like Map Reduce over Hadoop guarantees …

Faster advances in many scientific disciplines and Improving the Profitability and success of many

enterprises.

MapReduce has received a lot of attentions in many fields--- including Data mining,

Information retrieval, Image retrieval and Pattern recognition.

Page 18: Analysing of big data using map reduce

18

References

[1]Hadoop ,“PoweredbyHadoop”,http://wiki.apache.org/hadoop/Poweredby.

[2 ] Hadoop Tutorial,YahooInc., https://developer.yahoo.com/hadoop/tutorial/index.html.

[3 ] Apache: Apache Hadoop,http://hadoop.apache.org [4 ] Hadoop Distributed File System (HDFS),

http://hortonworks.com/hadoop/hdfs/

[5 ] Jianqing Fan1, Fang Han and Han Liu, Challenges of Big Data analysis, National Science Review Advance Access published February, 2014.

[6 ] Haddop MapReduce- http://hadooptutorial.wikispaces.com/MapReduce

[7] Jens Dittrich JorgeArnulfo Quian´eRuiz, Efficient Big Data Processing in Hadoop MapReduce.

Page 19: Analysing of big data using map reduce

End of Presentation.