hadoop hive presentation

27
Hadoop

Upload: arvind-kumar

Post on 02-Jul-2015

639 views

Category:

Education


3 download

DESCRIPTION

Hadoop seminar topic,Hadoop Cse,Hadoop ppt

TRANSCRIPT

Page 1: Hadoop hive presentation

Hadoop

Page 2: Hadoop hive presentation

Agenda• Problems with traditional large-scale systems

• Requirements for new approaches

• What is Hadoop..?

• Why Hadoop?

• Overview of Hadoop

• HDFS

• Map Reduce

• Applications

• Conclusion

Page 3: Hadoop hive presentation
Page 4: Hadoop hive presentation

Problems with traditional large-scale systems

Data is being increased day-by-day Issues with the network failure Server failure Loss of data Cost is more. Distributed computing need manual processing

Page 5: Hadoop hive presentation

Requirements for new approaches

Data should be stored in a distributed manner and parallel processing.

High performance and less cost. Should be scalable Should be simple to access and process Fault tolerance

Page 6: Hadoop hive presentation

What is Hadoop…?

Open Source Framework

Process large amount of data

Page 7: Hadoop hive presentation
Page 8: Hadoop hive presentation

Why Hadoop…?

• Accessible• Scalable• Robust• Simple

Page 9: Hadoop hive presentation

Overview of Hadoop

It handles 3 types of data

Structured

Semi – structured

Unstructured

Analyses and process large amounts of data (Peta byte)

Page 10: Hadoop hive presentation

Compare with traditional DB’s

RDBMS

• Stores GB’s of data

• Supports batch process and interactive process

• Allows Updation

• Schemas must me defined

• Only structured data

HADOOP

• Stores PB’s of data

• Only batch process

• Does not allow Updation, it follows WORM

• Schemas not required

• Supports 3 types of data

Page 11: Hadoop hive presentation
Page 12: Hadoop hive presentation
Page 13: Hadoop hive presentation

Components

Hadoop can be divided into 2 parts

1. HDFS – Hadoop Distributed File System

2. MapReduce Programming model

Page 14: Hadoop hive presentation

Hadoop Distributed File System

It is a distributed file system

Runs on commodity hardware

Provides high throughput access to application data

suitable for applications that have large data sets.

It is designed to store a very large amount of data (Tera or peta bytes).

Page 15: Hadoop hive presentation
Page 16: Hadoop hive presentation

Core Architectural Goal of HDFS

A HDFS instance may consist of thousands of server machines.

Detection of faults and quickly recovering from them in an automated manner

Page 17: Hadoop hive presentation

MapReduce Programming Model

MapReduce works on divide and conquer rule on the data.

Schedules execution across a set of machines

Manages inter-process communication

The Reducer processes all output from all mappers and arrives at final output

Page 18: Hadoop hive presentation

MapReduce Programming Model

– MAP• Map() function that processes a key/value pair to

generate a set of intermediate key/value pairs

– REDUCE• reduce() function that merges all intermediate values

associated with the same intermediate key.

Page 19: Hadoop hive presentation
Page 20: Hadoop hive presentation

Applications

Page 21: Hadoop hive presentation
Page 22: Hadoop hive presentation
Page 23: Hadoop hive presentation
Page 24: Hadoop hive presentation
Page 25: Hadoop hive presentation

REFERENCE

• HADOOP IN ACTION

- By CHUK LAM

• YOUTUBE

• WIKEPEDIA

• GOOGLE IMAGES

Page 26: Hadoop hive presentation

Conclusion

Page 27: Hadoop hive presentation