hadoop hive presentation

Hadoop

Agenda• Problems with traditional large-scale systems

• Requirements for new approaches

• What is Hadoop..?

• Why Hadoop?

• Overview of Hadoop

• HDFS

• Map Reduce

• Applications

• Conclusion

Problems with traditional large-scale systems

Data is being increased day-by-day Issues with the network failure Server failure Loss of data Cost is more. Distributed computing need manual processing

Requirements for new approaches

Data should be stored in a distributed manner and parallel processing.

High performance and less cost. Should be scalable Should be simple to access and process Fault tolerance

What is Hadoop…?

Open Source Framework

Process large amount of data

Why Hadoop…?

• Accessible• Scalable• Robust• Simple

Overview of Hadoop

It handles 3 types of data

Structured

Semi – structured

Unstructured

Analyses and process large amounts of data (Peta byte)

Compare with traditional DB’s

RDBMS

• Stores GB’s of data

• Supports batch process and interactive process

• Allows Updation

• Schemas must me defined

• Only structured data

HADOOP

• Stores PB’s of data

• Only batch process

• Does not allow Updation, it follows WORM

• Schemas not required

• Supports 3 types of data

Components

Hadoop can be divided into 2 parts

1. HDFS – Hadoop Distributed File System

2. MapReduce Programming model

Hadoop Distributed File System

It is a distributed file system

Runs on commodity hardware

Provides high throughput access to application data

suitable for applications that have large data sets.

It is designed to store a very large amount of data (Tera or peta bytes).

Core Architectural Goal of HDFS

A HDFS instance may consist of thousands of server machines.

Detection of faults and quickly recovering from them in an automated manner

MapReduce Programming Model

MapReduce works on divide and conquer rule on the data.

Schedules execution across a set of machines

Manages inter-process communication

The Reducer processes all output from all mappers and arrives at final output

MapReduce Programming Model

– MAP• Map() function that processes a key/value pair to

generate a set of intermediate key/value pairs

– REDUCE• reduce() function that merges all intermediate values

associated with the same intermediate key.

Applications

REFERENCE

• HADOOP IN ACTION

- By CHUK LAM

• YOUTUBE

• WIKEPEDIA

• GOOGLE IMAGES

Conclusion

hadoop hive presentation

Education