hadoop & hdfs architecture - ravi namboori network architect

9
Hadoop & HDFS Architecture Presentation By Ravi Namboori Visit us: http://ravinamboori.net

Upload: ravi-namboori

Post on 26-Jul-2016

220 views

Category:

Documents


1 download

DESCRIPTION

In this PPT Ravi Namboori, Director of Equinix has explained us about Hadoop, distributed filesystems, HDFS Architecture and design with proper image & Advantages and Why Is the Block in HDFS is so Large every thing mentioned clearly.

TRANSCRIPT

Hadoop & HDFS Architecture

Presentation By

Ravi NambooriVisit us: http://ravinamboori.net

What is HDFS?

Hadoop comes with a distributed filesystem called HDFS, which stands for Hadoop distributed filesystem.

http://ravinamboori.net

What actually a distributed filesystem?

When a dataset outgrows the storage capacity of a single physical machine, it becomes necessary to partition it across a number of separate machines.

Filesystems that manage the storage across a network of machines are called distributed filesystems.

http://ravinamboori.net

Design of HDFS:

Very large files Streaming data access Commodity hardware

http://ravinamboori.net

Where HDFS Is Not A Good Fit

Low-latency data accessLots of small filesMultiple writers, arbitrary file modifications

http://ravinamboori.net

HDFS Architecture

An HDFS cluster consists of a single NameNode, a master server that manages the file system namespace and regulates access to files by clients.

It also consist of secondary namenode. It updates the namespace image with datalog.

http://ravinamboori.net

Why Is a Block in HDFS So Large? HDFS blocks are large compared to disk blocks, and

the reason is to minimize the cost of seeks. By making a block large enough, the time to transfer

the data from the disk can be made to be significantly larger than the time to seek to the start of the block.

Thus the time to transfer a large file made of multiple blocks operates at the disk transfer rate.

Advantage of HDFS?

A computation requested by an application is much more efficient if it is executed near the data it operates on. This is especially true when the size of the data set is huge. HDFS provides interfaces for applications to move themselves closer to where the data is located.

“Moving Computation is Cheaper than Moving Data"

http://ravinamboori.net

Thank You

Presentation By Ravi Namboori

Visit us: http://ravinamboori.in