the hadoop distributed file system

PaoMin Wu University at Buffalo

The Hadoop Distributed File System

ARCHITECTURE

1. Namenodestores matadata of the systemkeeps all namespace in RAM

2. Datanodeblock replicastores application data

3. HDFS-ClientUser applications access the file system using the HDFSclient

HDFS Client Process

ARCHITECTURE

4. Image and JournalNamespace image = file system metadataPeresistent record of image = checkpoint

5. CheckpointNode (NameNode)Protects file system metadata

6. BackupNode (NameNode)Capable of creating periodic checkpoints

FILE I/O OPERATIONS AND REPLICA MANGEMENT

Sort Benchmark

Future Work

Problem:NameNode contains all important information

Solution:Allow multiple namespaces(and NameNodes) to share the physical storage within a cluster

PaoMin Wu University at Buffalo

MapReduce: Simplied Data Processing on Large Clusters

Introduction

•key/value pair

•execution across a set of machines

•handling machine failures

•managing the required inter-machine communication

•runs on a large cluster

•powerful interface

•automatic parallelization

•distribution of large-scale computations

Programming Model

Map, written by the user, takes an input pair and produces a set of intermediate key/value pairs.

The Reduce function, also written by the user, acceptsan intermediate key and a set of values for that key.

The intermediate values are supplied to the user's reduce function via an iterator.

Example:

Execution Overflow:

Backup Tasks:

Conclusions

1. Restricting the programming model is beneficial

2. Network bandwidth is a scarce resource

3. Redundant execution can help

References:

The Hadoop Distributed File SystemKonstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert ChanslerYahoo!Sunnyvale, California USA{Shv, Hairong, SRadia, Chansler}@Yahoo-Inc.com

MapReduce: Simplied Data Processing on Large ClustersJeffrey Dean and Sanjay Ghemawatjeff@google.com, sanjay@google.comGoogle, Inc.

the hadoop distributed file system

hairong kuang

sanjay radia

set of values

intermediate values

simplied data processing

buffalo mapreduce

journalnamespace image

acceptsan intermediate

Documents

snapshotting in hadoop distributed file system for hadoop...

the hadoop distributed file system: architecture and

hadoop distributed file system...

comparing the hadoop distributed file system...

ijiret vivekanand s reshmi significance of hadoop...

2. hadoop -...

the hadoop distributed file system

mapreduce and hadoop distributed file system

research article big data and hadoop with … · in this...

hadoop distributed file system

hadoop integration function user's guide...-in the case of...

hadoop distributed file system - snia...hadoop distributed...

hadoop distributed file system for the grid -

fredrick ishengoma - hdfs+- erasure coding based hadoop...

fredrick ishengoma - erasure coding based hadoop distributed...

pengantar hadoop...hadoop • hadoop distributed file system...

introduction to distributed file system in hadoop (hdfs) to...

the hadoop distributed file system - david r. cheriton...

hadoop distributed file system hdfs reliability based on...

unit-ii distributed file systems leading to hadoop file...