the hadoop distributed file system

16
PaoMin Wu University at Buffalo The Hadoop Distributed File System

Upload: nelle-garcia

Post on 30-Dec-2015

32 views

Category:

Documents


3 download

DESCRIPTION

The Hadoop Distributed File System. PaoMin Wu University at Buffalo. Namenode stores matadata of the system keeps all namespace in RAM Datanode block replica stores application data 3. HDFS-Client User applications access the file system using the HDFS client. ARCHITECTURE. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The  Hadoop  Distributed File System

PaoMin Wu University at Buffalo

The Hadoop Distributed File System

Page 2: The  Hadoop  Distributed File System

ARCHITECTURE

1. Namenodestores matadata of the systemkeeps all namespace in RAM

2. Datanodeblock replicastores application data

3. HDFS-ClientUser applications access the file system using the HDFSclient

Page 3: The  Hadoop  Distributed File System

HDFS Client Process

Page 4: The  Hadoop  Distributed File System

ARCHITECTURE

4. Image and JournalNamespace image = file system metadataPeresistent record of image = checkpoint

5. CheckpointNode (NameNode)Protects file system metadata

6. BackupNode (NameNode)Capable of creating periodic checkpoints

Page 5: The  Hadoop  Distributed File System

FILE I/O OPERATIONS AND REPLICA MANGEMENT

Page 6: The  Hadoop  Distributed File System

FILE I/O OPERATIONS AND REPLICA MANGEMENT

Page 7: The  Hadoop  Distributed File System

Sort Benchmark

Page 8: The  Hadoop  Distributed File System

Future Work

Problem:NameNode contains all important information

Solution:Allow multiple namespaces(and NameNodes) to share the physical storage within a cluster

Page 9: The  Hadoop  Distributed File System

PaoMin Wu University at Buffalo

MapReduce: Simplied Data Processing on Large Clusters

Page 10: The  Hadoop  Distributed File System

Introduction

•key/value pair

•execution across a set of machines

•handling machine failures

•managing the required inter-machine communication

•runs on a large cluster

•powerful interface

•automatic parallelization

•distribution of large-scale computations

Page 11: The  Hadoop  Distributed File System

Programming Model

Map, written by the user, takes an input pair and produces a set of intermediate key/value pairs.

The Reduce function, also written by the user, acceptsan intermediate key and a set of values for that key.

The intermediate values are supplied to the user's reduce function via an iterator.

Page 12: The  Hadoop  Distributed File System

Example:

Page 13: The  Hadoop  Distributed File System

Execution Overflow:

Page 14: The  Hadoop  Distributed File System

Backup Tasks:

Page 15: The  Hadoop  Distributed File System

Conclusions

1. Restricting the programming model is beneficial

2. Network bandwidth is a scarce resource

3. Redundant execution can help

Page 16: The  Hadoop  Distributed File System

References:

The Hadoop Distributed File SystemKonstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert ChanslerYahoo!Sunnyvale, California USA{Shv, Hairong, SRadia, Chansler}@Yahoo-Inc.com

MapReduce: Simplied Data Processing on Large ClustersJeffrey Dean and Sanjay [email protected], [email protected], Inc.