hadoop distributed file system

19
Anand L. Kulkarni. Hadoop Distributed File System A Presentation By ,

Upload: anand-kulkarni

Post on 16-Apr-2017

368 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Hadoop Distributed File System

Anand L. Kulkarni.

Hadoop Distributed File System

A Presentation By ,

Page 2: Hadoop Distributed File System

May 3, 2023Hadoop Distributed File System 2

BACKGROUND

Need for large data processing –

Challenges at large scale –

What is Distributed File System(DFS)?

Page 3: Hadoop Distributed File System

May 3, 2023Hadoop Distributed File System 3

“Framework for running [distributed] applications on large cluster built of commodity hardware“ .

- From Hadoop Wiki.

Originally created by Doug Cutting . Named the project after his son’s name.

Inspired by Google’s architecture: Map Reduce and GFS

What Is Hadoop ?

Page 4: Hadoop Distributed File System

May 3, 2023Hadoop Distributed File System 4

The name “Hadoop” has now evolved to cover a family of products, but at its core, it’s essentially just the

- MapReduce programming paradigm and

- A distributed file system(HDFS).

What Is Hadoop ?

Page 5: Hadoop Distributed File System

May 3, 2023Hadoop Distributed File System 5

Hadoop Ecosystem

Page 6: Hadoop Distributed File System

May 3, 2023Hadoop Distributed File System 6

Hadoop Distributed File System

Master/slave architecture

Fault tolerant via replication .

Optimized for larger files.

Hardware failures assumed in design.

Name Node

Data NodeData NodeData Node

(Master)

(Slaves)

Page 7: Hadoop Distributed File System

May 3, 2023Hadoop Distributed File System 7

Hadoop Distributed File System

Written in Java.

Focus on streaming data (High throughput > low-latency)

Designed to run on commodity hardware

HDFS is a File System, not a DBMS.

Page 8: Hadoop Distributed File System

May 3, 2023Hadoop Distributed File System 8

Block Data Node

Name Node

Checkpoint Node

Backup Node

HDFS Terminologies

Page 9: Hadoop Distributed File System

May 3, 2023Hadoop Distributed File System 9

Name Node Backup Node

Data Node Data Node Data Node Data NodeData Node

HDFS Architecture( Replication, Heartbeats, balancing )

(Namespace backups)

(Namespace , Metadata operations)

(Writes to local disks)

Page 10: Hadoop Distributed File System

May 3, 2023Hadoop Distributed File System 10

Name Node Backup Node

100100110010100101010010101010101001010100101010101010010101010101010101

File

HDFS Client

Data Node Data Node Data Node Data NodeData Node

( File locations, block size, file system operations )

(Data transfer)

HDFS Architecture

Page 11: Hadoop Distributed File System

May 3, 2023Hadoop Distributed File System 11

Data Node Data Node Data Node Data NodeData Node

Name Node Backup Node100100110010100101010010101010101001010100101010101010010101010101010101

File

HDFS Client

Putting Files On HDFS

Page 12: Hadoop Distributed File System

May 3, 2023Hadoop Distributed File System 12

Data Node Data Node Data Node Data NodeData Node

Name Node Backup Node100100110010100101010010101010101001010100101010101010010101010101010101

File

HDFS Client

(Return locations of blocks for a file.)

Getting Files From HDFS

Page 13: Hadoop Distributed File System

May 3, 2023Hadoop Distributed File System 13

HDFS Architecture

The Files system namespace

Replica management

Replica Selection

Safe mode

Page 14: Hadoop Distributed File System

May 3, 2023Hadoop Distributed File System 14

HDFS Architecture

The Persistence Of File System Metadata

Robustness

Space Reclamation-◦ File Deletes And Undeletes◦ Decrease Replication Factor

Page 15: Hadoop Distributed File System

May 3, 2023Hadoop Distributed File System 15

HDFS Architecture

Name Node Recovery.

Data Node Recovery.

Metadata Disk Failure.

Page 16: Hadoop Distributed File System

May 3, 2023Hadoop Distributed File System 16

Name Node Backup Node

Data Node Data Node Data Node Data NodeData Node

Data Node Failure

Page 17: Hadoop Distributed File System

May 3, 2023Hadoop Distributed File System 17

Data Node Data Node Data Node Data NodeData Node

Name Node Backup Node

Name Node Failure

Page 18: Hadoop Distributed File System

May 3, 2023Hadoop Distributed File System 18

Future Work

Scalability of Name node.

Automation of Name node recovery.

Page 19: Hadoop Distributed File System

May 3, 2023Hadoop Distributed File System 19

Q & A