hadoop distributed file system
TRANSCRIPT
Anand L. Kulkarni.
Hadoop Distributed File System
A Presentation By ,
May 3, 2023Hadoop Distributed File System 2
BACKGROUND
Need for large data processing –
Challenges at large scale –
What is Distributed File System(DFS)?
May 3, 2023Hadoop Distributed File System 3
“Framework for running [distributed] applications on large cluster built of commodity hardware“ .
- From Hadoop Wiki.
Originally created by Doug Cutting . Named the project after his son’s name.
Inspired by Google’s architecture: Map Reduce and GFS
What Is Hadoop ?
May 3, 2023Hadoop Distributed File System 4
The name “Hadoop” has now evolved to cover a family of products, but at its core, it’s essentially just the
- MapReduce programming paradigm and
- A distributed file system(HDFS).
What Is Hadoop ?
May 3, 2023Hadoop Distributed File System 5
Hadoop Ecosystem
May 3, 2023Hadoop Distributed File System 6
Hadoop Distributed File System
Master/slave architecture
Fault tolerant via replication .
Optimized for larger files.
Hardware failures assumed in design.
Name Node
Data NodeData NodeData Node
(Master)
(Slaves)
May 3, 2023Hadoop Distributed File System 7
Hadoop Distributed File System
Written in Java.
Focus on streaming data (High throughput > low-latency)
Designed to run on commodity hardware
HDFS is a File System, not a DBMS.
May 3, 2023Hadoop Distributed File System 8
Block Data Node
Name Node
Checkpoint Node
Backup Node
HDFS Terminologies
May 3, 2023Hadoop Distributed File System 9
Name Node Backup Node
Data Node Data Node Data Node Data NodeData Node
HDFS Architecture( Replication, Heartbeats, balancing )
(Namespace backups)
(Namespace , Metadata operations)
(Writes to local disks)
May 3, 2023Hadoop Distributed File System 10
Name Node Backup Node
100100110010100101010010101010101001010100101010101010010101010101010101
File
HDFS Client
Data Node Data Node Data Node Data NodeData Node
( File locations, block size, file system operations )
(Data transfer)
HDFS Architecture
May 3, 2023Hadoop Distributed File System 11
Data Node Data Node Data Node Data NodeData Node
Name Node Backup Node100100110010100101010010101010101001010100101010101010010101010101010101
File
HDFS Client
Putting Files On HDFS
May 3, 2023Hadoop Distributed File System 12
Data Node Data Node Data Node Data NodeData Node
Name Node Backup Node100100110010100101010010101010101001010100101010101010010101010101010101
File
HDFS Client
(Return locations of blocks for a file.)
Getting Files From HDFS
May 3, 2023Hadoop Distributed File System 13
HDFS Architecture
The Files system namespace
Replica management
Replica Selection
Safe mode
May 3, 2023Hadoop Distributed File System 14
HDFS Architecture
The Persistence Of File System Metadata
Robustness
Space Reclamation-◦ File Deletes And Undeletes◦ Decrease Replication Factor
May 3, 2023Hadoop Distributed File System 15
HDFS Architecture
Name Node Recovery.
Data Node Recovery.
Metadata Disk Failure.
May 3, 2023Hadoop Distributed File System 16
Name Node Backup Node
Data Node Data Node Data Node Data NodeData Node
Data Node Failure
May 3, 2023Hadoop Distributed File System 17
Data Node Data Node Data Node Data NodeData Node
Name Node Backup Node
Name Node Failure
May 3, 2023Hadoop Distributed File System 18
Future Work
Scalability of Name node.
Automation of Name node recovery.
May 3, 2023Hadoop Distributed File System 19
Q & A