hadoop session 2 : hdfs
TRANSCRIPT
![Page 1: Hadoop Session 2 : HDFS](https://reader036.vdocuments.net/reader036/viewer/2022062310/5873c03d1a28abbc788b64d3/html5/thumbnails/1.jpg)
HDFSHADOOP DISTRIBUTED
FILE SYSTEM
![Page 2: Hadoop Session 2 : HDFS](https://reader036.vdocuments.net/reader036/viewer/2022062310/5873c03d1a28abbc788b64d3/html5/thumbnails/2.jpg)
Revision What is Big Data ? 3 V’s of Big Data What can we do with Big Data ? What is Hadoop ? Components of Hadoop
![Page 3: Hadoop Session 2 : HDFS](https://reader036.vdocuments.net/reader036/viewer/2022062310/5873c03d1a28abbc788b64d3/html5/thumbnails/3.jpg)
• DISTRIBUTED• FAULT TOLERENT• SCALABLE• FLEXIBLE• INTELLIGENT
Revision
![Page 4: Hadoop Session 2 : HDFS](https://reader036.vdocuments.net/reader036/viewer/2022062310/5873c03d1a28abbc788b64d3/html5/thumbnails/4.jpg)
Hadoop Components
SELF HEALINGDISTRIBUTED STORAGE
FAULT TOLERANTDISTRIBUTED COMPUTING
+ ABSTRACTION PARALLEL
PROCESSING
![Page 5: Hadoop Session 2 : HDFS](https://reader036.vdocuments.net/reader036/viewer/2022062310/5873c03d1a28abbc788b64d3/html5/thumbnails/5.jpg)
• Designed for modest number of Large files (millions instead of billions)• Sequential access not Random access• Write Once, Read Many Times• Data is split into BIG chunks and stored in multiple nodes as blocks• Blocks get replicated over the multiple nodes
HDFS Overview
![Page 6: Hadoop Session 2 : HDFS](https://reader036.vdocuments.net/reader036/viewer/2022062310/5873c03d1a28abbc788b64d3/html5/thumbnails/6.jpg)
HDFS Client Server Architecture Server - Name Node Client – Data Nodes File Split into multiple Blocks Multiple Copies of Each Block
![Page 7: Hadoop Session 2 : HDFS](https://reader036.vdocuments.net/reader036/viewer/2022062310/5873c03d1a28abbc788b64d3/html5/thumbnails/7.jpg)
NN
DATA NODES
5
3
2
1
4
3
144
2
5
1
![Page 8: Hadoop Session 2 : HDFS](https://reader036.vdocuments.net/reader036/viewer/2022062310/5873c03d1a28abbc788b64d3/html5/thumbnails/8.jpg)
TOPOLOGY OF HADOOP CLUSTER
NAME NODE
DATANODE
SECONDARYNAME NODE
DATANODE DATANODEDATANODE
![Page 9: Hadoop Session 2 : HDFS](https://reader036.vdocuments.net/reader036/viewer/2022062310/5873c03d1a28abbc788b64d3/html5/thumbnails/9.jpg)
Nodes in a HDFS Name Node Secondary Name Node Data Node Job Tracker Task Tracker
![Page 10: Hadoop Session 2 : HDFS](https://reader036.vdocuments.net/reader036/viewer/2022062310/5873c03d1a28abbc788b64d3/html5/thumbnails/10.jpg)
NAMENODE One NN per cluster Manages
File System namespace META Data
Single Point of failure Enterprise hardware i.e. RAID
machines
![Page 11: Hadoop Session 2 : HDFS](https://reader036.vdocuments.net/reader036/viewer/2022062310/5873c03d1a28abbc788b64d3/html5/thumbnails/11.jpg)
SECONDARY NAMENODE
NOT a backup node of NN NOT Automatic to replace NN Single Point of failure Enterprise hardware i.e. RAID
machines
![Page 12: Hadoop Session 2 : HDFS](https://reader036.vdocuments.net/reader036/viewer/2022062310/5873c03d1a28abbc788b64d3/html5/thumbnails/12.jpg)
DATANODE Many per cluster Manages
Blocks and Serves to the client Periodically report to NN list of block it
stores Use Inexpensive commodity hardware
![Page 13: Hadoop Session 2 : HDFS](https://reader036.vdocuments.net/reader036/viewer/2022062310/5873c03d1a28abbc788b64d3/html5/thumbnails/13.jpg)
JOB TRACKER One per cluster Manages
Job Requests submitted by client Initial point of contact for client Job starts at Job Tracker Single point of failure
![Page 14: Hadoop Session 2 : HDFS](https://reader036.vdocuments.net/reader036/viewer/2022062310/5873c03d1a28abbc788b64d3/html5/thumbnails/14.jpg)
TASK TRACKER Many per cluster Execute Map and Reduce Operation Read input splits for a Map Reduce
Job
![Page 15: Hadoop Session 2 : HDFS](https://reader036.vdocuments.net/reader036/viewer/2022062310/5873c03d1a28abbc788b64d3/html5/thumbnails/15.jpg)
Block Replication
1st Node at the client (Randomly Chosen) 2nd Different Rack than first 3rd Same Rack as the second
Replication factor = 3
REPLICA PLACEMENT
![Page 16: Hadoop Session 2 : HDFS](https://reader036.vdocuments.net/reader036/viewer/2022062310/5873c03d1a28abbc788b64d3/html5/thumbnails/16.jpg)
HDFS Large Blocks of 64 MB/128 MB
150 MB
64 MB
64 MB
22 MB
![Page 17: Hadoop Session 2 : HDFS](https://reader036.vdocuments.net/reader036/viewer/2022062310/5873c03d1a28abbc788b64d3/html5/thumbnails/17.jpg)
HDFS CLI
![Page 18: Hadoop Session 2 : HDFS](https://reader036.vdocuments.net/reader036/viewer/2022062310/5873c03d1a28abbc788b64d3/html5/thumbnails/18.jpg)
HDFS Files Read/Write hadoop fs -ls <path> hadoop fs -mkdir <path> hadoop fs -cp <Source> <Destination> hadoop fs -cat <File Path> hadoop fs –tail <File Path> hadoop fs -mv <Source> <Destination> hadoop fs -rm <path>
![Page 19: Hadoop Session 2 : HDFS](https://reader036.vdocuments.net/reader036/viewer/2022062310/5873c03d1a28abbc788b64d3/html5/thumbnails/19.jpg)
HDFS File Ownership sudo -u hdfs hadoop fs -chmod 600
hadoop/purchases.txt
sudo -u hdfs hadoop fs -chown root:root hadoop/purchases.txt
sudo -u hdfs hadoop fs -chgrp training hadoop/purchases.txt
![Page 20: Hadoop Session 2 : HDFS](https://reader036.vdocuments.net/reader036/viewer/2022062310/5873c03d1a28abbc788b64d3/html5/thumbnails/20.jpg)
HDFS Administration Commands
hadoop version hadoop classpath hadoop fsck - / hadoop balancer hadoop fs -du -s -h <path> **hadoop fs -setrep -w 2 <File Path> hadoop fs –expunge hadoop fs -df hdfs:/
![Page 21: Hadoop Session 2 : HDFS](https://reader036.vdocuments.net/reader036/viewer/2022062310/5873c03d1a28abbc788b64d3/html5/thumbnails/21.jpg)
HDFS Read/Write Local HDFS
hadoop fs -copyFromLocal <Source - Local>
<Destination - HDFS> hadoop fs –copyToLocal
<Source HDFS> < Source - Local> hadoop fs –put <source> <destination> hadoop fs –get <source> <destination>
![Page 22: Hadoop Session 2 : HDFS](https://reader036.vdocuments.net/reader036/viewer/2022062310/5873c03d1a28abbc788b64d3/html5/thumbnails/22.jpg)
Most ImportantHelp ??
hadoop fs -help