cluster architecture · mr (v2) architecture 7 client node resource manager node other node manager...
TRANSCRIPT
![Page 1: Cluster architecture · MR (v2) Architecture 7 client node resource manager node other node manager nodes client JVM task JVM MR program job ResourceManager HDFS YARN NodeManager](https://reader033.vdocuments.net/reader033/viewer/2022060608/605e3bd4904d7b3b567e7096/html5/thumbnails/1.jpg)
© inovex Academy
Cluster architecture
1
![Page 2: Cluster architecture · MR (v2) Architecture 7 client node resource manager node other node manager nodes client JVM task JVM MR program job ResourceManager HDFS YARN NodeManager](https://reader033.vdocuments.net/reader033/viewer/2022060608/605e3bd4904d7b3b567e7096/html5/thumbnails/2.jpg)
© inovex Academy
Hdfs Architecture
2
data nodes 03, 05, 08
name node
client nodedata node 01
data node 02
data node 03
data node 04
data node 05
data node 06
data node 07
data node 08
data node 09
data node 10
data node 11
data node 12
rack 1 rack 2 rack 3
blk 2 blk 3 blk 4blk 1
Where do I store block 1?
blk 1 (03, 05, 08)
blk 1
(03,
05, 0
8)
blk 1(03, 05, 08)
Done!
Done!
Done!
![Page 3: Cluster architecture · MR (v2) Architecture 7 client node resource manager node other node manager nodes client JVM task JVM MR program job ResourceManager HDFS YARN NodeManager](https://reader033.vdocuments.net/reader033/viewer/2022060608/605e3bd4904d7b3b567e7096/html5/thumbnails/3.jpg)
© inovex Academy
MapReduce architecture
1. MR v1
2. MR v2 (YARN)
3
![Page 4: Cluster architecture · MR (v2) Architecture 7 client node resource manager node other node manager nodes client JVM task JVM MR program job ResourceManager HDFS YARN NodeManager](https://reader033.vdocuments.net/reader033/viewer/2022060608/605e3bd4904d7b3b567e7096/html5/thumbnails/4.jpg)
© inovex Academy
MR (v1) Architecture
4
client node
jobtracker node
tasktracker nodeclient JVM
child JVMMR
program job
JobTracker
HDFS
TaskTracker child map or red task1: run
2: get new job ID 4: submit job
3: copy job resources
5: initializejob
6: retrieveinput splits
7: heartbeat (returns task)
8: retrieve job resources
9: launch 10: run
![Page 5: Cluster architecture · MR (v2) Architecture 7 client node resource manager node other node manager nodes client JVM task JVM MR program job ResourceManager HDFS YARN NodeManager](https://reader033.vdocuments.net/reader033/viewer/2022060608/605e3bd4904d7b3b567e7096/html5/thumbnails/5.jpg)
© inovex Academy
Shortcomings of MR v1
1. Jobtracker is a SPOF!
2. Jobtracker limits scalability
1. max. cluster size: 4000 nodes (Yahoo)
3. Jobtracker responsible for both scheduling and tracking
5
![Page 6: Cluster architecture · MR (v2) Architecture 7 client node resource manager node other node manager nodes client JVM task JVM MR program job ResourceManager HDFS YARN NodeManager](https://reader033.vdocuments.net/reader033/viewer/2022060608/605e3bd4904d7b3b567e7096/html5/thumbnails/6.jpg)
© inovex Academy
MR v2
1. YARN (yet another resource negotiator)
2. separates responsibilities
1. Scheduling: Resouce Manager
2. Tracking: Application Masters
6
![Page 7: Cluster architecture · MR (v2) Architecture 7 client node resource manager node other node manager nodes client JVM task JVM MR program job ResourceManager HDFS YARN NodeManager](https://reader033.vdocuments.net/reader033/viewer/2022060608/605e3bd4904d7b3b567e7096/html5/thumbnails/7.jpg)
© inovex Academy
MR v2
1. YARN (yet another resource negotiator)
2. separates responsibilities
1. Scheduling: Resouce Manager
2. Tracking: Application Masters
6
restores scalability
![Page 8: Cluster architecture · MR (v2) Architecture 7 client node resource manager node other node manager nodes client JVM task JVM MR program job ResourceManager HDFS YARN NodeManager](https://reader033.vdocuments.net/reader033/viewer/2022060608/605e3bd4904d7b3b567e7096/html5/thumbnails/8.jpg)
© inovex Academy
MR (v2) Architecture
7
client node
resource manager node
other node manager nodes
client JVM
task JVM
MR program job ResourceManager
HDFS
NodeManagerYARNchildmap or red task
1: run
2: get new application ID
4: submit applications
3: copy job resources
7: retrieveinput splits10: retrieve job resources
9b: launch11: run
node manager node
5b: launch NodeManager
MR-AppMaster
6: initialize job
9a: start container
8: allocate resources
5a: startcontainer
![Page 9: Cluster architecture · MR (v2) Architecture 7 client node resource manager node other node manager nodes client JVM task JVM MR program job ResourceManager HDFS YARN NodeManager](https://reader033.vdocuments.net/reader033/viewer/2022060608/605e3bd4904d7b3b567e7096/html5/thumbnails/9.jpg)
© inovex Academy
MR v2: Resource manager
1. manages resources such as memory
2. sub-component: scheduler
1. capacity or fair
2. allocates containers for AM & tasks
8
![Page 10: Cluster architecture · MR (v2) Architecture 7 client node resource manager node other node manager nodes client JVM task JVM MR program job ResourceManager HDFS YARN NodeManager](https://reader033.vdocuments.net/reader033/viewer/2022060608/605e3bd4904d7b3b567e7096/html5/thumbnails/10.jpg)
© inovex Academy
MR v2: Resource manager
1. manages resources such as memory
2. sub-component: scheduler
1. capacity or fair
2. allocates containers for AM & tasks
8
allows for heterogenoushardware;
no more “slots”
![Page 11: Cluster architecture · MR (v2) Architecture 7 client node resource manager node other node manager nodes client JVM task JVM MR program job ResourceManager HDFS YARN NodeManager](https://reader033.vdocuments.net/reader033/viewer/2022060608/605e3bd4904d7b3b567e7096/html5/thumbnails/11.jpg)
© inovex Academy
MR v2: Application master
1. manages application lifecycle
1. task coordination: request containers from RM
2. monitoring (via heartbeat)
3. counter management
2. per-application
9
![Page 12: Cluster architecture · MR (v2) Architecture 7 client node resource manager node other node manager nodes client JVM task JVM MR program job ResourceManager HDFS YARN NodeManager](https://reader033.vdocuments.net/reader033/viewer/2022060608/605e3bd4904d7b3b567e7096/html5/thumbnails/12.jpg)
© inovex Academy
MR v2: Node manager
1. responsible for actual execution of tasks
2. tracks task progress
10
![Page 13: Cluster architecture · MR (v2) Architecture 7 client node resource manager node other node manager nodes client JVM task JVM MR program job ResourceManager HDFS YARN NodeManager](https://reader033.vdocuments.net/reader033/viewer/2022060608/605e3bd4904d7b3b567e7096/html5/thumbnails/13.jpg)
© inovex Academy
MR v2: misc
1. Job history server
2. Shuffle service
3. co-existence w/
1. other Hadoop versions
2. non-MR apps (e.g., Impala)
11