meet hadoop family: part 2
TRANSCRIPT
Meet Hadoop Family: part 2
YARN
• What is it? Resource manager platform in a Hadoop cluster, it allows dynamic memory and CPU sharing between processing frameworks such as MapReduce, Spark, and others
• PurposeMore predictable performance Better cluster utilization
• Compared to MapReduce v1MapReduce v1 starts to break on > 4000 nodes YARN allows other frameworks to run on it also support multi tenancy YARN is backward compatible with MapReduce V1
YARN Architecture
Scheduler types
• FIFO Scheduler
• Capacity scheduler Fixed pools for resources FIFO scheduling for each pools
• Fair schedulerWeighted pools for resources Fair sharing
Capacity Scheduler
• Capacity guaranted on each pool, with hard limits and soft limits
• Hierarchical pool with a root pool
• Elasticity with preemptive option
Preemption Option
• T1: Time of App2’s submission
• T2: Time of App1 can finish
• T3: Time of App2 can finish
Fair Scheduler
• Each application assigned to a pool, a subpool is possible
• Excess capacity will be spreaded across all pools
• Pools with minimum resources defined received priority during allocation
• Minimum resources are minimum amount of resources that must be allocated to the pool before any fair allocation, often used to satisfy SLA (service level agreement)
• Pools can be assigned a weight
• Preemption types, minimum and fair share
• Resource manager web interface, port 8088
• Job history web interface, port 19888
Log Aggregation
• Logs can be grouped by application
• Stored in HDFS (was not in Map Reduce v1)
• Gives better load balance when writting logs
• Show applications yarn application -listyarn application -list allyarn application -status <application_id> yarn application -list -appstates FINISHED
• Kill application yarn application -kill <application_id>
• Show logsyarn logs -applicationId <application_id>
• List YARN nodes yarn node -list
Common Commands
Questions?https://www.meetup.com/Jakarta-Hadoop-Big-Data/