meet hadoop family: part 2

Meet Hadoop Family: part 2

YARN

• What is it? Resource manager platform in a Hadoop cluster, it allows dynamic memory and CPU sharing between processing frameworks such as MapReduce, Spark, and others

• PurposeMore predictable performance Better cluster utilization

• Compared to MapReduce v1MapReduce v1 starts to break on > 4000 nodes YARN allows other frameworks to run on it also support multi tenancy YARN is backward compatible with MapReduce V1

YARN Architecture

Scheduler types

• FIFO Scheduler

• Capacity scheduler Fixed pools for resources FIFO scheduling for each pools

• Fair schedulerWeighted pools for resources Fair sharing

Capacity Scheduler

• Capacity guaranted on each pool, with hard limits and soft limits

• Hierarchical pool with a root pool

• Elasticity with preemptive option

Preemption Option

• T1: Time of App2’s submission

• T2: Time of App1 can finish

• T3: Time of App2 can finish

Fair Scheduler

• Each application assigned to a pool, a subpool is possible

• Excess capacity will be spreaded across all pools

• Pools with minimum resources defined received priority during allocation

• Minimum resources are minimum amount of resources that must be allocated to the pool before any fair allocation, often used to satisfy SLA (service level agreement)

• Pools can be assigned a weight

• Preemption types, minimum and fair share

• Resource manager web interface, port 8088

• Job history web interface, port 19888

Log Aggregation

• Logs can be grouped by application

• Stored in HDFS (was not in Map Reduce v1)

• Gives better load balance when writting logs

• Show applications yarn application -listyarn application -list allyarn application -status <application_id> yarn application -list -appstates FINISHED

• Kill application yarn application -kill <application_id>

• Show logsyarn logs -applicationId <application_id>

• List YARN nodes yarn node -list

Common Commands

Questions?https://www.meetup.com/Jakarta-Hadoop-Big-Data/

meet hadoop family: part 2

Data & Analytics