hadoop yarn in the cloud junping du staff engineer, vmware china hadoop summit, 2013

18
Hadoop YARN in the Cloud Junping Du Staff Engineer, VMware China Hadoop Summit, 2013

Upload: estella-wright

Post on 17-Dec-2015

221 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Hadoop YARN in the Cloud Junping Du Staff Engineer, VMware China Hadoop Summit, 2013

Hadoop YARN in the Cloud

Junping DuStaff Engineer, VMware

China Hadoop Summit, 2013

Page 2: Hadoop YARN in the Cloud Junping Du Staff Engineer, VMware China Hadoop Summit, 2013

Agenda

• Hadoop YARN – Hub for Big Data Applications

• YARN and Cloud Computing

• HVE (Hadoop Virtualization Extension) work on YARN

Page 3: Hadoop YARN in the Cloud Junping Du Staff Engineer, VMware China Hadoop Summit, 2013

Hadoop MapReduce v1 (Classic)

• JobTracker– Manage cluster

resources and job scheduling

• TaskTracker– Per node agent– Manage tasks

Page 4: Hadoop YARN in the Cloud Junping Du Staff Engineer, VMware China Hadoop Summit, 2013

MapReduce v1 Limitations

• Scalability– Manage cluster resources and job scheduling

• SPOF (Single Point Of Failure)• JobTracker failure cause all queued and running job

failure– Restart is very tricky due to complex state

• Hard partition of resources into map and reduce slots– Low resource utilization

• Lacks support for alternate paradigms• Lack of wire-compatible protocols

Page 5: Hadoop YARN in the Cloud Junping Du Staff Engineer, VMware China Hadoop Summit, 2013

YARN Architecture• Splits up the two major functions of

JobTracker– Resource Manager (RM) - Cluster resource

management– Application Master (AM) - Task scheduling and

monitoring

• NodeManager (NM) - A new per-node slave– launching the applications’ containers– monitoring their resource usage (cpu, memory)

and reporting to the Resource Manager.

• YARN maintains compatibility with existing MapReduce application and support other applications

Page 6: Hadoop YARN in the Cloud Junping Du Staff Engineer, VMware China Hadoop Summit, 2013

YARN – Hub for Big Data Applications

YARN

MapReduce Tez

HDFS

Storm

Spark

HBaseImpala

OpenMPI Distributed Shell

• App-specific AM• HOYA (Hbase On YArn)

– Long running services (YARN-896)• LLAMA (Low Latency Application MAster)

– Gang Scheduler (YARN-624)

Page 7: Hadoop YARN in the Cloud Junping Du Staff Engineer, VMware China Hadoop Summit, 2013

• Two different prospective:– YARN-centric prospective• YARN is the key platform to apps• YARN is independent of infrastructure, running on top of

Cloud shows YARN’s generality

– Cloud-centric prospective• YARN is an umbrella kind of applications• Supporting YARN shows Cloud’s generality

YARN and Cloud

Page 8: Hadoop YARN in the Cloud Junping Du Staff Engineer, VMware China Hadoop Summit, 2013

YARN and Cloud: YARN-centric Prospective

YARN

Bare-metal machines

MapReduce Tez Storm

SparkHBase

Impala

Open MPI Distributed Shell

VMware Open Stack

Infrastructure

Big Data Apps

Cloud Infrastructure

Page 9: Hadoop YARN in the Cloud Junping Du Staff Engineer, VMware China Hadoop Summit, 2013

YARN and Cloud: Cloud-centric Prospective

YARN

MapReduce Tez Storm

SparkHBase

Impala

Open MPI D.S

Cloud Infrastructure (VMware, Open Stack, etc.)

YARN AppsLegacy Apps Non-YARNBig Data Apps

……

Page 10: Hadoop YARN in the Cloud Junping Du Staff Engineer, VMware China Hadoop Summit, 2013

• Similarity – Target to share resources across applications– Provide Global Resource Management

• YARN vs. Cloud– YARN managing resource in OS layer vs. Cloud

managing resources in Hypervisor (Not comparable, but Hypervisor is more powerful than OS )

– Apps managed by YARN need specific AppMaster, Apps managed by Cloud is exactly the same as running on physical machines (Cloud )

– YARN tracking application-specific metrics/progress, Cloud only track underlayer resources (YARN )

YARN vs. Cloud

Page 11: Hadoop YARN in the Cloud Junping Du Staff Engineer, VMware China Hadoop Summit, 2013

• Why YARN + Cloud?– Leverage virtualization in strong isolation, fine-grained resource

sharing and other benefits– Uniform infrastructure to simplify IT in enterprise

• What it looks like?– Running YARN NM inside of VMs managed by Cloud Infrastructure– Build communication channel between YARN RM and Cloud

Resource Manager for coordination• How we do?– First thing above is very easy and smoothly– Second things to achieve in two ways

• YARN can aware/manipulate Cloud resource change• YARN provide a generic resource notification mechanism so Cloud

Manager can use when resource changing

YARN + Cloud

Page 12: Hadoop YARN in the Cloud Junping Du Staff Engineer, VMware China Hadoop Summit, 2013

• VM’s resource boundary can be elastic– CPU is easy – time slicing (with constraints)– Memory is harder – page sharing and memory ballooning– In case of contention, enforce limits and proportional sharing– “Stealing” resources behind apps could cause bad performance

(paging)– App aware resource management could address these issues

• Hadoop YARN Resource Model– Dynamic with adding/removing nodes– But static for per node

• In this case, shall we enable resource elasticity on VM?– If yes, low performance when resource contention happens.– If no, low utilization as physical boxes because free resources cannot

be leveraged by other busy VMs• We need better answer .

Elastic YARN Node in the Cloud

Page 13: Hadoop YARN in the Cloud Junping Du Staff Engineer, VMware China Hadoop Summit, 2013

HVE provide the answer!• Hadoop Virtualization Extensions– A project to enhance Hadoop running on

virtualization• Goal: Make Hadoop Cloud-Ready– Provide Virtualization-awareness to Hadoop, i.e.

virtual topology, virtual resources, etc.– Deliver generic utility that can be leveraged by

virtualized platform • Independent of virtualization platform and

cloud infrastructure• 100% contribution to Apache Hadoop

Community

Page 14: Hadoop YARN in the Cloud Junping Du Staff Engineer, VMware China Hadoop Summit, 2013

HVE• Philosophy– make infrastructure related components abstract– deliver different implementations that can be

configured properly• E.g.

BlockPlacementPolicy

BlockPlacementPolicy(Abstract)

BlockPlacementPolicyDefault

BlockPlacementPolicyFor Virtualization

Page 15: Hadoop YARN in the Cloud Junping Du Staff Engineer, VMware China Hadoop Summit, 2013

Virtualization Host

Elastic YARN Node in the Cloud

VirtualYARNNode

OtherWorkload

VMDK

Datanode

NodeManager

Container

ContainerAdd/RemoveResources?

Grow/Shrinkby tens of GB in memory?

Grow/Shrink resource of a VM

Page 16: Hadoop YARN in the Cloud Junping Du Staff Engineer, VMware China Hadoop Summit, 2013

Implementation – YARN-291 (umbrella)

• YARN-311– Core scheduler changes

• YARN-313• CLI

• YARN-312– AdminProtocol changes

• REST API, JMX, etc.

Node Manager

SchedulerNode

Cloud Resource Manager

Resource Manager

Resource Tracker Service

Scheduler

RMContext

RMNode

Heartbeat

Admin CLIAdminServiceCluster Resource

UpdateNodeResource()

yarn rmadmin -updateNodeResource <NodeId> <Resource>

Page 17: Hadoop YARN in the Cloud Junping Du Staff Engineer, VMware China Hadoop Summit, 2013

Reference• YARN MapReduce 2.0– https://issues.apache.org/jira/browse/MAPREDUCE-279

• HVE topology extension– https://issues.apache.org/jira/browse/HADOOP-8468

• HVE topology extension for YARN– https://issues.apache.org/jira/browse/YARN-18

• HVE elastic resource configuration– https://issues.apache.org/jira/browse/YARN-291

• Gang Scheduling– https://issues.apache.org/jira/browse/YARN-624

• Long-lived services in YARN– https://issues.apache.org/jira/browse/YARN-896

Page 18: Hadoop YARN in the Cloud Junping Du Staff Engineer, VMware China Hadoop Summit, 2013

Thanks!

Junping Du [email protected]