hadoop summit san jose 2014 - apache hadoop yarn: best practices
DESCRIPTION
With the advent of YARN as part of Apache Hadoop 2, Hadoop clusters evolved from running only MapReduce jobs to a whole new world of running various different applications starting from Apache Tez for interactive/batch applications to Apache Storm for stream-processing. To make the best use of a YARN cluster, there are questions that need to be addressed at various levels. For an administrator managing a YARN cluster, how does one go from configuring Map/Reduce slots to configuring resources and containers? Operations teams now have to deal with a new range of metrics when managing YARN clusters. A YARN application-developer has to understand how to write an efficient application to make the best use of YARN and at the same time gracefully degrading on a busy cluster. In this talk, we’ll will cover YARN best practices from various perspectives - administrators and developers. We’ll describe how administrators can configure a YARN cluster to optimally use the resources depending on the kind of hardware and types of applications being run. We’ll focus on managing a cluster shared across numerous users, how to manage queues and do capacity allocation across different business units. For developers, we’ll cover how to interact with the various components of YARN and focus on the implicit features that all applications need to be built with such as security and failure handling.TRANSCRIPT
© Hortonworks Inc. 2014
Apache Hadoop YARN
Best Practices
Zhijie Shen
zshen [at] hortonworks.com
Varun Vasudev
vvasudev [at] hortonworks.com
Page 1
© Hortonworks Inc. 2014
Who we are
• Zhijie Shen– Software engineer at Hortonworks– Apache Hadoop Committer– Apache SAMZA Committer and PPMC– PhD from National University of Singapore
• Varun Vasudev– Software engineer at Hortonworks, working on YARN– Worked on image and web search at Yahoo!
Page 2Architecting the Future of Big Data
© Hortonworks Inc. 2014
Agenda
• Talking about what we have learnt from our experiences working with YARN users
• Best practices for– Administrators– Application Developers
Page 3Architecting the Future of Big Data
© Hortonworks Inc. 2014
For Administrators
Architecting the Future of Big DataPage 4
© Hortonworks Inc. 2014
Sub-Agenda
• Overview of YARN configuration• ResourceManager• Schedulers• NodeManagers• Others
– Log aggregation– Metrics
Page 5Architecting the Future of Big Data
© Hortonworks Inc. 2014
Overview of YARN configuration
• Almost everything YARN related in yarn-site.xml• Granular – individual variables documented• Nearly 150 configuration properties
– Required: Very small set – hostnames etc– Common: Client and server– Advanced: RPC retries etc.– yarn.resourcemanager.* yarn.nodemanager.* usually - server configs
– Admins can mark them ‘final’ to clarify to users they cannot be overridden
– yarn.client.* - client configs
• Security, ResourceManager, NodeManager, TimelineServer, Scheduler – all in one file
• Topology scripts on RM, NM and all nodes– BUG: MR AM has to read the same script. Work in progress to send it from RM to
AMs
Page 6Architecting the Future of Big Data
© Hortonworks Inc. 2014
ResourceManager
• Hardware requirements– ResourceManagers needs CPU– Doesn’t require as much memory as JobTracker
– 4 to 8 GB should be fine
• JobHistoryServer– Needs memory, at least 8 GB
Page 7Architecting the Future of Big Data
© Hortonworks Inc. 2014
Enable RM HA
• Enable RM HA - availability• Only supported using Zookeeper
– Leader election used– Fencing support
• Automatic failover enabled by default– Using zookeeper again– Embedded zkfc, no need to explicitly start separate process
• You can start multiple ResourceManagers• Specify rm-ids using yarn.resourcemanager.ha.rm-ids
– e.g yarn.resourcemanager.ha.rm-ids rm1, rm2
• Associate hostnames with rm-ids using yarn.resourcemanager.hostname.rm1, yarn.resourcemanager.hostname.rm2– No need to change any other configs – scheduler, resource-tracker addresses are
automatically taken care of
• Web-Uis automatically get redirected to the active
Page 8Architecting the Future of Big Data
© Hortonworks Inc. 2014
YARN schedulers
• Two main schedulers– capacity– fair
• Capacity Scheduler allows you to setup queues to split resources – useful for multi-tenant clusters where you want to guarantee resources
• Fair Scheduler allows you to split resources ‘fairly’ across applications• Both have admin files which can be used to dynamically change the
setup• If you have enabled HA, queue configuration files are on local disk
– Make sure queue files are consistent across nodes– Feature to centralize configs in progress
Page 9Architecting the Future of Big Data
© Hortonworks Inc. 2014
Capacity Scheduler
Page 10Architecting the Future of Big Data
50%
queue-1 queue-2 queue-3
Apps Apps Apps
Guaranteed Resources
30% 20%
© Hortonworks Inc. 2014
YARN Capacity scheduler
• Configuration in capacity-scheduler.xml• Take some time to setup your queues!• Queues have per-queue acls to restrict queue access
– Access can be dynamically changed
• Elasticity can be limited on a per-queue basis – use yarn.scheduler.capacity.<queue-path>.maximum-capacity
• Use yarn.scheduler.capacity.<queue-path>.state to drain queues– ‘Decommissioning’ a queue
• yarn rmadmin –refreshQueues to make runtime changes
Page 11Architecting the Future of Big Data
© Hortonworks Inc. 2014
YARN Fair Scheduler
• Apps get equal share of resources, on average, over time• No worry about starvation• Support for queues – meant to be used so that you can prevent users
from flooding the system with apps• Has support for fairness policy which can be modified at runtime• Good if you have lots of small jobs
Page 12Architecting the Future of Big Data
© Hortonworks Inc. 2014
Size your containers
• Memory and cores – minimum and maximum allocation, affects containers per node
• yarn.scheduler.*-allocation-*• Defaults are 1GB, 8GB, 1 core and 32 cores• CPU scheduling needs a bit more stabilization
– Historically – translate to memory calculations
• Similarly Disk-scheduling– translate disk limits to memory/cpu.
Page 13Architecting the Future of Big Data
4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 640
10
20
30
40
50
60
70
Memory for NodeManager(in GB)
Number of containers per node
© Hortonworks Inc. 2014
NodeManagers
• Set resource-memory – variable is yarn.nodemanager.resource.memory-mb– Sets how much memory YARN can use for containers– Default is 8GB
• Set up a health-checker script!– Check disk– Check network– Check any external resources required for job completion– Test it on your OS– Weed out bad nodes automatically!
• Figure out if the physical and virtual memory monitors make sense; both are enabled by default.– Default ratio is 2.1
• Multiple disks for containers on NodeManagers– HDFS too accesses them– If bottlenecked on disks, separate them. Haven’t seen it in the wild though
Page 14Architecting the Future of Big Data
© Hortonworks Inc. 2014
YARN log aggregation
• Log aggregation can be enabled using yarn.log-aggregation-enable. • Can control how long you keep the logs by setting parameters for
purging• App logs can be obtained using “yarn logs” command• Creates lots of small files, can affect HDFS performance
Page 15Architecting the Future of Big Data
© Hortonworks Inc. 2014
YARN Metrics
• JMX – http://<rm address>:<port>/jmx, http://<nm address>:<port>/jmx– Cluster metrics – apps running, successful, failed, etc– Scheduler metrics – queue usage– RPC metrics
• Web UI – http://<rm address>:<port>/cluster– Cluster metrics– Scheduler metrics – easier to digest, especially queue usage– Healthy, failed nodes
• Can be emitted to Ganglia directly using the metrics sink– Metrics configuration file
Page 16Architecting the Future of Big Data
© Hortonworks Inc. 2014
For Application Developers
Architecting the Future of Big DataPage 17
© Hortonworks Inc. 2014
Sub-Agenda
• Framework or a native Application?• Understanding YARN Basics• Writing an YARN Client• Writing an ApplicationMaster• Misc Lessons
Page 18Architecting the Future of Big Data
© Hortonworks Inc. 2014
Framework or a native app?
• Two choices– Write applications on top of existing frameworks
– Battle tested
– Already work
– APIs
– Roll your own native YARN application
• Existing frameworks– Scalable batch processing: MapReduce– Stream processing: Storm/Samza– Interactive processing, iterations: Tez/Spark– SQL: Hive– Data pipelines: Pig– Graph processing: Giraph– Existing app: Slider
• Apache: Your App Store
Page 19Architecting the Future of Big Data
© Hortonworks Inc. 2014
Ease of development
• Check the other developing or deployment tools
Page 20Architecting the Future of Big Data
NativeSlider
Frameworks
Complexity
Twill/REEF
© Hortonworks Inc. 2014
Understanding YARN Components
Page 21Architecting the Future of Big Data
• ResourceManager– Master of a cluster
• NodeManager– Slave to take care of one host
• ApplicationMaster– Master of an application
• Container– Resource abstraction, process to
complete a task
© Hortonworks Inc. 2014
User code: Client and AM
• Client– Client to ResourceManager
• ApplicationMaster– ApplicationMaster to scheduler
– Allocate resources
– ApplicationMaster to NodeManager– Manage containers
Page 22Architecting the Future of Big Data
© Hortonworks Inc. 2014
Client: Rule of Thumb
• Use the client libraries– YarnClient
– Submit an application
– AMRMClient(Async)– Negotiate resources
– NMClient(Async)– Manage containers
– TimelineClient– Monitor an application
Page 23Architecting the Future of Big Data
© Hortonworks Inc. 2014
Writing Client
1. Get the application Id from RM
2. Construct ApplicationSubmissionContext1. Shell command to run the AM
2. Environment (class path, env-variable)
3. LocalResources (Job jars downloaded from HDFS)
3. Submit the request to RM1. submitApplication
Page 24Architecting the Future of Big Data
© Hortonworks Inc. 2014
Tips for Writing Client
• Cluster Dependencies–Try to make zero assumptions on the cluster–Cluster location–Cluster sizes.
– ApplicationMaster too
• Your application bundle should deploy everything required using YARN’s local resources.
Page 25Architecting the Future of Big Data
© Hortonworks Inc. 2014
Writing ApplicationMaster
1. AM registers with RM (registerApplicationMaster)
2. HeartBeats(allocate) with RM (asynchronously)1. send the Request
1. Request new containers.
2. Release containers.
2. Received containers and send request to NM to start the container1. construct ContainerLaunchContext
– commands– env– jars
3. Unregisters with RM (finishApplicationMaster)
Page 26Architecting the Future of Big Data
© Hortonworks Inc. 2014
Tips for writing ApplicationMaster
• RM assigns containers asynchronously– Containers are likely not returned immediately at current call.– User needs to give empty requests until it gets the containers it requested.– ResourceRequest is incremental.
• Locality requests may not always be met– Relaxed Locality
• AMs can fail– They run on cluster nodes which can fail– RM restarts AMs automatically– Write AMs to handle failures on restarts - recovery– May be continue your work when AM restarts
• Optionally talk to your containers directly through the AM– To get progress, give work, kill it, etc– YARN doesn’t do anything for you
Page 27Architecting the Future of Big Data
© Hortonworks Inc. 2014
Using the Timeline Service
• Metadata/Metrics• Put application specific information
– TimelineClient– POJO objects
• Query the information– Get all entities of an entity type– Get one specific entity– Get all events of an entity type
Page 28Architecting the Future of Big Data
© Hortonworks Inc. 2014Page 29
Architecting the Future of Big Data
Summary: Application Workflow
• Execution Sequence1. Client submits an application
2. RM allocates a container to start AM
3. AM registers with RM
4. AM asks containers from RM
5. AM notifies NM to launch containers
6. Application code is executed in container
7. Client contacts RM/AM to monitor application’s status
8. AM unregisters with RM
Client RM NM AM
1
2
3
4
5
7
8
6
© Hortonworks Inc. 2014
Misc Lessons: Taking What YARN offers
• Monitor your application– RM– NM– Timeline server
Page 30Architecting the Future of Big Data
© Hortonworks Inc. 2014
Misc Lessons: Debugging/Testing
• MiniYARNCluster– In JVM YARN cluster!– Regression tests for your applications
• Unmanaged AM– Support to run the AM outside of a YARN cluster for development and
testing– AM logs on your console!
• Logs– RM/NM logs– App Log aggregation– Accessible via CLI, web UI
Page 31Architecting the Future of Big Data
© Hortonworks Inc. 2014
Thank you!Questions?
Architecting the Future of Big DataPage 32