scheduling policies in yarn

29
1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Scheduling Policies in YARN Wangda Tan, Varun Vasudev San Jose, June 2016

Upload: dataworks-summithadoop-summit

Post on 16-Apr-2017

591 views

Category:

Technology


20 download

TRANSCRIPT

Scheduling Policies in YARN

Scheduling Policies in YARNWangda Tan, Varun VasudevSan Jose, June 2016

# Hortonworks Inc. 2011 2016. All Rights Reserved

Who we areWangda TanApache Hadoop PMC member

Varun VasudevApache Hadoop committer

# Hortonworks Inc. 2011 2016. All Rights Reserved

AgendaExisting scheduling in YARNAdding resource types and resource profilesResource scheduling for servicesGUTS(Grand Unified Theory of Scheduling) APIQ & A

# Hortonworks Inc. 2011 2016. All Rights ReservedExisting scheduling in YARN

# Hortonworks Inc. 2011 2016. All Rights Reserved

# Hortonworks Inc. 2011 2016. All Rights ReservedCurrent resource typesCurrently only support scheduling based on memory and cpuDepending on the calculator, scheduler will take cpu into accountMost applications are unaware of the resources being used for schedulingApplications may not get the containers they expect due to a mismatchNo support for resources like gpu, disk, network

# Hortonworks Inc. 2011 2016. All Rights Reserved

Locality for containersApplications can request for host or rack localityIf the request cant be satisfied in a certain number of tries, the container is allocated on the next node to heartbeatGood for MapReduce type applicationsInsufficient for servicesServices need support for affinity, anti-affinity, gang schedulingNeed support for fallback strategies

# Hortonworks Inc. 2011 2016. All Rights Reserved

Placement and capacity optionsNode partitionsEnd up partitioning the cluster akin to sub-clusters Support for non-exclusive partitions is availableReservationsLet you plan for capacity in advanceHelp you guarantee capacity for high priority large jobs

# Hortonworks Inc. 2011 2016. All Rights Reserved

Resource types and resource profiles

# Hortonworks Inc. 2011 2016. All Rights Reserved

# Hortonworks Inc. 2011 2016. All Rights ReservedExtending resource types in YARNAdd support for generalized resource typesUsers can use configuration to add and remove resource types from the schedulerAllows users to experiment with resource typesFor resources like network, modeling is hard - should you use ops or bandwidth?No need to touch the codeCurrent work is for countable resource typesSupport for exclusive resource types(like ports) is future work

# Hortonworks Inc. 2011 2016. All Rights Reserved

Resource profilesAnalogous to instance types in EC2Hard for users to conceptualize concepts like disk bandwidthCollection of resource typesAllows admins to define a set of profiles that can users can use to request containersUsers dont need to worry about resource types like disk bandwidthNew resource types can be added and removed without users needing to change their job submissionsProfiles are stored on the RMusers just pass on the name of the profile they want(small, medium, large)YARN-3926 is the umbrella jira for the feature

# Hortonworks Inc. 2011 2016. All Rights Reserved

Resource profiles examplesresource-profiles.json{ minimum: { yarn.io/memory: 1024, yarn.io/cpu: 1 }, maximum: { yarn.io/memory: 8192, yarn.io/cpu: 8 }, default: { yarn.io/memory: 2048, yarn.io/cpu: 2 }}resource-profiles.json{ minimum: { yarn.io/memory: 1024, yarn.io/cpu: 1 }, maximum: { yarn.io/memory: 8192, yarn.io/cpu: 8 }, default: { yarn.io/memory: 2048, yarn.io/cpu: 2 } small: { yarn.io/memory: 1024, yarn.io/cpu: 1 }, medium: { yarn.io/memory: 3072, yarn.io/cpu: 3 }, large: { yarn.io/memory: 8192, yarn.io/cpu: 8 } }

# Hortonworks Inc. 2011 2016. All Rights ReservedResource Scheduling for Services

# Hortonworks Inc. 2011 2016. All Rights Reserved

# Hortonworks Inc. 2011 2016. All Rights ReservedAffinity and Anti-affinityAnti-AffinitySome services dont want their daemons run on the same host/rack for better fault recovering or performance.For example, dont run >1 HBase region server on the same fault zone.

Overview

# Hortonworks Inc. 2011 2016. All Rights Reserved

13

Affinity and Anti-affinityAffinitySome services want to run their daemons close to each other, etc. for performance.For example, run Storm workers as close as possible for better data exchanging performance. (SW = Storm Worker)

Overview

# Hortonworks Inc. 2011 2016. All Rights ReservedRequirementsBe able to specify affinity/anti-affinity for intra/inter application(s) Intra-applicationInter-applicationExample of inter-application anti-affinity

Hard and soft affinity/anti-affinityHard: Reject not expected resources.Soft: Best effortExample of inter-application soft anti-affinity

RequirementsAffinity and Anti-affinity

# Hortonworks Inc. 2011 2016. All Rights Reserved

Affinity and Anti-affinity YARN-1042 is the umbrella JIRADemo

# Hortonworks Inc. 2011 2016. All Rights ReservedAffinity/Anti-affinity Demo

# Hortonworks Inc. 2011 2016. All Rights ReservedContainer ResizingUse casesServices can modify size of their running container according to workload changes.For example: when HBase region servers are running, when workload changes . We can return excessive resources of RM to improve utilization.

Before this featureApplication has to re-ask container with different size from YARN.Contexts in task memory will be lost.Status-feature will be included by Hadoop 2.8YARN-1197 is the umbrella jiraOverview

# Hortonworks Inc. 2011 2016. All Rights ReservedGUTS (Grand Unified Theory of Scheduling) API

# Hortonworks Inc. 2011 2016. All Rights Reserved

# Hortonworks Inc. 2011 2016. All Rights ReservedRequirementsWe have more and more new scheduling requirements:Scheduling fallbacksTry plan-A first, fall back to plan-B if plan-A cannot be satisfied in X secs.Currently YARN only supports one scheduling fallbacks: node/rack/off-switch fallbacks by delay scheduling, but user cannot specify order of fallbacks.

Affinity / Anti-affinity

# Hortonworks Inc. 2011 2016. All Rights ReservedRequirementsNode partitions Already supported by YARN-796, which can divide a big cluster to several smaller clusters according to hardware and purpose, we can specify capacities and ACLs for node partitions.Node constraints Is a way to tag nodes without complexities like ACLs/capacity-configurations. (YARN-3409)

# Hortonworks Inc. 2011 2016. All Rights ReservedRequirementsGang schedulingGive me N containers at once or nothing. Resource reservationGive me resource at time T. This is supported since YARN-1051 (Hadoop 2.6), we need to consider unifying APIs.

Combination of aboveGang scheduling + anti-affinity: give me 10 containers at once but avoid nodes which have containers from application-X.Scheduling fallbacks + node partition: give me 10 containers from partition X, if I cannot get them within 5 mins, any hosts are fine.

# Hortonworks Inc. 2011 2016. All Rights ReservedProblems of existing ResourceRequest APIExisting Resource Request API is not extensibleCannot specify relationships between ResourceRequestFragmentation of resource request APIsWe have ResourceRequest (what I want now), BlacklistRequest (dislike), ReservationRequest (what I want in the future) API for different purposes.

# Hortonworks Inc. 2011 2016. All Rights ReservedExisting ResourceRequest is grouped by priority within an app, and scheduler chooses resource request according to locality preferences (node/rack/offswitch)

23

ProposalWe need an unified API to specify resource requirements, following requirements will be considered:Allocation tagTag the purpose of allocated container (like Hbase_regionserver) Quantities of request Total number of containers Minimum concurrency (give me at least N containers at once) Maximum concurrency (dont give me more than N container at once)

Relationships between placement request And/Or/Not: give me resource according to specified conditions Order and delay of fallbacks: Try to allocate request#1 first, fall back to request#2 after waits for X seconds Time: Give me resource between [T1, T2]

# Hortonworks Inc. 2011 2016. All Rights ReservedIn simple words Application can use unified API to request resource with different constraints/conditions.Easier to be understood, combination of resource requests can be supported.Lets see some examples:

# Hortonworks Inc. 2011 2016. All Rights ReservedExamples: Gang scheduling: I want 8 containers allocate to me at once.Reservation + anti-affinity: Give me 5 containers tomorrow and not on the same host of application_..._000512345: { // Allocation_id// Other fields..

// Quantity conditionsallocation_size: 2G, maximum_allocations: 8,minimum_concurrency: 8,}12345: { // Allocation_id allocation_size = 1G, maximum_allocations = 5, placement_strategy: { NOT { // do not take me to this application target_app_id: application_123456789_0015 } },

time_conditions: { allocation_start_time: [ 10:50 pm tomorrow - *] }}

# Hortonworks Inc. 2011 2016. All Rights Reserved

26

Examples: Request with fallbacks: Try to allocate on GPU partition first, then fall back to any hosts after 5 mins.567890: { // allocation_id allocation_size: 2G, maximum_allocations = 10, placement_strategy: { ORDERED_OR [ { node_partition: GPU, delay_to_next: 5 min }, { host: * } ] }}

# Hortonworks Inc. 2011 2016. All Rights ReservedStatus & PlanWorking on API definition to make sure it covers all target scenarios.Will start POC soonThis should be a replacement of existing ResourceRequest API, old API will be kept and automatically converted to new request (old application will not be affected).If you want to get more details, please take a look at design doc and discussions of YARN-4902.

# Hortonworks Inc. 2011 2016. All Rights ReservedQ & A Thank you!

# Hortonworks Inc. 2011 2016. All Rights Reserved