performance-responsive scheduling for grid computing

Download Performance-responsive Scheduling for  Grid Computing

Post on 01-Jan-2016

26 views

Category:

Documents

1 download

Embed Size (px)

DESCRIPTION

High Performance Systems Group. Performance-responsive Scheduling for Grid Computing. Dr Stephen Jarvis High Performance Systems Group University of Warwick, UK. High Performance Systems Group. Context. Funded by / collaborating with UK e-Science Core Programme IBM (Watson, Hursley) - PowerPoint PPT Presentation

TRANSCRIPT

  • Performance-responsive Scheduling for Grid ComputingDr Stephen JarvisHigh Performance Systems GroupUniversity of Warwick, UKHigh Performance Systems Group

  • ContextFunded by / collaborating with UK e-Science Core ProgrammeIBM (Watson, Hursley)NASA (Ames)NEC EuropeLos Alamos National Laboratory

    Integrate established performance tools into emerging grid middlewareHigh Performance Systems Group

  • What do we mean by schedulingUsers viewJobs run somewhere on the GridNotion of deadlineExecution is single domain (includes pre-staging)Resource providers viewDont mind which jobs are run whereAs long as resources are well/evenly usedMaintaining customers deadlines is importantSystem viewJobs can run anywhereResources are heterogeneousThroughput is important, as are scheduling overheads

  • High Performance Systems GroupManaging through Middleware

  • High Performance Systems GroupDetermine what resources are required (predict)

    Determine what resources are available (discover)

    Map requirements to available resources (schedule)

    Maintain contract of performance (QoS)Managing through Middleware

  • Performance ServicesIntra-domainLab- / department-basedShared resources under local administration

    Multi-domainCampus- / country-basedWide-area resource and task managementCross domainHigh Performance Systems Group

  • Performance ServicesHigh Performance Systems GroupIntra-domainLab- / department-basedShared resources under local administration

    Multi-domainCampus- / country-basedWide-area resource and task managementCross domain

  • Performance ServicesHigh Performance Systems GroupIntra-domainLab- / department-basedShared resources under local administration

    Multi-domainCampus- / country-basedWide-area resource and task managementCross domain

  • Performance PredictionPerformance prediction toolsAim to predictExecution timeCommunication usageData and resource requirementsProvides best guess as to how an application will execute on a given resourceHigh Performance Systems Group

  • High Performance Systems GroupPACEUserApplicationResource

  • High Performance Systems GroupPACEUserApplicationResourceApplicationModelResource Model

  • ApplicationApplicationModelResourceResource ModelPACEUserEvaluation EngineModel parametersResource config.High Performance Systems Group

  • ApplicationApplicationModelResourceResource ModelPACEUserEvaluation EngineModel parametersResource config.High Performance Systems Group

  • Why is prediction useful?Scaling propertiesCompare runtime options withdeadlineavailable resourcespriority / other jobsetc.High Performance Systems GroupAllows runtime scenarios to be explored before deploymentRun-time

  • 1. Intra-Domain Co-SchedulingAugment Condor scheduler with additional performance informationScheduler driver, or co-scheduler (called Titan)Use predictive data for system improvementTime to complete tasks / utilisation of resourcesQoS ability to meet deadlinesHandle predictive and non-predictive tasksHigh Performance Systems Group

  • Intra-Domain Co-SchedulingNon-predictive tasks

    High Performance Systems GroupPORTALPRE-EXECUTIONENGINEMATCHMAKERSCHEDULEQUEUEPACEGA CLUSTERCONNECTOR CONDORREQUESTS FROM USERS OR OTHERDOMAIN SCHEDULERSRESOURCESCLASSADSTitan

  • Intra-Domain Co-SchedulingNon-predictive tasksHigh Performance Systems GroupPORTALPRE-EXECUTIONENGINEMATCHMAKERSCHEDULEQUEUEPACEGA CLUSTERCONNECTOR CONDORREQUESTS FROM USERS OR OTHERDOMAIN SCHEDULERSRESOURCESCLASSADSTitan

  • Intra-Domain Co-SchedulingNon-predictive tasksTasks with prediction dataHigh Performance Systems GroupPORTALPRE-EXECUTIONENGINEMATCHMAKERSCHEDULEQUEUEPACEGA CLUSTERCONNECTOR CONDORREQUESTS FROM USERS OR OTHERDOMAIN SCHEDULERSRESOURCESCLASSADSTitan

  • Intra-Domain Co-SchedulingNon-predictive tasksTasks with prediction dataHigh Performance Systems GroupPORTALPRE-EXECUTIONENGINEMATCHMAKERSCHEDULEQUEUEPACEGA CLUSTERCONNECTOR CONDORREQUESTS FROM USERS OR OTHERDOMAIN SCHEDULERSRESOURCESCLASSADSTitan

  • Intra-Domain Co-SchedulingNon-predictive tasksTasks with prediction dataHigh Performance Systems GroupPORTALPRE-EXECUTIONENGINEMATCHMAKERSCHEDULEQUEUEPACEGA CLUSTERCONNECTOR CONDORREQUESTS FROM USERS OR OTHERDOMAIN SCHEDULERSRESOURCESCLASSADSTitan

  • Intra-Domain Co-SchedulingNon-predictive tasksTasks with prediction dataHigh Performance Systems GroupPORTALPRE-EXECUTIONENGINEMATCHMAKERSCHEDULEQUEUEPACEGA CLUSTERCONNECTOR CONDORREQUESTS FROM USERS OR OTHERDOMAIN SCHEDULERSRESOURCESCLASSADSTitan

  • Intra-Domain Co-SchedulingNon-predictive tasksTasks with prediction dataHigh Performance Systems GroupPORTALPRE-EXECUTIONENGINEMATCHMAKERSCHEDULEQUEUEPACEGA CLUSTERCONNECTOR CONDORREQUESTS FROM USERS OR OTHERDOMAIN SCHEDULERSRESOURCESCLASSADSTitan

  • Intra-Domain DeploymentWithout co-schedulerWith co-schedulerTime to complete = 70.08mTime to complete = 35.19mHigh Performance Systems Group

  • Publish intra-domain perf. data through global information services (MDS)Augment service with agent systemOne agent per domain / VOWhen a task is submittedAgents query IS, and negotiate to discover best domain to run taskScheme is tested on a 256-node exp. Grid16 resource domains; 6 arch. typesHigh Performance Systems Group2. Multi-Domain Management

  • High Performance Systems GroupMulti-Domain Management time

  • High Performance Systems GroupMulti-Domain Management time

  • High Performance Systems GroupMulti-Domain Management time

  • High Performance Systems GroupMulti-Domain Management Time to complete = 2752s

  • Multi-Domain Management High Performance Systems GroupTime to complete = 467s;an improvement of 83%

  • Multi-Domain Management High Performance Systems GroupTime to complete = 467s; an improvement of 83%

  • QoS: Ability to Meet DeadlineHigh Performance Systems Groupactiveinactive

  • Resource usageHigh Performance Systems Groupactiveinactive

  • Other workOGSA compatibilityPredictionAccuracyOther prediction techniquesWorkflow (CCGrid 2003)ReservationV. 1.1, Condor/GT2-basedwww.dcs.warwick.ac.uk/~hpsgDocumented at HPDC-12/GGF-8, FGCS

    High Performance Systems Group