performance-responsive scheduling for grid computing dr stephen jarvis high performance systems...

33
Performance-responsive Scheduling for Grid Computing Dr Stephen Jarvis High Performance Systems Group University of Warwick, UK High Performance Systems Group

Upload: allen-harrington

Post on 18-Jan-2018

219 views

Category:

Documents


0 download

DESCRIPTION

What do we mean by ‘scheduling’ Users view –Jobs run somewhere on the Grid –Notion of deadline –Execution is single domain (includes pre-staging) Resource providers view –Don’t mind which jobs are run where –As long as resources are well/evenly used –Maintaining customers deadlines is important System view –Jobs can run anywhere –Resources are heterogeneous –Throughput is important, as are scheduling overheads

TRANSCRIPT

Page 1: Performance-responsive Scheduling for Grid Computing Dr Stephen Jarvis High Performance Systems Group University of Warwick, UK High Performance Systems

Performance-responsive Scheduling for Grid Computing

Dr Stephen JarvisHigh Performance Systems Group

University of Warwick, UK

High Performance Systems Group

Page 2: Performance-responsive Scheduling for Grid Computing Dr Stephen Jarvis High Performance Systems Group University of Warwick, UK High Performance Systems

Context• Funded by / collaborating with

– UK e-Science Core Programme– IBM (Watson, Hursley)– NASA (Ames)– NEC Europe– Los Alamos National Laboratory

• Integrate established performance tools into emerging grid middleware

High Performance Systems Group

Page 3: Performance-responsive Scheduling for Grid Computing Dr Stephen Jarvis High Performance Systems Group University of Warwick, UK High Performance Systems

What do we mean by ‘scheduling’• Users view

– Jobs run somewhere on the Grid– Notion of deadline– Execution is single domain (includes pre-staging)

• Resource providers view– Don’t mind which jobs are run where– As long as resources are well/evenly used– Maintaining customers deadlines is important

• System view– Jobs can run anywhere– Resources are heterogeneous– Throughput is important, as are scheduling overheads

Page 4: Performance-responsive Scheduling for Grid Computing Dr Stephen Jarvis High Performance Systems Group University of Warwick, UK High Performance Systems

High Performance Systems Group

Managing through Middleware

Page 5: Performance-responsive Scheduling for Grid Computing Dr Stephen Jarvis High Performance Systems Group University of Warwick, UK High Performance Systems

High Performance Systems Group

•Determine what resources are required (predict)

•Determine what resources are available (discover)

•Map requirements to available resources (schedule)

•Maintain contract of performance (QoS)

Managing through Middleware

Page 6: Performance-responsive Scheduling for Grid Computing Dr Stephen Jarvis High Performance Systems Group University of Warwick, UK High Performance Systems

Performance Services• Intra-domain

– Lab- / department-based– Shared resources under

local administration

• Multi-domain– Campus- / country-based– Wide-area resource and

task management– Cross domain

High Performance Systems Group

Page 7: Performance-responsive Scheduling for Grid Computing Dr Stephen Jarvis High Performance Systems Group University of Warwick, UK High Performance Systems

Performance ServicesHigh Performance Systems Group

• Intra-domain– Lab- / department-based– Shared resources under

local administration

• Multi-domain– Campus- / country-based– Wide-area resource and

task management– Cross domain

Page 8: Performance-responsive Scheduling for Grid Computing Dr Stephen Jarvis High Performance Systems Group University of Warwick, UK High Performance Systems

Performance ServicesHigh Performance Systems Group

• Intra-domain– Lab- / department-based– Shared resources under

local administration

• Multi-domain– Campus- / country-based– Wide-area resource and

task management– Cross domain

Page 9: Performance-responsive Scheduling for Grid Computing Dr Stephen Jarvis High Performance Systems Group University of Warwick, UK High Performance Systems

Performance Prediction• Performance prediction tools• Aim to predict

– Execution time– Communication usage– Data and resource requirements

• Provides best guess as to how an application will execute on a given resource

High Performance Systems Group

Page 10: Performance-responsive Scheduling for Grid Computing Dr Stephen Jarvis High Performance Systems Group University of Warwick, UK High Performance Systems

High Performance Systems Group

PACE User

Application

Resource

Page 11: Performance-responsive Scheduling for Grid Computing Dr Stephen Jarvis High Performance Systems Group University of Warwick, UK High Performance Systems

High Performance Systems Group

PACE User

Application

Resource

ApplicationModel

Resource Model

Page 12: Performance-responsive Scheduling for Grid Computing Dr Stephen Jarvis High Performance Systems Group University of Warwick, UK High Performance Systems

Application

ApplicationModel

Resource

Resource Model

PACE User

Evaluation Engine

Model parameters

Resource config.

High Performance Systems Group

Page 13: Performance-responsive Scheduling for Grid Computing Dr Stephen Jarvis High Performance Systems Group University of Warwick, UK High Performance Systems

Application

ApplicationModel

Resource

Resource Model

PACE User

Evaluation Engine

Model parameters

Resource config.

High Performance Systems Group

Page 14: Performance-responsive Scheduling for Grid Computing Dr Stephen Jarvis High Performance Systems Group University of Warwick, UK High Performance Systems

Why is prediction useful?• Scaling properties• Compare runtime

options with– deadline– available resources– priority / other jobs– etc.

High Performance Systems Group

05

101520253035404550

1 4 7 10 13 16

The Number of Processors

Runn

ing

Tim

e on

SG

IOrig

in20

00 (s

ec)

sweep3d

fft

improc

closure

jacobi

memsort

cpi

Allows runtime scenarios to be explored before deployment

Run

-tim

e

Page 15: Performance-responsive Scheduling for Grid Computing Dr Stephen Jarvis High Performance Systems Group University of Warwick, UK High Performance Systems

1. Intra-Domain Co-Scheduling

High Performance Systems Group

• Augment Condor scheduler with additional performance information

• Scheduler driver, or co-scheduler (called Titan)

• Use predictive data for system improvement– Time to complete tasks / utilisation of resources– QoS – ability to meet deadlines

• Handle predictive and non-predictive tasks

Page 16: Performance-responsive Scheduling for Grid Computing Dr Stephen Jarvis High Performance Systems Group University of Warwick, UK High Performance Systems

Intra-Domain Co-SchedulingHigh Performance Systems Group

• Non-predictive tasks

PORTALPRE-

EXECUTIONENGINE MATCHMAKER

SCHEDULEQUEUE

PACE

GA CLUSTERCONNECTOR

CONDOR

REQUESTS FROM USERS OR OTHERDOMAIN SCHEDULERS

RESOURCES

CLASSADS

Titan

Page 17: Performance-responsive Scheduling for Grid Computing Dr Stephen Jarvis High Performance Systems Group University of Warwick, UK High Performance Systems

Intra-Domain Co-SchedulingHigh Performance Systems Group

• Non-predictive tasks

PORTALPRE-

EXECUTIONENGINE MATCHMAKER

SCHEDULEQUEUE

PACE

GA CLUSTERCONNECTOR

CONDOR

REQUESTS FROM USERS OR OTHERDOMAIN SCHEDULERS

RESOURCES

CLASSADS

Titan

Page 18: Performance-responsive Scheduling for Grid Computing Dr Stephen Jarvis High Performance Systems Group University of Warwick, UK High Performance Systems

Intra-Domain Co-SchedulingHigh Performance Systems Group

• Non-predictive tasks• Tasks with prediction

dataPORTAL

PRE-EXECUTION

ENGINE MATCHMAKER

SCHEDULEQUEUE

PACE

GA CLUSTERCONNECTOR

CONDOR

REQUESTS FROM USERS OR OTHERDOMAIN SCHEDULERS

RESOURCES

CLASSADS

Titan

Page 19: Performance-responsive Scheduling for Grid Computing Dr Stephen Jarvis High Performance Systems Group University of Warwick, UK High Performance Systems

Intra-Domain Co-SchedulingHigh Performance Systems Group

• Non-predictive tasks• Tasks with prediction

dataPORTAL

PRE-EXECUTION

ENGINE MATCHMAKER

SCHEDULEQUEUE

PACE

GA CLUSTERCONNECTOR

CONDOR

REQUESTS FROM USERS OR OTHERDOMAIN SCHEDULERS

RESOURCES

CLASSADS

Titan

Page 20: Performance-responsive Scheduling for Grid Computing Dr Stephen Jarvis High Performance Systems Group University of Warwick, UK High Performance Systems

Intra-Domain Co-SchedulingHigh Performance Systems Group

• Non-predictive tasks• Tasks with prediction

dataPORTAL

PRE-EXECUTION

ENGINE MATCHMAKER

SCHEDULEQUEUE

PACE

GA CLUSTERCONNECTOR

CONDOR

REQUESTS FROM USERS OR OTHERDOMAIN SCHEDULERS

RESOURCES

CLASSADS

Titan

Page 21: Performance-responsive Scheduling for Grid Computing Dr Stephen Jarvis High Performance Systems Group University of Warwick, UK High Performance Systems

Intra-Domain Co-SchedulingHigh Performance Systems Group

• Non-predictive tasks• Tasks with prediction

dataPORTAL

PRE-EXECUTION

ENGINE MATCHMAKER

SCHEDULEQUEUE

PACE

GA CLUSTERCONNECTOR

CONDOR

REQUESTS FROM USERS OR OTHERDOMAIN SCHEDULERS

RESOURCES

CLASSADS

Titan

Page 22: Performance-responsive Scheduling for Grid Computing Dr Stephen Jarvis High Performance Systems Group University of Warwick, UK High Performance Systems

Intra-Domain Co-SchedulingHigh Performance Systems Group

• Non-predictive tasks• Tasks with prediction

dataPORTAL

PRE-EXECUTION

ENGINE MATCHMAKER

SCHEDULEQUEUE

PACE

GA CLUSTERCONNECTOR

CONDOR

REQUESTS FROM USERS OR OTHERDOMAIN SCHEDULERS

RESOURCES

CLASSADS

Titan

Page 23: Performance-responsive Scheduling for Grid Computing Dr Stephen Jarvis High Performance Systems Group University of Warwick, UK High Performance Systems

Intra-Domain DeploymentWithout co-scheduler With co-scheduler

Time to complete = 70.08m Time to complete = 35.19m

High Performance Systems Group

Page 24: Performance-responsive Scheduling for Grid Computing Dr Stephen Jarvis High Performance Systems Group University of Warwick, UK High Performance Systems

• Publish intra-domain perf. data through global information services (MDS)

• Augment service with agent system– One agent per domain / VO

• When a task is submitted– Agents query IS, and negotiate to discover best

domain to run task• Scheme is tested on a 256-node exp. Grid

– 16 resource domains; 6 arch. types

High Performance Systems Group

2. Multi-Domain Management

Page 25: Performance-responsive Scheduling for Grid Computing Dr Stephen Jarvis High Performance Systems Group University of Warwick, UK High Performance Systems

High Performance Systems Group

Multi-Domain Management

time

Page 26: Performance-responsive Scheduling for Grid Computing Dr Stephen Jarvis High Performance Systems Group University of Warwick, UK High Performance Systems

High Performance Systems Group

Multi-Domain Management

time

Page 27: Performance-responsive Scheduling for Grid Computing Dr Stephen Jarvis High Performance Systems Group University of Warwick, UK High Performance Systems

High Performance Systems Group

Multi-Domain Management

time

Page 28: Performance-responsive Scheduling for Grid Computing Dr Stephen Jarvis High Performance Systems Group University of Warwick, UK High Performance Systems

High Performance Systems Group

Multi-Domain Management

Time to complete = 2752s

Page 29: Performance-responsive Scheduling for Grid Computing Dr Stephen Jarvis High Performance Systems Group University of Warwick, UK High Performance Systems

Multi-Domain Management High Performance Systems Group

Time to complete = 467s; an improvement of 83%

Page 30: Performance-responsive Scheduling for Grid Computing Dr Stephen Jarvis High Performance Systems Group University of Warwick, UK High Performance Systems

Multi-Domain Management High Performance Systems Group

Time to complete = 467s; an improvement of 83%

Page 31: Performance-responsive Scheduling for Grid Computing Dr Stephen Jarvis High Performance Systems Group University of Warwick, UK High Performance Systems

QoS: Ability to Meet DeadlineHigh Performance Systems Group

active inactive

Page 32: Performance-responsive Scheduling for Grid Computing Dr Stephen Jarvis High Performance Systems Group University of Warwick, UK High Performance Systems

Resource usageHigh Performance Systems Group

active inactive

Page 33: Performance-responsive Scheduling for Grid Computing Dr Stephen Jarvis High Performance Systems Group University of Warwick, UK High Performance Systems

Other work• OGSA compatibility• Prediction

– Accuracy– Other prediction techniques

• Workflow (CCGrid 2003)• Reservation• V. 1.1, Condor/GT2-based

– www.dcs.warwick.ac.uk/~hpsg– Documented at HPDC-12/GGF-8, FGCS

High Performance Systems Group