ghs: a performance prediction and task scheduling system for grid computing

31
GHS: A Performance Prediction and Task Scheduling System for Grid Computing Xian-He Sun Department of Computer Science Illinois Institute of Technology [email protected] SC/APART Nov. 22, 2002

Upload: haven

Post on 19-Jan-2016

56 views

Category:

Documents


0 download

DESCRIPTION

GHS: A Performance Prediction and Task Scheduling System for Grid Computing. Xian-He Sun Department of Computer Science Illinois Institute of Technology [email protected]. SC/APART Nov. 22, 2002. SCS. Outline. Introduction Concept and challenge The Grid Harvest Service (GHS) System - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: GHS: A Performance Prediction and Task Scheduling System for Grid Computing

GHS: A Performance Prediction and Task Scheduling System for Grid

Computing

Xian-He SunDepartment of Computer Science

Illinois Institute of Technology

[email protected]

SC/APART Nov. 22, 2002

Page 2: GHS: A Performance Prediction and Task Scheduling System for Grid Computing

Outline• Introduction

Concept and challenge

• The Grid Harvest Service (GHS) System– Design methodology– Measurement system– Scheduling algorithms– Experimental testing

• Conclusion

Scalable Computing Software Laboratory

Page 3: GHS: A Performance Prediction and Task Scheduling System for Grid Computing

• Parallel Processing– Two or more working entities work together

toward a common goal for a better performance

• Grid Computing– Use distributed resources as a unified compute

platform for a better performance

• New Challenges of Grid Computing– Heterogeneous system, Non-dedicated

environment, Relative large data access delay

IntroductionIntroduction

Page 4: GHS: A Performance Prediction and Task Scheduling System for Grid Computing

Degradations of Parallel Processing

Unbalanced Workload

Communication Delay

Overhead Increases with the Ensemble Size

Page 5: GHS: A Performance Prediction and Task Scheduling System for Grid Computing

Degradations of Grid Computing

Unbalanced Computing Power and Workload

Shared Computing and Communication Resource

Uncertainty, Heterogeneity, and Overhead Increases with the Ensemble Size

Page 6: GHS: A Performance Prediction and Task Scheduling System for Grid Computing

Performance Evaluation (Improving performance is the goal)

• Performance Measurement– Metric, Parameter

• Performance Prediction– Model, Application-Resource, Scheduling

• Performance Diagnose/Optimization– Post-execution, Algorithm improvement,

Architecture improvement, State-of-the-art

Page 7: GHS: A Performance Prediction and Task Scheduling System for Grid Computing

Parallel Performance Metrics(Run-time is the dominant metric)

• Run-Time (Execution Time)

• Speed: mflops, mips, cpi

• Efficiency: throughput

• Speedup

• Parallel Efficiency

• Scalability: The ability to maintain performance gain when system and problem size increase

• Others: portability, programming ability,etc

TimeExecutionParallelTimeExecutionorUniprocesspS

Page 8: GHS: A Performance Prediction and Task Scheduling System for Grid Computing

Parallel Performance Models(Predicting Run-time is the dominant goal)

• PRAM (parallel random-access model)– EREW, CREW, CRCW

• BSP (bulk synchronous parallel) Model – Supersteps, phase parallel model

• Alpha and Beta Model– comm. startup time, data trans. time per byte

• Scalable Computing Model– Scalable speedup, scalability

• Log(P) Model– L-latency, o-overhead, g-gap, P-the number of processors

• Others

Page 9: GHS: A Performance Prediction and Task Scheduling System for Grid Computing

Research Projects and Tools• Parallel Processing

– Paradyn, W3 (why, when, and where) – TAU, tuning and analysis utilities – Pablo, Prophesy, SCALEA, SCALA, etc– for dedicated systems– instrumentation, post-execution analysis,

visualization, prediction, application performance, I/O performance

Page 10: GHS: A Performance Prediction and Task Scheduling System for Grid Computing

Research Projects and Tools• Grid Computing

– NWS (Network Weather Service)• monitors and forecasts resource performance

– RPS (Resource Prediction System) • predicts CPU availability of a Unix system

– AppLeS (Application-Level Scheduler)• A application-level scheduler extended to non-

dedicated environment based on NWS

– Short-term system-level prediction

Page 11: GHS: A Performance Prediction and Task Scheduling System for Grid Computing

• New Metric for Computation Grid ?– ????

• New Model for Computation Grid ?– Yes – Application-level performance prediction

• New Model for other Technical Advance?– Yes– Date access in hierarchical memory systems

Do We Need

Page 12: GHS: A Performance Prediction and Task Scheduling System for Grid Computing

The Grid Harvest Service (GHS) System

• A long-term application-level performance

prediction and scheduling system for non-dedicated

(Grid) environments

• A new prediction model derived by probability

analysis and simulation

• Non-intrusive measurement and scheduling

algorithms

• Implementation and testing

Sun/Wu 02

Page 13: GHS: A Performance Prediction and Task Scheduling System for Grid Computing

Performance Model (Gong,Sun,Watson,02)

• Remote job has low priority

• Local job arriving and service time based on extensive monitoring and observation

ws(k)

t

kw

kT

ZYXYXYXT SSk 2211

Sk YYYwT 21

1X 1Y SX SY Z

Page 14: GHS: A Performance Prediction and Task Scheduling System for Grid Computing

Pr(Tk t)

= Pr(Tkt | Sk=0)Pr(Sk = 0) + Pr(Tk

t | Sk>0)Pr(Sk > 0)

= e-wk + (1-e-wk)Pr(U(Sk) t-wk|Sk>0), if t wk

0, if t < wk

Predication Formula

Uk(S)|Sk>0 Gamma distribution

k

k1k

• Arrival of local jobs follow a Poisson distribution with rate• Execution time of the owner job follows a general distribution with mean and standard deviation

• Simulate the distribution of the local service rate, approaches with a know distribution

Page 15: GHS: A Performance Prediction and Task Scheduling System for Grid Computing

Prediction Formula

• Parallel task completion time

• Homogeneous parallel task completion time

• Mean time balancing partition

kkm

kkk

k

Ww

)1(

)1(1

,0

,)]0|)(Pr()1([)Pr( 1

m

kkkk

ww SwtSUeetT

kkkk max. wtif

otherwise

wtwhere

otherwise

ifSSUeetT

mww

,

,0

0,)]0|)(Pr()1([)Pr(

Page 16: GHS: A Performance Prediction and Task Scheduling System for Grid Computing

Measurement Methodology

• A parameter has a population with a mean and a standard deviation, a confidence interval for the

population mean is given

• The smallest sample size n with a desired confidence interval and a required accuracy r is given

x

),( 2/12/1 ndzxndzx

22/1 )100

(xr

dzn

Page 17: GHS: A Performance Prediction and Task Scheduling System for Grid Computing

Measurement and Prediction of Parameters

• Utilization

• Job Arrival

• Standard Deviation of Service Rate

• Least-Intrusive Measurement

i

i

erval

startbetween

erval

arrivali T

JJ

T

J

intint

i

t

ti jijt

tii

t

xtxavgAdapt23

||

1

23

||

1),(_

Page 18: GHS: A Performance Prediction and Task Scheduling System for Grid Computing

Select previous days, in the system measurement history;

For each day ,

where means the set of measured during the time interval beginning from the day ;End For

Select previous continuous time interval before , calculate where means the set of measured during ;

output while and

rs pp **

aN },{ ,21 aNddd

kd )1( Nk

||

1

1)(

X

iik p

Xdp

ip),( 21 tt

kd

aN

ii

as dp

Np

1

)(1

bN

),( 1mm tt

||

1

1)(

X

iim p

Xdp

ip

),( 1mm tt

bN

ii

br dp

Np

1

)(1

1)( 0,

),( 21 tt

Page 19: GHS: A Performance Prediction and Task Scheduling System for Grid Computing

List a set of lightly loaded machines ;List all possible sets of machines, such as

For each machine set ,Use mean time balancing partition to partition the task Use the formula to calculate the mean and coefficient of variation If > , then ;

End ForAssign parallel task to the machine set ;

},{ ,21 qmmmM

pS i ||

)).(1)((pp SS TCoeTE

)).(1)((kk SS TCoeTE kp

pS

kS )1( zk

Scheduling Algorithm

Scheduling with a Given Number of Sub-tasks

Page 20: GHS: A Performance Prediction and Task Scheduling System for Grid Computing

List a set of lightly loaded machines ;While do Scheduling with Sub-tasks If > , then

;End IfEnd whileAssign parallel task to the machine set .

},{ ,21 qmmmM

Optimal Scheduling Algorithm

p

q p)).(1)(( p

kp

k SSTCoeTE )).(1)(( p

kp

k SSTCoeTE

pp

pkS

Page 21: GHS: A Performance Prediction and Task Scheduling System for Grid Computing

•List a set of lightly loaded machines ;•Sort the machines in a decreasing order with ;•Use the task ratio to find the upper limit q ;•Use bi-section search to find the p such as

is minimum

},{ ,21 qmmmM

Heuristic Scheduling Algorithm

)).(1)(( pk

pk SS

TCoeTE

kk )1(

Page 22: GHS: A Performance Prediction and Task Scheduling System for Grid Computing

Embedded in Grid Run-time System

Page 23: GHS: A Performance Prediction and Task Scheduling System for Grid Computing

Application-level Prediction

Remote task completion time on single machine

|Pr

|tMeasuremen

tMeasuremenediction period

-20

0

20

40

60

80

100

120

140

0.5 1 2 4 8

rem ote task execution tim e (hours)

pre

dic

tio

n e

rro

r (%

)

expectation+variation

expectation-variation

expectation

Experimental Testing

Page 24: GHS: A Performance Prediction and Task Scheduling System for Grid Computing

Prediction of parallel task completion time

Prediction of a multi-processor with local scheduler

-200

-100

0

100

200

300

0.5 2 8 32 128

512

paralle l task execution tim e (hours)

pre

dic

tio

n(%

)

expectation+variation

expectation

expectation-variation

0

5

10

15

20

4 8 16

paralle l task execution tim e (hours)

pre

dic

tio

n e

rro

r(%

)

expectation+variation

expectation-variation

expectation

Page 25: GHS: A Performance Prediction and Task Scheduling System for Grid Computing

Partition and Scheduling

Comparison of three partition approaches

0

100

200

300

400

500e

xe

cu

tio

n t

ime

(m

)

1 2 4 8

task demand (hours)

equal-load(heterogeneous)

mean-time

equal-load

0

100

200

300

400

500e

xe

cu

tio

n t

ime

(m

)

1 1 2 2 4 4 8 8

task demand (hours) on machine A and B respectively

equal-load(heterogeneous)

mean-time

equal-load

Page 26: GHS: A Performance Prediction and Task Scheduling System for Grid Computing

Performance Gain with Scheduling

Execution time with different scheduling strategies

0200400600800

10001200140016001800

10 15 20

machine number

exec

utio

n tim

e (s

econ

d) optimal

random (5 machines)

random (10 machines)

random (15 machines)

20 machines

heuristic

Page 27: GHS: A Performance Prediction and Task Scheduling System for Grid Computing

Cost and Gain

0

2

4

6

8

10

12

14

16

18

1 4 7

10

13

16

19

number ofmeasurment perhour

Measurement reduces when system steady

Page 28: GHS: A Performance Prediction and Task Scheduling System for Grid Computing

The calculation time of the prediction component 

Node Number

8 16 32 64 128 256 512 1024

Time (s) 0.00 0.01 0.02 0.04 0.08 0.16 0.31 0.66

 

Page 29: GHS: A Performance Prediction and Task Scheduling System for Grid Computing

The GHS System

• A Good Sample and Successful Story

– Performance modeling

– Parameter measurement and prediction schemes

– Application-level performance prediction

– Partition and Scheduling

• It has its limitation too

– Communication and data access delay

Page 30: GHS: A Performance Prediction and Task Scheduling System for Grid Computing

What We Know, What We Do Not

• We know there is no deterministic prediction in a

non-deterministic shared environment. We do not

know how to reach a fussy engineering solution

Heuristicalgorithm

s

Rule ofthumb Stochastic

AI

Data Mining

Statistic

etc

Innovativemethod

etc

Page 31: GHS: A Performance Prediction and Task Scheduling System for Grid Computing

Conclusion

• Application-level Performance Evaluation

– Code-machine versus machine, alg., alg.-machine

• New Requirement under New Environments

We know we are making progress. We do not know if we can keep up with the technology improvement