adviser: frank, yeong -sung lin present by sean chou

1

OPTIMAL SERVICE TASK PARTITION AND DISTRIBUTION IN GRID SYSTEMWITH STAR TOPOLOGYGREGORY LEVITIN, YUAN-SHUN DAI

Adviser: Frank, Yeong-Sung LinPresent by Sean Chou

2

AGENDA Introduction The model Algorithm for determining the pmf of the

service time Numerical example Conclusions

3



4

INTRODUCTION Grid computing is a newly developed

technology for complex systems with large-scale resource sharing, wide-area communication, and multi-institutional collaboration. [1]

This is required by a range of collaborative problem-solving and resource-brokering strategies emerging in industry, science, and engineering.

5

INTRODUCTION The sharing is controlled by a resource

management system (RMS) [2] When the RMS receives a service request

from a user, the task can be divided into a set of execution blocks (EBs) that are executed in parallel.

The RMS assigns those EBs to available resources for execution.

After the resources finish the assigned jobs, they return the results back to the RMS

6

INTRODUCTION The above grid service process can be

approximated by a structure with star topology

7

INTRODUCTION The performance of grid computing is of

great concern. Usually the measure of grid performance is

the task execution time (service time). This index can be significantly improved by

using the RMS that divides a task into a set of EBs which can be executed in parallel by multiple online resources.

Many complicated and time-consuming tasks that could not be implemented before are currently working well under the grid computing environment

8

INTRODUCTION The service time is a random variable affected

by many factors [3].1. There are many resources available online, that

have different task processing speeds.2. Some resources can fail when running the jobs3. The communication links in grid service can fail

during the data transmission.4. The choice of the group of subtasks assigned to

the same EB and running on the same resource can influence the total amount of data transmitted between the RMS and the resource since different subtasks can use common input data blocks.

9

INTRODUCTION Most of the previous researchers separated

performance and reliability into two different fields and studied them individually.

However in fact, performance and reliability are closely related and affect each other, in particular when the grid computing is implemented.

10

INTRODUCTION For example, when a task is fully parallelized

into n different EBs executed by n resources simultaneously, the performance is high but the reliability can be low because failure of any resource makes the entire task incomplete.

Therefore, it is worth having some redundant resources to execute same EB especially for those failure-prone resources.

However, too many redundancies, even though improving the reliability, can decrease the performance by not fully parallelizing the task.

11

INTRODUCTION Performance and reliability should be studied

together in the grid service analysis. The first model for evaluating performance

(service time) of grid with star topology taking the service reliability into account was presented in [4].

12

INTRODUCTION Optimizing the division of a service task into

EBs and distribution of these EBs among available grid resources can considerably improve the service performance.

This paper presents an algorithm for solving these optimization problems based on the model developed in [4].

13



14

THE MODEL 2.1. Service execution by the grid system

with star architecture 2.2. Assumptions 2.3. Service execution time 2.4. Service reliability and expected

performance

15

THE MODEL Service execution by the grid system

with star architecture Different resources are distributed in the grid

system. The considered service can use a given set of

resources. All the resources and communication

channels from this set are available at the time when the request for service arrives to the RMS

16

THE MODEL Each resource is directly connected to the

RMS by a single communication channel forming the star topology.

17

THE MODEL The service task consists of subtasks that

can be independently executed by different resources.

Different subtasks may need some common input data blocks for their execution.

The subtasks can be grouped into EBs. The input data for any EB consists of input data blocks necessary for executing all the subtasks belonging to this EB.

18

THE MODEL The request for service (task execution) arrives

to the RMS which forms the EBs and assigns them to different resources for processing. Each resource gets no more than one EB for processing.

The same EB can be assigned to several resources for parallel execution.

If the same EB is processed by several resources, it is completed when first output is returned to the RMS.

The entire task is completed when all of the EBs are completed and their results are returned to the RMS from the resources.

19

THE MODEL Assumptions

Each resource starts processing the assigned EB immediately after it gets all the necessary input data from the RMS through the corresponding communication channel. Each resource sends the output data to the RMS through the same communication channel immediately after it completes the EB.

Each resource has a given constant processing speed when it is available. Each resource has a given constant failure rate.

20

THE MODEL Each communication channel has constant data

transmission speed (bandwidth) when it is available. Each communication channel has a constant failure rate.

The subtasks belonging to an EB are processed in sequence. The subtask processing time is proportional to its computational complexity.

The data transmission time is proportional to the amount of data transmitted between the RMS and a resource.

21

THE MODEL The failure rates of the communication channels

or resources are the same when they are idle or loaded (hot standby model). The failures at different resources and communication channels are independent.

The RMS is fully reliable. The time of task processing by the RMS (formation and assignment of EBs, sending them to the resources, receiving the results and integrating them into entire task output) is negligible when compared with the EBs’ processing time.

22

THE MODEL Service execution time The entire task consists of m subtasks that

can be executed independently Any EB i consisting of a set of subtasks EB’s computational complexity :

23

THE MODEL Each subtask j needs a set Bj of data blocks

as its input and produces amount Oj of output data.

The set of the input data blocks necessary for execution of EB i is [j2siBj

the amount of data to be transmitted from the RMS to the resource executing this EB is

24

THE MODEL The total amount of data (input and output)

Di that should be transmitted between the RMS and a resource executing EB i is

25

THE MODEL The EB execution time is defined as time from the

beginning of input data transmission from the RMS to a resource to the end of output data transmission from the resource to the RMS.

Therefore, the random time tij of EB i completion by resource j can take two possible values

If the resource j and the communication channel j do not fail until the subtask completion, and otherwise.

26

THE MODEL EB i can be successfully completed by

resource j if this resource and communication link j do not fail before the end of subtask execution.

For constant failure rates of resource j and communication link j one can obtain the probability of EB success as

27

THE MODEL Assume that each EB i is assigned to

resources composing set oi such that oi \ oj ?; for any iaj.

The random time of EB i completion is

The entire task is completed when all of the subtasks (including the slowest one) are completed.

The random task execution time takes the form:

28

THE MODEL Service reliability and expected

performance In order to estimate both the service

reliability and performance of a grid system, different measures can be used depending on the application.

The system reliability ReyT is defined (according to performability concept [5,6]) as a probability that the correct output is produced in time less than y.

29

THE MODEL The service reliability is defined as the

probability that it produces correct outputs without respect to the service time. This index can be referred to as

The conditional expected service time W is considered to be a measure of its performance.

30

THE MODEL The service task partition into EBs

(represented by the sets si, 1piph) and distribution of the EBs among the resources (represented by the sets oi, 1piph) determine the service reliability and performance.

Two optimization problems:

31

AGENDA Introduction The model Algorithm for determining the pmf of

the service time Numerical example Conclusions

32

ALGORITHM FOR DETERMINING THE PMF OF THE SERVICE TIME The procedure used for the evaluation of

service time distribution is based on the universal generating function (u-function) technique.

Its high computational efficiency that allows it to be used in optimization procedures where a large number of different solutions should be estimated.

33

ALGORITHM FOR DETERMINING THE PMF OF THE SERVICE TIME The u-function ui;fjge can define pmf of total

completion time tij for EB i assigned to resource j.

This u-function takes the form of

34

ALGORITHM FOR DETERMINING THE PMF OF THE SERVICE TIME The total completion time of EB i assigned to

a pair of resources k and j is equal to the minimum of completion times for different resources

To obtain the u-function representing the pmf of this time, composition operator with

should be used:

35

ALGORITHM FOR DETERMINING THE PMF OF THE SERVICE TIME The u-function representing the pmf of

completion time of EB i assigned to all of the resources from set can be obtained recursively:

36

ALGORITHM FOR DETERMINING THE PMF OF THE SERVICE TIME Having the u-functions uj;oj ez for each EB i

(1piph) one can obtain the u-function representing the pmf of the entire task completion time Y

37

ALGORITHM FOR DETERMINING THE PMF OF THE SERVICE TIME The final u-function Uh(z represents the pmf

of random task completion time Y in the form

38

ALGORITHM FOR DETERMINING THE PMF OF THE SERVICE TIME Algorithm for determining service

performance/reliability indices for arbitrary task partition and distribution :

39



40

NUMERICAL EXAMPLE Formulations (9) and (10) define a

complicated NP complete partitioning/allocation problem.

An exhaustive examination of all possible solutions is not realistic, considering reasonable time limitations.

41

NUMERICAL EXAMPLE A heuristic search algorithm is needed which

uses only estimates of solution quality and which does not require derivative information to determine the next direction of the search.

The genetic algorithm (GA) has been proven to be an effective optimization tool for a large number of complicated problems in reliability engineering [10,11].

42

NUMERICAL EXAMPLE Consider a grid service that uses six

resources distributed in the grid system.

43

NUMERICAL EXAMPLE The entire service task can be divided into

eight independent subtasks.

44

NUMERICAL EXAMPLE The amount of data in each input data block

is presented in Table 4.

45

NUMERICAL EXAMPLE First the optimal task partition and

distribution problem was solved by the GA for formulation (9):

The solutions for different allowed service time y are presented in Tables 5 and 6.

46

NUMERICAL EXAMPLE Table 5 contains obtained task partition into

EB and their distribution among the resources

47

NUMERICAL EXAMPLE Table 6 contains minimal and maximal

possible service times, the service reliability and the conditional expected service time for each obtained solution.

48

NUMERICAL EXAMPLE Functions for the obtained solutions are

presented in Fig. 2. It can be seen that the best solutions

obtained for certain y provide the greatest reliability for this value of service time whereas for other values of y they provide lower reliability than the solutions obtained for these values.

49

NUMERICAL EXAMPLE

50

NUMERICAL EXAMPLE

51



52

CONCLUSIONS Grid technology is a newly developed

method for large scale distributed system. This technology allows effective distribution of computational tasks among different resources presented in the grid.

The resource management system (RMS) can divide service task into subtasks and send the subtasks to different resources for parallel execution.

53

CONCLUSIONS For any given service task the service

reliability and performance indices depend on task partition into EBs and their distribution among the available resources.

The suggested optimization algorithm is aimed at achieving the greatest reliability/performance by the optimal task partition and distribution.

54

CONCLUSIONS Most of the previous researchers separated

performance and reliability into two different fields and studied them individually.

However in fact, performance and reliability are closely related and affect each other, in particular when the grid computing is implemented.

This paper presents an algorithm for solving these optimization problems about evaluating performance (service time) of grid with star topology taking the service reliability into account.

55

Thanks for your listening.

adviser: frank, yeong -sung lin present by sean chou

Documents