grid computing

Grid Service Reliability Modeling and Optimal Task

Scheduling Considering Fault Recovery

Abstract

• There has been quite some research on the development of tools and techniques for grid systems.

• Main issues are, e.g., grid service reliability and task scheduling in the grid, have not been sufficiently studied.

• Some grid services having large subtasks requiring more time computation, so the reliability of grid service could be low.

Abstract

• To resolve this problems we are using local node fault recovery mechanism and ant colony optimization.

• LNFR helps us to recovery the faults that occur in the grid systems.

• ACO helps us to optimize the task scheduling that takes large time for computation.

CONTENTS

• Introduction• Problem statement• Existing System• Proposed System• Advantages of current system• Modules

INTRODUCTION

• Grid computing has emerged as the next-generation parallel and distributed computing methodology.

• Its goal is to provide a service-oriented infrastructure to enable easy access to resources.

• For solving various kinds of large-scale parallel applications in the wide area network.

• Nowadays, grid computing has been widely accepted, researched, and given attention to by researchers.

System Architecture

Resources Control Server

ACO scheduler and processor

Grid Node1 Grid Node2 Grid Node3

Problem statement

• In order to achieve high level of reliability and availability, the grid infrastructure should be fault tolerant.

• Since the failure of resources affects job execution fatally.

• Fault tolerance service is essential to satisfy QoS requirement in grid computing.

Existing system

• Grid service reliability can be defined as the probability of subtasks involved in the considered service to be executed successfully.• The modeling and analysis of grid service reliability mainly concentrated on the resource management and sharing.•But it failed to maintain the fault recovery and task scheduling.

Existing System

• Grid services will perform long tasks that may require several days of computation.

• For some grid services which have large subtasks requiring time-consuming computation

• So the reliability of grid service could be low.

Proposed System

• The basic approach proposed by us on fault recovery in grid systems is a Remote Node Fault Recovery (RNFR) mechanism.

• i.e. when a failure occurs on a node, the state information can be migrated to another node.

• Failed subtask execution is resumed from the interrupted point.

(or)• Failed subtask can be dynamically rescheduled on

another node, and the node restarts the subtask from the beginning.

Proposed System

• It is very useful and effective for RNFR to recover grid tasks from failures.

• However, some complex tasks may require several days of computation.

• Based on the proposed grid service reliability model, a multi-objective task scheduling optimization model is presented.

• Ant colony optimization (ACO) algorithm is developed to solve it effectively.

MODULES

• The Application is split into below modules:

– USERS

– RESOURCE

– GRID

Users:

-- Connect to Grid

-- Send request to grid

-- Get response from Grid

Resource:-- Accept request from Grid

-- Process and send output to Grid

MODULES

• Grid: Maintain Users/NodesTake Request from UserSchedule task to ResourceAccept the output from ResourceValidate the output Reschedule the task (if output is invalid)Intimate the Grid for Fault GeneratedSend output to User (if valid)Handle Network Failure and intimate the

Grid Manager

MODULES

Process Model Used

• Process model used for the project Grid Service Reliability Modeling and Optimal Task Scheduling Considering Fault Recovery is WATER FALL MODEL.

Designs

UML• Use case Diagram• Activity Diagram• Interaction Diagram

Sequence DiagramCollaboration Diagram

• Class Diagram

USE-CASE DIAGRAM

connect to grid

resourse specification

send request to grid

forward request to resource

process and return output

apply validation on outputs

reshedule task

send potput to user

resource

database

ACTIVITY DIAGRAMS

accept grid node information

connect to grid

view resourse of grid

select function and pass values send request to grid

display output

accept the output from grid

Activity diagram for a user/node:

ACTIVITY DIAGRAMS

start grid

resourse specification

accept user connection

obtain o/p from resource

verify and transact

accept request from user

forward request to resource

view faults registration

send o/p to user

register fault details

validate o/p from resourse

invalid

Activity diagram for the grid:

ACTIVITY DIAGRAMS

accept request from grid

process the request

send response to grir

start resource

Activity diagram for the resource:

Interaction Diagrams

grid gsr : to resourse info

1 : set resource info requset()

2 : prompt for resource details()

3 : enter func name,specify resourse node()

4 : verify and save()

5 : return status()

6 : view resource list request()

7 : verify and fetch records()

8 : load resourse info()

Sequence diagram for grid:

userusersft : to

grid connection

1 : user connection request()

2 : prompts for grid details()

3 : enter grid system IP adress()4 : verify establish connection()

5 : return status of connectivity()

6 : request func call to grid()

7 : verify and fetch function()

8 : load function and pass values()

9 : verify and send request to grid()

10 : load o/p to called function()

Sequence diagram for grid connectivity:

grid gsr : to requests resourse fault info

1 : view user request()2 : verify & fetch request()

3 : load request()

4 : verify & process req()

5 : load resource()

6 : trace &forward req to resource()

7 : verify &process request()

8 : return process output()

9 : validate output()

10 : store validation status()

11 : register fault details()

12 : send responce to user()

13 : resedule request()

14 : return output for request()

15 : send output to user()

Sequence diagram for grid to handle user request:

gsr : to resourse info

1 : set resource info requset()

2 : prompt for resource details()

3 : enter func name,specify resourse node()

4 : verify and save()

5 : return status()6 : view resource list request()

7 : verify and fetch records()

8 : load resourse info()

Collaboration diagram for grid:

usersft : togrid connection

1 : user connection request()

2 : prompts for grid details()3 : enter grid system IP adress()

4 : verify establish connection()

5 : return status of connectivity()

6 : request func call to grid()

7 : verify and fetch function()

8 : load function and pass values()

9 : verify and send request to grid()

10 : load o/p to called function()

Collaboration diagram for grid connectivity:

gsr : torequests

resourse

fault info

1 : view user request()

2 : verify & fetch request()3 : load request()

4 : verify & process req()

5 : load resource()

6 : trace &forward req to resource()

7 : verify &process request()

8 : return process output()

9 : validate output()

10 : store validation status()

11 : register fault details()

12 : send responce to user()

13 : resedule request() 14 : return output for request()

15 : send output to user()

Collaboration diagram for grid to handle user request:

Class Diagram

+startserver()

Gthread

+connectclient()+acceptreq()

+port+servername+uname+pwd

+connect()+sendrequest()+getresponce()

Rthread

+acceptreqfromgrid()+processreq()+sendrestogrid()

Resource

+stratresource()

Snapshots

Conclusion

• As present organisations require more reliability which can be achieved by the proposed system and are used in organisations and research areas.

References

• Foster, “The Grid: A new infrastructure for 21st century science,”Physics Today, vol. 55, no. 2, pp. 42–47, 2002.

• Y. S. Dai, M. Xie, and K. L. Poh, “Reliability of grid service systems,”Computers and Industrial Engineering, vol. 50, no. 1–2, pp. 130–147,2006.

• Y. S. Dai, Y. Pan, and X. K. Zou, “A hierarchical modeling and analysis for grid service reliability,” IEEE Trans. Computers, vol. 56, no. 5, pp. 681–691, 2007.

• T. Paul and X. Jie, “Fault tolerance within a grid environment,” in Proceedings of UK e-Science All Hands Meeting, 2003.

• Y. C. Liang and A. E. Smith, “An ant colony optimization algorithmfor the redundancy allocation problem (RAP),” IEEE Trans. Reliability,vol. 53, no. 3, pp. 417–423, 2004.

grid computing

ant colony

interaction

interaction

activity diagramsactivity

store validation

register fault

optimal task

request func

Documents

grid computing - aau 14/11-051 grid computing josva kleist...

gc3: grid computing competence center · gc3: grid...

grid computing cloud computing

grid computing ali yildirim emre uzuncakara. agenda what is...

bob thome senior manager, grid computing enterprise grid...

grid computing grid systems and scheduling grid computing...

grid computing

grid computing now! making the case for grid computing